Amazon ElastiCache is a managed in-memory caching service offering Redis OSS, Valkey (the fork Amazon backs after Redis's license change), and Memcached. It offloads hot reads from databases, accelerates session state, and supports pub/sub and streams — all with sub-millisecond latency.
DatabaseMemoryUsagePercentage and reserved nodes.
import json, redis, boto3
r = redis.Redis(
host="prod-cache.abcdef.ng.0001.usw2.cache.amazonaws.com",
port=6379,
ssl=True,
decode_responses=True,
)
def get_user(user_id: str) -> dict:
key = f"user:{user_id}"
cached = r.get(key)
if cached:
return json.loads(cached)
# Fall through to DynamoDB
ddb = boto3.resource("dynamodb").Table("Users")
item = ddb.get_item(Key={"pk": user_id})["Item"]
r.setex(key, 300, json.dumps(item, default=str)) # TTL 5 min
return item
def invalidate_user(user_id: str):
r.delete(f"user:{user_id}")
aws elasticache create-replication-group \
--replication-group-id prod-cache \
--replication-group-description "App-tier hot cache" \
--engine valkey \
--engine-version 7.2 \
--cache-node-type cache.r7g.large \
--num-node-groups 3 \
--replicas-per-node-group 2 \
--automatic-failover-enabled \
--multi-az-enabled \
--transit-encryption-enabled \
--at-rest-encryption-enabled \
--kms-key-id alias/elasticache \
--cache-subnet-group-name prod-private \
--security-group-ids sg-0abc123 \
--snapshot-retention-limit 7
Redis/Valkey: rich data types (sorted sets, hashes, streams, geo), persistence, replication, Multi-AZ failover, pub/sub, transactions, Lua scripting. Memcached: pure key-value, multithreaded (better single-node throughput on simple GET/SET), no persistence or replication. Pick Memcached only for trivial caches; pick Valkey/Redis for almost everything else.
Cluster mode shards data across multiple primary nodes (each with optional replicas), allowing horizontal scaling beyond a single node's RAM and write throughput. Enable when working set exceeds the largest node, or when write throughput exceeds a single primary. Requires cluster-aware Redis clients.
Cache-aside: app checks cache first, falls through to DB on miss, writes update DB and invalidate the cache. Simple, tolerant of cache failures, but stale reads possible. Write-through: every write goes to cache and DB synchronously — fresher cache but more write latency and cache must be available.
The thundering-herd problem when a popular key expires and many requests miss simultaneously. Mitigations: random TTL jitter, single-flight locks (only one process refills, others wait), early refresh before expiry (probabilistic early expiration), or background refresh jobs.
Multi-AZ replicates within one region (synchronous to readers in other AZs) for HA — failover in seconds. Global Datastore replicates across regions (async, sub-second lag) for low-latency reads worldwide and regional DR — promotion of a secondary region is a manual action.
Spiky or unpredictable workloads where capacity planning is hard, dev/test environments, or new applications without traffic patterns. It auto-scales storage and request capacity in seconds and bills per usage. Trade-off: typically more expensive than a right-sized provisioned cluster at steady state.