Amazon ElastiCache

Amazon ElastiCache is a managed in-memory caching service offering Redis OSS, Valkey (the fork Amazon backs after Redis's license change), and Memcached. It offloads hot reads from databases, accelerates session state, and supports pub/sub and streams — all with sub-millisecond latency.

Engine Choices:

ElastiCache for Valkey: Amazon's default going forward — open-source continuation of Redis 7.2, fully protocol-compatible with Redis clients. Priced ~20% below Redis OSS on AWS.
ElastiCache for Redis OSS: The classic option — still supported for existing deployments. Adds replication, Multi-AZ, cluster mode, and persistence.
ElastiCache Serverless: On-demand variant (Valkey/Redis/Memcached) that auto-scales capacity and charges per-GB-hour plus per-request.
ElastiCache for Memcached: Simpler, multithreaded cache for basic key-value use cases without persistence or replication.

Key Features:

Sub-Millisecond Latency: In-memory data paths suitable for caching, leaderboards, rate limiting, and real-time analytics.
Cluster Mode & Sharding: Horizontal scaling across shards with online resharding; replicas per shard for read scaling and HA.
Multi-AZ with Automatic Failover: Primary failures promote a replica in seconds.
Data Tiering: Large node types (r6gd) spill cold data to local NVMe at lower cost per GB.
Encryption & VPC Isolation: TLS in transit, KMS encryption at rest, subnet groups, and security groups.
Global Datastore (Redis/Valkey): Cross-region replication for low-latency reads in multiple regions.
Backup & Restore: Automated daily snapshots to S3 with manual snapshots retained indefinitely (Redis/Valkey).

Common Use Cases:

Read-Through / Cache-Aside: Cache hot queries in front of RDS, Aurora, or DynamoDB to reduce load and latency.
Session Store: Externalize web session state so application servers are stateless.
Leaderboards & Counters: Sorted sets and atomic counters for gaming and real-time rankings.
Rate Limiting: Token-bucket counters with TTL for API gateways.
Pub/Sub & Streams: Lightweight message fan-out for real-time features.
ML Feature Cache: Online feature lookups for low-latency inference.

Service Limits & Quotas:

Nodes per cluster (cluster mode disabled): 1 primary + 5 replicas.
Shards per cluster (cluster mode enabled): default 500 (raisable to ~1,000).
Replicas per shard: 5.
Largest node type: 635 GiB RAM (r7g.16xlarge); data-tiered nodes scale further onto NVMe.
Connections per node: default 65,000.
Backup retention: 0-35 days for automated; manual snapshots indefinite.
ElastiCache Serverless: auto-scales storage 1 GB to 5 TB and ECPU/sec on demand.

Pricing Model:

Node-based: per-hour by instance class (cache.t4g.micro to cache.r7g.16xlarge); reserved nodes for 1- or 3-year discount.
Serverless: per GB-hour for stored data + per ECPU (ElastiCache Processing Unit) consumed.
Backups: first backup free; additional snapshots billed per GB stored.
Data transfer: intra-AZ free, cross-AZ replication billed.
Global Datastore: cross-region replication data transfer charges.
Common cost surprise: oversized cluster running 24/7 for a workload that hits cache in only a small window. Use Serverless for spiky traffic; for steady workloads, right-size with CloudWatch DatabaseMemoryUsagePercentage and reserved nodes.

Code Example — Cache-Aside Pattern with redis-py:


import json, redis, boto3

r = redis.Redis(
    host="prod-cache.abcdef.ng.0001.usw2.cache.amazonaws.com",
    port=6379,
    ssl=True,
    decode_responses=True,
)

def get_user(user_id: str) -> dict:
    key = f"user:{user_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    # Fall through to DynamoDB
    ddb = boto3.resource("dynamodb").Table("Users")
    item = ddb.get_item(Key={"pk": user_id})["Item"]

    r.setex(key, 300, json.dumps(item, default=str))  # TTL 5 min
    return item

def invalidate_user(user_id: str):
    r.delete(f"user:{user_id}")

Create a Multi-AZ Valkey Cluster (CLI):


aws elasticache create-replication-group \
  --replication-group-id prod-cache \
  --replication-group-description "App-tier hot cache" \
  --engine valkey \
  --engine-version 7.2 \
  --cache-node-type cache.r7g.large \
  --num-node-groups 3 \
  --replicas-per-node-group 2 \
  --automatic-failover-enabled \
  --multi-az-enabled \
  --transit-encryption-enabled \
  --at-rest-encryption-enabled \
  --kms-key-id alias/elasticache \
  --cache-subnet-group-name prod-private \
  --security-group-ids sg-0abc123 \
  --snapshot-retention-limit 7

Common Interview Questions:

Redis/Valkey vs. Memcached?

Redis/Valkey: rich data types (sorted sets, hashes, streams, geo), persistence, replication, Multi-AZ failover, pub/sub, transactions, Lua scripting. Memcached: pure key-value, multithreaded (better single-node throughput on simple GET/SET), no persistence or replication. Pick Memcached only for trivial caches; pick Valkey/Redis for almost everything else.

What is cluster mode and when enable it?

Cluster mode shards data across multiple primary nodes (each with optional replicas), allowing horizontal scaling beyond a single node's RAM and write throughput. Enable when working set exceeds the largest node, or when write throughput exceeds a single primary. Requires cluster-aware Redis clients.

Cache-aside vs. write-through?

Cache-aside: app checks cache first, falls through to DB on miss, writes update DB and invalidate the cache. Simple, tolerant of cache failures, but stale reads possible. Write-through: every write goes to cache and DB synchronously — fresher cache but more write latency and cache must be available.

How do you handle a cache stampede?

The thundering-herd problem when a popular key expires and many requests miss simultaneously. Mitigations: random TTL jitter, single-flight locks (only one process refills, others wait), early refresh before expiry (probabilistic early expiration), or background refresh jobs.

What's the difference between Multi-AZ and Global Datastore?

Multi-AZ replicates within one region (synchronous to readers in other AZs) for HA — failover in seconds. Global Datastore replicates across regions (async, sub-second lag) for low-latency reads worldwide and regional DR — promotion of a secondary region is a manual action.

When use ElastiCache Serverless?

Spiky or unpredictable workloads where capacity planning is hard, dev/test environments, or new applications without traffic patterns. It auto-scales storage and request capacity in seconds and bills per usage. Trade-off: typically more expensive than a right-sized provisioned cluster at steady state.