Exercise · Infrastructure

Distributed Cache

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points45 min whiteboardReference solution →

Prompt

An in-memory key-value store serving ~1M ops/sec per node with sub-millisecond latency. The hard parts: hash-slot sharding across a cluster of ~100 nodes so any key routes deterministically, eviction policies (LRU/LFU) that keep hot data in RAM while memory stays bounded, and a replication + persistence layer (leader-follower async replication, RDB snapshots + AOF logs) that survives node crashes without losing writes. Redis, Memcached, Dragonfly -- same core pattern, different tradeoffs.

Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

Hints (progressive — click to reveal)

Hint 1

Start with the access pattern. "Read-heavy workload, cache-aside pattern, Redis as L2 between app and DB." This frames the problem correctly.

Hint 2

Name the sharding scheme. "CRC16 mod 16384 hash slots, client-side routing, MOVED/ASK redirects." Shows you know Redis internals, not just the API.

Hint 3

Explain eviction before they ask. "allkeys-lru with sample size 10. Approximated LRU, not true LRU -- the O(1) tradeoff." This is the detail that separates senior from mid-level.

Rubric — 10 points

+2 Start with the access pattern. "Read-heavy workload, cache-aside pattern, Redis as L2 between app and DB." This frames the problem correctly.
+2 Name the sharding scheme. "CRC16 mod 16384 hash slots, client-side routing, MOVED/ASK redirects." Shows you know Redis internals, not just the API.
+2 Explain eviction before they ask. "allkeys-lru with sample size 10. Approximated LRU, not true LRU -- the O(1) tradeoff." This is the detail that separates senior from mid-level.
+2 Persistence is not optional. "RDB+AOF hybrid. RDB prefix for fast restart, AOF tail for sub-second data loss." Don't hand-wave "we'll use persistence" -- name the mode.
+1 Address the thundering herd. Without prompting, mention cache stampede mitigation: lock-based fill, probabilistic early expiration, or background refresh. Shows production experience.
+1 Know when Redis is wrong. "Above 100 GB per node, consider sharding more aggressively or using a disk-backed store. Redis is not a database replacement." Showing boundaries earns trust.

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

Red flags (things that tank the interview)

Use Redis as the primary database -- "it's fast, who needs Postgres?"
Store 500 GB in a single Redis instance -- "just add more RAM"
Run KEYS * in production to find matching keys
Cache with no TTL -- "we'll invalidate manually when data changes"