Exercise · Storage & Data

Key-Value Store

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points60 min whiteboardReference solution →
01

Prompt

10 million requests per second, 100 TB of data, single-digit millisecond p99 at any scale. The hard parts: a consistent hash ring that routes every request to the right partition without a central coordinator, an LSM storage engine (memtable + WAL + SSTables + compaction) that turns random writes into sequential I/O, and quorum replication across three AZs so a single node or zone death never loses data and never causes downtime. Amazon DynamoDB, Apache Cassandra, Riak -- same lineage from the 2007 Dynamo paper.

Time budget: 60 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

02

Hints (progressive — click to reveal)

Hint 1

Start with consistent hashing. "hash(partition_key) maps to a virtual node on a consistent hash ring. Adding/removing nodes only remaps 1/N of keys." This is the foundation.

Hint 2

Name the LSM tree explicitly. "Writes append to WAL + memtable, flush to SSTables, background compaction merges them. Bloom filters avoid unnecessary reads." Shows you understand the storage engine, not just the API.

Hint 3

Quorum math: W + R > N. "3 replicas, write quorum of 2, read quorum of 2. W+R=4 > 3, so any read sees the latest write. For eventual consistency, R=1 is enough."

03

Rubric — 10 points

  • +2 Start with consistent hashing. "hash(partition_key) maps to a virtual node on a consistent hash ring. Adding/removing nodes only remaps 1/N of keys." This is the foundation.
  • +2 Name the LSM tree explicitly. "Writes append to WAL + memtable, flush to SSTables, background compaction merges them. Bloom filters avoid unnecessary reads." Shows you understand the storage engine, not just the API.
  • +2 Quorum math: W + R > N. "3 replicas, write quorum of 2, read quorum of 2. W+R=4 > 3, so any read sees the latest write. For eventual consistency, R=1 is enough."
  • +1 Partition key design is the interview. High cardinality, uniform distribution, matches the access pattern. Bad PK = hot partition = system down. Give a concrete example: user_id good, date bad.
  • +1 Mention adaptive capacity. This is the detail that shows you have gone beyond textbook knowledge. "DynamoDB borrows unused capacity from cold partitions to hot ones -- no manual intervention."
  • +1 Contrast with the original Dynamo paper. "Dynamo used vector clocks and sloppy quorums for AP. DynamoDB uses a single leader per partition for CP within a region, which is a simpler model. Global tables add eventual consistency across regions."
  • +1 Discuss single-table design. "Overload the sort key to store multiple entity types in one table. PK=USER#123 SK=PROFILE for metadata, SK=ORDER#2024-01-15 for orders. One Query returns all data for a user. This is the DynamoDB-native pattern."

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

04

Red flags (things that tank the interview)

  • Scan entire table for analytics queries
  • Using a date or timestamp as partition key
  • Storing items larger than 400 KB inline