Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.
Out of 10 points45 min whiteboardReference solution →
01
Prompt
A distributed counter system that handles 10M+ increments per second across 10B+ videos with eventual consistency for display. The hard parts: a sharded counter architecture that absorbs bursty write traffic without hot rows, a deduplication layer ensuring each user can like a video only once, and a read path from Redis cache that serves stale-but-fast counts within ~5 seconds of real-time. Instagram solved media pipelines; this problem isolates the counter subsystem at YouTube's write volume.
Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.
02
Hints (progressive — click to reveal)
Hint 1
Start with the write path, not the read path. The hard part is absorbing 500K writes/sec without hot rows. Most candidates jump to "use Redis INCR" — which works until you need durability, dedup, and crash recovery.
Hint 2
Name the sharded counter pattern explicitly. "I'll use N counter shards per video, flush deltas periodically" shows you understand distributed counting. This is the key insight the interviewer is looking for.
Hint 3
Don't forget dedup. A counter without dedup is a broken counter. Mention Bloom filter + authoritative store as a two-tier approach. Discuss the false-positive tradeoff.
03
Rubric — 10 points
+2 Start with the write path, not the read path. The hard part is absorbing 500K writes/sec without hot rows. Most candidates jump to "use Redis INCR" — which works until you need durability, dedup, and crash recovery.
+2 Name the sharded counter pattern explicitly. "I'll use N counter shards per video, flush deltas periodically" shows you understand distributed counting. This is the key insight the interviewer is looking for.
+2 Don't forget dedup. A counter without dedup is a broken counter. Mention Bloom filter + authoritative store as a two-tier approach. Discuss the false-positive tradeoff.
+2 Clarify consistency requirements early. "Is exact real-time count needed, or is ~5s eventual consistency OK?" This question shows maturity. YouTube shows approximate counts — that's a hint.
+2 Kafka is the shock absorber. Explain why async is safe here: unlike a payment, a like being delayed by 5 seconds is invisible to the user. Kafka decouples the write spike from the DB capacity.
Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.