A distributed counter system that handles 10M+ increments per second across 10B+ videos with eventual consistency for display.
The hard parts: a sharded counter architecture that absorbs bursty write traffic without hot rows,
a deduplication layer ensuring each user can like a video only once,
and a read path from Redis cache that serves stale-but-fast counts within ~5 seconds of real-time.
Instagram solved media pipelines; this problem isolates the counter subsystem at YouTube's write volume.
Dedup must prevent double-counting even under retries and network partitions
03
Scale Estimation
2B logged-in users. Average user likes ~5 videos/day. That's 10B likes/day ~ 115K writes/sec average, ~500K peak. Reads are heavier: every video page load fetches the like count. At 5B video views/day, that's ~58K count reads/sec, but most are served from cache. The dedup set is ~5 TB (300B entries at 16 bytes each over 30 days).
10B+
total videos
~500K
peak likes/sec
~58K
count reads/sec
~5 TB
dedup state (30-day window)
~5s
max display staleness
~10B
likes/day
< 10 ms
p99 read latency (cached)
2B+
registered users
The dominant challenge is write volume. A single Cassandra row per video becomes a hot partition instantly. The entire design revolves around distributing writes across shards and reconciling periodically.
04
API Design
POST/v1/videos/{id}/like
Like a video. Idempotent — re-liking a video you already liked returns 200 with no side effect. Requires authenticated user. Publishes event to Kafka after dedup check passes.
DELETE/v1/videos/{id}/like
Unlike a video. Removes the user from the like-set and publishes a decrement event. Idempotent — unliking when not liked returns 200.
GET/v1/videos/{id}/stats
Returns { like_count, dislike_count, view_count, user_liked: bool }. Count fields are eventually consistent (~5s staleness). user_liked is strongly consistent, read from dedup store.
05
High-Level Architecture
Two paths diverge from the API gateway:
Write path: Like request → dedup check (Bloom filter + Redis SET) → publish to Kafka → counter aggregator consumes and increments sharded counters → periodic flush to Cassandra → cache invalidation in Redis.
Read path: Stats request → Redis cache hit (most cases) → on miss, read from Cassandra counter table + rebuild cache. User's own like status read directly from dedup store.
Architecture — Like Write + Count ReadSVG
Request Flow — Step Through
Client · Web / Mobile→API Gateway · Auth + rate-limit→Like Service · Dedup + publish→Bloom Filter · Fast reject dupes→Kafka · like-events topic→Counter Aggregator · Sharded in-memory→Cassandra · Durable counter store
Click Next Step to walk through the request flow.
06
Deep Dive — Sharded Counters & Dedup
The Core Question
How do you handle 500K counter increments per second without creating hot rows in your database, while guaranteeing each user can only like once?
Step 1 — Dedup check. Two-tier: Bloom filter (in-memory, ~1% FP) rejects obvious duplicates instantly. On "maybe," check Redis SET likes:{video_id}. False positive from Bloom = one extra Redis lookup. Prevents double-counting under retries.
Step 2 — Publish to Kafka. Publish LikeEvent{video_id, user_id, LIKE|UNLIKE} to a topic partitioned by video_id. Kafka provides durability and ordering. Partitioning ensures all likes for one video go to the same consumer.
Step 3 — Sharded in-memory aggregation. Each aggregator instance holds a local shard counter per video. 50 aggregators handling a viral video at 100K likes/sec = ~2K increments/sec each — trivial in memory.
Step 4 — Periodic flush to Cassandra. Every ~5s, flush accumulated deltas. Table: video_id, shard_id, count_delta. Total = SUM(shards). Turns 500K writes/sec into ~10K batch writes every 5s.
Step 5 — Cache invalidation. After flush, update Redis count:{video_id}. Reads hit cache (sub-ms); on miss, sum shards from Cassandra. Staleness: at most ~5s.
The unlike path mirrors the like path: dedup confirms user liked it, publishes decrement, aggregator subtracts.
Sequence — Like Request FlowMermaid.js
sequenceDiagram
participant C as Client
participant API as API Gateway
participant LS as Like Service
participant BF as Bloom Filter
participant RS as Redis (dedup SET)
participant K as Kafka
participant AG as Counter Aggregator
participant CA as Cassandra
participant RC as Redis (count cache)
C->>API: POST /videos/{id}/like
API->>LS: forward (auth validated)
LS->>BF: check(user_id, video_id)
BF-->>LS: maybe new
LS->>RS: SISMEMBER likes:{vid} user_id
RS-->>LS: 0 (not member)
LS->>RS: SADD likes:{vid} user_id
LS->>K: publish LikeEvent{vid, uid, LIKE}
LS-->>C: 200 OK
K->>AG: consume (partitioned by vid)
AG->>AG: increment in-memory shard counter
Note over AG: every ~5 seconds
AG->>CA: batch flush shard deltas
AG->>RC: SET count:{vid} new_total
Note over RC: reads serve from here
07
Key Design Decisions & Tradeoffs
Sharded counters + periodic rollup
Distribute writes across N shards, sum on read
Each aggregator holds a local counter per video. Flush deltas every ~5s. Total = SUM(shards). Write throughput scales linearly.
Single atomic counter per video
One row, UPDATE count = count + 1
Simple but fatal at scale. A viral video becomes a hot partition. Even Cassandra LWTs can't handle 100K concurrent increments on one row.
Async writes via Kafka
Publish like event; aggregate asynchronously
Kafka absorbs write bursts; provides durability and replay. Like request returns in ~10ms (dedup + publish). Perfect for ~5s staleness tolerance.
Synchronous DB write on every like
Direct Cassandra UPDATE on each request
Strong consistency but 500K writes/sec directly to Cassandra is expensive. API latency tied to DB performance. A Cassandra hiccup = user-facing errors.
Bloom filter + Redis SET dedup
Two-tier: fast reject, then exact check
Bloom filter rejects ~99% of duplicate likes in nanoseconds (in-memory). The remaining 1% false-positives fall through to Redis SISMEMBER — one network hop. Handles 500K checks/sec without breaking a sweat.
DB-only dedup (query before write)
Check Cassandra for existing like row
Correct but slow. Every like requires a Cassandra read-before-write. At 500K/sec, that's 500K random reads — high latency, wastes DB capacity.
Redis cache with ~5s TTL refresh
Serve stale counts from cache
Aggregator pushes updated totals to Redis every flush cycle. Sub-ms reads. For "1.2M likes," 5 seconds of drift is invisible.
Read-through sum from Cassandra
Sum all shards on every read
Always fresh but 50 shard reads per video per request. At 58K reads/sec = 2.9M Cassandra reads/sec. Cache is the obvious win.
08
What Can Go Wrong
🐌
Kafka consumer lag — stale counts
Aggregator can't keep up with burst traffic. Displayed count lags by minutes instead of seconds.
-> Mitigation: auto-scale consumers on lag metric. Tune partition count for peak. Alert at > 10s lag.
🏚
Redis cache miss stampede
Viral video's cache key expires. 10K concurrent reads slam Cassandra to rebuild count.
Kafka partitions skewed: one partition gets all BTS likes, one aggregator overloaded.
-> Mitigation: partition by video_id hash. For extreme hotspots, sub-partition with random suffix.
🎯
Bloom filter false positive — lost likes
Bloom filter says "already liked" for a new user. Like silently dropped if Bloom is sole authority.
-> Mitigation: always fall through to Redis SET on Bloom positive. Bloom is an optimization, not a gatekeeper.
💥
Aggregator crash before flush
5 seconds of in-memory increments lost on crash. Counter drifts permanently low.
-> Mitigation: commit Kafka offsets only after flush. On restart, replay from last offset. No data loss.
🔁
Dedup Redis SET memory explosion
A video with 100M likes = 100M SET entries = 800 MB per key. Top 1000 videos = 800 GB in Redis.
-> Mitigation: per-video Bloom filters or HyperLogLog for approximate dedup. Shard across Redis cluster. Evict cold video sets to Cassandra.
⚠
Anti-patterns
🚫
Single row UPDATE counter per video (hot row)
Every like for a viral video contends on the same row. At 100K likes/sec, even Cassandra's last-write-wins semantics lose increments. The row becomes a global bottleneck.
Better: Sharded counters — distribute increments across N shard rows; sum on read. No single row sees more than write_rate / N operations/sec.
🚫
Synchronous Cassandra write on every like
500K direct writes/sec with quorum consistency. Each like adds 5-10ms of DB latency to the request path. A slow Cassandra node = user-facing 500s.
Better: Publish to Kafka (fire-and-forget from the user's perspective); aggregate and batch-flush to Cassandra every ~5s. API latency drops to ~10ms.
🚫
No dedup — user likes a video 1000 times
Without membership checks, a malicious user (or buggy client with retry loops) can inflate counts arbitrarily. Advertiser trust destroyed; engagement metrics meaningless.
Better: Two-tier dedup: Bloom filter for fast rejection of known duplicates, Redis SET for authoritative membership. Check before publishing to Kafka.
09
Interview Tips
Start with the write path, not the read path. The hard part is absorbing 500K writes/sec without hot rows. Most candidates jump to "use Redis INCR" — which works until you need durability, dedup, and crash recovery.
Name the sharded counter pattern explicitly. "I'll use N counter shards per video, flush deltas periodically" shows you understand distributed counting. This is the key insight the interviewer is looking for.
Don't forget dedup. A counter without dedup is a broken counter. Mention Bloom filter + authoritative store as a two-tier approach. Discuss the false-positive tradeoff.
Clarify consistency requirements early. "Is exact real-time count needed, or is ~5s eventual consistency OK?" This question shows maturity. YouTube shows approximate counts — that's a hint.
Kafka is the shock absorber. Explain why async is safe here: unlike a payment, a like being delayed by 5 seconds is invisible to the user. Kafka decouples the write spike from the DB capacity.
One row per video with a like_count integer. UPDATE on every like. Works fine at ~100 likes/sec. Simple, strongly consistent. This is where every startup begins.
2
Redis INCR + periodic DB flush
Move hot counter to Redis (INCR is atomic, single-threaded, ~100K ops/sec). Background job flushes Redis value to DB every 30s for durability. Handles ~50K likes/sec on one Redis instance.
3
Kafka + sharded aggregator
Kafka absorbs write bursts. Multiple aggregator instances each hold a shard counter in memory; batch-flush to Cassandra. Dedup via Redis SET. Scales to ~500K likes/sec. This is the core design described above.
4
Per-region counters + global rollup
Each geographic region (US, EU, APAC) runs its own aggregator cluster with local Kafka + Cassandra. A global rollup job merges regional totals every ~30s. Reduces cross-region latency; improves write locality.