YouTube Likes Counter

A distributed counter system that handles 10M+ increments per second across 10B+ videos with eventual consistency for display. The hard parts: a sharded counter architecture that absorbs bursty write traffic without hot rows, a deduplication layer ensuring each user can like a video only once, and a read path from Redis cache that serves stale-but-fast counts within ~5 seconds of real-time. Instagram solved media pipelines; this problem isolates the counter subsystem at YouTube's write volume.

Core: Sharded Counters + Dedup10B+ videos~500K likes/sec peak~5s eventual consistencyWrite-heavy
02

Requirements

Functional
  • User can like a video (toggle on)
  • User can unlike a video (toggle off)
  • Retrieve the total like count for any video
  • Each user can like a given video at most once (deduplication)
  • Display count is eventually consistent; exact value not required for display (>100K shown as "1.2M")
  • User's own like status must be strongly consistent (if I liked, I see the filled thumb)
Non-Functional
  • Write throughput: ~500K likes/sec peak (viral events, BTS comeback)
  • Read latency < 10 ms p99 for count retrieval
  • Display count within ~5 seconds of real-time
  • 99.99% availability for reads; 99.95% for writes
  • Scale to 10B+ videos, 2B+ users
  • Dedup must prevent double-counting even under retries and network partitions
03

Scale Estimation

2B logged-in users. Average user likes ~5 videos/day. That's 10B likes/day ~ 115K writes/sec average, ~500K peak. Reads are heavier: every video page load fetches the like count. At 5B video views/day, that's ~58K count reads/sec, but most are served from cache. The dedup set is ~5 TB (300B entries at 16 bytes each over 30 days).

10B+
total videos
~500K
peak likes/sec
~58K
count reads/sec
~5 TB
dedup state (30-day window)
~5s
max display staleness
~10B
likes/day
< 10 ms
p99 read latency (cached)
2B+
registered users

The dominant challenge is write volume. A single Cassandra row per video becomes a hot partition instantly. The entire design revolves around distributing writes across shards and reconciling periodically.

04

API Design

POST/v1/videos/{id}/like

Like a video. Idempotent — re-liking a video you already liked returns 200 with no side effect. Requires authenticated user. Publishes event to Kafka after dedup check passes.

DELETE/v1/videos/{id}/like

Unlike a video. Removes the user from the like-set and publishes a decrement event. Idempotent — unliking when not liked returns 200.

GET/v1/videos/{id}/stats

Returns { like_count, dislike_count, view_count, user_liked: bool }. Count fields are eventually consistent (~5s staleness). user_liked is strongly consistent, read from dedup store.

05

High-Level Architecture

Two paths diverge from the API gateway:

  • Write path: Like request → dedup check (Bloom filter + Redis SET) → publish to Kafka → counter aggregator consumes and increments sharded counters → periodic flush to Cassandra → cache invalidation in Redis.
  • Read path: Stats request → Redis cache hit (most cases) → on miss, read from Cassandra counter table + rebuild cache. User's own like status read directly from dedup store.
Architecture — Like Write + Count ReadSVG
Client API Gateway Like Service Dedup Check Bloom + Redis SET Kafka (like-events topic) Counter Aggregator sharded in-memory Cassandra (counters) Redis (count cache) Redis (dedup SET) user:video membership Read path: Redis cache → fallback Cassandra
Request Flow — Step Through
Client · Web / MobileAPI Gateway · Auth + rate-limitLike Service · Dedup + publishBloom Filter · Fast reject dupesKafka · like-events topicCounter Aggregator · Sharded in-memoryCassandra · Durable counter store
Click Next Step to walk through the request flow.
06

Deep Dive — Sharded Counters & Dedup

The Core Question

How do you handle 500K counter increments per second without creating hot rows in your database, while guaranteeing each user can only like once?

Step 1 — Dedup check. Two-tier: Bloom filter (in-memory, ~1% FP) rejects obvious duplicates instantly. On "maybe," check Redis SET likes:{video_id}. False positive from Bloom = one extra Redis lookup. Prevents double-counting under retries.

Step 2 — Publish to Kafka. Publish LikeEvent{video_id, user_id, LIKE|UNLIKE} to a topic partitioned by video_id. Kafka provides durability and ordering. Partitioning ensures all likes for one video go to the same consumer.

Step 3 — Sharded in-memory aggregation. Each aggregator instance holds a local shard counter per video. 50 aggregators handling a viral video at 100K likes/sec = ~2K increments/sec each — trivial in memory.

Step 4 — Periodic flush to Cassandra. Every ~5s, flush accumulated deltas. Table: video_id, shard_id, count_delta. Total = SUM(shards). Turns 500K writes/sec into ~10K batch writes every 5s.

Step 5 — Cache invalidation. After flush, update Redis count:{video_id}. Reads hit cache (sub-ms); on miss, sum shards from Cassandra. Staleness: at most ~5s.

The unlike path mirrors the like path: dedup confirms user liked it, publishes decrement, aggregator subtracts.

Sequence — Like Request FlowMermaid.js
sequenceDiagram participant C as Client participant API as API Gateway participant LS as Like Service participant BF as Bloom Filter participant RS as Redis (dedup SET) participant K as Kafka participant AG as Counter Aggregator participant CA as Cassandra participant RC as Redis (count cache) C->>API: POST /videos/{id}/like API->>LS: forward (auth validated) LS->>BF: check(user_id, video_id) BF-->>LS: maybe new LS->>RS: SISMEMBER likes:{vid} user_id RS-->>LS: 0 (not member) LS->>RS: SADD likes:{vid} user_id LS->>K: publish LikeEvent{vid, uid, LIKE} LS-->>C: 200 OK K->>AG: consume (partitioned by vid) AG->>AG: increment in-memory shard counter Note over AG: every ~5 seconds AG->>CA: batch flush shard deltas AG->>RC: SET count:{vid} new_total Note over RC: reads serve from here
07

Key Design Decisions & Tradeoffs

Sharded counters + periodic rollup

Distribute writes across N shards, sum on read

Each aggregator holds a local counter per video. Flush deltas every ~5s. Total = SUM(shards). Write throughput scales linearly.

Single atomic counter per video

One row, UPDATE count = count + 1

Simple but fatal at scale. A viral video becomes a hot partition. Even Cassandra LWTs can't handle 100K concurrent increments on one row.

Async writes via Kafka

Publish like event; aggregate asynchronously

Kafka absorbs write bursts; provides durability and replay. Like request returns in ~10ms (dedup + publish). Perfect for ~5s staleness tolerance.

Synchronous DB write on every like

Direct Cassandra UPDATE on each request

Strong consistency but 500K writes/sec directly to Cassandra is expensive. API latency tied to DB performance. A Cassandra hiccup = user-facing errors.

Bloom filter + Redis SET dedup

Two-tier: fast reject, then exact check

Bloom filter rejects ~99% of duplicate likes in nanoseconds (in-memory). The remaining 1% false-positives fall through to Redis SISMEMBER — one network hop. Handles 500K checks/sec without breaking a sweat.

DB-only dedup (query before write)

Check Cassandra for existing like row

Correct but slow. Every like requires a Cassandra read-before-write. At 500K/sec, that's 500K random reads — high latency, wastes DB capacity.

Redis cache with ~5s TTL refresh

Serve stale counts from cache

Aggregator pushes updated totals to Redis every flush cycle. Sub-ms reads. For "1.2M likes," 5 seconds of drift is invisible.

Read-through sum from Cassandra

Sum all shards on every read

Always fresh but 50 shard reads per video per request. At 58K reads/sec = 2.9M Cassandra reads/sec. Cache is the obvious win.

08

What Can Go Wrong

🐌
Kafka consumer lag — stale counts
Aggregator can't keep up with burst traffic. Displayed count lags by minutes instead of seconds.
-> Mitigation: auto-scale consumers on lag metric. Tune partition count for peak. Alert at > 10s lag.
🏚
Redis cache miss stampede
Viral video's cache key expires. 10K concurrent reads slam Cassandra to rebuild count.
-> Mitigation: jitter TTL + single-flight rebuild (one goroutine rebuilds, others wait). Pre-warm trending videos.
Counter shard imbalance
Kafka partitions skewed: one partition gets all BTS likes, one aggregator overloaded.
-> Mitigation: partition by video_id hash. For extreme hotspots, sub-partition with random suffix.
🎯
Bloom filter false positive — lost likes
Bloom filter says "already liked" for a new user. Like silently dropped if Bloom is sole authority.
-> Mitigation: always fall through to Redis SET on Bloom positive. Bloom is an optimization, not a gatekeeper.
💥
Aggregator crash before flush
5 seconds of in-memory increments lost on crash. Counter drifts permanently low.
-> Mitigation: commit Kafka offsets only after flush. On restart, replay from last offset. No data loss.
🔁
Dedup Redis SET memory explosion
A video with 100M likes = 100M SET entries = 800 MB per key. Top 1000 videos = 800 GB in Redis.
-> Mitigation: per-video Bloom filters or HyperLogLog for approximate dedup. Shard across Redis cluster. Evict cold video sets to Cassandra.

Anti-patterns

🚫
Single row UPDATE counter per video (hot row)

Every like for a viral video contends on the same row. At 100K likes/sec, even Cassandra's last-write-wins semantics lose increments. The row becomes a global bottleneck.

Better: Sharded counters — distribute increments across N shard rows; sum on read. No single row sees more than write_rate / N operations/sec.
🚫
Synchronous Cassandra write on every like

500K direct writes/sec with quorum consistency. Each like adds 5-10ms of DB latency to the request path. A slow Cassandra node = user-facing 500s.

Better: Publish to Kafka (fire-and-forget from the user's perspective); aggregate and batch-flush to Cassandra every ~5s. API latency drops to ~10ms.
🚫
No dedup — user likes a video 1000 times

Without membership checks, a malicious user (or buggy client with retry loops) can inflate counts arbitrarily. Advertiser trust destroyed; engagement metrics meaningless.

Better: Two-tier dedup: Bloom filter for fast rejection of known duplicates, Redis SET for authoritative membership. Check before publishing to Kafka.
09

Interview Tips

  1. Start with the write path, not the read path. The hard part is absorbing 500K writes/sec without hot rows. Most candidates jump to "use Redis INCR" — which works until you need durability, dedup, and crash recovery.
  2. Name the sharded counter pattern explicitly. "I'll use N counter shards per video, flush deltas periodically" shows you understand distributed counting. This is the key insight the interviewer is looking for.
  3. Don't forget dedup. A counter without dedup is a broken counter. Mention Bloom filter + authoritative store as a two-tier approach. Discuss the false-positive tradeoff.
  4. Clarify consistency requirements early. "Is exact real-time count needed, or is ~5s eventual consistency OK?" This question shows maturity. YouTube shows approximate counts — that's a hint.
  5. Kafka is the shock absorber. Explain why async is safe here: unlike a payment, a like being delayed by 5 seconds is invisible to the user. Kafka decouples the write spike from the DB capacity.
11

Evolution

1

Single DB counter column

One row per video with a like_count integer. UPDATE on every like. Works fine at ~100 likes/sec. Simple, strongly consistent. This is where every startup begins.

2

Redis INCR + periodic DB flush

Move hot counter to Redis (INCR is atomic, single-threaded, ~100K ops/sec). Background job flushes Redis value to DB every 30s for durability. Handles ~50K likes/sec on one Redis instance.

3

Kafka + sharded aggregator

Kafka absorbs write bursts. Multiple aggregator instances each hold a shard counter in memory; batch-flush to Cassandra. Dedup via Redis SET. Scales to ~500K likes/sec. This is the core design described above.

4

Per-region counters + global rollup

Each geographic region (US, EU, APAC) runs its own aggregator cluster with local Kafka + Cassandra. A global rollup job merges regional totals every ~30s. Reduces cross-region latency; improves write locality.

5

Real-time streaming counters (Flink) + approximate display

Replace batch aggregators with Apache Flink streaming jobs. Counters update in real-time with exactly-once semantics. Display uses HyperLogLog for approximate unique counts. Counts shown as "~1.2M" — nobody needs the exact number.

Next up