News Feed

02

Requirements

Functional

Users can create posts (text, images, links)
Users can follow / unfollow other accounts
Users see a personalized feed of posts from followed accounts
Feed is algorithmically ranked — not pure reverse-chron
Users can like, comment, share posts
Comments support nested replies
Push notifications when followed users post

Non-Functional

Feed load under 200ms p99 latency
99.99% availability — feed never fully down
Handles celebrities with 10M+ followers
Eventual consistency acceptable — 1-2s post propagation fine
Horizontally scalable — no single bottleneck
Spam and bot detection on comments

03

Scale Estimation

Metric	Calculation	Result
Daily Active Users	Twitter/Instagram scale assumption	500M
Posts/day	500M × 1 post per 3 days	~167M
Post write RPS	167M ÷ 86,400 (peak 2×)	~1,900 / peak 4K
Feed loads/day	500M × 5 sessions × 2 pages	2.5B
Read RPS	2.5B ÷ 86,400 (peak 2×)	~29K / peak 60K
Likes/sec	500M × 10 likes/day ÷ 86,400	~58K / peak 120K
Comments/sec	500M × 2 comments/day ÷ 86,400	~11,500
Total engagement writes/sec	likes + comments + shares	~87K / peak 180K
Fan-out Redis writes/sec	1,900 posts × avg 200 followers	~380K
Redis feed cache size	500M users × 1.6 KB per feed	~800 GB
Post storage (5 years)	167M/day × 500B × 365 × 5	~150 TB

Key Insight

The 15:1 read-to-write ratio drives every architectural decision. Optimise aggressively for reads. The 87K engagement writes/second is the real write challenge — Postgres alone cannot handle this. The fan-out problem (380K Redis writes/sec) is why the hybrid strategy exists.

04

API Design

GET/v1/feedFetch personalized feed (cursor-paginated)

// Query params: cursor=<timestamp:post_id>, limit=20 // Response 200 { "posts": [...], "next_cursor": "1735000000:abc123", "has_more": true }

POST/v1/postsCreate a new post

// Request { "content": "Hello world", "media_ids": ["img_123"] } // Response 201 — optimistic, fan-out async { "post_id": "abc123", "created_at": "2025-01-01T00:00:00Z" }

POST/v1/posts/:id/likeLike a post (idempotent)

// Response 200 — counter updated in Redis immediately { "like_count": 1424, "liked_by_me": true }

POST/v1/posts/:id/commentsAdd a comment (supports nested replies)

// Request — parent_id null for top-level, set for replies { "content": "Great post!", "parent_id": null } // Response 201 { "comment_id": "cmt_456", "path": "0001", "depth": 0 }

POST/v1/users/:id/followFollow a user

// Response 200 — updates follow graph, triggers feed rebuild if needed { "following": true, "follower_count": 10241 }

05

High-Level Architecture

Architecture — Full SystemSVG Diagram

Write Path

Client → API Server → Posts DB (sync) → Kafka event (async) → Fan-out workers → Redis sorted set per follower. Post creation is fast; fan-out happens in the background.

Read Path

Client → API Server → Redis feed cache (top 200 candidates) → hydrate post content → Re-ranker (10-20ms) → return top 20. Feed load target: under 50ms including re-ranking.

Kafka as shock absorber

Decouples post creation from fan-out. A celebrity post triggers 10M Redis writes — doing this synchronously would make post creation painfully slow. Kafka absorbs the spike; workers consume at their own pace.

Cassandra for engagement

87K engagement writes/second. Cassandra handles 100K+ writes/sec on a modest cluster with linear horizontal scaling. Postgres with this write volume would need 20+ shards — operational nightmare.

06

Deep Dive — The Fan-out Problem

The Core Question

When Beyoncé posts, her 10 million followers all need to see it. When do you update their feeds? At write time (when she posts) or at read time (when each follower opens the app)? Neither works alone. The hybrid approach is the real answer.

Option A

Fan-out on Read

Build each follower's feed when they open the app. Merge posts from all 500 followed accounts. Write cost: cheap (1 DB write). Read cost: 500 DB queries per feed load at 29K read RPS = catastrophic.

✗ Fails at scale

Option B

Fan-out on Write

Push post ID into every follower's Redis feed immediately. Read: O(1) — one Redis lookup. Write: 10M Redis writes for a celebrity post. Blocks post creation. Write storm problem.

~ Celebrity problem

Option C — Chosen

Hybrid Fan-out

Regular users (<10K followers) → fan-out on write. Cheap to push to small follower lists. Celebrities (>10K) → fan-out on read. At read time, merge the pre-built feed with fresh posts from a handful of celebrity accounts you follow. The merge is tiny (5-10 sources), not catastrophic.

✓ Best of both worlds

Algorithmic Ranking

Two-Stage Ranking

Stage 1 (write time): Pre-score = unix_timestamp + small engagement boost. Stored as sorted set score. Fast, no ML. Stage 2 (read time): Pull top 200 candidates, run lightweight re-ranker with fresh signals (10-20ms). ML model trains offline only — never scores at query time.

✓ Fast + relevant

Sequence — Feed Read FlowMermaid.js

sequenceDiagram participant C as Client participant API as API Server participant Redis as Feed Cache participant Rank as Re-ranker participant DB as Posts DB C->>API: GET /v1/feed API->>Redis: ZREVRANGE feed:user123 0 199 alt Feed exists (cache hit) Redis-->>API: 200 post IDs + scores API->>DB: Hydrate content (batch fetch) DB-->>API: Post content API->>Rank: Re-rank top 200 (~15ms) Rank-->>API: Top 20 ranked posts API-->>C: 200 OK — feed in ~50ms else Cache miss (expired or new user) API->>DB: Rebuild feed from follows + posts DB-->>API: Posts from followees API->>Redis: Cache rebuilt feed (TTL 24h+jitter) API-->>C: 200 OK — feed in ~300ms end

Request Flow — Step Through

Client · Posts content→API Server · Stateless→Posts DB · Postgres→Kafka · Fan-out queue→Fan-out Worker · Parallel→Redis Feed · Sorted set

Click Next Step to walk through the request flow.

11

Key Design Decisions & Tradeoffs

Chosen

Async fan-out via Kafka

Post creation is fast (~10ms). Fan-out happens in background. Kafka provides durability, retry, and backpressure. A celebrity post triggers 10M Redis writes without slowing the API.

✓ Decoupled, durable, scalable

Alternative

Synchronous fan-out

Write to all follower feeds before returning 200 OK. Simple to reason about, but post creation latency scales with follower count. Celebrity posts time out. No retry on partial failure.

✗ Unusable at celebrity scale

Chosen

Redis INCR + Cassandra + Postgres flush

Three-layer engagement writes. Redis absorbs 87K writes/sec. Cassandra stores durable records at scale. Postgres gets batch-flushed aggregates. Each layer does what it's best at.

✓ Handles 180K writes/sec peak

Alternative

Postgres only for engagement

Simpler. Works fine up to ~20K writes/sec. At 87K sustained, needs 20+ shards. Operationally painful. Index contention on hot posts during viral moments causes cascading slowdowns.

~ Fine at <10K users

Chosen

Materialized path for comments

Single index scan for any subtree. O(1) insert. ORDER BY path returns depth-first tree order. No recursive CTEs. Works at celebrity post scale (50K comments/min).

✓ Fast reads + fast writes

Alternative

Adjacency list

Simpler schema. But fetching a thread requires recursive CTEs — one DB join per nesting level. Catastrophic on a post with 50K comments and 10 levels of nesting under viral conditions.

✗ Recursive queries fail at scale

12

What Can Go Wrong

Design Principle

Every failure answer has three parts: what breaks, what the user experiences, and how you recover. Not just "add a replica" — but what happens in the gap before the replica promotes, and does data get lost?

⚡

API Server Crash

In-flight requests fail. No data loss because API servers are stateless — no in-memory state that matters. LB health checks detect the dead instance within 5-10 seconds and stop routing to it.

→ Fix: Stateless design + LB health checks every 5s
→ Fix: Min 3 instances across 3 AZs — one AZ down = no impact
→ Fix: Graceful shutdown — drain in-flight requests over 30s on SIGTERM

🌊

Kafka Lag / Broker Down

Fan-out workers fall behind. Posts are written to the DB but don't appear in follower feeds for minutes. If a broker dies, partition leaders re-elect in seconds. Consumer offsets committed to Kafka — if a worker crashes mid-processing, it restarts from the last committed offset. No data loss.

→ Fix: Replication factor 3 on all topics
→ Fix: Alert on consumer lag > 100K messages or growing for > 5 min
→ Fix: Auto-scale fan-out workers based on lag metric
→ Fix: Dead-letter queue for messages failing after 3 retries

🔄

Fan-out Worker Crash Mid-Processing

A worker fans out a celebrity post to 800K of 2M followers, then crashes. 1.2M followers never get the post. Without protection: permanent gap. With Kafka: offset not committed → reprocessed on restart. The 800K writes already done are idempotent (Redis sorted sets deduplicate by member).

→ Fix: Never commit Kafka offset before all writes succeed
→ Fix: Checkpoint progress in Redis for >1M follower fan-outs
→ Fix: Parallel workers with consistent hash bucketing — 2M ÷ 20 workers = 10s total

🔥

Redis Cluster Down

Feed cache gone. Like/comment counts show 0. Rate limiting stops working (risk: spam through). Session tokens unvalidatable. This is the scariest failure because Redis is in both the hot read and write path.

→ Fix: Graceful fallback — try Redis, on failure rebuild feed from Postgres (~300ms vs ~5ms)
→ Fix: Redis Cluster — 3 primary shards × 1 replica, multi-AZ
→ Fix: AOF persistence — worst case: lose last 1 second of counter data
→ Fix: Rate limiting fallback — fail open with aggressive logging, not fail closed

💾

Postgres Primary Goes Down

All writes fail for 20-30 seconds during failover. Reads continue from replicas. New posts cannot be created. During the gap, API queues writes to Kafka rather than dropping them — when Postgres returns, they land safely.

→ Fix: Patroni/PGBouncer for automatic failover (~20s total)
→ Fix: synchronous_commit = on for post creation — zero data loss on failover
→ Fix: Kafka buffers writes during the 20s gap — no user-visible data loss

⚡

Thundering Herd on Cache Expiry

All users' feed caches expire at the same time (24h TTL). Thousands of simultaneous requests hit Postgres to rebuild feeds. DB gets crushed before caches are repopulated. Latency spikes → timeouts → potential cascade.

→ Fix: TTL jitter — base 24h + random(0, 3600s) spreads expiry over 1 hour
→ Fix: Mutex lock on cache rebuild — only one request triggers rebuild per user
→ Fix: Proactive cache warming — background job pre-warms feeds for users with last_active_at > 20h

🌡️

Hot Shard (Celebrity Problem)

Sharding by user_id means Beyoncé's posts all land on one shard. 100M followers reading her posts → that shard gets 10-100× the traffic of any other shard. Standard sharding doesn't help — the problem is data locality, not distribution.

→ Fix: Celebrity-dedicated shards — accounts >10K followers get their own shard
→ Fix: Post content cache in Redis — celebrity posts almost never hit Postgres directly
→ Fix: Extra read replicas for hot shards (5 replicas instead of 2)

🏔️

Cascading Failure — the Most Dangerous

Slow Postgres → API servers wait longer → thread pools fill up → new requests queue → memory fills → API starts timing out all requests including ones that don't need the DB. One slow component takes down the entire service.

→ Fix: Circuit breaker — CLOSED → OPEN (50% failure rate in 10s) → HALF-OPEN (30s) → CLOSED
→ Fix: Bulkheads — separate thread pools for feed reads, post writes, fan-out
→ Fix: Explicit timeouts — API→Redis: 50ms, API→DB: 500ms, API→Kafka: 200ms
→ Fix: When circuit opens, serve stale feed from Redis rather than 503

Failure	User Impact	Recovery Time	Key Mechanism
API server crash	Brief errors on in-flight requests	< 10s	Stateless + LB health checks
Kafka lag	Feed delay — not data loss	Minutes	Consumer auto-scaling
Fan-out worker crash	Partial feed, auto-retried	Seconds	Kafka offset replay
Redis cluster down	Slower feeds (DB fallback)	< 30s	Replica promotion + fallback code
Postgres primary down	20-30s write outage	20-30s	Patroni failover + Kafka buffer
Thundering herd	Slow feeds during spike	Minutes	TTL jitter + mutex lock
Hot shard	Degraded for celebrity content	Ongoing	Dedicated shards + content cache
Cascading failure	Full outage if unmitigated	Minutes–hours	Circuit breaker + bulkheads

⚠

Anti-patterns

🚫

Scan all posts, filter by follow graph, sort by time

O(N × following) per feed load. Dies past ~1M users.

✓ Better: Pre-computed sorted-set feed per user; fan-out-on-write for normal users.

🚫

ElasticSearch as primary feed store

Great for search; suboptimal for infinite-scroll + real-time inserts.

✓ Better: Redis sorted sets for hot feed; ES for search + analytics.

🚫

Re-rank every feed request from scratch

Ranking 5000 candidates per load × 100M DAU is compute-bound.

✓ Better: Pre-compute ranked batches offline; rerank top-200 online with lightweight ML.

13

Interview Tips

01

Lead with the fan-out question before drawing anything. The first thing you say after "I'll design a news feed" should be: "The core tension is the fan-out problem — when a celebrity with 10M followers posts, when do I update their feeds?" Then answer it. This frames the entire design and signals you know what makes this problem hard.

02

The hybrid strategy is not optional at FAANG scale. Pure fan-out on read fails (500 queries per feed load). Pure fan-out on write fails (celebrity problem). Hybrid is the only correct answer. Name the threshold explicitly: >10K followers = celebrity = fan-out on read. Interviewers will push you on where you draw the line.

03

Engagement writes are the trap most candidates miss. If you say "write likes to Postgres" in a 500M DAU system, you've already lost the interviewer. The number — 87,000 writes/second — is what justifies Redis counters + Cassandra. Always derive the number before proposing the solution. "87K writes/second exceeds Postgres's single-node limit by 4×, so here's my three-layer approach…"

04

Name the composite index explicitly. Say: (user_id, created_at DESC) is the most important index in this system. Explain why: celebrity fetches, feed rebuilds, and fan-out lookups all use it. Without it, every such query is a full table scan. Interviewers who care about databases will nod and move on — this is the signal that you understand the data access patterns.

05

Always address the follow graph sharding problem unprompted. If you shard the follows table, you can't efficiently serve both "who follows me?" and "who do I follow?" from the same shard. The answer — store it twice — shows you understand write amplification as a conscious tradeoff. Most candidates either ignore this or give a vague answer about "using a graph database."

06

For failure scenarios: three parts every time. What breaks → what the user experiences → how you recover. Don't just say "add a read replica." Say: "During the 30-second Postgres failover, API servers queue writes to Kafka instead of failing them. The user gets an optimistic 200 OK. By the time they refresh, Postgres is back and the post is durably stored." That's a complete answer.

07

On comments: say "materialized path" immediately. Most candidates say "adjacency list" or "just store parent_id." Materialized path — where you store the full ancestry as a dot-separated string — gives O(1) writes and a single LIKE index scan for any subtree. Name it, describe it in one sentence, and move on. It signals you've thought about tree storage beyond the toy example.

08

Know when NOT to use this stack. The Kafka + Flink + Cassandra + Redis architecture is justified at 87K writes/second. At 500 writes/second, it's massive over-engineering. If an interviewer asks "what would you do differently at 10K users?", say: "Synchronous Postgres writes, no Kafka, no Cassandra. This stack adds 6 failure points — you don't need it until you've hit Postgres's limits." Shows maturity and pragmatism.

09

The cascading failure question is almost always asked. Prepare: "If Postgres slows down, API thread pools fill up and the whole service degrades — even endpoints that don't touch the DB. The solution is a circuit breaker: if the failure rate exceeds 50% in 10 seconds, trip the breaker and serve stale feeds from Redis rather than timing out everything." This answer shows systems thinking, not just component knowledge.

10

For the OnlyFans / paid content follow-up: flip the model. "Instead of fan-out, think content broadcast. Pre-warm the CDN before firing push notifications. 500K subscribers hit the CDN — every read is a cache hit, origin DB never touched. Separate access control via signed URLs so auth doesn't block content delivery." This three-beat answer covers the thundering herd, CDN strategy, and access control in under 30 seconds.

14

How the Design Evolves

Phase 1 — 0 to 10K users

Monolith + Single Postgres

One server, one database, reverse-chronological feed built at read time. No cache, no queue, no fan-out service. Ship fast — premature optimisation kills startups. Feed is slow but the product can be validated.

Phase 2 — 10K to 500K users

Separate DB + Redis Cache

Move DB to its own server. Add Redis in front for hot reads. CDN for media. Read replica on Postgres. Start pre-building feeds for active users in Redis. Most products live here their entire lives.

Phase 3 — 500K to 10M users

Async Fan-out + Sharding

Add Kafka for async fan-out. Shard Postgres by user_id. Introduce hybrid fan-out (celebrities vs regular users). Engagement writes move to Redis counters + periodic Postgres flush. First detection of hot shards.

Phase 4 — 10M to 100M users

Full Pipeline + Cassandra

Engagement records move to Cassandra. Add Flink for stream processing (dedup, trending, counter flush). Algorithmic ranking introduced — two-stage scoring. Follow graph moves to Redis sets. Celebrity-dedicated shards. CDN pre-warming for high-fan-count creators.

Phase 5 — 100M+ users

Multi-Region + Custom Infrastructure

Multi-region active-active with global load balancing. Regional feed caches. ML ranking model updated hourly. Separate microservices for every major component. Most engineers never operate here — but knowing it shows architectural depth.

📺

References & Videos

Design Twitter — System Design Interview

Gaurav Sen · 15 min

Design a News Feed System

ByteByteGo · 11 min

Design Twitter — Fan-out Deep Dive

Jordan Has No Life · 32 min

Design a News Feed System

AlgoMaster

Design Twitter

GeeksforGeeks

Requirements

Scale Estimation

API Design

High-Level Architecture

Deep Dive — The Fan-out Problem

Key Design Decisions & Tradeoffs

What Can Go Wrong

Anti-patterns

Interview Tips

Similar Problems

How the Design Evolves

References & Videos

Interview Framework