WhatsApp

02

Requirements

Functional — Must Design

One-to-one messaging with three-state delivery receipts — sent / delivered / read
Group messaging up to 256 members
Offline message queuing — held for 30 days, delivered on reconnect
Push notifications via APNs (iOS) and FCM (Android)

Non-Functional

Availability over consistency — system stays up even if delivery is briefly delayed
Online-to-online delivery under 100ms
At-least-once delivery — duplicates acceptable, loss is not
Scale: 100 billion messages/day, 50M concurrent connections

Mention — Don't Fully Design

Media sending — object storage + URL in message body
Typing indicators and online / last seen presence
Multi-device sync

E2EE — Acknowledge Deeply

Signal Protocol — per-message key derivation
Server is a blind relay — forwards ciphertext it cannot read
Keys stored only on devices, never on server

03

Scale Estimation

Metric	Calculation	Result
Total / Daily Active Users	Industry figures	2B / 500M DAU
Messages per day	Industry figure	100B
Average write RPS	100B ÷ 86,400	~1.16M/sec
Peak write RPS	1.16M × 3 (peak multiplier)	~3.5M/sec
Avg message size	Text + metadata	~500 bytes
Storage per day	1.16M × 500B × 86,400	~50 TB/day
Concurrent WebSocket connections	10% of DAU online simultaneously	~50M

Chat Server Sizing

Memory per WebSocket connection: ~100 KB. A 32 GB server at 65% utilisation holds ~200K connections. To serve 50M concurrent connections: 50M ÷ 200K = ~250 chat servers. Provision 300–350 for failover headroom and traffic spikes.

The Write Amplification Trap

One message in a 256-person group generates 255 delivery receipts + 255 read receipts — ~510 additional writes. At scale, the actual write load is 3–5× the raw message count. Always account for this in your estimates.

04

API Design

The primary API is WebSocket-based for real-time messaging. REST endpoints handle auxiliary operations — fetching history, managing groups, registering devices.

WS wss://chat.whatsapp.com/connect Establish persistent connection

// Client sends on connect (auth + registration) { "type": "connect", "user_id": "u123", "token": "jwt...", "device_id": "d456" } // Server registers: SET session:u123 → "chat-server-7" { "type": "connected", "server_id": "chat-server-7" }

WS send_message Send a message over WebSocket frame

// Client → Server (WebSocket frame) { "type": "message", "message_id": "msg_uuid_abc", // client-generated for deduplication "conversation_id": "conv_xyz", "recipient_id": "u456", "content": "hey", "timestamp": 1711180800000 } // Server → Client (immediate ✓ sent receipt) { "type": "ack", "message_id": "msg_uuid_abc", "status": "sent" }

GET /v1/conversations/:id/messages Fetch message history (paginated)

// Query params: ?before=timestamp&limit=50 // Response 200 { "messages": [ { "id": "msg_1", "content": "hey", "ts": 1711180800000, "status": "read" } ], "cursor": "eyJ0cyI6MTcxMX0=", "has_more": true }

POST /v1/devices/register Register push notification token

// Request { "user_id": "u123", "platform": "ios", "push_token": "apns_token_xyz" } // Response 200 { "device_id": "d456", "registered": true }

05

High-Level Architecture

Architecture — High Level SVG Diagram

WebSocket + Load Balancer

Persistent bidirectional connections. The LB uses sticky sessions — consistent hashing on user ID ensures Alice's frames always reach Server A. If Server A dies, Alice reconnects with exponential backoff + jitter.

Redis — Connection Registry

Maps user_id → server_id with a 60-second TTL refreshed by heartbeat. The most underrated component in this design. Without it, cross-server message routing is impossible. If Redis goes down, delivery degrades gracefully to the offline queue.

Kafka — Message Pipeline

Three jobs: (1) durable buffer so Alice gets ✓ in <1ms before DB write, (2) fan-out for group messages — one write triggers N deliveries, (3) carries receipt events back upstream from recipient to sender.

Cassandra — Message Storage

Partitioned by conversation_id, clustered by timestamp DESC. Chosen over PostgreSQL for linear write scalability — append-only write path (memtable + commit log) handles 1.2M writes/sec without manual sharding.

06

Deep Dive — WebSockets & Real-Time Routing

The Core Problem

HTTP is request-response — the server can never reach out unprompted. Chat requires the server to push to the client at any moment. WebSockets solve this with a persistent bidirectional connection. But persistent connections are stateful — and stateful connections break standard horizontal scaling.

Why delivery feels instant across geography. The naive mental model is Alice → WhatsApp California → Bob. That would be ~250ms of physics. The real path is Alice → nearest regional DC → private backbone → Bob's regional DC → Bob. Multi-region deployment dramatically shortens the actual travel distance.

More importantly, the WebSocket is already open. HTTP incurs TCP + TLS handshake overhead on every request — 3-4 round trips before any data moves. WebSocket pays this cost once on connect. Every subsequent message is one round trip. At 50ms RTT to the nearest DC, that's 50ms latency for same-region delivery.

Receipt latency breakdown (Bob receives → Alice sees ✓✓): Bob → Server B ~10ms · Server B → Kafka <1ms · Kafka → Receipt Handler ~5ms · Redis lookup <1ms · Handler → Server A ~10ms · Server A → Alice ~10ms = ~37ms total.

Sequence — Message Send & Receipt Flow Mermaid.js

sequenceDiagram participant A as Alice participant SA as Chat Server A participant K as Kafka participant R as Redis participant C as Cassandra participant SB as Chat Server B participant B as Bob A->>SA: send "hey" (WebSocket frame) SA->>K: publish message event K-->>SA: ack < 1ms SA-->>A: ✓ sent receipt par Async consumers K->>C: write to Cassandra and K->>R: GET session:bob R-->>K: "chat-server-B" K->>SB: deliver to Bob SB->>B: push message (WebSocket) B-->>SB: delivered receipt SB->>K: publish receipt event K->>R: GET session:alice R-->>K: "chat-server-A" K->>SA: route receipt SA-->>A: ✓✓ grey (delivered) end B->>SB: opens chat (read receipt) SB->>K: publish read event K->>SA: route read receipt SA-->>A: ✓✓ blue (read)

The Connection Registry — Most Missed Component

When Bob connects, Chat Server B writes: SET session:bob → "chat-server-B" EX 60. Every message delivery does GET session:bob first. The 60-second TTL means if Server B crashes, Redis auto-cleans Bob's stale entry — no manual cleanup needed. This is the piece most YouTube tutorials skip entirely.

Request Flow — Step Through

Alice · Mobile→Chat Server A · WebSocket→Kafka · Pipeline→Redis Lookup · session:bob→Chat Server B · Bob's server→Bob · Mobile

Click Next Step to walk through the request flow.

08

Key Design Decisions & Tradeoffs

Chosen

WebSockets

Persistent bidirectional connection. One-time handshake cost. Idle connections cost only memory (~100KB). 50M connections = ~250 servers. Server can push at any time.

✓ Necessary at this scale

Rejected

HTTP Long Polling

Works. Used by early chat systems. But every poll is a full HTTP request — TCP + TLS overhead. At 50M users polling every 3 seconds: ~17M requests/sec just to check for messages. Unsustainable.

✗ Too expensive at scale

Chosen

Cassandra

Linear horizontal scaling. Append-only write path (memtable + commit log). Access pattern is always "last N messages in conversation X" — perfectly fits partition-by-conversation model. Handles 1.2M writes/sec natively.

✓ Justified by write characteristics

Rejected for messages

PostgreSQL

ACID, joins, complex queries. But primary-node writes become the bottleneck. Manual sharding by conversation_id is operationally painful. No advantage here since we never need joins on the message read path.

~ Right for users / metadata tables

Chosen

At-Least-Once Delivery

Kafka publishes, DB consumer and delivery consumer run independently. If delivery consumer crashes post-delivery but pre-ack, message is replayed. Client deduplicates via message_id. Duplicates are rare and invisible.

✓ Industry standard approach

Rejected

Exactly-Once Delivery

Requires distributed transactions and two-phase commits across Kafka + DB + delivery layer. Adds significant latency overhead. At 1.2M writes/sec the cost is prohibitive. The benefit (no duplicates) is solvable client-side for free.

✗ Too expensive for marginal gain

Chosen

Availability over Consistency

Messages queue when downstream components degrade. Kafka accepts writes even if Cassandra is slow. Delivery router queues offline even if Redis is flaky. System never rejects a send — it just delivers later.

✓ Users tolerate delay, not failure

Rejected

Consistency over Availability

Would require all replicas to confirm before acknowledging a send. Under any network partition, sends fail with errors. UX catastrophe — users panic on failed sends, retry storms compound the problem.

✗ Wrong tradeoff for chat

09

What Can Go Wrong

💥

Chat Server Crash — Thundering Herd

Server holding 200K connections crashes. All 200K users attempt reconnection simultaneously. Remaining servers at 65% capacity get slammed — potentially cascading to more crashes. Redis session entries are stale, pointing to dead server.

Recovery: Load balancer detects dead server via health check (every 5-10s), stops routing. Redis TTL (60s) auto-expires stale session entries — no manual cleanup. Clients reconnect with exponential backoff + jitter: 200K reconnects spread over 30-60 seconds instead of simultaneously. Messages missed during outage window delivered from offline queue on reconnect.

→ Fix: Exponential backoff + jitter on client · Redis TTL self-cleanup · offline queue covers message gap

🔴

Redis Goes Down — Routing Blind

Connection registry unavailable. Delivery router can't find recipient's server. All real-time delivery to online users breaks. Crucially: no messages are lost — Kafka and Cassandra are still running. The only broken path is real-time delivery.

Recovery: Delivery router treats all users as offline, queues everything. Redis Sentinel / Redis Cluster promotes replica within ~30 seconds. Real-time delivery resumes. Queued messages flush automatically. Users see a ~30s delay on message delivery — not an error.

→ Fix: Redis Cluster with automatic failover · graceful degradation to offline queue path

📢

Group Fan-out Explosion

One message in a 256-person group where all are online generates: 256 Redis lookups + 256 WebSocket pushes + 256 delivered receipts + 256 receipt deliveries = ~1,000+ operations per message send. At scale with many active groups, the delivery router becomes a bottleneck.

Mitigation: Batch the 256 Redis lookups into a single pipeline call — one round trip instead of 256. WhatsApp's 256-member cap is a deliberate architectural decision, not a product preference — it's the threshold where server-side fan-out remains manageable. For systems with larger groups (Telegram at 200K members), flip to fan-out on read.

→ Fix: Redis pipelining for batch lookups · group size caps · fan-out on read for very large groups

⚡

Kafka Consumer Lag

If a Kafka consumer (message writer, delivery router, receipt handler) falls behind, messages pile up in the Kafka topic. Delivery delays grow. If the consumer crashes entirely, unprocessed messages wait until it recovers or another instance takes over.

Kafka's consumer group model handles this automatically — multiple consumer instances share partitions. If one crashes, its partitions are reassigned to healthy instances within seconds. Messages are not lost because Kafka retains messages until explicitly committed.

→ Fix: Consumer groups with multiple instances · monitor consumer lag · auto-scaling consumers on lag spike

⚠

Anti-patterns

🚫

Relational DB with rows per message

~100B messages/day on a RDBMS is hopeless.

✓ Better: Cassandra (partition by chat_id, clustering by ts); cold data tiered to cheaper storage.

🚫

Always-on server storage for all media

S3 bills at EB scale get expensive.

✓ Better: E2E encrypted media; server holds encrypted blob briefly, expires after recipient fetches.

🚫

Single global WebSocket gateway

Cross-region latency kills typing indicators + delivery receipts.

✓ Better: Regional gateways; users connect nearest; cross-region messages routed via Kafka-style log.

10

Interview Tips

01

Lead with why WebSockets, not just that you're using them. Say: "HTTP long polling would generate ~17M requests/second at our scale just to check for new messages. WebSockets are persistent — 50M idle connections costs only ~5TB of RAM across ~250 servers." Numbers win interviews.

02

Name the connection registry immediately. Most candidates skip this. The question "how does Server A know Bob is on Server B?" will always come. Have the answer ready: Redis maps user_id → server_id, 60-second TTL refreshed by heartbeat, auto-expires on crash. This one component differentiates good answers from great ones.

03

Kafka appears in two directions. Interviewers will ask "what is Kafka doing here?" The answer has two parts: (1) message delivery pipeline from sender to recipient, and (2) receipt events flowing back from recipient to sender. Make sure you explain both or your design feels incomplete.

04

The 256 group cap is an architecture decision. When asked why WhatsApp limits group size, don't say "product choice." Say: one group message to 256 online members generates ~1,000 operations. At scale with millions of active groups, the fan-out math dictates the cap. Shows you think in systems, not features.

05

E2EE in one sentence, then move on. "Signal Protocol — keys are exchanged on registration and stored only on devices. The server is a blind relay forwarding ciphertext it cannot read." This shows depth without burning 10 minutes on cryptography that won't be tested.

06

When asked about consistency: say "availability over consistency" and immediately explain what that means concretely — messages queue, not fail. Users tolerate a 300ms delay. They do not tolerate a send failure. The CAP tradeoff should come with a user experience justification, not just the theory.

11

How the Design Evolves

Phase 1 — 0 to 10K users

Simple Server + Long Polling

Single Node.js server with Socket.io or basic long polling. SQLite or PostgreSQL for messages. No Kafka, no Redis, no separate push service. Ship fast — premature real-time infrastructure kills early-stage products.

Phase 2 — 10K to 500K users

Upgrade to True WebSockets + Add Redis

Migrate to real WebSocket connections. Add Redis for session registry and presence. Separate the database. Add read replicas. Introduce basic push notifications. Most startups live here for years — this handles tens of millions of messages per day comfortably.

Phase 3 — 500K to 50M users

Kafka + Cassandra + Dedicated Chat Servers

Introduce Kafka as the message pipeline. Migrate message storage to Cassandra as write volume overwhelms PostgreSQL. Separate chat servers from REST API servers. Add consumer microservices for delivery routing, receipts, offline queuing. This is the architecture we designed today.

Phase 4 — 50M+ users

Multi-Region + Custom Infrastructure

Deploy to multiple geographic regions — Americas, Europe, Asia-Pacific. Private fibre backbone between DCs for sub-50ms inter-region latency. Global load balancing routes users to nearest region. Cassandra multi-region replication. At WhatsApp scale: ~250 chat servers hold 50M WebSocket connections — surprisingly lean for 2 billion users.

📺

References & Videos

Whatsapp System Design — Chat Messaging Systems

Gaurav Sen · 18 min

Design WhatsApp/Facebook Messenger

Tech Dummies · 28 min

Design a Chat System

ByteByteGo · 10 min

Design WhatsApp Messenger

GeeksforGeeks

Design WhatsApp

AlgoMaster

Requirements

Scale Estimation

API Design

High-Level Architecture

Deep Dive — WebSockets & Real-Time Routing

Key Design Decisions & Tradeoffs

What Can Go Wrong

Anti-patterns

Interview Tips

Similar Problems

How the Design Evolves

References & Videos

Interview Framework