Slack / Discord

A real-time messaging platform with channels, DMs, threads, reactions, presence, and file sharing. The hard parts: a persistent WebSocket fleet holding tens of millions of long-lived connections, message fan-out into channels with up to hundreds of thousands of members, and a presence system that reliably tells everyone who's online without collapsing under the cross-product of "everyone watching everyone." Discord carries ~200M MAU; Slack ~35M DAU. Both push billions of messages per day through hot WebSocket paths.

⚡ Core: WebSockets + Fan-Out + Presence200M MAU (Discord)~4B msgs/dayHuge channels (100k+ members)Persistent connections

Requirements

Functional

Send & receive text messages in channels, DMs, group DMs in real-time
Threads / replies — sub-conversations inside a parent message
Presence — know which users are online / away / offline
Typing indicators, read receipts, emoji reactions
File / image attachments; link previews (unfurling)
Message history: infinite scroll, search, edit, delete
Push notifications when mentioned or DM'd while away

Non-Functional

Message delivery < 200 ms p99 end-to-end on the same region
Ordered delivery per-channel; never show msg B before msg A if A was sent first
At-least-once delivery; clients dedupe on message_id
Scale to 10M concurrent WebSocket connections per region
Channels with up to 500k members (Discord "large guilds")
99.99% availability — users notice outages within seconds
Eventual consistency on reactions + presence; strong on message order

Scale Estimation

Concurrent WS connections

10M

per region; need ~500 gateway servers @ 20k conns each

Messages / sec (peak)

50k

4B/day ÷ 86,400 ≈ 46k avg; peaks 2–3×

Fan-out amplification

~200×

1 msg → avg 200 channel members → 10M deliveries/sec peak

Message storage

~2 TB / day

4B msgs × 500 B avg; ~700 TB/yr raw; compressed ≈ 250 TB/yr

Presence events

10M/min

connect/disconnect/away transitions — dominant overhead on large servers

WebSocket payload

~1 KB

avg msg envelope; reels/file pointers small; bytes live in blob store

API Design

Two surfaces: a REST API for stateful operations (create channel, fetch history) and a WebSocket gateway for real-time events. All real-time flows use the WebSocket; REST is for things you'd do from any tab.

WSwss://gateway.example.com/?v=9&token=JWT

Establish persistent connection. Server sends Hello + heartbeat_interval; client heartbeats every N seconds. Receives push events (MESSAGE_CREATE, TYPING_START, PRESENCE_UPDATE, etc.)

POST/api/channels/{channel_id}/messages

Send a message. Body: { content, attachments[], reply_to?, nonce }. Nonce enables client-side dedup. Returns { message_id, ts }.

GET/api/channels/{channel_id}/messages?before=ID&limit=50

Paginated message history. Cursor-based (not offset); cursor = message_id. Lets you scroll back indefinitely.

POST/api/channels/{channel_id}/messages/{id}/reactions/{emoji}/@me

Add reaction. Separate endpoint so rate limits can target reactions specifically (common spam vector).

POST/api/channels/{channel_id}/typing

Broadcast typing-indicator. Fire-and-forget; no persistence. Gateway broadcasts to channel subscribers; auto-expires after 10 s.

GET/api/search?q=term&channel_id=X

Full-text search across message history. Backed by Elasticsearch; results scoped by user's channel membership.

POST/api/attachments/upload

Returns pre-signed S3 URL + attachment_id. Client uploads directly; then includes attachment_id in POST /messages.

POST/api/guilds/{guild_id}/members/{user_id}

Join / leave server. Triggers membership update event to all guild members (GUILD_MEMBER_ADD).

Architecture

The defining piece is the gateway tier — a fleet of stateful WebSocket servers. Each gateway holds tens of thousands of connections. A session store (Redis) maps user_id → gateway_id + conn_id so any service can push to any user. A fan-out bus (Redis pub/sub or Kafka) delivers messages to the right gateways.

Gateway + Fan-Out Architecture SVG

Request Flow — Step Through

Client · sends WS frame→Gateway · auth + rate-limit→Message svc · validate + persist→Cassandra · partition by chan→Redis pub/sub · channel topic→Recipient GW · subscribed gateways→Recipient · WS push

Click Next Step to walk through the request flow.

Deep Dive — Message Send + Fan-Out

Sending a message in a 50k-member channel means one write + 50k pushes. The critical path is: persist → fan-out → edge-deliver. The trick is keeping it fast while staying correct under reconnects, dedup, and order preservation.

Client sends over WebSocket to its gateway with a client-generated nonce (for dedup) and the target channel_id.
Gateway validates auth + rate limits (e.g., 5 msgs / 5 s per channel per user). Relays to Message service via internal gRPC.
Message service writes to Cassandra with partition key = channel_id, clustering key = message_id (Snowflake/ULID — monotonic, sortable). The message_id IS the ordering; no per-row timestamps needed.
Publish to Redis pub/sub on topic channel:{channel_id}. Every gateway subscribes to topics for channels it has active viewers in.
Gateways receive the pub/sub event and push over WS to each local subscriber. At 10M conns × 500 gateways, one channel publish touches only the ~10 gateways where actual viewers sit — not all 500.
Offline users route through Push service: APNs / FCM notification if mobile; queued in inbox if web.

Sequence — Message Send End-to-End Mermaid

sequenceDiagram participant C as Client participant G1 as Gateway A participant MS as Message svc participant DB as Cassandra participant PS as Redis pub/sub participant G2 as Gateway B participant R as Recipient C->>G1: WS send { nonce, text } G1->>G1: auth + rate-limit G1->>MS: gRPC createMessage MS->>DB: INSERT by channel_id, msg_id DB-->>MS: ok MS->>PS: PUBLISH channel:X {envelope} MS-->>G1: ack { message_id, ts } G1-->>C: ACK (echoes nonce) PS-->>G2: deliver envelope G2->>R: WS push MESSAGE_CREATE

Ordering guarantee. Clients see messages in the order the Message service wrote them to Cassandra, because message_ids are monotonic and clients sort by message_id. Out-of-order arrivals (e.g., reconnected client replaying) are reordered on the client before render.

Dedup. Client-generated nonce lets the Message service detect and reject replays of the same send. Server-assigned message_id lets clients detect duplicate delivery (pub/sub at-least-once) and collapse.

Huge channels. A 500k-member channel can't fan out to 500k WS conns per message — that's 50k msgs/sec × 500k = 25B events/sec in aggregate. Real answer: only users currently viewing the channel get real-time pushes; everyone else gets an unread-count update and fetches history when they open the channel. Discord calls this "lazy loading" of large guilds.

Interview answer

"Message goes over the client's persistent WebSocket to its gateway, which persists to Cassandra (partition by channel_id, clustering by monotonic message_id) and publishes to a Redis topic per channel. Every gateway subscribes only to channels its users are actively viewing, so one publish fans out to ~10 gateways, not 500. Ordering comes from monotonic message_ids; dedup via client nonce. Large guilds don't fan out messages — only viewers get real-time; everyone else gets unread counts."

Tradeoffs & Design Choices

Redis pub/sub vs Kafka for fan-out. Redis pub/sub: sub-ms latency, zero durability, easy ops. Kafka: durable, replayable, higher latency (~5–10 ms). Chat demands latency; the "durable replay" lives in Cassandra history, so Redis pub/sub is the right pick. Kafka is still used upstream for analytics + notifications (audit trail).
Cassandra vs Postgres for messages. Per-channel write patterns are append-only by message_id; Cassandra's partition-by-chan_id gives free sharding + cheap range scans for history. Postgres fails at 4B-row monotonic tables.
Presence is cross-product expensive. 10M online users × ~200 mutuals = 2B presence events per state change wave. Batch + coalesce on gateway egress: only send presence updates that changed in the last few seconds, bundled per tick.
Sticky WS sessions. Can't round-robin each WS frame — they belong to a specific conn. Load balancer uses consistent hashing on conn_id; reconnect lands on a different gateway fine (session store holds user→gateway).

Failure Modes

💥

Gateway crashes with 20k active connections

One gateway dies; 20k clients reconnect simultaneously → thundering herd on the load balancer + session store.

→ Mitigation: client reconnects with jittered backoff (1–30 s). Gateway capacity planned with 30% headroom. Session resume uses session_id to replay missed events (Discord "resume" semantics).

🕒

Redis pub/sub partition drops

Pub/sub is not durable. A partition loss or restart means dropped messages in flight.

→ Mitigation: message is always persisted in Cassandra first; pub/sub is only for notify. Clients with live sessions that detect a gap (by message_id sequence) fetch from Cassandra to backfill.

📴

Presence storm on deploy

Rolling deploy drains a gateway → 20k disconnects → 20k × 200 mutuals = 4M offline events. Fan-out service melts.

→ Mitigation: graceful shutdown. Drain over 60–120 s. Presence transitions damped with a 30 s "maybe offline" state so reconnects within that window never emit offline→online cycle.

📣

Celebrity @everyone ping

User @-mentions 500k-member channel. Push service must notify every offline member; APNs rate-limits kick in; notifications delayed by minutes.

→ Mitigation: throttle @everyone to mods only. Coalesce per-user notifications ("3 mentions in #general"). Pre-shard push pipeline so no single APNs connection bottlenecks.

🔁

Client replays same send on flaky network

Client sends msg, network hiccups before ACK, client retries → server writes twice.

→ Mitigation: client-supplied nonce. Message service rejects duplicate (user_id, nonce) within a 5-minute window.

🔍

Search index lag on new messages

User searches for a message they sent 2 seconds ago; Elasticsearch hasn't indexed it yet.

→ Mitigation: accept ~30 s search lag as normal. Client-side fallback: scan last N messages of open channel locally before falling through to Elasticsearch.

⚠

Anti-patterns

🚫

One WebSocket server per user

Stateful long-lived connections don't scale horizontally with naive sharding.

✓ Better: Pool of gateway servers each holding tens of thousands of conns; consistent hashing by conn_id.

🚫

Persist message in Kafka as source of truth

Kafka is event log, not OLTP. Random access for scrollback is slow.

✓ Better: Persist in Cassandra (partition by channel_id, clustering by message_id); Kafka only for fan-out notify.

🚫

Broadcast every message to every channel subscriber via WebSocket

500k-member Discord channels × 1 msg = 500k WS sends. Infrastructure melts.

✓ Better: Lazy loading: only users actively viewing get real-time pushes; others see unread counters + fetch on open.

Interview Tips

Name the hard parts fast. "Persistent WebSockets, fan-out, presence." Don't start with sharding the user table — that's trivial here. The gateway tier and fan-out is the actual design.
Distinguish durability from delivery. Cassandra holds the durable record; pub/sub is fire-and-forget notify. This decoupling is the whole design — explain it.
Huge channels ≠ huge fan-out. Show that you know Discord doesn't actually push messages to 500k WS conns on every send. Viewers get pushed; others get unread counts + history pull on open.
Pick the right storage. Cassandra (messages) + Postgres (users/guilds) + Elasticsearch (search) + S3 (attachments). Don't jam it all in one DB.
Presence is its own system. It's read-heavy, write-heavy, and not durable. Separate service, separate scale story.

Evolution

MVP — long-polling + single Postgres

HTTP long-poll for "new messages since ts". Postgres for everything. Works to ~10K users. Slack's earliest architecture was close to this.

WebSocket gateways + Redis

Persistent WS connections via dedicated gateways. Redis pub/sub for fan-out. Cassandra for messages. Carries to ~1M concurrent.

Separated presence service + sharded pub/sub

Presence extracted because write + read patterns differ. Pub/sub sharded by channel_id hash. ~10M concurrent.

Large-guild "lazy" fan-out

Guilds > 75k members stop pushing messages to non-viewers; they get unread counts + history-on-open. Discord shipped this specifically because huge servers melted the fan-out tier.

Multi-region gateways + global routing

Gateways in every region; users connect to nearest. Cross-region channels replicate via Kafka-style log. End-to-end latency held under 200 ms globally.

📺

References & Videos

Design Discord — System Design Interview

Jordan Has No Life · 30 min

Slack System Architecture

Hussein Nasser · 22 min

How Discord Stores Trillions of Messages

ByteByteGo · 9 min

How Discord Stores Trillions of Messages

ByteByteGo Blog

How Discord Stores Trillions of Messages (Official)

Discord Engineering

Next up

PROBLEM

DM-first chat; end-to-end encrypted; same gateway pattern

Read →

PROBLEM

Notification System

Push notifications for offline members

Read →

Slack / Discord

Requirements

Scale Estimation

API Design

Architecture

Deep Dive — Message Send + Fan-Out

Tradeoffs & Design Choices

Failure Modes

Anti-patterns

Interview Tips

Similar Problems

WhatsApp

Notification System

Live Streaming Chat

Distributed Queue

Video Conferencing

Evolution

MVP — long-polling + single Postgres

WebSocket gateways + Redis

Separated presence service + sharded pub/sub

Large-guild "lazy" fan-out

Multi-region gateways + global routing

References & Videos

WhatsApp

Notification System