Exercise · Communication

Slack / Discord — Real-Time Messaging

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points45 min whiteboardReference solution →
01

Prompt

Design Slack or Discord. Real-time team messaging with channels, DMs, threads, reactions, presence, file sharing. 10M concurrent WebSocket connections per region; channels up to 500k members (Discord large guilds).

Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

02

Hints (progressive — click to reveal)

Hint 1

The hard part is the WebSocket gateway tier — tens of millions of long-lived connections. Stateful, sticky, connection-oriented. Different from REST.

Hint 2

Large channels break naive fan-out. A 500k-member channel can't push every message to every member's WebSocket.

Hint 3

Separate the durable message log (Cassandra) from the real-time notify fabric (Redis pub/sub). They have different consistency + latency needs.

03

Rubric — 10 points

  • +1 WebSocket gateway tier ~500 servers × 20k conns each; consistent-hash LB by conn_id
  • +2 Per-channel publish fan-out: only gateways with active subscribers get the message (not all N gateways)
  • +2 Cassandra for durable message history: partition by channel_id, clustering by monotonic message_id
  • +1 Redis pub/sub (not Kafka) for real-time delivery — sub-ms latency; durability is Cassandra's job
  • +1 Large-guild lazy load: channels > 75k members don't push real-time to non-viewers; only unread counts
  • +1 Presence as separate service; batched + coalesced; tolerates brief staleness
  • +1 Client-side message_id + nonce for dedup; clients sort by message_id
  • +1 Graceful reconnect: session resume with sequence number for replay

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

04

Red flags (things that tank the interview)

  • Proposes single Postgres table for all messages
  • Doesn't mention WebSocket — uses HTTP polling
  • Broadcasts every large-channel message to every member's WebSocket
  • Conflates durable storage with notify fabric (e.g., Kafka for both)
  • Ignores presence as a scale problem