System Design — 06

Live Streaming Platform

Design a Twitch-scale live streaming platform where creators broadcast live video to millions of concurrent viewers with sub-5-second latency, real-time chat, and adaptive quality.

Video PipelineHLS / CDNWebSocketReal-TimeFan-Out
01

Problem Statement

Design a live streaming platform like Twitch or YouTube Live where creators broadcast live video to potentially millions of concurrent viewers. The system must handle video ingest from streamers, real-time transcoding to multiple quality levels, global distribution via CDN, interactive chat, stream discovery, and social features — all with glass-to-glass latency under 5 seconds.

The defining challenge is getting video from one producer to millions of consumers in near real-time. Unlike VOD (YouTube/Netflix), every second of latency matters because chat interaction between streamer and audience is the core product experience.

Core question: How do you ingest a single video stream from one creator and deliver it to potentially millions of concurrent viewers with low latency, adaptive quality, and interactive features?

02

Requirements

Functional Requirements

  • Go Live — Streamer broadcasts from OBS/mobile via RTMP/SRT with stream key authentication, reconnection handling, and health monitoring
  • Watch Live — Viewers watch with adaptive bitrate (1080p→360p), signed CDN URLs, and sub-2-second time-to-first-frame
  • Live Chat — Real-time bidirectional messaging via WebSocket with emotes, badges, moderation, rate limiting, and message sampling for mega-streams
  • Stream Discovery — Browse by category, search channels/streams, personalized recommendations, real-time viewer count rankings
  • Follow / Subscribe — Free follows (social graph), paid subscriptions (entitlements), go-live push notifications with engagement tiering
  • VOD — Past broadcasts recorded in parallel during live stream, clip creation without re-encoding, chat replay alongside VODs

Non-Functional Requirements

  • Low Latency — Glass-to-glass < 5 seconds (standard HLS), < 2.5 seconds (LL-HLS). Latency budget: encoding 0.5s + transcoding 0.5s + segmentation 0-2s + CDN 0.3s + player buffer 2s
  • High Availability99.99% uptime. Three-plane separation (ingest, media, control) ensures blast radius containment. Database crash doesn't kill video.
  • Massive Concurrency1M+ viewers per stream. CDN handles 10 Tbps egress. Chat scales via partitioned WebSocket servers.
  • Adaptive Quality — Multi-rendition transcoding (1080p60/720p30/480p30/360p30/audio-only). Tiered: full ladder for popular streams, passthrough for tiny streams.
  • Global Distribution — Multi-CDN strategy (Akamai + CloudFront + Fastly) with intelligent routing. Origin shield reduces origin load by 100x.
03

Scale Estimation

Derived from Twitch-scale assumptions: 140M MAU, 30M DAU, ~12% online at peak.

3.5M
Peak Concurrent Viewers
150K
Peak Concurrent Streamers
~10 Tbps
Peak Egress Bandwidth
110K
Transcoding CPU Cores
117K/sec
Viewer Heartbeats
~50K/sec
Chat Msg (Burst)
~8 PB
VOD Storage (Steady)
~$24M/mo
CDN Cost

Key Derivation

Egress: 3.5M viewers × 2.7 Mbps weighted average (mix of 1080p/720p/480p/360p) = ~10 Tbps. This single number forces the entire architecture through CDN — no company self-serves 10 Tbps. Transcoding: tiered rendition strategy (passthrough for <10 viewers, 2 renditions for 10-100, full ladder for 1000+) yields ~110K core-equivalents across 150K streams. The power-law distribution matters: top 0.1% of streams consume 30-40% of viewers.

04

API Design

Video is never served through the API — the API returns signed CDN URLs and metadata only. Three auth models: User JWT (API calls), Stream Key (RTMP ingest), Signed CDN Token (playback URLs with HMAC + expiry).

Stream Lifecycle
POST   /v1/streams                    # Create stream (metadata setup, status → CONFIGURED)
PATCH  /v1/streams/{id}               # Update title/category while live
POST   /v1/streams/{id}/end           # End stream → triggers VOD processing
POST   /v1/channels/{id}/stream-key/regenerate  # Invalidate + regenerate stream key
Viewer Playback
GET    /v1/streams/{id}/playback      # Returns signed HLS/DASH/LL-HLS URLs, chat WS URL
POST   /v1/streams/{id}/heartbeat     # Lightweight: rendition, buffer health, QoE metrics → 204
POST   /v1/streams/{id}/playback/refresh-token  # Rotate CDN token every 5-10 min
Chat (WebSocket + REST)
WS     wss://chat.platform.com/ws?stream={id}&token={jwt}
         → CONNECTED (last 50 msgs, emote sets, room settings)
         → MESSAGE (nonce-based dedup, server-side emote parsing)
         → SUBSCRIPTION, RAID, MODERATION events

POST   /v1/chat/{room_id}/moderate    # timeout, ban, unban, delete_message
GET    /v1/chat/{room_id}/messages?around_timestamp=T  # VOD chat replay
Discovery & Social
GET    /v1/directory/streams?category=valorant&sort=viewers  # Cursor-based, Redis ZSET backed
GET    /v1/search?q=pokimane&type=all&live_only=true       # Elasticsearch with liveness boost
POST   /v1/users/{id}/follows         # Follow with per-channel notification toggle
POST   /v1/subscriptions              # Paid sub → Stripe → entitlement cache in Redis
POST   /v1/clips                      # Time-range clip from live/VOD (no re-encoding)

Design principle: The heartbeat endpoint (117K req/sec) goes to a dedicated lightweight service — not the main API. It does zero business logic: updates a Redis counter, publishes QoE to Kafka, returns 204.

05

High-Level Architecture

Three independent planes: Ingest (receives RTMP), Media (transcode + CDN delivery), Control (API, chat, discovery). They scale independently and fail independently — a database crash never kills live video.

INGEST PLANE MEDIA PLANE DELIVERY CONTROL PLANE Streamer OBS / Mobile Ingest PoP RTMP / SRT Stream Router Capacity-aware Transcoder GPU / Multi-rendition Segmenter HLS / LL-HLS Origin NVMe + S3 Object Store S3 / VOD Archive CDN Edge Multi-CDN Origin Shield Request collapse Viewer HLS Player API Gateway Auth / Rate Limit API Servers Stateless Chat Cluster ~100 WS servers Heartbeat Svc 117K req/sec PostgreSQL Primary data Redis Cluster Cache + Pub/Sub Kafka Event bus Elasticsearch Search Cassandra Chat history ClickHouse Analytics RTMP route async S3
Request Flow — Step Through
Viewer Clicks PlayAPI ServerCDN EdgeOrigin ShieldOrigin (NVMe)Player Renders
Click Next Step to walk through the request flow.
06

Deep Dive — Video Ingest → Transcode → HLS Pipeline

This is the defining technical challenge of live streaming — the component that doesn't exist in any other system design problem. Every byte travels: Streamer's GPUOBS encoderRTMPIngest serverTranscoderSegmenterOriginCDNPlayer.

sequenceDiagram participant OBS as Streamer (OBS) participant Ingest as Ingest PoP participant TC as Transcoder participant Seg as Segmenter participant Origin as Origin (NVMe) participant CDN as CDN Edge participant Player as Viewer Player OBS->>Ingest: RTMP connect + stream key Ingest->>Ingest: Validate stream key (Redis) Ingest->>TC: Forward raw H.264 + AAC TC->>TC: Decode → Scale → Re-encode (4 renditions) TC->>Seg: Continuous encoded output per rendition Seg->>Seg: Wait for keyframe boundary (2s) Seg->>Origin: Write .ts segment + update .m3u8 Origin->>CDN: CDN pulls segment (on viewer request) CDN->>CDN: Cache at edge PoP Player->>CDN: GET /720p/playlist.m3u8 CDN-->>Player: Manifest with latest segments Player->>CDN: GET /720p/seg_4530.ts CDN-->>Player: Segment (cache hit) Player->>Player: Decode + render frame

Transcoding: The Expensive Core

Each ingest stream (1080p60, 6 Mbps H.264) is decoded to raw YUV frames, scaled to 4 resolutions, and re-encoded at target bitrates. Encoding at "veryfast" preset (real-time constraint) costs ~2.5 CPU cores per stream. At 150K streams with tiered renditions, that's ~110K cores — the #1 compute cost. GPU encoding (NVIDIA NVENC) reduces per-stream cost with slightly lower quality.

Segmentation: Continuous → Discrete

The segmenter chops continuous encoder output into 2-second .ts files at keyframe boundaries. Each segment is an independently-decodable HTTP-cacheable file. The manifest (.m3u8) is a rolling window of ~10 segments. The CDN caches segments at 200+ edge PoPs — your origin sees ~200 requests per segment, not 1M.

Low-Latency HLS (LL-HLS)

Standard HLS adds ~2s of segmentation delay (wait for full segment). LL-HLS emits 200ms partial segments — the player fetches parts as they're produced. Combined with blocking playlist reloads (player long-polls the manifest, server responds instantly when a new part exists), latency drops from ~5s to ~2.5 seconds.

Latency budget: Encoding 0.5s + Transcoding 0.5s + Part duration 0.2s + CDN propagation 0.3s + Player buffer 1.0s = ~2.5s glass-to-glass with LL-HLS.

07

Key Design Decisions & Tradeoffs

1. Viewer Delivery Protocol

✓ Chosen

HLS over CDN

Segments are HTTP-cacheable files. CDN edge serves 5000 viewers from cache per PoP. Origin sees ~200 req/segment for 1M viewer stream. Scales to infinity. Cost: 3-5s latency floor.

✗ Alternative

WebRTC (SFU)

Sub-second latency (200-500ms). But each viewer = separate connection to your infra. 1M viewers = 4 Tbps from YOUR servers, ~20K SFU servers. No CDN caching. Only viable for <10K viewers.

2. Segment Duration

✓ Chosen

2-Second Segments + LL-HLS

3-5s latency. Fast ABR switching (adapt quality every 2s). Twitch's proven sweet spot. More CDN requests but smaller per-request overhead.

✗ Alternative

6-Second Segments

10-15s latency. Better CDN efficiency (3x fewer requests). More resilient to jitter. Chat interaction feels sluggish. Better for pure broadcast (sports/news).

3. Transcoding Strategy

✓ Chosen

Server-Side, Tiered Renditions

Full ladder (5 renditions) for 1000+ viewer streams. 2 renditions for 10-100. Passthrough for <10. Saves ~60% transcoding cost while ensuring ABR for most viewers.

✗ Alternative

Passthrough Only / SVC

Zero transcoding cost. But viewers with slow connections can't watch. SVC (Scalable Video Coding) is theoretically ideal but lacks decoder support and produces worse quality.

4. Chat Pub/Sub Backend

✓ Chosen

Redis Pub/Sub (real-time) + Kafka (persistence)

Redis delivers chat in <1ms. Kafka persists to Cassandra async. Chat is ephemeral — losing a message during failover is unnoticeable. Dual-path gives speed + durability.

✗ Alternative

Kafka as Primary Chat Bus

2-10ms latency (vs <1ms). 150K dynamic channels don't map well to Kafka's partition model. Durability is overkill for the real-time chat path.

5. CDN Strategy

✓ Chosen

Multi-CDN with Intelligent Routing

2-3 CDN providers. Route viewers to best CDN by real-time performance + cost. Saves 15-25% ($3-6M/mo at scale). CDN-level resilience if one provider has outage.

✗ Alternative

Single CDN

Simpler operations, one API/config. Right choice when CDN spend <$1M/mo. No failover if the provider has regional issues. No negotiating leverage on pricing.

6. Transcoding Redundancy

✓ Chosen

Selective Dual Encoding (Top 5%)

Dual transcoders for streams with 1000+ viewers (70% of all viewers). Seamless failover on crash. Adds ~$216K/mo — cost-proportional protection.

✗ Alternative

No Redundancy / Fast Failover Only

Every crash = 12-18s viewer freeze. Unacceptable for esports finals with 500K viewers. Acceptable for a stream with 3 viewers.

08

What Can Go Wrong

Transcoder Crash Mid-Stream

Top 5% streams (dual-encoded): standby transcoder output takes over in 0-2 seconds — seamless. Remaining 95%: stream router detects failure in 6s, re-routes RTMP to new transcoder, first new segment in 14-18s. Viewers see a freeze then resume. Process isolation prevents one corrupt stream from crashing 20+ co-located streams.

CDN Edge PoP Failure

Multi-CDN router shifts new viewers to healthy CDN within seconds. Existing viewers switch on next token refresh (5-10 min) or immediately via player-side error recovery: detect 3 consecutive segment failures → request new manifest from API → receive URLs for healthy CDN. Recovery: 2-3 seconds.

Origin Server Failure

CDN origin shield serves cached segments for 8 seconds (TTL). Meanwhile, CDN falls back to S3 backup origin (segments async-replicated). Viewers see 0-1 second stall. Without S3 backup: 25-35 second gap while transcoder output is redirected to new origin.

PostgreSQL Primary Failure

Three-plane separation saves us: live streams keep playing (video pipeline doesn't touch DB). Chat keeps working (Redis + Cassandra). Writes fail (new streams, follows, subscriptions) for 15-30s until Patroni promotes a replica. Write queue in Redis bridges the gap.

Thundering Herd — Viral Stream Goes Live

500K viewers arrive in 30 seconds. CDN origin shield + request collapsing reduces origin load to O(1) per segment. API caches playback responses (per-stream, not per-viewer). Chat pre-provisions servers for known large channels. Notifications spread over 60-120s with engagement tiering. Self-healing as caches warm within 10-30s.

Cascading Overload — Transcoder Fleet at Capacity

Admission control: reject new small streams before dropping existing live ones. Priority lanes for partners. Degraded mode: reduce renditions (5→3→1 passthrough). Circuit breakers at each boundary prevent cascade. Auto-scale GPU fleet within 3-5 minutes.

09

Interview Tips

🎬
Lead with the video pipeline, not the chat.
90% of candidates start with the database or chat. The defining challenge is RTMP → transcode → HLS → CDN. Start here to differentiate yourself. Cover it in 3-5 minutes, then layer on chat and discovery.
⏱️
Know your latency budget cold.
"Glass-to-glass breaks into six stages: encoding 0.5s, transcoding 0.5s, segmentation 0-2s, CDN 0.3s, player buffer 2s. With LL-HLS partial segments, we get to ~2.5s." This 20-second answer shows more depth than most show in 20 minutes.
🌐
Say "CDN" in the first two minutes.
10 Tbps egress is impossible to self-serve. CDN is the architecture, not an optimization. Mention multi-CDN for bonus points. If you're 10 minutes in without mentioning CDN, the interviewer questions your understanding of scale.
💰
Mention cost unprompted.
Three big costs: CDN bandwidth (~$24M/mo), transcoding compute (~$1.3M/mo), storage (~$184K/mo). Mention tiered transcoding (save 60%) and AV1 codec transition (save 30-50% CDN). This shows you think like an architect who owns a P&L.
🧱
Separate the three planes early.
Draw Ingest, Media, and Control planes in the first 5 minutes. Say: "They scale independently and fail independently — a database crash never kills live video." This framing organizes the entire discussion.
📊
Use specific numbers, not hand-waves.
"3.5M concurrent viewers × 2.7 Mbps = ~10 Tbps" beats "we'd need a lot of bandwidth." Memorize 5-6 key numbers: 3.5M viewers, 150K streamers, 10 Tbps egress, 110K transcoding cores, 117K heartbeats/sec, $24M/mo CDN.
🚫
Avoid these anti-patterns.
Don't confuse live streaming with video calling (opposite tradeoffs). Don't serve video from your API servers. Don't spend 15 minutes on chat (it's solved — WebSocket + pub/sub). Don't say "microservices" without justifying each extraction.
11

Evolution

How this design grows from MVP to planet-scale. Each transition is triggered by specific numerical thresholds, not arbitrary timelines.

1

MVP — Single Server (~100 viewers)

One machine: Nginx-RTMP receives ingest, FFmpeg transcodes to 1 rendition, Nginx serves HLS segments. Socket.io chat. SQLite. Total cost: ~$50/mo. Breaks at: >1 concurrent streamer or >100 viewers (CPU/bandwidth saturated).

2

Separation — Dedicated Components (~10K viewers)

Separate ingest, transcoding, and origin. Add CloudFront CDN — the single biggest leap (100→unlimited viewers). Multi-rendition transcoding. PostgreSQL + Redis. Dedicated chat service. Cost: ~$3K/mo.

3

Clustering — Horizontal Scale (~100K viewers)

Transcoder fleet with stream router. Origin cluster. Chat cluster with Redis Pub/Sub. Kafka event bus. Elasticsearch for search. Cassandra for chat history. Tiered transcoding strategy. Basic VOD pipeline. Cost: ~$30K/mo.

4

Multi-Region — Global Distribution (~1M viewers)

Ingest PoPs in 5 regions. Regional transcoding clusters. LL-HLS for 2.5s latency. GPU transcoding (NVENC). Dual encoding for top streams. Multi-CDN. Partitioned chat. Dedicated heartbeat service. PostgreSQL sharding. Cost: ~$500K/mo.

5

Planet-Scale — Twitch Tier (3.5M+ concurrent)

Intelligent multi-CDN routing (saves 15-25%). AV1 codec (saves 30-50% bandwidth). Active-active multi-region. Edge computing for manifest manipulation. ISP peering / embedded caches. Real-time ML moderation. Custom CDN infrastructure. Cost: ~$30M+/mo.

Next up