Live Streaming Platform

01

Problem Statement

Design a live streaming platform like Twitch or YouTube Live where creators broadcast live video to potentially millions of concurrent viewers. The system must handle video ingest from streamers, real-time transcoding to multiple quality levels, global distribution via CDN, interactive chat, stream discovery, and social features — all with glass-to-glass latency under 5 seconds.

The defining challenge is getting video from one producer to millions of consumers in near real-time. Unlike VOD (YouTube/Netflix), every second of latency matters because chat interaction between streamer and audience is the core product experience.

Core question: How do you ingest a single video stream from one creator and deliver it to potentially millions of concurrent viewers with low latency, adaptive quality, and interactive features?

02

Requirements

Functional Requirements

Go Live — Streamer broadcasts from OBS/mobile via RTMP/SRT with stream key authentication, reconnection handling, and health monitoring
Watch Live — Viewers watch with adaptive bitrate (1080p→360p), signed CDN URLs, and sub-2-second time-to-first-frame
Live Chat — Real-time bidirectional messaging via WebSocket with emotes, badges, moderation, rate limiting, and message sampling for mega-streams
Stream Discovery — Browse by category, search channels/streams, personalized recommendations, real-time viewer count rankings
Follow / Subscribe — Free follows (social graph), paid subscriptions (entitlements), go-live push notifications with engagement tiering
VOD — Past broadcasts recorded in parallel during live stream, clip creation without re-encoding, chat replay alongside VODs

Non-Functional Requirements

Low Latency — Glass-to-glass < 5 seconds (standard HLS), < 2.5 seconds (LL-HLS). Latency budget: encoding 0.5s + transcoding 0.5s + segmentation 0-2s + CDN 0.3s + player buffer 2s
High Availability — 99.99% uptime. Three-plane separation (ingest, media, control) ensures blast radius containment. Database crash doesn't kill video.
Massive Concurrency — 1M+ viewers per stream. CDN handles 10 Tbps egress. Chat scales via partitioned WebSocket servers.
Adaptive Quality — Multi-rendition transcoding (1080p60/720p30/480p30/360p30/audio-only). Tiered: full ladder for popular streams, passthrough for tiny streams.
Global Distribution — Multi-CDN strategy (Akamai + CloudFront + Fastly) with intelligent routing. Origin shield reduces origin load by 100x.

03

Scale Estimation

Derived from Twitch-scale assumptions: 140M MAU, 30M DAU, ~12% online at peak.

3.5M

Peak Concurrent Viewers

150K

Peak Concurrent Streamers

~10 Tbps

Peak Egress Bandwidth

110K

Transcoding CPU Cores

117K/sec

Viewer Heartbeats

~50K/sec

Chat Msg (Burst)

~8 PB

VOD Storage (Steady)

~$24M/mo

CDN Cost

Key Derivation

Egress: 3.5M viewers × 2.7 Mbps weighted average (mix of 1080p/720p/480p/360p) = ~10 Tbps. This single number forces the entire architecture through CDN — no company self-serves 10 Tbps. Transcoding: tiered rendition strategy (passthrough for <10 viewers, 2 renditions for 10-100, full ladder for 1000+) yields ~110K core-equivalents across 150K streams. The power-law distribution matters: top 0.1% of streams consume 30-40% of viewers.

04

API Design

Video is never served through the API — the API returns signed CDN URLs and metadata only. Three auth models: User JWT (API calls), Stream Key (RTMP ingest), Signed CDN Token (playback URLs with HMAC + expiry).

Stream Lifecycle

        POST   /v1/streams                    # Create stream (metadata setup, status → CONFIGURED)
PATCH  /v1/streams/{id}               # Update title/category while live
POST   /v1/streams/{id}/end           # End stream → triggers VOD processing
POST   /v1/channels/{id}/stream-key/regenerate  # Invalidate + regenerate stream key
      

Viewer Playback

        GET    /v1/streams/{id}/playback      # Returns signed HLS/DASH/LL-HLS URLs, chat WS URL
POST   /v1/streams/{id}/heartbeat     # Lightweight: rendition, buffer health, QoE metrics → 204
POST   /v1/streams/{id}/playback/refresh-token  # Rotate CDN token every 5-10 min
      

Chat (WebSocket + REST)

        WS     wss://chat.platform.com/ws?stream={id}&token={jwt}
         → CONNECTED (last 50 msgs, emote sets, room settings)
         → MESSAGE (nonce-based dedup, server-side emote parsing)
         → SUBSCRIPTION, RAID, MODERATION events

POST   /v1/chat/{room_id}/moderate    # timeout, ban, unban, delete_message
GET    /v1/chat/{room_id}/messages?around_timestamp=T  # VOD chat replay
      

Discovery & Social

        GET    /v1/directory/streams?category=valorant&sort=viewers  # Cursor-based, Redis ZSET backed
GET    /v1/search?q=pokimane&type=all&live_only=true       # Elasticsearch with liveness boost
POST   /v1/users/{id}/follows         # Follow with per-channel notification toggle
POST   /v1/subscriptions              # Paid sub → Stripe → entitlement cache in Redis
POST   /v1/clips                      # Time-range clip from live/VOD (no re-encoding)
      

Design principle: The heartbeat endpoint (117K req/sec) goes to a dedicated lightweight service — not the main API. It does zero business logic: updates a Redis counter, publishes QoE to Kafka, returns 204.

05

High-Level Architecture

Three independent planes: Ingest (receives RTMP), Media (transcode + CDN delivery), Control (API, chat, discovery). They scale independently and fail independently — a database crash never kills live video.

Request Flow — Step Through

Viewer Clicks Play→API Server→CDN Edge→Origin Shield→Origin (NVMe)→Player Renders

Click Next Step to walk through the request flow.

06

Deep Dive — Video Ingest → Transcode → HLS Pipeline

This is the defining technical challenge of live streaming — the component that doesn't exist in any other system design problem. Every byte travels: Streamer's GPU → OBS encoder → RTMP → Ingest server → Transcoder → Segmenter → Origin → CDN → Player.

sequenceDiagram participant OBS as Streamer (OBS) participant Ingest as Ingest PoP participant TC as Transcoder participant Seg as Segmenter participant Origin as Origin (NVMe) participant CDN as CDN Edge participant Player as Viewer Player OBS->>Ingest: RTMP connect + stream key Ingest->>Ingest: Validate stream key (Redis) Ingest->>TC: Forward raw H.264 + AAC TC->>TC: Decode → Scale → Re-encode (4 renditions) TC->>Seg: Continuous encoded output per rendition Seg->>Seg: Wait for keyframe boundary (2s) Seg->>Origin: Write .ts segment + update .m3u8 Origin->>CDN: CDN pulls segment (on viewer request) CDN->>CDN: Cache at edge PoP Player->>CDN: GET /720p/playlist.m3u8 CDN-->>Player: Manifest with latest segments Player->>CDN: GET /720p/seg_4530.ts CDN-->>Player: Segment (cache hit) Player->>Player: Decode + render frame

Transcoding: The Expensive Core

Each ingest stream (1080p60, 6 Mbps H.264) is decoded to raw YUV frames, scaled to 4 resolutions, and re-encoded at target bitrates. Encoding at "veryfast" preset (real-time constraint) costs ~2.5 CPU cores per stream. At 150K streams with tiered renditions, that's ~110K cores — the #1 compute cost. GPU encoding (NVIDIA NVENC) reduces per-stream cost with slightly lower quality.

Segmentation: Continuous → Discrete

The segmenter chops continuous encoder output into 2-second .ts files at keyframe boundaries. Each segment is an independently-decodable HTTP-cacheable file. The manifest (.m3u8) is a rolling window of ~10 segments. The CDN caches segments at 200+ edge PoPs — your origin sees ~200 requests per segment, not 1M.

Low-Latency HLS (LL-HLS)

Standard HLS adds ~2s of segmentation delay (wait for full segment). LL-HLS emits 200ms partial segments — the player fetches parts as they're produced. Combined with blocking playlist reloads (player long-polls the manifest, server responds instantly when a new part exists), latency drops from ~5s to ~2.5 seconds.

Latency budget: Encoding 0.5s + Transcoding 0.5s + Part duration 0.2s + CDN propagation 0.3s + Player buffer 1.0s = ~2.5s glass-to-glass with LL-HLS.

07

Every crash = 12-18s viewer freeze. Unacceptable for esports finals with 500K viewers. Acceptable for a stream with 3 viewers.

08

What Can Go Wrong

Transcoder Crash Mid-Stream

Top 5% streams (dual-encoded): standby transcoder output takes over in 0-2 seconds — seamless. Remaining 95%: stream router detects failure in 6s, re-routes RTMP to new transcoder, first new segment in 14-18s. Viewers see a freeze then resume. Process isolation prevents one corrupt stream from crashing 20+ co-located streams.

CDN Edge PoP Failure

Multi-CDN router shifts new viewers to healthy CDN within seconds. Existing viewers switch on next token refresh (5-10 min) or immediately via player-side error recovery: detect 3 consecutive segment failures → request new manifest from API → receive URLs for healthy CDN. Recovery: 2-3 seconds.

Origin Server Failure

CDN origin shield serves cached segments for 8 seconds (TTL). Meanwhile, CDN falls back to S3 backup origin (segments async-replicated). Viewers see 0-1 second stall. Without S3 backup: 25-35 second gap while transcoder output is redirected to new origin.

PostgreSQL Primary Failure

Three-plane separation saves us: live streams keep playing (video pipeline doesn't touch DB). Chat keeps working (Redis + Cassandra). Writes fail (new streams, follows, subscriptions) for 15-30s until Patroni promotes a replica. Write queue in Redis bridges the gap.

Thundering Herd — Viral Stream Goes Live

500K viewers arrive in 30 seconds. CDN origin shield + request collapsing reduces origin load to O(1) per segment. API caches playback responses (per-stream, not per-viewer). Chat pre-provisions servers for known large channels. Notifications spread over 60-120s with engagement tiering. Self-healing as caches warm within 10-30s.

Cascading Overload — Transcoder Fleet at Capacity

Admission control: reject new small streams before dropping existing live ones. Priority lanes for partners. Degraded mode: reduce renditions (5→3→1 passthrough). Circuit breakers at each boundary prevent cascade. Auto-scale GPU fleet within 3-5 minutes.

09

Interview Tips

🎬

Lead with the video pipeline, not the chat.
90% of candidates start with the database or chat. The defining challenge is RTMP → transcode → HLS → CDN. Start here to differentiate yourself. Cover it in 3-5 minutes, then layer on chat and discovery.

⏱️

Know your latency budget cold.
"Glass-to-glass breaks into six stages: encoding 0.5s, transcoding 0.5s, segmentation 0-2s, CDN 0.3s, player buffer 2s. With LL-HLS partial segments, we get to ~2.5s." This 20-second answer shows more depth than most show in 20 minutes.

🌐

Say "CDN" in the first two minutes.
10 Tbps egress is impossible to self-serve. CDN is the architecture, not an optimization. Mention multi-CDN for bonus points. If you're 10 minutes in without mentioning CDN, the interviewer questions your understanding of scale.

💰

Mention cost unprompted.
Three big costs: CDN bandwidth (~$24M/mo), transcoding compute (~$1.3M/mo), storage (~$184K/mo). Mention tiered transcoding (save 60%) and AV1 codec transition (save 30-50% CDN). This shows you think like an architect who owns a P&L.

🧱

Separate the three planes early.
Draw Ingest, Media, and Control planes in the first 5 minutes. Say: "They scale independently and fail independently — a database crash never kills live video." This framing organizes the entire discussion.

📊

Use specific numbers, not hand-waves.
"3.5M concurrent viewers × 2.7 Mbps = ~10 Tbps" beats "we'd need a lot of bandwidth." Memorize 5-6 key numbers: 3.5M viewers, 150K streamers, 10 Tbps egress, 110K transcoding cores, 117K heartbeats/sec, $24M/mo CDN.

🚫

Avoid these anti-patterns.
Don't confuse live streaming with video calling (opposite tradeoffs). Don't serve video from your API servers. Don't spend 15 minutes on chat (it's solved — WebSocket + pub/sub). Don't say "microservices" without justifying each extraction.

10

Evolution

How this design grows from MVP to planet-scale. Each transition is triggered by specific numerical thresholds, not arbitrary timelines.

1

MVP — Single Server (~100 viewers)

One machine: Nginx-RTMP receives ingest, FFmpeg transcodes to 1 rendition, Nginx serves HLS segments. Socket.io chat. SQLite. Total cost: ~$50/mo. Breaks at: >1 concurrent streamer or >100 viewers (CPU/bandwidth saturated).

2

Separation — Dedicated Components (~10K viewers)

Separate ingest, transcoding, and origin. Add CloudFront CDN — the single biggest leap (100→unlimited viewers). Multi-rendition transcoding. PostgreSQL + Redis. Dedicated chat service. Cost: ~$3K/mo.

3

Clustering — Horizontal Scale (~100K viewers)

Transcoder fleet with stream router. Origin cluster. Chat cluster with Redis Pub/Sub. Kafka event bus. Elasticsearch for search. Cassandra for chat history. Tiered transcoding strategy. Basic VOD pipeline. Cost: ~$30K/mo.

4

Multi-Region — Global Distribution (~1M viewers)

Ingest PoPs in 5 regions. Regional transcoding clusters. LL-HLS for 2.5s latency. GPU transcoding (NVENC). Dual encoding for top streams. Multi-CDN. Partitioned chat. Dedicated heartbeat service. PostgreSQL sharding. Cost: ~$500K/mo.

5

Planet-Scale — Twitch Tier (3.5M+ concurrent)

Intelligent multi-CDN routing (saves 15-25%). AV1 codec (saves 30-50% bandwidth). Active-active multi-region. Edge computing for manifest manipulation. ISP peering / embedded caches. Real-time ML moderation. Custom CDN infrastructure. Cost: ~$30M+/mo.

📺

References & Videos

System Design: Live Streaming (Twitch)

Gaurav Sen · 20 min

How Twitch Works

Hussein Nasser · 22 min

Twitch Engineering Blog

Twitch Engineering

Design Live Streaming

Hello Interview

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Scale Estimation

Key Derivation

API Design

High-Level Architecture

Deep Dive — Video Ingest → Transcode → HLS Pipeline

Transcoding: The Expensive Core

Segmentation: Continuous → Discrete

Low-Latency HLS (LL-HLS)

Key Design Decisions & Tradeoffs

1. Viewer Delivery Protocol

HLS over CDN

WebRTC (SFU)

2. Segment Duration

2-Second Segments + LL-HLS

6-Second Segments

3. Transcoding Strategy

Server-Side, Tiered Renditions

Passthrough Only / SVC

4. Chat Pub/Sub Backend

Redis Pub/Sub (real-time) + Kafka (persistence)

Kafka as Primary Chat Bus

5. CDN Strategy

Multi-CDN with Intelligent Routing

Single CDN

6. Transcoding Redundancy

Selective Dual Encoding (Top 5%)

No Redundancy / Fast Failover Only

What Can Go Wrong

Transcoder Crash Mid-Stream

CDN Edge PoP Failure

Origin Server Failure

PostgreSQL Primary Failure

Thundering Herd — Viral Stream Goes Live

Cascading Overload — Transcoder Fleet at Capacity

Interview Tips

Similar Problems

YouTube / Netflix (VOD)

WhatsApp / Chat System

Notification System

Video Calling (Zoom)

Kafka / Message Queue

CDN Design

Evolution

MVP — Single Server (~100 viewers)

Separation — Dedicated Components (~10K viewers)

Clustering — Horizontal Scale (~100K viewers)

Multi-Region — Global Distribution (~1M viewers)

Planet-Scale — Twitch Tier (3.5M+ concurrent)

References & Videos

YouTube / Netflix (VOD)

WhatsApp / Chat System

CDN