06
Deep Dive — Video Ingest → Transcode → HLS Pipeline
This is the defining technical challenge of live streaming — the component that doesn't exist in any other system design problem. Every byte travels: Streamer's GPU → OBS encoder → RTMP → Ingest server → Transcoder → Segmenter → Origin → CDN → Player.
sequenceDiagram
participant OBS as Streamer (OBS)
participant Ingest as Ingest PoP
participant TC as Transcoder
participant Seg as Segmenter
participant Origin as Origin (NVMe)
participant CDN as CDN Edge
participant Player as Viewer Player
OBS->>Ingest: RTMP connect + stream key
Ingest->>Ingest: Validate stream key (Redis)
Ingest->>TC: Forward raw H.264 + AAC
TC->>TC: Decode → Scale → Re-encode (4 renditions)
TC->>Seg: Continuous encoded output per rendition
Seg->>Seg: Wait for keyframe boundary (2s)
Seg->>Origin: Write .ts segment + update .m3u8
Origin->>CDN: CDN pulls segment (on viewer request)
CDN->>CDN: Cache at edge PoP
Player->>CDN: GET /720p/playlist.m3u8
CDN-->>Player: Manifest with latest segments
Player->>CDN: GET /720p/seg_4530.ts
CDN-->>Player: Segment (cache hit)
Player->>Player: Decode + render frame
Transcoding: The Expensive Core
Each ingest stream (1080p60, 6 Mbps H.264) is decoded to raw YUV frames, scaled to 4 resolutions, and re-encoded at target bitrates. Encoding at "veryfast" preset (real-time constraint) costs ~2.5 CPU cores per stream. At 150K streams with tiered renditions, that's ~110K cores — the #1 compute cost. GPU encoding (NVIDIA NVENC) reduces per-stream cost with slightly lower quality.
Segmentation: Continuous → Discrete
The segmenter chops continuous encoder output into 2-second .ts files at keyframe boundaries. Each segment is an independently-decodable HTTP-cacheable file. The manifest (.m3u8) is a rolling window of ~10 segments. The CDN caches segments at 200+ edge PoPs — your origin sees ~200 requests per segment, not 1M.
Low-Latency HLS (LL-HLS)
Standard HLS adds ~2s of segmentation delay (wait for full segment). LL-HLS emits 200ms partial segments — the player fetches parts as they're produced. Combined with blocking playlist reloads (player long-polls the manifest, server responds instantly when a new part exists), latency drops from ~5s to ~2.5 seconds.
Latency budget: Encoding 0.5s + Transcoding 0.5s + Part duration 0.2s + CDN propagation 0.3s + Player buffer 1.0s = ~2.5s glass-to-glass with LL-HLS.