YouTube / Netflix

02

Requirements

Functional

Users can upload videos — raw file, any format, up to several GB
Uploaded videos are automatically transcoded into multiple resolutions
Users can stream videos with adaptive quality based on connection speed
Users can search for videos by title, description, and tags
Users see a personalised recommendations feed on the homepage

Non-Functional

99.99% availability — video must always be watchable
First frame in < 2 seconds — drop-off spikes after 2s buffer
Upload async OK — users tolerate a few minutes for processing
99.999999% durability — losing someone's video is catastrophic
Eventual consistency OK — new videos can take minutes to propagate globally

YouTube vs Netflix — Key Difference

YouTube's hard problem is the write side — processing 500 hours of video per minute from arbitrary sources. Netflix's hard problem is the read side — delivering studio-quality content flawlessly to 200M subscribers. Same product surface, fundamentally different architecture pressure.

YouTube

User-generated content — anyone uploads
500 hours of video per minute
Reactive transcoding pipeline
Fixed bitrate ladder per rendition
Hard problem: write path scale

Netflix

Studio-licensed catalog — controlled intake
Hundreds of titles per year
Proactive transcoding — weeks before release
Per-title encoding — custom bitrate per show
Hard problem: read path quality & reliability

03

Scale Estimation

Metric	Calculation	Result
Daily Active Users	YouTube benchmark	500M DAU
Video views/day	~2 views per DAU	~1B views/day
View RPS	1B ÷ 86,400	~11,600 RPS
Upload volume	Industry stat	500h video/min
Raw ingestion bandwidth	500h × 60min × ~1GB/10min	~50 GB/s
Storage per year (raw)	50 GB/s × 86,400 × 365	~1.5 EB/year
Storage multiplier (transcoding)	6 renditions × ~0.7 avg ratio	~5× raw size
Read : Write ratio	1B views ÷ ~5M uploads	~200:1

The Number That Drives Everything

The 200:1 read-to-write ratio is the most important number. This is almost entirely a read and delivery problem once a video is processed. Every architectural decision — CDN-first, aggressive caching, pre-warming, adaptive bitrate — flows from this single fact. Writes can be slow and async. Reads must be instant.

04

API Design

POST /v1/videos/initiate Start a resumable upload session

// Request { "filename": "vacation.mov", "size_bytes": 2147483648, "content_type": "video/quicktime" } // Response 201 { "upload_id": "upl_abc123", "chunk_size_bytes": 5242880, "total_chunks": 409 }

PUT /v1/videos/:upload_id/chunks/:n Upload a single 5MB chunk

// Body: raw binary chunk data (5MB) // Response 200 — idempotent, safe to retry { "chunk": 47, "received": true, "chunks_remaining": 362 }

POST /v1/videos/:upload_id/complete Signal upload complete, begin transcoding

// Response 202 Accepted — async processing begins { "video_id": "vid_xyz789", "status": "processing", "eta_seconds": 300 }

GET /v1/videos/:video_id Fetch video metadata and manifest URL

// Response 200 { "id": "vid_xyz789", "title": "My Vacation", "duration_s": 342, "manifest_url": "https://cdn.example.com/vid_xyz789/manifest.m3u8", "thumbnail_url": "https://cdn.example.com/vid_xyz789/thumb.jpg", "status": "ready" }

GET /v1/search?q=:query&page_token=:token Two-stage retrieval + ranking

// Response 200 { "results": [ { "id": "vid_abc", "title": "...", "score": 0.92 } ], "next_page_token": "pg_2" }

GET /v1/feed?user_id=:id&page_token=:token Personalised recommendations homepage

// Response 200 { "videos": [ ... ], "next_page_token": "pg_2" }

05

High-Level Architecture

Full System — Write + Read + Search Paths SVG Diagram

Kafka Queue

The architectural pivot between upload and transcoding. The upload service publishes an event and returns 202 Accepted immediately. Transcoding workers consume independently — decoupled, scalable, resilient to worker crashes.

Transcoding Workers

A fleet of CPU-heavy machines running FFmpeg. Parallelised in two dimensions: one worker per rendition, and the video is split into 10-second segments so each segment transcodes concurrently — a DAG of jobs.

CDN Edge

The most critical component on the read path. Netflix's Open Connect Appliances sit inside ISP data centres. YouTube uses Google's private PoP network. Over 95% of traffic never reaches origin servers.

Metadata DB + Redis

MySQL stores video metadata — title, owner, manifest URL, status. Redis caches hot video metadata and approximate view counts. View counts use Redis INCR and batch-flush to DB — exact real-time counts are unnecessary.

06

Deep Dive — The Video Encoding Pipeline

Why This Is the Hard Part

Most candidates describe the CDN and stop. But the encoding pipeline is what makes YouTube possible. Every video you watch has been automatically processed into 6+ renditions by a distributed job system. Understanding how that works — and why it's designed the way it is — separates good answers from great ones.

When a raw video arrives, three problems need solving simultaneously: resilience (uploads fail), decoupling (encoding is slow), and speed (encoding is embarrassingly parallel). Each problem has a clean solution.

One upload → six renditions

4K

VP9 / AV1

~20 Mbps

1080p

H.264

~8 Mbps

720p

H.264

~4 Mbps

480p

H.264

~1.5 Mbps

360p

H.264

~0.7 Mbps

144p

H.264

~0.1 Mbps

Sequence — Upload to Video Live Mermaid.js

sequenceDiagram participant C as Client participant US as Upload Service participant S3r as S3 Raw participant Q as Kafka Queue participant TW as Transcoding Workers participant S3v as S3 Renditions participant CDN as CDN Edge participant DB as Metadata DB C->>US: POST /initiate → upload_id loop Per 5MB chunk C->>US: PUT /chunks/:n (idempotent) end C->>US: POST /complete US->>S3r: Store raw file US->>Q: Publish { video_id, raw_path } US-->>C: 202 Accepted — processing par Parallel transcoding (per rendition) Q->>TW: Consume event TW->>S3r: Fetch raw segments TW->>TW: FFmpeg transcode (DAG jobs) TW->>S3v: Store renditions end TW->>CDN: Pre-warm edge caches TW->>DB: Mark video status = ready

Problem 1 — Resilient uploads. A 2GB file can't be a single HTTP request. Networks drop. Phones sleep. The solution is chunked uploading — split the file into 5MB pieces, upload each independently with its own retry. The server tracks received chunks in Redis. A failed chunk retries without restarting the entire upload. The upload is idempotent per chunk — safe to retry any number of times.

Problem 2 — Decoupled transcoding. Encoding takes minutes. You can't block the upload service waiting. The upload service does two things: stores the raw file in S3, then publishes an event to Kafka. That's it — it returns 202 Accepted and walks away. Transcoding workers consume the queue independently, at their own pace, on their own machines. If workers crash, the message stays in Kafka and retries. The two services are completely decoupled.

Problem 3 — Parallel DAG jobs. A naïve transcoder processes renditions sequentially. But the 1080p job has no dependency on the 4K job. And the first minute of a video has no dependency on the last. So the pipeline splits across two dimensions: one worker per rendition (fan-out by format), and the video is first split into 10-second segments (fan-out by time). With enough workers, a 2-hour video transcodes in minutes, not hours. This DAG approach is what lets YouTube make a video watchable within ~5 minutes of upload.

Netflix's edge: Netflix has weeks before a title goes live. They run a complexity analysis pass first — measuring how visually complex each scene is. A dark action sequence needs more bits. A static talking head needs fewer. They then set a per-title bitrate ladder — a custom encoding profile for each show. Same perceived quality at lower bandwidth. This is only possible because they control the intake timeline.

Request Flow — Step Through

Client · Chunked upload→Upload Service · 202 Accepted→Kafka Queue · Event published→Segment Split · 10s chunks→Workers ×N · Parallel FFmpeg→S3 Renditions · 6 formats stored→CDN Pre-warm · Edges populated→Video Live · Status = ready

Click Next Step to walk through the request flow.

07

Key Design Decisions & Tradeoffs

Option A — Chosen

Async transcoding via queue

Upload service publishes to Kafka and returns immediately. Workers consume independently. Resilient to worker failures. Scales to any upload volume by adding workers. Enables parallel segment-level processing.

✓ Required at YouTube's scale

Option B

Synchronous transcoding

Upload service waits for encoding to finish before returning. Simple to reason about. But a single slow 4K upload blocks a thread for 10+ minutes. One traffic spike saturates the upload service. Does not scale past hundreds of uploads/hour.

✗ Fails under any real load

Option A — Chosen (Netflix)

Proactive CDN cache warming

Push popular content to edge caches before requests arrive. No cold start. First viewer gets the same fast experience as the millionth. Netflix pre-warms Open Connect Appliances during off-peak hours with predicted popular content.

✓ Eliminates cold-start latency spikes

Option B

Pull-based CDN caching

Standard CDN behaviour — cache on first request, serve subsequent requests from cache. Simple. No infrastructure overhead. But the first user in a region after a cache miss hits origin — potentially slow and expensive for a viral video.

~ Fine for small-scale deployments

Option A — Chosen

Chunked resumable upload

Split uploads into 5MB chunks. Each chunk is idempotent — safe to retry. Server tracks progress in Redis. A 2GB upload that fails at 1.8GB resumes from chunk 380, not from zero. Works on any network quality.

✓ Standard for any large file upload

Option B

Single-stream upload

One HTTP connection for the entire file. Simplest to implement. But any network interruption means restarting from zero. A user on mobile uploading a 2GB video has a near-zero chance of success on a single stream.

✗ Unusable on mobile / poor connections

Option A — Chosen (at scale)

Erasure coding for cold storage

Split file into data + parity chunks, distributed across disks. Can reconstruct from partial data. ~1.5× storage overhead vs 3× for replication. At petabyte scale, the difference between 1.5 EB and 3 EB is billions of dollars.

✓ Netflix + YouTube both use this

Option B

3× replication

Store three identical copies on separate disks/nodes. Simple, fast reads, well-understood failure model. But 3× storage cost is prohibitive at exabyte scale. Fine for small deployments or hot-tier data.

~ Start here, migrate later

08

What Can Go Wrong

🔥

Transcoding Worker Crash Mid-Job

A worker dies halfway through encoding a 4K rendition. Without idempotency, you'd have partial files in S3 and a corrupted video. The Kafka message was consumed but never acknowledged — it'll be redelivered to another worker and re-processed from scratch.

→ Fix: Idempotent jobs — each 10s segment is a separate job with unique ID. Failed segments retry independently. Kafka's at-least-once delivery ensures no message is lost.

🌊

Viral Video CDN Cache Miss Storm

A video goes viral. Millions of viewers hit play simultaneously in a region where the CDN hasn't cached the content yet. All requests fall back to origin — a thundering herd that overwhelms the origin server.

→ Fix: Pre-warm CDN edges immediately after transcoding completes. For uploads gaining rapid traction, a watch-count threshold triggers additional pre-warming. CDN request coalescing ensures only one origin fetch per cache miss, not one per user.

📴

Upload Service Crash Mid-Chunk

User is uploading chunk 47 of 200. The upload service instance dies. Without stateful tracking, the client has no idea which chunks were received and must restart the entire upload.

→ Fix: Chunk state stored in Redis (not in-process memory). Client retries against any healthy instance — the state is in Redis, not the server. Client can query which chunks are missing and resume exactly.

⚡

Metadata DB Overload on Viral Content

A video goes viral — 10 million simultaneous viewers, each triggering a metadata read. The DB gets crushed. Latency spikes. Errors cascade.

→ Fix: Redis cache in front of metadata DB for hot videos. View counts use approximate counters (Redis INCR, batch-flush every 30s) — exact real-time counts aren't needed. Hot videos get a "pinned" flag in Redis that bypasses DB entirely.

🕐

Kafka Consumer Lag

A sudden spike in uploads (live event, viral moment) floods the Kafka topic. Transcoding workers can't keep up. Queue depth grows. Videos take hours instead of minutes to become watchable.

→ Fix: Auto-scale transcoding worker fleet based on queue depth. Kafka's consumer group model means adding more workers is just deploying more instances — no coordination needed. Prioritise short videos (faster to process) over long ones during load spikes.

⚠

Anti-patterns

🚫

Stream directly from S3

Origin bandwidth bills + latency kill the playback.

✓ Better: Multi-CDN edge caching; HLS segments served from edges; origin is last resort.

🚫

Transcode on playback request

GPU time per request is absurd; viewers see spinner for minutes.

✓ Better: Pre-transcode all renditions on upload; manifest points at pre-built segments.

🚫

Stream a single 1080p file to all devices

Mobile on 3G can't handle 1080p; desktop wants 4K.

✓ Better: Adaptive bitrate (HLS/DASH); player chooses rendition per-segment based on bandwidth.

09

Interview Tips

01

Draw the write path and read path separately — from the start. This is the single most clarifying thing you can do. Say: "I'll design the upload pipeline first, then the watch path — they're completely separate systems." Candidates who conflate them end up drawing confusing spaghetti diagrams.

02

The queue between upload and transcoding is the pivotal decision. State it proactively: "We can't synchronously encode video — it takes minutes. So the upload service publishes to Kafka and returns 202 immediately. This decouples ingestion from processing." If you don't say this, the interviewer will probe until you get there. Get there first.

03

Name the CDN strategy specifically. Don't say "we'll use a CDN." Say: "Netflix uses Open Connect Appliances — physical servers inside ISPs, pre-warmed with popular content. YouTube uses Google's own PoP network. Both serve 95%+ of traffic without ever hitting origin." Specificity signals depth.

04

Know adaptive bitrate streaming by name. The player doesn't pick one quality and stick to it. It requests 2–10 second chunks independently, measures download speed between each chunk, and picks the best rendition for the next one. Know HLS and DASH by name — they're the two dominant standards.

05

Common follow-up: "How do you handle a video going viral 5 minutes after upload?" Answer: immediately after transcoding completes, push the manifest and first few chunks to CDN edges proactively — before the traffic wave hits. This is the pre-warming pattern. Without it, the first million viewers all get a slow cache-miss experience simultaneously.

06

YouTube vs Netflix is a strong signal of depth. Most candidates describe one system. Interviewers love when you distinguish them: "YouTube's hard problem is the write side — reactive transcoding at scale. Netflix's hard problem is the read side — per-title encoding and Open Connect delivery." This shows architectural thinking, not just pattern matching.

10

How the Design Evolves

Phase 1 — 0 to 10K users

Synchronous encoding, single server

Upload to S3, trigger a single FFmpeg process, transcode to 2–3 renditions. No queue, no workers. Ship fast — get the product in front of users. Most startups that build video features live here forever. Synchronous is fine when uploads are rare.

Phase 2 — 10K to 1M users

Add queue + worker pool + CDN

The moment uploads start failing or taking too long, introduce SQS or Kafka and a pool of transcoding workers. Add a third-party CDN (Cloudflare, Akamai) for video delivery. Add Redis for metadata caching. Most video startups live here. The architecture that handles 1M users is usually sufficient for 10M.

Phase 3 — 1M to 100M users

Segment-level parallelism + multi-region

Introduce DAG-based transcoding — split videos into segments, transcode in parallel. Multi-region deployment. CDN cache warming based on engagement signals. Separate microservices for upload, transcoding, delivery, and search. A dedicated recommendations engine. Auto-scaling worker fleet based on queue depth.

Phase 4 — 100M+ users

Per-title encoding + embedded CDN hardware

Netflix-level: Open Connect Appliances inside ISPs. Per-title encoding with complexity analysis. Erasure coding for cold storage at exabyte scale. ML-driven predictive cache population. Custom ABR algorithms tuned per device class and network type. Most engineers never operate here — but knowing it demonstrates architectural depth in interviews.

📺

References & Videos

System Design: YouTube

Gaurav Sen · 18 min

How Netflix Really Works

ByteByteGo · 12 min

Design YouTube — The Hard Parts

Jordan Has No Life · 28 min

Design a System Like YouTube

ByteByteGo Blog

Netflix Tech Blog — Engineering at Scale

Netflix Engineering

Requirements

Scale Estimation

API Design

High-Level Architecture

Deep Dive — The Video Encoding Pipeline

Key Design Decisions & Tradeoffs

What Can Go Wrong

Anti-patterns

Interview Tips

Similar Problems

How the Design Evolves

References & Videos

Interview Framework