User uploads a 12 MP photo from their phone. Your app needs to display it as: thumbnail (200×200), card image (600×400), full view (1920×1080), maybe HEIC + WebP + JPEG variants for different browsers. Maybe with face crop. Maybe in 4 different aspect ratios for different feed templates. One upload becomes 8-20 derived assets.
Doing this synchronously on upload kills latency. Doing it on every read kills CPU. The right answer is a processing pipeline: upload triggers an async job, derivatives generated once, served from CDN. Same shape applies to video (transcoding + multi-bitrate ladder) at much higher cost.
02
The image pipeline
Direct-to-S3 upload. Client gets a pre-signed URL; uploads original directly to S3. App server never sees the bytes — saves bandwidth + load.
S3 event triggers a Lambda / SQS message ("new image at path X").
Worker processes: read original, generate variants (resize, compress, format-convert), write derivatives back to S3 under predictable paths.
On-demand resize (modern variant): instead of pre-generating all sizes, generate on first request via image proxy (Imgix, Cloudflare Images, Vercel/Next Image). Cache result at edge. Better for long-tail sizes.
03
Pre-generate vs on-demand
Pre-generate
All variants on upload
Predictable storage, predictable serving cost, instant CDN serve. Wasted compute + storage for variants that are never requested. Common for small fixed variant sets (3-5 sizes).
On-demand resize
Generate on first request
Serve any size or format from a single source. CDN caches result. Pay only for what's actually used. Used by Imgix, Cloudflare Images, Next.js Image. Modern default.
04
Video — much harder
Video needs transcoding — decode the source, re-encode to multiple bitrate/resolution combinations for adaptive streaming (HLS / DASH). Each variant is a "rendition." A typical ladder:
Resolution
Bitrate
Use
240p
~400 kbps
Slow mobile
480p
~1 Mbps
3G mobile
720p
~3 Mbps
WiFi mobile
1080p
~5 Mbps
Desktop / smart TV
4K
~16 Mbps
Premium
Each rendition encoded with a codec — H.264 (universal), H.265/HEVC (smaller, weaker support), AV1 (smallest, slow encode), VP9 (YouTube). Most providers encode multiple codecs per rendition for browser compatibility.
Cost: a 30-second 1080p clip transcoded to 5 renditions takes ~30 seconds of GPU time. A 90-minute movie: hours. Encoding farms are massive cost centers at YouTube/Netflix scale.
05
Deep dive — adaptive bitrate streaming (HLS / DASH)
The user's network changes mid-video — WiFi drops, cellular kicks in. Streaming adapts: switch to lower bitrate; switch back when bandwidth recovers. Both HLS and DASH work the same way:
Manifest file (.m3u8 for HLS, .mpd for DASH) lists all renditions + their segments.
Each rendition split into ~6-second segments. Pre-encoded; stored as files (seg-001.ts, etc.).
Player downloads manifest, picks initial bitrate based on initial bandwidth estimate.
Player downloads segments sequentially, switching renditions between segments based on observed throughput.
CDN serves segments — each is just an HTTP-cacheable file. CDN scaling is normal HTTP scaling.
Why segments + manifest beat continuous streaming: any HTTP CDN can serve it. No special protocols. WebSocket-grade complexity not needed. Player can prefetch, jump, pause, switch quality cheaply.
Interview answer
"Direct-to-S3 upload triggers an async transcoding pipeline. Output 5-rendition HLS ladder + WebP/AV1 thumbnails. Manifests + segments served via CDN with long TTLs. Player adapts bitrate based on observed bandwidth. Image variants generated on-demand via image proxy (Imgix-style) so we don't pre-generate sizes that are never used."
06
Real-world
YouTube / Netflix encoding farms
Massive scale
Tens of thousands of GPUs encoding new content + re-encoding old to newer codecs. AV1 re-encoding has saved Netflix petabytes of bandwidth.
Imgix / Cloudflare Images
On-demand image proxy
URL-based transformations. ?w=300&fmt=webp generates + caches at edge. The default for image-heavy sites.
Cloudflare Stream / AWS MediaConvert
Managed video pipelines
Upload → transcode → deliver as HLS/DASH. SaaS for teams that don't want to run encoding farms.
YouTube/Netflix is fundamentally a video pipeline at extreme scale. News feed processes user-uploaded photos via image pipeline. Live streaming uses real-time transcoding (different beast). Google Drive thumbnails files via image pipeline.