Concept · Frontend & Mobile

Image & Video Processing Pipeline

01

Why this matters

User uploads a 12 MP photo from their phone. Your app needs to display it as: thumbnail (200×200), card image (600×400), full view (1920×1080), maybe HEIC + WebP + JPEG variants for different browsers. Maybe with face crop. Maybe in 4 different aspect ratios for different feed templates. One upload becomes 8-20 derived assets.

Doing this synchronously on upload kills latency. Doing it on every read kills CPU. The right answer is a processing pipeline: upload triggers an async job, derivatives generated once, served from CDN. Same shape applies to video (transcoding + multi-bitrate ladder) at much higher cost.

02

The image pipeline

  1. Direct-to-S3 upload. Client gets a pre-signed URL; uploads original directly to S3. App server never sees the bytes — saves bandwidth + load.
  2. S3 event triggers a Lambda / SQS message ("new image at path X").
  3. Worker processes: read original, generate variants (resize, compress, format-convert), write derivatives back to S3 under predictable paths.
  4. CDN serves derivatives. URL pattern: cdn.example.com/img/<id>/600x400.webp.
  5. On-demand resize (modern variant): instead of pre-generating all sizes, generate on first request via image proxy (Imgix, Cloudflare Images, Vercel/Next Image). Cache result at edge. Better for long-tail sizes.
03

Pre-generate vs on-demand

Pre-generate

All variants on upload

Predictable storage, predictable serving cost, instant CDN serve. Wasted compute + storage for variants that are never requested. Common for small fixed variant sets (3-5 sizes).

On-demand resize

Generate on first request

Serve any size or format from a single source. CDN caches result. Pay only for what's actually used. Used by Imgix, Cloudflare Images, Next.js Image. Modern default.

04

Video — much harder

Video needs transcoding — decode the source, re-encode to multiple bitrate/resolution combinations for adaptive streaming (HLS / DASH). Each variant is a "rendition." A typical ladder:

ResolutionBitrateUse
240p~400 kbpsSlow mobile
480p~1 Mbps3G mobile
720p~3 MbpsWiFi mobile
1080p~5 MbpsDesktop / smart TV
4K~16 MbpsPremium

Each rendition encoded with a codec — H.264 (universal), H.265/HEVC (smaller, weaker support), AV1 (smallest, slow encode), VP9 (YouTube). Most providers encode multiple codecs per rendition for browser compatibility.

Cost: a 30-second 1080p clip transcoded to 5 renditions takes ~30 seconds of GPU time. A 90-minute movie: hours. Encoding farms are massive cost centers at YouTube/Netflix scale.

05

Deep dive — adaptive bitrate streaming (HLS / DASH)

The user's network changes mid-video — WiFi drops, cellular kicks in. Streaming adapts: switch to lower bitrate; switch back when bandwidth recovers. Both HLS and DASH work the same way:

  1. Manifest file (.m3u8 for HLS, .mpd for DASH) lists all renditions + their segments.
  2. Each rendition split into ~6-second segments. Pre-encoded; stored as files (seg-001.ts, etc.).
  3. Player downloads manifest, picks initial bitrate based on initial bandwidth estimate.
  4. Player downloads segments sequentially, switching renditions between segments based on observed throughput.
  5. CDN serves segments — each is just an HTTP-cacheable file. CDN scaling is normal HTTP scaling.

Why segments + manifest beat continuous streaming: any HTTP CDN can serve it. No special protocols. WebSocket-grade complexity not needed. Player can prefetch, jump, pause, switch quality cheaply.

Interview answer

"Direct-to-S3 upload triggers an async transcoding pipeline. Output 5-rendition HLS ladder + WebP/AV1 thumbnails. Manifests + segments served via CDN with long TTLs. Player adapts bitrate based on observed bandwidth. Image variants generated on-demand via image proxy (Imgix-style) so we don't pre-generate sizes that are never used."

06

Real-world

YouTube / Netflix encoding farms

Massive scale

Tens of thousands of GPUs encoding new content + re-encoding old to newer codecs. AV1 re-encoding has saved Netflix petabytes of bandwidth.

Imgix / Cloudflare Images

On-demand image proxy

URL-based transformations. ?w=300&fmt=webp generates + caches at edge. The default for image-heavy sites.

Cloudflare Stream / AWS MediaConvert

Managed video pipelines

Upload → transcode → deliver as HLS/DASH. SaaS for teams that don't want to run encoding farms.

Mux

Video API for developers

"Stripe for video." API-driven encoding + delivery. Handles HLS + DASH + analytics + thumbnails.

07

Used in problems

YouTube/Netflix is fundamentally a video pipeline at extreme scale. News feed processes user-uploaded photos via image pipeline. Live streaming uses real-time transcoding (different beast). Google Drive thumbnails files via image pipeline.

Next up