Instagram

A photo and short-video sharing platform with feeds, stories, reels, and direct messaging serving 2 billion monthly active users. The hard parts: a media upload pipeline handling 95M+ daily uploads with on-the-fly transcoding, a ranked feed blending followed accounts and algorithmic recommendations, and a stories/reels delivery system that's effectively a global short-video CDN. Twitter solved feeds; Instagram solved feeds + media at heavier per-item cost.

⚡ Core: Media Pipeline + Ranked Feed2B MAU95M uploads/dayRead-heavy 100:1Heavy media (~3 MB/photo, ~10 MB/reel)
02

Requirements

Functional
  • Upload photos and short videos (reels) with caption, location, tags
  • Browse a ranked home feed blending followed accounts + algorithmic suggestions
  • Post and view stories (24-hour ephemeral content)
  • Like, comment, save, share posts and reels
  • Follow / unfollow accounts; view profile feeds
  • Search by username, hashtag, location
Non-Functional
  • Feed load < 200 ms p99 globally
  • Media upload resilient to flaky mobile networks (resumable, retry)
  • Reel playback starts in < 500 ms with adaptive bitrate
  • 99.95% availability for read; 99.9% for write
  • Scale to 2B MAU, 500M DAU, 95M uploads/day
  • Eventual consistency on counts (likes, views) is fine; strong on follow graph
03

Scale Estimation

500M DAU. Average user uploads ~0.2 posts/day → 100M uploads/day ≈ 1,200 writes/sec. Reads are heavier: ~50 feed loads/user/day × 500M = 25B reads/day ≈ 290k feed loads/sec.

Each feed load fetches ~30 posts metadata → ~9M post fetches/sec. Each post displays media; even with CDN cache the origin bandwidth is dominated by uploads:

500M
DAU
~1,200
writes/sec (uploads)
~290k
feed loads/sec
~9M
post fetches/sec
~3 PB/day
new media (avg 30 MB after transcoding ladder)
~20 EB
total stored media (5 years)
~50 Tbps
peak read bandwidth (CDN edge)
~500 GB
graph storage (2B users × ~250 follows avg)

The dominant cost is media — both storage (PB-scale) and egress bandwidth (Tbps-scale). Compute cost is dominated by transcoding and feed ranking. Database load is small relative to media; the post metadata DB at ~1,200 writes/sec sits comfortably in a sharded Postgres or Cassandra.

04

API Design

POST/v1/uploads/initiate

Returns a pre-signed S3 URL + upload_id. Client uploads media bytes directly to S3 in chunks (resumable). Server never touches bytes.

POST/v1/posts

{ upload_id, caption, tags, location, audience }. Creates the post once media is uploaded. Triggers async transcoding + fan-out.

GET/v1/feed?cursor=&limit=30

Returns ranked feed page. Cursor-based pagination over a per-user feed cache. Includes posts + interleaved reels + ad slots.

GET/v1/users/{id}/posts

Profile grid. Reverse chronological. Includes media URLs from CDN.

POST/v1/posts/{id}/like

Idempotent like. Increments engagement counter (eventual consistency); tags user_id in like-set for "who liked this."

POST/v1/follows

{ followee_id }. Strong consistency — affects feed eligibility immediately.

GET/v1/stories

Stories from followed accounts within 24h TTL. Pre-joined per-viewer cache.

GET/v1/reels?cursor=

Algorithmic reel feed. Pure recommendation — not constrained to follows. Vertical autoplay stream.

05

High-Level Architecture

Three concurrent pipelines fan out from a single upload:

  • Media path: client → S3 (direct) → transcoding workers → multi-rendition derivatives → CDN.
  • Metadata path: post created in sharded Postgres; emits PostCreated event to Kafka.
  • Distribution path: fan-out workers consume PostCreated; push post_id into followers' Redis feed sorted-sets (for normal users), plus emit to recommendation pipelines (for ML-driven reel feed).
Architecture — Upload + Feed ReadSVG
Client app CDN edge API gateway Upload service S3 originals Feed service Reel service Redis · feed cache Redis · counters Postgres · posts (sharded) Cassandra · graph Kafka · PostCreated Transcoding workers (GPU) Fan-out workers Reel ranker (ML) CDN derivative cache (HLS + thumbs)
Request Flow — Step Through
Client · Mobile appAPI Gateway · Auth + rate-limitUpload svc · pre-sign URLS3 originals · multipart uploadValidator · NSFW + EXIFTranscoder · GPU poolCDN · derivative cache
Click Next Step to walk through the request flow.
06

Deep Dive — The Media Upload Pipeline

The Core Question

How do you accept 95M uploads/day, transcode each into 4-6 renditions, and have them globally available within seconds — without your servers ever touching the bytes?

Step 1 — Direct-to-S3 upload. Client requests a pre-signed multi-part upload URL. Uploads chunks of ~5 MB directly to S3, resumable (network drops? resume from last chunk). Server sees only metadata: upload_id, expected size, declared MIME. Saves ~50 Tbps of inbound bandwidth at the API tier.

Step 2 — Validate + extract. S3 event triggers a Lambda. Reads the original, validates format/size, extracts EXIF, runs initial moderation (NSFW classifier + virus scan). Marks the upload ready or rejects.

Step 3 — Transcode ladder. A GPU worker pool consumes ready uploads. Photos get resized to 5 sizes (thumbnail 150², card 600², feed 1080², profile 1440², full 2048²) + 2 formats (WebP for modern, JPEG fallback). Reels get the full HLS ladder: 240p / 480p / 720p / 1080p (+ AV1 for newer clients) split into 4-second segments. One reel = ~40 derivative files; one photo = ~10.

Step 4 — Push to CDN. Derivatives written to S3 with predictable paths. CDN pulls on first request (lazy) or pre-warmed via API call (for high-confidence viral content from verified accounts). HLS manifest stitched with adaptive bitrate URLs.

Step 5 — Ack to client. Original Kafka UploadReadyPostCreated handlers update the post status. Client polls the post; once status is processed, post becomes visible in feeds.

Latency budget: first thumbnail in 1-2s, full HLS ladder in 10-30s for reels. Users see their photo immediately (we serve the original or a fast-path resize); the optimal rendition replaces it within seconds.

Cost realities — a 30-second reel transcoded into 4 renditions × 2 codecs takes ~30 GPU-seconds. At 95M uploads/day with say 30% reels, that's ~280 GPU-hours/day for transcoding alone. At AWS spot pricing ($0.50/GPU-hour for older A10G), ~$140/day. At Instagram's actual scale with H100s and AV1 re-encoding, it's millions in annual GPU spend.

Sequence — Upload & TranscodeMermaid.js
sequenceDiagram participant C as Client participant API as API participant S3 as S3 participant L as Validator (Lambda) participant K as Kafka participant T as Transcode workers participant CDN as CDN C->>API: POST /uploads/initiate API->>S3: pre-sign multipart URL API-->>C: { upload_url, upload_id } C->>S3: PUT chunks (5 MB each, resumable) S3->>L: ObjectCreated event L->>L: validate, EXIF, NSFW, virus L->>K: UploadReady{ id, type } K->>T: consume T->>S3: read original T->>T: encode 5 sizes + 2 formats (photo)
or HLS ladder (reel) T->>S3: write derivatives T->>K: TranscodeComplete{ id, manifest } K->>API: status update CDN->>S3: lazy pull on first request CDN-->>C: serve derivative
07

Key Design Decisions & Tradeoffs

Hybrid fan-out

Fan-out on write for normal users; fan-out on read for celebrities

Normal user (≤ 10K followers) — push PostCreated into each follower's Redis feed sorted-set. O(followers) work but happens once. Reads become O(1) ZRANGE.

Celebrity (≥ 1M followers, e.g. @cristiano @selenagomez) — fan-out write would mean 600M Redis writes for one post. Skip; instead, at read time, merge celebrity posts from a smaller "celebrity feed" cache pulled per-followed-celebrity. Avoids the celebrity fan-out storm.

Pure fan-out on read

Compute every feed at request time

Cleaner, no fan-out workers. But 290k feed loads/sec × 250 follows avg = 73M Redis lookups/sec just to compose feeds. Too expensive at this scale.

Cassandra for the social graph

Wide-column for follow relationships

Graph queries are simple (one hop: who do I follow / who follows me). Volume is huge: 2B users × 250 avg follows = 500B edges. Cassandra scales linearly; reads of "all my follows" are one partition lookup.

Graph DB (Neo4j)

True graph traversals

Better for multi-hop ("friends of friends") but Instagram's actual product needs are 1-hop. Neo4j at 500B edges is operationally hard. Cassandra wins for this workload.

Pre-generate photo derivatives

Encode all sizes on upload

Photo set sizes are small (5 sizes × 2 formats = 10 files). Pre-generate; serve from CDN. Predictable cost; instant first-byte.

On-demand image proxy (Imgix-style)

Generate any size on first request

Better when you have many possible sizes (responsive design); not the case here. Pre-generation is simpler at this product's known sizes.

08

What Can Go Wrong

🌪
Celebrity post fan-out storm
A verified account with 600M followers posts. Naive fan-out = 600M Redis writes in seconds → Redis cluster melts.
→ Mitigation: hybrid fan-out (skip write for celebrities; merge at read time from a small per-celebrity feed cache).
🎥
Transcoding queue backlog
Big creator drops a 60-second reel; 50M downstream re-shares trigger transcoding; queue fills; new uploads delayed by hours.
→ Mitigation: priority lanes (verified accounts, organic uploads first); autoscale GPU pool on queue depth; degrade gracefully (serve original + lower-quality fallback while ladder finishes).
📵
Mobile upload failure mid-stream
User on subway uploads a 20 MB reel; tunnel kills connection at 80%. Without resumable, full re-upload — terrible UX.
→ Mitigation: S3 multipart resumable; client retries individual chunks; idempotent upload_id survives across app restarts.
💾
Storage cost explosion
Naive 3× replication of 20 EB media = 60 EB. At $0.023/GB/month S3 standard, ~$1.4B/year just for storage.
→ Mitigation: tiered storage. Hot (last 30 days) on standard with 3× replication. Warm (30d-1y) on Infrequent Access (1.4×). Cold (older + rarely-viewed) on Glacier with erasure coding (1.5×). ~70% cost reduction.
Counter inconsistency
Likes counter shows 1.4M; refresh shows 1.39M. Replicated counter drift visible to user.
→ Mitigation: per-shard counter increments + periodic reconciliation; round display to 1.4M for any value > 100K (spec ambiguity hides drift).
🧨
Hot feed cache key
Celebrity-of-the-week's feed is the most-read key in Redis. One Redis shard drowns; 100k followers see latency spikes.
→ Mitigation: replicate hot keys across multiple Redis shards via consistent-hashing virtual nodes; client-side load-balance reads across replicas.

Anti-patterns

🚫
Store photos + videos in Postgres as blobs

Database sharding is for rows, not multi-MB binaries. Every query pulls massive bytes; cache churn destroys hit ratio.

✓ Better: Direct-to-S3 via pre-signed URLs; metadata rows in Postgres reference S3 keys; CDN serves derivatives.
🚫
Fan out on write to all 500k followers of a celebrity

10M celebrities × average-follower-count = trillions of writes/day. Hot-key Redis explodes.

✓ Better: Hybrid fan-out: push to normal users' timelines, pull-on-read for celebrity accounts.
🚫
Transcode video synchronously on upload

CPU-bound work on the request path. One viral moment = API tier down.

✓ Better: Upload returns immediately; async transcoder consumes from a queue and emits derivatives.
09

Interview Tips

  1. Lead with the media pipeline. Most candidates over-focus on the feed (which is just a Twitter clone) and skim the photo/video pipeline. The transcoding ladder + storage tiering is what makes Instagram different.
  2. Acknowledge the hybrid fan-out tradeoff. Push-on-write for normal users + pull-on-read for celebrities is THE Instagram-specific design choice. Naming it explicitly earns credibility.
  3. Storage tiering as cost answer. When asked "how much will this cost?", show storage tiering math. 70% reduction via cold tiers is concrete; "we'll use S3" is hand-waving.
  4. Reels & feed are different feeds. Home feed is follow-graph based; reels feed is purely algorithmic recommendation. Different ranker, different infra. Treat them as separate problems.
  5. Direct-to-S3 upload. Mention it explicitly. Saves bandwidth at API tier; avoids common candidate mistake of assuming app servers proxy media bytes.
11

Evolution

1

MVP — single Postgres + S3

One DB. App server transcodes synchronously. CDN wrapping S3. Works to ~100K MAU. Instagram literally launched this way (Django + Postgres) in 2010.

2

Direct-to-S3 + async transcoding

App server stops touching bytes. Transcoding moves to a worker queue. Critical for ~10M MAU.

3

Sharded Postgres + Redis feed cache

Posts table sharded by user_id; Redis sorted-set per follower for feed. Fan-out workers consume Kafka. Carries to ~500M MAU.

4

Hybrid fan-out + tiered storage

Celebrity exception added. Storage tiered hot/warm/cold. CDN multi-region. Scales to ~2B MAU (current).

5

ML-driven feed + AV1 video

Home feed transitions from chronological to ranked-with-recommendations. Reels added with separate purely-algorithmic ranker. AV1 transcoding cuts bandwidth ~40% over H.264.

Next up