Back-of-Envelope Estimation

01

Why this matters

Every design decision hinges on numbers. Do we need sharding? Only if writes exceed ~10k/sec. Do we need a cache? Only if read latency is a bottleneck. Do we need a CDN? Only if users are geographically far from origin. You can't answer any of these without doing back-of-envelope math first. Interviewers care about this more than any specific technology.

The goal isn't precision — it's order of magnitude. Knowing it's "10k QPS" vs "10M QPS" changes the architecture entirely. Knowing it's exactly 11,437 QPS vs 9,812 QPS changes nothing.

02

The numbers you must memorize

Quantity	Value	Mnemonic
Seconds/day	86,400	"~100k" — round up for estimates
Seconds/month	2.5M	86.4k × 30
Seconds/year	31.5M	"π × 10⁷"
Bytes per char (ASCII)	1	—
Bytes per char (UTF-8 general)	~2	pessimistic average
KB / MB / GB / TB / PB	10³ / 10⁶ / 10⁹ / 10¹² / 10¹⁵	base 10 for estimation
Days in 5 years	1825	for storage projections

03

The QPS / storage / bandwidth recipe

Given N DAU (daily active users) and k actions/user/day:

Writes/sec = (N × k) ÷ 86,400
Reads/sec = Writes × read-to-write ratio (typical: 10× to 1000× for social apps)
Storage/day = (N × k) × avg-record-size
Storage/5yr = storage/day × 1825 × replication-factor (3× typical)
Bandwidth/sec = Reads × avg-response-size

Worked example — Twitter-style feed, 100M DAU, 2 posts + 200 views each per day:

2,300

writes/sec (100M × 2 / 86.4k)

230,000

reads/sec (100× writes)

200 GB

posts/day (1KB each)

1.1 PB

5-year storage (3× replication)

230 MB/s

read bandwidth (1KB/response)

Those numbers immediately drive decisions: 2,300 writes/sec fits a single Postgres. 230k reads/sec needs Redis + CDN. 1.1 PB needs object storage and tiered storage. 230 MB/s is fine for one region but doubles per extra continent served.

04

Shortcuts that save time

86400 ≈ 100k — round up, the error is ~15% which is fine.
1M DAU × 1 action/day ≈ 12 QPS — memorize this as an anchor. 100M DAU → 1200 QPS per action. 1B DAU → 12k QPS per action.
1 KB × 1 QPS = ~1 KB/s — scale bandwidth by record size × QPS.
Peak ≈ 2–3× average — if daily average is 1200 QPS, plan for ~3000 QPS at peak hours. Interviewers reward this.
Replication factor 3 — always assume 3× storage for durability. Some systems (Ceph, MinIO erasure-coded) can do 1.5× but 3× is the safe baseline.

05

Deep dive — the memorizable latency ladder

Jeff Dean's "numbers every engineer should know" — updated. These explain why caches, CDNs, and in-region placement matter.

Operation	Latency	Scale if 1ns = 1 sec
L1 cache	0.5 ns	0.5 sec
L2 cache	7 ns	7 sec
RAM access	100 ns	~2 min
SSD random read	~16 μs	~4.5 hours
Redis / in-region	~0.5–1 ms	~1.5 weeks
Cross-AZ round-trip	~1 ms	~2 weeks
Cross-region (SF → NY)	~70 ms	~2 years
Cross-continent (US → EU)	~100 ms	~3 years
Cross-ocean (US → Asia)	~180 ms	~5.7 years

Two takeaways for interviews: (1) RAM is ~200× faster than SSD, SSD is ~30× faster than a network round-trip — that's why caches work. (2) Cross-ocean latency is physics; no amount of engineering beats the speed of light. This is why you need multi-region deployments and CDNs.

06

Example estimations

URL Shortener

~100M short links/year

100M × 7 chars = 700MB raw. Add 500B of metadata per entry = ~50GB/year. Fits in one Postgres. Reads: 10k/sec at peak, easy with Redis cache.

News Feed

200M DAU, 3 reads per second each at peak

600M RPS across fleet → impossible per-DB. Need Redis feed cache (200 post IDs per user × 200M = 40 GB), per-follower pre-computation.

2B users, 100B messages/day

That's 1.2M writes/sec — 100× what a single Cassandra cluster handles. Shard by (user_id, chat_id). Persistent TCP connections ≈ 2B sockets → need ~100k servers × 20k conns each.

Uber

5M drivers, location update every 4s

1.25M writes/sec of (driver_id, lat, lng). Redis sorted sets by geohash prefix handles this. Matching RPS ≈ 200 rides/sec × 10 candidates = 2k lookups/sec — trivially easy.

07