Latency Numbers

01

Why this matters

Every architectural decision — "do we need a cache?", "should this call be async?", "can we fit this in RAM?" — collapses to the same question: how long does the operation actually take? Engineers who can't estimate this pick the wrong tool constantly. Engineers who can know, without running a benchmark, that reading 1MB from disk will dominate a request and therefore must go through cache.

The numbers below are Jeff Dean's famous list, updated for 2025 hardware. Memorize the orders of magnitude.

02

The ladder — 9 orders of magnitude

Operation	Time	Relative
L1 cache reference	0.5 ns	1×
Branch mispredict	5 ns	10×
L2 cache reference	7 ns	14×
Mutex lock/unlock	25 ns	50×
Main memory reference	100 ns	200×
Compress 1 KB with Snappy	2 μs	4 000×
Send 1 KB over 1 Gbps network	10 μs	20 000×
Read 1 MB sequentially from memory	100 μs	200 000×
SSD random read (NVMe)	16 μs	32 000×
Read 1 MB sequentially from SSD	1 ms	2 000 000×
Round-trip within datacenter	500 μs	1 000 000×
Round-trip CA → Netherlands	150 ms	300 000 000×
Disk seek (spinning, rare now)	10 ms	20 000 000×
Read 1 MB sequentially from disk	20 ms	40 000 000×

03

Scaled to human time

If an L1 access took 1 second:

L2 cache: 14 seconds
RAM: 3.3 minutes
SSD random read: 9 hours
SSD sequential 1MB: 23 days
DC round-trip: 11.5 days
Disk seek: ~7.6 months
CA → Netherlands round-trip: ~9.5 years

This is why reading from disk to serve a user request is insane when the data could be in RAM. It's also why a cross-ocean call must never happen synchronously in a hot path.

04

Deep dive — what this means in practice

Why caches exist. RAM is ~200× faster than SSD. A 1ms DB query becomes a 5μs Redis hit. On a page that makes 20 backend calls, that's 20ms vs 100μs — the difference between "snappy" and "laggy."

Why CDNs exist. A round-trip from Singapore to Virginia is ~180ms. You cannot beat this with code. The only answer is to move the data closer — CDN edges cut this to ~20ms.

Why asynchronous processing exists. A single synchronous cross-datacenter call per request limits you to ~6–7 RPS per thread. An async architecture (queue + workers) decouples request latency from processing latency — the user gets a fast ACK, work happens in the background.

Why compression is cheap. Snappy compresses 1KB in 2μs. Sending it over a 1 Gbps link takes 10μs, but sending 400 bytes (gzipped) takes 4μs. Net win: 4μs total instead of 10μs. For anything crossing the network, compress.

Why sequential beats random. 1MB sequential SSD read = 1ms. 1MB of random 4KB reads = 256 × 16μs = 4ms. 4× slower. This is why B-tree indexes (sequential) outperform pointer-chasing (random) even when the total bytes read are equal.

05

Budget per user request

200 ms

Total request budget

Typical SLO for an interactive API endpoint. Anything over 500ms feels slow to a human.

50 ms

Database per query

If you do 4 sequential queries, you've blown the budget. Parallelize or cache.

10 ms

Cache per lookup

Can afford ~15–20 cache hits inside one request without slowing the user.

1 ms

In-process computation

JSON parse, template render. If yours exceeds this, profile.

06

Used in problems

Every problem sets explicit latency targets in its requirements. These targets drive whether you need CDN, cache, in-memory indices, or async processing.

📺