Concept · Foundations

Latency Numbers

01

Why this matters

Every architectural decision — "do we need a cache?", "should this call be async?", "can we fit this in RAM?" — collapses to the same question: how long does the operation actually take? Engineers who can't estimate this pick the wrong tool constantly. Engineers who can know, without running a benchmark, that reading 1MB from disk will dominate a request and therefore must go through cache.

The numbers below are Jeff Dean's famous list, updated for 2025 hardware. Memorize the orders of magnitude.

02

The ladder — 9 orders of magnitude

OperationTimeRelative
L1 cache reference0.5 ns
Branch mispredict5 ns10×
L2 cache reference7 ns14×
Mutex lock/unlock25 ns50×
Main memory reference100 ns200×
Compress 1 KB with Snappy2 μs4 000×
Send 1 KB over 1 Gbps network10 μs20 000×
Read 1 MB sequentially from memory100 μs200 000×
SSD random read (NVMe)16 μs32 000×
Read 1 MB sequentially from SSD1 ms2 000 000×
Round-trip within datacenter500 μs1 000 000×
Round-trip CA → Netherlands150 ms300 000 000×
Disk seek (spinning, rare now)10 ms20 000 000×
Read 1 MB sequentially from disk20 ms40 000 000×
03

Scaled to human time

If an L1 access took 1 second:

  • L2 cache: 14 seconds
  • RAM: 3.3 minutes
  • SSD random read: 9 hours
  • SSD sequential 1MB: 23 days
  • DC round-trip: 11.5 days
  • Disk seek: ~7.6 months
  • CA → Netherlands round-trip: ~9.5 years

This is why reading from disk to serve a user request is insane when the data could be in RAM. It's also why a cross-ocean call must never happen synchronously in a hot path.

04

Deep dive — what this means in practice

Why caches exist. RAM is ~200× faster than SSD. A 1ms DB query becomes a 5μs Redis hit. On a page that makes 20 backend calls, that's 20ms vs 100μs — the difference between "snappy" and "laggy."

Why CDNs exist. A round-trip from Singapore to Virginia is ~180ms. You cannot beat this with code. The only answer is to move the data closer — CDN edges cut this to ~20ms.

Why asynchronous processing exists. A single synchronous cross-datacenter call per request limits you to ~6–7 RPS per thread. An async architecture (queue + workers) decouples request latency from processing latency — the user gets a fast ACK, work happens in the background.

Why compression is cheap. Snappy compresses 1KB in 2μs. Sending it over a 1 Gbps link takes 10μs, but sending 400 bytes (gzipped) takes 4μs. Net win: 4μs total instead of 10μs. For anything crossing the network, compress.

Why sequential beats random. 1MB sequential SSD read = 1ms. 1MB of random 4KB reads = 256 × 16μs = 4ms. 4× slower. This is why B-tree indexes (sequential) outperform pointer-chasing (random) even when the total bytes read are equal.

05

Budget per user request

200 ms

Total request budget

Typical SLO for an interactive API endpoint. Anything over 500ms feels slow to a human.

50 ms

Database per query

If you do 4 sequential queries, you've blown the budget. Parallelize or cache.

10 ms

Cache per lookup

Can afford ~15–20 cache hits inside one request without slowing the user.

1 ms

In-process computation

JSON parse, template render. If yours exceeds this, profile.

06

Used in problems

Every problem sets explicit latency targets in its requirements. These targets drive whether you need CDN, cache, in-memory indices, or async processing.

Next up