Your homepage is served from Redis. TTL 5 minutes. At T = 0, the cache expires. At T = 0.001s, 10,000 concurrent requests all miss the cache and all query the origin database at the same time. The DB, designed for 100 RPS, now handles 10,000 RPS. It dies. Every request fails. The cache can't be filled because the DB is down. Outage cascade.
This is a cache stampede (a.k.a. thundering herd, dog-pile). It happens whenever a hot cache entry expires or is invalidated. Mitigation is not optional at scale.
02
Why the stampede happens
Normal cache-aside: check cache → miss → read DB → fill cache → return. When one request misses, 100ms later the cache is warm and every subsequent request hits it.
The stampede: at the moment of expiry, every in-flight request sees a miss simultaneously. They all read the DB (which is slow — that's why you cached). Each succeeds eventually and fills the cache, but the DB is briefly asked to serve full load. Repeat every time the hottest keys expire.
03
Five mitigations
1. Request coalescing (single-flight). First miss locks the key; other concurrent misses wait for the first to fill the cache and read the result. Only one DB call per miss window. Works great in a single process (Go's singleflight package). Across servers, needs distributed coordination — Redis Lua script or per-key lock.
2. Probabilistic early expiry. Before the TTL actually expires, some requests randomly decide "close enough, let me re-fetch." Spreads the refresh across many requests instead of a cliff. A few requests pay the cost; the cache is always warm.
3. Jittered TTL. Instead of all keys expiring at exactly T+300s, set TTL to random(280, 320)s. Spreads expirations across a 40-second window. No cliff = no stampede.
4. Background refresh. A cron job or dedicated worker refreshes hot keys before they expire. Users never see a miss on hot keys — they always return from cache. Good for predictable hot keys (homepage, celebrity profiles).
5. Serve stale while revalidating. When a request sees a slightly-expired entry, return the stale value immediately AND kick off a background refresh. No request waits. Most CDNs support this (stale-while-revalidate directive).
XFetch — probabilistic early recomputation
import random, time, math
def xfetch(key, ttl, beta=1.0, recompute_fn=None):
"""Probabilistically recompute before expiry; prevents stampedes."""
value, delta, expiry = cache_get(key) # delta = last recompute time
now = time.time()
if value is None or now - delta * beta * math.log(random.random()) >= expiry:
value = recompute_fn()
cache_set(key, value, ttl=ttl)
return value
# Without XFetch: at TTL=T+0, all concurrent requests miss → thundering herd.
# With XFetch: probability of early recompute rises smoothly as expiry nears;
# one request wins, others keep serving the stale value until it rotates.
Stampede vs Mitigated — DB Load at Cache ExpirySVG
10 000×
DB load spike (naive)
~5 /sec
DB load (XFetch + jitter)
±20 s
jittered TTL window
β = 1.0
XFetch aggressiveness default
04
Deep dive — probabilistic early expiry
Also called "XFetch" (Vattani, Chierichetti, Lowenstein, 2015). On every cache read, compute a probability that this request should refresh, based on how close to expiry the entry is.
def get(key):
value, fetch_time, ttl = cache.get(key)
now = time.time()
delta = now - fetch_time
# Recompute early with small probability, growing as we near TTL
if delta * beta * log(random()) > ttl - now + fetch_time:
value = compute_fresh()
cache.set(key, value, now, ttl)
return value
Near the start of TTL: probability of refresh ≈ 0. Near the end: probability approaches 1. Parameter beta (typically 1.0) controls aggressiveness. The math ensures only a small fraction of requests pay the refresh cost, but the expected number refreshing rises smoothly to 1 exactly at TTL. No cliff.
Tiny code change, eliminates stampedes for any reasonably-hot key. Combined with jittered TTL and stale-while-revalidate, cache stampedes become unreachable.
05
Real-world
Facebook memcached (2013 paper)
Leases
First miss gets a "lease" to fill the cache. Concurrent misses wait for the lease holder. Famous for eliminating stampedes at FB scale.
Cloudflare / Fastly
stale-while-revalidate
Serve stale value to user; refresh in background. Zero user-visible cache miss on hot content. Standard HTTP cache-control directive.
Go singleflight
In-process coalescing
Standard library package. Concurrent calls for the same key collapse into one. Every Go cache should wrap origin calls in singleflight.Group.
Spring @Cacheable
sync=true for coalescing
JVM: Spring's cache abstraction has a sync flag that prevents stampedes at the method level.
06
Used in problems
News feed jitters TTL across user feed caches. E-commerce uses request coalescing for product detail pages. Ticketmaster uses stale-while-revalidate for event availability (stale by 100ms is fine; stampede is not).