Cache Stampede

01

Why this matters

Your homepage is served from Redis. TTL 5 minutes. At T = 0, the cache expires. At T = 0.001s, 10,000 concurrent requests all miss the cache and all query the origin database at the same time. The DB, designed for 100 RPS, now handles 10,000 RPS. It dies. Every request fails. The cache can't be filled because the DB is down. Outage cascade.

This is a cache stampede (a.k.a. thundering herd, dog-pile). It happens whenever a hot cache entry expires or is invalidated. Mitigation is not optional at scale.

02

Why the stampede happens

Normal cache-aside: check cache → miss → read DB → fill cache → return. When one request misses, 100ms later the cache is warm and every subsequent request hits it.

The stampede: at the moment of expiry, every in-flight request sees a miss simultaneously. They all read the DB (which is slow — that's why you cached). Each succeeds eventually and fills the cache, but the DB is briefly asked to serve full load. Repeat every time the hottest keys expire.

03

Five mitigations

1. Request coalescing (single-flight). First miss locks the key; other concurrent misses wait for the first to fill the cache and read the result. Only one DB call per miss window. Works great in a single process (Go's singleflight package). Across servers, needs distributed coordination — Redis Lua script or per-key lock.

2. Probabilistic early expiry. Before the TTL actually expires, some requests randomly decide "close enough, let me re-fetch." Spreads the refresh across many requests instead of a cliff. A few requests pay the cost; the cache is always warm.

3. Jittered TTL. Instead of all keys expiring at exactly T+300s, set TTL to random(280, 320)s. Spreads expirations across a 40-second window. No cliff = no stampede.

4. Background refresh. A cron job or dedicated worker refreshes hot keys before they expire. Users never see a miss on hot keys — they always return from cache. Good for predictable hot keys (homepage, celebrity profiles).

5. Serve stale while revalidating. When a request sees a slightly-expired entry, return the stale value immediately AND kick off a background refresh. No request waits. Most CDNs support this (stale-while-revalidate directive).

XFetch — probabilistic early recomputation

import random, time, math

def xfetch(key, ttl, beta=1.0, recompute_fn=None):
    """Probabilistically recompute before expiry; prevents stampedes."""
    value, delta, expiry = cache_get(key)  # delta = last recompute time
    now = time.time()
    if value is None or now - delta * beta * math.log(random.random()) >= expiry:
        value = recompute_fn()
        cache_set(key, value, ttl=ttl)
    return value

# Without XFetch: at TTL=T+0, all concurrent requests miss → thundering herd.
# With XFetch: probability of early recompute rises smoothly as expiry nears;
# one request wins, others keep serving the stale value until it rotates.

Stampede vs Mitigated — DB Load at Cache Expiry SVG

10 000×

DB load spike (naive)

~5 /sec

DB load (XFetch + jitter)

±20 s

jittered TTL window

β = 1.0

XFetch aggressiveness default

04

Deep dive — probabilistic early expiry

Also called "XFetch" (Vattani, Chierichetti, Lowenstein, 2015). On every cache read, compute a probability that this request should refresh, based on how close to expiry the entry is.

def get(key):
  value, fetch_time, ttl = cache.get(key)
  now = time.time()
  delta = now - fetch_time
  # Recompute early with small probability, growing as we near TTL
  if delta * beta * log(random()) > ttl - now + fetch_time:
    value = compute_fresh()
    cache.set(key, value, now, ttl)
  return value

Near the start of TTL: probability of refresh ≈ 0. Near the end: probability approaches 1. Parameter beta (typically 1.0) controls aggressiveness. The math ensures only a small fraction of requests pay the refresh cost, but the expected number refreshing rises smoothly to 1 exactly at TTL. No cliff.

Tiny code change, eliminates stampedes for any reasonably-hot key. Combined with jittered TTL and stale-while-revalidate, cache stampedes become unreachable.

05

Real-world

Facebook memcached (2013 paper)

Leases

First miss gets a "lease" to fill the cache. Concurrent misses wait for the lease holder. Famous for eliminating stampedes at FB scale.

Cloudflare / Fastly

stale-while-revalidate

Serve stale value to user; refresh in background. Zero user-visible cache miss on hot content. Standard HTTP cache-control directive.

Go singleflight

In-process coalescing

Standard library package. Concurrent calls for the same key collapse into one. Every Go cache should wrap origin calls in singleflight.Group.

Spring @Cacheable

sync=true for coalescing

JVM: Spring's cache abstraction has a sync flag that prevents stampedes at the method level.

06

Used in problems

News feed jitters TTL across user feed caches. E-commerce uses request coalescing for product detail pages. Ticketmaster uses stale-while-revalidate for event availability (stale by 100ms is fine; stampede is not).

📺