Concept · Caching

Cache Eviction

01

Why this matters

Your cache has 16 GB of RAM. Your working set grows to 20 GB. Something must go. Which entries you evict — and which you keep — is the difference between a 95% hit rate and a 50% hit rate. At scale, that gap is millions of dollars of backend load and seconds of page latency.

The policy that decides is the eviction algorithm. LRU dominates, but it's not always the right choice. Know the menu and when each one wins.

02

The five main policies

PolicyEvictsBest forWeakness
LRULeast recently usedTemporal locality — hot data stays hot for a whileOne-time scan (e.g., big export) wipes your cache
LFULeast frequently usedLong-tail popularity — top 10% served 90% of trafficNew items that haven't had time to accumulate hits get evicted instantly
FIFOOldest insertedSimple, low-overheadIgnores access patterns entirely; poor hit rate
ARCAdaptive — blends LRU and LFUMixed workloads, unknown access patternPatented for years (IBM); now available in some systems (ZFS)
TinyLFU / W-TinyLFUFrequency-based with scan protectionModern default — Caffeine, Redis 4+More complex to implement
03

LRU in one paragraph

Every access moves the entry to the head of a doubly-linked list. When space is needed, drop the tail. O(1) insert, access, and evict. The structure: hash map (key → list node) + doubly-linked list. Every major language has a standard implementation. Redis's maxmemory-policy allkeys-lru is exactly this.

LRU's Achilles' heel: one-time sequential scans. A nightly job that reads 10M rows in order will kick every useful entry out of the cache. Every morning, you wake up to a cold cache and a degraded first hour.

LRU cache — hashmap + doubly-linked list
class Node:
    __slots__ = "k", "v", "prev", "next"
    def __init__(self, k, v):
        self.k, self.v = k, v
        self.prev = self.next = None

class LRU:
    def __init__(self, cap):
        self.cap, self.map = cap, {}
        self.head, self.tail = Node(0, 0), Node(0, 0)  # sentinels
        self.head.next, self.tail.prev = self.tail, self.head

    def _remove(self, n):
        n.prev.next, n.next.prev = n.next, n.prev

    def _add(self, n):  # insert right after head
        n.next = self.head.next; n.prev = self.head
        self.head.next.prev = n; self.head.next = n

    def get(self, k):
        if k not in self.map: return -1
        n = self.map[k]; self._remove(n); self._add(n)
        return n.v

    def put(self, k, v):
        if k in self.map: self._remove(self.map[k])
        elif len(self.map) == self.cap:
            lru = self.tail.prev; self._remove(lru); del self.map[lru.k]
        n = Node(k, v); self._add(n); self.map[k] = n
04

Deep dive — why TinyLFU beats LRU in modern systems

LRU treats one-time and popular items the same: if you saw it recently, it stays. TinyLFU (2015) separates the two by using a frequency sketch (Count-Min Sketch) to estimate how often each key has been seen overall.

Admission policy: when a new item arrives and you need to evict something to fit it, compare the new item's global frequency to the victim's. Only admit if the new item is more popular. A one-time scan adds items with frequency 1 — they fail the admission check, so the scan doesn't evict hot items.

Hit rate gains are real: Caffeine (Java) reports 20–40% higher hit rates than LRU on production traces. Redis adopted the approach in its LFU mode from 4.0 onward (allkeys-lfu). In 2025, the defensible default for a new cache is W-TinyLFU (Caffeine) or LFU with decay (Redis), not plain LRU.

Rule of thumb

If your workload has stable "hot items" (product catalog, celebrity accounts, popular searches), switch to LFU. If you primarily see temporal locality without a long-tail popularity curve, LRU is fine.

05

TTL as a form of eviction

Besides memory-pressure eviction, most caches evict on time-to-live (TTL). An entry set with SET key value EX 300 lives 5 minutes regardless of access frequency.

TTL is your tool for bounded staleness — you're saying "I accept at most this much out-of-date data." Lower TTL = fresher data, lower hit rate. Higher TTL = stale data risk, higher hit rate. See cache strategies for how TTL interacts with cache-aside vs write-through.

06

Real-world

Redis

Configurable policy

allkeys-lru, allkeys-lfu, volatile-lru (only entries with TTL), noeviction (refuse writes when full). LFU recommended for most workloads.

Memcached

LRU only

Simple, fast, no frills. Segmented LRU with 4 queues to approximate LFU behavior.

Caffeine (Java)

W-TinyLFU default

The default in Spring. Industry-leading hit rates. Adaptive window sizing.

OS page cache

Variant of LRU (clock algorithm)

Linux page cache uses a clock-based approximation of LRU — cheaper than exact LRU. All file I/O benefits from it.

07

Used in problems

URL shortener uses LRU for redirect cache (temporal locality). News feed uses LFU for feed cache (hot-user skew). Typeahead uses LFU for prefix suggestions (popular queries dominate).

Next up