Cache Eviction

01

Why this matters

Your cache has 16 GB of RAM. Your working set grows to 20 GB. Something must go. Which entries you evict — and which you keep — is the difference between a 95% hit rate and a 50% hit rate. At scale, that gap is millions of dollars of backend load and seconds of page latency.

The policy that decides is the eviction algorithm. LRU dominates, but it's not always the right choice. Know the menu and when each one wins.

02

The five main policies

Policy	Evicts	Best for	Weakness
LRU	Least recently used	Temporal locality — hot data stays hot for a while	One-time scan (e.g., big export) wipes your cache
LFU	Least frequently used	Long-tail popularity — top 10% served 90% of traffic	New items that haven't had time to accumulate hits get evicted instantly
FIFO	Oldest inserted	Simple, low-overhead	Ignores access patterns entirely; poor hit rate
ARC	Adaptive — blends LRU and LFU	Mixed workloads, unknown access pattern	Patented for years (IBM); now available in some systems (ZFS)
TinyLFU / W-TinyLFU	Frequency-based with scan protection	Modern default — Caffeine, Redis 4+	More complex to implement

03

LRU in one paragraph

Every access moves the entry to the head of a doubly-linked list. When space is needed, drop the tail. O(1) insert, access, and evict. The structure: hash map (key → list node) + doubly-linked list. Every major language has a standard implementation. Redis's maxmemory-policy allkeys-lru is exactly this.

LRU's Achilles' heel: one-time sequential scans. A nightly job that reads 10M rows in order will kick every useful entry out of the cache. Every morning, you wake up to a cold cache and a degraded first hour.

LRU cache — hashmap + doubly-linked list

class Node:
    __slots__ = "k", "v", "prev", "next"
    def __init__(self, k, v):
        self.k, self.v = k, v
        self.prev = self.next = None

class LRU:
    def __init__(self, cap):
        self.cap, self.map = cap, {}
        self.head, self.tail = Node(0, 0), Node(0, 0)  # sentinels
        self.head.next, self.tail.prev = self.tail, self.head

    def _remove(self, n):
        n.prev.next, n.next.prev = n.next, n.prev

    def _add(self, n):  # insert right after head
        n.next = self.head.next; n.prev = self.head
        self.head.next.prev = n; self.head.next = n

    def get(self, k):
        if k not in self.map: return -1
        n = self.map[k]; self._remove(n); self._add(n)
        return n.v

    def put(self, k, v):
        if k in self.map: self._remove(self.map[k])
        elif len(self.map) == self.cap:
            lru = self.tail.prev; self._remove(lru); del self.map[lru.k]
        n = Node(k, v); self._add(n); self.map[k] = n

04

Deep dive — why TinyLFU beats LRU in modern systems

LRU treats one-time and popular items the same: if you saw it recently, it stays. TinyLFU (2015) separates the two by using a frequency sketch (Count-Min Sketch) to estimate how often each key has been seen overall.

Admission policy: when a new item arrives and you need to evict something to fit it, compare the new item's global frequency to the victim's. Only admit if the new item is more popular. A one-time scan adds items with frequency 1 — they fail the admission check, so the scan doesn't evict hot items.

Hit rate gains are real: Caffeine (Java) reports 20–40% higher hit rates than LRU on production traces. Redis adopted the approach in its LFU mode from 4.0 onward (allkeys-lfu). In 2025, the defensible default for a new cache is W-TinyLFU (Caffeine) or LFU with decay (Redis), not plain LRU.

Rule of thumb

If your workload has stable "hot items" (product catalog, celebrity accounts, popular searches), switch to LFU. If you primarily see temporal locality without a long-tail popularity curve, LRU is fine.

05

TTL as a form of eviction

Besides memory-pressure eviction, most caches evict on time-to-live (TTL). An entry set with SET key value EX 300 lives 5 minutes regardless of access frequency.

TTL is your tool for bounded staleness — you're saying "I accept at most this much out-of-date data." Lower TTL = fresher data, lower hit rate. Higher TTL = stale data risk, higher hit rate. See cache strategies for how TTL interacts with cache-aside vs write-through.

06

Real-world

Redis

Configurable policy

allkeys-lru, allkeys-lfu, volatile-lru (only entries with TTL), noeviction (refuse writes when full). LFU recommended for most workloads.

Memcached

LRU only

Simple, fast, no frills. Segmented LRU with 4 queues to approximate LFU behavior.

Caffeine (Java)

W-TinyLFU default

The default in Spring. Industry-leading hit rates. Adaptive window sizing.

OS page cache

Variant of LRU (clock algorithm)

Linux page cache uses a clock-based approximation of LRU — cheaper than exact LRU. All file I/O benefits from it.

07

Used in problems

URL shortener uses LRU for redirect cache (temporal locality). News feed uses LFU for feed cache (hot-user skew). Typeahead uses LFU for prefix suggestions (popular queries dominate).

📺

References & Videos

Caching — System Design Interview Concept

Gaurav Sen · 10 min

Cache Eviction Policies Explained

ByteByteGo · 12 min

Redis Eviction Policies Deep Dive

Hussein Nasser · 35 min

Redis Eviction Policies

Redis Docs

Cache Eviction Policies

GeeksforGeeks

Why this matters

The five main policies

LRU in one paragraph

Deep dive — why TinyLFU beats LRU in modern systems

TTL as a form of eviction

Real-world

Configurable policy

LRU only

W-TinyLFU default

Variant of LRU (clock algorithm)

Used in problems

References & Videos

Cache Strategies

Cache Stampede

URL Shortener