Problem Statement
On a single machine, locking is solved — your language gives you mutexes backed by shared memory and atomic CPU instructions. In a distributed system, there's no shared memory. Process A on Machine 1 and Process B on Machine 2 can only coordinate via the network, which is unreliable. The lock state must live somewhere both can access — a separate coordination service.
The central insight: A distributed lock, by itself, does NOT guarantee mutual exclusion. If a lock holder pauses (GC) and the lock expires (TTL), another holder acquires it. Two holders operate simultaneously. True mutual exclusion requires a lock service plus fencing tokens at the protected resource.
Three Categories of Locking Need
Tier 1 — Efficiency
Prevent duplicate work (cache stampede, job dedup). Violation = wasted work. Tolerable. Use Redis SET NX.
Tier 2 — Correctness
Protect shared state (wallet balance, leader election). Violation = data corruption, split brain. Catastrophic. Use consensus + fencing.
The most common production mistake: using a Tier 1 lock for a Tier 2 problem. Fast, simple, almost always correct — until the one time it isn't, and $50,000 vanishes from a user's account.
The Naive Lock & Its Four Failures
A database row lock (INSERT INTO locks ON CONFLICT DO NOTHING) fails in four ways: holder crashes → lock stuck forever (no TTL), TTL expires while holder works (two holders simultaneously), network partition (lock unreachable), and clock skew (TTL unreliable). Every component in our distributed design solves one of these.