Distributed Locking

01

Why this matters

Two instances of your service want to send a reminder email for order #42. Both run the cron job at the same second. Without coordination, the user gets two emails. Or two payment workers process the same refund. Or two job schedulers pick the same task.

In a single process, a mutex solves this — across machines, mutexes don't exist. Distributed locking is the class of techniques that emulates mutual exclusion across a cluster. Getting it wrong produces duplicate work at best, silent data corruption at worst.

02

Why it's hard

A distributed lock is just a shared register: "whoever writes their ID first owns the resource." But three failure modes destroy naive implementations:

Lock holder pauses. Process A acquires the lock, then GC-pauses for 5 seconds. Lock expires. Process B acquires it, does work. A wakes up and also thinks it holds the lock. Both do the work.
Network partition. A holds the lock. Network to the lock store is cut. A can't renew. Lock server expires the lease. B acquires. Partition heals — A now writes with stale belief it's the holder.
Lock server fails over. A holds lock on Redis master. Master dies before replicating to replica. Replica promoted. B acquires same lock on new master. Two holders.

These are real. A correct distributed lock must survive all three.

03

The Redis SET NX pattern (the one you'll use)

The simplest working lock, from Redis docs:

Acquire: SET lock:orders:42 <unique-token> NX PX 30000 — sets the key only if absent, with 30s expiry. Returns OK if you won, nil if someone else holds.
Release: Lua script: if GET lock:orders:42 == unique-token then DEL. Atomic — you can only release a lock you own.
Renewal: If your work takes longer than 30s, periodically extend the TTL via Lua: if GET == token then PEXPIRE 30000.

This works for 95% of use cases. The gap: if Redis fails over before replicating your SET, the replica doesn't know the lock is held, and another process can acquire it. For critical correctness, you need the next level.

Optimistic concurrency (version column)

-- Version column incremented on every write.
CREATE TABLE orders (
  id      BIGSERIAL PRIMARY KEY,
  status  TEXT NOT NULL,
  version INT NOT NULL DEFAULT 1,
  ...
);

-- Update with CAS semantics:
UPDATE orders
SET status = 'shipped', version = version + 1
WHERE id = $1 AND version = $2;   -- $2 = version client fetched

-- If 0 rows updated: conflict; client re-reads + retries.
-- No locks held during user think-time. Wins on low-contention paths;
-- retry-storms on hot rows (use pessimistic there).

Redis SET NX LockMermaid

sequenceDiagram participant A as Worker A participant B as Worker B participant R as Redis A->>R: SET lock:job NX PX 30000 token=a1 R-->>A: OK (won) B->>R: SET lock:job NX PX 30000 token=b1 R-->>B: nil (A holds) A->>A: do work ... A->>R: Lua: if GET == a1 then DEL R-->>A: released B->>R: SET lock:job NX PX 30000 token=b2 R-->>B: OK (won)

04

Options in order of strength

Strategy	Correctness	Complexity	Use when
Redis SET NX	Best-effort	Low	Deduping work, not correctness-critical (email sends, cron)
Redlock (multi-Redis)	Stronger but controversial (Kleppmann vs antirez debate)	Medium	Redis-only shops who want more safety
Consensus-backed (etcd / Zookeeper)	Strong	Medium	When correctness matters; running these anyway for coordination
Fencing tokens	Strong	High (needs resource-side check)	Correctness-critical: payments, inventory decrements

05

Deep dive — fencing tokens

A lock alone can't save you from the GC-pause scenario: A holds lock → pauses → lock expires → B acquires → A unpauses and writes. B's state gets overwritten.

Fencing token: every time the lock is acquired, the lock server returns a monotonically-increasing token. A gets token 33. Pauses. B acquires, gets token 34. B writes to the resource, tagging the write with "token 34." The resource (database) remembers the highest token it's seen. When A finally writes with token 33, the resource says "I've already seen 34, rejecting."

This requires the resource to participate — it must know about tokens and reject stale ones. That's the catch. Not every resource can do this (e.g., third-party APIs). For critical invariants (payments, inventory), make your own backend enforce the token check.

Design pattern

"Redis SET NX for coordination (dedup work, avoid duplicate emails) + fencing tokens in the database for anything that must not double-execute (decrement inventory, charge card)."

GC Pause + Fencing Token Mermaid

sequenceDiagram participant A as Worker A participant B as Worker B participant L as Lock Server participant DB as Resource (DB) A->>L: acquire(orders) L-->>A: granted, token=33 Note over A: GC pause · 15s Note over L: TTL expires · 30s B->>L: acquire(orders) L-->>B: granted, token=34 B->>DB: write x, token=34 DB->>DB: max_token = 34 DB-->>B: ok Note over A: GC done · A still thinks it holds the lock A->>DB: write y, token=33 DB->>DB: 33 < max_token (34) · REJECT DB-->>A: stale-token error

30 s

typical Redis SET NX TTL

10 s

renewal interval (TTL/3)

~5 s

JVM long-GC pause we must survive

1

monotonically-increasing token per acquire

06

Real-world

Redis + Redlock

Most common

Most web apps use the basic SET NX pattern. Redlock (lock across N Redis instances) is used when the team takes distributed-systems purity seriously.

etcd / Zookeeper locks

Strong, battle-tested

Kubernetes leader election, Kafka broker coordination — all built on Zookeeper / etcd locks. Consensus-backed so failover is safe.

DB row lock

Simpler than distributed locks

SELECT ... FOR UPDATE in one DB. Works for single-DB coordination. Limited to DB throughput; doesn't help across services.

Idempotency keys

Better alternative for API deduplication

Client sends Idempotency-Key header; server stores first response by key. Retries return cached response. No locks needed. Stripe pioneered this.

07

Used in problems

Ticketmaster uses distributed locks to prevent double-booking. Payment gateway uses fencing tokens for idempotent charge processing. Distributed job scheduler uses etcd locks for exclusive leadership. Bidding platform locks the auction record during settlement.

📺

References & Videos

Distributed Locking with Redis (Redlock)

Hussein Nasser · 18 min

Distributed Locks — System Design

Arpit Bhayani · 22 min

How to do distributed locking

Martin Kleppmann

Distributed Locks with Redis

Redis Docs

Why this matters

Why it's hard

The Redis SET NX pattern (the one you'll use)

Options in order of strength

Deep dive — fencing tokens

Real-world

Most common

Strong, battle-tested

Simpler than distributed locks

Better alternative for API deduplication

Used in problems

References & Videos

Consensus — Paxos & Raft

Leader Election

Distributed Locking