Two instances of your service want to send a reminder email for order #42. Both run the cron job at the same second. Without coordination, the user gets two emails. Or two payment workers process the same refund. Or two job schedulers pick the same task.
In a single process, a mutex solves this — across machines, mutexes don't exist. Distributed locking is the class of techniques that emulates mutual exclusion across a cluster. Getting it wrong produces duplicate work at best, silent data corruption at worst.
02
Why it's hard
A distributed lock is just a shared register: "whoever writes their ID first owns the resource." But three failure modes destroy naive implementations:
Lock holder pauses. Process A acquires the lock, then GC-pauses for 5 seconds. Lock expires. Process B acquires it, does work. A wakes up and also thinks it holds the lock. Both do the work.
Network partition. A holds the lock. Network to the lock store is cut. A can't renew. Lock server expires the lease. B acquires. Partition heals — A now writes with stale belief it's the holder.
Lock server fails over. A holds lock on Redis master. Master dies before replicating to replica. Replica promoted. B acquires same lock on new master. Two holders.
These are real. A correct distributed lock must survive all three.
03
The Redis SET NX pattern (the one you'll use)
The simplest working lock, from Redis docs:
Acquire:SET lock:orders:42 <unique-token> NX PX 30000 — sets the key only if absent, with 30s expiry. Returns OK if you won, nil if someone else holds.
Release: Lua script: if GET lock:orders:42 == unique-token then DEL. Atomic — you can only release a lock you own.
Renewal: If your work takes longer than 30s, periodically extend the TTL via Lua: if GET == token then PEXPIRE 30000.
This works for 95% of use cases. The gap: if Redis fails over before replicating your SET, the replica doesn't know the lock is held, and another process can acquire it. For critical correctness, you need the next level.
Optimistic concurrency (version column)
-- Version column incremented on every write.
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
status TEXT NOT NULL,
version INT NOT NULL DEFAULT 1,
...
);
-- Update with CAS semantics:
UPDATE orders
SET status = 'shipped', version = version + 1
WHERE id = $1 AND version = $2; -- $2 = version client fetched
-- If 0 rows updated: conflict; client re-reads + retries.
-- No locks held during user think-time. Wins on low-contention paths;
-- retry-storms on hot rows (use pessimistic there).
Redis SET NX LockMermaid
sequenceDiagram
participant A as Worker A
participant B as Worker B
participant R as Redis
A->>R: SET lock:job NX PX 30000 token=a1
R-->>A: OK (won)
B->>R: SET lock:job NX PX 30000 token=b1
R-->>B: nil (A holds)
A->>A: do work ...
A->>R: Lua: if GET == a1 then DEL
R-->>A: released
B->>R: SET lock:job NX PX 30000 token=b2
R-->>B: OK (won)
04
Options in order of strength
Strategy
Correctness
Complexity
Use when
Redis SET NX
Best-effort
Low
Deduping work, not correctness-critical (email sends, cron)
Redlock (multi-Redis)
Stronger but controversial (Kleppmann vs antirez debate)
Medium
Redis-only shops who want more safety
Consensus-backed (etcd / Zookeeper)
Strong
Medium
When correctness matters; running these anyway for coordination
A lock alone can't save you from the GC-pause scenario: A holds lock → pauses → lock expires → B acquires → A unpauses and writes. B's state gets overwritten.
Fencing token: every time the lock is acquired, the lock server returns a monotonically-increasing token. A gets token 33. Pauses. B acquires, gets token 34. B writes to the resource, tagging the write with "token 34." The resource (database) remembers the highest token it's seen. When A finally writes with token 33, the resource says "I've already seen 34, rejecting."
This requires the resource to participate — it must know about tokens and reject stale ones. That's the catch. Not every resource can do this (e.g., third-party APIs). For critical invariants (payments, inventory), make your own backend enforce the token check.
Design pattern
"Redis SET NX for coordination (dedup work, avoid duplicate emails) + fencing tokens in the database for anything that must not double-execute (decrement inventory, charge card)."
GC Pause + Fencing TokenMermaid
sequenceDiagram
participant A as Worker A
participant B as Worker B
participant L as Lock Server
participant DB as Resource (DB)
A->>L: acquire(orders)
L-->>A: granted, token=33
Note over A: GC pause · 15s
Note over L: TTL expires · 30s
B->>L: acquire(orders)
L-->>B: granted, token=34
B->>DB: write x, token=34
DB->>DB: max_token = 34
DB-->>B: ok
Note over A: GC done · A still thinks it holds the lock
A->>DB: write y, token=33
DB->>DB: 33 < max_token (34) · REJECT
DB-->>A: stale-token error
30 s
typical Redis SET NX TTL
10 s
renewal interval (TTL/3)
~5 s
JVM long-GC pause we must survive
1
monotonically-increasing token per acquire
06
Real-world
Redis + Redlock
Most common
Most web apps use the basic SET NX pattern. Redlock (lock across N Redis instances) is used when the team takes distributed-systems purity seriously.
etcd / Zookeeper locks
Strong, battle-tested
Kubernetes leader election, Kafka broker coordination — all built on Zookeeper / etcd locks. Consensus-backed so failover is safe.
DB row lock
Simpler than distributed locks
SELECT ... FOR UPDATE in one DB. Works for single-DB coordination. Limited to DB throughput; doesn't help across services.
Idempotency keys
Better alternative for API deduplication
Client sends Idempotency-Key header; server stores first response by key. Retries return cached response. No locks needed. Stripe pioneered this.
07
Used in problems
Ticketmaster uses distributed locks to prevent double-booking. Payment gateway uses fencing tokens for idempotent charge processing. Distributed job scheduler uses etcd locks for exclusive leadership. Bidding platform locks the auction record during settlement.