Replication

01

Why this matters

One database server holds your data. The disk dies. You lose everything. Replication — keeping copies on multiple servers — solves three problems at once: durability (survive hardware failures), availability (serve reads when one replica is down), and read scale (spread reads across replicas).

It introduces two hard problems: when is a write "done" (replica 1? all replicas?) and what happens when replicas disagree. Every DB answers these differently.

02

The three topologies

Leader-follower (primary-replica)

One writer, N readers

All writes go to one leader. The leader ships the change log to N followers. Reads can go anywhere. Simple, well-understood, works for 90% of systems. Postgres, MySQL, MongoDB, Redis default to this.

Multi-leader

Every node accepts writes

Multiple leaders, each in a different datacenter. Writes are eventually reconciled across leaders. Used for multi-region active-active. Conflict resolution is the hard part — last-write-wins, CRDTs, or application-level merging.

Leaderless (Dynamo-style)

No designated leader

Writes and reads go to any N nodes. Quorum rules (W + R > N) ensure consistency. Cassandra, DynamoDB. Maximum availability, weakest consistency.

03

Sync vs async — the latency/durability dial

A write arrives at the leader. The leader can:

Synchronous replication: wait for at least one replica to confirm, then return success. Durable against leader failure. Costs — every write latency = leader + network + replica. If a replica is slow or down, writes stall.
Asynchronous replication: return success immediately after writing locally. Ship to replicas in the background. Fast. Costs — if the leader crashes before shipping, data is lost. Replicas lag (replica lag).
Semi-sync: wait for ≥1 replica to acknowledge receipt (not commit). Middle ground. MySQL's default.

For money: always sync. For social media: async is usually fine (a second of replica lag is invisible; an outage from a slow replica is catastrophic).

+1–5 ms

sync replication cost (in-region)

+70–150 ms

sync replication cost (cross-region)

ms–sec

typical async replica lag

hours

worst-case lag during replica catch-up

04

Deep dive — the three eternal replication bugs

1. Read-your-own-writes. User posts a photo. Write goes to leader. User immediately refreshes — read goes to a lagging replica that doesn't have the photo yet. User panics, re-uploads. Two photos now exist.

Mitigation: route writes and the user's subsequent reads to the same replica for some window (session stickiness). Or force reads-after-writes to hit the leader for a few seconds.

2. Monotonic reads. User refreshes the comment thread: reads from replica A, sees 10 comments. Refreshes again: reads from replica B (more lagged), sees 8. The count went backward.

Mitigation: route each user to a consistent replica (hash by user ID). Or use consistent-read APIs.

3. Failover split-brain. Leader briefly unreachable. Ops promotes a follower to leader. Original leader comes back, still thinks it's leader. Two leaders accept conflicting writes.

Mitigation: fencing tokens (every write includes a monotonically-increasing token; stale leader's writes get rejected). Consensus (Raft/Paxos) picks the unambiguous leader. Never rely on "if I can't reach the other node, I'm the leader."

05

Real-world

Postgres streaming replication

Async leader-follower, tunable to sync

Ships the WAL to followers. Set synchronous_commit=on + synchronous_standby_names for sync. Standard pattern for most apps.

MySQL group replication

Multi-primary optional

Plugin that lets any node accept writes. Uses Paxos for conflict resolution. Tricky to run; most shops stick with leader-follower.

Cassandra

Leaderless

W + R > N quorum. Writes to any N replicas; reads from any R. Tunable per query.

DynamoDB global tables

Multi-leader across regions

Each region is a leader for local writes. Cross-region conflicts resolved last-write-wins. Under 1 second typical convergence.

06

Used in problems

URL shortener uses leader-follower Postgres. Payment gateway uses synchronous replication for the ledger. News feed uses Cassandra's leaderless model for the feed store. WhatsApp uses cross-region multi-primary for message delivery.

📺