04
Deep dive — the three eternal replication bugs
1. Read-your-own-writes. User posts a photo. Write goes to leader. User immediately refreshes — read goes to a lagging replica that doesn't have the photo yet. User panics, re-uploads. Two photos now exist.
Mitigation: route writes and the user's subsequent reads to the same replica for some window (session stickiness). Or force reads-after-writes to hit the leader for a few seconds.
2. Monotonic reads. User refreshes the comment thread: reads from replica A, sees 10 comments. Refreshes again: reads from replica B (more lagged), sees 8. The count went backward.
Mitigation: route each user to a consistent replica (hash by user ID). Or use consistent-read APIs.
3. Failover split-brain. Leader briefly unreachable. Ops promotes a follower to leader. Original leader comes back, still thinks it's leader. Two leaders accept conflicting writes.
Mitigation: fencing tokens (every write includes a monotonically-increasing token; stale leader's writes get rejected). Consensus (Raft/Paxos) picks the unambiguous leader. Never rely on "if I can't reach the other node, I'm the leader."