Concept · Databases

CAP Theorem

01

Why this matters

Every distributed database picks one of two behaviors when the network between nodes breaks: return stale data or refuse to serve the request. CAP is the formal statement that you cannot avoid this choice. Pretending otherwise (marketing saying "CP AND AP!") is nonsense.

Getting this right in interviews means naming what your system does under partition and why. "Cassandra is AP" is worth saying. "Postgres with synchronous replication is CP" is worth saying. "We're eventually consistent" is a red flag without the CAP trade explicit.

02

The three letters

C — Consistency. Every read sees the most recent write, or an error. "Linearizable" is the formal term. NOT the C in ACID.

A — Availability. Every request gets a non-error response. You may get stale data, but you don't get a 500.

P — Partition tolerance. The system keeps working when the network between nodes drops messages.

In a real distributed system, P is not optional. Networks will partition — transient congestion, a cable cut, a misconfigured firewall, whole-region outages. So the real choice is C or A when P happens.

03

The choice made visible

Imagine your database runs in two regions, US-East and EU-West. A submarine cable fails — they can't talk. A user writes to US-East. An EU user now tries to read the same key.

CP system (Spanner, Zookeeper, etcd): EU read blocks or errors. You can't confirm the value is latest, so you refuse. Consistency preserved; EU user sees downtime.

AP system (Cassandra, DynamoDB, Riak): EU read returns the old value (or whatever local replica has). Request succeeds. Consistency sacrificed; the two regions now diverge until the network heals, then they reconcile (last-write-wins, vector clocks, etc.).

Neither is universally better. A banking ledger must be CP. A social media feed must be AP. An interviewer asking about CAP wants to see you identify which side your system is on, and why the other side would be wrong.

CP vs AP Under PartitionMermaid
sequenceDiagram participant U as US User participant USE as US Region participant EUW as EU Region participant E as EU User Note over USE,EUW: Network partition between regions U->>USE: write(key=X, val=42) USE-->>U: OK rect rgb(255, 240, 240) Note over EUW: CP system E->>EUW: read(key=X) EUW-->>E: ERROR · partition end rect rgb(240, 255, 240) Note over EUW: AP system E->>EUW: read(key=X) EUW-->>E: old value · 41 Note over EUW: will reconcile when partition heals end
04

Where common databases land

DatabaseCAP stanceWhy
Postgres (primary + sync replicas)CPSync replication blocks writes if replica unreachable. Primary fails? No writes until failover.
MySQL (async replicas)AP-ishPrimary stays writable during partition. Replicas drift, reconcile after.
CassandraAP (tunable)Configurable consistency per query. QUORUM reads/writes lean CP. ONE leans AP.
DynamoDBAPOptional strongly-consistent reads (slower, more expensive). Defaults eventual.
MongoDB (replica set)CP by defaultWrites go to primary. During partition, minority side refuses writes.
Redis (single-node)CA (not P-tolerant)One node = no partition to worry about. Redis Cluster becomes AP.
SpannerCPUses TrueTime + Paxos. External consistency at global scale. Pays latency cost.
etcd / Zookeeper / ConsulCPRaft-based. Refuses writes below quorum. Used exactly because you need CP.
05

When to pick each

CP

Correctness over availability

Anything involving money, inventory, uniqueness (usernames), locks, or coordination. Better to reject a write than to have two airline seats sold to different people.

AP

Availability over freshness

Social feeds, counters, notifications, analytics, caches. A tweet showing on one phone 2 seconds before another is fine. Downtime is not fine.

06

Deep dive — CAP is the wrong framing for the no-partition case

CAP only talks about what happens during a partition. But partitions are rare; normal operation is far more common. The real tradeoff you make in the steady state is between latency and consistency — see PACELC.

A system can be "AP during partition, EL during normal" (Cassandra). Or "CP during partition, EC during normal" (Spanner). These are very different beasts, even though both "pick C." Interviewers who've worked with distributed systems will push you on this — don't stop at CAP.

Common misunderstanding

"CAP says pick 2 of 3." No — CAP says during a network partition, you pick C or A. In the absence of a partition, you can have both. The confusion comes from the original phrasing and gets repeated forever.

07

Used in problems

Google Docs chooses CP for concurrent edits (OT/CRDT need ordering). News feed chooses AP for engagement counts. Payment gateway chooses CP for the ledger. WhatsApp messaging chooses AP for delivery (better to double-deliver than lose).

Next up