Concept · Data Structures

Merkle Trees

01

Why this matters

Two Cassandra replicas hold 100M rows each. They're supposed to be identical. Something went wrong and one has a few stale entries. Comparing all 100M rows to find the differences would take hours and saturate the network. A Merkle tree lets you find exactly which rows diverge in O(log N) comparisons of small hashes. Same trick underpins Bitcoin, git, rsync, and every distributed DB's anti-entropy process.

02

How it works

Take your data. Split it into chunks. Hash each chunk → leaf hashes. Pair up leaves, concatenate-and-hash each pair → level-1 hashes. Pair those, hash → level-2. Continue until one root hash remains.

Two key properties:

  • Any change to a chunk → the root hash changes. A single bit flip deep in the data propagates up to the root.
  • If two roots match, all data matches. Guaranteed (modulo hash collisions).

If roots differ, recurse into child subtrees to find where the divergence is. Two children: one matches, one doesn't. Go into the non-matching one. Continue. You find all differences in O(log N + d) comparisons where d = number of differences.

Merkle TreeSVG
root h(L, R) h(L, R) h(A) h(B) h(C) h(D) chunk A chunk B chunk C chunk D
03

Anti-entropy in Dynamo/Cassandra

Periodically (every few hours), replicas run anti-entropy repair:

  1. Each replica builds a Merkle tree over the key range it owns.
  2. Replicas exchange roots.
  3. Roots match → done, data identical, zero network spent on the actual rows.
  4. Roots differ → exchange next level. Recurse on differing children.
  5. Eventually pinpoint the specific rows that diverge. Transfer only those rows.

For 100M identical rows, comparison is one hash comparison (hundreds of bytes). For 100M rows with 100 differences, you exchange ~log₂(100M) × a few hashes ≈ 30 hashes × 3 levels of recursion ≈ ~100 hashes to identify all discrepancies. Orders of magnitude cheaper than naive row-by-row comparison.

Merkle tree root computation
import hashlib

def leaf(d): return hashlib.sha256(b"L" + d).digest()
def node(a, b): return hashlib.sha256(b"N" + a + b).digest()

def merkle_root(leaves):
    if not leaves: return None
    level = [leaf(x) for x in leaves]
    while len(level) > 1:
        if len(level) % 2: level.append(level[-1])  # dup last
        level = [node(level[i], level[i+1]) for i in range(0, len(level), 2)]
    return level[0]

def proof(leaves, idx):
    path = []
    level = [leaf(x) for x in leaves]
    while len(level) > 1:
        if len(level) % 2: level.append(level[-1])
        sibling = idx ^ 1
        path.append((level[sibling], sibling & 1))
        level = [node(level[i], level[i+1]) for i in range(0, len(level), 2)]
        idx //= 2
    return path

# Bitcoin, Git, Dynamo, ZFS — all use variants of this
04

Deep dive — Bitcoin's Merkle root

A Bitcoin block contains thousands of transactions. The block header (80 bytes) includes a single Merkle root hashed from all transactions. Light clients (SPV wallets) don't store every transaction — they trust the root in the header.

To verify a specific transaction is in a block, the light client asks a full node for a Merkle proof: the sibling hashes along the path from the transaction's leaf to the root. With those ~20 hashes, the client recomputes the root and compares against the header. If it matches, the transaction is definitely in the block. If not, the node is lying.

This is the cryptographic foundation of SPV ("Simplified Payment Verification") and by extension why Bitcoin can scale to millions of clients without each having the full blockchain.

Key insight

Merkle trees let you prove membership in a large dataset with a tiny, verifiable proof — without needing the whole dataset. Used everywhere this property matters: blockchains, transparent logs (Certificate Transparency), content-addressable storage (IPFS, git).

05

Real-world

Cassandra / DynamoDB

Anti-entropy repair

Periodic Merkle-based reconciliation between replicas. Essential to Dynamo-style leaderless consistency.

git

Every commit is a Merkle root

Trees + blobs hashed together; root hash = commit SHA. Why git can efficiently verify repository integrity.

Bitcoin / Ethereum

Block headers include Merkle root

Enables SPV wallets. State roots in Ethereum (via Merkle Patricia trees) serve the same role for account state.

Certificate Transparency

Append-only Merkle log

Every SSL cert issuance logged in a tamper-evident Merkle tree. Auditors can verify inclusion without downloading the whole log.

06

Used in problems

Google Drive uses Merkle-like hashing for file-chunk deduplication. Google Docs uses tree hashes for history. Distributed logging uses Merkle trees to audit ingestion without re-reading logs.

Next up