Concept · Databases

Sharding

01

Why this matters

One server, even a massive one, tops out around ~100k writes/sec. Replication buys you read scale but not write scale — the leader is still a bottleneck. Once writes exceed one box, you shard: split the data across N boxes, each owning a slice. Done right, throughput scales linearly with N. Done wrong, you get hot shards, impossible cross-shard queries, and a year-long migration project.

Sharding is the most-misused word in system design. Know the four strategies, their failure modes, and when each is correct.

02

The four strategies

StrategyHow shard is pickedProsCons
Range-basedKey falls in a range (e.g., user_id 1–1M → shard 0)Easy range scans, natural for time seriesHot shards if traffic skews to one range (new users, recent time)
Hash-basedhash(key) mod N picks shardEvenly distributes load by defaultRebalancing on add/remove node moves most data. Range scans cost N queries.
Consistent hashHash key into a ring; shard is the next node clockwiseAdding/removing a node only moves ~1/N of dataMore complex; typically needs virtual nodes for even distribution
Directory-basedLookup table maps key → shardFully flexible; can move any key anywhereThe directory is itself a scaling problem. Used for "celebrity" keys.
Add 1 Node — Keys Moved per Strategy SVG
Going from 10 → 11 nodes — % of keys that move Range Hash mod N Consistent Hash Directory ~9% ~91% ~9% ~0% new range only rehash everyone 1/N of keyspace manual map
03

Picking a shard key

The shard key is the single most consequential choice. It determines:

  • Read distribution — ideally, each query hits one shard. Queries that fan out to all shards scale with N, not down.
  • Write distribution — keys should spread evenly across shards. user_id usually does. created_at does not (all new writes hit one shard).
  • Locality — data that's queried together should live on the same shard. Shard by user_id so all of user 42's posts live together — no cross-shard join to build their feed.

Heuristic: shard by the column that appears in the WHERE clause of your hottest query. For Twitter that's user_id. For an ad-tech event store that's advertiser_id. For a multi-tenant SaaS that's tenant_id.

04

What sharding costs

  • Cross-shard transactions disappear — a database can't give you ACID across shards without 2PC or Paxos (expensive). Design so transactions stay within one shard.
  • Cross-shard joins become application logic — the DB can't join across shards. You fetch from shard A, fetch from shard B, join in your service. Or you denormalize (replicate user data to every post's shard).
  • Rebalancing is painful — adding shard #17 when you have 16 is a migration. Plan from day 1.
  • Hot shards — if one user has 100× normal traffic (celebrity), their shard gets hammered. Solutions: split their data across sub-shards; add a cache layer in front.
The 1000-shard rule

Don't start with 2 shards. Start with at least 1000 virtual shards mapped onto fewer physical nodes. Adding physical nodes becomes a reassignment of virtual shards rather than a global rebalance. Same math Instagram, Pinterest, and many others used.

05

Deep dive — the celebrity shard problem

You shard tweets by user_id. Beyoncé's user_id hashes to shard 7. Beyoncé posts. Ten million followers all query shard 7 for her tweet. Shard 7 is 100× hotter than every other shard. Its cache saturates, its disk saturates, replies to other users on that shard slow down.

Mitigation stack:

  1. Cache the celebrity's data in Redis — for the top 0.1% of keys by traffic, skip the shard entirely. Redis holds the latest 1000 tweets by hot users, refreshed every few seconds.
  2. Sub-shard the celebrity — store Beyoncé's posts across shards 7, 7a, 7b, 7c using hash(user_id + post_id) mod 4. Reads fan out to 4 shards but throughput quadruples.
  3. Fan-out on write — copy each celebrity post into each follower's Redis feed at post time. Read becomes O(1) — one Redis ZRANGE on the follower's feed key. The shard-7 problem disappears because reads never hit it.

Real systems use all three in combination. Twitter famously does (3) for normal users and (1) for celebrities, because fan-out-on-write to 10M followers is too slow.

06

Real-world

Instagram

Thousands of Postgres shards

Shard by user_id. ~1000 logical shards across physical Postgres nodes. Users and their media always colocated.

Vitess (YouTube)

MySQL sharding framework

Now an Apache project. Runs YouTube's MySQL tier at scale. Transparent query routing to shards.

Cassandra / DynamoDB

Consistent hashing built-in

Automatically re-shards as you add nodes. No application effort needed. You just pick a partition key.

MongoDB sharded clusters

Range or hash sharding

You configure shard key; Mongo routes queries. Historically easy to misconfigure (bad shard key = hot chunks).

07

Used in problems

URL shortener shards by short-code hash. News feed shards by user_id. WhatsApp shards messages by (chat_id). Google Drive shards files by user_id. Uber shards driver locations by geohash prefix.

Next up