06
Deep Dive — Real-Time Ranking at Scale
The entire leaderboard design rests on one data structure: the skip list inside Redis ZSET. Let's understand exactly why it's perfect for ranking, how it achieves O(log N) on all operations, and what happens when it's not enough.
Why Ranking Is a Data Structure Problem
You have 100M scores. You need two queries: top-K and "what's my rank?" A flat array needs O(n log n) sort for top-K and O(n) scan for rank. A B-Tree (database index) can do top-K efficiently but rank still requires a range scan — the B-Tree doesn't store subtree sizes. The answer: a data structure that maintains sorted order incrementally with position tracking.
The Skip List — How Redis ZSET Works
A skip list is a layered linked list. Level 0 is a sorted linked list of all elements. Higher levels are "express lanes" that skip over multiple elements. Think of it like a subway — local train (level 0), express (level 1), super-express (level 2). Search starts at the highest level and drops down.
sequenceDiagram
participant C as Game Client
participant GW as API Gateway
participant WS as Write Service
participant R as Redis ZSET
participant PG as PostgreSQL
participant K as Kafka
participant CW as Consumer Worker
C->>GW: POST /scores (score=150)
GW->>WS: Route (authenticated)
WS->>WS: Check idempotency key
WS->>PG: INSERT score_event
PG-->>WS: OK (durable)
WS->>R: ZINCRBY lb:global 150 user_a
R-->>WS: new_score = 2450
WS->>K: Publish score.updated
WS-->>C: 200 { new_score: 2450 }
K->>CW: Consume event
CW->>R: ZINCRBY lb:weekly 150 user_a
CW->>R: ZINCRBY lb:daily 150 user_a
The Key Augmentation — Span Tracking
Each forward pointer in the skip list stores a span — the number of elements it skips over. To find rank, traverse from HEAD at the highest level, accumulating spans as you go. For 100M elements with 32 levels, that's ~27 hops. Sub-microsecond on modern hardware.
ZSET = Skip List + Hash Table
The skip list gives O(log N) sorted operations. The hash table maps member → (score, node pointer) for O(1) score lookup. Together: ZADD O(log N), ZSCORE O(1), ZREVRANK O(log N), ZREVRANGE O(log N + K).
The Tiebreaker Trick — Score Packing
Redis sorts by one dimension. For tiebreaking, pack score + inverted timestamp into one float64:
composite = score × 10⁷ + (MAX_TIMESTAMP − actual_timestamp)
// Alice scores 2450 at t=1712345600 → 24502390099200
// Bob scores 2450 at t=1712345700 → 24502390099100
// Alice's composite is higher → Alice ranks above Bob ✓
// Extract raw score: floor(composite / 10⁷) = 2450
IEEE 754 doubles have 52 bits of mantissa — integer precision up to 9 × 10¹⁵. Our composite uses at most 15 digits. Safely within range.
When One ZSET Isn't Enough — Sharding Strategies
A. Score-Range Sharding
Shard by score range. Rank = ZCARD of higher shards + rank within own shard. Fast rank queries. Uneven distribution, cross-shard migration on score change.
B. Hash-Based Sharding
Hash users across shards. Top-K = query all shards, merge. Rank = ZCOUNT on all shards. Even distribution. Every rank query fans out to all shards.
C. Tiered Approach (Production Answer)
Tier 1: ZSET of top 1M users (~110 MB). Exact ranking, fast, unsharded.
Tier 2: Score histogram (Fenwick tree) for all users. "How many users have score > X?" as a prefix sum in O(log B).
Result: exact rank for top players, approximate percentile for the rest. This is how Xbox Live and Steam work at scale.
The Decision Tree
| Users | Ranking Engine | Precision |
| < 1M | PostgreSQL with B-Tree index | Exact (COUNT query fast enough) |
| 1M – 100M | Single Redis ZSET | Exact for all users |
| 100M – 1B | Tiered: ZSET + histogram | Exact top-1M, percentile rest |
| 1B+ | Streaming top-K (Space-Saving) | Exact top-1K, approximate rest |