06
Deep Dive — Key Generation & Hotness Tiers
Two Core Concepts
This problem has two technically interesting cores: how you generate unique short codes safely across distributed servers, and how you detect and serve viral links without touching your database. Everything else is standard web infrastructure.
Sequence — Cache Hit vs Cache Miss
Mermaid.js
sequenceDiagram
participant C as Client
participant CDN as CDN Edge
participant API as API Server
participant R as Redis
participant DB as PostgreSQL
C->>CDN: GET /x9kZ3mP
alt Tier 2/3 — CDN hit
CDN-->>C: 302 redirect (< 5ms globally)
else CDN miss
CDN->>API: Forward request
API->>R: GET x9kZ3mP
alt Redis hit
R-->>API: long_url (< 1ms)
API-->>C: 302 redirect (~10ms total)
else Redis miss
API->>DB: SELECT long_url WHERE short_code = 'x9kZ3mP'
DB-->>API: long_url (~5ms)
API->>R: SET x9kZ3mP TTL=86400 (async)
API-->>C: 302 redirect (~20ms total)
end
end
Key Generation — The Pre-Generated Pool
Three approaches exist for generating short codes. Random on-demand requires a DB read on every write to check for collisions. MD5 hashing is deterministic (deduplication for free) but needs salt-and-rehash on collision. The pre-generated key pool wins because all uniqueness checking happens offline in the background — the write path just claims a key atomically.
The critical SQL clause is FOR UPDATE SKIP LOCKED. When 10 API servers simultaneously claim keys, each one locks a row and any server that finds its target already locked simply skips to the next available key. No race condition. No collision. No coordination overhead. The entire operation is a single atomic transaction.
BEGIN;
SELECT short_code FROM keys_available
LIMIT 1
FOR UPDATE SKIP LOCKED;
DELETE FROM keys_available WHERE short_code = 'x9kZ3mP';
INSERT INTO keys_used (short_code, long_url, user_id)
VALUES ('x9kZ3mP', 'https://amazon.com/...', 12345);
COMMIT;
Hotness Monitor — Three-Tier Traffic Detection
Every click increments a Redis counter with minute-level granularity: INCR clicks:x9kZ3mP:2024-01-15-14:37. The hotness monitor sums the last 5 minute-buckets every 60 seconds to get a stable clicks/minute figure for each active link. Links are then sorted into three tiers based on traffic thresholds.
Tier 1 (100–999/min) — Redis cache with TTL refreshed on access. Tier 2 (1,000–9,999/min) — Redis plus CDN edge globally, TTL renewed every 60s by the monitor. Tier 3 (10,000+/min) — pre-computed static redirect served at CDN with no application logic involved at all. Demotion is passive — when traffic drops, the monitor stops renewing the CDN TTL, and it expires naturally within 5 minutes.
Why 7 Characters? The Math.
7 characters from Base62 (a–z, A–Z, 0–9) gives 62^7 = 3.5 trillion unique combinations. At 1 million new links per day, that's 9,500 years of runway. 6 characters would give 56 billion combinations — enough mathematically but without safety margin. 8 characters is unnecessary. 7 is the sweet spot derived from requirements, not guesswork.
The avalanche effect in Base62-encoded hashes ensures inputs are uniformly distributed across the entire 3.5 trillion slot space. Even after 2 billion stored URLs, you're using only 0.057% of available space — collision probability per new insert is approximately 0.029%.