"Make me a short URL." Sounds trivial. The shortcode must be: unique, URL-safe, short (6-8 chars), and ideally non-guessable. Plus, you need to mint billions of them at thousands per second without coordination overhead. The encoding choice — Base62 vs Base64 vs hash-based — drives the whole system design. The classic URL-shortener interview question is really about this trade-off.
02
Three approaches
Strategy
How
Pros
Cons
Hash + truncate
SHA256 of URL → take first N chars Base62
Deterministic — same URL always returns the same code
Collisions inevitable; need DB lookup to confirm uniqueness, retry if collision
Counter + Base62
Auto-increment counter, encode in Base62
Guaranteed unique; predictable length progression
Codes are sequential / guessable; counter is a hot key
Pre-generated pool
Generate batch of random codes, store, hand out
Random + collision-free; no counter contention
Need a maintenance job to refill; storage overhead
03
Why Base62
Alphabet: 0-9 a-z A-Z = 62 characters. All URL-safe (no escaping). Each character carries log₂(62) ≈ 5.95 bits.
6 chars = 62⁶ ≈ 56 billion codes
7 chars = 62⁷ ≈ 3.5 trillion codes
8 chars = 62⁸ ≈ 218 trillion codes
Why not Base64? Base64 includes + and / which aren't URL-safe — they'd need percent-encoding. URL-safe Base64 (- and _) is fine but visually ugly. Base62 reads like a license plate.
Why not hex (Base16)? Each char is only 4 bits → twice as long for the same key space. Base62 is the sweet spot of "URL-safe + readable + dense."
Snowflake 64-bit ID generator
package snowflake
import (
"sync"
"time"
)
const (
epoch int64 = 1577836800000 // 2020-01-01 UTC ms
machineBits = 10
seqBits = 12
maxSeq = (1 << seqBits) - 1
)
type Node struct {
mu sync.Mutex
machine int64 // 0..1023
lastMs int64
seq int64
}
func (n *Node) NextID() int64 {
n.mu.Lock(); defer n.mu.Unlock()
now := time.Now().UnixMilli() - epoch
if now == n.lastMs {
n.seq = (n.seq + 1) & maxSeq
if n.seq == 0 {
for now <= n.lastMs { now = time.Now().UnixMilli() - epoch }
}
} else { n.seq = 0 }
n.lastMs = now
return (now << (machineBits+seqBits)) | (n.machine << seqBits) | n.seq
}
// 4096 IDs/ms/machine × 1024 machines = 4B IDs/sec ceiling
04
Deep dive — the counter problem and how to scale it
Naive: a single Postgres counter. SELECT nextval('shortcode_seq'). Encode in Base62. Insert. Done.
Problem: every shortener creation takes a round-trip to one DB. At 10k QPS, the counter is a hot row. Single point of contention.
Range-batch trick — each app server reserves a range of 10,000 IDs from the central counter. It hands out IDs locally without DB calls. When the range runs low, it requests another range. Coordination overhead drops 10,000×. This is how Twitter Snowflake-style ID generators work.
Snowflake IDs — 64 bits split into: timestamp (41 bits) + machine ID (10 bits) + sequence-within-ms (12 bits). Each server generates locally with no coordination. Roughly time-ordered. Trivially Base62-encodable to ~11 chars.
Pre-generated random pool — for non-guessable codes (interview ask: "make codes hard to enumerate"), a worker pre-generates batches of cryptographically-random 7-char Base62 strings, deduplicates against existing codes, stores them as "available." App grabs one per insertion. No contention; no enumeration attack.
Production mix
Bit.ly + TinyURL use predictable counter-based codes for free tier, custom random codes for premium. Snowflake-style timestamp IDs are common when ordering matters (Twitter post IDs, Discord snowflakes).
Twitter Snowflake — 64-bit ID LayoutSVG
05
Real-world
Bit.ly
Counter + Base62
Sequential by default; non-sequential / vanity codes for paid plans.
YouTube video IDs
Random 11-char Base64
11 chars × 6 bits = 66 bits of namespace. Non-guessable; videos can be unlisted via secret URL.
Twitter Snowflake
Timestamp + machine + seq
64-bit IDs, base-10 in URLs. Enables cursor pagination by time without an extra column.
Imgur / Reddit
5-7 char Base36 / Base62
Mix of counter-based (older content) and random (newer). Reddit visible IDs use Base36.
06
Used in problems
URL shortener is the canonical problem. Pastebin uses the same encoding tricks. Any "shareable link" feature (Google Drive shares, Notion pages, Stripe payment links) builds on these patterns.