URL Encoding & Base62

01

Why this matters

"Make me a short URL." Sounds trivial. The shortcode must be: unique, URL-safe, short (6-8 chars), and ideally non-guessable. Plus, you need to mint billions of them at thousands per second without coordination overhead. The encoding choice — Base62 vs Base64 vs hash-based — drives the whole system design. The classic URL-shortener interview question is really about this trade-off.

02

Three approaches

Strategy	How	Pros	Cons
Hash + truncate	SHA256 of URL → take first N chars Base62	Deterministic — same URL always returns the same code	Collisions inevitable; need DB lookup to confirm uniqueness, retry if collision
Counter + Base62	Auto-increment counter, encode in Base62	Guaranteed unique; predictable length progression	Codes are sequential / guessable; counter is a hot key
Pre-generated pool	Generate batch of random codes, store, hand out	Random + collision-free; no counter contention	Need a maintenance job to refill; storage overhead

03

Why Base62

Alphabet: 0-9 a-z A-Z = 62 characters. All URL-safe (no escaping). Each character carries log₂(62) ≈ 5.95 bits.

6 chars = 62⁶ ≈ 56 billion codes
7 chars = 62⁷ ≈ 3.5 trillion codes
8 chars = 62⁸ ≈ 218 trillion codes

Why not Base64? Base64 includes + and / which aren't URL-safe — they'd need percent-encoding. URL-safe Base64 (- and _) is fine but visually ugly. Base62 reads like a license plate.

Why not hex (Base16)? Each char is only 4 bits → twice as long for the same key space. Base62 is the sweet spot of "URL-safe + readable + dense."

Snowflake 64-bit ID generator

package snowflake

import (
    "sync"
    "time"
)

const (
    epoch     int64 = 1577836800000 // 2020-01-01 UTC ms
    machineBits = 10
    seqBits     = 12
    maxSeq      = (1 << seqBits) - 1
)

type Node struct {
    mu       sync.Mutex
    machine  int64 // 0..1023
    lastMs   int64
    seq      int64
}

func (n *Node) NextID() int64 {
    n.mu.Lock(); defer n.mu.Unlock()
    now := time.Now().UnixMilli() - epoch
    if now == n.lastMs {
        n.seq = (n.seq + 1) & maxSeq
        if n.seq == 0 {
            for now <= n.lastMs { now = time.Now().UnixMilli() - epoch }
        }
    } else { n.seq = 0 }
    n.lastMs = now
    return (now << (machineBits+seqBits)) | (n.machine << seqBits) | n.seq
}

// 4096 IDs/ms/machine × 1024 machines = 4B IDs/sec ceiling

04

Deep dive — the counter problem and how to scale it

Naive: a single Postgres counter. SELECT nextval('shortcode_seq'). Encode in Base62. Insert. Done.

Problem: every shortener creation takes a round-trip to one DB. At 10k QPS, the counter is a hot row. Single point of contention.

Range-batch trick — each app server reserves a range of 10,000 IDs from the central counter. It hands out IDs locally without DB calls. When the range runs low, it requests another range. Coordination overhead drops 10,000×. This is how Twitter Snowflake-style ID generators work.

Snowflake IDs — 64 bits split into: timestamp (41 bits) + machine ID (10 bits) + sequence-within-ms (12 bits). Each server generates locally with no coordination. Roughly time-ordered. Trivially Base62-encodable to ~11 chars.

Pre-generated random pool — for non-guessable codes (interview ask: "make codes hard to enumerate"), a worker pre-generates batches of cryptographically-random 7-char Base62 strings, deduplicates against existing codes, stores them as "available." App grabs one per insertion. No contention; no enumeration attack.

Production mix

Bit.ly + TinyURL use predictable counter-based codes for free tier, custom random codes for premium. Snowflake-style timestamp IDs are common when ordering matters (Twitter post IDs, Discord snowflakes).

Twitter Snowflake — 64-bit ID Layout SVG

05

Real-world

Bit.ly

Counter + Base62

Sequential by default; non-sequential / vanity codes for paid plans.

YouTube video IDs

Random 11-char Base64

11 chars × 6 bits = 66 bits of namespace. Non-guessable; videos can be unlisted via secret URL.

Twitter Snowflake

Timestamp + machine + seq

64-bit IDs, base-10 in URLs. Enables cursor pagination by time without an extra column.

Imgur / Reddit

5-7 char Base36 / Base62

Mix of counter-based (older content) and random (newer). Reddit visible IDs use Base36.

06

Used in problems

URL shortener is the canonical problem. Pastebin uses the same encoding tricks. Any "shareable link" feature (Google Drive shares, Notion pages, Stripe payment links) builds on these patterns.

📺

References & Videos

URL Shortening — Base62 Encoding

Gaurav Sen · 12 min

Design a URL Shortener

ByteByteGo · 8 min

URL Shortener (Base62 section)

AlgoMaster

URL Shortener Design

GeeksforGeeks

Why this matters

Three approaches

Why Base62

Deep dive — the counter problem and how to scale it

Real-world

Counter + Base62

Random 11-char Base64

Timestamp + machine + seq

5-7 char Base36 / Base62

Used in problems

References & Videos

Indexing

Consistent Hashing

URL Shortener