Unique ID Generator

Generate globally unique, k-sortable, compact identifiers at over 1 million IDs per second across N machines with zero coordination. The hard parts: a bit-layout balancing timestamp, machine, and sequence that stays unique without a central authority, handling clock skew from NTP corrections that can break monotonicity, and keeping IDs compact enough (~11 chars base62) for URL-friendly use while encoding enough entropy to avoid collisions across 1,024 machines.

Core: Snowflake-style Distributed IDs1M+ IDs/sec64-bit / ~11 chars base62Sub-ms latency per IDZero coordination

Requirements

Functional

Generate globally unique 64-bit IDs from any machine, no central bottleneck
IDs must be k-sortable (roughly ordered by creation time)
Support batch generation: request N IDs in a single call
Parse an ID back into its components: timestamp, machine_id, sequence
Encode IDs as compact base62 strings (~11 chars) for URL use

Non-Functional

Sub-millisecond latency per ID generation (local operation)
Throughput: 4,096 IDs/ms per machine, 1M+ globally
No single point of failure for ID generation itself
IDs must not leak sensitive info (exact server count, internal topology)
Monotonically increasing within a single machine
Survive NTP clock adjustments without producing duplicates

Scale Estimation

Each machine can produce 4,096 IDs per millisecond (12-bit sequence). With 1,024 machines (10-bit machine_id), the system yields 4,096 x 1,024 x 1,000 = ~4.2 billion IDs/sec theoretical max. In practice, 1M+/sec sustained is the target.

A 41-bit timestamp in milliseconds covers 2^41 ms = ~69.7 years from a custom epoch. IDs are 64 bits = 8 bytes; base62-encoded they become ~11 characters.

64 bits

ID size (8 bytes)

~11 chars

base62-encoded length

41 bits

timestamp (ms, ~69 years)

10 bits

machine_id (1,024 machines)

12 bits

sequence (4,096/ms/machine)

4.2B/sec

theoretical max throughput

< 1 ms

latency per ID (no network)

~69.7 years

before timestamp rollover

API Design

POST/ids/generate?count=N

Generate N unique IDs (default 1, max 1000). Returns array of 64-bit IDs as both integer and base62 string. Each ID encodes timestamp + machine_id + sequence.

GET/ids/parse/{id}

Decode a 64-bit ID (integer or base62) into its components: { timestamp, machine_id, sequence, created_at_utc }. Useful for debugging and audit.

GET/ids/health

Returns generator status: current machine_id, last timestamp, sequence position, clock health (detects backward jumps).

High-Level Architecture

The ID service is stateless per-request — each instance holds only its machine_id and a local clock + sequence counter. No database, no network call on the hot path.

ID Service instances: one per application host (or sidecar). Generates IDs locally with zero network overhead.
Zookeeper / etcd: assigns unique machine_ids on startup. Only needed at boot, not on the hot path.
Alternative — Ticket Servers: dual-MySQL with odd/even auto-increment (Flickr's design). Centralized but simpler; acceptable at lower scale.

Architecture — Snowflake ID GenerationSVG

Request Flow — Step Through

Application · needs unique ID→ID Generator · local, stateless→System Clock · ms timestamp→Sequence Counter · 12-bit, per-ms→Bit Composer · sign|ts|machine|seq→Base62 Encoder · ~11 chars output→Consumer · URL, DB key, etc.

Click Next Step to walk through the request flow.

Deep Dive — Snowflake Bit Layout & Alternatives

The Core Question

How do you pack enough information into 64 bits to guarantee global uniqueness, preserve time-ordering, and avoid any coordination between machines on the hot path?

Snowflake (Twitter, 2010). The canonical design: 1 sign bit + 41-bit timestamp (ms since custom epoch, ~69 years) + 10-bit machine_id (1,024 machines) + 12-bit sequence (4,096 IDs per ms per machine). Total: 64 bits. IDs are k-sortable because the timestamp occupies the most-significant bits. Each machine generates IDs independently; the machine_id guarantees no collisions across hosts.

ULID (Universally Unique Lexicographically Sortable Identifier). 128 bits: 48-bit timestamp (ms, ~8,919 years) + 80-bit cryptographically random tail. No machine registration needed — randomness provides uniqueness. Tradeoff: slight collision probability (birthday problem at ~2^40 IDs/ms, negligible in practice). Encodes to 26 chars Crockford base32.

UUID v7 (RFC 9562). 128-bit, timestamp-first layout for B-tree index friendliness. Replaces UUID v4's fully-random approach with time-ordered prefix. Wider than Snowflake (128 vs 64 bits) but standardized and needs no coordinator.

Ticket Servers (Flickr, 2010). Two MySQL instances: one auto-increments by 2 starting at 1 (odd), the other starting at 2 (even). App round-robins between them. Simple, no clock dependency, but centralized — each MySQL is a SPOF risk. Flickr ran this in production for years.

Clock skew handling. Snowflake relies on the local clock. If NTP corrects the clock backward, the generator would produce IDs with a past timestamp — risking duplicates. Defense: if current_ms < last_ms, either refuse to generate (throw error) or spin-wait until the clock catches up. Critical invariant: never generate an ID with a timestamp older than the last generated ID.

Sequence — ID Generation FlowMermaid.js

sequenceDiagram participant App as Application participant Gen as ID Generator participant Clk as System Clock App->>Gen: generate_id() Gen->>Clk: current_time_ms() Clk-->>Gen: timestamp alt same ms as last call Gen->>Gen: sequence++ alt sequence > 4095 Gen->>Gen: spin-wait for next ms Gen->>Clk: current_time_ms() Clk-->>Gen: new timestamp Gen->>Gen: sequence = 0 end else new ms Gen->>Gen: sequence = 0 end Gen->>Gen: compose: sign|timestamp|machine|seq Gen-->>App: 64-bit ID App->>App: base62_encode(id) for URL use

Anti-patterns

Auto-increment INT from a single database

Single DB becomes a write bottleneck and SPOF. IDs are sequential and predictable — attackers can enumerate all records. Sharding breaks the sequence.

Better: Distributed generator (Snowflake) where each machine produces IDs independently with no central coordination.

Random UUID v4 as primary key in B-tree indexed tables

Fully random UUIDs fragment B-tree indexes — every insert goes to a random leaf page, causing excessive page splits and write amplification. 128 bits is also twice as wide as needed, wasting index space.

Better: Time-ordered IDs (Snowflake, UUID v7) that append to the end of the B-tree. 64 bits where possible.

Timestamp-only ID (no machine or sequence component)

Two machines generating an ID in the same millisecond produce identical values. Even on one machine, burst traffic within 1 ms causes collisions.

Better: Include machine_id + per-ms sequence counter to disambiguate within the same timestamp.

Key Design Decisions & Tradeoffs

Snowflake (64-bit)

Timestamp + machine_id + sequence in 64 bits

K-sortable, compact, 4,096 IDs/ms/machine. Requires a machine_id registry (Zookeeper/etcd) at boot time. Clock-dependent — NTP backward jump must be handled. The gold standard for high-throughput systems (Twitter, Discord, Instagram).

ULID / UUID v7 (128-bit)

Timestamp + random tail, no coordinator

No machine registration needed — randomness provides uniqueness. Wider (128 bits, 26 chars). Slight theoretical collision risk under extreme load within same ms. Better for systems that cannot run Zookeeper.

Local generation (no network)

Each machine generates IDs independently

Sub-microsecond latency; no SPOF on the hot path. Machine_id assigned once at startup. The Snowflake/ULID approach. All production systems use this.

Centralized ticket server

Dual-MySQL auto-increment (Flickr-style)

Simpler to reason about; strictly ordered. But every ID requires a network round-trip to the DB. Two servers with odd/even offset provide redundancy but not horizontal scale. Works at moderate throughput only.

Custom epoch

Start timestamp from a recent date, not Unix epoch

41-bit ms from Jan 1, 2020 lasts until ~2089. Starting from Unix epoch (1970) would waste 50 years of bits, reducing usable lifespan to ~19 years from now.

Unix epoch

Standard milliseconds since 1970

Universal, no custom epoch to document. But wastes precious timestamp bits on the past. Only viable with wider formats (ULID's 48 bits = 8,919 years).

What Can Go Wrong

Clock

NTP clock jump backward

NTP daemon corrects a drifted clock by stepping it backward. Snowflake generator now sees a timestamp older than the last ID it issued — generating would produce duplicate or out-of-order IDs.

Mitigation: refuse to generate IDs until clock catches up (spin-wait or error). Log alert for operator. Use NTP slew mode (gradual correction) instead of step mode.

Limit

Machine_id exhaustion

10-bit machine_id supports only 1,024 unique machines. In large Kubernetes clusters with frequent pod churn, IDs can be exhausted if not recycled properly.

Mitigation: use ephemeral Zookeeper nodes so machine_ids are reclaimed on pod death. Alternatively, hash pod IP + port into 10 bits (risk: collisions).

Down

Zookeeper unavailable at boot

New ID service instance starts but cannot reach Zookeeper to get its machine_id. Without a machine_id, it cannot generate unique IDs.

Mitigation: cache last assigned machine_id to local disk. On boot, use cached value if ZK is unreachable. Alternatively, derive machine_id from stable host identity (MAC address lower bits).

Burst

Sequence overflow within single millisecond

Extreme burst: one machine receives more than 4,096 ID requests in a single millisecond. The 12-bit sequence counter overflows.

Mitigation: spin-wait (busy loop) until the next millisecond, then reset sequence to 0. Adds at most 1 ms latency. Under sustained overload, add more machines to spread load.

Epoch

Timestamp bit exhaustion (year ~2089)

41-bit timestamp from a 2020 custom epoch rolls over around 2089. After rollover, IDs wrap to timestamp 0 and collide with historical IDs.

Mitigation: plan migration to wider format (128-bit) before rollover. Monitor remaining timestamp headroom. 69 years is enough for most systems — plan succession, not permanence.

Interview Tips

Start with requirements, not solutions. Ask: how many IDs/sec? Must they be sortable? How compact? 64-bit or 128-bit acceptable? This shapes the entire design.
Draw the bit layout. Literally sketch 1 + 41 + 10 + 12 = 64 on the whiteboard. Interviewers love seeing the tradeoff between timestamp range, machine count, and per-ms throughput.
Address clock skew proactively. Mention NTP backward jump before being asked. The defense (refuse or wait) shows you understand the real-world failure mode that trips up Snowflake.
Know the alternatives. Snowflake vs ULID vs UUID v7 vs Ticket Server. Each has a sweet spot. Naming all four and their tradeoffs demonstrates breadth.
Mention base62 encoding. 64-bit integer to ~11-char URL-safe string. Shows you think about the consumer of IDs, not just the generator.

Evolution

Auto-increment database column

Single MySQL/Postgres with BIGINT AUTO_INCREMENT. Simple, strictly ordered. Works to ~10K writes/sec on one machine. Cannot scale horizontally without splitting ranges.

UUID v4 (random)

No coordination, no central DB. But 128 bits (too wide), not sortable, fragments B-tree indexes. Good enough for low-write systems that don't need ordering.

Ticket servers (Flickr, 2010)

Dual-MySQL with odd/even auto-increment offsets. Round-robin between them for redundancy. Centralized but practical. Flickr used this for all photo IDs in production.

Snowflake (Twitter, 2010)

The breakthrough: 64-bit, k-sortable, fully distributed. Each machine generates independently. Became the industry standard. Used by Twitter, Discord, Instagram (with variations).

ULID / UUID v7 (modern, no coordinator)

Timestamp-first + random tail. No Zookeeper needed. UUID v7 standardized in RFC 9562 (2024). Best choice for new systems that want sortability without infrastructure overhead.

📺

References & Videos

Generating Unique IDs in Distributed Systems

ByteByteGo · 8 min

Snowflake ID — Distributed ID Generator

Arpit Bhayani · 20 min

Announcing Snowflake

Twitter Engineering

Design a Unique ID Generator

AlgoMaster

Next up

PROBLEM

URL Encoding / Base62

Compact encoding used for ID-to-string conversion

Read →

PROBLEM

Distributed Locking

Coordination primitives that ID generators avoid on hot path

Read →

Unique ID Generator

Requirements

Scale Estimation

API Design

High-Level Architecture

Deep Dive — Snowflake Bit Layout & Alternatives

Anti-patterns

Key Design Decisions & Tradeoffs

Timestamp + machine_id + sequence in 64 bits

Timestamp + random tail, no coordinator

Each machine generates IDs independently

Dual-MySQL auto-increment (Flickr-style)

Start timestamp from a recent date, not Unix epoch

Standard milliseconds since 1970

What Can Go Wrong

Interview Tips

Similar Problems

URL Encoding / Base62

Distributed Locking

URL Shortener

Count Active Users

Leaderboard

Evolution

Auto-increment database column

UUID v4 (random)

Ticket servers (Flickr, 2010)

Snowflake (Twitter, 2010)

ULID / UUID v7 (modern, no coordinator)

References & Videos

URL Encoding / Base62

Distributed Locking