Generate globally unique, k-sortable, compact identifiers at over 1 million IDs per second across N machines with zero coordination.
The hard parts: a bit-layout balancing timestamp, machine, and sequence that stays unique without a central authority,
handling clock skew from NTP corrections that can break monotonicity,
and keeping IDs compact enough (~11 chars base62) for URL-friendly use while encoding enough entropy to avoid collisions across 1,024 machines.
Generate globally unique 64-bit IDs from any machine, no central bottleneck
IDs must be k-sortable (roughly ordered by creation time)
Support batch generation: request N IDs in a single call
Parse an ID back into its components: timestamp, machine_id, sequence
Encode IDs as compact base62 strings (~11 chars) for URL use
Non-Functional
Sub-millisecond latency per ID generation (local operation)
Throughput: 4,096 IDs/ms per machine, 1M+ globally
No single point of failure for ID generation itself
IDs must not leak sensitive info (exact server count, internal topology)
Monotonically increasing within a single machine
Survive NTP clock adjustments without producing duplicates
03
Scale Estimation
Each machine can produce 4,096 IDs per millisecond (12-bit sequence). With 1,024 machines (10-bit machine_id), the system yields 4,096 x 1,024 x 1,000 = ~4.2 billion IDs/sec theoretical max. In practice, 1M+/sec sustained is the target.
A 41-bit timestamp in milliseconds covers 2^41 ms = ~69.7 years from a custom epoch. IDs are 64 bits = 8 bytes; base62-encoded they become ~11 characters.
64 bits
ID size (8 bytes)
~11 chars
base62-encoded length
41 bits
timestamp (ms, ~69 years)
10 bits
machine_id (1,024 machines)
12 bits
sequence (4,096/ms/machine)
4.2B/sec
theoretical max throughput
< 1 ms
latency per ID (no network)
~69.7 years
before timestamp rollover
04
API Design
POST/ids/generate?count=N
Generate N unique IDs (default 1, max 1000). Returns array of 64-bit IDs as both integer and base62 string. Each ID encodes timestamp + machine_id + sequence.
GET/ids/parse/{id}
Decode a 64-bit ID (integer or base62) into its components: { timestamp, machine_id, sequence, created_at_utc }. Useful for debugging and audit.
GET/ids/health
Returns generator status: current machine_id, last timestamp, sequence position, clock health (detects backward jumps).
05
High-Level Architecture
The ID service is stateless per-request — each instance holds only its machine_id and a local clock + sequence counter. No database, no network call on the hot path.
ID Service instances: one per application host (or sidecar). Generates IDs locally with zero network overhead.
Zookeeper / etcd: assigns unique machine_ids on startup. Only needed at boot, not on the hot path.
Alternative — Ticket Servers: dual-MySQL with odd/even auto-increment (Flickr's design). Centralized but simpler; acceptable at lower scale.
Architecture — Snowflake ID GenerationSVG
Request Flow — Step Through
Application · needs unique ID→ID Generator · local, stateless→System Clock · ms timestamp→Sequence Counter · 12-bit, per-ms→Bit Composer · sign|ts|machine|seq→Base62 Encoder · ~11 chars output→Consumer · URL, DB key, etc.
Click Next Step to walk through the request flow.
06
Deep Dive — Snowflake Bit Layout & Alternatives
The Core Question
How do you pack enough information into 64 bits to guarantee global uniqueness, preserve time-ordering, and avoid any coordination between machines on the hot path?
Snowflake (Twitter, 2010). The canonical design: 1 sign bit + 41-bit timestamp (ms since custom epoch, ~69 years) + 10-bit machine_id (1,024 machines) + 12-bit sequence (4,096 IDs per ms per machine). Total: 64 bits. IDs are k-sortable because the timestamp occupies the most-significant bits. Each machine generates IDs independently; the machine_id guarantees no collisions across hosts.
ULID (Universally Unique Lexicographically Sortable Identifier). 128 bits: 48-bit timestamp (ms, ~8,919 years) + 80-bit cryptographically random tail. No machine registration needed — randomness provides uniqueness. Tradeoff: slight collision probability (birthday problem at ~2^40 IDs/ms, negligible in practice). Encodes to 26 chars Crockford base32.
UUID v7 (RFC 9562). 128-bit, timestamp-first layout for B-tree index friendliness. Replaces UUID v4's fully-random approach with time-ordered prefix. Wider than Snowflake (128 vs 64 bits) but standardized and needs no coordinator.
Ticket Servers (Flickr, 2010). Two MySQL instances: one auto-increments by 2 starting at 1 (odd), the other starting at 2 (even). App round-robins between them. Simple, no clock dependency, but centralized — each MySQL is a SPOF risk. Flickr ran this in production for years.
Clock skew handling. Snowflake relies on the local clock. If NTP corrects the clock backward, the generator would produce IDs with a past timestamp — risking duplicates. Defense: if current_ms < last_ms, either refuse to generate (throw error) or spin-wait until the clock catches up. Critical invariant: never generate an ID with a timestamp older than the last generated ID.
Sequence — ID Generation FlowMermaid.js
sequenceDiagram
participant App as Application
participant Gen as ID Generator
participant Clk as System Clock
App->>Gen: generate_id()
Gen->>Clk: current_time_ms()
Clk-->>Gen: timestamp
alt same ms as last call
Gen->>Gen: sequence++
alt sequence > 4095
Gen->>Gen: spin-wait for next ms
Gen->>Clk: current_time_ms()
Clk-->>Gen: new timestamp
Gen->>Gen: sequence = 0
end
else new ms
Gen->>Gen: sequence = 0
end
Gen->>Gen: compose: sign|timestamp|machine|seq
Gen-->>App: 64-bit ID
App->>App: base62_encode(id) for URL use
07
Anti-patterns
X
Auto-increment INT from a single database
Single DB becomes a write bottleneck and SPOF. IDs are sequential and predictable — attackers can enumerate all records. Sharding breaks the sequence.
Better: Distributed generator (Snowflake) where each machine produces IDs independently with no central coordination.
X
Random UUID v4 as primary key in B-tree indexed tables
Fully random UUIDs fragment B-tree indexes — every insert goes to a random leaf page, causing excessive page splits and write amplification. 128 bits is also twice as wide as needed, wasting index space.
Better: Time-ordered IDs (Snowflake, UUID v7) that append to the end of the B-tree. 64 bits where possible.
X
Timestamp-only ID (no machine or sequence component)
Two machines generating an ID in the same millisecond produce identical values. Even on one machine, burst traffic within 1 ms causes collisions.
Better: Include machine_id + per-ms sequence counter to disambiguate within the same timestamp.
08
Key Design Decisions & Tradeoffs
Snowflake (64-bit)
Timestamp + machine_id + sequence in 64 bits
K-sortable, compact, 4,096 IDs/ms/machine. Requires a machine_id registry (Zookeeper/etcd) at boot time. Clock-dependent — NTP backward jump must be handled. The gold standard for high-throughput systems (Twitter, Discord, Instagram).
ULID / UUID v7 (128-bit)
Timestamp + random tail, no coordinator
No machine registration needed — randomness provides uniqueness. Wider (128 bits, 26 chars). Slight theoretical collision risk under extreme load within same ms. Better for systems that cannot run Zookeeper.
Local generation (no network)
Each machine generates IDs independently
Sub-microsecond latency; no SPOF on the hot path. Machine_id assigned once at startup. The Snowflake/ULID approach. All production systems use this.
Centralized ticket server
Dual-MySQL auto-increment (Flickr-style)
Simpler to reason about; strictly ordered. But every ID requires a network round-trip to the DB. Two servers with odd/even offset provide redundancy but not horizontal scale. Works at moderate throughput only.
Custom epoch
Start timestamp from a recent date, not Unix epoch
41-bit ms from Jan 1, 2020 lasts until ~2089. Starting from Unix epoch (1970) would waste 50 years of bits, reducing usable lifespan to ~19 years from now.
Unix epoch
Standard milliseconds since 1970
Universal, no custom epoch to document. But wastes precious timestamp bits on the past. Only viable with wider formats (ULID's 48 bits = 8,919 years).
09
What Can Go Wrong
Clock
NTP clock jump backward
NTP daemon corrects a drifted clock by stepping it backward. Snowflake generator now sees a timestamp older than the last ID it issued — generating would produce duplicate or out-of-order IDs.
Mitigation: refuse to generate IDs until clock catches up (spin-wait or error). Log alert for operator. Use NTP slew mode (gradual correction) instead of step mode.
Limit
Machine_id exhaustion
10-bit machine_id supports only 1,024 unique machines. In large Kubernetes clusters with frequent pod churn, IDs can be exhausted if not recycled properly.
Mitigation: use ephemeral Zookeeper nodes so machine_ids are reclaimed on pod death. Alternatively, hash pod IP + port into 10 bits (risk: collisions).
Down
Zookeeper unavailable at boot
New ID service instance starts but cannot reach Zookeeper to get its machine_id. Without a machine_id, it cannot generate unique IDs.
Mitigation: cache last assigned machine_id to local disk. On boot, use cached value if ZK is unreachable. Alternatively, derive machine_id from stable host identity (MAC address lower bits).
Burst
Sequence overflow within single millisecond
Extreme burst: one machine receives more than 4,096 ID requests in a single millisecond. The 12-bit sequence counter overflows.
Mitigation: spin-wait (busy loop) until the next millisecond, then reset sequence to 0. Adds at most 1 ms latency. Under sustained overload, add more machines to spread load.
Epoch
Timestamp bit exhaustion (year ~2089)
41-bit timestamp from a 2020 custom epoch rolls over around 2089. After rollover, IDs wrap to timestamp 0 and collide with historical IDs.
Mitigation: plan migration to wider format (128-bit) before rollover. Monitor remaining timestamp headroom. 69 years is enough for most systems — plan succession, not permanence.
10
Interview Tips
Start with requirements, not solutions. Ask: how many IDs/sec? Must they be sortable? How compact? 64-bit or 128-bit acceptable? This shapes the entire design.
Draw the bit layout. Literally sketch 1 + 41 + 10 + 12 = 64 on the whiteboard. Interviewers love seeing the tradeoff between timestamp range, machine count, and per-ms throughput.
Address clock skew proactively. Mention NTP backward jump before being asked. The defense (refuse or wait) shows you understand the real-world failure mode that trips up Snowflake.
Know the alternatives. Snowflake vs ULID vs UUID v7 vs Ticket Server. Each has a sweet spot. Naming all four and their tradeoffs demonstrates breadth.
Mention base62 encoding. 64-bit integer to ~11-char URL-safe string. Shows you think about the consumer of IDs, not just the generator.
Single MySQL/Postgres with BIGINT AUTO_INCREMENT. Simple, strictly ordered. Works to ~10K writes/sec on one machine. Cannot scale horizontally without splitting ranges.
2
UUID v4 (random)
No coordination, no central DB. But 128 bits (too wide), not sortable, fragments B-tree indexes. Good enough for low-write systems that don't need ordering.
3
Ticket servers (Flickr, 2010)
Dual-MySQL with odd/even auto-increment offsets. Round-robin between them for redundancy. Centralized but practical. Flickr used this for all photo IDs in production.
4
Snowflake (Twitter, 2010)
The breakthrough: 64-bit, k-sortable, fully distributed. Each machine generates independently. Became the industry standard. Used by Twitter, Discord, Instagram (with variations).
5
ULID / UUID v7 (modern, no coordinator)
Timestamp-first + random tail. No Zookeeper needed. UUID v7 standardized in RFC 9562 (2024). Best choice for new systems that want sortability without infrastructure overhead.