Designing a Distributed ID Generator: Snowflake Algorithm, ULID & UUID v7 at Scale
Every distributed system needs globally unique identifiers — for database primary keys, request tracing, event IDs, and distributed transactions. The choice of ID generation strategy affects database index performance, sortability, URL readability, and system scalability. This guide covers every approach you'll encounter in a system design interview or production deployment.
TL;DR — Quick Decision Guide
"Use Snowflake for time-ordered, high-throughput IDs (Twitter, Discord, Uber scale). Use ULID or UUID v7 for sortable, URL-safe IDs without coordination. Use DB auto-increment only for single-node systems. Never use UUID v4 as a primary key in a B-tree index — random insertion destroys page locality."
Table of Contents
- Requirements for a Distributed ID System
- Why Not Database Auto-Increment?
- UUID v4 — The Naive Approach & Its Pitfalls
- Twitter Snowflake — 64-Bit Time-Ordered IDs
- Clock Skew & Sequence Overflow Handling
- Machine ID Registry with ZooKeeper
- ULID — Lexicographically Sortable IDs
- UUID v7 — The 2026 Standard
- Comparison Table & Decision Framework
- Impact on Database Index Performance
- Production Deployment & Conclusion
1. Requirements for a Distributed ID System
- Globally unique: No two IDs ever collide across all nodes, datacenters, and time
- High throughput: Generate millions of IDs per second without coordination bottlenecks
- No single point of failure: ID generation cannot depend on a centralized service
- Sortable (optional but valuable): IDs generated later sort higher — critical for pagination and range queries
- Compact: Fits in 64 or 128 bits — avoids excessive index bloat
- Low latency: ID generation should take < 1ms — must not become a bottleneck
2. Why Not Database Auto-Increment?
The simplest approach — BIGSERIAL or AUTO_INCREMENT — works perfectly for single-node deployments. It breaks in distributed systems for several reasons:
- Single point of failure: All ID generation routes through one database node — any downtime means zero ID generation
- Write bottleneck: At 100K inserts/sec, the sequence generator becomes the bottleneck
- Cross-shard conflicts: Sharding introduces duplicate IDs unless you use separate sequences per shard — breaks global uniqueness
- Predictability: Sequential IDs expose business metrics (number of users, orders) through URL enumeration
Ticket server workaround: A dedicated "ticket server" (Flickr's approach) generates auto-increment IDs in a single database, avoiding sharding conflicts. Still a single point of failure — needs careful HA setup. Only viable below ~10K req/sec.
3. UUID v4 — The Naive Approach & Its Pitfalls
UUID v4 (128-bit random) is the most widely used "quick fix" for distributed IDs. It's easy to generate in any language without coordination and has astronomically low collision probability (2^122 unique values).
// UUID v4: 128-bit random
550e8400-e29b-41d4-a716-446655440000
// Format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
// "4" = version 4; y = 8,9,a,b (variant bits)
// ⚠️ Problem: Completely random → NOT sortable
// Inserting into a B-tree index causes page splits on every insert
// Benchmark: UUID v4 PK → 3-5× slower inserts vs auto-increment at 1M rows
// Page utilization drops to 50-60% vs 99% for sequential IDs
UUID v4 Anti-Pattern: Primary Key in B-Tree Index
Using UUID v4 as a clustered primary key (the default in MySQL InnoDB) causes index fragmentation. Since UUIDs are random, every insert potentially lands in the middle of an existing page, causing a page split and leaving pages half-empty. At 100M rows, this can cause 5–10× more disk I/O than sequential IDs.
Mitigation: Use UUID v4 only as a secondary unique key; keep an auto-increment surrogate as the clustered primary key. Or switch to UUID v7 / ULID (time-ordered).
4. Twitter Snowflake — 64-Bit Time-Ordered IDs
Twitter's Snowflake (2010) solves all the requirements: globally unique, time-sortable, no coordination needed per ID, and generates 4M+ IDs per millisecond across 1024 machines.
Snowflake Implementation
public class Snowflake {
// Twitter Snowflake epoch: Nov 4, 2010 at 01:42:54 UTC
private static final long EPOCH = 1288834974657L;
private static final long DATACENTER_BITS = 5;
private static final long MACHINE_BITS = 5;
private static final long SEQUENCE_BITS = 12;
private static final long MAX_SEQUENCE = (1L << SEQUENCE_BITS) - 1; // 4095
private static final long MACHINE_SHIFT = SEQUENCE_BITS; // 12
private static final long DATACENTER_SHIFT = SEQUENCE_BITS + MACHINE_BITS; // 17
private static final long TIMESTAMP_SHIFT = DATACENTER_SHIFT + DATACENTER_BITS; // 22
private final long datacenterId;
private final long machineId;
private long sequence = 0L;
private long lastTimestamp = -1L;
public synchronized long nextId() {
long timestamp = currentMs();
if (timestamp < lastTimestamp) {
throw new RuntimeException("Clock moved backwards! Refusing to generate ID");
}
if (timestamp == lastTimestamp) {
sequence = (sequence + 1) & MAX_SEQUENCE;
if (sequence == 0) {
timestamp = waitForNextMs(lastTimestamp); // sequence exhausted — wait
}
} else {
sequence = 0L; // new millisecond — reset sequence
}
lastTimestamp = timestamp;
return ((timestamp - EPOCH) << TIMESTAMP_SHIFT)
| (datacenterId << DATACENTER_SHIFT)
| (machineId << MACHINE_SHIFT)
| sequence;
}
}
Properties of Snowflake IDs
- ✅ Time-ordered: IDs generated later are always numerically larger (within same machine)
- ✅ No coordination: Each machine generates IDs independently using its machine ID
- ✅ Max throughput: 4,096 IDs/ms/machine × 1,024 machines = 4.2 billion IDs/ms globally
- ✅ 64-bit integer — fits in a BIGINT column (8 bytes) — same as a regular DB PK
- ⚠️ Requires machine ID assignment — needs a registry (ZooKeeper, etcd)
- ⚠️ Not URL-safe as decimal — typically encoded as base36 or kept as BIGINT
- ⚠️ Epoch expires in ~69 years from chosen epoch date
5. Clock Skew & Sequence Overflow Handling
Clock Skew Defense Strategies
- Detect and wait: If
currentTime < lastTimestamp, wait for clock to catch up before generating the next ID. Acceptable for small skew (<5ms). - Detect and throw: For large backward jumps (> 500ms), throw an exception and alert — likely indicates a serious NTP misconfiguration.
- Hybrid Logical Clocks (HLC): Advanced approach used by CockroachDB — tracks both wall time and logical time to handle NTP drift without waiting or failing. Best for globally distributed systems.
- NTP configuration: Use
ntpdwith-xflag to disable step adjustments (gradual slewing only). This prevents large backward jumps.
Sequence Overflow
When all 4,096 sequence numbers are exhausted within a single millisecond, the generator must wait for the next millisecond before issuing more IDs. This creates brief burst pauses — acceptable for most workloads. If 4,096 IDs/ms per machine is consistently insufficient, add more machines (scale out, don't scale up).
6. Machine ID Registry with ZooKeeper
The Snowflake machine ID (5-bit = 0 to 31 per datacenter) must be uniquely assigned to each running instance. Manual assignment doesn't scale — use ZooKeeper or etcd for automatic assignment:
// ZooKeeper machine ID registration (on service startup)
void registerMachineId() throws Exception {
ZooKeeper zk = new ZooKeeper("zk-cluster:2181", 3000, null);
String path = "/snowflake/workers/";
// Create ephemeral sequential node — ZK assigns sequential suffix
String nodePath = zk.create(
path + "worker-",
InetAddress.getLocalHost().getHostName().getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL // auto-deleted when process dies
);
// Extract machine ID from sequential suffix (0-1023 globally)
int machineId = Integer.parseInt(nodePath.replace(path + "worker-", "")) % 1024;
this.snowflake = new Snowflake(datacenterId, machineId);
}
// When process dies, ZK ephemeral node is deleted
// Next startup gets a new (potentially different) machine ID — that's OK
7. ULID — Lexicographically Sortable IDs
ULID (Universally Unique Lexicographically Sortable Identifier) is a 128-bit ID designed as a UUID replacement that is time-sortable and URL-friendly.
// ULID format: 26-character Crockford base32
01ARZ3NDEKTSV4RRFFQ69G5FAV
// Layout: [10-char timestamp][16-char random]
// 01ARZ3NDEK = 48-bit Unix timestamp in milliseconds
// TSVRFFQ69G5FAV = 80-bit random component
// Properties:
// - Monotonically increasing within same millisecond (random component incremented)
// - URL-safe (no hyphens, no special characters)
// - Case-insensitive
// - 128 bits = same collision resistance as UUID
// Java library
import de.huxhorn.sulky.ulid.ULID;
ULID ulid = new ULID();
String id = ulid.nextULID(); // "01ARZ3NDEKTSV4RRFFQ69G5FAV"
ULID Monotonicity Guarantee
When multiple ULIDs are generated in the same millisecond, the random component is incremented by 1 rather than regenerated randomly. This guarantees strict monotonic ordering within the same process, making ULIDs safe to use as B-tree primary keys without fragmentation.
8. UUID v7 — The 2026 Standard
UUID v7 (RFC 9562, finalized 2024) is the modern replacement for UUID v4. It combines a millisecond-precision timestamp in the high bits with random data in the low bits, making it time-sortable while maintaining UUID compatibility.
// UUID v7 format: xxxxxxxx-xxxx-7xxx-yxxx-xxxxxxxxxxxx
// 018e2462-d8f0-7c3a-b4f2-8d1e9a3c2f01
// ^-- 48-bit unix_ts_ms (sortable) ^-- 74-bit random
// Java (UUID library)
import com.fasterxml.uuid.Generators;
UUID uuid7 = Generators.timeBasedEpochGenerator().generate();
// → 018e2462-d8f0-7c3a-b4f2-8d1e9a3c2f01
// PostgreSQL (native UUID v7 support in pg17+)
SELECT gen_random_uuid_v7();
// Advantages over UUID v4:
// ✅ Time-sortable → no B-tree fragmentation
// ✅ UUID-compatible → works with existing UUID columns
// ✅ No coordination required (no machine ID registry)
// ✅ Widely adopted: PostgreSQL 17, MySQL 9.0, Java uuid library
9. Comparison Table & Decision Framework
| Approach | Size | Sortable | Coordination | Best For |
|---|---|---|---|---|
| DB Auto-Increment | 8B | ✅ Sequential | DB (SPOF) | Single-node DB |
| UUID v4 | 16B | ❌ Random | None | Secondary keys only |
| Snowflake (64-bit) | 8B | ✅ Time-ordered | Machine ID registry | Twitter/Discord scale |
| ULID | 16B (26 chars) | ✅ Lexicographic | None | URL-safe sortable IDs |
| UUID v7 | 16B | ✅ Time-ordered | None | Modern APIs (2026+) |
10. Impact on Database Index Performance
The choice of ID type dramatically affects B-tree index performance — arguably more than any other single database design decision at scale.
Random vs. Sequential Insert Benchmark
| ID Type | Insert Speed (1M rows) | Index Fragmentation | Page Fill % |
|---|---|---|---|
| Auto-increment | Baseline (fastest) | None | ~99% |
| UUID v4 | 3–5× slower | High (page splits) | 50–60% |
| Snowflake | ~1.1× slower | Minimal | ~95% |
| UUID v7 / ULID | ~1.1× slower | Minimal | ~95% |
Practical Recommendation
- If migrating from UUID v4 primary keys: Add a UUID v7 column and re-cluster the index. Migration is worth the effort at > 10M rows.
- New services: Default to UUID v7 (if using PostgreSQL 17+) or Snowflake (for microsecond sortability)
- Run
REINDEXorOPTIMIZE TABLEperiodically if stuck with UUID v4 for legacy reasons
11. Production Deployment & Conclusion
Deployment Patterns
- Embedded (recommended for <100 services): Embed Snowflake library in each service instance; assign machine IDs via ZooKeeper on startup. Zero network hop, maximum throughput.
- Sidecar: Deploy as a gRPC sidecar per pod in Kubernetes. Services call
localhost:50051/nextId. Good for polyglot environments. - Centralized ID service: Single gRPC service per datacenter. Simplest ops but adds ~1ms latency per ID request and is a dependency. Use batch allocation (get 1000 IDs at once) to amortize the cost.
ID Generator System Design Checklist
- ☐ Never use UUID v4 as a clustered primary key in B-tree indexed tables
- ☐ Snowflake: handle clock skew (wait or throw) and sequence overflow (wait for next ms)
- ☐ Machine IDs assigned via ZooKeeper/etcd with ephemeral nodes (auto-released on crash)
- ☐ New projects: default to UUID v7 or ULID for zero-coordination sortable IDs
- ☐ Batch allocation used if ID service is centralized (>1000 IDs per batch request)
- ☐ Custom epoch chosen far enough in past to avoid short IDs initially
- ☐ Epoch expiry calculated and documented (Snowflake: ~69 years from epoch)
- ☐ ID type decision documented — explain sortability, coordination, and size trade-offs
ID generation is a foundational decision that affects every other system layer. Getting it wrong early (UUID v4 primary keys at scale) leads to painful migrations later. In 2026, UUID v7 is the pragmatic default for new services — it requires no coordination, is sortable, and is natively supported by PostgreSQL and MySQL. Choose Snowflake when you need 64-bit integers, need microsecond precision, or are building at Twitter/Discord scale where every byte of storage costs real money.