What is Requirements for a Distributed ID System and how does it work?

Globally unique: No two IDs ever collide across all nodes, datacenters, and time High throughput: Generate millions of IDs per second without coordination bottlenecks No single point of failure: ID generation cannot depend on a centralized service Sortable (optional but valuable): IDs generated later sort higher — critical for pagination and range queries

System Design

Designing a Distributed ID Generator: Snowflake Algorithm, ULID & UUID v7 at Scale

Q: What is TL;DR — Quick Decision Guide and how does it work?

"Use Snowflake for time-ordered, high-throughput IDs (Twitter, Discord, Uber scale). Use ULID or UUID v7 for sortable, URL-safe IDs without coordination. Use DB auto-increment only for single-node systems. Never use UUID v4 as a primary key in a B-tree index — random insertion destroys page locality."

Q: Why Not Database Auto-Increment?

The simplest approach — BIGSERIAL or AUTO_INCREMENT — works perfectly for single-node deployments. It breaks in distributed systems for several reasons: Ticket server workaround: A dedicated "ticket server" (Flickr's approach) generates auto-increment IDs in a single database, avoiding sharding conflicts. Still a single point of failure — needs careful HA setup. Only viable below ~10K req/sec. Single point of failure: All ID generation routes through one database node — any downtime means zero ID generation Write bottleneck: At 100K inserts/sec, the sequence generator becomes the bottleneck Cross-shard conflicts: Sharding introduces duplicate IDs unless you use separate sequences per shard — breaks global uniqueness Predictability: Sequential IDs expose business metrics (number of users, orders) through URL enumeration

Q: What is UUID v4 — The Naive Approach & Its Pitfalls and how does it work?

UUID v4 (128-bit random) is the most widely used "quick fix" for distributed IDs. It's easy to generate in any language without coordination and has astronomically low collision probability (2^122 unique values).

Q: What are the common anti-patterns in UUID v4 Anti-Pattern?

Using UUID v4 as a clustered primary key (the default in MySQL InnoDB) causes index fragmentation . Since UUIDs are random, every insert potentially lands in the middle of an existing page, causing a page split and leaving pages half-empty. At 100M rows, this can cause 5–10× more disk I/O than sequential IDs. Mitigation: Use UUID v4 only as a secondary unique key; keep an auto-increment surrogate as the clustered primary key. Or switch to UUID v7 / ULID (time-ordered).

Every distributed system needs globally unique identifiers — for database primary keys, request tracing, event IDs, and distributed transactions. The choice of ID generation strategy affects database index performance, sortability, URL readability, and system scalability. This guide covers every approach you'll encounter in a system design interview or production deployment.

Md Sanwar Hossain April 6, 2026 16 min read System Design

Distributed ID generator system design: Snowflake algorithm, ULID, UUID v7 for distributed systems

TL;DR — Quick Decision Guide

"Use Snowflake for time-ordered, high-throughput IDs (Twitter, Discord, Uber scale). Use ULID or UUID v7 for sortable, URL-safe IDs without coordination. Use DB auto-increment only for single-node systems. Never use UUID v4 as a primary key in a B-tree index — random insertion destroys page locality."

Requirements for a Distributed ID System
Why Not Database Auto-Increment?
UUID v4 — The Naive Approach & Its Pitfalls
Twitter Snowflake — 64-Bit Time-Ordered IDs
Clock Skew & Sequence Overflow Handling
Machine ID Registry with ZooKeeper
ULID — Lexicographically Sortable IDs
UUID v7 — The 2026 Standard
Comparison Table & Decision Framework
Impact on Database Index Performance
Production Deployment & Conclusion

1. Requirements for a Distributed ID System

Globally unique: No two IDs ever collide across all nodes, datacenters, and time
High throughput: Generate millions of IDs per second without coordination bottlenecks
No single point of failure: ID generation cannot depend on a centralized service
Sortable (optional but valuable): IDs generated later sort higher — critical for pagination and range queries
Compact: Fits in 64 or 128 bits — avoids excessive index bloat
Low latency: ID generation should take < 1ms — must not become a bottleneck

2. Why Not Database Auto-Increment?

The simplest approach — BIGSERIAL or AUTO_INCREMENT — works perfectly for single-node deployments. It breaks in distributed systems for several reasons:

Single point of failure: All ID generation routes through one database node — any downtime means zero ID generation
Write bottleneck: At 100K inserts/sec, the sequence generator becomes the bottleneck
Cross-shard conflicts: Sharding introduces duplicate IDs unless you use separate sequences per shard — breaks global uniqueness
Predictability: Sequential IDs expose business metrics (number of users, orders) through URL enumeration

Ticket server workaround: A dedicated "ticket server" (Flickr's approach) generates auto-increment IDs in a single database, avoiding sharding conflicts. Still a single point of failure — needs careful HA setup. Only viable below ~10K req/sec.

3. UUID v4 — The Naive Approach & Its Pitfalls

UUID v4 (128-bit random) is the most widely used "quick fix" for distributed IDs. It's easy to generate in any language without coordination and has astronomically low collision probability (2^122 unique values).

// UUID v4: 128-bit random
550e8400-e29b-41d4-a716-446655440000
// Format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
// "4" = version 4; y = 8,9,a,b (variant bits)

// ⚠️ Problem: Completely random → NOT sortable
// Inserting into a B-tree index causes page splits on every insert
// Benchmark: UUID v4 PK → 3-5× slower inserts vs auto-increment at 1M rows
// Page utilization drops to 50-60% vs 99% for sequential IDs

UUID v4 Anti-Pattern: Primary Key in B-Tree Index

Using UUID v4 as a clustered primary key (the default in MySQL InnoDB) causes index fragmentation. Since UUIDs are random, every insert potentially lands in the middle of an existing page, causing a page split and leaving pages half-empty. At 100M rows, this can cause 5–10× more disk I/O than sequential IDs.

Mitigation: Use UUID v4 only as a secondary unique key; keep an auto-increment surrogate as the clustered primary key. Or switch to UUID v7 / ULID (time-ordered).

4. Twitter Snowflake — 64-Bit Time-Ordered IDs

Twitter's Snowflake (2010) solves all the requirements: globally unique, time-sortable, no coordination needed per ID, and generates 4M+ IDs per millisecond across 1024 machines.

Snowflake 64-bit layout and distributed ID generator comparison table — Snowflake vs UUID vs ULID vs UUID v7. Source: mdsanwarhossain.me

Snowflake Implementation

public class Snowflake {
    // Twitter Snowflake epoch: Nov 4, 2010 at 01:42:54 UTC
    private static final long EPOCH = 1288834974657L;
    private static final long DATACENTER_BITS = 5;
    private static final long MACHINE_BITS    = 5;
    private static final long SEQUENCE_BITS   = 12;

    private static final long MAX_SEQUENCE     = (1L << SEQUENCE_BITS) - 1; // 4095
    private static final long MACHINE_SHIFT    = SEQUENCE_BITS;              // 12
    private static final long DATACENTER_SHIFT = SEQUENCE_BITS + MACHINE_BITS; // 17
    private static final long TIMESTAMP_SHIFT  = DATACENTER_SHIFT + DATACENTER_BITS; // 22

    private final long datacenterId;
    private final long machineId;
    private long sequence = 0L;
    private long lastTimestamp = -1L;

    public synchronized long nextId() {
        long timestamp = currentMs();

        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards! Refusing to generate ID");
        }
        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & MAX_SEQUENCE;
            if (sequence == 0) {
                timestamp = waitForNextMs(lastTimestamp); // sequence exhausted — wait
            }
        } else {
            sequence = 0L; // new millisecond — reset sequence
        }
        lastTimestamp = timestamp;

        return ((timestamp - EPOCH) << TIMESTAMP_SHIFT)
             | (datacenterId << DATACENTER_SHIFT)
             | (machineId    << MACHINE_SHIFT)
             | sequence;
    }
}

Properties of Snowflake IDs

✅ Time-ordered: IDs generated later are always numerically larger (within same machine)
✅ No coordination: Each machine generates IDs independently using its machine ID
✅ Max throughput: 4,096 IDs/ms/machine × 1,024 machines = 4.2 billion IDs/ms globally
✅ 64-bit integer — fits in a BIGINT column (8 bytes) — same as a regular DB PK
⚠️ Requires machine ID assignment — needs a registry (ZooKeeper, etcd)
⚠️ Not URL-safe as decimal — typically encoded as base36 or kept as BIGINT
⚠️ Epoch expires in ~69 years from chosen epoch date

5. Clock Skew & Sequence Overflow Handling

Clock skew problem and sequence overflow in Snowflake ID generation — prevention strategies and production deployment. Source: mdsanwarhossain.me

Clock Skew Defense Strategies

Detect and wait: If currentTime < lastTimestamp, wait for clock to catch up before generating the next ID. Acceptable for small skew (<5ms).
Detect and throw: For large backward jumps (> 500ms), throw an exception and alert — likely indicates a serious NTP misconfiguration.
Hybrid Logical Clocks (HLC): Advanced approach used by CockroachDB — tracks both wall time and logical time to handle NTP drift without waiting or failing. Best for globally distributed systems.
NTP configuration: Use ntpd with -x flag to disable step adjustments (gradual slewing only). This prevents large backward jumps.

Sequence Overflow

When all 4,096 sequence numbers are exhausted within a single millisecond, the generator must wait for the next millisecond before issuing more IDs. This creates brief burst pauses — acceptable for most workloads. If 4,096 IDs/ms per machine is consistently insufficient, add more machines (scale out, don't scale up).

6. Machine ID Registry with ZooKeeper

The Snowflake machine ID (5-bit = 0 to 31 per datacenter) must be uniquely assigned to each running instance. Manual assignment doesn't scale — use ZooKeeper or etcd for automatic assignment:

// ZooKeeper machine ID registration (on service startup)
void registerMachineId() throws Exception {
    ZooKeeper zk = new ZooKeeper("zk-cluster:2181", 3000, null);
    String path = "/snowflake/workers/";

    // Create ephemeral sequential node — ZK assigns sequential suffix
    String nodePath = zk.create(
        path + "worker-",
        InetAddress.getLocalHost().getHostName().getBytes(),
        ZooDefs.Ids.OPEN_ACL_UNSAFE,
        CreateMode.EPHEMERAL_SEQUENTIAL // auto-deleted when process dies
    );

    // Extract machine ID from sequential suffix (0-1023 globally)
    int machineId = Integer.parseInt(nodePath.replace(path + "worker-", "")) % 1024;
    this.snowflake = new Snowflake(datacenterId, machineId);
}

// When process dies, ZK ephemeral node is deleted
// Next startup gets a new (potentially different) machine ID — that's OK

7. ULID — Lexicographically Sortable IDs

ULID (Universally Unique Lexicographically Sortable Identifier) is a 128-bit ID designed as a UUID replacement that is time-sortable and URL-friendly.

// ULID format: 26-character Crockford base32
01ARZ3NDEKTSV4RRFFQ69G5FAV

// Layout: [10-char timestamp][16-char random]
// 01ARZ3NDEK = 48-bit Unix timestamp in milliseconds
// TSVRFFQ69G5FAV = 80-bit random component

// Properties:
// - Monotonically increasing within same millisecond (random component incremented)
// - URL-safe (no hyphens, no special characters)
// - Case-insensitive
// - 128 bits = same collision resistance as UUID

// Java library
import de.huxhorn.sulky.ulid.ULID;
ULID ulid = new ULID();
String id = ulid.nextULID(); // "01ARZ3NDEKTSV4RRFFQ69G5FAV"

ULID Monotonicity Guarantee

When multiple ULIDs are generated in the same millisecond, the random component is incremented by 1 rather than regenerated randomly. This guarantees strict monotonic ordering within the same process, making ULIDs safe to use as B-tree primary keys without fragmentation.

8. UUID v7 — The 2026 Standard

UUID v7 (RFC 9562, finalized 2024) is the modern replacement for UUID v4. It combines a millisecond-precision timestamp in the high bits with random data in the low bits, making it time-sortable while maintaining UUID compatibility.

// UUID v7 format: xxxxxxxx-xxxx-7xxx-yxxx-xxxxxxxxxxxx
// 018e2462-d8f0-7c3a-b4f2-8d1e9a3c2f01
//  ^-- 48-bit unix_ts_ms (sortable)     ^-- 74-bit random

// Java (UUID library)
import com.fasterxml.uuid.Generators;
UUID uuid7 = Generators.timeBasedEpochGenerator().generate();
// → 018e2462-d8f0-7c3a-b4f2-8d1e9a3c2f01

// PostgreSQL (native UUID v7 support in pg17+)
SELECT gen_random_uuid_v7();

// Advantages over UUID v4:
// ✅ Time-sortable → no B-tree fragmentation
// ✅ UUID-compatible → works with existing UUID columns
// ✅ No coordination required (no machine ID registry)
// ✅ Widely adopted: PostgreSQL 17, MySQL 9.0, Java uuid library

9. Comparison Table & Decision Framework

Approach	Size	Sortable	Coordination	Best For
DB Auto-Increment	8B	✅ Sequential	DB (SPOF)	Single-node DB
UUID v4	16B	❌ Random	None	Secondary keys only
Snowflake (64-bit)	8B	✅ Time-ordered	Machine ID registry	Twitter/Discord scale
ULID	16B (26 chars)	✅ Lexicographic	None	URL-safe sortable IDs
UUID v7	16B	✅ Time-ordered	None	Modern APIs (2026+)

10. Impact on Database Index Performance

The choice of ID type dramatically affects B-tree index performance — arguably more than any other single database design decision at scale.

Random vs. Sequential Insert Benchmark

ID Type	Insert Speed (1M rows)	Index Fragmentation	Page Fill %
Auto-increment	Baseline (fastest)	None	~99%
UUID v4	3–5× slower	High (page splits)	50–60%
Snowflake	~1.1× slower	Minimal	~95%
UUID v7 / ULID	~1.1× slower	Minimal	~95%

Practical Recommendation

If migrating from UUID v4 primary keys: Add a UUID v7 column and re-cluster the index. Migration is worth the effort at > 10M rows.
New services: Default to UUID v7 (if using PostgreSQL 17+) or Snowflake (for microsecond sortability)
Run REINDEX or OPTIMIZE TABLE periodically if stuck with UUID v4 for legacy reasons

11. Production Deployment & Conclusion

Deployment Patterns

Embedded (recommended for <100 services): Embed Snowflake library in each service instance; assign machine IDs via ZooKeeper on startup. Zero network hop, maximum throughput.
Sidecar: Deploy as a gRPC sidecar per pod in Kubernetes. Services call localhost:50051/nextId. Good for polyglot environments.
Centralized ID service: Single gRPC service per datacenter. Simplest ops but adds ~1ms latency per ID request and is a dependency. Use batch allocation (get 1000 IDs at once) to amortize the cost.

ID Generator System Design Checklist

☐ Never use UUID v4 as a clustered primary key in B-tree indexed tables
☐ Snowflake: handle clock skew (wait or throw) and sequence overflow (wait for next ms)
☐ Machine IDs assigned via ZooKeeper/etcd with ephemeral nodes (auto-released on crash)
☐ New projects: default to UUID v7 or ULID for zero-coordination sortable IDs
☐ Batch allocation used if ID service is centralized (>1000 IDs per batch request)
☐ Custom epoch chosen far enough in past to avoid short IDs initially
☐ Epoch expiry calculated and documented (Snowflake: ~69 years from epoch)
☐ ID type decision documented — explain sortability, coordination, and size trade-offs

ID generation is a foundational decision that affects every other system layer. Getting it wrong early (UUID v4 primary keys at scale) leads to painful migrations later. In 2026, UUID v7 is the pragmatic default for new services — it requires no coordination, is sortable, and is natively supported by PostgreSQL and MySQL. Choose Snowflake when you need 64-bit integers, need microsecond precision, or are building at Twitter/Discord scale where every byte of storage costs real money.

Designing a Distributed ID Generator: Snowflake Algorithm, ULID & UUID v7 at Scale

TL;DR — Quick Decision Guide

Table of Contents

1. Requirements for a Distributed ID System

2. Why Not Database Auto-Increment?

3. UUID v4 — The Naive Approach & Its Pitfalls

UUID v4 Anti-Pattern: Primary Key in B-Tree Index

4. Twitter Snowflake — 64-Bit Time-Ordered IDs

Snowflake Implementation

Properties of Snowflake IDs

5. Clock Skew & Sequence Overflow Handling

Clock Skew Defense Strategies

Sequence Overflow

6. Machine ID Registry with ZooKeeper

7. ULID — Lexicographically Sortable IDs

ULID Monotonicity Guarantee

8. UUID v7 — The 2026 Standard

9. Comparison Table & Decision Framework

10. Impact on Database Index Performance

Random vs. Sequential Insert Benchmark

Practical Recommendation

11. Production Deployment & Conclusion

Deployment Patterns

ID Generator System Design Checklist

Tags

Leave a Comment

Related Posts

Designing a Distributed ID Generator: Snowflake Algorithm, ULID & UUID v7 at Scale

TL;DR — Quick Decision Guide

Table of Contents

1. Requirements for a Distributed ID System

2. Why Not Database Auto-Increment?

3. UUID v4 — The Naive Approach & Its Pitfalls

UUID v4 Anti-Pattern: Primary Key in B-Tree Index

4. Twitter Snowflake — 64-Bit Time-Ordered IDs

Snowflake Implementation

Properties of Snowflake IDs

5. Clock Skew & Sequence Overflow Handling

Clock Skew Defense Strategies

Sequence Overflow

6. Machine ID Registry with ZooKeeper

7. ULID — Lexicographically Sortable IDs

ULID Monotonicity Guarantee

8. UUID v7 — The 2026 Standard

9. Comparison Table & Decision Framework

10. Impact on Database Index Performance

Random vs. Sequential Insert Benchmark

Practical Recommendation

11. Production Deployment & Conclusion

Deployment Patterns

ID Generator System Design Checklist

Tags

Leave a Comment

Related Posts

Consistent Hashing Ring Design

Distributed Locking with Redis

Designing an Autocomplete System at Scale

Designing a Payment System at Scale

Cookie Notice