System Design

Designing a Distributed ID Generator: Snowflake Algorithm, ULID & UUID v7 at Scale

Every distributed system needs globally unique identifiers — for database primary keys, request tracing, event IDs, and distributed transactions. The choice of ID generation strategy affects database index performance, sortability, URL readability, and system scalability. This guide covers every approach you'll encounter in a system design interview or production deployment.

Md Sanwar Hossain April 6, 2026 16 min read System Design
Distributed ID generator system design: Snowflake algorithm, ULID, UUID v7 for distributed systems

TL;DR — Quick Decision Guide

"Use Snowflake for time-ordered, high-throughput IDs (Twitter, Discord, Uber scale). Use ULID or UUID v7 for sortable, URL-safe IDs without coordination. Use DB auto-increment only for single-node systems. Never use UUID v4 as a primary key in a B-tree index — random insertion destroys page locality."

Table of Contents

  1. Requirements for a Distributed ID System
  2. Why Not Database Auto-Increment?
  3. UUID v4 — The Naive Approach & Its Pitfalls
  4. Twitter Snowflake — 64-Bit Time-Ordered IDs
  5. Clock Skew & Sequence Overflow Handling
  6. Machine ID Registry with ZooKeeper
  7. ULID — Lexicographically Sortable IDs
  8. UUID v7 — The 2026 Standard
  9. Comparison Table & Decision Framework
  10. Impact on Database Index Performance
  11. Production Deployment & Conclusion

1. Requirements for a Distributed ID System

2. Why Not Database Auto-Increment?

The simplest approach — BIGSERIAL or AUTO_INCREMENT — works perfectly for single-node deployments. It breaks in distributed systems for several reasons:

Ticket server workaround: A dedicated "ticket server" (Flickr's approach) generates auto-increment IDs in a single database, avoiding sharding conflicts. Still a single point of failure — needs careful HA setup. Only viable below ~10K req/sec.

3. UUID v4 — The Naive Approach & Its Pitfalls

UUID v4 (128-bit random) is the most widely used "quick fix" for distributed IDs. It's easy to generate in any language without coordination and has astronomically low collision probability (2^122 unique values).

// UUID v4: 128-bit random
550e8400-e29b-41d4-a716-446655440000
// Format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
// "4" = version 4; y = 8,9,a,b (variant bits)

// ⚠️ Problem: Completely random → NOT sortable
// Inserting into a B-tree index causes page splits on every insert
// Benchmark: UUID v4 PK → 3-5× slower inserts vs auto-increment at 1M rows
// Page utilization drops to 50-60% vs 99% for sequential IDs

UUID v4 Anti-Pattern: Primary Key in B-Tree Index

Using UUID v4 as a clustered primary key (the default in MySQL InnoDB) causes index fragmentation. Since UUIDs are random, every insert potentially lands in the middle of an existing page, causing a page split and leaving pages half-empty. At 100M rows, this can cause 5–10× more disk I/O than sequential IDs.

Mitigation: Use UUID v4 only as a secondary unique key; keep an auto-increment surrogate as the clustered primary key. Or switch to UUID v7 / ULID (time-ordered).

4. Twitter Snowflake — 64-Bit Time-Ordered IDs

Twitter's Snowflake (2010) solves all the requirements: globally unique, time-sortable, no coordination needed per ID, and generates 4M+ IDs per millisecond across 1024 machines.

Twitter Snowflake 64-bit ID layout: sign bit, timestamp, datacenter, machine ID, sequence number. Distributed ID generator comparison table.
Snowflake 64-bit layout and distributed ID generator comparison table — Snowflake vs UUID vs ULID vs UUID v7. Source: mdsanwarhossain.me

Snowflake Implementation

public class Snowflake {
    // Twitter Snowflake epoch: Nov 4, 2010 at 01:42:54 UTC
    private static final long EPOCH = 1288834974657L;
    private static final long DATACENTER_BITS = 5;
    private static final long MACHINE_BITS    = 5;
    private static final long SEQUENCE_BITS   = 12;

    private static final long MAX_SEQUENCE     = (1L << SEQUENCE_BITS) - 1; // 4095
    private static final long MACHINE_SHIFT    = SEQUENCE_BITS;              // 12
    private static final long DATACENTER_SHIFT = SEQUENCE_BITS + MACHINE_BITS; // 17
    private static final long TIMESTAMP_SHIFT  = DATACENTER_SHIFT + DATACENTER_BITS; // 22

    private final long datacenterId;
    private final long machineId;
    private long sequence = 0L;
    private long lastTimestamp = -1L;

    public synchronized long nextId() {
        long timestamp = currentMs();

        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards! Refusing to generate ID");
        }
        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & MAX_SEQUENCE;
            if (sequence == 0) {
                timestamp = waitForNextMs(lastTimestamp); // sequence exhausted — wait
            }
        } else {
            sequence = 0L; // new millisecond — reset sequence
        }
        lastTimestamp = timestamp;

        return ((timestamp - EPOCH) << TIMESTAMP_SHIFT)
             | (datacenterId << DATACENTER_SHIFT)
             | (machineId    << MACHINE_SHIFT)
             | sequence;
    }
}

Properties of Snowflake IDs

5. Clock Skew & Sequence Overflow Handling

Snowflake clock skew problem and sequence overflow handling in distributed ID generation
Clock skew problem and sequence overflow in Snowflake ID generation — prevention strategies and production deployment. Source: mdsanwarhossain.me

Clock Skew Defense Strategies

Sequence Overflow

When all 4,096 sequence numbers are exhausted within a single millisecond, the generator must wait for the next millisecond before issuing more IDs. This creates brief burst pauses — acceptable for most workloads. If 4,096 IDs/ms per machine is consistently insufficient, add more machines (scale out, don't scale up).

6. Machine ID Registry with ZooKeeper

The Snowflake machine ID (5-bit = 0 to 31 per datacenter) must be uniquely assigned to each running instance. Manual assignment doesn't scale — use ZooKeeper or etcd for automatic assignment:

// ZooKeeper machine ID registration (on service startup)
void registerMachineId() throws Exception {
    ZooKeeper zk = new ZooKeeper("zk-cluster:2181", 3000, null);
    String path = "/snowflake/workers/";

    // Create ephemeral sequential node — ZK assigns sequential suffix
    String nodePath = zk.create(
        path + "worker-",
        InetAddress.getLocalHost().getHostName().getBytes(),
        ZooDefs.Ids.OPEN_ACL_UNSAFE,
        CreateMode.EPHEMERAL_SEQUENTIAL // auto-deleted when process dies
    );

    // Extract machine ID from sequential suffix (0-1023 globally)
    int machineId = Integer.parseInt(nodePath.replace(path + "worker-", "")) % 1024;
    this.snowflake = new Snowflake(datacenterId, machineId);
}

// When process dies, ZK ephemeral node is deleted
// Next startup gets a new (potentially different) machine ID — that's OK

7. ULID — Lexicographically Sortable IDs

ULID (Universally Unique Lexicographically Sortable Identifier) is a 128-bit ID designed as a UUID replacement that is time-sortable and URL-friendly.

// ULID format: 26-character Crockford base32
01ARZ3NDEKTSV4RRFFQ69G5FAV

// Layout: [10-char timestamp][16-char random]
// 01ARZ3NDEK = 48-bit Unix timestamp in milliseconds
// TSVRFFQ69G5FAV = 80-bit random component

// Properties:
// - Monotonically increasing within same millisecond (random component incremented)
// - URL-safe (no hyphens, no special characters)
// - Case-insensitive
// - 128 bits = same collision resistance as UUID

// Java library
import de.huxhorn.sulky.ulid.ULID;
ULID ulid = new ULID();
String id = ulid.nextULID(); // "01ARZ3NDEKTSV4RRFFQ69G5FAV"

ULID Monotonicity Guarantee

When multiple ULIDs are generated in the same millisecond, the random component is incremented by 1 rather than regenerated randomly. This guarantees strict monotonic ordering within the same process, making ULIDs safe to use as B-tree primary keys without fragmentation.

8. UUID v7 — The 2026 Standard

UUID v7 (RFC 9562, finalized 2024) is the modern replacement for UUID v4. It combines a millisecond-precision timestamp in the high bits with random data in the low bits, making it time-sortable while maintaining UUID compatibility.

// UUID v7 format: xxxxxxxx-xxxx-7xxx-yxxx-xxxxxxxxxxxx
// 018e2462-d8f0-7c3a-b4f2-8d1e9a3c2f01
//  ^-- 48-bit unix_ts_ms (sortable)     ^-- 74-bit random

// Java (UUID library)
import com.fasterxml.uuid.Generators;
UUID uuid7 = Generators.timeBasedEpochGenerator().generate();
// → 018e2462-d8f0-7c3a-b4f2-8d1e9a3c2f01

// PostgreSQL (native UUID v7 support in pg17+)
SELECT gen_random_uuid_v7();

// Advantages over UUID v4:
// ✅ Time-sortable → no B-tree fragmentation
// ✅ UUID-compatible → works with existing UUID columns
// ✅ No coordination required (no machine ID registry)
// ✅ Widely adopted: PostgreSQL 17, MySQL 9.0, Java uuid library

9. Comparison Table & Decision Framework

Approach Size Sortable Coordination Best For
DB Auto-Increment 8B ✅ Sequential DB (SPOF) Single-node DB
UUID v4 16B ❌ Random None Secondary keys only
Snowflake (64-bit) 8B ✅ Time-ordered Machine ID registry Twitter/Discord scale
ULID 16B (26 chars) ✅ Lexicographic None URL-safe sortable IDs
UUID v7 16B ✅ Time-ordered None Modern APIs (2026+)

10. Impact on Database Index Performance

The choice of ID type dramatically affects B-tree index performance — arguably more than any other single database design decision at scale.

Random vs. Sequential Insert Benchmark

ID Type Insert Speed (1M rows) Index Fragmentation Page Fill %
Auto-increment Baseline (fastest) None ~99%
UUID v4 3–5× slower High (page splits) 50–60%
Snowflake ~1.1× slower Minimal ~95%
UUID v7 / ULID ~1.1× slower Minimal ~95%

Practical Recommendation

11. Production Deployment & Conclusion

Deployment Patterns

ID Generator System Design Checklist

  • ☐ Never use UUID v4 as a clustered primary key in B-tree indexed tables
  • ☐ Snowflake: handle clock skew (wait or throw) and sequence overflow (wait for next ms)
  • ☐ Machine IDs assigned via ZooKeeper/etcd with ephemeral nodes (auto-released on crash)
  • ☐ New projects: default to UUID v7 or ULID for zero-coordination sortable IDs
  • ☐ Batch allocation used if ID service is centralized (>1000 IDs per batch request)
  • ☐ Custom epoch chosen far enough in past to avoid short IDs initially
  • ☐ Epoch expiry calculated and documented (Snowflake: ~69 years from epoch)
  • ☐ ID type decision documented — explain sortability, coordination, and size trade-offs

ID generation is a foundational decision that affects every other system layer. Getting it wrong early (UUID v4 primary keys at scale) leads to painful migrations later. In 2026, UUID v7 is the pragmatic default for new services — it requires no coordination, is sortable, and is natively supported by PostgreSQL and MySQL. Choose Snowflake when you need 64-bit integers, need microsecond precision, or are building at Twitter/Discord scale where every byte of storage costs real money.

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · System Design

All Posts
Last updated: April 6, 2026