System Design

URL Shortener System Design: TinyURL & Bitly at Scale (2026)

Q: How many characters should a short URL code be?

7 characters using Base62 (a-z, A-Z, 0-9) gives 62^7 ≈ 3.5 trillion unique codes — more than enough for billions of URLs with headroom. TinyURL uses 8 characters; Bitly uses 7. 6 characters yields 56 billion codes, which is still sufficient for most services.

Q: What database should I use for a URL shortener?

NoSQL (Cassandra, DynamoDB) is preferred for the URL mapping table because read throughput far exceeds write throughput (10:1), the data model is a simple key-value lookup, and horizontal scaling is straightforward. Use a relational database only if you need strong transactional guarantees for user accounts or billing.

Q: How do you handle hash collisions in a URL shortener?

First, check the database before inserting — if the short code already exists for a different long URL, append a counter suffix or re-hash with a salt and retry. Using a Bloom filter as a pre-check avoids unnecessary database round-trips. A counter-based pre-allocation strategy (range-based ID generation) eliminates collisions entirely.

Q: How do you scale a URL shortener to handle 100M URLs per day?

Use horizontal scaling at the application tier behind a load balancer, a distributed cache (Redis cluster) for hot URLs, a NoSQL database with sharding for storage, CDN edge nodes for the redirect path, and an async event pipeline (Kafka) for click analytics. The redirect path must be ultra-low latency — target under 10ms p99.

Q: What is the difference between 301 and 302 redirects in a URL shortener?

301 (Permanent Redirect) instructs browsers to cache the redirect and go directly to the destination on future visits — this reduces server load but prevents analytics tracking of repeat visits. 302 (Temporary Redirect) forces the browser to hit your server every time, enabling accurate click counting. Services like Bitly use 302 to track every click for analytics.

A URL shortener is a deceptively simple service that hides enormous engineering challenges at scale. Designing one for 100 million URLs per day — like TinyURL or Bitly — forces you to make hard decisions about hash generation, collision handling, caching, database choice, analytics pipelines, and global distribution. This deep-dive covers every layer of the system with Java code examples you can take to an interview or production.

Md Sanwar Hossain April 11, 2026 22 min read System Design

URL shortener system design at scale — TinyURL and Bitly architecture deep dive

TL;DR — Core Design Decisions at a Glance

Hash strategy: Use counter-based ID + Base62 encoding; avoid MD5 (collision-prone, expensive to resolve).
Storage: NoSQL (Cassandra/DynamoDB) for URL mappings; Redis cluster for caching hot short codes.
Redirects: HTTP 302 for analytics tracking; serve from CDN edge nodes for <10ms latency.
Capacity: 100M writes/day ≈ 1,157 writes/sec; 10:1 read:write = ~11,570 reads/sec peak.
Analytics: Async Kafka pipeline; never block the redirect path for click tracking.

What Is a URL Shortener?
Functional & Non-Functional Requirements
High-Level Architecture
Hash Generation Strategies
Database Design
Caching Strategy
Custom Aliases & Rate Limiting
Analytics & Tracking
Scalability Deep Dive
Common Mistakes & Interview Tips
Conclusion & Key Takeaways

1. What Is a URL Shortener?

A URL shortener converts a long, unwieldy URL into a compact short code that redirects to the original destination. When a user visits the short URL, the service performs an HTTP redirect to the original long URL — transparently, within milliseconds.

The classic example: https://tinyurl.com/y7k2xq3p expands to a 200-character Amazon product URL. Real-world companies operating at scale include:

Bitly — 10+ billion links created; enterprise analytics, custom domains, QR codes
TinyURL — the original (1997), anonymous, no account required
t.co — Twitter's internal shortener; every tweet link is wrapped automatically
goo.gl — Google's deprecated shortener (shut down 2019); handled ~1B redirects/day at peak
rb.gy, short.io, Rebrandly — modern competitors with white-label branded domains

Core Use Cases

Social media sharing: Twitter's 280-character limit made short URLs essential; every URL consumes exactly 23 characters regardless of length via t.co wrapping.
SMS & print campaigns: Short URLs are typeable and memorable in offline contexts.
Link tracking & analytics: Marketers measure CTR, geo-distribution, device breakdown per campaign link.
API responses & deep links: Programmatic URL generation for mobile deep links, QR codes, and referral systems.
Expiring/one-time links: Password reset, file download, or payment confirmation URLs that expire after use.

In system design interviews, URL shorteners are a favorite because they are simple to understand but rich in trade-offs: hashing vs. sequential ID generation, SQL vs. NoSQL, cache consistency, 301 vs. 302 redirects, and horizontal scaling — all in one system.

2. Functional & Non-Functional Requirements

Functional Requirements

Given a long URL, generate a unique short URL (e.g., https://short.ly/aB3kZ9q)
Redirect short URL to original long URL with <10ms added latency
Support optional custom aliases (e.g., /my-brand-launch)
Support optional expiry date/time on short URLs
Track click analytics: timestamp, referrer, geolocation, device/browser
User accounts: dashboard for managing and auditing personal links
Delete or deactivate a short URL

Non-Functional Requirements & Capacity Estimation

Parameter	Assumption	Derived Metric
Daily URL writes	100 million	~1,157 writes/sec
Read:Write ratio	10:1	~11,570 reads/sec
Storage per URL record	~500 bytes avg	50 GB/day; 18 TB/year
Short code length	7 chars Base62	62⁷ ≈ 3.5 trillion codes
Redirect latency SLA	p99 < 20ms	Requires aggressive caching
Availability	99.99% uptime	<52 min downtime/year

Key non-functional constraints: low-latency reads (the redirect is on the hot path for every user click), high availability (a dead link is a broken user experience for every downstream campaign), and eventual consistency for analytics (a few delayed click events are acceptable; a failed redirect is not).

3. High-Level Architecture

The system has two primary flows: the write path (create short URL) and the read path (redirect). These paths have very different performance profiles and can be scaled independently.

┌──────────────────────────────────────────────────────────────────┐
│                        WRITE PATH                                │
│                                                                  │
│  Client ──► API Gateway ──► Write Service ──► ID Generator       │
│                                    │               │             │
│                                    ▼               ▼             │
│                             URL Validator   Zookeeper/Range      │
│                                    │                             │
│                                    ▼                             │
│                              NoSQL DB (Cassandra)                │
│                              + Async Kafka → Analytics DB        │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│                        READ PATH                                 │
│                                                                  │
│  User ──► CDN Edge ──► Load Balancer ──► Redirect Service        │
│              │                                   │               │
│              │ (cache HIT)                       ▼               │
│              └──────────── Redis Cluster ◄───── Lookup           │
│                                │                │                │
│                            cache HIT        cache MISS           │
│                                │                │                │
│                         HTTP 302            Cassandra             │
│                         Redirect            Lookup               │
│                                              + populate cache    │
└──────────────────────────────────────────────────────────────────┘

Component Responsibilities

API Gateway: Rate limiting, authentication, TLS termination, routing to write vs. read service clusters
Write Service: URL validation, short code generation, deduplication check, database write
Redirect Service: Short code lookup (cache first, DB fallback), HTTP redirect, async click event emit
ID Generator: Distributed counter with pre-allocated range blocks — eliminates collision by construction
Redis Cluster: Cache short_code → long_url mappings; 20% of hot URLs absorb 80% of redirect traffic
Cassandra: Distributed NoSQL store for all URL records; partition key = short_code for O(1) lookup
Kafka: Click events streamed asynchronously to analytics consumers — zero impact on redirect latency
CDN: Edge caching of redirect responses for globally distributed low-latency access

4. Hash Generation Strategies

Choosing the right short code generation strategy is the most critical design decision. The wrong approach causes collisions, wasted DB round-trips, and unpredictable performance under load.

Strategy Comparison

Strategy	Pros	Cons	Verdict
MD5 / SHA-256 Hash	Same URL always maps to same code (deterministic)	Hash collisions require DB check + retry loop; truncation worsens collision rate	❌ Avoid
Random Base62	Simple, no coordination needed	Collision probability increases as DB fills; must always check DB before write	⚠ Acceptable
Counter + Base62	Zero collisions; predictable; tiny storage	Requires distributed counter coordination (Zookeeper/Redis)	✅ Recommended
UUID + Truncate	Decentralized generation; no coordination	High collision rate after truncation; 128-bit UUID → 7-char = ~1% collision at 1B records	❌ Avoid

Base62 Encoding Explained

Base62 uses the character set 0-9a-zA-Z (62 characters). A 7-character code gives 62⁷ = 3,521,614,606,208 unique values. The mapping is simply converting an integer counter to its Base62 representation:

// ❌ Bad - MD5 hash with truncation causes collisions and wastes DB round-trips
import java.security.MessageDigest;

public class BadUrlShortener {

    public String shorten(String longUrl) throws Exception {
        // ❌ MD5 is cryptographically broken and produces 128-bit output
        MessageDigest md = MessageDigest.getInstance("MD5");
        byte[] hash = md.digest(longUrl.getBytes());

        // ❌ Taking only first 7 chars of hex increases collision probability dramatically
        String hexHash = bytesToHex(hash);
        String shortCode = hexHash.substring(0, 7);

        // ❌ Must check DB for collision on EVERY write — O(n) as DB fills
        if (codeExists(shortCode)) {
            // ❌ Appending counter and re-checking is fragile under concurrency
            shortCode = hexHash.substring(1, 8);
            if (codeExists(shortCode)) {
                throw new RuntimeException("Too many collisions"); // 💀
            }
        }
        return shortCode;
    }

    private String bytesToHex(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) sb.append(String.format("%02x", b));
        return sb.toString();
    }

    private boolean codeExists(String code) { /* DB lookup */ return false; }
}

// ✅ Good - Counter-based Base62 encoding with range pre-allocation eliminates collisions
public class Base62Encoder {

    private static final String ALPHABET =
        "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    private static final int BASE = 62;
    private static final int CODE_LENGTH = 7;

    // ✅ Convert a unique monotonic counter value to a 7-char Base62 string
    public String encode(long id) {
        StringBuilder sb = new StringBuilder();
        while (id > 0) {
            sb.append(ALPHABET.charAt((int)(id % BASE)));
            id /= BASE;
        }
        // ✅ Pad to fixed length to prevent code length leaking sequence info
        while (sb.length() < CODE_LENGTH) {
            sb.append(ALPHABET.charAt(0));
        }
        return sb.reverse().toString();
    }

    // ✅ Decode back to the original counter ID (useful for debugging)
    public long decode(String code) {
        long result = 0;
        for (char c : code.toCharArray()) {
            result = result * BASE + ALPHABET.indexOf(c);
        }
        return result;
    }
}

// ✅ Distributed ID generator using range pre-allocation
// Each app node claims a range (e.g., 1000 IDs at a time) from Zookeeper/Redis
// and generates IDs locally without network calls — zero collision, high throughput
@Component
public class RangeBasedIdGenerator {

    private final RedisTemplate<String, Long> redisTemplate;
    private static final String COUNTER_KEY = "url:global:counter";
    private static final long RANGE_SIZE = 1_000L;

    private long rangeStart = 0;
    private long rangeEnd = 0;
    private final Object lock = new Object();

    public RangeBasedIdGenerator(RedisTemplate<String, Long> redisTemplate) {
        this.redisTemplate = redisTemplate;
    }

    // ✅ Claim a new range atomically from Redis when local range is exhausted
    public long nextId() {
        synchronized (lock) {
            if (rangeStart >= rangeEnd) {
                // Atomic increment — claims RANGE_SIZE IDs in one Redis call
                Long newEnd = redisTemplate.opsForValue()
                    .increment(COUNTER_KEY, RANGE_SIZE);
                rangeEnd = newEnd;
                rangeStart = newEnd - RANGE_SIZE;
            }
            return rangeStart++;
        }
    }
}

5. Database Design

The database choice underpins the scalability of your entire system. The URL mapping table is a classic write-once, read-many workload — an almost perfect use case for NoSQL.

SQL vs. NoSQL Trade-off Analysis

Criterion	SQL (PostgreSQL / MySQL)	NoSQL (Cassandra / DynamoDB)
Horizontal scaling	Hard — requires sharding middleware or Vitess	Native — add nodes, data redistributes automatically
Read throughput at scale	Good with read replicas; cap at ~100K RPS	Excellent; millions of RPS with linear scaling
Data model fit	Overkill — joins & ACID not needed for key-value lookup	Perfect — short_code is the partition key, O(1) lookup
Consistency model	Strong ACID — guarantees unique short codes natively	Eventual (tunable); use lightweight transactions for uniqueness
Use SQL for	User accounts, billing, subscription data — where ACID and relational queries matter
Use NoSQL for	URL mapping table — pure key-value, 18 TB/year, billions of rows, read-heavy

Cassandra Schema

-- CQL Schema for URL mapping table
CREATE KEYSPACE url_shortener
  WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3}
  AND durable_writes = true;

CREATE TABLE url_shortener.url_mappings (
    short_code   TEXT PRIMARY KEY,    -- partition key = O(1) lookup
    long_url     TEXT,
    user_id      UUID,
    created_at   TIMESTAMP,
    expires_at   TIMESTAMP,
    is_active    BOOLEAN,
    click_count  COUNTER              -- approximate; exact counts in analytics DB
);

CREATE TABLE url_shortener.user_urls (
    user_id      UUID,
    created_at   TIMESTAMP,
    short_code   TEXT,
    long_url     TEXT,
    PRIMARY KEY (user_id, created_at)  -- partition by user, cluster by time
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Index for deduplication: same long URL by the same user → return existing short code
CREATE INDEX ON url_shortener.url_mappings (long_url);  -- use sparingly!

Why two tables? The url_mappings table is optimized for the redirect path (lookup by short_code). The user_urls table is optimized for the dashboard query (list all links by user_id, paginated by time). This is Cassandra's denormalization pattern — store data in the shape of your queries.

6. Caching Strategy

Caching is the most impactful optimization for the redirect path. URL access follows a Zipfian distribution — the top 20% of URLs account for 80%+ of redirects. A well-tuned Redis cache can serve 95%+ of redirects without a database hit.

Cache Design Decisions

Cache aside (lazy loading): On cache miss, redirect service fetches from Cassandra and populates Redis. Simple, no stale data on writes.
TTL: 24–48 hours for regular URLs; shorter (1 hour) for expiring links. Never cache deleted/deactivated URLs — use a tombstone entry with TTL=5 minutes.
Cache capacity: 20% of daily URLs fit in cache covers 80% of traffic. At 500 bytes/entry × 20M entries = ~10 GB RAM — very manageable for Redis.
Eviction policy: allkeys-lru — evict the least recently used key when memory is full. Automatically keeps hot URLs in cache.
Redis cluster: Shard across 6 nodes (3 primary + 3 replica) for HA and throughput. Use consistent hashing for key distribution.

// ✅ Good - Cache-aside pattern with Spring Boot + Redis for URL redirect lookup
@Service
public class UrlRedirectService {

    private static final String CACHE_KEY_PREFIX = "url:short:";
    private static final Duration CACHE_TTL = Duration.ofHours(24);
    private static final String TOMBSTONE = "__DELETED__";

    private final StringRedisTemplate redisTemplate;
    private final UrlMappingRepository cassandraRepo;

    public UrlRedirectService(StringRedisTemplate redisTemplate,
                               UrlMappingRepository cassandraRepo) {
        this.redisTemplate = redisTemplate;
        this.cassandraRepo = cassandraRepo;
    }

    // ✅ Cache-aside: check cache first, fall back to Cassandra, then populate cache
    public String resolveLongUrl(String shortCode) {
        String cacheKey = CACHE_KEY_PREFIX + shortCode;

        // 1. Check Redis cache
        String cached = redisTemplate.opsForValue().get(cacheKey);
        if (cached != null) {
            if (TOMBSTONE.equals(cached)) {
                // ✅ Tombstone entry means URL was deleted — avoid DB hit
                throw new UrlNotFoundException(shortCode);
            }
            return cached;
        }

        // 2. Cache miss — query Cassandra
        UrlMapping mapping = cassandraRepo.findByShortCode(shortCode)
            .orElseThrow(() -> {
                // ✅ Write tombstone to prevent cache stampede on non-existent codes
                redisTemplate.opsForValue().set(cacheKey, TOMBSTONE,
                    Duration.ofMinutes(5));
                return new UrlNotFoundException(shortCode);
            });

        // 3. Validate: check expiry and active flag
        if (!mapping.isActive() || isExpired(mapping)) {
            redisTemplate.opsForValue().set(cacheKey, TOMBSTONE,
                Duration.ofMinutes(5));
            throw new UrlExpiredException(shortCode);
        }

        // 4. Populate cache and return
        redisTemplate.opsForValue().set(cacheKey, mapping.getLongUrl(), CACHE_TTL);
        return mapping.getLongUrl();
    }

    // ✅ Invalidate cache on URL deletion or deactivation
    public void invalidate(String shortCode) {
        String cacheKey = CACHE_KEY_PREFIX + shortCode;
        redisTemplate.opsForValue().set(cacheKey, TOMBSTONE, Duration.ofMinutes(5));
    }

    private boolean isExpired(UrlMapping mapping) {
        return mapping.getExpiresAt() != null
            && mapping.getExpiresAt().isBefore(Instant.now());
    }
}

7. Custom Aliases & Rate Limiting

Custom Aliases

Custom aliases (e.g., short.ly/my-product-launch) are user-specified short codes. They require additional validation: alias format check, reserved word filtering, uniqueness check, and user tier enforcement (free users get 5 custom aliases; premium get unlimited).

Validate alias against regex ^[a-zA-Z0-9_-]{3,30}$ — no path separators or reserved URI characters
Block reserved words: admin, api, login, health, metrics, etc.
Check Cassandra for uniqueness with a conditional write (INSERT IF NOT EXISTS in Cassandra LWT)
Return HTTP 409 Conflict if alias is already taken

Rate Limiting

Rate limiting protects the write path from abuse (scraping, competitor bulk creation, DoS). Use a Token Bucket or Sliding Window Counter per user/IP, backed by Redis for distributed enforcement.

// ✅ Good - Sliding window rate limiter using Redis sorted set
// Allows N requests per window, tracked per user ID
@Component
public class SlidingWindowRateLimiter {

    private final StringRedisTemplate redisTemplate;

    // ✅ Configuration: max 100 URL creations per hour per user
    private static final long WINDOW_MILLIS = 3_600_000L;  // 1 hour
    private static final long MAX_REQUESTS   = 100L;

    public SlidingWindowRateLimiter(StringRedisTemplate redisTemplate) {
        this.redisTemplate = redisTemplate;
    }

    /**
     * ✅ Returns true if the request is allowed; false if rate limit exceeded.
     * Uses Redis ZSET: score = timestamp, member = unique request ID
     */
    public boolean isAllowed(String userId) {
        String key = "ratelimit:create:" + userId;
        long now = System.currentTimeMillis();
        long windowStart = now - WINDOW_MILLIS;

        // ✅ Execute as a Lua script for atomicity — avoids TOCTOU race conditions
        String luaScript =
            "local key = KEYS[1] " +
            "local now = tonumber(ARGV[1]) " +
            "local window = tonumber(ARGV[2]) " +
            "local max = tonumber(ARGV[3]) " +
            "local id = ARGV[4] " +
            // Remove timestamps outside the window
            "redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window) " +
            // Count requests inside the window
            "local count = redis.call('ZCARD', key) " +
            "if count < max then " +
            "  redis.call('ZADD', key, now, id) " +
            "  redis.call('EXPIRE', key, math.ceil(window / 1000)) " +
            "  return 1 " +
            "else " +
            "  return 0 " +
            "end";

        Long result = redisTemplate.execute(
            new DefaultRedisScript<>(luaScript, Long.class),
            List.of(key),
            String.valueOf(now),
            String.valueOf(WINDOW_MILLIS),
            String.valueOf(MAX_REQUESTS),
            userId + ":" + now  // unique member per request
        );
        return Long.valueOf(1L).equals(result);
    }
}

8. Analytics & Tracking

Analytics is a key differentiator for paid tiers (Bitly Enterprise charges for advanced analytics). Every click generates a rich event: timestamp, referrer URL, IP → geolocation, User-Agent → device/browser/OS, and country/region.

Architecture: Never Block the Redirect

The cardinal rule of analytics in a URL shortener: the redirect must never wait for analytics to complete. Analytics writes must be 100% asynchronous. The flow:

Redirect service resolves short code, emits click event to Kafka topic url.click.events, then immediately returns the HTTP 302 redirect. Total added latency: <1ms for the Kafka produce call (fire-and-forget).
A separate Analytics Consumer (Kafka consumer group) processes click events in micro-batches, enriches them with MaxMind GeoIP and User-Agent parsing, then writes to ClickHouse (or TimescaleDB) for fast analytical queries.
An Aggregation Job (Spark Streaming or Flink) pre-aggregates hourly counts, top referrers, and geo heatmaps into a reporting table read by the user dashboard.

// ✅ Clean API — Redirect Controller with async analytics
@RestController
@RequestMapping("/")
public class RedirectController {

    private final UrlRedirectService redirectService;
    private final KafkaTemplate<String, ClickEvent> kafkaTemplate;

    private static final String CLICK_TOPIC = "url.click.events";

    public RedirectController(UrlRedirectService redirectService,
                               KafkaTemplate<String, ClickEvent> kafkaTemplate) {
        this.redirectService = redirectService;
        this.kafkaTemplate = kafkaTemplate;
    }

    @GetMapping("/{shortCode}")
    public ResponseEntity<Void> redirect(
            @PathVariable String shortCode,
            HttpServletRequest request) {

        // 1. Resolve URL (cache → DB)
        String longUrl = redirectService.resolveLongUrl(shortCode);

        // 2. Fire-and-forget analytics event — non-blocking
        ClickEvent event = ClickEvent.builder()
            .shortCode(shortCode)
            .timestamp(Instant.now())
            .ipAddress(extractClientIp(request))
            .userAgent(request.getHeader("User-Agent"))
            .referer(request.getHeader("Referer"))
            .build();
        kafkaTemplate.send(CLICK_TOPIC, shortCode, event);  // async, returns Future

        // 3. 302 redirect — browser will NOT cache this (unlike 301)
        // ✅ Use 302 to ensure every click hits the server for analytics tracking
        return ResponseEntity.status(HttpStatus.FOUND)
            .header(HttpHeaders.LOCATION, longUrl)
            .header("Cache-Control", "no-cache, no-store")
            .build();
    }

    // ✅ Extract real client IP — handle X-Forwarded-For from CDN/load balancer
    private String extractClientIp(HttpServletRequest request) {
        String xff = request.getHeader("X-Forwarded-For");
        if (xff != null && !xff.isBlank()) {
            return xff.split(",")[0].trim();  // first IP in chain is the client
        }
        return request.getRemoteAddr();
    }
}

// ✅ Clean API — URL creation endpoint
@RestController
@RequestMapping("/api/v1/urls")
public class UrlCreateController {

    private final UrlCreateService urlCreateService;
    private final SlidingWindowRateLimiter rateLimiter;

    @PostMapping
    public ResponseEntity<UrlCreateResponse> createShortUrl(
            @Valid @RequestBody UrlCreateRequest request,
            @AuthenticationPrincipal UserDetails user) {

        if (!rateLimiter.isAllowed(user.getUsername())) {
            return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS).build();
        }

        UrlCreateResponse response = urlCreateService.create(
            request.getLongUrl(),
            request.getCustomAlias(),      // nullable
            request.getExpiresAt(),        // nullable
            user.getUsername()
        );
        return ResponseEntity.status(HttpStatus.CREATED).body(response);
    }
}

9. Scalability Deep Dive

Horizontal Scaling the Application Tier

Both the write service and redirect service are stateless — they hold no local session state. This makes horizontal scaling trivial: add more instances behind a load balancer. The only shared state is in Redis and Cassandra. Target 10–20 redirect service pods per region, autoscaled on CPU and RPS.

Consistent Hashing for Redis Cluster

When adding Redis nodes to handle traffic growth, consistent hashing minimizes cache invalidation. Standard modular sharding (hash % N) invalidates ~100% of keys when N changes. Consistent hashing invalidates only ~1/N keys — critical for a cache serving 100M+ items.

Redis Cluster implements consistent hashing natively using hash slots (16,384 total). Spring Data Redis with Lettuce or Jedis handles cluster-aware routing transparently. Use virtual nodes (vnodes) to ensure even key distribution when nodes have different capacities.

CDN for Global Edge Caching

For globally distributed systems, serve redirects from CDN edge nodes (Cloudflare Workers, AWS CloudFront with Lambda@Edge). A user in Tokyo hitting a Cloudflare edge near Tokyo gets a redirect in <5ms instead of <100ms to a US origin. CDN caches the 302 response with a short TTL (e.g., 60 seconds) — balancing freshness with latency.

Important: CDN edge caching of redirects means analytics events are NOT generated at the origin server. Handle this by having the CDN forward a click beacon (async HTTP request) to your analytics ingestion endpoint, or accept that CDN-cached redirects will under-count clicks.

Multi-Region Deployment

Active-active: Write service runs in all regions; Cassandra uses multi-datacenter replication with LOCAL_QUORUM for writes. Consistency sacrifice: ~100ms replication lag between regions is acceptable for URL creation.
ID generation: Each region is assigned a unique prefix bit pattern in the counter to prevent cross-region ID collisions without coordination. (Similar to Twitter Snowflake: region bits in the ID.)
DNS routing: Route 53 or Cloudflare Geo-Routing sends users to the nearest region. Failover is automatic if health checks fail.

Handling Thundering Herd / Cache Stampede

When a viral URL's cache entry expires simultaneously, thousands of requests hit Cassandra at once — the classic cache stampede. Mitigations:

Mutex lock: First miss acquires a Redis distributed lock; other misses wait for the first request to populate the cache. Jedis/Lettuce support SET NX EX for this pattern.
Probabilistic early expiry (PER): Refresh the cache entry slightly before it expires with a small probability that increases as TTL approaches zero. Prevents synchronized expiry.
Background refresh: A background thread refreshes cache entries that are approaching TTL expiry, keeping them warm before the stampede window.

10. Common Mistakes & Interview Tips

Mistakes Candidates Commonly Make

Mistake 1: Using MD5 for Hash Generation

MD5 is not an ID generator — it's a fingerprint. When you truncate a 32-char hex string to 7 characters, you dramatically increase collision probability. At 1 billion URLs, truncated MD5 collides ~27% of the time. Always clarify in the interview that you'll use counter-based Base62 encoding instead.

Mistake 2: Using 301 Instead of 302

301 (Permanent Redirect) causes browsers to cache the redirect locally and go directly to the destination on future visits — bypassing your server entirely. This eliminates all analytics data for repeat visitors. The correct choice for a tracked URL shortener is always 302 Temporary Redirect.

Mistake 3: Blocking the Redirect for Analytics

Writing analytics data synchronously (calling the analytics DB inside the redirect handler) adds 5–50ms to every single redirect. At 100M redirects/day, even 10ms of extra latency is catastrophic. Always use an async message queue (Kafka, SQS) and decouple analytics writes from the hot path.

Mistake 4: Not Discussing the ID Generator Bottleneck

A single Redis counter for ID generation is a single point of failure and a bottleneck under high write throughput. Always mention range pre-allocation (each write node claims a batch of 1,000 IDs) and Zookeeper-based coordination for true distributed ID generation (Twitter Snowflake pattern).

Mistake 5: Ignoring URL Validation and Security

A URL shortener without validation becomes a phishing vector. Production systems must: validate URL format, check against malware/phishing URL blocklists (Google Safe Browsing API), block javascript: and data: scheme URLs, and show a preview/warning page for suspicious destinations.

Interview Bonus Points

Mention Bloom filter as a pre-check before DB lookups for non-existent short codes (reduces unnecessary DB reads for 404 abuse)
Discuss database TTL / scheduled cleanup for expired URLs to reclaim storage
Propose short URL recycling for expired codes after a grace period
Address abuse prevention: CAPTCHA for anonymous creation, honeypot fields, anomaly detection on high-volume creators
Mention vanity metrics: interviewers love it when you ask "should the system support the same long URL creating multiple distinct short codes or deduplicate?"

11. Conclusion & Key Takeaways

A production-grade URL shortener at Bitly/TinyURL scale is a masterclass in distributed systems trade-offs. The system is deceptively simple to describe but requires careful engineering at every layer:

Key Takeaways Checklist

✅ Hash generation: Counter + Base62 encoding — zero collisions, O(1) generation
✅ Storage: NoSQL (Cassandra) for URL mappings; SQL only for user accounts
✅ Caching: Redis with LRU eviction and tombstone entries; target 95%+ cache hit rate
✅ Redirect: HTTP 302 (not 301) to enable per-click analytics tracking
✅ Analytics: Async Kafka pipeline — never block the redirect on analytics writes
✅ Rate limiting: Sliding window counter per user in Redis; Lua scripts for atomicity
✅ Scalability: Stateless app tier + Redis cluster + CDN edge caching for global distribution
✅ Security: URL validation, Safe Browsing API check, reserved alias blocklist

Whether you're taking this to a system design interview or building a real product, the patterns here — cache-aside, async analytics, range-based ID generation, CDN edge caching — apply far beyond URL shorteners. They are the building blocks of any high-throughput, low-latency read-heavy distributed service.

FAQs: URL Shortener System Design

Q: How many characters should a short URL code be?

7 characters using Base62 (a-z, A-Z, 0-9) gives 62⁷ ≈ 3.5 trillion unique codes — more than enough for billions of URLs with headroom. TinyURL uses 8 characters; Bitly uses 7. 6 characters yields 56 billion codes, which is still sufficient for most services.

Q: What database should I use for a URL shortener?

NoSQL (Cassandra, DynamoDB) is preferred for the URL mapping table because read throughput far exceeds write throughput (10:1), the data model is a simple key-value lookup, and horizontal scaling is straightforward. Use a relational database only if you need strong transactional guarantees for user accounts or billing.

Q: How do you handle hash collisions in a URL shortener?

First, check the database before inserting — if the short code already exists for a different long URL, append a counter suffix or re-hash with a salt and retry. Using a Bloom filter as a pre-check avoids unnecessary database round-trips. A counter-based pre-allocation strategy (range-based ID generation) eliminates collisions entirely.

Q: How do you scale a URL shortener to handle 100M URLs per day?

Use horizontal scaling at the application tier behind a load balancer, a distributed cache (Redis cluster) for hot URLs, a NoSQL database with sharding for storage, CDN edge nodes for the redirect path, and an async event pipeline (Kafka) for click analytics. The redirect path must be ultra-low latency — target under 10ms p99.

Q: What is the difference between 301 and 302 redirects in a URL shortener?

301 (Permanent Redirect) instructs browsers to cache the redirect and go directly to the destination on future visits — this reduces server load but prevents analytics tracking of repeat visits. 302 (Temporary Redirect) forces the browser to hit your server every time, enabling accurate click counting. Services like Bitly use 302 to track every click for analytics.

URL Shortener System Design: TinyURL & Bitly at Scale (2026)

TL;DR — Core Design Decisions at a Glance

Table of Contents

1. What Is a URL Shortener?

Core Use Cases

2. Functional & Non-Functional Requirements

Functional Requirements

Non-Functional Requirements & Capacity Estimation

3. High-Level Architecture

Component Responsibilities

4. Hash Generation Strategies

Strategy Comparison

Base62 Encoding Explained

5. Database Design

SQL vs. NoSQL Trade-off Analysis

Cassandra Schema

6. Caching Strategy

Cache Design Decisions

7. Custom Aliases & Rate Limiting

Custom Aliases

Rate Limiting

8. Analytics & Tracking

Architecture: Never Block the Redirect

9. Scalability Deep Dive

Horizontal Scaling the Application Tier

Consistent Hashing for Redis Cluster

CDN for Global Edge Caching

Multi-Region Deployment

Handling Thundering Herd / Cache Stampede

10. Common Mistakes & Interview Tips

Mistakes Candidates Commonly Make

Mistake 1: Using MD5 for Hash Generation

Mistake 2: Using 301 Instead of 302

Mistake 3: Blocking the Redirect for Analytics

Mistake 4: Not Discussing the ID Generator Bottleneck

Mistake 5: Ignoring URL Validation and Security

Interview Bonus Points

11. Conclusion & Key Takeaways

Key Takeaways Checklist

FAQs: URL Shortener System Design

Q: How many characters should a short URL code be?

Q: What database should I use for a URL shortener?

Q: How do you handle hash collisions in a URL shortener?

Q: How do you scale a URL shortener to handle 100M URLs per day?

Q: What is the difference between 301 and 302 redirects in a URL shortener?

Tags

Leave a Comment

Related Posts

System Design Patterns Every Engineer Must Know

Consistent Hashing: How Distributed Systems Scale Without Reshuffling

Redis Caching Patterns for Microservices

Database Sharding: Strategies, Trade-offs & Production Pitfalls

Cookie Notice