System Design Interview Prep: Design URL Shortener, Twitter & YouTube Step-by-Step (2026)
System design interviews are the highest-stakes round at top tech companies — a 45-minute whiteboard session where vagueness is penalized and structured thinking is rewarded. Unlike coding interviews with deterministic answers, system design requires you to demonstrate judgment: knowing what to estimate, what to optimize, and what tradeoffs to explicitly discuss. This guide walks through a repeatable framework and three complete designs you can study and adapt.
The System Design Interview Framework
The most common failure mode in system design interviews is jumping straight to solutions. An interviewer presents "design Twitter" and the candidate immediately says "I'd use Kafka for the message queue." This is equivalent to a surgeon picking up a scalpel before examining the patient. Before any technical decision, you must establish requirements, constraints, and scale — because the right architecture for 10,000 daily active users is fundamentally different from the one for 100 million.
The RADIO framework provides a structured 45-minute approach. R = Requirements (5 min): clarify functional requirements (what the system does) and non-functional requirements (latency, availability, consistency, durability). A = API Design (5 min): define the public interface — endpoints, request/response contracts. D = Data Model (5 min): schema design, entity relationships, choice of storage engine. I = Implementation (20 min): deep-dive the core components — queues, caches, databases, services. O = Optimization (10 min): bottlenecks, scaling strategies, failure modes, monitoring.
Interviewers award credit for each layer, so even if you run out of time during Implementation, you will have accumulated partial credit from Requirements through Data Model. Never skip Requirements to appear impressive — candidates who skip directly to Kafka or Cassandra without establishing scale requirements are silently marked down for poor engineering judgment.
| Phase | Time | Key Questions to Answer |
|---|---|---|
| Requirements | 5 min | What are the core features? Read:write ratio? Target scale? SLA? |
| API Design | 5 min | REST vs gRPC? What endpoints? Auth model? |
| Data Model | 5 min | Entities? SQL vs NoSQL? Indexing strategy? |
| Implementation | 20 min | Component diagram, data flow, critical path, caching layer |
| Optimization | 10 min | Bottlenecks, hotspots, failure modes, monitoring |
Design #1: URL Shortener (Like bit.ly)
The URL shortener is the canonical system design warm-up. It appears simple but introduces fundamental concepts: hashing, collisions, caching read-heavy workloads, and geographic distribution. Start with requirements. Functional: given a long URL, return a short code; given a short code, redirect to the original URL. Non-functional: 100 million URLs stored, 1 billion redirects per day (100:1 read:write ratio), redirect latency <10ms P99 globally, 99.99% availability.
Capacity estimation: 1 billion redirects/day = ~11,600 redirects/second. At 100:1 ratio, ~116 writes/second. Each URL record: ~500 bytes (short code 7 chars, long URL ~250 chars, user ID, timestamps). 100 million URLs × 500 bytes = ~50 GB total storage — fits in a single database with proper indexing, no sharding required. For caching: top 20% of URLs serve ~80% of traffic (Pareto). Cache 20M hot URLs × 500 bytes = ~10 GB in Redis — trivially affordable.
The encoding algorithm must generate a 7-character short code that is unique and collision-resistant. Base62 (a-z, A-Z, 0-9) gives 62^7 = 3.5 trillion combinations — more than sufficient for 100 million URLs. The two main approaches are: (1) hash the long URL using MD5/SHA-256 and take the first 7 characters of the base62-encoded hash, or (2) use an auto-increment ID from the database and encode it in base62. Approach 1 risks collisions on popular URLs. Approach 2 is deterministic and collision-free.
// Base62 encoding for URL shortener
public class Base62Encoder {
private static final String ALPHABET =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static final int BASE = 62;
public static String encode(long id) {
StringBuilder sb = new StringBuilder();
while (id > 0) {
sb.append(ALPHABET.charAt((int)(id % BASE)));
id /= BASE;
}
// Pad to 7 characters
while (sb.length() < 7) sb.append('a');
return sb.reverse().toString();
}
public static long decode(String code) {
long id = 0;
for (char c : code.toCharArray()) {
id = id * BASE + ALPHABET.indexOf(c);
}
return id;
}
}
// API endpoints
// POST /api/v1/shorten
// Request: { "longUrl": "https://example.com/very/long/path", "customAlias": "mylink" }
// Response: { "shortCode": "ab3Xz9k", "shortUrl": "https://sho.rt/ab3Xz9k", "expiresAt": "2027-03-10" }
// GET /:code → 301 Redirect to longUrl (301 = permanent, cached by browser; 302 = temporary)
Database schema: a single urls table with columns id BIGINT PRIMARY KEY, short_code VARCHAR(10) UNIQUE, long_url TEXT, user_id BIGINT, created_at TIMESTAMP, expires_at TIMESTAMP, click_count BIGINT DEFAULT 0. Index on short_code for O(1) lookups. For caching: on redirect, check Redis first using GET short_code. On miss, query PostgreSQL and populate cache with a TTL of 24 hours. Since reads vastly outnumber writes, Redis absorbs >99% of redirect traffic.
For global latency <10ms, deploy behind a CDN. Since 301 redirects are cached by browsers and CDN edge nodes indefinitely, repeat visits to popular short URLs never hit your origin servers. Use 302 for custom aliases that might change. Place your Redis cluster in multiple regions using active-active replication. The result: a system that handles 11,600 RPS with commodity hardware, a few Redis nodes, and a CDN.
Design #2: Twitter / X Timeline
The Twitter design tests your understanding of the fan-out problem — one of the most nuanced tradeoffs in distributed systems. When a user tweets, that tweet must appear in the timelines of all their followers. With 100 million followers (e.g., a celebrity account), delivering that tweet to every follower's timeline in real-time creates a write amplification of 100 million — an enormous problem.
There are two approaches. Fan-out on write (push model): when a tweet is created, immediately write it to all followers' timeline caches in Redis. Timeline reads are O(1) Redis lookups. Problem: for celebrity accounts with 100M followers, one tweet triggers 100M Redis writes — a "thundering herd" that can overwhelm the cache tier. Fan-out on read (pull model): store tweets only once in a tweet table; on timeline read, query all followed accounts' recent tweets and merge-sort them. Reads are slow (fanout to N accounts), but writes are instant. Neither is correct in isolation.
The production approach (used by Twitter/X in reality) is a hybrid: fan-out on write for accounts with <10,000 followers, fan-out on read for celebrity accounts (>1M followers). Regular users' timeline caches are pre-populated on write. Celebrity tweets are injected into timelines at read-time by merging the pre-populated cache with a query for followed celebrities' recent tweets.
-- Tweet storage in Cassandra (optimized for time-series reads)
CREATE TABLE tweets (
user_id UUID,
tweet_id TIMEUUID, -- encodes timestamp, enables time-range queries
content TEXT,
media_urls LIST<TEXT>,
like_count COUNTER,
retweet_count COUNTER,
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);
-- Timeline cache in Redis (sorted set, score = tweet timestamp)
-- Key: timeline:{user_id}
-- Members: tweet_id, Score: unix_timestamp
-- ZADD timeline:123 1710000000 tweet_abc (add tweet to timeline)
-- ZREVRANGE timeline:123 0 49 (fetch 50 most recent tweets)
-- ZREMRANGEBYRANK timeline:123 0 -1001 (cap timeline at 1000 entries)
For search, index tweet content in Elasticsearch. The tweet service publishes to Kafka on write; an Elasticsearch consumer indexes asynchronously. Search latency target is 500ms, well within the capability of a properly tuned Elasticsearch cluster. For trending topics, maintain a Redis sorted set of hashtag counts updated on every tweet write, and take the top-K with ZREVRANGE trending:global 0 9.
Design #3: YouTube Video Upload & Streaming
YouTube-scale design introduces a new dimension: large binary data. A 1080p video at 8 Mbps is 3.6 GB for a 1-hour upload. YouTube receives 500 hours of video per minute. This means the upload, processing, and storage pipeline must be designed for massive throughput of large objects — fundamentally different from the text-centric designs above.
The upload pipeline separates concerns cleanly. The client uploads directly to blob storage (S3/GCS) via a pre-signed URL — bypassing your application servers entirely and preventing them from becoming a bottleneck on large uploads. The application server generates the pre-signed URL (valid for 1 hour), returns it to the client, and stores the video metadata (title, description, uploader, raw storage key) in PostgreSQL. When the client finishes uploading, it calls POST /videos/{id}/processing-complete to trigger the transcoding pipeline.
// Video upload flow
// 1. Client requests upload URL
// POST /api/v1/videos/upload-url
// Response: { "uploadUrl": "https://storage.googleapis.com/bucket/raw/abc123?X-Goog-Signature=...", "videoId": "abc123" }
// 2. Client uploads directly to GCS (bypasses app servers)
// PUT https://storage.googleapis.com/bucket/raw/abc123?X-Goog-Signature=...
// Body: video bytes
// 3. Client notifies completion
// POST /api/v1/videos/abc123/processing-complete
// 4. App server publishes to Kafka
// Topic: video.uploaded
// Message: { "videoId": "abc123", "rawStoragePath": "gs://raw-videos/abc123.mp4", "uploaderId": "user_789" }
// 5. Transcoding workers consume from Kafka
// For each resolution (360p, 720p, 1080p, 4K):
// - Download raw video from GCS
// - Run FFmpeg transcode
// - Upload output to CDN origin bucket
// - Publish to video.transcoding-complete Kafka topic
// FFmpeg command for 720p transcode
// ffmpeg -i input.mp4 -vf scale=-2:720 -c:v libx264 -crf 23 -preset fast
// -c:a aac -b:a 128k -movflags +faststart output_720p.mp4
Streaming uses HLS (HTTP Live Streaming) or DASH, which segments video into 6-second chunks. The CDN caches these segments at edge nodes globally. A viewer in Tokyo streams chunks from a Tokyo edge node — latency is tens of milliseconds, not trans-Pacific. The manifest file (.m3u8) lists available quality levels; the player auto-selects based on measured bandwidth (adaptive bitrate streaming).
View counts present a subtle consistency problem. Incrementing a counter in PostgreSQL on every view at YouTube scale (1 billion views/day = ~11,600/second) would saturate the database. Instead, use Redis counters: INCR view_count:videoId atomically in Redis (which handles millions of ops/second), and persist to PostgreSQL asynchronously every 60 seconds via a background job. Accept that view counts are eventually consistent — they do not need to be real-time accurate.
Capacity Estimation Cheat Sheet
Interviewers evaluate your ability to perform back-of-envelope calculations quickly and accurately. Memorize these reference points and derive estimates from them — do not memorize formulas, derive them from first principles during the interview to demonstrate understanding.
| Metric | Value | Derivation |
|---|---|---|
| 1M DAU, 1 action/day | ~12 RPS | 1M / 86,400s ≈ 12 |
| 1M DAU, 10 actions/day | ~120 RPS | 10M / 86,400s ≈ 120 |
| 100M DAU, 10 actions/day | ~12,000 RPS | 1B / 86,400s ≈ 11,600 |
| 1 KB per tweet × 1M tweets/day | ~365 GB/year | 1M × 1KB × 365 = 365 GB |
| 500B per user profile × 100M users | ~50 GB | 100M × 500B = 50 GB |
| Redis throughput | ~100K–1M ops/sec | Single node, in-memory |
| PostgreSQL throughput | ~10K–50K QPS | With connection pooling + SSD |
| CDN cache-hit rate | ~95%+ | Pareto: 20% of content = 80% of traffic |
Database Selection Guide for System Design
One of the most common interview questions — implicit or explicit — is "why did you choose that database?" A vague answer like "NoSQL is faster" is a red flag. Demonstrate understanding of the specific tradeoffs each database makes and how those tradeoffs align with the system's access patterns and consistency requirements.
PostgreSQL / MySQL for entities with complex relationships, ACID transaction requirements, and moderate scale. Use for user accounts, payment records, inventory management, order tables. The relational model and foreign keys enforce data integrity that is critical for financial data. Scale vertically first; read replicas for read scaling; sharding only when necessary and painful.
Apache Cassandra for write-heavy time-series data with known partition keys. Use for activity feeds, IoT sensor data, analytics event streams, messaging history. Cassandra's LSM-tree storage provides ~100K+ writes/second per node. Design your partition key around your primary read pattern — all queries must include the partition key or you perform expensive full-cluster scans.
Redis for caching, rate limiting, session storage, leaderboards, and pub/sub. Redis is a data structure server, not just a cache — sorted sets for rankings, streams for event queues, hyperloglog for unique visitor counts. Never use Redis as your primary data store — it is in-memory and data loss on restart without persistence configuration is real.
Elasticsearch for full-text search, faceted filtering, and log aggregation. Use for product search, tweet search, application log search. Elasticsearch's inverted index enables sub-second search across billions of documents. Never use it as primary storage — it trades consistency guarantees for search performance.
| Use Case | Database | Key Reason |
|---|---|---|
| User profiles, payments | PostgreSQL | ACID, relations, integrity |
| Activity feeds, time-series | Cassandra | High write throughput, TTL |
| Session cache, rate limiting | Redis | Sub-ms latency, data structures |
| Product / tweet search | Elasticsearch | Inverted index, full-text |
| User-generated content / docs | MongoDB | Flexible schema, nested docs |
Common System Design Mistakes to Avoid
After reviewing hundreds of system design interviews, the same failure modes appear repeatedly. The most damaging is starting with the database before establishing requirements. Database selection is a downstream decision that follows from access patterns, scale, and consistency requirements. Announcing "I'll use Cassandra" in the first minute demonstrates a solution looking for a problem — the opposite of engineering judgment.
Ignoring failure modes is the second most common mistake. Every interviewer will eventually ask "what happens if your cache goes down?" or "what if the message queue is unavailable?" A production system is defined as much by how it degrades gracefully as by how it performs under normal conditions. Address single points of failure proactively: "the Redis tier uses cluster mode with 3 primary shards and 1 replica each; on node failure, replicas are promoted automatically within 30 seconds."
Missing explicit tradeoff discussions leaves marks on the table. When you choose fan-out on write, say "I'm choosing fan-out on write because 99.9% of users have fewer than 10K followers, accepting higher write latency for faster read performance. For celebrity accounts, I'll switch to fan-out on read to avoid the thundering herd." This shows architectural maturity — you understand what you're trading and why. Interviewers at FAANG companies are specifically evaluating whether you can articulate tradeoffs, not just name patterns.
Not discussing monitoring and observability is a signal of insufficient production experience. Mention the key metrics you would alert on: P99 redirect latency, cache hit rate, error rate by endpoint, database connection pool saturation. A system with no observability is unoperatable at production scale.
FAQs: System Design Interview
Q: How long should I spend on capacity estimation?
A: 3–5 minutes. Do not over-optimize your calculations — a 2× error in estimation does not change the architecture. The goal is to establish an order of magnitude: thousands of RPS vs. millions of RPS. One requires a single load-balanced cluster; the other requires sharding and multi-region deployment.
Q: Should I use microservices or a monolith?
A: In a 45-minute interview, default to a pragmatic service decomposition — 3 to 5 services maximum. Full microservices decomposition introduces distributed transaction complexity that consumes your entire implementation time. Focus on the core data flow, not service boundaries.
Q: How do I handle the interviewer redirecting me?
A: Welcome it — redirects indicate the interviewer wants to evaluate a specific area. Acknowledge the redirect ("great point, let me focus there"), address it fully, then summarize where you were. A candidate who refuses to be redirected is demonstrating inflexibility, not depth.
Q: Do I need to know every database's internals?
A: Know the storage model (B-tree vs LSM-tree), consistency guarantees (ACID vs eventual), and primary use case of PostgreSQL, Cassandra, Redis, Elasticsearch, and MongoDB. That covers 95% of system design questions.
Q: How important is the diagram?
A: Very. A clean component diagram with labeled data flows communicates architecture more efficiently than paragraphs of text. Practice drawing box-and-arrow diagrams: client → load balancer → API gateway → service → database, with cache layers and message queues. Label arrows with protocols (HTTP/2, gRPC, Kafka topic names).
Key Takeaways
- Use RADIO framework: Requirements → API → Data Model → Implementation → Optimization. Never skip Requirements to appear impressive.
- Capacity estimation drives architecture: 12 RPS (1M DAU) requires very different solutions than 12,000 RPS (100M DAU). Always derive scale before choosing technologies.
- Fan-out tradeoff is fundamental: Push (fan-out on write) optimizes reads at the cost of write amplification. Pull (fan-out on read) does the opposite. Hybrid approaches handle celebrity accounts at Twitter scale.
- Cache everything read-heavy: A 95% cache hit rate on a 100:1 read:write workload means your database serves 5% of traffic. Redis sorted sets are the right tool for timeline and leaderboard caches.
- Separate upload from processing: Pre-signed URLs bypass application servers for large binary uploads. Async transcoding workers via Kafka provide horizontal scale for video processing.
- Articulate tradeoffs explicitly: Interviewers score judgment, not just knowledge. Say what you are trading away when making every architectural decision.
Related Articles
Discussion / Comments
Join the conversation — your comment goes directly to my inbox.