What database does Twitter use for tweets?

Twitter uses Apache Cassandra as the primary tweet store due to its ability to handle billions of writes with high availability. Tweets are stored with a compound key (author_id, tweet_id) where tweet_id is a Snowflake ID — a 64-bit time-ordered distributed ID that enables efficient time-range scans without secondary indexes.

What is fan-out on write vs fan-out on read?

Fan-out on write (push): when a tweet is published, it is immediately written to every follower's timeline cache. This makes reads fast but writes expensive for celebrities. Fan-out on read (pull): followers' timelines are assembled by reading the followees' tweets at query time. This is cheap on write but expensive on read. Twitter uses a hybrid: push for users with fewer than ~10k followers, pull for celebrities at read time.

How does Snowflake ID work?

A Snowflake ID is a 64-bit integer composed of: 41 bits for timestamp (milliseconds since epoch), 10 bits for machine ID (5 for datacenter + 5 for worker), and 12 bits for sequence number within the millisecond. This gives 4096 unique IDs per millisecond per machine, is globally unique without coordination, and sorts chronologically.

How would you detect trending topics in real-time on Twitter?

Use a Count-Min Sketch data structure for approximate frequency counting with O(1) updates, combined with a min-heap to track the top-K topics. Kafka Streams processes the tweet stream with a sliding time window (1 hour), incrementing counts for each hashtag/phrase. Results are materialized to Redis and served via an API with low latency. This avoids storing exact counts for billions of distinct topics.

System Design April 11, 2026 · 24 min read

Designing Twitter at 500M Users Scale: Feed, Fanout & Follow Graph (2026)

Q: What is the hardest part of designing Twitter?

The hardest challenge is the home timeline — delivering tweets from all followed accounts within 500ms to 200M daily active users. The naive approach (query all followees at read time) cannot scale, so Twitter uses a hybrid fanout model: push tweets to follower timelines (Redis sorted sets) for normal users, and pull at read time for celebrity accounts with millions of followers.

A complete system design deep-dive covering tweet storage, home timeline fanout, the follow graph, Snowflake IDs, Redis caching, Kafka-driven fanout workers, and trending topics — ready for senior system design interviews.

Md Sanwar Hossain
Senior Java & Backend Engineer

TL;DR: Twitter's core challenge is the home timeline — serving 500M users reading tweets from people they follow. The answer is a hybrid push/pull fanout model: push for users with <10k followers (fan-out on write), pull at read time for celebrities (fan-out on read).

1. Scale Requirements & Estimation

Before designing any system, anchor your design decisions with concrete numbers. Twitter's scale in 2026:

500M registered users, 200M DAU, 500M tweets/day
Peak write throughput: ~15,000 tweets/sec (traffic spikes at events)
Peak read throughput: ~300,000 timeline reads/sec — reads dwarf writes 20:1
Average tweet: ~300 bytes (text) — 500GB/day tweets-only storage
Media: ~1.5MB average per media tweet — adds ~5TB/day
Follow graph: avg user follows 200 people, avg 500 followers
Latency SLA: <500ms for home timeline (p99)

Component	Peak RPS	Storage	Latency SLA
Tweet writes	15K/s	500GB/day	<200ms
Timeline reads	300K/s	Redis cache	<500ms
Fanout workers	50K/s writes	Kafka queue	Eventual
Media CDN	500K/s	5TB/day	<100ms

2. High-Level Architecture

The system is composed of several loosely-coupled services, each with a distinct responsibility:

Tweet Service: Accept tweet writes, validate, persist to Cassandra, emit to Kafka topic tweet-created
Fanout Service: Kafka consumer group that reads tweet-created events, fetches follower list, writes to Redis timelines
Timeline Service: Serves home timeline by reading Redis sorted set; falls back to Cassandra rebuild on cache miss
User Service: Stores user profiles in MySQL (sharded); manages follow/unfollow operations
Search Service: Elasticsearch cluster for full-text tweet search and trending topic computation
Notification Service: Consumes events from Kafka, sends push notifications (FCM/APNs)
Media Service: Handles image/video uploads; stores in S3; invalidates CDN (CloudFront)

All services sit behind an API Gateway (rate limiting, auth, routing). A Service Mesh (Istio) handles mTLS between services, circuit breaking, and distributed tracing.

3. Tweet Service & Snowflake ID

Every tweet needs a globally unique, time-sortable ID with no central coordination point. Twitter's open-source Snowflake generator solves this elegantly:

Snowflake ID (64-bit):

41 bits — timestamp (ms since custom epoch, ~69 years of IDs)
10 bits — machine ID (5 datacenter + 5 worker node)
12 bits — sequence counter (4096 IDs/ms/machine)

Cassandra stores tweets partitioned by author_id with tweet_id as the clustering key in descending order — enabling fast "get latest N tweets by user" queries without full scans.

// ❌ BAD: MySQL with auto-increment IDs

-- Hot partition problem at scale!
CREATE TABLE tweets (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  author_id BIGINT,
  content TEXT,
  created_at TIMESTAMP
);
-- All writes go to the last partition: sequential scan, locking, single-point bottleneck

// ✅ GOOD: Cassandra + Snowflake ID

-- Cassandra CQL: partition by author, cluster by time descending
CREATE TABLE tweets (
  author_id   UUID,
  tweet_id    BIGINT,   -- Snowflake ID (time-sortable, globally unique)
  content     TEXT,
  media_urls  LIST<TEXT>,
  created_at  TIMESTAMP,
  deleted     BOOLEAN,
  PRIMARY KEY (author_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC)
  AND default_time_to_live = 0
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                   'compaction_window_unit': 'DAYS',
                   'compaction_window_size': 7};

// Java Snowflake ID Generator
public class SnowflakeIdGenerator {
    private final long datacenterId;
    private final long machineId;
    private final long epoch = 1288834974657L; // Twitter's custom epoch
    private long sequence = 0L;
    private long lastTimestamp = -1L;

    public synchronized long nextId() {
        long timestamp = currentTime();
        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & 0xFFF; // 12-bit mask
            if (sequence == 0) timestamp = waitNextMillis(lastTimestamp);
        } else {
            sequence = 0L;
        }
        lastTimestamp = timestamp;
        return ((timestamp - epoch) << 22) | (datacenterId << 17) | (machineId << 12) | sequence;
    }
}

4. Fanout Service: Push vs Pull vs Hybrid

The fanout problem is the central system design challenge at Twitter scale: when Lady Gaga tweets, you cannot write to 100 million follower timelines synchronously.

Strategy	How It Works	When to Use	Pros	Cons
Fan-out on Write	Push tweet to all follower Redis timelines immediately	≤10K followers	Fast reads (O(1))	Expensive for celebrities
Fan-out on Read	Fetch followees' tweets at query time & merge	Celebrities (>10K followers)	Cheap writes	Slow reads, N+1 queries
Hybrid	Push for normal users; pull celebrity tweets at read time	Production Twitter	Balanced cost & speed	Complexity at read merge

// ✅ GOOD: Kafka Fanout Consumer with celebrity check & batch processing

@Component
public class FanoutWorker {
    @Autowired private FollowGraphService followGraphService;
    @Autowired private RedisTemplate<String, Long> redisTemplate;
    private static final int CELEBRITY_THRESHOLD = 10_000;
    private static final int BATCH_SIZE = 1_000;
    private static final int TIMELINE_MAX_SIZE = 800;

    @KafkaListener(topics = "tweet-created", groupId = "fanout-group",
                   concurrency = "10")  // 10 parallel consumers
    public void handleTweet(TweetCreatedEvent event) {
        long followerCount = followGraphService.getFollowerCount(event.getAuthorId());
        if (followerCount > CELEBRITY_THRESHOLD) {
            // Celebrities are NOT fanned out — pulled at read time
            return;
        }
        // Fetch followers in batches to avoid memory explosion
        int offset = 0;
        while (true) {
            List<Long> batch = followGraphService.getFollowers(
                event.getAuthorId(), offset, BATCH_SIZE);
            if (batch.isEmpty()) break;

            // Pipeline Redis writes for throughput
            redisTemplate.executePipelined((RedisCallback<?>) conn -> {
                for (Long followerId : batch) {
                    String key = "timeline:" + followerId;
                    conn.zAdd(key.getBytes(), event.getTweetId(),
                              String.valueOf(event.getTweetId()).getBytes());
                    conn.zRemRangeByRank(key.getBytes(), 0, -(TIMELINE_MAX_SIZE + 1));
                }
                return null;
            });
            offset += BATCH_SIZE;
        }
    }
}

5. Home Timeline: Redis Sorted Sets

The home timeline is stored in Redis as a sorted set where the score is the tweet's Snowflake ID. Since Snowflake IDs are time-ordered, sorting by score gives chronological order:

Key: timeline:{user_id}
Score: Snowflake tweet_id (time-sortable)
Value: tweet_id (look up full tweet from Cassandra or Tweet cache)
Keep last 800 tweets per timeline (trim on each fanout write)
Cache miss: rebuild from Cassandra by querying each followee's recent tweets, merge & populate

// ✅ GOOD: Timeline service — Redis first, Cassandra fallback + celebrity merge

@Service
public class TimelineService {
    @Autowired private RedisTemplate<String, Long> redis;
    @Autowired private TweetRepository tweetRepository;
    @Autowired private FollowGraphService followGraph;
    @Autowired private CelebrityService celebrityService;

    public List<Tweet> getHomeTimeline(long userId, int limit, Long cursor) {
        String key = "timeline:" + userId;
        // 1. Read from Redis sorted set (newest first)
        Set<Long> tweetIds = redis.opsForZSet()
            .reverseRangeByScore(key, 0, cursor != null ? cursor : Double.MAX_VALUE,
                                 0, limit);

        if (tweetIds == null || tweetIds.isEmpty()) {
            // 2. Cache miss: rebuild from Cassandra
            tweetIds = rebuildFromCassandra(userId, limit);
        }

        // 3. Merge celebrity tweets (fan-out on read for celebrities)
        List<Long> celebrities = followGraph.getCelebrityFollowees(userId);
        for (long celebId : celebrities) {
            List<Long> celebTweets = tweetRepository.getRecentTweetIds(celebId, 20);
            tweetIds.addAll(celebTweets);
        }

        // 4. Sort merged list, take top N
        return tweetIds.stream()
            .sorted(Comparator.reverseOrder())
            .limit(limit)
            .map(tweetRepository::findById)
            .filter(Objects::nonNull)
            .collect(Collectors.toList());
    }
}

6. Follow Graph: Social Graph Database

The follow graph stores who follows whom. At Twitter scale (~1B edges for 500M users), the options are:

Option	Pros	Cons	Used by
MySQL sharded	Familiar, ACID	Cross-shard joins hard	Twitter (historically)
Cassandra	High write throughput	No joins, eventual consistency	Large-scale follow graphs
Neo4j / Graph DB	Rich traversals (mutual follows, suggestions)	Hard to scale past 10B edges	LinkedIn recommendations

Twitter chose MySQL sharded by user_id for the follow table (two tables: followers and following, both indexed). Cache follower counts and recent follower lists in Redis with a 10-minute TTL.

7. Search & Trending Topics

Full-text search: Tweets are indexed in Elasticsearch as they are written. Elasticsearch clusters use time-based index rotation (one index per day) with aliases for search and reindex operations.

Trending topics are a top-K approximate counting problem. Twitter's approach:

Count-Min Sketch: O(1) frequency updates for arbitrary hashtags without storing exact per-key counters — approximate count with bounded error
Min-Heap (top-K): Maintain a heap of the K most frequent topics; update as new counts arrive
Kafka Streams: Process tweet stream with a 1-hour sliding window; output trending topics every 5 minutes to Redis
Geo-awareness: Trending computed per geo-region (city, country, global) using separate Kafka Streams topologies

8. Media Upload & CDN

Pre-signed S3 URL pattern: Client requests upload URL from API → API returns pre-signed S3 URL → Client uploads directly to S3 (bypassing app servers)
CDN caching: Media URLs are immutable (content-addressed by hash) → infinite TTL on CloudFront → 99% cache hit rate
Video transcoding: ffmpeg workers consume from SQS queue; produce multiple resolutions (360p, 720p, 1080p); store in S3; update tweet record with CDN URLs
Image optimization: WebP conversion, responsive sizes generated on upload via Lambda@Edge

9. Notification System

Notifications (likes, retweets, replies, follows) are event-driven via Kafka:

Actions emit events to a notification-events Kafka topic
Notification workers consume events, look up user preferences (push/email/in-app enabled?)
For push: dispatch to FCM (Android) or APNs (iOS) via a dedicated Push Service
For in-app: write to a notifications:{user_id} Redis list (capped at 100 items); serve via WebSocket on active connections
Batching: If user receives 50 likes in 10 seconds, batch into "50 people liked your tweet" (reduces push notification fatigue)

10. Rate Limiting & Abuse Prevention

Twitter's rate limiting uses token bucket per user, implemented atomically in Redis using Lua scripts:

-- Redis Lua: atomic token bucket rate limiter

local key = KEYS[1]
local rate = tonumber(ARGV[1])    -- tokens per hour (e.g., 100 for tweets)
local burst = tonumber(ARGV[2])   -- max burst capacity
local now = tonumber(ARGV[3])     -- current timestamp (seconds)
local tokens = redis.call("GET", key)
if not tokens then
  redis.call("SET", key, burst - 1, "EX", 3600)
  return 1  -- allowed
end
tokens = tonumber(tokens)
if tokens > 0 then
  redis.call("DECR", key)
  return 1  -- allowed
end
return 0  -- rate limited

Abuse prevention: Shadowbanning — the tweet write succeeds (so the abuser sees no error) but their tweets are excluded from search results and other users' timelines. Implemented via a Redis Set of shadowbanned user IDs checked at read time.

11. Global Scale: Multi-Region Architecture

Active-active multi-region: Users are geo-routed to their nearest region (US-East, US-West, EU, APAC) via Route53 latency-based routing
Cassandra: Geo-distributed cluster with NetworkTopologyStrategy; use LOCAL_QUORUM for fast regional writes/reads; asynchronous cross-region replication
Follow graph (MySQL): Master in home region, read replicas in all regions — follow/unfollow goes to master, fanout reads from local replica
Redis: Regional Redis clusters (not cross-region replicated); timeline cache is rebuilt from Cassandra on miss — acceptable since reads are fast
Consistency model: Timeline is eventually consistent — a new tweet may take up to 10 seconds to appear in all followers' timelines globally. This is explicitly stated in Twitter's design docs.

12. Interview Questions & System Design Checklist

Q: How do you handle Lady Gaga tweeting to 100M followers?

A: Lady Gaga is classified as a "celebrity" (follower count > threshold). Her tweet is NOT fanned out to 100M Redis timelines — that would cost 100M writes per tweet. Instead, at read time, when a follower requests their home timeline, the system additionally queries Lady Gaga's recent tweets from Cassandra and merges them into the response. The hybrid approach adds at most a few milliseconds of latency at read time.

Q: What happens to a user's timeline when they unfollow someone?

A: The follow graph is updated immediately (MySQL write). Future fanout will no longer include the unfollowed user. However, existing tweets from that user already in the Redis timeline sorted set remain until they age out (timeline holds only 800 recent items). There is no cleanup of existing timeline entries — this is an accepted eventual consistency trade-off at Twitter's scale.

Q: How does Twitter search differ from web search?

A: Twitter search is real-time (tweets must be searchable within seconds) while web search crawls periodically. Twitter uses Elasticsearch with near-real-time indexing (tweets indexed within <5s). It also heavily weighs recency over PageRank-style authority. Queries are short-text matching, not document similarity. The index is time-partitioned (daily) for efficient deletion and TTL of old indices.

✅ Twitter System Design Interview Checklist

Anchor with back-of-envelope numbers first
Justify Cassandra over MySQL for tweets
Explain Snowflake ID structure
Hybrid fanout: push for normal, pull for celebrity
Redis sorted set for timeline cache
Kafka for async fanout decoupling
Count-Min Sketch for trending topics
Pre-signed S3 for media uploads
Active-active multi-region design
Eventual consistency acknowledgment
Rate limiting with token bucket + Redis
Shadow ban for abuse prevention

Frequently Asked Questions

What is the hardest part of designing Twitter?

The home timeline — serving pre-computed, low-latency tweet feeds to 200M daily active users reading from hundreds of followees. The naive solution (query all followees' tweets at read time) breaks at scale; the production answer is the hybrid fanout model detailed in this post.

Why does Twitter use Cassandra instead of PostgreSQL?

PostgreSQL cannot handle 15,000 writes/second with 99.99% availability across 3+ data centers without extraordinary sharding complexity. Cassandra is designed for this: tunable consistency, no single master, linear horizontal scalability, and excellent time-series write patterns that match tweet storage needs.

How does Twitter's trending algorithm work?

Twitter uses approximate frequency counting (Count-Min Sketch) over a sliding time window via Kafka Streams. Topics that show abnormally high growth rate relative to their baseline are surfaced as trending — it's not just raw count but acceleration that matters. Results are geo-localized and personalized.

How would you scale the fanout service further?

Increase Kafka partition count (currently 10 consumer instances) for parallelism. Use Redis Cluster to distribute timeline keys across multiple shards. Pre-warm timelines for "active users" (those who have opened the app recently) — inactive users' timelines can be rebuilt lazily on next login.

Tags:

twitter system design tweet fanout home timeline design snowflake id cassandra twitter system design interview 2026 social media at scale

System Design

Instagram System Design

System Design

News Feed System Design

System Design

WhatsApp System Design

System Design

URL Shortener System Design

Back to Blog Last updated: April 11, 2026

Designing Twitter at 500M Users Scale: Feed, Fanout & Follow Graph (2026)

1. Scale Requirements & Estimation

2. High-Level Architecture

3. Tweet Service & Snowflake ID

4. Fanout Service: Push vs Pull vs Hybrid

5. Home Timeline: Redis Sorted Sets

6. Follow Graph: Social Graph Database

7. Search & Trending Topics

8. Media Upload & CDN

9. Notification System

10. Rate Limiting & Abuse Prevention

11. Global Scale: Multi-Region Architecture

12. Interview Questions & System Design Checklist

Frequently Asked Questions

Leave a Comment

Related Posts