Designing Twitter at 500M Users Scale: Feed, Fanout & Follow Graph (2026)

A complete system design deep-dive covering tweet storage, home timeline fanout, the follow graph, Snowflake IDs, Redis caching, Kafka-driven fanout workers, and trending topics — ready for senior system design interviews.

Twitter System Design 2026
TL;DR: Twitter's core challenge is the home timeline — serving 500M users reading tweets from people they follow. The answer is a hybrid push/pull fanout model: push for users with <10k followers (fan-out on write), pull at read time for celebrities (fan-out on read).

1. Scale Requirements & Estimation

Before designing any system, anchor your design decisions with concrete numbers. Twitter's scale in 2026:

  • 500M registered users, 200M DAU, 500M tweets/day
  • Peak write throughput: ~15,000 tweets/sec (traffic spikes at events)
  • Peak read throughput: ~300,000 timeline reads/sec — reads dwarf writes 20:1
  • Average tweet: ~300 bytes (text) — 500GB/day tweets-only storage
  • Media: ~1.5MB average per media tweet — adds ~5TB/day
  • Follow graph: avg user follows 200 people, avg 500 followers
  • Latency SLA: <500ms for home timeline (p99)
ComponentPeak RPSStorageLatency SLA
Tweet writes15K/s500GB/day<200ms
Timeline reads300K/sRedis cache<500ms
Fanout workers50K/s writesKafka queueEventual
Media CDN500K/s5TB/day<100ms

2. High-Level Architecture

The system is composed of several loosely-coupled services, each with a distinct responsibility:

  • Tweet Service: Accept tweet writes, validate, persist to Cassandra, emit to Kafka topic tweet-created
  • Fanout Service: Kafka consumer group that reads tweet-created events, fetches follower list, writes to Redis timelines
  • Timeline Service: Serves home timeline by reading Redis sorted set; falls back to Cassandra rebuild on cache miss
  • User Service: Stores user profiles in MySQL (sharded); manages follow/unfollow operations
  • Search Service: Elasticsearch cluster for full-text tweet search and trending topic computation
  • Notification Service: Consumes events from Kafka, sends push notifications (FCM/APNs)
  • Media Service: Handles image/video uploads; stores in S3; invalidates CDN (CloudFront)

All services sit behind an API Gateway (rate limiting, auth, routing). A Service Mesh (Istio) handles mTLS between services, circuit breaking, and distributed tracing.

3. Tweet Service & Snowflake ID

Every tweet needs a globally unique, time-sortable ID with no central coordination point. Twitter's open-source Snowflake generator solves this elegantly:

Snowflake ID (64-bit):
  • 41 bits — timestamp (ms since custom epoch, ~69 years of IDs)
  • 10 bits — machine ID (5 datacenter + 5 worker node)
  • 12 bits — sequence counter (4096 IDs/ms/machine)

Cassandra stores tweets partitioned by author_id with tweet_id as the clustering key in descending order — enabling fast "get latest N tweets by user" queries without full scans.

// ❌ BAD: MySQL with auto-increment IDs
-- Hot partition problem at scale!
CREATE TABLE tweets (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  author_id BIGINT,
  content TEXT,
  created_at TIMESTAMP
);
-- All writes go to the last partition: sequential scan, locking, single-point bottleneck
// ✅ GOOD: Cassandra + Snowflake ID
-- Cassandra CQL: partition by author, cluster by time descending
CREATE TABLE tweets (
  author_id   UUID,
  tweet_id    BIGINT,   -- Snowflake ID (time-sortable, globally unique)
  content     TEXT,
  media_urls  LIST<TEXT>,
  created_at  TIMESTAMP,
  deleted     BOOLEAN,
  PRIMARY KEY (author_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC)
  AND default_time_to_live = 0
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                   'compaction_window_unit': 'DAYS',
                   'compaction_window_size': 7};

// Java Snowflake ID Generator
public class SnowflakeIdGenerator {
    private final long datacenterId;
    private final long machineId;
    private final long epoch = 1288834974657L; // Twitter's custom epoch
    private long sequence = 0L;
    private long lastTimestamp = -1L;

    public synchronized long nextId() {
        long timestamp = currentTime();
        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & 0xFFF; // 12-bit mask
            if (sequence == 0) timestamp = waitNextMillis(lastTimestamp);
        } else {
            sequence = 0L;
        }
        lastTimestamp = timestamp;
        return ((timestamp - epoch) << 22) | (datacenterId << 17) | (machineId << 12) | sequence;
    }
}

4. Fanout Service: Push vs Pull vs Hybrid

The fanout problem is the central system design challenge at Twitter scale: when Lady Gaga tweets, you cannot write to 100 million follower timelines synchronously.

StrategyHow It WorksWhen to UseProsCons
Fan-out on WritePush tweet to all follower Redis timelines immediately≤10K followersFast reads (O(1))Expensive for celebrities
Fan-out on ReadFetch followees' tweets at query time & mergeCelebrities (>10K followers)Cheap writesSlow reads, N+1 queries
HybridPush for normal users; pull celebrity tweets at read timeProduction TwitterBalanced cost & speedComplexity at read merge
// ✅ GOOD: Kafka Fanout Consumer with celebrity check & batch processing
@Component
public class FanoutWorker {
    @Autowired private FollowGraphService followGraphService;
    @Autowired private RedisTemplate<String, Long> redisTemplate;
    private static final int CELEBRITY_THRESHOLD = 10_000;
    private static final int BATCH_SIZE = 1_000;
    private static final int TIMELINE_MAX_SIZE = 800;

    @KafkaListener(topics = "tweet-created", groupId = "fanout-group",
                   concurrency = "10")  // 10 parallel consumers
    public void handleTweet(TweetCreatedEvent event) {
        long followerCount = followGraphService.getFollowerCount(event.getAuthorId());
        if (followerCount > CELEBRITY_THRESHOLD) {
            // Celebrities are NOT fanned out — pulled at read time
            return;
        }
        // Fetch followers in batches to avoid memory explosion
        int offset = 0;
        while (true) {
            List<Long> batch = followGraphService.getFollowers(
                event.getAuthorId(), offset, BATCH_SIZE);
            if (batch.isEmpty()) break;

            // Pipeline Redis writes for throughput
            redisTemplate.executePipelined((RedisCallback<?>) conn -> {
                for (Long followerId : batch) {
                    String key = "timeline:" + followerId;
                    conn.zAdd(key.getBytes(), event.getTweetId(),
                              String.valueOf(event.getTweetId()).getBytes());
                    conn.zRemRangeByRank(key.getBytes(), 0, -(TIMELINE_MAX_SIZE + 1));
                }
                return null;
            });
            offset += BATCH_SIZE;
        }
    }
}

5. Home Timeline: Redis Sorted Sets

The home timeline is stored in Redis as a sorted set where the score is the tweet's Snowflake ID. Since Snowflake IDs are time-ordered, sorting by score gives chronological order:

  • Key: timeline:{user_id}
  • Score: Snowflake tweet_id (time-sortable)
  • Value: tweet_id (look up full tweet from Cassandra or Tweet cache)
  • Keep last 800 tweets per timeline (trim on each fanout write)
  • Cache miss: rebuild from Cassandra by querying each followee's recent tweets, merge & populate
// ✅ GOOD: Timeline service — Redis first, Cassandra fallback + celebrity merge
@Service
public class TimelineService {
    @Autowired private RedisTemplate<String, Long> redis;
    @Autowired private TweetRepository tweetRepository;
    @Autowired private FollowGraphService followGraph;
    @Autowired private CelebrityService celebrityService;

    public List<Tweet> getHomeTimeline(long userId, int limit, Long cursor) {
        String key = "timeline:" + userId;
        // 1. Read from Redis sorted set (newest first)
        Set<Long> tweetIds = redis.opsForZSet()
            .reverseRangeByScore(key, 0, cursor != null ? cursor : Double.MAX_VALUE,
                                 0, limit);

        if (tweetIds == null || tweetIds.isEmpty()) {
            // 2. Cache miss: rebuild from Cassandra
            tweetIds = rebuildFromCassandra(userId, limit);
        }

        // 3. Merge celebrity tweets (fan-out on read for celebrities)
        List<Long> celebrities = followGraph.getCelebrityFollowees(userId);
        for (long celebId : celebrities) {
            List<Long> celebTweets = tweetRepository.getRecentTweetIds(celebId, 20);
            tweetIds.addAll(celebTweets);
        }

        // 4. Sort merged list, take top N
        return tweetIds.stream()
            .sorted(Comparator.reverseOrder())
            .limit(limit)
            .map(tweetRepository::findById)
            .filter(Objects::nonNull)
            .collect(Collectors.toList());
    }
}

6. Follow Graph: Social Graph Database

The follow graph stores who follows whom. At Twitter scale (~1B edges for 500M users), the options are:

OptionProsConsUsed by
MySQL shardedFamiliar, ACIDCross-shard joins hardTwitter (historically)
CassandraHigh write throughputNo joins, eventual consistencyLarge-scale follow graphs
Neo4j / Graph DBRich traversals (mutual follows, suggestions)Hard to scale past 10B edgesLinkedIn recommendations

Twitter chose MySQL sharded by user_id for the follow table (two tables: followers and following, both indexed). Cache follower counts and recent follower lists in Redis with a 10-minute TTL.

8. Media Upload & CDN

  • Pre-signed S3 URL pattern: Client requests upload URL from API → API returns pre-signed S3 URL → Client uploads directly to S3 (bypassing app servers)
  • CDN caching: Media URLs are immutable (content-addressed by hash) → infinite TTL on CloudFront → 99% cache hit rate
  • Video transcoding: ffmpeg workers consume from SQS queue; produce multiple resolutions (360p, 720p, 1080p); store in S3; update tweet record with CDN URLs
  • Image optimization: WebP conversion, responsive sizes generated on upload via Lambda@Edge

9. Notification System

Notifications (likes, retweets, replies, follows) are event-driven via Kafka:

  1. Actions emit events to a notification-events Kafka topic
  2. Notification workers consume events, look up user preferences (push/email/in-app enabled?)
  3. For push: dispatch to FCM (Android) or APNs (iOS) via a dedicated Push Service
  4. For in-app: write to a notifications:{user_id} Redis list (capped at 100 items); serve via WebSocket on active connections
  5. Batching: If user receives 50 likes in 10 seconds, batch into "50 people liked your tweet" (reduces push notification fatigue)

10. Rate Limiting & Abuse Prevention

Twitter's rate limiting uses token bucket per user, implemented atomically in Redis using Lua scripts:

-- Redis Lua: atomic token bucket rate limiter
local key = KEYS[1]
local rate = tonumber(ARGV[1])    -- tokens per hour (e.g., 100 for tweets)
local burst = tonumber(ARGV[2])   -- max burst capacity
local now = tonumber(ARGV[3])     -- current timestamp (seconds)
local tokens = redis.call("GET", key)
if not tokens then
  redis.call("SET", key, burst - 1, "EX", 3600)
  return 1  -- allowed
end
tokens = tonumber(tokens)
if tokens > 0 then
  redis.call("DECR", key)
  return 1  -- allowed
end
return 0  -- rate limited

Abuse prevention: Shadowbanning — the tweet write succeeds (so the abuser sees no error) but their tweets are excluded from search results and other users' timelines. Implemented via a Redis Set of shadowbanned user IDs checked at read time.

11. Global Scale: Multi-Region Architecture

  • Active-active multi-region: Users are geo-routed to their nearest region (US-East, US-West, EU, APAC) via Route53 latency-based routing
  • Cassandra: Geo-distributed cluster with NetworkTopologyStrategy; use LOCAL_QUORUM for fast regional writes/reads; asynchronous cross-region replication
  • Follow graph (MySQL): Master in home region, read replicas in all regions — follow/unfollow goes to master, fanout reads from local replica
  • Redis: Regional Redis clusters (not cross-region replicated); timeline cache is rebuilt from Cassandra on miss — acceptable since reads are fast
  • Consistency model: Timeline is eventually consistent — a new tweet may take up to 10 seconds to appear in all followers' timelines globally. This is explicitly stated in Twitter's design docs.

12. Interview Questions & System Design Checklist

Q: How do you handle Lady Gaga tweeting to 100M followers?

A: Lady Gaga is classified as a "celebrity" (follower count > threshold). Her tweet is NOT fanned out to 100M Redis timelines — that would cost 100M writes per tweet. Instead, at read time, when a follower requests their home timeline, the system additionally queries Lady Gaga's recent tweets from Cassandra and merges them into the response. The hybrid approach adds at most a few milliseconds of latency at read time.

Q: What happens to a user's timeline when they unfollow someone?

A: The follow graph is updated immediately (MySQL write). Future fanout will no longer include the unfollowed user. However, existing tweets from that user already in the Redis timeline sorted set remain until they age out (timeline holds only 800 recent items). There is no cleanup of existing timeline entries — this is an accepted eventual consistency trade-off at Twitter's scale.

Q: How does Twitter search differ from web search?

A: Twitter search is real-time (tweets must be searchable within seconds) while web search crawls periodically. Twitter uses Elasticsearch with near-real-time indexing (tweets indexed within <5s). It also heavily weighs recency over PageRank-style authority. Queries are short-text matching, not document similarity. The index is time-partitioned (daily) for efficient deletion and TTL of old indices.

✅ Twitter System Design Interview Checklist
  • Anchor with back-of-envelope numbers first
  • Justify Cassandra over MySQL for tweets
  • Explain Snowflake ID structure
  • Hybrid fanout: push for normal, pull for celebrity
  • Redis sorted set for timeline cache
  • Kafka for async fanout decoupling
  • Count-Min Sketch for trending topics
  • Pre-signed S3 for media uploads
  • Active-active multi-region design
  • Eventual consistency acknowledgment
  • Rate limiting with token bucket + Redis
  • Shadow ban for abuse prevention

Frequently Asked Questions

What is the hardest part of designing Twitter?

The home timeline — serving pre-computed, low-latency tweet feeds to 200M daily active users reading from hundreds of followees. The naive solution (query all followees' tweets at read time) breaks at scale; the production answer is the hybrid fanout model detailed in this post.

Why does Twitter use Cassandra instead of PostgreSQL?

PostgreSQL cannot handle 15,000 writes/second with 99.99% availability across 3+ data centers without extraordinary sharding complexity. Cassandra is designed for this: tunable consistency, no single master, linear horizontal scalability, and excellent time-series write patterns that match tweet storage needs.

How does Twitter's trending algorithm work?

Twitter uses approximate frequency counting (Count-Min Sketch) over a sliding time window via Kafka Streams. Topics that show abnormally high growth rate relative to their baseline are surfaced as trending — it's not just raw count but acceleration that matters. Results are geo-localized and personalized.

How would you scale the fanout service further?

Increase Kafka partition count (currently 10 consumer instances) for parallelism. Use Redis Cluster to distribute timeline keys across multiple shards. Pre-warm timelines for "active users" (those who have opened the app recently) — inactive users' timelines can be rebuilt lazily on next login.

Tags:
twitter system design tweet fanout home timeline design snowflake id cassandra twitter system design interview 2026 social media at scale

Leave a Comment

Related Posts

System Design

Instagram System Design

System Design

News Feed System Design

System Design

WhatsApp System Design

System Design

URL Shortener System Design

Back to Blog Last updated: April 11, 2026