Designing a Social Media News Feed at Scale: Fan-Out Strategies, Timeline Caching & Personalization
Building a news feed for hundreds of millions of users is one of the hardest system design problems — it demands sub-100ms read latency, support for celebrity accounts with 100M+ followers, personalized ranking, and real-time freshness. This guide covers the complete architecture used by Twitter, Instagram, and Facebook.
TL;DR — Core Decision
"Use fan-out on write (push to Redis timelines) for regular users; use fan-out on read (pull from Cassandra at query time) for celebrities with >1M followers. Merge both at read time for the final feed. Rank with ML. Cache the ranked result per user with a short TTL."
Table of Contents
- Requirements & Scale
- Data Model & Social Graph
- Fan-Out on Write (Push Model)
- Fan-Out on Read (Pull Model)
- Hybrid Strategy — The Real-World Approach
- Storage Architecture & Data Tiers
- Feed Ranking & Personalization
- Media Storage & CDN Delivery
- Real-Time Updates & Push Notifications
- Scaling the Feed at Global Scale
- Design Checklist & Conclusion
1. Requirements & Scale
Functional Requirements
- Users can post text, images, and videos
- Users can follow/unfollow other users
- Home feed shows posts from followed accounts, ranked by relevance
- Feed must reflect new posts within seconds (near real-time freshness)
- Support pagination (infinite scroll) efficiently
- Users can like, comment, and share posts
Scale Estimates
| Metric | Value | Notes |
|---|---|---|
| DAU | 500M | Twitter-scale |
| Posts/day | 100M | ~1,150 posts/sec |
| Feed reads/day | 5B | 10× write ratio |
| Avg followers/user | 200 | Celebrities: 10M–100M |
Write amplification challenge: A user with 1M followers posting once = 1M Redis writes. A celebrity with 50M followers = 50M writes per post. This is why a single fan-out strategy doesn't work at scale.
2. Data Model & Social Graph
Core Tables
-- Posts (stored in Cassandra for write-heavy, time-series access)
CREATE TABLE posts_by_user (
user_id UUID,
created_at TIMESTAMP,
post_id UUID,
content TEXT,
media_urls LIST<TEXT>,
like_count COUNTER,
PRIMARY KEY (user_id, created_at, post_id)
) WITH CLUSTERING ORDER BY (created_at DESC);
-- Social graph (follows) — stored in a graph DB or Cassandra
CREATE TABLE user_follows (
follower_id UUID,
followee_id UUID,
created_at TIMESTAMP,
PRIMARY KEY (follower_id, followee_id)
);
-- Reverse index for fan-out
CREATE TABLE followers_of (
followee_id UUID,
follower_id UUID,
PRIMARY KEY (followee_id, follower_id)
);
Social Graph Storage
The social graph (who follows whom) is the backbone of the feed. At scale, store it in:
- Cassandra: For the raw follow relationship table — excellent for wide-row fan-out lookups
- Redis Sets: Cache the follower list for active users —
SMEMBERS followers:{user_id} - Neo4j / Dgraph: For graph traversal queries (mutual friends, second-degree connections, recommendations)
3. Fan-Out on Write (Push Model)
When a user posts, immediately push the post ID to each follower's timeline cache. The timeline is stored as a Redis Sorted Set, scored by publish timestamp.
// Fan-out worker (runs as Kafka consumer per partition)
void onPost(PostCreatedEvent event) {
String postId = event.postId;
long score = event.createdAtMs;
// Batch Redis pipeline for all followers
List<String> followers = socialGraph.getFollowers(event.userId);
try (Pipeline pipe = redis.pipelined()) {
for (String followerId : followers) {
String key = "timeline:" + followerId;
pipe.zadd(key, score, postId); // add to sorted set
pipe.zremrangeByRank(key, 0, -801); // keep only 800 latest posts
pipe.expire(key, 86400 * 7); // 7-day TTL
}
pipe.sync();
}
}
// Feed read — O(1) Redis fetch
List<String> getFeed(String userId, int page, int size) {
long offset = (long) page * size;
return redis.zrevrange("timeline:" + userId, offset, offset + size - 1);
}
Pros and Cons
✅ Advantages
- Feed reads are O(1) — just a Redis range query
- Read latency is consistently < 5ms
- No complex merge logic at read time
- Works perfectly for users with ≤ 100K followers
❌ Disadvantages
- Write amplification for celebrities (50M writes per post)
- Inactive users waste Redis memory (they never read the feed)
- Lag for ultra-popular accounts during fan-out
4. Fan-Out on Read (Pull Model)
Posts are stored once in Cassandra. At read time, fetch the most recent posts from each followee and merge them into a ranked feed. No write amplification during posting.
// Fan-out on read — merge N timelines at query time
List<Post> getFeedByMerge(String userId, int size) {
List<String> followees = socialGraph.getFollowees(userId); // users I follow
// Parallel fetch from Cassandra — async
List<CompletableFuture<List<Post>>> futures = followees.stream()
.map(fId -> CompletableFuture.supplyAsync(
() -> postRepo.getLatest(fId, size)))
.toList();
// Merge and sort by created_at
return futures.stream()
.map(CompletableFuture::join)
.flatMap(Collection::stream)
.sorted(Comparator.comparing(Post::getCreatedAt).reversed())
.limit(size)
.toList();
}
Problem: If you follow 500 accounts, this requires 500 Cassandra reads per feed request at O(500ms+) latency. Completely unacceptable for interactive feeds — only works for users following few accounts, or with aggressive result caching.
5. Hybrid Strategy — The Real-World Approach
Twitter, Instagram, and Facebook all use a hybrid approach. The key insight: classify users by follower count and apply different strategies.
Hybrid Fan-Out Rules
- Regular user (< 10K followers): Fan-out on write → push post_id to all follower Redis timelines immediately
- Popular user (10K – 1M followers): Fan-out on write but async (Kafka worker) with priority queue to prevent blocking
- Celebrity (> 1M followers): Skip fan-out entirely. Store post in Cassandra only. Inject at read time.
- Inactive followers: Skip fan-out if the follower hasn't been active in 30 days — saves millions of writes
Read-time merge: When serving a feed, take the user's Redis timeline (regular followees) + fetch last N posts from celebrity followees in Cassandra + merge + re-rank.
// Hybrid feed assembly
List<Post> getHybridFeed(String userId, int size) {
// 1. Get cached timeline from Redis (regular followees already fan-outed)
List<String> postIdsFromCache = redis.zrevrange("timeline:" + userId, 0, size * 2);
// 2. Identify celebrity followees
List<String> celebFollowees = socialGraph.getCelebrityFollowees(userId);
// 3. Fetch latest posts from celebrities (parallel Cassandra reads)
List<Post> celebPosts = fetchCelebPosts(celebFollowees, size);
// 4. Hydrate post IDs from cache
List<Post> regularPosts = hydrate(postIdsFromCache);
// 5. Merge and rank
return rankingService.rank(merge(regularPosts, celebPosts), userId, size);
}
6. Storage Architecture & Data Tiers
Redis Timeline Cache
- Data structure: Sorted Set —
ZADD timeline:{user_id} {timestamp} {post_id} - Keep only the 800 most recent posts per user — trim with
ZREMRANGEBYRANK - TTL: 7 days — evict timelines of inactive users
- Memory estimate: 500M users × 800 posts × 16 bytes/entry ≈ 6.4TB Redis cluster
- Use Redis Cluster with 6 shards and replication factor 2 for HA
Cassandra Post Store
- Partition key:
user_id— all posts for a user on one partition (locality for fan-out) - Clustering key:
created_at DESC— natural time-series ordering without secondary index - Replication factor: 3 across 3 datacenters for global availability
- Compaction: TWCS (TimeWindowCompactionStrategy) for time-series write patterns
- TTL: 90 days for content in warm tier; archive older posts to S3 Parquet
7. Feed Ranking & Personalization
A chronological feed stopped working at scale when engagement rates dropped — users missed important content from close friends buried under high-frequency posters. Ranked feeds increased user engagement 40–70% for major platforms.
Two-Stage Ranking Pipeline
| Stage | Goal | Latency Budget |
|---|---|---|
| Candidate Generation | Retrieve 500–2000 candidate posts from Redis + Cassandra | < 20ms |
| Light Scoring | Apply fast heuristics: recency, relationship strength, content type | < 10ms |
| ML Re-ranking | Top-200 candidates scored by engagement prediction model | < 30ms |
| Diversity Filter | Limit same-author posts, inject ads/promoted content | < 5ms |
| Final Feed | Top-20 posts returned to client | — |
Ranking Features
- Recency signal: Exponential decay — posts older than 24h score 50% lower
- Relationship strength: DMs, comments, mutual follows → weighted higher
- Engagement velocity: Post receiving 1000 likes in 10 minutes scores much higher
- Content affinity: User's historical engagement with video vs. text vs. images
- Diversity: Maximum 3 posts from the same author in top-20; inject varied content types
8. Media Storage & CDN Delivery
Images and videos dominate storage and bandwidth costs. The media pipeline:
- Upload: Client uploads directly to S3 via presigned URL — bypasses your backend entirely
- Processing: S3 event → Lambda/worker → transcode video (HLS with multiple bitrates), generate image thumbnails (3 sizes)
- Storage: Original in S3 Standard; thumbnails in S3 Standard; videos older than 90 days in S3 Intelligent-Tiering
- CDN delivery: CloudFront/Akamai serves all media — cache at edge PoPs closest to users
- Lazy loading: Feed initially shows placeholder; images load as user scrolls into viewport
- Adaptive quality: Client sends connection speed → serve appropriate resolution (360p/720p/1080p)
9. Real-Time Updates & Push Notifications
Users expect to see new posts appear in their feed without refreshing. Two strategies:
Polling vs. Push
- Long polling: Client holds open HTTP connection for 30s; server responds when new posts arrive or times out. Simple but doesn't scale to millions of concurrent connections.
- Server-Sent Events (SSE): Server pushes new post IDs to connected clients over a persistent HTTP stream. Better than WebSocket for unidirectional feed updates.
- Push notifications: For background users (app not open), send APNs/FCM push notification when a followed account posts. Respect notification preferences and rate-limit to prevent notification fatigue.
New Post Indicator Pattern
Rather than live-streaming posts into the feed (jarring UX), show a "X new posts — tap to refresh" banner. This decouples real-time notification from feed refresh and gives users control over when to update.
10. Scaling the Feed at Global Scale
Geographic Distribution
- Deploy Cassandra with multi-datacenter replication (US-East, EU-West, APAC)
- Redis Cluster per region — users' timelines served from their local region
- Fan-out workers replicate to all regions via Kafka MirrorMaker for global celebrities
- Social graph partitioned by user geography — reduce cross-region latency for followee lookups
Engagement Counter Design
Like/comment counts are read frequently but need not be perfectly accurate at every read. Use approximate counters:
- Increment Redis counter on every like:
INCR post_likes:{post_id} - Async Kafka consumer writes to Cassandra (eventual consistency — acceptable for counters)
- For viral posts (> 10K likes/min), use Redis HyperLogLog for approximate unique-user count
11. Design Checklist & Conclusion
News Feed System Design Checklist
- ☐ Fan-out strategy defined: write for regular users, read for celebrities, hybrid at threshold
- ☐ Redis Sorted Sets used for timeline cache with bounded size (800 posts) and TTL
- ☐ Cassandra schema uses (user_id, created_at DESC) primary key for time-ordered fan-out
- ☐ Inactive users skipped during fan-out (30-day activity check)
- ☐ Two-stage ranking pipeline: candidate gen → ML re-rank → diversity filter
- ☐ Media uploads via presigned S3 URLs (never through your API servers)
- ☐ CDN configured for media delivery with aggressive edge caching
- ☐ New post notification via "X new posts" banner rather than live-injecting into feed
- ☐ Read and write paths are independently scalable services
- ☐ Engagement counters use Redis + async Cassandra persistence
The news feed is one of the most read-heavy, latency-sensitive systems in existence. Its design requires accepting fundamental trade-offs: write amplification for fast reads, eventual consistency for engagement counters, and approximate ranking for acceptable latency. Start with a pure fan-out-on-write approach, measure your P99 latencies under celebrity posting scenarios, and adopt the hybrid strategy as your user base grows past 10M accounts.