WhatsApp System Design: Messaging at 2B+ Users Scale (2026)
How do you design a system that delivers 100 billion messages per day to 2 billion users with sub-second latency, end-to-end encryption, and five-nines availability? This deep dive covers every layer of the WhatsApp architecture — from WebSocket connection management and the Signal encryption protocol to Cassandra schema design and group-messaging fan-out strategies — with annotated Java code you can bring to your next system design interview.
TL;DR — The Architecture in Five Lines
- Persistent WebSocket connections per device on stateful chat servers — one connection = one open TCP socket per client.
- Signal Protocol (double-ratchet + X3DH) for end-to-end encryption; keys never leave the device.
- Cassandra time-series schema for per-user message queues; Redis for presence & offline-queue metadata.
- Fan-out on write for small groups (<256 members); fan-out on read for broadcast channels with thousands of members.
- CDN + S3-compatible object store for media; deduplication via SHA-256 content hash before upload.
Table of Contents
- WhatsApp at Scale — What We're Building
- Functional & Non-Functional Requirements
- High-Level Architecture
- Real-Time Messaging: WebSocket vs HTTP Long-Polling
- Message Delivery & Offline Queue
- End-to-End Encryption (Signal Protocol)
- Group Messaging Fan-Out
- Media Storage & CDN
- Database Design
- Scalability & Fault Tolerance
- Common Interview Mistakes & What Interviewers Look For
- Conclusion & Checklist
1. WhatsApp at Scale — What We're Building
WhatsApp is the world's most used messaging platform. As of 2026 its key public metrics are staggering: 2 billion+ monthly active users, 100 billion messages per day, 1 billion+ groups, and a backend that runs on a surprisingly lean engineering team relative to its scale. Building a system at this level requires rethinking every conventional design assumption.
In a system design interview, you will rarely be asked to design the entire WhatsApp product. Instead, the interviewer wants to see that you can scope sensibly and dive deep on the hard problems. The core subsystems we will design are:
- 1-to-1 and group text messaging — real-time delivery with acknowledgement and read receipts
- Offline message storage — durable message queuing for offline recipients
- End-to-end encryption — client-side encryption using the Signal Protocol so servers never see plaintext
- Media sharing — images, videos, and documents via an object storage pipeline
- Presence — online/last-seen indicators without polling the server on every request
Scale Reality Check
- ~1.16 million messages per second at peak (100B / 86,400s)
- Average message size ~1 KB → ~1.16 GB/s of message payload throughput
- ~500 million daily group conversations
- ~100 million media files shared per day
- Each user connects from up to 4 linked devices simultaneously (WhatsApp multi-device)
2. Functional & Non-Functional Requirements
Functional Requirements
- Send and receive text messages in 1-to-1 and group chats (up to 1,024 members)
- Message delivery statuses: sent ✓, delivered ✓✓, read ✓✓ (blue)
- Offline message queuing — messages delivered when the recipient comes online
- Send and receive media (images <16 MB, video <64 MB, documents <100 MB)
- End-to-end encryption — server processes only opaque ciphertext
- Presence indicators — online, last seen, typing indicator
- Push notifications via APNs / FCM for offline devices
- Multi-device support — messages synchronized across up to 4 linked devices
Non-Functional Requirements & Capacity Estimates
| Requirement | Target | Notes |
|---|---|---|
| Message latency (P99) | < 500 ms | Sender to server; server to receiver depends on their connectivity |
| Availability | 99.99% | <53 min downtime/year |
| Message throughput | 1.2M msg/s peak | 100B/day ÷ 86,400s ≈ 1.16M/s |
| Storage (messages, 30 days) | ~3 PB | 100B × 1KB avg × 30 days |
| Active WebSocket connections | ~600M concurrent | ~30% of 2B DAU online at any moment |
| Consistency model | Eventual + ordered per conversation | Messages within a chat must be in order; cross-chat is eventually consistent |
3. High-Level Architecture
The system is composed of several horizontally-scalable services. Below is the data-flow for a message sent from Alice to Bob:
Alice (Client)
│ WebSocket (TLS 1.3)
▼
┌─────────────────────┐
│ Load Balancer │ (L4, sticky by user_id hash)
└─────────────────────┘
│
▼
┌─────────────────────┐ ┌─────────────────┐
│ Chat Server A │─────▶│ Message Queue │ (Kafka / internal queue)
│ (stateful WS) │ └────────┬────────┘
└─────────────────────┘ │
▼
┌───────────────────────────────────────────────┐
│ Message Processing Service │
│ (decrypt metadata, store, fan-out, ACK) │
└────────┬──────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────────┐
│ Cassandra DB │ │ Presence Service │
│ (message store) │ │ (Redis cluster) │
└─────────────────┘ └──────────────────────┘
│
▼
Chat Server B (Bob's connection) ──▶ Bob (Client)
Or Push Notification Service ──▶ Bob (offline)
Core Services Breakdown
- Chat Servers: Stateful, long-lived WebSocket servers. Each server holds tens of thousands of open connections. Horizontally scalable — a new chat server can be added to accept more connections at any time.
- Message Queue (Kafka): Decouples message ingestion from processing. Producers (chat servers) write to a per-user-partition topic. Consumers (message processors) fan out to recipients, store messages, and send ACKs.
- Presence Service: Redis-backed service tracking who is online and on which chat server. Uses Redis pub/sub for cross-server notification routing.
- Push Notification Service: Integrates with APNs (iOS) and FCM (Android) to wake offline clients and trigger a reconnect.
- Media Service: Handles upload URLs, compression, deduplication, and CDN distribution for binary attachments.
- User Service: Manages accounts, phone-to-user-id mapping, profile metadata, and contact discovery.
4. Real-Time Messaging: WebSocket vs HTTP Long-Polling
The choice of transport protocol is the most critical architectural decision for a messaging system. Both WebSocket and HTTP Long-Polling have been used in production, but they have fundamentally different scalability and latency characteristics.
| Dimension | WebSocket | HTTP Long-Polling |
|---|---|---|
| Connection model | Persistent, full-duplex TCP | Repeated HTTP requests, semi-duplex |
| Server push latency | ~1–10 ms | 0–30 s (poll interval) |
| Header overhead | 2–10 bytes per frame | ~800 bytes per HTTP request |
| Connections per server | 50,000–500,000 (with tuning) | ~10,000–30,000 (thread-limited) |
| Load balancer compatibility | Requires sticky sessions or L4 LB | Stateless; any L7 LB works |
| Firewall & proxy support | Some proxies block WS; port 443 fallback needed | Works everywhere (plain HTTPS) |
| WhatsApp's choice | ✅ WebSocket (primary) | Fallback for restricted networks |
WhatsApp uses a custom binary protocol over WebSocket (originally XMPP, later replaced with a proprietary protocol). The persistent connection lets the server push messages instantly without any polling overhead. A single chat server can hold 500,000+ concurrent connections using Java's NIO (non-blocking I/O) or a reactive stack.
❌ Bad WebSocket Handler — Blocking, No ACK, No Retry
// ❌ Bad: Blocking handler — ties up a thread per message,
// no ACK sent back to sender, drops message on DB error
@ServerEndpoint("/ws/chat")
public class BadChatWebSocketHandler {
@OnMessage
public void onMessage(String rawMessage, Session session) {
// ❌ Synchronous DB write blocks the WebSocket thread pool
Message msg = parseMessage(rawMessage);
messageRepository.save(msg); // blocks thread up to 2s
// ❌ No acknowledgement — sender never knows if message arrived
// ❌ No retry — if DB write fails, message is silently lost
// ❌ No async fan-out — recipient delivery happens inline
User recipient = userService.findById(msg.getRecipientId()); // 2nd blocking call
if (recipient.isOnline()) {
Session recipientSession = sessionRegistry.get(recipient.getId());
recipientSession.getBasicRemote().sendText(rawMessage); // 3rd blocking call
}
}
}
✅ Good WebSocket Handler — Async, ACK, Exponential Backoff Retry
// ✅ Good: Non-blocking handler — publishes to Kafka, sends ACK immediately,
// retry logic lives in the consumer, not the WebSocket handler
@ServerEndpoint("/ws/chat")
public class GoodChatWebSocketHandler {
private final KafkaProducer<String, ChatMessage> kafkaProducer;
private final MessageAckService ackService;
@OnMessage
public void onMessage(String rawMessage, Session session) {
ChatMessage msg = ChatMessage.parseFrom(rawMessage);
// ✅ Publish to Kafka asynchronously — non-blocking fire-and-return
ProducerRecord<String, ChatMessage> record =
new ProducerRecord<>("chat-messages", msg.getConversationId(), msg);
kafkaProducer.send(record, (metadata, ex) -> {
if (ex == null) {
// ✅ ACK immediately once message is durable in Kafka
ackService.sendAck(session, msg.getMessageId(), MessageStatus.SENT);
} else {
// ✅ Report failure to client — client will retry with backoff
ackService.sendNack(session, msg.getMessageId(), ex.getMessage());
}
});
// ✅ Handler thread returns immediately — never blocks
}
@OnError
public void onError(Session session, Throwable t) {
// Graceful cleanup; pending outbound messages stay in Kafka topic
sessionRegistry.remove(session);
presenceService.markOffline(session.getUserPrincipal().getName());
}
}
// ✅ Client-side retry with exponential backoff (pseudocode reference)
// maxRetries = 5, baseDelay = 500ms, factor = 2.0
// retry 1: 500ms, retry 2: 1s, retry 3: 2s, retry 4: 4s, retry 5: 8s + jitter
5. Message Delivery & Offline Queue
WhatsApp's three-tick system (sent ✓ / delivered ✓✓ / read ✓✓ blue) is one of the most recognizable UI patterns in messaging. Implementing it correctly requires a well-designed ACK pipeline.
Message State Machine
PENDING ──send──▶ SENT ──server_ack──▶ DELIVERED ──read_receipt──▶ READ
│ │
│ └──▶ push_notification (if offline)
│
└──▶ FAILED (no server ack after timeout → client retries)
Message ACK Service — Java Implementation
// ✅ Good: Message delivery service with full status tracking
@Service
public class MessageDeliveryService {
private final MessageRepository messageRepo; // Cassandra
private final PresenceService presenceService; // Redis
private final WebSocketSessionRegistry sessionRegistry;
private final PushNotificationService pushService;
private final KafkaTemplate<String, AckEvent> kafka;
/**
* Called by Kafka consumer after message is persisted in Cassandra.
* Determines delivery path: real-time WS or offline push.
*/
@KafkaListener(topics = "chat-messages", groupId = "delivery-group")
public void deliverMessage(ChatMessage message) {
String recipientId = message.getRecipientId();
// Update persisted status to SENT (server received it)
messageRepo.updateStatus(message.getMessageId(), MessageStatus.SENT);
notifySender(message.getSenderId(), message.getMessageId(), MessageStatus.SENT);
PresenceInfo presence = presenceService.getPresence(recipientId);
if (presence.isOnline()) {
// ✅ Real-time delivery via WebSocket
boolean delivered = tryDeliverViaWebSocket(recipientId, message);
if (delivered) {
messageRepo.updateStatus(message.getMessageId(), MessageStatus.DELIVERED);
notifySender(message.getSenderId(), message.getMessageId(), MessageStatus.DELIVERED);
return;
}
}
// ✅ Recipient is offline — store in offline queue + push notification
offlineQueueRepo.enqueue(recipientId, message);
pushService.sendPushNotification(recipientId,
PushPayload.builder()
.title("New message")
.conversationId(message.getConversationId())
.build());
}
/**
* Called when recipient opens the conversation (read receipt).
*/
public void markAsRead(String userId, String conversationId, long upToMessageSeq) {
List<Message> messages = messageRepo.findUnread(conversationId, userId, upToMessageSeq);
messages.forEach(m -> {
messageRepo.updateStatus(m.getMessageId(), MessageStatus.READ);
notifySender(m.getSenderId(), m.getMessageId(), MessageStatus.READ);
});
}
private void notifySender(String senderId, String messageId, MessageStatus status) {
AckEvent ack = new AckEvent(messageId, status, Instant.now());
Session senderSession = sessionRegistry.getSession(senderId);
if (senderSession != null && senderSession.isOpen()) {
senderSession.getAsyncRemote().sendObject(ack); // non-blocking send
}
}
private boolean tryDeliverViaWebSocket(String recipientId, ChatMessage msg) {
Session session = sessionRegistry.getSession(recipientId);
if (session == null || !session.isOpen()) return false;
try {
session.getAsyncRemote().sendObject(msg);
return true;
} catch (Exception e) {
return false; // falls back to offline queue
}
}
}
Offline Queue Design
When a recipient is offline, messages are stored in their per-user Cassandra partition (not a separate queue table — the same message store, just undelivered). When the client reconnects:
- Client sends a
CONNECTframe with its last received sequence number. - Chat server queries Cassandra for all messages with
seq > last_seen_seqfor that user. - Messages are streamed back over the newly opened WebSocket in order.
- Client sends bulk ACK; server updates delivery statuses and notifies original senders.
Retention: WhatsApp stores messages on its servers only until they are delivered. Once all recipient devices have acknowledged delivery, the server-side copy is deleted. This is a deliberate design choice to reinforce end-to-end privacy.
6. End-to-End Encryption (Signal Protocol)
WhatsApp uses the Signal Protocol for end-to-end encryption — the same protocol used by Signal messenger and originally developed by Open Whisper Systems. The server processes only ciphertext and never has access to the encryption keys or plaintext message content.
Key Concepts
- Identity Key Pair: Long-term Curve25519 key pair generated on device install. Public key registered with the server; private key never leaves the device.
- Signed Prekey: A medium-term Curve25519 key pair signed by the identity key. Rotated every few weeks.
- One-Time Prekeys (OTKs): A batch of single-use Curve25519 key pairs uploaded to the server. Each new conversation consumes one OTK to prevent replay attacks.
- X3DH (Extended Triple Diffie-Hellman): The initial key agreement protocol used when Alice sends a message to Bob for the first time. Combines four DH operations using Alice's identity key, Bob's identity key, Bob's signed prekey, and Bob's OTK to derive a shared secret without Bob being online.
- Double Ratchet Algorithm: After X3DH, every message uses a new encryption key derived from the double ratchet — a combination of a symmetric-key ratchet and a Diffie-Hellman ratchet. This provides forward secrecy (past messages remain secure if current keys are compromised) and break-in recovery (future messages are safe even after a temporary key compromise).
What the Server Stores vs What It Doesn't
✅ Server Stores
- Public identity keys
- Public signed prekeys
- Public one-time prekeys
- Encrypted ciphertext blobs
- Message metadata (sender, recipient, timestamp, size)
❌ Server Never Stores
- Private keys (any type)
- Session keys / ratchet state
- Plaintext message content
- Media decryption keys
- Message content after delivery
Media Encryption
Media files are encrypted on-device before upload with a randomly generated AES-256-CBC key and an HMAC-SHA-256 integrity key. Only the encrypted blob is uploaded to the media CDN. The decryption key is included in the message payload — itself encrypted by the Signal session — so only the recipient can decrypt the media.
7. Group Messaging Fan-Out
Group messaging is significantly more complex than 1-to-1 messaging because a single message must be delivered to potentially hundreds or thousands of recipients. The two competing strategies are fan-out on write and fan-out on read.
Fan-Out on Write vs Fan-Out on Read
| Dimension | Fan-Out on Write | Fan-Out on Read |
|---|---|---|
| Mechanism | Write one copy per recipient's mailbox at send time | Store one copy; each recipient fetches it at read time |
| Write amplification | High (N writes for N members) | Low (1 write per message) |
| Read latency | Very low (message already in mailbox) | Higher (must query shared timeline) |
| Suitable for | Small groups (<256 members); active users | Large broadcast channels (>1,000 members) |
| Individual read receipt | Easy — per-copy status field | Harder — needs separate read-tracking table |
| WhatsApp strategy | ✅ Used for groups <256 | ✅ Used for channels & large groups |
Group Encryption Challenge
End-to-end encrypting a group message is more complex than a 1-to-1 message because there is no single shared secret. WhatsApp uses the Sender Key mechanism from the Signal Protocol:
- Each group member generates a random Sender Key for a specific group.
- The sender encrypts their Sender Key individually to each group member (using their 1-to-1 Signal session) — this is the only per-member work on the first message.
- Subsequent group messages are encrypted once with the Sender Key. All members can decrypt it since they received the Sender Key in step 2.
- If a member leaves or is added, all Sender Keys are rotated.
This means group messages only require O(N) work once per key distribution, not O(N) per message — a major efficiency improvement over naive group encryption.
8. Media Storage & CDN
WhatsApp handles approximately 100 million media files per day. Storing and serving these efficiently requires a dedicated media pipeline separate from the message pipeline.
Upload Flow
- Hash before upload: Client computes
SHA-256of the encrypted media blob. Sends the hash to the Media Service to check for an existing copy (deduplication). - Pre-signed URL: If no duplicate, Media Service returns a pre-signed upload URL pointing to object storage (S3-compatible). Client uploads directly to object storage — no chat server involvement.
- Confirm upload: Client notifies Media Service that the upload is complete. Media Service triggers post-processing (thumbnail generation, virus scan, CDN distribution).
- Media URL in message: Sender includes the CDN URL + decryption key in the encrypted message payload sent via WebSocket.
Content Deduplication
If 10,000 people forward the same viral video, it only needs to be stored once. The content-addressed storage key is sha256(encrypted_blob). When a new upload's hash matches an existing blob, the Media Service returns the existing CDN URL immediately — no re-upload needed. This simple deduplication reportedly reduces WhatsApp's storage costs by 30–40% for heavily forwarded media.
CDN & Expiry Strategy
- Media files are served from a geo-distributed CDN (Points of Presence close to users) to minimize download latency.
- CDN URLs include a time-limited signature (TTL: 30 days). After expiry, recipients must request a refresh URL — the encrypted blob remains in object storage for the duration of the chat backup period.
- Original quality and compressed thumbnail versions are stored separately. The thumbnail is included in the message metadata for instant preview; the full-quality file is downloaded on-demand.
9. Database Design
Database selection is critical for a messaging system. The access patterns are highly predictable: reads and writes are almost always scoped to a single conversation, and the data is time-series in nature — making Apache Cassandra an excellent fit.
Why Cassandra for Messages?
- Write-optimized: Cassandra's LSM-tree storage engine handles extremely high write throughput with predictable, low-latency writes — critical for 1.2M messages/second.
- Partition by conversation: All messages in a conversation land in the same partition, enabling efficient range scans by sequence number or timestamp.
- Linear horizontal scale: Add nodes to increase capacity without schema changes or data migration complexity.
- Tunable consistency: Use
LOCAL_QUORUMfor writes (durability) andLOCAL_ONEfor reads (speed) within a datacenter. - Time-to-live (TTL): Cassandra's built-in TTL feature automatically expires undelivered messages after a configurable period (e.g., 30 days) without separate cleanup jobs.
Cassandra Schema
-- Messages table: partition by conversation, cluster by time (descending for latest-first queries)
CREATE TABLE messages (
conversation_id UUID,
message_seq BIGINT, -- monotonically increasing per conversation (Snowflake ID)
message_id UUID,
sender_id UUID,
ciphertext BLOB, -- encrypted payload (server never decrypts)
message_type TINYINT, -- 0=text, 1=image, 2=video, 3=audio, 4=doc
media_url TEXT, -- null for text messages
status TINYINT, -- 0=sent, 1=delivered, 2=read
created_at TIMESTAMP,
PRIMARY KEY ((conversation_id), message_seq)
) WITH CLUSTERING ORDER BY (message_seq DESC)
AND default_time_to_live = 2592000; -- 30-day TTL; deleted after delivery in practice
-- User inbox: tracks per-user last-seen seq for offline replay
CREATE TABLE user_inbox (
user_id UUID,
conversation_id UUID,
last_read_seq BIGINT,
last_seen_seq BIGINT,
unread_count INT,
updated_at TIMESTAMP,
PRIMARY KEY ((user_id), conversation_id)
);
-- Conversations table (metadata)
CREATE TABLE conversations (
conversation_id UUID PRIMARY KEY,
conversation_type TINYINT, -- 0=direct, 1=group, 2=channel
participant_ids SET<UUID>,
group_name TEXT,
created_at TIMESTAMP,
last_message_seq BIGINT
);
User Service — Relational DB (PostgreSQL)
User profiles, phone number to user_id mapping, and contact discovery use PostgreSQL (or a consistent relational store) because these require strong consistency and are read-heavy relative to writes. The user service is accessed far less frequently than the message store and doesn't need Cassandra's write throughput.
Redis for Presence & Session Routing
The Presence Service uses Redis for ultra-low-latency lookups:
-- Key: presence:{user_id}
-- Value: { "status": "online", "chat_server": "chat-server-7", "last_seen": 1712856321 }
-- TTL: 60 seconds (auto-expires if client disconnects without sending DISCONNECT frame)
-- Key: sessions:{chat_server_id}
-- Value: SET of user_ids connected to this server (used for server failover)
10. Scalability & Fault Tolerance
Chat Server Failure Recovery
Chat servers are stateful (they hold open WebSocket connections), which makes failure recovery non-trivial. The strategy:
- Heartbeat-based failure detection: Each client sends a ping every 30 seconds. If the chat server doesn't respond within 60 seconds, the client triggers reconnection.
- Redis session tracking: A health-check service monitors each chat server. When a server fails, it bulk-deletes all presence entries for that server. Clients detect absence of pong and reconnect to a different server.
- No in-flight message loss: Messages are durable in Kafka and Cassandra before the WebSocket ACK is sent. If the client doesn't receive an ACK and reconnects, it replays from the last acknowledged sequence number.
- Zero-downtime rolling restarts: Before a chat server is shut down (e.g., for deployment), it sends a
RECONNECTframe to all connected clients with a list of alternative servers, triggering a graceful migration.
Message Ordering
Within a conversation, messages must be ordered. WhatsApp uses a per-conversation monotonic sequence number generated by a distributed ID service (similar to Twitter's Snowflake). The sequence number encodes a timestamp, data-center ID, and machine ID, making it globally unique and time-sortable without coordination across servers.
- Sequence numbers are assigned by the Message Processing Service (single writer per partition) — not the sender — to prevent gaps from network re-ordering.
- Clients display messages in sequence-number order, not arrival order, to handle cases where two messages from the same device arrive out of order at the server.
- Clock skew between client devices is normalized by server-assigned timestamps.
Multi-Region Architecture
WhatsApp operates in multiple geographic regions (Americas, EMEA, APAC) with the following design:
- Users are homed to a region based on phone number prefix. All messages to/from that user are processed in their home region.
- Cross-region messages (Alice in the US, Bob in Germany) are routed through a global backbone. Alice's chat server writes to Kafka in the US region; a cross-region Kafka MirrorMaker replicates to the EU region where Bob's chat server delivers it.
- Cassandra replication uses
NetworkTopologyStrategywith a replication factor of 3 in each datacenter and 2 in a backup datacenter.LOCAL_QUORUMwrites require 2/3 replicas in the local datacenter for durability.
11. Common Interview Mistakes & What Interviewers Look For
Common Mistakes
- Using a single SQL database for messages without discussing partitioning strategy
- Not addressing the offline message delivery problem — treating all users as always-online
- Designing the group fan-out as a single synchronous call to N chat servers
- Ignoring message ordering — assuming arrival order equals send order
- Conflating end-to-end encryption with transport-layer TLS
- Forgetting multi-device support — designing for single-device users only
- Not separating the media pipeline from the message pipeline
- Using HTTP polling instead of WebSocket without justification
What Strong Candidates Do
- Drive clear requirements before drawing any architecture
- Identify the hardest problems early: fan-out at scale, offline delivery, E2E encryption
- Explain the ACK pipeline in detail — sent/delivered/read state transitions
- Justify database choices with specific access pattern reasoning
- Discuss failure modes: what happens when a chat server dies?
- Distinguish between fan-out-on-write and fan-out-on-read with thresholds
- Address the group encryption key distribution problem proactively
- Think about storage costs and propose TTL / deduplication strategies
Bonus: Interviewers Love These Deep Dives
- Typing indicator: Sent via a separate lightweight WebSocket event (not through Kafka/DB). Ephemeral — server doesn't persist it. Debounced client-side to avoid flooding the server on every keystroke.
- Last Seen: Updated in Redis with a TTL. If a user disables "last seen", the server stops broadcasting it but still updates internally for the user's own devices.
- Message search: Not natively supported server-side for privacy reasons. Full-text search is done client-side on the locally decrypted message database — reinforcing the E2E encryption model.
- Backup: Optional, encrypted backups to Google Drive / iCloud use a different key path — a password-derived key managed by the user, not WhatsApp.
Frequently Asked Questions
Q: Why does WhatsApp use Cassandra and not a traditional relational database like MySQL?
A: MySQL (and relational databases generally) are excellent for complex queries, joins, and strong ACID transactions — but WhatsApp's message access pattern doesn't need any of that. Every message read is a simple range scan within a single conversation partition. What WhatsApp needs is extremely high write throughput (1.2M messages/second), horizontal scalability without schema-change friction, and built-in TTL for message expiry. Cassandra's write-optimized LSM-tree storage and masterless peer-to-peer architecture deliver all three. Trying to achieve WhatsApp's write scale on a single-primary MySQL cluster would require enormous engineering effort to shard, replicate, and manage.
Q: If messages are end-to-end encrypted, how does WhatsApp detect spam and abuse?
A: This is a genuine tension in the design. WhatsApp can inspect metadata — sender, recipient, frequency, group membership, phone number reputation — without decrypting content. Client-side scanning for known CSAM hashes (PhotoDNA) is done on the device before encryption. Users can also report messages, which sends the last few messages (decrypted by the reporting user's device) to WhatsApp for review — a deliberate opt-in disclosure. WhatsApp has stated that "forwarding limits" (capping how many people a message can be forwarded to) are applied based on forward-count metadata without reading content.
Q: How does WhatsApp handle the "thundering herd" problem when millions of users reconnect after an outage?
A: Reconnection is controlled with exponential backoff with jitter on the client side. When clients detect disconnection, they wait a random delay (e.g., base 2s ± jitter) before reconnecting, doubling the base delay on each retry up to a cap. This spreads the reconnection wave over minutes instead of seconds, preventing a synchronized storm from overwhelming the load balancer and chat servers simultaneously. The load balancer itself distributes reconnecting clients using consistent hashing to avoid hot-spotting individual chat servers.
Q: How does WhatsApp's multi-device feature work with end-to-end encryption?
A: Each linked device has its own identity key pair. When Alice links a new device (Device B), a key agreement happens between Device A (the primary) and Device B using the same X3DH protocol used for regular contacts. The primary device sends all conversation history to the new device encrypted to Device B's identity key. Subsequently, every outgoing message is encrypted separately for each of Alice's linked devices and each of Bob's linked devices — the sender's device knows all the public keys for all linked devices of all participants. This is why adding the 4th linked device to a large group chat requires distributing Sender Keys to every device of every member.
Q: What is the estimated storage cost for 100 billion messages per day?
A: A typical text message payload (ciphertext + metadata) averages ~1 KB. 100B × 1KB = ~100 TB/day of raw message data. With replication factor 3 in Cassandra, that's ~300 TB/day of raw storage. Over 30 days (before TTL expiry for delivered messages — in practice WhatsApp deletes server copies on delivery), the steady-state is dominated by undelivered messages, which is far smaller. For media: 100M files × average 500 KB = ~50 TB/day; with deduplication reducing that 30–40%, effective media ingestion is ~30 TB/day. CDN caching absorbs most of the read traffic so origin bandwidth is far lower than the total media size implies.
12. Conclusion & Checklist
Designing a messaging system at WhatsApp's scale requires getting multiple hard problems right simultaneously: real-time delivery, durable offline storage, end-to-end encryption, efficient group fan-out, and massive media throughput. No single technology or pattern solves all of them — the architecture is a careful composition of WebSocket connection management, event-driven message processing, time-series database design, and client-side cryptography.
Design Review Checklist
- ☐ Persistent WebSocket connections — not HTTP polling — for real-time bidirectional messaging
- ☐ Three-state ACK pipeline: SENT (server ack) → DELIVERED (recipient device ack) → READ (opened)
- ☐ Offline message queue in Cassandra with per-user partition; replay on reconnect
- ☐ Signal Protocol (X3DH + Double Ratchet) for E2E encryption; keys never leave devices
- ☐ Group Sender Key for efficient group encryption (O(N) key distribution, O(1) per message encrypt)
- ☐ Fan-out-on-write for groups <256 members; fan-out-on-read for channels/large groups
- ☐ Media uploaded directly to object store via pre-signed URLs; SHA-256 deduplication
- ☐ CDN for media serving; time-limited signed URLs for access control
- ☐ Cassandra with
(conversation_id, message_seq)primary key; TTL for auto-expiry - ☐ Redis for presence, session routing, and typing indicators
- ☐ Kafka for durable message ingestion; decouples chat servers from message processing
- ☐ Snowflake-style distributed IDs for monotonic per-conversation message ordering
- ☐ Chat server failure recovery via client reconnect + Redis cleanup
- ☐ Push notifications (APNs/FCM) for offline client wake-up
In a system design interview, you don't need to cover all of this. Pick the two or three components the interviewer cares most about and go deep. The ACK pipeline, the offline queue, and the group fan-out decision are the three areas where most candidates are weakest — and therefore the three areas where a strong answer earns the most signal.