What is TL;DR — The Architecture in Five Lines and how does it work?

Persistent WebSocket connections per device on stateful chat servers — one connection = one open TCP socket per client. Signal Protocol (double-ratchet + X3DH) for end-to-end encryption; keys never leave the device. Cassandra time-series schema for per-user message queues; Redis for presence & offline-queue metadata. Fan-out on write for small groups (<256 members); fan-out on read for broadcast channels with thousands of members.

What is Scale Reality Check and how does it work?

~1.16 million messages per second at peak (100B / 86,400s) Average message size ~1 KB → ~1.16 GB/s of message payload throughput ~500 million daily group conversations ~100 million media files shared per day

What is Functional Requirements and how does it work?

Send and receive text messages in 1-to-1 and group chats (up to 1,024 members) Message delivery statuses: sent ✓ , delivered ✓✓ , read ✓✓ (blue) Offline message queuing — messages delivered when the recipient comes online Send and receive media (images <16 MB, video <64 MB, documents <100 MB)

System Design

WhatsApp System Design: Messaging at 2B+ Users Scale (2026)

Q: What is WhatsApp at Scale — What We're Building and how does it work?

WhatsApp is the world's most used messaging platform. As of 2026 its key public metrics are staggering: 2 billion+ monthly active users , 100 billion messages per day , 1 billion+ groups, and a backend that runs on a surprisingly lean engineering team relative to its scale. Building a system at this level requires rethinking every conventional design assumption. In a system design interview, you will rarely be asked to design the entire WhatsApp product. Instead, the interviewer wants to see that you can scope sensibly and dive deep on the hard problems.

Q: What is Scale Reality Check and how does it work?

~1.16 million messages per second at peak (100B / 86,400s) Average message size ~1 KB → ~1.16 GB/s of message payload throughput ~500 million daily group conversations ~100 million media files shared per day

Q: What is Functional Requirements and how does it work?

Send and receive text messages in 1-to-1 and group chats (up to 1,024 members) Message delivery statuses: sent ✓ , delivered ✓✓ , read ✓✓ (blue) Offline message queuing — messages delivered when the recipient comes online Send and receive media (images <16 MB, video <64 MB, documents <100 MB)

Q: What is High-Level Architecture and how does it work?

The system is composed of several horizontally-scalable services. Below is the data-flow for a message sent from Alice to Bob:

How do you design a system that delivers 100 billion messages per day to 2 billion users with sub-second latency, end-to-end encryption, and five-nines availability? This deep dive covers every layer of the WhatsApp architecture — from WebSocket connection management and the Signal encryption protocol to Cassandra schema design and group-messaging fan-out strategies — with annotated Java code you can bring to your next system design interview.

Md Sanwar Hossain April 11, 2026 24 min read System Design

TL;DR — The Architecture in Five Lines

Persistent WebSocket connections per device on stateful chat servers — one connection = one open TCP socket per client.
Signal Protocol (double-ratchet + X3DH) for end-to-end encryption; keys never leave the device.
Cassandra time-series schema for per-user message queues; Redis for presence & offline-queue metadata.
Fan-out on write for small groups (<256 members); fan-out on read for broadcast channels with thousands of members.
CDN + S3-compatible object store for media; deduplication via SHA-256 content hash before upload.

WhatsApp at Scale — What We're Building
Functional & Non-Functional Requirements
High-Level Architecture
Real-Time Messaging: WebSocket vs HTTP Long-Polling
Message Delivery & Offline Queue
End-to-End Encryption (Signal Protocol)
Group Messaging Fan-Out
Media Storage & CDN
Database Design
Scalability & Fault Tolerance
Common Interview Mistakes & What Interviewers Look For
Conclusion & Checklist

1. WhatsApp at Scale — What We're Building

WhatsApp is the world's most used messaging platform. As of 2026 its key public metrics are staggering: 2 billion+ monthly active users, 100 billion messages per day, 1 billion+ groups, and a backend that runs on a surprisingly lean engineering team relative to its scale. Building a system at this level requires rethinking every conventional design assumption.

In a system design interview, you will rarely be asked to design the entire WhatsApp product. Instead, the interviewer wants to see that you can scope sensibly and dive deep on the hard problems. The core subsystems we will design are:

1-to-1 and group text messaging — real-time delivery with acknowledgement and read receipts
Offline message storage — durable message queuing for offline recipients
End-to-end encryption — client-side encryption using the Signal Protocol so servers never see plaintext
Media sharing — images, videos, and documents via an object storage pipeline
Presence — online/last-seen indicators without polling the server on every request

Scale Reality Check

~1.16 million messages per second at peak (100B / 86,400s)
Average message size ~1 KB → ~1.16 GB/s of message payload throughput
~500 million daily group conversations
~100 million media files shared per day
Each user connects from up to 4 linked devices simultaneously (WhatsApp multi-device)

2. Functional & Non-Functional Requirements

Functional Requirements

Send and receive text messages in 1-to-1 and group chats (up to 1,024 members)
Message delivery statuses: sent ✓, delivered ✓✓, read ✓✓ (blue)
Offline message queuing — messages delivered when the recipient comes online
Send and receive media (images <16 MB, video <64 MB, documents <100 MB)
End-to-end encryption — server processes only opaque ciphertext
Presence indicators — online, last seen, typing indicator
Push notifications via APNs / FCM for offline devices
Multi-device support — messages synchronized across up to 4 linked devices

Non-Functional Requirements & Capacity Estimates

Requirement	Target	Notes
Message latency (P99)	< 500 ms	Sender to server; server to receiver depends on their connectivity
Availability	99.99%	<53 min downtime/year
Message throughput	1.2M msg/s peak	100B/day ÷ 86,400s ≈ 1.16M/s
Storage (messages, 30 days)	~3 PB	100B × 1KB avg × 30 days
Active WebSocket connections	~600M concurrent	~30% of 2B DAU online at any moment
Consistency model	Eventual + ordered per conversation	Messages within a chat must be in order; cross-chat is eventually consistent

3. High-Level Architecture

The system is composed of several horizontally-scalable services. Below is the data-flow for a message sent from Alice to Bob:

  Alice (Client)
       │  WebSocket (TLS 1.3)
       ▼
  ┌─────────────────────┐
  │   Load Balancer     │  (L4, sticky by user_id hash)
  └─────────────────────┘
       │
       ▼
  ┌─────────────────────┐      ┌─────────────────┐
  │  Chat Server A      │─────▶│  Message Queue  │  (Kafka / internal queue)
  │  (stateful WS)      │      └────────┬────────┘
  └─────────────────────┘               │
                                         ▼
  ┌───────────────────────────────────────────────┐
  │           Message Processing Service          │
  │  (decrypt metadata, store, fan-out, ACK)      │
  └────────┬──────────────────────────────────────┘
           │                         │
           ▼                         ▼
  ┌─────────────────┐     ┌──────────────────────┐
  │  Cassandra DB   │     │  Presence Service    │
  │ (message store) │     │  (Redis cluster)     │
  └─────────────────┘     └──────────────────────┘
           │
           ▼
  Chat Server B (Bob's connection) ──▶ Bob (Client)
  Or  Push Notification Service  ──▶ Bob (offline)

Core Services Breakdown

Chat Servers: Stateful, long-lived WebSocket servers. Each server holds tens of thousands of open connections. Horizontally scalable — a new chat server can be added to accept more connections at any time.
Message Queue (Kafka): Decouples message ingestion from processing. Producers (chat servers) write to a per-user-partition topic. Consumers (message processors) fan out to recipients, store messages, and send ACKs.
Presence Service: Redis-backed service tracking who is online and on which chat server. Uses Redis pub/sub for cross-server notification routing.
Push Notification Service: Integrates with APNs (iOS) and FCM (Android) to wake offline clients and trigger a reconnect.
Media Service: Handles upload URLs, compression, deduplication, and CDN distribution for binary attachments.
User Service: Manages accounts, phone-to-user-id mapping, profile metadata, and contact discovery.

4. Real-Time Messaging: WebSocket vs HTTP Long-Polling

The choice of transport protocol is the most critical architectural decision for a messaging system. Both WebSocket and HTTP Long-Polling have been used in production, but they have fundamentally different scalability and latency characteristics.

Dimension	WebSocket	HTTP Long-Polling
Connection model	Persistent, full-duplex TCP	Repeated HTTP requests, semi-duplex
Server push latency	~1–10 ms	0–30 s (poll interval)
Header overhead	2–10 bytes per frame	~800 bytes per HTTP request
Connections per server	50,000–500,000 (with tuning)	~10,000–30,000 (thread-limited)
Load balancer compatibility	Requires sticky sessions or L4 LB	Stateless; any L7 LB works
Firewall & proxy support	Some proxies block WS; port 443 fallback needed	Works everywhere (plain HTTPS)
WhatsApp's choice	✅ WebSocket (primary)	Fallback for restricted networks

WhatsApp uses a custom binary protocol over WebSocket (originally XMPP, later replaced with a proprietary protocol). The persistent connection lets the server push messages instantly without any polling overhead. A single chat server can hold 500,000+ concurrent connections using Java's NIO (non-blocking I/O) or a reactive stack.

❌ Bad WebSocket Handler — Blocking, No ACK, No Retry

// ❌ Bad: Blocking handler — ties up a thread per message,
//         no ACK sent back to sender, drops message on DB error
@ServerEndpoint("/ws/chat")
public class BadChatWebSocketHandler {

    @OnMessage
    public void onMessage(String rawMessage, Session session) {
        // ❌ Synchronous DB write blocks the WebSocket thread pool
        Message msg = parseMessage(rawMessage);
        messageRepository.save(msg);           // blocks thread up to 2s

        // ❌ No acknowledgement — sender never knows if message arrived
        // ❌ No retry — if DB write fails, message is silently lost
        // ❌ No async fan-out — recipient delivery happens inline
        User recipient = userService.findById(msg.getRecipientId()); // 2nd blocking call
        if (recipient.isOnline()) {
            Session recipientSession = sessionRegistry.get(recipient.getId());
            recipientSession.getBasicRemote().sendText(rawMessage); // 3rd blocking call
        }
    }
}

✅ Good WebSocket Handler — Async, ACK, Exponential Backoff Retry

// ✅ Good: Non-blocking handler — publishes to Kafka, sends ACK immediately,
//          retry logic lives in the consumer, not the WebSocket handler
@ServerEndpoint("/ws/chat")
public class GoodChatWebSocketHandler {

    private final KafkaProducer<String, ChatMessage> kafkaProducer;
    private final MessageAckService ackService;

    @OnMessage
    public void onMessage(String rawMessage, Session session) {
        ChatMessage msg = ChatMessage.parseFrom(rawMessage);

        // ✅ Publish to Kafka asynchronously — non-blocking fire-and-return
        ProducerRecord<String, ChatMessage> record =
            new ProducerRecord<>("chat-messages", msg.getConversationId(), msg);

        kafkaProducer.send(record, (metadata, ex) -> {
            if (ex == null) {
                // ✅ ACK immediately once message is durable in Kafka
                ackService.sendAck(session, msg.getMessageId(), MessageStatus.SENT);
            } else {
                // ✅ Report failure to client — client will retry with backoff
                ackService.sendNack(session, msg.getMessageId(), ex.getMessage());
            }
        });
        // ✅ Handler thread returns immediately — never blocks
    }

    @OnError
    public void onError(Session session, Throwable t) {
        // Graceful cleanup; pending outbound messages stay in Kafka topic
        sessionRegistry.remove(session);
        presenceService.markOffline(session.getUserPrincipal().getName());
    }
}

// ✅ Client-side retry with exponential backoff (pseudocode reference)
// maxRetries = 5, baseDelay = 500ms, factor = 2.0
// retry 1: 500ms, retry 2: 1s, retry 3: 2s, retry 4: 4s, retry 5: 8s + jitter

5. Message Delivery & Offline Queue

WhatsApp's three-tick system (sent ✓ / delivered ✓✓ / read ✓✓ blue) is one of the most recognizable UI patterns in messaging. Implementing it correctly requires a well-designed ACK pipeline.

Message State Machine

PENDING ──send──▶ SENT ──server_ack──▶ DELIVERED ──read_receipt──▶ READ
                    │                      │
                    │                      └──▶ push_notification (if offline)
                    │
                    └──▶ FAILED (no server ack after timeout → client retries)

Message ACK Service — Java Implementation

// ✅ Good: Message delivery service with full status tracking
@Service
public class MessageDeliveryService {

    private final MessageRepository messageRepo;         // Cassandra
    private final PresenceService presenceService;       // Redis
    private final WebSocketSessionRegistry sessionRegistry;
    private final PushNotificationService pushService;
    private final KafkaTemplate<String, AckEvent> kafka;

    /**
     * Called by Kafka consumer after message is persisted in Cassandra.
     * Determines delivery path: real-time WS or offline push.
     */
    @KafkaListener(topics = "chat-messages", groupId = "delivery-group")
    public void deliverMessage(ChatMessage message) {
        String recipientId = message.getRecipientId();

        // Update persisted status to SENT (server received it)
        messageRepo.updateStatus(message.getMessageId(), MessageStatus.SENT);
        notifySender(message.getSenderId(), message.getMessageId(), MessageStatus.SENT);

        PresenceInfo presence = presenceService.getPresence(recipientId);

        if (presence.isOnline()) {
            // ✅ Real-time delivery via WebSocket
            boolean delivered = tryDeliverViaWebSocket(recipientId, message);
            if (delivered) {
                messageRepo.updateStatus(message.getMessageId(), MessageStatus.DELIVERED);
                notifySender(message.getSenderId(), message.getMessageId(), MessageStatus.DELIVERED);
                return;
            }
        }

        // ✅ Recipient is offline — store in offline queue + push notification
        offlineQueueRepo.enqueue(recipientId, message);
        pushService.sendPushNotification(recipientId,
            PushPayload.builder()
                .title("New message")
                .conversationId(message.getConversationId())
                .build());
    }

    /**
     * Called when recipient opens the conversation (read receipt).
     */
    public void markAsRead(String userId, String conversationId, long upToMessageSeq) {
        List<Message> messages = messageRepo.findUnread(conversationId, userId, upToMessageSeq);
        messages.forEach(m -> {
            messageRepo.updateStatus(m.getMessageId(), MessageStatus.READ);
            notifySender(m.getSenderId(), m.getMessageId(), MessageStatus.READ);
        });
    }

    private void notifySender(String senderId, String messageId, MessageStatus status) {
        AckEvent ack = new AckEvent(messageId, status, Instant.now());
        Session senderSession = sessionRegistry.getSession(senderId);
        if (senderSession != null && senderSession.isOpen()) {
            senderSession.getAsyncRemote().sendObject(ack);  // non-blocking send
        }
    }

    private boolean tryDeliverViaWebSocket(String recipientId, ChatMessage msg) {
        Session session = sessionRegistry.getSession(recipientId);
        if (session == null || !session.isOpen()) return false;
        try {
            session.getAsyncRemote().sendObject(msg);
            return true;
        } catch (Exception e) {
            return false;  // falls back to offline queue
        }
    }
}

Offline Queue Design

When a recipient is offline, messages are stored in their per-user Cassandra partition (not a separate queue table — the same message store, just undelivered). When the client reconnects:

Client sends a CONNECT frame with its last received sequence number.
Chat server queries Cassandra for all messages with seq > last_seen_seq for that user.
Messages are streamed back over the newly opened WebSocket in order.
Client sends bulk ACK; server updates delivery statuses and notifies original senders.

Retention: WhatsApp stores messages on its servers only until they are delivered. Once all recipient devices have acknowledged delivery, the server-side copy is deleted. This is a deliberate design choice to reinforce end-to-end privacy.

6. End-to-End Encryption (Signal Protocol)

WhatsApp uses the Signal Protocol for end-to-end encryption — the same protocol used by Signal messenger and originally developed by Open Whisper Systems. The server processes only ciphertext and never has access to the encryption keys or plaintext message content.

Key Concepts

Identity Key Pair: Long-term Curve25519 key pair generated on device install. Public key registered with the server; private key never leaves the device.
Signed Prekey: A medium-term Curve25519 key pair signed by the identity key. Rotated every few weeks.
One-Time Prekeys (OTKs): A batch of single-use Curve25519 key pairs uploaded to the server. Each new conversation consumes one OTK to prevent replay attacks.
X3DH (Extended Triple Diffie-Hellman): The initial key agreement protocol used when Alice sends a message to Bob for the first time. Combines four DH operations using Alice's identity key, Bob's identity key, Bob's signed prekey, and Bob's OTK to derive a shared secret without Bob being online.
Double Ratchet Algorithm: After X3DH, every message uses a new encryption key derived from the double ratchet — a combination of a symmetric-key ratchet and a Diffie-Hellman ratchet. This provides forward secrecy (past messages remain secure if current keys are compromised) and break-in recovery (future messages are safe even after a temporary key compromise).

What the Server Stores vs What It Doesn't

✅ Server Stores

Public identity keys
Public signed prekeys
Public one-time prekeys
Encrypted ciphertext blobs
Message metadata (sender, recipient, timestamp, size)

❌ Server Never Stores

Private keys (any type)
Session keys / ratchet state
Plaintext message content
Media decryption keys
Message content after delivery

Media Encryption

Media files are encrypted on-device before upload with a randomly generated AES-256-CBC key and an HMAC-SHA-256 integrity key. Only the encrypted blob is uploaded to the media CDN. The decryption key is included in the message payload — itself encrypted by the Signal session — so only the recipient can decrypt the media.

7. Group Messaging Fan-Out

Group messaging is significantly more complex than 1-to-1 messaging because a single message must be delivered to potentially hundreds or thousands of recipients. The two competing strategies are fan-out on write and fan-out on read.

Fan-Out on Write vs Fan-Out on Read

Dimension	Fan-Out on Write	Fan-Out on Read
Mechanism	Write one copy per recipient's mailbox at send time	Store one copy; each recipient fetches it at read time
Write amplification	High (N writes for N members)	Low (1 write per message)
Read latency	Very low (message already in mailbox)	Higher (must query shared timeline)
Suitable for	Small groups (<256 members); active users	Large broadcast channels (>1,000 members)
Individual read receipt	Easy — per-copy status field	Harder — needs separate read-tracking table
WhatsApp strategy	✅ Used for groups <256	✅ Used for channels & large groups

Group Encryption Challenge

End-to-end encrypting a group message is more complex than a 1-to-1 message because there is no single shared secret. WhatsApp uses the Sender Key mechanism from the Signal Protocol:

Each group member generates a random Sender Key for a specific group.
The sender encrypts their Sender Key individually to each group member (using their 1-to-1 Signal session) — this is the only per-member work on the first message.
Subsequent group messages are encrypted once with the Sender Key. All members can decrypt it since they received the Sender Key in step 2.
If a member leaves or is added, all Sender Keys are rotated.

This means group messages only require O(N) work once per key distribution, not O(N) per message — a major efficiency improvement over naive group encryption.

8. Media Storage & CDN

WhatsApp handles approximately 100 million media files per day. Storing and serving these efficiently requires a dedicated media pipeline separate from the message pipeline.

Upload Flow

Hash before upload: Client computes SHA-256 of the encrypted media blob. Sends the hash to the Media Service to check for an existing copy (deduplication).
Pre-signed URL: If no duplicate, Media Service returns a pre-signed upload URL pointing to object storage (S3-compatible). Client uploads directly to object storage — no chat server involvement.
Confirm upload: Client notifies Media Service that the upload is complete. Media Service triggers post-processing (thumbnail generation, virus scan, CDN distribution).
Media URL in message: Sender includes the CDN URL + decryption key in the encrypted message payload sent via WebSocket.

Content Deduplication

If 10,000 people forward the same viral video, it only needs to be stored once. The content-addressed storage key is sha256(encrypted_blob). When a new upload's hash matches an existing blob, the Media Service returns the existing CDN URL immediately — no re-upload needed. This simple deduplication reportedly reduces WhatsApp's storage costs by 30–40% for heavily forwarded media.

CDN & Expiry Strategy

Media files are served from a geo-distributed CDN (Points of Presence close to users) to minimize download latency.
CDN URLs include a time-limited signature (TTL: 30 days). After expiry, recipients must request a refresh URL — the encrypted blob remains in object storage for the duration of the chat backup period.
Original quality and compressed thumbnail versions are stored separately. The thumbnail is included in the message metadata for instant preview; the full-quality file is downloaded on-demand.

9. Database Design

Database selection is critical for a messaging system. The access patterns are highly predictable: reads and writes are almost always scoped to a single conversation, and the data is time-series in nature — making Apache Cassandra an excellent fit.

Why Cassandra for Messages?

Write-optimized: Cassandra's LSM-tree storage engine handles extremely high write throughput with predictable, low-latency writes — critical for 1.2M messages/second.
Partition by conversation: All messages in a conversation land in the same partition, enabling efficient range scans by sequence number or timestamp.
Linear horizontal scale: Add nodes to increase capacity without schema changes or data migration complexity.
Tunable consistency: Use LOCAL_QUORUM for writes (durability) and LOCAL_ONE for reads (speed) within a datacenter.
Time-to-live (TTL): Cassandra's built-in TTL feature automatically expires undelivered messages after a configurable period (e.g., 30 days) without separate cleanup jobs.

Cassandra Schema

-- Messages table: partition by conversation, cluster by time (descending for latest-first queries)
CREATE TABLE messages (
    conversation_id  UUID,
    message_seq      BIGINT,          -- monotonically increasing per conversation (Snowflake ID)
    message_id       UUID,
    sender_id        UUID,
    ciphertext       BLOB,            -- encrypted payload (server never decrypts)
    message_type     TINYINT,         -- 0=text, 1=image, 2=video, 3=audio, 4=doc
    media_url        TEXT,            -- null for text messages
    status           TINYINT,         -- 0=sent, 1=delivered, 2=read
    created_at       TIMESTAMP,
    PRIMARY KEY ((conversation_id), message_seq)
) WITH CLUSTERING ORDER BY (message_seq DESC)
  AND default_time_to_live = 2592000;  -- 30-day TTL; deleted after delivery in practice

-- User inbox: tracks per-user last-seen seq for offline replay
CREATE TABLE user_inbox (
    user_id          UUID,
    conversation_id  UUID,
    last_read_seq    BIGINT,
    last_seen_seq    BIGINT,
    unread_count     INT,
    updated_at       TIMESTAMP,
    PRIMARY KEY ((user_id), conversation_id)
);

-- Conversations table (metadata)
CREATE TABLE conversations (
    conversation_id  UUID PRIMARY KEY,
    conversation_type TINYINT,        -- 0=direct, 1=group, 2=channel
    participant_ids  SET<UUID>,
    group_name       TEXT,
    created_at       TIMESTAMP,
    last_message_seq BIGINT
);

User Service — Relational DB (PostgreSQL)

User profiles, phone number to user_id mapping, and contact discovery use PostgreSQL (or a consistent relational store) because these require strong consistency and are read-heavy relative to writes. The user service is accessed far less frequently than the message store and doesn't need Cassandra's write throughput.

Redis for Presence & Session Routing

The Presence Service uses Redis for ultra-low-latency lookups:

-- Key: presence:{user_id}
-- Value: { "status": "online", "chat_server": "chat-server-7", "last_seen": 1712856321 }
-- TTL: 60 seconds (auto-expires if client disconnects without sending DISCONNECT frame)

-- Key: sessions:{chat_server_id}
-- Value: SET of user_ids connected to this server (used for server failover)

10. Scalability & Fault Tolerance

Chat Server Failure Recovery

Chat servers are stateful (they hold open WebSocket connections), which makes failure recovery non-trivial. The strategy:

Heartbeat-based failure detection: Each client sends a ping every 30 seconds. If the chat server doesn't respond within 60 seconds, the client triggers reconnection.
Redis session tracking: A health-check service monitors each chat server. When a server fails, it bulk-deletes all presence entries for that server. Clients detect absence of pong and reconnect to a different server.
No in-flight message loss: Messages are durable in Kafka and Cassandra before the WebSocket ACK is sent. If the client doesn't receive an ACK and reconnects, it replays from the last acknowledged sequence number.
Zero-downtime rolling restarts: Before a chat server is shut down (e.g., for deployment), it sends a RECONNECT frame to all connected clients with a list of alternative servers, triggering a graceful migration.

Message Ordering

Within a conversation, messages must be ordered. WhatsApp uses a per-conversation monotonic sequence number generated by a distributed ID service (similar to Twitter's Snowflake). The sequence number encodes a timestamp, data-center ID, and machine ID, making it globally unique and time-sortable without coordination across servers.

Sequence numbers are assigned by the Message Processing Service (single writer per partition) — not the sender — to prevent gaps from network re-ordering.
Clients display messages in sequence-number order, not arrival order, to handle cases where two messages from the same device arrive out of order at the server.
Clock skew between client devices is normalized by server-assigned timestamps.

Multi-Region Architecture

WhatsApp operates in multiple geographic regions (Americas, EMEA, APAC) with the following design:

Users are homed to a region based on phone number prefix. All messages to/from that user are processed in their home region.
Cross-region messages (Alice in the US, Bob in Germany) are routed through a global backbone. Alice's chat server writes to Kafka in the US region; a cross-region Kafka MirrorMaker replicates to the EU region where Bob's chat server delivers it.
Cassandra replication uses NetworkTopologyStrategy with a replication factor of 3 in each datacenter and 2 in a backup datacenter. LOCAL_QUORUM writes require 2/3 replicas in the local datacenter for durability.

11. Common Interview Mistakes & What Interviewers Look For

Common Mistakes

Using a single SQL database for messages without discussing partitioning strategy
Not addressing the offline message delivery problem — treating all users as always-online
Designing the group fan-out as a single synchronous call to N chat servers
Ignoring message ordering — assuming arrival order equals send order
Conflating end-to-end encryption with transport-layer TLS
Forgetting multi-device support — designing for single-device users only
Not separating the media pipeline from the message pipeline
Using HTTP polling instead of WebSocket without justification

What Strong Candidates Do

Drive clear requirements before drawing any architecture
Identify the hardest problems early: fan-out at scale, offline delivery, E2E encryption
Explain the ACK pipeline in detail — sent/delivered/read state transitions
Justify database choices with specific access pattern reasoning
Discuss failure modes: what happens when a chat server dies?
Distinguish between fan-out-on-write and fan-out-on-read with thresholds
Address the group encryption key distribution problem proactively
Think about storage costs and propose TTL / deduplication strategies

Bonus: Interviewers Love These Deep Dives

Typing indicator: Sent via a separate lightweight WebSocket event (not through Kafka/DB). Ephemeral — server doesn't persist it. Debounced client-side to avoid flooding the server on every keystroke.
Last Seen: Updated in Redis with a TTL. If a user disables "last seen", the server stops broadcasting it but still updates internally for the user's own devices.
Message search: Not natively supported server-side for privacy reasons. Full-text search is done client-side on the locally decrypted message database — reinforcing the E2E encryption model.
Backup: Optional, encrypted backups to Google Drive / iCloud use a different key path — a password-derived key managed by the user, not WhatsApp.

Frequently Asked Questions

Q: Why does WhatsApp use Cassandra and not a traditional relational database like MySQL?
A: MySQL (and relational databases generally) are excellent for complex queries, joins, and strong ACID transactions — but WhatsApp's message access pattern doesn't need any of that. Every message read is a simple range scan within a single conversation partition. What WhatsApp needs is extremely high write throughput (1.2M messages/second), horizontal scalability without schema-change friction, and built-in TTL for message expiry. Cassandra's write-optimized LSM-tree storage and masterless peer-to-peer architecture deliver all three. Trying to achieve WhatsApp's write scale on a single-primary MySQL cluster would require enormous engineering effort to shard, replicate, and manage.

Q: If messages are end-to-end encrypted, how does WhatsApp detect spam and abuse?
A: This is a genuine tension in the design. WhatsApp can inspect metadata — sender, recipient, frequency, group membership, phone number reputation — without decrypting content. Client-side scanning for known CSAM hashes (PhotoDNA) is done on the device before encryption. Users can also report messages, which sends the last few messages (decrypted by the reporting user's device) to WhatsApp for review — a deliberate opt-in disclosure. WhatsApp has stated that "forwarding limits" (capping how many people a message can be forwarded to) are applied based on forward-count metadata without reading content.

Q: How does WhatsApp handle the "thundering herd" problem when millions of users reconnect after an outage?
A: Reconnection is controlled with exponential backoff with jitter on the client side. When clients detect disconnection, they wait a random delay (e.g., base 2s ± jitter) before reconnecting, doubling the base delay on each retry up to a cap. This spreads the reconnection wave over minutes instead of seconds, preventing a synchronized storm from overwhelming the load balancer and chat servers simultaneously. The load balancer itself distributes reconnecting clients using consistent hashing to avoid hot-spotting individual chat servers.

Q: How does WhatsApp's multi-device feature work with end-to-end encryption?
A: Each linked device has its own identity key pair. When Alice links a new device (Device B), a key agreement happens between Device A (the primary) and Device B using the same X3DH protocol used for regular contacts. The primary device sends all conversation history to the new device encrypted to Device B's identity key. Subsequently, every outgoing message is encrypted separately for each of Alice's linked devices and each of Bob's linked devices — the sender's device knows all the public keys for all linked devices of all participants. This is why adding the 4th linked device to a large group chat requires distributing Sender Keys to every device of every member.

Q: What is the estimated storage cost for 100 billion messages per day?
A: A typical text message payload (ciphertext + metadata) averages ~1 KB. 100B × 1KB = ~100 TB/day of raw message data. With replication factor 3 in Cassandra, that's ~300 TB/day of raw storage. Over 30 days (before TTL expiry for delivered messages — in practice WhatsApp deletes server copies on delivery), the steady-state is dominated by undelivered messages, which is far smaller. For media: 100M files × average 500 KB = ~50 TB/day; with deduplication reducing that 30–40%, effective media ingestion is ~30 TB/day. CDN caching absorbs most of the read traffic so origin bandwidth is far lower than the total media size implies.

12. Conclusion & Checklist

Designing a messaging system at WhatsApp's scale requires getting multiple hard problems right simultaneously: real-time delivery, durable offline storage, end-to-end encryption, efficient group fan-out, and massive media throughput. No single technology or pattern solves all of them — the architecture is a careful composition of WebSocket connection management, event-driven message processing, time-series database design, and client-side cryptography.

Design Review Checklist

☐ Persistent WebSocket connections — not HTTP polling — for real-time bidirectional messaging
☐ Three-state ACK pipeline: SENT (server ack) → DELIVERED (recipient device ack) → READ (opened)
☐ Offline message queue in Cassandra with per-user partition; replay on reconnect
☐ Signal Protocol (X3DH + Double Ratchet) for E2E encryption; keys never leave devices
☐ Group Sender Key for efficient group encryption (O(N) key distribution, O(1) per message encrypt)
☐ Fan-out-on-write for groups <256 members; fan-out-on-read for channels/large groups
☐ Media uploaded directly to object store via pre-signed URLs; SHA-256 deduplication
☐ CDN for media serving; time-limited signed URLs for access control
☐ Cassandra with (conversation_id, message_seq) primary key; TTL for auto-expiry
☐ Redis for presence, session routing, and typing indicators
☐ Kafka for durable message ingestion; decouples chat servers from message processing
☐ Snowflake-style distributed IDs for monotonic per-conversation message ordering
☐ Chat server failure recovery via client reconnect + Redis cleanup
☐ Push notifications (APNs/FCM) for offline client wake-up

In a system design interview, you don't need to cover all of this. Pick the two or three components the interviewer cares most about and go deep. The ACK pipeline, the offline queue, and the group fan-out decision are the three areas where most candidates are weakest — and therefore the three areas where a strong answer earns the most signal.