System Design

Designing a Gmail-Scale Email System: SMTP, Storage, Search & Delivery Architecture

Email is one of the most critical and deceptively complex distributed systems ever built. Gmail processes over 10 billion emails per day across 1.8 billion active users. This guide walks through every engineering layer — from SMTP ingestion and Bigtable-style message storage to full-text search indexing, ML-based spam filtering, IMAP/IDLE push delivery, and globally-replicated active-active architecture — giving you a production-grade blueprint for system design interviews and real-world platform builds.

Md Sanwar Hossain April 7, 2026 22 min read Email Architecture
Email system design Gmail-scale SMTP storage search architecture

TL;DR — The Architecture in One Paragraph

"A Gmail-scale email system routes inbound SMTP through MX gateways that enforce SPF/DKIM/DMARC, stores messages in a sharded Bigtable-style store with separate blob storage for attachments, indexes content in a real-time inverted index for sub-second search, applies ML-based spam scoring at ingestion time, delivers to clients via IMAP/IDLE + WebSocket push, and replicates across regions in an active-active topology with eventual consistency for mailbox state."

Table of Contents

  1. Functional & Non-Functional Requirements
  2. High-Level Architecture Overview
  3. SMTP Gateway & MX Routing
  4. Message Storage Design
  5. Full-Text Search & Indexing
  6. Spam & Phishing Filtering
  7. Outbound Delivery & MTA
  8. Push Notifications & IMAP
  9. Global Replication & Availability
  10. Capacity Estimation & Back-of-Envelope Math
  11. Security & Compliance
  12. System Design Interview Checklist

1. Functional & Non-Functional Requirements

Before diving into architecture, nail down the requirements. Email systems are one of the most feature-dense platforms in existence — scope carefully to avoid analysis paralysis in interviews and misaligned delivery in production.

Functional Requirements

Non-Functional Requirements

Dimension Target Notes
Scale 1.8B users, 10B emails/day ~115,000 emails/sec peak
Availability 99.99% (52 min/year downtime) Multi-region active-active
Durability 11 nines (99.999999999%) Triple replication minimum
Latency (send) < 2 seconds end-to-end From compose-click to inbox
Search latency < 300 ms p99 Over billions of messages
Storage 15 GB free per user ~27 exabytes total corpus
Spam accuracy > 99.9% precision < 0.1% false positive rate

2. High-Level Architecture Overview

A Gmail-scale email system decomposes into seven distinct planes. Each plane is independently scalable and deployed as its own service cluster. Understanding the data flow through these planes is the foundation of any system design answer.

Data Flow: Inbound Email

  1. DNS / MX lookup → sender's MTA resolves recipient's MX records → connects to our SMTP Gateway
  2. SMTP Gateway → validates connection (IP reputation, rate limiting), runs SPF/DKIM/DMARC checks, accepts or rejects at SMTP level
  3. Spam & AV Scanner → scores the message for spam probability, scans attachments for malware, extracts phishing signals
  4. Routing Service → applies user-defined filters and rules, determines destination mailbox and labels
  5. Storage Writer → writes message metadata to the mailbox index shard, stores raw message body in blob store
  6. Search Indexer → asynchronously tokenizes subject/body/headers, writes to the per-user inverted index
  7. Notification Dispatcher → pushes new-mail events to IMAP/IDLE connections, WebSocket sessions, and FCM/APNs for mobile
Email system architecture Gmail-scale SMTP storage search
Gmail-scale email system architecture — inbound SMTP flow through spam filtering, sharded storage, search indexing, and push delivery. Source: mdsanwarhossain.me

Core Service Boundaries

Each service is isolated behind an internal RPC interface (gRPC) and communicates asynchronously via a distributed message queue (Apache Kafka) for non-latency-critical paths. Synchronous paths include spam scoring (inline, pre-delivery) and storage writes (must complete before SMTP 250 OK is returned to the sender). Asynchronous paths include search indexing, notification dispatch, and delivery status webhook callbacks.

3. SMTP Gateway & MX Routing

The SMTP gateway is the first line of defense and the entry point for all inbound email. It must handle enormous connection concurrency (millions of simultaneous SMTP sessions from external mail servers worldwide), enforce authentication standards, and make accept/reject decisions in milliseconds — because rejected spam at SMTP level is cheaper than accepting it and filtering downstream.

MX Record Architecture

Multiple MX records with different priority values provide load distribution and failover. A typical production setup publishes MX records at priority 5 and 10 pointing to anycast IP ranges backed by multiple physical SMTP gateway pools. Senders use the lowest-priority MX first (5), failing over to priority 10 only when the primary is unreachable. Within each priority group, DNS-level load balancing via round-robin or GeoDNS routes to the nearest gateway cluster.

Sender Authentication: SPF, DKIM, DMARC

The gateway validates three sender authentication mechanisms during the SMTP transaction. Failure modes must be handled carefully — misclassifying legitimate email as fraudulent creates false positives that erode user trust.

Connection-Level Rate Limiting & IP Reputation

Before accepting even the SMTP banner exchange, the gateway applies connection-level throttling:

# SMTP Gateway rate-limit policy (pseudoconfig)
connection_limits:
  per_ip_max_concurrent: 20          # max simultaneous connections per IP
  per_ip_rate_limit: 100/minute      # new connections per minute per IP
  unknown_ip_greylisting: true       # defer unknown IPs with 451 for 5 min
  ip_reputation_threshold: 0.4      # block IPs with reputation score < 0.4
  
spamhaus_dnsbl_lookup: true          # check sending IP against Spamhaus ZEN
surbl_url_check: true                # check embedded URLs against SURBL
tarpitting:
  enabled: true
  delay_ms_per_rcpt_unknown: 5000   # slow down dictionary attacks
  
greylisting:
  first_seen_defer_seconds: 300
  allowlist_after_first_success: true

IP reputation is maintained in a globally replicated distributed cache (Redis Cluster with cross-region replication). Each successfully delivered message from an IP increases its reputation score; each spam complaint, bounce, or authentication failure decreases it. Reputation scores decay over 30 days without updates, allowing reformed senders to rebuild trust gradually.

4. Message Storage Design

Email storage is a classic write-heavy, read-mostly workload with complex access patterns. Users read recent emails frequently, search across all historical email occasionally, and almost never access emails older than 6 months. This access pattern demands a tiered storage architecture that separates hot metadata from cold raw content.

Mailbox Metadata Store — Bigtable-Style Sharding

Message metadata (subject, sender, timestamp, size, labels, thread ID, read/unread status, blob reference) is stored in a wide-column store modeled after Google Bigtable. The row key is constructed as user_id + reverse_timestamp, which sorts messages chronologically in reverse order within each user's key range — this means mailbox listing queries (most common operation) read a contiguous sequence of rows from the user's shard without scatter-gather across multiple nodes.

// Row key design: user_id (8 bytes) + inverted_epoch_ms (8 bytes)
// Row key for user abc123, message at 2026-04-07T10:00:00Z:
// "abc123" + (Long.MAX_VALUE - 1712484000000) → "abc1239223372036854775807"

// Column families:
// meta: {from, to, subject, size_bytes, mime_type}
// flags: {read, starred, archived, deleted}
// labels: {label_id_1: true, label_id_2: true, ...}
// refs:   {blob_key: "sha256:abc...", thread_id: "t_xyz..."}
// spam:   {score: 0.03, classifier_version: "v47"}

// Sharding: users are range-partitioned across tablet servers
// Each tablet covers ~10GB of data before splitting
// Hot users (high-volume inboxes) get dedicated tablet servers

The metadata store uses Paxos-based replication (similar to Spanner) within each region for strong consistency within a user's shard. All write operations for a given user are routed to their primary shard leader, which replicates synchronously to two followers before acknowledging the write. Cross-region replication is asynchronous, enabling reads from nearby replicas without cross-region latency.

Blob Store for Message Bodies & Attachments

Raw MIME message bodies and attachments are stored separately in an immutable, content-addressed blob store — analogous to Google's Colossus distributed file system or Amazon S3. Content addressing means the blob key is the SHA-256 hash of the raw bytes, providing automatic deduplication: if two users receive the same mass-mailing, only one copy of the attachment blob is stored on disk.

Thread Grouping

Grouping messages into conversation threads requires matching on the RFC 2822 Message-ID, In-Reply-To, and References headers. The threading service maintains a per-user thread graph stored as a separate column family in the metadata store. When a new message arrives, the service checks its In-Reply-To header against the thread index; if matched, the message is added to the existing thread. If no match is found and the subject line (normalized by stripping Re:/Fwd: prefixes) matches a recent thread, a heuristic grouping is applied. Thread IDs are stable 64-bit identifiers that clients use to fetch all messages in a conversation with a single query.

Full-text search over a personal mailbox of potentially millions of messages must return results in under 300ms at p99. This is a hard engineering problem: the corpus is enormous, updates are real-time (new mail must be searchable within seconds of delivery), and queries are unpredictable (wildcard, phrase, field-scoped, date-ranged). Gmail's search system is one of the most complex parts of the platform.

Inverted Index Architecture

The search subsystem maintains a per-user inverted index — a mapping from each term (word) to the list of message IDs containing that term. The index is sharded by user ID, with each user's index living on a dedicated index shard that can be up to a few gigabytes for heavy users with decades of email history.

// Inverted index structure (Lucene-style segment format)
// Term dictionary (sorted, prefix-compressed):
//   "amazon"       → [msg_001, msg_047, msg_293, msg_1102, ...]
//   "invoice"      → [msg_001, msg_112, msg_293, ...]
//   "order"        → [msg_001, msg_039, msg_112, ...]

// Posting list entry:
// {
//   msg_id: uint64,
//   field_mask: uint8,   // bitmask: 0x01=subject, 0x02=body, 0x04=from
//   term_freq: uint16,   // how many times term appears in message
//   positions: []uint16  // byte offsets for snippet highlighting
// }

// Per-field boosting at query time:
// subject_match_weight: 5.0
// from_match_weight:    3.0
// body_match_weight:    1.0
// attachment_name:      2.0

Tokenization & Analysis Pipeline

Raw message content passes through a multi-stage analysis pipeline before indexing:

Real-Time Index Updates

Search indexing must be real-time: users expect new mail to be searchable immediately. The system uses a two-tier index architecture inspired by Elasticsearch's translog and Lucene's segment merging:

6. Spam & Phishing Filtering

Email spam and phishing represent an adversarial arms race. Gmail's spam system reportedly blocks over 100 million spam and phishing emails every day. The classifier must maintain above 99.9% precision (very few false positives — legitimate email going to spam) while maximizing recall (catching all spam). These competing objectives require a multi-layer defense in depth.

Layer 1: Connection & Authentication Signals

The cheapest signals to compute are those available at connection time, before even reading the message body. These include IP reputation score, Spamhaus DNSBL lookup result, SPF/DKIM/DMARC authentication outcome, sending domain age (recently registered domains are high-risk), and the ratio of this IP's historically accepted versus rejected messages. Approximately 70–80% of inbound spam connections can be rejected at this layer alone with zero content analysis required.

Layer 2: Bayesian & Rule-Based Classification

For messages that pass connection-level checks, a Naive Bayes classifier scores the message based on token frequency statistics. Bayesian spam filtering works by computing, for each token in the message, the conditional probability that the token appears in spam versus legitimate (ham) email. The individual token probabilities are combined using Bayes' theorem to produce a final message-level spam probability score.

Alongside the Bayesian model, a rule engine (SpamAssassin-style) evaluates hundreds of hand-crafted rules. These rules detect patterns such as: excessive use of HTML formatting tricks (invisible text, tiny fonts used to fool word-frequency-based classifiers), URL shortener abuse, mismatches between the display name and email address in the From header, and known spam phrases. Each rule contributes a positive or negative score; the total rule score is combined with the Bayesian probability to produce an aggregate signal.

Layer 3: Neural Network Classifier

A deep neural network model — typically a transformer-based text classifier fine-tuned on billions of labeled spam/ham examples — provides the final classification for messages that score in the ambiguous range from the rule-based system. The model is retrained continuously on a stream of user feedback signals (user marks as spam, user marks as not spam, user moves out of spam folder). This adversarial retraining loop allows the classifier to adapt to new spam campaigns within hours of their first appearance.

Quarantine & User Feedback Loop

Messages classified as spam above a configurable threshold are routed to the Spam folder rather than hard-bounced. Hard bouncing spam backscatter (sending NDRs to spoofed senders) is itself a spam amplification vector. Spam folder retention is 30 days before automatic permanent deletion. User actions — marking a message as spam or moving it out of spam — generate training signals that flow back into the classifier's online learning pipeline with a few hours of lag. Users who routinely receive high volumes of a particular sender pattern (newsletters, mailing lists) can create personal spam-override rules that prevent future messages from that sender from being marked as spam regardless of classifier output.

7. Outbound Delivery & MTA

Outbound email delivery — sending messages composed by users to external email servers — is a complex, stateful process that involves queue management, retry logic, DKIM signing, bounce handling, and reputation management across thousands of destination mail servers, each with their own acceptance policies and rate limits.

MTA Architecture & Delivery Queues

The outbound MTA (Mail Transfer Agent) maintains per-destination delivery queues. Rather than a single global queue, messages are partitioned by destination domain. This design prevents a single slow or unavailable destination domain from blocking delivery to other domains — a critical isolation property at scale.

// Per-destination delivery queue state machine
enum DeliveryStatus {
    QUEUED,          // awaiting delivery worker pickup
    IN_FLIGHT,       // active SMTP connection to destination
    DELIVERED,       // 250 OK received → move to Sent folder
    TEMP_FAILED,     // 4xx response → retry with backoff
    PERM_FAILED,     // 5xx response → generate bounce NDR
    DEFERRED         // destination rate-limited us → hold and retry
}

// Exponential backoff schedule for temporary failures:
// Attempt 1: immediate
// Attempt 2: +5 minutes
// Attempt 3: +30 minutes
// Attempt 4: +2 hours
// Attempt 5: +6 hours
// Attempt 6: +24 hours
// Max retry window: 5 days (per RFC 5321 minimum 5 days)
// After 5 days: generate bounce NDR to sender

DKIM Signing

Every outbound message is DKIM-signed before delivery. The signing service maintains a pool of RSA-2048 or Ed25519 private keys, rotating them every 90 days. The public key is published in DNS as a TXT record at selector._domainkey.gmail.com. Key rotation is non-breaking: the old selector remains published in DNS for 30 days after rotation so that messages signed with the old key (which may still be in transit or cached) can still be verified by recipients. The DKIM signature covers the message body (SHA-256 hash) and a defined set of headers (From, To, Subject, Date, Message-ID), making it tamper-evident.

Bounce Handling & Feedback Loops

Bounces (NDRs — Non-Delivery Reports) returned by destination servers must be processed to protect outbound reputation. A high bounce rate signals to destination servers that the sender is sending to invalid addresses (a common spam pattern), triggering throttling or blocking. The bounce processor parses NDR messages, classifies them as hard (permanent: unknown user, domain does not exist) or soft (transient: mailbox full, destination server temporarily unavailable), and updates the sending user's account state. Hard bounces to a specific address suppress future delivery attempts to that address from all users. Abuse feedback loops (FBL) from major ISPs (Yahoo, AOL, Outlook.com) report spam complaints back to Gmail; these complaints influence the sender's outbound reputation score and can trigger account-level sending throttling.

8. Push Notifications & IMAP

Clients — web browsers, mobile apps, desktop email clients — need to receive new email notifications in real time without polling. Polling at scale (1.8B users each polling every 30 seconds) would be catastrophically expensive. The system must support multiple push delivery protocols to serve the full ecosystem of client types.

IMAP with IDLE Extension

IMAP (Internet Message Access Protocol) is the standard protocol for email client access. The IDLE extension (RFC 2177) allows a client to place an IMAP connection into a "listening" state where the server pushes EXISTS and FETCH notifications to the client whenever the mailbox changes, without the client needing to poll. Each IDLE connection is a persistent TCP connection maintained per authenticated user session.

At Gmail scale, maintaining billions of long-lived IMAP connections requires a dedicated IMAP proxy tier. The proxy uses multiplexed, event-driven I/O (similar to Nginx's event loop model) to serve thousands of IMAP connections per process, each forwarding to a user's mailbox shard. IDLE connections are load-balanced across proxy instances; when a new message arrives for a user, the storage write path publishes a notification event to Kafka, which is consumed by the IMAP proxy that holds the user's active IDLE connection. The proxy immediately sends the EXISTS notification to the connected client.

WebSocket Push for Web Clients

The Gmail web application uses a WebSocket connection (or HTTP/2 server push, or long-polling as fallback) to receive real-time new-mail events. The web push gateway maintains a mapping of authenticated user sessions to WebSocket connections. When the notification Kafka topic receives a new-mail event for a user, the gateway fans out the notification to all open WebSocket sessions for that user (supporting multi-tab and multi-device scenarios). The notification payload contains minimal data: message ID, subject preview, sender, timestamp, and unread count — enough to update the inbox UI without a full page reload. The client then lazily fetches the full message only when the user clicks on it.

Mobile Push via FCM & APNs

For mobile clients (Android and iOS), battery life constraints mean maintaining a persistent background connection from the app is not viable. Instead, the system integrates with platform-specific push notification services: Firebase Cloud Messaging (FCM) for Android and Apple Push Notification service (APNs) for iOS.

9. Global Replication & Availability

Email is mission-critical infrastructure. A 1-hour Gmail outage makes international news. Achieving 99.99% availability requires an architecture that can survive the complete failure of an entire datacenter region without user-visible disruption. This mandates multi-region, active-active deployment with careful consistency trade-offs.

Multi-Region Active-Active Topology

User mailboxes are assigned a home region based on user location and capacity planning. Writes (new message delivery, read/unread status changes, label mutations) are directed to the home region's shard leader and synchronously replicated to a second replica within the same region before acknowledging. Cross-region replication to at least two additional geographic regions happens asynchronously, with a typical replication lag of 1–5 seconds under normal conditions.

Reads are served from the nearest region. Because cross-region replication is asynchronous, reads from non-home regions may reflect a slightly stale view (eventual consistency). In practice, for email — which is not a financial transaction system — users tolerate a few seconds of stale inbox state when reading from a geographically distant replica. Operations that require strict freshness (explicit inbox refresh, sending a reply that must reflect the latest draft state) are pinned to the home region with a read-your-writes consistency guarantee.

Regional Failover

When a home region becomes unavailable (network partition, power failure, major hardware incident), the system promotes the most up-to-date secondary replica to become the new primary using a distributed leader election protocol (Paxos or Raft). The promotion process includes a brief write-hold window (typically 10–30 seconds) during which writes are buffered in a durable WAL (write-ahead log) at the SMTP gateway layer. Once the new primary is established, the buffered writes are replayed. This brief hold is invisible to external senders (the SMTP gateway has not yet returned 250 OK) and results in slightly delayed email delivery — acceptable given the alternative of data loss.

Consistency Choices by Operation Type

Operation Consistency Model Rationale
Message delivery write Strong (within region) No message loss after SMTP 250 OK
Read/unread status Eventual (cross-region) Tolerable 2–5s lag across devices
Label mutation Eventual (cross-region) Low-conflict; last-write-wins acceptable
Send draft Read-your-writes (home region) Must read latest draft before sending
Search index update Eventual (< 10 seconds) Async; brief delay is acceptable

10. Capacity Estimation & Back-of-Envelope Math

Capacity estimation grounds the architecture in physical reality. In a system design interview, precise calculations demonstrate engineering maturity. Here is the complete back-of-envelope analysis for a Gmail-scale email platform.

Traffic & Throughput

Storage Calculations

Network Bandwidth

11. Security & Compliance

An email platform stores some of the most sensitive personal and business information that exists. Security must be built into every layer, from transport encryption to at-rest data protection, access control, and regulatory compliance frameworks spanning dozens of jurisdictions.

Transport Security

All SMTP connections — both inbound from external servers and outbound to external destinations — enforce STARTTLS opportunistic encryption. Connections from major email providers (Microsoft 365, Yahoo, iCloud) that support SMTP MTA-STS (RFC 8461) require TLS with certificate validation, preventing downgrade attacks. Between internal services (SMTP gateway → spam scanner → storage writer), mutual TLS (mTLS) is enforced using internally managed PKI certificates rotated every 90 days. No plaintext internal traffic is permitted.

At-Rest Encryption

All data at rest — message blobs, metadata index rows, search index segments — is encrypted using AES-256-GCM. Encryption keys are managed by an internal Key Management Service (KMS) that wraps data encryption keys (DEKs) with key encryption keys (KEKs). KEKs are stored in hardware security modules (HSMs) and rotated annually. This envelope encryption model ensures that even if raw disk images are stolen, they cannot be decrypted without access to the KMS. For enterprise Google Workspace customers, customer-managed encryption keys (CMEK) allow businesses to hold their own KEKs outside of Google's infrastructure, enabling them to revoke Google's ability to decrypt their data by revoking the KEK.

Access Control & Audit Logging

Access to mailbox data is governed by strict authorization policies. The storage service enforces that only authenticated, authorized services (spam scanner, search indexer, client API) can access message data, and only for the specific user they are authorized to serve. All data access — including internal service-to-service accesses — is logged to an immutable audit log. These audit logs are retained for 7 years for compliance with financial sector regulations and are regularly reviewed by automated anomaly detection systems to flag unusual data access patterns (e.g., bulk export of messages from millions of accounts, which could indicate an insider threat or compromised service account).

GDPR, CCPA & Data Retention

12. System Design Interview Checklist

Use this checklist as a structured interview framework. Covering all dimensions within a 45–60 minute session requires preparation and practice. Map each checklist item to the architecture sections above for confident, detailed answers.

Phase 1: Requirements Clarification (5 min)

  • ☑ Clarify scale: number of users, daily email volume, expected read vs write ratio
  • ☑ Confirm feature scope: send, receive, search, spam filtering, push notifications
  • ☑ Clarify availability target (99.9% vs 99.99%) and durability SLA
  • ☑ Ask about global vs single-region requirement
  • ☑ Clarify attachment size limits and storage quota per user

Phase 2: Back-of-Envelope Estimation (5 min)

  • ☑ Calculate peak email throughput (emails/sec)
  • ☑ Estimate storage per day and total corpus size
  • ☑ Estimate inbound network bandwidth at SMTP gateway
  • ☑ Size the metadata store vs blob store separately
  • ☑ Estimate search index size as percentage of raw content

Phase 3: High-Level Design (15 min)

  • ☑ Draw inbound flow: external MTA → SMTP Gateway → Spam Filter → Storage → Search Index → Push
  • ☑ Draw outbound flow: Client → API → MTA Queue → DKIM Signing → External Delivery
  • ☑ Separate metadata store (structured, sharded) from blob store (immutable, content-addressed)
  • ☑ Explain the inverted index structure and real-time update path
  • ☑ Name the push delivery mechanisms: IMAP/IDLE, WebSocket, FCM/APNs

Phase 4: Deep Dives (15 min)

  • ☑ Storage sharding strategy: row key design (user_id + inverted_timestamp), range partitioning
  • ☑ Blob deduplication via content-addressed SHA-256 keys
  • ☑ Spam filtering layers: IP reputation → Bayesian → rules → neural model
  • ☑ SPF/DKIM/DMARC validation at SMTP gateway
  • ☑ Outbound delivery retry backoff and bounce handling
  • ☑ Full-text search: tokenization pipeline, inverted index, real-time segment merge
  • ☑ Multi-region replication: strong within-region, eventual cross-region
  • ☑ Failover: replica promotion, write-hold window, WAL replay

Phase 5: Edge Cases & Failure Modes (5 min)

  • ☑ What happens if spam classifier is unavailable? → Accept email, mark for async reclassification
  • ☑ What if blob store write fails mid-delivery? → SMTP gateway returns 4xx (temp failure), sender retries
  • ☑ What if a destination domain is down during outbound delivery? → 5-day retry queue with exponential backoff
  • ☑ What if search indexer falls behind? → Queue-based catch-up; messages still delivered, temporarily unsearchable
  • ☑ Large attachment (25 MB): stream directly to blob store during SMTP DATA phase, never buffer in memory
  • ☑ Mail loop detection: check Received headers for recursive loops; reject after configurable hop limit (RFC 5321 recommends max 25 hops)

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · Distributed Systems

All Posts
Last updated: April 7, 2026