System Design

Designing an Autocomplete/Typeahead System at Scale: Trie, Top-K Ranking & Distributed Architecture

Q: What is Trie Data Structure — The Core Abstraction and how does it work?

A Trie (prefix tree) is the natural data structure for prefix-based lookups. Each node represents one character; traversing from root to any node spells out a prefix. All strings sharing that prefix are in the subtree below that node.

Q: What is Trie Node Structure and how does it work?

class TrieNode { Map children; boolean isEnd; // marks a complete query string long frequency; // how often this query is searched // Optimization: cache top-K completions at each node List topK; // pre-computed top-5 completions for this prefix } class Trie { void insert(String query, long frequency) { TrieNode node = root; for (char c : query.toCharArray()) { node = node.children.computeIfAbsent(c, k -> new TrieNode()); // Update topK at every ancestor node updateTopK(node, query, frequency); } node.isEnd = true; node.frequency = frequency; } List topKCompletions(String prefix) { TrieNode node = traverseTo(prefix); return node == null ? List.of() : node.topK; // O(prefix.length) + O(1) } } Key optimization: Instead of traversing the entire subtree at query time (expensive), pre-compute and cache the top-K completions at every intermediate node during.

Q: What is Top-K per Node — Efficient Completion Retrieval and how does it work?

Each Trie node caches its top-K completions (K=5 typically). When a new query is inserted or its frequency updated, propagate the update up to all ancestor nodes, maintaining a sorted list of top-K at each node.

Autocomplete (typeahead) appears simple but conceals deep engineering challenges at Google or Amazon scale: returning top-5 relevant completions in under 100ms for billions of queries per day across 50+ languages, while adapting to trending queries in near real-time.

Md Sanwar Hossain April 6, 2026 18 min read System Design

Autocomplete typeahead system design with Trie data structure and distributed architecture

TL;DR — Core Design

"Store top-K completions per prefix in a Trie (in-memory) or Redis Sorted Set (distributed). Update scores offline via Spark batch on query logs. Shard the Trie by prefix first-letter for horizontal scaling. Client-side debouncing + server-side caching reduce backend QPS by 90%."

Requirements & Scale Estimation
Trie Data Structure — The Core Abstraction
Top-K per Node — Efficient Completion Retrieval
Distributed Trie — Sharding & Replication
Redis Sorted Set Alternative
Offline Aggregation Pipeline (Spark)
Near Real-Time Updates for Trending Queries
Ranking: Frequency, Personalization & Context
API Design & Client-Side Optimization
Global Scaling & Multi-Language Support
Design Checklist & Conclusion

1. Requirements & Scale Estimation

Functional Requirements

Return top-5 query completions as the user types each character
Completions ranked by query frequency (most searched first)
Support prefix matching: typing "sys" returns "system design", "syscall", "system monitor", etc.
Adapt to trending queries within hours (not weeks)
Optionally: personalized suggestions based on user history
Support 50+ languages with different character sets

Non-Functional Requirements & Scale

Metric	Target	Notes
Read latency	< 100ms P99	User perceives as instant
QPS (peak)	100,000 req/sec	Google: 99K searches/sec
Total unique queries	5 billion	Top 1M queries cover 90% traffic
Freshness	Trending within 1 hour	Breaking news, viral events

2. Trie Data Structure — The Core Abstraction

A Trie (prefix tree) is the natural data structure for prefix-based lookups. Each node represents one character; traversing from root to any node spells out a prefix. All strings sharing that prefix are in the subtree below that node.

Trie data structure for autocomplete: prefix tree nodes, top-K scores per node, and distributed service architecture — Trie Structure and Distributed Autocomplete Architecture — prefix nodes, top-K per node, Redis/Cassandra sharding. Source: mdsanwarhossain.me

Trie Node Structure

class TrieNode {
    Map<Character, TrieNode> children;
    boolean isEnd;           // marks a complete query string
    long frequency;          // how often this query is searched
    // Optimization: cache top-K completions at each node
    List<String> topK;       // pre-computed top-5 completions for this prefix
}

class Trie {
    void insert(String query, long frequency) {
        TrieNode node = root;
        for (char c : query.toCharArray()) {
            node = node.children.computeIfAbsent(c, k -> new TrieNode());
            // Update topK at every ancestor node
            updateTopK(node, query, frequency);
        }
        node.isEnd = true;
        node.frequency = frequency;
    }

    List<String> topKCompletions(String prefix) {
        TrieNode node = traverseTo(prefix);
        return node == null ? List.of() : node.topK; // O(prefix.length) + O(1)
    }
}

Key optimization: Instead of traversing the entire subtree at query time (expensive), pre-compute and cache the top-K completions at every intermediate node during the insert/update phase. Retrieval becomes O(prefix_length) = effectively O(1) for short prefixes.

Time & Space Complexity

Operation	Time (with top-K cache)	Time (without cache)
Insert	O(L × K log K)	O(L)
Search (top-K completions)	O(L)	O(L + subtree size)
Space (5B queries, avg 20 chars)	~50GB uncompressed; 15–20GB with prefix sharing

3. Top-K per Node — Efficient Completion Retrieval

Each Trie node caches its top-K completions (K=5 typically). When a new query is inserted or its frequency updated, propagate the update up to all ancestor nodes, maintaining a sorted list of top-K at each node.

// Update top-K at a node — min-heap approach
void updateTopK(TrieNode node, String query, long frequency) {
    // Replace if query already in top-K (update score)
    node.topK.removeIf(e -> e.query.equals(query));
    node.topK.add(new Completion(query, frequency));
    // Sort descending by frequency, keep only top-K
    node.topK.sort((a, b) -> Long.compare(b.frequency, a.frequency));
    if (node.topK.size() > K) node.topK = node.topK.subList(0, K);
}

// Example: prefix "spr" → top-5 completions
// 1. "spring boot"      score: 9,823,441
// 2. "spring framework" score: 7,234,182
// 3. "spring data jpa"  score: 4,102,091
// 4. "sprint planning"  score: 2,891,333
// 5. "spreadsheet"      score: 2,100,009

4. Distributed Trie — Sharding & Replication

A single in-memory Trie for 5B queries won't fit on one machine (~50GB). Distribute by prefix partition.

Consistent hashing ring for distributed autocomplete trie sharding and replication — Consistent Hashing Ring for Distributed Trie Sharding — prefix partitions, virtual nodes, and replication topology. Source: mdsanwarhossain.me

Sharding Strategy: First Letter Prefix

Shard 1: prefixes starting with a–e
Shard 2: prefixes starting with f–l
Shard 3: prefixes starting with m–r
Shard 4: prefixes starting with s–z + numbers
Adjust shard boundaries based on actual traffic distribution (hot prefixes like "s" get more shards)

Replication for High Availability

Each shard has a primary + 2 read replicas (3-way replication)
All reads served from read replicas — updates only go to primary, then async replicate
Trie updates are infrequent (hourly batch) — eventual consistency is acceptable
If a shard primary fails, replica election via ZooKeeper/etcd (leader election)

Trie Serialization

Persist the Trie to Cassandra or S3 as a serialized snapshot. On startup, load from snapshot (~5-10s) rather than rebuilding from query logs (hours). Rolling restarts avoid cold-start latency.

5. Redis Sorted Set Alternative

For simpler deployments, Redis Sorted Sets provide an excellent autocomplete engine without building a custom Trie service. Each prefix is a Sorted Set key; members are completion strings; scores are query frequencies.

// Populating Redis Sorted Sets for "spring" (all prefix expansions)
ZADD autocomplete:s   9823441 "spring boot"
ZADD autocomplete:sp  9823441 "spring boot"
ZADD autocomplete:spr 9823441 "spring boot"
ZADD autocomplete:spri 9823441 "spring boot"
... (all 11 prefixes)

// Query: user types "spr"
ZREVRANGE autocomplete:spr 0 4 WITHSCORES
// Returns: spring boot (9.8M), spring framework (7.2M), ...

// Memory estimate per prefix key: ~200 bytes × 5 completions = 1KB per key
// Total for 100K hot prefixes × avg prefix length 5 = 500K keys × 1KB = ~500MB
// Very practical for most deployments

Redis vs. Trie Service: When to Use Which

Approach	Scale	Ops Complexity	Best For
Redis Sorted Sets	Up to 10M queries	Low	Startups, internal tools
Custom Trie Service	Billions of queries	High	Google, Amazon, Twitter
Elasticsearch Completion	Medium scale	Medium	Combined search + suggest

6. Offline Aggregation Pipeline (Spark)

Query frequencies must be updated regularly for the autocomplete to reflect actual search popularity. A Spark batch pipeline runs hourly or daily:

Pipeline Stages

Ingest: Raw search logs from Kafka → S3 (Parquet format, partitioned by hour)
Aggregate: Spark reads last 7 days of logs → GROUP BY query → sum search counts
Filter: Remove queries below minimum frequency threshold (e.g., < 100 searches/week)
Normalize: Lowercase, strip special characters, handle misspellings using edit-distance correction
Build: Generate Trie or Redis key-value updates from top-N queries per prefix
Deploy: Push updated Trie snapshot to all shard primaries; replicas sync asynchronously

// Spark aggregation job (pseudocode)
Dataset<Row> logs = spark.read().parquet("s3://search-logs/hourly/**");
Dataset<Row> topQueries = logs
    .filter(col("query_len").gt(1))          // skip single chars
    .groupBy("query")
    .agg(sum("count").as("total_count"))
    .filter(col("total_count").gt(100))       // minimum threshold
    .orderBy(col("total_count").desc())
    .limit(5_000_000);                        // top 5M queries only

// Generate per-prefix top-K
topQueries.flatMap(row -> generatePrefixEntries(row), encoder)
    .groupBy("prefix")
    .agg(topK("query", 5, "total_count"))     // top-5 per prefix
    .write().mode("overwrite").parquet("s3://trie-updates/latest/");

7. Near Real-Time Updates for Trending Queries

A Spark hourly batch introduces up to 60-minute lag for trending queries. During breaking news ("earthquake los angeles"), users expect suggestions to appear within minutes. Two approaches:

Approach 1: Streaming Aggregation (Kafka Streams)

Run a parallel streaming pipeline that computes a sliding window count (last 10 minutes) and injects high-velocity queries directly into Redis with a boosted score. Blend: final_score = (7-day_count × 0.7) + (10-min_velocity × 0.3 × boost_factor).

Approach 2: Trending Topics Override

Maintain a separate "trending" sorted set in Redis, updated every minute. At query time, merge trending results with the regular top-K: if a trending query matches the prefix, boost it to the top regardless of historical frequency. This handles viral events without rebuilding the Trie.

8. Ranking: Frequency, Personalization & Context

Base Ranking Signal: Query Frequency

The simplest and most effective ranking signal is historical search frequency. Popular queries are almost always relevant. The Trie top-K uses raw frequency as the score.

Personalization Layer

User search history: If user has previously searched "system design interview", boost "system design" completions for that user
Location context: Typing "pizza" in New York → "pizza near me nyc"; in Chicago → "pizza chicago style"
Device type: Mobile users get shorter suggestions; desktop users get longer completions
Time context: Monday morning boosts work-related queries; Sunday night boosts entertainment

Safety Filters

Before returning completions, apply a blocklist filter (profanity, hate speech, dangerous content) using a trie-based blocklist or ML classifier. This runs in memory in < 1ms and removes inappropriate suggestions from the returned set.

9. API Design & Client-Side Optimization

API Endpoint

GET /v1/autocomplete?q=spr&limit=5&locale=en-US
Authorization: Bearer {token}

// Response (JSON)
{
  "prefix": "spr",
  "completions": [
    {"query": "spring boot", "score": 9823441},
    {"query": "spring framework", "score": 7234182},
    {"query": "spring data jpa", "score": 4102091},
    {"query": "sprint planning", "score": 2891333},
    {"query": "spreadsheet", "score": 2100009}
  ],
  "request_id": "req_abc123",
  "latency_ms": 12
}

Client-Side Optimizations (90% QPS Reduction)

Debounce: Only fire request after 150–300ms of no typing — eliminates requests for fast typists
Client-side prefix cache: Cache last 20 prefixes in memory; if user backspaces, serve from cache
Minimum length: Don't request completions for prefixes shorter than 2 characters
Abort in-flight: Cancel the previous HTTP request when new character typed (AbortController)
CDN caching: Cache popular prefixes (top 10K) at CDN edge — hit rate 60–80% for global traffic

10. Global Scaling & Multi-Language Support

Geographic Distribution

Deploy regional autocomplete clusters (US, EU, APAC) — each serving local traffic
Trie snapshot replicated to all regions via S3 cross-region replication
Local trending streams per region (trending in Brazil differs from trending in Japan)
Anycast DNS routes users to nearest region automatically

Multi-Language Challenges

Chinese, Japanese, Korean: character-based rather than alphabet-based — Trie nodes store CJK codepoints
RTL languages (Arabic, Hebrew): prefix direction inverted — store reversed string in Trie for correct prefix matching
Transliteration: "pizza" → also match "پیتزا" (Persian) for bilingual users
Separate Trie per locale+language combination; language detection from request headers

11. Design Checklist & Conclusion

Autocomplete System Design Checklist

☐ Trie nodes cache top-K completions (not traverse subtree at query time)
☐ Sharding strategy defined (first-letter prefix) with load-balanced hot prefixes
☐ Offline Spark aggregation pipeline runs hourly/daily on query logs
☐ Near real-time streaming for trending queries (last 10 minutes velocity)
☐ Redis Sorted Sets used as L1 cache in front of Trie service
☐ Client debouncing (150–300ms) and min-prefix-length enforced
☐ CDN caches top-10K prefix responses at edge (60–80% hit rate)
☐ Safety filter (blocklist/ML) applied before returning completions
☐ Trie snapshots persisted to S3 for fast cold-start recovery
☐ Multi-language support: separate Trie/locale with CJK and RTL handling

Autocomplete looks simple but touches almost every hard problem in distributed systems: low-latency reads, high-QPS serving, near-real-time freshness, and personalization. For most applications, start with Redis Sorted Sets — it covers 95% of use cases with minimal complexity. Graduate to a custom distributed Trie service only when you outgrow Redis memory or need sub-10ms global latency at billions of QPS.

Frequently Asked Questions

What is TL;DR — Core Design and how does it work?

What is Functional Requirements and how does it work?

Return top-5 query completions as the user types each character Completions ranked by query frequency (most searched first) Support prefix matching: typing "sys" returns "system design", "syscall", "system monitor", etc. Adapt to trending queries within hours (not weeks)

What is Trie Data Structure — The Core Abstraction and how does it work?

What is Trie Node Structure and how does it work?

class TrieNode { Map<Character, TrieNode> children; boolean isEnd; // marks a complete query string long frequency; // how often this query is searched // Optimization: cache top-K completions at each node List<String> topK; // pre-computed top-5 completions for this prefix } class Trie { void insert(String query, long frequency) { TrieNode node = root; for (char c : query.toCharArray()) { node = node.children.computeIfAbsent(c, k -> new TrieNode()); // Update topK at every ancestor node updateTopK(node, query, frequency); } node.isEnd = true; node.frequency = frequency; } List<String> topKCompletions(String prefix) { TrieNode node = traverseTo(prefix); return node == null ? List.of() : node.topK; // O(prefix.length) + O(1) } } Key optimization: Instead of traversing the entire subtree at query time (expensive), pre-compute and cache the top-K completions at every intermediate node during.