Designing an Autocomplete/Typeahead System at Scale: Trie, Top-K Ranking & Distributed Architecture
Autocomplete (typeahead) appears simple but conceals deep engineering challenges at Google or Amazon scale: returning top-5 relevant completions in under 100ms for billions of queries per day across 50+ languages, while adapting to trending queries in near real-time.
TL;DR — Core Design
"Store top-K completions per prefix in a Trie (in-memory) or Redis Sorted Set (distributed). Update scores offline via Spark batch on query logs. Shard the Trie by prefix first-letter for horizontal scaling. Client-side debouncing + server-side caching reduce backend QPS by 90%."
Table of Contents
- Requirements & Scale Estimation
- Trie Data Structure — The Core Abstraction
- Top-K per Node — Efficient Completion Retrieval
- Distributed Trie — Sharding & Replication
- Redis Sorted Set Alternative
- Offline Aggregation Pipeline (Spark)
- Near Real-Time Updates for Trending Queries
- Ranking: Frequency, Personalization & Context
- API Design & Client-Side Optimization
- Global Scaling & Multi-Language Support
- Design Checklist & Conclusion
1. Requirements & Scale Estimation
Functional Requirements
- Return top-5 query completions as the user types each character
- Completions ranked by query frequency (most searched first)
- Support prefix matching: typing "sys" returns "system design", "syscall", "system monitor", etc.
- Adapt to trending queries within hours (not weeks)
- Optionally: personalized suggestions based on user history
- Support 50+ languages with different character sets
Non-Functional Requirements & Scale
| Metric | Target | Notes |
|---|---|---|
| Read latency | < 100ms P99 | User perceives as instant |
| QPS (peak) | 100,000 req/sec | Google: 99K searches/sec |
| Total unique queries | 5 billion | Top 1M queries cover 90% traffic |
| Freshness | Trending within 1 hour | Breaking news, viral events |
2. Trie Data Structure — The Core Abstraction
A Trie (prefix tree) is the natural data structure for prefix-based lookups. Each node represents one character; traversing from root to any node spells out a prefix. All strings sharing that prefix are in the subtree below that node.
Trie Node Structure
class TrieNode {
Map<Character, TrieNode> children;
boolean isEnd; // marks a complete query string
long frequency; // how often this query is searched
// Optimization: cache top-K completions at each node
List<String> topK; // pre-computed top-5 completions for this prefix
}
class Trie {
void insert(String query, long frequency) {
TrieNode node = root;
for (char c : query.toCharArray()) {
node = node.children.computeIfAbsent(c, k -> new TrieNode());
// Update topK at every ancestor node
updateTopK(node, query, frequency);
}
node.isEnd = true;
node.frequency = frequency;
}
List<String> topKCompletions(String prefix) {
TrieNode node = traverseTo(prefix);
return node == null ? List.of() : node.topK; // O(prefix.length) + O(1)
}
}
Key optimization: Instead of traversing the entire subtree at query time (expensive), pre-compute and cache the top-K completions at every intermediate node during the insert/update phase. Retrieval becomes O(prefix_length) = effectively O(1) for short prefixes.
Time & Space Complexity
| Operation | Time (with top-K cache) | Time (without cache) |
|---|---|---|
| Insert | O(L × K log K) | O(L) |
| Search (top-K completions) | O(L) | O(L + subtree size) |
| Space (5B queries, avg 20 chars) | ~50GB uncompressed; 15–20GB with prefix sharing | |
3. Top-K per Node — Efficient Completion Retrieval
Each Trie node caches its top-K completions (K=5 typically). When a new query is inserted or its frequency updated, propagate the update up to all ancestor nodes, maintaining a sorted list of top-K at each node.
// Update top-K at a node — min-heap approach
void updateTopK(TrieNode node, String query, long frequency) {
// Replace if query already in top-K (update score)
node.topK.removeIf(e -> e.query.equals(query));
node.topK.add(new Completion(query, frequency));
// Sort descending by frequency, keep only top-K
node.topK.sort((a, b) -> Long.compare(b.frequency, a.frequency));
if (node.topK.size() > K) node.topK = node.topK.subList(0, K);
}
// Example: prefix "spr" → top-5 completions
// 1. "spring boot" score: 9,823,441
// 2. "spring framework" score: 7,234,182
// 3. "spring data jpa" score: 4,102,091
// 4. "sprint planning" score: 2,891,333
// 5. "spreadsheet" score: 2,100,009
4. Distributed Trie — Sharding & Replication
A single in-memory Trie for 5B queries won't fit on one machine (~50GB). Distribute by prefix partition.
Sharding Strategy: First Letter Prefix
- Shard 1: prefixes starting with a–e
- Shard 2: prefixes starting with f–l
- Shard 3: prefixes starting with m–r
- Shard 4: prefixes starting with s–z + numbers
- Adjust shard boundaries based on actual traffic distribution (hot prefixes like "s" get more shards)
Replication for High Availability
- Each shard has a primary + 2 read replicas (3-way replication)
- All reads served from read replicas — updates only go to primary, then async replicate
- Trie updates are infrequent (hourly batch) — eventual consistency is acceptable
- If a shard primary fails, replica election via ZooKeeper/etcd (leader election)
Trie Serialization
Persist the Trie to Cassandra or S3 as a serialized snapshot. On startup, load from snapshot (~5-10s) rather than rebuilding from query logs (hours). Rolling restarts avoid cold-start latency.
5. Redis Sorted Set Alternative
For simpler deployments, Redis Sorted Sets provide an excellent autocomplete engine without building a custom Trie service. Each prefix is a Sorted Set key; members are completion strings; scores are query frequencies.
// Populating Redis Sorted Sets for "spring" (all prefix expansions)
ZADD autocomplete:s 9823441 "spring boot"
ZADD autocomplete:sp 9823441 "spring boot"
ZADD autocomplete:spr 9823441 "spring boot"
ZADD autocomplete:spri 9823441 "spring boot"
... (all 11 prefixes)
// Query: user types "spr"
ZREVRANGE autocomplete:spr 0 4 WITHSCORES
// Returns: spring boot (9.8M), spring framework (7.2M), ...
// Memory estimate per prefix key: ~200 bytes × 5 completions = 1KB per key
// Total for 100K hot prefixes × avg prefix length 5 = 500K keys × 1KB = ~500MB
// Very practical for most deployments
Redis vs. Trie Service: When to Use Which
| Approach | Scale | Ops Complexity | Best For |
|---|---|---|---|
| Redis Sorted Sets | Up to 10M queries | Low | Startups, internal tools |
| Custom Trie Service | Billions of queries | High | Google, Amazon, Twitter |
| Elasticsearch Completion | Medium scale | Medium | Combined search + suggest |
6. Offline Aggregation Pipeline (Spark)
Query frequencies must be updated regularly for the autocomplete to reflect actual search popularity. A Spark batch pipeline runs hourly or daily:
Pipeline Stages
- Ingest: Raw search logs from Kafka → S3 (Parquet format, partitioned by hour)
- Aggregate: Spark reads last 7 days of logs →
GROUP BY query→ sum search counts - Filter: Remove queries below minimum frequency threshold (e.g., < 100 searches/week)
- Normalize: Lowercase, strip special characters, handle misspellings using edit-distance correction
- Build: Generate Trie or Redis key-value updates from top-N queries per prefix
- Deploy: Push updated Trie snapshot to all shard primaries; replicas sync asynchronously
// Spark aggregation job (pseudocode)
Dataset<Row> logs = spark.read().parquet("s3://search-logs/hourly/**");
Dataset<Row> topQueries = logs
.filter(col("query_len").gt(1)) // skip single chars
.groupBy("query")
.agg(sum("count").as("total_count"))
.filter(col("total_count").gt(100)) // minimum threshold
.orderBy(col("total_count").desc())
.limit(5_000_000); // top 5M queries only
// Generate per-prefix top-K
topQueries.flatMap(row -> generatePrefixEntries(row), encoder)
.groupBy("prefix")
.agg(topK("query", 5, "total_count")) // top-5 per prefix
.write().mode("overwrite").parquet("s3://trie-updates/latest/");
7. Near Real-Time Updates for Trending Queries
A Spark hourly batch introduces up to 60-minute lag for trending queries. During breaking news ("earthquake los angeles"), users expect suggestions to appear within minutes. Two approaches:
Approach 1: Streaming Aggregation (Kafka Streams)
Run a parallel streaming pipeline that computes a sliding window count (last 10 minutes) and injects high-velocity queries directly into Redis with a boosted score. Blend: final_score = (7-day_count × 0.7) + (10-min_velocity × 0.3 × boost_factor).
Approach 2: Trending Topics Override
Maintain a separate "trending" sorted set in Redis, updated every minute. At query time, merge trending results with the regular top-K: if a trending query matches the prefix, boost it to the top regardless of historical frequency. This handles viral events without rebuilding the Trie.
8. Ranking: Frequency, Personalization & Context
Base Ranking Signal: Query Frequency
The simplest and most effective ranking signal is historical search frequency. Popular queries are almost always relevant. The Trie top-K uses raw frequency as the score.
Personalization Layer
- User search history: If user has previously searched "system design interview", boost "system design" completions for that user
- Location context: Typing "pizza" in New York → "pizza near me nyc"; in Chicago → "pizza chicago style"
- Device type: Mobile users get shorter suggestions; desktop users get longer completions
- Time context: Monday morning boosts work-related queries; Sunday night boosts entertainment
Safety Filters
Before returning completions, apply a blocklist filter (profanity, hate speech, dangerous content) using a trie-based blocklist or ML classifier. This runs in memory in < 1ms and removes inappropriate suggestions from the returned set.
9. API Design & Client-Side Optimization
API Endpoint
GET /v1/autocomplete?q=spr&limit=5&locale=en-US
Authorization: Bearer {token}
// Response (JSON)
{
"prefix": "spr",
"completions": [
{"query": "spring boot", "score": 9823441},
{"query": "spring framework", "score": 7234182},
{"query": "spring data jpa", "score": 4102091},
{"query": "sprint planning", "score": 2891333},
{"query": "spreadsheet", "score": 2100009}
],
"request_id": "req_abc123",
"latency_ms": 12
}
Client-Side Optimizations (90% QPS Reduction)
- Debounce: Only fire request after 150–300ms of no typing — eliminates requests for fast typists
- Client-side prefix cache: Cache last 20 prefixes in memory; if user backspaces, serve from cache
- Minimum length: Don't request completions for prefixes shorter than 2 characters
- Abort in-flight: Cancel the previous HTTP request when new character typed (AbortController)
- CDN caching: Cache popular prefixes (top 10K) at CDN edge — hit rate 60–80% for global traffic
10. Global Scaling & Multi-Language Support
Geographic Distribution
- Deploy regional autocomplete clusters (US, EU, APAC) — each serving local traffic
- Trie snapshot replicated to all regions via S3 cross-region replication
- Local trending streams per region (trending in Brazil differs from trending in Japan)
- Anycast DNS routes users to nearest region automatically
Multi-Language Challenges
- Chinese, Japanese, Korean: character-based rather than alphabet-based — Trie nodes store CJK codepoints
- RTL languages (Arabic, Hebrew): prefix direction inverted — store reversed string in Trie for correct prefix matching
- Transliteration: "pizza" → also match "پیتزا" (Persian) for bilingual users
- Separate Trie per locale+language combination; language detection from request headers
11. Design Checklist & Conclusion
Autocomplete System Design Checklist
- ☐ Trie nodes cache top-K completions (not traverse subtree at query time)
- ☐ Sharding strategy defined (first-letter prefix) with load-balanced hot prefixes
- ☐ Offline Spark aggregation pipeline runs hourly/daily on query logs
- ☐ Near real-time streaming for trending queries (last 10 minutes velocity)
- ☐ Redis Sorted Sets used as L1 cache in front of Trie service
- ☐ Client debouncing (150–300ms) and min-prefix-length enforced
- ☐ CDN caches top-10K prefix responses at edge (60–80% hit rate)
- ☐ Safety filter (blocklist/ML) applied before returning completions
- ☐ Trie snapshots persisted to S3 for fast cold-start recovery
- ☐ Multi-language support: separate Trie/locale with CJK and RTL handling
Autocomplete looks simple but touches almost every hard problem in distributed systems: low-latency reads, high-QPS serving, near-real-time freshness, and personalization. For most applications, start with Redis Sorted Sets — it covers 95% of use cases with minimal complexity. Graduate to a custom distributed Trie service only when you outgrow Redis memory or need sub-10ms global latency at billions of QPS.