System Design

Designing an Autocomplete/Typeahead System at Scale: Trie, Top-K Ranking & Distributed Architecture

Autocomplete (typeahead) appears simple but conceals deep engineering challenges at Google or Amazon scale: returning top-5 relevant completions in under 100ms for billions of queries per day across 50+ languages, while adapting to trending queries in near real-time.

Md Sanwar Hossain April 6, 2026 18 min read System Design
Autocomplete typeahead system design with Trie data structure and distributed architecture

TL;DR — Core Design

"Store top-K completions per prefix in a Trie (in-memory) or Redis Sorted Set (distributed). Update scores offline via Spark batch on query logs. Shard the Trie by prefix first-letter for horizontal scaling. Client-side debouncing + server-side caching reduce backend QPS by 90%."

Table of Contents

  1. Requirements & Scale Estimation
  2. Trie Data Structure — The Core Abstraction
  3. Top-K per Node — Efficient Completion Retrieval
  4. Distributed Trie — Sharding & Replication
  5. Redis Sorted Set Alternative
  6. Offline Aggregation Pipeline (Spark)
  7. Near Real-Time Updates for Trending Queries
  8. Ranking: Frequency, Personalization & Context
  9. API Design & Client-Side Optimization
  10. Global Scaling & Multi-Language Support
  11. Design Checklist & Conclusion

1. Requirements & Scale Estimation

Functional Requirements

Non-Functional Requirements & Scale

Metric Target Notes
Read latency < 100ms P99 User perceives as instant
QPS (peak) 100,000 req/sec Google: 99K searches/sec
Total unique queries 5 billion Top 1M queries cover 90% traffic
Freshness Trending within 1 hour Breaking news, viral events

2. Trie Data Structure — The Core Abstraction

A Trie (prefix tree) is the natural data structure for prefix-based lookups. Each node represents one character; traversing from root to any node spells out a prefix. All strings sharing that prefix are in the subtree below that node.

Trie data structure for autocomplete: prefix tree nodes, top-K scores per node, and distributed service architecture
Trie Structure and Distributed Autocomplete Architecture — prefix nodes, top-K per node, Redis/Cassandra sharding. Source: mdsanwarhossain.me

Trie Node Structure

class TrieNode {
    Map<Character, TrieNode> children;
    boolean isEnd;           // marks a complete query string
    long frequency;          // how often this query is searched
    // Optimization: cache top-K completions at each node
    List<String> topK;       // pre-computed top-5 completions for this prefix
}

class Trie {
    void insert(String query, long frequency) {
        TrieNode node = root;
        for (char c : query.toCharArray()) {
            node = node.children.computeIfAbsent(c, k -> new TrieNode());
            // Update topK at every ancestor node
            updateTopK(node, query, frequency);
        }
        node.isEnd = true;
        node.frequency = frequency;
    }

    List<String> topKCompletions(String prefix) {
        TrieNode node = traverseTo(prefix);
        return node == null ? List.of() : node.topK; // O(prefix.length) + O(1)
    }
}

Key optimization: Instead of traversing the entire subtree at query time (expensive), pre-compute and cache the top-K completions at every intermediate node during the insert/update phase. Retrieval becomes O(prefix_length) = effectively O(1) for short prefixes.

Time & Space Complexity

Operation Time (with top-K cache) Time (without cache)
Insert O(L × K log K) O(L)
Search (top-K completions) O(L) O(L + subtree size)
Space (5B queries, avg 20 chars) ~50GB uncompressed; 15–20GB with prefix sharing

3. Top-K per Node — Efficient Completion Retrieval

Each Trie node caches its top-K completions (K=5 typically). When a new query is inserted or its frequency updated, propagate the update up to all ancestor nodes, maintaining a sorted list of top-K at each node.

// Update top-K at a node — min-heap approach
void updateTopK(TrieNode node, String query, long frequency) {
    // Replace if query already in top-K (update score)
    node.topK.removeIf(e -> e.query.equals(query));
    node.topK.add(new Completion(query, frequency));
    // Sort descending by frequency, keep only top-K
    node.topK.sort((a, b) -> Long.compare(b.frequency, a.frequency));
    if (node.topK.size() > K) node.topK = node.topK.subList(0, K);
}

// Example: prefix "spr" → top-5 completions
// 1. "spring boot"      score: 9,823,441
// 2. "spring framework" score: 7,234,182
// 3. "spring data jpa"  score: 4,102,091
// 4. "sprint planning"  score: 2,891,333
// 5. "spreadsheet"      score: 2,100,009

4. Distributed Trie — Sharding & Replication

A single in-memory Trie for 5B queries won't fit on one machine (~50GB). Distribute by prefix partition.

Sharding Strategy: First Letter Prefix

Replication for High Availability

Trie Serialization

Persist the Trie to Cassandra or S3 as a serialized snapshot. On startup, load from snapshot (~5-10s) rather than rebuilding from query logs (hours). Rolling restarts avoid cold-start latency.

5. Redis Sorted Set Alternative

For simpler deployments, Redis Sorted Sets provide an excellent autocomplete engine without building a custom Trie service. Each prefix is a Sorted Set key; members are completion strings; scores are query frequencies.

// Populating Redis Sorted Sets for "spring" (all prefix expansions)
ZADD autocomplete:s   9823441 "spring boot"
ZADD autocomplete:sp  9823441 "spring boot"
ZADD autocomplete:spr 9823441 "spring boot"
ZADD autocomplete:spri 9823441 "spring boot"
... (all 11 prefixes)

// Query: user types "spr"
ZREVRANGE autocomplete:spr 0 4 WITHSCORES
// Returns: spring boot (9.8M), spring framework (7.2M), ...

// Memory estimate per prefix key: ~200 bytes × 5 completions = 1KB per key
// Total for 100K hot prefixes × avg prefix length 5 = 500K keys × 1KB = ~500MB
// Very practical for most deployments

Redis vs. Trie Service: When to Use Which

Approach Scale Ops Complexity Best For
Redis Sorted Sets Up to 10M queries Low Startups, internal tools
Custom Trie Service Billions of queries High Google, Amazon, Twitter
Elasticsearch Completion Medium scale Medium Combined search + suggest

6. Offline Aggregation Pipeline (Spark)

Query frequencies must be updated regularly for the autocomplete to reflect actual search popularity. A Spark batch pipeline runs hourly or daily:

Pipeline Stages

  1. Ingest: Raw search logs from Kafka → S3 (Parquet format, partitioned by hour)
  2. Aggregate: Spark reads last 7 days of logs → GROUP BY query → sum search counts
  3. Filter: Remove queries below minimum frequency threshold (e.g., < 100 searches/week)
  4. Normalize: Lowercase, strip special characters, handle misspellings using edit-distance correction
  5. Build: Generate Trie or Redis key-value updates from top-N queries per prefix
  6. Deploy: Push updated Trie snapshot to all shard primaries; replicas sync asynchronously
// Spark aggregation job (pseudocode)
Dataset<Row> logs = spark.read().parquet("s3://search-logs/hourly/**");
Dataset<Row> topQueries = logs
    .filter(col("query_len").gt(1))          // skip single chars
    .groupBy("query")
    .agg(sum("count").as("total_count"))
    .filter(col("total_count").gt(100))       // minimum threshold
    .orderBy(col("total_count").desc())
    .limit(5_000_000);                        // top 5M queries only

// Generate per-prefix top-K
topQueries.flatMap(row -> generatePrefixEntries(row), encoder)
    .groupBy("prefix")
    .agg(topK("query", 5, "total_count"))     // top-5 per prefix
    .write().mode("overwrite").parquet("s3://trie-updates/latest/");

7. Near Real-Time Updates for Trending Queries

A Spark hourly batch introduces up to 60-minute lag for trending queries. During breaking news ("earthquake los angeles"), users expect suggestions to appear within minutes. Two approaches:

Approach 1: Streaming Aggregation (Kafka Streams)

Run a parallel streaming pipeline that computes a sliding window count (last 10 minutes) and injects high-velocity queries directly into Redis with a boosted score. Blend: final_score = (7-day_count × 0.7) + (10-min_velocity × 0.3 × boost_factor).

Approach 2: Trending Topics Override

Maintain a separate "trending" sorted set in Redis, updated every minute. At query time, merge trending results with the regular top-K: if a trending query matches the prefix, boost it to the top regardless of historical frequency. This handles viral events without rebuilding the Trie.

8. Ranking: Frequency, Personalization & Context

Base Ranking Signal: Query Frequency

The simplest and most effective ranking signal is historical search frequency. Popular queries are almost always relevant. The Trie top-K uses raw frequency as the score.

Personalization Layer

Safety Filters

Before returning completions, apply a blocklist filter (profanity, hate speech, dangerous content) using a trie-based blocklist or ML classifier. This runs in memory in < 1ms and removes inappropriate suggestions from the returned set.

9. API Design & Client-Side Optimization

API Endpoint

GET /v1/autocomplete?q=spr&limit=5&locale=en-US
Authorization: Bearer {token}

// Response (JSON)
{
  "prefix": "spr",
  "completions": [
    {"query": "spring boot", "score": 9823441},
    {"query": "spring framework", "score": 7234182},
    {"query": "spring data jpa", "score": 4102091},
    {"query": "sprint planning", "score": 2891333},
    {"query": "spreadsheet", "score": 2100009}
  ],
  "request_id": "req_abc123",
  "latency_ms": 12
}

Client-Side Optimizations (90% QPS Reduction)

10. Global Scaling & Multi-Language Support

Geographic Distribution

Multi-Language Challenges

11. Design Checklist & Conclusion

Autocomplete System Design Checklist

  • ☐ Trie nodes cache top-K completions (not traverse subtree at query time)
  • ☐ Sharding strategy defined (first-letter prefix) with load-balanced hot prefixes
  • ☐ Offline Spark aggregation pipeline runs hourly/daily on query logs
  • ☐ Near real-time streaming for trending queries (last 10 minutes velocity)
  • ☐ Redis Sorted Sets used as L1 cache in front of Trie service
  • ☐ Client debouncing (150–300ms) and min-prefix-length enforced
  • ☐ CDN caches top-10K prefix responses at edge (60–80% hit rate)
  • ☐ Safety filter (blocklist/ML) applied before returning completions
  • ☐ Trie snapshots persisted to S3 for fast cold-start recovery
  • ☐ Multi-language support: separate Trie/locale with CJK and RTL handling

Autocomplete looks simple but touches almost every hard problem in distributed systems: low-latency reads, high-QPS serving, near-real-time freshness, and personalization. For most applications, start with Redis Sorted Sets — it covers 95% of use cases with minimal complexity. Graduate to a custom distributed Trie service only when you outgrow Redis memory or need sub-10ms global latency at billions of QPS.

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · System Design

All Posts
Last updated: April 6, 2026