Designing a Maps & Navigation System at Scale: Google Maps Architecture, Routing & Live Traffic
Google Maps serves over 1 billion users monthly, answering routing queries in milliseconds, rendering interactive maps with billions of tiles, and incorporating live traffic from millions of GPS-traced devices. This guide covers the complete system: from how map tiles are generated and cached, to routing graph algorithms, ETA prediction with ML, and geocoding at global scale.
TL;DR — Core Architecture Decisions
"A maps platform needs: (1) a tile rendering pipeline that pre-generates billions of PNG/WebP map tiles at zoom levels 0–21 and serves them from a global CDN, (2) a road graph (OpenStreetMap-scale: 1B+ nodes) pre-processed with CH (Contraction Hierarchies) for sub-millisecond routing, (3) live traffic ingestion from GPS probes via Kafka into a stream processor that updates edge weights in real time, (4) an ETA model combining graph travel time + ML corrections for traffic patterns, and (5) a geocoding service that resolves place names ↔ coordinates using a multi-layer index (exact → prefix → fuzzy)."
Table of Contents
- System Architecture Overview
- Map Data Ingestion & Processing Pipeline
- Tile Rendering & CDN Delivery
- Road Graph & Routing Algorithms
- Live Traffic Ingestion & Graph Updates
- ETA Prediction: From Dijkstra to ML
- Geocoding & Reverse Geocoding
- Places Search & POI Data
- Turn-by-Turn Navigation
- Scale, Fault Tolerance & Cost
- Capacity Estimation & Conclusion
1. System Architecture Overview
A maps platform consists of several distinct subsystems that share geographic data but serve fundamentally different user needs:
- Map Rendering: Generate and serve visual map tiles at every zoom level. Read-heavy; served almost entirely from CDN.
- Routing: Compute the shortest/fastest path between two or more locations. Computationally intensive; must return results in <200ms for global queries.
- Traffic: Ingest GPS traces from hundreds of millions of devices and compute real-time traffic speed per road segment.
- Geocoding: Convert between human-readable addresses and geographic coordinates. Highly variable query patterns — both forward (address → lat/lng) and reverse (lat/lng → address).
- Places: Search for businesses, landmarks, and points of interest (POIs). Combines text search with geographic proximity.
- Navigation: Turn-by-turn guidance that re-routes in real time as the user deviates or traffic conditions change.
2. Map Data Ingestion & Processing Pipeline
The foundation of any maps platform is geographic data. Google Maps combines dozens of data sources into a unified, authoritative representation of the physical world.
Data Sources
- Government / official data: Road network from national mapping agencies (USGS, Ordnance Survey), building footprints, administrative boundaries.
- OpenStreetMap: Community-maintained dataset covering 1B+ GPS nodes and 800M+ ways globally. Basis for many commercial alternatives.
- Street View imagery: 360° photos used to extract business names, hours, road signs, and lane markings via computer vision.
- Satellite & aerial imagery: Used for terrain rendering, building height extraction (3D maps), and change detection.
- Business data providers: Yelp, Foursquare, local data aggregators for POI data (names, hours, phone, reviews).
- User contributions: Corrections submitted via "Edit the map" flow — reviewed and merged with quality scoring.
Data Processing Pipeline
All data sources feed into a batch processing pipeline (Google uses MapReduce/Dataflow) that: normalizes formats → detects conflicts between sources → applies authority ranking (official data beats crowdsourced) → validates topology (roads must connect, no orphan nodes) → generates the canonical graph and place database. This pipeline runs continuously; changes from Street View ML detectors or user edits trigger incremental re-processing of affected regions.
3. Tile Rendering & CDN Delivery
The classic map tile system divides the world into a grid of square images at each zoom level. At zoom level 0, the entire world fits in a single 256×256 pixel tile. Each zoom level quadruples the number of tiles. At zoom level 21 (street level), there are 4.4 trillion tiles.
Tile Generation Pipeline
- Style definition: Cartographers define styling rules: which features appear at which zoom level, colors, fonts. Google Maps uses a proprietary style language; open-source systems use MapboxGL styles.
- Batch rendering: A MapReduce job iterates over all (zoom, x, y) tile coordinates for the changed region, renders each tile using the geographic data + style, outputs PNG/WebP images.
- Tile storage: Tiles are stored in a hierarchical key-value store keyed by
zoom/x/y. At zoom >15, only populated tiles (non-empty oceans) are stored — sparsity drops storage by 60%+. - CDN push: Changed tiles are pushed to the CDN. With 200+ PoPs, a map tile for any city is cached <50ms from most users.
Vector Tiles: The Modern Approach
Google Maps shifted to vector tiles on mobile in 2013. Instead of sending pre-rendered images, the server sends raw geographic data (road outlines, building footprints) as binary-encoded vectors. The client renders them locally using WebGL (MapboxGL) or native GL. Benefits: 10× smaller payload, smooth rotation/tilt, runtime style changes, and 3D extrusion of buildings. The tile data is still served from CDN; rendering computation shifts to the client GPU.
4. Road Graph & Routing Algorithms
Routing is a shortest-path problem on a directed weighted graph where nodes are intersections and edges are road segments. The challenge: the global road network has 1 billion+ nodes — naive Dijkstra runs in O((V + E) log V) and cannot return results fast enough for global queries.
Contraction Hierarchies (CH)
Contraction Hierarchies is the industry-standard technique for fast routing on large road networks. It precomputes a hierarchical overlay graph:
- Node ordering: Assign each node an "importance" rank (based on degree, edge density, hierarchy in the road network). Highways and major roads get higher ranks.
- Contraction: Remove low-rank nodes one by one, adding "shortcut" edges that preserve the shortest path through the removed node. The graph gains shortcut edges but becomes much sparser at higher ranks.
- Bidirectional Dijkstra: At query time, run Dijkstra forward from the source and backward from the destination, relaxing only edges to higher-rank nodes. The two searches meet at a "meeting node," yielding the shortest path in milliseconds — even across continents.
| Algorithm | Preprocessing | Query Time | Traffic Updates |
|---|---|---|---|
| Dijkstra (basic) | None | Seconds (global) | Trivial |
| A* with heuristic | None | 100ms–2s (global) | Trivial |
| Contraction Hierarchies | Hours (one-time) | <10ms (global) | Complex (re-weight) |
| CH + Customizable (CCH) | Hours (structure), seconds (weights) | <10ms (global) | Fast (re-weight structure) |
5. Live Traffic Ingestion & Graph Updates
Google Maps computes live traffic by aggregating anonymized GPS traces from Android devices with location sharing enabled. On busy roads, hundreds of cars per minute serve as live speed sensors — far more granular than fixed traffic cameras.
Traffic Processing Pipeline
# Traffic ingestion flow
Millions of Android devices → GPS trace batches (every 30s)
│
▼
Kafka topic "gps-traces" (500k events/sec)
│
▼
Stream processor (Flink/Dataflow):
1. Map-match: snap GPS coordinates to nearest road segment
2. Compute speed: distance between consecutive points / time
3. Aggregate: median speed per road segment per 1-minute window
4. Detect incidents: speed < 10% of baseline → flag potential incident
│
▼
Traffic graph store (Redis): edge_id → current_speed, congestion_level
│
▼
Routing service reads edge weights from traffic graph at query time
6. ETA Prediction: From Dijkstra to ML
ETA prediction is a multi-layer problem. The base estimate comes from graph traversal (sum of edge travel times on the computed path), but this is systematically biased — traffic changes over the course of a trip, drivers stop at traffic lights, and historical patterns matter.
ETA Correction Layers
- Base graph ETA: Sum of (edge_length / current_speed) for all edges on the route. Includes typical turn delay constants (e.g., left turns average 12 seconds more than right turns in US cities).
- Traffic progression model: Predict how traffic will change during the trip using historical patterns by time-of-day, day-of-week, and special events. A trip starting at 4:55pm might encounter rush-hour traffic that doesn't exist when the query is made.
- ML correction: A gradient-boosted tree model trained on historical trip records adjusts the graph estimate based on: distance, time of day, day of week, origin/destination types, predicted weather, and error patterns for that specific route.
- Real-time updates: As the trip progresses, the ETA is continuously recalculated with fresh traffic data for remaining segments.
7. Geocoding & Reverse Geocoding
Geocoding converts a human-readable address ("1600 Amphitheatre Pkwy, Mountain View, CA") to coordinates (37.4224, -122.0849). Reverse geocoding goes the other way. Both must handle abbreviations, typos, international address formats, and partial addresses.
Multi-Layer Geocoding Index
- Layer 1 — Exact match: Key-value lookup on normalized address string. Cache hit for most repeated queries (e.g., "Times Square, New York").
- Layer 2 — Token-based prefix search: Parse the address into tokens (house number, street name, city, zip). Look up each token in an inverted index. Score matches by how many tokens align and return top-5 candidates.
- Layer 3 — Fuzzy matching: For typos and alternative spellings, use edit-distance matching on street names with geographic proximity ranking (a "Mian St" query near Austin should prefer "Main St, Austin, TX" over "Main St, Austin, MN").
- Layer 4 — ML reranking: A neural model ranks candidates by combining text similarity, geographic coherence, and query context signals (user's current location, recent searches).
8. Places Search & POI Data
Google Maps indexes hundreds of millions of places (businesses, landmarks, public facilities). A "coffee shop near me" query must combine text search with geographic proximity and quality ranking in under 100ms.
Places Search Architecture
- Index: Elasticsearch with geospatial support indexes all POIs. Each document contains: name, category, location (lat/lng), rating, review count, price tier, opening hours, and text embeddings.
- Query execution: Query is parsed into intent (category: "coffee") + location constraint (within 2km of user). Elasticsearch executes a geo-filtered query with BM25 text relevance.
- Ranking: A learned-to-rank model re-scores candidates using: text relevance, proximity, quality (rating × review count), business status (open now?), and personalization (user has visited before).
- Freshness: Business data changes constantly (hours change, businesses close). A change data pipeline from business owners + Google's web crawlers keeps data fresh, triggering Elasticsearch index updates via Kafka consumers.
9. Turn-by-Turn Navigation
Navigation differs from routing: instead of returning a single path, the system must guide the user in real time, detecting when they deviate and re-routing instantly.
Client-Server Split
The initial route and turn-by-turn instructions are downloaded to the client at session start. The client executes navigation logic locally (position matching to route, upcoming turn detection) without server calls — this enables offline navigation and reduces server load. Server involvement during active navigation:
- GPS position upload every 5 seconds (traffic probe contribution)
- Traffic update poll every 60 seconds (are there new incidents on my route?)
- Re-route request when user deviates more than 50m from route
- ETA update pushed from server when traffic significantly changes
10. Scale, Fault Tolerance & Cost
Tile Serving
- 99%+ of tile requests served from CDN cache
- Cache TTL: 24h for static tiles; 5min for tiles with live traffic overlays
- Tile storage: ~10 PB (all zoom levels, all regions)
- On CDN miss: origin serves from GCS; re-renders only if data changed
Routing Service
- Road graph loaded entirely in RAM per routing server (~50 GB)
- Routing servers are stateless; any server can answer any query
- Queries: 10M+ routing requests/day; peak ~500 QPS per server
- Routing graph updates: edge weights re-computed from traffic every 2min
11. Capacity Estimation & Conclusion
Back-of-Envelope: Tile Requests
- 1 billion MAU × 5 map sessions/day × 20 tile requests/session = 100 billion tile requests/day
- 100B ÷ 86,400 sec = ~1.16 million tile requests/second
- With 99% CDN cache hit ratio: 11,600 tile requests/second reach origin
- Each tile ≈ 15 KB (vector tile) → 1.16M × 15KB = 17.4 GB/s egress from CDN
A maps platform is uniquely interesting because it is read-dominated at massive scale (tile serving), but has critical write-intensive subsystems (traffic ingestion). The CDN handles the read load elegantly; the routing system's use of Contraction Hierarchies is one of the most elegant graph algorithm applications in production. The ETA prediction system — combining graph algorithms with ML corrections — is a template for how mathematical models and machine learning can be layered productively.