Designing a Food Delivery System at Scale: DoorDash Architecture, Courier Dispatch & ETA Prediction
Building a food delivery platform that handles millions of orders per day — with real-time courier dispatch, sub-second ETA updates, and five-nines reliability — is one of the most demanding system design challenges in industry. This guide tears apart every subsystem: from idempotent order placement and H3-indexed geospatial dispatch, to ML-powered ETA prediction and dynamic surge pricing that keeps supply and demand in balance.
TL;DR — Core Design Decisions
"Use an idempotent order service backed by a distributed saga, an H3 hex-grid dispatch engine with expanding ring search for courier matching, a Kafka-driven GPS pipeline for real-time location push, and a two-stage ML ETA model (prep time + travel time) that recalculates every 30 seconds. Partition every hot data path by geographic zone to contain blast radius and scale independently."
Table of Contents
- Functional & Non-Functional Requirements
- High-Level Architecture Overview
- Order Service — Cart, Checkout & Idempotent Payments
- Restaurant Service — Menu, Prep Time & Availability
- Courier Dispatch Engine — H3 Grid, Batching & Acceptance
- Real-Time Location Service — GPS Streams & WebSockets
- ETA Prediction — ML Features, Online Recalculation
- Dynamic Surge Pricing — Demand/Supply Ratio & Incentives
- Notification & Communication — Push, SMS, Rate Limiting
- Capacity Estimation — Peak Orders, GPS Events, Bandwidth
- Scalability & Reliability Patterns
- System Design Interview Checklist
1. Functional & Non-Functional Requirements
Before drawing boxes and arrows, nail down the exact problem scope. In a system design interview, requirements elicitation is the first thing the interviewer evaluates. Food delivery sits at the intersection of three real-time domains — ordering, logistics, and payments — all of which carry different SLA expectations.
Functional Requirements
- Customer flow: Browse restaurants by location, view menus, add items to cart, place order, pay, track order in real time, receive delivered notification.
- Restaurant flow: Accept/reject orders, update prep time estimate, mark order as ready for pickup, update menu availability in real time.
- Courier flow: Go online/offline, receive dispatch offer (with accept/decline), navigate to restaurant, pick up order, navigate to customer, confirm delivery.
- Operations: Dynamic surge pricing by zone and time window, order batching (one courier carries multiple orders from nearby restaurants to nearby customers), ETA displayed at checkout and updated continuously during delivery.
- Payments: Idempotent charge capture at order placement; refunds on failed/cancelled orders; tip adjustment post-delivery.
Non-Functional Requirements
| Dimension | Target | Notes |
|---|---|---|
| Availability | 99.99% (order path) | Multi-region active-active |
| Order placement latency | < 500 ms p99 | Async payment capture allowed |
| Dispatch offer latency | < 3 s from order confirmed | First courier offer sent |
| Location update frequency | Every 5 seconds (courier app) | Adaptive: 15 s when idle |
| ETA accuracy | ±3 minutes (80th percentile) | Customer-facing promise |
| Throughput | 500,000 concurrent active orders | Peak dinner rush globally |
| Consistency | Eventual (location), strong (payments) | Mixed model per domain |
2. High-Level Architecture Overview
A food delivery platform is best understood as a set of loosely-coupled domain services communicating over an event bus, with geographic partitioning as the primary scaling axis. The following diagram illustrates the major components and their primary data flows.
The system is organized around six core domains, each owning its data store and communicating asynchronously through Kafka topics:
- Order Service: manages the order lifecycle (cart → placed → accepted → picked up → delivered → cancelled). Owns an RDBMS (PostgreSQL) for ACID guarantees on order state transitions.
- Restaurant Service: manages menus, operating hours, prep time estimates, and real-time availability signals. Uses a document store (DynamoDB) for flexible menu schemas.
- Dispatch Engine: stateful service that matches couriers to orders using geospatial search. Maintains courier positions in Redis with GEO commands and H3 index lookups.
- Location Service: ingests GPS pings from courier mobile apps via a high-throughput Kafka topic, stores last-known position in Redis, and fans out real-time updates to customers over WebSockets.
- ETA Service: combines prep time signals from restaurants with travel time estimates from the routing engine and ML models trained on historical delivery data.
- Pricing Service: computes surge multipliers per geographic zone based on real-time supply/demand ratio, persisted in Redis with short TTLs.
API Gateway & Client Communication
Customer and courier mobile apps communicate through a GraphQL API Gateway for request/response flows (menu queries, order placement, history). For real-time tracking, both apps maintain a persistent WebSocket connection to the Location Service through a dedicated WebSocket Gateway cluster. The WebSocket gateway is horizontally scaled and each instance maintains at most 50,000 concurrent connections, with sticky routing via a consistent hash on user_id.
3. Order Service — Cart, Checkout & Idempotent Payments
The order service is the most correctness-sensitive service in the stack. Duplicate charges, ghost orders, and lost state transitions are catastrophic — both financially and in customer trust. The architecture must guarantee exactly-once order creation even when mobile clients retry on network failures.
Idempotent Order Placement
Every POST /orders request from the client carries a client-generated idempotency_key (UUID v4). The server stores this key in a dedicated idempotency_keys table with the resulting order ID and response payload. On retry, the server looks up the key and returns the cached response — no downstream payment or database writes are executed.
-- Idempotency key table (PostgreSQL)
CREATE TABLE idempotency_keys (
idempotency_key UUID PRIMARY KEY,
order_id BIGINT NOT NULL,
response_status SMALLINT NOT NULL,
response_body JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + INTERVAL '24 hours'
);
CREATE INDEX idx_ikey_expires ON idempotency_keys (expires_at);
-- Order placement (simplified pseudocode)
BEGIN;
-- Check for existing key
SELECT order_id, response_body FROM idempotency_keys
WHERE idempotency_key = $1 FOR UPDATE;
IF found THEN
RETURN cached_response; -- Idempotent replay
END IF;
-- Create the order
INSERT INTO orders (customer_id, restaurant_id, items, total, status)
VALUES ($2, $3, $4, $5, 'PENDING') RETURNING order_id;
-- Fire async payment capture via outbox pattern
INSERT INTO outbox_events (aggregate_id, event_type, payload)
VALUES (order_id, 'ORDER_PAYMENT_REQUESTED', $payload);
-- Record idempotency key
INSERT INTO idempotency_keys VALUES ($1, order_id, 202, $response);
COMMIT;
Order State Machine
Every order transitions through a well-defined state machine. Illegal transitions are rejected at the application layer — state updates are never accepted via direct database writes from any service other than the Order Service.
PENDING → [payment_captured] → CONFIRMED → [restaurant_accepted] → PREPARING → [courier_assigned] → COURIER_ASSIGNED → [courier_arrived_restaurant] → PICKED_UP → [courier_arrived_customer] → DELIVERED
PENDING → [payment_failed] → PAYMENT_FAILED
CONFIRMED / PREPARING → [customer/restaurant cancel] → CANCELLED → [refund_issued] → REFUNDED
Payment Capture via Saga
Payment capture is orchestrated via a choreography-based saga using the Transactional Outbox Pattern. The Order Service writes a PAYMENT_REQUESTED event to the outbox table in the same database transaction that creates the order. A Debezium CDC connector streams the event to the payments.commands Kafka topic. The Payment Service consumes the event, calls the payment processor (Stripe/Adyen), and publishes either PAYMENT_SUCCEEDED or PAYMENT_FAILED. The Order Service listens and transitions state accordingly, triggering downstream dispatch only on success.
4. Restaurant Service — Menu Management, Prep Time & Availability
The Restaurant Service handles three distinct responsibilities that have very different read/write characteristics: menu management (write-heavy during onboarding, read-heavy during browse), real-time availability (high-frequency item 86-ing during busy periods), and prep time estimation (critical input to ETA accuracy).
Menu Storage & Serving
Menu schemas vary enormously across restaurant types — pizza has modifier groups (size, crust, toppings), sushi has combo options, coffee has temperature/milk/syrup combinations. A document store (DynamoDB with a restaurant_id partition key) is ideal for this schema flexibility. The canonical menu document is ~5–50 KB per restaurant and is cached aggressively in CDN edge nodes (CloudFront) with a 5-minute TTL. Cache invalidation is triggered by menu edit events published to Kafka.
Real-Time Item Availability
Restaurants mark items as unavailable (sold out) or pause entirely during rush periods. These signals must propagate to customer apps within seconds to prevent orders for unavailable items. Architecture:
- Restaurant tablet app sends availability updates to the Restaurant Service via a persistent HTTPS/2 connection with streaming.
- Restaurant Service writes the availability delta to Redis (item-level boolean flags with restaurant-scoped hash keys, e.g.,
avail:{restaurant_id}). - An event is published to Kafka (
restaurant.availability.changed), consumed by the Search Service to remove items from browse results within 10 seconds. - Order validation at checkout re-checks availability against Redis to prevent race conditions — an order containing an unavailable item is rejected with a user-friendly error.
Prep Time Estimation
Prep time is one of the two primary inputs to ETA (alongside travel time). The Restaurant Service computes a dynamic prep time estimate using three signals:
- Restaurant-declared estimate: The restaurant tablet allows the operator to declare current kitchen queue depth (e.g., "orders backed up 15 min").
- Historical average: Per-restaurant, per-hour-of-week median prep time from the last 4 weeks. Stored in a time-series table (TimescaleDB).
- Real-time queue model: A lightweight M/M/1 queuing model using the current active order count and historical throughput rate gives a live queue wait estimate.
A weighted blend (40% declared, 30% historical, 30% queue model) is computed and published as a prep-time signal every 60 seconds to the ETA Service via Kafka. This signal is also used by the Dispatch Engine to decide when to send a courier offer — targeting arrival at the restaurant within 2 minutes of order readiness to minimize wait time.
5. Courier Dispatch Engine — H3 Grid Search, Batching & Acceptance Flow
The Dispatch Engine is the algorithmic heart of the platform. Its job is to solve a continuous, real-time assignment problem: given a set of available couriers and a set of unassigned orders, find the optimal matching that minimizes total delivery time while maximizing courier utilization. At DoorDash scale, this must happen in under 3 seconds for millions of events per hour.
Geospatial Indexing with H3
Uber's H3 hierarchical hexagonal grid is the industry standard for geospatial dispatch. Every latitude/longitude coordinate is mapped to a hex cell at multiple resolutions. For courier dispatch, resolution 9 (average area ~0.1 km²) provides the right granularity for urban areas.
// Courier position stored in Redis using H3 cell index
// Key: h3:{resolution}:{h3_cell_index} → SortedSet of courier_ids by score=timestamp
// When a new order is placed at restaurant lat/lng:
String restaurantCell = H3Core.latLngToCell(lat, lng, resolution=9);
// Expanding ring search: k=1 (7 cells) → k=2 (19 cells) → k=3 (37 cells)
for (int ring = 1; ring <= MAX_RING; ring++) {
List<String> ringCells = H3Core.gridDisk(restaurantCell, ring);
List<Courier> candidates = redis.sunion(
ringCells.stream().map(c -> "h3:9:" + c).toArray()
);
if (candidates.size() >= MIN_CANDIDATES) break;
}
// Score each candidate courier by: distance + prep_time_remaining + batching_bonus
candidates.sort(Comparator.comparingDouble(c -> scoreCourier(c, order)));
// Send offer to top-scored courier with 30-second acceptance window
dispatchOffer(candidates.get(0), order, offerTtlSeconds=30);
Key advantages of H3 over naive radius search: hex cells tile perfectly (no gaps or overlaps), neighbors are equidistant (no corner bias as with square grids), and the hierarchical structure allows resolution switching for different search radii without re-indexing. Courier position updates in Redis are O(1) set operations; ring lookups are O(k²) but with very small k in practice.
Order Batching & Stacking Algorithm
Order batching (combining multiple orders for one courier trip) dramatically improves unit economics — one courier delivering two orders on one trip earns 60–80% the margin of two separate trips at half the variable cost. The batching algorithm runs on every newly confirmed order and evaluates potential stacks:
- Eligibility check: Both orders must have the same pickup zone (within 800m of each other), compatible delivery directions (delivery destinations within a 120° arc from pickup), and ETA impact < 8 minutes for any individual order.
- Route optimization: The routing engine (OSRM or Google Maps Routes API) computes the optimal sequence — pickup restaurant A → pickup restaurant B → deliver A → deliver B — and all permutations. Lowest total route duration wins.
- Stack size cap: Maximum 2 orders per stack in urban high-density, maximum 3 in suburban/rural zones where restaurant clustering is lower.
- Dynamic re-stacking: If a courier's first order is delayed (restaurant not ready), the system evaluates opportunistic addition of a second order that was just confirmed nearby.
Acceptance Flow & Fallback
The dispatch offer is sent to the courier app as a push notification and an in-app overlay. The courier has a configurable acceptance window (default 30 seconds). If declined or timed out:
- Offer is immediately sent to the next-ranked courier in the candidate list.
- After 3 declines/timeouts, the search ring expands by 2 additional rings and the order priority score is boosted.
- After 10 minutes without assignment, the Operations team is alerted and a manual intervention workflow activates.
- A courier's acceptance rate is tracked; consistently low acceptance rates trigger a review workflow — the system avoids penalizing couriers for market conditions (e.g., rain surge) vs genuine cherry-picking.
6. Real-Time Location Service — GPS Stream Ingestion & WebSocket Push
The Location Service handles one of the highest-throughput data streams in the platform. With 500,000 active couriers each sending a GPS ping every 5 seconds, that is 100,000 location events per second — sustained, with 3–5× peaks during dinner rush. This requires a purpose-built streaming pipeline, not a general-purpose REST API.
GPS Ingestion Pipeline
The courier mobile SDK sends location pings over a persistent WebSocket connection (not HTTP polling) to the Location Ingestion Gateway. This gateway is a thin, stateless Netty-based server that validates the payload, stamps a server timestamp, and produces to a partitioned Kafka topic courier.location.raw (partitioned by courier_id for in-order processing).
// Kafka message schema: courier.location.raw
{
"courier_id": "cour_8f3a91bc",
"lat": 40.712776,
"lng": -74.005974,
"accuracy_m": 4.2,
"speed_kmh": 18.3,
"heading_deg": 247,
"battery_pct": 68,
"client_ts_ms": 1712345678901,
"server_ts_ms": 1712345678950 // stamped by gateway
}
// Kafka topic config for 100k events/sec throughput
Topic: courier.location.raw
Partitions: 256 // courier_id % 256
Replication: 3
Retention: 24 hours (location data not needed long-term in raw form)
Compression: LZ4 // ~40% size reduction on GPS payloads
Stream Processing & Fan-out
A Kafka Streams (or Flink) application consumes courier.location.raw and performs three actions per event:
- Redis write: Updates the courier's last-known position in Redis (
GEOADD couriers lng lat courier_id) for Dispatch Engine lookups. Also updates the H3 cell index for the new position. - ETA trigger: If the courier is on an active delivery and has moved more than 100m since the last ETA calculation, publishes a recalculation event to
eta.recalculate.requests. - Customer push: If the courier is on an active delivery, fans out the sanitized location payload to a
order.location.{order_id}Kafka topic. WebSocket Gateway workers subscribed to this topic push the update to the customer's WebSocket connection.
WebSocket Connection Management
The WebSocket Gateway is a cluster of stateful servers. Each customer app connects to one gateway node for the duration of an active order. The connection is established after order confirmation and terminated after the "Delivered" state is received. Key design decisions:
- Sticky routing: API Gateway uses consistent hashing on
order_idto route all WebSocket traffic for an order to the same gateway node. - Each gateway node subscribes to Kafka partitions for the order IDs of its connected customers — avoiding a pub/sub fan-out bottleneck.
- Heartbeat: client sends a ping every 30 seconds; server times out connections after 90 seconds of inactivity to reclaim resources.
- Reconnection: clients automatically reconnect with exponential backoff (1 s, 2 s, 4 s, 8 s, max 30 s). On reconnect, the last 5 location payloads are replayed from Kafka to avoid a stale map display.
7. ETA Prediction — ML Features, Model Architecture & Online Recalculation
ETA accuracy is the single metric most correlated with customer satisfaction and repeat ordering. A system that consistently shows 25 minutes but delivers in 35 minutes destroys trust faster than a system that accurately shows 35 minutes. ETA is fundamentally a machine learning problem because the underlying factors (traffic, restaurant throughput, courier skill) are too complex for rule-based estimation.
Two-Stage ETA Architecture
Total ETA = Prep Time ETA + Travel Time ETA (with courier wait buffer). Each component is modelled separately because they have different feature sets and update cadences:
| Component | Model Type | Key Features | Update Cadence |
|---|---|---|---|
| Prep Time | Gradient Boosted Trees (XGBoost) | Restaurant queue depth, order complexity, hour-of-day, day-of-week, current active orders | On order placement; every 2 min thereafter |
| Travel Time | Graph Neural Net + OSRM baseline | Current courier position, destination, real-time traffic, courier speed history, route congestion | Every 30 s or every 100m movement |
| Buffer | Empirical percentile model | Handoff variance, elevator wait, parking, weather | Static per zone; updated weekly |
Feature Engineering for Travel Time
The travel time model consumes a rich feature vector assembled in real time from multiple sources:
- Static road graph features: Road classification (highway/arterial/local), speed limits, turn restrictions, one-way segments — pre-computed from OSM data.
- Real-time traffic: Speed observations aggregated from thousands of active courier GPS traces on each road segment, updated every 2 minutes. This is better than third-party traffic APIs because it reflects exactly the roads couriers use.
- Courier behavioral features: Individual courier's average speed by road type, historical on-time rate, current fatigue level (shift duration), transport mode (bike/scooter/car).
- Temporal features: Time of day, day of week, school schedule, local event calendar (stadium events cause massive traffic spikes).
- Weather: Current rain/snow intensity (from weather API) — rain increases delivery time by 15–30% on average.
Online Recalculation Pipeline
ETA recalculation is triggered by the Location Service every 30 seconds for each active delivery. The ETA Service maintains a feature store (Redis + offline features in DynamoDB) to assemble the feature vector in <50 ms. Model inference runs on CPU-optimized instances using ONNX runtime — each inference takes <10 ms. The updated ETA is:
- Written to the Order Service (for persistence and the order tracking API).
- Pushed to the customer app via the WebSocket pipeline (if the change exceeds 2 minutes, to avoid noisy jitter).
- Logged to a time-series store for model training (actual delivery time vs predicted, per recalculation checkpoint).
8. Dynamic Surge Pricing — Demand/Supply Ratio, Zone Pricing & Dasher Incentives
Surge pricing in food delivery has two economic goals: (1) dampen excess demand during peak periods by increasing customer prices, and (2) attract additional supply (couriers) to underserved zones via incentive bonuses. Importantly, customer price surge and courier incentives are independent levers — you may activate courier incentives without changing customer prices in markets where demand elasticity is low.
Zone-Level Supply/Demand Computation
The Pricing Service partitions every city into pricing zones (H3 resolution 6 hexagons, average area ~36 km²). Every 60 seconds, a Flink job computes for each zone:
// Supply/demand ratio per zone (computed every 60 seconds)
supply_score = (available_couriers * 60) / avg_delivery_time_min;
demand_score = new_orders_last_5_min * 12; // annualize to hourly rate
sd_ratio = supply_score / demand_score;
// Surge multiplier lookup table (calibrated by market)
IF sd_ratio > 1.5: multiplier = 1.0 // healthy supply; no surge
IF sd_ratio > 1.0: multiplier = 1.1 // mild pressure
IF sd_ratio > 0.8: multiplier = 1.25 // moderate surge
IF sd_ratio > 0.6: multiplier = 1.5 // high surge
IF sd_ratio <= 0.6: multiplier = 2.0 // extreme surge; also trigger incentives
// Courier incentive bonuses (independent of customer price)
IF available_couriers < demand_score * 0.5:
bonus_per_delivery = base_pay * 0.3 // 30% delivery bonus
push_notification to offline couriers in adjacent zones
Surge Price Display & Transparency
The surge multiplier is fetched from Redis (TTL 90 seconds) during checkout price computation. The customer app shows a "Busy period — delivery fee increased" banner with the current multiplier when surge is active. Design constraints:
- Surge must be applied at the time of checkout lock-in, not at delivery time (prevents bait-and-switch).
- The surge price shown at checkout is valid for a 10-minute window — the checkout session includes a server-side expiry timestamp and the price is locked in on order placement.
- Regulatory compliance: some markets (e.g., New York City) cap surge multipliers; Pricing Service has a per-market cap configuration.
9. Notification & Communication — Push, SMS, Email & Rate Limiting
Order state transitions generate notifications across multiple channels — customer push, restaurant tablet push, courier push, SMS fallback for critical states, and email for receipts. A dedicated Notification Service decouples notification logic from business logic and provides rate limiting, templating, channel fallback, and delivery tracking.
Notification Triggers by State Transition
| Event | Customer | Courier | Restaurant |
|---|---|---|---|
| ORDER_CONFIRMED | Push + Email (receipt) | — | Tablet push + sound alert |
| COURIER_ASSIGNED | Push (courier info + ETA) | Push (dispatch offer) | — |
| COURIER_ARRIVING_RESTAURANT | — | — | Tablet push (2-min warning) |
| PICKED_UP | Push (en route + ETA) | — | — |
| COURIER_NEARBY (500m) | Push + SMS fallback | — | — |
| DELIVERED | Push + Email (review prompt) | Push (earnings summary) | — |
Rate Limiting & Channel Fallback
Push notification delivery is not guaranteed (device offline, DND mode, permission revoked). The Notification Service implements:
- Channel priority: Push first, SMS fallback after 45-second delivery timeout for critical events (COURIER_NEARBY, DELIVERED). Email is supplementary, never a fallback for time-sensitive events.
- Global rate limits per user: Maximum 15 push notifications per order, 3 SMS messages per order. Prevents notification spam that leads to users disabling push entirely.
- Deduplication: Each notification has a deterministic
notification_id = sha256(order_id + event_type + epoch_minute). If an event triggers duplicate Kafka deliveries, the Notification Service deduplicates within a 5-minute window using a Redis SET. - FCM/APNs rate limits: The service maintains per-app token send rate tracking to stay within FCM (600k/min) and APNs quotas. A token bucket in Redis governs burst behavior.
10. Capacity Estimation — Peak Orders, GPS Events & Bandwidth
Back-of-the-envelope estimation demonstrates you understand system scale and helps drive architectural decisions. Here is a worked example for a DoorDash-scale platform serving 10 major metros globally.
Order Volume
- Daily orders: 5 million (across all markets)
- Peak hour (6–8 PM local): 3× average = ~1,800 orders/minute = 30 orders/second globally
- Active orders at peak: ~500,000 (average delivery time 30 min × 30 orders/sec × 60 sec × peak factor)
- Order Service write throughput: 30 writes/sec on primary (trivially handled by a single PostgreSQL primary with read replicas for query traffic)
GPS Event Volume
- Active couriers at peak: 500,000 (including those on pickup/delivery/waiting)
- GPS ping frequency: every 5 seconds → 100,000 events/second
- Payload size per event: ~200 bytes (JSON with compression) → 20 MB/s ingestion bandwidth
- Kafka storage (24 h retention, 3× replication): 20 MB/s × 86,400 s × 3 = ~5 TB/day — reasonable for 10-node Kafka cluster with 600 GB/node
WebSocket Connections & Bandwidth
- Customer WebSocket connections at peak: 500,000 (one per active order)
- Location update push to customers: every 5 seconds per active delivery × 200 bytes = 40 MB/s outbound from WebSocket Gateway cluster
- WebSocket Gateway sizing: 50,000 connections per node → 10 gateway nodes at peak, auto-scaling to 20 nodes with 2× headroom
- Redis for courier positions: 500,000 keys × 200 bytes = 100 MB of working set — fits in a single Redis shard with room to spare
11. Scalability & Reliability Patterns
Food delivery is a geo-local problem — orders in New York have zero direct interaction with orders in London. This geographic independence is the primary scaling axis. Every hot path in the system is designed to be partitioned by zone, with zone-level failover and independent scaling budgets.
Zone-Based Partitioning
Every service that processes order or location data is partitioned by a zone_id derived from the restaurant's H3 cell at resolution 4 (large hexagons covering ~2,100 km²). This means:
- Kafka topics are partitioned by
zone_id; Flink jobs can process each zone independently. - The Dispatch Engine runs separate instances per zone — a failure in one zone's dispatch instance does not affect other zones.
- Database sharding for Orders and Location history uses
zone_idas the shard key, enabling independent scaling of high-demand metros. - Redis clusters for courier positions are zone-scoped — no cross-shard lookups needed for dispatch (dispatch is always local to a zone).
Circuit Breakers & Graceful Degradation
Every downstream dependency call is wrapped in a circuit breaker (Resilience4j). Defined degradation modes per failure scenario:
- ETA Service down: Fall back to a static lookup table (restaurant_type × time_of_day → p75 historical ETA). Display "approximately 30–45 min" rather than failing the order flow entirely.
- Routing engine down: Fall back to straight-line (haversine) distance × average speed estimate × 1.3 road-correction factor.
- Payment processor timeout: Do not block order placement. Place order in PENDING_PAYMENT state, queue retry via Kafka with exponential backoff (1 s, 2 s, 4 s). Alert on-call if unresolved in 30 seconds.
- Location Service overloaded: Drop GPS events older than 10 seconds on the Kafka consumer — stale position data is worse than slightly delayed position data because outdated coordinates mislead the dispatch engine.
- Restaurant Service down: Serve stale menu from CDN cache. Block new order placement for that restaurant if menu cache is >10 minutes stale (risk of unavailable items slipping through).
Multi-Region Active-Active
The platform deploys across three AWS regions (us-east-1, eu-west-1, ap-southeast-1) in an active-active configuration. Traffic is routed to the nearest region via AWS Route 53 latency routing. Cross-region concerns:
- Orders: Each order is mastered in the region closest to the restaurant. Cross-region order routing is rejected at the API Gateway — all order mutations go to the origin region.
- Read replicas: Order history queries can be served by any region using Aurora Global Database with <1 second cross-region replication lag.
- Kafka: MirrorMaker 2 replicates analytics topics cross-region for reporting. Operational topics are region-local to avoid cross-region latency on the hot path.
- RTO/RPO: Order data RPO = 1 second (synchronous replication within region). If an AZ fails, RTO = 30 seconds (Aurora failover). If a region fails, RTO = 5 minutes (manual DNS failover to secondary region).
12. System Design Interview Checklist
When asked to design a food delivery system in an interview, structure your answer in this order. Interviewers at DoorDash, Uber Eats, and Amazon have confirmed this progression earns the highest signal:
Food Delivery System Design Checklist
- ☐ Requirements (5 min): Clarify functional scope (customer/courier/restaurant), scale (orders/day, active couriers), SLAs (latency, ETA accuracy), and explicit non-goals (payments processor internals, driver background checks).
- ☐ Capacity estimation (3 min): Orders/sec at peak, GPS events/sec, WebSocket connections, storage growth. Name specific numbers.
- ☐ Core API design (5 min): Define
POST /orderswith idempotency key,GET /orders/{id}/track,PUT /couriers/location. - ☐ Data model (5 min): Orders table (PostgreSQL), Menu document (DynamoDB/MongoDB), Courier positions (Redis GEO), Location history (TimescaleDB or Cassandra).
- ☐ Order service (5 min): Idempotent placement, Transactional Outbox for payment, state machine.
- ☐ Dispatch engine (8 min): H3 cell indexing in Redis, expanding ring search, scoring function, batch stacking, acceptance timeout and fallback cascade.
- ☐ Location & WebSocket (5 min): Kafka ingestion pipeline, stream processor for Redis + ETA trigger + customer push, WebSocket Gateway fan-out.
- ☐ ETA (5 min): Two-stage model (prep + travel), feature list, online recalculation trigger, graceful fallback.
- ☐ Surge pricing (3 min): S/D ratio computation, zone-level multipliers, courier incentive bonuses, regulatory caps.
- ☐ Reliability (5 min): Geographic zone partitioning, circuit breakers with named degradation modes, multi-region active-active, RTO/RPO targets.
- ☐ Bottlenecks & trade-offs (3 min): Proactively name the hard problems: dispatch latency under heavy load, ETA accuracy vs compute cost, GPS battery drain vs freshness trade-off.
Common Mistakes to Avoid
- Using a simple SQL query for courier search:
SELECT * FROM couriers WHERE ST_Distance(position, $restaurant) < 5kmwill table-scan at scale. Use Redis GEO or H3 index in Redis/Elasticsearch. - Polling for location updates: HTTP polling from the customer app for location creates 10–100× unnecessary load vs a persistent WebSocket. Always push.
- Forgetting idempotency: Mobile networks are unreliable; retries are guaranteed. Order placement, payment capture, and state transitions must all be idempotent.
- Single-region architecture: A dinner rush incident in one region should not affect other regions. Geo-partitioning is not just a scaling optimization — it is a reliability requirement.
- Hard-coding delivery-only batching: A well-designed batch system should also support restaurant-side batching (grouping multiple orders from the same restaurant into a single courier trip pickup) — not just customer-side delivery batching.
- Ignoring the courier wait problem: If a courier arrives at a restaurant 15 minutes before the order is ready, they are idle and frustrated. The dispatch timing algorithm must account for restaurant prep time signal to dispatch couriers at the right moment, not just the nearest courier.