Designing an E-Commerce Platform at Scale: Catalog, Cart, Inventory & Order Management
E-commerce platforms are among the most complex distributed systems — they must handle millions of concurrent users, prevent overselling during flash sales, process payments atomically, and coordinate fulfillment across global warehouse networks. This guide covers every major subsystem from product catalog to order delivery tracking.
TL;DR — Core Architecture Decisions
"Product catalog: Elasticsearch for search, PostgreSQL for truth. Cart: Redis (15-min TTL) with persistent backup in Cassandra. Inventory: Redis atomic DECR prevents oversell. Checkout: Saga pattern coordinates inventory → payment → fulfillment with compensation on failure. Orders: PostgreSQL + Kafka event bus."
Table of Contents
- Requirements & Scale Estimation
- Microservices Architecture Overview
- Product Catalog — Search, Filtering & CDN
- Shopping Cart — Redis & Persistence Strategy
- Inventory Management — Oversell Prevention
- Flash Sale — Handling 100K TPS Spikes
- Checkout Saga — Distributed Transaction
- Order Management & Fulfillment
- Pricing Engine & Promotions
- Scaling Strategy & Data Partitioning
- Design Checklist & Conclusion
1. Requirements & Scale Estimation
Functional Requirements
- Users browse, search, and filter products from a catalog of 500M+ SKUs
- Users add items to cart, apply coupons, and check out
- Inventory is reserved atomically during checkout to prevent overselling
- Orders are processed, tracked, and fulfilled across global warehouses
- Flash sales with extreme traffic spikes (100K users simultaneously competing for 1K items)
- Sellers can list products, manage inventory, and view sales analytics
Scale Estimates (Amazon-like)
| Metric | Value | Notes |
|---|---|---|
| DAU | 100M | Mid-size platform |
| Product catalog | 500M SKUs | ~1TB product data |
| Orders/day | 5M | ~58 orders/sec average |
| Peak checkout TPS | 10,000 | Black Friday peak |
| Flash sale peak TPS | 100,000+ | Requires queue/rate limiting |
2. Microservices Architecture Overview
Each service owns its domain and database. Services communicate synchronously (gRPC/REST) only when an immediate response is needed (inventory check, price lookup). All post-checkout operations (fulfillment, notifications, analytics) are driven by Kafka events for resilience.
3. Product Catalog — Search, Filtering & CDN
Dual Storage: PostgreSQL + Elasticsearch
Product data lives in two stores simultaneously:
- PostgreSQL (source of truth): Product master data — name, description, price, seller, categories, SKUs. ACID transactions for write correctness. Sharded by seller_id (multi-tenant partitioning).
- Elasticsearch (search index): Denormalized copy optimized for full-text search, faceted filtering, and ranked results. Updated via Debezium CDC from PostgreSQL → Kafka → Elasticsearch consumer.
// Elasticsearch product document
{
"product_id": "B08N5WRWNW",
"title": "Apple iPhone 15 Pro 256GB",
"brand": "Apple",
"categories": ["Electronics", "Phones", "Smartphones"],
"price": 999.99,
"rating": 4.8,
"review_count": 12849,
"in_stock": true,
"variants": [
{"color": "Natural Titanium", "storage": "256GB", "sku_id": "SKU-001"}
],
"features": ["5G", "USB-C", "Action Button"],
"image_urls": ["https://cdn.store.com/B08N5WRWNW/main.webp"],
"seller_id": "seller_apple_official"
}
// Search query with facets
GET /products/_search
{
"query": { "multi_match": { "query": "iphone 15", "fields": ["title^3","brand^2","features"] }},
"aggs": { "brands": { "terms": { "field": "brand" } },
"price_ranges": { "range": { "field": "price", "ranges": [...] } } },
"sort": [{ "_score": "desc" }, { "rating": "desc" }]
}
Product Page Caching
- Static product metadata (title, images, description): CDN edge cache with 1-hour TTL — invalidated on price/availability change
- Dynamic data (live price, stock count, reviews): Short-lived Redis cache (30s TTL) or served directly from DB
- Images: S3 + CloudFront with 30-day cache; served as WebP with multiple resolutions (400px, 800px, 1600px)
- Product recommendations: Pre-computed ML recommendations cached in Redis per user (4h TTL); generated nightly batch
4. Shopping Cart — Redis & Persistence Strategy
The shopping cart is read extremely frequently (every page view on a shopping session) but has low durability requirements — a lost cart is annoying but not catastrophic. This makes Redis the ideal primary store.
// Cart stored as Redis Hash
// Key: cart:{user_id}
// Field: {sku_id}, Value: {quantity, added_at, price_at_add}
HSET cart:user_abc123
SKU-001 '{"qty":2,"price":999.99,"added_at":1712300000}'
SKU-002 '{"qty":1,"price":49.99,"added_at":1712300100}'
EXPIRE cart:user_abc123 86400 // 24h TTL; reset on every cart modification
// Read full cart
HGETALL cart:user_abc123
// Add or update quantity
HSET cart:user_abc123 SKU-001 '{"qty":3,...}'
// Remove item
HDEL cart:user_abc123 SKU-001
Cart Persistence for Logged-In Users
- On add-to-cart: write to Redis immediately (sync) + publish to Kafka (async)
- Kafka consumer persists cart to Cassandra (eventually consistent backup)
- On login: merge anonymous guest cart with persisted logged-in cart (union, keeping higher quantity)
- Price staleness: prices stored in cart at time of add; on checkout, validate current prices and alert user if changed > 5%
5. Inventory Management — Oversell Prevention
Inventory is the most consistency-critical part of e-commerce. Selling more items than you have in stock leads to order cancellations, customer trust loss, and potential legal liability.
Two-Phase Reservation
- Soft reservation (add to cart): DECR Redis counter; if result ≥ 0, item is soft-reserved for 15 minutes. TTL expires → auto-release.
- Hard reservation (checkout initiated): Move from soft to hard reservation in DB transaction. Hard reservation holds inventory while payment processes (typically < 30 seconds).
- Commit (payment succeeded): Deduct inventory permanently in PostgreSQL. Publish
inventory.deductedevent. - Rollback (payment failed / timeout): INCR Redis counter + update DB reservation status. Inventory made available again.
-- Inventory table
CREATE TABLE inventory (
sku_id UUID PRIMARY KEY,
warehouse_id UUID NOT NULL,
total_qty INT NOT NULL,
reserved_qty INT NOT NULL DEFAULT 0, -- soft + hard reservations
available_qty INT GENERATED ALWAYS AS (total_qty - reserved_qty) STORED,
version BIGINT NOT NULL DEFAULT 0 -- optimistic locking
);
-- Atomic soft reservation with optimistic locking
UPDATE inventory
SET reserved_qty = reserved_qty + :qty, version = version + 1
WHERE sku_id = :sku_id
AND available_qty >= :qty -- check atomically
AND version = :expected_version;
-- If 0 rows affected: either out of stock or concurrent update (retry)
6. Flash Sale — Handling 100K TPS Spikes
Flash sales are designed to create urgency — 1000 iPhones at 50% off for 10 minutes. The resulting traffic spike (100K users fighting for 1K units) requires a completely different architecture than normal e-commerce traffic.
Flash Sale Architecture
- Pre-load stock to Redis: 10 minutes before sale, set
flash_inventory:sale_id = 1000in Redis - Request queue: Incoming checkout requests enqueued in Redis List (LPUSH). Queue length = 3× stock (3000). Reject requests beyond queue capacity with "sale sold out" immediately.
- Virtual waiting room: Show users their queue position and estimated wait time — reduces frustration and retry storms
- Worker drains queue: Workers pop from queue (RPOP), call
DECR flash_inventory:sale_id, process if result ≥ 0 - Async persistence: Successful reservations published to Kafka → DB consumer persists to PostgreSQL
- Early termination: Stop queue processing when counter hits 0; reject all remaining requests
// Flash sale checkout handler
public FlashSaleResult attemptPurchase(String saleId, String userId, int qty) {
// 1. Check queue capacity (reject early)
Long queueLen = redis.llen("flash_queue:" + saleId);
if (queueLen != null && queueLen > maxQueueSize) {
return FlashSaleResult.SOLD_OUT;
}
// 2. Rate limit per user (max 1 attempt per 5 seconds)
String rateLimitKey = "flash_limit:" + saleId + ":" + userId;
Boolean allowed = redis.setNX(rateLimitKey, "1");
if (!allowed) return FlashSaleResult.RATE_LIMITED;
redis.expire(rateLimitKey, 5);
// 3. Atomic decrement inventory
Long remaining = redis.decrBy("flash_inventory:" + saleId, qty);
if (remaining != null && remaining < 0) {
redis.incrBy("flash_inventory:" + saleId, qty); // refund counter
return FlashSaleResult.SOLD_OUT;
}
// 4. Publish async order creation
kafka.publish("flash.orders", new FlashOrderEvent(saleId, userId, qty));
return FlashSaleResult.SUCCESS;
}
7. Checkout Saga — Distributed Transaction
Checkout spans multiple services: inventory reservation, payment, and order creation. There's no global transaction across these services — we use the Saga pattern with compensating transactions for failure recovery.
Checkout Saga Steps
| Step | Action | Compensating Action |
|---|---|---|
| 1 | Validate cart & prices | — |
| 2 | Hard-reserve inventory | Release reservation |
| 3 | Apply coupon / calculate final price | Un-apply coupon usage |
| 4 | Charge payment (PSP) | Refund charge |
| 5 | Create order record | Cancel order (status: cancelled) |
| 6 | Confirm inventory deduction, clear cart | — |
Orchestration vs. Choreography: For checkout, use orchestration (a dedicated Checkout Orchestrator service manages the saga state machine) rather than choreography — it's easier to reason about failure modes, implement timeouts, and perform compensation in a single place.
8. Order Management & Fulfillment
Order State Machine
Orders follow a strict state machine: placed → confirmed → packed → shipped → delivered → completed. Each transition is triggered by an event (from fulfillment, carrier, or user) and stored as an immutable audit log entry. The current state is the latest entry.
CREATE TABLE orders (
order_id UUID PRIMARY KEY DEFAULT gen_random_uuid_v7(),
user_id UUID NOT NULL,
status ENUM('placed','confirmed','packed','shipped','delivered','completed','cancelled'),
total_amount BIGINT NOT NULL, -- cents
currency CHAR(3) NOT NULL,
shipping_address JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE order_events ( -- immutable event log
event_id UUID PRIMARY KEY,
order_id UUID NOT NULL REFERENCES orders(order_id),
event_type TEXT NOT NULL, -- 'status_changed', 'tracking_updated', etc.
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Warehouse Routing & Fulfillment
- Warehouse selection: Choose warehouse closest to buyer with sufficient stock. Use geo-proximity scoring + stock availability query.
- Split shipments: If no single warehouse has all items, split order across multiple warehouses (each becomes a separate shipment)
- Carrier selection: Choose carrier based on SLA, cost, and historical on-time delivery rate for the destination zone
- Tracking updates: Carrier webhooks → Kafka → Order Service → notify user via email/SMS/app
9. Pricing Engine & Promotions
Pricing in e-commerce is complex: base price, sale price, coupon discounts, loyalty points, bundle deals, and dynamic repricing must all be applied correctly and atomically.
Coupon Design
- Coupon storage: Redis SET
coupon:{code}stores coupon config (discount %, expiry, usage limit, eligible product categories) - Usage deduplication: Redis SETNX
coupon_used:{code}:{user_id}prevents double-use per user - Global usage limit: Redis INCR with a limit check (e.g., first 500 users only) — atomic even under high concurrency
- Coupon validation at checkout: Validate in Checkout Service before applying to total — never trust client-side discount calculations
Dynamic Pricing
For marketplace platforms (Amazon-style), sellers set their own prices. Dynamic repricing algorithms adjust prices based on competitor pricing and demand elasticity. Prices are updated in PostgreSQL + synced to Elasticsearch and CDN cache via invalidation events.
10. Scaling Strategy & Data Partitioning
Database Partitioning Strategy
| Service | Database | Sharding Key | Rationale |
|---|---|---|---|
| Products | PostgreSQL | seller_id | All seller products collocated |
| Orders | PostgreSQL | user_id | User sees all their orders on one shard |
| Inventory | PostgreSQL + Redis | sku_id | High contention items spread across shards |
| Cart | Redis + Cassandra | user_id | Session affinity for fast access |
Read Scaling
- Product listing pages: served from Elasticsearch + CDN (99% of product traffic is reads)
- Order history: read replicas for order queries; user-facing queries never hit write primary
- Inventory availability (product page): Redis cache with 30s TTL; approximate count acceptable for display ("only 5 left!")
- Inventory at checkout: always read from the write primary for accuracy before reserving
11. Design Checklist & Conclusion
E-Commerce System Design Checklist
- ☐ Product catalog: PostgreSQL (write truth) + Elasticsearch (search) + CDC sync
- ☐ Cart stored in Redis Hash with 24h TTL; persisted async to Cassandra via Kafka
- ☐ Inventory uses two-phase reservation (soft at cart, hard at checkout)
- ☐ Atomic inventory check-and-reserve via Lua script or DB optimistic locking
- ☐ Flash sale uses Redis DECR + request queue + virtual waiting room
- ☐ Checkout uses saga pattern with orchestrator (not choreography) for clarity
- ☐ Payment service is idempotent (idempotency keys for all PSP calls)
- ☐ Orders stored with immutable event log (every status change appended)
- ☐ Coupon usage is atomic (Redis SETNX per user + INCR global counter)
- ☐ All post-checkout work (fulfillment, notifications) driven by Kafka events
E-commerce system design covers a remarkably wide range of distributed systems challenges: from the read-heavy product catalog (Elasticsearch, CDN) to the write-contended inventory (Redis atomics, optimistic locking) to the multi-service checkout (saga pattern) to the event-driven fulfillment pipeline (Kafka). Each subsystem has its own failure modes and scaling patterns — which is exactly why e-commerce is such a rich topic for system design interviews and a rigorous test of distributed systems thinking.