System Design

Designing a Hotel Booking System at Scale: Airbnb Architecture, Inventory & Double-Booking Prevention

Airbnb handles over 150 million users, 7 million active listings, and tens of thousands of bookings per minute at peak. Building a hotel or short-term rental booking platform at this scale is one of the most challenging system design problems you'll face — touching distributed inventory, strong consistency guarantees, real-time search, dynamic pricing, saga-based transactions, payment escrow, and fraud prevention all at once. This guide walks through every layer with production-grade depth.

Md Sanwar Hossain April 7, 2026 21 min read Booking Architecture
Hotel booking system design Airbnb search inventory booking payment

TL;DR — Core Design Decisions

"Use Elasticsearch with geo_point for sub-100ms listing discovery. Model availability as a date-range bitset or calendar table with optimistic locking to prevent double booking. Orchestrate bookings as a Saga with compensating transactions. Capture payment in escrow and release after check-in. Trigger reviews 24 hours post-checkout via an event pipeline — not a cron job."

Table of Contents

  1. Functional & Non-Functional Requirements
  2. High-Level Architecture Overview
  3. Search Service — Geo-Search, Facets & ML Ranking
  4. Inventory & Availability Calendar
  5. Dynamic Pricing Engine
  6. Booking Engine & Saga Pattern
  7. Payment & Escrow Service
  8. Review & Trust System
  9. Notification Service
  10. Capacity Estimation
  11. Scalability & Reliability Patterns
  12. System Design Interview Checklist

1. Functional & Non-Functional Requirements

Before touching any architecture, lock down what you're building and what you're not. In a system design interview, this step separates the 5% who get offers from those who dive straight into databases.

Functional Requirements

Non-Functional Requirements

Property Target Rationale
Search latency (p99) < 150 ms User conversion drops 7% per 100ms delay
Booking confirmation latency < 3 s end-to-end Includes availability lock + payment auth
Availability consistency Strong (no double bookings) Double bookings are catastrophic to trust
Search availability freshness Eventual (< 30 s lag) Search can show slightly stale results
System availability 99.99% (4.3 min/month downtime) Revenue-critical path
Payment idempotency 100% — exactly once Duplicate charges are legal/trust risks

Out of Scope (for this design)

2. High-Level Architecture Overview

The platform is decomposed into bounded-context microservices aligned with domain ownership. Each service owns its database, communicates asynchronously via events for most flows, and synchronously only where consistency is critical (availability lock, payment).

Hotel booking system design Airbnb search inventory booking payment
Hotel Booking System Architecture — search, inventory, booking saga, payment escrow, and notification pipeline. Source: mdsanwarhossain.me

Core Services & Responsibilities

Data Store Selection

Service Primary Store Cache Reason
Search Elasticsearch Redis Geo-queries, full-text, facets
Listing PostgreSQL Redis Relational, ACID, complex joins
Inventory PostgreSQL Redis (read cache) Strong consistency, row-level locks
Booking PostgreSQL Saga state machine, ACID required
Pricing TimescaleDB Redis Time-series demand data
Reviews PostgreSQL + Cassandra CDN Write-heavy review feed, read-heavy display

Search is the highest-traffic, most latency-sensitive component. At Airbnb scale, 80% of all requests are search queries. The search service runs on an Elasticsearch cluster with custom ML ranking layered on top of BM25 relevance.

Elasticsearch Geo-Search Design

Each listing document is indexed with a geo_point field. Search queries combine a geo filter with availability and facet filters:

// Elasticsearch query: location + date availability + price filter
GET /listings/_search
{
  "query": {
    "bool": {
      "must": [
        {"range": {"price_per_night": {"gte": 50, "lte": 300}}},
        {"term":  {"property_type": "entire_apartment"}},
        {"term":  {"max_guests": {"gte": 2}}}
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {"lat": 40.7128, "lon": -74.0060}
          }
        },
        // Availability filter: date range NOT in booked_dates
        {"bool": {"must_not": [
          {"nested": {
            "path": "booked_ranges",
            "query": {
              "bool": {
                "must": [
                  {"range": {"booked_ranges.start": {"lte": "2026-06-20"}}},
                  {"range": {"booked_ranges.end":   {"gte": "2026-06-15"}}}
                ]
              }
            }
          }}
        ]}}
      ]
    }
  },
  "sort": [
    {"_score": "desc"},
    {"geo_distance": {"location": {"lat": 40.7128, "lon": -74.0060}, "order": "asc"}}
  ],
  "from": 0, "size": 20
}

Availability in Search vs. Inventory Service

A critical design decision: search availability is eventually consistent, but booking availability is strongly consistent. Here's the split:

ML Ranking Layer

Elasticsearch BM25 is the retrieval layer; a LambdaMART or XGBoost ranking model is the ranking layer. Features fed to the ranking model include:

Search Result Caching Strategy

Cache keys are constructed from hash(lat_lng_bucket + date_range + filters) where lat/lng is quantized to 0.01° cells (≈1 km). Cache TTL is 60 seconds for popular searches, 300 seconds for rare searches. Personalized ranking is applied post-cache, so the cache stores un-ranked result IDs. This approach gives 40–60% cache hit rates while still delivering personalized results.

4. Inventory & Availability Calendar — Preventing Double Bookings

Double booking is the single worst failure mode in a booking system. It destroys host and guest trust and is the source of most legal disputes. The availability calendar design must guarantee that two concurrent booking requests for the same property and overlapping date range cannot both succeed.

Availability Data Model

We model availability as a per-listing, per-date table rather than a range table. This gives O(1) date lookup and enables atomic locking at the individual night level:

-- Inventory availability table (PostgreSQL)
CREATE TABLE availability (
    listing_id   BIGINT       NOT NULL,
    date         DATE         NOT NULL,
    status       VARCHAR(20)  NOT NULL DEFAULT 'available',
    -- 'available' | 'booked' | 'blocked' | 'pending'
    booking_id   BIGINT       REFERENCES bookings(id),
    price        NUMERIC(10,2),
    version      BIGINT       NOT NULL DEFAULT 0,  -- optimistic locking
    PRIMARY KEY (listing_id, date)
);

-- Index for range queries
CREATE INDEX idx_avail_listing_date ON availability (listing_id, date)
    WHERE status = 'available';

-- Partial index for pending holds (garbage-collected after TTL)
CREATE INDEX idx_avail_pending ON availability (listing_id, date, booking_id)
    WHERE status = 'pending';

Optimistic Locking for Double-Booking Prevention

When a guest initiates a booking, we atomically update all dates in the requested range from available to pending using an optimistic lock check. If any date has a version mismatch (concurrent modification), the transaction rolls back:

-- Atomic availability hold (all-or-nothing for date range)
-- Step 1: Read current state with FOR UPDATE (pessimistic variant)
SELECT date, version FROM availability
WHERE listing_id = $1
  AND date BETWEEN $2 AND $3
  AND status = 'available'
FOR UPDATE;

-- Step 2: Verify all dates are present (= no gaps in availability)
-- If count != expected nights, raise "DATES_UNAVAILABLE"

-- Step 3: Atomically mark as 'pending' with booking reference
UPDATE availability
SET status     = 'pending',
    booking_id = $4,
    version    = version + 1
WHERE listing_id = $1
  AND date BETWEEN $2 AND $3
  AND status = 'available'
  AND version = ANY($5::BIGINT[]);  -- optimistic check

-- Step 4: If rows_updated != expected nights → concurrent conflict → rollback
-- Saga compensating action: release hold

Pending Hold TTL & Cleanup

A pending hold is created when availability is locked but payment has not yet been processed. If payment fails or the user abandons the flow, the hold must be released within a bounded time. Two mechanisms enforce this:

5. Dynamic Pricing Engine

Dynamic pricing is a competitive differentiator. Airbnb's "Smart Pricing" tool is estimated to increase host revenue by 15–30% compared to flat pricing. The engine computes a per-night recommended price based on multiple signals and updates it daily or on-demand.

Pricing Factors & Weights

Pricing Computation Pipeline

// Pricing service pseudo-code (Java Spring Boot)
@Service
public class DynamicPricingEngine {

    public BigDecimal computePrice(Long listingId, LocalDate date) {
        Listing listing   = listingCache.get(listingId);
        BigDecimal base   = listing.getBasePrice();

        // Multiplicative factors (each returns a ratio, e.g., 1.25 = +25%)
        double dowFactor     = dayOfWeekModel.getFactor(listing.getCityId(), date.getDayOfWeek());
        double leadFactor    = leadTimeModel.getFactor(ChronoUnit.DAYS.between(LocalDate.now(), date));
        double demandFactor  = demandSignalService.getDemandFactor(listing.getGeoCell(), date);
        double eventFactor   = eventService.getEventFactor(listing.getCityId(), date);
        double compFactor    = competitorService.getMedianRatio(listingId, date);

        BigDecimal recommended = base
            .multiply(BigDecimal.valueOf(dowFactor))
            .multiply(BigDecimal.valueOf(leadFactor))
            .multiply(BigDecimal.valueOf(demandFactor))
            .multiply(BigDecimal.valueOf(eventFactor))
            .multiply(BigDecimal.valueOf(compFactor));

        // Clamp to [minPrice, maxPrice] set by host
        return recommended.max(listing.getMinPrice()).min(listing.getMaxPrice());
    }
}

Pricing Caching Strategy

Computing prices on every search request would be prohibitively expensive. Instead:

6. Booking Engine & Saga Pattern

The booking flow spans multiple services — Inventory, Pricing, User, Payment — with no single ACID transaction boundary. We use the Saga pattern with an orchestrator (the Booking Service) coordinating the sequence and triggering compensating transactions on failure.

Booking Saga Steps

Step Action Compensating Action Owner Service
1 Create booking record (PENDING) Mark booking CANCELLED Booking Service
2 Lock inventory dates (PENDING hold) Release inventory hold Inventory Service
3 Confirm final price N/A (read-only) Pricing Service
4 Authorize & capture payment (escrow) Void authorization or refund Payment Service
5 Confirm inventory (PENDING → BOOKED) Revert to AVAILABLE Inventory Service
6 Publish BookingConfirmed event → Notifications Publish BookingCancelled event Booking Service

Idempotency in the Booking Saga

Network failures can cause the same saga step to be retried. Every saga step must be idempotent:

Booking State Machine

// Booking states (stored in bookings table)
enum BookingStatus {
    PENDING,              // Created, awaiting inventory lock
    INVENTORY_HELD,       // Dates locked in availability table
    PAYMENT_AUTHORIZED,   // Payment captured in escrow
    CONFIRMED,            // Inventory confirmed BOOKED
    CHECKED_IN,           // Guest checked in (triggers escrow release timer)
    COMPLETED,            // Post-checkout, payout released to host
    CANCELLATION_PENDING, // Cancellation requested, calculating refund
    CANCELLED,            // Fully cancelled, refund processed
    FAILED                // Saga failed, all compensating actions run
}

// Saga transition table
PENDING              → INVENTORY_HELD    (on: inventory lock success)
PENDING              → FAILED            (on: inventory lock failure)
INVENTORY_HELD       → PAYMENT_AUTHORIZED (on: payment captured)
INVENTORY_HELD       → FAILED            (on: payment failure → release inventory)
PAYMENT_AUTHORIZED   → CONFIRMED         (on: inventory confirmed)
CONFIRMED            → CHECKED_IN        (on: check-in event)
CHECKED_IN           → COMPLETED         (on: checkout + 24h timer)
CONFIRMED            → CANCELLATION_PENDING (on: cancellation request)
CANCELLATION_PENDING → CANCELLED         (on: refund processed)

7. Payment & Escrow Service

Payment in a booking platform is fundamentally different from an e-commerce checkout. The money is collected upfront but held for days or weeks before the host earns it. This escrow model protects guests (chargebacks on no-shows) while guaranteeing hosts eventual payment for delivered stays.

Payment Lifecycle

Cancellation Policy Engine

Policy > 5 days before check-in 2–5 days before < 48 hours
Flexible 100% refund 100% refund No refund
Moderate 100% refund 50% refund No refund
Strict 50% refund No refund No refund

Preventing Duplicate Charges

The Payment Service maintains a payment_operations table with a unique constraint on (booking_id, operation_type). Before issuing any Stripe API call, it checks this table. If an entry already exists (prior successful operation), it skips the Stripe call and returns the cached result. This — combined with Stripe's idempotency keys — gives a two-layer guarantee against duplicate charges even under aggressive retries.

8. Review & Trust System

Reviews are the trust mechanism that makes the entire marketplace work. Guests rely on them to choose listings; hosts rely on them to attract bookings. The review system must be tamper-resistant, prompt, and fair to both parties.

Post-Stay Review Trigger

Reviews are triggered by the BookingCompleted event, published when the booking transitions to COMPLETED state (checkout + 24 hours). The Review Service consumes this event and:

Review Fraud Detection

Fake reviews (both positive and negative) are a marketplace threat. Multi-layer fraud detection runs on every submitted review:

Impact on Search Ranking

When a review is published (both parties submit or window expires), the Review Service publishes a RatingUpdated event to Kafka. The Search Service consumes this event and re-indexes the listing's aggregate rating in Elasticsearch. The ML ranking model uses the updated rating within the next 60-second reindex cycle. A single 1-star review from a previously 5-star listing will affect search position within minutes.

9. Notification Service — Email, SMS & Push

Notifications are the voice of your platform. A missed booking confirmation or a delayed check-in reminder erodes user trust as quickly as a bug in the booking flow. The notification service is event-driven, fan-out capable, and multi-channel.

Architecture

The Notification Service subscribes to multiple Kafka topics. Each domain service emits events; the Notification Service maps event types to notification templates and channels:

Channel Providers & Fallback

Channel Primary Provider Fallback Volume (Airbnb scale)
Email AWS SES SendGrid ~5M emails/day
SMS Twilio MessageBird ~500K SMS/day
Push (iOS) Apple APNs ~3M push/day
Push (Android) Google FCM ~4M push/day

Deduplication & Rate Limiting

Kafka consumer retries can cause duplicate notifications — a guest should not receive three "Booking Confirmed" emails. The Notification Service maintains a sent_notifications table keyed by (user_id, event_id, channel). Before dispatching, it checks for an existing entry. This deduplication check uses Redis with a 72-hour TTL for high-throughput lookups before falling back to the database. Additionally, per-user rate limits (max 3 email notifications per hour, max 5 push per day for non-critical events) prevent spam fatigue.

10. Capacity Estimation

Back-of-envelope calculations anchor your infrastructure sizing decisions. These numbers reflect Airbnb-scale and are good baselines for system design interviews.

Traffic Estimates

Storage Estimates

Infrastructure Sizing (Production Baseline)

Component Count Instance Type Rationale
Search (Elasticsearch) 9 nodes (3 primaries + 6 replicas) r6g.2xlarge (64GB RAM) Memory for inverted index; HA
Inventory DB (PostgreSQL) 1 primary + 2 read replicas r6g.4xlarge (128GB RAM) Availability table in memory
Redis Cluster 6 nodes (3 masters, 3 replicas) r6g.xlarge (32GB RAM) Price cache + session + dedup
Kafka 6 brokers m6i.2xlarge + NVMe storage Event backbone; 7-day retention
Booking Service (K8s pods) 20 pods (HPA, min:10, max:50) 2 vCPU / 4 GB Saga orchestration is CPU-bound

11. Scalability & Reliability Patterns

A booking platform must handle extreme seasonal peaks (New Year's Eve, major holidays) — traffic can spike 5–10× overnight. Several architectural patterns make the system elastic and fault-tolerant.

Hot Listing Problem

A listing that goes viral (featured in a magazine, shared on social media) can receive thousands of concurrent booking attempts for a handful of available dates. This creates a hotspot on specific rows in the availability table. Mitigation strategies:

Multi-Region Deployment

Three AWS regions: us-east-1 (primary), eu-west-1 (Europe primary), ap-southeast-1 (APAC primary). Each region is a full active-active deployment for search and reads. Booking writes are routed to the listing's "home region" — the region where the listing was created — to keep all availability writes co-located and avoid cross-region consistency issues.

Circuit Breakers & Bulkheads

Database Sharding Strategy

The availability table is the most write-intensive database table in the system. As the platform grows beyond 20 million listings, a single PostgreSQL instance will not sustain the write throughput. Sharding by listing_id % num_shards distributes the load evenly. The Inventory Service uses a consistent hash ring to determine which shard owns a given listing_id, with the shard mapping cached in Redis for sub-millisecond routing. New shards are added by splitting existing shards (Citus extension for PostgreSQL enables transparent sharding without application code changes).

12. System Design Interview Checklist

When asked to design a hotel booking system like Airbnb or Booking.com in a system design interview, hit these points to demonstrate senior-level thinking:

Requirements Clarification (5 min)

  • ✅ What is the scale? (number of listings, daily bookings, concurrent searches)
  • ✅ Instant booking vs. host approval flow?
  • ✅ What cancellation policies are supported?
  • ✅ Is this a global system (multi-currency, multi-region)?
  • ✅ What is the consistency requirement for availability? (This is the key question — answer: strong consistency for booking writes, eventual for search)

Core Design Decisions to Discuss

  • Double-booking prevention: Explain optimistic locking on availability table, pending hold TTL, and saga compensating transactions
  • Search vs. booking consistency split: Elasticsearch for search (eventually consistent), PostgreSQL for booking (strongly consistent) — explain why
  • Saga pattern for distributed transaction: Name each step, its compensating action, and the idempotency mechanism
  • Payment escrow: Capture at booking, release 24h post check-in — not a simple charge
  • Hot listing handling: Virtual queue + rate limiting + dedicated shard
  • Pricing architecture: Pre-computed nightly batch + real-time event override, cached in Redis
  • Review double-blind window: 14-day window, neither review published until both submitted

Reliability & Failure Scenarios to Address

  • ✅ Payment service is down mid-booking → circuit breaker, no ghost inventory holds
  • ✅ Saga step fails after inventory held but before payment → compensating transaction releases hold
  • ✅ Duplicate booking request (retry storm) → idempotency keys at every step
  • ✅ Elasticsearch cluster goes down → Redis full-page cache fallback
  • ✅ Database primary failure → automatic promotion of read replica (RDS Multi-AZ, < 30s failover)
  • ✅ Kafka consumer lag spike → dead letter queue for failed notification events, replay after service recovery

Common Mistakes in Interviews

  • ❌ Using a single global transaction across services (impossible in microservices without 2PC overhead)
  • ❌ Querying the inventory PostgreSQL database for every search request (will not scale)
  • ❌ Storing availability as a binary "available" flag on the listing row (no date granularity, race condition on concurrent updates)
  • ❌ Making payment synchronously in the user's HTTP request without a timeout and saga retry (payment providers have p99 latencies of 3–8 seconds)
  • ❌ Forgetting the pending hold TTL mechanism (orphaned holds permanently block availability)
  • ❌ Not addressing idempotency for payment — "what if the same booking request is retried twice?" is always asked

Key Metrics to Monitor in Production

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · AI/LLM Systems

All Posts
Last updated: April 7, 2026