Designing Scalable Systems at Uber & Netflix Scale: Patterns, Trade-offs, and Architecture Decisions

High-scale distributed system architecture visualization

Designing systems that serve millions of users simultaneously requires a fundamentally different approach than building typical enterprise applications. The patterns used by Uber, Netflix, and similar platforms are not magic — they are well-understood architectural decisions with specific trade-offs. Understanding those trade-offs is what separates senior system designers from the rest.

The Three Axes of Scale

Before diving into specific patterns, it is important to distinguish the different dimensions of scale. Traffic scale: can the system handle millions of requests per second? Data scale: can the system store and query terabytes or petabytes efficiently? Geographic scale: can the system serve users in multiple regions with low latency? Each axis requires different architectural solutions, and real production systems at Uber/Netflix scale must address all three.

The common mistake is optimizing for one axis while ignoring others. A horizontally scaled application cluster with a single-region PostgreSQL database is not truly scalable at traffic scale — the database becomes the bottleneck. Thinking systematically about all three axes prevents architectural blind spots.

Capacity Estimation: The Foundation of Scalable Design

Good system design starts with numbers, not patterns. Before choosing technologies, estimate the scale requirements:

  • Traffic: Daily active users × average requests per user = daily request volume. Assume peak traffic is 2–3× the daily average rate. For Uber, peak occurs at Friday evening — that is your design point, not the average.
  • Storage: Data generated per event × events per day × retention period. Account for replication (typically 3×) and index overhead (typically 2–3×).
  • Bandwidth: Average request size × peak requests per second. Estimate for both ingress and egress separately.

These estimates drive hardware sizing decisions and identify the primary bottleneck — is it compute, storage, or network? — which determines where to focus architectural effort.

The Read-Heavy System: Netflix's Content Delivery Approach

Netflix serves hundreds of millions of concurrent streams. Video content is inherently read-heavy — content is uploaded once and read millions of times. The architectural strategy is aggressive caching at every layer.

CDN as the Primary Serving Layer

Netflix's Open Connect CDN deploys servers inside internet service providers' data centers globally, caching the most popular content files close to end users. By the time a user requests a popular title, the content file is on a server in their ISP's facility — dramatically reducing latency and Netlfix's egress bandwidth costs. Designing around a CDN for static and semi-static content is one of the highest-leverage architectural decisions for content-heavy systems.

Database Reads: Read Replicas and Caching

For metadata reads (movie information, user preferences, recommendations), the pattern is: cache first, read replica second, primary last. Redis or Memcached serves cache hits in under 1 ms. Read replicas handle cache misses, taking load off the primary. The primary handles only writes. This read-path hierarchy can absorb 99%+ of read traffic without touching the primary database.

// Cache-aside pattern for user preferences
@Service
public class UserPreferenceService {
    private final RedisTemplate<String, UserPreference> redis;
    private final UserPreferenceRepository repository;
    private static final Duration CACHE_TTL = Duration.ofMinutes(30);
    public UserPreference getPreferences(String userId) {
        String cacheKey = "user:preferences:" + userId;
        UserPreference cached = redis.opsForValue().get(cacheKey);
        if (cached != null) return cached;
        UserPreference prefs = repository.findByUserId(userId);
        redis.opsForValue().set(cacheKey, prefs, CACHE_TTL);
        return prefs;
    }
    public void updatePreferences(String userId, UserPreference prefs) {
        repository.save(prefs);
        redis.delete("user:preferences:" + userId);  // invalidate cache
    }
}

The Write-Heavy System: Uber's Trip Data Approach

Uber processes millions of GPS location updates per second at peak. Unlike Netflix's read-heavy model, location tracking is write-heavy. The architectural strategy is write path optimization with eventual consistency.

Message Queue as Write Buffer

Raw GPS events are published to Apache Kafka topics rather than written directly to a database. Kafka's throughput (millions of messages per second per cluster) far exceeds any database's write throughput. Consumers process events asynchronously: the trip position service reads events for active trips, the fare calculation service reads events to estimate ETAs and fares, and the analytics service reads events for business intelligence. The database receives only aggregated, post-processed data.

Database Sharding for Write Scale

When write volume exceeds a single database's capacity, horizontal sharding distributes data across multiple database instances. Each shard handles a subset of the key space — for trip data, sharding by driver ID or geographic region is natural. The application layer (or a proxy like Vitess, which Uber uses for MySQL sharding) routes writes and reads to the correct shard based on the sharding key.

Sharding introduces complexity: cross-shard queries require scatter-gather aggregation; shard rebalancing as the dataset grows requires careful migration planning; and operations that span multiple shards cannot be wrapped in a single ACID transaction. Design the sharding key to minimize cross-shard operations.

Geographic Distribution: Multi-Region Architecture

For true global scale, a single cloud region is not sufficient — not for latency (250 ms round-trip from US to Asia Pacific) and not for resilience (a single region outage must not take down the entire service). Multi-region architecture typically follows one of two patterns: active-passive (primary region handles all writes; secondary region is a warm standby for disaster recovery) or active-active (multiple regions handle writes with cross-region synchronization). Active-active is harder but achieves both lower global latency and higher resilience.

The challenge in active-active is handling conflicting writes to the same data from different regions. Design your data model to minimize cross-region write conflicts: partition data by geography (US users managed by US region, EU users by EU region), or use CRDTs (Conflict-free Replicated Data Types) for data that can be merged without conflict (counters, sets).

The CAP Theorem in Practice

The CAP theorem states that a distributed system can provide at most two of three guarantees: Consistency (all nodes see the same data at the same time), Availability (every request receives a response), and Partition tolerance (the system continues despite network partition). In practice, network partitions happen, so systems must choose between consistency and availability under partition.

Uber's dispatch system prioritizes availability: it is better to show a driver's last known location than to block dispatch because the location database is temporarily unavailable. Netflix's streaming prioritizes availability: it is better to play content from a slightly stale CDN cache than to block playback waiting for the freshest metadata. On the other hand, financial transaction systems (payment processing, account balances) prioritize consistency: a stale or incorrect balance is not an acceptable trade-off for availability.

Know your business requirements and make an explicit consistency vs availability decision for each data type rather than applying a blanket policy.

"Scale is not a binary — you scale to your actual requirements. Over-designing for scale you don't have is just as damaging as under-designing for scale you do have."

Key Takeaways

  • Address all three axes of scale: traffic, data, and geographic distribution.
  • Start with capacity estimates — numbers drive architectural decisions, not pattern preferences.
  • Read-heavy systems: CDN + read replicas + aggressive caching reduce database load by 99%+.
  • Write-heavy systems: Kafka as a write buffer + database sharding for write throughput at scale.
  • Make explicit CAP theorem decisions for each data type based on business requirements.

Related Articles

Discussion / Comments

Join the conversation — your comment goes directly to my inbox.

← Back to Blog