System Design

Designing a Video Streaming Platform at Scale: YouTube Architecture, Adaptive Bitrate & CDN

Video streaming platforms are among the most data-intensive distributed systems ever built. YouTube serves over 1 billion hours of video daily; Netflix accounts for 15% of global internet traffic. This guide dissects every layer of the stack — from the upload pipeline to the adaptive player in your browser — with concrete design decisions and real numbers.

Md Sanwar Hossain April 6, 2026 22 min read System Design
Video streaming platform system design: ingestion, transcoding, CDN, adaptive bitrate, and recommendation engine at scale

TL;DR — Core Architecture Decisions

"A video streaming platform needs: (1) a resumable upload pipeline that chunks videos and writes to blob storage (S3/GCS), (2) a parallel transcoding fleet producing a bitrate ladder (240p–4K) in H.264/VP9/AV1, (3) HLS/DASH manifest generation for adaptive bitrate delivery, (4) a multi-tier CDN (edge PoPs → regional clusters → origin shield) serving segments with 99%+ cache hit ratios, and (5) a two-stage recommendation engine (candidate retrieval + ranking) powered by watch-history embeddings."

Table of Contents

  1. Architecture Overview & Scale Numbers
  2. Video Ingestion & Upload Pipeline
  3. Transcoding at Scale: Bitrate Ladder & Codec Selection
  4. Adaptive Bitrate Streaming: HLS & DASH
  5. CDN & Edge Delivery Architecture
  6. Metadata & Storage Layer
  7. Recommendation Engine
  8. Live Streaming vs VOD: Key Differences
  9. Search & Discovery
  10. Cost Optimization Strategies
  11. Capacity Estimation & Conclusion

1. Architecture Overview & Scale Numbers

A video streaming platform has two fundamentally different traffic patterns: write path (upload, transcode, index) and read path (browse, play, seek). The read path dwarfs the write path by orders of magnitude — for every video uploaded, millions of views happen. This asymmetry drives most design decisions.

YouTube-Scale Numbers (2026)

Metric Number Design Implication
Daily active users 2.5 billion Global CDN with 200+ PoPs
Video hours watched/day 1 billion hours 200+ Tbps egress bandwidth
Videos uploaded/minute 500 hours of video Parallel transcoding workers
Storage Exabytes Tiered cold/warm/hot blob storage
Peak concurrent viewers 80+ million Aggressive CDN prefetching + edge caching

High-Level System Components

The platform decomposes into five planes:

Video streaming platform architecture: ingestion pipeline, transcoding fleet, CDN edge delivery, metadata layer, and recommendation engine
Complete video streaming platform architecture — from upload to playback at global scale. Source: mdsanwarhossain.me

2. Video Ingestion & Upload Pipeline

Uploading a large video file requires careful engineering. Raw uploads can be hundreds of gigabytes (4K, 8K films). A naive single-request upload would time out and lose progress on network interruptions. The solution is resumable chunked uploads.

Resumable Upload Protocol

  1. Initiate: Client sends a POST to the upload API with file metadata (size, MIME type, title). Server returns a unique upload session ID and a presigned URL pointing to blob storage.
  2. Chunk: Client splits the file into 5–10 MB chunks and sends each with a Content-Range header. Chunks can be sent in parallel (8–16 concurrent connections) for speed.
  3. Track state: Upload service stores chunk completion state in Redis (UPLOAD:{sessionId} → bitmap). If the network drops, the client queries which chunks are missing and resends only those.
  4. Assemble: Once all chunks are received, the upload service writes the assembled raw file to a "raw" bucket in S3 and publishes a VIDEO_UPLOADED event to Kafka.
  5. Validate: A validation worker consumes the event, verifies the file (format check, virus scan, copyright fingerprint via Content ID), and transitions the video to PROCESSING state.

Direct-to-Storage Upload Pattern

For large files, routing through application servers wastes bandwidth and CPU. The preferred pattern: the upload API generates a presigned S3 URL with a 6-hour expiry. The client browser uploads directly to S3, bypassing your application tier entirely. Your upload API only handles metadata; your servers are never a bottleneck for the bytes themselves. Upon successful S3 upload, S3 triggers a Lambda or sends an S3 event notification to SQS/Kafka to kick off the processing pipeline.

3. Transcoding at Scale: Bitrate Ladder & Codec Selection

Transcoding is the most compute-intensive operation in the platform. A single 4K, 60fps, 2-hour film may require 10+ hours of CPU time for a full codec ladder — which is why the fleet must parallelize aggressively.

The Bitrate Ladder

Each uploaded video is transcoded into multiple quality tiers (the "bitrate ladder") so adaptive streaming can choose the right one at runtime:

Resolution H.264 Bitrate VP9/AV1 Bitrate Use Case
240p 300 kbps 150 kbps 2G / very slow connections
480p 1 Mbps 500 kbps Mobile, 3G
720p 2.5 Mbps 1.2 Mbps Standard Wi-Fi
1080p 5 Mbps 2.5 Mbps Full HD, broadband
1440p (2K) 10 Mbps 5 Mbps High-end desktop
2160p (4K) 20 Mbps 10 Mbps 4K TV, premium

Parallel Segmented Transcoding

Rather than transcoding a 2-hour film as a single job, the pipeline splits the raw video into 2-minute segments, transcodes each segment in parallel across dozens of workers, then stitches the outputs. This reduces a 4-hour transcoding job to under 10 minutes for most content. The workflow:

# Transcoding pipeline (simplified)
1. Split raw video into 120s segments (ffmpeg -segment_time 120)
2. Fan out: publish N segment jobs to Kafka topic "transcode-jobs"
3. Worker pool (auto-scaled on Kubernetes) each picks a job:
   - Downloads segment from S3 (raw bucket)
   - Transcodes to all bitrate rungs: H.264, VP9, AV1
   - Uploads output segments to S3 (processed bucket)
   - Publishes "SEGMENT_DONE" event
4. Manifest builder waits for all segments to complete
5. Generates HLS (.m3u8) and DASH (MPD) manifests
6. Updates video state to PUBLISHED in metadata DB
7. Triggers CDN cache warming for top-N edge PoPs

Codec Strategy

YouTube uses a multi-codec strategy: H.264 for maximum device compatibility (every browser and device since 2010), VP9 for Chromium browsers (50% bandwidth saving vs H.264), and AV1 for new devices where supported (additional 30% saving over VP9). Netflix uses a per-title encoding approach — complex content (action, fireworks) gets higher bitrates at each rung; simple content (talking heads) gets lower bitrates with identical visual quality. This alone saves Netflix 20% in storage and bandwidth costs.

Adaptive bitrate streaming and CDN delivery architecture: HLS segments, manifest files, edge PoPs, and origin shield
HLS/DASH adaptive bitrate streaming with multi-tier CDN: edge PoPs serve segments from cache; origin shield protects blob storage from cache misses. Source: mdsanwarhossain.me

4. Adaptive Bitrate Streaming: HLS & DASH

Adaptive Bitrate (ABR) streaming is the technology that allows the player to seamlessly switch quality levels based on available bandwidth — without the user pressing a button. This is what prevents buffering on slow connections while still delivering 4K on fast ones.

How HLS Works

HTTP Live Streaming (HLS, developed by Apple) works as follows:

  1. The transcoding pipeline outputs video segments (typically 2–10 seconds each) as .ts or .fmp4 files per quality level.
  2. A media playlist (.m3u8) file lists all segment URLs for one quality level.
  3. A master playlist (.m3u8) lists all quality-level playlists with bandwidth hints.
  4. The player downloads the master playlist, measures available bandwidth, selects the appropriate quality, then downloads segments sequentially.
  5. Every few seconds, the player re-evaluates bandwidth and may switch to a higher or lower quality rung — the switch is seamless because all segments are independently decodable.
# HLS Master Playlist (example)
#EXTM3U
#EXT-X-VERSION:6

#EXT-X-STREAM-INF:BANDWIDTH=300000,RESOLUTION=426x240,CODECS="avc1.42E01E,mp4a.40.2"
240p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480,CODECS="avc1.42E01E,mp4a.40.2"
480p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.4D401F,mp4a.40.2"
720p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/playlist.m3u8

DASH vs HLS

MPEG-DASH (Dynamic Adaptive Streaming over HTTP) is the open-standard alternative. Netflix uses DASH; YouTube supports both. Key differences: HLS uses .m3u8 + .ts/.fmp4; DASH uses XML manifests (MPD) + .mp4 segments. Both achieve the same ABR outcome. For browsers, DASH requires the Media Source Extensions (MSE) API; HLS is natively supported in Safari and iOS. Modern platforms serve DASH via JavaScript players (Shaka Player, dash.js) on desktop and HLS on iOS.

5. CDN & Edge Delivery Architecture

The CDN is the single most important performance component for a video platform. Without edge caching, every viewer's video segments would cross the globe to reach origin storage — adding hundreds of milliseconds of latency and consuming enormous egress bandwidth costs.

Multi-Tier CDN Architecture

Cache Warming for Viral Content

When a video goes viral, the CDN cold-cache problem can cause a massive spike at origin. The mitigation strategy: upon video publication, proactively push the first 30 seconds of the 720p rung (the most popular quality for initial playback) to the top 20 edge PoPs by geographic traffic volume. This ensures the first wave of viewers hits warm cache. For highly anticipated events (product launches, sports finals), pre-warm all rungs to all PoPs 15 minutes before broadcast.

6. Metadata & Storage Layer

Video bytes live in blob storage; everything else — titles, descriptions, view counts, likes, comments — lives in the metadata layer. This layer must handle writes (new views, likes) at extreme rates while serving reads with sub-50ms latency.

Storage Decisions by Data Type

Data Type Storage Rationale
Video bytes (raw + transcoded) S3 / GCS Petabyte-scale, 11 nines durability, tiered storage
Video metadata (title, description, tags) PostgreSQL (sharded) ACID, complex queries, relational integrity
View counters, like counts Redis (+ Cassandra for durability) INCR at millions/sec; eventual consistency OK
Comments Bigtable / Cassandra Write-heavy, time-ordered, wide rows per video
Watch history / user events Kafka → BigQuery / Iceberg Append-only stream, batch analytics for ML
Search index Elasticsearch Full-text search, faceting, ranking

View Count Scaling

View counts on viral videos can hit millions per second. Writing every view directly to PostgreSQL would saturate the database. The solution: use Redis INCR as a write buffer, then flush to PostgreSQL asynchronously via a background job every 60 seconds. For display, serve the Redis value (eventually consistent but fast). For analytics and billing, use the Kafka event log as the ground truth — every view event is published to Kafka and consumed by BigQuery for accurate aggregation.

7. Recommendation Engine

YouTube's recommendation engine is the highest-value component of the platform — it drives 70%+ of views. The architecture follows a classic two-stage retrieval + ranking pipeline used by Netflix, Spotify, and all major recommendation systems.

Stage 1: Candidate Retrieval

The corpus has billions of videos; you cannot rank all of them. Retrieval narrows the field to a few hundred candidates in <50ms using:

Stage 2: Ranking

The ~500 candidates from retrieval are passed to the ranking model, which scores each one using a deep neural network. Features include: user-video affinity (predicted watch percentage), video freshness, creator quality score, expected watch time, CTR calibration (prevent clickbait), and diversity penalty (avoid 10 cooking videos in a row). The ranker outputs a final ordered list in <100ms. YouTube's ranker is trained on billions of examples and optimized for a mix of watch time and user satisfaction signals.

8. Live Streaming vs VOD: Key Differences

Live streaming (Twitch, YouTube Live) and Video on Demand (YouTube VOD) share the CDN and player infrastructure but diverge significantly in the ingestion and transcoding layers.

VOD (Pre-recorded)

  • Upload → transcode → publish (async, minutes to hours)
  • Can optimize codec per title (per-title encoding)
  • Pre-warm CDN cache before release
  • Seekable anywhere in the video
  • Latency to viewer: milliseconds (buffered)

Live Streaming

  • RTMP/SRT ingest → real-time transcode → segment push
  • Latency is a design constraint (2–30s depending on use case)
  • Cannot pre-warm cache; origin shield bears initial spike
  • DVR functionality: retain last N minutes as seekable segments
  • Latency to viewer: 2–30s (HLS) or ~1s (WebRTC for ultra-low)

9. Search & Discovery

Search on a video platform must handle entity recognition (creator names, show titles), typo tolerance, and relevance ranking that factors in engagement signals — not just text match.

Indexing Pipeline

When a video is published, a Kafka consumer triggers the search indexing pipeline: video metadata (title, description, tags, transcript from automated speech recognition) is sent to Elasticsearch. The index maintains inverted indexes for text search and dense vector fields (via Elasticsearch's kNN support) for semantic search. Relevance ranking is a learned-to-rank model that uses query-video text similarity + view count + engagement rate + recency as features, trained on click-through data.

10. Cost Optimization Strategies

At YouTube scale, a 1% reduction in storage or bandwidth cost saves tens of millions of dollars annually. The major levers:

11. Capacity Estimation & Conclusion

For a system design interview, demonstrate structured estimation:

Back-of-Envelope: Storage

  • 500 hours of video uploaded per minute = 30,000 hours/hour
  • 1 hour of raw video ≈ 10 GB (4K RAW source)
  • After transcoding (6 rungs × ~0.5× compression): ~30 GB per hour of content
  • Daily new storage: 30,000 × 30 GB = 900 TB/day = ~0.9 PB/day
  • After 10 years: ~3.3 EB (consistent with YouTube's reported scale)

Back-of-Envelope: Bandwidth

  • 1 billion hours watched daily ÷ 86,400 seconds = ~11.6 million concurrent viewers
  • Average bitrate per viewer: 3 Mbps (mix of resolutions)
  • Total egress: 11.6M × 3 Mbps = 34.7 Tbps average; peak ≈ 2× = ~70 Tbps
  • CDN cache hit ratio 99% → origin only sees 1% = ~350 Gbps to origin

A production-grade video streaming platform is a masterclass in distributed systems. The key insight is that read optimization dominates: the CDN, the ABR player, and the recommendation engine all exist to serve a read-heavy workload with minimum latency and cost. The upload and transcoding systems are comparatively straightforward engineering challenges — the hard part is serving 11 million concurrent viewers at <200ms buffer start time across 200 countries.

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · System Design

All Posts
Last updated: April 6, 2026