What is High-Level System Components and how does it work?

The platform decomposes into five planes: Upload plane: Client SDK → API gateway → chunked upload service → raw blob storage (S3/GCS) Processing plane: Upload event → message queue (Kafka) → transcoding workers → processed blob storage + manifest generation Serving plane: Client player → CDN edge → origin shield → blob storage Metadata plane: PostgreSQL (source of truth), Elasticsearch (search), Redis (hot data cache), Cassandra (view counters)

What is Resumable Upload Protocol and how does it work?

Initiate: Client sends a POST to the upload API with file metadata (size, MIME type, title). Server returns a unique upload session ID and a presigned URL pointing to blob storage. Chunk: Client splits the file into 5–10 MB chunks and sends each with a Content-Range header. Chunks can be sent in parallel (8–16 concurrent connections) for speed. Track state: Upload service stores chunk completion state in Redis ( UPLOAD:{sessionId} → bitmap). If the network drops, the client queries which chunks are missing and resends only those. Assemble: Once all chunks are received, the upload service writes the assembled raw file to a "raw" bucket in S3 and publishes a VIDEO_UPLOADED event to Kafka.

System Design

Designing a Video Streaming Platform at Scale: YouTube Architecture, Adaptive Bitrate & CDN

Video streaming platforms are among the most data-intensive distributed systems ever built. YouTube serves over 1 billion hours of video daily; Netflix accounts for 15% of global internet traffic. This guide dissects every layer of the stack — from the upload pipeline to the adaptive player in your browser — with concrete design decisions and real numbers.

Md Sanwar Hossain April 6, 2026 22 min read System Design

Video streaming platform system design: ingestion, transcoding, CDN, adaptive bitrate, and recommendation engine at scale

TL;DR — Core Architecture Decisions

"A video streaming platform needs: (1) a resumable upload pipeline that chunks videos and writes to blob storage (S3/GCS), (2) a parallel transcoding fleet producing a bitrate ladder (240p–4K) in H.264/VP9/AV1, (3) HLS/DASH manifest generation for adaptive bitrate delivery, (4) a multi-tier CDN (edge PoPs → regional clusters → origin shield) serving segments with 99%+ cache hit ratios, and (5) a two-stage recommendation engine (candidate retrieval + ranking) powered by watch-history embeddings."

Architecture Overview & Scale Numbers
Video Ingestion & Upload Pipeline
Transcoding at Scale: Bitrate Ladder & Codec Selection
Adaptive Bitrate Streaming: HLS & DASH
CDN & Edge Delivery Architecture
Metadata & Storage Layer
Recommendation Engine
Live Streaming vs VOD: Key Differences
Search & Discovery
Cost Optimization Strategies
Capacity Estimation & Conclusion

1. Architecture Overview & Scale Numbers

A video streaming platform has two fundamentally different traffic patterns: write path (upload, transcode, index) and read path (browse, play, seek). The read path dwarfs the write path by orders of magnitude — for every video uploaded, millions of views happen. This asymmetry drives most design decisions.

YouTube-Scale Numbers (2026)

Metric	Number	Design Implication
Daily active users	2.5 billion	Global CDN with 200+ PoPs
Video hours watched/day	1 billion hours	200+ Tbps egress bandwidth
Videos uploaded/minute	500 hours of video	Parallel transcoding workers
Storage	Exabytes	Tiered cold/warm/hot blob storage
Peak concurrent viewers	80+ million	Aggressive CDN prefetching + edge caching

High-Level System Components

The platform decomposes into five planes:

Upload plane: Client SDK → API gateway → chunked upload service → raw blob storage (S3/GCS)
Processing plane: Upload event → message queue (Kafka) → transcoding workers → processed blob storage + manifest generation
Serving plane: Client player → CDN edge → origin shield → blob storage
Metadata plane: PostgreSQL (source of truth), Elasticsearch (search), Redis (hot data cache), Cassandra (view counters)
Intelligence plane: Kafka events → stream processor → feature store → ML recommendation models → ranking service

Video streaming platform architecture: ingestion pipeline, transcoding fleet, CDN edge delivery, metadata layer, and recommendation engine — Complete video streaming platform architecture — from upload to playback at global scale. Source: mdsanwarhossain.me

2. Video Ingestion & Upload Pipeline

Uploading a large video file requires careful engineering. Raw uploads can be hundreds of gigabytes (4K, 8K films). A naive single-request upload would time out and lose progress on network interruptions. The solution is resumable chunked uploads.

Resumable Upload Protocol

Initiate: Client sends a POST to the upload API with file metadata (size, MIME type, title). Server returns a unique upload session ID and a presigned URL pointing to blob storage.
Chunk: Client splits the file into 5–10 MB chunks and sends each with a Content-Range header. Chunks can be sent in parallel (8–16 concurrent connections) for speed.
Track state: Upload service stores chunk completion state in Redis (UPLOAD:{sessionId} → bitmap). If the network drops, the client queries which chunks are missing and resends only those.
Assemble: Once all chunks are received, the upload service writes the assembled raw file to a "raw" bucket in S3 and publishes a VIDEO_UPLOADED event to Kafka.
Validate: A validation worker consumes the event, verifies the file (format check, virus scan, copyright fingerprint via Content ID), and transitions the video to PROCESSING state.

Direct-to-Storage Upload Pattern

For large files, routing through application servers wastes bandwidth and CPU. The preferred pattern: the upload API generates a presigned S3 URL with a 6-hour expiry. The client browser uploads directly to S3, bypassing your application tier entirely. Your upload API only handles metadata; your servers are never a bottleneck for the bytes themselves. Upon successful S3 upload, S3 triggers a Lambda or sends an S3 event notification to SQS/Kafka to kick off the processing pipeline.

3. Transcoding at Scale: Bitrate Ladder & Codec Selection

Transcoding is the most compute-intensive operation in the platform. A single 4K, 60fps, 2-hour film may require 10+ hours of CPU time for a full codec ladder — which is why the fleet must parallelize aggressively.

The Bitrate Ladder

Each uploaded video is transcoded into multiple quality tiers (the "bitrate ladder") so adaptive streaming can choose the right one at runtime:

Resolution	H.264 Bitrate	VP9/AV1 Bitrate	Use Case
240p	300 kbps	150 kbps	2G / very slow connections
480p	1 Mbps	500 kbps	Mobile, 3G
720p	2.5 Mbps	1.2 Mbps	Standard Wi-Fi
1080p	5 Mbps	2.5 Mbps	Full HD, broadband
1440p (2K)	10 Mbps	5 Mbps	High-end desktop
2160p (4K)	20 Mbps	10 Mbps	4K TV, premium

Parallel Segmented Transcoding

Rather than transcoding a 2-hour film as a single job, the pipeline splits the raw video into 2-minute segments, transcodes each segment in parallel across dozens of workers, then stitches the outputs. This reduces a 4-hour transcoding job to under 10 minutes for most content. The workflow:

# Transcoding pipeline (simplified)
1. Split raw video into 120s segments (ffmpeg -segment_time 120)
2. Fan out: publish N segment jobs to Kafka topic "transcode-jobs"
3. Worker pool (auto-scaled on Kubernetes) each picks a job:
   - Downloads segment from S3 (raw bucket)
   - Transcodes to all bitrate rungs: H.264, VP9, AV1
   - Uploads output segments to S3 (processed bucket)
   - Publishes "SEGMENT_DONE" event
4. Manifest builder waits for all segments to complete
5. Generates HLS (.m3u8) and DASH (MPD) manifests
6. Updates video state to PUBLISHED in metadata DB
7. Triggers CDN cache warming for top-N edge PoPs

Codec Strategy

YouTube uses a multi-codec strategy: H.264 for maximum device compatibility (every browser and device since 2010), VP9 for Chromium browsers (50% bandwidth saving vs H.264), and AV1 for new devices where supported (additional 30% saving over VP9). Netflix uses a per-title encoding approach — complex content (action, fireworks) gets higher bitrates at each rung; simple content (talking heads) gets lower bitrates with identical visual quality. This alone saves Netflix 20% in storage and bandwidth costs.

Adaptive bitrate streaming and CDN delivery architecture: HLS segments, manifest files, edge PoPs, and origin shield — HLS/DASH adaptive bitrate streaming with multi-tier CDN: edge PoPs serve segments from cache; origin shield protects blob storage from cache misses. Source: mdsanwarhossain.me

4. Adaptive Bitrate Streaming: HLS & DASH

Adaptive Bitrate (ABR) streaming is the technology that allows the player to seamlessly switch quality levels based on available bandwidth — without the user pressing a button. This is what prevents buffering on slow connections while still delivering 4K on fast ones.

How HLS Works

HTTP Live Streaming (HLS, developed by Apple) works as follows:

The transcoding pipeline outputs video segments (typically 2–10 seconds each) as .ts or .fmp4 files per quality level.
A media playlist (.m3u8) file lists all segment URLs for one quality level.
A master playlist (.m3u8) lists all quality-level playlists with bandwidth hints.
The player downloads the master playlist, measures available bandwidth, selects the appropriate quality, then downloads segments sequentially.
Every few seconds, the player re-evaluates bandwidth and may switch to a higher or lower quality rung — the switch is seamless because all segments are independently decodable.

# HLS Master Playlist (example)
#EXTM3U
#EXT-X-VERSION:6

#EXT-X-STREAM-INF:BANDWIDTH=300000,RESOLUTION=426x240,CODECS="avc1.42E01E,mp4a.40.2"
240p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480,CODECS="avc1.42E01E,mp4a.40.2"
480p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.4D401F,mp4a.40.2"
720p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/playlist.m3u8

DASH vs HLS

MPEG-DASH (Dynamic Adaptive Streaming over HTTP) is the open-standard alternative. Netflix uses DASH; YouTube supports both. Key differences: HLS uses .m3u8 + .ts/.fmp4; DASH uses XML manifests (MPD) + .mp4 segments. Both achieve the same ABR outcome. For browsers, DASH requires the Media Source Extensions (MSE) API; HLS is natively supported in Safari and iOS. Modern platforms serve DASH via JavaScript players (Shaka Player, dash.js) on desktop and HLS on iOS.

5. CDN & Edge Delivery Architecture

The CDN is the single most important performance component for a video platform. Without edge caching, every viewer's video segments would cross the globe to reach origin storage — adding hundreds of milliseconds of latency and consuming enormous egress bandwidth costs.

Multi-Tier CDN Architecture

Tier 1 — Edge PoPs (200+ locations): Closest to end users. Cache hot video segments. Aim for 99%+ cache hit ratio for top 1% of videos. Serve the majority of requests without going upstream.
Tier 2 — Regional clusters (20–30 locations): Aggregate requests from multiple edge PoPs. Cache mid-popularity content. Reduce origin shield load by 80–90% for the long tail.
Tier 3 — Origin shield (2–3 locations): Single point that shields blob storage from the internet. All CDN misses converge here before reaching S3/GCS. Prevents thundering herd on popular uploads (a viral video goes from 0 to 10M requests in minutes).
Origin — Blob storage: S3-compatible object storage. Never exposed directly to clients. Accessed only by the origin shield on CDN misses.

Cache Warming for Viral Content

When a video goes viral, the CDN cold-cache problem can cause a massive spike at origin. The mitigation strategy: upon video publication, proactively push the first 30 seconds of the 720p rung (the most popular quality for initial playback) to the top 20 edge PoPs by geographic traffic volume. This ensures the first wave of viewers hits warm cache. For highly anticipated events (product launches, sports finals), pre-warm all rungs to all PoPs 15 minutes before broadcast.

6. Metadata & Storage Layer

Video bytes live in blob storage; everything else — titles, descriptions, view counts, likes, comments — lives in the metadata layer. This layer must handle writes (new views, likes) at extreme rates while serving reads with sub-50ms latency.

Storage Decisions by Data Type

Data Type	Storage	Rationale
Video bytes (raw + transcoded)	S3 / GCS	Petabyte-scale, 11 nines durability, tiered storage
Video metadata (title, description, tags)	PostgreSQL (sharded)	ACID, complex queries, relational integrity
View counters, like counts	Redis (+ Cassandra for durability)	INCR at millions/sec; eventual consistency OK
Comments	Bigtable / Cassandra	Write-heavy, time-ordered, wide rows per video
Watch history / user events	Kafka → BigQuery / Iceberg	Append-only stream, batch analytics for ML
Search index	Elasticsearch	Full-text search, faceting, ranking

View Count Scaling

View counts on viral videos can hit millions per second. Writing every view directly to PostgreSQL would saturate the database. The solution: use Redis INCR as a write buffer, then flush to PostgreSQL asynchronously via a background job every 60 seconds. For display, serve the Redis value (eventually consistent but fast). For analytics and billing, use the Kafka event log as the ground truth — every view event is published to Kafka and consumed by BigQuery for accurate aggregation.

7. Recommendation Engine

YouTube's recommendation engine is the highest-value component of the platform — it drives 70%+ of views. The architecture follows a classic two-stage retrieval + ranking pipeline used by Netflix, Spotify, and all major recommendation systems.

Stage 1: Candidate Retrieval

The corpus has billions of videos; you cannot rank all of them. Retrieval narrows the field to a few hundred candidates in <50ms using:

Collaborative filtering: "Users who watched X also watched Y." Pre-computed item-item similarity matrices stored in Redis/Memcached. Fast lookup: O(1) per video.
Embedding-based ANN search: User watch history is encoded into a 256-dim embedding. FAISS or ScaNN retrieves the top-K nearest video embeddings from the corpus. Latency: ~5ms on GPU indexes.
Content-based filtering: If the user watched a cooking video, retrieve other videos with similar tags, channel, or topic clusters. Cold-start friendly.
Trending & contextual signals: Globally trending videos, time-of-day signals (news in morning, entertainment in evening), and geography-adjusted trending.

Stage 2: Ranking

The ~500 candidates from retrieval are passed to the ranking model, which scores each one using a deep neural network. Features include: user-video affinity (predicted watch percentage), video freshness, creator quality score, expected watch time, CTR calibration (prevent clickbait), and diversity penalty (avoid 10 cooking videos in a row). The ranker outputs a final ordered list in <100ms. YouTube's ranker is trained on billions of examples and optimized for a mix of watch time and user satisfaction signals.

8. Live Streaming vs VOD: Key Differences

Live streaming (Twitch, YouTube Live) and Video on Demand (YouTube VOD) share the CDN and player infrastructure but diverge significantly in the ingestion and transcoding layers.

VOD (Pre-recorded)

Upload → transcode → publish (async, minutes to hours)
Can optimize codec per title (per-title encoding)
Pre-warm CDN cache before release
Seekable anywhere in the video
Latency to viewer: milliseconds (buffered)

Live Streaming

RTMP/SRT ingest → real-time transcode → segment push
Latency is a design constraint (2–30s depending on use case)
Cannot pre-warm cache; origin shield bears initial spike
DVR functionality: retain last N minutes as seekable segments
Latency to viewer: 2–30s (HLS) or ~1s (WebRTC for ultra-low)

9. Search & Discovery

Search on a video platform must handle entity recognition (creator names, show titles), typo tolerance, and relevance ranking that factors in engagement signals — not just text match.

Indexing Pipeline

When a video is published, a Kafka consumer triggers the search indexing pipeline: video metadata (title, description, tags, transcript from automated speech recognition) is sent to Elasticsearch. The index maintains inverted indexes for text search and dense vector fields (via Elasticsearch's kNN support) for semantic search. Relevance ranking is a learned-to-rank model that uses query-video text similarity + view count + engagement rate + recency as features, trained on click-through data.

10. Cost Optimization Strategies

At YouTube scale, a 1% reduction in storage or bandwidth cost saves tens of millions of dollars annually. The major levers:

AV1 adoption: AV1 provides ~30% bandwidth savings over VP9 and ~50% over H.264. As device support grows (90%+ of new devices by 2026), shifting traffic from H.264 to AV1 is the single biggest bandwidth cost reduction lever.
Cold storage tiering: Videos with <100 views/month are migrated to S3 Glacier or GCS Archive (90% cheaper than standard tiers). 80% of YouTube's video catalog falls in this "long tail" category.
Spot/preemptible instances for transcoding: Transcoding is fault-tolerant (checkpointed by segment) and can run on spot instances. Cost reduction: 70% vs on-demand.
Deduplicate identical uploads: Content fingerprinting (perceptual hash) detects re-uploads of the same video. Serve from existing transcoded assets instead of re-transcoding.
Adaptive segment duration: Use longer segments (10s) for long-tail videos (reduces manifest request overhead) and shorter segments (2s) for live/sports (reduces latency).

11. Capacity Estimation & Conclusion

For a system design interview, demonstrate structured estimation:

Back-of-Envelope: Storage

500 hours of video uploaded per minute = 30,000 hours/hour
1 hour of raw video ≈ 10 GB (4K RAW source)
After transcoding (6 rungs × ~0.5× compression): ~30 GB per hour of content
Daily new storage: 30,000 × 30 GB = 900 TB/day = ~0.9 PB/day
After 10 years: ~3.3 EB (consistent with YouTube's reported scale)

Back-of-Envelope: Bandwidth

1 billion hours watched daily ÷ 86,400 seconds = ~11.6 million concurrent viewers
Average bitrate per viewer: 3 Mbps (mix of resolutions)
Total egress: 11.6M × 3 Mbps = 34.7 Tbps average; peak ≈ 2× = ~70 Tbps
CDN cache hit ratio 99% → origin only sees 1% = ~350 Gbps to origin

A production-grade video streaming platform is a masterclass in distributed systems. The key insight is that read optimization dominates: the CDN, the ABR player, and the recommendation engine all exist to serve a read-heavy workload with minimum latency and cost. The upload and transcoding systems are comparatively straightforward engineering challenges — the hard part is serving 11 million concurrent viewers at <200ms buffer start time across 200 countries.

Designing a Video Streaming Platform at Scale: YouTube Architecture, Adaptive Bitrate & CDN

TL;DR — Core Architecture Decisions

Table of Contents

1. Architecture Overview & Scale Numbers

YouTube-Scale Numbers (2026)

High-Level System Components

2. Video Ingestion & Upload Pipeline

Resumable Upload Protocol

Direct-to-Storage Upload Pattern

3. Transcoding at Scale: Bitrate Ladder & Codec Selection

The Bitrate Ladder

Parallel Segmented Transcoding

Codec Strategy

4. Adaptive Bitrate Streaming: HLS & DASH

How HLS Works

DASH vs HLS

5. CDN & Edge Delivery Architecture

Multi-Tier CDN Architecture

Cache Warming for Viral Content

6. Metadata & Storage Layer

Storage Decisions by Data Type

View Count Scaling

7. Recommendation Engine

Stage 1: Candidate Retrieval

Stage 2: Ranking

8. Live Streaming vs VOD: Key Differences

VOD (Pre-recorded)

Live Streaming

9. Search & Discovery

Indexing Pipeline

10. Cost Optimization Strategies

11. Capacity Estimation & Conclusion

Back-of-Envelope: Storage

Back-of-Envelope: Bandwidth

Tags

Leave a Comment

Related Posts

Designing a Video Streaming Platform at Scale: YouTube Architecture, Adaptive Bitrate & CDN

TL;DR — Core Architecture Decisions

Table of Contents

1. Architecture Overview & Scale Numbers

YouTube-Scale Numbers (2026)

High-Level System Components

2. Video Ingestion & Upload Pipeline

Resumable Upload Protocol

Direct-to-Storage Upload Pattern

3. Transcoding at Scale: Bitrate Ladder & Codec Selection

The Bitrate Ladder

Parallel Segmented Transcoding

Codec Strategy

4. Adaptive Bitrate Streaming: HLS & DASH

How HLS Works

DASH vs HLS

5. CDN & Edge Delivery Architecture

Multi-Tier CDN Architecture

Cache Warming for Viral Content

6. Metadata & Storage Layer

Storage Decisions by Data Type

View Count Scaling

7. Recommendation Engine

Stage 1: Candidate Retrieval

Stage 2: Ranking

8. Live Streaming vs VOD: Key Differences

VOD (Pre-recorded)

Live Streaming

9. Search & Discovery

Indexing Pipeline

10. Cost Optimization Strategies

11. Capacity Estimation & Conclusion

Back-of-Envelope: Storage

Back-of-Envelope: Bandwidth

Tags

Leave a Comment

Related Posts

Designing a Global CDN Architecture

Designing a Distributed Search Engine

Designing a File Storage System at Scale

Scalable Systems at Uber & Netflix

Cookie Notice