YouTube is a video streaming platform where users upload, stream, and discover video content. With 2.5 billion monthly active users and 500+ hours of video uploaded every minute, designing YouTube is a classic system design interview question that tests your understanding of video processing, storage optimization, and content delivery at massive scale.
This walkthrough follows the Interview Framework and focuses on what you'd actually present in a 45-60 minute interview.
Users should be able to upload videos - Content creators upload videos in various formats
Users should be able to stream videos - Viewers watch videos with adaptive quality based on bandwidth
Users should be able to search for videos - Find videos by title, description, and tags
Users should be able to interact with videos - Like, comment, and subscribe to channels
Keep the scope tight. In an interview, explicitly defer features like live streaming, recommendations, and monetization unless asked. These are entire systems on their own.
Scale: 2.5 billion monthly active users, 500M daily active users
Availability: 99.99% uptime - users expect YouTube to always work
Latency: First byte within ~200ms at the edge; playback starts within ~1-2s for cached content
Consistency: Eventual consistency is acceptable - a new upload doesn't need to be instantly visible to all users
Key insight for the interviewer: YouTube is extremely read-heavy. The upload:view ratio is approximately 1:300. This heavily influences our design decisions around caching and CDN strategy.
Let's establish the scale we're designing for:
Traffic:
500M daily active users
Each user watches ~5 videos/day = 2.5B video views/day
2.5B views / 86,400 seconds = ~29K video streams/second
Uploads:
500 hours of video uploaded per minute
Average video length: 5 minutes
Videos per minute: (500 * 60) / 5 = 6,000 videos/minute = ~100 uploads/second
Storage:
Source upload (already compressed): ~600 MB per 5-minute video
Encoded renditions (360p, 480p, 720p, 1080p): ~250 MB total per video on average
4K + AV1 are generated only for a subset of videos, which increases storage per title
Daily new storage (encoded renditions only): 6,000 videos/min × 1,440 min/day × 250 MB = ~2.2 PB/day
Storage grows linearly and never stops. YouTube must have a strategy for cold storage migration and potentially removing very old, rarely-watched content from hot storage tiers.
Bandwidth:
Streaming: Assume average bitrate of 5 Mbps (720p)
Peak concurrent streams: ~50M users (10% of DAU)
Outbound bandwidth: 50M * 5 Mbps = 250 Tbps for streaming alone
Video
├── video_id (PK)
├── user_id (FK)
├── title
├── description
├── tags[]
├── upload_timestamp
├── duration_seconds
├── status (processing, published, failed)
├── source_url (original upload in blob storage)
├── manifest_url (DASH/HLS manifest)
├── thumbnail_url
├── view_count
├── like_count
└── dislike_count
User
├── user_id (PK)
├── email
├── username
├── channel_name
├── subscriber_count
└── created_at
Comment
├── comment_id (PK)
├── video_id (FK)
├── user_id (FK)
├── content
├── timestamp
└── parent_comment_id (for replies)
YouTube uses 11-character Base64 IDs (e.g., dQw4w9WgXcQ). With 64 possible characters per position, this provides 64^11 ≈ 73 quintillion unique IDs.
Options:
Random generation: Generate random 11-char string, check for collision
Counter + Base64: Similar to Snowflake, encode timestamp + machine ID + sequence
UUID shortened: Generate UUID, Base64 encode, truncate
Unlike Twitter's Snowflake IDs, YouTube IDs don't need to be time-sortable since videos are queried by creation timestamp, not ID order. Random IDs work fine and are simpler.
| Data Type | Storage Solution | Rationale |
|---|---|---|
| Video files | Object storage (S3/GCS) | Designed for large binary files, highly durable |
| Video metadata | SQL (MySQL/PostgreSQL) | Structured data, ACID for ownership/permissions |
| Thumbnails | Object storage + metadata in Bigtable | Blob storage for images, fast lookup for metadata |
| User sessions | Redis | Fast lookups, can tolerate data loss |
| Search index | Elasticsearch | Full-text search on titles, descriptions, tags |
Interview insight: Mention that YouTube uses Vitess (MySQL sharding middleware) to scale their relational database. This shows awareness of real-world solutions beyond generic "just shard it" answers.
We'll use REST for simplicity. In practice, YouTube uses gRPC internally for service-to-service communication.
Large file uploads require a two-step resumable protocol:
Step 1: Initiate upload
POST /api/v1/videos/upload
Headers: Authorization: Bearer <token>
Request Body:
{
"title": "My Video",
"description": "Description here",
"tags": ["tech", "tutorial"],
"file_size": 524288000,
"privacy": "public" | "private" | "unlisted"
}
Response: 200 OK
{
"video_id": "dQw4w9WgXcQ",
"upload_url": "https://upload.youtube.com/v1/upload/dQw4w9WgXcQ"
}
Step 2: Upload chunks
PUT {upload_url}
Headers: Content-Range: bytes 0-5242879/524288000
Body: <binary chunk>
Response: 308 Resume Incomplete (or 200 OK when complete)
Resumable uploads are essential at scale. Users on flaky connections can resume from the last successful chunk. The upload service tracks progress and triggers transcoding only when all chunks are received.
GET /api/v1/videos/{video_id}/status
Response: 200 OK
{
"video_id": "dQw4w9WgXcQ",
"status": "processing" | "published" | "failed",
"progress": 75,
"available_resolutions": ["360p", "480p"] // Partial availability
}
Clients poll this endpoint or subscribe to webhooks to know when a video is ready.
GET /api/v1/videos/{video_id}/stream
Headers: Authorization: Bearer <token> (optional)
Query Parameters:
Response: 200 OK
{
"manifest_url": "https://cdn.youtube.com/.../manifest.mpd",
"available_resolutions": ["360p", "480p", "720p", "1080p"],
"duration": 300
}
The client uses the manifest URL to fetch video chunks via DASH or HLS adaptive streaming protocols. Seeking is done by requesting the segment that covers the target timestamp.
GET /api/v1/search?q={query}&cursor={page_token}&limit={limit}
Response: 200 OK
{
"results": [
{
"video_id": "abc123",
"title": "Matching Video",
"thumbnail_url": "...",
"channel_name": "Creator",
"view_count": 1000000,
"duration": 300,
"upload_date": "2024-01-15"
}
],
"next_page_token": "..."
}
Use cursor-based pagination for search results. Offset-based (page=5) is expensive at scale—the database must scan and skip all previous rows. Cursors (opaque tokens encoding the last result's position) allow efficient range queries.
Data Stores
Video Processing
Application Services
Edge Layer
Clients
API + video requests
API + video requests
API + video requests
API requests
Video cache miss
Raw video
Enqueue job
Encoded videos
Thumbnails
Update status
Get metadata
Cache miss
Video URL
Web Browser
Mobile App
Smart TV
CDN / Edge Cache
Load Balancer
API Gateway
Upload Service
Streaming Service
Search Service
User Service
Message Queue
Transcoding Workers
Thumbnail Generator
Metadata DB
Blob Storage
Search Index
Redis Cache
Client initiates upload: Sends metadata first, receives a video_id and resumable upload URL
Chunked upload: Client uploads video in chunks (enables resume on failure)
Raw storage: Video stored in temporary blob storage
Queue processing job: Message sent to transcoding queue
Transcoding: Workers convert video to multiple resolutions (360p, 480p, 720p, 1080p, 4K) and formats (H.264, VP9, AV1)
Thumbnail generation: Extract frames or accept user-uploaded thumbnails
Update metadata: Mark video as "published", store URLs for each resolution
CDN push (optional): For predicted popular videos, proactively push to CDN edge nodes
Why per-segment encoding? Videos are split into 4-10 second segments, each encoded independently. This enables:
Parallel processing across many workers
Adaptive bitrate streaming (switch quality mid-video)
Faster time-to-first-byte (start playing before full transcode)
Client requests video: API returns a manifest file (DASH .mpd or HLS .m3u8)
Manifest describes chunks: Lists URLs for each segment at each quality level
Adaptive bitrate: Client monitors bandwidth and requests appropriate quality chunks
CDN serves chunks: Most chunks served from edge cache, cache miss goes to origin
Manifest Example (simplified):
{
"duration": 300,
"segments": [
{
"start": 0,
"duration": 4,
"qualities": {
"360p": "https://cdn.youtube.com/video123/seg0_360p.mp4",
"720p": "https://cdn.youtube.com/video123/seg0_720p.mp4",
"1080p": "https://cdn.youtube.com/video123/seg0_1080p.mp4"
}
},
// ... more segments
]
}
Index on upload: When a video is published, extract metadata (title, description, tags, auto-generated captions)
Inverted index: Elasticsearch maintains mapping from keywords to video IDs
Ranking factors: Relevance score + view count + recency + user engagement
Query flow: Search service queries Elasticsearch, enriches results with metadata from cache/DB
Low Latency (~200ms TTFB, ~1-2s startup)
Global CDN: Deploy edge servers in 100+ locations worldwide
Predictive caching: Push popular content to edge before requests arrive
Connection reuse: HTTP/2 or QUIC for faster connection establishment
Segment pre-fetch: Client fetches next segment while current one plays
High Availability (99.99% uptime)
Multi-region deployment: Active-active across 3+ regions
Graceful degradation: If high-res transcoding fails, serve lower resolutions that completed
Circuit breakers: Isolate failing services to prevent cascade
Data replication: 3x replication for blob storage, sync within a region + async cross-region for metadata
Massive Scale (250 Tbps bandwidth)
Tiered caching:
L1: ISP-level cache (Google Global Cache)
L2: Regional CDN PoPs
L3: Origin data centers
Storage tiering:
Hot: Popular videos on SSD-backed storage
Warm: Moderate traffic on HDD
Cold: Rarely accessed videos on tape/archive
CDN Strategy: Build vs. Buy
| Approach | Pros | Cons |
|---|---|---|
| Public CDN (Akamai, CloudFlare) | Quick to deploy, global coverage | Expensive at YouTube's scale, less control |
| Private CDN (Google's approach) | Optimized for video, cost-effective at scale | Huge upfront investment, complex operations |
YouTube uses a hybrid: their own infrastructure + partnerships with ISPs (Google Global Cache boxes installed at ISP data centers).
Database: SQL vs. NoSQL
Video metadata: SQL (Vitess-sharded MySQL) - structured data, strong consistency for ownership/permissions
Comments: NoSQL (Cassandra) - high write volume, eventually consistent is fine
View counts: Redis + async flush to SQL - high throughput, approximate counts acceptable
User sessions: Redis - ephemeral, speed is priority
Handling Viral Videos
When a video suddenly goes viral:
Real-time popularity detection: Monitor view velocity
Automatic CDN promotion: Push to more edge locations
Origin shielding: Aggregate cache misses at regional level before hitting origin
Rate limiting: Protect origin from thundering herd
10% of uploads are duplicates. At 6,000 videos/minute, that's 600 duplicate uploads per minute wasting storage and violating copyright.
Solutions:
Content fingerprinting: Hash video frames, compare against database (Content ID system)
Perceptual hashing: Detect near-duplicates (slightly modified videos)
Audio fingerprinting: Catch re-uploads with different video but same audio
View counts must be:
Accurate: No inflated counts from bots
Real-time enough: Creators expect to see views increase
Scalable: Handle millions of increments per second
Solution:
Write increments to Redis (fast)
Batch flush to database every few seconds
Apply bot detection (rate limiting, behavioral analysis)
Show "approximate" counts for very recent videos
Ignoring the upload:view ratio - This is 1:300. Design your read path to be 300x more robust than your write path.
Forgetting video processing time - Raw uploads need transcoding. This can take minutes to hours. Design for asynchronous processing with status updates.
Underestimating bandwidth costs - At YouTube's scale, bandwidth is the primary cost driver, not storage. CDN and ISP partnerships are critical.
Not considering mobile networks - Many users are on 3G/4G with variable bandwidth. Adaptive bitrate streaming is essential, not optional.
Treating all videos equally - 90% of views come from 10% of videos. Your caching and storage tiering strategy must account for this power law distribution.
Before concluding, verify you've covered:
Upload flow with resumable uploads and async processing
Streaming with adaptive bitrate (DASH/HLS)
CDN strategy for global low-latency delivery
Storage tiering for cost optimization
Video transcoding pipeline (multiple resolutions/codecs)
Search with inverted index
Handling viral videos / thundering herd
View count accuracy and bot prevention
Trade-off discussion (consistency vs. availability)
| Aspect | Decision | Rationale |
|---|---|---|
| Upload | Resumable chunked upload | Handle large files over unreliable connections |
| Processing | Async transcoding via message queue | Decouple upload from encoding, parallel processing |
| Storage | Blob storage + SQL + Elasticsearch | Right tool for each data type |
| Streaming | Adaptive bitrate (DASH/HLS) via CDN | Adjust quality to bandwidth in real-time |
| CDN | Multi-tier: ISP → Regional → Origin | Minimize distance to users, reduce origin load |
| Database | Vitess (sharded MySQL) + Redis | Scale relational data, sub-ms cache reads |
| Consistency | Eventual consistency | Acceptable for views/likes, prioritize availability |