Back

Design YouTube

System DesignSystem DesignOnsitePhoneSoftware EngineerReported Feb, 2026

YouTube is a video streaming platform where users upload, stream, and discover video content. With 2.5 billion monthly active users and 500+ hours of video uploaded every minute, designing YouTube is a classic system design interview question that tests your understanding of video processing, storage optimization, and content delivery at massive scale.

This walkthrough follows the Interview Framework and focuses on what you'd actually present in a 45-60 minute interview.

Phase 1: Requirements

Functional Requirements

Users should be able to upload videos - Content creators upload videos in various formats

Users should be able to stream videos - Viewers watch videos with adaptive quality based on bandwidth

Users should be able to search for videos - Find videos by title, description, and tags

Users should be able to interact with videos - Like, comment, and subscribe to channels

Keep the scope tight. In an interview, explicitly defer features like live streaming, recommendations, and monetization unless asked. These are entire systems on their own.

Non-Functional Requirements

Scale: 2.5 billion monthly active users, 500M daily active users

Availability: 99.99% uptime - users expect YouTube to always work

Latency: First byte within ~200ms at the edge; playback starts within ~1-2s for cached content

Consistency: Eventual consistency is acceptable - a new upload doesn't need to be instantly visible to all users

Key insight for the interviewer: YouTube is extremely read-heavy. The upload:view ratio is approximately 1:300. This heavily influences our design decisions around caching and CDN strategy.

Capacity Estimation

Let's establish the scale we're designing for:

Traffic:

500M daily active users

Each user watches ~5 videos/day = 2.5B video views/day

2.5B views / 86,400 seconds = ~29K video streams/second

Uploads:

500 hours of video uploaded per minute

Average video length: 5 minutes

Videos per minute: (500 * 60) / 5 = 6,000 videos/minute = ~100 uploads/second

Storage:

Source upload (already compressed): ~600 MB per 5-minute video

Encoded renditions (360p, 480p, 720p, 1080p): ~250 MB total per video on average

4K + AV1 are generated only for a subset of videos, which increases storage per title

Daily new storage (encoded renditions only): 6,000 videos/min × 1,440 min/day × 250 MB = ~2.2 PB/day

Storage grows linearly and never stops. YouTube must have a strategy for cold storage migration and potentially removing very old, rarely-watched content from hot storage tiers.

Bandwidth:

Streaming: Assume average bitrate of 5 Mbps (720p)

Peak concurrent streams: ~50M users (10% of DAU)

Outbound bandwidth: 50M * 5 Mbps = 250 Tbps for streaming alone

Phase 2: Data Model

Core Entities

Video

├── video_id (PK)
├── user_id (FK)
├── title
├── description
├── tags[]
├── upload_timestamp
├── duration_seconds
├── status (processing, published, failed)
├── source_url (original upload in blob storage)
├── manifest_url (DASH/HLS manifest)
├── thumbnail_url
├── view_count
├── like_count
└── dislike_count

User

├── user_id (PK)
├── email
├── username
├── channel_name
├── subscriber_count
└── created_at

Comment

├── comment_id (PK)
├── video_id (FK)
├── user_id (FK)
├── content
├── timestamp
└── parent_comment_id (for replies)

Video ID Generation

YouTube uses 11-character Base64 IDs (e.g., dQw4w9WgXcQ). With 64 possible characters per position, this provides 64^11 ≈ 73 quintillion unique IDs.

Options:

Random generation: Generate random 11-char string, check for collision

Counter + Base64: Similar to Snowflake, encode timestamp + machine ID + sequence

UUID shortened: Generate UUID, Base64 encode, truncate

Unlike Twitter's Snowflake IDs, YouTube IDs don't need to be time-sortable since videos are queried by creation timestamp, not ID order. Random IDs work fine and are simpler.

Storage Strategy

Data TypeStorage SolutionRationale
Video filesObject storage (S3/GCS)Designed for large binary files, highly durable
Video metadataSQL (MySQL/PostgreSQL)Structured data, ACID for ownership/permissions
ThumbnailsObject storage + metadata in BigtableBlob storage for images, fast lookup for metadata
User sessionsRedisFast lookups, can tolerate data loss
Search indexElasticsearchFull-text search on titles, descriptions, tags

Interview insight: Mention that YouTube uses Vitess (MySQL sharding middleware) to scale their relational database. This shows awareness of real-world solutions beyond generic "just shard it" answers.

Phase 3: API Design

We'll use REST for simplicity. In practice, YouTube uses gRPC internally for service-to-service communication.

Upload Video (Resumable)

Large file uploads require a two-step resumable protocol:

Step 1: Initiate upload

POST /api/v1/videos/upload

Headers: Authorization: Bearer <token>

Request Body:

{
  "title": "My Video",
  "description": "Description here",
  "tags": ["tech", "tutorial"],
  "file_size": 524288000,
  "privacy": "public" | "private" | "unlisted"
}

Response: 200 OK

{
  "video_id": "dQw4w9WgXcQ",
  "upload_url": "https://upload.youtube.com/v1/upload/dQw4w9WgXcQ"
}

Step 2: Upload chunks

PUT {upload_url}

Headers: Content-Range: bytes 0-5242879/524288000

Body: <binary chunk>

Response: 308 Resume Incomplete (or 200 OK when complete)

Resumable uploads are essential at scale. Users on flaky connections can resume from the last successful chunk. The upload service tracks progress and triggers transcoding only when all chunks are received.

Check Video Status

GET /api/v1/videos/{video_id}/status

Response: 200 OK

{
  "video_id": "dQw4w9WgXcQ",
  "status": "processing" | "published" | "failed",
  "progress": 75,
  "available_resolutions": ["360p", "480p"]  // Partial availability
}

Clients poll this endpoint or subscribe to webhooks to know when a video is ready.

Stream Video

GET /api/v1/videos/{video_id}/stream

Headers: Authorization: Bearer <token> (optional)

Query Parameters:

  • resolution: 360p | 480p | 720p | 1080p | 4k (optional, auto-selected)

Response: 200 OK

{
  "manifest_url": "https://cdn.youtube.com/.../manifest.mpd",
  "available_resolutions": ["360p", "480p", "720p", "1080p"],
  "duration": 300
}

The client uses the manifest URL to fetch video chunks via DASH or HLS adaptive streaming protocols. Seeking is done by requesting the segment that covers the target timestamp.

Search Videos

GET /api/v1/search?q={query}&cursor={page_token}&limit={limit}

Response: 200 OK

{
  "results": [
    {
      "video_id": "abc123",
      "title": "Matching Video",
      "thumbnail_url": "...",
      "channel_name": "Creator",
      "view_count": 1000000,
      "duration": 300,
      "upload_date": "2024-01-15"
    }
  ],
  "next_page_token": "..."
}

Use cursor-based pagination for search results. Offset-based (page=5) is expensive at scale—the database must scan and skip all previous rows. Cursors (opaque tokens encoding the last result's position) allow efficient range queries.

Phase 4: High-Level Design

Architecture Diagram

Data Stores

Video Processing

Application Services

Edge Layer

Clients

API + video requests

API + video requests

API + video requests

API requests

Video cache miss

Raw video

Enqueue job

Encoded videos

Thumbnails

Update status

Get metadata

Cache miss

Video URL

Web Browser

Mobile App

Smart TV

CDN / Edge Cache

Load Balancer

API Gateway

Upload Service

Streaming Service

Search Service

User Service

Message Queue

Transcoding Workers

Thumbnail Generator

Metadata DB

Blob Storage

Search Index

Redis Cache

Video Upload Flow

Client initiates upload: Sends metadata first, receives a video_id and resumable upload URL

Chunked upload: Client uploads video in chunks (enables resume on failure)

Raw storage: Video stored in temporary blob storage

Queue processing job: Message sent to transcoding queue

Transcoding: Workers convert video to multiple resolutions (360p, 480p, 720p, 1080p, 4K) and formats (H.264, VP9, AV1)

Thumbnail generation: Extract frames or accept user-uploaded thumbnails

Update metadata: Mark video as "published", store URLs for each resolution

CDN push (optional): For predicted popular videos, proactively push to CDN edge nodes

Why per-segment encoding? Videos are split into 4-10 second segments, each encoded independently. This enables:

Parallel processing across many workers

Adaptive bitrate streaming (switch quality mid-video)

Faster time-to-first-byte (start playing before full transcode)

Video Streaming Flow

Client requests video: API returns a manifest file (DASH .mpd or HLS .m3u8)

Manifest describes chunks: Lists URLs for each segment at each quality level

Adaptive bitrate: Client monitors bandwidth and requests appropriate quality chunks

CDN serves chunks: Most chunks served from edge cache, cache miss goes to origin

Manifest Example (simplified):

{
  "duration": 300,
  "segments": [
    {
      "start": 0,
      "duration": 4,
      "qualities": {
        "360p": "https://cdn.youtube.com/video123/seg0_360p.mp4",
        "720p": "https://cdn.youtube.com/video123/seg0_720p.mp4",
        "1080p": "https://cdn.youtube.com/video123/seg0_1080p.mp4"
      }
    },
    // ... more segments
  ]
}

Search Architecture

Index on upload: When a video is published, extract metadata (title, description, tags, auto-generated captions)

Inverted index: Elasticsearch maintains mapping from keywords to video IDs

Ranking factors: Relevance score + view count + recency + user engagement

Query flow: Search service queries Elasticsearch, enriches results with metadata from cache/DB

Phase 5: Deep Dives

Addressing Non-Functional Requirements

Low Latency (~200ms TTFB, ~1-2s startup)

Global CDN: Deploy edge servers in 100+ locations worldwide

Predictive caching: Push popular content to edge before requests arrive

Connection reuse: HTTP/2 or QUIC for faster connection establishment

Segment pre-fetch: Client fetches next segment while current one plays

High Availability (99.99% uptime)

Multi-region deployment: Active-active across 3+ regions

Graceful degradation: If high-res transcoding fails, serve lower resolutions that completed

Circuit breakers: Isolate failing services to prevent cascade

Data replication: 3x replication for blob storage, sync within a region + async cross-region for metadata

Massive Scale (250 Tbps bandwidth)

Tiered caching:

L1: ISP-level cache (Google Global Cache)

L2: Regional CDN PoPs

L3: Origin data centers

Storage tiering:

Hot: Popular videos on SSD-backed storage

Warm: Moderate traffic on HDD

Cold: Rarely accessed videos on tape/archive

Key Design Decisions

CDN Strategy: Build vs. Buy

ApproachProsCons
Public CDN (Akamai, CloudFlare)Quick to deploy, global coverageExpensive at YouTube's scale, less control
Private CDN (Google's approach)Optimized for video, cost-effective at scaleHuge upfront investment, complex operations

YouTube uses a hybrid: their own infrastructure + partnerships with ISPs (Google Global Cache boxes installed at ISP data centers).

Database: SQL vs. NoSQL

Video metadata: SQL (Vitess-sharded MySQL) - structured data, strong consistency for ownership/permissions

Comments: NoSQL (Cassandra) - high write volume, eventually consistent is fine

View counts: Redis + async flush to SQL - high throughput, approximate counts acceptable

User sessions: Redis - ephemeral, speed is priority

Handling Viral Videos

When a video suddenly goes viral:

Real-time popularity detection: Monitor view velocity

Automatic CDN promotion: Push to more edge locations

Origin shielding: Aggregate cache misses at regional level before hitting origin

Rate limiting: Protect origin from thundering herd

Duplicate Video Detection

10% of uploads are duplicates. At 6,000 videos/minute, that's 600 duplicate uploads per minute wasting storage and violating copyright.

Solutions:

Content fingerprinting: Hash video frames, compare against database (Content ID system)

Perceptual hashing: Detect near-duplicates (slightly modified videos)

Audio fingerprinting: Catch re-uploads with different video but same audio

View Count Accuracy

View counts must be:

Accurate: No inflated counts from bots

Real-time enough: Creators expect to see views increase

Scalable: Handle millions of increments per second

Solution:

Write increments to Redis (fast)

Batch flush to database every few seconds

Apply bot detection (rate limiting, behavioral analysis)

Show "approximate" counts for very recent videos

Common Pitfalls

Ignoring the upload:view ratio - This is 1:300. Design your read path to be 300x more robust than your write path.

Forgetting video processing time - Raw uploads need transcoding. This can take minutes to hours. Design for asynchronous processing with status updates.

Underestimating bandwidth costs - At YouTube's scale, bandwidth is the primary cost driver, not storage. CDN and ISP partnerships are critical.

Not considering mobile networks - Many users are on 3G/4G with variable bandwidth. Adaptive bitrate streaming is essential, not optional.

Treating all videos equally - 90% of views come from 10% of videos. Your caching and storage tiering strategy must account for this power law distribution.

Interview Checklist

Before concluding, verify you've covered:

Upload flow with resumable uploads and async processing

Streaming with adaptive bitrate (DASH/HLS)

CDN strategy for global low-latency delivery

Storage tiering for cost optimization

Video transcoding pipeline (multiple resolutions/codecs)

Search with inverted index

Handling viral videos / thundering herd

View count accuracy and bot prevention

Trade-off discussion (consistency vs. availability)

Summary

AspectDecisionRationale
UploadResumable chunked uploadHandle large files over unreliable connections
ProcessingAsync transcoding via message queueDecouple upload from encoding, parallel processing
StorageBlob storage + SQL + ElasticsearchRight tool for each data type
StreamingAdaptive bitrate (DASH/HLS) via CDNAdjust quality to bandwidth in real-time
CDNMulti-tier: ISP → Regional → OriginMinimize distance to users, reduce origin load
DatabaseVitess (sharded MySQL) + RedisScale relational data, sub-ms cache reads
ConsistencyEventual consistencyAcceptable for views/likes, prioritize availability

WhiteboardAuto-save enabled
Loading whiteboard…