Video Streaming Platform (YouTube / Netflix / Hotstar)
Chapter 1 — Product Requirements, Scale, and Design Targets
This chapter defines what kind of video platform we are building and the physical limits it must survive.
Everything that follows in this book is constrained by these numbers.
We are designing a global video streaming platform in the class of YouTube, Netflix, and Amazon Prime Video that supports:
- User-generated uploads
- Studio-grade content
- On-demand playback
- Live streaming
- Offline viewing
- Multi-device continuity
The system must feel instant, reliable, and smooth for hundreds of millions of users.
1. Functional Requirements
The platform must support the following core user actions:
Content creators
- Upload raw video files of arbitrary length and size
- See upload progress and failure recovery
- Have videos transcoded into multiple qualities
- Publish videos to be watchable by viewers
Viewers
- Discover and open a video
- Start playback in under 2 seconds
- Seek, pause, and change quality without visible glitches
- Continue watching the same video on another device
- Download videos for offline playback
- Watch live streams with minimal delay
Platform
- Track watch time, views, and engagement
- Recommend content
- Enforce regional, subscription, and DRM rules
- Protect against piracy and abuse
2. Non-Functional Requirements
These are the invisible constraints that shape the architecture.
Latency
- Time-to-first-frame: < 2 seconds for most users
- Seek latency: < 500 ms
- Live stream delay: < 5 seconds from broadcaster to viewer
Reliability
- A CDN edge failure must not stop playback
- Analytics outages must not stop playback
- Backend outages should only block new playback starts, not active streams
Consistency
- Resume position can be eventually consistent
- View counts can be delayed
- DRM enforcement must be strongly consistent
Scalability
- Must support global viral traffic spikes
- One video can be watched by tens of millions simultaneously
3. Traffic Model
We design for a YouTube-scale service.
Users
- 300 million daily active users
- 50 million concurrent viewers at peak
Playback
- Average session: 30 minutes
- Average bitrate: 3 Mbps
- Peak bitrate: 15–25 Mbps (4K)
This means peak outbound traffic can exceed:
50M users × 3 Mbps = 150 Tbps
This immediately tells us:
No backend service can ever sit in the video data path.
Only CDNs can handle this scale.
4. Upload Model
Creators upload far fewer videos than viewers watch.
- 10 million uploads per day
- Average file size: 1–3 GB
- Peak upload throughput: ~500 Gbps globally
Uploads are heavy but not latency-sensitive.
They can be queued, retried, and processed asynchronously.
5. Storage Model
We store multiple versions of every video.
If a 1-hour video is transcoded into:
- 4K
- 1080p
- 720p
- 480p
- 360p
And segmented into 4-second chunks, a single video produces thousands of objects.
At YouTube scale:
- Exabytes of cold storage
- Petabytes of hot CDN cache
This forces us to use:
- Cheap object storage (S3-like)
- Aggressive CDN caching
- Versioned immutable files
6. Design Targets
These numbers lock in the architecture.
| Constraint | Consequence |
|---|---|
| 150+ Tbps video traffic | Video must flow only through CDNs |
| Millions of concurrent users | Backend must be stateless & horizontally scalable |
| Billions of video segments | Storage must be object-based, not filesystem-based |
| UI must never freeze | Player must run off the main thread |
| Analytics can lag | Events must be async via Kafka-style logs |
These constraints will force:
- A two-plane architecture (control vs data)
- A frontend-driven control loop
- A CDN-first delivery model
End of Chapter 1.
Chapter 2 — Global Platform Architecture
This chapter defines the full system at 30,000 feet before we dive into any single pipeline.
Every service, database, CDN, and client lives inside this picture.
The most important idea is this:
Video bytes and playback control must never flow through the same systems.
This is the single architectural rule that allows the platform to scale to hundreds of millions of users.
1. The Two-Plane Architecture
The platform is split into two planes:
-
Control Plane
Handles:- Authentication
- Authorization
- Metadata
- Manifests
- DRM
- Analytics events
-
Data Plane
Handles:- Video bytes
- Audio bytes
- Subtitle bytes
- Segment delivery
The control plane is backend-heavy.
The data plane is CDN-heavy.
2. High-Level System Diagram
┌─────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web App │ │ Mobile App │ │ Smart TV │ │
│ │ (React/Vue) │ │ (iOS/Android)│ │ App │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼──────────────────────────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌────────▼────────┐
│ API Gateway │
│ (Rate Limiting,│
│ Auth, Routing)│
└────────┬────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐
│ Video │ │ Metadata │ │ User │
│ Upload │ │ Service │ │ Service │
│ Service │ │ │ │ │
└──────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
│ │ │
┌──────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐
│ Transcode │ │ Comment │ │ Recommend. │
│ Service │ │ Service │ │ Service │
│ (Queue) │ │ │ │ (ML) │
└──────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
│ │ │
┌─────────▼─────────────────▼──────────────────▼─────────────┐
│ DATA LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ SQL │ │ NoSQL │ │ Object │ │ Cache │ │
│ │ (RDS) │ │(Cassandra│ │ Storage │ │ (Redis) │ │
│ │ │ │/DynamoDB)│ │ (S3) │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CDN LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ CDN Edge│ │ CDN Edge│ │ CDN Edge│ │ CDN Edge│ │
│ │ (US) │ │ (EU) │ │ (APAC) │ │ (Others)│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ BACKGROUND JOBS │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Thumbnail │ │ View │ │Analytics │ │ CDN │ │
│ │Generator │ │ Counter │ │Processor │ │ Warmer │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
The backend never streams video.
It only gives the client permission and coordinates where to get it.
3. Why This Architecture Exists
If even 1% of video traffic hit the backend:
150 Tbps × 1% = 1.5 Tbps
No database, API layer, or VPC can survive that.
So:
- Backend gives URLs
- CDN gives bytes
This separation makes:
- Video cheap
- Latency low
- Scaling trivial
4. Where the Frontend Fits
The frontend is not “just a UI”.
It is the playback brain.
It decides:
- Which quality to use
- When to prefetch
- When to pause
- When to seek
- When to retry
The backend only provides:
- The map (manifest)
- The rules (DRM, region, quality caps)
This makes the platform:
- Highly available
- Resistant to partial failures
- Cheap to operate
5. Failure Boundaries
This architecture enforces strong blast-radius isolation.
If this fails:
| Component | What breaks |
|---|---|
| CDN edge | Player switches to another edge |
| Metadata DB | New playback may fail |
| Analytics | No metrics, playback continues |
| Kafka | Data piles up, playback continues |
Playback is protected by design.
End of Chapter 2.
Chapter 3 — The Ingestion & Transcoding Pipeline (Merged Logic)
To handle millions of hours of uploads globally, the system must treat ingestion as an asynchronous, fault-tolerant factory.
3.1 Resumable Upload Flow
We avoid simple POST requests for large files. Instead, the Frontend utilizes the TUS Protocol or S3 Multipart Upload to ensure reliability.
- Handshake: The Client requests a unique
videoIdand a pre-signeduploadUrlfrom the Upload Service. - Chunking: The Frontend Client breaks the video file into small, equal-sized chunks (e.g., 5MB each).
- Transmission: Chunks are sent sequentially or in parallel with a checksum. If the connection drops, the client queries the server for the last successful byte offset and resumes.
3.2 Transcoding & Processing (The "Refinery")
Once the raw file is stored in Object Storage (S3), an event triggers the Transcoding Service.
The Transcoding Workflow:
- Job Orchestration: A Message Queue (Kafka/SQS) holds transcoding tasks to decouple the upload from processing.
-
Parallel Workers: Distributed workers (using FFmpeg) pick up jobs to generate the Quality Ladder:
- Resolutions: 4K (2160p), 1080p, 720p, 480p, 360p, 240p.
- Codecs: H.264 (Compatibility), H.265/VP9 (Efficiency).
-
Segmenting: The workers break each version into 5-10 second segments (
.tsor.m4sfiles) and generate the HLS/DASH Manifests (.m3u8or.mpd).
3.3 Ancillary Background Jobs:
- Thumbnail Generation: Extracting keyframes at specific intervals to generate "Preview Sprites" for the frontend seek-bar.
- Content Moderation: Running ML models to scan for spam, copyright violations, or prohibited content.
- CDN Warming: Proactively pushing the newly created manifest and initial segments to edge caches in regions where the creator has a high following.
3.4 Ingestion Architecture (ASCII)
┌─────────┐ ┌────────────────┐ ┌──────────────┐ ┌───────────────┐
│ Creator │ ───> │ Upload Service │ ───> │ Raw S3 Bucket│ ───> │ Message Queue │
└─────────┘ └────────────────┘ └──────────────┘ └───────┬───────┘
│
┌───────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ Metadata DB │ <─── │ Storage (CDN)│ <─── │ Transcode Worker│ <─────┘
└───────────────┘ └──────────────┘ └─────────────────┘
End of Chapter 3
Chapter 4 — The Frontend Player Engine & ABR Logic (The "Spine" Core)
This chapter addresses the "Brain" of the system: the Client-Side Player. We treat the player not as a UI component, but as a resource orchestrator that manages the hardware-software bridge.
4.1 Architecture of a Production-Grade Player
To prevent UI jank, we separate the playback logic from the rendering thread.
- The Controller: Coordinates between the UI, the network, and the hardware buffer.
-
The Buffering Engine (MSE): Utilizes Media Source Extensions (MSE) to feed binary video segments into the browser's
<video>tag. - The Decryption Module (EME): Handles Encrypted Media Extensions (EME) for DRM-protected content (Widevine/FairPlay).
4.2 Adaptive Bitrate (ABR) Heuristics
The player must decide which quality to download next without human intervention. We use a Hybrid Algorithm:
- Throughput-Based: Measures the download speed of the last few segments.
- Buffer-Based (BBA): Measures how many seconds of video are currently stored in RAM.
- Safe Zone (30s+): Stay at High Quality.
- Danger Zone (<10s): Aggressively switch to Low Quality to avoid a "Spinner."
4.3 Handling the "Thin Client" vs. "Thick Client"
Staff engineers must account for hardware diversity.
| Feature | Thick Client (Desktop/PS5) | Thin Client (2018 Smart TV) |
|---|---|---|
| Logic | Runs full ABR heuristics locally. | Server dictates the bitrate. |
| Threading | Uses Web Workers for parsing. | Single-threaded, synchronous. |
| Buffering | Large 60s forward buffer. | Minimal 5-8s buffer to avoid RAM crash. |
4.4 The Internal Player State Machine
The player does not just "Play" or "Pause." It transitions through complex states:
- IDLE: Resource allocation.
-
LOADING: Fetching the Master Manifest (
.m3u8). - STALLED: Buffer empty; UI shows "spinner," ABR shifts to lowest bitrate.
- SEEKING: Clearing the current buffer and performing a "Cold Start" at the new timestamp.
4.5 Performance Optimization: The "Zero-Latency" Goal
- VTT (Video Thumbnails): Fetching a single "Sprite Sheet" image for the seek-bar rather than individual frames.
-
Pre-fetching: Using
<link rel="prefetch">for the first 3 segments of the "Next Video" in a playlist. - Request Interleaving: Prioritizing the video chunk download over secondary metadata (like comments or likes) on slow networks.
[ UI: React ] <--- (Events) --- [ Player State Manager ]
^
|
[ Adaptive Bitrate Logic ] <---> [ Segment Downloader ]
|
[ Media Source Extensions ] <----------+
|
v
[ Hardware Decoder ] --> [ Screen ]
End of Chapter 4
Chapter 5 — Metadata DB, Schema, and Discovery (Merged Logic)
While the video bytes live on the CDN, the Metadata Plane handles the "Brain" of the platform: users, subscriptions, and video details. This chapter merges the SQL/NoSQL strategy from the Backend Doc with the Discovery requirements of the Frontend Spine.
5.1 The Data Modeling Strategy
We use a polyglot persistence model to balance Acid Transactions (for ownership) with High Availability (for views/likes).
Primary Database (PostgreSQL/Spanner)
-
Users Table:
userId, email, channelName, subscriptionLevel. -
Videos Table:
videoId, creatorId, title, description, manifestUrl, thumbnailUrL, status (Processing/Live/Private). -
Subscriptions:
(followerId, creatorId)with composite unique index.
High-Frequency Metadata (Cassandra/BigTable)
- View Counts & Likes: These require massive write-throughput. We use an Eventual Consistency model where counts are buffered in Redis and flushed to Cassandra.
-
Comments: Stored as a partitioned wide-column store by
videoId.
5.2 Discovery & Search Architecture
The Frontend "Home Feed" and "Search Bar" are powered by a specialized indexing layer.
- Search Index (Elasticsearch/OpenSearch):
- Whenever a video is transcoded, the Metadata Service pushes a document to Elasticsearch.
- Insight: We use "Fuzzy Matching" and "Autocomplete" to handle typos in the frontend search bar.
- Recommendation Engine:
- Feature Store: Collects user signals (watch time, skipped videos, likes).
-
Ranking Service: A machine learning model that generates a list of
videoIds for the user’s home feed.
5.3 Scalability Trade-offs
| Decision | Choice | Why? |
|---|---|---|
| Video ID | UUID/Snowflake | Prevents ID predictable scraping and allows distributed generation. |
| Consistency | Eventual | A 1-second delay in "Like" count visibility is better than a system crash during a viral video. |
| Database Sharding | By VideoId | Ensures that metadata for a single viral video doesn't overwhelm a single DB node. |
5.4 The API Handshake (Frontend Fetching)
The Frontend does not "join" tables. It calls a BFF (Backend-for-Frontend) or GraphQL Gateway:
-
GET /v1/video/:idreturns a pre-aggregated JSON object containing video details, creator info, and the HLS manifest URL. - Prefetching Logic: When the user hovers over a thumbnail, the frontend pre-warps the Metadata Cache to make the actual click feel instant.
[ Metadata Flow ]
[ Client ] <---(GraphQL/REST)---> [ Metadata Service ]
|
+-------------------------+-------------------------+
| | |
[ PostgreSQL ] [ Redis Cache ] [ Elasticsearch ]
(Users/Permissions) (Hot Metadata) (Video Search)
End of Chapter 5
Chapter 6 — State Management & Multi-Device Resume Sync
In a global platform, "State" exists in three places: the Local UI, the Video Player, and the Cloud. Maintaining a seamless "Continue Watching" experience requires a sophisticated synchronization strategy.
6.1 The State Hierarchy
- Volatile State (UI): Search queries, hover states, menu toggles. Stored in React State / Signals.
- Player State: Current playback timestamp, volume, selected quality. Stored in a specialized Player Controller.
- Persistent State: Watch history, "Resume" points, User preferences. Stored in the Cloud.
6.2 The "Resume-Sync" Pipeline
How does Netflix know you stopped at 12:45 on your TV and show it on your phone instantly?
- Client-Side Heartbeat: The Player Engine emits a "Pulse" event every 5-10 seconds.
- Throttling & Batching: To avoid DDOSing the backend, the Frontend batches these pulses. We don't send an API call for every second played.
- The Write-Ahead Log (WAL): The Backend receives the pulse and appends it to a high-speed log (Kafka).
- The Sync Store: A high-availability Key-Value store (Redis/Cassandra) updates the
last_watched_posfor theuserId:videoIdpair.
6.3 Handling Conflicts (The Edge Case)
If a user is watching on two devices simultaneously:
- Conflict Resolution: We follow a Last-Write-Wins (LWW) or Max-Timestamp logic.
-
Race Conditions: If the user closes the app suddenly, we utilize the
navigator.sendBeacon()API or aService Workerto send a "Final Pulse" before the process is killed.
6.4 Local State Persistence (Offline Mode)
For the "Partial Offline Download" requirement:
- IndexedDB: We store downloaded video segments and their metadata in the browser's IndexedDB.
- Background Sync: When the user goes back online, a Service Worker triggers a background sync to upload any "Watch History" accumulated while offline.
6.5 State Management Architecture (ASCII)
[ Device A ] [ Device B ]
| |
(Heartbeat: 10s) (Fetch Resume)
| |
v v
[ API Gateway ] ───> [ Redis / Cassandra ] <── [ Metadata API ]
| (Resume Store)
+───> [ Kafka ] ───> [ Analytics DB ]
End of Chapter 6
Chapter 7 — Global Distribution & CDN Strategy
We recognize that the "Cloud" is too slow for video. To achieve a <500ms TTFF (Time to First Frame), we must move the data as close to the user's ISP as possible using a multi-tiered distribution strategy.
7.1 Multi-Tier CDN Architecture
We do not rely on a single origin. We use a layered approach:
- Origin Server (S3): The source of truth for all transcoded segments.
- Regional Edges: Larger caches that store 80% of popular content within a geographic region (e.g., US-East).
- Local Edge (PoPs): Small, highly distributed servers inside local ISPs. These store the "Top 10%" viral videos to ensure zero-buffering for the most-watched content.
7.2 Cache Invalidation vs. Short TTLs
Video segments are Immutable. Once segment_101.ts is created, it never changes.
- Strategy: We set an infinitely long TTL for video segments.
-
The Manifest Problem: Unlike segments, the Manifest (
.m3u8) is dynamic (especially for Live). We use a short TTL (1-2s) for manifests or a Cache-Control: no-cache strategy to ensure the player always knows the latest state of the stream.
7.3 Geo-Routing & Request Steering
When a user hits "Play," the system must decide which CDN to use:
- Anycast DNS: Routes the user to the nearest IP address.
- Latency-Based Routing: The Backend Metadata API provides a manifest URL pointing to the CDN with the lowest current latency for that user's specific IP.
7.4 Content Steering (Fault Tolerance)
What if a major CDN provider (like Akamai or Cloudflare) goes down?
-
Client-Side Steering: The manifest contains URLs for multiple CDNs. If the Frontend Player detects a
5xxerror or a timeout from CDN A, it automatically fails over to CDN B without stopping the video.
7.5 The "Hot" Video Problem (Thundering Herd)
When a viral video is released, millions of people request the same segment at the same millisecond.
- Request Collapsing: The CDN Edge ensures that if 1,000 requests come in for the same segment, it only sends one request back to the origin, then broadcasts the result to all 1,000 users.
7.6 Distribution Architecture (ASCII)
[ Origin S3 ]
|
+-----> [ Regional Cache (London) ]
| |
| +-----> [ Local PoP (UK ISP) ] ---> [ Viewer A ]
| +-----> [ Local PoP (EU ISP) ] ---> [ Viewer B ]
|
+-----> [ Regional Cache (Mumbai) ]
|
+-----> [ Local PoP (India ISP) ] --> [ Viewer C ]
End of Chapter 7
Chapter 8 — Security, DRM Handshake & Access Control (Merged)
For a video platform, security is more than just an Auth token; it is an end-to-end chain of trust that protects billions of dollars in intellectual property while ensuring seamless user access.
8.1 The Access Control Handshake
We use a decoupled security model where the Backend defines the policy and the CDN enforces it.
- Authentication: Users authenticate via OAuth2/OIDC. The frontend stores a short-lived JWT in a
Secure; HttpOnlycookie. - Authorization: When a user clicks "Play," the Frontend requests a Signed URL or Cookie from the Backend.
- CDN Enforcement: The CDN Edge validates the signature (HMAC) on the request. If the signature is expired or the IP doesn't match, the request is rejected at the edge, saving origin bandwidth.
8.2 Digital Rights Management (DRM)
To prevent stream ripping and unauthorized screen recording, we implement a DRM Handshake using the browser's EME (Encrypted Media Extensions).
-
The Components:
- CDM (Content Decryption Module): A sandbox in the browser/OS that handles decryption keys.
- License Server: A backend service that verifies the user's right to watch and issues a decryption key.
-
The Flow:
- The Player detects encrypted segments in the manifest.
- The Player sends a License Request (containing the device's hardware ID) to our License Server.
- The Server returns an encrypted key.
- The CDM decrypts the pixels directly in the GPU memory, ensuring the "Clear Text" video never touches the Javascript heap.
8.3 Protecting the API & Metadata
- Rate Limiting: Using a Leaky Bucket algorithm at the API Gateway to prevent "View Count" manipulation and scraping.
- CORS & CSRF: Strict Origin policies to ensure only our official web/mobile clients can initiate playback.
- Geofencing: Backend checks the user's Geo-IP against the video's distribution rights before issuing a Signed URL.
8.4 Security Architecture (ASCII)
[ Browser / CDM ] [ API Gateway ] [ License Server ]
| | |
(1) Get Signed URL -------------> | (Verify JWT & Rights) |
| <--- (Signed URL) -------| |
| | |
(2) Request Segments (CDN) | |
| | |
(3) EME License Request ----------------------------------> |
| <--- (Encrypted Key) ----------------------------- |
| | |
(4) Decrypt & Render | |
End of Chapter 8
Chapter 9 — Real-time Engagement & Live Streaming Deep-Dive
Live streaming is the "final boss" of video engineering. It requires shifting from a "pull-based" VOD model to a "push-based" real-time model where latency is measured in milliseconds, not seconds.
9.1 The Live Ingestion Pipeline
Unlike VOD, where we transcode the whole file, Live requires Streaming Transcoding.
- Ingestion (RTMP/SRT): The creator's encoder (like OBS) pushes a continuous stream to our Live Ingest Service.
- Transmuxing: The backend converts the incoming stream into tiny LL-HLS (Low-Latency HLS) or DASH chunks (typically 1-second segments).
- The Live Edge: The CDN must be optimized to never cache the "Manifest" for more than a fraction of a second, ensuring users are always at the "Live Edge."
9.2 Real-time Engagement (Comments & Likes)
To handle viral moments (e.g., a sports final with 10M+ viewers), we cannot use standard polling.
- WebSocket Gateways: Maintain persistent connections for the "Live Chat."
-
Pub/Sub (Kafka/Redis): When a comment is posted:
- The Comment Service writes to a DB.
- The event is published to a Redis Pub/Sub topic.
- The WebSocket Gateway "fans out" the message to all connected viewers of that specific
videoId.
- Throttling & Sampled Likes: For massive streams, we don't show every single "Like" in real-time. We aggregate and sample at the edge to prevent the UI from becoming a resource hog.
9.3 DVR & Catch-up Capability
Systems allow users to "Rewind" a live stream.
- Rolling Window: The CDN and Origin keep the last 2 hours of segments available.
-
Manifest Manipulation: The Frontend Player detects the
EXT-X-PLAYLIST-TYPE:EVENTtag and allows the seek-bar to move backward into the cached segments while the stream continues at the edge.
9.4 Challenge: The "Herd" Effect
When the stream ends, 10 million people hit the "Home" button at once.
- Solution: We use Staggered Reconnection and Jitter in our frontend retry logic to ensure that a massive audience doesn't crash the discovery services upon exit.
9.5 Live & Engagement Architecture (ASCII)
[ Creator ] --(RTMP)--> [ Ingest ] --+--> [ Transcoder ] --(HLS)--> [ CDN ]
|
+--> [ Frame Capture ] (Thumbnails)
[ Viewer ] <--(WS)--> [ Gateway ] <--(Pub/Sub)-- [ Engagement Service ]
| |
+----(GET/POST)----------------------------------+
End of Chapter 9
Chapter 10 — Cost Model, Performance Trade-offs, and Final Architecture
In a interview, the final goal is to prove that the system is not just technically sound, but economically viable. This chapter explains the "Business Logic" of our architectural choices.
10.1 The Economic Model of Video
The biggest costs in this system are Bandwidth, CDN Egress, and Storage. Everything else (CPU for APIs, Database lookups) is negligible by comparison.
- The "Thick Client" Strategy: By moving ABR logic and buffering to the frontend, we utilize the user's local CPU for free, rather than paying for server-side logic.
- Storage Tiering: We use S3 Intelligent-Tiering. Raw videos move to "Glacier" (Cold) after 30 days, while transcoded fragments stay in "S3 Standard" (Hot) for CDN delivery.
10.2 Performance Trade-offs (Decisions)
| Decision | Choice | The Trade-off |
|---|---|---|
| Consistency | Eventual | We sacrifice perfect counters (Likes/Views) for absolute availability of playback. |
| Latency | Buffering | We intentionally delay playback start by 2-3 segments to ensure a "Stall-free" experience. |
| Resolution | Transcoding | We spend money upfront on transcoding to save money on bandwidth later (by serving smaller files). |
10.3 Summary of the "Sweet Spot" Architecture
This design succeeds because it separates concerns into three distinct layers:
- The Client (Spine): Controls reality. It handles the network's unpredictability and manages the hardware resources.
- The Edge (CDN): Controls scale. It brings the bits to the user's doorstep, bypassing the slow public internet.
- The Backend (Foundation): Controls policy. It handles metadata, security keys, and the heavy lifting of transcoding.
10.4 Final Conclusion for the Interview
"We have built a system that is Offline-First, Global by Design, and Economically Optimized. By leveraging a metadata-driven ingestion pipeline and a sophisticated client-side player engine, we ensure that the platform remains performant for the next 100M users, regardless of their device or network speed."
📄 Document Audit Checklist
- [x] Ingestion: Resumable, chunked, and multi-bitrate.
- [x] Playback: ABR, MSE/EME, and Frame-accurate seeking.
- [x] Discovery: Decoupled metadata DB with search indexing.
- [x] Scale: Multi-tier CDN and Edge-caching.
- [x] Security: Signed URLs and DRM Handshake.
- [x] Consistency: Eventual consistency for engagement; Strong for auth.
End of Chapter 10.
Chapter 11 — The End-to-End Playback Lifecycle: A Narrative Walkthrough
To tie the previous 10 chapters together, we will trace the journey of a single user (Alice) watching a single video (4K "Nature Documentary") from the moment she hits "Play" to the moment she switches devices.
11.1 Phase 1: The Handshake (Chapters 5 & 8)
- Action: Alice clicks the "Play" button on her React-based Discovery Feed.
-
The Logic: 1. The Frontend calls the Discovery API (Chapter 5) to fetch video metadata. 2. Simultaneously, the Security Service (Chapter 8) issues a Signed Manifest URL and a DRM License Challenge. 3. The browser receives a JSON response containing the Master Manifest URL (
.m3u8).
11.2 Phase 2: Orchestration & ABR (Chapter 4)
- Action: The Player Controller (Chapter 4) takes over.
-
The Logic:
- The player downloads the Master Manifest.
- The ABR Logic Unit (Chapter 4) detects Alice is on a 50Mbps connection and chooses the 4K variant.
- The Segment Downloader maps the 4K variant to a specific CDN Edge location (Chapter 7).
11.3 Phase 3: The Data Flow (Chapter 3 & 7)
- Action: Pixels move from the Edge to the Screen.
-
The Logic:
- The browser requests
segment_001.tsfrom the CDN. - The CDN Edge (Chapter 7) serves the file from its SSD cache (originally generated by the Transcoder in Chapter 3).
- The binary data is fed into the Media Source Extensions (MSE) buffer (Chapter 4).
- The CDM/DRM Module (Chapter 8) decrypts the data in the hardware, and Alice sees the first frame.
- The browser requests
11.4 Phase 4: Reality Reporting (Chapter 6 & 9)
- Action: The system "remembers" Alice’s experience.
-
The Logic:
- Every 10 seconds, the Frontend emits a Heartbeat (Chapter 6).
- This pulse updates the Resume Store (Chapter 6) so Alice can switch to her iPad later.
- High-volume signals like "Likes" or "Real-time Views" flow through Kafka to update the global counters (Chapter 9).
11.5 Phase 5: The Handover
- Action: Alice closes her laptop and opens her phone.
-
The Logic:
- The Phone app calls the Metadata API.
- It receives the
last_watched_posfrom the Resume Store. - The Player Engine seeks to 12:45, and the cycle repeats instantly.
Summary: The Core Invariant
This narrative proves that our architecture is not just a list of services, but a synchronized loop.
- The Backend defines what can be watched.
- The CDN handles the weight of the bits.
- The Frontend owns the decision-making logic.
End of Chapter 11.

