Video Streaming Platform (YouTube / Netflix / Hotstar)

Chapter 1 — Product Requirements, Scale, and Design Targets

This chapter defines what kind of video platform we are building and the physical limits it must survive.

Everything that follows in this book is constrained by these numbers.

We are designing a global video streaming platform in the class of YouTube, Netflix, and Amazon Prime Video that supports:

User-generated uploads
Studio-grade content
On-demand playback
Live streaming
Offline viewing
Multi-device continuity

The system must feel instant, reliable, and smooth for hundreds of millions of users.

1. Functional Requirements

The platform must support the following core user actions:

Content creators

Upload raw video files of arbitrary length and size
See upload progress and failure recovery
Have videos transcoded into multiple qualities
Publish videos to be watchable by viewers

Viewers

Discover and open a video
Start playback in under 2 seconds
Seek, pause, and change quality without visible glitches
Continue watching the same video on another device
Download videos for offline playback
Watch live streams with minimal delay

Platform

Track watch time, views, and engagement
Recommend content
Enforce regional, subscription, and DRM rules
Protect against piracy and abuse

2. Non-Functional Requirements

These are the invisible constraints that shape the architecture.

Latency

Time-to-first-frame: < 2 seconds for most users
Seek latency: < 500 ms
Live stream delay: < 5 seconds from broadcaster to viewer

Reliability

A CDN edge failure must not stop playback
Analytics outages must not stop playback
Backend outages should only block new playback starts, not active streams

Consistency

Resume position can be eventually consistent
View counts can be delayed
DRM enforcement must be strongly consistent

Scalability

Must support global viral traffic spikes
One video can be watched by tens of millions simultaneously

3. Traffic Model

We design for a YouTube-scale service.

Users

300 million daily active users
50 million concurrent viewers at peak

Playback

Average session: 30 minutes
Average bitrate: 3 Mbps
Peak bitrate: 15–25 Mbps (4K)

This means peak outbound traffic can exceed:

50M users × 3 Mbps = 150 Tbps

This immediately tells us:
No backend service can ever sit in the video data path.
Only CDNs can handle this scale.

4. Upload Model

Creators upload far fewer videos than viewers watch.

10 million uploads per day
Average file size: 1–3 GB
Peak upload throughput: ~500 Gbps globally

Uploads are heavy but not latency-sensitive.
They can be queued, retried, and processed asynchronously.

5. Storage Model

We store multiple versions of every video.

If a 1-hour video is transcoded into:

4K
1080p
720p
480p
360p

And segmented into 4-second chunks, a single video produces thousands of objects.

At YouTube scale:

Exabytes of cold storage
Petabytes of hot CDN cache

This forces us to use:

Cheap object storage (S3-like)
Aggressive CDN caching
Versioned immutable files

6. Design Targets

These numbers lock in the architecture.

Constraint	Consequence
150+ Tbps video traffic	Video must flow only through CDNs
Millions of concurrent users	Backend must be stateless & horizontally scalable
Billions of video segments	Storage must be object-based, not filesystem-based
UI must never freeze	Player must run off the main thread
Analytics can lag	Events must be async via Kafka-style logs

These constraints will force:

A two-plane architecture (control vs data)
A frontend-driven control loop
A CDN-first delivery model

End of Chapter 1.

Chapter 2 — Global Platform Architecture

This chapter defines the full system at 30,000 feet before we dive into any single pipeline.
Every service, database, CDN, and client lives inside this picture.

The most important idea is this:

Video bytes and playback control must never flow through the same systems.

This is the single architectural rule that allows the platform to scale to hundreds of millions of users.

1. The Two-Plane Architecture

The platform is split into two planes:

Control Plane

Handles:
- Authentication
- Authorization
- Metadata
- Manifests
- DRM
- Analytics events
Data Plane

Handles:
- Video bytes
- Audio bytes
- Subtitle bytes
- Segment delivery

The control plane is backend-heavy.

The data plane is CDN-heavy.

2. High-Level System Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                            CLIENT LAYER                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                  │
│  │   Web App    │  │  Mobile App  │  │   Smart TV   │                  │
│  │  (React/Vue) │  │ (iOS/Android)│  │     App      │                  │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘                  │
└─────────┼──────────────────┼──────────────────┼──────────────────────────┘
          │                  │                  │
          └──────────────────┼──────────────────┘
                             │
                    ┌────────▼────────┐
                    │   API Gateway   │
                    │  (Rate Limiting,│
                    │   Auth, Routing)│
                    └────────┬────────┘
                             │
          ┌──────────────────┼──────────────────┐
          │                  │                  │
   ┌──────▼──────┐    ┌─────▼──────┐    ┌─────▼──────┐
   │   Video     │    │  Metadata  │    │   User     │
   │  Upload     │    │  Service   │    │  Service   │
   │  Service    │    │            │    │            │
   └──────┬──────┘    └─────┬──────┘    └─────┬──────┘
          │                 │                  │
          │                 │                  │
   ┌──────▼──────┐    ┌─────▼──────┐    ┌─────▼──────┐
   │  Transcode  │    │  Comment   │    │ Recommend. │
   │   Service   │    │  Service   │    │  Service   │
   │  (Queue)    │    │            │    │   (ML)     │
   └──────┬──────┘    └─────┬──────┘    └─────┬──────┘
          │                 │                  │
          │                 │                  │
┌─────────▼─────────────────▼──────────────────▼─────────────┐
│                     DATA LAYER                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │   SQL    │  │  NoSQL   │  │  Object  │  │  Cache   │  │
│  │   (RDS)  │  │(Cassandra│  │ Storage  │  │ (Redis)  │  │
│  │          │  │/DynamoDB)│  │   (S3)   │  │          │  │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    CDN LAYER                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │  CDN Edge│  │  CDN Edge│  │  CDN Edge│  │  CDN Edge│  │
│  │   (US)   │  │   (EU)   │  │  (APAC)  │  │  (Others)│  │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                  BACKGROUND JOBS                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │Thumbnail │  │  View    │  │Analytics │  │  CDN     │  │
│  │Generator │  │ Counter  │  │Processor │  │ Warmer   │  │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  │
└─────────────────────────────────────────────────────────────┘

The backend never streams video.

It only gives the client permission and coordinates where to get it.

3. Why This Architecture Exists

If even 1% of video traffic hit the backend:

150 Tbps × 1% = 1.5 Tbps

No database, API layer, or VPC can survive that.

So:

Backend gives URLs
CDN gives bytes

This separation makes:

Video cheap
Latency low
Scaling trivial

4. Where the Frontend Fits

The frontend is not “just a UI”.
It is the playback brain.

It decides:

Which quality to use
When to prefetch
When to pause
When to seek
When to retry

The backend only provides:

The map (manifest)
The rules (DRM, region, quality caps)

This makes the platform:

Highly available
Resistant to partial failures
Cheap to operate

5. Failure Boundaries

This architecture enforces strong blast-radius isolation.

If this fails:

Component	What breaks
CDN edge	Player switches to another edge
Metadata DB	New playback may fail
Analytics	No metrics, playback continues
Kafka	Data piles up, playback continues

Playback is protected by design.

End of Chapter 2.

Chapter 3 — The Ingestion & Transcoding Pipeline (Merged Logic)

To handle millions of hours of uploads globally, the system must treat ingestion as an asynchronous, fault-tolerant factory.

3.1 Resumable Upload Flow

We avoid simple POST requests for large files. Instead, the Frontend utilizes the TUS Protocol or S3 Multipart Upload to ensure reliability.

Handshake: The Client requests a unique videoId and a pre-signed uploadUrl from the Upload Service.
Chunking: The Frontend Client breaks the video file into small, equal-sized chunks (e.g., 5MB each).
Transmission: Chunks are sent sequentially or in parallel with a checksum. If the connection drops, the client queries the server for the last successful byte offset and resumes.

3.2 Transcoding & Processing (The "Refinery")

Once the raw file is stored in Object Storage (S3), an event triggers the Transcoding Service.

The Transcoding Workflow:

Job Orchestration: A Message Queue (Kafka/SQS) holds transcoding tasks to decouple the upload from processing.
Parallel Workers: Distributed workers (using FFmpeg) pick up jobs to generate the Quality Ladder:
- Resolutions: 4K (2160p), 1080p, 720p, 480p, 360p, 240p.
- Codecs: H.264 (Compatibility), H.265/VP9 (Efficiency).
Segmenting: The workers break each version into 5-10 second segments (.ts or .m4s files) and generate the HLS/DASH Manifests (.m3u8 or .mpd).

3.3 Ancillary Background Jobs:

Thumbnail Generation: Extracting keyframes at specific intervals to generate "Preview Sprites" for the frontend seek-bar.
Content Moderation: Running ML models to scan for spam, copyright violations, or prohibited content.
CDN Warming: Proactively pushing the newly created manifest and initial segments to edge caches in regions where the creator has a high following.

3.4 Ingestion Architecture (ASCII)

┌─────────┐      ┌────────────────┐      ┌──────────────┐      ┌───────────────┐
│ Creator │ ───> │ Upload Service │ ───> │ Raw S3 Bucket│ ───> │ Message Queue │
└─────────┘      └────────────────┘      └──────────────┘      └───────┬───────┘
                                                                       │
┌───────────────┐      ┌──────────────┐      ┌─────────────────┐       │
│ Metadata DB   │ <─── │ Storage (CDN)│ <─── │ Transcode Worker│ <─────┘
└───────────────┘      └──────────────┘      └─────────────────┘

End of Chapter 3

Chapter 4 — The Frontend Player Engine & ABR Logic (The "Spine" Core)

This chapter addresses the "Brain" of the system: the Client-Side Player. We treat the player not as a UI component, but as a resource orchestrator that manages the hardware-software bridge.

4.1 Architecture of a Production-Grade Player

To prevent UI jank, we separate the playback logic from the rendering thread.

The Controller: Coordinates between the UI, the network, and the hardware buffer.
The Buffering Engine (MSE): Utilizes Media Source Extensions (MSE) to feed binary video segments into the browser's <video> tag.
The Decryption Module (EME): Handles Encrypted Media Extensions (EME) for DRM-protected content (Widevine/FairPlay).

4.2 Adaptive Bitrate (ABR) Heuristics

The player must decide which quality to download next without human intervention. We use a Hybrid Algorithm:

Throughput-Based: Measures the download speed of the last few segments.
Buffer-Based (BBA): Measures how many seconds of video are currently stored in RAM.
- Safe Zone (30s+): Stay at High Quality.
- Danger Zone (<10s): Aggressively switch to Low Quality to avoid a "Spinner."

4.3 Handling the "Thin Client" vs. "Thick Client"

Staff engineers must account for hardware diversity.

Feature	Thick Client (Desktop/PS5)	Thin Client (2018 Smart TV)
Logic	Runs full ABR heuristics locally.	Server dictates the bitrate.
Threading	Uses Web Workers for parsing.	Single-threaded, synchronous.
Buffering	Large 60s forward buffer.	Minimal 5-8s buffer to avoid RAM crash.

4.4 The Internal Player State Machine

The player does not just "Play" or "Pause." It transitions through complex states:

IDLE: Resource allocation.
LOADING: Fetching the Master Manifest (.m3u8).
STALLED: Buffer empty; UI shows "spinner," ABR shifts to lowest bitrate.
SEEKING: Clearing the current buffer and performing a "Cold Start" at the new timestamp.

4.5 Performance Optimization: The "Zero-Latency" Goal

VTT (Video Thumbnails): Fetching a single "Sprite Sheet" image for the seek-bar rather than individual frames.
Pre-fetching: Using <link rel="prefetch"> for the first 3 segments of the "Next Video" in a playlist.
Request Interleaving: Prioritizing the video chunk download over secondary metadata (like comments or likes) on slow networks.


    [ UI: React ] <--- (Events) --- [ Player State Manager ]
                                           ^
                                           |
    [ Adaptive Bitrate Logic ] <---> [ Segment Downloader ]
                                           |
    [ Media Source Extensions ] <----------+
             |
             v
    [ Hardware Decoder ] --> [ Screen ]

End of Chapter 4

Chapter 5 — Metadata DB, Schema, and Discovery (Merged Logic)

While the video bytes live on the CDN, the Metadata Plane handles the "Brain" of the platform: users, subscriptions, and video details. This chapter merges the SQL/NoSQL strategy from the Backend Doc with the Discovery requirements of the Frontend Spine.

5.1 The Data Modeling Strategy

We use a polyglot persistence model to balance Acid Transactions (for ownership) with High Availability (for views/likes).

Primary Database (PostgreSQL/Spanner)

Users Table: userId, email, channelName, subscriptionLevel.
Videos Table: videoId, creatorId, title, description, manifestUrl, thumbnailUrL, status (Processing/Live/Private).
Subscriptions: (followerId, creatorId) with composite unique index.

High-Frequency Metadata (Cassandra/BigTable)

View Counts & Likes: These require massive write-throughput. We use an Eventual Consistency model where counts are buffered in Redis and flushed to Cassandra.
Comments: Stored as a partitioned wide-column store by videoId.

5.2 Discovery & Search Architecture

The Frontend "Home Feed" and "Search Bar" are powered by a specialized indexing layer.

Search Index (Elasticsearch/OpenSearch):
- Whenever a video is transcoded, the Metadata Service pushes a document to Elasticsearch.
- Insight: We use "Fuzzy Matching" and "Autocomplete" to handle typos in the frontend search bar.
Recommendation Engine:
- Feature Store: Collects user signals (watch time, skipped videos, likes).
- Ranking Service: A machine learning model that generates a list of videoIds for the user’s home feed.

5.3 Scalability Trade-offs

Decision	Choice	Why?
Video ID	UUID/Snowflake	Prevents ID predictable scraping and allows distributed generation.
Consistency	Eventual	A 1-second delay in "Like" count visibility is better than a system crash during a viral video.
Database Sharding	By VideoId	Ensures that metadata for a single viral video doesn't overwhelm a single DB node.

5.4 The API Handshake (Frontend Fetching)

The Frontend does not "join" tables. It calls a BFF (Backend-for-Frontend) or GraphQL Gateway:

GET /v1/video/:id returns a pre-aggregated JSON object containing video details, creator info, and the HLS manifest URL.
Prefetching Logic: When the user hovers over a thumbnail, the frontend pre-warps the Metadata Cache to make the actual click feel instant.

[ Metadata Flow ]

[ Client ] <---(GraphQL/REST)---> [ Metadata Service ]
                                         |
               +-------------------------+-------------------------+
               |                         |                         |
        [ PostgreSQL ]            [ Redis Cache ]           [ Elasticsearch ]
        (Users/Permissions)       (Hot Metadata)            (Video Search)

End of Chapter 5

Chapter 6 — State Management & Multi-Device Resume Sync

In a global platform, "State" exists in three places: the Local UI, the Video Player, and the Cloud. Maintaining a seamless "Continue Watching" experience requires a sophisticated synchronization strategy.

6.1 The State Hierarchy

Volatile State (UI): Search queries, hover states, menu toggles. Stored in React State / Signals.
Player State: Current playback timestamp, volume, selected quality. Stored in a specialized Player Controller.
Persistent State: Watch history, "Resume" points, User preferences. Stored in the Cloud.

6.2 The "Resume-Sync" Pipeline

How does Netflix know you stopped at 12:45 on your TV and show it on your phone instantly?

Client-Side Heartbeat: The Player Engine emits a "Pulse" event every 5-10 seconds.
Throttling & Batching: To avoid DDOSing the backend, the Frontend batches these pulses. We don't send an API call for every second played.
The Write-Ahead Log (WAL): The Backend receives the pulse and appends it to a high-speed log (Kafka).
The Sync Store: A high-availability Key-Value store (Redis/Cassandra) updates the last_watched_pos for the userId:videoId pair.

6.3 Handling Conflicts (The Edge Case)

If a user is watching on two devices simultaneously:

Conflict Resolution: We follow a Last-Write-Wins (LWW) or Max-Timestamp logic.
Race Conditions: If the user closes the app suddenly, we utilize the navigator.sendBeacon() API or a Service Worker to send a "Final Pulse" before the process is killed.

6.4 Local State Persistence (Offline Mode)

For the "Partial Offline Download" requirement:

IndexedDB: We store downloaded video segments and their metadata in the browser's IndexedDB.
Background Sync: When the user goes back online, a Service Worker triggers a background sync to upload any "Watch History" accumulated while offline.

6.5 State Management Architecture (ASCII)

[ Device A ]                                [ Device B ]
     |                                           |
(Heartbeat: 10s)                            (Fetch Resume)
     |                                           |
     v                                           v
[ API Gateway ] ───> [ Redis / Cassandra ] <── [ Metadata API ]
     |                  (Resume Store)
     +───> [ Kafka ] ───> [ Analytics DB ]

End of Chapter 6

Chapter 7 — Global Distribution & CDN Strategy

We recognize that the "Cloud" is too slow for video. To achieve a <500ms TTFF (Time to First Frame), we must move the data as close to the user's ISP as possible using a multi-tiered distribution strategy.

7.1 Multi-Tier CDN Architecture

We do not rely on a single origin. We use a layered approach:

Origin Server (S3): The source of truth for all transcoded segments.
Regional Edges: Larger caches that store 80% of popular content within a geographic region (e.g., US-East).
Local Edge (PoPs): Small, highly distributed servers inside local ISPs. These store the "Top 10%" viral videos to ensure zero-buffering for the most-watched content.

7.2 Cache Invalidation vs. Short TTLs

Video segments are Immutable. Once segment_101.ts is created, it never changes.

Strategy: We set an infinitely long TTL for video segments.
The Manifest Problem: Unlike segments, the Manifest (.m3u8) is dynamic (especially for Live). We use a short TTL (1-2s) for manifests or a Cache-Control: no-cache strategy to ensure the player always knows the latest state of the stream.

7.3 Geo-Routing & Request Steering

When a user hits "Play," the system must decide which CDN to use:

Anycast DNS: Routes the user to the nearest IP address.
Latency-Based Routing: The Backend Metadata API provides a manifest URL pointing to the CDN with the lowest current latency for that user's specific IP.

7.4 Content Steering (Fault Tolerance)

What if a major CDN provider (like Akamai or Cloudflare) goes down?

Client-Side Steering: The manifest contains URLs for multiple CDNs. If the Frontend Player detects a 5xx error or a timeout from CDN A, it automatically fails over to CDN B without stopping the video.

7.5 The "Hot" Video Problem (Thundering Herd)

When a viral video is released, millions of people request the same segment at the same millisecond.

Request Collapsing: The CDN Edge ensures that if 1,000 requests come in for the same segment, it only sends one request back to the origin, then broadcasts the result to all 1,000 users.

7.6 Distribution Architecture (ASCII)

[ Origin S3 ]
      |
      +-----> [ Regional Cache (London) ]
      |              |
      |              +-----> [ Local PoP (UK ISP) ] ---> [ Viewer A ]
      |              +-----> [ Local PoP (EU ISP) ] ---> [ Viewer B ]
      |
      +-----> [ Regional Cache (Mumbai) ]
                     |
                     +-----> [ Local PoP (India ISP) ] --> [ Viewer C ]

End of Chapter 7

Chapter 8 — Security, DRM Handshake & Access Control (Merged)

For a video platform, security is more than just an Auth token; it is an end-to-end chain of trust that protects billions of dollars in intellectual property while ensuring seamless user access.

8.1 The Access Control Handshake

We use a decoupled security model where the Backend defines the policy and the CDN enforces it.

Authentication: Users authenticate via OAuth2/OIDC. The frontend stores a short-lived JWT in a Secure; HttpOnly cookie.
Authorization: When a user clicks "Play," the Frontend requests a Signed URL or Cookie from the Backend.
CDN Enforcement: The CDN Edge validates the signature (HMAC) on the request. If the signature is expired or the IP doesn't match, the request is rejected at the edge, saving origin bandwidth.

8.2 Digital Rights Management (DRM)

To prevent stream ripping and unauthorized screen recording, we implement a DRM Handshake using the browser's EME (Encrypted Media Extensions).

The Components:
- CDM (Content Decryption Module): A sandbox in the browser/OS that handles decryption keys.
- License Server: A backend service that verifies the user's right to watch and issues a decryption key.
The Flow:
1. The Player detects encrypted segments in the manifest.
2. The Player sends a License Request (containing the device's hardware ID) to our License Server.
3. The Server returns an encrypted key.
4. The CDM decrypts the pixels directly in the GPU memory, ensuring the "Clear Text" video never touches the Javascript heap.

8.3 Protecting the API & Metadata

Rate Limiting: Using a Leaky Bucket algorithm at the API Gateway to prevent "View Count" manipulation and scraping.
CORS & CSRF: Strict Origin policies to ensure only our official web/mobile clients can initiate playback.
Geofencing: Backend checks the user's Geo-IP against the video's distribution rights before issuing a Signed URL.

8.4 Security Architecture (ASCII)

[ Browser / CDM ]          [ API Gateway ]          [ License Server ]
       |                          |                         |
(1) Get Signed URL -------------> | (Verify JWT & Rights)   |
       | <--- (Signed URL) -------|                         |
       |                          |                         |
(2) Request Segments (CDN)        |                         |
       |                          |                         |
(3) EME License Request ----------------------------------> |
       | <--- (Encrypted Key) ----------------------------- |
       |                          |                         |
(4) Decrypt & Render              |                         |

End of Chapter 8

Chapter 9 — Real-time Engagement & Live Streaming Deep-Dive

Live streaming is the "final boss" of video engineering. It requires shifting from a "pull-based" VOD model to a "push-based" real-time model where latency is measured in milliseconds, not seconds.

9.1 The Live Ingestion Pipeline

Unlike VOD, where we transcode the whole file, Live requires Streaming Transcoding.

Ingestion (RTMP/SRT): The creator's encoder (like OBS) pushes a continuous stream to our Live Ingest Service.
Transmuxing: The backend converts the incoming stream into tiny LL-HLS (Low-Latency HLS) or DASH chunks (typically 1-second segments).
The Live Edge: The CDN must be optimized to never cache the "Manifest" for more than a fraction of a second, ensuring users are always at the "Live Edge."

9.2 Real-time Engagement (Comments & Likes)

To handle viral moments (e.g., a sports final with 10M+ viewers), we cannot use standard polling.

WebSocket Gateways: Maintain persistent connections for the "Live Chat."
Pub/Sub (Kafka/Redis): When a comment is posted:
1. The Comment Service writes to a DB.
2. The event is published to a Redis Pub/Sub topic.
3. The WebSocket Gateway "fans out" the message to all connected viewers of that specific videoId.
Throttling & Sampled Likes: For massive streams, we don't show every single "Like" in real-time. We aggregate and sample at the edge to prevent the UI from becoming a resource hog.

9.3 DVR & Catch-up Capability

Systems allow users to "Rewind" a live stream.

Rolling Window: The CDN and Origin keep the last 2 hours of segments available.
Manifest Manipulation: The Frontend Player detects the EXT-X-PLAYLIST-TYPE:EVENT tag and allows the seek-bar to move backward into the cached segments while the stream continues at the edge.

9.4 Challenge: The "Herd" Effect

When the stream ends, 10 million people hit the "Home" button at once.

Solution: We use Staggered Reconnection and Jitter in our frontend retry logic to ensure that a massive audience doesn't crash the discovery services upon exit.

9.5 Live & Engagement Architecture (ASCII)

[ Creator ] --(RTMP)--> [ Ingest ] --+--> [ Transcoder ] --(HLS)--> [ CDN ]
                                     |
                                     +--> [ Frame Capture ] (Thumbnails)

[ Viewer ] <--(WS)--> [ Gateway ] <--(Pub/Sub)-- [ Engagement Service ]
    |                                                |
    +----(GET/POST)----------------------------------+

End of Chapter 9

Chapter 10 — Cost Model, Performance Trade-offs, and Final Architecture

In a interview, the final goal is to prove that the system is not just technically sound, but economically viable. This chapter explains the "Business Logic" of our architectural choices.

10.1 The Economic Model of Video

The biggest costs in this system are Bandwidth, CDN Egress, and Storage. Everything else (CPU for APIs, Database lookups) is negligible by comparison.

The "Thick Client" Strategy: By moving ABR logic and buffering to the frontend, we utilize the user's local CPU for free, rather than paying for server-side logic.
Storage Tiering: We use S3 Intelligent-Tiering. Raw videos move to "Glacier" (Cold) after 30 days, while transcoded fragments stay in "S3 Standard" (Hot) for CDN delivery.

10.2 Performance Trade-offs (Decisions)

Decision	Choice	The Trade-off
Consistency	Eventual	We sacrifice perfect counters (Likes/Views) for absolute availability of playback.
Latency	Buffering	We intentionally delay playback start by 2-3 segments to ensure a "Stall-free" experience.
Resolution	Transcoding	We spend money upfront on transcoding to save money on bandwidth later (by serving smaller files).

10.3 Summary of the "Sweet Spot" Architecture

This design succeeds because it separates concerns into three distinct layers:

The Client (Spine): Controls reality. It handles the network's unpredictability and manages the hardware resources.
The Edge (CDN): Controls scale. It brings the bits to the user's doorstep, bypassing the slow public internet.
The Backend (Foundation): Controls policy. It handles metadata, security keys, and the heavy lifting of transcoding.

10.4 Final Conclusion for the Interview

"We have built a system that is Offline-First, Global by Design, and Economically Optimized. By leveraging a metadata-driven ingestion pipeline and a sophisticated client-side player engine, we ensure that the platform remains performant for the next 100M users, regardless of their device or network speed."

📄 Document Audit Checklist

[x] Ingestion: Resumable, chunked, and multi-bitrate.
[x] Playback: ABR, MSE/EME, and Frame-accurate seeking.
[x] Discovery: Decoupled metadata DB with search indexing.
[x] Scale: Multi-tier CDN and Edge-caching.
[x] Security: Signed URLs and DRM Handshake.
[x] Consistency: Eventual consistency for engagement; Strong for auth.

End of Chapter 10.

Chapter 11 — The End-to-End Playback Lifecycle: A Narrative Walkthrough

To tie the previous 10 chapters together, we will trace the journey of a single user (Alice) watching a single video (4K "Nature Documentary") from the moment she hits "Play" to the moment she switches devices.

11.1 Phase 1: The Handshake (Chapters 5 & 8)

Action: Alice clicks the "Play" button on her React-based Discovery Feed.
The Logic: 1. The Frontend calls the Discovery API (Chapter 5) to fetch video metadata. 2. Simultaneously, the Security Service (Chapter 8) issues a Signed Manifest URL and a DRM License Challenge. 3. The browser receives a JSON response containing the Master Manifest URL (.m3u8).

11.2 Phase 2: Orchestration & ABR (Chapter 4)

Action: The Player Controller (Chapter 4) takes over.
The Logic:
1. The player downloads the Master Manifest.
2. The ABR Logic Unit (Chapter 4) detects Alice is on a 50Mbps connection and chooses the 4K variant.
3. The Segment Downloader maps the 4K variant to a specific CDN Edge location (Chapter 7).

11.3 Phase 3: The Data Flow (Chapter 3 & 7)

Action: Pixels move from the Edge to the Screen.
The Logic:
1. The browser requests segment_001.ts from the CDN.
2. The CDN Edge (Chapter 7) serves the file from its SSD cache (originally generated by the Transcoder in Chapter 3).
3. The binary data is fed into the Media Source Extensions (MSE) buffer (Chapter 4).
4. The CDM/DRM Module (Chapter 8) decrypts the data in the hardware, and Alice sees the first frame.

11.4 Phase 4: Reality Reporting (Chapter 6 & 9)

Action: The system "remembers" Alice’s experience.
The Logic:
1. Every 10 seconds, the Frontend emits a Heartbeat (Chapter 6).
2. This pulse updates the Resume Store (Chapter 6) so Alice can switch to her iPad later.
3. High-volume signals like "Likes" or "Real-time Views" flow through Kafka to update the global counters (Chapter 9).

11.5 Phase 5: The Handover

Action: Alice closes her laptop and opens her phone.
The Logic:
1. The Phone app calls the Metadata API.
2. It receives the last_watched_pos from the Resume Store.
3. The Player Engine seeks to 12:45, and the cycle repeats instantly.

Summary: The Core Invariant

This narrative proves that our architecture is not just a list of services, but a synchronized loop.

The Backend defines what can be watched.
The CDN handles the weight of the bits.
The Frontend owns the decision-making logic.

End of Chapter 11.

Arghya Majumder @arghya_majumder

Video Streaming Platform (YouTube / Hotstar / Netflix / Prime) High-level System Design