Interview‑Ready : Vertical vs. Horizontal Scaling
ZeeshanAli-0704

ZeeshanAli-0704 @zeeshanali0704

About: I have expertise in developing and unit-testing web applications using JavaScript, React-Redux, Angular, Node.js, Next.js and KnockoutJS frameworks.

Location:
INDIA
Joined:
Aug 13, 2022

Interview‑Ready : Vertical vs. Horizontal Scaling

Publish Date: Aug 21
0 0

🚀 Scaling Deep‑Dive: Vertical vs. Horizontal (Interview‑Ready Guide)

A practical, interview‑focused deep dive into scaling strategies with trade‑offs, architecture patterns, and scenario playbooks. Includes crisp "interview lines" you can quote, plus detailed explanations for your blog.


📑 Table of Contents


Introduction

Scaling is about increasing a system’s capacity to handle load—more users, more data, more requests—without breaking reliability or blowing up cost. You have two primary levers:

  • Vertical scaling (scale‑up): Make a single machine bigger/faster.
  • Horizontal scaling (scale‑out): Add more machines and distribute work.

Most real systems do both over time: scale up early for simplicity, then scale out for sustained growth and resilience.


Comparison / Trade-offs

Which is easier to implement and why?

Vertical is easier. You upgrade CPU, memory, disk, or instance size. No code changes, minimal architecture churn.

But: you inherit a single point of failure (SPOF) and a hard ceiling (the biggest box you can buy/afford). Costs rise non‑linearly at the high end.

Interview line: “Scale up is the fastest bandaid; scale out is the durable fix.”

Limitations of vertical scaling

  • Hardware ceiling: Limited by max CPU sockets, RAM capacity, memory bandwidth, and storage IOPS.
  • SPOF: One box, one failure domain. If it dies, you’re down.
  • Diminishing returns: Amdahl’s Law—some parts won’t speed up by adding cores.
  • Upgrade blast radius: Upgrades can cause downtime or risky live resizes.
  • Cost curve: High‑end hardware is disproportionately expensive.

Why large-scale systems prefer horizontal scaling

  • Elasticity: Add/remove nodes to match demand.
  • Fault isolation: Fail one node, others keep serving.
  • Throughput & locality: Parallelize work, push compute closer to data/users.
  • Economics: Commodity hardware + pay‑as‑you‑go beats monolithic super‑servers.
  • Multi‑AZ/region: Survive data center failures and improve latency.

Deep Dive / Architecture

Scaling a relational database: vertical vs horizontal

Vertical (scale‑up):

  • Bigger instance (more vCPU/RAM), faster NVMe, larger buffer pool.
  • Tune: connection pooling, query plans, proper indexes, memory settings.
  • Pros: Simple, transactional semantics preserved, minimal app changes.
  • Cons: Ceiling, SPOF, expensive.

Horizontal (scale‑out):

  • Read replicas: Offload reads; app does read/write splitting.
  • Partitioning/Sharding: Split data across shards by key (e.g., user_id, tenant_id, order_id range).

    • Hash sharding: Even distribution; harder range queries.
    • Range sharding: Good for time series/ranges; watch for hot shards.
    • Directory/lookup sharding: Indirect mapping; flexible but adds a hop.
  • Multi‑primary (write scaling): Requires conflict resolution or partitioned ownership.

  • Federation/CQRS: Write to OLTP store; project to read models/materialized views for scale.

Migration sketch:

  1. Add replicas → route safe reads. 2) Introduce a shard key at write path. 3) Backfill + dual‑write. 4) Cut traffic shard‑by‑shard. 5) Decommission monolith.

Interview line: “Start with replicas, then shard by a stable, high‑cardinality key; plan re‑sharding from day one.”

How horizontal scaling impacts consistency (CAP & PACELC)

  • CAP: In the presence of partitions (P), you trade Consistency (C) vs Availability (A). Horizontal systems must tolerate P → many choose availability with eventual consistency (AP). Others choose CP (strong consistency) and accept unavailability during partitions.
  • PACELC: If Partition (P) happens → trade A vs C; **Else* (no partition) trade Latency (L) vs Consistency (C).* Even without failures, cross‑node coordination adds latency for strong consistency.
  • Read models:

    • Strong: Linearizable reads/writes (single leader, quorum).
    • Eventual: Replicas converge over time (read‑after‑write may be stale).
    • Tunable: Quorums (R+W>N) for “strong‑enough”.
  • Client patterns: read‑your‑writes, monotonic reads, session consistency.

Interview line: “Horizontal scale introduces replica lag; we often relax to eventual consistency or use quorums to balance latency and correctness.”

Infrastructure you need for horizontal scaling

  • Load balancer (L4/L7): Health checks, weighted routing, sticky sessions when necessary.
  • Distributed caching: Redis/Memcached (clustered), cache key discipline, eviction/TTL, stampede protection.
  • Distributed storage:

    • Relational with replicas/shards;
    • NoSQL (Cassandra/Dynamo) for partitioned, high‑throughput;
    • Object storage (S3/GCS) for blobs;
    • Distributed FS for shared files where needed.
  • Service discovery & config: DNS, Consul, Eureka, Kubernetes DNS; config/coordination via etcd/ZooKeeper.

  • Orchestration & autoscaling: Kubernetes, ASGs; HPA/VPA and Cluster Autoscaler.

  • Observability: Metrics (Prometheus), tracing (OpenTelemetry), logs (ELK/Loki). SLOs, alerts, dashboards.

  • Networking: VPC design, pod/service CIDRs, egress/ingress, rate limiting, WAF, API gateway.

  • Reliability controls: Circuit breakers, retries with jitter, timeouts, bulkheads, backpressure.

  • Delivery & infra as code: CI/CD, blue‑green/canary, Terraform/Pulumi, secrets management.

What role does caching play in scaling?

  • Primary lever for read scale & latency: Serve hot data from RAM at micro‑ to millisecond speed.
  • Where to cache:

    • Client/browser; CDN/edge (static & edge compute); app tier (Redis); DB/materialized views.
  • Patterns: cache‑aside (lazy), write‑through, write‑back, refresh‑ahead.

  • Keys & invalidation: Namespacing, versioning, TTLs, event‑driven invalidation; protect against stampedes (request coalescing, mutex, jittered TTLs).

  • Consistency: Decide tolerance for staleness; support read‑after‑write where necessary (e.g., bypass or short TTL on user‑profile updates).

Interview line: “Cache first; it’s the cheapest scale. Then scale out storage/compute.”


Failure Handling / Availability

SPOF vs fault tolerance

  • Vertical scale concentrates risk: single machine, single AZ, single NIC/volume.
  • Horizontal scale spreads risk: N replicas, multi‑AZ/region, rolling upgrades.
  • Design for failure: graceful degradation, brownouts, traffic shedding, overload protection.

Read-heavy vs write-heavy: choosing a strategy

Read‑heavy: replicas + caches + denormalized read models.
Write‑heavy: sharding/partition ownership, log‑structured stores, batching/queues, idempotent writes, hot‑key mitigation (random suffixing, consistent hashing, time‑bucketed IDs).

Interview line: “Reads scale with replicas; writes scale with ownership (shards).”


Scenario Playbooks

Slow web app: vertical or horizontal?

  1. Measure first: CPU, memory, GC, heap, disk IOPS, DB QPS/latency, queue depth, p95/p99, error rates, RPS.
  2. If host‑bound: scale up (bigger instance, faster disk) + fix hot paths.
  3. If throughput‑bound: scale out (LB + more app pods/instances).
  4. Always: add caching, index queries, async offload, connection pooling.
  5. Guardrails: timeouts, retries with jitter, circuit breakers to avoid cascades.

Interview line: “Diagnose before dollars; cache before clusters.”

Stateless vs stateful: how to scale horizontally

Stateless services

  • Store no user/session state in memory beyond a request.
  • Scale by simply adding nodes; LB can send any request to any node.
  • Externalize state to Redis/DB; use idempotent handlers.

Stateful services

  • Options: sticky sessions, external session store, or partitioned ownership (e.g., user‑ID‑based ownership).
  • For consensus‑bound state (leaders, locks), use Raft/Zab systems (etcd/ZooKeeper) and expect lower write throughput.
  • For collaborative/near‑real‑time (docs/chat), consider CRDTs for AP trade‑offs.

When AWS/RDS hits vertical limits

  • Exploit replicas for reads; use a read‑write proxy to split traffic.
  • Shard by tenant, region, or entity (user/order). Precompute routing (lookup table/consistent hashing).
  • Denormalize hot paths to reduce cross‑shard joins; use CQRS + events to project read models.
  • Add cache layers (Redis) and materialized views for expensive reads.
  • Archive cold data and compress; narrow indexes; tune autovacuums.
  • Zero‑downtime plan: dual writes + backfill → switch reads → cut writes shard‑by‑shard.

Interview line: “RDS scale‑up buys time; sharding is the destination. Plan re‑shardability on day one.”

Scaling an e‑commerce checkout service

  • Requirements: high availability, idempotency, exactly‑once‑ish payment capture, inventory integrity.
  • Topology:

    • Multiple checkout service instances behind LB.
    • External session/cart store (Redis/DynamoDB).
    • Order DB partitioned by customer/region/order_id.
    • Payment worker consuming from a queue (SQS/Kafka) with retry & dead‑letter.
    • Inventory service with reservation (hold → confirm → release on timeout).
    • Idempotency keys on APIs; dedupe table with TTL.
  • Consistency: prefer sagas over 2PC. Design compensating actions (refund, restock).

  • Resilience: circuit breakers to PSPs, fallback to alternate gateways, poison‑pill handling.

  • Observability: trace an order end‑to‑end (distributed tracing). Auditable logs.

Interview line: “Scale nodes, externalize cart state, partition orders, queue payments, and enforce idempotency throughout.”

How Kubernetes supports horizontal scaling

  • HPA (Horizontal Pod Autoscaler): scales pods based on CPU/memory/custom/external metrics (v2). Set min/max replicas and stabilization windows.
  • Cluster Autoscaler: adds/removes nodes to fit pending pods.
  • Service & Ingress: built‑in L4/L7 load distribution. Readiness/liveness probes for safe rollout.
  • StatefulSets + PVCs: stable identities, ordered updates for stateful workloads.
  • Best practices: pod anti‑affinity (spread across AZs), PDBs, resource requests/limits, autoscaling cooldowns, and slow start to avoid thundering herds.

Interview line: “HPA scales pods, Cluster Autoscaler scales nodes, and readiness gates keep traffic on healthy replicas.”


Quick Interview Lines (Cheat Sheet)

  • “Scale up for simplicity; scale out for longevity.”
  • “Reads scale with replicas; writes scale with ownership (shards).”
  • “Diagnose before dollars; cache before clusters.”
  • “Eventual consistency buys availability; quorums buy confidence.”
  • “Design re‑sharding on day one.”

Glossary

  • SPOF: Single Point of Failure.
  • Shard key: Field used to route data to a shard.
  • Quorum (R/W/N): Read/Write counts over N replicas ensuring overlap.
  • Idempotency: Same request multiple times → one effect.
  • Saga: Orchestration of local transactions with compensations.

Further Reading Pointers

  • Indexing & query tuning checklists (RDBMS).
  • Cache design patterns (cache‑aside, write‑through, write‑back).
  • Sharding strategies (hash/range/directory, re‑sharding).
  • CAP & PACELC, consistency models.
  • Kubernetes autoscaling (HPA v2), PDBs, anti‑affinity.

More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Comments 0 total

    Add comment