Concept · Scaling

Horizontal vs Vertical Scaling

01

Why this matters

You launch with 1000 users on one server. Six months later you have 1M users. The server's CPU sits at 99%. You have exactly two knobs: make that one server bigger (vertical / "scale up") or add more servers alongside it (horizontal / "scale out"). This is the first architectural decision of almost every system design.

Get it wrong and you'll either spend $40k/month on a box that still dies when one disk fails, or spend three months rebuilding a monolith into a distributed system during a traffic surge.

02

Intuition

A restaurant has one cook. Rush hour hits. You can (a) hire a much faster cook (vertical: replace the person) or (b) hire five more cooks of the same skill (horizontal: add more people, now you need a kitchen manager to coordinate).

Option (a) is simple — same kitchen, same menu, nobody to train. But the best cook in town is 3× the price and there's a ceiling on how fast one human can work. Option (b) is cheaper per cook, has no ceiling (you can always hire more), and keeps working if any one cook calls in sick — but now you need coordination (the manager = load balancer, consistent recipes = stateless services, shared ingredient fridge = shared database).

03

How each works

Vertical is a cloud-provider API call: stop the VM, resize it from c5.2xlarge (8 vCPU, 16 GB) to c5.24xlarge (96 vCPU, 192 GB), start it. Downtime: ~60 seconds. Your code doesn't change at all.

Horizontal requires architectural work: put a load balancer in front, run N identical instances, make sure your service is stateless (no in-memory sessions — put state in Redis or a DB), ensure the DB can handle the concurrent load (or shard it). Downtime during scale-out: zero — you just add boxes. But the one-time investment is weeks, not minutes.

~64 vCPU
typical vertical ceiling
10 000s
horizontal ceiling (servers)
3–5×
price premium top vs bottom of instance line
~60 s
vertical resize downtime
0 s
horizontal add-box downtime
04

Tradeoffs

Vertical (scale up)

Bigger box, same code

Zero app changes. No distributed-systems tax. Fine for legacy monoliths that can't be refactored and for workloads where parallelism is hard (e.g., single-writer databases). Has a ceiling (~96 vCPU, ~1 TB RAM in 2025). Single point of failure. Price scales super-linearly at the top of the instance family.

Horizontal (scale out)

More boxes, orchestrated

No ceiling. Better availability. Each extra instance adds linearly. Failing one of N servers costs 1/N of your capacity — users don't notice. Requires statelessness, a load balancer, and harder debugging (distributed tracing, racy bugs, cache coherency). Needs infra investment up front.

Rule of thumb

Scale vertically first to buy time (takes a day), refactor to scale horizontally once you're past the 1 vCPU-hour-per-minute band. Stateful systems (primary DB) stay vertical longer than stateless services.

05

Deep dive — what actually scales horizontally

Horizontal scaling is trivial for stateless services, excruciating for stateful ones. "Stateless" means every request can be answered by any instance with no knowledge of prior requests from that client. Any state (sessions, counters, cached results) lives outside the service — in Redis, the DB, or a JWT the client carries.

Stateful services that don't scale horizontally for free: relational databases (one writer at a time; read replicas help reads but writes need sharding), in-memory session stores (move to Redis), file uploads sitting on local disk (move to object storage), WebSocket connections (sticky sessions + a pub/sub fabric).

The stateless checklist — if any answer is "yes," you have work before horizontal scaling works:

  • Does the app read or write files on local disk? → move to S3 / EFS.
  • Does it store sessions in process memory? → move to Redis.
  • Does it cache computed results per-process? → either fine (soft cache) or move to shared cache.
  • Does it hold open WebSocket / SSE connections? → introduce a pub/sub for cross-server message delivery.
  • Does it assume singleton jobs (e.g., a cron)? → introduce distributed locking so only one instance runs the job.
06

Real-world

Instagram early days

Scaled vertically to a few massive Postgres boxes

Bought them the biggest instances available, then eventually sharded Postgres horizontally when vertical hit the wall. Typical path.

AWS ECS/EKS

Pure horizontal

Services declared with replica counts. Autoscaler adds/removes tasks based on CPU/RPS. No "resize" concept for the service layer.

RDS / Aurora

Vertical for writes, horizontal for reads

One primary, N read replicas. When the primary saturates you resize it (vertical). When reads saturate you add replicas (horizontal).

DynamoDB / Cassandra

Horizontal by default

Data is partitioned across N nodes from day one. Adding nodes scales both reads and writes linearly. No "primary" to resize.

07

Used in problems

Every problem in this portfolio eventually chooses horizontal scaling for its stateless API tier. The interesting decisions are in the data layer — see the sidebar for specific problems.

Next up