The vocabulary of
system design.
Deep, interview-grade references for the concepts that show up in every system design discussion. Each concept explains the intuition first, then the mechanics, then the tradeoffs, then the deep dive. Cross-linked to the problem pages that use them.
Foundations
9 / 9 shippedInterview Framework
You get 45 minutes. The interviewer says "design Twitter." There are a thousand things to say and you'll remember half of them. Without structure, you'll ramble about load balancers for 20 minutes, never estimate scale,
Back-of-Envelope Estimation
Every design decision hinges on numbers. Do we need sharding? Only if writes exceed ~10k/sec. Do we need a cache? Only if read latency is a bottleneck. Do we need a CDN? Only if users are geographically far from origin.
Latency Numbers
Every architectural decision — "do we need a cache?", "should this call be async?", "can we fit this in RAM?" — collapses to the same question: how long does the operation actually take? Engineers who can't estimate this
Availability — The Nines
"Four nines" (99.99%) sounds marginally better than "three nines" (99.9%). The truth: four nines allows 52 minutes of downtime per year; three nines allows 8.8 hours. That's a 10× difference in how much engineering you p
SLOs, SLIs, SLAs
"Our service is reliable" is not a claim; it's marketing. "Our service is reliable" with numbers — "99.9% of requests complete under 200ms, measured over a 30-day window" — is an engineering target. Without SLOs/SLIs/SLA
Time Synchronization & Clocks
Two machines, separated by a network, disagree on the order of events. "Event A at 12:00:00.500" on server 1. "Event B at 12:00:00.400" on server 2. Did B come before A? Only if the clocks agree, and they never perfectly
Concurrency Models
"Handle 50k concurrent connections" is an easy requirement to write and a hard one to meet. Every language offers a concurrency model — threads, event loops, actors, coroutines, or some combo. Each has different costs: t
Shared-Nothing Architecture
The single most important architectural principle for scalable systems: no resource is shared across nodes. No shared memory, no shared disk, no shared lock manager, no shared cache. Each node owns its slice of data and
Multi-Tenancy
You're building B2B SaaS. 10,000 customer companies, each with their own users + data + settings. Should each customer get their own database, their own EC2 instances, their own Kubernetes namespace? Or do they all share
Networking & Delivery
13 / 13 shippedDNS
Every request to your API starts with "what IP is api.example.com?" That's DNS. Get it wrong and users can't reach you. Configure TTLs wrong and users are stuck on a dead IP for hours after you change it. Pick the wrong
CDN
Your origin server is in Virginia. Your user is in Singapore. A round-trip over the Pacific is ~180ms, minimum, without TLS handshake, TCP slow-start, or any actual work. Serve a page with 40 assets (images, scripts, fon
TCP vs UDP
"TCP is reliable, UDP is fast" is the bumper sticker. True, but the interesting question is why you'd ever choose UDP. For decades the answer was niche (DNS, streaming). Now — with QUIC, WebRTC, and modern games — UDP is
HTTP/1 vs HTTP/2 vs HTTP/3
Picking HTTP/2 over HTTP/1 gave an instant 2× speedup on a typical page load with no app code changes. HTTP/3 fixes the one failure mode HTTP/2 still had. In interviews: naming the specific wins of each version shows you
TLS & HTTPS
Every byte between a client and your server passes through routers you don't control. Without encryption, anyone on the path reads passwords, session tokens, the works. TLS encrypts it all — but also adds latency (the ha
Proxy vs Reverse Proxy
Both types of proxy stand between a client and a server. Which one faces which way flips the use case entirely. Candidates mix them up all the time. Interviewers notice.
WebSockets vs SSE vs Polling
Chat. Live scores. Notifications. Collaborative editing. Any feature where the server pushes data to the client the moment it's ready. HTTP was designed for the opposite: client asks, server answers. To make push happen,
REST vs GraphQL vs gRPC
Picking an API protocol shapes your client experience, your throughput, your tooling, and your caching story for years. "We'll use REST" is the lazy default, often the wrong one. gRPC is 7× faster but kills browser usabi
Service Mesh
You have 50 microservices in production. Every service-to-service call needs: TLS encryption, mTLS auth, retries, timeouts, circuit breaking, observability, traffic routing. Implementing all of that in every service, in
Webhooks
Stripe processes a payment. Your app needs to know when it succeeds. Two options:
API Versioning
You ship v1 of your API. Customers integrate. Six months later, you need to rename a field, change a response shape, drop a deprecated endpoint. Every existing customer integration will break. They didn't sign up for a m
Edge Computing
Your origin server is in Virginia. A user in Singapore: 180ms round-trip just to reach you, before any work. CDNs solved this for static content. Edge computing extends the same idea to code — run your business logic at
Compression & Encoding
Sending 100 KB to a user costs 100ms over a 8 Mbps link. Compress it to 20 KB → 20ms. Compression is the cheapest performance win in your stack: zero infrastructure changes, ~80% of the bytes saved, milliseconds of CPU.
Scaling
5 / 5 shippedHorizontal vs Vertical Scaling
You launch with 1000 users on one server. Six months later you have 1M users. The server's CPU sits at 99%. You have exactly two knobs: make that one server bigger (vertical / "scale up") or add more servers alongside it
Load Balancer
A single server can serve maybe a few thousand requests per second before CPU, memory, or connection limits pin it. The moment your traffic crosses that ceiling, you have two options: buy a much bigger server (vertical s
Stateless Services
You want to scale horizontally. Your load balancer spreads traffic across N servers. User Alice's session is in memory on server 3. The LB routes Alice's next request to server 7. Server 7 doesn't know Alice — login lost
Autoscaling
Traffic is spiky. Your e-commerce site does 1000 RPS on a Tuesday afternoon and 15,000 RPS during Black Friday. Static provisioning is wasteful (pay for peak capacity 24/7) or risky (provision for average, die during spi
API Gateway
You have 40 microservices behind the scenes. A mobile client wanting to render a home screen shouldn't call 20 different hostnames, each with different auth, rate limits, and retry policies. An API gateway is the single
Databases
15 / 15 shippedSQL vs NoSQL
Picking the wrong database is the most expensive architectural mistake you can make early. You can change a cache vendor in a weekend; you can't change your database in a quarter. Interviewers want to see a justified cho
ACID vs BASE
"Strong consistency" and "eventual consistency" are the two ends of a spectrum that drives every data-layer decision. Pick strong when you must (payments, inventory, anything involving money or exclusivity). Pick eventua
CAP Theorem
Every distributed database picks one of two behaviors when the network between nodes breaks: return stale data or refuse to serve the request. CAP is the formal statement that you cannot avoid this choice. Pretending oth
PACELC Theorem
CAP only describes what happens during a partition — but partitions are rare. Most of your system's life is spent in the no-partition state, where CAP has nothing to say. The real everyday tradeoff there is between laten
Replication
One database server holds your data. The disk dies. You lose everything. Replication — keeping copies on multiple servers — solves three problems at once: durability (survive hardware failures), availability (serve reads
Sharding
One server, even a massive one, tops out around ~100k writes/sec. Replication buys you read scale but not write scale — the leader is still a bottleneck. Once writes exceed one box, you shard: split the data across N box
Indexing
A database table with 100M rows. A query: WHERE email = 'x@y.com'. Without an index, the DB scans all 100M rows — seconds per query. With a B-tree index, it's ~10 lookups, 0.5ms total. That's a 10,000× difference and the
Normalization vs Denormalization
Your data model has two extremes. Fully normalized: every fact lives in exactly one place, connected by foreign keys. Great for writes (update once, seen everywhere), bad for reads (join 5 tables to render one page). Ful
Database Types
"We'll use a database" is not an architecture decision — it's punting. Every system design interview eventually asks: Postgres or Cassandra? Redis or DynamoDB? Mongo or Elasticsearch? The answer is never "it depends" alo
Distributed Transactions
User clicks "Buy." You must: charge card (payments service) + decrement inventory (warehouse service) + create order (orders DB). All three or none — partial is catastrophic (card charged, no order). In one database, ACI
Write-Ahead Log (WAL)
You commit a transaction. Two milliseconds later, the power fails. On reboot, is the commit there? If yes, your DB is durable. If no, it's a lie. The trick every durable datastore uses — Postgres, MySQL, Cassandra, SQLit
Connection Pooling
Opening a TCP connection takes 1 round-trip. Adding TLS adds another. Adding Postgres handshake adds 2-3 more. Each connection: ~5-10ms of pure overhead before any query runs. If your app opens a new connection per reque
Database Federation
One huge database holds everything: users, orders, products, reviews, notifications, sessions. As the company grows, this DB becomes the bottleneck — every team's slow query affects everyone, schema changes touch everyon
Change Data Capture (CDC)
Your transactional DB has the source-of-truth for orders. You also need: a search index in Elasticsearch, a denormalized read store in Redis, an analytics warehouse, a notifications pipeline triggered on order events. Fi
Data Lake vs Warehouse
"Where do analytics queries run?" If you say "Postgres," you've never had real analytics needs. Running aggregations over 10 TB of order history on your transactional DB tanks every API. Analytics belongs in a separate s
Caching
4 / 4 shippedCache Strategies
A database lookup costs 1–10ms. A cache lookup costs 0.1–1ms. Your feed page makes 40 lookups. Without a cache you're over 100ms just on DB; with one you're under 5ms. That's the difference between a snappy app and a slo
Cache Eviction
Your cache has 16 GB of RAM. Your working set grows to 20 GB. Something must go. Which entries you evict — and which you keep — is the difference between a 95% hit rate and a 50% hit rate. At scale, that gap is millions
CDN vs Application Cache
Both are caches. Both reduce load on your origin. They live at different layers of the stack, cache different things, and solve different problems. A candidate who says "we'll use a cache" and can't distinguish them is p
Cache Stampede
Your homepage is served from Redis. TTL 5 minutes. At T = 0, the cache expires. At T = 0.001s, 10,000 concurrent requests all miss the cache and all query the origin database at the same time. The DB, designed for 100 RP
Messaging & Async
5 / 5 shippedMessage Queue vs Pub/Sub
Two teams both say "we use a queue." One means point-to-point work distribution (one message, one consumer takes it, runs a job). The other means broadcast (one event, N subscribers each process it). These have different
Delivery Guarantees
You send a message. The network is flaky. Did it arrive? If you retry, does the consumer see it twice? If you don't retry, did it get lost?
Event Sourcing & CQRS
A traditional DB stores the current state: user 42 has balance $150. How they got there is gone. Event sourcing inverts this: you store the stream of changes (deposits, withdrawals, transfers), and "current balance" is a
Kafka Internals
Kafka is the backbone of event streaming at every serious company. 500k+ messages/sec per broker, years of retention, replayable. In interviews, "we'll use Kafka" isn't an answer — how Kafka works is. Partitions, consume
Batch vs Stream Processing
"Compute total revenue per day per product." Two answers exist:
Distributed Systems
13 / 13 shippedConsensus — Paxos & Raft
Five nodes each think they might be the leader. The old leader's network cable was unplugged; now it's plugged back in. Who's in charge? Who accepts writes? If two nodes both think they're leader, they both accept writes
Distributed Locking
Two instances of your service want to send a reminder email for order #42. Both run the cron job at the same second. Without coordination, the user gets two emails. Or two payment workers process the same refund. Or two
Leader Election
Distributed systems are full of "only one at a time" jobs: one cron runner, one database primary, one billing reconciler, one job scheduler. Without coordination, every replica does the same work — duplicate emails, dupl
Consistent Hashing
You have 10 cache servers. You route keys to them with hash(key) mod 10. It works beautifully. Then you add an 11th server. Now ~90% of your keys hash to a different server — your entire cache invalidates. Origin gets ha
Vector Clocks & LWW
Two users edit the same document at (roughly) the same moment. Their clients both commit locally. Now the server has two versions. Which one wins?
Gossip Protocols
1000 nodes in a Cassandra cluster. Each needs to know about the others — who's alive, what ranges they own, their load. Centralized approach (one server polls all 1000) creates a SPOF and a bottleneck. Broadcasting (each
Quorum
In a replicated system, you have N copies of the data. Do you write to all N before returning? Just 1? Somewhere in between? Do you read from all N (slow), or 1 (possibly stale)? The answer is a quorum — a minimum number
Heartbeat & Failure Detection
Is node 7 dead, or is its network link just slow? You can't tell from a missed response. Decide too quickly → false-positive, you kick a healthy node. Decide too slowly → real failures go unnoticed, requests time out for
Service Discovery
Service A calls Service B. Where is B? Hard-coding an IP breaks the moment B autoscales, moves to a new host, or deploys to a new region. Hard-coding a hostname helps, but still routes through DNS caches that are slow to
Two Generals & Byzantine Problems
The two foundational impossibility results in distributed systems. Knowing them isn't trivia — they tell you exactly what's possible to build and what's not. Every distributed protocol you'll ever design is constrained b
Read Repair & Anti-Entropy
You replicate data across 3 nodes for durability. A network blip causes node 2 to miss a write. Now node 1 + 3 have v2; node 2 has v1. Replicas have diverged. Without active repair, this divergence accumulates — the long
Tunable Consistency per Query
"What consistency does our database give us?" is the wrong question. Different queries need different guarantees. A user's password change must be strongly consistent (next login should see the new hash). A like-count di
Clock-Skew Tolerance Design
You read time-sync-clocks and learned wall clocks drift, NTP misbehaves, leap seconds break things. Now what? You can't avoid using time entirely — TTLs, timeouts, scheduling, ordering, JWT expiry all need it. Clock-skew
Reliability
11 / 11 shippedCircuit Breaker
Service B is slow. Service A calls it on every request, each call hanging for 30 seconds before timing out. A's threads fill up waiting for B. A's latency spikes. A's LB marks A unhealthy. Now A is down because B is slow
Retries, Backoff & Jitter
Network calls fail. Retry solves 90% of transient failures for free. But naive retries cause retry storms — every client retrying the failing service simultaneously, which is exactly what the failing service can least ha
Bulkhead Isolation
Your service has 200 threads. It calls services A, B, C. Service C hangs. Requests to C accumulate; all 200 threads end up waiting on C. Requests to A and B can't get served because no thread is free — even though A and
Rate Limiting Algorithms
"Limit users to 100 requests/minute" sounds simple. It is not. Which minute? The last 60 seconds? The current calendar minute? Count as they come, or enforce an even drip? Each answer is a different algorithm with differ
Idempotency
A user clicks "Pay." The request hits your server, charges their card, but the response packet is lost in the network. The user's app doesn't see success, retries. Now you've charged them twice. Classic distributed-syste
Graceful Degradation
Your recommendation service is down. Does your homepage return a 500, or does it just skip the "Recommended for you" section and still render? The first is a hard failure; the second is graceful degradation. Same failure
Backpressure & Flow Control
A producer emits 100k events/sec. A consumer can process 10k/sec. With no coordination, the consumer's queue grows unboundedly — memory bloats, latency cliffs, eventually the process OOMs. Backpressure is the mechanism b
Feature Flags & Rollouts
"Deploy a new feature to all 100M users at once" is asking for a 3am incident. Feature flags let you separate code deploy from feature release: ship the code dark, then turn it on for 1% of users, watch metrics, ramp to
Chaos Engineering
You think your system handles a database failover. The runbook says it does. The diagram says it does. But you've never actually tested it. The first time it happens for real — at 3am, with no warning — you discover the
Request Hedging
Your service makes 5 backend calls per user request. Each call has a P99 of 50ms — sounds great. But the combined P99 is much worse: even one slow call ruins the request. With 5 calls, the chance of at least one being a
Blue-Green & Canary Deployments
You ship 100 deploys a week. Each one risks breaking production. The naive "stop the old version, start the new version" gives you 30 seconds of total outage and a rollback that takes longer than the original deploy. Dep
Observability & Security
8 / 8 shippedObservability Triad
3 AM. Your on-call phone rings. Something is broken. You have 15 minutes before users notice and executives notice after that. With good observability, you grep a log, check a metric, pull a trace, and see the problem in
Auth — OAuth & JWT
Every API needs to answer: who is this user, and what are they allowed to do? Roll your own auth and you'll have a CVE within months. OAuth 2.0 is how you delegate authentication to a trusted identity provider. JWTs are
DDoS Protection
A botnet of 100,000 compromised devices each sends 10 requests/sec at your site. 1M RPS at your origin. Your LB handles 50k. Site is down. No users can reach you until the attack stops or you have defenses.
Secret Management
Your app needs a Postgres password, an AWS access key, a Stripe API key, a Twilio token, an OAuth client secret. Where do they live?
Zero Trust Networking
The traditional security model: castle-and-moat. Strong perimeter (firewall + VPN); inside the perimeter, services trust each other implicitly. The flaw: one compromised laptop or one breached service inside the perimete
Tokenization & PCI Compliance
Your e-commerce app needs to store credit card numbers so customers don't re-enter them every time. The moment you do, your entire stack — every server that touches that data, every database that stores it, every backup
Field-Level Encryption
"Our database is encrypted." Sounds great. Look closer: encrypted at rest with a single key controlled by the cloud provider. A DBA, a compromised app, an SRE running SELECT * sees plaintext. Encryption-at-rest only prot
GDPR — Right to Be Forgotten
EU GDPR Article 17: a user can request all of their personal data be deleted. You have 30 days. "Delete a row" sounds simple. The problem: that user's data is in the live DB, in 7 read replicas, in 30 days of database ba
Data Structures
7 / 7 shippedBloom Filter
You have 1 billion URLs you've already crawled. Before crawling a new URL, you want to check "have I seen this before?" A hash set costs ~100 GB of RAM for 1B URLs. Most queries will be for URLs you've never seen. You ju
Geospatial Indexes
"Find all drivers within 2km of (37.77°N, 122.42°W)." A standard index on (lat, lng) doesn't help — you'd scan every row checking distance. Geospatial indexes let you answer proximity queries in milliseconds over million
Merkle Trees
Two Cassandra replicas hold 100M rows each. They're supposed to be identical. Something went wrong and one has a few stale entries. Comparing all 100M rows to find the differences would take hours and saturate the networ
HyperLogLog & Sketches
"How many unique visitors did we have today?" Naive: keep a Set<user_id>. For 100M users, that's 6+ GB of RAM. Do it per-URL or per-minute and you're out of memory fast. HyperLogLog answers the same question with 12 KB,
Memory-Mapped Files (mmap)
Reading a 100 GB file the normal way: read() in chunks, copy from kernel buffer to user buffer, process. Two copies per byte. Slow at scale. mmap maps the file's bytes directly into your process's virtual memory — access
Erasure Coding
Storing 1 PB of files with 3× replication = 3 PB of disk. Storage at scale is dominated by disk cost, and 3× is expensive — Facebook's photos, Netflix's video, Google's mail all add up to exabytes. Erasure coding achieve
URL Encoding & Base62
"Make me a short URL." Sounds trivial. The shortcode must be: unique, URL-safe, short (6-8 chars), and ideally non-guessable. Plus, you need to mint billions of them at thousands per second without coordination overhead.
Machine Learning Systems
7 / 7 shippedFeature Store
"User's last 7-day click count" is a feature. Your training pipeline computes it from click logs in Spark; your serving pipeline computes it from a Kafka stream. The two implementations drift — one bug here, one rounding
Model Serving — Online vs Batch
You trained a recommendation model. Now what — does it predict in real time when the user opens the app, or do you precompute every user's recs nightly and read from a cache? Same model, completely different infrastructu
Vector Databases
The query is "movies like Inception." Postgres can't help — there's no SQL operator for "semantically similar." With embeddings (vectors of 384-1536 floats representing meaning), the answer is: nearest neighbors of Incep
Embedding Generation Pipelines
You have 100 million product descriptions, 50 million user-generated images, 1 billion documents. To use them with vector search, every one needs an embedding — a 384–1536-dim vector. Calling an embedding model API for e
Online vs Offline Training
Your recommendation model was trained on last month's data. New trends emerged. New users appeared. The model is already stale — predictions degrade by the hour. Do you re-train nightly (offline), continuously update wei
A/B Testing Platform
You ship a new homepage. Did it improve conversion or hurt it? Eyeballing the dashboard for a week proves nothing — traffic patterns shift hour to hour, day to day. A/B testing is the discipline of statistically comparin
LLM Serving Infrastructure
Serving GPT-class LLMs is unlike serving any model that came before. A single inference can take 30 seconds, generates thousands of tokens, requires tens of GB of GPU memory, and costs $0.001-0.10 per request. Multiply b
Architecture Patterns
5 / 5 shippedMicroservices vs Monolith
"Should we go microservices?" is one of the most consequential — and most misunderstood — architectural decisions. Pick microservices too early and you'll spend years debugging distributed systems instead of building fea
Domain-Driven Design
"What does Order mean?" In your warehouse system, Order = a list of SKUs to pick. In your billing system, Order = an invoice with payment terms. In your shipping system, Order = a box with an address. Same word, three co
Event-Driven Architecture
Order placed. Now: charge card, decrement inventory, send confirmation email, notify warehouse, update analytics, trigger fraud check, alert support if VIP. Six things must happen.
Hexagonal & Clean Architecture
You wrote your business logic in a controller class that calls Postgres directly via an ORM and HTTP-responds with JSON. Year 2: you need to support gRPC. Year 3: you need to use DynamoDB instead of Postgres. Year 4: the
Backend for Frontend (BFF)
One backend serves your iOS app, Android app, web SPA, and partner API. iOS wants compact JSON for slow networks. Web wants every field for the rich UI. Partners want a stable contract that never changes. You can't make
Operations
3 / 3 shippedDisaster Recovery — RTO & RPO
Your entire AWS region goes down. Power failure, fiber cut, malicious actor, hurricane. How long until your service is back? How much data did you lose? These two questions have formal names: RTO (Recovery Time Objective
Multi-Region — Active-Active vs Active-Passive
One AWS region fails — your service is down for 30 minutes. Multi-region deployment fixes this, but how? You can run the second region passively (cold or warm standby that takes traffic only on disaster) or actively (bot
Zero-Downtime Database Migration
You need to rename a column. Drop a table. Migrate from MySQL to Postgres. Add a NOT NULL field to a 1-billion-row table. Each of these can be a 30-minute outage if you do it naively — or zero downtime if you follow the
Frontend & Mobile
4 / 4 shippedService Workers & PWA Offline
Native mobile apps work offline. Web apps don't — refresh the page on a flight, you get the dinosaur. Service workers close that gap. A service worker is a JavaScript file the browser runs in the background, intercepting
Mobile Offline-First Sync
The user opens your app on the subway. No signal. They write a comment, mark a task done, edit a doc. The app should accept their work, not pop "no internet" errors. When connectivity returns, everything they did syncs t
Image & Video Processing Pipeline
User uploads a 12 MP photo from their phone. Your app needs to display it as: thumbnail (200×200), card image (600×400), full view (1920×1080), maybe HEIC + WebP + JPEG variants for different browsers. Maybe with face cr
Push Notification Protocols
Sending a push notification to 10 million users sounds simple — until you confront that you're not actually delivering the notification. Apple is. Google is. The browser vendor is. Your server hands the message to APNs (