CDN

A global Content Delivery Network: serve cached copies of customer content from hundreds of edge points-of-presence (PoPs), delivering to users with sub-50ms RTT anywhere on Earth. The hard parts: global routing so requests land at the closest healthy PoP (Anycast); a cache hierarchy that protects origin from stampedes at 10M+ req/sec; and cache invalidation that reliably purges stale content from ~300 locations in seconds, not hours. Cloudflare: 300+ PoPs, ~55M HTTP req/sec, serves ~20% of internet traffic. Akamai: 4000+ PoPs, older + deeper model.

⚡ Core: Anycast + Cache Hierarchy + Purge300+ PoPs55M req/sec> 90% cache hit ratioSub-50 ms RTT
02

Requirements

Functional
  • Pull-through cache for customer HTTP content (images, CSS/JS, videos, API responses)
  • Customer-configurable cache rules (TTL, cache key, bypass conditions)
  • Fresh-on-write: purge cache entry globally within seconds of API call
  • TLS termination at the edge with customer certs (SNI)
  • Optional: WAF rules, bot management, DDoS mitigation, image resize
  • Real-time analytics: req/sec, bytes served, cache hit ratio per customer
  • Edge compute (Workers / Lambda@Edge) for request-path JS execution
Non-Functional
  • RTT < 50 ms to end-user from nearest PoP (p95)
  • Cache hit ratio > 90% for static assets
  • Purge propagation < 5 seconds globally
  • 99.99% availability — outages affect huge swaths of the internet
  • Survive DDoS up to tens of millions of req/sec per target
  • No single PoP's death should impact serving
03

Scale Estimation

PoPs
~300
Cloudflare ~320; Akamai ~4,000 (different model — deep edge)
HTTP req / sec
~55M
Cloudflare reported peak; average is lower. Each PoP handles ~100k–500k rps
Cache storage / PoP
~100 TB
SSD-tier. Not all content in all PoPs — tiered cache spills to regional
Cache hit ratio
> 90%
for static content; < 50% for dynamic / API responses without tuning
Customers
~25M
Cloudflare disclosed; each with their own certs + cache rules + WAF config
Purge propagation
~3–5 s
global invalidate across 300 PoPs; pub/sub with per-PoP message
04

API Design

Two API surfaces: the data plane — HTTP(S) requests from end-users hitting edge PoPs — and the control plane — customer-facing API for configuring rules, purging cache, reading analytics.

GEThttps://customer-site.com/path

End-user HTTP request. Anycast routes to nearest PoP. Cache check → serve or fetch from origin. TLS term at edge using customer cert.

POST/client/v4/zones/{zone_id}/purge_cache

Purge specific URLs, tags, or the entire zone. Body: {files: [URLs], tags: [cache_tag_1, ...], purge_everything: true}. Propagated to all PoPs within seconds.

POST/client/v4/zones/{zone_id}/rules

Configure cache rules, page rules, WAF rules, workers. Changes propagate to edge via config sync (seconds).

GET/client/v4/zones/{zone_id}/analytics?start=X&end=Y

Returns time-series metrics per zone: requests, bytes, hit ratio, status codes, countries, threats blocked.

POST/client/v4/zones/{zone_id}/workers

Deploy edge compute (JavaScript/WASM). Uploads script bundle to all edge PoPs; runs in an isolate on every request to the configured routes.

05

Architecture

Three tiers: edge PoPs (close to users, serve 90%+ of requests), regional tier (fewer, larger, act as a mid-cache + origin shield), and origin (customer's server). Requests cascade upward only on cache miss. A control plane in a few central regions handles config + purge + analytics rollup.

Global CDN Topology SVG
User / browser HTTP(S) Resolver (DNS) Anycast lookup Edge tier — hundreds of PoPs PoP NYC LB + cache + TLS PoP London LB + cache + TLS PoP Singapore LB + cache + TLS Each PoP runs: nginx/envoy proxy → cache (NVMe) → WAF → workers → origin-fetch TLS cert served via SNI-indexed local store Request log + metrics shipped to regional aggregator in real-time Regional tier (mid-cache) Regional NA ~10 TB SSD Regional EU ~10 TB SSD Origin-shield role absorbs PoP miss traffic reduces origin load ~10× Control plane (central) Config svc per-customer rules Purge svc global pub/sub Analytics logs → DB Worker deploy isolates ship Origin (customer) app servers Edge / regional log stream → Kafka → ClickHouse / BigQuery for analytics per customer
Request Flow — Step Through
User · HTTP requestAnycast BGP · nearest PoP winsEdge PoP · TLS + cacheCache decide · HIT or MISSRegional tier · mid-cacheOrigin · customer serverResponse · populate cache
Click Next Step to walk through the request flow.
06

Deep Dive — Anycast + Cache Hierarchy + Purge

Anycast routing. All CDN PoPs announce the same IP address (typically a single /24 per service) via BGP. Internet routers naturally prefer the topologically closest announcement. A user in London gets routed to the London PoP; a user in Tokyo to the Tokyo PoP — without DNS games. If a PoP dies, its BGP announcement is withdrawn and traffic reroutes to the next-closest.

This is dramatically simpler than DNS-based geographic routing (which is the Akamai-era approach): no TTL games, no resolver location guesses, no DNS caching issues. Cloudflare popularized Anycast-for-CDN; now industry standard.

Cache hierarchy with tiered cache. Problem: 300 PoPs × 100 TB each is 30 PB of storage — but cumulative customer content is far more than that. Each PoP can only hold a subset. Result: cold content causes repeated fetches from origin, hammering customer servers.

Solution: regional tier between edge and origin. A PoP miss queries its regional tier. The regional tier has more storage, sees more traffic, and has better hit ratio. Only regional-tier misses hit origin. Net: origin load drops 10× compared to flat-edge.

Request Flow — Cache Hit / Miss Cascade Mermaid
sequenceDiagram participant U as User participant E as Edge PoP participant R as Regional participant O as Origin U->>E: GET /foo.jpg E->>E: cache lookup alt HIT (90% of the time) E-->>U: serve from NVMe (p50 ~5 ms) else MISS at edge E->>R: fetch /foo.jpg alt HIT at regional R-->>E: bytes + cache-control E-->>U: serve + populate edge cache else MISS at regional R->>O: fetch /foo.jpg O-->>R: bytes R-->>E: bytes E-->>U: serve + populate both caches end end

Cache key. Default is (host, URL-path, query-string). Customer can override — strip tracking params, normalize case, include Vary headers (Accept-Encoding, device type). Mis-configured cache keys are a top source of inexplicable "why isn't my site caching?" support tickets.

Purge / invalidation. Two flavors: URL-based (specific URLs) and tag-based (arbitrary label applied to responses at cache-time, then "purge all cached items with tag X"). Purge flow:

  1. Customer API call hits control plane.
  2. Control plane writes purge message to a global pub/sub bus (Kafka / internal multicast).
  3. Every PoP subscribes; consumes message.
  4. Each PoP invalidates matching entries in local cache (usually by writing a tombstone so serves return miss).

Total time: seconds. Not instant — but predictable. "Purge everything" zone-wide is much slower (hours) because it invalidates millions of keys.

Stampede prevention. If 1M users request a newly-popular image simultaneously and edge cache misses, all 1M requests would forward to the regional/origin. Protection: request coalescing — concurrent requests for the same key at a single PoP collapse into one upstream fetch. Only the first miss goes upstream; others wait for its response. One-request-in-flight invariant per key per PoP.

Interview answer

"Anycast BGP routing sends users to the nearest PoP. Each PoP runs nginx/envoy with NVMe-backed cache, TLS termination with SNI-indexed certs, WAF, and workers. Cache misses cascade to a regional tier (origin shield) before ever hitting the customer's origin, cutting origin load 10×. Purge is global pub/sub from a control plane — seconds to propagate to all PoPs. Stampede protection via per-key request coalescing so only one upstream fetch runs for concurrent misses. Real-time logs stream to a central analytics pipeline."

07

Tradeoffs & Design Choices

  • Anycast vs DNS-based routing. Anycast is simpler, faster to fail over, no TTL dependencies. DNS-based allows policy ("send premium customers to better PoPs") but is brittle. Cloudflare-style = Anycast; Akamai = historically DNS. New CDNs default to Anycast.
  • Flat-edge vs tiered cache. Flat (no regional) is simpler but hammers origin on misses. Tiered requires another hop but preserves origin. Modern default is tiered; "flat-edge" survives only in smallest deployments.
  • Pull vs push cache. Pull = first request populates cache (lazy). Push = pre-populate before launch (eager, for known-popular events). Pull is default; push is opt-in for live-streaming Superbowl-style events.
  • Shared vs per-customer caches. Shared cache (all customers share PoP capacity) is more efficient but creates noisy-neighbor risk. Per-customer quotas prevent one customer filling the cache. Some products isolate premium customers into dedicated capacity.
  • TLS cert distribution. Every PoP needs the cert for every customer on it. Options: replicate all certs to all PoPs (wastes storage), fetch on-demand (slow first-request), or SNI-based lazy pull with LRU. Cloudflare uses combination of Keyless SSL + selective pre-stage.
08

Failure Modes

🌋
PoP power / network failure
A PoP in a region loses upstream. Traffic normally routed there now has no path.
→ Mitigation: BGP announcement withdrawn on failure; Anycast redirects to next-closest PoP. Capacity planning assumes any PoP's load can fall on neighbors. Health checks + automated withdrawal within seconds.
🌊
DDoS attack — 10M rps on one customer
Attacker floods victim's hostname from botnets worldwide. Without CDN, victim's origin drops. With CDN, attack lands on CDN's edge.
→ Mitigation: Anycast naturally spreads attack across all PoPs (attacker can't target one location); per-customer rate limits at edge; L3/L4 volumetric scrubbing; WAF rules for L7 application attacks; emergency mode drops suspicious traffic before it hits backend.
🕳️
Cache poisoning via header tricks
Attacker crafts request with unusual headers that change response semantics but not cache key → serves poisoned content to everyone after them.
→ Mitigation: canonicalize cache keys; include security-relevant headers in Vary; CVE-style disclosure + monitoring for this pattern specifically.
📡
Purge doesn't propagate
Customer purges cache; one PoP misses the message; serves stale content to its users for hours.
→ Mitigation: purge messages have ACK from each PoP; control plane retries + alerts on missing ACKs; per-PoP purge log for auditing. Customer purges dashboard shows propagation per PoP.
🌀
Origin goes down → cache slowly empties → outage
Customer's origin is unreachable. As cache entries expire, edge returns 5xx.
→ Mitigation: stale-while-revalidate — serve stale content with a best-effort background revalidation; if origin is down, extend stale TTL automatically ("serve stale on error"). Customer opts in; most do.
🔒
TLS certificate expiry
Customer's cert expires (or CDN's auto-renewal fails). All users see browser cert warnings.
→ Mitigation: auto-renewal via ACME / Let's Encrypt; monitoring + alerts for certs within 30 days of expiry; cert health dashboards for customers; redundant cert providers so one CA outage doesn't block renewal.
09

Interview Tips

  1. Anycast first, not DNS. Modern CDN answer. Explain the mechanism in one sentence.
  2. Tiered cache beats flat-edge. Without it, your origin burns. Mention by name; it's a load-reducing design choice that shows sophistication.
  3. Purge is not magic. It's pub/sub with seconds of propagation. Don't say "instant cache invalidation."
  4. Request coalescing for stampedes. One-request-in-flight per key. Classic pattern that many candidates forget.
  5. Analytics + logs. CDN's real product is observability. Mention shipping edge logs → central analytics; it's half the value customers pay for.
  6. Stale-while-revalidate. One line, massive impact. Shows you know the production-grade answer.
11

Evolution

1

MVP — DNS-based geo routing, few PoPs

GeoDNS sends user to nearest PoP. Simple Varnish/nginx cache at each PoP. Origin fetch on miss. Works for small scale.

2

Anycast routing + 50+ PoPs

BGP-Anycast IP replaces GeoDNS. Failover automatic via BGP. Hundreds of thousands of req/sec per PoP.

3

Regional tier + origin shield

Regional mid-cache absorbs PoP misses. Origin load drops 10×. Tiered cache becomes default for all customers.

4

Edge compute + bot management

Customer-provided JS runs in V8 isolates at every PoP (Cloudflare Workers / Lambda@Edge). Bot management, WAF, A/B-splitting all at edge.

5

CDN as platform (DB / KV / pub-sub at edge)

Cloudflare Durable Objects, KV, R2 (object storage). Edge becomes an application platform — full apps shippable with no origin. Still evolving (2024+).

Next up