Concept · Networking & Delivery

DNS

01

Why this matters

Every request to your API starts with "what IP is api.example.com?" That's DNS. Get it wrong and users can't reach you. Configure TTLs wrong and users are stuck on a dead IP for hours after you change it. Pick the wrong DNS provider and you die when they go down (see: Dyn 2016, half the internet offline).

DNS is also the first place in your stack where you can do global traffic management: geo-route users to the nearest region, fail over a dead datacenter, A/B test by %. If you don't use it that way, you're leaving power on the table.

02

How resolution works

Client wants api.example.com. It asks its resolver (usually run by the ISP or Google at 8.8.8.8, or Cloudflare 1.1.1.1). The resolver walks the DNS hierarchy:

  1. Root servers → "who handles .com?" → "these TLD servers."
  2. TLD servers → "who handles example.com?" → "these authoritative name servers."
  3. Authoritative name server → "what's api.example.com?" → "IP 203.0.113.42 (or: here's a CNAME → resolve that)."

Each answer includes a TTL (time-to-live) — how long the resolver can cache it. Next time a client asks, resolver returns the cached answer without walking the hierarchy.

03

Record types you care about

RecordPurposeExample
ADomain → IPv4api.example.com → 203.0.113.42
AAAADomain → IPv6api.example.com → 2001:db8::1
CNAMEAlias → another domainwww.example.com → example.com
MXMail server for a domainexample.com → mx.google.com
TXTArbitrary text (SPF, DKIM, domain verification)v=spf1 include:_spf.google.com ~all
NSWhich name servers are authoritativeexample.com → ns1.cloudflare.com
04

DNS as a traffic-management tool

Authoritative DNS servers can return different answers to different clients. This enables:

  • Geo-routing. US client asks → returns US-East IP. EU client asks → returns EU-West IP. Latency-optimal.
  • Weighted round-robin. 80% of answers point to region A, 20% to region B. Used for gradual traffic shifts during deploys.
  • Health-based failover. Authoritative DNS monitors each endpoint. If US-East fails health checks, return US-West until recovery.
  • Anycast. Many servers announce the same IP from different regions (BGP magic). The network itself routes each client to the nearest. Used by Cloudflare, most CDNs, and 1.1.1.1.
05

TTL — the propagation tradeoff

TTL controls how long resolvers cache the answer. Short TTL = fast failover, high query load on your DNS. Long TTL = cached everywhere, slow to propagate changes.

60–300s
TTL for actively-managed records (blue/green)
1–24 h
TTL for stable records
~48 h
worst-case stale cache (ignoring misbehaving resolvers)
~3 s
typical resolution time (cached)

Before planning a cutover: lower TTL days in advance. After cutover is stable: raise TTL again to reduce query load and cost.

TTL=300s — When Does Each Resolver See the New IP? SVG
t=0 t=100s t=300s (TTL) t=500s t=600s primary fails · update authoritative DNS Resolver A cached BAD until next miss → refresh now Resolver B cached entry expires → re-fetch Resolver C last to expire Misbehaving resolver ignores TTL · stuck on bad IP for hours
~300 s
avg failover with TTL=300
~10%
resolvers ignore TTL
~24 hr
long-tail of misbehaving resolvers
3-7 days
lower TTL ahead of planned cutover
06

Deep dive — anycast, the trick behind 1.1.1.1

Normal IP: one machine owns that IP. Anycast IP: many machines across the world each announce "I am 1.1.1.1" via BGP. When a client sends a packet to 1.1.1.1, the global routing table picks the topologically nearest announcer — almost always the closest one geographically.

Result: a single IP address served from ~300 cities. Any one PoP can fail and traffic automatically reroutes to the next closest. No DNS change needed. Clients don't know or care. This is how CDNs, global DNS resolvers, and modern anycast load balancers work.

For your systems: usually you rent anycast from someone else (Cloudflare, AWS Global Accelerator). Running your own anycast means BGP peering at multiple internet exchanges — a commitment, not a side project.

07

Real-world

Route 53

AWS managed DNS

Integrated with AWS. Geo routing, weighted routing, health checks. ~99.99% SLA. Default for AWS-native stacks.

Cloudflare DNS

Anycast everywhere

Fastest resolver, free. Used for both authoritative (*.cloudflare.com) and recursive (1.1.1.1).

NS1 / Dyn

Traffic-steering specialists

Sophisticated routing (real-user metrics, data-driven). Pricier. Used by Netflix, LinkedIn historically.

Redundant providers

Run two authoritative DNS providers

Post-Dyn 2016, serious companies keep DNS in two providers simultaneously (Cloudflare + Route 53). One goes down, the other answers.

08

Used in problems

URL shortener uses short, memorable domains + low TTL for fast failover. YouTube/Netflix use DNS geo-routing to direct users to the nearest CDN edge. Google Maps uses DNS-based region routing.

Next up