Concept · Reliability

Feature Flags & Rollouts

01

Why this matters

"Deploy a new feature to all 100M users at once" is asking for a 3am incident. Feature flags let you separate code deploy from feature release: ship the code dark, then turn it on for 1% of users, watch metrics, ramp to 10%, 50%, 100%. If anything goes wrong, flip it off — no rollback, no redeploy.

Modern continuous deployment depends on this. Stripe deploys hundreds of times per day; nothing reaches all customers without a flag-controlled rollout.

02

Four flavors of flag

TypeLifetimeUse for
Release flagDays–weeksHide an unfinished feature in production until it's ready
Experiment flagWeeks (A/B test duration)Compare variant A vs B; analyze which performs better
Permission flagLong-lived"Premium tier sees this; free tier doesn't"
Operational kill switchPermanent"Disable the recommendations service if it misbehaves" — pairs with graceful degradation
03

How a check actually evaluates

Code path:

if (flags.isEnabled("new-feed-ranker", { user_id, country, plan })) {
  return newRanker(user);
} else {
  return legacyRanker(user);
}

Inside isEnabled:

  1. Look up the flag's rules (cached locally; refreshed every few seconds).
  2. Apply targeting: "enabled for premium users in EU" → check the context.
  3. Apply percentage rollout: hash the user_id deterministically into a 0–99 bucket; if bucket < rollout%, return true.
  4. Cache the result.

Critical: the bucketing is deterministic per user. Same user always gets the same answer until rollout% increases. Otherwise users would flip in/out of the experience randomly per request.

04

Operational rules that make flags safe

  • Default to OFF for new flags. Code that hasn't been tested in prod stays dark.
  • Test both branches in CI. Run your suite with the flag on and off.
  • Time-bound release flags. Track them in a registry; flags older than 60 days get reviewed and deleted. Otherwise you accumulate dead branches.
  • Monitor by flag. Each flag should expose metrics: enabled-vs-disabled performance, error rate per branch. Alerts when one branch regresses.
  • Make rollback instant. Flag changes propagate in < 60s. If you have to redeploy to disable a feature, your flag system is broken.
05

Deep dive — progressive delivery patterns

Beyond simple percentage rollouts, modern teams use a ladder:

  1. Internal dogfood — flag enabled only for company employees (filter by email domain). Catch egregious bugs.
  2. Beta cohort — opt-in users / specific accounts. Real-world feedback at low risk.
  3. Canary deploy — code rolled out to 1 server out of N. Hardware-level isolation. If that one server crashes, only its share of traffic is affected.
  4. 1% percentage rollout — flag bumps from 0% to 1%. Watch error rate, latency, business metrics. Hold for an hour.
  5. 10% → 50% → 100% — over hours or days. Each step is a checkpoint.
  6. Cleanup — once at 100% for a stable period, remove the flag and the legacy branch. Avoid permanent flag debt.

Sophisticated platforms (LaunchDarkly, Split, Optimizely) automate the ladder: define a rollout policy, the platform advances rollout% based on guardrail metrics. Halt automatically if anomalies detected.

Interview one-liner

"Every major change is behind a feature flag. We deploy code dark, ramp via percentage rollout with metric guardrails, and have an instant kill switch. Failed releases roll back in < 1 minute by flipping the flag — no redeploy required."

Progressive Delivery Ladder SVG
Internal ~50 users · 1 day Beta ~500 · 3 days 1% 1 hour hold 10% 2 hour hold 50% 12 hour hold 100% cleanup & remove flag Each step: SLO check → promote OR auto-revert Any guardrail breach at any step → instant flag-off (rollback in < 60s)
< 60 s
flag-flip propagation (LaunchDarkly)
+0.1%
error-rate threshold for auto-revert
+10%
P99 latency threshold for auto-revert
60 days
flag age before mandatory cleanup review
06

Real-world

LaunchDarkly

Managed feature-flag platform

SDKs for every language. Sub-second flag propagation. The default for teams that don't want to build it themselves.

Statsig / Optimizely

Flags + experimentation

Automatic A/B test analysis per flag. Used by companies with heavy experiment cadence.

Stripe / Netflix

Internal platforms

Built their own at scale. Stripe's "Sorbet" and Netflix's "Mantis" tightly integrate with their deploy pipelines.

Cloudflare Pages / Vercel

Edge-config flags

Flags read at the CDN edge → instant propagation worldwide. Useful for static-site flag flips.

07

Used in problems

News feed gates new ranker rollouts via flags + A/B test. E-commerce uses flags for checkout flow experiments + kill switches per payment provider. Notification system gates new channel integrations.

Next up