Code Deployment Pipeline

Developer pushes code. CI builds and tests inside isolated ephemeral containers. An immutable, SHA-tagged artifact is stored in an S3-backed content-addressed store. CD does a canary deploy to 1% of traffic, runs automated statistical analysis comparing canary vs baseline metrics over 15 minutes, then progressively promotes: 10% → 50% → 100%. Rollback = re-deploy the previous artifact — no new build needed. Feature flags decouple deploy from release: ship code dark, enable for 1% → 100% independently. Google, Netflix, Meta — same pattern, different scale.

Core: Hermetic Build + Immutable Artifact + Canary Analysis~10K deploys/day~1000 concurrent buildsArtifact store ~50 TBAuto-rollback on regression

Requirements

Functional

Developer pushes to main → CI automatically triggers build + test in ephemeral containers
Build produces an immutable artifact (Docker image, SHA-tagged, content-addressed)
CD performs progressive rollout: canary 1% → 10% → 50% → 100%
Automated canary analysis compares p99 latency + error rate against baseline
Auto-promote on pass; auto-rollback on fail (re-deploy previous SHA)
Feature flags allow toggling features independently of deploy pipeline

Non-Functional

Hermetic builds — no network during build except cached deps; fully reproducible
Build time < 5 min p90 — developers won't wait longer
Canary analysis window: 15 min with statistical significance
Rollback latency < 60 sec from detection to full rollback
Artifact store: never overwrite — append-only, content-addressed
Support ~10K deploys/day, ~1000 concurrent build jobs

Scale Estimation

Deploys per day

~10K

across all services in the org; peak during business hours

Concurrent builds

~1000

ephemeral containers spun up on a shared build cluster

Artifact store

~50 TB

S3-backed; content-addressed by SHA; GC after retention policy

Build p90 latency

< 5 min

including container spin-up, dep fetch, compile, test

Canary window

15 min

enough samples for statistical significance at 1% traffic

Rollback time

< 60 sec

re-point deploy config at previous SHA; no rebuild needed

API Design

POST/api/builds

Trigger a new build. Body: {repo, commit_sha, branch}. Returns {build_id, status: queued}. Webhook fires on completion.

GET/api/builds/{build_id}

Poll build status. Returns {status: queued|building|testing|passed|failed, artifact_sha, logs_url, duration_sec}.

POST/api/deploys

Start a deployment. Body: {service, artifact_sha, strategy: canary|blue_green|rolling}. Returns {deploy_id, status: canary_1pct}.

POST/api/deploys/{deploy_id}/rollback

Immediate rollback. Re-deploys previous artifact SHA. Returns {deploy_id, rolled_back_to: prev_sha}.

GET/api/artifacts/{sha}

Fetch artifact metadata. Returns {sha, build_id, created_at, size_bytes, image_url}. Artifact itself served via signed S3 URL.

Architecture

Four tiers: Build tier (ephemeral containers compile + test), Artifact tier (S3 content-addressed store), Deploy tier (canary + progressive rollout), Observability tier (metrics collection + canary analysis). Git push triggers the pipeline end-to-end.

Deployment Pipeline ArchitectureSVG

Request Flow — Step Through

Developer · git push→CI Build · ephemeral container→Artifact Store · S3, SHA-tagged→Deploy Controller · canary orchestrator→Canary (1%) · new artifact→Canary Analyzer · Mann-Whitney test→Promote/Rollback · 10% → 50% → 100%

Click Next Step to walk through the request flow.

Deep Dive — Build Isolation, Canary Analysis & Feature Flags

(a) Build Isolation. Each build runs in an ephemeral container spun from a base image per language (e.g., ci-golang:1.22, ci-java:21). The container has no outbound network access during build — hermetic builds. Dependencies are pre-fetched into a shared cache layer (content-addressed). Result: builds are fully reproducible. Same commit SHA = same artifact SHA, always. Container is destroyed after build; no state leaks between builds.

Why hermetic? Three reasons: (1) Reproducibility — build the same commit six months later and get the exact same binary. No "it worked on my machine." (2) Security — no network means a compromised dependency mirror can't inject malware at build time. (3) Cacheability — since builds are deterministic, unchanged modules produce the same output and can be skipped entirely (build-graph caching).

# Hermetic build: container spec
FROM ci-golang:1.22
COPY --from=dep-cache /deps /go/pkg/mod   # pre-fetched, content-addressed
COPY . /src
RUN --network=none go build -o /out/app   # no network access
RUN --network=none go test ./...           # tests also hermetic

(b) Artifact Store. S3-backed, content-addressed by the SHA-256 of the artifact. Key design: never overwrite. Once sha256:abc123 is written, it is immutable. Rollback is trivial: point the deploy config at the previous SHA. No rebuild required. Retention policy garbage-collects artifacts older than 90 days with no active deploy reference. Total store size ~50 TB across org.

Content-addressing gives you deduplication for free. If two services produce the same binary from the same source, they share one artifact in S3. The store is append-only — writes are idempotent (PUT with If-None-Match). Reads go through a CDN layer for geographic caching; a deploy in us-east pulls artifacts from the local edge, not cross-region. Metadata (build_id, commit, timestamp, SBOM) lives in Postgres; the blob itself lives in S3. Signed URLs ensure only authorized deploy agents can pull artifacts.

// Artifact store put — idempotent, content-addressed
PUT /artifacts/{sha256}
  If-None-Match: *          // only write if doesn't exist
  Content-Type: application/octet-stream
  Body: <docker image tar>

// Deploy config — just a pointer to a SHA
{ "service": "payments",
  "artifact_sha": "sha256:abc123",
  "replicas": 50,
  "canary_pct": 1 }

(c) Canary Analysis. When a new artifact is deployed, it receives 1% of production traffic. Over a 15-minute window, the canary analyzer collects p99 latency and error rate from both canary (new) and baseline (old) pods. It applies a Mann-Whitney U test (non-parametric, no normality assumption) to determine if the canary is statistically worse. If the test fails (p < 0.05 for regression), auto-rollback fires within 60 seconds. If the test passes, promote to 10%, repeat analysis, then 50%, then 100%.

Why Mann-Whitney instead of a simple threshold? A threshold like "error rate < 1%" is meaningless if your baseline already runs at 0.95%. Mann-Whitney compares the distributions of canary vs baseline samples. It detects relative regressions ("canary is statistically worse than baseline") regardless of absolute values. At 1% traffic over 15 minutes, you collect ~9K request samples — more than enough for statistical power. The test requires no normality assumption, which is important because latency distributions are heavily right-skewed.

# Canary analysis pseudocode
canary_latencies  = collect_p99(canary_pods, window=15min)
baseline_latencies = collect_p99(baseline_pods, window=15min)
stat, p_value = mann_whitney_u(canary_latencies, baseline_latencies)
if p_value < 0.05 and median(canary) > median(baseline):
    trigger_rollback()    # canary is worse; roll back within 60s
else:
    promote_to_next_stage()  # 1% → 10% → 50% → 100%

Canary Deploy SequenceMermaid

sequenceDiagram participant D as Developer participant CI as CI Build participant AS as Artifact Store participant DC as Deploy Controller participant CA as Canary Analyzer participant LB as Load Balancer D->>CI: git push (commit SHA) CI->>CI: build + test in container CI->>AS: push artifact (SHA-tagged) CI->>DC: trigger deploy (artifact SHA) DC->>LB: route 1% to canary pods CA->>CA: collect metrics 15 min CA->>DC: pass (p99 OK, errors OK) DC->>LB: promote to 10% CA->>CA: collect metrics 15 min CA->>DC: pass DC->>LB: promote 50% then 100%

(d) Feature Flags. Decouple deploy from release. Code ships "dark" — the feature exists in the artifact but is gated behind a flag. The flag service evaluates rules (user ID % 100 < rollout_pct). Rollout: 1% → 5% → 50% → 100%, independent of the deploy pipeline. If the feature causes issues, flip the flag off — no redeploy needed. Flags are evaluated at runtime with ~1 ms overhead (local cache + SSE push for updates).

Architecture: a central flag store (Postgres) holds flag definitions + targeting rules. Each application instance caches the full flag set in memory (~10 KB). The flag service pushes updates via SSE — when an engineer changes a flag, all instances receive the update within 2 seconds. Evaluation is local (no network call per request). Stale flags are a real problem: after a feature is fully rolled out, the flag should be cleaned up. A "flag hygiene" job alerts on flags older than 90 days with 100% rollout — time to remove the conditional and delete the flag.

// Feature flag evaluation — local, no network
func isEnabled(flagKey string, userID int64) bool {
    flag := localCache.Get(flagKey)       // in-memory, ~0 ms
    if flag == nil { return false }       // unknown flag = off
    return userID % 100 < flag.RolloutPct // deterministic bucketing
}

Interview answer

"Developer pushes code. CI spins an ephemeral container from a language-specific base image — hermetic, no network. Build produces an immutable, SHA-tagged artifact stored in S3. CD deploys the artifact to 1% of traffic as a canary. Over 15 minutes, we compare canary p99 latency and error rate against baseline using a Mann-Whitney test. Auto-promote on pass through 10% / 50% / 100%. Auto-rollback on fail — just re-point to the previous SHA, no rebuild. Feature flags let us decouple deploy from release: ship dark, enable gradually."

Anti-patterns

🚫

Deploy on Friday at 5 PM — "it's just a small change"

No one is watching dashboards over the weekend. A subtle regression in p99 latency compounds for 60 hours before Monday morning. Incident response time goes from minutes to days.

Better: Deploy windows during business hours. Freeze deploys Friday afternoon. Canary analysis requires humans on-call to respond to alerts.

🚫

Mutable artifacts — "just SSH in and patch the JAR on the production server"

You can never reproduce what's running in prod. Rollback is impossible because the previous state is gone. Debugging requires guessing what was patched. Audit trail is destroyed.

Better: Immutable, SHA-tagged artifacts. Every deploy is a known, reproducible state. Rollback = point to previous SHA. Always.

🚫

All-at-once deploy with no canary — "we tested it in staging, ship it"

Staging never perfectly mirrors production traffic patterns. A bug that only manifests under real load hits 100% of users simultaneously. Blast radius is the entire fleet.

Better: Canary 1% with automated analysis. If it regresses, only 1% of users were affected for 15 minutes. Auto-rollback limits blast radius.

Tradeoffs & Design Choices

Canary vs Blue-Green. Canary: gradual traffic shift (1% → 100%), fine-grained analysis, but slower rollout. Blue-Green: instant swap between two full environments, fast rollback, but double infrastructure cost and no gradual validation.
Hermetic builds vs cached network fetches. Hermetic (no network) = reproducible, secure, no supply-chain attack at build time. Cost: must pre-populate dependency cache. Cached fetches = faster cold start, but builds break if a dependency is yanked (left-pad incident).
Statistical canary vs threshold-based. Threshold (e.g., "error rate < 1%") gives false positives when baseline is already at 0.95%. Mann-Whitney compares distributions directly — detects real regressions even when absolute values look fine.
Feature flags vs deploy-gated releases. Feature flags add runtime complexity (flag evaluation on every request, stale flags accumulate). But they let you decouple risky feature rollouts from infrastructure deploys — a 10-line config change doesn't need canary analysis.
Monorepo vs polyrepo CI. Monorepo: one pipeline triggers affected services (Google-style). Polyrepo: each repo owns its pipeline. Monorepo is simpler for cross-service changes but requires smart build-graph analysis to avoid rebuilding everything.
Fast rollback vs fast forward. Rollback (re-deploy old artifact) is instant but means running old code with possible data-schema mismatches. Forward-fix (push a new commit with the fix) is slower but maintains forward progress. Best practice: rollback immediately to stop the bleeding, then forward-fix at normal pace.

Failure Modes

💥

Build cluster overloaded — builds queued for 30+ minutes

1000 developers push simultaneously after a merge window. Build queue backs up. Developer velocity drops to zero.

→ Mitigation: auto-scale build cluster on queue depth. Priority lanes for hotfix branches. Build deduplication (same commit = reuse in-flight build).

🔥

Canary analysis gives false positive — promotes a bad build

15-minute window isn't long enough to catch a slow memory leak. Canary passes, promotes to 100%, OOM crash 2 hours later.

→ Mitigation: post-promotion monitoring with auto-rollback triggers. Extended soak tests for memory/disk metrics. Separate slow-burn canary for latency-insensitive regressions.

🔄

Artifact store corruption — SHA mismatch

S3 returns a corrupted artifact. Deploy rolls out a broken binary to canary pods.

→ Mitigation: verify SHA-256 checksum on pull before deploying. Content-addressed means any corruption is detectable. Fail deploy if checksum doesn't match.

⏰

Rollback cascade — new version is already depended on

Service A v2 deployed. Service B deployed against A v2's new API. Rolling back A to v1 breaks B.

→ Mitigation: backward-compatible API changes only. Feature flags to gate new endpoints. Expand-contract migration pattern for breaking changes.

🛡

Supply chain attack — compromised dependency

Malicious package published to npm/PyPI. Build fetches it and bakes it into the artifact.

→ Mitigation: hermetic builds with pinned, pre-verified dependency cache. SBOM (software bill of materials) generated per artifact. Dependency scanning in CI.

Interview Tips

Lead with immutability. "Artifacts are SHA-tagged and never overwritten. Rollback is just re-pointing to the previous SHA." This single idea simplifies everything.
Name the statistical test. "Mann-Whitney U for canary analysis" — shows you've thought beyond "just check if errors go up."
Distinguish deploy from release. "Deploy puts code on servers. Release exposes it to users. Feature flags let you do these independently." This nuance impresses.
Talk about blast radius. "Canary at 1% means worst case, 1% of users see a regression for 15 minutes before auto-rollback." Quantify the risk.
Mention hermetic builds. "No network during build. Dependencies from a pre-populated cache. Same commit = same artifact, always." Production-grade insight.

Evolution

MVP — Manual deploy via SSH

Developer builds locally, SCPs binary to server, restarts the process manually. Works for 1 server. No rollback — if something breaks, you SSH in and fix it live. Deployment is a human ritual, not a pipeline.

CI server + scripted deploy

Jenkins watches the repo, builds on push. A shell script SCPs the artifact to every server and restarts. All-at-once — if it's broken, every server is broken. Rollback = re-run the old build, which takes 5 minutes. Better, but still painful.

Immutable artifacts + container orchestration

Docker images tagged by SHA-256. Kubernetes rolling update replaces pods gradually. Rollback = kubectl set image deployment/X image=sha256:prev. No rebuild needed. First taste of sub-minute rollback. Handles 100+ services.

Canary deploys + automated statistical analysis

New artifacts go to 1% of traffic first. Canary analyzer runs Mann-Whitney test on p99 latency and error rate vs baseline over 15 minutes. Auto-promote through 10% / 50% / 100% or auto-rollback. Blast radius limited to 1% of users for at most 15 minutes.

Feature flags + hermetic builds + SBOM

Full supply-chain security with hermetic builds and software bill of materials. Feature flags decouple deploy from release — ship code dark, enable gradually. Progressive delivery with fine-grained audience targeting (by user segment, geography, device). Flag hygiene automation cleans up stale flags.

📺