(a) Build Isolation. Each build runs in an ephemeral container spun from a base image per language (e.g., ci-golang:1.22, ci-java:21). The container has no outbound network access during build — hermetic builds. Dependencies are pre-fetched into a shared cache layer (content-addressed). Result: builds are fully reproducible. Same commit SHA = same artifact SHA, always. Container is destroyed after build; no state leaks between builds.
Why hermetic? Three reasons: (1) Reproducibility — build the same commit six months later and get the exact same binary. No "it worked on my machine." (2) Security — no network means a compromised dependency mirror can't inject malware at build time. (3) Cacheability — since builds are deterministic, unchanged modules produce the same output and can be skipped entirely (build-graph caching).
# Hermetic build: container spec
FROM ci-golang:1.22
COPY --from=dep-cache /deps /go/pkg/mod # pre-fetched, content-addressed
COPY . /src
RUN --network=none go build -o /out/app # no network access
RUN --network=none go test ./... # tests also hermetic
(b) Artifact Store. S3-backed, content-addressed by the SHA-256 of the artifact. Key design: never overwrite. Once sha256:abc123 is written, it is immutable. Rollback is trivial: point the deploy config at the previous SHA. No rebuild required. Retention policy garbage-collects artifacts older than 90 days with no active deploy reference. Total store size ~50 TB across org.
Content-addressing gives you deduplication for free. If two services produce the same binary from the same source, they share one artifact in S3. The store is append-only — writes are idempotent (PUT with If-None-Match). Reads go through a CDN layer for geographic caching; a deploy in us-east pulls artifacts from the local edge, not cross-region. Metadata (build_id, commit, timestamp, SBOM) lives in Postgres; the blob itself lives in S3. Signed URLs ensure only authorized deploy agents can pull artifacts.
// Artifact store put — idempotent, content-addressed
PUT /artifacts/{sha256}
If-None-Match: * // only write if doesn't exist
Content-Type: application/octet-stream
Body: <docker image tar>
// Deploy config — just a pointer to a SHA
{ "service": "payments",
"artifact_sha": "sha256:abc123",
"replicas": 50,
"canary_pct": 1 }
(c) Canary Analysis. When a new artifact is deployed, it receives 1% of production traffic. Over a 15-minute window, the canary analyzer collects p99 latency and error rate from both canary (new) and baseline (old) pods. It applies a Mann-Whitney U test (non-parametric, no normality assumption) to determine if the canary is statistically worse. If the test fails (p < 0.05 for regression), auto-rollback fires within 60 seconds. If the test passes, promote to 10%, repeat analysis, then 50%, then 100%.
Why Mann-Whitney instead of a simple threshold? A threshold like "error rate < 1%" is meaningless if your baseline already runs at 0.95%. Mann-Whitney compares the distributions of canary vs baseline samples. It detects relative regressions ("canary is statistically worse than baseline") regardless of absolute values. At 1% traffic over 15 minutes, you collect ~9K request samples — more than enough for statistical power. The test requires no normality assumption, which is important because latency distributions are heavily right-skewed.
# Canary analysis pseudocode
canary_latencies = collect_p99(canary_pods, window=15min)
baseline_latencies = collect_p99(baseline_pods, window=15min)
stat, p_value = mann_whitney_u(canary_latencies, baseline_latencies)
if p_value < 0.05 and median(canary) > median(baseline):
trigger_rollback() # canary is worse; roll back within 60s
else:
promote_to_next_stage() # 1% → 10% → 50% → 100%
Canary Deploy SequenceMermaid
sequenceDiagram
participant D as Developer
participant CI as CI Build
participant AS as Artifact Store
participant DC as Deploy Controller
participant CA as Canary Analyzer
participant LB as Load Balancer
D->>CI: git push (commit SHA)
CI->>CI: build + test in container
CI->>AS: push artifact (SHA-tagged)
CI->>DC: trigger deploy (artifact SHA)
DC->>LB: route 1% to canary pods
CA->>CA: collect metrics 15 min
CA->>DC: pass (p99 OK, errors OK)
DC->>LB: promote to 10%
CA->>CA: collect metrics 15 min
CA->>DC: pass
DC->>LB: promote 50% then 100%
(d) Feature Flags. Decouple deploy from release. Code ships "dark" — the feature exists in the artifact but is gated behind a flag. The flag service evaluates rules (user ID % 100 < rollout_pct). Rollout: 1% → 5% → 50% → 100%, independent of the deploy pipeline. If the feature causes issues, flip the flag off — no redeploy needed. Flags are evaluated at runtime with ~1 ms overhead (local cache + SSE push for updates).
Architecture: a central flag store (Postgres) holds flag definitions + targeting rules. Each application instance caches the full flag set in memory (~10 KB). The flag service pushes updates via SSE — when an engineer changes a flag, all instances receive the update within 2 seconds. Evaluation is local (no network call per request). Stale flags are a real problem: after a feature is fully rolled out, the flag should be cleaned up. A "flag hygiene" job alerts on flags older than 90 days with 100% rollout — time to remove the conditional and delete the flag.
// Feature flag evaluation — local, no network
func isEnabled(flagKey string, userID int64) bool {
flag := localCache.Get(flagKey) // in-memory, ~0 ms
if flag == nil { return false } // unknown flag = off
return userID % 100 < flag.RolloutPct // deterministic bucketing
}
Interview answer
"Developer pushes code. CI spins an ephemeral container from a language-specific base image — hermetic, no network. Build produces an immutable, SHA-tagged artifact stored in S3. CD deploys the artifact to 1% of traffic as a canary. Over 15 minutes, we compare canary p99 latency and error rate against baseline using a Mann-Whitney test. Auto-promote on pass through 10% / 50% / 100%. Auto-rollback on fail — just re-point to the previous SHA, no rebuild. Feature flags let us decouple deploy from release: ship dark, enable gradually."