Flash Sale

100K users hit "Buy" at the exact same second for 1K limited items. No overselling. The hard parts: a waiting room that absorbs the thundering herd without crashing your backend, atomic inventory decrement that never goes negative even under 100K concurrent requests, and a payment pipeline that pre-authorizes cards before entering the queue so checkout is instant for winners. Amazon Prime Day, Xiaomi phone drops, Supreme releases — same pattern, different merch.

⚡ Core: Waiting Room + Atomic Inventory + Fair Queue100K concurrent users1K itemsSub-second checkoutZero oversell
02

Requirements

Functional
  • Admin creates a flash sale event: product, quantity, start time, duration
  • Users enter a waiting room before sale starts; admitted in controlled batches at T=0
  • Admitted user sees product page with live inventory count → clicks Buy → instant checkout
  • Payment pre-authorized during queue wait; charge on successful purchase only
  • Fair ordering: FIFO within the waiting room; no advantage to page-refresh spam
  • Sold-out notification to remaining queued users immediately
Non-Functional
  • Zero oversell — never sell item #1001 of 1000
  • Checkout latency < 500 ms for admitted users
  • Waiting room absorbs 100K+ concurrent connections
  • Sale start within < 1 sec of configured time (clock-accurate)
  • Survive bot attacks: CAPTCHA + rate-limit + proof-of-work
  • Graceful: sold-out ≠ error; it's a clean UX state
03

Scale Estimation

Concurrent users at T=0
~100K
all hitting the same endpoint within 1 second
Inventory
1K
flash-sale items; sometimes as low as 10 for hype drops
Admission rate
~500/sec
controlled drain from waiting room to checkout
Checkout latency
< 500 ms
pre-authed payment → atomic decrement → confirm
Bot traffic
~60%
of total requests during hype drops; must be filtered
Sale duration
~2–30 min
popular items sell out in seconds; long-tail in minutes
04

API Design

POST/api/flash-sales/{sale_id}/enter-queue

User enters waiting room. Returns {queue_ticket, position, estimated_wait_sec}. Ticket is a signed JWT with enqueue timestamp — proves FIFO position. Requires CAPTCHA token.

GET/api/flash-sales/{sale_id}/status?ticket=TOKEN

Poll queue status. Returns {status: waiting|admitted|sold_out, position, inventory_remaining}. Client polls every 2 s or receives SSE push.

POST/api/flash-sales/{sale_id}/purchase

Admitted user purchases. Body: {ticket, payment_token, quantity: 1}. Atomic inventory decrement + payment capture. Returns {order_id, status: success|sold_out}. Idempotency via ticket — can't buy twice.

POST/api/flash-sales/{sale_id}/pre-auth

Pre-authorize payment while in queue. Returns {auth_id}. If user doesn't win, auth voided automatically after 30 min.

GET/api/flash-sales/{sale_id}/inventory

Live inventory count. Served from Redis; eventually consistent (~1 s lag). Used for countdown UI.

05

Architecture

Three tiers isolated by concern: queue tier (absorb + order the herd), checkout tier (atomic purchase), payment tier (pre-auth + capture). A CDN serves the static product page; only the queue-enter and purchase calls hit the backend.

Flash Sale ArchitectureSVG
Users (100K)browsers + bots CDN + WAFstatic page + CAPTCHA API Gatewayrate-limit + bot filter Queue ServiceRedis Sorted SetFIFO by enqueue_ts Admission Controllerdrains N/sec Payment svcpre-auth + capture Checkout svcatomic purchase Redis: inventoryDECR if > 0 (Lua) Postgres: ordersauthoritative ledger Kafka: eventsorder created / sold out Notification: push sold-out to all remaining queue users via SSE / WebSocket
Request Flow — Step Through
User · enters queueCDN + WAF · CAPTCHA + staticQueue svc · Redis ZADDAdmission ctrl · drains N/secCheckout svc · atomic purchaseRedis inventory · Lua DECRPayment · capture pre-auth
Click Next Step to walk through the request flow.
06

Deep Dive — Atomic Inventory + Waiting Room

The single hardest problem: 100K users all call POST /purchase in 1 second. If each checks inventory, sees "999 remaining," and then decrements — you sell 100K items instead of 1K. Classic lost-update / check-then-act race.

Solution: Redis Lua atomic decrement.

-- Redis Lua script: atomic decrement-if-positive
local stock = redis.call('GET', KEYS[1])
if tonumber(stock) > 0 then
  redis.call('DECR', KEYS[1])
  return 1  -- success
else
  return 0  -- sold out
end

This runs atomically inside Redis (single-threaded). No race. No oversell. Each call either gets a unit or doesn't. The caller gets a definitive answer in < 1 ms.

Waiting room. Without it, 100K users slam the checkout tier simultaneously — even if inventory check is atomic, the backend can't handle 100K DB writes/sec. Solution: a queue that controls admission rate.

  1. User enters queue. Redis ZADD with score = enqueue timestamp. Returns ticket (signed JWT) + position.
  2. Admission controller (a cron or timer) pops the top N users per second (e.g., 500/sec) from the sorted set. Marks them "admitted" in Redis.
  3. Admitted user's next poll returns status=admitted. Client-side JS transitions to checkout page.
  4. User calls POST /purchase with ticket. Checkout service verifies ticket is admitted, calls Redis Lua decrement, on success writes order to Postgres + captures pre-authed payment.
  5. When inventory hits 0, admission controller stops admitting. Kafka event fires. SSE/WS pushes "sold out" to all remaining queue connections.
Purchase Sequence — Queue to CheckoutMermaid
sequenceDiagram participant U as User participant Q as Queue svc participant AC as Admission ctrl participant C as Checkout svc participant R as Redis (inventory) participant P as Payment svc U->>Q: POST /enter-queue (CAPTCHA) Q-->>U: ticket + position=4521 U->>Q: GET /status (poll) Q-->>U: waiting, position=312 AC->>Q: pop top 500 from ZSET Q-->>AC: [user_tokens...] AC->>Q: mark admitted U->>Q: GET /status Q-->>U: admitted U->>C: POST /purchase (ticket) C->>R: Lua DECR-if-positive R-->>C: 1 (success) C->>P: capture pre-auth P-->>C: captured C-->>U: order confirmed

Pre-authorization. While user is in queue, client silently calls POST /pre-auth with their saved card. If they win a slot, capture is instant — no "enter your card" delay at checkout. If they lose, auth auto-voids after 30 min. UX: "You're in line. We'll charge your card only if you get one."

Interview answer

"CDN serves the product page. Users enter a FIFO waiting room (Redis sorted set by enqueue timestamp). An admission controller drains N users/sec into checkout. Inventory is a single Redis key decremented atomically via a Lua script — no oversell possible. Payment is pre-authorized during queue wait; on admission + successful decrement, payment captured instantly. When inventory hits 0, 'sold out' pushed to all remaining queue users via SSE. Total purchase latency for admitted user: < 500 ms."

Anti-patterns

🚫
SELECT stock FROM products WHERE id=X; if stock > 0 then UPDATE stock = stock - 1

Classic check-then-act race. 100K users all see stock=999 simultaneously; all decrement. You sell 100K items.

✓ Better: Redis Lua atomic DECR-if-positive. Single-threaded, no race, sub-ms.
🚫
Let all 100K users hit checkout simultaneously — "the server will handle it"

No server handles 100K concurrent DB writes/sec gracefully. You get 503s, timeouts, and angry customers.

✓ Better: Waiting room with controlled admission rate (500/sec). Backend sees a smooth stream, not a spike.
🚫
Checkout asks user to enter card info after winning the slot

User takes 30 seconds to type card number. Slot is wasted. Others in queue wait longer. Conversion drops.

✓ Better: Pre-authorize payment while in queue. Checkout = one-click capture. Sub-second.
07

Tradeoffs & Design Choices

  • Redis atomic vs Postgres FOR UPDATE. Redis Lua: ~50K ops/sec on single key, sub-ms. Postgres pessimistic lock: ~5K/sec, 2–5 ms. Redis wins overwhelmingly for the hot path. Postgres is the authoritative ledger written after Redis succeeds.
  • Waiting room (fair, controlled) vs first-come-first-served (unfair, chaotic). Without a queue, users with lower latency (closer server, faster network) win. Queue equalizes: everyone gets a ticket; admission is FIFO. Supreme and Nike SNKRS both adopted queues for this reason.
  • Per-item vs per-SKU inventory. Flash sales typically have one SKU. Single Redis key works. Multi-SKU (sizes, colors) needs one key per variant — same Lua pattern, multiple keys.
  • Optimistic (check-and-set) vs pessimistic (lock). At 100K contenders, optimistic = 99K retries. Pessimistic = sequential. Redis Lua is effectively "pessimistic within a single thread" — no retries, no contention, because there's only one thread.
  • Pre-auth cost. Pre-authorizing 100K cards when only 1K win = 99K void auths. Payment processors charge ~$0.01 per auth. 100K × $0.01 = $1K per sale — acceptable for high-value items; questionable for $5 items.
08

Failure Modes

🤖
Bot army floods the queue
Scalper bots create 10K queue entries in 1 second with different IPs / accounts.
→ Mitigation: CAPTCHA on queue entry; proof-of-work (client computes hash challenge); device fingerprinting; rate-limit per account + IP; ban known bot ASNs.
💥
Redis crashes mid-sale
Inventory key lost. Remaining stock unknown. Oversell risk if we guess wrong.
→ Mitigation: Redis with AOF persistence (fsync every second). On crash-restart, replay AOF to recover inventory count. Belt: Postgres order count provides authoritative floor — never sell more than Postgres says remaining.
🔁
User retries purchase on timeout
Network hiccup between checkout svc and user. User retries. Charged twice.
→ Mitigation: ticket-based idempotency. Each ticket can purchase exactly once. Retry with same ticket returns the same order_id.
Clock skew — sale starts early on some servers
Server A's clock is 2 seconds ahead of Server B. Users on A enter queue 2 seconds early → unfair advantage.
→ Mitigation: queue-open time enforced by a single Redis key (admin sets flag at T=0 via one write). Servers check Redis, not local clock, for "is queue open?"
📉
Admitted user abandons checkout
User is admitted but never calls /purchase. Slot wasted; users behind them wait longer.
→ Mitigation: admission ticket expires in 60 seconds. If not used, admission is revoked and next user in queue is admitted.
09

Interview Tips

  1. Lead with the waiting room. "100K users → 500/sec admission rate → smooth backend load." This is the insight that separates good from great.
  2. Redis Lua for atomic inventory. Name it explicitly — "single Redis key, Lua DECR-if-positive, no race." Shows you've actually built these systems.
  3. Pre-auth payment in queue. This UX detail (checkout = one click, not "enter card") is the kind of production nuance interviewers love.
  4. Bot mitigation is load-bearing. Without it, bots take all 1K items. CAPTCHA + proof-of-work + device fingerprinting. Don't skip this.
  5. Distinguish from Ticketmaster. Ticketmaster = seat selection + hold pattern. Flash sale = single SKU + atomic counter. Simpler inventory model, harder contention model.
11

Evolution

1

MVP — Postgres UPDATE with row lock

Single DB, UPDATE stock = stock - 1 WHERE stock > 0. Works to ~100 concurrent users. Falls over at 1K.

2

Redis atomic counter + DB sync

Redis DECR for the hot path. Async write confirmed orders to Postgres. Handles ~10K/sec.

3

Waiting room + admission control

Queue absorbs the herd. Controlled drain rate keeps backend healthy. Fair FIFO ordering.

4

Pre-auth + one-click checkout

Payment authorized during queue wait. Winners checkout in < 500 ms. Losers' auths auto-void.

5

Bot mitigation + regional queues

CAPTCHA + proof-of-work + device fingerprinting. Per-region queue shards for global sales (Xiaomi India vs Xiaomi China).

Next up