100K users hit "Buy" at the exact same second for 1K limited items. No overselling. The hard parts:
a waiting room that absorbs the thundering herd without crashing your backend,
atomic inventory decrement that never goes negative even under 100K concurrent requests,
and a payment pipeline that pre-authorizes cards before entering the queue so checkout is instant for winners.
Amazon Prime Day, Xiaomi phone drops, Supreme releases — same pattern, different merch.
flash-sale items; sometimes as low as 10 for hype drops
Admission rate
~500/sec
controlled drain from waiting room to checkout
Checkout latency
< 500 ms
pre-authed payment → atomic decrement → confirm
Bot traffic
~60%
of total requests during hype drops; must be filtered
Sale duration
~2–30 min
popular items sell out in seconds; long-tail in minutes
04
API Design
POST/api/flash-sales/{sale_id}/enter-queue
User enters waiting room. Returns {queue_ticket, position, estimated_wait_sec}. Ticket is a signed JWT with enqueue timestamp — proves FIFO position. Requires CAPTCHA token.
GET/api/flash-sales/{sale_id}/status?ticket=TOKEN
Poll queue status. Returns {status: waiting|admitted|sold_out, position, inventory_remaining}. Client polls every 2 s or receives SSE push.
Pre-authorize payment while in queue. Returns {auth_id}. If user doesn't win, auth voided automatically after 30 min.
GET/api/flash-sales/{sale_id}/inventory
Live inventory count. Served from Redis; eventually consistent (~1 s lag). Used for countdown UI.
05
Architecture
Three tiers isolated by concern: queue tier (absorb + order the herd), checkout tier (atomic purchase), payment tier (pre-auth + capture). A CDN serves the static product page; only the queue-enter and purchase calls hit the backend.
The single hardest problem: 100K users all call POST /purchase in 1 second. If each checks inventory, sees "999 remaining," and then decrements — you sell 100K items instead of 1K. Classic lost-update / check-then-act race.
Solution: Redis Lua atomic decrement.
-- Redis Lua script: atomic decrement-if-positive
local stock = redis.call('GET', KEYS[1])
if tonumber(stock) > 0 then
redis.call('DECR', KEYS[1])
return 1 -- success
else
return 0 -- sold out
end
This runs atomically inside Redis (single-threaded). No race. No oversell. Each call either gets a unit or doesn't. The caller gets a definitive answer in < 1 ms.
Waiting room. Without it, 100K users slam the checkout tier simultaneously — even if inventory check is atomic, the backend can't handle 100K DB writes/sec. Solution: a queue that controls admission rate.
User enters queue. Redis ZADD with score = enqueue timestamp. Returns ticket (signed JWT) + position.
Admission controller (a cron or timer) pops the top N users per second (e.g., 500/sec) from the sorted set. Marks them "admitted" in Redis.
Admitted user's next poll returns status=admitted. Client-side JS transitions to checkout page.
User calls POST /purchase with ticket. Checkout service verifies ticket is admitted, calls Redis Lua decrement, on success writes order to Postgres + captures pre-authed payment.
When inventory hits 0, admission controller stops admitting. Kafka event fires. SSE/WS pushes "sold out" to all remaining queue connections.
Purchase Sequence — Queue to CheckoutMermaid
sequenceDiagram
participant U as User
participant Q as Queue svc
participant AC as Admission ctrl
participant C as Checkout svc
participant R as Redis (inventory)
participant P as Payment svc
U->>Q: POST /enter-queue (CAPTCHA)
Q-->>U: ticket + position=4521
U->>Q: GET /status (poll)
Q-->>U: waiting, position=312
AC->>Q: pop top 500 from ZSET
Q-->>AC: [user_tokens...]
AC->>Q: mark admitted
U->>Q: GET /status
Q-->>U: admitted
U->>C: POST /purchase (ticket)
C->>R: Lua DECR-if-positive
R-->>C: 1 (success)
C->>P: capture pre-auth
P-->>C: captured
C-->>U: order confirmed
Pre-authorization. While user is in queue, client silently calls POST /pre-auth with their saved card. If they win a slot, capture is instant — no "enter your card" delay at checkout. If they lose, auth auto-voids after 30 min. UX: "You're in line. We'll charge your card only if you get one."
Interview answer
"CDN serves the product page. Users enter a FIFO waiting room (Redis sorted set by enqueue timestamp). An admission controller drains N users/sec into checkout. Inventory is a single Redis key decremented atomically via a Lua script — no oversell possible. Payment is pre-authorized during queue wait; on admission + successful decrement, payment captured instantly. When inventory hits 0, 'sold out' pushed to all remaining queue users via SSE. Total purchase latency for admitted user: < 500 ms."
⚠
Anti-patterns
🚫
SELECT stock FROM products WHERE id=X; if stock > 0 then UPDATE stock = stock - 1
Classic check-then-act race. 100K users all see stock=999 simultaneously; all decrement. You sell 100K items.
✓ Better: Redis Lua atomic DECR-if-positive. Single-threaded, no race, sub-ms.
🚫
Let all 100K users hit checkout simultaneously — "the server will handle it"
No server handles 100K concurrent DB writes/sec gracefully. You get 503s, timeouts, and angry customers.
✓ Better: Waiting room with controlled admission rate (500/sec). Backend sees a smooth stream, not a spike.
🚫
Checkout asks user to enter card info after winning the slot
User takes 30 seconds to type card number. Slot is wasted. Others in queue wait longer. Conversion drops.
✓ Better: Pre-authorize payment while in queue. Checkout = one-click capture. Sub-second.
07
Tradeoffs & Design Choices
Redis atomic vs Postgres FOR UPDATE. Redis Lua: ~50K ops/sec on single key, sub-ms. Postgres pessimistic lock: ~5K/sec, 2–5 ms. Redis wins overwhelmingly for the hot path. Postgres is the authoritative ledger written after Redis succeeds.
Waiting room (fair, controlled) vs first-come-first-served (unfair, chaotic). Without a queue, users with lower latency (closer server, faster network) win. Queue equalizes: everyone gets a ticket; admission is FIFO. Supreme and Nike SNKRS both adopted queues for this reason.
Per-item vs per-SKU inventory. Flash sales typically have one SKU. Single Redis key works. Multi-SKU (sizes, colors) needs one key per variant — same Lua pattern, multiple keys.
Optimistic (check-and-set) vs pessimistic (lock). At 100K contenders, optimistic = 99K retries. Pessimistic = sequential. Redis Lua is effectively "pessimistic within a single thread" — no retries, no contention, because there's only one thread.
Pre-auth cost. Pre-authorizing 100K cards when only 1K win = 99K void auths. Payment processors charge ~$0.01 per auth. 100K × $0.01 = $1K per sale — acceptable for high-value items; questionable for $5 items.
08
Failure Modes
🤖
Bot army floods the queue
Scalper bots create 10K queue entries in 1 second with different IPs / accounts.
→ Mitigation: CAPTCHA on queue entry; proof-of-work (client computes hash challenge); device fingerprinting; rate-limit per account + IP; ban known bot ASNs.
💥
Redis crashes mid-sale
Inventory key lost. Remaining stock unknown. Oversell risk if we guess wrong.
→ Mitigation: Redis with AOF persistence (fsync every second). On crash-restart, replay AOF to recover inventory count. Belt: Postgres order count provides authoritative floor — never sell more than Postgres says remaining.
🔁
User retries purchase on timeout
Network hiccup between checkout svc and user. User retries. Charged twice.
→ Mitigation: ticket-based idempotency. Each ticket can purchase exactly once. Retry with same ticket returns the same order_id.
⏰
Clock skew — sale starts early on some servers
Server A's clock is 2 seconds ahead of Server B. Users on A enter queue 2 seconds early → unfair advantage.
→ Mitigation: queue-open time enforced by a single Redis key (admin sets flag at T=0 via one write). Servers check Redis, not local clock, for "is queue open?"
📉
Admitted user abandons checkout
User is admitted but never calls /purchase. Slot wasted; users behind them wait longer.
→ Mitigation: admission ticket expires in 60 seconds. If not used, admission is revoked and next user in queue is admitted.
09
Interview Tips
Lead with the waiting room. "100K users → 500/sec admission rate → smooth backend load." This is the insight that separates good from great.
Redis Lua for atomic inventory. Name it explicitly — "single Redis key, Lua DECR-if-positive, no race." Shows you've actually built these systems.
Pre-auth payment in queue. This UX detail (checkout = one click, not "enter card") is the kind of production nuance interviewers love.
Bot mitigation is load-bearing. Without it, bots take all 1K items. CAPTCHA + proof-of-work + device fingerprinting. Don't skip this.
Distinguish from Ticketmaster. Ticketmaster = seat selection + hold pattern. Flash sale = single SKU + atomic counter. Simpler inventory model, harder contention model.