System Design — 06

E-Commerce Platform

Design a large-scale platform like Amazon or Flipkart — from product search to checkout — where millions browse concurrently but inventory must never oversell.

Distributed TransactionsInventory ConcurrencyCQRSSaga PatternFlash Sales
01

Problem Statement

Design the core buyer purchase flow of a large-scale e-commerce platform: a user searches for products, views a product page, adds items to a shopping cart, and checks out with payment and order confirmation. The system serves 50 million daily active users with a 1000:1 read-to-write ratio, must handle flash sale traffic spikes of 100x, and must guarantee that inventory is never oversold.

Core question: How do you serve millions of concurrent browsers with low-latency reads while guaranteeing that two customers never successfully buy the last unit of the same item?

In Scope

  • Product search & browse — keyword search with faceted filters, category navigation
  • Product detail page — the most visited page, mixing cached content with real-time inventory
  • Shopping cart — persistent across sessions and devices, no premature inventory lock
  • Checkout & ordering — inventory reservation, payment processing, order creation (the distributed transaction)
  • Order tracking — event-driven state machine from Confirmed to Delivered

Out of Scope

Seller onboarding, recommendation engine, reviews & ratings, returns/refunds, delivery logistics, pricing engine, coupons. Each is acknowledged as a separate system but excluded for focus.

02

Requirements

Functional Requirements

  • Users can search products by keyword with filters (price, brand, rating, category) and browse by category tree
  • Product detail pages display info, images, real-time stock status ("Only 3 left!"), and delivery estimate
  • Shopping cart persists across sessions/devices; adding to cart does not reserve inventory
  • Checkout atomically reserves inventory → processes payment → creates order — never overselling
  • Users track order status via an event-driven state machine (Confirmed → Shipped → Delivered)

Non-Functional Requirements

  • Read path: 99.99% available, < 300ms search, < 200ms product page — eventually consistent
  • Write path: 99.95% available, < 2s checkout — strongly consistent for inventory & payment
  • Read:write ratio ≈ 1000:1 — drives separate scaling strategies for each path
  • Flash sales: Handle 10,000+ concurrent buys on a single product without overselling or degraded UX
  • Data durability: Zero order loss. Idempotent payments. Cart loss tolerable.

Key insight: Different parts of the system need different consistency models. The read path (search, PDP) is eventually consistent and cached. The write path (inventory, payment) is strongly consistent and transactional. This two-zone architecture is the fundamental design pattern.

03

Scale Estimation

Assumptions: 500M registered users, 50M DAU, ~20 page views per session, 2% conversion rate, 500M products in catalog.

~17K/sec
Avg page reads
~170K/sec
Peak (flash sale)
~12/sec
Avg orders
1000:1
Read:Write ratio

Derivation

50M DAU × 1.5 sessions × 20 pages = 1.5B page views/day ≈ 17K reads/sec (avg), peak at 3–5x. Orders: 50M × 2% = 1M/day ≈ 12/sec. Flash sale hot-key: 10K concurrent DECR on a single SKU — the number that breaks naive database locking.

StorageSizeStore
Product metadata~2 TBPostgreSQL
Search index~6–8 TBElasticsearch
Product images~4.8 PBS3 + CDN
Cart data (active)~4 GBRedis
Order data (5 yr)~1.8 TBPostgreSQL (sharded)
Inventory records~16 GBRedis + PostgreSQL
04

API Design

REST for cacheability, cursor-based pagination for deep search pages, idempotency keys on order placement to prevent double-charging, and a two-step checkout flow (session → confirm) to allow inventory reservation with TTL.

Search — Elasticsearch backed
GET /api/v1/search?q=wireless+headphones&category=audio
                  &price_min=50&price_max=300&sort=relevance
                  &cursor=eyJzY29yZSI6MC44NX0&limit=20

→ 200  { results[], facets{ brands[], price_ranges[] },
         pagination{ cursor, has_next }, metadata{ search_time_ms } }
Checkout initiation — reserves inventory with TTL
POST /api/v1/orders/checkout
Body: { shipping_address_id, coupon_code }

→ 200  { checkout_session_id, items[], pricing{ subtotal, tax,
         discount, total }, reservation_expires_at }
→ 409  { error: "INVENTORY_CONFLICT", items[{ available }] }
Place order — the point of no return
POST /api/v1/orders
Headers: Idempotency-Key: 550e8400-e29b-41d4-...
Body: { checkout_session_id, payment_method_id }

→ 201  { order_id, status: "CONFIRMED", payment{ transaction_id } }
→ 402  { error: "PAYMENT_FAILED", session_still_valid: true }
→ 410  { error: "SESSION_EXPIRED", items_released: true }
Internal — Atomic inventory reservation (Lua in Redis)
POST /internal/inventory/reserve
Body: { reservation_id, items[{ product_id, quantity }], ttl_seconds: 600 }

→ 200  { status: "RESERVED", expires_at }
→ 409  { status: "PARTIAL_FAILURE", reserved_items[] }
05

High-Level Architecture

The architecture is split into a fast, eventually-consistent read plane (CDN → cache → search) and a strongly-consistent write plane (inventory → payment → order). The 1000:1 ratio justifies investing heavily in the read path while keeping the write path careful and transactional.

Clients Web / Mobile / Tablet CDN CloudFront API Gateway Auth · Rate Limit · Route Product Svc Catalog + PDP Search Svc Full-text + Facets Cart Svc Add / Remove Order Svc Saga Orchestrator Inventory Svc Atomic Reserve Payment Svc Stripe / Razorpay Redis Cache + Cart + Inv Elasticsearch Search Index PostgreSQL Catalog + Users PostgreSQL Orders (Sharded) DynamoDB Cart Backup S3 Product Images Kafka Events: order.created, product.updated Notification Email · SMS · Push Search Indexer ES Sync Consumer Static Reserve Charge Outbox
Request Flow — Step Through
ClientAPI GatewayOrder SvcInventory (Redis)Payment SvcOrder DBKafkaConsumers
Click Next Step to walk through the request flow.
06

Deep Dive — Inventory Under Concurrency

The single hardest problem in e-commerce: 10,000 users click "Buy" on an item with 5 units left — within the same second. How do exactly 5 succeed and 9,995 get a clean "sold out"?

Why Naive Approaches Fail

Pessimistic Lock

SELECT ... FOR UPDATE serializes all 10K requests through one row lock. At 5ms/txn = 50 seconds total. Connection pool exhaustion crashes the entire database.

Optimistic Lock

Version-based check-and-retry. 10K reads, 1 succeeds, 9,999 retry. Cascading retries produce ~50M total attempts — worse than pessimistic.

The Production Solution: Redis Atomic DECR

Redis is single-threaded — every command executes sequentially with no interleaving. A Lua script atomically checks availability and decrements, handling 50,000+ ops/sec — a 250x improvement over database locking.

Reserve with TTL — Redis Lua Script
local available = tonumber(redis.call('GET', KEYS[1]))
if available == nil or available <= 0 then
    return {0, "OUT_OF_STOCK", 0}
end
local requested = tonumber(ARGV[1])
if available < requested then
    return {0, "INSUFFICIENT_STOCK", available}
end
redis.call('DECRBY', KEYS[1], requested)
redis.call('HSET', KEYS[2], ARGV[2], ARGV[1])
redis.call('EXPIRE', KEYS[2], tonumber(ARGV[3]))
return {1, "RESERVED", available - requested}

Two-Phase Reservation Lifecycle

sequenceDiagram participant U as User participant O as Order Svc participant I as Inventory (Redis) participant P as Payment Svc participant DB as Order DB U->>O: POST /orders/checkout O->>I: EVAL reserve.lua (DECR + TTL) I-->>O: RESERVED (10 min TTL) O-->>U: checkout_session_id + pricing U->>O: POST /orders (Idempotency-Key) O->>P: Charge card (idempotent) P-->>O: Payment SUCCESS O->>DB: INSERT order + outbox (single txn) O->>I: Confirm reservation (HDEL) O-->>U: 201 Order Confirmed Note over I: If TTL expires before payment: I->>I: Sweeper: INCRBY (release stock)

Soft vs. hard expiry: The user sees a 10-minute timer (soft). The actual Redis TTL is 15 minutes (hard). The 5-minute buffer ensures reservations never expire while payment is in-flight.

Reconciliation — The Safety Net

Redis is volatile; PostgreSQL is the durable source of truth. A reconciliation job runs every 5 minutes (paused during flash sales), compares Redis counters with PostgreSQL, and corrects drift. An orphaned-reservation sweeper releases reservations that exist in PostgreSQL but are missing from Redis (e.g., after a Redis failover).

ApproachThroughputCorrect?Best For
Pessimistic Lock~200/secYesLow traffic (<100 concurrent)
Optimistic Lock~500/sec*YesModerate contention
Atomic SQL UPDATE~2,000/secYesSingle-DB architectures
Redis Lua (2-phase)~50,000/secYesFlash sales, production
07

Key Design Decisions & Tradeoffs

Inventory Counter: Redis vs. Database Locking

✓ Chosen

Redis Atomic DECR (Lua)

50K+ ops/sec on a single instance. No lock contention, no connection pool exhaustion. Cost: dual-system complexity and reconciliation job.

✗ Alternative

PostgreSQL FOR UPDATE

~200 ops/sec. Correct but serializes under contention. Single source of truth (simpler), but melts under flash sale load.

Checkout Orchestration: Saga vs. 2PC

✓ Chosen

Saga with Compensating Transactions

Each service commits independently. Orchestrator handles rollback. Works across Redis, Stripe, PostgreSQL. Higher availability — one slow service doesn't block all.

✗ Alternative

Two-Phase Commit (2PC)

True atomicity across databases. But requires all participants available simultaneously (reduces availability), holds locks during prepare phase, and doesn't work with Redis or Stripe.

Search: Elasticsearch vs. PostgreSQL Full-Text

✓ Chosen

Elasticsearch (Async Indexed)

Purpose-built: fuzzy matching, faceted aggregations, relevance ranking, 20–50ms queries on 500M docs. Accepts 1–5 second index lag.

✗ Alternative

PostgreSQL tsvector + GIN

Single source of truth, no sync lag. But no fuzzy search, no facets, 100–500ms on 500M rows. Good fallback for ES outage.

Consistency Model: Eventual (Reads) vs. Strong (Writes)

✓ Chosen

Two-Zone Architecture

Read path: aggressive caching, CDN, eventual consistency (seconds of staleness OK). Write path: strong consistency for inventory and payments. 99% of traffic hits the fast zone.

✗ Alternative

Strong Consistency Everywhere

Every read hits the source of truth. No stale data. But can't serve 50K reads/sec without a massive database fleet, and latency doubles.

Side Effects: Kafka Events vs. Synchronous Calls

✓ Chosen

Kafka (Async Events + Outbox)

One "order.created" event, 5+ consumers. If notification service is down, events queue — no orders fail. Supports replay for data recovery.

✗ Alternative

Synchronous HTTP Calls

Simpler. But if notification service is down, checkout either fails (bad) or silently drops notifications (also bad). No replay capability.

08

What Can Go Wrong

Redis Crashes During Flash Sale

Inventory counter lost. Recovery: Sentinel promotes replica in 10–30s. Reconciliation job corrects drift within 5 min. Orphaned-reservation sweeper releases stuck stock. Worst case: oversell a few units → cancel + apologize + 15% discount code.

Payment Succeeds but Order Write Fails

User charged with no order record — the scariest failure. Recovery: Payment intent pre-logged in PostgreSQL. Recovery worker finds charges without orders, retries order creation. If unrecoverable after 30 min, issues automatic refund. Idempotency key prevents double-charging on retries.

Elasticsearch Cluster Down

Search stops. Graceful degradation: L1 — serve cached results for popular queries. L2 — fall back to PostgreSQL tsvector (basic but functional). L3 — show category browsing + trending products. Browsing, cart, checkout all unaffected.

Cascading Failure from Slow Service

Inventory Service responds in 5s → Order Service thread pool fills → API Gateway queue grows → all endpoints affected. Defense in depth: 2s timeouts, circuit breakers (trip after 5 failures), bulkhead pattern (separate thread pools per workload), load shedding (drop non-critical requests at 90%+ load).

Kafka Unavailable

Orders still placed (critical path is synchronous). But emails don't send, search index doesn't update, warehouse not notified. Recovery: Outbox table queues all events in PostgreSQL. When Kafka recovers, relay drains the backlog. Delay = Kafka downtime. No events lost.

Stale Cache Serving Wrong Prices

Seller changes price; cache shows old price for up to 15 min. Fix: Write-through cache (immediate update) + event-driven invalidation (safety net via Kafka) + TTL (ultimate backstop). Checkout always verifies prices against source of truth before charging.

09

Interview Tips

💡
Lead with the 1000:1 ratio
Derive it in 30 seconds, then explain its consequence: separate read and write paths with different consistency models. This one insight drives the entire architecture and immediately signals data-driven thinking.
"Add-to-cart does NOT reserve inventory"
State this proactively. Most candidates get it wrong. Explain why: 100K users holding items in abandoned carts would create phantom stock-outs. Reservation happens at checkout initiation, not cart addition.
🎯
Build the diagram layer by layer
Start: Client → Server → DB. Then add CDN (static assets), then Redis (cache), then Elasticsearch (search), then split into services, then add Kafka. Each layer is a response to a specific bottleneck, not a pattern from memory.
🔑
Idempotency key on order placement
Mention it before the interviewer asks about double-charging. It signals production experience. Explain: client generates a UUID, server deduplicates — even if the response is lost and the client retries, only one charge occurs.
📦
Know the outbox pattern
"The order and the event are written in the same database transaction. A relay process publishes to Kafka." This one sentence shows you understand reliable event publishing in microservices — a production-level concern most candidates miss.
🧠
Frame tradeoffs, not answers
Don't say "use Redis." Say "We chose Redis over database locking because the flash sale requirement demands 50K ops/sec on a single hot key. If we didn't have flash sales, the simpler atomic SQL UPDATE would suffice." This shows architectural maturity.
11

Evolution

How this design grows from MVP to planet-scale. Each stage is triggered by a specific bottleneck — complexity is added only when the numbers demand it.

1

MVP Monolith (0–10K DAU)

Django/Rails monolith + PostgreSQL + local disk for images. Search is SQL ILIKE. Cart is a DB table. Checkout is a single database transaction. Simple, deployable, sufficient.

2

Cache + CDN (10K–100K DAU)

Add Redis as cache layer (90%+ hit rate). Move images to S3 + CloudFront. Add background job framework (Celery) for async tasks. Still a monolith, still one DB.

3

Search + First Split (100K–1M DAU)

Add Elasticsearch for product search (async sync from PostgreSQL). Extract Payment Service for PCI isolation. Add monitoring (APM, Sentry, log aggregation).

4

Full Microservices + Events (1M–10M DAU)

Split into 7 services. Add Kafka for event-driven side effects. Redis atomic inventory with 2-phase reservation. Saga pattern for checkout. Kubernetes for orchestration. Outbox pattern for reliable publishing.

5

Sharding + CQRS (10M–50M DAU)

Shard orders DB by user_id. CQRS: write to transactional DB, read from data warehouse (BigQuery) for analytics. Redis Cluster for HA. Read replicas per service.

6

Multi-Region / Planet-Scale (50M+ DAU)

Regional read replicas (catalog, search, CDN) in US, EU, Asia. Active-passive writes or region-specific inventory. Global load balancer (Route 53). Chaos engineering. You're Amazon now.

Next up