E-Commerce Platform

01

Problem Statement

Design the core buyer purchase flow of a large-scale e-commerce platform: a user searches for products, views a product page, adds items to a shopping cart, and checks out with payment and order confirmation. The system serves 50 million daily active users with a 1000:1 read-to-write ratio, must handle flash sale traffic spikes of 100x, and must guarantee that inventory is never oversold.

Core question: How do you serve millions of concurrent browsers with low-latency reads while guaranteeing that two customers never successfully buy the last unit of the same item?

In Scope

Product search & browse — keyword search with faceted filters, category navigation
Product detail page — the most visited page, mixing cached content with real-time inventory
Shopping cart — persistent across sessions and devices, no premature inventory lock
Checkout & ordering — inventory reservation, payment processing, order creation (the distributed transaction)
Order tracking — event-driven state machine from Confirmed to Delivered

Out of Scope

Seller onboarding, recommendation engine, reviews & ratings, returns/refunds, delivery logistics, pricing engine, coupons. Each is acknowledged as a separate system but excluded for focus.

02

Requirements

Functional Requirements

Users can search products by keyword with filters (price, brand, rating, category) and browse by category tree
Product detail pages display info, images, real-time stock status ("Only 3 left!"), and delivery estimate
Shopping cart persists across sessions/devices; adding to cart does not reserve inventory
Checkout atomically reserves inventory → processes payment → creates order — never overselling
Users track order status via an event-driven state machine (Confirmed → Shipped → Delivered)

Non-Functional Requirements

Read path: 99.99% available, < 300ms search, < 200ms product page — eventually consistent
Write path: 99.95% available, < 2s checkout — strongly consistent for inventory & payment
Read:write ratio ≈ 1000:1 — drives separate scaling strategies for each path
Flash sales: Handle 10,000+ concurrent buys on a single product without overselling or degraded UX
Data durability: Zero order loss. Idempotent payments. Cart loss tolerable.

Key insight: Different parts of the system need different consistency models. The read path (search, PDP) is eventually consistent and cached. The write path (inventory, payment) is strongly consistent and transactional. This two-zone architecture is the fundamental design pattern.

03

Scale Estimation

Assumptions: 500M registered users, 50M DAU, ~20 page views per session, 2% conversion rate, 500M products in catalog.

~17K/sec

Avg page reads

~170K/sec

Peak (flash sale)

~12/sec

Avg orders

1000:1

Read:Write ratio

Derivation

50M DAU × 1.5 sessions × 20 pages = 1.5B page views/day ≈ 17K reads/sec (avg), peak at 3–5x. Orders: 50M × 2% = 1M/day ≈ 12/sec. Flash sale hot-key: 10K concurrent DECR on a single SKU — the number that breaks naive database locking.

Storage	Size	Store
Product metadata	~2 TB	PostgreSQL
Search index	~6–8 TB	Elasticsearch
Product images	~4.8 PB	S3 + CDN
Cart data (active)	~4 GB	Redis
Order data (5 yr)	~1.8 TB	PostgreSQL (sharded)
Inventory records	~16 GB	Redis + PostgreSQL

04

API Design

REST for cacheability, cursor-based pagination for deep search pages, idempotency keys on order placement to prevent double-charging, and a two-step checkout flow (session → confirm) to allow inventory reservation with TTL.

Search — Elasticsearch backed

GET /api/v1/search?q=wireless+headphones&category=audio
                  &price_min=50&price_max=300&sort=relevance
                  &cursor=eyJzY29yZSI6MC44NX0&limit=20

→ 200  { results[], facets{ brands[], price_ranges[] },
         pagination{ cursor, has_next }, metadata{ search_time_ms } }

Checkout initiation — reserves inventory with TTL

POST /api/v1/orders/checkout
Body: { shipping_address_id, coupon_code }

→ 200  { checkout_session_id, items[], pricing{ subtotal, tax,
         discount, total }, reservation_expires_at }
→ 409  { error: "INVENTORY_CONFLICT", items[{ available }] }

Place order — the point of no return

POST /api/v1/orders
Headers: Idempotency-Key: 550e8400-e29b-41d4-...
Body: { checkout_session_id, payment_method_id }

→ 201  { order_id, status: "CONFIRMED", payment{ transaction_id } }
→ 402  { error: "PAYMENT_FAILED", session_still_valid: true }
→ 410  { error: "SESSION_EXPIRED", items_released: true }

Internal — Atomic inventory reservation (Lua in Redis)

POST /internal/inventory/reserve
Body: { reservation_id, items[{ product_id, quantity }], ttl_seconds: 600 }

→ 200  { status: "RESERVED", expires_at }
→ 409  { status: "PARTIAL_FAILURE", reserved_items[] }

05

High-Level Architecture

The architecture is split into a fast, eventually-consistent read plane (CDN → cache → search) and a strongly-consistent write plane (inventory → payment → order). The 1000:1 ratio justifies investing heavily in the read path while keeping the write path careful and transactional.

Request Flow — Step Through

Client→API Gateway→Order Svc→Inventory (Redis)→Payment Svc→Order DB→Kafka→Consumers

Click Next Step to walk through the request flow.

06

Deep Dive — Inventory Under Concurrency

The single hardest problem in e-commerce: 10,000 users click "Buy" on an item with 5 units left — within the same second. How do exactly 5 succeed and 9,995 get a clean "sold out"?

Why Naive Approaches Fail

Pessimistic Lock

SELECT ... FOR UPDATE serializes all 10K requests through one row lock. At 5ms/txn = 50 seconds total. Connection pool exhaustion crashes the entire database.

Optimistic Lock

Version-based check-and-retry. 10K reads, 1 succeeds, 9,999 retry. Cascading retries produce ~50M total attempts — worse than pessimistic.

The Production Solution: Redis Atomic DECR

Redis is single-threaded — every command executes sequentially with no interleaving. A Lua script atomically checks availability and decrements, handling 50,000+ ops/sec — a 250x improvement over database locking.

Reserve with TTL — Redis Lua Script

local available = tonumber(redis.call('GET', KEYS[1]))
if available == nil or available <= 0 then
    return {0, "OUT_OF_STOCK", 0}
end
local requested = tonumber(ARGV[1])
if available < requested then
    return {0, "INSUFFICIENT_STOCK", available}
end
redis.call('DECRBY', KEYS[1], requested)
redis.call('HSET', KEYS[2], ARGV[2], ARGV[1])
redis.call('EXPIRE', KEYS[2], tonumber(ARGV[3]))
return {1, "RESERVED", available - requested}

Two-Phase Reservation Lifecycle

sequenceDiagram participant U as User participant O as Order Svc participant I as Inventory (Redis) participant P as Payment Svc participant DB as Order DB U->>O: POST /orders/checkout O->>I: EVAL reserve.lua (DECR + TTL) I-->>O: RESERVED (10 min TTL) O-->>U: checkout_session_id + pricing U->>O: POST /orders (Idempotency-Key) O->>P: Charge card (idempotent) P-->>O: Payment SUCCESS O->>DB: INSERT order + outbox (single txn) O->>I: Confirm reservation (HDEL) O-->>U: 201 Order Confirmed Note over I: If TTL expires before payment: I->>I: Sweeper: INCRBY (release stock)

Soft vs. hard expiry: The user sees a 10-minute timer (soft). The actual Redis TTL is 15 minutes (hard). The 5-minute buffer ensures reservations never expire while payment is in-flight.

Reconciliation — The Safety Net

Redis is volatile; PostgreSQL is the durable source of truth. A reconciliation job runs every 5 minutes (paused during flash sales), compares Redis counters with PostgreSQL, and corrects drift. An orphaned-reservation sweeper releases reservations that exist in PostgreSQL but are missing from Redis (e.g., after a Redis failover).

Approach	Throughput	Correct?	Best For
Pessimistic Lock	~200/sec	Yes	Low traffic (<100 concurrent)
Optimistic Lock	~500/sec*	Yes	Moderate contention
Atomic SQL UPDATE	~2,000/sec	Yes	Single-DB architectures
Redis Lua (2-phase)	~50,000/sec	Yes	Flash sales, production

07

Simpler. But if notification service is down, checkout either fails (bad) or silently drops notifications (also bad). No replay capability.

08

What Can Go Wrong

Redis Crashes During Flash Sale

Inventory counter lost. Recovery: Sentinel promotes replica in 10–30s. Reconciliation job corrects drift within 5 min. Orphaned-reservation sweeper releases stuck stock. Worst case: oversell a few units → cancel + apologize + 15% discount code.

Payment Succeeds but Order Write Fails

User charged with no order record — the scariest failure. Recovery: Payment intent pre-logged in PostgreSQL. Recovery worker finds charges without orders, retries order creation. If unrecoverable after 30 min, issues automatic refund. Idempotency key prevents double-charging on retries.

Elasticsearch Cluster Down

Search stops. Graceful degradation: L1 — serve cached results for popular queries. L2 — fall back to PostgreSQL tsvector (basic but functional). L3 — show category browsing + trending products. Browsing, cart, checkout all unaffected.

Cascading Failure from Slow Service

Inventory Service responds in 5s → Order Service thread pool fills → API Gateway queue grows → all endpoints affected. Defense in depth: 2s timeouts, circuit breakers (trip after 5 failures), bulkhead pattern (separate thread pools per workload), load shedding (drop non-critical requests at 90%+ load).

Kafka Unavailable

Orders still placed (critical path is synchronous). But emails don't send, search index doesn't update, warehouse not notified. Recovery: Outbox table queues all events in PostgreSQL. When Kafka recovers, relay drains the backlog. Delay = Kafka downtime. No events lost.

Stale Cache Serving Wrong Prices

Seller changes price; cache shows old price for up to 15 min. Fix: Write-through cache (immediate update) + event-driven invalidation (safety net via Kafka) + TTL (ultimate backstop). Checkout always verifies prices against source of truth before charging.

09

Interview Tips

💡

Lead with the 1000:1 ratio
Derive it in 30 seconds, then explain its consequence: separate read and write paths with different consistency models. This one insight drives the entire architecture and immediately signals data-driven thinking.

⚡

"Add-to-cart does NOT reserve inventory"
State this proactively. Most candidates get it wrong. Explain why: 100K users holding items in abandoned carts would create phantom stock-outs. Reservation happens at checkout initiation, not cart addition.

🎯

Build the diagram layer by layer
Start: Client → Server → DB. Then add CDN (static assets), then Redis (cache), then Elasticsearch (search), then split into services, then add Kafka. Each layer is a response to a specific bottleneck, not a pattern from memory.

🔑

Idempotency key on order placement
Mention it before the interviewer asks about double-charging. It signals production experience. Explain: client generates a UUID, server deduplicates — even if the response is lost and the client retries, only one charge occurs.

📦

Know the outbox pattern
"The order and the event are written in the same database transaction. A relay process publishes to Kafka." This one sentence shows you understand reliable event publishing in microservices — a production-level concern most candidates miss.

🧠

Frame tradeoffs, not answers
Don't say "use Redis." Say "We chose Redis over database locking because the flash sale requirement demands 50K ops/sec on a single hot key. If we didn't have flash sales, the simpler atomic SQL UPDATE would suffice." This shows architectural maturity.

10

Evolution

How this design grows from MVP to planet-scale. Each stage is triggered by a specific bottleneck — complexity is added only when the numbers demand it.

1

MVP Monolith (0–10K DAU)

Django/Rails monolith + PostgreSQL + local disk for images. Search is SQL ILIKE. Cart is a DB table. Checkout is a single database transaction. Simple, deployable, sufficient.

2

Cache + CDN (10K–100K DAU)

Add Redis as cache layer (90%+ hit rate). Move images to S3 + CloudFront. Add background job framework (Celery) for async tasks. Still a monolith, still one DB.

3

Search + First Split (100K–1M DAU)

Add Elasticsearch for product search (async sync from PostgreSQL). Extract Payment Service for PCI isolation. Add monitoring (APM, Sentry, log aggregation).

4

Full Microservices + Events (1M–10M DAU)

Split into 7 services. Add Kafka for event-driven side effects. Redis atomic inventory with 2-phase reservation. Saga pattern for checkout. Kubernetes for orchestration. Outbox pattern for reliable publishing.

5

Sharding + CQRS (10M–50M DAU)

Shard orders DB by user_id. CQRS: write to transactional DB, read from data warehouse (BigQuery) for analytics. Redis Cluster for HA. Read replicas per service.

6

Multi-Region / Planet-Scale (50M+ DAU)

Regional read replicas (catalog, search, CDN) in US, EU, Asia. Active-passive writes or region-specific inventory. Global load balancer (Route 53). Chaos engineering. You're Amazon now.

📺

References & Videos

Design Amazon — E-commerce Platform

Gaurav Sen · 25 min

E-commerce System Design

Tech Dummies · 30 min

Design Amazon

AlgoMaster

Design an Online Shopping System

GeeksforGeeks

Problem Statement

In Scope

Out of Scope

Requirements

Functional Requirements

Non-Functional Requirements

Scale Estimation

Derivation

API Design

High-Level Architecture

Deep Dive — Inventory Under Concurrency

Why Naive Approaches Fail

Pessimistic Lock

Optimistic Lock

The Production Solution: Redis Atomic DECR

Two-Phase Reservation Lifecycle

Reconciliation — The Safety Net

Key Design Decisions & Tradeoffs

Inventory Counter: Redis vs. Database Locking

Redis Atomic DECR (Lua)

PostgreSQL FOR UPDATE

Checkout Orchestration: Saga vs. 2PC

Saga with Compensating Transactions

Two-Phase Commit (2PC)

Search: Elasticsearch vs. PostgreSQL Full-Text

Elasticsearch (Async Indexed)

PostgreSQL tsvector + GIN

Consistency Model: Eventual (Reads) vs. Strong (Writes)

Two-Zone Architecture

Strong Consistency Everywhere

Side Effects: Kafka Events vs. Synchronous Calls

Kafka (Async Events + Outbox)

Synchronous HTTP Calls

What Can Go Wrong

Redis Crashes During Flash Sale

Payment Succeeds but Order Write Fails

Elasticsearch Cluster Down

Cascading Failure from Slow Service

Kafka Unavailable

Stale Cache Serving Wrong Prices

Interview Tips

Similar Problems

Uber / Ride Sharing

WhatsApp / Chat

Rate Limiter

Notification System

Search Autocomplete

Evolution

MVP Monolith (0–10K DAU)

Cache + CDN (10K–100K DAU)

Search + First Split (100K–1M DAU)

Full Microservices + Events (1M–10M DAU)

Sharding + CQRS (10M–50M DAU)

Multi-Region / Planet-Scale (50M+ DAU)

References & Videos

Uber / Ride Sharing

WhatsApp / Chat

Proxy vs Reverse Proxy