Shopping Cart

500M carts updated daily across every device. Never lose a cart. The hard parts: a hybrid storage layer that keeps hot carts in Redis for sub-ms reads while persisting to DynamoDB for durability, guest-to-authenticated cart merge that unions items without losing anything on login, and inventory soft-reservation with TTL that holds stock for 15 minutes without hard-locking millions of abandoned items. Amazon, Walmart, Shopify — every cart looks simple until it runs at 50K ops/sec.

Core: Add/Remove/Merge + Soft Hold + Price Lock~500M carts/day~50K ops/sec~500M active cartsCross-device sync
02

Requirements

Functional
  • Add, remove, update items in cart with quantity and variant selection
  • Cart persists across sessions and devices — close laptop, open phone, cart is there
  • Guest cart merges into logged-in cart on authentication — no items lost
  • Inventory soft-reservation with 15-min TTL when item added to cart
  • Price-lock: show the price at time of add, refresh before checkout
  • Abandoned cart recovery: detect idle carts, trigger email recovery flow
Non-Functional
  • Cart read latency < 10 ms from hot cache (Redis)
  • Handle 50K cart-update ops/sec sustained, 200K peak
  • Zero cart data loss — DynamoDB persistence is source of truth
  • Support 500M simultaneously active carts globally
  • Eventual consistency between Redis and DynamoDB < 500 ms
  • Graceful degradation: if Redis is down, serve from DynamoDB (slower but alive)
03

Scale Estimation

Carts created/day
~500M
including guest carts, most abandoned within 1 hour
Cart update ops/sec
~50K
add/remove/update; 200K peak during sales events
Active carts
~500M
carts with at least 1 item, not yet checked out or expired
Avg items per cart
~5
cart document ~2 KB; 500M carts = ~1 TB total
Guest-to-auth merge rate
~15%
of guest sessions eventually log in and trigger a merge
Abandon rate
~70%
of carts abandoned; recovery emails recapture ~5-10%
04

API Design

POST/api/cart/items

Add item to cart. Body: {sku, quantity, variant_id}. Triggers inventory soft-hold (Redis DECR + 15-min TTL). Returns updated cart with price_at_add snapshot. Idempotent on sku — re-adding increments quantity.

PATCH/api/cart/items/{item_id}

Update item quantity or variant. Body: {quantity}. Adjusts soft-hold accordingly (INCR/DECR delta). Returns updated cart. Setting quantity to 0 is equivalent to DELETE.

DELETE/api/cart/items/{item_id}

Remove item from cart. Releases inventory soft-hold (Redis INCR restores reserved units). Returns updated cart.

GET/api/cart

Fetch full cart. Served from Redis (hot) with DynamoDB fallback (cold). Returns {user_id, items: [{sku, qty, price_at_add, current_price, added_at}], updated_at}. Prices refreshed on read.

POST/api/cart/merge

Merge guest cart into authenticated user cart. Body: {guest_cart_id}. Union strategy: items in both keep higher quantity; prices refreshed. Guest cart deleted after merge.

05

Architecture

Four services behind the API gateway: Cart Service (read/write cart), Inventory Service (soft holds), Price Service (cached catalog prices), Merge Service (guest-to-auth union). Redis serves hot carts; DynamoDB is the durable source of truth. Kafka powers async events for abandon recovery.

Shopping Cart ArchitectureSVG
Clientweb / mobile API Gatewayauth + rate-limit Cart Serviceadd / remove / getwrite-through Redis (hot cart)sub-ms reads, TTL 24h DynamoDBdurable persistence Inventory Servicesoft hold DECR+TTL Price Servicecached catalog prices Merge Serviceguest-to-auth union Redis (inventory)DECR + 15-min TTL Kafkacart events stream Email Serviceabandon recovery Abandon Recovery: Kafka event on cart-idle > 1 hour triggers email "you left items in your cart"
Request Flow — Step Through
Client · web / mobileAPI Gateway · auth + rate-limitCart Service · CRUD operationsRedis · hot cart cacheDynamoDB · durable persistenceInventory svc · soft hold DECR+TTLMerge Service · guest-to-auth union
Click Next Step to walk through the request flow.
06

Deep Dive — Cart Data Model + Merge + Soft Hold

(a) Cart data model. Per-user document stored in DynamoDB (partition key: user_id) and cached in Redis as a JSON hash.

// DynamoDB / Redis document
{
  "user_id": "u_abc123",        // or "guest_xyz" for anonymous
  "items": [
    { "sku": "B08N5WRWNW", "qty": 2, "price_at_add": 29.99,
      "variant_id": "color_black", "added_at": "2026-04-13T10:05:00Z" }
  ],
  "updated_at": "2026-04-13T10:05:00Z",
  "ttl": 1681430400             // DynamoDB TTL: auto-delete after 30 days idle
}

Write-through: Cart Service writes Redis first (fast ack to client), then async writes DynamoDB. On Redis miss, load from DynamoDB and backfill Redis. DynamoDB TTL auto-cleans stale carts after 30 days.

(b) Guest-to-auth merge. When a guest user logs in, the Merge Service unions the guest cart with the saved authenticated cart:

  1. Read both carts (guest + auth) from Redis/DynamoDB.
  2. For items in only one cart: include as-is.
  3. For items in both carts (same SKU): keep the higher quantity. The user clearly wants at least that many.
  4. Refresh all prices from Price Service — guest cart may be hours old.
  5. Write merged cart under the authenticated user_id. Delete guest cart.
  6. Adjust inventory soft-holds: release guest holds, acquire auth holds for the merged quantities.

(c) Inventory soft-hold with TTL. When an item is added to cart, hold that unit for 15 minutes so it doesn't sell out from under the user.

-- Redis: soft-hold on cart-add
-- Key: inventory:{sku}:available
DECR inventory:{sku}:available       -- reserve 1 unit
SET  hold:{user_id}:{sku} 1 EX 900  -- 15-min TTL key

-- On TTL expiry (keyspace notification):
INCR inventory:{sku}:available       -- restore stock

-- On checkout (hard commit):
DEL hold:{user_id}:{sku}             -- prevent TTL restore
-- Inventory already decremented; now it's permanent

If the user doesn't checkout within 15 minutes, the TTL expires, Redis INCR restores the stock, and the item becomes available to others. If they do checkout, the hold key is deleted (preventing the TTL callback) and the decrement becomes permanent.

(d) Abandon cart recovery. Cart Service publishes a Kafka event on every cart update. A Flink consumer tracks "time since last update" per cart. When a cart has been idle for > 1 hour and contains items, it fires an event to the Email Service: "You left items in your cart." Recovery emails recapture ~5-10% of abandoned carts — worth billions at Amazon scale.

Recovery timing matters. Sending the email too early (15 min) annoys users still browsing. Too late (24 hours) and they've forgotten or bought elsewhere. The sweet spot is 1 hour for high-intent carts (high-value items), 4 hours for low-value carts. ML models optimize send time per user based on historical open rates and conversion patterns.

Write-through vs write-behind detail. The Cart Service uses a write-behind pattern with Kafka as a durable buffer:

  1. Client calls POST /cart/items. Cart Service writes to Redis immediately — client gets response in < 5 ms.
  2. Cart Service publishes a cart-updated event to Kafka (async, non-blocking).
  3. A DynamoDB writer consumer reads from Kafka and writes to DynamoDB.
  4. If Redis crashes before Kafka publish, the write is lost — but the client already got a success response. Risk window: ~1-2 ms. Acceptable for a shopping cart (not for payments).
  5. If DynamoDB writer fails, Kafka retains the event. Consumer retries with exponential backoff. Eventually consistent.
Cart Lifecycle — Add to Merge to CheckoutMermaid
sequenceDiagram participant U as User (guest) participant CS as Cart Service participant R as Redis participant D as DynamoDB participant IS as Inventory svc participant MS as Merge Service participant CO as Checkout U->>CS: POST /cart/items {sku, qty:2} CS->>R: HSET cart:guest_xyz (item) CS->>D: PutItem (async write-through) CS->>IS: DECR inventory:{sku} by 2 IS-->>CS: hold confirmed (15-min TTL) CS-->>U: cart updated, price_at_add=$29.99 Note over U: User logs in U->>MS: POST /cart/merge {guest_cart_id} MS->>R: GET cart:guest_xyz + cart:u_abc123 MS->>MS: union items, keep higher qty MS->>R: HSET cart:u_abc123 (merged) MS->>D: PutItem (merged, async) MS->>IS: release guest holds, acquire auth holds MS-->>U: merged cart returned U->>CO: POST /checkout CO->>IS: hard-commit inventory (DEL hold keys) CO-->>U: order confirmed

Soft-hold edge cases. The 15-min TTL soft-hold has several edge cases that need careful handling:

  • User updates quantity from 2 to 5: Cart Service computes delta (+3), calls DECR by 3. If insufficient stock for the delta, reject the update and return "only N available."
  • User removes item: Cart Service calls INCR by the held quantity. Stock is immediately available to others.
  • TTL race on checkout: User's hold expires at T=15:00, checkout request arrives at T=14:59. The hold key might expire between "check hold exists" and "delete hold key." Solution: use a Lua script that atomically checks and deletes the hold, then hard-commits the inventory decrement.
  • Multiple tabs / devices: User adds item on phone (hold created), adds same item on laptop (second hold attempted). Deduplicate by user_id + sku — only one hold per user per SKU.

Price service integration. Every GET /cart call refreshes prices from the Price Service cache (a Redis cluster with catalog prices updated every 5 minutes from the product catalog DB). The response includes both price_at_add (historical) and current_price. If they differ by more than 5%, the UI shows a "price changed" badge. Before checkout, a final price validation ensures the user pays the current price — never the stale one.

Interview answer

"Cart is a per-user JSON document in DynamoDB (durable) with a Redis cache (fast). Writes go through Redis first, then async to DynamoDB via Kafka write-behind. Guest carts merge on login via union strategy — same SKU keeps higher quantity, prices refreshed. Inventory soft-hold uses Redis DECR with a 15-minute TTL key: if checkout happens, delete the TTL key and hard-commit; if abandoned, TTL expires and INCR restores stock. Kafka events on cart updates feed a Flink consumer that triggers abandon-recovery emails after 1 hour idle."

07

Interview Tips

  1. Lead with the hybrid storage model. "Redis for hot reads at sub-ms, DynamoDB for durable persistence, write-behind via Kafka." This shows you understand the latency-durability tradeoff at scale.
  2. Guest merge is the differentiator. Most candidates forget that 40% of e-commerce traffic is unauthenticated. Explain the union strategy (keep higher quantity) and the inventory hold transfer — this is production-level detail.
  3. Soft hold with TTL is the key insight. "DECR on add, TTL key at 15 min, INCR on expiry. Checkout deletes the TTL key to make the hold permanent." This single pattern prevents both oversell and dead-stock from abandoned carts.
  4. Don't forget abandon recovery. 70% of carts are abandoned. At Amazon scale that's 350M carts/day. Recovering even 5% via email is worth billions annually. Kafka event stream + idle detection is the architecture.
  5. Distinguish from flash-sale inventory. Flash sale = single atomic counter, extreme contention. Shopping cart = distributed soft holds across millions of SKUs, long-lived state. Different problem, different solution.
08

Anti-patterns

-
Store the cart in an HTTP session cookie

4 KB cookie size limit. Lost when user switches devices. Lost when cookie expires. No server-side visibility for analytics or abandon recovery. Cannot merge.

Better: Server-side cart keyed by user_id (or guest token). Accessible from any device. Full server-side control.
-
Hard-lock inventory when item is added to cart

70% of carts are abandoned. 1M abandoned carts = 1M items locked forever. Real buyers see "out of stock" for items nobody will buy. Revenue lost.

Better: Soft hold with 15-min TTL. Stock auto-releases if user doesn't checkout. Fair to everyone.
-
Never refresh prices — show the price from 3 days ago at checkout

User adds item at $29.99, price drops to $24.99 — they're overcharged. Or price rises — you eat the margin. Either way: disputes, chargebacks, angry users.

Better: Store price_at_add for display. Refresh from Price Service on cart GET and before checkout. Show "price changed" banner if different.
09

Tradeoffs & Design Choices

  • Server-side cart (consistent, cross-device) vs client-side (fast offline, no server cost). Server-side wins for Amazon-scale: cross-device sync is table stakes, and you need server visibility for abandon recovery + analytics. Client-side is viable for simple single-device apps.
  • Soft hold (fair but complex) vs no hold (oversell risk at checkout). Soft hold prevents the "added to cart but sold out at checkout" frustration. The complexity of TTL management is worth it — checkout conversion improves 5-10% when users trust "in my cart = reserved for me."
  • DynamoDB (durable, infinite scale) vs Redis-only (fast but volatile). Redis-only loses carts on crash. DynamoDB-only is too slow for hot reads (5-10 ms vs sub-ms). Hybrid: Redis for speed, DynamoDB for durability. Best of both.
  • Write-through vs write-behind. Write-through (Redis then sync DynamoDB) adds latency. Write-behind (Redis then async DynamoDB) risks data loss in Redis crash window. We choose write-behind with Kafka as a durable buffer — cart writes ack from Redis in < 1 ms, Kafka ensures DynamoDB eventually gets it.
  • Merge strategy: keep-higher-qty vs keep-latest vs prompt user. Keep-higher-qty is the safest default — never lose items. Prompting the user adds friction at login. Keep-latest risks losing items from the other cart.
  • Cart TTL (30 days) vs infinite retention. Keeping carts forever means 500M carts/day accumulate indefinitely — storage grows without bound. 30-day TTL via DynamoDB auto-delete keeps costs manageable. Power users who return after 60 days see an empty cart — acceptable tradeoff. Wishlists solve the "save for later" use case separately.
  • Single cart per user vs multiple saved carts. Amazon allows one active cart but offers "Save for Later" and wishlists. Multiple named carts (e.g., "birthday party supplies") add UI complexity. Start with one cart + save-for-later; evolve to named carts if user research demands it.
10

Failure Modes

--
Cart-inventory desync (TTL expired but item still in cart)
User's 15-min hold expires. Item still shows in their cart, but stock was released and bought by someone else. User clicks checkout — item is actually gone.
Mitigation: on checkout, re-check inventory. If unavailable, show "this item sold out since you added it" with alternatives. Refresh soft-hold on cart interaction (extend TTL on any cart activity).
--
Merge conflict on login
Guest cart has SKU-A qty=3, auth cart has SKU-A qty=1. Which wins? If merge logic has a bug, items are lost or doubled.
Mitigation: deterministic merge rule (max qty). Log both carts before merge for audit. Expose "recently removed" so user can recover if merge drops something unexpectedly.
--
Price change between add and checkout
User adds item at $49.99, price changes to $59.99 before checkout. User feels bait-and-switched. Or price drops and you overcharge.
Mitigation: refresh prices on every cart GET. Show clear "price changed" banner. Let user decide to proceed or remove. Never silently charge a different price.
--
Redis crash loses hot carts
Redis cluster node fails. All carts cached on that shard are gone. Users see empty carts until DynamoDB backfill.
Mitigation: Redis with AOF persistence + replicas. On cache miss, load from DynamoDB (adds ~5 ms latency, not catastrophic). Kafka write-behind buffer ensures DynamoDB is never more than seconds behind Redis.
--
DynamoDB throttling during peak
Black Friday traffic spike. DynamoDB provisioned capacity exceeded. Write-behind Kafka consumer can't keep up. Cart persistence falls behind.
Mitigation: DynamoDB on-demand capacity mode (auto-scales). Kafka consumer lag monitoring with alerts. If lag exceeds 60 seconds, scale consumer group partitions. Redis continues serving reads — users are unaffected even if DynamoDB is behind.
--
Inventory service returns stale hold status
Inventory Redis and Cart Redis are separate clusters. Network partition between them. Cart thinks hold is active; inventory has already released it.
Mitigation: checkout always re-validates inventory with a synchronous call to the inventory service. Cart-side hold is a UX hint, not a guarantee. The guarantee comes at checkout time.
12

Evolution

1

Session cookie cart

Cart stored in browser cookie or server session. Simple but lost on device switch, limited to 4 KB, no abandon tracking.

2

Server-side DB cart

Cart in MySQL/Postgres keyed by user_id. Cross-device. But every read hits the DB — slow under load. No guest merge.

3

Redis hot + DB cold hybrid

Redis caches active carts for sub-ms reads. DynamoDB persists for durability. Write-behind via Kafka. 50K ops/sec handled.

4

Soft inventory hold + price lock

Redis DECR with 15-min TTL reserves stock on cart-add. Price snapshot stored per item. Checkout friction drops dramatically.

5

ML-driven abandon recovery + recommendations

Flink tracks cart idle time. ML model picks optimal email send time per user based on historical open/conversion rates. Cart page shows "frequently bought together" powered by collaborative filtering on cart contents. A/B testing on recovery email timing, subject lines, and discount offers drives continuous conversion improvement.

Next up