DoorDash / Uber Eats

A three-sided marketplace: eaters order, merchants prepare, Dashers deliver. The hard parts: a dispatch engine that assigns the right Dasher to the right order in real-time optimizing for ETA, efficiency, and driver earnings; a live ETA system updating predictions every few seconds based on traffic, restaurant prep time, and Dasher position; and a transactional order lifecycle that doesn't lose your $47 burrito when the payment processor burps. DoorDash does ~2B orders/year; ~6M orders/day.

⚡ Core: Dispatch + ETA + Transactions~6M orders/day~300k daily DashersMulti-party SLALocation-heavy
02

Requirements

Functional
  • Eater browses local restaurant menus; places an order with customizations
  • Order routes to restaurant; kitchen accepts/rejects/adjusts prep time
  • Dispatch engine picks a nearby Dasher and offers the order; Dasher accepts/declines
  • Live order tracking map showing Dasher location + ETA updates
  • Payment charged on order; tips added/modified post-delivery; refunds for issues
  • Ratings + tips; dispute resolution; scheduled orders
Non-Functional
  • Order placement < 500 ms p99; live tracking < 2 s update latency
  • Dispatch decision < 3 seconds — balance of quality vs. latency
  • ETA accuracy within ~3 min on 90th percentile
  • Strong consistency on orders + payments; eventual on search / browse
  • 99.99% availability on order placement (revenue-critical)
  • Scale to ~6M orders/day = ~70/sec avg, ~600/sec peak (Friday dinner)
03

Scale Estimation

Orders / sec (peak)
~600
6M/day × ~10× peak-to-avg; concentrated 6–8 pm local time
Active Dashers / sec
~300K
sending location pings every 4 s = ~75K pings/sec baseline
Geospatial queries / sec
~10K
"nearby Dashers", "nearby restaurants" — hot QPS on the geo index
Dispatch latency budget
~3 s
eater waits for "assigning Dasher"; longer feels broken
Order storage
~60 GB / day
6M orders × ~10 KB each incl. line items + events; ~22 TB/yr hot
Live tracking sessions
~500K
concurrent during dinner peak; each gets Dasher location pushed every ~4 s
04

API Design

GET/restaurants?lat=X&lng=Y&radius=3km

Geo-indexed restaurant search. Returns {restaurants: [{id, name, cuisine, eta_range, fee, rating}], hot_tiles}. Served from a read replica with restaurants clustered by H3/geohash cell.

POST/orders

Create order with idempotency key. Body: {restaurant_id, items[], address, payment_token, tip_amount}. Returns {order_id, total, estimated_eta, status: 'pending_restaurant'}. Strongly consistent; wrapped in a transaction with payment auth.

GET/orders/{id}/track

Live order status + Dasher location. Eater clients poll this every 4 s (or WebSocket push). Returns current {state, dasher_location, eta, elapsed}.

POST/dasher/location

Dasher app posts GPS ping every 4 s when active. Goes to the geo index + the live-tracking push bus. Fire-and-forget; retry on failure.

POST/dasher/offers/{id}/respond

Dasher accepts/declines a dispatched offer. Response window is ~30 s; server auto-rejects and re-dispatches after timeout.

POST/merchants/{id}/orders/{oid}/prep-time

Restaurant POS integration or tablet app updates ready-by-time. Critical signal for dispatch timing.

POST/orders/{id}/tips

Post-delivery tip modification. Hits payment service; updates order ledger. Idempotent.

05

Architecture

Three primary domains: Marketplace (restaurants, menus, browse), Order (checkout, lifecycle, payments), and Logistics (dispatch, ETAs, live tracking). Each is an independent service with its own datastore. They communicate via events on Kafka.

Marketplace / Order / Logistics Services SVG
Eater app browse + order Merchant POS tablet / API Dasher app GPS + offers API Gateway auth + routing Marketplace svc restaurants, menus Order svc checkout + lifecycle Payment svc auth/capture/refund Dispatch svc Dasher ↔ order match ETA / Routing svc ML + map tiles Location svc H3 geo-index Kafka order events Kafka dispatch events Postgres (orders) sharded by user_id Postgres (catalog) menus + restaurants Redis (sessions) live tracking state Cassandra location history Elasticsearch restaurant search S3 ML features + logs
Request Flow — Step Through
Eater · taps place orderOrder svc · saga: createPayment · auth before confirmMerchant · accept + prep timeDispatch · 2s batched windowDasher · push offer / acceptTracking · GPS ping + ETA
Click Next Step to walk through the request flow.
06

Deep Dive — Dispatch Engine

Dispatch is the heart of the system. An order ready (or soon-to-be ready) needs matching to a Dasher with the best tradeoff across: pickup distance, estimated total trip time, driver earnings fairness, batching potential (combine with nearby orders), and restaurant prep synchronization. This is a combinatorial assignment problem at scale.

Naive approach: nearest Dasher wins. Good enough for small markets, terrible at scale — you'd leave Dashers idle on one side of the map while orders pile up on the other.

Modern dispatch algorithm (simplified):

  1. Candidate generation. Query H3 geo-index for Dashers within ~3 km of restaurant. Typically 10–50 candidates.
  2. Feature extraction. For each (Dasher, order) pair compute: pickup distance, Dasher current queue length, restaurant prep time, historical Dasher acceptance rate, current traffic on route.
  3. Score each pair with an ML model predicting "delivery quality" — likelihood of on-time + Dasher acceptance + CS-call-free outcome.
  4. Solve as assignment problem. For N open orders × M candidate Dashers, run the Hungarian algorithm (or approximation) to maximize total score subject to "each Dasher gets at most 1 order." Batch every ~2 s to give multiple orders a chance to co-optimize.
  5. Offer to winning Dasher. Send push notification; 30 s response window. If declined, re-run assignment with remaining candidates.
  6. Consider batching. Two orders at the same restaurant or two orders on the same route → offer both to one Dasher; cut cost per delivery ~30%.
Dispatch Flow — Order Placed → Dasher Assigned Mermaid
sequenceDiagram participant E as Eater participant O as Order svc participant M as Merchant participant D as Dispatch participant GEO as Location svc participant DSR as Dasher E->>O: POST /orders O->>M: forward + prep-time request M-->>O: accept + 12 min O->>D: OrderReady @ T+10 min event D->>GEO: nearby Dashers within 3 km GEO-->>D: candidates [D1..D25] Note over D: score candidates
ML + Hungarian D->>DSR: offer to best Dasher alt accepted DSR-->>D: accept D->>O: AssignmentCommitted O-->>E: ETA + Dasher info else declined / timeout 30s D->>D: remove + reassign end

Why not run dispatch on every order immediately? Because the optimal assignment often looks 60–120 s into the future. Running a batched window lets "order A places at T, order B places at T+30 s near the same restaurant" be offered as a batch. Pure first-come-first-served leaves money on the table.

Location service. Every Dasher pings location every ~4 s. The service maintains a live H3 geo-index: H3 cell → set of Dasher IDs. Lookups are ring queries — "give me Dashers in this cell and its 18 neighbors." Stored in Redis for speed; bulk location history archived to Cassandra for ML training.

ETA. A separate ML service with features: distance, time-of-day traffic, historical restaurant prep time, Dasher speed profile, weather. Runs a GBDT model every time an eater loads the tracking page. P90 error target: ~3 minutes.

Interview answer

"Dispatch runs in ~2 s windows as a batched assignment problem. Query H3 geo-index for Dashers within 3 km, score each (order, Dasher) pair with an ML model on distance + prep time + historical signals, solve with Hungarian algorithm to maximize global score. Batch two nearby orders to one Dasher when possible. Offer sent via push with 30 s response window; on reject, re-dispatch. ETA served by a separate GBDT model refreshed on every tracking-page open. Location service uses H3 cells + ring queries, hot data in Redis, history in Cassandra."

07

Tradeoffs & Design Choices

  • Batched dispatch vs instant. Batching (every 2 s) improves global efficiency but adds latency per order. Instant feels faster but is locally greedy and wasteful. The 2-second window is the sweet spot — imperceptible to eater, big marketplace-efficiency win.
  • Order-as-saga, not a single transaction. Create order → auth payment → confirm with restaurant → dispatch Dasher — each step can fail. Saga pattern with compensating actions (auto-refund on restaurant decline, re-dispatch on Dasher timeout) beats trying to do it as one distributed transaction.
  • ETA accuracy vs pessimism. A promised 25 min ETA delivered in 23 min delights the user; a promised 25 min delivered in 28 min makes them angry even though it's "close." ETAs are intentionally padded ~2 min for this UX asymmetry.
  • H3 cells vs quadtree vs geohash. H3 (Uber's hex grid) beats square grids because hex neighbors are all equidistant, simplifying ring queries. Geohash has edge-of-cell artifacts. DoorDash adopted H3 (originally Uber's library) for dispatch lookups.
  • Strong consistency where it matters. Order state transitions are sagas with idempotent steps in Postgres; payments are 2-phase (auth + capture); dispatch assignments are Cassandra-style writes with dedup. Browse/search can be eventually consistent (cached menus are fine for 60 s).
08

Failure Modes

🍔
Restaurant accepts, then realizes it's out of a dish
Eater waits 10 min; then gets "item unavailable" and has to pick a substitute or cancel. Broken promise.
→ Mitigation: menu inventory integration with POS so out-of-stock items are auto-hidden; pre-order availability check before confirm; in-app substitution flow with partial refund.
🚗
Dasher GPS ping goes dark mid-delivery
Tunnel, dead battery, bad signal → eater's tracking map freezes; CS complaints spike.
→ Mitigation: client-side UI gracefully shows "Dasher is approaching" without exact position after N seconds of stale data. Don't show the last-known dot forever — it's creepy and misleading.
💳
Payment auth fails after order committed
Order saved; payment processor declines. Restaurant started cooking.
→ Mitigation: payment auth is the first step of the saga, not last. No order state written before auth success. If capture fails later (rare), order proceeds and we collect on retry or debt channel.
🌧️
Weather surge — 3× order volume, half the Dashers
Snow storm. Orders spike, Dashers stay home. Wait times balloon; dispatch queue backs up.
→ Mitigation: surge pricing incentives to Dashers; ETA inflation visible to eater at checkout so expectations match reality; temporary caps on ordering in the most impacted areas to prevent total collapse.
🧨
Dispatch service regional outage
Dispatch goes down in a region. Orders pile up in "pending assignment"; restaurants cook food with no one to deliver it.
→ Mitigation: automatic fallback to nearest-Dasher heuristic in-region (no ML model required); circuit-breaker from Order svc to Dispatch so eaters see realistic ETA extensions; graceful degradation better than full failure.
🔁
Duplicate order on network retry
Eater taps "place order" twice during a spinner; phone retries on 504 → two orders.
→ Mitigation: client-generated idempotency key per tap; Order service rejects duplicate keys within 10 minutes; charge once.

Anti-patterns

🚫
Dispatch every order instantly on arrival

Greedy assignment misses batching opportunities; Dashers deadhead.

✓ Better: 2-second batched window; Hungarian assignment with batching bonus.
🚫
Poll for order status every second

Millions of clients × 1/sec = DDoS on your own API.

✓ Better: WebSocket push or long-poll; status delta events; client pulls on wake.
🚫
One `orders` table with everything

Merchant / Dasher / Rider / Payments all fight over this hot row.

✓ Better: Split into Marketplace / Order / Logistics / Payment services with their own stores.
09

Interview Tips

  1. Lead with dispatch, not with search. The marketplace is boring; dispatch is where the interesting engineering is. Interviewers expect you to pick dispatch as the deep-dive.
  2. Explicitly call out "three-sided." Eaters, merchants, Dashers each have their own app, their own SLA, their own failure modes. Many candidates forget the merchant surface entirely.
  3. Saga over distributed transactions. Don't try to "2PC across restaurant + payment + dispatch." Saga with idempotent compensations is the production pattern.
  4. H3 over geohash. Mention H3 by name — Uber popularized it, DoorDash uses it, Yelp uses it. Shows domain fluency.
  5. ETA is ML, not "nearest(restaurant, address)/avg_speed." Frame it as a model trained on millions of historical deliveries. Distance is one feature among many.
11

Evolution

1

MVP — single city, one PostgreSQL, human dispatcher

Orders into Postgres; ops team manually calls Dashers. Works for ~1000 orders/day. DoorDash's first year-ish.

2

Rule-based auto-dispatch + Dasher app

Nearest-available with basic filters. Sharded DB by region. Carries to ~100K orders/day.

3

Batched assignment + H3 geo-index

Shift to 2-second dispatch windows with Hungarian-algorithm assignment. H3 cells for fast nearest-Dasher queries. ~1M orders/day.

4

ML-driven dispatch + ETAs

GBDT models on historical delivery data replace heuristics for both assignment scoring and ETA prediction. Acceptance rates up, CS contacts down.

5

Multi-order batching + vertical expansion

One Dasher carries multiple orders on overlapping routes; cost per delivery drops ~30%. Platform extends to groceries, convenience, returns.

Next up