Mock Interview: Design Uber — Mock Transcript — Mock Interview

Interviewer

Design Uber. Riders request a ride, drivers accept, and we need to match them efficiently. Assume we're operating in 600 cities worldwide with around 5 million rides per day.

Candidate

I want to start with the core dispatch problem, because that's where the real engineering challenge is. Dispatch isn't just "find the nearest driver" — it's a batched assignment problem. Every 2 seconds, we collect all pending ride requests and all available drivers in a region, then solve an optimization: minimize total pickup time across all pairs while respecting constraints like driver preferences, vehicle type requirements, and predicted trip profitability. In production, Uber uses a variant of the Hungarian algorithm in 2-second windows to solve this bipartite matching. This batched approach outperforms greedy nearest-driver matching by 10-20% on average pickup time.

📝 Annotation

Saying "Hungarian algorithm in 2-second windows" shows production awareness. Most candidates describe a simple nearest-driver search. Framing dispatch as batched optimization immediately signals that the candidate understands this is a matching problem, not a proximity lookup.

Interviewer

How do you efficiently find nearby drivers for this matching?

Candidate

I'd use H3, Uber's open-source hierarchical hexagonal grid system. The Earth's surface is divided into hexagonal cells at multiple resolutions. For driver matching, resolution 7 gives us hexagons of about 5km² — a good granularity for urban areas. Each driver's current location maps to an H3 cell index. When a ride request comes in, we compute the H3 cell for the pickup point and query drivers in that cell plus the 6 neighboring cells (the k-ring). This gives us O(1) spatial lookups instead of computing distances against every driver. The H3 indices are stored in a Redis sorted set keyed by cell ID, with driver_id as the member and last-update timestamp as the score. Stale entries (drivers who haven't pinged in 30 seconds) are pruned by a background job.

📝 Annotation

Naming H3 by name and explaining the k-ring neighbor query shows domain fluency. The specific resolution choice (7) and the Redis sorted-set implementation with TTL pruning are details that distinguish a senior-level answer.

Interviewer

Walk me through the location tracking system. Drivers are sending GPS updates constantly.

Candidate

Drivers send GPS pings every 4 seconds via a persistent WebSocket connection to the nearest edge server. That's roughly 1.25 million pings per second at peak (5M active drivers). The edge server publishes each ping to a Kafka topic partitioned by city_id. A stream processor — Flink or Kafka Streams — consumes these events and does three things: (1) updates the driver's position in the H3 spatial index in Redis, (2) writes to a time-series store (like Apache Druid or TimescaleDB) for trip replay and analytics, and (3) pushes the position to the rider's app via a separate WebSocket channel if they're on an active trip. The key design decision is that the spatial index is eventually consistent — a 1-2 second lag is acceptable for matching, but the rider-facing position stream is real-time via direct WebSocket relay.

Interviewer

How do you compute ETAs? That seems like a hard problem.

Candidate

ETA computation happens at two levels. For the initial ride offer, we need a rough ETA within 100ms. We precompute a road-network graph using OpenStreetMap data, partitioned into regional subgraphs. We run a modified Dijkstra (actually Contraction Hierarchies, which preprocesses the graph to make queries 1000x faster than naive Dijkstra) to get travel time estimates. But raw graph distances aren't enough — we layer on real-time traffic data. We aggregate GPS traces from active drivers to compute per-road-segment speeds, updated every 2 minutes. The ETA model combines the graph distance with a gradient-boosted decision tree that adjusts for time-of-day, weather, and local events. During an active trip, we continuously refine the ETA using the driver's actual speed and reroute if they deviate from the predicted path.

📝 Annotation

Mentioning Contraction Hierarchies by name is a strong signal. This is the actual algorithm used by OSRM and similar routing engines. The two-layer approach (graph + ML adjustment) is how real ETA systems work.

Interviewer

What if the dispatch service goes down? How do you handle that failure?

Candidate

The dispatch service is the most critical component, so it needs careful failure handling. First, it runs as a stateless service behind a load balancer — the matching state lives in Redis, not in the service instances. If an instance crashes, requests route to another instance that reads the same Redis cluster. For a full dispatch outage, we have a degraded fallback mode: ride requests go into a Kafka dead-letter queue, and a simplified matcher runs that does basic nearest-driver assignment without the optimization — essentially greedy matching. This is worse for efficiency but keeps rides flowing. The system detects dispatch health via a circuit breaker pattern: if the optimization solver exceeds its 2-second SLA for 3 consecutive windows, we trip the breaker and activate greedy mode. Recovery is automatic once the solver latency returns to normal. We also geo-partition dispatch by city, so a failure in the NYC dispatch shard doesn't affect London.

📝 Annotation

The geo-partitioned blast radius is excellent. Combining circuit breaker with a graceful degradation to greedy matching shows the candidate thinks about failure as a spectrum, not a binary. The dead-letter queue ensures no ride request is silently dropped.

Interviewer

Tell me about surge pricing. How does that work architecturally?

Candidate

Surge pricing is a supply-demand balancing mechanism. We compute a surge multiplier per H3 cell at resolution 6 (larger hexagons, about 36km²) every 30 seconds. The inputs are: number of pending ride requests in the cell, number of available drivers, historical demand patterns for this cell/time, and nearby cell spillover. A simple model: if demand/supply ratio exceeds 1.5, we start surging. The multiplier follows a curve — 1.2x at ratio 1.5, up to 3x at ratio 4.0. The surge service publishes multipliers to a Redis hash that both the pricing service and the rider app read. Critically, the displayed surge price must be locked in at request time and honored for 5 minutes — we store the quoted surge in the ride record so that a multiplier change mid-request doesn't affect the rider. Drivers see a heat map of surge zones to incentivize movement toward high-demand areas.

Interviewer

How do you handle the payment flow?

Candidate

Payment is a critical path that must be reliable and auditable. When the ride starts, we place a hold (auth) on the rider's payment method for the estimated fare plus a buffer. When the ride ends, we compute the actual fare based on distance, time, surge multiplier, and any promotions. We then capture the exact amount against the hold. If the actual exceeds the hold, we do a separate charge for the difference. All of this goes through an idempotent payment service with a unique ride_id as the idempotency key. The payment service writes to a double-entry ledger: every transaction creates both a debit (rider) and credit (driver's earnings account) entry. Driver payouts happen in a batch settlement process — daily or weekly depending on the driver's preference. We reconcile the ledger daily against the payment processor's records to catch discrepancies.

📝 Annotation

The auth-then-capture pattern with idempotency keys and double-entry ledger shows financial systems awareness. Many candidates overlook the payment dimension of ride-sharing entirely.

Interviewer

Any final architectural thoughts?

Candidate

Two things. First, the trip lifecycle is a state machine: REQUESTED → MATCHED → DRIVER_EN_ROUTE → ARRIVED → IN_PROGRESS → COMPLETED (or CANCELLED at any stage). Each transition is an event published to Kafka, enabling downstream consumers for analytics, fraud detection, and customer support tooling. Second, the entire system is multi-region active-active. Each city's data lives in the region closest to it, but we replicate ride and payment records cross-region for disaster recovery. Failover happens at the city level — if us-east-1 goes down, NYC rides can failover to us-west-2 within 60 seconds using pre-warmed standby dispatch instances that sync from the replicated Redis cluster.

📝 Annotation

Framing the trip as a state machine is clean and shows the candidate thinks in terms of event-driven architecture. The city-level failover granularity is a thoughtful detail that avoids the complexity of global active-active while still providing resilience.

Mock Interview: Design Uber — Mock Transcript

Problem statement

Transcript

Key takeaways