10
Deep Dive — ETA Calculation
When the rider sees "3 min away," that's not a guess — it's a prediction pipeline with four layers, each trading accuracy for speed.
Why Not Just Use Google Maps?
Cost + Latency Kill It
5 ETA calculations per app open × 1M opens/day = 5M API calls/day per city. At $5/1K calls = $25,000/day per city. Across 500 cities → $12.5M/day. Plus 100–300ms latency per call. Uber built their own routing engine.
The Four-Layer ETA Pipeline
| Layer | When Used | Latency | Accuracy |
| Haversine + Detour Factor | Rider browsing, quick filter | <1ms | ±40% |
| Pre-Computed Zone Matrix | "X min away" on nearby drivers | <1ms | ±20% |
| Contraction Hierarchies | After match — "arriving in X" | 5–50ms | ±10% |
| ML Correction | Final correction layer | +5ms | ±5–7% |
Layer 1 — Haversine + Detour Factor
straight_line = haversine(driver_lat, driver_lng, rider_lat, rider_lng)
estimated_dist = straight_line × detour_factor // Manhattan: 1.4, London: 1.7
eta = estimated_dist / avg_speed_at_time_of_day // 8am rush: 15km/h, 3am: 40km/h
Sub-millisecond. Just arithmetic. Good enough for "is the nearest driver ~3 minutes or ~15 minutes away?" — the coarse filter.
Layer 2 — Pre-Computed Zone Matrix
Divide the city into ~500m hexagonal zones. Pre-compute travel time between zone pairs within 10–15km (sparse matrix: ~1.5M entries, 6 MB in memory). Updated every 5–15 min from crowdsourced trip data. Lookup: O(1) hash map.
Layer 3 — Contraction Hierarchies (The Big One)
Full graph-based shortest path on the road network. Nodes = intersections. Edges = road segments. Weights = travel time (not distance — a 1km highway takes 30s, a 1km local road takes 3 min).
How Contraction Hierarchies Work — The Highway Analogy
You drive cross-city the same way: "get on Highway 1, take it across, exit at 5th Ave." You skip unimportant streets mentally. CH formalizes this: offline, rank nodes by importance (highways > residential). Add "shortcut edges" that skip unimportant nodes. Online, search upward from source and downward from target — they meet at an important node (highway). Explores ~1,000 nodes instead of 1,000,000. Query drops from ~100ms to ~0.1ms — a thousand× faster.
Real-time traffic: Edge weights updated every 1–5 min using GPS data from active drivers (millions of real-time speed observations). Uses Customizable Contraction Hierarchies — hierarchy structure pre-computed once, weights swappable in ~1 second.
Layer 4 — ML Correction
Fixes systematic biases the graph misses: traffic lights, left turns, school zones, pickup complexity. Input: graph ETA + time/day/weather/zone. Trained on billions of historical trips. Improves accuracy from ~85% to ~90–93% within ±2 minutes.
The Data Flywheel
flowchart LR
A["Trip Completed"] --> B["Actual vs Predicted"]
B --> C["Zone Matrix Update"]
B --> D["Road Segment Weights"]
B --> E["ML Model Retrain"]
C --> F["Better ETAs"]
D --> F
E --> F
F --> G["More Trips"]
G --> A
Competitive moat: Every completed trip makes future predictions better. Uber has billions of historical trips. A new ride-hailing startup has zero trip data to train on.
Edge Cases
GPS Tunnel Problem
Driver enters tunnel, GPS lost for 2 min. System predicts progress based on route + average tunnel traversal time. Synthetic ETA updates without GPS.
Driver Goes Off-Route
Wrong turn or shortcut. Recalculate from current position every 30–60s. If deviation is significant, ETA adjusts automatically.