"Find all drivers within 2km of (37.77°N, 122.42°W)." A standard index on (lat, lng) doesn't help — you'd scan every row checking distance. Geospatial indexes let you answer proximity queries in milliseconds over millions of moving points. Without them, Uber, Yelp, Google Maps, and ride-sharing wouldn't exist.
Interview-critical if the problem involves "near me" or "within X distance."
02
Why lat/lng indexes don't work
Naive approach: index (lat, lng). Query: WHERE lat BETWEEN 37.76 AND 37.78 AND lng BETWEEN -122.43 AND -122.41. The DB uses the lat portion of the index to narrow down candidates, then filters by lng. For a small radius this works — for a complex region or curved distance, it's wrong (Euclidean ≠ Earth-surface distance).
Also: range queries on two dimensions don't compose well in a B-tree. You need a spatial data structure that understands proximity.
03
Four approaches
Index
Idea
Strength
Weakness
Geohash
Encode lat/lng as a Base32 string via recursive rectangle subdivision
String prefixes = nearby cells. Works with any KV store or B-tree.
Edge cells — points near cell boundaries may be stored "far" by prefix even if geographically close
Quadtree
Recursive 2D subdivision into 4 quadrants until each leaf has ≤ k points
Adapts to density. Dense areas subdivided more.
More complex to implement; mutation-heavy as points move
R-tree
Spatial analog of B-tree. Each node covers a bounding box; children are sub-boxes.
Handles regions (not just points). Good for "which polygons contain this point?"
Overlap between sibling boxes complicates queries
S2 (Google)
Project Earth onto 6 cube faces; each face subdivided via Hilbert curve
Precise spherical math. Fast neighbor queries. Globally consistent cells.
Library-heavy; harder to understand than geohash
04
Geohash — simplest and most common
Encode (lat, lng) as an interleaved bit sequence, then Base32. "dr5ru" maps to a ~4km cell; "dr5ruj" to a ~600m cell inside it; "dr5ruj4" to a ~80m cell; and so on — each extra character cuts the cell size ~4×.
Prefix property: two points in the same cell share a prefix. Query: WHERE geohash LIKE 'dr5ru%' finds everything in that 4km region. Drop one character → search the 16km parent region. This turns 2D proximity into 1D string prefix — works with any B-tree or sorted KV store.
Edge problem: two points could be 100m apart but in different parent cells (one is in dr5ruj, the other dr5rum just across the boundary). Fix: query the point's cell plus 8 neighbor cells. Union the results. You never miss a nearby point.
05
Deep dive — the Uber driver lookup
Uber's real-time "drivers near this rider" query at 1M+ writes/sec is the archetypal geospatial problem. The design:
Each driver has a cell key. Compute geohash-6 (~600m cell) from driver's current lat/lng. Update every 4 seconds as they move.
Drivers indexed by cell. Redis sorted set per cell: cell:dr5ruj → {driver_42, driver_58, ...}. Score = last-update timestamp.
When a rider requests, compute rider's cell + 8 neighbors. SUNION those 9 sets. Filter by driver availability. Sort by Euclidean distance to rider.
Stale drivers aged out. Any driver whose score (last-update) < now − 30s is removed from their cell. They've gone offline or their app is stuck.
Query latency: < 20ms for a city-sized density. Cell writes sharded across Redis instances by cell hash. The cell hierarchy keeps dense cities from becoming hotspots — you can shard sub-cells when density exceeds a threshold.
Uber uses a variant called H3 (hexagonal cells they open-sourced) instead of geohash, because hexagons have better neighbor-equidistance properties than squares. Same core idea.
9-Cell Neighbor Query — Rider in Center, Driver Across BoundarySVG
9
cells per neighbor query
~600 m
geohash-6 cell width
~150 m
geohash-7 cell width
< 20 ms
9-SUNION at city density
06
Real-world
Redis GEO commands
Geohash under the hood
GEOADD, GEOSEARCH. Built-in to Redis. Fast, practical. Used by many "near me" features.
PostGIS
R-tree on Postgres
Industrial-strength geospatial. Complex queries (polygon containment, spatial joins). Most mapping backends use it.
Google S2
The library
Open-source, cubic-projection geospatial cells. Used by Google Maps, Pokemon GO. More accurate than geohash at high latitudes.
Uber H3
Hexagonal cells
Open-sourced from Uber. Preferred for uniform-radius neighbor queries. Hex cells eliminate the "corner" problem of square grids.
07
Used in problems
Uber uses H3 + Redis for driver matching. Yelp/Google Places uses geohash for business proximity. Google Maps uses S2 for tile indexing and route computation. Any "near me" feature is backed by one of these.