Geospatial Indexes

01

Why this matters

"Find all drivers within 2km of (37.77°N, 122.42°W)." A standard index on (lat, lng) doesn't help — you'd scan every row checking distance. Geospatial indexes let you answer proximity queries in milliseconds over millions of moving points. Without them, Uber, Yelp, Google Maps, and ride-sharing wouldn't exist.

Interview-critical if the problem involves "near me" or "within X distance."

02

Why lat/lng indexes don't work

Naive approach: index (lat, lng). Query: WHERE lat BETWEEN 37.76 AND 37.78 AND lng BETWEEN -122.43 AND -122.41. The DB uses the lat portion of the index to narrow down candidates, then filters by lng. For a small radius this works — for a complex region or curved distance, it's wrong (Euclidean ≠ Earth-surface distance).

Also: range queries on two dimensions don't compose well in a B-tree. You need a spatial data structure that understands proximity.

03

Four approaches

Index	Idea	Strength	Weakness
Geohash	Encode lat/lng as a Base32 string via recursive rectangle subdivision	String prefixes = nearby cells. Works with any KV store or B-tree.	Edge cells — points near cell boundaries may be stored "far" by prefix even if geographically close
Quadtree	Recursive 2D subdivision into 4 quadrants until each leaf has ≤ k points	Adapts to density. Dense areas subdivided more.	More complex to implement; mutation-heavy as points move
R-tree	Spatial analog of B-tree. Each node covers a bounding box; children are sub-boxes.	Handles regions (not just points). Good for "which polygons contain this point?"	Overlap between sibling boxes complicates queries
S2 (Google)	Project Earth onto 6 cube faces; each face subdivided via Hilbert curve	Precise spherical math. Fast neighbor queries. Globally consistent cells.	Library-heavy; harder to understand than geohash

04

Geohash — simplest and most common

Encode (lat, lng) as an interleaved bit sequence, then Base32. "dr5ru" maps to a ~4km cell; "dr5ruj" to a ~600m cell inside it; "dr5ruj4" to a ~80m cell; and so on — each extra character cuts the cell size ~4×.

Prefix property: two points in the same cell share a prefix. Query: WHERE geohash LIKE 'dr5ru%' finds everything in that 4km region. Drop one character → search the 16km parent region. This turns 2D proximity into 1D string prefix — works with any B-tree or sorted KV store.

Edge problem: two points could be 100m apart but in different parent cells (one is in dr5ruj, the other dr5rum just across the boundary). Fix: query the point's cell plus 8 neighbor cells. Union the results. You never miss a nearby point.

05

Deep dive — the Uber driver lookup

Uber's real-time "drivers near this rider" query at 1M+ writes/sec is the archetypal geospatial problem. The design:

Each driver has a cell key. Compute geohash-6 (~600m cell) from driver's current lat/lng. Update every 4 seconds as they move.
Drivers indexed by cell. Redis sorted set per cell: cell:dr5ruj → {driver_42, driver_58, ...}. Score = last-update timestamp.
When a rider requests, compute rider's cell + 8 neighbors. SUNION those 9 sets. Filter by driver availability. Sort by Euclidean distance to rider.
Stale drivers aged out. Any driver whose score (last-update) < now − 30s is removed from their cell. They've gone offline or their app is stuck.

Query latency: < 20ms for a city-sized density. Cell writes sharded across Redis instances by cell hash. The cell hierarchy keeps dense cities from becoming hotspots — you can shard sub-cells when density exceeds a threshold.

Uber uses a variant called H3 (hexagonal cells they open-sourced) instead of geohash, because hexagons have better neighbor-equidistance properties than squares. Same core idea.

9-Cell Neighbor Query — Rider in Center, Driver Across Boundary SVG

9

cells per neighbor query

~600 m

geohash-6 cell width

~150 m

geohash-7 cell width

< 20 ms

9-SUNION at city density

06

Real-world

Redis GEO commands

Geohash under the hood

GEOADD, GEOSEARCH. Built-in to Redis. Fast, practical. Used by many "near me" features.

PostGIS

R-tree on Postgres

Industrial-strength geospatial. Complex queries (polygon containment, spatial joins). Most mapping backends use it.

Google S2

The library

Open-source, cubic-projection geospatial cells. Used by Google Maps, Pokemon GO. More accurate than geohash at high latitudes.

Uber H3

Hexagonal cells

Open-sourced from Uber. Preferred for uniform-radius neighbor queries. Hex cells eliminate the "corner" problem of square grids.

07

Used in problems

Uber uses H3 + Redis for driver matching. Yelp/Google Places uses geohash for business proximity. Google Maps uses S2 for tile indexing and route computation. Any "near me" feature is backed by one of these.

📺