Concept · Machine Learning Systems

Feature Store

01

Why this matters

"User's last 7-day click count" is a feature. Your training pipeline computes it from click logs in Spark; your serving pipeline computes it from a Kafka stream. The two implementations drift — one bug here, one rounding there. Models trained on offline features mispredict on different online features. Training-serving skew kills more ML systems than bad models.

A feature store is the single source of truth for features. One implementation, two access paths: batch (training) and online (serving). Production-grade ML at every serious company runs on one — Uber Michelangelo, Airbnb Zipline, Instacart, Lyft, all have their own. Feast and Tecton are the open + commercial standards.

02

The dual-store model

Every feature has two homes:

  • Offline store — historical feature values for every entity at every point in time. Stored in a data warehouse (BigQuery, Snowflake) or lakehouse (Iceberg, Delta). Used to construct training data by joining labels with feature values as of the prediction time.
  • Online store — latest feature values per entity, sub-millisecond lookup. Stored in a KV store (Redis, DynamoDB, Cassandra). Used at serving time when the model needs features for a fresh prediction.

Both populated by the same feature transformation code. Same logic, two destinations. That's the magic.

03

Point-in-time correctness

The killer feature (pun) of a feature store is point-in-time joins. To build training data without leaking the future:

  1. Take the labels: e.g. (user_id, did_purchase, timestamp) rows.
  2. For each row, join the feature values as they were at that timestamp. Not the latest value — the historical value at that exact moment.
  3. Train on the result.

Without this, your training data has features computed from data that didn't exist yet at prediction time. Model looks great offline; flops in production. Feature stores enforce correctness via SQL window functions (LAST_VALUE … RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) or temporal joins.

Feature Store Architecture Mermaid
flowchart LR R[Raw events
Kafka / S3] --> T[Transformation
same code] T --> O[(Offline store
BigQuery / Iceberg)] T --> N[(Online store
Redis / DynamoDB)] O -->|point-in-time join| TR[Training pipeline] TR --> M[Model training] N -->|sub-ms lookup| S[Serving service] M -->|deploy| S S --> P[Prediction]
< 5 ms
online lookup p99
100s-1000s
features per request
15 min
typical batch refresh interval
~seconds
streaming feature freshness
04

Three feature classes

ClassUpdate cadenceSourceExamples
Batch featuresHourly / dailySpark over warehouse"7-day rolling spend", "cohort age"
Streaming featuresSecondsFlink / Kafka Streams"clicks last 5 min", "session length"
Request-time featuresPer-requestComputed online"current geo distance", "device type"
05

Deep dive — why most teams fail at feature stores

Feature stores look simple in slides ("just two stores!"). They die in production for three reasons:

1. Schema sprawl. Teams add features ad-hoc; nobody owns the catalog. After a year, you have 5,000 features, 30% duplicates with subtly different names. Discoverability dies. Mitigation: feature registry with mandatory metadata (owner, refresh, lineage), feature-of-the-month review.

2. Online/offline drift. The "same code" promise breaks the moment someone hot-fixes the streaming job without backporting to batch. CI must replay the same input through both pipelines and assert numeric equivalence (within float epsilon). Tecton enforces this; rolling your own you'll skip it once and pay forever.

3. Freshness vs cost. 15-min batch features are cheap; 30-second streaming features cost 100× more compute. Most teams over-stream. Rule: only stream the features the model proves benefit from at sub-minute freshness.

Interview answer

"We use a feature store with point-in-time joins for training and a sub-millisecond online store for serving. Same transformation code feeds both, so no training-serving skew. Most features are batch-refreshed; a few critical ones stream via Flink. Feature registry catalogs ownership and lineage."

06

Real-world

Uber Michelangelo

The original

Pioneered the pattern in 2016. Powers ETA, ride pricing, driver matching. Internal-only but inspired everything that followed.

Feast

Open-source standard

Kubernetes-native, multi-store backends (BigQuery + Redis typical). Adopted by Robinhood, Twitter, Salesforce.

Tecton

Commercial

Founded by ex-Uber Michelangelo team. SaaS feature store with Spark + streaming + online serving baked in.

Airbnb Zipline

Internal

Stream-first feature store. Influenced Chronon (open-source) which runs at Airbnb + Stripe scale.

07

Used in problems

Recommendation algorithm uses a feature store to share user/item features between training + serving. News feed ranker pulls real-time engagement features. Leaderboard recomputes scoring features in a streaming pipeline.

Next up