"User's last 7-day click count" is a feature. Your training pipeline computes it from click logs in Spark; your serving pipeline computes it from a Kafka stream. The two implementations drift — one bug here, one rounding there. Models trained on offline features mispredict on different online features. Training-serving skew kills more ML systems than bad models.
A feature store is the single source of truth for features. One implementation, two access paths: batch (training) and online (serving). Production-grade ML at every serious company runs on one — Uber Michelangelo, Airbnb Zipline, Instacart, Lyft, all have their own. Feast and Tecton are the open + commercial standards.
02
The dual-store model
Every feature has two homes:
Offline store — historical feature values for every entity at every point in time. Stored in a data warehouse (BigQuery, Snowflake) or lakehouse (Iceberg, Delta). Used to construct training data by joining labels with feature values as of the prediction time.
Online store — latest feature values per entity, sub-millisecond lookup. Stored in a KV store (Redis, DynamoDB, Cassandra). Used at serving time when the model needs features for a fresh prediction.
Both populated by the same feature transformation code. Same logic, two destinations. That's the magic.
03
Point-in-time correctness
The killer feature (pun) of a feature store is point-in-time joins. To build training data without leaking the future:
Take the labels: e.g. (user_id, did_purchase, timestamp) rows.
For each row, join the feature values as they were at that timestamp. Not the latest value — the historical value at that exact moment.
Train on the result.
Without this, your training data has features computed from data that didn't exist yet at prediction time. Model looks great offline; flops in production. Feature stores enforce correctness via SQL window functions (LAST_VALUE … RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) or temporal joins.
Feature Store ArchitectureMermaid
flowchart LR
R[Raw events Kafka / S3] --> T[Transformation same code]
T --> O[(Offline store BigQuery / Iceberg)]
T --> N[(Online store Redis / DynamoDB)]
O -->|point-in-time join| TR[Training pipeline]
TR --> M[Model training]
N -->|sub-ms lookup| S[Serving service]
M -->|deploy| S
S --> P[Prediction]
< 5 ms
online lookup p99
100s-1000s
features per request
15 min
typical batch refresh interval
~seconds
streaming feature freshness
04
Three feature classes
Class
Update cadence
Source
Examples
Batch features
Hourly / daily
Spark over warehouse
"7-day rolling spend", "cohort age"
Streaming features
Seconds
Flink / Kafka Streams
"clicks last 5 min", "session length"
Request-time features
Per-request
Computed online
"current geo distance", "device type"
05
Deep dive — why most teams fail at feature stores
Feature stores look simple in slides ("just two stores!"). They die in production for three reasons:
1. Schema sprawl. Teams add features ad-hoc; nobody owns the catalog. After a year, you have 5,000 features, 30% duplicates with subtly different names. Discoverability dies. Mitigation: feature registry with mandatory metadata (owner, refresh, lineage), feature-of-the-month review.
2. Online/offline drift. The "same code" promise breaks the moment someone hot-fixes the streaming job without backporting to batch. CI must replay the same input through both pipelines and assert numeric equivalence (within float epsilon). Tecton enforces this; rolling your own you'll skip it once and pay forever.
3. Freshness vs cost. 15-min batch features are cheap; 30-second streaming features cost 100× more compute. Most teams over-stream. Rule: only stream the features the model proves benefit from at sub-minute freshness.
Interview answer
"We use a feature store with point-in-time joins for training and a sub-millisecond online store for serving. Same transformation code feeds both, so no training-serving skew. Most features are batch-refreshed; a few critical ones stream via Flink. Feature registry catalogs ownership and lineage."
06
Real-world
Uber Michelangelo
The original
Pioneered the pattern in 2016. Powers ETA, ride pricing, driver matching. Internal-only but inspired everything that followed.
Founded by ex-Uber Michelangelo team. SaaS feature store with Spark + streaming + online serving baked in.
Airbnb Zipline
Internal
Stream-first feature store. Influenced Chronon (open-source) which runs at Airbnb + Stripe scale.
07
Used in problems
Recommendation algorithm uses a feature store to share user/item features between training + serving. News feed ranker pulls real-time engagement features. Leaderboard recomputes scoring features in a streaming pipeline.