Your recommendation model was trained on last month's data. New trends emerged. New users appeared. The model is already stale — predictions degrade by the hour. Do you re-train nightly (offline), continuously update weights as events stream in (online), or something in between? Different answers cost orders of magnitude in infrastructure and produce dramatically different freshness/correctness tradeoffs.
Most production ML systems are offline-trained, online-served. A small but important class genuinely needs online learning. Knowing which is which is the interview-grade insight.
02
The training cadence ladder
Mode
Update cadence
Where trained
Best for
Static (one-shot)
Once
Notebook → deploy
Stable patterns: spam classifier, language detection
Periodic offline
Daily / weekly
Spark / GPU cluster
Most production ML — recs, ranking, fraud baseline
Continual / incremental
Hours
GPU cluster, warm-start from previous
Evolving distributions (news, trends)
Online learning
Per event
Streaming SGD updates
Personalization with rapidly-shifting tastes; bandit problems
03
Why offline dominates
Despite all the hype, ~95% of production ML is periodic offline training. Reasons:
Reproducibility. Re-train on a fixed snapshot → identical weights. Online updates make every model state ephemeral, hard to debug.
Hyperparameter stability. Tuning a model is an offline experiment. Online updates with the wrong learning rate destroy what you trained.
Feature monitoring. Offline pipelines run validation, distribution-shift alerts, A/B comparisons. Online updates skip all that.
Rollback. Offline = checkpoint per training run. Roll back instantly. Online = "we lost the state from 3 hours ago" might be impossible.
Sufficient freshness. Daily-trained model + real-time features (via feature store) is often as good as continuous training and dramatically simpler.
04
When online learning genuinely wins
Online learning earns its complexity in three scenarios:
1. Multi-armed bandits (recommendation exploration). Show ad A or ad B? Don't wait a day to learn — every impression provides signal. Thompson sampling or UCB updates per click.
2. Drift-heavy problems. Twitter trending — what was hot 2 hours ago is dead. Fashion recs in fast-fashion ecommerce. News recommendation.
3. Cold-start personalization. A new user arrives. Static model has no signal. Bandit-style online updates from their first 5 actions converge fast.
Architecture: keep a base model trained offline (the "warm prior"). Apply online deltas on top — usually as a simple linear model or shallow correction layer. If online state is lost, fall back to base model. The two layers compose.
Offline + Online — The Hybrid PatternMermaid
flowchart LR
H[Historical data warehouse] -->|nightly job| OT[Offline trainer]
OT -->|model weights| M[Base model]
M --> S[Serving service]
E[Event stream clicks, purchases] -->|Kafka| OL[Online learner bandit / SGD]
OL -->|small delta layer| S
S -->|prediction + A/B logging| Log[Outcomes log]
Log -->|feeds back| H
Log -->|feeds back| E
Interview answer
"Default to nightly offline training with online feature freshness. Add an online bandit layer only where exploration matters (ad selection, fast-changing trends). Keep the offline model as a fallback so a bad online update can't break the system."
05
Operational pitfalls
Training-serving skew. Solved by sharing feature transformation code via a feature store.
Distribution drift. The world the model was trained on isn't the world it serves. Monitor input feature distributions vs training distributions; alert on KL divergence above threshold.
Label leakage. Training data accidentally contains future signal. Always use point-in-time joins.
Ground truth lag. "Did the user buy?" answer arrives hours/days later. Online models must handle delayed labels.
Catastrophic forgetting (online learning specific). Sustained updates on a new distribution erase old knowledge. Mitigate with replay buffers or elastic weight consolidation.
06
Real-world
Netflix recommendations
Hybrid
Daily offline retrain of the deep ranker. Lightweight online update layer for "things you just clicked on." A/B test framework compares both side by side.
Google Ads
Genuinely online
Click prediction model updates continuously. Bandit-style exploration for new ads. Massive online infrastructure — one of the few places that needs it.
Spotify Discover Weekly
Pure offline batch
Weekly playlist generation. Nothing online. Works because user taste changes slowly.
Stripe Radar
Daily retrain + online features
Fraud model retrained daily. Real-time features (current session, IP velocity) update online via Kafka — but model weights are static between training runs.
07
Used in problems
Recommendation algorithm uses offline training + online feature freshness. News feed retrains ranker daily but applies fresh engagement features online. Leaderboard's scoring weights rarely change — pure static.