Concept · Machine Learning Systems

Online vs Offline Training

01

Why this matters

Your recommendation model was trained on last month's data. New trends emerged. New users appeared. The model is already stale — predictions degrade by the hour. Do you re-train nightly (offline), continuously update weights as events stream in (online), or something in between? Different answers cost orders of magnitude in infrastructure and produce dramatically different freshness/correctness tradeoffs.

Most production ML systems are offline-trained, online-served. A small but important class genuinely needs online learning. Knowing which is which is the interview-grade insight.

02

The training cadence ladder

ModeUpdate cadenceWhere trainedBest for
Static (one-shot)OnceNotebook → deployStable patterns: spam classifier, language detection
Periodic offlineDaily / weeklySpark / GPU clusterMost production ML — recs, ranking, fraud baseline
Continual / incrementalHoursGPU cluster, warm-start from previousEvolving distributions (news, trends)
Online learningPer eventStreaming SGD updatesPersonalization with rapidly-shifting tastes; bandit problems
03

Why offline dominates

Despite all the hype, ~95% of production ML is periodic offline training. Reasons:

  • Reproducibility. Re-train on a fixed snapshot → identical weights. Online updates make every model state ephemeral, hard to debug.
  • Hyperparameter stability. Tuning a model is an offline experiment. Online updates with the wrong learning rate destroy what you trained.
  • Feature monitoring. Offline pipelines run validation, distribution-shift alerts, A/B comparisons. Online updates skip all that.
  • Rollback. Offline = checkpoint per training run. Roll back instantly. Online = "we lost the state from 3 hours ago" might be impossible.
  • Sufficient freshness. Daily-trained model + real-time features (via feature store) is often as good as continuous training and dramatically simpler.
04

When online learning genuinely wins

Online learning earns its complexity in three scenarios:

1. Multi-armed bandits (recommendation exploration). Show ad A or ad B? Don't wait a day to learn — every impression provides signal. Thompson sampling or UCB updates per click.

2. Drift-heavy problems. Twitter trending — what was hot 2 hours ago is dead. Fashion recs in fast-fashion ecommerce. News recommendation.

3. Cold-start personalization. A new user arrives. Static model has no signal. Bandit-style online updates from their first 5 actions converge fast.

Architecture: keep a base model trained offline (the "warm prior"). Apply online deltas on top — usually as a simple linear model or shallow correction layer. If online state is lost, fall back to base model. The two layers compose.

Offline + Online — The Hybrid Pattern Mermaid
flowchart LR H[Historical data warehouse] -->|nightly job| OT[Offline trainer] OT -->|model weights| M[Base model] M --> S[Serving service] E[Event stream
clicks, purchases] -->|Kafka| OL[Online learner
bandit / SGD] OL -->|small delta layer| S S -->|prediction + A/B logging| Log[Outcomes log] Log -->|feeds back| H Log -->|feeds back| E
Interview answer

"Default to nightly offline training with online feature freshness. Add an online bandit layer only where exploration matters (ad selection, fast-changing trends). Keep the offline model as a fallback so a bad online update can't break the system."

05

Operational pitfalls

  • Training-serving skew. Solved by sharing feature transformation code via a feature store.
  • Distribution drift. The world the model was trained on isn't the world it serves. Monitor input feature distributions vs training distributions; alert on KL divergence above threshold.
  • Label leakage. Training data accidentally contains future signal. Always use point-in-time joins.
  • Ground truth lag. "Did the user buy?" answer arrives hours/days later. Online models must handle delayed labels.
  • Catastrophic forgetting (online learning specific). Sustained updates on a new distribution erase old knowledge. Mitigate with replay buffers or elastic weight consolidation.
06

Real-world

Netflix recommendations

Hybrid

Daily offline retrain of the deep ranker. Lightweight online update layer for "things you just clicked on." A/B test framework compares both side by side.

Google Ads

Genuinely online

Click prediction model updates continuously. Bandit-style exploration for new ads. Massive online infrastructure — one of the few places that needs it.

Spotify Discover Weekly

Pure offline batch

Weekly playlist generation. Nothing online. Works because user taste changes slowly.

Stripe Radar

Daily retrain + online features

Fraud model retrained daily. Real-time features (current session, IP velocity) update online via Kafka — but model weights are static between training runs.

07

Used in problems

Recommendation algorithm uses offline training + online feature freshness. News feed retrains ranker daily but applies fresh engagement features online. Leaderboard's scoring weights rarely change — pure static.

Next up