Online vs Offline Training

01

Why this matters

Your recommendation model was trained on last month's data. New trends emerged. New users appeared. The model is already stale — predictions degrade by the hour. Do you re-train nightly (offline), continuously update weights as events stream in (online), or something in between? Different answers cost orders of magnitude in infrastructure and produce dramatically different freshness/correctness tradeoffs.

Most production ML systems are offline-trained, online-served. A small but important class genuinely needs online learning. Knowing which is which is the interview-grade insight.

02

The training cadence ladder

Mode	Update cadence	Where trained	Best for
Static (one-shot)	Once	Notebook → deploy	Stable patterns: spam classifier, language detection
Periodic offline	Daily / weekly	Spark / GPU cluster	Most production ML — recs, ranking, fraud baseline
Continual / incremental	Hours	GPU cluster, warm-start from previous	Evolving distributions (news, trends)
Online learning	Per event	Streaming SGD updates	Personalization with rapidly-shifting tastes; bandit problems

03

Why offline dominates

Despite all the hype, ~95% of production ML is periodic offline training. Reasons:

Reproducibility. Re-train on a fixed snapshot → identical weights. Online updates make every model state ephemeral, hard to debug.
Hyperparameter stability. Tuning a model is an offline experiment. Online updates with the wrong learning rate destroy what you trained.
Feature monitoring. Offline pipelines run validation, distribution-shift alerts, A/B comparisons. Online updates skip all that.
Rollback. Offline = checkpoint per training run. Roll back instantly. Online = "we lost the state from 3 hours ago" might be impossible.
Sufficient freshness. Daily-trained model + real-time features (via feature store) is often as good as continuous training and dramatically simpler.

04

When online learning genuinely wins

Online learning earns its complexity in three scenarios:

1. Multi-armed bandits (recommendation exploration). Show ad A or ad B? Don't wait a day to learn — every impression provides signal. Thompson sampling or UCB updates per click.

2. Drift-heavy problems. Twitter trending — what was hot 2 hours ago is dead. Fashion recs in fast-fashion ecommerce. News recommendation.

3. Cold-start personalization. A new user arrives. Static model has no signal. Bandit-style online updates from their first 5 actions converge fast.

Architecture: keep a base model trained offline (the "warm prior"). Apply online deltas on top — usually as a simple linear model or shallow correction layer. If online state is lost, fall back to base model. The two layers compose.

Offline + Online — The Hybrid Pattern Mermaid

Interview answer

"Default to nightly offline training with online feature freshness. Add an online bandit layer only where exploration matters (ad selection, fast-changing trends). Keep the offline model as a fallback so a bad online update can't break the system."

05

Operational pitfalls

Training-serving skew. Solved by sharing feature transformation code via a feature store.
Distribution drift. The world the model was trained on isn't the world it serves. Monitor input feature distributions vs training distributions; alert on KL divergence above threshold.
Label leakage. Training data accidentally contains future signal. Always use point-in-time joins.
Ground truth lag. "Did the user buy?" answer arrives hours/days later. Online models must handle delayed labels.
Catastrophic forgetting (online learning specific). Sustained updates on a new distribution erase old knowledge. Mitigate with replay buffers or elastic weight consolidation.

06

Real-world

Netflix recommendations

Hybrid

Daily offline retrain of the deep ranker. Lightweight online update layer for "things you just clicked on." A/B test framework compares both side by side.

Google Ads

Genuinely online

Click prediction model updates continuously. Bandit-style exploration for new ads. Massive online infrastructure — one of the few places that needs it.

Spotify Discover Weekly

Pure offline batch

Weekly playlist generation. Nothing online. Works because user taste changes slowly.

Stripe Radar

Daily retrain + online features

Fraud model retrained daily. Real-time features (current session, IP velocity) update online via Kafka — but model weights are static between training runs.

07

Used in problems

Recommendation algorithm uses offline training + online feature freshness. News feed retrains ranker daily but applies fresh engagement features online. Leaderboard's scoring weights rarely change — pure static.

📺

References & Videos

ML Training Paradigms

ByteByteGo · 10 min

Online vs Offline Training

Arpit Bhayani · 30 min

Online vs Offline Machine Learning

GeeksforGeeks

Michelangelo: Uber's ML Platform

Uber Engineering