06
Deep Dive — Candidate Generation Strategies
Candidate generation is where recommendation systems are won or lost. The ranking model can only score what the candidate generator retrieves. Four approaches, in increasing sophistication:
1. Item-Based Collaborative Filtering
Compute item-item similarity from co-occurrence in the interaction matrix. Simple, fast (Redis lookup), and naturally explainable: "Customers who bought X also bought Y." But suffers from popularity bias (popular items are similar to everything) and sparsity (99.99% of item pairs have zero co-occurrence at 50 M items).
2. Content-Based Filtering
Match item attributes (genre, brand, price) to user preference vector. Solves cold start for new items (metadata exists on day one). But over-specialises — no serendipity — and requires clean metadata.
3. Matrix Factorisation
Decompose the sparse user-item matrix into dense user and item embeddings in a shared latent space. Score = dot product. Discovers latent patterns that co-occurrence misses. The bridge to modern methods — but can't incorporate side features.
4. Two-Tower Neural Model (Production Standard)
Two separate neural networks produce user and item embeddings. User tower ingests demographics, session context, click history. Item tower ingests category, price, text embeddings. Score = dot product. The towers are decoupled — item embeddings are precomputed and indexed for ANN search. One user tower forward pass (~2 ms) + one ANN lookup (~5 ms) = candidate retrieval from 50 M items in 7 ms.
Key insight: Decoupling the towers is what makes the system serve at 20 K RPS. You do one neural net forward pass plus one ANN lookup instead of 50 M forward passes.
sequenceDiagram
participant C as Client
participant S as Serving Service
participant Cache as Rec Cache
participant ANN as ANN Index
participant CF as Item-CF Tables
participant FS as Feature Store
participant R as Ranking Model
C->>S: GET /recommendations
S->>Cache: Check precomputed recs
Cache-->>S: Cache HIT (70%) → return
Note over S: Cache MISS or stale ↓
S->>ANN: User embedding → top 500
S->>CF: Recent items → top 200
Note over S: + Content-based 100 + Trending 100
S->>S: Merge & dedup → ~1000 candidates
S->>FS: Fetch features for 1000 items
FS-->>S: User + item + cross features
S->>R: Score 1000 candidates
R-->>S: Ranked scores
S->>S: Re-rank (diversity, rules, explore)
S-->>C: Top 50 with explanations
ANN Index Internals (FAISS IVF+PQ)
Brute-force search across 50 M items: ~130 ms on CPU. Too slow. IVF clusters items into ~4096 cells via K-means, then searches only the nearest nprobe=64 clusters (~780 K items). PQ compresses each 128-d vector from 512 bytes to 32 bytes. Combined: ~3-5 ms for top-1000 at ~95% recall, using ~2 GB RAM.
| nprobe | Items scanned | Latency | Recall@1000 |
| 8 | ~98 K | 0.3 ms | ~75% |
| 32 | ~390 K | 1 ms | ~90% |
| 64 | ~780 K | 2 ms | ~95% |
| 128 | ~1.5 M | 4 ms | ~98% |
Ranking Model (Stage 2)
A Deep & Cross Network scores the 1000 candidates with rich cross-features: user × item interactions (price vs. user average, time since last category purchase, position bias correction). Multi-objective: P(click), P(purchase), P(return). Scoring 1000 items: ~20-30 ms on CPU.
Production Systems Combine All Four
Multiple generators run in parallel — two-tower ANN (500 candidates), item-based CF (200), content-based (100), trending (100), personal history (100) — merge, dedup, and send ~1000 candidates to the ranker. Each generator covers the others' blind spots.