Hot-path in < 100 ms. The scoring service orchestrates:
- Rule engine (1–5 ms). Short-circuit for obvious blocks — known-bad IP, sanctioned country, velocity thresholds exceeded. Rules are fast and deterministic; ML isn't needed to say "decline a card transaction from 10 countries in 10 minutes."
- Feature fetch (10–30 ms). Parallel reads: online features (Redis), profile store (card history, user account age), graph store (device ↔ card ↔ IP clusters). All KVs keyed by entity_id.
- Feature engineering (2–5 ms). Combine raw features into model inputs — ratios, deltas, categorical encoding. Done in-process in the scoring service.
- Model inference (2–10 ms). GBDT (XGBoost/LightGBM) models dominate production fraud because they're fast, explainable, and perform well on tabular features. Neural nets occasionally used for specific signals (transaction-text embeddings, sequence models) but orchestrated alongside, not replacing.
- Decision (1 ms). Compare score to thresholds (per-merchant). Return
{score, decision, reason_codes}.
- Log everything async (not on critical path): tx + features + score + decision to Kafka → offline store for training.
Feature freshness — the velocity problem. Fraudsters exploit time gaps. If a stolen card has been used 5 times in the last minute and we only see "lifetime card velocity" computed nightly, we miss it. Online features need sub-second updates.
Pattern: stream processor reads every transaction event; maintains sliding-window counts per entity (user, card, IP, device) at multiple granularities (1 min, 5 min, 1 hr). Written to Redis keyed by {entity_id}:{window}:{stat}. The scoring service reads these in parallel with the user/merchant profile.
Exact counts at entity granularity are fine (a user has few recent transactions). But aggregate counts across entities (e.g., "how many distinct cards used this IP in the last hour") benefit from Count-Min Sketches for bounded memory.
Scoring Sequence — where the 100 ms budget goes
Mermaid
sequenceDiagram
participant C as Caller (payments)
participant S as Scoring svc
participant R as Rule engine
participant F as Feature store
participant M as Model svc
C->>S: POST /v1/score {tx}
S->>R: evaluate hard rules (~3 ms)
alt hard block
R-->>S: BLOCK + reason
S-->>C: {decision: block}
else no hard block
par parallel fetch
S->>F: online features (~15 ms)
S->>F: profile + graph (~10 ms)
end
S->>S: engineer inputs (~3 ms)
S->>M: infer (~5 ms)
M-->>S: score 0.0–1.0
S->>S: decision by threshold
S-->>C: {score, decision, explain}
end
S->>S: async emit to Kafka (off critical path)
Graph features — finding rings. Single transactions look innocent; the ring doesn't. Graph features capture network-level signals: "how many other accounts share this device fingerprint?" "Has this IP been used by accounts that later charged back?" Stored in a graph DB or a materialized graph (adjacency lists) in Bigtable. Updated by a graph updater consuming the event stream.
Feedback loop. Three label sources:
- Chargebacks (1–90 days later) — highest signal, lowest latency. Comes from payment network reports.
- User reports — "I didn't make this transaction." Fast (hours), but noisy (users sometimes forget).
- Analyst review — internal team labels borderline cases flagged by the model.
All labels flow into a join table (tx_id, features, label). A weekly training job pulls the last N days of labeled data, retrains GBDT, shadow-tests on recent traffic, then canary-deploys to a small % of tx. If metrics hold, promote to 100%.
The 1–90 day label delay is fundamental. You can't retrain on today's data today. Partial mitigation: heuristic labels (e.g., "3 chargebacks on this card in 30 days = probably fraud") extrapolated forward. Unsupervised anomaly detection supplements supervised on the most-recent window.
Interview answer
"Scoring service orchestrates: rule engine short-circuits hard blocks, parallel feature fetch (online features + profile + graph) < 30 ms, feature engineering, GBDT model inference < 10 ms, threshold decision. Features kept fresh by stream processor (Flink) updating Redis-backed windowed stats on every event — sub-second freshness. Feedback from chargebacks + user reports + analyst review feeds weekly retrain via shadow/canary/promote. Graph features capture ring structure. Explainability via SHAP for reviewers. Degrade gracefully to rule-only on model outage."