Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.
Out of 10 points45 min whiteboardReference solution →
01
Prompt
A system that scores every incoming payment / account action in under 100 ms and decides: approve, review, or block. The hard parts: a low-latency feature store that can answer "how many card swipes has this user made in the last 5 minutes?" in < 10 ms; an online ML model co-located with features returning a risk score in single-digit milliseconds; and a feedback loop that closes the gap between a human-confirmed chargeback (days later) and retraining the model, without letting the fraudsters get a week head-start. Stripe Radar, PayPal, Visa all run systems like this at 10K+ tx/sec.
Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.
02
Hints (progressive — click to reveal)
Hint 1
Rules + ML hybrid. Pure-ML is wrong. Mention rule engine short-circuit as a core design piece.
Hint 2
Online + offline feature store. Same features for training + serving. Training-serving skew is a classic production bug; show you know about it.
Hint 3
Label delay is fundamental. Acknowledge chargebacks take days-to-weeks. Don't propose "train on today's data, serve tomorrow" without caveats.
03
Rubric — 10 points
+2 Rules + ML hybrid. Pure-ML is wrong. Mention rule engine short-circuit as a core design piece.
+2 Online + offline feature store. Same features for training + serving. Training-serving skew is a classic production bug; show you know about it.
+2 Label delay is fundamental. Acknowledge chargebacks take days-to-weeks. Don't propose "train on today's data, serve tomorrow" without caveats.
+2 Graceful degrade. Payment auth can't hang. Fall back to rules + conservative threshold on any dependency failure. Payments > precision.
+1 Graph features matter. Single-tx features miss rings. Shared-device-fingerprint clusters are a cheap and effective signal.
+1 Explainability isn't nice-to-have. Compliance requires it. SHAP or tree-derived feature contributions are the standard.
Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.
04
Red flags (things that tank the interview)
No back-of-envelope estimation — jumps straight into components without quantifying scale for Real-time Fraud Detection
Single point of failure — no replication, failover, or redundancy discussed
Ignores data model and storage choices — hand-waves the database layer