Exercise · Financial & Trading

Real-time Fraud Detection

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points45 min whiteboardReference solution →

Prompt

A system that scores every incoming payment / account action in under 100 ms and decides: approve, review, or block. The hard parts: a low-latency feature store that can answer "how many card swipes has this user made in the last 5 minutes?" in < 10 ms; an online ML model co-located with features returning a risk score in single-digit milliseconds; and a feedback loop that closes the gap between a human-confirmed chargeback (days later) and retraining the model, without letting the fraudsters get a week head-start. Stripe Radar, PayPal, Visa all run systems like this at 10K+ tx/sec.

Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

Hints (progressive — click to reveal)

Hint 1

Rules + ML hybrid. Pure-ML is wrong. Mention rule engine short-circuit as a core design piece.

Hint 2

Online + offline feature store. Same features for training + serving. Training-serving skew is a classic production bug; show you know about it.

Hint 3

Label delay is fundamental. Acknowledge chargebacks take days-to-weeks. Don't propose "train on today's data, serve tomorrow" without caveats.

Rubric — 10 points

+2 Rules + ML hybrid. Pure-ML is wrong. Mention rule engine short-circuit as a core design piece.
+2 Online + offline feature store. Same features for training + serving. Training-serving skew is a classic production bug; show you know about it.
+2 Label delay is fundamental. Acknowledge chargebacks take days-to-weeks. Don't propose "train on today's data, serve tomorrow" without caveats.
+2 Graceful degrade. Payment auth can't hang. Fall back to rules + conservative threshold on any dependency failure. Payments > precision.
+1 Graph features matter. Single-tx features miss rings. Shared-device-fingerprint clusters are a cheap and effective signal.
+1 Explainability isn't nice-to-have. Compliance requires it. SHAP or tree-derived feature contributions are the standard.

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

Red flags (things that tank the interview)

No back-of-envelope estimation — jumps straight into components without quantifying scale for Real-time Fraud Detection
Single point of failure — no replication, failover, or redundancy discussed
Ignores data model and storage choices — hand-waves the database layer