Problem Statement
Design a payment gateway that sits between a customer clicking "Pay Now" and money actually moving between bank accounts. This isn't just "process a credit card" — it's a multi-party financial relay across 6 participants (customer, merchant, gateway, processor, card network, issuing bank) where every handoff is a network call that can fail, timeout, or return ambiguous results.
The system must process millions of financial transactions per day while guaranteeing exactly-once semantics at the business level. Most systems tolerate inconsistency — a tweet appearing late is fine, a leaderboard being slightly stale is acceptable. Payments are different: this is the one system where every failure mode has a direct financial consequence.
Core question: How do you guarantee that money is never lost, duplicated, or stuck in limbo — even when networks fail, services crash, and third-party providers go down?
What Makes This Unique
Idempotency Is Existential
A network timeout after calling Visa means you don't know if the charge went through. Without bulletproof idempotency, retries cause double charges — not a bug, a lawsuit.
State Machine Is the Product
A payment isn't a single action — it's a lifecycle: INITIATED → AUTHORIZED → CAPTURED → SETTLED (or FAILED, REFUNDED, DISPUTED). Every transition must be auditable and reversible.
Distributed Saga Across Organizations
You can't wrap Visa, your DB, and Kafka in a single transaction. You need sagas, compensation, and reconciliation across systems you don't control.
Compliance Shapes Architecture
PCI-DSS mandates network segmentation for card data. The tokenization vault runs in an isolated environment. Legal constraints drive technical decisions.