System Design — 24

Crypto Exchange

A cryptocurrency exchange like Coinbase or Binance: order-book trading across hundreds of coin pairs, wallets for hundreds of coins, deposits + withdrawals to the public blockchain, 24/7 operation. The hard parts: a matching engine that's stricter than a stock exchange (never miss a trade, but also never double-spend); multi-chain wallet management with hot/cold segregation — exchanges that hold everything in hot wallets get catastrophically hacked (Mt. Gox, FTX's neighbors); and deposit detection from blockchain — monitoring 20+ chains' finality rules simultaneously without over-crediting reorg-able transactions. Binance does ~$50B+ daily volume; peak trading is 100K+ trades/sec.

⚡ Core: Matching + Wallets + Blockchain Ingress~100K trades/sec24/7 operation200+ coinsCustodial wallets
02

Requirements

Functional
  • Order book trading — market + limit + stop orders for pairs (BTC/USD, ETH/BTC, ...)
  • Real-time market data: order book depth, trades, candles
  • Deposit any supported coin from any blockchain address
  • Withdraw to user-specified external address
  • Per-user wallet balances across all supported assets
  • KYC + AML compliance; account tiers unlock higher limits
  • Fiat on/off-ramp (USD, EUR) via bank / card rails
Non-Functional
  • Matching latency < 1 ms p99 per order (tight since pros trade here)
  • No lost orders, no phantom trades — strong consistency is sacred
  • 24/7/365 uptime; planned maintenance windows only
  • Support 20+ blockchains with different finality rules
  • Never expose > 5% of assets in hot wallets (industry norm post-Mt. Gox)
  • Scale to 100K+ trades/sec peak per pair
03

Scale Estimation

Trades / sec (peak)
~100K
Binance published 1.4M orders/sec peak matching engine capacity
Order book depth
~1M price levels
per active pair; most compact near mid-price; tail is long
Market data fanout
~1M WS conns
traders + bots subscribing to order book updates + trade feed
Chains monitored
~30
Bitcoin, Ethereum, Solana, etc.; each needs its own node + indexer
Hot wallet %
< 5%
rest in cold storage (multi-sig HSM / air-gapped); daily rebalance
Finality wait
varies
BTC: ~6 confirmations (~60 min); ETH: ~12 blocks (~3 min); Solana: finalized slot (~13 sec)
04

API Design

POST/api/v1/orders

Place an order. Body: {pair: "BTC-USD", side: "buy", type: "limit", price, quantity, client_order_id}. Idempotency via client_order_id. Returns {order_id, status}.

DELETE/api/v1/orders/{id}

Cancel resting order. Returns final status. Idempotent (cancelling already-filled order returns current state, not error).

GET/api/v1/orderbook?pair=BTC-USD&depth=100

Snapshot of order book top-N levels per side. Served from cached view for speed. Most pros use WS for incremental updates instead.

WSwss://ws.example.com/stream

Subscribe to channels: {orderbook: BTC-USD, trades: BTC-USD, ticker: *}. Server streams delta updates. Target latency < 10 ms per update.

GET/api/v1/account/balances

Returns all asset balances: {BTC: {available, reserved}, USD: {…}, …}. Reserved = held for open orders.

POST/api/v1/deposits/address

Get a unique deposit address for a coin. Backend either picks from pre-generated address pool or derives new one from master HD wallet. Address tied to user_id.

POST/api/v1/withdrawals

Request withdrawal. Body: {coin, amount, address, 2fa_code}. Goes through risk check, withdrawal queue, signing. External tx submitted after approval. Returns pending withdrawal_id.

05

Architecture

Three critical subsystems, isolated for failure-domain + security: (1) the trading engine — matching, balance holds, trade settlement — strongly consistent, low-latency; (2) the blockchain interface — nodes watching chains for deposits + signing withdrawals — security-critical; (3) the custody layer — hot wallet (online, minimal balance) + cold wallet (air-gapped, majority of funds).

Trading + Blockchain + Custody SVG
Trader / Bot REST + WS User app deposit/withdraw API Gateway auth + rate-limit Trading engine (low-latency) Risk svc balance hold pre-trade check Matching engine per-pair process priority queue Trade settler DB commit Market data pub/sub fanout Data plane Postgres orders + trades Ledger DB double-entry Redis balances cache Kafka trade events Blockchain + Custody Chain watchers BTC / ETH / SOL / ... Deposit detector finality per chain Withdrawal queue risk + manual review Signer svc HSM-backed Hot wallet < 5% funds Cold wallet air-gapped / multisig Treasury / rebalance automation daily sweep hot → cold when hot > threshold; drip cold → hot for outflows Compliance + Ops KYC / AML tiered limits Travel-rule counterparty disclosure Surveillance wash / spoofing Risk engine circuit breakers Proof-of-reserves Merkle commitments
Request Flow — Step Through
Trader · place orderRisk svc · balance holdMatching · per-pair processTrade settler · ledger commitMarket data · WS fanoutChain watcher · deposit finalitySigner + hot · withdrawal tx
Click Next Step to walk through the request flow.
06

Deep Dive — Matching + Deposit Detection + Custody

Matching engine. One process per trading pair, running a price-time priority order book in memory. Orders flow in via a deterministic sequencer (FIFO). Process:

  1. Pre-trade risk check holds funds: for buy 1 BTC @ $60k → reserve $60k in USD balance (plus fee). If insufficient, reject.
  2. Matching engine ingests the order; matches against opposite side at or better than limit. Emits fills.
  3. Trade settler consumes fills, writes to ledger (double-entry: buyer's BTC +, buyer's USD -, seller inverse), releases reserved balance delta.
  4. Market-data service fans out trade + order-book-delta to WS subscribers with minimal latency.

Critical detail: the matching engine is single-threaded per pair. Determinism + recoverability beat parallelism. State is journaled; on crash, replay from journal. This matches standard securities exchange design; the only crypto-specific piece is fee calculation + decimal precision on tiny fractions.

Deposit detection. Chain watchers run full nodes for each supported blockchain. For each block:

  1. Scan transactions for outputs paying to our known deposit addresses.
  2. Compare against user → address map; identify which user each incoming tx belongs to.
  3. Wait for chain-specific finality before crediting user's balance — Bitcoin needs ~6 confirmations (~60 min), Ethereum ~12 (~3 min), Solana waits for finalized slot (~13 s), Polygon uses separate checkpoint rules.
  4. On finality, credit balance via ledger write: (user_balance += amount).
  5. Notify user; surface in UI + WebSocket feed.

Missing finality rules is how exchanges get rekt. Crediting before finality → attacker double-spends → exchange eats the loss.

Withdrawal Flow (security-critical) Mermaid
sequenceDiagram participant U as User participant R as Risk svc participant Q as Withdrawal queue participant OP as Ops review participant SG as Signer (HSM) participant H as Hot wallet participant C as Chain U->>R: POST /withdrawals (addr, amt, 2fa) R->>R: MFA check + risk score alt low risk + within hot cap R->>Q: enqueue pending Q->>SG: sign tx from hot wallet SG->>H: built + signed tx H->>C: broadcast to blockchain C-->>H: tx confirmed (N blocks) H-->>U: withdrawal success else high risk or large amt R->>OP: flag for manual approval OP-->>R: approve R->>Q: enqueue else triggers cold-wallet draw R->>Q: enqueue; ops initiates cold-to-hot transfer Note over Q: may be hours end

Custody split: hot vs cold. Hot wallet holds online keys controlled by the signer service (HSM-backed) — fast to withdraw from, but if compromised, funds drain immediately. Cold wallet keeps the majority offline — multi-sig + hardware + air-gapped signing — practically impossible to drain remotely. Post-Mt. Gox norm: ≤ 5% hot, ≥ 95% cold. Daily treasury automation rebalances as outflows drain the hot side.

Proof of reserves. After FTX collapsed (2022), customers and regulators expect exchanges to cryptographically prove they hold users' assets. Standard approach: Merkle tree of customer balances + on-chain proof of wallet holdings. Anyone can verify their balance is included without exposing others'. Not a technical design choice — a trust / political one.

Interview answer

"Per-pair single-threaded matching engine with journaled state. Pre-trade risk holds balances; trade settler writes double-entry ledger. Chain watchers run full nodes for each supported blockchain; deposit crediting waits for chain-specific finality. Custody is hot/cold with ≤ 5% hot; withdrawals flow through risk scoring → signing via HSM → broadcast. Large or suspicious withdrawals require ops approval. Proof-of-reserves via Merkle tree of balances + on-chain wallet proofs."

07

Tradeoffs & Design Choices

  • Custodial vs non-custodial. Custodial exchanges (Coinbase, Binance) hold user funds; responsible for security. Non-custodial (Uniswap, DEXes) don't hold anything; users sign from their own wallets. Custodial is much easier UX; non-custodial is more regulatory + honesty straightforward. This design is custodial.
  • Single-process per pair vs sharded. Single process keeps price-time priority trivially correct. Sharding would require distributed matching, hard to get exactly right. Binance / Coinbase both use per-pair processes. The practical scale ceiling is high (tens of thousands/sec per pair) so single-process isn't the bottleneck.
  • Hot wallet size vs withdrawal speed. Larger hot wallet → faster withdrawals, bigger blast radius if compromised. Smaller hot wallet → safer, but large withdrawals delayed by cold-to-hot transfers. Most exchanges publish an approximate % to manage user expectations.
  • Finality vs UX. Waiting for 6 BTC confirmations is ~60 min — terrible UX. Some exchanges credit "pending" immediately (user can see + withdraw pending to other products), but block external withdrawal until finality. Balances the UX vs double-spend risk.
  • ACID database vs in-memory matching. Matching is in-memory for speed. Settlement lands in an ACID database (Postgres + ledger tables) for durability. Don't conflate — matching isn't a DB operation.
08

Failure Modes

🔑
Hot wallet key compromise
Attacker gets signer service keys; drains hot wallet. Historical exchange killers (e.g., Bitfinex 2016, ~$70M).
→ Mitigation: (a) HSM-backed keys (key material never in software); (b) multi-sig require M-of-N approvals for signing; (c) per-transaction rate limits; (d) cold-wallet majority so even full hot-wallet drain is survivable; (e) insurance fund (SAFU) to cover losses.
⛓️
Chain reorganization invalidates deposit
Exchange credited user's deposit after N confs; a reorg orphans that block; deposit "un-happens." User already traded it.
→ Mitigation: require enough confirmations per chain that reorg probability is cryptographically negligible; if reorg deeper than expected happens, pause deposits for that chain + manual reconciliation. Monitor reorg depth continuously.
🌀
Flash crash on an illiquid pair
Market order sweeps through thin order book; executes across 1000 price levels; final trade at 10% below mid.
→ Mitigation: per-order price protection (reject market orders that cross X% of mid); circuit breakers pause a pair if trades move > Y% in Z seconds; post-trade review for obviously-erroneous trades (rare but done).
🏃
Matching engine process crashes mid-fill
Process dies between matching + settlement. Order book + ledger could diverge.
→ Mitigation: all matching-engine mutations journaled (append-only log) before applying. On restart, replay journal deterministically from last-settled point. Settlement is idempotent by trade_id.
💰
Double-spend via concurrent orders
User with $1k USD submits two market-buy orders at same time, both for $1k. Both pass pre-check because balance hasn't been held yet.
→ Mitigation: pre-trade balance hold is atomic — reserve funds in the account row (DB transaction or Redis INCR with checked delta); matching engine rejects once reserved exceeds available. No race between check and matching.
🕵️
Sanctioned address sends deposit
Compliance-sanctioned address (OFAC list) deposits to a user. Exchange credits → regulatory nightmare.
→ Mitigation: every incoming tx runs through address-screening (Chainalysis / TRM); funds from sanctioned sources auto-quarantine + compliance review before crediting. Proactive — not after-the-fact.
09

Interview Tips

  1. Frame custody as the hard part. Matching is "stock exchange with different fee math." Custody + chain ingress are the crypto-specific complexity.
  2. Name finality per chain. BTC 6 confs, ETH 12 blocks, Solana finalized slots. Shows domain fluency.
  3. Hot / cold split. Mention the ≤ 5% hot heuristic. Interviewers know this; omitting it suggests you've never operated one.
  4. Per-pair single-threaded matching. Don't propose "distributed matching engine for throughput." It's wrong + candidates who say it fail hard here.
  5. Journaling + replay. Every mutation goes to a log before state update. Crash recovery is log replay. Universal pattern.
  6. Proof of reserves is current. Post-FTX reality; interviewers expect you to know it. Merkle tree of balances + on-chain wallet proofs.
11

Evolution

1

MVP — single DB matching + single hot wallet

Python matching against Postgres row updates. Single hot wallet with backup. Works to ~1k trades/day. Early Mt. Gox / Coinbase 2012.

2

In-memory matching + hot/cold split

Per-pair in-memory order book with journal. Majority of funds in cold storage with manual signing. ~10k trades/sec/pair.

3

Multi-chain support + automated custody

30+ chains; chain watchers for deposits. HSM-backed signer service. Automated daily cold-to-hot rebalance. Global exchange scale.

4

Regulatory + compliance layer

KYC/AML integration, sanction screening, travel-rule compliance. Surveillance system for market manipulation. Geo-restricted features per jurisdiction.

5

Proof-of-reserves + staking + derivatives

Published Merkle-tree proofs of reserves (post-FTX). Staking services for held coins. Futures + options product lines with own engines + risk systems.

Next up