Post-mortem · Deploy / operator error

Knight Capital

A stale deployment on 1 of 8 trading servers caused Knight's algorithm to submit millions of unintended orders in 45 minutes. Loss: $440 million. The firm was insolvent by end of day; absorbed by a competitor within weeks.

Stale deployTrading$440M lossFirm-ending
01

TL;DR

Engineers deployed a new trading feature ("RLP" — retail liquidity program) to Knight's 8 SMARS trading servers. One server was missed. On that server, a flag named PowerPeg (an old test routine from 2003 that had been dormant for 9 years) was accidentally repurposed to control the new RLP feature. When market opened, the 7 updated servers ran RLP; the 1 stale server ran PowerPeg — which buys high and sells low as fast as possible, in a test scenario. There was no kill switch. 45 minutes to realize + stop. $440M net loss on ~4M trades.

02

Timeline

  • Late July 2012 — Engineers begin deploying new RLP (Retail Liquidity Program) feature.
  • July 31 evening — Deploy to 7 of 8 trading servers. 8th missed. No automated check catches it.
  • Aug 1, 09:30 ET — US market opens. RLP flag turns on via exchange message. On the 7 updated servers: RLP activates. On the stale 8th: the PowerPeg test code (re-purposed flag on stale server) fires.
  • 09:30–10:15 ET — PowerPeg rapidly sends millions of marketable orders into ~150 NYSE securities. Each trade nets a small loss. Knight accumulates $440M of losses.
  • ~10:15 ET — Engineers identify stale server, shut it down. Knight now holds massive unwanted positions; unwinding over the next days.
  • Aug 2 — Knight announces $440M pre-tax loss. Stock drops ~75% in 2 days.
  • Dec 2012 — Knight merged with Getco in rescue deal.
03

Root cause

Multiple compounding failures:

  1. Dead code not deleted. The PowerPeg test function from 2003 was still in the codebase 9 years later. When its flag name got reused, the dead code reanimated.
  2. Manual deploy, no automated verification. The deploy was eight separate actions; missing one went unnoticed. No post-deploy "all 8 servers on same build" check.
  3. No kill switch for algos. When the runaway trading started, there was no "halt all trading" button. Engineers had to identify and kill the process manually.
  4. No position limits. The rogue algo could take arbitrarily large positions without hitting a firm-level risk limit.
04

Blast radius

$440M loss in 45 minutes. Knight Capital, a major HFT + market-making firm ($400M+ revenue/year), became insolvent the same day. Forced to raise $400M emergency capital at distressed terms; merged with Getco within months. 1,400 employees affected. Market-wide impact was smaller (no system failure, just one participant's P&L), but regulatory scrutiny of HFT intensified. SEC later fined Knight $12M for rule violations.

05

Lessons

  1. Delete dead code. Not "mark as deprecated" — delete. Dead code is a loaded gun with an unlabeled safety.
  2. Automate deploys; verify symmetry. If you have N servers, build verifies all N running same build before traffic returns. Don't trust humans with multi-server sequences.
  3. Kill switches for algorithmic trading. Now regulatory-required for most US venues. Firm-level limits on notional exposure, order rate, P&L drawdown — if exceeded, algo halts.
  4. Reuse of config flag names is dangerous. Namespace feature flags. Never reuse a flag name that once referenced a different code path.
06

Concepts in play