Concept · Observability & Security

Tokenization & PCI Compliance

01

Why this matters

Your e-commerce app needs to store credit card numbers so customers don't re-enter them every time. The moment you do, your entire stack — every server that touches that data, every database that stores it, every backup it lands in — falls under PCI DSS (Payment Card Industry Data Security Standard). Annual audits, penetration tests, encryption requirements, network segmentation. Hundreds of thousands of dollars per year.

Tokenization is the architectural pattern that lets you keep the convenience without the compliance burden. Replace card numbers with meaningless tokens; the real numbers live in a vault that you don't operate. Stripe, Braintree, Adyen all run that vault for you. Your stack handles tokens; PCI scope shrinks 99%.

02

The mechanics

  1. Customer enters card in a form.
  2. Form posts directly to payment processor (Stripe Elements, Braintree Hosted Fields). Card number never touches your server. Browser → Stripe.
  3. Processor returns a token like tok_1OABC123 or PaymentMethod ID. This is a meaningless string that maps to the card in the processor's vault.
  4. Your server stores the token. Tokens are not card data; tokens are PCI-out-of-scope by design.
  5. To charge: POST to the processor with the token. Processor looks up real card; charges; returns success/fail. Your server still doesn't see the number.
  6. Customer comes back next month: charge the stored token. Same flow.

Net effect: your servers never store, log, or process card numbers. PCI scope reduces from "the entire infrastructure" to "the form field that posts to Stripe."

03

PCI scope reduction levels

ApproachPCI scopeEffort
Store card data yourselfFull PCI DSS Level 1 — your entire infrastructure$$$$ — annual audits, penetration tests, segmentation
Store hashed cards (own derivation)Still in scope — hashes considered card dataCommon myth; doesn't help
Tokenization via processor (server-side proxy)SAQ D — substantial$$ — server still touches data briefly
Tokenization via processor (Stripe Elements / iframe)SAQ A — minimal$ — annual self-attestation, no full audit

SAQ A vs SAQ D is the difference between "I sign a form once a year" and "I spend $200K and 6 months on compliance." The choice of integration pattern matters enormously.

04

Deep dive — vault tokenization vs format-preserving

Two flavors of tokens you'll encounter:

Vault tokens — the token is an opaque ID (tok_1OABC123). The mapping to the real card is in the vault. To use the token, you must call the vault. Tokens are useless to attackers without vault access.

Format-preserving tokens (FPE) — the token looks like a card number (4111-1111-2345-6789). Same length, valid Luhn checksum, even preserves the BIN (bank identifier). Used when downstream legacy systems require something card-shaped (mainframe billing, third-party integrations). Tokens still meaningless without the FPE key.

Most modern stacks use vault tokens. FPE shows up in older payment infrastructure where ripping out card-shape assumptions is impossible.

Detokenization: when you genuinely need the real card number (settlement, dispute resolution, regulatory request), the vault returns it under audit log + access control. Should be rare; most flows never detokenize.

Interview answer

"For payments we use Stripe Elements — card numbers post directly from browser to Stripe; we receive only tokens. Our infrastructure is PCI SAQ A — minimal scope, annual self-attestation. We never see, store, or log raw card data. For recurring charges, we store the token and charge it via Stripe's API."

05

Beyond payments — generalizing the pattern

Tokenization works for any sensitive data:

  • SSNs / national IDs. Vault stores; token in your DB.
  • Bank account numbers. Plaid issues tokens.
  • Health records. HIPAA tokenization vaults exist.
  • PII for analytics. Pseudonymization tokens let you analyze user behavior without storing names/emails in the data warehouse.

The principle: centralize sensitive data in one heavily-protected vault; decentralize everything else with tokens that are useless on their own. Reduces blast radius, simplifies compliance, makes data deletion (see GDPR) tractable — delete the vault row and every token everywhere is now meaningless.

06

Real-world

Stripe

The default tokenization layer

Powers tokenization for hundreds of thousands of merchants. Stripe Elements iframe captures card; merchant's server only sees tokens.

Braintree / Adyen

Alternatives

Same model. Hosted fields or SDK. Multi-acquirer support for global merchants.

Apple Pay / Google Pay

Network-level tokenization

Visa / Mastercard issue device-specific tokens (DPAN) instead of the real card number. Even merchants don't see the underlying PAN.

Plaid (banking)

Same model for ACH

Bank account numbers tokenized via Plaid; merchant sees a Plaid token, never a routing/account number.

07

Used in problems

Payment gateway is fundamentally a tokenization service. E-commerce checkout integrates a tokenization layer (Stripe / Braintree) to stay PCI SAQ A. Bidding platforms and crypto exchanges use similar patterns for sensitive financial identifiers.

Next up