Concept · Observability & Security

GDPR — Right to Be Forgotten

01

Why this matters

EU GDPR Article 17: a user can request all of their personal data be deleted. You have 30 days. "Delete a row" sounds simple. The problem: that user's data is in the live DB, in 7 read replicas, in 30 days of database backups, in Kafka topic retention, in the data warehouse, in Elasticsearch search indices, in S3 logs, in cache layers, in third-party services (Stripe, Mailchimp, Datadog), and in offline BigQuery snapshots used for ML training.

Architectures that can delete this completely look very different from those that just UPDATE users SET deleted = true. GDPR (and CCPA, LGPD, and similar laws spreading globally) forces an architectural reckoning that most teams discover only after the first regulator letter.

02

Soft delete is not enough

The naive approach: UPDATE users SET deleted = true. Hide from queries via WHERE NOT deleted. Done?

No. The data is still there. Backups still have it. Read replicas still have it. The user's name in the orders table still exists. Your search index returns hits. The data warehouse still trains ML models on it. None of this satisfies GDPR.

GDPR requires actual deletion — the data must be gone in a reasonable timeframe (typically 30 days) from every system, with documented evidence.

03

Three architectural patterns

PatternHow deletion worksTradeoff
Hard delete propagatedDELETE from primary; CDC propagates DELETE to every downstreamConceptually clean; many places to coordinate; backups still have data
Crypto-shreddingEach user's PII encrypted with a per-user key; deleting the key makes all ciphertext uselessMost elegant; data persists but is unreadable; works across backups + warehouses
PseudonymizationReplace PII with anonymous IDs in analytics/warehouse; mapping table the only place with real data; delete from mapping tableExcellent for analytics; doesn't help with operational data
04

Deep dive — crypto-shredding, the elegant answer

The pattern: every user's PII fields are encrypted with a per-user symmetric key. Master KMS holds the per-user keys (or encrypts them). To "forget" the user: delete their key. The encrypted bytes still exist everywhere — in the DB, in backups, in the warehouse, in S3 archives — but nobody can ever decrypt them again. They're cryptographically equivalent to noise.

Why this works:

  • You don't have to chase data across 30 systems. Backups can keep the ciphertext forever — it's gibberish.
  • Deletion completes in seconds (one KMS key delete) regardless of how much data you have for the user.
  • Audit-friendly: KMS logs the key deletion; you have proof.
  • Reversible operationally if the user changes their mind within a grace window (don't actually delete the key for 30 days).

Catches:

  • Search indices: encrypted values aren't searchable. Use blind indexes for lookups; those need explicit invalidation too.
  • References from other users (e.g., user A's order that was placed by user B) need pseudonymization, not crypto-shredding.
  • Aggregates ("user X spent $1500") aren't PII once aggregated. Crypto-shredding doesn't unwind them; that's fine.
The interview answer

"For GDPR-grade deletion we use crypto-shredding. Every user's PII fields are encrypted with a per-user key managed in KMS. On a deletion request we delete the key — backups, warehouse, search indices keep the ciphertext, but it's now permanently undecryptable. One operation, complete deletion, with audit log proof."

05

The realistic implementation checklist

What a GDPR-compliant deletion actually involves:

  1. Catalog every system that touches user data. Primary DB, replicas, caches, queues, warehouse, search, logs, third parties (Stripe, Mailchimp). Most companies discover 20+ systems.
  2. Tag PII vs non-PII fields in each system. "User name" — PII. "Aggregated revenue" — not.
  3. Retention policies per system. Logs auto-rotate; backups expire. Set retention so old data falls off naturally.
  4. Deletion workflow. User requests deletion → workflow fans out: delete from primary (cascades via CDC) + invalidate caches + crypto-shred backups + DELETE from warehouse + delete from third parties via their APIs + invalidate search indices.
  5. 30-day grace period. User can change their mind. Soft-mark for deletion; actual hard-delete on day 30.
  6. Audit log. Track each step. Be ready to prove to regulators.
  7. Test: have a real "delete me" button. Verify across all systems. Don't discover gaps in production.
06

Real-world

Stripe

Crypto-shredding for PII

Documented their crypto-shredding pattern publicly. Per-account encryption keys; key deletion is the deletion event. Audit-trailed.

Slack

Workspace-level pseudonymization

For deleted accounts, name + email replaced with pseudonyms in messages. Original PII deleted from primary; messages remain readable to other workspace members.

Google Takeout / Account Delete

Cross-product orchestration

Account deletion flows through every Google product (Mail, Photos, YouTube, Drive). Documented pipeline; regulatory-required.

Most startups

Soft-delete with manual sweep

Realistic state. Soft-delete in primary; quarterly cleanup job on warehouse + caches. Often gets a regulator warning before they upgrade.

07

Used in problems

News feed must support user deletion across feed-cache + post DB + interaction logs + ML training pipelines. E-commerce deletes customer PII while preserving anonymized order analytics. WhatsApp's E2EE makes "forget" automatic for content but profile data still needs explicit handling.

Next up