EU GDPR Article 17: a user can request all of their personal data be deleted. You have 30 days. "Delete a row" sounds simple. The problem: that user's data is in the live DB, in 7 read replicas, in 30 days of database backups, in Kafka topic retention, in the data warehouse, in Elasticsearch search indices, in S3 logs, in cache layers, in third-party services (Stripe, Mailchimp, Datadog), and in offline BigQuery snapshots used for ML training.
Architectures that can delete this completely look very different from those that just UPDATE users SET deleted = true. GDPR (and CCPA, LGPD, and similar laws spreading globally) forces an architectural reckoning that most teams discover only after the first regulator letter.
02
Soft delete is not enough
The naive approach: UPDATE users SET deleted = true. Hide from queries via WHERE NOT deleted. Done?
No. The data is still there. Backups still have it. Read replicas still have it. The user's name in the orders table still exists. Your search index returns hits. The data warehouse still trains ML models on it. None of this satisfies GDPR.
GDPR requires actual deletion — the data must be gone in a reasonable timeframe (typically 30 days) from every system, with documented evidence.
03
Three architectural patterns
Pattern
How deletion works
Tradeoff
Hard delete propagated
DELETE from primary; CDC propagates DELETE to every downstream
Conceptually clean; many places to coordinate; backups still have data
Crypto-shredding
Each user's PII encrypted with a per-user key; deleting the key makes all ciphertext useless
Most elegant; data persists but is unreadable; works across backups + warehouses
Pseudonymization
Replace PII with anonymous IDs in analytics/warehouse; mapping table the only place with real data; delete from mapping table
Excellent for analytics; doesn't help with operational data
04
Deep dive — crypto-shredding, the elegant answer
The pattern: every user's PII fields are encrypted with a per-user symmetric key. Master KMS holds the per-user keys (or encrypts them). To "forget" the user: delete their key. The encrypted bytes still exist everywhere — in the DB, in backups, in the warehouse, in S3 archives — but nobody can ever decrypt them again. They're cryptographically equivalent to noise.
Why this works:
You don't have to chase data across 30 systems. Backups can keep the ciphertext forever — it's gibberish.
Deletion completes in seconds (one KMS key delete) regardless of how much data you have for the user.
Audit-friendly: KMS logs the key deletion; you have proof.
Reversible operationally if the user changes their mind within a grace window (don't actually delete the key for 30 days).
Catches:
Search indices: encrypted values aren't searchable. Use blind indexes for lookups; those need explicit invalidation too.
References from other users (e.g., user A's order that was placed by user B) need pseudonymization, not crypto-shredding.
Aggregates ("user X spent $1500") aren't PII once aggregated. Crypto-shredding doesn't unwind them; that's fine.
The interview answer
"For GDPR-grade deletion we use crypto-shredding. Every user's PII fields are encrypted with a per-user key managed in KMS. On a deletion request we delete the key — backups, warehouse, search indices keep the ciphertext, but it's now permanently undecryptable. One operation, complete deletion, with audit log proof."
05
The realistic implementation checklist
What a GDPR-compliant deletion actually involves:
Catalog every system that touches user data. Primary DB, replicas, caches, queues, warehouse, search, logs, third parties (Stripe, Mailchimp). Most companies discover 20+ systems.
Tag PII vs non-PII fields in each system. "User name" — PII. "Aggregated revenue" — not.
Retention policies per system. Logs auto-rotate; backups expire. Set retention so old data falls off naturally.
Deletion workflow. User requests deletion → workflow fans out: delete from primary (cascades via CDC) + invalidate caches + crypto-shred backups + DELETE from warehouse + delete from third parties via their APIs + invalidate search indices.
30-day grace period. User can change their mind. Soft-mark for deletion; actual hard-delete on day 30.
Audit log. Track each step. Be ready to prove to regulators.
Test: have a real "delete me" button. Verify across all systems. Don't discover gaps in production.
06
Real-world
Stripe
Crypto-shredding for PII
Documented their crypto-shredding pattern publicly. Per-account encryption keys; key deletion is the deletion event. Audit-trailed.
Slack
Workspace-level pseudonymization
For deleted accounts, name + email replaced with pseudonyms in messages. Original PII deleted from primary; messages remain readable to other workspace members.
Google Takeout / Account Delete
Cross-product orchestration
Account deletion flows through every Google product (Mail, Photos, YouTube, Drive). Documented pipeline; regulatory-required.
Most startups
Soft-delete with manual sweep
Realistic state. Soft-delete in primary; quarterly cleanup job on warehouse + caches. Often gets a regulator warning before they upgrade.
07
Used in problems
News feed must support user deletion across feed-cache + post DB + interaction logs + ML training pipelines. E-commerce deletes customer PII while preserving anonymized order analytics. WhatsApp's E2EE makes "forget" automatic for content but profile data still needs explicit handling.