Notification System

01

Problem Statement

Every major app — Uber, Amazon, WhatsApp, YouTube — has a notification system behind it. But the notification system is rarely the product itself. It's infrastructure that every other team depends on. The payments team needs it for OTPs. Marketing needs it for promos. Social needs it for "X liked your post."

Think of it like a postal service inside your company. Any team can hand you a letter (notification request), and your job is to figure out: who gets it, through which channel, how urgently, and to confirm it was delivered.

Core question: How do you build a single system that handles a 2-second OTP delivery and a 50-million-user promo blast with the same infrastructure, without one starving the other?

Real-world context: Facebook sends billions of notifications daily. Uber's OTP and ride-status notifications are on the critical path — if they fail, the business fails. Amazon's "your package shipped" emails are high-volume but tolerant of a few minutes' delay. All three coexist in a single notification platform.

02

Requirements

Functional Requirements

Multi-channel delivery — Push (FCM/APNs), SMS (Twilio), Email (SES/SendGrid), In-app (stored feed)
Priority levels — Critical (OTP), High (ride update), Medium (social), Low (promos)
User preferences — Per-channel opt-in/out, quiet hours, frequency caps
Template system — Reusable templates with variable substitution: Hi {{name}}, your order {{id}} shipped
Delivery tracking — Per-notification lifecycle: created → queued → sent → delivered → read → failed
Scheduling — Future-dated and recurring notifications (weekly digest every Sunday 9 AM)
Bulk fan-out — One event triggers notifications to millions (e.g., all Premium users in a region)
In-app notification feed — Paginated, chronologically ordered list per user

Non-Functional Requirements

High availability — Tier-0 infrastructure. If notifications go down, OTPs fail, payments stall, rides can't start.
At-least-once delivery — We'd rather send a duplicate than lose an OTP. Clients handle dedup.
Latency-tiered — Critical: <3 seconds. High: <30s. Low: minutes are fine.
Massive throughput — Millions of notifications per minute during peak events.
Backpressure-aware — FCM, APNs, Twilio, SES all have rate limits. Throttle per-provider without blocking urgent traffic.
Ordering for in-app — In-app feed chronologically ordered per user. Cross-channel ordering doesn't matter.

Key NFR Tensions

Latency vs Throughput

A 10M-user promo blast and a single OTP hit the same system. If promo messages flood the queue, the OTP waits. Forces us toward priority queues.

Reliability vs Cost

SMS is reliable but costs $0.01–0.05/msg. Push is free but requires app install. Channel selection becomes a routing problem.

Consistency vs Availability

Preference updates need strong consistency. Sending a promo after opt-out is a GDPR violation.

Simplicity vs Flexibility

Multi-channel fallback, conditional logic, A/B testing on copy — each addition makes the routing layer more complex.

03

Scale Estimation

Starting assumptions: 500M registered users, 100M DAU (20% daily active), 6 notifications per DAU per day across transactional, social, and marketing.

600M

Notifications / Day

1.16B

Delivery Attempts / Day

~35K/s

Peak Creation QPS

~67K/s

Peak Delivery QPS

~1 KB

Per Notification

~600 GB

Daily Storage

54 TB

Active Feed (90 days)

~4.2 TB

Daily Outbound Bandwidth

Channel Breakdown

Each notification may hit multiple channels. In-app: 100% (600M). Push: 70% (420M). Email: 20% (120M). SMS: 3% (18M, critical only). Total delivery attempts: ~1.16B/day.

What the Numbers Force

35K/s peak → async queue-based architecture (can't do sync). 54 TB feed → Cassandra/DynamoDB, not PostgreSQL. 50M bulk sends → separate fan-out pipeline with throttling. Provider rate limits → per-channel worker pools with independent rate limiters.

04

API Design

Two distinct consumer sets: internal services (send notifications, service-to-service auth) and end users (read feed, manage preferences, user JWT auth).

Send Single Notification — Internal

POST /v1/notifications
Headers: X-Service-Id, X-Api-Key, X-Idempotency-Key

{
  "user_id": "user_abc123",
  "priority": "critical",
  "channels": ["push", "sms"],
  "fallback_chain": true,
  "fallback_timeout_sec": 30,
  "template_id": "otp_verification",
  "template_vars": { "otp_code": "482901", "expiry_minutes": "5" },
  "metadata": { "deep_link": "app://verify?code=482901" },
  "ttl_sec": 300
}

→ 202 Accepted { "notification_id": "notif_7f8a9b2c", "status": "queued" }

Why 202 not 201: We're not delivering synchronously. The request is accepted into the queue. Idempotency key prevents duplicate OTPs on retry. Fallback chain: try push first, SMS after 30s if push fails. TTL: expired OTPs are dropped from queue.

Send Bulk Notification — Internal

POST /v1/notifications/bulk

{
  "segment": { "type": "query", "filters": { "subscription": "premium", "country": "AE" } },
  "priority": "low",
  "channels": ["push", "email"],
  "template_id": "promo_sale_v2",
  "template_vars": { "sale_name": "January Mega Sale", "discount": "30%" },
  "scheduling": {
    "send_at": "2025-01-20T09:00:00Z",
    "timezone_aware": true,
    "throttle_per_sec": 10000
  },
  "respect_preferences": true,
  "dedup_key": "uae-sale-jan2025"
}

→ 202 Accepted { "bulk_id": "bulk_3e4f", "estimated_recipients": 1200000 }

Get Notification Feed — User

GET /v1/users/me/notifications?cursor=notif_abc&limit=20&unread_only=false
Authorization: Bearer <jwt>

→ 200 OK {
  "notifications": [ { "notification_id", "title", "body", "deep_link",
                        "priority", "is_read", "created_at" }, ... ],
  "cursor": "notif_6a5b4c",
  "has_more": true,
  "unread_count": 7
}

User Preferences

GET /PUT /v1/users/me/notification-preferences

{
  "categories": {
    "transactional": { "push": true, "email": true, "sms": true, "mutable": false },
    "marketing":     { "push": true, "email": true, "sms": false, "mutable": true }
  },
  "quiet_hours": { "enabled": true, "start": "23:00", "end": "07:00",
                   "timezone": "Asia/Dubai", "override_for_critical": true },
  "frequency_caps": { "marketing_push_per_day": 3 }
}

Endpoint	Method	Consumer	Purpose
/v1/notifications	POST	Internal	Send single notification
/v1/notifications/bulk	POST	Internal	Send to segment
/v1/users/me/notifications	GET	User	Read feed (cursor-paginated)
/v1/users/me/notifications/read	PATCH	User	Mark as read
/v1/users/me/notification-preferences	GET/PUT	User	Manage preferences
/v1/templates	CRUD	Internal	Template management
Webhook callback	POST	System→Service	Delivery status updates

05

High-Level Architecture

The notification system is a pipeline: a notification enters at one end and exits through a provider at the other. Everything in between is routing, filtering, prioritizing, and tracking.

Request Flow — Step Through

Payments Svc→Ingestion API→Redis Dedup→Kafka Critical→Processing→Push Worker→FCM→Delivery Tracker

Click Next Step to walk through the request flow.

06

Deep Dive — Fan-Out, Push vs Pull & Delivery Guarantees

Fan-Out: Staged Bulk Sends

The naive approach — enqueuing 50M messages at once — floods queues, blows memory during segment resolution, and causes a downstream provider stampede. The production approach uses three stages:

Stage 1 — Job Creation (instant): Write a single bulk_job row to the database, return 202.
Stage 2 — Segment Resolution (streaming): Fan-out service streams users in 5K batches via cursor, pre-filters by preferences (eliminates ~20% before queuing), throttles enqueue at caller-specified rate, and checkpoints progress for crash recovery.
Stage 3 — Delivery: Individual notifications flow through the normal pipeline at a steady, manageable rate.

Push vs Pull for In-App

Pull: User opens feed → query Cassandra (partition key = user_id, cluster key = created_at DESC). Single-partition read, ~2ms. Redis caches page 1 (last 50 notifications) for ~90% cache hit rate — 60× cheaper than storing the full feed in Redis RAM.

Push: Real-time badge updates via WebSocket Gateway. Connection registry in Redis maps user_id → gateway instance. On new notification, publish to the specific gateway holding the user's connection. At 10M concurrent connections, ~25 GB RAM per gateway instance across 20 pods.

Delivery Guarantees: At-Least-Once + Idempotency

Exactly-once across a network boundary with external providers is impossible (Two Generals' Problem). We always retry, with idempotency at 4 layers:

sequenceDiagram participant PS as Payments Service participant API as Ingestion API participant RD as Redis (Dedup) participant KF as Kafka Critical participant PW as Processing Worker participant PU as Push Worker participant FCM as FCM (Google) participant DT as Delivery Tracker participant SMS as SMS Worker PS->>API: POST /v1/notifications (OTP) API->>RD: Check idempotency key RD-->>API: Not seen ✓ API->>KF: Enqueue to notifications.critical API-->>PS: 202 Accepted KF->>PW: Consume message PW->>RD: Check processed set (SETNX) RD-->>PW: New ✓ PW->>PW: Resolve template + check prefs PW->>KF: Enqueue to delivery.push KF->>PU: Consume push message PU->>RD: Check delivery log RD-->>PU: Not sent ✓ PU->>FCM: Send push notification FCM-->>PU: 200 OK PU->>DT: Status → "sent" Note over DT: Start 30s fallback timer FCM-->>DT: Delivery receipt DT->>DT: Status → "delivered" DT->>DT: Cancel SMS fallback ✓ DT->>PS: Webhook: OTP delivered via push Note over DT,SMS: If push fails or times out after 30s: DT->>SMS: Trigger SMS fallback SMS->>SMS: Deliver via Twilio

Layer 1 — Ingestion

Redis SETNX on idempotency key (24h TTL). Duplicate requests return same notification_id without re-queuing.

Layer 2 — Processing

Atomic SETNX on processed set. Kafka consumer commits offset after processing, not before.

Layer 3 — Delivery

Check delivery log before calling provider. FCM collapse_key reduces visible duplicates on device.

Layer 4 — Client

Client maintains local set of recent notification_ids (~200). Last line of defense against server-side dedup failures.

Retry Strategy

Exponential backoff with retry Kafka topics: delivery.push.retry.1 (2s), .retry.2 (8s), .retry.3 (32s), .dlq (dead letter). Critical: 6 retries (~10 min window). Low: 2 retries (~10s), then silent drop.

07

Simpler (one check location). But wastes queue capacity and worker resources on notifications we'll ultimately drop.

08

What Can Go Wrong

🔴 Provider Outage (FCM / Twilio / SES)

Impact: One channel completely blocked. Retry storms fill retry topics, new messages pile up. Mitigation: Circuit breaker pattern — after 50 failures in 60s, stop sending to that provider. Critical traffic routes to fallback channel. Non-critical held in deferred topic until recovery.

🔴 Thundering Herd from Bulk Send

Impact: 50M promo blast + organic traffic overwhelms push workers. FCM returns 429s. Mitigation: Physical isolation — separate worker pools for critical vs low priority with independent FCM API quotas. Adaptive throttling on fan-out based on consumer lag.

🟠 Preference Desync — Send After Opt-Out

Impact: User opts out of marketing at 14:00:01, cached preference is stale, marketing push sent at 14:00:03. GDPR violation risk. Mitigation: Cache invalidation on write. Marketing notifications bypass cache — synchronous PostgreSQL read for compliance.

🟠 SMS Cost Explosion

Impact: Delayed FCM delivery receipts trigger SMS fallbacks unnecessarily. 1% false fallback × 420M pushes = 4.2M wasted SMS = $126K/day. Mitigation: Conservative fallback timeouts per priority (30s critical, 2min high, none for medium/low). Re-check push delivery status before sending SMS.

🟠 Device Token Rot

Impact: 10–30% of stored tokens become stale over time. 84M wasted FCM calls/day. Mitigation: Reactive cleanup on "NotRegistered" response. Weekly proactive health check with FCM dry_run. Token refresh on every app open.

🟡 Kafka Broker Failure

Impact: Writes fail for affected partitions. Mitigation: Replication factor = 3, min.insync.replicas = 2. Leadership transfer in ~30s. Redis fallback buffer for critical notifications during brief outages.

🟡 Duplicate Notifications Despite Idempotency

Impact: Worker crashes between processing and Redis write → reprocesses on restart. Mitigation: Atomic SETNX instead of check-then-set. If Redis is down, fail open (accept brief duplicate spike). Client-side dedup as ultimate safety net.

🟡 Queue Starvation (Low Priority Never Processed)

Impact: Marketing campaigns that should deliver in 1 hour take 12 hours. Mitigation: Dynamic worker rebalancing based on consumer lag. Minimum 2 workers per priority level always. SLA alerts per priority topic.

⚠

Anti-patterns

🚫

Send every notification to every device immediately

10M notifications/sec ingest; APNs/FCM rate-limits throttle you.

✓ Better: Queue + batch per-device-token; respect provider rate limits; dedup collapsed notifications.

🚫

Send SMS for every in-app notification

Cost explodes; users unsubscribe.

✓ Better: Channel preferences per user; in-app default; SMS only for critical (payment, 2FA).

🚫

Write notification to DB synchronously before acking API

Notification service becomes part of the hot path.

✓ Better: Fire-and-forget event to Kafka; notify service consumes + delivers async.

09

Interview Tips

💡

Scope first — ask 5 questions
What channels? Platform or feature? Scale (DAU)? Priority levels? Delivery guarantees? This narrows a dozen-direction problem into a specific design. Proposing numbers yourself ("Let me assume 100M DAU") signals confidence.

⚡

Draw the pipeline, not the boxes
Start with the data flow: Ingestion → Priority Queues → Processing → Channel Queues → Providers. This gives the interviewer the full picture in 60 seconds. Then add detail where they want to go deep.

🎯

Lead with the OTP-vs-promo tension
This is the defining constraint. Explain priority isolation (separate Kafka topics, dedicated consumer pools). This single point demonstrates you understand both the business requirement and the technical solution.

🚫

Never say "exactly-once delivery"
It's impossible across a network boundary with external providers. Say "at-least-once with idempotency at every layer" and explain the 4-layer dedup strategy. This signals real distributed systems understanding.

🏗️

Proactively discuss failures
Don't wait for "what could go wrong?" Bring up provider outages and circuit breakers yourself. Mention the thundering herd from bulk sends. This signals senior-level thinking that goes beyond the happy path.

📐

45-minute structure
Scoping (5 min) → Scale estimation (3 min) → Architecture pipeline (10 min) → API design (5 min) → Deep dive on ONE topic for 15 min (priority queues OR delivery guarantees OR fan-out) → Tradeoffs + failures (5 min) → Q&A (2 min).

10

Evolution

How this design grows from MVP to planet-scale. Architecture is derived from scale, not pattern-matched from blog posts.

1

MVP — Single Server, One Channel (10K users)

Flask + PostgreSQL + Celery + FCM. One channel, one priority, one server. Preferences are a boolean column. Templates are hardcoded strings. Deployable in a day. Breaks at ~1M notifications/day when the single Celery worker becomes a bottleneck.

2

Multi-Channel with Queues (1M DAU)

Add email + SMS channels. Replace Celery with Kafka for durability and replay. Two priority topics (critical, normal). User preference table with per-channel opt-in/out. Separate notification feed table partitioned by month.

3

Platform Scale (50M DAU)

Template CRUD API. 4 priority levels with dedicated consumer pools. Bulk fan-out service with throttling. Cassandra for feed store. Fallback chains with delivery tracker. Webhook system for calling services. WebSocket gateway for real-time updates.

4

Planet Scale (500M DAU)

Multi-region deployment. ML-powered channel selection (predicts best channel per user). Notification aggregation ("Ahmed and 47 others liked your post"). Batch provider APIs. Send-time optimization. Per-team SMS cost governance. GDPR compliance audit trails.

📺

References & Videos

Design a Notification Service

ByteByteGo · 10 min

Notification System Design

Gaurav Sen · 15 min

Design a Notification System

AlgoMaster

Design Notification Service

GeeksforGeeks

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Key NFR Tensions

Latency vs Throughput

Reliability vs Cost

Consistency vs Availability

Simplicity vs Flexibility

Scale Estimation

Channel Breakdown

What the Numbers Force

API Design

High-Level Architecture

Deep Dive — Fan-Out, Push vs Pull & Delivery Guarantees

Fan-Out: Staged Bulk Sends

Push vs Pull for In-App

Delivery Guarantees: At-Least-Once + Idempotency

Layer 1 — Ingestion

Layer 2 — Processing

Layer 3 — Delivery

Layer 4 — Client

Retry Strategy

Key Design Decisions & Tradeoffs

Queue Architecture

Separate Kafka Topics per Priority

Single Topic with Priority Field

Feed Store

Cassandra

Redis Sorted Sets

Real-Time Updates

WebSocket + Pull Fallback

Push Notification Piggyback

Delivery Guarantee

At-Least-Once + Idempotency

Exactly-Once

API Model

Async (202 Accepted)

Synchronous (200 with result)

Template Resolution

Write Time (Eager)

Read Time (Lazy)

Preference Check Location

Coarse at Fan-Out + Fine at Processing

Processing Only

What Can Go Wrong

🔴 Provider Outage (FCM / Twilio / SES)

🔴 Thundering Herd from Bulk Send

🟠 Preference Desync — Send After Opt-Out

🟠 SMS Cost Explosion

🟠 Device Token Rot

🟡 Kafka Broker Failure

🟡 Duplicate Notifications Despite Idempotency

🟡 Queue Starvation (Low Priority Never Processed)

Anti-patterns

Interview Tips

Similar Problems

WhatsApp / Chat System

Twitter / News Feed

Distributed Message Queue

Rate Limiter

Email Delivery Service

Evolution

MVP — Single Server, One Channel (10K users)

Multi-Channel with Queues (1M DAU)

Platform Scale (50M DAU)

Planet Scale (500M DAU)

References & Videos

WhatsApp / Chat System

Twitter / News Feed

Availability — The Nines