Reminder Alert

Schedule a reminder for "Thursday 9am in America/New_York" and have it arrive on the user's phone at exactly that local time — even when DST changes, the user moves timezones, or the delivery channel is flaky. The hard parts: time-bucketed scanning that doesn't miss or duplicate reminders across millions of users, timezone materialization that handles DST transitions correctly, and at-least-once delivery with client-side dedup so a reminder arrives once, not zero or three times. Google Calendar reminders, Slack scheduled messages, and medical appointment alerts all solve this.

⚡ Core: Time Buckets + Timezone + At-Least-Once~100M reminders/dayDST-safeMulti-channel (push/SMS/email)Recurring support
02

Requirements

Functional
  • Create a one-time reminder for a specific datetime + timezone
  • Create a recurring reminder (daily, weekly, custom RRULE)
  • Deliver via push notification, SMS, or email — user configures channel
  • Snooze (reschedule N minutes/hours from now)
  • Cancel or edit a pending reminder
  • Support all IANA timezones including DST transitions
Non-Functional
  • Delivery within ±30 seconds of the scheduled time
  • At-least-once delivery — never silently drop a reminder
  • Client-side dedup so user sees it once even if delivered twice
  • Scale to ~100M reminders/day = ~1200/sec avg, ~5K/sec peak
  • 99.99% availability on the scheduling path
  • Recurring reminders must handle DST, leap years, "last day of month"
03

Scale Estimation

Reminders / day
~100M
one-time + recurring instances combined
Fire rate (peak)
~5K/sec
morning hours across top-3 timezones concentrate load
Scan window
1 minute
scanner wakes every 60s; processes all due in [now, now+60s]
Recurring %
~40%
daily standup, weekly report, medication reminders
Delivery channels
3
push (APNs/FCM), SMS (Twilio), email (SES)
Storage per reminder
~500 B
user_id, fire_at_utc, tz, title, channel, recurrence_rule
04

API Design

POST/api/reminders

Create reminder. Body: {user_id, title, body, fire_at: "2026-04-17T09:00", timezone: "America/New_York", channel: "push", recurrence?: "RRULE:FREQ=DAILY"}. Server stores fire_at_utc computed from local time + IANA zone. Returns {reminder_id}.

GET/api/reminders?user_id=X&status=pending

List user's pending reminders. Paginated by fire_at_utc. Includes next fire time for recurring.

POST/api/reminders/{id}/snooze

Snooze: reschedule to now + N minutes. Creates a new one-time instance; original recurrence continues separately.

DELETE/api/reminders/{id}

Cancel pending reminder. For recurring: cancels the series; individual instance cancel via PATCH.

05

Architecture

Two flows: scheduling (user creates → store → index by fire_at_utc) and firing (scanner reads due reminders → dispatch to delivery channels). A recurrence expander materializes the next N instances of recurring reminders.

Reminder PipelineSVG
User / Appcreate reminder Reminder svcCRUD + TZ convert Reminder DBCassandra by fire_at bucket Scannerevery 60s: read bucket[now, now+60s] Dispatcherroute to channel APNs/FCM SMS Email (SES) Recurrence expanderRRULE → next N instances Dedup storeRedis SET (reminder_id) Delivery logPostgres audit trail
Request Flow — Step Through
User · create reminderReminder svc · TZ → UTC convertReminder DB · fire_minute bucketScanner · read due bucketDedup (Redis) · SETNX checkDispatcher · route to channelDelivery · push / SMS / email
Click Next Step to walk through the request flow.
06

Deep Dive — Time Buckets + Timezone + At-Least-Once

Time-bucketed scanning. Reminders stored in Cassandra with partition key = fire_minute_utc (truncated to the minute). E.g., all reminders due at 2026-04-17T13:00 UTC are in partition 202604171300. Scanner wakes every 60 s, reads the current-minute partition, dispatches each reminder.

Why this works: partition key gives O(1) lookup of "everything due now." No full-table scan. Scanner is a single leader process (elected via distributed lock) per shard of the time space. Multiple scanners shard by minute-range to parallelize.

Timezone materialization. User says "9am in America/New_York." Server must compute the UTC equivalent at creation time. But: DST changes mean "9am" maps to different UTC offsets on different dates. For one-time reminders, compute once at creation. For recurring: compute the NEXT instance's UTC at recurrence-expand time, re-compute after each fire.

Critical rule: store the IANA zone name (America/New_York), NOT the UTC offset (-05:00). UTC offsets change with DST. If you stored offset, a daily 9am reminder set in January (-05:00) would fire at 10am local time in March when clocks spring forward (-04:00).

Reminder Fire SequenceMermaid
sequenceDiagram participant SC as Scanner participant DB as Reminder DB participant DD as Dedup (Redis) participant D as Dispatcher participant CH as Push / SMS / Email participant RE as Recurrence expander SC->>DB: read partition 202604171300 DB-->>SC: [reminder_1, reminder_2, ...] loop each reminder SC->>DD: SETNX reminder_id (dedup) alt new (not seen) DD-->>SC: 1 (acquired) SC->>D: dispatch reminder D->>CH: deliver via configured channel CH-->>D: ack / fail alt recurring D->>RE: compute next instance RE->>DB: insert next fire_minute_utc partition end D->>DB: mark delivered else already processed DD-->>SC: 0 (skip) end end

At-least-once delivery. Scanner reads due reminders → dispatches → marks delivered. If scanner crashes after dispatch but before marking: next scanner run sees the same reminder, re-dispatches. Hence at-least-once. The dedup store (Redis SETNX with TTL) prevents most duplicates; client-side dedup (by reminder_id) catches the rest.

Recurring expansion. After a recurring reminder fires, the recurrence expander computes the next instance from the RRULE + IANA timezone, converts to UTC, and inserts into the appropriate time-bucket partition. Horizon: expand only 1–2 instances ahead (not "every Monday forever").

Interview answer

"Store reminders in Cassandra partitioned by fire_minute_utc. Scanner (leader-elected per shard) wakes every 60 s, reads the current-minute partition, dedup-checks via Redis SETNX, and dispatches to the configured channel (push/SMS/email). At-least-once by design — scanner re-processes unacked reminders. Client dedup by reminder_id. Timezones stored as IANA zone names, not offsets; UTC conversion happens at creation + recurrence expansion. Recurring reminders expand one instance ahead after each fire."

Anti-patterns

🚫
Store timezone as UTC offset (-05:00) instead of IANA name (America/New_York)

DST changes the offset twice a year. A daily 9am reminder created in winter fires at 10am local time in summer.

✓ Better: Store IANA zone + local time. Convert to UTC at computation time using a tz library.
🚫
Full-table scan every minute looking for due reminders

At 100M reminders, scanning the whole table every 60s is ~1.7M rows/sec just for the scan. Unscalable.

✓ Better: Partition by fire_minute_utc. Scanner reads only the one partition that's due now.
🚫
Expand all future instances of "every Monday forever" at creation time

Infinite storage. And if the user edits the recurrence, you have to delete + recreate all future instances.

✓ Better: Expand only the next 1–2 instances. After each fires, compute + insert the next one.
07

Tradeoffs & Design Choices

  • Polling (scanner every N sec) vs timer-wheel / DelayQueue. Polling is simpler and works well for minute-level precision. Timer wheels (Kafka-style) are better for sub-second precision but harder to distribute. For reminders (30 s tolerance): minute-bucket polling wins on simplicity.
  • At-least-once vs exactly-once delivery. Exactly-once across distributed systems is infeasible without 2PC (slow) or app-level dedup. At-least-once + dedup is the pragmatic answer. Client-side dedup is cheap (check reminder_id before displaying).
  • Cassandra vs Postgres for time buckets. Cassandra: partition by fire_minute_utc gives free sharding + fast partition reads. Postgres: range query on fire_at index, simpler ops. At 100M/day, Cassandra is the better fit. At 1M/day, Postgres is fine.
  • Single scanner leader vs sharded scanners. Single: simpler, no coordination needed, but SPOF. Sharded: each scanner owns a range of minutes (scanner_0 handles even minutes, scanner_1 handles odd, etc.). Leader election per shard via Redis lock.
  • Push vs pull for delivery confirmation. Push to APNs/FCM is fire-and-forget (no delivery guarantee from the platform). SMS has delivery receipts. Email has no reliable read-receipt. Accept: delivery = "we sent it"; display confirmation is the client's problem.
08

Failure Modes

Scanner leader dies mid-scan
Scanner crashes after dispatching 50 of 200 reminders in a minute bucket. 150 not fired.
→ Mitigation: leader lock has TTL (e.g., 90 s). On expiry, another scanner takes over and re-reads the partition. Already-dispatched reminders caught by dedup (Redis SETNX). 150 un-dispatched fire on retry.
🌍
IANA timezone database update
A country changes its DST rules (happens ~yearly somewhere). Existing reminders computed with old rules fire at wrong local time.
→ Mitigation: store local time + IANA zone; recompute UTC on each recurrence expansion. Batch re-index job when tz database updates: scan all reminders in affected zone, recompute fire_at_utc.
📱
Push notification not delivered (phone offline)
APNs/FCM queue the push; phone was off for 3 hours; by then the reminder is irrelevant.
→ Mitigation: set TTL on push (e.g., 1 hour). Fallback channel: if push not delivered within 5 min, send SMS. Channel escalation ladder: push → SMS → email.
🔁
Duplicate delivery on scanner restart
Scanner re-reads the same minute bucket after crash recovery. User gets the reminder twice.
→ Mitigation: Redis SETNX per reminder_id with 10-min TTL. Client-side dedup: reminder_id already displayed → skip. Belt + suspenders.
🗓️
"Last day of month" recurring reminder
User sets "remind me on the last day of every month." Feb has 28/29, Apr has 30, May has 31. RRULE BYMONTHDAY=-1 handles this — but naive implementations break.
→ Mitigation: use a proper RRULE library (e.g., python-dateutil, rrule.js) that handles BYMONTHDAY=-1 correctly. Never hand-roll date math.
09

Interview Tips

  1. Lead with time-bucketed partitioning. "Cassandra partitioned by fire_minute_utc. Scanner reads one partition per tick." This is the insight.
  2. IANA zone, not UTC offset. Say it explicitly: "we store America/New_York, not -05:00." This is the DST answer and shows you've hit this bug.
  3. At-least-once + dedup is the pattern. Don't propose exactly-once. Acknowledge the duplicate risk; show the mitigation (Redis SETNX + client dedup).
  4. Recurring = expand one ahead. Never pre-generate all future instances. Expand after each fire.
  5. Channel escalation. Push fails silently; SMS is more reliable; email is least timely. Escalation ladder shows product thinking.
11

Evolution

1

MVP — cron job + single Postgres

Cron scans WHERE fire_at <= NOW() AND status = 'pending' every minute. Sends email. Works to ~10K reminders/day.

2

Time-bucketed Cassandra + scanner leader

Partition by fire_minute_utc. Scanner leader per shard. Multi-channel delivery (push + SMS). ~1M/day.

3

RRULE engine + IANA timezone handling

Proper recurring support. Zone-aware UTC conversion. DST-safe. ~10M/day.

4

Channel escalation + delivery tracking

Push fails → escalate to SMS after 5 min. Delivery log for audit. At-least-once with Redis dedup. ~100M/day.

5

Smart reminders + ML-suggested timing

ML predicts when user is most likely to act on the reminder. "You usually check email at 8:15am; should we remind at 8:10?" Context-aware delivery.

Next up