Count Unique Active Users

01

Problem Statement

"How many unique users were active on our platform yesterday?" A seemingly simple question that becomes an engineering challenge at scale. At 200M DAU generating 6 billion events/day, a naive SELECT COUNT(DISTINCT user_id) would scan billions of rows and take your database down.

"Active" must be defined — we use any user who triggered at least one meaningful event (page view, click, stream, search). "Unique" must be deduplicated across multiple devices, sessions, and servers. Time windows (DAU/WAU/MAU) must be composable — MAU is not the sum of 30 DAUs.

The core tension: exact counting requires O(n) memory — 150 TB for all segments at scale. HyperLogLog reduces this to 108 MB with 0.81% error. The architecture must serve both paths.

Core question: How do you count unique active users across multiple time windows, at massive scale, with tunable accuracy-vs-cost tradeoffs, serving both real-time dashboards and exact batch reports?

02

Requirements

Functional Requirements

Count unique users within a time window — DAU, WAU, MAU, or any custom date range with deduplication across devices and sessions
Support multiple granularities — hourly, daily, weekly, monthly; and arbitrary ranges like "March 10–25"
Segment by dimensions — country, platform (iOS/Android/Web), feature, user cohort — each multiplying the counter count
Expose via API and dashboard — REST API for programmatic access (ad pricing, alerts) and real-time visual dashboard for PMs/execs
Historical trend queries — "Show DAU trend for last 90 days" or "Compare this WAU to same week last year"

Non-Functional Requirements

Scale: Ingest 6B events/day (~70K/sec avg, 210K/sec peak) from 200M DAU
Low-latency reads: Dashboard queries return in < 1 second via pre-computed results
Tunable accuracy: Approximate (HLL, ±0.81%) for dashboards; exact (batch) for financial/investor reports
Fault tolerance: No event loss — Kafka replication factor 3, Flink checkpointing, idempotent producers
Idempotency: Duplicate events must not inflate counts — HLL naturally deduplicates by user_id
Freshness: Real-time path reflects activity within 1–5 minutes
Cost efficiency: Memory must scale sub-linearly — 150 TB exact vs 108 MB HLL for all segments

Key NFR conflict: Accuracy vs Freshness is irreconcilable in a single path — this forces the Lambda Architecture with separate real-time (approximate) and batch (exact) pipelines.

03

Scale Estimation

Every number is derived from stated assumptions. The DAU/MAU ratio of 40% and 30 events/user/day are industry benchmarks for engagement-driven platforms.

200M

Daily Active Users

70K/sec

Avg Events Ingested

210K/sec

Peak Events

3 TB/day

Raw Event Storage

108 MB

HLL Memory (all segments)

150 TB

Exact Set Memory

~1 PB/yr

Annual Event Archive

1.4M×

Memory Reduction (HLL)

Derivation

500M MAU × 40% DAU/MAU = 200M DAU. Each active user generates ~30 events/day → 6B events/day ÷ 86,400 = ~70K/sec (peak 3× = 210K/sec). At 500 bytes/event → 3 TB/day raw storage. HLL: 3 windows × 3,000 segments × 12 KB = 108 MB. Exact: 200M users × 48 bytes × 3 windows × 3,000 segments = 150 TB. The 1.4-million-fold gap forces probabilistic counting for real-time.

Metric	Value	Architectural Signal
Kafka brokers	6+ (RF=3)	~10 TB/broker for 7-day retention
Flink task slots	7–10	~30K events/sec per slot
Redis memory	~25 GB	30 days of hourly HLL sketches
Spark batch (DAU)	3 TB scan	~2 hour job window
Spark batch (MAU)	90 TB scan	Incremental computation required

04

API Design

Event Ingestion — POST /v1/events

        {
  "events": [{
    "event_id":     "evt_8f3a2b91c4d7",   // Client UUID — idempotency key
    "user_id":      "usr_204871",          // Canonical authenticated user ID
    "anonymous_id": "anon_d84f2c9e",       // Pre-login device ID
    "event_type":   "page_view",           // page_view | click | search | stream
    "timestamp":    "2026-04-06T14:23:01Z",// Client-side event time
    "properties": {
      "platform": "ios",
      "country":  "IN",
      "feature":  "search",
      "session_id": "sess_7a2b3c"
    }
  }]
}
// Response: 202 Accepted (async — written to Kafka, not yet counted)
// { "accepted": 98, "rejected": 2, "errors": [...] }
      

Events are batched (50–100 per request) to reduce TCP overhead from 70K to ~1K requests/sec. 202 Accepted (not 200 OK) because processing is async. Client-generated event_id enables idempotent retry. Both user_id and anonymous_id support identity stitching across devices.

Query Unique Counts — GET /v1/metrics/unique-users

        GET /v1/metrics/unique-users
  ?metric=dau
  &start=2026-04-01&end=2026-04-06
  &granularity=daily
  &filters[country]=IN,US
  &filters[platform]=ios
  &mode=approximate        // approximate (HLL) | exact (batch)
  &compare_to=previous_period

// Response:
{
  "mode": "approximate",
  "accuracy": { "standard_error": "0.81%", "method": "hyperloglog" },
  "data": [
    { "date": "2026-04-01", "unique_users": 14832901,
      "comparison": { "date": "2026-03-25", "change_pct": "+4.38%" } },
    ...
  ],
  "freshness_lag": "2m 14s"
}
      

The mode parameter exposes the Lambda dual-path directly. accuracy metadata lets consumers decide if precision is sufficient. freshness_lag shows pipeline delay. The granularity parameter leverages HLL's mergeability — only hourly sketches are stored; daily/weekly/monthly are derived via PFMERGE.

Anomaly Alerts — POST /v1/alerts/rules

        {
  "name": "DAU drop alert",
  "metric": "dau",
  "condition": {
    "type": "percentage_change",
    "threshold": -15,
    "compared_to": "same_day_last_week",
    "window": "1h"
  },
  "notify": { "webhook": "https://hooks.slack.com/..." }
}
      

Alerting is first-class because silent data loss — the most dangerous failure — produces plausible but wrong numbers that nobody investigates unless automated detection catches it.

05

High-Level Architecture

A Lambda Architecture with three paths: real-time (speed), batch (truth), and serving (unified query). Every component exists because a specific scale constraint demanded it.

Request Flow — Step Through

Client SDK→Ingestion API→Kafka→Flink (HLL)→Redis→S3 Archive→Spark (Batch)→Query API→Dashboard

Click Next Step to walk through the request flow.

06

Deep Dive — HyperLogLog vs Exact Counting

The defining data structure choice. HyperLogLog lets you count 500M unique elements in 12 KB of memory with 0.81% standard error — a 1.4-million-fold reduction vs exact sets. Here's why it works.

The Coin-Flip Intuition

Imagine each user flips a coin until they get heads, then reports the number of tails. You only track the maximum tails reported. If the max is R, you've probably seen ~2^R unique people. A run of 20 tails (probability 1/2²⁰ ≈ 1 in a million) means ~1M unique visitors. One number, O(1) memory.

In practice: hash each user_id to a binary string, count leading zeros (equivalent to tails before heads). A single estimator has ~70% error. HLL fixes this with stochastic averaging — split elements into 16,384 buckets (using the first 14 bits of the hash), track max leading zeros per bucket, and combine using the harmonic mean.

Memory

16,384 registers × 6 bits each = 12 KB. Constant regardless of cardinality — works for 100 or 500M elements.

Error

Standard error = 1.04 / √16384 = 0.81%. For 200M DAU: ±1.62M (95% confidence ±3.24M).

The Killer Feature: Mergeability

Two HLL sketches merge by taking the register-wise maximum. The merged sketch equals a single sketch built over the union of both sets, with identical error bounds. This enables:

Hourly → Daily: Merge 24 hourly sketches → DAU
Daily → Monthly: Merge 30 daily sketches → MAU
Cross-partition: Merge 10 Flink task slot sketches → global count
Arbitrary ranges: "Users from March 10–25" → merge 16 daily sketches

Limitation: HLL supports union (merge) but NOT intersection or difference. "Users on iOS AND Android" requires Theta Sketches (Apache DataSketches) which support all set operations at ~32 KB, 2.7× more memory.

sequenceDiagram participant SDK as Client SDK participant API as Ingestion API participant K as Kafka participant F as Flink participant R as Redis SDK->>API: POST /v1/events (batch of 50) API->>K: Produce to partition hash(user_id) % 64 K->>F: Consume event F->>F: hash(user_id) → bucket #6746 F->>F: Leading zeros = 5 → Register[6746] = max(old, 5) Note over F: Repeat for each segment:
global, country:IN, platform:ios F->>R: PFMERGE hourly HLL sketches (every 30s) Note over R: PFCOUNT → approximate DAU
PFMERGE 24 hourly → daily

Comparison Table

Structure	Memory (200M, 1 ctr)	Memory (9K ctrs)	Accuracy	Merge	Intersect
HashSet	9.6 GB	150 TB	Exact	Yes (slow)	Yes
Roaring Bitmap	200 MB–1 GB	~9 TB	Exact	Yes (fast)	Yes
HyperLogLog	12 KB	108 MB	±0.81%	Yes (trivial)	No
Theta Sketch	32 KB	288 MB	±1.4%	Yes	Yes
Bloom Filter	250 MB	2.25 TB	N/A (not a counter)	Yes	≈ Yes

Redis HLL in Practice

        # Add user to hourly sketch
PFADD hll:hourly:2026040614:country:IN "usr_204871"
→ (integer) 1   # sketch modified — new element

# Same user again — no change
PFADD hll:hourly:2026040614:country:IN "usr_204871"
→ (integer) 0   # already counted!

# Merge 24 hourly sketches → daily
PFMERGE hll:daily:20260406:country:IN \
    hll:hourly:2026040600:country:IN \
    ... \
    hll:hourly:2026040623:country:IN

# Get DAU
PFCOUNT hll:daily:20260406:country:IN
→ (integer) 14832901   # ±0.81%
      

07

Simpler pipeline. But unresolved anonymous IDs permanently overcount. No correction path. Error compounds over time.

08

What Can Go Wrong

🔴 Silent Data Loss — The #1 Threat

No crash, no error, but events silently stop flowing. A broken SDK, misconfigured load balancer, or Kafka retention misconfiguration causes DAU to drift down without alerts. Defense: End-to-end canary events (synthetic test users verified at the output), volume-based anomaly detection per platform/country, cross-path consistency checks (real-time vs batch divergence > 3% triggers investigation).

Kafka Broker Crash

Leader partitions become unavailable for 5–30s during election. At 1,100 events/sec per partition, ~55K–330K events queue during failover. Defense: Replication factor 3 ensures no data loss. Idempotent producer (enable.idempotence=true) prevents duplicates on retry. Auto-recovery via leader election.

Flink Task Slot Crash

In-memory HLL state for current hour is lost. ~10% of events stop flowing into sketches. Defense: Checkpointing to S3 every 1–2 minutes. On crash, Flink restores from checkpoint and replays events from Kafka. HLL idempotency means replayed events don't double-count. Recovery time: 30–120 seconds.

Redis Cluster Failure

Dashboard queries fail. No permanent data loss (Flink still processes and checkpoints). Defense: Redis replicas with auto-failover (< 30s). Query API falls back to batch results from PostgreSQL with "data_source": "batch_fallback" flag. Flink buffers HLL updates and replays on Redis recovery.

Identity Stitching Failure

Service returns stale mappings or times out. Users counted under both anonymous and authenticated IDs → DAU inflated 5–15%. Dangerous because DAU going UP looks like good news. Defense: Circuit breaker at 5% error rate. Fallback to user_id-only counting (undercounts, which is safer). Batch reconciliation catches within 24h.

Clock Skew / Timestamp Corruption

Misconfigured device clocks send events timestamped in 2024 or 2030. With event-time windowing, these corrupt wrong time windows. Defense: Reject events where |client_timestamp - server_received_at| > 7 days. Watermark allowed lateness of 24h. Monitor clock skew distribution — alert if >1% of events have skew > 1 hour.

Spark Batch Job Failure

Executor OOM, S3 timeout, or logic bug produces incorrect exact counts. Defense: Sanity checks before commit: DAU within 20% of previous day, within 5% of HLL estimate, MAU ≥ WAU ≥ DAU. Failed checks write to staging table, not production. Immutable Parquet input enables safe re-runs.

09

Interview Tips

💡

Lead with "What does active mean?"
This clarifying question signals seniority. The definition (login vs event vs session duration) changes the entire data pipeline. Junior candidates jump straight to HyperLogLog; senior candidates nail the requirements first.

⚡

Derive HLL from the memory calculation
Proactively compute: "200M users × 48 bytes = 9.6 GB per set. With 3,000 segments × 3 windows = 150 TB. HLL brings this to 108 MB — a 1.4M× reduction at 0.81% error." This back-of-envelope is the most impressive moment in the interview.

🎯

Name Lambda, then critique it
Say: "This is a Lambda Architecture." Then immediately: "The cost is two codepaths and divergent counts. If exact isn't needed, I'd simplify to Kappa." Naming + critiquing + offering alternatives = senior-level signal.

🔑

HLL mergeability is your trump card
"I store hourly HLL sketches because HLL merge is lossless — 24 hourly sketches merge to DAU, 720 to MAU, with identical accuracy. Only hourly granularity is computed; all coarser windows are derived."

🛡️

Preempt the intersection limitation
Before the interviewer asks: "HLL only supports union, not intersection. For 'users on iOS AND Android,' I'd use Theta Sketches from Apache DataSketches — 32 KB, support union + intersect + difference."

🚨

Silent data loss is your differentiator
"The most dangerous failure isn't a crash — it's silent data loss. A broken SDK drops 15% of events and nobody notices. My defense: end-to-end canary events from synthetic test users, verified at the output."

10

Evolution

How this design grows from MVP to planet-scale. Add complexity only when a specific constraint forces it.

1

MVP — PostgreSQL + Cron (0–1M DAU)

SELECT COUNT(DISTINCT user_id) FROM events WHERE timestamp >= CURRENT_DATE. Runs hourly on the production DB. Simple, exact, and fast enough. Breaks when analytics queries compete with production traffic.

2

Read Replica + Redis HLL (1M–10M DAU)

Dedicated read replica for analytics. Redis PFADD per event for real-time approximate DAU on the dashboard. First introduction of HyperLogLog. Total HLL memory: a few hundred KB.

3

Streaming Pipeline (10M–50M DAU)

Kafka + Flink replaces direct Redis writes. S3 archival + weekly Spark job for exact counts. Birth of Lambda Architecture. First "dashboard says X, report says Y" moment — add mode=approximate|exact API parameter.

4

Full Production (50M–500M DAU)

Identity stitching service. 3,000+ segments. Anomaly detection with canary events. Redis Cluster for 25 GB of hourly HLL sketches. Incremental MAU computation with Roaring Bitmaps. Cross-path consistency checks.

5

Planet Scale — Multi-Region (500M+ DAU)

Regional Kafka + Flink clusters. Global HLL merge aggregator (associative + commutative — order doesn't matter). Tiered storage (S3 Standard → Glacier). HLL precision tuning per segment. Evolving toward streaming-first with batch as audit-only.

📺

References & Videos

HyperLogLog Explained

ByteByteGo · 8 min

HyperLogLog in Redis

Redis Docs

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Scale Estimation

Derivation

API Design

High-Level Architecture

Deep Dive — HyperLogLog vs Exact Counting

The Coin-Flip Intuition

Memory

Error

The Killer Feature: Mergeability

Comparison Table

Redis HLL in Practice

Key Design Decisions & Tradeoffs

Lambda vs Kappa Architecture

Lambda (Dual-Path)

Kappa (Single Stream)

HyperLogLog vs Theta Sketch

HLL (Primary) + Theta (Secondary)

Theta Sketch Everywhere

Event-Time vs Processing-Time Windows

Event-Time (Client Timestamp)

Processing-Time (Server Clock)

Kafka Partition Key

Partition by user_id

Random (Round-Robin)

Redis Update Model

Push from Flink (Batched)

Direct PFADD per Event

Identity Resolution Strategy

Best-Effort Real-Time + Batch Reconciliation

Real-Time Only

What Can Go Wrong

🔴 Silent Data Loss — The #1 Threat

Kafka Broker Crash

Flink Task Slot Crash

Redis Cluster Failure

Identity Stitching Failure

Clock Skew / Timestamp Corruption

Spark Batch Job Failure

Interview Tips

Similar Problems

Real-Time Analytics

Ad Impression Dedup

Rate Limiter

Distributed Logging

Leaderboard / Top-K

Message Queue (Kafka)

Evolution

MVP — PostgreSQL + Cron (0–1M DAU)

Read Replica + Redis HLL (1M–10M DAU)

Streaming Pipeline (10M–50M DAU)

Full Production (50M–500M DAU)

Planet Scale — Multi-Region (500M+ DAU)

References & Videos

Real-Time Analytics

Ad Impression Dedup

Gossip Protocols