Reddit

1.7 billion monthly visits across millions of subreddit communities. The hard parts: a Hot ranking algorithm that decays old posts and surfaces fresh content without scanning every post on every page load, nested comment trees stored efficiently for fast subtree retrieval and sorted by Wilson score, and a vote counting pipeline that handles 50K votes/sec without hot-row contention or data loss. Upvotes, downvotes, karma, moderation, subreddit-scoped feeds -- same bones as any community platform, but at massive scale.

Core: Hot Ranking + Comment Trees + Vote Pipeline1.7B monthly visits100K posts/day2M comments/day50K vote QPS peak
02

Requirements

Functional
  • Users create subreddit-scoped posts (text, link, image, video)
  • Upvote / downvote on posts and comments; net score drives ranking
  • Nested comment trees with sorting (best, top, new, controversial)
  • Home feed = posts from subscribed subreddits; Popular feed = global trending
  • User karma accumulated from votes on their content
  • Subreddit moderation tools: remove posts, ban users, set rules, automod
  • Full-text search across posts and comments
Non-Functional
  • Feed latency < 200 ms for hot/top pages
  • Vote ingestion at 50K QPS peak with no lost votes
  • Comment tree renders in < 100 ms for threads with 10K+ comments
  • Eventual consistency acceptable for vote counts (< 5 s lag)
  • Search index lag < 30 seconds after post creation
  • High availability: 99.9% uptime across all read paths
03

Scale Estimation

Monthly active visits
~1.7B
~430M unique users; heavy read:write ratio (~100:1)
Posts created / day
~100K
across millions of subreddits; power-law distribution
Comments / day
~2M
avg ~20 comments per post; viral threads reach 50K+
Vote QPS (peak)
~50K
bursty during front-page events; avg ~10K QPS
Feed reads / sec
~200K
home + popular + subreddit feeds combined
Storage (posts + comments)
~50 TB
text-heavy; images/video stored in object storage
Active subreddits
~100K
of ~3M total; power-law: top 1K subs drive 80% of traffic
Avg subscriptions / user
~50
determines fan-out factor for home feed construction
04

API Design

POST/r/{subreddit}/posts

Create a new post. Body: {title, type: text|link|image, body, url}. Returns {post_id, created_at}. Subreddit rules validated server-side.

POST/posts/{id}/vote

Upvote or downvote. Body: {direction: 1|-1|0}. Idempotent per user+post. Returns {new_score}. Direction 0 = remove vote.

GET/r/{subreddit}/hot?after=CURSOR

Paginated hot feed for a subreddit. Returns {posts[], next_cursor}. Served from pre-computed feed cache. Cursor-based pagination for stable ordering.

GET/posts/{id}/comments?sort=best

Fetch comment tree for a post. Sort by best (Wilson score), top, new, controversial. Returns nested {comments[{id, body, score, children[]}]}. Depth-limited to ~10 levels; deeper threads require "continue this thread" link.

POST/posts/{id}/comments

Add a comment. Body: {parent_id, body}. If parent_id is null, top-level comment. Returns {comment_id, path}. Depth capped at 10 levels.

GET/home/feed?after=CURSOR

Home feed for authenticated user. Merges top posts from subscribed subreddits ranked by hot score. Cursor-based pagination. Returns {posts[], next_cursor}.

GET/search?q=QUERY&subreddit=SUB&sort=relevance

Full-text search across posts and comments. Optional subreddit scoping. Sort by relevance (BM25 + engagement) or new. Returns {results[], total_count}.

POST/r/{subreddit}/mod/remove

Moderator removes a post or comment. Body: {target_id, reason}. Removes from feed cache immediately. Logged to public mod log.

05

Architecture

Read-heavy architecture with service-per-domain: Post Service owns post CRUD and writes to Postgres sharded by subreddit for feed query locality. Vote Service handles vote ingestion via Redis (INCR for fast counts, SET for dedup) and publishes events to Kafka. Comment Service manages tree storage using materialized paths in Postgres (or Cassandra for write-heavy threads). Feed Service maintains pre-computed Redis sorted sets per subreddit, rebuilt on vote aggregation events. Search Service consumes from Kafka and indexes into Elasticsearch. Kafka is the central nervous system -- it decouples the write-heavy vote path from the read-heavy feed path, enables replay on failure, and drives async indexing.

Reddit ArchitectureSVG
Clientsweb + mobile API Gatewayauth + rate-limit Post ServiceCRUD + validation Vote Servicededup + count Comment Servicetree storage Feed Serviceranked feeds Search Servicefull-text Post DBPostgres (sharded) Redis (votes)INCR + SET dedup Comment DBmaterialized path Feed CacheRedis sorted sets Elasticsearchposts + comments Kafkavotes + feed fan-out Vote AggregatorKafka consumer Postgres (votes)durable ledger
Request Flow — Step Through
Client · web / mobileAPI Gateway · auth + routingPost Service · CRUD + rulesVote Service · dedup + INCRComment Service · tree storageFeed Service · ranked feedsSearch · Elasticsearch
Click Next Step to walk through the request flow.
06

Deep Dive — Hot Ranking, Comment Trees, Vote Pipeline

(a) Hot Ranking Algorithm. Reddit's classic formula:

hot_score = log10(max(|ups - downs|, 1)) * sign(ups - downs)
            + (post_created_utc - epoch) / 45000

The key insight: the time component is absolute, not relative. A post from today starts with a higher baseline than one from yesterday. No need to re-score old posts -- they naturally sink as newer posts have higher time baselines. The log of votes means the first 10 upvotes matter as much as the next 100 -- this prevents runaway posts from dominating forever. The sign function means downvoted posts (controversial) rank below zero-vote posts.

This makes the algorithm O(1) per post at write time (recompute score on each vote) and O(0) maintenance -- no background recalculation, no scanning all posts. The score is stored in a Redis sorted set per subreddit. Feed reads are ZREVRANGE -- O(log N + K) where K is page size (25).

-- Compute hot score on vote
import math, time
def hot_score(ups, downs, created_utc):
    score = ups - downs
    order = math.log10(max(abs(score), 1))
    sign = 1 if score > 0 else -1 if score < 0 else 0
    seconds = created_utc - 1134028003  # Reddit epoch
    return round(sign * order + seconds / 45000, 7)

(b) Comment Tree Storage — Materialized Path. Each comment stores its full ancestry path, e.g., /root/abc/def/self.

-- Fetch entire subtree of comment 'abc' in one query:
SELECT * FROM comments
WHERE post_id = :post_id
  AND path LIKE '/root/abc/%'
ORDER BY path;

This avoids N+1 recursive queries of an adjacency list. The path column is indexed with a B-tree prefix index. Within each depth level, comments are sorted by Wilson score confidence interval (lower bound of 95% CI on upvote proportion):

-- Wilson score lower bound (simplified)
-- n = total votes, p = fraction of upvotes, z = 1.96 (95% CI)
wilson = (p + z*z/(2*n) - z * sqrt((p*(1-p) + z*z/(4*n))/n))
         / (1 + z*z/n)

This correctly handles the "1 upvote / 0 downvote vs 100 upvotes / 50 downvotes" ranking problem. A comment with few but all-positive votes ranks appropriately against a controversial comment with many total votes. The score is computed at vote time and stored alongside the comment for fast retrieval.

(c) Vote Counting Pipeline. 50K votes/sec cannot hit Postgres directly (hot-row contention). Three-layer approach:

  1. Dedup: Redis SET per post (SADD post:{id}:voters {user_id}). If already a member, reject duplicate vote. For high-cardinality posts, use a Bloom filter instead.
  2. Fast counter: Redis INCR post:{id}:score. Sub-ms. This is the counter used for hot ranking and display.
  3. Durability: Vote event published to Kafka. A consumer aggregates batches (every 5 s or 1000 votes) and flushes to Postgres. On Redis crash, Postgres is the recovery source.
Vote Pipeline SequenceMermaid
sequenceDiagram participant U as User participant VS as Vote Service participant R as Redis participant K as Kafka participant A as Aggregator participant PG as Postgres participant FS as Feed Service U->>VS: POST /posts/{id}/vote {dir:1} VS->>R: SADD post:123:voters user42 R-->>VS: 1 (new vote) VS->>R: INCR post:123:score R-->>VS: 1547 VS->>K: VoteEvent{post:123, user:42, dir:1} VS-->>U: {new_score: 1547} K->>A: batch of vote events A->>PG: INSERT votes + UPDATE post score A->>FS: trigger feed re-rank for affected subreddit

Moderation Pipeline. Each subreddit has moderators with tools to remove posts, ban users, and configure AutoMod rules (regex-based filters). Moderation actions are synchronous -- a removed post disappears from the feed immediately (delete from Redis sorted set). AutoMod runs as a pre-save hook on Post Service: if a post matches any rule, it is flagged or auto-removed before entering the feed pipeline. Mod actions are audit-logged to a separate Postgres table for transparency (mod log is public on Reddit).

Home Feed Construction. A user subscribed to 50 subreddits needs a merged, ranked feed. Two strategies:

  • Fan-out on write: when a post is created, push its ID to every subscriber's feed (Redis list). Fast reads, but a subreddit with 10M subscribers = 10M writes per post. Only viable for small subreddits.
  • Fan-out on read: at read time, fetch the top N posts from each subscribed subreddit's sorted set, merge-sort by hot score, return top 25. Slower reads, but no write amplification. Used for large subreddits.

Reddit's hybrid: small subreddits (< 100K subscribers) fan-out on write; large subreddits fan-out on read. The Feed Service maintains a bitmap per subreddit to decide which strategy to use.

Search. Posts and comments are indexed into Elasticsearch via Kafka consumers. Index structure: one index per month (time-based rollover for efficient retention). Queries support subreddit scoping, author filtering, date ranges, and flair-based filtering. Autocomplete uses an edge-ngram tokenizer on post titles. Search ranking combines text relevance (BM25) with engagement signals (vote count, comment count) to surface popular results over obscure matches.

Karma Calculation. A user's karma = sum of net votes on all their posts and comments, with diminishing returns per post (first 10 upvotes count fully, next 100 count ~50%, etc.). Karma is computed async by a Kafka consumer that listens to vote events. The running total is cached in a Redis hash per user (HINCRBY user:42:karma post_karma 1). Displayed karma may lag actual karma by up to 60 seconds -- acceptable for a vanity metric. Reddit also separates post karma from comment karma, stored as two fields in the same hash.

Data Model Summary:

posts:    {id, subreddit_id, author_id, title, body, type, score, hot_score, created_at}
comments: {id, post_id, author_id, body, path, wilson_score, ups, downs, created_at}
votes:    {user_id, target_id, target_type, direction, created_at}  -- PK: (user_id, target_id)
users:    {id, username, post_karma, comment_karma, created_at}
Interview answer

"Posts are ranked by Reddit's Hot formula: log10(votes) + time_created/45000. Time is absolute so old posts decay without re-scoring. Comments use materialized path storage -- one indexed LIKE query fetches an entire subtree, sorted by Wilson score. Votes go through Redis for dedup (SET) and counting (INCR), then Kafka to a batch aggregator that flushes to Postgres. Redis is the fast path for reads; Postgres is the durable source of truth. Feed is a pre-computed Redis sorted set per subreddit, updated when vote aggregation completes. Home feed uses a hybrid fan-out: write-based for small subs, read-based merge-sort for large subs."

Anti-patterns

1
Recalculate Hot ranking by scanning all posts on every feed load

O(N) per page view. With millions of posts per subreddit, this means full table scans on every request. At 200K feed reads/sec, your DB is dead.

Better: Pre-compute hot score at write time (vote or post creation). Store in a Redis sorted set per subreddit. Feed read = ZREVRANGE -- O(log N + K).
2
Store comment tree as adjacency list + recursive query (parent_id FK)

Fetching a 500-comment thread = 500 queries (N+1 problem). Even with recursive CTE, Postgres struggles above ~1000 rows with deep nesting.

Better: Materialized path ("/root/abc/def"). One LIKE query fetches entire subtree. Index on (post_id, path) makes it fast.
3
Single vote counter row per post in Postgres (UPDATE posts SET score = score + 1)

Hot-row contention. 50K concurrent UPDATEs on one row = row lock convoy. Latency spikes to seconds; deadlocks common.

Better: Redis INCR for the hot path (single-threaded, no contention). Kafka + batch aggregator writes to Postgres periodically.
4
Build home feed by querying each subscribed subreddit's DB shard at read time

User subscribed to 50 subreddits = 50 cross-shard queries per feed load. At 200K feed reads/sec, this is 10M shard queries/sec. Latency explodes.

Better: Pre-compute home feed using fan-out on write for small subs + cached merge-sort for large subs. One Redis read per feed request.
07

Tradeoffs & Design Choices

  • Pre-computed feed (fast read, stale) vs real-time query (fresh, slow). Pre-computed: feed reads are O(1) from Redis cache, but a new viral post may take 5-10 s to appear. Real-time: always fresh, but requires scoring all posts on every read. Reddit chose pre-computed -- staleness of a few seconds is acceptable for a feed; latency is not.
  • Materialized path vs nested set vs adjacency list for comments. Materialized path: easy subtree queries, easy inserts, path length grows with depth (cap at ~10 levels). Nested set: fast reads but O(N) on insert (must re-number). Adjacency list: simplest schema but N+1 reads. Materialized path wins for Reddit's read-heavy + moderate-write pattern.
  • Redis vote counter (fast, lossy on crash) vs Postgres (durable, slower). Hybrid: Redis for the hot path + Kafka for durability. On Redis failure, replay Kafka or recount from Postgres. Accept ~5 s staleness in exchange for 50K QPS throughput.
  • Fan-out on write (push to all subscriber feeds) vs fan-out on read (pull at read time). Reddit uses a mix: popular subreddits fan-out on read (too many subscribers); niche subreddits fan-out on write. Same hybrid approach as Twitter.
  • Wilson score vs raw vote count for comment sorting. Raw count favors old comments (more time to accumulate votes). Wilson score confidence interval normalizes for sample size -- a comment with 5/5 upvotes ranks higher than 100/200. Fairer for new comments.
  • Sharding strategy: by subreddit vs by post ID. Subreddit sharding keeps all posts for a feed query on one shard (locality). Post ID sharding distributes writes evenly but requires scatter-gather for feed reads. Reddit shards posts by subreddit for read locality; votes by post ID for write distribution.
  • Comment pagination: depth-first vs breadth-first. Depth-first (show full thread chains) is Reddit's default -- users follow conversation threads naturally. Breadth-first (show all top-level first) works for flat comment sections like YouTube. Reddit's "best" sort uses depth-first with Wilson-scored ordering at each level.
  • Subreddit feed TTL: short (5 s) vs long (60 s). Short TTL means fresher content but higher cache miss rate. Long TTL reduces load but users see stale feeds. Reddit uses ~60 s for most subreddits, ~5 s for extremely active ones (r/popular during major events).
  • Vote dedup: Redis SET vs Bloom filter. Redis SET (SADD post:123:voters user_id) is exact but uses ~64 bytes per entry. For a post with 1M votes, that is 64 MB. Bloom filter uses ~1 MB for 1M entries at 1% false positive rate. Tradeoff: Bloom filter cannot support vote removal (changing upvote to downvote requires knowing the previous state). Reddit uses Redis SET for active posts and archives to compressed Postgres for old posts.
  • Single Elasticsearch cluster vs per-subreddit sharding. Single cluster is simpler to operate but creates hot shards for popular subreddits. Index-per-month with subreddit-aware routing balances load while keeping operational complexity manageable. Reddit uses time-based indices with custom routing.
08

Failure Modes

1
Vote manipulation (bot armies)
Coordinated bots upvote a post thousands of times to game the front page. Undermines content integrity.
Mitigation: rate-limit votes per account + IP; shadow-ban suspicious accounts (votes are accepted but not counted); ML anomaly detection on voting patterns; account age + karma thresholds for vote weight.
2
Hot post thundering herd on feed cache
A viral post causes millions of users to load the same subreddit feed simultaneously. Cache stampede when the feed cache TTL expires.
Mitigation: staggered TTL (jitter); cache-aside with single-flight (only one goroutine/thread rebuilds the cache, others wait); serve stale-while-revalidate.
3
Comment tree depth explosion
Users create deeply nested reply chains (100+ levels). Materialized path strings grow huge; queries slow down; UI becomes unusable.
Mitigation: cap nesting at ~10 levels (Reddit's actual limit). Deeper replies attach to the 10th-level parent. "Continue this thread" link loads the subtree separately.
4
Karma gaming
Users farm karma via repost bots, then sell high-karma accounts for astroturfing. Erodes trust in the platform.
Mitigation: diminishing karma returns per subreddit; repost detection (perceptual hashing); karma-weighted voting; account age requirements for certain actions.
5
Redis vote counter drift after crash
Redis restarts and loses in-memory vote counts accumulated since last RDB snapshot. Displayed scores are now wrong until recount.
Mitigation: Kafka consumer tracks offset; on Redis recovery, replay Kafka events from last committed offset to rebuild counters. Postgres serves as fallback source of truth for full recount if Kafka retention has expired.
6
Elasticsearch index lag during viral event
Kafka consumer falls behind during a spike in post creation. Search results are minutes stale; users cannot find trending content.
Mitigation: scale Kafka consumer group partitions horizontally; priority queue for posts with high early engagement; fallback to database search for very recent posts (< 30 s old).
09

Interview Tips

  1. Lead with the Hot ranking formula. "log10(votes) + time/45000 -- old posts decay automatically, no re-scoring needed." This shows you understand the elegance of the time-based decay approach.
  2. Name materialized path explicitly for comments. "Each comment stores /root/parent/self as its path. One LIKE prefix query fetches the whole subtree." This immediately distinguishes you from candidates who say "recursive CTE."
  3. Vote pipeline = Redis + Kafka + Postgres. Three layers: dedup (SET), fast count (INCR), durability (Kafka to Postgres batch). Shows you understand the tension between speed and durability.
  4. Explain hybrid fan-out. "Small subs fan-out on write, large subs fan-out on read." Same insight as Twitter's celebrity problem. Interviewers love seeing you apply cross-system patterns.
  5. Mention Wilson score for comment sorting. "Raw vote count is biased toward old comments. Wilson score confidence interval normalizes for sample size." This statistical nuance is a strong signal.
  6. Cap comment depth at ~10. "Materialized paths grow linearly with depth. Reddit caps at 10 levels; deeper replies collapse behind a link." Shows production awareness.
  7. Distinguish hot vs controversial vs best. Hot = votes + time decay (for feeds). Best = Wilson score (for comments). Controversial = high total votes but near 50/50 split. Showing you know each algorithm signals depth.
  8. Address the celebrity problem for feeds. "When r/AskReddit (40M subscribers) gets a new post, we don't push to 40M feeds. Fan-out on read for large subs, fan-out on write for small subs." This is the same insight Twitter had with celebrity tweets.
  9. Mention cursor-based pagination. "Offset pagination breaks when new posts are inserted (users see duplicates or miss posts). Cursor pagination (after=post_id) gives stable page boundaries." Simple but often missed.
11

Evolution

1

Monolith + Postgres

Single Python app (Reddit was Python/Pylons). All data in one Postgres instance. Comments stored as adjacency list with parent_id FK. Hot ranking computed at query time. Works to ~1M users before DB becomes the bottleneck.

2

Memcached feed cache + Cassandra comments

Feed reads served from Memcached to eliminate repeated sort queries. Comments migrated to Cassandra for write throughput and wide-row storage model. Vote counts still in Postgres with row-level locking -- starting to see contention on hot posts.

3

Kafka vote pipeline + Redis counters

Votes flow through Redis INCR for real-time counts, eliminating Postgres hot-row contention. Kafka decouples vote ingestion from durable storage. Feed pre-computed into Redis sorted sets keyed by subreddit. Comment storage migrated to materialized path model.

4

ML ranking (beyond Hot formula)

Personalized feed ranking using engagement signals (dwell time, click-through, save rate), user history, and collaborative filtering. Hot formula becomes one feature among many in a gradient-boosted model. A/B testing framework to measure ranking quality.

5

Real-time features: chat, live threads, predictions

WebSocket layer for live comment streaming during AMAs and sporting events. Reddit Talk (audio rooms), prediction tournaments, and community points. Architecture evolves toward event-driven streaming with CQRS patterns.

Next up