Reddit — Exercise — System Design Portfolio

Exercise · Social & Feed

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points60 min whiteboardReference solution →

Prompt

1.7 billion monthly visits across millions of subreddit communities. The hard parts: a Hot ranking algorithm that decays old posts and surfaces fresh content without scanning every post on every page load, nested comment trees stored efficiently for fast subtree retrieval and sorted by Wilson score, and a vote counting pipeline that handles 50K votes/sec without hot-row contention or data loss. Upvotes, downvotes, karma, moderation, subreddit-scoped feeds -- same bones as any community platform, but at massive scale.

Time budget: 60 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

Hints (progressive — click to reveal)

Hint 1

Lead with the Hot ranking formula. "log10(votes) + time/45000 -- old posts decay automatically, no re-scoring needed." This shows you understand the elegance of the time-based decay approach.

Hint 2

Name materialized path explicitly for comments. "Each comment stores /root/parent/self as its path. One LIKE prefix query fetches the whole subtree." This immediately distinguishes you from candidates who say "recursive CTE."

Hint 3

Vote pipeline = Redis + Kafka + Postgres. Three layers: dedup (SET), fast count (INCR), durability (Kafka to Postgres batch). Shows you understand the tension between speed and durability.

Rubric — 10 points

+2 Lead with the Hot ranking formula. "log10(votes) + time/45000 -- old posts decay automatically, no re-scoring needed." This shows you understand the elegance of the time-based decay approach.
+2 Name materialized path explicitly for comments. "Each comment stores /root/parent/self as its path. One LIKE prefix query fetches the whole subtree." This immediately distinguishes you from candidates who say "recursive CTE."
+1 Vote pipeline = Redis + Kafka + Postgres. Three layers: dedup (SET), fast count (INCR), durability (Kafka to Postgres batch). Shows you understand the tension between speed and durability.
+1 Explain hybrid fan-out. "Small subs fan-out on write, large subs fan-out on read." Same insight as Twitter's celebrity problem. Interviewers love seeing you apply cross-system patterns.
+1 Mention Wilson score for comment sorting. "Raw vote count is biased toward old comments. Wilson score confidence interval normalizes for sample size." This statistical nuance is a strong signal.
+1 Cap comment depth at ~10. "Materialized paths grow linearly with depth. Reddit caps at 10 levels; deeper replies collapse behind a link." Shows production awareness.
+1 Distinguish hot vs controversial vs best. Hot = votes + time decay (for feeds). Best = Wilson score (for comments). Controversial = high total votes but near 50/50 split. Showing you know each algorithm signals depth.
+1 Address the celebrity problem for feeds. "When r/AskReddit (40M subscribers) gets a new post, we don't push to 40M feeds. Fan-out on read for large subs, fan-out on write for small subs." This is the same insight Twitter had with celebrity tweets.

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

Red flags (things that tank the interview)

Recalculate Hot ranking by scanning all posts on every feed load
Store comment tree as adjacency list + recursive query (parent_id FK)
Single vote counter row per post in Postgres (UPDATE posts SET score = score + 1)
Build home feed by querying each subscribed subreddit's DB shard at read time