Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.
Out of 10 points45 min whiteboardReference solution →
01
Prompt
Surface the top-10 trending topics — hashtags, phrases, named entities — from the firehose of all tweets, refreshed every few minutes, per geo and per user. The hard parts: approximate top-K over a high-velocity stream where storing exact counts for every term is impossible; distinguishing a trend from steady background noise (#love is always common, so it's not trending); and filtering spam + bot amplification before a coordinated campaign games the list. X/Twitter does ~500M tweets/day, ~6K/sec; trending updates for 400+ geos every ~5 minutes.
Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.
02
Hints (progressive — click to reveal)
Hint 1
Lead with streaming top-K + CMS. Candidates who say "store counts in a DB and sort" fail here. The probabilistic data structure answer is expected.
Hint 2
Clarify "trending" early. Deviation from baseline, not raw volume. This single sentence separates good from great answers.
Hint 3
Anti-spam / anti-bot is load-bearing. Often skipped. Mention user-reputation weighting + author-diversity constraint.
03
Rubric — 10 points
+2 Lead with streaming top-K + CMS. Candidates who say "store counts in a DB and sort" fail here. The probabilistic data structure answer is expected.
+2 Clarify "trending" early. Deviation from baseline, not raw volume. This single sentence separates good from great answers.
+2 Anti-spam / anti-bot is load-bearing. Often skipped. Mention user-reputation weighting + author-diversity constraint.
+2 Geo scoping + personalization are separate concerns. Geo is pre-compute; personalization is runtime-rerank on top of geo's top-K. Don't mush them together.
+2 LLM-generated trend summaries is modern. Shows you know the 2024+ state of the art; a generic trend-picking design sounds dated.
Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.
04
Red flags (things that tank the interview)
No back-of-envelope estimation — jumps straight into components without quantifying scale for Twitter Trending
Single point of failure — no replication, failover, or redundancy discussed
Ignores data model and storage choices — hand-waves the database layer