Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.
Out of 10 points45 min whiteboardReference solution →
01
Prompt
How do you reliably execute millions of scheduled tasks at precisely the right time across a fleet of unreliable machines, ensuring no job is missed and no job runs twice?
Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.
02
Hints (progressive — click to reveal)
Hint 1
Start with requirements: functional vs non-functional. Clarify the scale (users, QPS, storage).
Hint 2
Think about the data model first. What entities exist? What are the access patterns?
Hint 3
Identify the hardest sub-problem and deep-dive into it. Show you can go beyond boxes and arrows.
03
Rubric — 10 points
+2 Back-of-envelope estimation with concrete numbers
+2 Clear API design with key endpoints
+2 Sensible data model and storage choices
+2 Addresses scalability (sharding, caching, CDN)
+2 Discusses failure modes and mitigations
Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.
04
Red flags (things that tank the interview)
No back-of-envelope estimation — jumps straight into components without quantifying scale for Distributed Job Scheduler
Single point of failure — no replication, failover, or redundancy discussed
Ignores data model and storage choices — hand-waves the database layer