Exercise · Storage & Data

Dropbox Sync

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points45 min whiteboardReference solution →
01

Prompt

A two-way file sync engine: every file in a user's "Dropbox" folder should appear identically across all their devices within seconds of any change. The hard parts: a content-addressable block store that deduplicates chunks across users for massive storage savings; a metadata service that tells clients exactly which blocks to fetch/push per file version; and a sync loop that reconciles local file-system changes with remote state while handling conflicts, offline edits, and flaky networks. Dropbox stores exabytes of user data; clients sync ~billions of file events per day.

Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

02

Hints (progressive — click to reveal)

Hint 1

Lead with chunking + content-addressable storage. This is THE answer to "how do you make this efficient?" Most candidates miss this and just "upload the file to S3."

Hint 2

Separate metadata from blocks. Metadata is transactional (needs DB); blocks are immutable byte bags (object store). Don't conflate them.

Hint 3

Conflict resolution is "save both." Not OT/CRDT. Name this explicitly — it shows you understand the scope of a sync engine vs a collab editor.

03

Rubric — 10 points

  • +2 Lead with chunking + content-addressable storage. This is THE answer to "how do you make this efficient?" Most candidates miss this and just "upload the file to S3."
  • +2 Separate metadata from blocks. Metadata is transactional (needs DB); blocks are immutable byte bags (object store). Don't conflate them.
  • +2 Conflict resolution is "save both." Not OT/CRDT. Name this explicitly — it shows you understand the scope of a sync engine vs a collab editor.
  • +2 Delta sync via chunk hashes. 99% of candidates think "rsync-style diff" — the correct Dropbox answer is "chunked CAS, query missing hashes, upload only those."
  • +2 Offline-first is the sync loop's existence. If you don't have a client-side SQLite holding pending ops + a reconcile step, your design assumes online-always.

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

04

Red flags (things that tank the interview)

  • No back-of-envelope estimation — jumps straight into components without quantifying scale for Dropbox Sync
  • Single point of failure — no replication, failover, or redundancy discussed
  • Ignores data model and storage choices — hand-waves the database layer