Exercise ยท Storage & Data

Google Drive / Dropbox

Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.

Out of 10 points45 min whiteboardReference solution โ†’
01

Prompt

Design Google Drive. Files up to 100 GB, folder hierarchy, sharing ACLs, cross-device sync, 1B+ users.

Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.

02

Hints (progressive โ€” click to reveal)

Hint 1

Chunked content-addressable storage (4 MB blocks, SHA-256 keyed). Delta sync = upload only missing blocks.

Hint 2

Metadata (tree, versions, ACLs) in transactional DB. Bytes in blob store. Don't conflate.

Hint 3

Offline-first client: local SQLite + operation queue; sync engine reconciles.

03

Rubric โ€” 10 points

  • +2 Chunked CAS: files split into 4 MB blocks, SHA-256-keyed; cross-user dedup of identical blocks
  • +2 Metadata in Postgres (sharded by user_id) โ€” file tree + versions + ACLs; transactional
  • +2 Blocks in object store (S3-like); deduplicated by hash; erasure-coded for durability
  • +1 Delta sync: /blocks/check returns missing hashes; client uploads only those
  • +1 Offline-first client: SQLite local + operation queue; reconciles via vector clocks
  • +1 Conflict resolution: save both versions as foo.txt + foo (conflicted copy).txt (not merge)
  • +1 Sharing via ACLs on metadata node; not file content; propagate via permission event

Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.

04

Red flags (things that tank the interview)

  • Uploads whole file on every change (no chunking)
  • Tries CRDT / OT merge on file content (file systems don't have semantic awareness)
  • Stores byte content in Postgres
  • Single ACL check at upload (doesn't propagate when folder-shared)
  • No client-side sync engine; assumes always-online