Exercise ยท Storage & Data
Google Drive / Dropbox
Whiteboard exercise. Try the problem cold, then reveal the rubric to self-score.
Out of 10 points45 min whiteboardReference solution โ
01
Prompt
Design Google Drive. Files up to 100 GB, folder hierarchy, sharing ACLs, cross-device sync, 1B+ users.
Time budget: 45 min whiteboard. Draw architecture, estimate numbers, discuss tradeoffs.
02
Hints (progressive โ click to reveal)
Hint 1
Chunked content-addressable storage (4 MB blocks, SHA-256 keyed). Delta sync = upload only missing blocks.
Hint 2
Metadata (tree, versions, ACLs) in transactional DB. Bytes in blob store. Don't conflate.
Hint 3
Offline-first client: local SQLite + operation queue; sync engine reconciles.
03
Rubric โ 10 points
- +2 Chunked CAS: files split into 4 MB blocks, SHA-256-keyed; cross-user dedup of identical blocks
- +2 Metadata in Postgres (sharded by user_id) โ file tree + versions + ACLs; transactional
- +2 Blocks in object store (S3-like); deduplicated by hash; erasure-coded for durability
- +1 Delta sync: /blocks/check returns missing hashes; client uploads only those
- +1 Offline-first client: SQLite local + operation queue; reconciles via vector clocks
- +1 Conflict resolution: save both versions as foo.txt + foo (conflicted copy).txt (not merge)
- +1 Sharing via ACLs on metadata node; not file content; propagate via permission event
Self-score: tally the points you would have mentioned unprompted. 7+ is interview-ready on this problem.
04
Red flags (things that tank the interview)
- Uploads whole file on every change (no chunking)
- Tries CRDT / OT merge on file content (file systems don't have semantic awareness)
- Stores byte content in Postgres
- Single ACL check at upload (doesn't propagate when folder-shared)
- No client-side sync engine; assumes always-online