A cloud file storage and synchronisation service that lets users upload any file, access it from any device, share it with others, and keep it in sync across all their machines — even after offline edits. The hard part is not storing files. It's storing them efficiently, keeping them consistent, and syncing them reliably at petabyte scale.
Changes sync automatically across all connected devices
Users can organize files into folders, rename, move
Users can share files with viewer or editor permissions
Full version history — view and restore previous versions
Conflict resolution — never lose data on concurrent edits
Non-Functional
11-nine durability — a file uploaded is never lost
99.99% availability for metadata operations
Eventual consistency acceptable for sync — strong consistency for permissions
Metadata ops under 100ms p99
Chunk existence checks under 10ms
Horizontally scalable — no single bottleneck
Resumable uploads — no re-upload on connection drop
The Key Distinction
Durability and availability are different things. Durability means the file exists and is not corrupted — ever. Availability means you can access it right now. S3 gives 11 nines of durability by replicating every chunk across 3 availability zones automatically. The metadata service might go down temporarily — that's an availability issue. The bytes in S3 are essentially indestructible.
03
Scale Estimation
Metric
Calculation
Result
Registered users
Industry benchmark
500M
Daily Active Users
~10% of registered
50M
Storage per user (quota)
Google Drive free tier
15 GB
Total raw storage ceiling
500M × 15 GB
7.5 Exabytes
After deduplication (~70% saved)
7.5 EB × 0.30
~2.25 Exabytes actual
Daily uploads
50M DAU × 5 files × 500 KB avg
~125 TB/day
Upload RPS
250M writes ÷ 86,400
~2,900 RPS
Download RPS
500M reads ÷ 86,400
~5,800 RPS
Sustained bandwidth (upload)
125 TB ÷ 86,400
~1.4 GB/s
Sustained bandwidth (download)
250 TB ÷ 86,400
~2.9 GB/s
Key Insight — 2:1 Read/Write
Unlike Twitter (10:1) or a URL shortener (100:1), Drive is nearly balanced at ~2:1 reads to writes. Files are uploaded once but accessed across multiple devices. This means caching is helpful but not the dominant strategy — durability and efficient storage are. The deduplication saving of ~70% is far more impactful than any cache hit rate.
05
API Design
POST/v1/files/upload/initInitiate a chunked upload
Handles file/folder CRUD, permissions checks, and version tracking. Reads from PostgreSQL with a Redis cache in front. Sharded by user_id so all of one user's data lives on one shard — one query, one hop.
Upload Service
Orchestrates the chunked upload flow. Checks Redis for existing chunk hashes, generates pre-signed S3 URLs for missing chunks only, and commits the file record once S3 confirms. Never touches file bytes itself.
Sync Service
Maintains persistent WebSocket connections with all active clients. When a file changes, pushes a lightweight notification event. Clients then pull the delta — which chunks changed — and download only what's new.
S3 + Redis + PostgreSQL
Each storage system does exactly what it's best at. S3 for exabyte-scale blobs. PostgreSQL for rich relational queries. Redis for sub-10ms chunk existence checks — a hash map lookup, O(1), no disk involved.
07
Deep Dive — Chunking, Deduplication & Delta Sync
The Core Insight
The entire efficiency of Google Drive comes from one idea: never transfer or store what you already have. Chunking, hashing, deduplication, and delta sync are all expressions of this single principle. The client does the heavy lifting so the server only sees what's new.
Upload Sequence — DetailedMermaid.js
sequenceDiagram
participant C as Client
participant US as Upload Service
participant R as Redis
participant S3 as S3
C->>C: Split file into 4MB chunks, SHA-256 each
C->>US: POST /upload/init {filename, chunk_hashes[]}
US->>R: SMISMEMBER chunk_hashes (batch check)
R-->>US: [exists, missing, exists, missing]
US-->>C: {upload_id, presigned_urls for missing only}
C->>S3: PUT chunk (direct, bypasses API)
S3-->>C: 200 OK
C->>US: POST /upload/complete {upload_id, chunk_order[]}
US->>R: SADD new chunk hashes
US-->>C: {file_id, version: 1}
Note over US: Async: notify Sync Service → push to other devices
Why chunking at all? A 2GB file is an all-or-nothing proposition without chunking. One dropped connection means starting over. With 4MB chunks, a resume is trivial — the client checks which hashes exist and picks up exactly where it left off. Chunk size is a tunable: too small means too many round trips; too large means resuming wastes more bandwidth. 4–8MB is the Dropbox-era sweet spot.
Content-addressable storage. The chunk hash is not just a fingerprint — it is the storage address. The path in S3 is derived from the hash. This means the same 4MB block of zeros appears in a million different files, but S3 stores it exactly once. Deduplication is free — it falls out naturally from using the hash as the key.
Delta sync. When you edit a 2GB video and change one frame, only 1–2 chunks — maybe 8MB — get new hashes. All other chunks are unchanged. The sync client computes a diff of chunk hashes between local and server state and uploads only the delta. This is why Dropbox felt fast even on slow connections in 2008.
Pre-signed URLs keep your API servers thin. The Upload Service never touches file bytes. It only checks Redis, generates a short-lived S3 URL, and records metadata. The actual data transfer is a direct HTTP PUT from client to S3 — no API server in the middle consuming bandwidth or memory.
When two clients edit the same file concurrently and both try to sync, save both versions. The second syncer gets a "conflicted copy" alongside the winning version. A human merges manually. Data is never lost.
✓ Never lose data — strongest guarantee
Approach B
Conflict: Last write wins
Whoever syncs last silently overwrites. Simple to implement. Someone loses their work with no warning — completely unacceptable for a storage product. Only defensible for systems where data loss is tolerable.
✗ Silent data loss
Approach A — Chosen
Sync: Push via WebSocket
Server holds open persistent connections to all active clients. When a file changes, it pushes a lightweight notification instantly. Client reacts and pulls delta. Idle connections are cheap — typically just a few KB of memory each.
✓ Real-time, no wasted RPS
Approach B
Sync: Poll every N seconds
Simple to implement. But 50M clients polling every 5 seconds = 10M requests/second to servers returning "nothing changed" 99.9% of the time. Enormous wasted infrastructure. Latency is also bounded by poll interval.
✗ 10M RPS of empty work
Approach A — Chosen
Storage: Separate S3 + PostgreSQL
File bytes in S3 (optimised for large sequential blobs). Metadata in PostgreSQL (optimised for relational queries, joins, permissions). Each system does exactly what it's built for. Deduplication and versioning are clean.
✓ Right tool for each job
Approach B
Storage: Everything in one DB
Storing binary blobs in PostgreSQL via BYTEA columns works at small scale. But it kills DB performance, makes backups enormous, and prevents the content-addressable deduplication pattern. Hits a wall at ~1TB.
✗ Breaks at scale
Approach A — Chosen
Folder Tree: Closure Table
Store every ancestor-descendant pair explicitly. "Get all files under this folder" is a single index scan regardless of nesting depth. Reads are O(1). Moves are O(n) where n = descendants — acceptable since moves are rare.
✓ Fast reads, acceptable write cost
Approach B
Folder Tree: Parent Pointer
Simple parent_folder_id on each row. Reads require recursive CTEs — one round trip per nesting level. A folder 10 levels deep means 10 sequential queries. Read performance degrades with depth.
~ Fine for shallow trees
09
What Can Go Wrong
📡
Partial Upload — Connection Drop Mid-Transfer
A user uploads a 2GB file chunked into 512 blocks. The connection drops at chunk 300. Without a resume mechanism, the client re-uploads all 512 chunks on reconnect — wasting 300 chunks of bandwidth that already reached S3.
→ Fix: Resumable uploads — chunk hashes in Redis act as resume state. On reconnect, client re-checks which hashes exist and uploads only the remainder from chunk 301.
🔀
Metadata / Storage Out of Sync
Upload succeeds — chunks land in S3 — but the metadata write to PostgreSQL fails. The result: orphaned chunks in S3 that no file record points to. Users can't see the file. Storage is consumed invisibly and grows without bound.
→ Fix: Two-phase commit — file is marked "pending" before upload, "complete" after. A nightly cleanup job purges stale pending files older than 24 hours and reconciles S3 against the chunk manifest.
⚡
Storage Quota Race Condition
User at 14.8 GB of a 15 GB quota simultaneously uploads two 500 MB files from two devices. Both read the quota, both see 200 MB free, both proceed. User ends up 800 MB over quota with no rejection.
→ Fix: Atomic quota check using SELECT ... FOR UPDATE row-level locking. The second concurrent transaction blocks, re-reads the updated quota, and correctly rejects.
🌊
Sync Storm on Reconnect
A laptop was offline for a week. 500 files changed across shared folders. On reconnect, the client fires all 500 sync requests simultaneously — a burst that can saturate the sync service and degrade performance for other users.
→ Fix: Client-side sync queue with exponential backoff + jitter. Server-side per-user rate limiting. Priority queue gives precedence to files the user is currently viewing.
🔥
Hot Shard — Celebrity Account
PostgreSQL is sharded by user_id. A celebrity account with 10M shared-folder members means every read hits the same shard regardless of total shard count. One machine becomes the bottleneck for a massive percentage of global reads.
→ Fix: Hot account detection — accounts above a threshold get metadata replicated to a read-optimized store with a much longer Redis TTL, effectively caching all reads away from PostgreSQL entirely.
File systems don't know file semantics; a .docx CRDT would corrupt docs.
✓ Better: Save both versions as `foo.txt` and `foo (conflicted copy).txt`; let user pick.
🚫
Notify all N users' devices by polling every 30s
Scales linearly with user count × device count.
✓ Better: Long-poll / WebSocket notify; clients pull delta since last cursor.
10
Interview Tips
01
Separate consistency tiers immediately. Not everything needs the same consistency guarantee. Say it explicitly: "File listings can be eventually consistent — a slight delay is tolerable. Permissions must be strongly consistent — you should never see a file you're not authorised to." This signals mature architectural thinking from the first minute.
02
The client is part of your system — talk about it. Most candidates draw a "Client" box and move on. The Desktop sync client in Dropbox/Drive is doing serious work: chunking, hashing, maintaining a local SQLite state database, diffing against server state. Mentioning client intelligence instantly differentiates you.
03
Be ready for: "How do you handle shared folders at scale?" If 10,000 people share a folder and someone uploads a file, do you notify all 10,000 simultaneously? That's a fan-out problem identical to Twitter's celebrity problem. Answer: tiered notification — online users get pushed notifications instantly, offline users pull on reconnect, very large shared folders batch their notifications.
04
Justify the shard key explicitly. When you say "shard PostgreSQL by user_id," explain why: a user's files, folders, versions, and permissions are almost always accessed together — co-locating them on one shard means one query, one network hop. Any other shard key breaks this data locality.
05
Answer "why not just S3 directly?" proactively. S3 gives you storage — not a product. You need metadata, permissions, versioning, conflict resolution, quota management, sharing semantics, and sync logic on top. S3 is one component of the solution. Framing this proactively shows you understand the layers of the problem.
06
Deduplication is the most impressive number to cite. Dropbox reportedly stores only ~30% of the actual data users think they've uploaded, purely through content-addressable chunk deduplication. Citing this tells the interviewer you've actually studied how these systems are built — not just pattern-matched to a generic design.
11
Similar Problems
These problems share the core patterns of content storage, unique-ID addressing, or sync semantics. Mastering Drive gives you a head start on all of them.
One API server, one PostgreSQL instance, S3 for file storage from day one. Use S3 early — there's no good reason to store files locally. No chunking yet — just upload the whole file. No Redis. Ship fast, optimise later. This is the right call for almost every early-stage startup.
Phase 2 — 10K to 1M users
Add Chunking + Redis
Implement client-side chunking and delta sync — this is your biggest reliability and bandwidth win. Add Redis for chunk existence checks. Add a read replica on PostgreSQL. Add CDN in front of S3 downloads. Most successful products live here their entire lives.
Phase 3 — 1M to 50M users
Shard PostgreSQL + Microservices
Shard PostgreSQL by user_id. Split the monolith into metadata, upload, sync, and share services. Introduce a message queue for async operations (S3 completion events, notification fan-out). Each service scales independently. Introduce closure table for folder trees at this stage.
Phase 4 — 50M+ users
Multi-Region + Block-Level Deduplication
Multi-region active-active for global latency. Block-level deduplication across users (not just within a user's account). Tiered storage — hot files on SSD-backed S3, cold files on S3 Glacier. Separate CDN edge nodes per continent. Most engineers never operate here — but knowing this evolution exists signals architectural depth.