Google Drive

02

Requirements

Functional

Users can upload any file — any type, any size
Users can download files from any device
Changes sync automatically across all connected devices
Users can organize files into folders, rename, move
Users can share files with viewer or editor permissions
Full version history — view and restore previous versions
Conflict resolution — never lose data on concurrent edits

Non-Functional

11-nine durability — a file uploaded is never lost
99.99% availability for metadata operations
Eventual consistency acceptable for sync — strong consistency for permissions
Metadata ops under 100ms p99
Chunk existence checks under 10ms
Horizontally scalable — no single bottleneck
Resumable uploads — no re-upload on connection drop

The Key Distinction

Durability and availability are different things. Durability means the file exists and is not corrupted — ever. Availability means you can access it right now. S3 gives 11 nines of durability by replicating every chunk across 3 availability zones automatically. The metadata service might go down temporarily — that's an availability issue. The bytes in S3 are essentially indestructible.

03

Scale Estimation

Metric	Calculation	Result
Registered users	Industry benchmark	500M
Daily Active Users	~10% of registered	50M
Storage per user (quota)	Google Drive free tier	15 GB
Total raw storage ceiling	500M × 15 GB	7.5 Exabytes
After deduplication (~70% saved)	7.5 EB × 0.30	~2.25 Exabytes actual
Daily uploads	50M DAU × 5 files × 500 KB avg	~125 TB/day
Upload RPS	250M writes ÷ 86,400	~2,900 RPS
Download RPS	500M reads ÷ 86,400	~5,800 RPS
Sustained bandwidth (upload)	125 TB ÷ 86,400	~1.4 GB/s
Sustained bandwidth (download)	250 TB ÷ 86,400	~2.9 GB/s

Key Insight — 2:1 Read/Write

Unlike Twitter (10:1) or a URL shortener (100:1), Drive is nearly balanced at ~2:1 reads to writes. Files are uploaded once but accessed across multiple devices. This means caching is helpful but not the dominant strategy — durability and efficient storage are. The deduplication saving of ~70% is far more impactful than any cache hit rate.

05

API Design

POST /v1/files/upload/init Initiate a chunked upload

// Request — client sends hashes of all chunks { "filename": "report.pdf", "size": 2147483648, "folder_id": "abc", "chunks": ["a3f9...", "b812...", "c291..."] } // Response 200 — server returns only missing chunk URLs { "upload_id": "up_xyz", "missing_chunks": [{ "hash": "b812...", "presigned_url": "https://s3..." }] }

POST /v1/files/upload/complete Commit file after S3 upload

// Client calls after uploading all missing chunks to S3 { "upload_id": "up_xyz", "chunk_order": ["a3f9...", "b812...", "c291..."] } // Response 201 — file is now visible { "file_id": "f_abc", "version": 1, "created_at": "2025-01-01T00:00:00Z" }

GET /v1/files/:id/download Get download URL for a file

// Response 200 — pre-signed URL, valid 15 min { "url": "https://cdn.example.com/f_abc?token=...", "expires_in": 900, "version": 3, "size": 2147483648 }

GET /v1/folders/:id/contents List folder contents

// Response 200 { "folder_id": "f_root", "items": [ { "id": "f1", "type": "file", "name": "report.pdf", "size": 2048, "version": 3 }, { "id": "d1", "type": "folder", "name": "Work" } ], "next_cursor": "cur_xyz" }

POST /v1/files/:id/share Grant access to another user

{ "user_email": "colleague@example.com", "role": "editor" } // Response 200: { "permission_id": "p_xyz", "role": "editor" }

06

High-Level Architecture

Architecture — Service Map SVG Diagram

Metadata Service

Handles file/folder CRUD, permissions checks, and version tracking. Reads from PostgreSQL with a Redis cache in front. Sharded by user_id so all of one user's data lives on one shard — one query, one hop.

Upload Service

Orchestrates the chunked upload flow. Checks Redis for existing chunk hashes, generates pre-signed S3 URLs for missing chunks only, and commits the file record once S3 confirms. Never touches file bytes itself.

Sync Service

Maintains persistent WebSocket connections with all active clients. When a file changes, pushes a lightweight notification event. Clients then pull the delta — which chunks changed — and download only what's new.

S3 + Redis + PostgreSQL

Each storage system does exactly what it's best at. S3 for exabyte-scale blobs. PostgreSQL for rich relational queries. Redis for sub-10ms chunk existence checks — a hash map lookup, O(1), no disk involved.

07

Deep Dive — Chunking, Deduplication & Delta Sync

The Core Insight

The entire efficiency of Google Drive comes from one idea: never transfer or store what you already have. Chunking, hashing, deduplication, and delta sync are all expressions of this single principle. The client does the heavy lifting so the server only sees what's new.

Upload Sequence — Detailed Mermaid.js

sequenceDiagram participant C as Client participant US as Upload Service participant R as Redis participant S3 as S3 C->>C: Split file into 4MB chunks, SHA-256 each C->>US: POST /upload/init {filename, chunk_hashes[]} US->>R: SMISMEMBER chunk_hashes (batch check) R-->>US: [exists, missing, exists, missing] US-->>C: {upload_id, presigned_urls for missing only} C->>S3: PUT chunk (direct, bypasses API) S3-->>C: 200 OK C->>US: POST /upload/complete {upload_id, chunk_order[]} US->>R: SADD new chunk hashes US-->>C: {file_id, version: 1} Note over US: Async: notify Sync Service → push to other devices

Why chunking at all? A 2GB file is an all-or-nothing proposition without chunking. One dropped connection means starting over. With 4MB chunks, a resume is trivial — the client checks which hashes exist and picks up exactly where it left off. Chunk size is a tunable: too small means too many round trips; too large means resuming wastes more bandwidth. 4–8MB is the Dropbox-era sweet spot.

Content-addressable storage. The chunk hash is not just a fingerprint — it is the storage address. The path in S3 is derived from the hash. This means the same 4MB block of zeros appears in a million different files, but S3 stores it exactly once. Deduplication is free — it falls out naturally from using the hash as the key.

Delta sync. When you edit a 2GB video and change one frame, only 1–2 chunks — maybe 8MB — get new hashes. All other chunks are unchanged. The sync client computes a diff of chunk hashes between local and server state and uploads only the delta. This is why Dropbox felt fast even on slow connections in 2008.

Pre-signed URLs keep your API servers thin. The Upload Service never touches file bytes. It only checks Redis, generates a short-lived S3 URL, and records metadata. The actual data transfer is a direct HTTP PUT from client to S3 — no API server in the middle consuming bandwidth or memory.

Request Flow — Step Through

Client · Chunks + hashes→Upload Svc · API Server→Redis · Chunk index→S3 · Object store→Sync Svc · WebSocket push

Click Next Step to walk through the request flow.

08

Key Design Decisions & Tradeoffs

Approach A — Chosen

Conflict: Fork into two files

When two clients edit the same file concurrently and both try to sync, save both versions. The second syncer gets a "conflicted copy" alongside the winning version. A human merges manually. Data is never lost.

✓ Never lose data — strongest guarantee

Approach B

Conflict: Last write wins

Whoever syncs last silently overwrites. Simple to implement. Someone loses their work with no warning — completely unacceptable for a storage product. Only defensible for systems where data loss is tolerable.

✗ Silent data loss

Approach A — Chosen

Sync: Push via WebSocket

Server holds open persistent connections to all active clients. When a file changes, it pushes a lightweight notification instantly. Client reacts and pulls delta. Idle connections are cheap — typically just a few KB of memory each.

✓ Real-time, no wasted RPS

Approach B

Sync: Poll every N seconds

Simple to implement. But 50M clients polling every 5 seconds = 10M requests/second to servers returning "nothing changed" 99.9% of the time. Enormous wasted infrastructure. Latency is also bounded by poll interval.

✗ 10M RPS of empty work

Approach A — Chosen

Storage: Separate S3 + PostgreSQL

File bytes in S3 (optimised for large sequential blobs). Metadata in PostgreSQL (optimised for relational queries, joins, permissions). Each system does exactly what it's built for. Deduplication and versioning are clean.

✓ Right tool for each job

Approach B

Storage: Everything in one DB

Storing binary blobs in PostgreSQL via BYTEA columns works at small scale. But it kills DB performance, makes backups enormous, and prevents the content-addressable deduplication pattern. Hits a wall at ~1TB.

✗ Breaks at scale

Approach A — Chosen

Folder Tree: Closure Table

Store every ancestor-descendant pair explicitly. "Get all files under this folder" is a single index scan regardless of nesting depth. Reads are O(1). Moves are O(n) where n = descendants — acceptable since moves are rare.

✓ Fast reads, acceptable write cost

Approach B

Folder Tree: Parent Pointer

Simple parent_folder_id on each row. Reads require recursive CTEs — one round trip per nesting level. A folder 10 levels deep means 10 sequential queries. Read performance degrades with depth.

~ Fine for shallow trees

09

What Can Go Wrong

📡

Partial Upload — Connection Drop Mid-Transfer

A user uploads a 2GB file chunked into 512 blocks. The connection drops at chunk 300. Without a resume mechanism, the client re-uploads all 512 chunks on reconnect — wasting 300 chunks of bandwidth that already reached S3.

→ Fix: Resumable uploads — chunk hashes in Redis act as resume state. On reconnect, client re-checks which hashes exist and uploads only the remainder from chunk 301.

🔀

Metadata / Storage Out of Sync

Upload succeeds — chunks land in S3 — but the metadata write to PostgreSQL fails. The result: orphaned chunks in S3 that no file record points to. Users can't see the file. Storage is consumed invisibly and grows without bound.

→ Fix: Two-phase commit — file is marked "pending" before upload, "complete" after. A nightly cleanup job purges stale pending files older than 24 hours and reconciles S3 against the chunk manifest.

⚡

Storage Quota Race Condition

User at 14.8 GB of a 15 GB quota simultaneously uploads two 500 MB files from two devices. Both read the quota, both see 200 MB free, both proceed. User ends up 800 MB over quota with no rejection.

→ Fix: Atomic quota check using SELECT ... FOR UPDATE row-level locking. The second concurrent transaction blocks, re-reads the updated quota, and correctly rejects.

🌊

Sync Storm on Reconnect

A laptop was offline for a week. 500 files changed across shared folders. On reconnect, the client fires all 500 sync requests simultaneously — a burst that can saturate the sync service and degrade performance for other users.

→ Fix: Client-side sync queue with exponential backoff + jitter. Server-side per-user rate limiting. Priority queue gives precedence to files the user is currently viewing.

🔥

Hot Shard — Celebrity Account

PostgreSQL is sharded by user_id. A celebrity account with 10M shared-folder members means every read hits the same shard regardless of total shard count. One machine becomes the bottleneck for a massive percentage of global reads.

→ Fix: Hot account detection — accounts above a threshold get metadata replicated to a read-optimized store with a much longer Redis TTL, effectively caching all reads away from PostgreSQL entirely.

⚠

Anti-patterns

🚫

Upload full file on every change

Changing one byte of a 2GB video uploads 2GB.

✓ Better: Chunked content-addressable storage (4MB blocks, SHA-256 keyed); dedup + delta sync.

🚫

Resolve all conflicts via OT / CRDT

File systems don't know file semantics; a .docx CRDT would corrupt docs.

✓ Better: Save both versions as `foo.txt` and `foo (conflicted copy).txt`; let user pick.

🚫

Notify all N users' devices by polling every 30s

Scales linearly with user count × device count.

✓ Better: Long-poll / WebSocket notify; clients pull delta since last cursor.

10

Interview Tips

01

Separate consistency tiers immediately. Not everything needs the same consistency guarantee. Say it explicitly: "File listings can be eventually consistent — a slight delay is tolerable. Permissions must be strongly consistent — you should never see a file you're not authorised to." This signals mature architectural thinking from the first minute.

02

The client is part of your system — talk about it. Most candidates draw a "Client" box and move on. The Desktop sync client in Dropbox/Drive is doing serious work: chunking, hashing, maintaining a local SQLite state database, diffing against server state. Mentioning client intelligence instantly differentiates you.

03

Be ready for: "How do you handle shared folders at scale?" If 10,000 people share a folder and someone uploads a file, do you notify all 10,000 simultaneously? That's a fan-out problem identical to Twitter's celebrity problem. Answer: tiered notification — online users get pushed notifications instantly, offline users pull on reconnect, very large shared folders batch their notifications.

04

Justify the shard key explicitly. When you say "shard PostgreSQL by user_id," explain why: a user's files, folders, versions, and permissions are almost always accessed together — co-locating them on one shard means one query, one network hop. Any other shard key breaks this data locality.

05

Answer "why not just S3 directly?" proactively. S3 gives you storage — not a product. You need metadata, permissions, versioning, conflict resolution, quota management, sharing semantics, and sync logic on top. S3 is one component of the solution. Framing this proactively shows you understand the layers of the problem.

06

Deduplication is the most impressive number to cite. Dropbox reportedly stores only ~30% of the actual data users think they've uploaded, purely through content-addressable chunk deduplication. Citing this tells the interviewer you've actually studied how these systems are built — not just pattern-matched to a generic design.

11

How the Design Evolves

Phase 1 — 0 to 10K users

Single Server + S3

One API server, one PostgreSQL instance, S3 for file storage from day one. Use S3 early — there's no good reason to store files locally. No chunking yet — just upload the whole file. No Redis. Ship fast, optimise later. This is the right call for almost every early-stage startup.

Phase 2 — 10K to 1M users

Add Chunking + Redis

Implement client-side chunking and delta sync — this is your biggest reliability and bandwidth win. Add Redis for chunk existence checks. Add a read replica on PostgreSQL. Add CDN in front of S3 downloads. Most successful products live here their entire lives.

Phase 3 — 1M to 50M users

Shard PostgreSQL + Microservices

Shard PostgreSQL by user_id. Split the monolith into metadata, upload, sync, and share services. Introduce a message queue for async operations (S3 completion events, notification fan-out). Each service scales independently. Introduce closure table for folder trees at this stage.

Phase 4 — 50M+ users

Multi-Region + Block-Level Deduplication

Multi-region active-active for global latency. Block-level deduplication across users (not just within a user's account). Tiered storage — hot files on SSD-backed S3, cold files on S3 Glacier. Separate CDN edge nodes per continent. Most engineers never operate here — but knowing this evolution exists signals architectural depth.

📺

References & Videos

Design Google Drive

ByteByteGo · 11 min

Google Drive System Design

Requirements

Scale Estimation

API Design

High-Level Architecture

Deep Dive — Chunking, Deduplication & Delta Sync

Key Design Decisions & Tradeoffs

What Can Go Wrong

Anti-patterns

Interview Tips

Similar Problems

How the Design Evolves

References & Videos

Back-of-Envelope Estimation