Google Docs / Live Collaborative Editor

01

Problem Statement

Build a system like Google Docs — a web-based document editor that supports real-time collaboration. Multiple users open the same document, type at the same time, and every user's screen shows the same content within milliseconds. The challenge isn't storage or CRUD — it's making concurrent edits converge to one consistent document state.

Think of three people at a whiteboard. Person A erases a word, Person B inserts text nearby, Person C types at the bottom. The whiteboard must look identical for all three — instantly, without anyone losing work.

Core question: How do you let multiple users edit the same document simultaneously and guarantee all users converge to the same final state — without locking, without losing edits, and with low latency?

02

Requirements

Functional Requirements

Create, edit, and delete documents — standard CRUD operations
Real-time collaborative editing — multiple cursors, live character-by-character updates visible to all participants
Sharing & permissions — share documents with view, comment, or edit access
Version history — browse past states of the document, revert if needed
Offline editing — keep working when disconnected, sync changes on reconnect

Non-Functional Requirements

Low-latency collaboration — edits visible to other users within ~100–200ms
Strong eventual consistency — all users must converge to the exact same document state
High availability — users expect the editor to always be accessible
Durability — never lose a document or edit, even during server crashes

03

Scale Estimation

Grounded in realistic assumptions for a large-scale collaborative editor, not necessarily Google's exact numbers.

100M

Monthly Active Users

500K

Peak Concurrent Editors

500K/s

Peak Operations

5B

Total Documents

250 TB

Document Storage

~1.25 PB

With Version History (5×)

Derivation

100M MAU → 10M DAU (10% daily engagement). At peak, 5% of DAU are editing → 500K concurrent editors. Each editor produces ~1 operation/second (keystroke, formatting change, cursor move) → 500K ops/sec. Average 50 docs/user × 50KB each = 5B documents × 50KB = 250TB. Version history multiplies storage ~5×.

Key insight: Those 500K ops/sec aren't database writes — they're operations that need to be broadcast to collaborators and merged correctly. That's a fundamentally different kind of load than a typical CRUD app.

04

API Design

Two communication channels: REST for document management and WebSocket for real-time editing.

REST — Document Management

Create Document

        POST /api/documents
Body: { "title": "My Doc" }
Response: { "id": "doc_abc123", "title": "My Doc", "created_at": "..." }
      

Get Document

        GET /api/documents/{id}
Response: { "id": "doc_abc123", "title": "...", "content": "...",
            "permissions": "edit", "revision": 42 }
      

Share Document

        POST /api/documents/{id}/share
Body: { "email": "bob@co.com", "permission": "edit" }
Response: { "status": "shared" }
      

Version History

        GET /api/documents/{id}/history
Response: { "versions": [
  { "revision": 42, "timestamp": "...", "author": "alice" },
  { "revision": 38, "timestamp": "...", "author": "bob" }
]}
      

WebSocket — Real-Time Collaboration

Connection

        CONNECT ws://collab.service/documents/{id}
  → Server sends: current document state + revision number
      

Client → Server Messages

        // Edit operation — "I edited based on revision 5"
{ "type": "operation", "revision": 5,
  "ops": [{ "type": "insert", "pos": 12, "text": "hello" }] }

// Cursor position update
{ "type": "cursor", "pos": 24, "selection_end": 30 }
      

Server → Client Messages

        // Acknowledge your operation, assign server revision
{ "type": "ack", "revision": 9 }

// Broadcast another user's operation (already transformed)
{ "type": "operation", "user": "bob", "revision": 9,
  "ops": [{ "type": "delete", "pos": 5, "len": 3 }] }

// Presence update
{ "type": "presence", "user": "bob", "cursor": 18, "color": "#4A90D9" }
      

Key detail: Every operation carries a revision number. The client says "I made this edit based on revision 5." The server, now at revision 8, transforms the operation against revisions 6, 7, and 8 before applying it. This is how OT stays in sync.

05

High-Level Architecture

The architecture has two planes: a standard REST plane for document CRUD and a specialised real-time plane for collaborative editing via WebSockets. The collaboration server is the brain — it's the only stateful component.

Why is the Collaboration Server stateful?

It holds the current document state in memory so it can apply OT transforms instantly — no database round-trip per keystroke. This is the key architectural trade: speed at the cost of statefulness. If the server dies, we recover by replaying the operation log onto a new instance.

Request Flow — Step Through

Alice Types→WebSocket→Session Router→Collab Server→OT Transform→Op Log→Broadcast→Bob Sees Edit

Click Next Step to walk through the request flow.

06

Deep Dive — OT vs CRDTs

This is the hard part of collaborative editing — and the concept that makes this problem unique. Two approaches exist, and understanding both is essential.

Why is concurrent editing hard?

The simplest example. Document says: AB

The Conflict Scenario

Alice

Inserts X at position 0

Expects:

XAB

⚡

Both sent
at same time

Bob

Inserts Y at position 2

Expects:

ABY

If the server applies Alice's edit first (XAB), then Bob's "insert Y at position 2" gives XAYB instead of XABY. Bob's position is stale — Alice shifted everything right. The document is now corrupted.

Solution 1: OT — "Fix the Positions"

Think of a post office with a smart clerk. Every edit arrives at the server (the clerk). The clerk checks: "Has anything changed since this person sent this?" If yes, the clerk adjusts the position before applying.

Alice inserts X at 0 → applied first. Bob inserts Y at 2 → clerk sees Alice pushed everything right by 1, so transforms Bob's position from 2 to 3. Result: XABY. Both users converge.

sequenceDiagram participant A as Alice (Browser) participant S as Collab Server participant B as Bob (Browser) Note over A,B: Document: "AB" — Revision 1 A->>S: Insert(X, pos=0) based on rev 1 B->>S: Insert(Y, pos=2) based on rev 1 Note over S: Apply Alice → "XAB" (rev 2) Note over S: Transform Bob: pos 2→3 (Alice shifted +1) Note over S: Apply Bob → "XABY" (rev 3) S-->>A: Bob's op (transformed): Insert(Y, pos=3) S-->>B: Ack + Alice's op: Insert(X, pos=0) Note over A,B: Both see "XABY" ✓

OT in one sentence

The server is a single referee that assigns a global order to operations and adjusts positions so that everyone's edits produce the same document — regardless of what order they arrived in.

Solution 2: CRDTs — "Don't Use Positions"

Completely different philosophy. Positions cause all the problems — so get rid of them. Give every character a permanent unique ID that never changes. Edits reference these stable IDs, not numeric positions.

The analogy: a train where every car has a unique name. You say "add a dining car between 'sleeper' and 'engine'." Someone else says "add a luggage car between 'engine' and 'caboose'." Both instructions work regardless of order because they reference names, not position numbers.

	OT (Operational Transformation)	CRDTs
Core idea	Fix the positions after the fact	Don't use positions at all
Needs central server?	Yes — the referee	No — works peer-to-peer
Memory overhead	Light	Heavier — metadata per character
Offline editing	Hard — large batch transforms	Natural — just sync on reconnect
Complexity	Transform functions are bug-prone	Data structure design is complex
Used by	Google Docs	Figma, Yjs, Automerge

Our choice: OT. We already need a server for auth, storage, and permissions. OT's lower memory footprint fits better, and it's the expected answer in interviews. But the industry trend is moving toward CRDTs — worth mentioning as the modern alternative.

07

No central server needed, offline editing is natural, mathematically guaranteed convergence. But higher memory overhead and less mature at massive scale. Industry is trending this direction.

08

What Can Go Wrong

Collaboration server crashes mid-session

All connected users lose their WebSocket connections. Clients detect this, buffer unacknowledged operations locally, and reconnect. A new server spins up, loads the latest snapshot, replays the operation log, and resumes. Users resend buffered ops. Briefly jarring, but no data is lost because the op log is durable.

User goes offline mid-edit

The user keeps editing locally. On reconnect, their local edits are sent as a batch. The server transforms them against all operations that happened while they were away. This is where OT gets tricky — transforming a large batch against hundreds of operations is expensive. This is actually one reason CRDTs are gaining popularity: offline sync is much more natural.

Hot document — 10,000 concurrent editors

A single collaboration server can't handle 10K WebSocket connections and 10K ops/sec of OT transforms. Solutions: cap concurrent editors (Google Docs limits to ~100), or split the document into sections with independent collaboration servers per section.

Operation log grows forever

A frequently edited document's log could reach millions of entries. After taking a snapshot, compact the log — delete operations before the snapshot point. Keep periodic snapshots for version history, but the granular op log can be pruned.

Split brain — two servers think they own the same document

Consistent hashing routes a document to a new server but the old one hasn't released it yet — two referees for the same game. Solve with a distributed lock (ZooKeeper/etcd). A server must acquire the lock before serving a document. Old server's lock expires or is released on graceful shutdown.

09

Interview Tips

💡

Start with a single-user editor
Don't jump straight into OT and CRDTs. Say "let me first design a simple document editor for one user, then add collaboration." This shows structured thinking and gives you a baseline to build complexity on top of.

⚡

Name the core challenge in the first 2 minutes
Say: "The hard part of this problem isn't storage or APIs — it's concurrent edit resolution." Interviewers want to see you identify what makes a problem unique instead of giving a generic architecture.

🎯

Don't implement the OT transform function
Explain the concept with the simple two-user, two-edit, position-adjustment example. Don't try to write the algorithm — it's notoriously complex and not what they're testing. They want architectural thinking, not code.

🔑

Mention OT vs CRDT even if you only deep-dive one
Even if you pick OT, saying "the modern alternative is CRDTs — Figma uses them" shows breadth. Name real systems: Google Docs = OT, Figma = CRDTs.

⚠️

Address the stateful server head-on
The collaboration server breaks the "stateless servers" rule. Don't hide it — say "this component is intentionally stateful for performance, and here's how I handle failover: sticky routing, operation log for recovery, session draining on deploy."

10

Evolution

How this design grows from MVP to planet-scale.

1

MVP — Single-User Editor

REST API, save-on-click, PostgreSQL for storage. No real-time, no collaboration. One server, one database. Validate the editing experience before adding complexity.

2

Add Collaboration

WebSocket server with OT engine, append-only operation log, one collaboration server per document with sticky routing. This gets you to "Google Docs circa 2010" — functional real-time editing for small teams.

3

Scale & Harden

Multiple collaboration server instances with consistent hashing, periodic snapshots + log compaction, CDN for document loading, read replicas for metadata. Add presence (cursors, avatars), commenting, suggestion mode.

4

Planet Scale

Document sharding by section for hot documents, CRDT migration for better offline support, edge servers for global low-latency, ML-powered features (autocomplete, grammar, smart compose).

📺

References & Videos

Design Google Docs

Gaurav Sen · 20 min

Collaborative Editing — OT vs CRDT

Martin Kleppmann · 40 min

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Scale Estimation

Derivation

API Design

REST — Document Management

WebSocket — Real-Time Collaboration

High-Level Architecture

Why is the Collaboration Server stateful?

Deep Dive — OT vs CRDTs

Why is concurrent editing hard?

Solution 1: OT — "Fix the Positions"

OT in one sentence

Solution 2: CRDTs — "Don't Use Positions"

Key Design Decisions & Tradeoffs

1. Real-time transport

WebSockets

HTTP Long Polling

2. Server per document

Sticky routing (one server per doc)

Distributed collaboration across servers

3. Durability strategy

Operation Log + Periodic Snapshots

Write full document on every edit

4. Permission checking

Check on WebSocket connect, cache for session

Check every operation

5. Conflict resolution strategy

OT (Operational Transformation)

CRDTs

What Can Go Wrong

Collaboration server crashes mid-session

User goes offline mid-edit

Hot document — 10,000 concurrent editors

Operation log grows forever

Split brain — two servers think they own the same document

Interview Tips

Similar Problems

WhatsApp / Chat System

Google Drive / Dropbox

Distributed Message Queue

Evolution

MVP — Single-User Editor

Add Collaboration

Scale & Harden

Planet Scale

References & Videos

WhatsApp / Chat System

Google Drive / Dropbox

CAP Theorem