Collaborative Whiteboard

Infinite canvas. Shapes, text, freehand drawing. Multiple users see each other's cursors in real time. Undo/redo, zoom/pan, offline edits that sync when reconnected. Like Miro or FigJam. The hard parts: a CRDT for canvas objects so concurrent edits merge without conflicts, spatial indexing so a canvas with 100K objects only sends the ~200 in your viewport, and per-user undo that reverses your operations without undoing other people's work. Scale: boards with 100K+ objects, 50+ simultaneous editors, sub-100ms collaboration latency.

Core: CRDT + Spatial Index + Real-Time Cursors + Per-User Undo100K+ canvas objects50+ concurrent usersSub-100ms syncOffline support

Requirements

Functional

Create/edit shapes (rect, circle, line), text blocks, and freehand drawings on an infinite canvas
See other users' cursors and selections in real time with name labels
Undo/redo per user — undoing your action does not undo others' actions
Zoom/pan across the infinite canvas; viewport-based loading
Offline mode: queue edits locally, sync and merge on reconnect
Board sharing: link-based access with viewer/editor/admin roles

Non-Functional

Collaboration latency < 100 ms (edit visible to others within 100 ms)
Canvas supports 100K+ objects without client performance degradation
Concurrent edits never lose data — both users' changes preserved (CRDT guarantee)
Cursor updates at ~15 Hz — smooth but not wasteful
Offline edits merge cleanly on reconnect — no manual conflict resolution
Support 50+ simultaneous editors per board

Scale Estimation

Objects per board

~100K

shapes, text, drawings; large boards accumulate over months

Objects in viewport

~200

only these need rendering and real-time sync

Concurrent editors

~50

per board during active workshops; spikes to 100+

Cursor updates

~15 Hz

per user; batched; only sent to overlapping viewports

Object operations

~5/sec/user

create, move, resize, edit text, delete

Sync latency target

< 100 ms

edit on user A's screen visible on user B's within 100 ms

API Design

WS/ws/boards/{board_id}

Primary collaboration channel. Client sends CRDT operations (create, update, delete objects) and cursor positions. Server broadcasts to other clients in same board. Binary protocol (CBOR) for efficiency.

GET/api/boards/{board_id}/objects?viewport=x,y,w,h

Fetch objects within viewport rectangle. Returns {objects: [{id, type, position, content, z_index}]}. Used on initial load and large pan jumps. R-tree spatial query on server.

POST/api/boards

Create new board. Body: {name, template_id?}. Returns {board_id, share_link}. Initializes empty CRDT document.

POST/api/boards/{board_id}/snapshot

Export board as PNG/SVG/PDF. Server renders full canvas server-side. Returns {download_url}. Used for sharing with non-editors.

POST/api/boards/{board_id}/sync

Offline sync endpoint. Client sends buffered CRDT operations accumulated offline. Server merges into board state, returns delta of changes made by others while offline.

Architecture

Three layers: real-time collaboration (WebSocket servers + CRDT merge), spatial query (R-tree index for viewport loading), persistence (CRDT state snapshots + operation log). Clients are thick — they hold the full CRDT state for objects in their viewport and apply operations locally before sending to server.

WebSocket servers are stateful (each holds connections for a set of boards) but horizontally scaled via consistent hashing on board_id. If a server dies, clients reconnect to a different server and replay from their local CRDT state. Persistence layer takes periodic snapshots (every 30 seconds or every 100 operations) to bound recovery time. The operation log is append-only and serves as the source of truth for replay, audit trails, and offline sync.

Collaborative Whiteboard ArchitectureSVG

Request Flow — Step Through

Client Canvas · local CRDT state→WebSocket Server · connection hub→CRDT Engine · merge operations→Spatial Index · R-tree query→Cursor Relay · viewport-filtered→Undo Service · per-user op stack→Persistence · snapshots + op log

Click Next Step to walk through the request flow.

Deep Dive — CRDTs, Spatial Indexing & Per-User Undo

(a) CRDT for canvas objects. Each shape/text/drawing is a CRDT register with fields: {id, type, position, content, z_index, style, deleted}. The key insight: each field is an independent last-writer-wins (LWW) register, timestamped with a Lamport clock.

-- CRDT object structure
{
  id: "obj_abc123",          -- globally unique (UUID)
  type: "rect",
  position: LWW{x: 100, y: 200, ts: 42, author: "user_A"},
  size: LWW{w: 300, h: 150, ts: 42, author: "user_A"},
  content: LWW{text: "Hello", ts: 40, author: "user_B"},
  z_index: LWW{val: 5, ts: 38, author: "user_A"},
  deleted: LWW{val: false, ts: 35, author: "user_A"}
}
-- Merge rule: for each field, keep the value with highest timestamp
-- Concurrent creates: both objects appear (add-wins set)
-- Concurrent moves: last-writer-wins on position (acceptable for whiteboard)

The board itself is an add-wins set (OR-Set) of objects. Creating an object adds it to the set. Deleting sets deleted: true (tombstone). Concurrent create + delete: create wins (add-wins semantics). This guarantees no data loss.

Why LWW works for whiteboards but not for text. In a text editor, concurrent inserts at the same position must both appear (interleaving). LWW would lose one insert — unacceptable. But for spatial objects, if two users move the same rectangle simultaneously, having it end up at one position or the other is fine — the "loser" just moves it again. This is why whiteboards use LWW registers while Google Docs uses sequence CRDTs (RGA, Yjs).

Lamport clocks for ordering. Each client maintains a logical clock. On every operation, increment the clock. When receiving a remote operation, set clock = max(local, remote) + 1. This ensures a total order across all operations without relying on wall-clock time (which can skew between clients).

(b) Spatial indexing with R-tree. A board with 100K objects cannot send all of them to every client on connect. Solution: server maintains an R-tree (rectangle tree) index of all object bounding boxes.

-- Viewport query
client_viewport = {x: 500, y: 200, width: 1200, height: 800}
visible_objects = rtree.query(client_viewport)
-- Returns ~200 objects that intersect the viewport
-- Client only receives and renders these

-- On pan/zoom: client sends new viewport
-- Server diffs: new_objects = query(new_vp) - query(old_vp)
-- Only send the delta — objects entering the viewport

R-tree updates on every object move/create/delete. For 100K objects, R-tree query is O(log N + k) where k is the result count. Sub-millisecond for typical viewports.

Viewport subscription model. Each WebSocket connection registers its current viewport with the server. When any object changes, the server checks which viewports intersect the object's bounding box and broadcasts only to those connections. This turns a naive O(users x objects) broadcast into O(affected_users) per operation.

-- Viewport subscription (server-side)
on_object_change(obj):
    affected_viewports = rtree_viewport_index.query(obj.bounding_box)
    for viewport in affected_viewports:
        viewport.connection.send(obj.crdt_delta)

on_viewport_change(connection, new_viewport):
    leaving = rtree.query(old_viewport) - rtree.query(new_viewport)
    entering = rtree.query(new_viewport) - rtree.query(old_viewport)
    connection.send({remove: leaving.ids, add: entering.full_objects})

(c) Cursor broadcasting. Each user's cursor position is sent via WebSocket at ~15 Hz (every ~67 ms). The server does not broadcast to all users on the board — only to users whose viewports overlap with the cursor position. If user A's cursor is at (5000, 3000) and user B is viewing (0, 0) to (1200, 800), user B does not receive A's cursor.

Client-side cursor interpolation. At 15 Hz, raw cursor positions appear jumpy. The client interpolates between received positions using cubic Bezier curves, making the remote cursor appear smooth at 60 fps despite only receiving 15 updates/sec. This is the same technique used in multiplayer games for entity interpolation.

-- Client-side cursor interpolation
on_cursor_update(remote_user, new_pos):
    remote_user.prev_pos = remote_user.target_pos
    remote_user.target_pos = new_pos
    remote_user.interp_start = now()
    remote_user.interp_duration = 67ms  -- 1/15 Hz

on_render_frame():  -- called at 60 fps
    for user in remote_users:
        t = (now() - user.interp_start) / user.interp_duration
        t = clamp(t, 0, 1)
        draw_cursor(lerp(user.prev_pos, user.target_pos, t))

(d) Per-user undo. Global undo (undo the last operation on the board) is a disaster in multiplayer: you undo someone else's work. Per-user undo: each user has a stack of their own operations. Undoing operation A means applying A-inverse.

-- Per-user undo stack
user_A.undo_stack = [
  {op: "move obj_1 from (0,0) to (100,200)", inverse: "move obj_1 to (0,0)"},
  {op: "create obj_2", inverse: "delete obj_2"},
  {op: "change obj_1 color to red", inverse: "change obj_1 color to blue"}
]
-- Undo: pop stack, apply inverse operation
-- Inverse is itself a valid CRDT operation — it merges normally
-- Other users see the undo as a regular edit

Real-Time Collaboration SequenceMermaid

sequenceDiagram participant A as User A (Client) participant WS as WebSocket Server participant CR as CRDT Engine participant SI as Spatial Index participant B as User B (Client) A->>WS: op: move obj_1 to (100,200) WS->>CR: merge CRDT operation CR->>SI: update R-tree for obj_1 CR-->>WS: merged state WS->>B: broadcast op (viewport overlap) B->>B: apply op locally A->>WS: cursor at (150,220) WS->>WS: check viewport overlap WS->>B: cursor A at (150,220) B->>WS: op: create obj_2 at (300,400) WS->>CR: merge CRDT operation CR->>SI: insert obj_2 into R-tree WS->>A: broadcast op (viewport overlap)

Offline sync protocol. When a client goes offline, it continues applying operations locally to its CRDT state. Operations are buffered in an ordered queue. On reconnect:

Client sends its last-known server timestamp (vector clock)
Server sends all operations since that timestamp (delta sync)
Client merges server delta into local state (CRDT merge — automatic, no conflicts)
Client sends its buffered offline operations to server
Server merges offline ops and broadcasts to other clients

Because CRDTs are commutative and idempotent, the order of merge does not matter. The final state is identical regardless of whether operations arrive in-order or out-of-order. This is the fundamental advantage over OT.

Interview answer

"Each canvas object is a CRDT with LWW registers per field — concurrent edits merge automatically, no central coordinator needed. The board is an add-wins set (OR-Set) so concurrent creates both appear. Spatial indexing via R-tree means clients only receive the ~200 objects in their viewport, not all 100K. Cursors broadcast at 15 Hz only to users with overlapping viewports. Undo is per-user: each operation has an inverse, and applying the inverse is itself a valid CRDT op. Offline edits buffer locally and merge on reconnect — same CRDT semantics, just delayed."

Anti-patterns

Send all 100K canvas objects to every client on load

Client receives 20 MB of object data on connect. Rendering 100K objects crashes the browser. 95% of objects are off-screen and wasted bandwidth.

Better: Viewport-based lazy loading via R-tree spatial query. Send only the ~200 objects visible in the current viewport. Load more on pan/zoom via delta queries.

Use OT (Operational Transform) instead of CRDT

OT requires a central server to transform operations in order. Single point of failure. Cannot work peer-to-peer or offline — operations must go through the server to be transformed against concurrent ops.

Better: CRDT (Conflict-free Replicated Data Types). Each client can apply operations locally and merge later. Works offline, peer-to-peer, and with multiple servers. No central transform needed.

Global undo stack — undo reverses the last operation on the board by any user

User A presses Ctrl+Z and undoes user B's rectangle that B just carefully positioned. Chaos in multiplayer. Users learn not to press undo, which is terrible UX.

Better: Per-user undo stack. Each user's Ctrl+Z only reverses their own last operation. The inverse operation is applied as a new CRDT op visible to everyone.

Tradeoffs & Design Choices

CRDT vs OT. CRDT: no central server needed, works offline, peer-to-peer friendly, but larger metadata overhead (timestamps, tombstones). OT: smaller wire format, but requires a central transform server and strict ordering. For a whiteboard (spatial objects, not linear text), CRDT is the clear winner.
LWW (last-writer-wins) vs multi-value register. LWW loses one concurrent write — acceptable for position (two users drag the same object; one wins). Multi-value would show both positions and ask user to pick — bad UX for a whiteboard. LWW is the pragmatic choice.
Server-authoritative vs peer-to-peer. Server-authoritative: simpler, easier access control, single source of truth. P2P: lower latency, no server cost, but hard to enforce permissions. Hybrid: clients apply locally (optimistic), server is authoritative merge point.
Cursor frequency: 15 Hz vs 60 Hz. 60 Hz = pixel-perfect cursor tracking but 4x bandwidth. 15 Hz with client-side interpolation (smooth the cursor between updates) gives near-identical UX at 1/4 the cost.
Tombstones vs compaction. Deleted objects become tombstones (deleted: true). Over time, tombstones accumulate. Periodic compaction removes tombstones older than 30 days — but must coordinate so no offline client still references them.
Freehand drawing representation. Freehand strokes are polylines (array of points). Storing every pixel is wasteful; use Ramer-Douglas-Peucker simplification to reduce point count by 80% with imperceptible visual difference. Each stroke is one CRDT object — concurrent drawing of different strokes creates independent objects (no conflict).
Z-index ordering. Z-index determines which objects appear on top. CRDT LWW on z_index means concurrent "bring to front" operations: last one wins. Alternative: use a fractional indexing scheme (e.g., "0.5" between "0" and "1") to allow insertions without renumbering all objects.

Failure Modes

WebSocket server crashes mid-session

50 users lose their connection. Unsent operations may be lost. Users see "reconnecting..." spinner.

Mitigation: client buffers unacknowledged ops locally. On reconnect (possibly to a different server), client replays buffered ops. CRDT merge is idempotent — replaying is safe.

R-tree index becomes stale

Object moved but R-tree not updated. Viewport query returns wrong objects. User sees stale canvas or missing objects.

Mitigation: R-tree update is synchronous with CRDT merge — same transaction. Client requests full viewport refresh if it detects inconsistency (missing object referenced by another).

User reconnects after 2 days offline with 500 buffered ops

Board has diverged significantly. Merging 500 ops against a changed board could produce surprising results.

Mitigation: CRDT guarantees convergence — all ops merge deterministically. Show user a "changes merged" summary highlighting conflicts (e.g., "object you edited was deleted by another user").

Cursor storm: 100 users on same board

100 cursors x 15 Hz = 1500 cursor messages/sec broadcast to all. Network saturated. Real edits delayed.

Mitigation: viewport filtering (only send cursors in your viewport). Priority: cursor messages are lower priority than edit operations. Drop cursor messages under load; never drop edit ops.

CRDT tombstone bloat

Board with 100K deleted objects — tombstones grow unbounded. Storage and sync overhead increases.

Mitigation: periodic compaction. Snapshot CRDT state, remove tombstones older than 30 days. New clients load from snapshot, not full history. Coordinate compaction with offline client grace period.

Interview Tips

Lead with CRDT, not OT. "Each canvas object is a CRDT register — concurrent edits merge without a central server." This immediately signals you know modern collaboration architecture.
Spatial indexing is the scalability key. "R-tree query returns only viewport objects" — this is what makes 100K-object boards feasible. Most candidates miss this entirely.
Per-user undo is non-obvious. "Undo applies the inverse operation as a new CRDT op" — this shows you've thought about multiplayer undo deeply, not just single-player Ctrl+Z.
Cursor optimization matters at scale. "Viewport-filtered, 15 Hz with client interpolation" — shows you understand the network cost of presence features.
Distinguish from Google Docs. Docs = linear text (sequence CRDT or OT). Whiteboard = spatial objects (map CRDT with LWW registers). Different data model, different indexing strategy.
Mention concrete libraries. "Yjs or Automerge for CRDT implementation; RBush for client-side R-tree; protobuf or CBOR for wire format." Naming real tools shows production awareness.
Address the "infinite" in infinite canvas. Coordinates are floating-point. There is no actual boundary. Zoom levels range from 10% (see the whole board) to 400% (pixel-level detail). At extreme zoom-out, the R-tree returns thousands of objects — client renders them as simplified thumbnails, not full detail.

Evolution

MVP — Single-user canvas with save/load

One user draws on canvas. State saved as JSON blob to server. No collaboration. Works for personal sketching.

Polling-based collaboration

Multiple users edit same board. Client polls server every 2 seconds for changes. Laggy but functional. Conflicts handled by last-write-wins at document level.

WebSocket + CRDT real-time sync

Sub-100ms collaboration via WebSocket. CRDT ensures conflict-free merges at field level. Cursors visible in real time. True multiplayer experience.

Spatial indexing + viewport loading

R-tree index enables boards with 100K+ objects. Only viewport objects sent to client. Pan/zoom triggers delta loading. Performance stays constant regardless of board size.

Offline support + per-user undo + compaction

Offline edits buffer and merge on reconnect. Per-user undo stacks. Periodic CRDT compaction removes ancient tombstones. Production-grade collaboration at scale.

AI-assisted + embedded content + version history

AI generates diagrams from text prompts, auto-layout for messy boards, smart grouping suggestions. Embed live data (Jira cards, Google Sheets). Full version history with named snapshots and branch/merge for board variants. Commenting and @mentions on specific objects.

📺