System Design

Designing a Collaborative Editing System: Google Docs Architecture, OT vs CRDT & Real-Time Sync

Q: What is The Core Problem and how does it work?

Imagine Alice and Bob are both editing the same document: "Hello" . Alice inserts "World" at position 5 (after "Hello"), and simultaneously Bob deletes position 0 (the "H"). If both operations are applied naively: This is the concurrent edit conflict problem . Operations that were valid when issued become invalid or produce wrong results when applied after other operations. The solution: transform operations before applying them to account for concurrently-applied operations — this is the essence of Operational Transformation. Alice's intended result: "HelloWorld" Bob's intended result: "ello" If Bob's delete is applied first, then Alice's insert at position 5 hits "ello" (only 4 chars) → crash or wrong result.

Q: What is OT Transform Function and how does it work?

The transform function T(op1, op2) takes two concurrent operations and returns a transformed version of op1 that can be applied after op2 has already been applied:

Google Docs allows millions of users to edit the same document simultaneously in real time. Every keystroke must be reflected on all collaborators' screens within milliseconds, conflict-free, even when users are concurrently editing the same paragraph. This problem — collaborative editing — is one of the most subtle and mathematically rich challenges in distributed systems, solved by two competing paradigms: Operational Transformation (OT) and CRDTs.

Md Sanwar Hossain April 6, 2026 19 min read System Design

Collaborative editing system design: Google Docs architecture, operational transformation, CRDT, real-time sync

TL;DR — Core Architecture Decisions

"A collaborative editor needs: (1) a conflict resolution algorithm — either Operational Transformation (OT) as used by Google Docs, or CRDTs (Logoot/LSEQ/Yjs) as used by Figma/Notion — to merge concurrent edits correctly, (2) a WebSocket server (or WebRTC for P2P) to propagate operations in real time, (3) a persistent operation log in the database for document reconstruction and undo history, (4) snapshot checkpointing to avoid replaying the full operation history on load, and (5) presence service for cursor/selection awareness with peer-to-peer efficiency."

The Core Problem: Concurrent Edit Conflicts
Operational Transformation (OT): How Google Docs Works
CRDTs: Conflict-Free Replicated Data Types
OT vs CRDT: When to Choose Which
System Architecture & Storage Design
Real-Time Sync Protocol & WebSocket Architecture
Cursor & Presence Awareness
Offline Editing & Reconnection
Undo/Redo in Collaborative Context
Snapshot Checkpointing & Persistence
Scale & Conclusion

1. The Core Problem: Concurrent Edit Conflicts

Imagine Alice and Bob are both editing the same document: "Hello". Alice inserts "World" at position 5 (after "Hello"), and simultaneously Bob deletes position 0 (the "H"). If both operations are applied naively:

Alice's intended result: "HelloWorld"
Bob's intended result: "ello"
If Bob's delete is applied first, then Alice's insert at position 5 hits "ello" (only 4 chars) → crash or wrong result.
If Alice's insert is applied first, then Bob's delete at position 0 deletes "H" from "HelloWorld" → "elloWorld" — correct! But only by luck.

This is the concurrent edit conflict problem. Operations that were valid when issued become invalid or produce wrong results when applied after other operations. The solution: transform operations before applying them to account for concurrently-applied operations — this is the essence of Operational Transformation.

Collaborative editing system architecture: OT server, operation log, WebSocket gateway, snapshot store, and presence service for Google Docs-scale systems — Collaborative editing platform architecture — OT/CRDT engine, operation log, WebSocket sync, and snapshot checkpointing. Source: mdsanwarhossain.me

2. Operational Transformation (OT): How Google Docs Works

Operational Transformation was invented in 1989 and is the algorithm powering Google Docs (via the Wave/ShareJS lineage). The core idea: before applying a remote operation, transform it against all locally-applied operations it was concurrent with, adjusting positions to account for those operations.

OT Transform Function

The transform function T(op1, op2) takes two concurrent operations and returns a transformed version of op1 that can be applied after op2 has already been applied:

# Simplified OT transform for text Insert/Delete
def transform_insert_insert(op1, op2):
    # op1 = Insert("X", position=5)
    # op2 = Insert("Y", position=3)  [applied before op1]
    # Since op2 inserts before op1's position, shift op1 right
    if op2.position <= op1.position:
        return Insert(op1.char, op1.position + len(op2.text))
    return op1  # op2 is after op1, no shift needed

def transform_insert_delete(op1, op2):
    # op1 = Insert("X", position=5)
    # op2 = Delete(position=3)  [applied before op1]
    # Since op2 deletes before op1's position, shift op1 left
    if op2.position < op1.position:
        return Insert(op1.char, op1.position - 1)
    return op1

Server-Mediated OT Architecture

Pure peer-to-peer OT is computationally intractable for more than 2 users (the transform compositions become exponential). Production systems use a server-mediated approach: the server is the single point that orders operations. Each operation carries a revision number indicating what document state it was based on. When the server receives an operation at revision R that was created on the client at revision C (C < R), the server transforms the operation against operations [C+1, R] before applying and broadcasting it. Clients apply the server's broadcast operations directly — no client-side conflict resolution needed.

3. CRDTs: Conflict-Free Replicated Data Types

CRDTs (Conflict-Free Replicated Data Types) take a different mathematical approach: design the data structure such that concurrent operations always commute — applying them in any order always produces the same result. No transformation is needed.

CRDT for Text: The Tombstone Approach

The key insight in CRDT text types (Logoot, LSEQ, RGA, Yjs) is to assign each character a globally unique, immutable position identifier rather than an integer index. When Alice inserts "W" between "Hello" and the end, she assigns it a unique fractional identifier between position(5) and position(6). When Bob deletes "H", he marks it as a tombstone (deleted but retained in the structure).

# CRDT text representation (simplified RGA/Yjs model)
document = [
    {id: (1, "alice"), char: "H", deleted: false},
    {id: (2, "alice"), char: "e", deleted: false},
    {id: (3, "alice"), char: "l", deleted: false},
    {id: (4, "alice"), char: "l", deleted: false},
    {id: (5, "alice"), char: "o", deleted: false},
]

# Alice inserts "W" after id=(5, "alice"):
alice_op = {type: "insert", after: (5,"alice"), id: (6,"alice"), char: "W"}

# Bob deletes id=(1, "alice") (the "H"):
bob_op = {type: "delete", id: (1,"alice")}

# Both ops can be applied in ANY order and produce identical result:
# [{H, deleted:true}, e, l, l, o, W] → visible: "elloW"
# The position reference (6,"alice") is NEVER invalidated by Bob's delete

CRDT collaborative editing sync: operation broadcasting, tombstone deletion, cursor transformation, and offline merge — CRDT-based collaborative sync: operations commute in any order; tombstones enable offline merges without conflicts. Source: mdsanwarhossain.me

4. OT vs CRDT: When to Choose Which

Dimension	Operational Transformation	CRDT
Server requirement	Required (for ordering)	Optional (P2P possible)
Offline support	Complex (buffer + sync on reconnect)	Natural (ops merge on any reconnect)
Memory overhead	Low (no tombstones)	Higher (tombstones grow over time)
Algorithm complexity	High (correct OT is notoriously hard)	Moderate (well-specified math)
Ordering intent preservation	High (server imposes total order)	Lower (concurrent inserts may interleave)
Used by	Google Docs, Etherpad	Figma, Notion, Linear (Yjs), Apple Notes

Verdict: For centralized server architectures with a reliable network, OT is simpler to reason about and has lower storage overhead. For distributed, offline-first, or P2P architectures, CRDTs are superior. Yjs (a CRDT library) has become the most popular choice for new collaborative applications in 2024–2026 due to its performance, offline support, and rich ecosystem.

5. System Architecture & Storage Design

A production collaborative editor requires several storage layers:

Data	Storage	Purpose
Document snapshots	PostgreSQL / S3 (large docs)	Fast document load without replaying all ops
Operation log	PostgreSQL (append-only)	Undo history, audit log, catch-up for reconnects
Active session state	Redis	Current document state, pending ops, presence info
Presence / cursor data	Redis Pub/Sub	Ephemeral; expires on disconnect
Document metadata (title, ACL)	PostgreSQL	Ownership, sharing permissions, version history

6. Real-Time Sync Protocol & WebSocket Architecture

Each document session is managed by a dedicated document server (or a WebSocket connection to a document service). The protocol:

Client connects: Opens WebSocket to document server. Server sends current document snapshot + revision number. Client renders document and subscribes to the operation stream.
Client edits: User types "X" at position 12. Client immediately applies the operation locally (optimistic UI — the edit appears instantly). Client sends operation to server with the last-known revision: {op: Insert("X", 12), clientRevision: 47}.
Server processes: Server receives operation. If clientRevision == serverRevision, no conflict — apply directly. If clientRevision < serverRevision, transform the operation against operations [clientRevision+1, serverRevision], then apply. Assign the operation revision 48. Broadcast to all connected clients.
Clients receive broadcast: Each client applies the operation to their local document state, adjusting for any pending local operations via OT/CRDT.

7. Cursor & Presence Awareness

Seeing other collaborators' cursors moving in real time is a key Google Docs experience feature. Cursor positions must be transmitted without interfering with document operations and must be adjusted when document operations shift text positions.

Cursor as an Operation

Cursor position is broadcast as a lightweight ephemeral event (not persisted). When Alice's cursor is at position 23 and Bob inserts 5 characters at position 10, Alice's cursor must shift to position 28 — exactly the same transformation logic applied to edits. The presence service sends cursor updates at most 30 times per second (throttled), transmitted via the same WebSocket channel. Cursor events are separated from document operations in the protocol and do not get an operation revision — they are purely ephemeral and can be dropped without affecting document correctness.

8. Offline Editing & Reconnection

Google Docs supports offline editing. When the network drops, the client continues accepting edits and buffers operations locally in IndexedDB (the browser's local database). When connectivity is restored:

Client reconnects to document server and reports its last-applied revision (e.g., revision 50).
Server sends all operations from revision 51 to current (e.g., operations 51–75 applied by other collaborators during the outage).
Client transforms its buffered local operations against the received server operations and sends them to the server.
Server integrates the client's operations via the same OT/CRDT pipeline.

For CRDT-based systems, this is even simpler: all buffered ops are simply merged with server state — no transformation needed since CRDT operations commute.

9. Undo/Redo in Collaborative Context

Undo in a collaborative editor is conceptually harder than undo in a single-user editor. If Alice types "Hello" and Bob then types "World", and Alice presses Undo — should she undo just her typing (reverting to the empty state), or should she see the document without her "Hello" but with Bob's "World" preserved?

The universally accepted answer: selective undo — undo only the specific user's own operations, leaving concurrent operations from other users intact. This is implemented by inverting the user's operation (Insert(X, pos) → Delete(X, pos)) and transforming the inverse operation against all subsequent operations by any user. The transformed inverse is then applied as a new operation, which is itself broadcast and logged.

10. Snapshot Checkpointing & Persistence

A heavily edited document might accumulate millions of operations over its lifetime. Loading the document by replaying all operations from scratch would take seconds or minutes. The solution: periodic snapshot checkpointing.

Every N operations (e.g., N=100) or every T minutes, the document server writes the current full document state as a snapshot to PostgreSQL/S3.
The snapshot includes a revision number (e.g., snapshot at revision 5000).
On document load, the server fetches the latest snapshot (O(1)) and then replays only the operations since that snapshot (typically <100 operations).
Old snapshots and operations before the snapshot can eventually be deleted (with user-configurable retention for "version history").

11. Scale & Conclusion

Google Docs Scale Estimation

Google Docs: 1 billion+ active documents; millions of concurrent editing sessions
Peak concurrent editors per document: typically 1–20; outliers (shared class docs) 100–1000
Operation rate: ~5 operations/second per active editor × 5 editors = 25 ops/sec per document
1 million active document sessions × 25 ops/sec = 25 million operations/sec peak
Each operation: ~50 bytes → 1.25 GB/sec broadcast bandwidth at peak
Document servers: stateful (hold in-memory document state); shard by documentId

Collaborative editing is one of the most intellectually demanding distributed systems problems. The combination of concurrent state management (OT/CRDT), real-time networking (WebSocket), offline-first architecture, and user experience constraints (sub-100ms perceived latency for typing) makes it uniquely challenging. The industry has largely converged on CRDT-based approaches for new systems — Yjs in particular has become the standard library, powering Notion, Linear, and dozens of other tools — while Google Docs continues to use its battle-tested OT implementation. Understanding both paradigms gives you deep insight into distributed state management that applies far beyond editors.

Frequently Asked Questions

What is TL;DR — Core Architecture Decisions and how does it work?

What is The Core Problem and how does it work?

Imagine Alice and Bob are both editing the same document: "Hello" . Alice inserts "World" at position 5 (after "Hello"), and simultaneously Bob deletes position 0 (the "H"). If both operations are applied naively: This is the concurrent edit conflict problem . Operations that were valid when issued become invalid or produce wrong results when applied after other operations. The solution: transform operations before applying them to account for concurrently-applied operations — this is the essence of Operational Transformation. Alice's intended result: "HelloWorld" Bob's intended result: "ello" If Bob's delete is applied first, then Alice's insert at position 5 hits "ello" (only 4 chars) → crash or wrong result.

What is Operational Transformation (OT) and how does it work?

Operational Transformation was invented in 1989 and is the algorithm powering Google Docs (via the Wave/ShareJS lineage). The core idea: before applying a remote operation, transform it against all locally-applied operations it was concurrent with, adjusting positions to account for those operations.

What is OT Transform Function and how does it work?

The transform function T(op1, op2) takes two concurrent operations and returns a transformed version of op1 that can be applied after op2 has already been applied:

What is Server-Mediated OT Architecture and how does it work?

Pure peer-to-peer OT is computationally intractable for more than 2 users (the transform compositions become exponential). Production systems use a server-mediated approach: the server is the single point that orders operations. Each operation carries a revision number indicating what document state it was based on. When the server receives an operation at revision R that was created on the client at revision C (C < R), the server transforms the operation against operations [C+1, R] before applying and broadcasting it. Clients apply the server's broadcast operations directly — no client-side conflict resolution needed.

Designing a Collaborative Editing System: Google Docs Architecture, OT vs CRDT & Real-Time Sync

TL;DR — Core Architecture Decisions

Table of Contents

1. The Core Problem: Concurrent Edit Conflicts

2. Operational Transformation (OT): How Google Docs Works

OT Transform Function

Server-Mediated OT Architecture

3. CRDTs: Conflict-Free Replicated Data Types

CRDT for Text: The Tombstone Approach

4. OT vs CRDT: When to Choose Which

5. System Architecture & Storage Design

6. Real-Time Sync Protocol & WebSocket Architecture

7. Cursor & Presence Awareness

Cursor as an Operation

8. Offline Editing & Reconnection

9. Undo/Redo in Collaborative Context

10. Snapshot Checkpointing & Persistence

11. Scale & Conclusion

Google Docs Scale Estimation

Frequently Asked Questions

What is TL;DR — Core Architecture Decisions and how does it work?

What is The Core Problem and how does it work?

What is Operational Transformation (OT) and how does it work?

What is OT Transform Function and how does it work?

What is Server-Mediated OT Architecture and how does it work?

Tags

Leave a Comment

Related Posts

Designing a Collaborative Editing System: Google Docs Architecture, OT vs CRDT & Real-Time Sync

TL;DR — Core Architecture Decisions

Table of Contents

1. The Core Problem: Concurrent Edit Conflicts

2. Operational Transformation (OT): How Google Docs Works

OT Transform Function

Server-Mediated OT Architecture

3. CRDTs: Conflict-Free Replicated Data Types

CRDT for Text: The Tombstone Approach

4. OT vs CRDT: When to Choose Which

5. System Architecture & Storage Design

6. Real-Time Sync Protocol & WebSocket Architecture

7. Cursor & Presence Awareness

Cursor as an Operation

8. Offline Editing & Reconnection

9. Undo/Redo in Collaborative Context

10. Snapshot Checkpointing & Persistence

11. Scale & Conclusion

Google Docs Scale Estimation

Frequently Asked Questions

What is TL;DR — Core Architecture Decisions and how does it work?

What is The Core Problem and how does it work?

What is Operational Transformation (OT) and how does it work?

What is OT Transform Function and how does it work?

What is Server-Mediated OT Architecture and how does it work?

Tags

Leave a Comment

Related Posts

Designing a Real-Time Notification System

Redis Distributed Locking in Production

Designing a Global Chat System

Event-Driven Architecture

Cookie Notice