Database Replication: Leader-Follower, CAP Theorem & Consistency Strategies for High Availability

Replication is the canonical strategy for high availability: data is copied to multiple database servers. If the primary server fails, a replica can be promoted to take over. But replication introduces complexity. Changes propagate asynchronously—a replica may lag behind the leader by seconds or minutes. Reads from a replica may return stale data. The system must handle failures: what if a replica crashes during a write? What if the network partitions and the leader is isolated? These questions are addressed by the CAP theorem: in a distributed system, you can guarantee at most two of Consistency, Availability, and Partition tolerance. This guide covers replication topologies, failure modes, and strategies for building highly available systems.

Md Sanwar Hossain
Md Sanwar Hossain

Software Engineer · Database · Distributed Systems

Database March 18, 2026 22 min read Database Management Series
Database replication and high availability architecture

Replication Topologies

Leader-Follower (Master-Slave)

One leader node accepts all writes. Writes are replicated to one or more follower nodes (read replicas). Reads can be distributed across followers, scaling read throughput. This is the most common topology.

Leader (Primary) → Followers (Read Replicas)
      ↓ writes replicated asynchronously
      Follower 1
      Follower 2
      Follower 3

Multi-Leader (Master-Master)

Multiple leaders accept writes concurrently. Changes are replicated between leaders. This allows writes in multiple data centers but introduces complexity: conflicting writes must be resolved.

Leaderless (Peer-to-Peer)

All replicas are equal; any can accept writes. Changes are replicated to other replicas asynchronously. Used in systems like Cassandra and Dynamo. Provides high availability but requires conflict resolution.

Replication Lag and Its Implications

Replication is asynchronous: the leader commits a write before it is replicated to followers. In the window between the write and replication, the leader and followers are inconsistent.

t=0: Client writes "balance = 100" to leader
t=0: Leader commits; returns success to client
t=10ms: Replication message reaches follower
t=10ms: Follower applies the write

// Between t=0 and t=10ms, a read from the follower returns stale data

Reading Your Own Writes

After writing, a user reads their own data. If the read hits a replica that has not yet received the replication message, the user sees stale data. Mitigation:

The CAP Theorem: The Fundamental Trade-Off

Any distributed data store can guarantee at most two of:

CA Systems (Consistency + Availability)

Example: Traditional relational databases with synchronous replication. If a partition occurs, the system halts to preserve consistency. Availability is sacrificed.

CP Systems (Consistency + Partition Tolerance)

Example: Leader-follower with synchronous replication. If a network partition isolates the leader, the system halts writes (sacrificing availability) to preserve consistency.

AP Systems (Availability + Partition Tolerance)

Example: Dynamo, Cassandra. If a partition occurs, requests are still served from whichever partition the client is in. Consistency is not guaranteed; the system uses eventual consistency.

Synchronous vs Asynchronous Replication

Synchronous (Strong Consistency)

The leader waits for acknowledgment from all (or a quorum of) followers before committing a write. If any follower fails, the write blocks.

Client → Leader: Write "balance = 100"
Leader → Followers: Replicate
Followers → Leader: Acknowledge (all must ack)
Leader → Client: Write committed (all replicas have it)

Pros: Strong consistency. Cons: Slow (blocked on slowest replica), unavailable if any replica fails.

Asynchronous (Eventual Consistency)

The leader commits immediately and replicates in the background. If the leader crashes before replication, data is lost.

Client → Leader: Write "balance = 100"
Leader → Client: Write committed (immediately)
Leader → Followers: Replicate (in background)

Pros: Fast, available even if replicas fail. Cons: Data loss if leader crashes, stale reads on replicas.

Handling Failures and Failover

Follower Failure

If a follower crashes, the leader continues. Clients reading from that follower are redirected to another replica. When the follower recovers, it re-syncs from the leader using write-ahead logs (or snapshots).

Leader Failure

A more serious scenario. One of the followers must be promoted to become the new leader. Questions arise:

Mitigation: Quorum-based leader election

A replica is promoted to leader only if it has the latest writes from a quorum (majority) of replicas. This reduces the risk of data loss.

Replication Lag Monitoring

-- PostgreSQL
SELECT now() - pg_last_wal_receive_lsn() AS replication_lag;

-- MySQL
SHOW SLAVE STATUS;  -- Seconds_Behind_Master

-- Alert if lag > 1 minute (typical threshold)

Multi-Region Replication and Consistency

A system serving users in multiple continents requires data replication across regions. A user in Europe writes to the European leader; the write is replicated to the US leader (across the Atlantic, ~100ms latency).

Conflict Resolution

If the European and US leaders both receive conflicting writes concurrently, they conflict. Resolution strategies:

Monitoring Replication Health

When to Use Different Replication Strategies

Replication is not optional in modern systems; it is the foundation of reliability and scale. Master these patterns, and you build systems that survive failures and scale globally.

Key Takeaways

Tags:

database replication leader-follower consistency models CAP theorem failover strategies high availability

Read More

Explore related articles on databases, distributed systems, and system design:

Discussion / Comments

Related Posts

System Design

Database Sharding

Horizontal scaling with consistent hashing and resharding.

Database

Transaction Isolation

ACID properties and race condition prevention.

System Design

Distributed Caching

Cache patterns, invalidation, and stampede prevention.

Back to Portfolio

Last updated: March 2026 — Written by Md Sanwar Hossain