Database Replication: Leader-Follower, CAP Theorem & Consistency Strategies for High Availability
Replication is the canonical strategy for high availability: data is copied to multiple database servers. If the primary server fails, a replica can be promoted to take over. But replication introduces complexity. Changes propagate asynchronously—a replica may lag behind the leader by seconds or minutes. Reads from a replica may return stale data. The system must handle failures: what if a replica crashes during a write? What if the network partitions and the leader is isolated? These questions are addressed by the CAP theorem: in a distributed system, you can guarantee at most two of Consistency, Availability, and Partition tolerance. This guide covers replication topologies, failure modes, and strategies for building highly available systems.
Software Engineer · Database · Distributed Systems
Replication Topologies
Leader-Follower (Master-Slave)
One leader node accepts all writes. Writes are replicated to one or more follower nodes (read replicas). Reads can be distributed across followers, scaling read throughput. This is the most common topology.
Leader (Primary) → Followers (Read Replicas)
↓ writes replicated asynchronously
Follower 1
Follower 2
Follower 3
Multi-Leader (Master-Master)
Multiple leaders accept writes concurrently. Changes are replicated between leaders. This allows writes in multiple data centers but introduces complexity: conflicting writes must be resolved.
Leaderless (Peer-to-Peer)
All replicas are equal; any can accept writes. Changes are replicated to other replicas asynchronously. Used in systems like Cassandra and Dynamo. Provides high availability but requires conflict resolution.
Replication Lag and Its Implications
Replication is asynchronous: the leader commits a write before it is replicated to followers. In the window between the write and replication, the leader and followers are inconsistent.
t=0: Client writes "balance = 100" to leader
t=0: Leader commits; returns success to client
t=10ms: Replication message reaches follower
t=10ms: Follower applies the write
// Between t=0 and t=10ms, a read from the follower returns stale data
Reading Your Own Writes
After writing, a user reads their own data. If the read hits a replica that has not yet received the replication message, the user sees stale data. Mitigation:
- Read-your-own-writes consistency: Route reads after writes to the leader for a brief window (e.g., 1 minute).
- Version numbers: Return the write timestamp to the client; if a read's timestamp is older, re-read from the leader.
The CAP Theorem: The Fundamental Trade-Off
Any distributed data store can guarantee at most two of:
- Consistency: All clients see the same data.
- Availability: Every request returns a response (not an error).
- Partition tolerance: The system continues operating even if the network is partitioned.
CA Systems (Consistency + Availability)
Example: Traditional relational databases with synchronous replication. If a partition occurs, the system halts to preserve consistency. Availability is sacrificed.
CP Systems (Consistency + Partition Tolerance)
Example: Leader-follower with synchronous replication. If a network partition isolates the leader, the system halts writes (sacrificing availability) to preserve consistency.
AP Systems (Availability + Partition Tolerance)
Example: Dynamo, Cassandra. If a partition occurs, requests are still served from whichever partition the client is in. Consistency is not guaranteed; the system uses eventual consistency.
Synchronous vs Asynchronous Replication
Synchronous (Strong Consistency)
The leader waits for acknowledgment from all (or a quorum of) followers before committing a write. If any follower fails, the write blocks.
Client → Leader: Write "balance = 100"
Leader → Followers: Replicate
Followers → Leader: Acknowledge (all must ack)
Leader → Client: Write committed (all replicas have it)
Pros: Strong consistency. Cons: Slow (blocked on slowest replica), unavailable if any replica fails.
Asynchronous (Eventual Consistency)
The leader commits immediately and replicates in the background. If the leader crashes before replication, data is lost.
Client → Leader: Write "balance = 100"
Leader → Client: Write committed (immediately)
Leader → Followers: Replicate (in background)
Pros: Fast, available even if replicas fail. Cons: Data loss if leader crashes, stale reads on replicas.
Handling Failures and Failover
Follower Failure
If a follower crashes, the leader continues. Clients reading from that follower are redirected to another replica. When the follower recovers, it re-syncs from the leader using write-ahead logs (or snapshots).
Leader Failure
A more serious scenario. One of the followers must be promoted to become the new leader. Questions arise:
- How long to wait before declaring the leader dead? If you wait too long, unavailability increases. If you wait too short, you may have two leaders (split-brain).
- Which follower to promote? Ideally, the one that is most up-to-date. But how do you know? The network may be partitioned.
- What about writes not yet replicated? These are lost. The new leader has a different state than the old leader.
Mitigation: Quorum-based leader election
A replica is promoted to leader only if it has the latest writes from a quorum (majority) of replicas. This reduces the risk of data loss.
Replication Lag Monitoring
-- PostgreSQL
SELECT now() - pg_last_wal_receive_lsn() AS replication_lag;
-- MySQL
SHOW SLAVE STATUS; -- Seconds_Behind_Master
-- Alert if lag > 1 minute (typical threshold)
Multi-Region Replication and Consistency
A system serving users in multiple continents requires data replication across regions. A user in Europe writes to the European leader; the write is replicated to the US leader (across the Atlantic, ~100ms latency).
Conflict Resolution
If the European and US leaders both receive conflicting writes concurrently, they conflict. Resolution strategies:
- Last-write-wins (LWW): The write with the later timestamp wins. Simple but lossy (older writes are discarded).
- Application-defined resolution: The application logic determines the winner. For banking, you might reject the conflicting write.
- Operational transform (OT) or CRDT: Reorder conflicting writes so both apply consistently on all replicas.
Monitoring Replication Health
replication_lag_seconds: Lag between leader and each follower.follower_sync_status: Is the follower in sync (caught up) or lagging?replication_errors: Count of replication errors (network issues, schema mismatches).failover_duration_seconds: Time to promote a follower to leader (should be < 30 seconds).
When to Use Different Replication Strategies
- Asynchronous leader-follower: Scale reads, accept data loss if leader crashes. Use for analytics, caches.
- Synchronous replication: Preserve data, accept unavailability. Use for financial systems, critical data.
- Multi-leader: Multiple data centers, tolerate conflicts. Use for global systems (CRMs, analytics).
- Leaderless (Dynamo-style): High availability, tolerate eventual consistency. Use for real-time applications (feeds, notifications).
Replication is not optional in modern systems; it is the foundation of reliability and scale. Master these patterns, and you build systems that survive failures and scale globally.
Key Takeaways
- Leader-follower replication is the most common topology. Scale reads by distributing them across followers.
- Asynchronous replication is fast but risks data loss. Synchronous is safe but slow. Choose based on your requirements.
- Replication lag is inevitable. Design applications to handle eventual consistency using read-your-own-writes and other consistency patterns.
- The CAP theorem is unavoidable: choose between consistency and availability during network partitions.
- Failover is operationally complex and risky. Automate it carefully and test thoroughly in staging.
- Monitor replication lag, follower availability, and failover readiness continuously. Silent replication failures are dangerous.
Tags:
Read More
Explore related articles on databases, distributed systems, and system design:
- Database Sharding at Scale: Consistent Hashing and Resharding — combine replication with sharding for unlimited scale
- Transaction Isolation Levels: ACID, MVCC, and Optimizing Performance — understand isolation in replicated systems
- Distributed Caching Patterns: Invalidation, Consistency & Cache Stampedes — use caching alongside replication
Discussion / Comments
Related Posts
Last updated: March 2026 — Written by Md Sanwar Hossain