What is Reading Your Own Writes and how does it work?

After writing, a user reads their own data. If the read hits a replica that has not yet received the replication message, the user sees stale data. Mitigation: Read-your-own-writes consistency : Route reads after writes to the leader for a brief window (e.g., 1 minute). Version numbers : Return the write timestamp to the client; if a reads timestamp is older, re-read from the leader.

Database

Database Replication: Leader-Follower, CAP Theorem & Consistency Strategies for High Availability

Replication is the canonical strategy for high availability: data is copied to multiple database servers. If the primary server fails, a replica can be promoted to take over. But replication introduces complexity. Changes propagate asynchronously—a replica may lag behind the leader by seconds or minutes. Reads from a replica may return stale data. The system must handle failures: what if a replica crashes during a write? What if the network partitions and the leader is isolated? These questions are addressed by the CAP theorem: in a distributed system, you can guarantee at most two of Consistency, Availability, and Partition tolerance. This guide covers replication topologies, failure modes, and strategies for building highly available systems.

Md Sanwar Hossain March 18, 2026 22 min read Database

Database replication and high availability architecture

Replication Topologies
Replication Lag and Its Implications
The CAP Theorem: The Fundamental Trade-Off
Synchronous vs Asynchronous Replication
Handling Failures and Failover
Replication Lag Monitoring
Multi-Region Replication and Consistency
Monitoring Replication Health
When to Use Different Replication Strategies
Key Takeaways
Read More

Replication Topologies

Database Replication Patterns | mdsanwarhossain.me — Database Replication Patterns — mdsanwarhossain.me

Leader-Follower (Master-Slave)

One leader node accepts all writes. Writes are replicated to one or more follower nodes (read replicas). Reads can be distributed across followers, scaling read throughput. This is the most common topology.

Leader (Primary) → Followers (Read Replicas)
      ↓ writes replicated asynchronously
      Follower 1
      Follower 2
      Follower 3

Multi-Leader (Master-Master)

Multiple leaders accept writes concurrently. Changes are replicated between leaders. This allows writes in multiple data centers but introduces complexity: conflicting writes must be resolved.

Leaderless (Peer-to-Peer)

All replicas are equal; any can accept writes. Changes are replicated to other replicas asynchronously. Used in systems like Cassandra and Dynamo. Provides high availability but requires conflict resolution.

Replication Lag and Its Implications

Replication is asynchronous: the leader commits a write before it is replicated to followers. In the window between the write and replication, the leader and followers are inconsistent.

t=0: Client writes "balance = 100" to leader
t=0: Leader commits; returns success to client
t=10ms: Replication message reaches follower
t=10ms: Follower applies the write

// Between t=0 and t=10ms, a read from the follower returns stale data

Reading Your Own Writes

After writing, a user reads their own data. If the read hits a replica that has not yet received the replication message, the user sees stale data. Mitigation:

Read-your-own-writes consistency: Route reads after writes to the leader for a brief window (e.g., 1 minute).
Version numbers: Return the write timestamp to the client; if a read's timestamp is older, re-read from the leader.

The CAP Theorem: The Fundamental Trade-Off

Replication Architecture | mdsanwarhossain.me — Replication Architecture — mdsanwarhossain.me

Any distributed data store can guarantee at most two of:

Consistency: All clients see the same data.
Availability: Every request returns a response (not an error).
Partition tolerance: The system continues operating even if the network is partitioned.

CA Systems (Consistency + Availability)

Example: Traditional relational databases with synchronous replication. If a partition occurs, the system halts to preserve consistency. Availability is sacrificed.

CP Systems (Consistency + Partition Tolerance)

Example: Leader-follower with synchronous replication. If a network partition isolates the leader, the system halts writes (sacrificing availability) to preserve consistency.

AP Systems (Availability + Partition Tolerance)

Example: Dynamo, Cassandra. If a partition occurs, requests are still served from whichever partition the client is in. Consistency is not guaranteed; the system uses eventual consistency.

Synchronous vs Asynchronous Replication

Synchronous (Strong Consistency)

The leader waits for acknowledgment from all (or a quorum of) followers before committing a write. If any follower fails, the write blocks.

Database Replication Architecture | mdsanwarhossain.me — Database Replication Architecture — mdsanwarhossain.me

Client → Leader: Write "balance = 100"
Leader → Followers: Replicate
Followers → Leader: Acknowledge (all must ack)
Leader → Client: Write committed (all replicas have it)

Pros: Strong consistency. Cons: Slow (blocked on slowest replica), unavailable if any replica fails.

Asynchronous (Eventual Consistency)

The leader commits immediately and replicates in the background. If the leader crashes before replication, data is lost.

Client → Leader: Write "balance = 100"
Leader → Client: Write committed (immediately)
Leader → Followers: Replicate (in background)

Pros: Fast, available even if replicas fail. Cons: Data loss if leader crashes, stale reads on replicas.

Handling Failures and Failover

Follower Failure

If a follower crashes, the leader continues. Clients reading from that follower are redirected to another replica. When the follower recovers, it re-syncs from the leader using write-ahead logs (or snapshots).

Leader Failure

A more serious scenario. One of the followers must be promoted to become the new leader. Questions arise:

How long to wait before declaring the leader dead? If you wait too long, unavailability increases. If you wait too short, you may have two leaders (split-brain).
Which follower to promote? Ideally, the one that is most up-to-date. But how do you know? The network may be partitioned.
What about writes not yet replicated? These are lost. The new leader has a different state than the old leader.

Mitigation: Quorum-based leader election

A replica is promoted to leader only if it has the latest writes from a quorum (majority) of replicas. This reduces the risk of data loss.

Replication Lag Monitoring

-- PostgreSQL
SELECT now() - pg_last_wal_receive_lsn() AS replication_lag;

-- MySQL
SHOW SLAVE STATUS;  -- Seconds_Behind_Master

-- Alert if lag > 1 minute (typical threshold)

Multi-Region Replication and Consistency

A system serving users in multiple continents requires data replication across regions. A user in Europe writes to the European leader; the write is replicated to the US leader (across the Atlantic, ~100ms latency).

Conflict Resolution

If the European and US leaders both receive conflicting writes concurrently, they conflict. Resolution strategies:

Last-write-wins (LWW): The write with the later timestamp wins. Simple but lossy (older writes are discarded).
Application-defined resolution: The application logic determines the winner. For banking, you might reject the conflicting write.
Operational transform (OT) or CRDT: Reorder conflicting writes so both apply consistently on all replicas.

Monitoring Replication Health

replication_lag_seconds: Lag between leader and each follower.
follower_sync_status: Is the follower in sync (caught up) or lagging?
replication_errors: Count of replication errors (network issues, schema mismatches).
failover_duration_seconds: Time to promote a follower to leader (should be < 30 seconds).

When to Use Different Replication Strategies

Asynchronous leader-follower: Scale reads, accept data loss if leader crashes. Use for analytics, caches.
Synchronous replication: Preserve data, accept unavailability. Use for financial systems, critical data.
Multi-leader: Multiple data centers, tolerate conflicts. Use for global systems (CRMs, analytics).
Leaderless (Dynamo-style): High availability, tolerate eventual consistency. Use for real-time applications (feeds, notifications).

Replication is not optional in modern systems; it is the foundation of reliability and scale. Master these patterns, and you build systems that survive failures and scale globally.

Read Replica Patterns in Spring Boot

One of the most valuable applications of leader-follower replication is offloading read traffic from the primary database to replicas. In Spring Boot, this is elegantly handled using AbstractRoutingDataSource — a built-in Spring class that selects a target DataSource at runtime based on a lookup key. The routing key we use is whether the current transaction is read-only or read-write.

Step 1: The Routing Key Enum and Thread-Local Context

public enum DataSourceType {
    PRIMARY, REPLICA
}

public class DataSourceContextHolder {
    private static final ThreadLocal<DataSourceType> contextHolder = new ThreadLocal<>();

    public static void setDataSourceType(DataSourceType type) {
        contextHolder.set(type);
    }

    public static DataSourceType getDataSourceType() {
        DataSourceType type = contextHolder.get();
        return (type != null) ? type : DataSourceType.PRIMARY;
    }

    public static void clearDataSourceType() {
        contextHolder.remove();
    }
}

Step 2: AbstractRoutingDataSource Implementation

public class ReadWriteRoutingDataSource extends AbstractRoutingDataSource {

    @Override
    protected Object determineCurrentLookupKey() {
        return DataSourceContextHolder.getDataSourceType();
    }
}

Step 3: DataSource Configuration

@Configuration
public class ReplicationDataSourceConfig {

    @Bean
    @ConfigurationProperties("spring.datasource.primary")
    public DataSource primaryDataSource() {
        return DataSourceBuilder.create().build();
    }

    @Bean
    @ConfigurationProperties("spring.datasource.replica")
    public DataSource replicaDataSource() {
        return DataSourceBuilder.create().build();
    }

    @Bean
    @Primary
    public DataSource routingDataSource(
            @Qualifier("primaryDataSource") DataSource primary,
            @Qualifier("replicaDataSource") DataSource replica) {
        Map<Object, Object> targetDataSources = new HashMap<>();
        targetDataSources.put(DataSourceType.PRIMARY, primary);
        targetDataSources.put(DataSourceType.REPLICA, replica);

        ReadWriteRoutingDataSource routingDataSource = new ReadWriteRoutingDataSource();
        routingDataSource.setTargetDataSources(targetDataSources);
        routingDataSource.setDefaultTargetDataSource(primary);
        return routingDataSource;
    }
}

Step 4: AOP Interceptor to Route Based on @Transactional(readOnly=true)

@Aspect
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class ReplicationRoutingAspect {

    @Around("@annotation(transactional)")
    public Object routeDataSource(ProceedingJoinPoint pjp,
                                  Transactional transactional) throws Throwable {
        if (transactional.readOnly()) {
            DataSourceContextHolder.setDataSourceType(DataSourceType.REPLICA);
        } else {
            DataSourceContextHolder.setDataSourceType(DataSourceType.PRIMARY);
        }
        try {
            return pjp.proceed();
        } finally {
            DataSourceContextHolder.clearDataSourceType();
        }
    }
}

Step 5: Using It in Your Service

@Service
@RequiredArgsConstructor
public class LoanService {

    private final LoanRepository loanRepository;

    // Routes to REPLICA — no primary contention for dashboard queries
    @Transactional(readOnly = true)
    public List<Loan> getActiveLoans(Long customerId) {
        return loanRepository.findByCustomerIdAndStatus(customerId, LoanStatus.ACTIVE);
    }

    // Routes to PRIMARY — writes always go to the primary
    @Transactional
    public Loan disburseLoan(LoanRequest request) {
        return loanRepository.save(new Loan(request));
    }
}

With this setup, annotating any service method with @Transactional(readOnly=true) automatically routes that query to the replica. Write transactions go to the primary. This pattern is in production at scale in many companies. The cost is near-zero: a ThreadLocal lookup per connection acquisition.

⚠️ Important: The AOP aspect must have a higher precedence than Spring's transaction interceptor (@Order(Ordered.HIGHEST_PRECEDENCE)). Otherwise the connection is already bound to a datasource before you route it, and the routing has no effect.

application.yml Configuration

spring:
  datasource:
    primary:
      url: jdbc:mysql://primary-db:3306/loandb
      username: app_user
      password: ${DB_PRIMARY_PASSWORD}
      driver-class-name: com.mysql.cj.jdbc.Driver
      hikari:
        maximum-pool-size: 20
        minimum-idle: 5
    replica:
      url: jdbc:mysql://replica-db:3306/loandb
      username: app_user
      password: ${DB_REPLICA_PASSWORD}
      driver-class-name: com.mysql.cj.jdbc.Driver
      hikari:
        maximum-pool-size: 30   # More connections: 70-80% of queries are reads
        minimum-idle: 10

The replica pool size is intentionally larger than the primary pool. In a typical OLTP application, 70–80% of queries are reads. Giving the replica a bigger connection pool allows more concurrent readers without queueing on HikariCP.

At BRAC IT: Our Replication War Story

Theory is clean. Production is messy. Here is a real incident from BRAC IT in March 2025 that taught us painful lessons about replication lag and stale data in a loan management system.

The Setup

We run a MySQL 8.0 primary-replica topology for our loan management system. The primary handles all writes — loan applications, approvals, disbursements, repayments. Two replicas serve read traffic: dashboard queries, eligibility calculations, and reporting. The replicas are asynchronous — standard configuration for performance.

The Incident

In March 2025, our data team ran a bulk migration to backfill a new risk_score column on the customers table — 2.3 million rows. The migration ran during business hours (a mistake, as we learned). The batch updates were large and generated an enormous amount of binary log events.

Within 30 minutes, replication lag on both replicas climbed from near-zero to 6 hours. The replicas were consuming binary log events as fast as they could, but the migration was flooding the binlog faster than they could apply. Our dashboards were effectively showing the state of the database from six hours earlier.

The Business Impact

Our loan officers use a dashboard that reads from the replica. The dashboard included a loan eligibility calculation that used the risk_score field. Since the replicas were 6 hours behind, the dashboard was showing old risk scores — scores from before the migration backfill had applied. Customers whose risk score had been upgraded to "low risk" were still showing as "medium risk" on the replica. Three loan officers approved applications based on stale data. Two of those approvals would have qualified for higher loan limits had the officers seen the updated scores. The discrepancy was caught in post-approval reconciliation, but it required manual re-processing and customer callbacks.

The Permanent Fix: Lag Monitoring with a Spring Boot HealthIndicator

We implemented a custom Spring Boot HealthIndicator that queries the replica for Seconds_Behind_Master and marks the application instance as DOWN for Kubernetes readiness probes when lag exceeds a threshold. This integrates with our Slack alerting pipeline automatically.

@Component
public class ReplicationLagHealthIndicator implements HealthIndicator {

    private final JdbcTemplate replicaJdbcTemplate;
    private static final long LAG_THRESHOLD_SECONDS = 30L;

    public ReplicationLagHealthIndicator(
            @Qualifier("replicaDataSource") DataSource replicaDataSource) {
        this.replicaJdbcTemplate = new JdbcTemplate(replicaDataSource);
    }

    @Override
    public Health health() {
        try {
            List<Map<String, Object>> statusRows =
                replicaJdbcTemplate.queryForList("SHOW SLAVE STATUS");

            if (statusRows.isEmpty()) {
                return Health.down()
                    .withDetail("error", "SHOW SLAVE STATUS returned no rows")
                    .build();
            }

            Map<String, Object> status = statusRows.get(0);
            Object lagObj = status.get("Seconds_Behind_Master");

            if (lagObj == null) {
                return Health.down()
                    .withDetail("error", "Replication not running")
                    .withDetail("IO_Running",  status.get("Slave_IO_Running"))
                    .withDetail("SQL_Running", status.get("Slave_SQL_Running"))
                    .build();
            }

            long lagSeconds = ((Number) lagObj).longValue();

            if (lagSeconds > LAG_THRESHOLD_SECONDS) {
                return Health.down()
                    .withDetail("replication_lag_seconds", lagSeconds)
                    .withDetail("threshold_seconds", LAG_THRESHOLD_SECONDS)
                    .withDetail("message", "Replica is lagging — stale reads risk")
                    .build();
            }

            return Health.up()
                .withDetail("replication_lag_seconds", lagSeconds)
                .withDetail("master_log_file", status.get("Master_Log_File"))
                .build();

        } catch (Exception e) {
            return Health.down(e)
                .withDetail("error", "Failed to query replica: " + e.getMessage())
                .build();
        }
    }
}

We expose this via /actuator/health/replicationLag and scrape it with Prometheus. A Grafana alert fires to PagerDuty when lag exceeds 30 seconds, and Slack gets a notification at 60 seconds. The runbook: if lag exceeds 5 minutes, the on-call engineer disables read routing to the replica and points all traffic at the primary until replication recovers.

Lessons Learned

Never run large batch operations against the primary during business hours without monitoring replica lag in real time.
Use Percona's pt-online-schema-change or gh-ost for large migrations — both throttle based on replica lag and slow down automatically when lag climbs.
Money decisions cannot tolerate stale reads. All loan eligibility calculations that influence approval decisions are now forced to the primary with @Transactional(readOnly=false), accepting slightly higher latency for data correctness.
Add replica lag to your deployment checklist. A deployment that generates many writes (e.g., schema migration, data backfill) should be monitored for lag before traffic is restored.

Replication vs Sharding vs Clustering: Decision Matrix

Engineers often confuse or conflate replication, sharding, and clustering. They solve different problems and have very different operational profiles. Here is a decision matrix to help you choose the right approach for your system:

Feature	Leader-Follower Replication	Sharding	Galera Cluster	CockroachDB / Spanner
Write Scalability	❌ Single writer	✅ Horizontal across shards	⚠️ All nodes accept writes; certification overhead	✅ Distributed writes
Read Scalability	✅ Add replicas	✅ Each shard scales independently	✅ Any node serves reads	✅ Any node serves reads
Consistency	Eventual (async) or strong (sync)	Strong per shard; eventual cross-shard	Strong (synchronous certification)	Serializable (linearizable)
Operational Complexity	Low–Medium	High (resharding is painful)	Medium	Low (cloud-managed) or Very High (self-hosted)
Best For	Read-heavy OLTP, reporting workloads	Very large datasets (>1 TB per table)	Multi-master HA, no SPOF	Global strongly-consistent applications
Spring Boot Support	AbstractRoutingDataSource, HikariCP	ShardingSphere, custom routing	Standard JDBC (transparent)	Standard JDBC driver
Latency	Low reads from local replica	Low per shard; high for cross-shard joins	Higher — all nodes must certify writes	High cross-region; low same-region

At BRAC IT, leader-follower replication meets our needs. Our dataset is large but not at the sharding threshold. Galera is operationally complex for a small engineering team. CockroachDB would be our choice if we moved to multi-region, but for now, a well-tuned MySQL primary with two replicas handles our load with comfortable headroom.

Handling Split-Brain in MySQL Group Replication

Split-brain is one of the most dangerous scenarios in any distributed database. It occurs when a network partition causes two nodes — that were part of the same cluster — to each believe they are the legitimate primary. Both start accepting writes independently. When the partition heals, you have two divergent transaction histories that must be reconciled.

How MySQL Group Replication Prevents Split-Brain

MySQL Group Replication uses a Paxos-based consensus protocol with a strict quorum requirement. A group of N nodes requires at least (N/2 + 1) nodes active before any node can process writes. When a network partition occurs:

The partition with a majority continues operating normally as the primary group.
The minority partition loses quorum and enters read-only mode — it cannot accept writes.
This prevents split-brain: only the majority partition can write, so there is only one source of truth.

For a 3-node group: you need at least 2 nodes in quorum. If one node loses connectivity to the other two, it goes read-only. The two-node majority continues serving writes uninterrupted.

-- Check group membership status on any node
SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

-- Healthy 3-node group:
-- | db-node-1 | 3306 | ONLINE | PRIMARY   |
-- | db-node-2 | 3306 | ONLINE | SECONDARY |
-- | db-node-3 | 3306 | ONLINE | SECONDARY |

-- After a partition (node-3 lost quorum):
-- | db-node-3 | 3306 | ERROR  | SECONDARY |  ← Read-only, needs rejoining

Diagnosing a Split-Brain Scenario

Despite quorum protection, real-world incidents happen — misconfigured firewalls, split-second network glitches during leader elections, or bugs in network drivers. Here is the step-by-step recovery playbook:

Step 1: Identify which node has more recent data by comparing GTID sets.

-- Run on EACH node to compare their transaction history:
SELECT @@global.gtid_executed;

-- Node A: 550e8400-e29b-41d4-a716-446655440000:1-10050
-- Node B: 550e8400-e29b-41d4-a716-446655440000:1-10047
-- Node A has 3 more committed transactions — it is the "winner"

Step 2: Trust the node with more committed transactions. Node A is the authoritative source. The 3 extra transactions on Node A represent real writes that Node B never received. Discarding them means data loss; you must decide whether to accept that or manually replay those transactions.

Step 3: Extract divergent transactions from the binary log if they must be preserved.

-- On the winning node (Node A), find the binary log containing the divergent transactions
SHOW MASTER LOGS;

-- Extract transactions 10048-10050 from the binlog:
mysqlbinlog --start-position=<pos> --stop-position=<pos> mysql-bin.000042 > recover.sql

-- Carefully review recover.sql — replay against the reconciled primary only after validation

Step 4: Reset and rejoin the losing node as a fresh secondary.

-- On the losing node (Node B):
STOP GROUP_REPLICATION;
RESET MASTER;   -- Clears binary logs and GTID state

-- Rejoin the group — Group Replication's distributed recovery will re-sync it:
SET GLOBAL group_replication_bootstrap_group = OFF;
START GROUP_REPLICATION;

-- Group Replication clones or streams binary logs from a donor node automatically.
-- Monitor progress:
SELECT MEMBER_STATE FROM performance_schema.replication_group_members
WHERE MEMBER_HOST = 'db-node-2';
-- RECOVERING → ONLINE (may take minutes for large datasets)

⚠️ Never run Group Replication with only 2 nodes in production. A 2-node cluster cannot achieve quorum after a single node failure (1 of 2 is not a majority). The cluster becomes completely unavailable — worse than a single-node setup. Always use a minimum of 3 nodes, ideally 5 for higher fault tolerance without sacrificing performance.

Replication Monitoring: Spring Boot Scheduled Job with Micrometer

Monitoring replication lag should not be an afterthought bolted on after an incident. Here is a production-ready Spring Boot scheduled job that queries SHOW SLAVE STATUS every 10 seconds and publishes lag as a Micrometer gauge — feeding directly into Prometheus and your Grafana dashboards.

@Component
@Slf4j
public class ReplicationLagMonitor {

    private final JdbcTemplate replicaJdbcTemplate;
    private final MeterRegistry meterRegistry;
    private final AtomicLong currentLagSeconds = new AtomicLong(0L);

    public ReplicationLagMonitor(
            @Qualifier("replicaDataSource") DataSource replicaDataSource,
            MeterRegistry meterRegistry) {
        this.replicaJdbcTemplate = new JdbcTemplate(replicaDataSource);
        this.meterRegistry = meterRegistry;

        // Register a Gauge — always reflects the latest measured value
        Gauge.builder("db.replication.lag.seconds", currentLagSeconds, AtomicLong::doubleValue)
             .description("Replication lag between primary and replica in seconds")
             .tag("replica", "replica-01")
             .register(meterRegistry);
    }

    @Scheduled(fixedDelay = 10_000)   // Poll every 10 seconds
    public void checkReplicationLag() {
        try {
            List<Map<String, Object>> statusRows =
                replicaJdbcTemplate.queryForList("SHOW SLAVE STATUS");

            if (statusRows.isEmpty()) {
                log.warn("SHOW SLAVE STATUS returned no rows — replication not configured");
                currentLagSeconds.set(-1L);
                return;
            }

            Map<String, Object> status = statusRows.get(0);
            String ioRunning  = (String) status.get("Slave_IO_Running");
            String sqlRunning = (String) status.get("Slave_SQL_Running");

            if (!"Yes".equals(ioRunning) || !"Yes".equals(sqlRunning)) {
                log.error("Replication threads down! IO={} SQL={}", ioRunning, sqlRunning);
                currentLagSeconds.set(Long.MAX_VALUE);   // Sentinel for "broken replication"
                meterRegistry.counter("db.replication.thread.errors").increment();
                return;
            }

            Object lagObj = status.get("Seconds_Behind_Master");
            long lagSeconds = (lagObj != null) ? ((Number) lagObj).longValue() : 0L;
            currentLagSeconds.set(lagSeconds);

            if (lagSeconds > 60) {
                log.warn("HIGH replication lag: {} seconds — consider disabling replica reads", lagSeconds);
            } else if (lagSeconds > 10) {
                log.info("Elevated replication lag: {} seconds", lagSeconds);
            }

        } catch (Exception e) {
            log.error("Failed to check replication lag: {}", e.getMessage(), e);
            meterRegistry.counter("db.replication.check.errors").increment();
        }
    }
}

Prometheus Alert Rules

# prometheus-rules.yml
groups:
  - name: database_replication
    rules:
      - alert: ReplicationLagWarning
        expr: db_replication_lag_seconds{replica="replica-01"} > 30
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "MySQL replica lag exceeds 30 seconds"
          description: "Replica {{ $labels.replica }} is {{ $value }}s behind primary."

      - alert: ReplicationLagCritical
        expr: db_replication_lag_seconds{replica="replica-01"} > 300
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "MySQL replica lag CRITICAL (>5 minutes)"
          description: "Disable read routing immediately. Replica {{ $labels.replica }} is {{ $value }}s behind."

      - alert: ReplicationThreadDown
        expr: db_replication_lag_seconds{replica="replica-01"} == 9.223372036854776e+18
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "MySQL replication threads are NOT RUNNING"
          description: "Replica {{ $labels.replica }} IO or SQL thread is down. Immediate action required."

PostgreSQL Replication Monitoring

-- On the PRIMARY: check each standby's lag
SELECT
    application_name,
    state,
    write_lag,
    flush_lag,
    replay_lag,
    sync_state
FROM pg_stat_replication;

-- On the STANDBY itself: check how far behind it is
SELECT
    now() - pg_last_xact_replay_timestamp() AS replication_lag,
    pg_is_in_recovery()                      AS is_standby,
    pg_last_wal_receive_lsn()                AS received_lsn,
    pg_last_wal_replay_lsn()                 AS replayed_lsn;

Key difference: in PostgreSQL you query pg_stat_replication on the primary to see how far behind each standby is. In MySQL you query SHOW SLAVE STATUS on the replica itself. Both should be part of your monitoring setup. At BRAC IT, we scrape both in Grafana, giving us a real-time single-pane view of replication health across the entire MySQL topology — primary binlog position, replica IO thread status, SQL thread status, and current lag in seconds.

Key Takeaways

Leader-follower replication is the most common topology. Scale reads by distributing them across followers.
Asynchronous replication is fast but risks data loss. Synchronous is safe but slow. Choose based on your requirements.
Replication lag is inevitable. Design applications to handle eventual consistency using read-your-own-writes and other consistency patterns.
The CAP theorem is unavoidable: choose between consistency and availability during network partitions.
Failover is operationally complex and risky. Automate it carefully and test thoroughly in staging.
Monitor replication lag, follower availability, and failover readiness continuously. Silent replication failures are dangerous.

Tags:

database replication leader-follower consistency models CAP theorem failover strategies high availability

Explore related articles on databases, distributed systems, and system design:

Database Sharding at Scale: Consistent Hashing and Resharding — combine replication with sharding for unlimited scale
Transaction Isolation Levels: ACID, MVCC, and Optimizing Performance — understand isolation in replicated systems
Distributed Caching Patterns: Invalidation, Consistency & Cache Stampedes — use caching alongside replication

Discussion / Comments

Back to Portfolio

Last updated: March 2026 — Written by Md Sanwar Hossain

Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Back to Blog

Last updated: March 2026