System Design

Read Replica Lag in Production: Patterns to Handle Eventual Consistency Without Breaking Your Application

Read replicas improve scale and latency, but asynchronous replication introduces lag that can break user trust in subtle ways. This guide covers practical, production-grade consistency and routing patterns so you can keep read throughput high without serving stale data at the worst possible moments.

Md Sanwar Hossain March 22, 2026 16 min read System Design

Database replica lag production patterns

The Silent Bug That Made Customers Call Support
What Causes Read Replica Lag?
Pattern 1: Read-Your-Writes Consistency
Pattern 2: Session Consistency with LSN Tokens
Pattern 3: Sticky Routing to Primary After Writes
Pattern 4: Lag Detection and Monitoring
Pattern 5: Circuit Breaker for Lagging Replicas
Read Replica Failure Scenarios
When NOT to Use Read Replicas
Optimization Techniques
Summary: Replica Lag Pattern Decision Matrix
Key Takeaways
Read More on the Blog

The Silent Bug That Made Customers Call Support

Read Replica Lag Architecture | mdsanwarhossain.me — Read Replica Lag Architecture — mdsanwarhossain.me

Picture this: a customer places an order on your e-commerce platform. The payment goes through, the order confirmation page loads, and the customer immediately clicks "My Orders" to verify the purchase. The page loads — and the order isn't there. They refresh. Still nothing. They call support in a panic, convinced the payment was taken but no order placed.

Five minutes later, the order magically appears. Your support team explains it was a "temporary glitch." But you know the truth: your application read from a replica that was lagging 4 seconds behind the primary. The write went to primary, the read came from a stale replica, and for a brief window, the data was invisible.

This is read replica lag in production, and it is one of the most insidious classes of distributed systems bugs because the system is technically correct — it's eventually consistent — but the user experience is broken. This post covers the practical patterns engineers use to handle replica lag without regressing to always-read-from-primary.

What Causes Read Replica Lag?

Understanding the root causes of lag is essential before applying mitigations. In PostgreSQL, replication works through Write-Ahead Log (WAL) shipping. Every change on the primary is first written to the WAL, then streamed to standby replicas which replay those changes in order. Several factors introduce lag:

WAL Shipping Delay

The primary ships WAL segments to replicas asynchronously by default. In streaming replication mode, this delay is typically sub-second. However, in high-write scenarios, the replica's WAL receiver can fall behind if it cannot consume WAL as fast as the primary produces it. The replica's apply worker also competes with query workers for I/O and CPU.

Network Latency

In multi-availability-zone or multi-region deployments, network round-trip time directly contributes to lag. A primary in us-east-1 replicating to a replica in eu-west-1 will exhibit lag proportional to the inter-region latency (typically 80–150ms minimum) plus any buffering. This makes cross-region replicas fundamentally different from local replicas.

Query Load on the Replica

A read replica that is handling heavy analytical queries may deprioritize WAL apply. Long-running queries hold row-level locks that can block WAL replay for rows those queries touch. This is why dedicated replicas for analytics are recommended separately from replicas serving low-latency OLTP reads.

Vacuum and Maintenance Operations

PostgreSQL's autovacuum on the replica can cause burst lag spikes. When autovacuum runs an aggressive vacuum on a large table, it competes with the WAL apply process for I/O bandwidth. Monitoring pg_stat_replication.replay_lag will reveal these periodic spikes.

-- Check replica lag from the primary
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

-- On the replica itself
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

Pattern 1: Read-Your-Writes Consistency

Replication Monitoring | mdsanwarhossain.me — Replication Monitoring — mdsanwarhossain.me

Read-your-writes (also called read-your-own-writes) is the guarantee that after a write, the same client will always see that write on subsequent reads. It does not guarantee other clients will see it immediately — only the writing client.

The simplest implementation: route all reads for a session to the primary immediately after any write in that session. Most database proxy layers support this as a routing policy.

// Spring Boot example: routing reads to primary after write
@Service
public class OrderService {

    @Autowired
    private DataSource primaryDataSource;

    @Autowired
    private DataSource replicaDataSource;

    @Transactional
    public Order createOrder(CreateOrderRequest req) {
        // Write goes to primary
        Order order = orderRepository.save(new Order(req));
        // Store a flag in request context
        RequestContext.setWroteThisRequest(true);
        return order;
    }

    public Order getOrder(Long orderId) {
        // If we wrote this request, read from primary
        if (RequestContext.wroteThisRequest()) {
            return orderPrimaryRepo.findById(orderId).orElseThrow();
        }
        return orderReplicaRepo.findById(orderId).orElseThrow();
    }
}

This is effective but coarse-grained. It routes the entire request to primary after any write, which can negate the read-scaling benefit for requests that do many reads after one write.

Pattern 2: Session Consistency with LSN Tokens

A more precise approach uses the Log Sequence Number (LSN) from the primary after a write. The client receives this LSN as a consistency token. Before any subsequent read, the application checks whether the replica's applied LSN has caught up to the token. If yes, read from replica; if no, read from primary or wait.

Read Replica Lag Handling | mdsanwarhossain.me — Read Replica Lag Handling — mdsanwarhossain.me

// After write: capture the primary LSN
public OrderCreateResponse createOrder(CreateOrderRequest req) {
    Order order = orderRepository.save(new Order(req));
    String lsn = jdbcTemplate.queryForObject(
        "SELECT pg_current_wal_lsn()::text", String.class);
    return new OrderCreateResponse(order.getId(), lsn);
}

// Before read: check replica lag
public Order getOrderConsistent(Long orderId, String consistencyToken) {
    if (consistencyToken != null && isReplicaCaughtUp(consistencyToken)) {
        return replicaOrderRepo.findById(orderId).orElseThrow();
    }
    return primaryOrderRepo.findById(orderId).orElseThrow();
}

private boolean isReplicaCaughtUp(String requiredLsn) {
    String replicaLsn = replicaJdbcTemplate.queryForObject(
        "SELECT pg_last_wal_replay_lsn()::text", String.class);
    // Compare LSN strings (they're sortable hex values)
    return replicaLsn != null && replicaLsn.compareTo(requiredLsn) >= 0;
}

This technique is used by systems like Amazon Aurora, which exposes a session-level consistency token via its SDK. The tradeoff is that the client must carry the token (in a cookie, HTTP header, or session store), adding protocol complexity.

Pattern 3: Sticky Routing to Primary After Writes

Sticky routing is a middleware-level pattern: after a write, all reads from that user's session are pinned to the primary for a configurable TTL (e.g., 5–10 seconds). After the TTL expires, reads return to replicas.

This can be implemented in a database proxy like PgBouncer, ProxySQL, or AWS RDS Proxy, or in application-layer routing logic.

# Example PgBouncer routing rule (conceptual)
# After detecting a write query, set a session flag
# Route all subsequent queries in session to primary pool
# until sticky_duration_ms has elapsed

sticky_write_routing = true
sticky_duration_ms = 10000
primary_pool = primary_pgbouncer:5432
replica_pool = replica_pgbouncer:5433

The TTL should be calibrated to your typical replica lag. If your lag is usually under 100ms, a 1-second TTL is generous. If you have cross-region replicas with 500ms lag, a 5-second TTL is more appropriate. Monitor actual lag and tune accordingly.

Pattern 4: Lag Detection and Monitoring

None of the above patterns work well without real-time lag visibility. Your application needs to know current replica lag, not just assume it. Expose replica lag as a metric and react to it dynamically.

@Component
public class ReplicaLagMonitor {

    private final MeterRegistry meterRegistry;
    private volatile long currentLagMs = 0;

    @Scheduled(fixedDelay = 1000)
    public void pollReplicaLag() {
        try {
            Long lagMs = replicaJdbc.queryForObject(
                "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000",
                Long.class);
            currentLagMs = lagMs != null ? lagMs : 0;
            meterRegistry.gauge("db.replica.lag_ms", currentLagMs);
        } catch (Exception e) {
            // replica unreachable — treat as max lag
            currentLagMs = Long.MAX_VALUE;
        }
    }

    public long getCurrentLagMs() { return currentLagMs; }

    public boolean isLagAcceptable(long thresholdMs) {
        return currentLagMs <= thresholdMs;
    }
}

Expose this as a Prometheus metric and alert when lag exceeds your SLA threshold. For most OLTP systems, lag above 5 seconds warrants a PagerDuty alert. Cross-region replicas may have a higher threshold of 30–60 seconds.

Pattern 5: Circuit Breaker for Lagging Replicas

If a replica falls critically behind, routing reads to it causes more user-visible inconsistency than routing to the primary. Implement a circuit breaker that automatically removes lagging replicas from the read pool when their lag exceeds a threshold.

@Component
public class ReplicaRouter {

    private static final long MAX_ACCEPTABLE_LAG_MS = 5_000;

    @Autowired ReplicaLagMonitor lagMonitor;
    @Autowired DataSource primaryDs;
    @Autowired DataSource replicaDs;

    public DataSource selectDataSource(boolean isReadOnly) {
        if (!isReadOnly) return primaryDs;

        if (lagMonitor.isLagAcceptable(MAX_ACCEPTABLE_LAG_MS)) {
            return replicaDs;
        }
        // Circuit open: fall back to primary
        meterRegistry.counter("db.replica.circuit_open").increment();
        return primaryDs;
    }
}

This is especially important during replica recovery from network partitions. A replica that just reconnected after 30 seconds of network failure will have 30 seconds of lag. Without a circuit breaker, your application will serve badly stale data for those 30 seconds.

For complex multi-threaded applications, this kind of concurrent read routing is a good fit for Java Structured Concurrency, where you can fan out reads to multiple replicas and take the fastest response from a non-lagging one in a structured scope.

Read Replica Failure Scenarios

Beyond lag, replicas can experience outright failures. Your routing logic must handle:

Replica crash: Connection pool throws exceptions. Detect and remove from pool. Route all reads to primary or remaining replicas.
Replica restart / recovery: Replica may replay WAL in catch-up mode. Lag can spike to minutes. Use lag-based circuit breaker.
Network partition: Replica becomes unreachable. TCP connection timeouts. Set aggressive socket timeouts (e.g., 2s connect timeout, 5s socket timeout) to detect quickly.
Replica promotion: During primary failover, a replica is promoted. Your connection strings must update. Use a connection proxy or DNS-based failover endpoint (e.g., Amazon RDS cluster endpoint).

# HikariCP settings for replica pool with fast failure detection
spring.datasource.replica.connection-timeout=2000
spring.datasource.replica.socket-timeout=5000
spring.datasource.replica.validation-timeout=1000
spring.datasource.replica.keepalive-time=30000
spring.datasource.replica.max-lifetime=1800000

When NOT to Use Read Replicas

Read replicas are not appropriate for every read workload. Be explicit about which read paths must have strong consistency:

Financial Writes and Balance Reads

Never read an account balance or transaction history from a replica before authorizing a debit. A stale balance can lead to overdrafts. Always read financial data from the primary within a transaction.

Inventory and Stock Level Checks

If you're checking whether an item is in stock before allowing a purchase, read from primary. A replica could show 1 unit in stock when the primary has already committed a sale that depleted inventory to 0.

Session and Authentication Data

Immediately after login, session tokens and permissions should be read from primary. A lagging replica might not have the new session yet, causing intermittent 401 errors right after login.

Idempotency Keys

When checking whether an idempotency key has already been used (to prevent duplicate processing), read from primary. A replica lag could allow a duplicate request to proceed.

Optimization Techniques

Connection Pooling Per Datasource

Use separate connection pools for primary and replica. This prevents a replica slowdown from starving primary connections and vice versa. HikariCP supports multiple DataSource beans in Spring Boot.

@Configuration
public class DataSourceConfig {

    @Bean(name = "primaryDataSource")
    @ConfigurationProperties("spring.datasource.primary")
    public DataSource primaryDataSource() {
        return DataSourceBuilder.create().type(HikariDataSource.class).build();
    }

    @Bean(name = "replicaDataSource")
    @ConfigurationProperties("spring.datasource.replica")
    public DataSource replicaDataSource() {
        return DataSourceBuilder.create().type(HikariDataSource.class).build();
    }
}

Routing Middleware with Spring's AbstractRoutingDataSource

Spring's AbstractRoutingDataSource allows transparent routing based on a thread-local context key. Combined with AOP on @Transactional(readOnly = true), you can automatically route read-only transactions to replicas.

public class RoutingDataSource extends AbstractRoutingDataSource {
    @Override
    protected Object determineCurrentLookupKey() {
        return TransactionSynchronizationManager.isCurrentTransactionReadOnly()
            ? DataSourceType.REPLICA : DataSourceType.PRIMARY;
    }
}

// AOP interceptor sets readOnly context before transaction begins
@Aspect
@Component
public class ReadOnlyRoutingAspect {
    @Around("@annotation(transactional)")
    public Object route(ProceedingJoinPoint pjp, Transactional transactional) throws Throwable {
        if (transactional.readOnly()) {
            DataSourceContextHolder.setReplica();
        }
        try { return pjp.proceed(); }
        finally { DataSourceContextHolder.clear(); }
    }
}

Read Replica Pools with Health Checks

Implement a health check endpoint on your replica pool that returns HTTP 503 if lag exceeds threshold. Use this in your load balancer health checks to automatically remove lagging replicas from rotation without application-layer changes.

Applications managing high concurrency across multiple data sources benefit greatly from structured concurrency primitives. See the deep dive on Java Structured Concurrency for patterns that make multi-datasource fan-out reads safer and more readable.

Summary: Replica Lag Pattern Decision Matrix

Pattern	Consistency Level	Complexity	Best For
Always read primary	Strong	Low	Financial data, auth
Read-your-writes (session flag)	Session	Medium	Order confirmation
LSN token consistency	Causal	High	API responses
Sticky routing with TTL	Bounded eventual	Low–Medium	General OLTP
Circuit breaker on lag	Adaptive	Medium	High availability

Key Takeaways

Read replica lag is a fundamental property of async replication — not a bug, but a design consideration.
Monitor pg_stat_replication.replay_lag and expose it as a metric. Alert on breaches.
Apply read-your-writes consistency for write-then-read patterns (order placement, profile updates).
LSN tokens provide the strongest consistency guarantee without always going to primary.
Implement a lag-based circuit breaker to automatically demote lagging replicas from the read pool.
Never read financial balances, idempotency keys, or inventory levels from replicas before critical writes.
Sticky routing with a short TTL is the pragmatic choice for most OLTP applications.
Use separate connection pools and routing middleware (AbstractRoutingDataSource) to keep routing logic out of business code.

Tags:

read replica lag handling database eventual consistency patterns PostgreSQL replica lag read-your-writes consistency database routing strategies

Read Replica Lag in Production: Patterns to Handle Eventual Consistency Without Breaking Your Application

Table of Contents

The Silent Bug That Made Customers Call Support

What Causes Read Replica Lag?

WAL Shipping Delay

Network Latency

Query Load on the Replica

Vacuum and Maintenance Operations

Pattern 1: Read-Your-Writes Consistency

Pattern 2: Session Consistency with LSN Tokens

Pattern 3: Sticky Routing to Primary After Writes

Pattern 4: Lag Detection and Monitoring

Pattern 5: Circuit Breaker for Lagging Replicas

Read Replica Failure Scenarios

When NOT to Use Read Replicas

Financial Writes and Balance Reads

Inventory and Stock Level Checks

Session and Authentication Data

Idempotency Keys

Optimization Techniques

Connection Pooling Per Datasource

Routing Middleware with Spring's AbstractRoutingDataSource

Read Replica Pools with Health Checks

Summary: Replica Lag Pattern Decision Matrix

Key Takeaways

Read More on the Blog

Tags

Leave a Comment

Related Posts

Read Replica Lag in Production: Patterns to Handle Eventual Consistency Without Breaking Your Application

Table of Contents

The Silent Bug That Made Customers Call Support

What Causes Read Replica Lag?

WAL Shipping Delay

Network Latency

Query Load on the Replica

Vacuum and Maintenance Operations

Pattern 1: Read-Your-Writes Consistency

Pattern 2: Session Consistency with LSN Tokens

Pattern 3: Sticky Routing to Primary After Writes

Pattern 4: Lag Detection and Monitoring

Pattern 5: Circuit Breaker for Lagging Replicas

Read Replica Failure Scenarios

When NOT to Use Read Replicas

Financial Writes and Balance Reads

Inventory and Stock Level Checks

Session and Authentication Data

Idempotency Keys

Optimization Techniques

Connection Pooling Per Datasource

Routing Middleware with Spring's AbstractRoutingDataSource

Read Replica Pools with Health Checks

Summary: Replica Lag Pattern Decision Matrix

Key Takeaways

Read More on the Blog

Tags

Leave a Comment

Related Posts

Database Sharding Strategies: Horizontal Scaling Patterns for High-Traffic Production Systems

CAP Theorem in Practice: Consistency vs Availability Trade-offs in Distributed Systems

PostgreSQL Query Optimization in Production: Indexes, EXPLAIN ANALYZE & Slow Query Elimination

Cookie Notice