System Design April 11, 2026 · 22 min read

PostgreSQL Replication & High Availability: Streaming Replication, Patroni & Failover (2026)

A complete production guide to PostgreSQL high availability: physical streaming replication with WAL, synchronous vs asynchronous trade-offs, Patroni + etcd for automatic leader election and failover, PgBouncer connection pooling, read replica routing in Spring Boot with AbstractRoutingDataSource, pgBackRest backups, and a full DR runbook.

Md Sanwar Hossain
Senior Java & Backend Engineer

PostgreSQL Replication High Availability 2026

TL;DR: PostgreSQL HA requires streaming replication for data durability, Patroni + etcd for automatic failover (<30s RTO), PgBouncer for connection pooling, and AbstractRoutingDataSource in Spring Boot to route reads to replicas. Back up with pgBackRest to S3 for PITR.

1. Why High Availability? RTO & RPO Explained

A single-node PostgreSQL instance is a single point of failure. Hardware failure, OS crash, or a botched migration can bring your entire application offline. Two metrics define your HA requirements:

RTO (Recovery Time Objective): Maximum acceptable downtime. "We can be down for at most 30 seconds." Drives your failover automation strategy.
RPO (Recovery Point Objective): Maximum acceptable data loss. "We cannot lose more than 0 committed transactions." Drives your replication mode (sync vs async).

HA Level	RTO Target	RPO Target	Approach
Single node + backup	> 1 hour	Hours (last backup)	Restore from pgBackRest
Async streaming replication	1–5 min (manual)	Seconds (lag-dependent)	Manual promotion or Patroni
Patroni + async replication	< 30 s	Seconds (last replicated LSN)	Automatic promotion via DCS
Patroni + sync replication	< 30 s	0 (zero data loss)	synchronous_commit = on

A single-node PostgreSQL fails at production scale for three reasons: no failover (any crash = downtime), no read scalability (all queries hit one server), and no parallel backup without blocking writes.

2. Streaming Replication Concepts: WAL & Standbys

PostgreSQL writes every change to the Write-Ahead Log (WAL) before applying it to heap data files. WAL records are binary, append-only, and crash-safe. Streaming replication ships these WAL bytes in real time to standby servers.

WAL sender: A background process on the primary that reads WAL and streams it over a TCP connection to each standby's WAL receiver.
WAL receiver: On the standby, writes received WAL to its own pg_wal directory and signals the startup process to apply (replay) it.
LSN (Log Sequence Number): A 64-bit byte offset into the WAL stream, used to measure replication lag.
Replication slot: Server-side state that prevents WAL deletion until all slots confirm they've consumed it. Essential for logical replication; optional but useful for physical standbys with short expected downtime.
Hot standby: A standby in recovery mode that still accepts read-only SQL queries, enabling read scale-out.

# postgresql.conf — enable WAL streaming on the primary

wal_level = replica          # minimum for streaming replication
max_wal_senders = 10         # max concurrent WAL sender processes
wal_keep_size = 1GB          # retain this much WAL for slow standbys (no slot)
hot_standby = on             # allow reads on standby (set on standby too)
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'

-- pg_hba.conf on primary — allow replication user from standby IPs

host  replication  replicator  10.0.1.0/24  scram-sha-256

3. Synchronous vs Asynchronous Replication Trade-offs

Mode	Durability	Write Latency	Data Loss Risk	Use Case
Asynchronous (default)	Primary only	Lowest	Seconds of transactions	Read scale-out, DR replicas
remote_write	WAL written on standby OS	+ 1 RTT	Very low (OS crash only)	Good balance for most apps
remote_apply	Applied on standby	+ apply time	None	Readable standby always current
on (synchronous)	WAL flushed on standby disk	+ 1–2 RTT	Zero (RPO=0)	Fintech, healthcare, audit logs

# postgresql.conf — synchronous replication config

# Wait for at least one standby named 'standby1' OR 'standby2' to flush WAL
synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'

# For ANY 1 of 2 standbys to acknowledge (quorum)
# synchronous_standby_names = 'ANY 1 (standby1, standby2)'

# Per-transaction override (useful for non-critical writes):
# SET LOCAL synchronous_commit TO off;

Warning: With synchronous_commit = on, if all synchronous standbys disconnect, the primary blocks all writes until one reconnects. Use ANY 1 quorum mode with at least two synchronous standby candidates.

4. Setting Up Physical Streaming Replication

# Step 1: Create replication user; Step 2: base backup from standby

-- On primary:
CREATE USER replicator REPLICATION LOGIN PASSWORD 'str0ngPass!';

-- On standby server (-Xs = stream WAL, -R = write recovery config):
pg_basebackup -h 10.0.1.10 -U replicator \
  -D /var/lib/postgresql/16/main -P -Xs -R --checkpoint=fast

# postgresql.auto.conf (written by -R) + Step 3: signal file

primary_conninfo = 'host=10.0.1.10 port=5432 user=replicator password=str0ngPass!'
primary_slot_name = 'standby1_slot'

touch /var/lib/postgresql/16/main/standby.signal
pg_ctlcluster 16 main start

-- Verify on primary (state = 'streaming' means healthy)

SELECT client_addr, state, sent_lsn, replay_lsn,
       pg_size_pretty(sent_lsn - replay_lsn) AS lag
FROM pg_stat_replication;

pg_rewind: Re-sync a former primary as a standby without a full base backup: pg_rewind --target-pgdata=... --source-server="host=new-primary ...". Far faster than pg_basebackup on large databases.

5. Logical Replication & Use Cases

Physical replication copies everything at the byte level and requires the standby to be on the same PostgreSQL major version. Logical replication replicates at the SQL row level, supports cross-version and cross-database replication, and can replicate a subset of tables.

Use case 1: Zero-downtime major version upgrades (replicate from PG15 primary to PG16 standby, then switch over).
Use case 2: Selective table replication to an analytics DB.
Use case 3: Multi-master with conflict resolution (BDR/pglogical).

-- On the publisher (source) database

-- postgresql.conf: wal_level = logical
CREATE PUBLICATION orders_pub FOR TABLE orders, order_items;
-- Or replicate entire database:
-- CREATE PUBLICATION all_pub FOR ALL TABLES;

-- On the subscriber (destination) database — can be different PG version

CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=10.0.1.10 dbname=appdb user=replicator password=str0ngPass!'
  PUBLICATION orders_pub
  WITH (copy_data = true, create_slot = true);

-- Monitor subscription status
SELECT subname, received_lsn, latest_end_lsn FROM pg_stat_subscription;

6. Patroni + etcd: Automatic Leader Election & Failover

Patroni is a Python daemon that runs on each PostgreSQL node and uses a Distributed Configuration Store (DCS) — etcd, Consul, or ZooKeeper — for leader election. The leader holds a lock in etcd with a TTL; if the primary fails to renew the lock before TTL expiry, a standby acquires it and promotes itself.

# patroni.yml — configuration for the primary node

scope: postgres-ha
name: pg-node1
restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.1.10:8008
etcd3:
  hosts: 10.0.1.20:2379,10.0.1.21:2379,10.0.1.22:2379
bootstrap:
  dcs:
    ttl: 30                         # leader lock TTL in seconds
    loop_wait: 10                   # health-check interval
    maximum_lag_on_failover: 1048576  # don't promote if > 1MB behind
    synchronous_mode: false         # set true for zero-loss failover
  initdb: [{encoding: UTF8}, {data-checksums: true}]
postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.1.10:5432
  data_dir: /var/lib/postgresql/16/main
  bin_dir: /usr/lib/postgresql/16/bin
  authentication:
    replication: {username: replicator, password: str0ngPass!}
    superuser: {username: postgres, password: s3cr3t!}
  parameters:
    wal_level: replica
    max_wal_senders: 10
    wal_keep_size: 1GB

# patronictl commands

# List cluster state (shows primary + replica lag)
patronictl -c /etc/patroni.yml list

# Planned switchover (graceful, zero data loss)
patronictl -c /etc/patroni.yml switchover --master pg-node1 --candidate pg-node2

# Emergency failover (promotes best standby immediately)
patronictl -c /etc/patroni.yml failover postgres-ha

# Pause/resume auto-failover (for maintenance)
patronictl -c /etc/patroni.yml pause
patronictl -c /etc/patroni.yml resume

Failover flow: (1) Primary crashes → stops renewing etcd lock → TTL expires (up to 30s). (2) etcd notifies Patroni on standbys. (3) Standby with highest LSN acquires lock → runs pg_ctl promote. (4) Other standbys reconfigure to replicate from new primary. (5) HAProxy/PgBouncer health-checks detect the new primary and re-route connections.

7. PgBouncer Connection Pooling in HA Setup

PostgreSQL creates a new OS process per client connection (up to ~3MB RAM each). At 500+ concurrent app connections, memory pressure becomes a bottleneck. PgBouncer maintains a small pool of actual backend connections and multiplexes thousands of client connections across them.

Pool Mode	Connection Held Until	Transactions/Connection	Use Case
session	Client disconnects	All session txns	Compatibility, SET/PREPARE statements
transaction	Transaction ends	High multiplexing	Most Spring Boot apps (recommended)
statement	Statement ends	Maximum multiplexing	Autocommit-only workloads

# pgbouncer.ini — production configuration

[databases]
appdb = host=10.0.1.10 port=5432 dbname=appdb  # points to primary (or HAProxy VIP)
appdb_ro = host=10.0.1.11 port=5432 dbname=appdb  # points to replica

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 25       # backend connections per db/user pair
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600
client_idle_timeout = 0
log_connections = 0
log_disconnections = 0
stats_period = 60

Configure Spring Boot's datasource URL to point to PgBouncer (jdbc:postgresql://pgbouncer-host:6432/appdb). Set HikariCP maximum-pool-size to ~10 per instance — PgBouncer multiplexes all app instances into a small backend pool.

8. Read Replica Routing in Spring Boot (AbstractRoutingDataSource)

// RoutingDataSource.java — route based on @Transactional(readOnly)

public class RoutingDataSource extends AbstractRoutingDataSource {
    @Override
    protected Object determineCurrentLookupKey() {
        return TransactionSynchronizationManager.isCurrentTransactionReadOnly()
                ? "REPLICA" : "PRIMARY";
    }
}

// DataSourceConfig.java — wire routing data source

@Configuration
public class DataSourceConfig {
    @Bean @ConfigurationProperties("spring.datasource.primary")
    public DataSource primaryDataSource() { return DataSourceBuilder.create().build(); }

    @Bean @ConfigurationProperties("spring.datasource.replica")
    public DataSource replicaDataSource() { return DataSourceBuilder.create().build(); }

    @Bean @Primary
    public DataSource routingDataSource(
            @Qualifier("primaryDataSource") DataSource primary,
            @Qualifier("replicaDataSource") DataSource replica) {
        RoutingDataSource routing = new RoutingDataSource();
        routing.setTargetDataSources(Map.of("PRIMARY", primary, "REPLICA", replica));
        routing.setDefaultTargetDataSource(primary);
        return routing;
    }
}

# application.yml — separate primary and replica URLs

spring:
  datasource:
    primary:
      url: jdbc:postgresql://pgbouncer:6432/appdb
      username: appuser
      password: ${DB_PASSWORD}
      hikari:
        maximum-pool-size: 10
        pool-name: PrimaryPool
    replica:
      url: jdbc:postgresql://pgbouncer:6432/appdb_ro
      username: appuser
      password: ${DB_PASSWORD}
      hikari:
        maximum-pool-size: 20   # more reads than writes
        pool-name: ReplicaPool

// Service layer — @Transactional(readOnly=true) routes to replica

@Service
public class ProductService {

    @Transactional(readOnly = true)   // ← routes to REPLICA
    public List<Product> findAll() {
        return productRepository.findAll();
    }

    @Transactional                    // ← routes to PRIMARY
    public Product create(ProductRequest req) {
        return productRepository.save(new Product(req));
    }
}

9. Monitoring Replication Lag (pg_stat_replication)

-- Monitor standbys and lag (run on primary)

SELECT application_name, client_addr, state,
       pg_size_pretty(sent_lsn - replay_lsn) AS lag_size,
       EXTRACT(EPOCH FROM replay_lag) AS replay_lag_sec,
       sync_state
FROM pg_stat_replication ORDER BY replay_lag DESC;

-- Inspect replication slots — alert on inactive slots accumulating WAL

SELECT slot_name, slot_type, active,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots;
-- retained_wal > 5GB = WARNING: drop or resume the slot!

# Prometheus AlertManager rule (postgres_exporter)

- alert: PostgresReplicationLagHigh
  expr: pg_replication_lag > 30
  for: 2m
  labels: {severity: warning}
  annotations:
    summary: "Replication lag {{ $value }}s on {{ $labels.instance }}"

10. Backup with pgBackRest & PITR

pgBackRest is the gold standard for PostgreSQL backups: parallel compression, S3/GCS/Azure storage, WAL archiving for PITR, incremental backups, and built-in backup verification.

# /etc/pgbackrest/pgbackrest.conf

[global]
repo1-type=s3
repo1-s3-bucket=my-pg-backups
repo1-s3-endpoint=s3.amazonaws.com
repo1-s3-region=us-east-1
repo1-retention-full=4       # keep 4 full backups
repo1-retention-diff=14
repo1-cipher-type=aes-256-cbc
compress-type=lz4
process-max=4                # parallel compression

[main]
pg1-path=/var/lib/postgresql/16/main

# Backup schedule (cron) + PITR restore example

# Full backup every Sunday 01:00
0 1 * * 0  postgres  pgbackrest --stanza=main --type=full backup
# Incremental daily
0 1 * * 1-6 postgres  pgbackrest --stanza=main --type=incr backup

# PITR: restore to timestamp (before a bad migration)
pg_ctlcluster 16 main stop
pgbackrest --stanza=main --delta --type=time \
  "--target=2026-04-11 14:25:00+00" --target-action=promote restore
pg_ctlcluster 16 main start

11. Disaster Recovery Runbook

Automated failover (Patroni): Patroni handles this automatically. Your runbook is: (1) Receive PagerDuty alert. (2) Run patronictl list to confirm new primary. (3) Verify application traffic is routed correctly via PgBouncer/HAProxy health checks. (4) Investigate and repair the failed node. (5) Rejoin it as a standby with patronictl reinit.

# Manual failover runbook (without Patroni)

# 1. Confirm primary is truly dead
pg_isready -h 10.0.1.10 -p 5432   # should fail
# 2. Choose standby with highest replay_lsn
psql -h 10.0.1.11 -c "SELECT pg_last_wal_replay_lsn();"
# 3. Promote the best standby
pg_ctlcluster 16 main promote
# 4. Update HAProxy backend to new primary IP
echo "set server pg_backend/pg-node1 addr 10.0.1.11" | socat stdio /var/run/haproxy/admin.sock
# 5. Reconfigure remaining standbys: edit primary_conninfo, then:
SELECT pg_reload_conf();
# 6. Rejoin old primary as standby using pg_rewind (faster than basebackup)
pg_rewind --target-pgdata=/var/lib/postgresql/16/main \
          --source-server="host=10.0.1.11 user=postgres"

# Chaos testing — validate DR procedure quarterly

# 1. Document current primary:  patronictl list
# 2. Simulate crash: kill -9 $(pgrep -f patroni) on primary node
# 3. Measure failover time: watch patronictl list  (target: < 30s)
# 4. Verify data consistency: SELECT COUNT(*) on critical tables
# 5. Rejoin failed node: patronictl reinit postgres-ha pg-node1

12. Production Checklist

✅ PostgreSQL HA Production Checklist

wal_level = replica on all nodes
At least 2 standbys (1 sync + 1 async)
Patroni + 3-node etcd cluster
PgBouncer in transaction pool mode
HAProxy health-check on Patroni :8008
pg_stat_replication monitored & alerted
Replication lag alert (<30s warn / <120s crit)
No orphaned replication slots
pgBackRest full weekly + incremental daily
WAL archiving enabled (archive_mode=on)
PITR tested within retention window
pgbackrest verify run weekly
AbstractRoutingDataSource read/write split
max_connections tuned (RAM / 3MB)
scram-sha-256 auth in pg_hba.conf
data-checksums enabled
DR runbook tested quarterly
RTO < 30s and RPO defined per SLA

Tags:

postgresql replication postgresql high availability patroni postgresql postgresql failover pgbouncer pgbackrest pitr spring boot read replica

System Design

MySQL InnoDB Internals: Buffer Pool & MVCC

System Design

MySQL vs PostgreSQL for Java Backend 2026

System Design

PostgreSQL Query Optimization Deep Dive

Microservices

Redis Caching in Spring Boot Production

Back to Blog Last updated: April 11, 2026

PostgreSQL Replication & High Availability: Streaming Replication, Patroni & Failover (2026)

1. Why High Availability? RTO & RPO Explained

2. Streaming Replication Concepts: WAL & Standbys

3. Synchronous vs Asynchronous Replication Trade-offs

4. Setting Up Physical Streaming Replication

5. Logical Replication & Use Cases

6. Patroni + etcd: Automatic Leader Election & Failover

7. PgBouncer Connection Pooling in HA Setup

8. Read Replica Routing in Spring Boot (AbstractRoutingDataSource)

9. Monitoring Replication Lag (pg_stat_replication)

10. Backup with pgBackRest & PITR

11. Disaster Recovery Runbook

12. Production Checklist

Leave a Comment

Related Posts