Technology

Dual Control Planes for Geo-Partitioned Event Sourcing: Surviving Region Blackouts

Audience: platform, SRE, and staff engineers running multi-region Kafka/postgres event stores who need to survive full regional loss without breaking audit guarantees.

Md Sanwar Hossain March 2026 15 min read Technology

Dual Control Planes for Geo-Partitioned Event Sourcing

Introduction
Story: When the East Coast Disappears
Why Dual Control Planes in Event-Sourced Stacks
Architecture Blueprint
Event Store Topology
Control Plane Mechanics and Failover
Step-by-Step Recovery Flow
Operational Commands and Config Snippets
Failure Modes to Drill
Observability and Runbooks
Data Repair and Consistency
Trade-offs and Costs
Mistakes to Avoid
Key Takeaways

Introduction

Dual Control Plane Architecture | mdsanwarhossain.me — Dual Control Plane Architecture — mdsanwarhossain.me

Event-sourced systems love audit trails, reliable replays, and append-only histories. They hate ambiguous failovers, partial replicas, and control planes that vanish mid-incident. In a single region, the control plane that owns schemas, topic policies, consumer offsets, and projection rollouts can be centralized. In a geo-partitioned deployment, centralization becomes a liability: if the region hosting the control plane disappears, operators are blind and the replay choreography that keeps projections coherent freezes.

This piece distills the pattern into concrete commands, configs, and a step-by-step recovery flow you can lift into a runbook.

Story: When the East Coast Disappears

It is 08:12 UTC on a Tuesday. An upstream cloud networking incident silently isolates us-east-1. Your Kafka control plane, Schema Registry primary, and GitOps controllers all live there. Producers in eu-west-1 keep writing locally. Consumers in ap-southeast-1 stall because their offset commits target a control plane that is now unreachable. Within 15 minutes, incident commanders say, “Fail east traffic to EU.” The runbook says nothing about projection rebuild order or who owns schema evolution. Without a second control plane, you are left with hand-edited configs and hope.

Why Dual Control Planes in Event-Sourced Stacks

Multi-Cluster Architecture | mdsanwarhossain.me — Multi-Cluster Architecture — mdsanwarhossain.me

Dual control planes are not about hot spares; they are about autonomy. Each region needs a control plane that can:

Approve or reject schema changes and topic ACLs without waiting for a remote admin API.
Coordinate projection rebuilds and idempotent replay guards locally.
Own consumer offset commits and DLQ routing within the region.
Synchronize policy, not command, across peers so that one region’s outage does not block others from acting.

Instead of a single orchestration brain, you operate two peer brains that exchange state through a low-frequency, signed policy mirror. During an outage, each region keeps processing with its last validated policy and queues diffs for later reconciliation.

Architecture Blueprint

The pattern builds four planes per region: data (Kafka + event store), control (GitOps + registries + orchestration), compute (services + projections), and observability (metrics/logs/traces). Duplicate the control plane in at least two regions and mirror policy as code through signed Git remotes. In practice:

Dual Control Planes Architecture | mdsanwarhossain.me — Dual Control Planes Architecture — mdsanwarhossain.me

# flux-kustomization.yaml (region-scoped)
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: control-plane
spec:
  interval: 2m
  targetNamespace: platform-system
  path: ./clusters/us-east-1
  prune: true
  sourceRef:
    kind: GitRepository
    name: platform-config
  suspend: false
  postBuild:
    substitute:
      REGION: us-east-1
      KAFKA_CLUSTER: kafka-east
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: policy-signing
  namespace: platform-system
data:
  publicKey.pem: |
    -----BEGIN PUBLIC KEY-----
    
    -----END PUBLIC KEY-----

Each region runs its own Flux/ArgoCD instance pointed to the same Git mirror but scoped paths, so a regional loss does not freeze reconciliation elsewhere. The orchestration code that fans out projection rebuilds should still enforce structured lifecycles: every replay job owns its child tasks and cancels them cleanly when switching failover states.

Event Store Topology

For geo-partitioned event sourcing, the typical topology is:

Kafka clusters per region with MirrorMaker 2 (or Cluster Linking) replicating topics across peer regions.
Region-pinned partitions for latency-sensitive aggregates, plus geo-replicated audit topics for global reads.
Write-local, read-anywhere semantics: producers write to their local cluster; global consumers read from replicas with well-defined lag budgets.

Example topic creation that enforces region pinning and retention aligned to replay windows:

$ kafka-topics --bootstrap-server kafka-east:9092 \
  --create --topic orders.us \
  --partitions 12 --replication-factor 3 \
  --config min.insync.replicas=2 \
  --config retention.ms=1209600000 # 14 days

# single-quoted heredoc prevents shell expansion and keeps the MM2 config literal
$ cat > mm2.properties <<'EOF'
clusters = east,eu
east.bootstrap.servers = kafka-east:9092
eu.bootstrap.servers = kafka-eu:9092
east->eu.enabled = true
east->eu.topics = orders.us
sync.topic.acls.enabled = true
tasks.max = 4
EOF
$ connect-mirror-maker.sh mm2.properties

Control Plane Mechanics and Failover

Dual control planes operate in active-active mode with a treaty:

Policy is mirrored Git: schemas, ACLs, consumer group ownership, projection rollout manifests.
Runtime state is local: offsets, DLQ topics, replay cursors.
Leaders are regional: each control plane is authoritative for its region’s compute plane.

During normal operations, policy PRs merge in either region, signed, and mirrored. During a blackout, the surviving control plane freezes cross-region policy merges but keeps applying region-local manifests, then reconciles deterministically when the failed region returns.

Step-by-Step Recovery Flow

The runbook below assumes us-east-1 went dark while eu-west-1 survived:

Freeze cross-region changes: Pause MirrorMaker/Cluster Linking for topics sourced from the failed region to prevent stale backfill.
Promote standby schemas: In EU, set Schema Registry compatibility to BACKWARD and apply the last signed schema bundle.
Re-point producers: Toggle DNS or service mesh to route US traffic to EU ingress; writes land in orders.eu.
Replay critical projections: Kick off scoped replays (orders, payments) from a checkpointed offset snapshot, canceling on first failure.
Switch read models: Update API read endpoints to point to EU projections.
Drain DLQs locally: EU control plane owns DLQ processing; halt cross-region DLQ shipping until east recovers.
Observe lag budgets: Alert if replication lag for orders.us mirrors exceeds your RPO (e.g., 5 minutes).
Restore east: When east recovers, keep it read-only, reconcile schemas/ACLs, then re-enable mirrors from EU to US.
Offset reconciliation and unfreeze: Export EU offsets, import to US once caught up, then reopen cross-region Git mirrors and automation.

Operational Commands and Config Snippets

# Pause mirror links from failed region
$ kafka-cluster-links --bootstrap-server kafka-eu:9092 \
  --alter --link east-to-eu --config "link.mode=paused"

# Promote EU schema bundle
$ curl -X PUT http://schema-eu:8081/config \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"compatibility": "BACKWARD"}'
$ tar -xf schemas/signed-bundle-us-east.tar.gz -C /tmp/schemas
$ curl -X POST http://schema-eu:8081/subjects/orders-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d @/tmp/schemas/orders-value.json

# Replay projections with checkpointed offsets
$ kubectl -n platform-system create job replay-orders \
  --image=registry/replayer:1.4 \
  -- \
  --topic orders.eu --group projections.orders \
  --offset-snapshot s3://backups/checkpoints/orders-eu.json

# Drain DLQ locally
$ kafka-console-consumer --bootstrap-server kafka-eu:9092 \
  --topic orders.dlq --from-beginning --property print.headers=true

Configuring consumer failover with region affinity:

spring:
  kafka:
    bootstrap-servers:
      - kafka-eu:9092
      - kafka-east:9092
    properties:
      client.rack: eu-west-1
      partition.assignment.strategy: org.apache.kafka.clients.consumer.StickyAssignor
    consumer:
      group-id: projections.orders
      auto-offset-reset: latest
      enable-auto-commit: false

Failure Modes to Drill

Schema divergence: One region merges a backward-incompatible schema before the mirror pauses. Drill rollbacks using the signed bundle mechanism.
Mirror partition skew: After reconnection, some partitions are ahead in EU; rehearse selective catch-up with kafka-reassign-partitions.
Projection poisoning: A bad event batch replays twice. Validate idempotency keys and ensure replay jobs stop on first duplicate detection.

Observability and Runbooks

Blackouts blur visibility. Keep observability regional and federate asynchronously. Runbooks should express steps as structured workflows rather than ad-hoc commands; nested tasks that own their children prevent orphaned replays when failover decisions change. The orchestration discipline in structured concurrency maps directly to incident automations.

Lag dashboards per link: Mirror lag, offset export/import durations, DLQ depth.
Health budgets: Alert when control-plane reconciliation exceeds 5 minutes or when schema bundles are older than 24 hours.

Data Repair and Consistency

Once the failed region returns:

Read-only staging: Keep producers on the surviving region; mount recovered brokers as mirrors.
Verify deltas: Compare partition checksums and event counts; only proceed when parity holds.
Import and restore: Import surviving offsets, run a dry-run replay to confirm idempotency, then shift traffic back gradually (10%, 25%, 50%, 100%).

Trade-offs and Costs

Dual control planes add expense: duplicate GitOps controllers, schema registries, and CI runners per region, plus more storage for mirrored topics. The payoff is autonomy: regional outages become localized incidents, and you keep audit-grade guarantees because replay order and schema governance never rely on a single region.

Mistakes to Avoid

Letting offset edits bypass signed policy bundles.
Failing to pause mirrors before rerouting producers, causing backfill storms.
Replaying projections without idempotency keys or dedupe fences.
Running a single Schema Registry for all regions.
Treating DLQs as trash cans instead of structured queues with ownership.

Key Takeaways

Control planes must be regional citizens with their own autonomy, not distant overlords.
Policy mirrors and signed bundles keep schemas, ACLs, and ownership aligned across blackouts.
Replay discipline, lag budgets, and DLQ ownership matter more than raw replication speed.
Runbooks should be executable scripts with clear cancellation semantics, not prose.

Conclusion

Surviving a regional blackout in an event-sourced world is less about “flip DNS” and more about choreography: pausing mirrors, rerouting producers, replaying projections in the right order, and reconciling offsets with proof. Dual control planes give you that choreography even when one side of the world goes dark. For the concurrency discipline behind those workflows, revisit structured concurrency and apply it to your incident automations.

Read Full Blog Here

For orchestration patterns behind these workflows, read the structured concurrency guide: https://mdsanwarhossain.me/blog-java-structured-concurrency.html.

Implementation Deep Dive: Kafka MirrorMaker 2 for Cross-Region Replication

MirrorMaker 2 (MM2) is the production-grade solution for cross-cluster Kafka replication, replacing the fragile single-consumer MirrorMaker 1 with a Kafka Connect-based architecture that supports bi-directional replication, offset translation, and consumer group migration. In a dual control plane setup, MM2 handles the data plane replication while your control plane manages topic policies, ACLs, and schema synchronization independently.

A complete MM2 configuration for a geo-partitioned deployment targeting us-east-1 and eu-west-1:

# mm2-config.properties — MirrorMaker 2 deployed as Kafka Connect cluster
clusters = east, eu
east.bootstrap.servers = kafka-east:9092
eu.bootstrap.servers   = kafka-eu:9092

# Replication rules: east → eu (active mirror)
east->eu.enabled = true
east->eu.topics = orders\..*,inventory\..*,payments\..*
east->eu.topics.exclude = .*\.internal,.*\.dlq

# Offset synchronisation — allows consumer failover without full replay
east->eu.sync.consumer.groups.enabled      = true
east->eu.emit.heartbeats.enabled           = true
east->eu.emit.checkpoints.enabled          = true
east->eu.sync.group.offsets.enabled        = true
east->eu.sync.group.offsets.interval.seconds     = 60
east->eu.emit.checkpoints.interval.seconds = 30

# Replication factor for mirror-internal topics
replication.factor                        = 3
checkpoints.topic.replication.factor      = 3
heartbeats.topic.replication.factor       = 3

# Exclude internal consumer groups
groups.exclude = connect-.*,__consumer_offsets.*

# Performance tuning
tasks.max = 4
producer.override.compression.type = lz4
producer.override.linger.ms        = 5
consumer.poll.timeout.ms           = 1000

Critical MM2 operational concerns that documentation understates:

Topic renaming: By default MM2 prefixes mirrored topics with the source cluster alias (east.orders.created). Override with replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy if consumers in the target region must read the same topic name without code changes.
Offset translation: Source offsets and target offsets are never identical. MM2 maintains a translation table in mm2-offset-syncs.eu.internal. When failing over consumers, use the MirrorCheckpointConnector checkpoints to find the translated offset before starting consumers on the target cluster — never assume raw offsets are portable.
Lag monitoring: MM2 exposes JMX metrics under kafka.connect:type=mirror-source-connector-metrics. The most actionable metric is replication-latency-ms-max; alert when it exceeds 30 seconds in steady state. Consumer group lag for the mirror connector's own internal consumer group is an early warning for pipeline backpressure long before partition lag surfaces.

MM2 Connector	Purpose	Key Config	Alert Threshold
MirrorSourceConnector	Replicates topic data	`refresh.topics.interval.seconds=30`	Lag > 30 s
MirrorCheckpointConnector	Translates consumer offsets	`emit.checkpoints.interval.seconds=30`	Checkpoint age > 5 min
MirrorHeartbeatConnector	Cluster connectivity probe	`emit.heartbeats.interval.seconds=1`	No heartbeat > 10 s

Quorum Fencing in Practice: ZooKeeper, etcd, and Custom Implementations

Fencing tokens are the foundational primitive that prevents split-brain writes during failover. The invariant is simple: any operation that could cause data divergence must carry a monotonically increasing fencing token, and the resource being written must reject any operation whose token is lower than the highest it has seen. Without fencing, two control-plane instances that each believe themselves to be primary will issue conflicting schema changes or projection-restart commands, producing silent data divergence that surfaces hours later during replay.

A ZooKeeper-based leader election with fencing token generation using Apache Curator:

import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.recipes.leader.LeaderSelector;
import org.apache.curator.framework.recipes.leader.LeaderSelectorListenerAdapter;
import java.util.concurrent.atomic.AtomicLong;

public class ControlPlaneLeaderElector extends LeaderSelectorListenerAdapter {
    private final LeaderSelector leaderSelector;
    private final AtomicLong fencingToken = new AtomicLong(0);

    public ControlPlaneLeaderElector(CuratorFramework client) {
        this.leaderSelector = new LeaderSelector(
            client, "/control-plane/leader", this);
        this.leaderSelector.autoRequeue(); // re-enter election after losing leadership
    }

    @Override
    public void takeLeadership(CuratorFramework client) throws Exception {
        // Increment fencing token on every leadership acquisition
        long token = fencingToken.incrementAndGet();
        client.setData()
              .withVersion(-1)
              .forPath("/control-plane/fence",
                       Long.toString(token).getBytes());
        log.info("Acquired leadership, fencing token={}", token);
        try {
            runControlPlane(token); // blocks until leadership is lost
        } finally {
            log.info("Lost leadership — fencing token {} invalidated", token);
        }
    }

    public boolean validateToken(long incoming) {
        return incoming >= fencingToken.get();
    }
}

// etcd-based election using Jetcd (preferred for KRaft-mode Kafka deployments)
import io.etcd.jetcd.Client;
import io.etcd.jetcd.Election;
import io.etcd.jetcd.election.CampaignResponse;
import static java.nio.charset.StandardCharsets.UTF_8;

public class EtcdLeaderElector {
    private final Election electionClient;
    private static final String ELECTION = "/control-plane-election";

    public void campaign(String instanceId) throws Exception {
        ByteSequence key = ByteSequence.from(ELECTION, UTF_8);
        ByteSequence val = ByteSequence.from(instanceId, UTF_8);
        // campaign() blocks until this instance wins the election
        CampaignResponse response = electionClient.campaign(key, 10L, val).get();
        // etcd revision is the fencing token — globally monotonically increasing
        long fenceToken = response.getLeader().getRevision();
        log.info("Won election, etcd revision {} is fencing token", fenceToken);
        runControlPlaneWithToken(fenceToken);
    }
}

The etcd revision number is an excellent fencing token because etcd guarantees a globally monotonically increasing revision across all write operations in a cluster. When writing to any protected resource — Schema Registry, topic ACLs, or the GitOps policy repository — include the current revision as a header. A guard layer in each resource rejects writes whose fencing token is lower than the highest seen:

Schema Registry guard: Wrap the Schema Registry admin API with a reverse proxy that checks an X-Fencing-Token header against the current etcd leader revision before forwarding PUT/POST requests. Any stale-token request receives 409 Conflict.
GitOps policy guard: Use a Git pre-receive hook that validates the fencing token embedded in the push commit message against the live etcd leader revision. Reject pushes from any instance that is not the current leader.
Kafka ACL guard: Implement a custom Kafka authorizer plugin that additionally validates fencing tokens on admin API operations — CreateTopics, AlterConfigs, AlterAcls — so a stale leader cannot mutate topic configurations after losing election.

Testing Geo-Partition Failures: Chaos Engineering Approaches

Chaos engineering for geo-partitioned systems requires simulating network partitions between regions, not just killing individual pods. The failure modes that matter most are: (1) complete network isolation between two regions, (2) asymmetric partitions where region A can reach region B but not vice versa, (3) partial partitions where data-plane traffic continues but control-plane API calls fail, and (4) clock skew between regions causing fencing token comparisons to behave unpredictably.

Simulating a region partition with Linux tc and iptables on test Kubernetes nodes hosting east-region Kafka brokers:

# Block all traffic from eu-region (10.2.0.0/16) on the Kafka port
iptables -I INPUT  -s 10.2.0.0/16 -p tcp --dport 9092 -j DROP
iptables -I OUTPUT -d 10.2.0.0/16 -p tcp --sport 9092 -j DROP

# Alternatively — add 500 ms latency with ±100 ms jitter to simulate a degraded link
tc qdisc add dev eth0 root handle 1: prio
tc qdisc add dev eth0 parent 1:3 handle 30: netem delay 500ms 100ms
tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 \
    match ip dst 10.2.0.0/16 flowid 1:3

# Add 5 % packet loss to stress-test MM2 retry and exactly-once semantics
tc qdisc change dev eth0 parent 1:3 handle 30: netem delay 200ms loss 5%

# Restore
tc qdisc del dev eth0 root
iptables -D INPUT  -s 10.2.0.0/16 -p tcp --dport 9092 -j DROP
iptables -D OUTPUT -d 10.2.0.0/16 -p tcp --sport 9092 -j DROP

For Kubernetes-native chaos, Chaos Mesh provides cleaner namespace-scoped experiments without requiring node-level access:

NetworkChaos partition: Use NetworkChaos with action: partition targeting Kafka broker pods by label selector. Creates a clean in-cluster network partition without touching node iptables rules, making it safe to run in shared staging environments.
PodChaos kill: Kill the Schema Registry primary pod while MM2 is actively syncing to validate that the mirror connector correctly buffers and retries rather than dropping events or corrupting offset state.
TimeChaos skew: Introduce a 5-second clock skew between region nodes to expose fencing token comparisons that erroneously rely on wall-clock time instead of monotonic etcd revisions.
IOChaos latency: Add I/O latency on Kafka log directories to simulate a degraded broker during high-throughput replication, verifying that producer backpressure and consumer lag recovery stay within SLO bounds.

Recovery validation checklist after each chaos experiment:

MM2 automatically reconnects and catch-up lag returns to zero within the defined SLO window (typically 5 minutes).
No duplicate events appear in the target region during reconnection — requires exactly-once semantics enabled in MM2 producer configuration.
Consumer group offsets in the target region are consistent with translated MM2 checkpoints, not stale raw offsets from the source cluster.
The surviving control plane promoted itself to primary within the configured etcd election TTL (recommend 10 seconds).
Schema Registry in the target region accepted no new schema registrations during the simulated partition — confirming that the fencing guard correctly isolated write operations to the elected leader.

Operational Runbook: Responding to Region Outages

A runbook without specific, timed steps is a wishlist. The following procedure targets an RTO of 15 minutes for traffic failover and an RPO of under 60 seconds of committed-but-not-yet-replicated events. These targets assume MM2 is running with checkpoint intervals of 30 seconds and heartbeat intervals of 1 second.

Time	Action	Owner	Target SLO	Verification
T+0	Alert fires: MM2 heartbeat absent >10 s from `us-east-1`	On-call	Ack in 2 min	PagerDuty acknowledged
T+2	Confirm partition: ping east brokers, check cloud-provider health dashboard for regional AZ status	On-call	2 min	Health-check endpoints unreachable
T+4	Pause MM2 links: `kafka-cluster-links --alter --link east-to-eu --config "link.mode=paused"`	On-call	1 min	Connector status shows PAUSED
T+5	Promote EU control plane: `./scripts/promote-control-plane.sh eu-west-1` — script triggers etcd campaign and invalidates east fencing tokens	On-call	2 min	etcd leader shows eu-west-1 instance
T+7	Redirect producers: update DNS CNAME or service-mesh weighted routing to point to EU brokers	On-call	2 min	Producer metrics show EU as bootstrap target
T+9	Restart projection consumers using translated offsets from MM2 checkpoints — avoid `auto-offset-reset=latest`	On-call	3 min	Consumer lag reaches zero on EU cluster
T+12	Validate business metrics: order-processing rate, error rate, projection staleness indicators on Grafana	On-call + Lead	3 min	All dashboards green
T+15	Post-incident communication: update status page, notify stakeholders, open post-mortem ticket	Lead	Ongoing	Status page updated; stakeholders notified

RTO / RPO targets and communication checklist:

RTO target: 15 minutes from confirmed outage to restored traffic flow in the surviving region.
RPO target: <60 seconds of committed-but-not-replicated events, based on a 30-second MM2 checkpoint interval plus a 30-second safety margin.
T+5 communication: Incident channel created; stakeholders notified that an active region failover is in progress.
T+15 communication: Status page updated to "Degraded — operating in single-region mode from eu-west-1."
T+60 communication: Preliminary incident report shared with engineering leadership; customer-facing impact quantified.
Post-mortem: Scheduled within 48 hours; the chaos experiment that reproduces the failure mode is added to the quarterly drill calendar so the next on-call rotation faces the same scenario in a safe environment before it hits production again.

Dual Control Planes for Geo-Partitioned Event Sourcing: Surviving Region Blackouts

Table of Contents

Introduction

Story: When the East Coast Disappears

Why Dual Control Planes in Event-Sourced Stacks

Architecture Blueprint

Event Store Topology

Control Plane Mechanics and Failover

Step-by-Step Recovery Flow

Operational Commands and Config Snippets

Failure Modes to Drill

Observability and Runbooks

Data Repair and Consistency

Trade-offs and Costs

Mistakes to Avoid

Key Takeaways

Conclusion

Read Full Blog Here

Implementation Deep Dive: Kafka MirrorMaker 2 for Cross-Region Replication

Quorum Fencing in Practice: ZooKeeper, etcd, and Custom Implementations

Testing Geo-Partition Failures: Chaos Engineering Approaches

Operational Runbook: Responding to Region Outages

Tags

Leave a Comment

Related Posts

Dual Control Planes for Geo-Partitioned Event Sourcing: Surviving Region Blackouts

Table of Contents

Introduction

Story: When the East Coast Disappears

Why Dual Control Planes in Event-Sourced Stacks

Architecture Blueprint

Event Store Topology

Control Plane Mechanics and Failover

Step-by-Step Recovery Flow

Operational Commands and Config Snippets

Failure Modes to Drill

Observability and Runbooks

Data Repair and Consistency

Trade-offs and Costs

Mistakes to Avoid

Key Takeaways

Conclusion

Read Full Blog Here

Implementation Deep Dive: Kafka MirrorMaker 2 for Cross-Region Replication

Quorum Fencing in Practice: ZooKeeper, etcd, and Custom Implementations

Testing Geo-Partition Failures: Chaos Engineering Approaches

Operational Runbook: Responding to Region Outages

Tags

Leave a Comment

Related Posts

Event-Driven Architecture: Design, Patterns, and Production Best Practices

CQRS and Event Sourcing in Production: A Deep Dive

Multi-Region Active-Active Architecture: Designing Globally Distributed Systems

Flux CD Multi-Cluster GitOps: Image Automation, SOPS Secrets, and Progressive Reconciliation at Scale

Cookie Notice