System Design

Designing a Payment Processing System at Scale: Idempotency, Double-Spend Prevention & Settlement

Payment systems are the most unforgiving domain in distributed systems — a single duplicated charge or missed transaction can cause legal liability, chargebacks, and user trust loss. This guide covers the complete architecture of a production payment system: from idempotency keys to settlement reconciliation.

Md Sanwar Hossain April 6, 2026 20 min read System Design
Payment processing system design architecture for engineers

TL;DR — Core Principles

"Every payment operation must be idempotent (safe to retry), consistent (exactly-once execution), and auditable (double-entry ledger). The architecture must tolerate network failures, PSP outages, and clock skew without ever double-charging or losing a transaction."

Table of Contents

  1. Requirements & Scale Estimation
  2. High-Level Architecture
  3. Idempotency — The Foundation of Safe Payments
  4. Double-Spend & Race Condition Prevention
  5. Checkout Saga — Distributed Transaction Pattern
  6. PSP Integration & Webhook Handling
  7. Ledger & Double-Entry Accounting
  8. Settlement & Reconciliation
  9. PCI DSS Compliance Architecture
  10. Scaling to Millions of Transactions
  11. Design Checklist & Conclusion

1. Requirements & Scale Estimation

Before designing, anchor on realistic numbers. A mid-sized e-commerce platform handles:

Functional Requirements

Scale Estimates

Metric Value Notes
Peak TPS 5,000 transactions/sec Black Friday peaks
Daily transactions ~10M 115 avg TPS
Idempotency key storage ~5GB/day (24h TTL) Redis
Ledger entries/year ~7 billion rows 2 entries per txn (debit/credit)

2. High-Level Architecture

The payment system is composed of loosely coupled microservices, each owning a specific domain, communicating through a Kafka event bus for resilience and auditability.

Payment processing system architecture: API gateway, payment service, PSP, ledger, settlement and reconciliation
Payment System Architecture — full service topology with Kafka event bus, PSP integration, and downstream consumers. Source: mdsanwarhossain.me

Core Services

3. Idempotency — The Foundation of Safe Payments

Idempotency is non-negotiable in payment systems. Network failures are common: mobile apps lose connectivity mid-request, clients time out and retry, load balancers health-check endpoints. Without idempotency, every retry is a potential duplicate charge.

How Idempotency Keys Work

The client generates a UUID v4 idempotency key per payment intent and includes it in every request:

POST /v1/payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json

{
  "amount": 4999,
  "currency": "USD",
  "customer_id": "cust_abc123",
  "payment_method_id": "pm_visa_xxx"
}

The server stores the key → response mapping in Redis with a 24-hour TTL. On every incoming request, it checks Redis first:

// Pseudocode — idempotency check in Payment Service
String key = "idempotency:" + customerId + ":" + idempotencyKey;
String cachedResponse = redis.get(key);

if (cachedResponse != null) {
    return deserialize(cachedResponse); // return same result, no re-processing
}

// Acquire distributed lock to prevent concurrent duplicate processing
try (DistributedLock lock = lockService.acquire(key, 30_000)) {
    // Double-check after acquiring lock
    cachedResponse = redis.get(key);
    if (cachedResponse != null) return deserialize(cachedResponse);

    PaymentResult result = processPayment(request);
    redis.setex(key, 86400, serialize(result)); // 24h TTL
    return result;
}
Idempotency flow diagram for payment processing: first request vs duplicate request handling
Idempotency Pattern — how first requests and duplicate retries are handled differently. Source: mdsanwarhossain.me

Idempotency Key Design Rules

4. Double-Spend & Race Condition Prevention

Double-spend occurs when two concurrent requests for the same payment both succeed. This can happen when a client retries too aggressively while the original request is still processing.

Distributed Lock + Optimistic Locking

Two-layer defense: Redis distributed lock for cross-instance coordination, plus database-level optimistic locking (version column) as the final safety net:

-- Database check with optimistic locking
UPDATE payment_intents
SET status = 'processing', version = version + 1
WHERE id = :id
  AND status = 'pending'        -- only process once
  AND version = :expected_version;  -- optimistic lock

-- If 0 rows updated → concurrent request already processing → return 409

Payment State Machine

Strict state transitions prevent invalid operations:

5. Checkout Saga — Distributed Transaction Pattern

A checkout involves multiple services: inventory, payment, and fulfillment. You can't use a 2PC (two-phase commit) across microservices. Instead, use the Saga pattern with compensating transactions:

Choreography Saga — Checkout Flow

  1. Order Service → create order (status: pending) → publish order.created
  2. Inventory Service → reserve stock → publish inventory.reserved or inventory.reservation_failed
  3. Payment Service → charge card → publish payment.succeeded or payment.failed
  4. Fulfillment Service → create shipment → publish shipment.created
  5. Notification Service → send confirmation email

Compensations: If payment fails → Inventory Service listens to payment.failed and releases the reservation. If fulfillment fails → Payment Service issues refund.

Outbox Pattern — Guaranteed Event Delivery

Never publish Kafka events directly from the payment handler — if the app crashes after the DB write but before the Kafka publish, the event is lost. Use the Outbox pattern:

// Inside DB transaction (atomic)
BEGIN TRANSACTION;
  UPDATE payments SET status = 'succeeded' WHERE id = :id;
  INSERT INTO outbox_events (aggregate_id, event_type, payload)
    VALUES (:id, 'payment.succeeded', :json_payload);
COMMIT;

// Separate Debezium CDC connector reads outbox table
// → publishes to Kafka → marks event as published

6. PSP Integration & Webhook Handling

Payment Service Providers (Stripe, Adyen, Braintree) are external dependencies with their own failure modes. Your integration must handle PSP timeouts, network errors, and asynchronous callbacks robustly.

PSP Call Resilience Pattern

Webhook Processing

PSPs deliver webhooks asynchronously for events like charge.succeeded, charge.failed, refund.created. Webhooks may arrive out of order and may be delivered multiple times:

7. Ledger & Double-Entry Accounting

Every financial system requires an immutable audit trail. The double-entry ledger records every money movement as two equal and opposite entries, ensuring the books always balance.

-- Ledger schema (append-only, never updated or deleted)
CREATE TABLE ledger_entries (
    id           BIGSERIAL PRIMARY KEY,
    account_id   UUID NOT NULL,       -- customer, merchant, or system account
    txn_id       UUID NOT NULL,       -- links debit and credit entries
    entry_type   ENUM('debit','credit') NOT NULL,
    amount       BIGINT NOT NULL,     -- cents, never floats
    currency     CHAR(3) NOT NULL,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    metadata     JSONB
);

-- Example: customer pays merchant $49.99
INSERT INTO ledger_entries VALUES
  (uuid, customer_acct, txn_id, 'debit',  4999, 'USD', now()),  -- customer -$49.99
  (uuid, merchant_acct, txn_id, 'credit', 4999, 'USD', now());  -- merchant +$49.99

Key Ledger Design Rules

8. Settlement & Reconciliation

PSPs batch-settle funds to merchant bank accounts on a schedule (T+1 or T+2). Settlement reconciliation ensures your ledger matches the PSP's settlement report — discrepancies indicate missing transactions, processing errors, or fraud.

Reconciliation Pipeline

  1. Download PSP settlement report (CSV/JSON via SFTP or API) at end of settlement period
  2. Parse and normalize PSP transaction IDs, amounts, fees, and statuses
  3. Match each PSP entry against your internal payment records by PSP transaction ID
  4. Identify discrepancies:
    • PSP charged but no internal record → potential ghost charge
    • Internal record but no PSP entry → payment may not have processed
    • Amount mismatch → fee calculation error or currency conversion issue
  5. Auto-resolve known patterns (e.g., authorization holds that expired)
  6. Flag unresolved discrepancies for manual finance team review
  7. Generate reconciliation report with match rate, total settled, and open items

Handling PSP Fees

PSPs charge interchange fees (typically 1.5–3% + fixed fee). Your ledger must account for fees separately: the gross amount credited minus the PSP fee equals the net settlement. Store fee amounts per transaction to enable accurate revenue reporting.

9. PCI DSS Compliance Architecture

PCI DSS (Payment Card Industry Data Security Standard) mandates how cardholder data is stored, transmitted, and processed. The primary strategy is to avoid storing cardholder data at all.

Tokenization Strategy

10. Scaling to Millions of Transactions

Database Scaling

Fraud Detection at Scale

Real-time fraud scoring must add minimal latency (<100ms) to the payment flow. Architecture:

11. Design Checklist & Conclusion

Payment systems reward defensive engineering. Every shortcut in reliability manifests as a chargeback, a regulatory audit, or an angry customer. Before going live, validate:

Payment System Production Checklist

  • ☐ Every write endpoint implements idempotency key checking with Redis
  • ☐ Payment state machine enforces valid transitions (no charge of an already-succeeded payment)
  • ☐ Outbox pattern used for all Kafka event publishing (no lost events)
  • ☐ PSP calls have timeout, retry, and circuit breaker logic
  • ☐ Webhook handlers verify signatures and process idempotently
  • ☐ Ledger is append-only and uses integer cents for all amounts
  • ☐ Daily reconciliation pipeline runs and alerts on unmatched transactions
  • ☐ No raw card data (PAN, CVV) stored anywhere in your system
  • ☐ Fraud scoring runs in <100ms and doesn't block payment processing
  • ☐ Load tested to 2× peak TPS with idempotency under concurrent retry storm

A production payment system is one of the most complex distributed systems you'll build — not because of algorithmic complexity, but because the failure modes are financial and legal. Start with a single PSP integration and add resilience layers incrementally. Use managed services (Stripe, Adyen) for PSP integration to avoid reinventing payment rails. Focus your engineering effort on idempotency, the ledger, reconciliation, and fraud — these are the layers that differentiate reliable payment infrastructure.

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · System Design

All Posts
Last updated: April 6, 2026