Microservices

Idempotency Patterns in Distributed Systems: Building Exactly-Once Processing

In distributed systems, network failures are not exceptional — they are the normal operating condition. Every retry is a potential duplicate. Idempotency patterns — idempotency keys, database deduplication, the Outbox Pattern, and Kafka exactly-once semantics — are the engineering discipline that converts "at-least-once" delivery into "exactly-once" behavior, protecting users from double charges, duplicate orders, and phantom records.

Md Sanwar Hossain March 2026 20 min read Microservices

Idempotency patterns in distributed microservices systems

Part of the Distributed Systems Failure Handling Series.

Introduction
The Real-World Problem: Double Charging in Production
What Is Idempotency?
Pattern 1: Idempotency Keys
Pattern 2: Database Deduplication
Pattern 3: The Outbox Pattern for Events
Pattern 4: Kafka Exactly-Once Semantics
Implementing Idempotency in Spring Boot with AOP
Architecture Overview
Failure Scenarios
Trade-offs
When NOT to Use Idempotency Patterns

Introduction

Idempotency Patterns | mdsanwarhossain.me — Idempotency Patterns — mdsanwarhossain.me

Distributed systems communicate over unreliable networks. Packets are lost, TCP connections time out mid-flight, load balancers reset connections, and pods are evicted mid-request. In every one of these scenarios, the caller faces an impossible question: did the operation succeed before the connection dropped, or did it fail? Without idempotency, the only safe answer is to retry — and without idempotency on the server side, that retry creates a duplicate.

Idempotency is not a feature you can add as an afterthought. It must be designed into every state-mutating operation from the start. The cost of ignoring it shows up in production at the worst possible moment: during a payment surge, during a network partition, during a deployment rollout when pods are being gracefully terminated mid-request. This post covers four concrete patterns — each addressing a different layer of the stack — that together form a complete exactly-once processing strategy for production microservices.

The Real-World Problem: Double Charging in Production

At 9:47 PM on a Friday, your payment service processes a $349.99 subscription renewal for a customer. The payment service calls the upstream payment gateway, which successfully charges the card and prepares a 200 OK response. At exactly that moment, a network switch briefly saturates. The TCP ACK carrying the 200 OK is dropped. The payment service's HTTP client times out after 5 seconds waiting for a response.

From the payment service's perspective, the request timed out — outcome unknown. The retry policy, correctly configured to handle transient failures, fires automatically. The same payment request is sent again to the gateway. The gateway has no record of seeing this request before. It charges the card a second time. Returns 200 OK. This time the response arrives successfully. The payment service records a successful payment and returns to the caller.

The customer now has two charges of $349.99 on their credit card statement. Support tickets start arriving. The customer files a chargeback. Your fraud detection system flags the account. Engineering gets paged at 10:15 PM for an "incident" that was not caused by any code bug — it was caused by the absence of idempotency design. This scenario is not hypothetical. Stripe, PayPal, and every major payment processor have detailed documentation on idempotency keys precisely because this failure mode is universal and inevitable at scale.

The fundamental problem: in a distributed system, "request timed out" does not mean "request failed." It means "I don't know." Any retry logic applied to non-idempotent operations in the presence of this uncertainty is a latent duplicate-processing bug.

What Is Idempotency?

Distributed Idempotency | mdsanwarhossain.me — Distributed Idempotency — mdsanwarhossain.me

An operation is idempotent if executing it multiple times produces the same result as executing it once. HTTP GET is naturally idempotent — reading a resource does not change it. HTTP PUT is designed to be idempotent — setting a resource to a specific value is the same whether done once or ten times. HTTP POST is not naturally idempotent — creating a resource multiple times creates multiple resources.

In distributed systems, idempotency becomes a property you must explicitly engineer into state-mutating operations. The central challenge arises from at-least-once delivery: every reliable messaging system and retry-enabled HTTP client guarantees that a message or request will be delivered at least once, but not exactly once. Network failures, timeouts, and partial failures all require retries to achieve reliability. Retries require idempotency to avoid duplicates. This is the fundamental tension: reliability requires retries, and retries require idempotency.

At-least-once semantics: the operation might execute one or more times. Simple to implement, safe for idempotent operations, dangerous for non-idempotent ones. At-most-once semantics: the operation executes zero or one time. Simple to implement (fire and forget, no retries), but loses messages on failure. Exactly-once semantics: the operation executes precisely one time despite retries and failures. Expensive to implement — requires coordination between sender and receiver — but is the correct model for payment processing, inventory reservation, and any operation with financial or inventory consequences.

Pattern 1: Idempotency Keys

The idempotency key pattern is the most widely applicable solution. The client generates a unique key (typically a UUID v4) and includes it in every request — usually as an HTTP header (Idempotency-Key: <uuid>). The server uses this key to deduplicate requests: if a request with a given key has already been processed successfully, the server returns the previously computed response without re-executing the operation.

The server-side implementation uses a fast key-value store — Redis is the standard choice — to store the result of each processed request keyed by its idempotency key, with a TTL that covers the client's maximum retry window (typically 24 hours for payment operations).

@Service
@RequiredArgsConstructor
public class IdempotentPaymentService {

    private final RedissonClient redisson;
    private final PaymentRepository paymentRepository;

    private static final Duration IDEMPOTENCY_TTL = Duration.ofHours(24);

    public PaymentResponse processPayment(String idempotencyKey, PaymentRequest request) {
        String redisKey = "idempotency:payment:" + idempotencyKey;
        RBucket<PaymentResponse> bucket = redisson.getBucket(redisKey);

        // Check cache first — return stored result if key already seen
        PaymentResponse cached = bucket.get();
        if (cached != null) {
            log.info("Returning cached result for idempotency key: {}", idempotencyKey);
            return cached;
        }

        // Use SET NX (set if not exists) to claim processing rights atomically
        // This prevents two concurrent requests with the same key from both processing
        RBucket<String> lockBucket = redisson.getBucket("idempotency:lock:" + idempotencyKey);
        boolean claimed = lockBucket.setIfAbsent("processing", Duration.ofSeconds(30));
        if (!claimed) {
            // Another thread is currently processing this key — wait and retry cache
            throw new IdempotencyConflictException(
                "Request with key " + idempotencyKey + " is currently being processed");
        }

        try {
            PaymentResponse result = executePayment(request);
            // Store result before releasing lock — TTL covers client retry window
            bucket.set(result, IDEMPOTENCY_TTL);
            return result;
        } catch (Exception ex) {
            // Do NOT cache failures — allow client to retry with same key
            log.error("Payment failed for idempotency key {}: {}", idempotencyKey, ex.getMessage());
            throw ex;
        } finally {
            lockBucket.delete();
        }
    }

    private PaymentResponse executePayment(PaymentRequest request) {
        // Actual payment gateway call
        return paymentRepository.charge(request);
    }
}

The SET NX (set if not exists) operation on the lock key is critical for race condition handling. Without it, two concurrent requests with the same idempotency key — which can happen when a client retries very quickly — could both pass the cache check simultaneously, both execute the payment, and both store their results. The atomic SET NX ensures exactly one request wins the right to process, while the other receives a conflict response and can retry the cache check momentarily. The 30-second lock TTL prevents deadlock if the processing pod crashes before deleting the lock.

One important nuance: failed operations should not be cached. If the payment gateway returns an error, the client should be able to retry with the same idempotency key and have the operation re-attempted. Only successful results should be stored. This matches Stripe's documented behavior: a key can be retried if the original request failed, but not if it succeeded.

Pattern 2: Database Deduplication

When Redis is not available, or when you need durability guarantees that survive a Redis restart, database-level deduplication using a unique constraint on the idempotency key column provides a simpler and more durable alternative.

-- Payments table with idempotency enforcement at the database level
CREATE TABLE payments (
    id            BIGSERIAL PRIMARY KEY,
    idempotency_key VARCHAR(64)    NOT NULL,
    order_id      BIGINT          NOT NULL,
    amount        NUMERIC(12, 2)  NOT NULL,
    currency      CHAR(3)         NOT NULL,
    status        VARCHAR(20)     NOT NULL DEFAULT 'PENDING',
    gateway_txn_id VARCHAR(128),
    created_at    TIMESTAMPTZ     NOT NULL DEFAULT NOW(),
    CONSTRAINT uq_payments_idempotency_key UNIQUE (idempotency_key)
);

// Repository with deduplication via ON CONFLICT DO NOTHING
@Repository
public interface PaymentRepository extends JpaRepository<Payment, Long> {

    @Modifying
    @Query(value = """
        INSERT INTO payments
            (idempotency_key, order_id, amount, currency, status)
        VALUES
            (:idempotencyKey, :orderId, :amount, :currency, 'PENDING')
        ON CONFLICT (idempotency_key) DO NOTHING
        """, nativeQuery = true)
    int insertIgnoreDuplicate(
        @Param("idempotencyKey") String idempotencyKey,
        @Param("orderId") Long orderId,
        @Param("amount") BigDecimal amount,
        @Param("currency") String currency
    );

    Optional<Payment> findByIdempotencyKey(String idempotencyKey);
}

@Transactional
public PaymentResponse processPayment(String idempotencyKey, PaymentRequest request) {
    int inserted = paymentRepository.insertIgnoreDuplicate(
        idempotencyKey, request.getOrderId(),
        request.getAmount(), request.getCurrency());

    if (inserted == 0) {
        // Duplicate — return the previously recorded result
        return paymentRepository.findByIdempotencyKey(idempotencyKey)
            .map(PaymentResponse::from)
            .orElseThrow(() -> new IllegalStateException(
                "Idempotency key conflict but record not found: " + idempotencyKey));
    }

    // First time seeing this key — proceed with payment gateway call
    PaymentResponse response = gatewayClient.charge(request);
    paymentRepository.updateStatusAndGatewayTxn(
        idempotencyKey, response.getStatus(), response.getTransactionId());
    return response;
}

The ON CONFLICT DO NOTHING clause (PostgreSQL) or INSERT IGNORE (MySQL) makes the insert a no-op if a row with the same idempotency_key already exists. The return value (rows affected) tells you whether this is a first execution or a duplicate. The key advantage over Redis is durability — the deduplication record survives pod restarts, Redis evictions, and cache flushes. The key disadvantage is latency: every idempotency check requires a synchronous database round-trip, and the idempotency table can become a hotspot under high write throughput. For most services, this trade-off favors the database approach for financial operations, where durability outweighs the latency cost.

Pattern 3: The Outbox Pattern for Events

When a service must both update its database and publish an event to Kafka atomically — without the risk of publishing an event for a transaction that later rolled back, or failing to publish an event for a transaction that committed — the Outbox Pattern is the standard solution. It exploits the fact that writing to two tables in the same database transaction is atomic; publishing to Kafka in the same transaction is not.

-- Outbox table: events staged for publication within the same DB transaction
CREATE TABLE outbox_events (
    id            BIGSERIAL     PRIMARY KEY,
    aggregate_id  VARCHAR(64)   NOT NULL,
    aggregate_type VARCHAR(64)  NOT NULL,
    event_type    VARCHAR(128)  NOT NULL,
    payload       JSONB         NOT NULL,
    idempotency_key VARCHAR(64) NOT NULL,
    published     BOOLEAN       NOT NULL DEFAULT FALSE,
    created_at    TIMESTAMPTZ   NOT NULL DEFAULT NOW(),
    published_at  TIMESTAMPTZ,
    CONSTRAINT uq_outbox_idempotency_key UNIQUE (idempotency_key)
);

-- Consumer deduplication table: tracks processed event IDs
CREATE TABLE processed_events (
    event_id      VARCHAR(64)   PRIMARY KEY,
    processed_at  TIMESTAMPTZ   NOT NULL DEFAULT NOW()
);

@Transactional
public void createOrder(CreateOrderCommand command) {
    // Step 1: Write business state — order record
    Order order = orderRepository.save(Order.from(command));

    // Step 2: Write outbox event — in the SAME transaction as Step 1
    OutboxEvent event = OutboxEvent.builder()
        .aggregateId(order.getId().toString())
        .aggregateType("Order")
        .eventType("OrderCreated")
        .idempotencyKey(command.getIdempotencyKey())
        .payload(objectMapper.valueToTree(OrderCreatedEvent.from(order)))
        .build();
    outboxRepository.save(event);

    // Transaction commits here — both order and outbox event are persisted atomically.
    // If the transaction rolls back (e.g., constraint violation), neither is persisted.
    // No Kafka call happens inside this transaction.
}

A separate outbox publisher — either a polling thread or a Change Data Capture (CDC) connector — reads unpublished rows from the outbox table and publishes them to Kafka. Debezium is the standard CDC tool: it streams the PostgreSQL WAL (Write-Ahead Log) and publishes row-level changes to Kafka topics without any polling overhead.

# Debezium PostgreSQL connector configuration
{
  "name": "outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres-primary",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "${DEBEZIUM_PASSWORD}",
    "database.dbname": "orders_db",
    "table.include.list": "public.outbox_events",
    "transforms": "outbox",
    "transforms.outbox.type":
      "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.table.field.event.id": "idempotency_key",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.table.field.event.payload": "payload",
    "slot.name": "outbox_debezium_slot"
  }
}

On the consumer side, idempotency is enforced using the processed_events table. Before processing any event, the consumer checks whether the event's idempotency key already exists. If it does, the event is acknowledged and skipped. If it does not, the consumer processes the event and inserts the key — in the same database transaction as the business logic update — ensuring that a consumer crash after processing but before acknowledging the Kafka offset cannot cause double processing on restart.

@KafkaListener(topics = "order-events", groupId = "inventory-service")
@Transactional
public void handleOrderEvent(OrderCreatedEvent event) {
    String eventId = event.getIdempotencyKey();

    // Check deduplication table within the same transaction
    if (processedEventRepository.existsById(eventId)) {
        log.info("Skipping already-processed event: {}", eventId);
        return;
    }

    // Process business logic
    inventoryService.reserveItems(event.getOrderId(), event.getItems());

    // Record processed event in same transaction — atomically with business logic
    processedEventRepository.save(new ProcessedEvent(eventId));
}

Pattern 4: Kafka Exactly-Once Semantics

Kafka's exactly-once semantics (EOS) is a producer-level and consumer-level feature that prevents both duplicate messages and message loss at the Kafka broker layer, independently of application-level deduplication. It is particularly valuable for stream processing pipelines built with Kafka Streams or scenarios where the Outbox Pattern is not feasible.

EOS in Kafka consists of three cooperating features. Producer idempotence prevents duplicate messages caused by producer retries. When the broker acknowledges a message but the ACK is lost in transit, the producer retries and sends a duplicate. With idempotent producers, the broker tracks sequence numbers per producer session and discards duplicate batches.

// Kafka producer configuration for exactly-once semantics
@Bean
public ProducerFactory<String, Object> exactlyOnceProducerFactory() {
    Map<String, Object> config = new HashMap<>();
    config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
    config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
    config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);

    // Enable producer idempotence — prevents duplicate messages on retry
    config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

    // Required for idempotence: acks=all ensures broker-side durability
    config.put(ProducerConfig.ACKS_CONFIG, "all");

    // Assign a stable transactional ID — unique per producer instance/partition
    config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "payment-service-producer-0");

    // Reduce in-flight requests to 1 per connection when idempotence is enabled
    config.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);

    return new DefaultKafkaProducerFactory<>(config);
}

@Bean
public KafkaTemplate<String, Object> exactlyOnceKafkaTemplate() {
    return new KafkaTemplate<>(exactlyOnceProducerFactory());
}

// Transactional publish — either all messages in the transaction commit or none do
@Autowired
private KafkaTemplate<String, Object> kafkaTemplate;

public void publishPaymentEvent(PaymentCompletedEvent event) {
    kafkaTemplate.executeInTransaction(ops -> {
        ops.send("payment-completed", event.getOrderId(), event);
        ops.send("audit-log", event.getOrderId(), AuditEvent.from(event));
        return true;
        // Both messages commit atomically, or neither does
    });
}

On the consumer side, read_committed isolation ensures consumers only see messages from committed producer transactions — messages from aborted transactions are never visible.

@Bean
public ConsumerFactory<String, Object> exactlyOnceConsumerFactory() {
    Map<String, Object> config = new HashMap<>();
    config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
    config.put(ConsumerConfig.GROUP_ID_CONFIG, "payment-processor-group");
    config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class);

    // Only read messages from committed transactions
    config.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");

    // Disable auto-commit — manage offsets within Kafka transactions
    config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);

    return new DefaultKafkaConsumerFactory<>(config);
}

The trade-off is real: Kafka EOS adds approximately 5–15% throughput overhead due to the transaction coordinator round-trips and acks=all requirement. For high-throughput pipelines (millions of messages/second), this cost is significant. Use EOS for financial streams, audit logs, and inventory systems where exactly-once processing has direct business value. For analytics pipelines where occasional duplicates are acceptable, at-least-once with idempotent consumers is a better cost/correctness trade-off.

Implementing Idempotency in Spring Boot with AOP

Rather than duplicating idempotency check logic in every service method, an AOP-based interceptor driven by a custom @Idempotent annotation keeps the implementation DRY and enforces a consistent pattern across all idempotent endpoints.

// Custom annotation to mark idempotent methods
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface Idempotent {
    /** Header name from which to extract the idempotency key */
    String keyHeader() default "Idempotency-Key";
    /** Redis TTL for the cached response */
    long ttlHours() default 24;
}

// AOP advice that intercepts @Idempotent methods
@Aspect
@Component
@RequiredArgsConstructor
@Slf4j
public class IdempotencyAspect {

    private final RedissonClient redisson;
    private final HttpServletRequest httpRequest;

    @Around("@annotation(idempotent)")
    public Object enforceIdempotency(ProceedingJoinPoint joinPoint,
                                     Idempotent idempotent) throws Throwable {
        String idempotencyKey = httpRequest.getHeader(idempotent.keyHeader());

        if (idempotencyKey == null || idempotencyKey.isBlank()) {
            throw new MissingIdempotencyKeyException(
                "Header '" + idempotent.keyHeader() + "' is required for this operation");
        }

        String redisKey = buildRedisKey(joinPoint, idempotencyKey);
        RBucket<Object> bucket = redisson.getBucket(redisKey);

        // Return cached result if present
        Object cached = bucket.get();
        if (cached != null) {
            log.debug("Idempotent cache hit for key: {}", idempotencyKey);
            return cached;
        }

        // Acquire processing lock with SET NX
        RBucket<String> lock = redisson.getBucket(redisKey + ":lock");
        boolean acquired = lock.setIfAbsent("1", Duration.ofSeconds(30));
        if (!acquired) {
            throw new IdempotencyConflictException(
                "Concurrent request with key: " + idempotencyKey);
        }

        try {
            Object result = joinPoint.proceed();
            // Cache only on success
            bucket.set(result, Duration.ofHours(idempotent.ttlHours()));
            return result;
        } finally {
            lock.delete();
        }
    }

    private String buildRedisKey(ProceedingJoinPoint jp, String idempotencyKey) {
        String className = jp.getTarget().getClass().getSimpleName();
        String methodName = jp.getSignature().getName();
        return String.format("idempotency:%s:%s:%s", className, methodName, idempotencyKey);
    }
}

// Usage: annotate any controller method requiring idempotency
@RestController
@RequestMapping("/api/v1/payments")
public class PaymentController {

    @PostMapping
    @Idempotent(keyHeader = "Idempotency-Key", ttlHours = 24)
    public ResponseEntity<PaymentResponse> processPayment(
            @RequestBody @Valid PaymentRequest request) {
        PaymentResponse response = paymentService.charge(request);
        return ResponseEntity.status(HttpStatus.CREATED).body(response);
    }
}

The AOP approach means idempotency is enforced declaratively. Any developer adding a new payment or reservation endpoint simply annotates it with @Idempotent — there is no risk of forgetting to add the Redis check manually. The aspect handles key extraction, cache lookup, lock acquisition, result caching, and lock release uniformly. Integration tests can verify the AOP interceptor independently of any specific service implementation.

Architecture Overview

The following describes the complete request flow combining all four patterns into a cohesive exactly-once architecture:

Client
  │
  │  POST /api/v1/payments
  │  Headers: Idempotency-Key: a1b2c3d4-...
  ▼
API Gateway
  │  Validates Idempotency-Key header presence
  │  Rate limiting per key (prevents abuse)
  ▼
Payment Service (Spring Boot)
  │
  ├─► Redis Cache Check (Redisson SET NX)
  │     Hit?  ──► Return cached response immediately
  │     Miss? ──► Acquire lock, proceed
  │
  ├─► Database Transaction (PostgreSQL)
  │     ├─ INSERT payment record (ON CONFLICT DO NOTHING)
  │     └─ INSERT outbox_events row (same transaction)
  │         Commit ──► Both writes durable, atomic
  │
  ├─► Store result in Redis (TTL 24h)
  │
  ▼
Debezium CDC (reads PostgreSQL WAL)
  │  Streams outbox_events INSERT to Kafka
  ▼
Kafka Topic: payment-events
  (producer.enable.idempotence=true, acks=all)
  │
  ▼
Downstream Consumer (Notification Service)
  │  isolation.level=read_committed
  ├─► Check processed_events table
  │     Already processed? ──► Skip, ack offset
  │     New event?         ──► Process + INSERT processed_events (same TX)
  └─► Commit Kafka offset

Each layer provides a different guarantee. Redis provides fast deduplication for the common case (retry within 24 hours, Redis available). The database unique constraint provides durability for the idempotency record. The Outbox Pattern decouples the database write from the Kafka publish atomically. Debezium CDC ensures no outbox event is missed even on service restart. The consumer's processed_events table handles the rare case of Kafka consumer rebalance or offset reset causing message redelivery.

Failure Scenarios

Idempotency key collision: Two different clients independently generate the same UUID (astronomically unlikely with UUID v4, but theoretically possible). The second client's request receives the first client's cached response — silently wrong. Mitigate by scoping keys to a user ID prefix: {user_id}:{uuid}. This makes cross-client collision impossible.

Redis failure during processing: If Redis goes down after the lock is acquired but before the result is cached, the lock is never stored and the processing lock is never set — the system falls back to database-level deduplication transparently. The reverse is more dangerous: if Redis returns a stale cache hit for a key whose database record was deleted (data corrective action), the service returns a cached response for an operation that no longer exists in the database. Implement a secondary database lookup for high-stakes operations when the cached result is being used for a debit or irreversible action.

TTL expiry before client retry: If a client retries with a valid idempotency key after the 24-hour TTL has expired, Redis returns a cache miss and the operation executes again — a duplicate. For payment operations, set the TTL to match your business SLA (typically 7 days for payment reconciliation windows). For short-lived operations, 24 hours is usually sufficient.

Partial writes in the Outbox: If the service crashes after the database transaction commits (writing both the payment record and the outbox event) but before Debezium reads the outbox row, the row remains with published=FALSE indefinitely. Debezium's polling or WAL streaming will pick it up on next startup. If Debezium's CDC slot falls too far behind (WAL lag), PostgreSQL will refuse to reclaim WAL disk space. Monitor pg_replication_slots lag and set wal_keep_size appropriately.

Trade-offs

Storage cost: Every successful idempotent operation stores a result in Redis and/or a row in the idempotency table. At 1,000 payments/second with 24-hour TTL, Redis holds ~86 million keys. Redis memory usage depends on the result object size — if you cache the full response, serialize it carefully. Consider caching only a status code and minimal metadata rather than the full response body.

Latency overhead: Every request with idempotency checking incurs at least one additional network round-trip (Redis or database). For Redis at sub-millisecond latency on the same cluster, this is typically <1ms. For database deduplication, it adds a synchronous DB query on the critical path. Measure P99 impact on your specific deployment before committing to a deduplication strategy for your highest-throughput endpoints.

Complexity: The Outbox Pattern adds three new infrastructure components (Debezium, Kafka connector, CDC slot) and two new tables to every service that publishes events. This is significant operational overhead. Evaluate whether your correctness requirements justify this complexity — for internal event notifications where occasional duplicates are tolerable, a simpler at-least-once approach with idempotent consumers may be sufficient.

TTL choices: A short TTL (1 hour) reduces storage and Redis memory pressure but creates a window where a slow client retry (caused by a long network outage) misses the cache and causes a duplicate. A long TTL (7 days) protects against slow retries but consumes proportionally more memory. Base your TTL on the maximum observed retry window in your client timeout policies, with a safety factor of 10×.

When NOT to Use Idempotency Patterns

Read-only operations: HTTP GET, database SELECT, cache reads — these do not mutate state and are naturally idempotent. Adding idempotency key infrastructure to read operations is pure overhead with no benefit. Any operation that only reads data can be safely retried without any deduplication mechanism.

Truly stateless operations: Pure computation endpoints that take input and return output without writing to any external system — hash computation, format conversion, validation — do not need idempotency keys. Every call is inherently idempotent because there is no state to duplicate.

Operations with natural deduplication: If your business logic already enforces uniqueness — for example, a "set user email" operation that uses an upsert (INSERT ON CONFLICT DO UPDATE), or a state machine that rejects transitions already in the target state — the operation is already idempotent by design. Adding an explicit idempotency key layer would be redundant.

Fire-and-forget analytics events: For high-volume analytics event ingestion where occasional duplicates are acceptable (and de-duplicated downstream in the analytics warehouse), the overhead of idempotency enforcement outweighs the cost of handling occasional duplicates in the reporting layer. Use at-least-once delivery and handle duplicates analytically with COUNT(DISTINCT event_id) rather than preventing them at ingestion.

Key Takeaways

Timeouts do not mean failure: In distributed systems, a timed-out request may have succeeded on the server. Any retry of a non-idempotent operation without an idempotency key is a latent duplicate-processing bug. Design idempotency in from the start.
Use UUID v4 idempotency keys scoped to the user: Clients generate keys, servers store results. Scope keys as {user_id}:{uuid} to eliminate cross-client collision risk. Include the key in all retry attempts for the same logical operation.
Redis SET NX handles race conditions: The atomic SET NX (set if not exists) on the lock key prevents two concurrent requests with the same idempotency key from both processing. Always set a lock TTL to prevent deadlock on pod crash.
The Outbox Pattern is the correct solution for event publishing: Writing to the DB and publishing to Kafka in the same "transaction" without the Outbox Pattern leads to either dual-write failures (event published but DB rolled back) or missed events (DB committed but service crashed before publishing). The Outbox Pattern atomically stages events for CDC publication.
Kafka EOS has a throughput cost: Exactly-once semantics in Kafka adds ~5–15% overhead. Use it for financial and inventory streams where exactly-once has direct business value. Use at-least-once with idempotent consumers for analytics and audit trails where occasional duplicates are tolerable.
AOP + @Idempotent keeps the implementation DRY: Centralise Redis check, lock acquisition, result caching, and lock release in a single AOP aspect. Declarative annotation usage prevents developers from accidentally omitting idempotency checks on new endpoints.
TTL must exceed your maximum retry window: If your HTTP client retries for up to 24 hours, your idempotency TTL must be longer. Base TTL on business reconciliation windows (typically 7 days for payments) rather than technical retry intervals.

Idempotency Patterns in Distributed Systems: Building Exactly-Once Processing

Table of Contents

Introduction

The Real-World Problem: Double Charging in Production

What Is Idempotency?

Pattern 1: Idempotency Keys

Pattern 2: Database Deduplication

Pattern 3: The Outbox Pattern for Events

Pattern 4: Kafka Exactly-Once Semantics

Implementing Idempotency in Spring Boot with AOP

Architecture Overview

Failure Scenarios

Trade-offs

When NOT to Use Idempotency Patterns

Key Takeaways

Tags

Leave a Comment

Related Posts

Idempotency Patterns in Distributed Systems: Building Exactly-Once Processing

Table of Contents

Introduction

The Real-World Problem: Double Charging in Production

What Is Idempotency?

Pattern 1: Idempotency Keys

Pattern 2: Database Deduplication

Pattern 3: The Outbox Pattern for Events

Pattern 4: Kafka Exactly-Once Semantics

Implementing Idempotency in Spring Boot with AOP

Architecture Overview

Failure Scenarios

Trade-offs

When NOT to Use Idempotency Patterns

Key Takeaways

Tags

Leave a Comment

Related Posts

Saga Pattern for Distributed Transactions in Microservices

Handling Partial Failures in Distributed Systems: Circuit Breaker, Retry, and Bulkhead Patterns

Event-Driven Architecture: Design, Patterns, and Production Best Practices

Cookie Notice