Event-Driven Architecture: Design, Patterns, and Production Best Practices

Event-driven architecture showing event streams and services

Event-driven architecture decouples producers and consumers, enables independent scaling, and makes business workflows observable through their event streams. But EDA introduces significant complexity around ordering, idempotency, and distributed consistency that must be designed for explicitly. This guide covers the patterns that make EDA work in production.

What Makes an Architecture "Event-Driven"

In an event-driven architecture, services communicate by producing and consuming events — immutable records of something that has already happened. "OrderPlaced", "PaymentProcessed", "ItemShipped" are events. An event is a statement of fact, in the past tense, with a complete description of what occurred. Unlike an RPC call (which says "do this now"), an event says "this happened" — and any interested consumer can react to it independently.

This distinction has profound architectural implications. The order service that publishes "OrderPlaced" does not know which other services will consume it. Today it might be the inventory service and the notification service. Next quarter, the fraud detection service might subscribe. Adding a new consumer does not require changing the order service. This is the fundamental decoupling that makes event-driven systems extensible — new capabilities can be added to the system without modifying the components that produce the events.

Domain Events: The Heart of EDA

Domain events represent significant occurrences within a bounded context. They are the primary integration mechanism between services in a Domain-Driven Design system. Designing domain events well — choosing the right level of granularity, including the right payload, and versioning them for backward compatibility — is as important as designing API contracts.

// Well-designed domain event — immutable, timestamped, versioned
public record OrderPlacedEvent(
    @JsonProperty("event_id")     String eventId,        // unique event identifier
    @JsonProperty("event_type")   String eventType,      // discriminator for consumers
    @JsonProperty("event_version") int eventVersion,     // schema version
    @JsonProperty("occurred_at")  Instant occurredAt,    // when it happened
    @JsonProperty("order_id")     String orderId,        // aggregate ID
    @JsonProperty("customer_id")  String customerId,
    @JsonProperty("items")        List<OrderItemDto> items,
    @JsonProperty("total_amount") BigDecimal totalAmount,
    @JsonProperty("currency")     String currency
) {
    public static OrderPlacedEvent from(Order order) {
        return new OrderPlacedEvent(
            UUID.randomUUID().toString(),
            "ORDER_PLACED",
            1,
            Instant.now(),
            order.getId(),
            order.getCustomerId(),
            order.getItems().stream().map(OrderItemDto::from).toList(),
            order.getTotalAmount(),
            order.getCurrency()
        );
    }
}

Event Sourcing: The Event Log as the Source of Truth

In event sourcing, instead of storing the current state of an entity in a database row, you store the sequence of events that led to the current state. The current state is derived by replaying all events from the beginning (or from a snapshot). This approach has powerful properties: complete audit trail (every state change is recorded); temporal queries (reconstruct state at any point in time); and debugging by replaying events to reproduce bugs.

Event sourcing is not appropriate for every entity. Use it for entities where auditability and temporal queries are requirements, and where the event volume per entity is manageable. Financial accounts, inventory items, and document revision history are natural fits. User profile data with frequent updates is a poor fit — the event log grows without bound and replay becomes expensive.

CQRS: Separating Reads from Writes

Command Query Responsibility Segregation (CQRS) separates the write model (commands that change state, producing events) from the read model (queries that return data, consuming events). The write model is optimized for transactional correctness; the read model is optimized for query performance, potentially using denormalized projections, search indexes, or materialized views that are updated by consuming events from the write model.

// CQRS: Command handler writes event; projection updates read model
@Service
public class PlaceOrderCommandHandler {
    private final OrderRepository orderRepository;  // write store
    private final EventPublisher eventPublisher;
    public String handle(PlaceOrderCommand cmd) {
        Order order = Order.create(cmd.customerId(), cmd.items());
        orderRepository.save(order);
        eventPublisher.publish(OrderPlacedEvent.from(order));
        return order.getId();  // return aggregate ID only
    }
}
@Component
public class OrderSummaryProjection {
    private final OrderSummaryRepository readRepository;  // read-optimized store
    @EventHandler
    public void on(OrderPlacedEvent event) {
        // Upsert a denormalized read model optimized for "list my orders" queries
        readRepository.save(new OrderSummary(
            event.orderId(),
            event.customerId(),
            event.totalAmount(),
            event.currency(),
            OrderStatus.PLACED,
            event.occurredAt()
        ));
    }
}

CQRS with event-driven projections creates eventual consistency: the read model reflects the write model after a small lag (typically milliseconds). Design your UI and API to acknowledge this — show a "processing" state after a command rather than immediately querying the read model.

The Saga Pattern: Distributed Transactions Without 2PC

In a microservices architecture, a business transaction often spans multiple services (place order → reserve inventory → process payment → confirm order). Traditional two-phase commit (2PC) across services is impractical — it creates tight coupling and availability dependency. The Saga pattern replaces 2PC with a sequence of local transactions, each publishing an event that triggers the next step. If any step fails, compensating transactions undo the preceding steps.

Choreography-Based Saga

Each service listens for events and decides independently what to do next. No central coordinator. This is simpler to implement but harder to trace and debug — the business workflow is implicit in the event routing rather than explicitly described anywhere.

Orchestration-Based Saga

A central saga orchestrator sends commands to each service and listens for their responses. The workflow is explicit and visible in the orchestrator's code. Failures trigger compensating commands. This is more complex to implement but much easier to monitor and debug, making it the preferred pattern for complex multi-step workflows.

// Orchestration-based saga for order placement
@Component
public class PlaceOrderSaga {
    @StartSaga
    @SagaEventHandler(associationProperty = "orderId")
    public void handle(OrderPlacedEvent event) {
        // Step 1: Reserve inventory
        commandGateway.send(new ReserveInventoryCommand(
            event.orderId(), event.items()
        ));
    }
    @SagaEventHandler(associationProperty = "orderId")
    public void handle(InventoryReservedEvent event) {
        // Step 2: Process payment
        commandGateway.send(new ProcessPaymentCommand(
            event.orderId(), event.totalAmount()
        ));
    }
    @SagaEventHandler(associationProperty = "orderId")
    public void handle(PaymentFailedEvent event) {
        // Compensate: release inventory reservation
        commandGateway.send(new ReleaseInventoryCommand(event.orderId()));
        commandGateway.send(new CancelOrderCommand(
            event.orderId(), "Payment failed"
        ));
    }
    @EndSaga
    @SagaEventHandler(associationProperty = "orderId")
    public void handle(PaymentProcessedEvent event) {
        commandGateway.send(new ConfirmOrderCommand(event.orderId()));
    }
}

Production Challenges and How to Address Them

Idempotency: Consumers may receive the same event more than once (Kafka at-least-once delivery). Every consumer must be idempotent — processing the same event twice must produce the same result as processing it once. Use the event ID as an idempotency key, storing processed event IDs in a deduplication table.

Ordering: Events in a Kafka topic partition are ordered, but events across partitions are not. Design your partitioning key so events that must be processed in order (all events for the same aggregate) land in the same partition.

Schema evolution: Consumer and producer evolve independently. Use a Schema Registry (Confluent or Apicurio) with Avro or Protobuf to enforce compatibility rules. Backward-compatible changes (adding optional fields) are safe; breaking changes (removing or renaming fields) require a new event version.

Observability: Distributed event chains are hard to trace. Propagate a correlation ID through all events. Use distributed tracing (OpenTelemetry with Kafka instrumentation) to trace the full journey of a business transaction across services and events.

"Events are the API contract of the future — but unlike REST contracts, they are asynchronous, durable, and replayable. Design them with the same care you give to your synchronous APIs."

Key Takeaways

  • Domain events are immutable facts about what happened — design them to be self-describing, versioned, and backward-compatible.
  • Event sourcing suits entities requiring full audit trails; CQRS separates the write model from read-optimized projections.
  • The Saga pattern replaces 2PC for distributed transactions — prefer orchestration-based sagas for complex multi-step workflows.
  • Every consumer must be idempotent; partition events by aggregate ID to ensure ordering within a stream.
  • Use a Schema Registry to manage event schema evolution safely across independently-deployed services.

Related Articles

Discussion / Comments

Join the conversation — your comment goes directly to my inbox.

← Back to Blog