Microservices Data Management: Database per Service Pattern & Saga Orchestration (2026)
Data management is where microservices architecture is hardest. The freedom to deploy services independently collides directly with the need to maintain consistency across services that each own a slice of the overall system state. Two-phase commit is theoretically correct but operationally catastrophic at scale. Eventual consistency is pragmatically necessary but demands careful engineering to be safe. This guide navigates the full landscape — from the shared database anti-pattern through saga orchestration, CQRS, and API composition — with production-grade Spring Boot implementations.
The Shared Database Anti-Pattern
The most common failure mode when migrating from a monolith to microservices is extracting services while retaining a shared database. Teams decompose the application logic into separate services but point all services at the same PostgreSQL database, often sharing the same schema or at best different schemas in the same instance. This preserves the appearance of microservices while retaining most of the coupling that made the monolith difficult to change.
Schema coupling is the most damaging form. When the Order service and Inventory service both read the products table, any schema change — adding a column, renaming a field, changing a data type — must be coordinated across every service that touches that table. The "independent deployability" promise of microservices evaporates: a database migration becomes a multi-service coordination exercise requiring simultaneous deployment windows. Teams end up with the operational complexity of microservices and the coupling of a monolith.
Deployment coupling follows inevitably. If the Inventory service's schema migration requires a table lock on products, the Order service experiences degraded performance during the migration window — even though Order has nothing to do with the inventory change. A Blue-Green deployment for the Inventory service becomes entangled with Order service's health. Performance coupling compounds the problem: a reporting service running expensive analytical queries against the shared database directly competes for I/O bandwidth with the transactional Order service. One team's query optimization becomes another team's production incident. The shared database anti-pattern must be eliminated as a prerequisite for achieving genuine microservices independence.
Database per Service: The Right Approach
The database per service pattern mandates that each microservice exclusively owns its data store — no other service may directly query or write to it. The only way to access a service's data is through its published API. This is the architectural boundary that makes services genuinely independent: you can change, migrate, or replace a service's database without any impact on other services, because no other service depends on its internal data model.
This pattern also enables polyglot persistence — choosing the database technology that best fits each service's access pattern. A typical e-commerce platform might use: PostgreSQL for the Order service (complex relational queries, ACID transactions, foreign key integrity); MongoDB for the Product Catalog service (flexible schema for products with varying attributes across categories); Redis for the Session service (sub-millisecond key-value access, automatic TTL-based expiry); Cassandra for the Analytics service (time-series write-heavy workloads, wide-column partitioned by user + time); Elasticsearch for the Search service (full-text search, faceted filtering). Each team selects the tool that fits their access patterns rather than conforming to one shared database's constraints.
The challenge introduced by database per service is cross-service queries. A customer service dashboard that shows a customer's profile (Customer service), recent orders (Order service), open support tickets (Support service), and total spend (Billing service) cannot be answered by a single SQL JOIN. The solutions — API Composition, CQRS with projections, and GraphQL Federation — are discussed in later sections. These patterns add complexity, but that complexity is the honest cost of genuine service independence, previously hidden by shared database joins.
Distributed Transactions: Why 2PC Fails
Two-Phase Commit (2PC) is the traditional academic solution for distributed transactions. In Phase 1 (Prepare), a coordinator sends "prepare to commit" to all participant services. Each participant acquires locks on affected data and responds "ready" or "abort". In Phase 2 (Commit), if all participants responded "ready", the coordinator sends "commit" to all; if any responded "abort", it sends "rollback" to all. 2PC provides ACID guarantees across distributed participants — in theory.
In practice, 2PC has a fatal flaw: coordinator single point of failure. If the coordinator crashes after sending "prepare" but before sending "commit" or "rollback", all participant services are stuck in the prepared state — holding locks on their data indefinitely. No participant knows whether to commit or roll back, because only the coordinator knew the decision. The system is blocked until the coordinator recovers. In a microservices architecture where services are running on ephemeral pods in Kubernetes, coordinator crashes are not rare edge cases — they are normal operating conditions during rolling deployments, pod evictions, and node failures. 2PC's blocking behavior makes it unsuitable for any system that requires continuous availability.
The deeper problem is that 2PC requires all participant services to hold locks across a network round trip. In a monolith, acquiring a lock and committing takes microseconds. In a distributed system, the network round trip for the prepare-commit cycle takes tens to hundreds of milliseconds. During this time, all locked rows are unavailable to other transactions. Under load, this creates severe contention and dramatically reduces throughput. The saga pattern was developed specifically to address these failures.
Saga Pattern: The Solution for Distributed Transactions
A saga is a sequence of local transactions, each in a separate service. If any step fails, compensating transactions undo the work of previously completed steps. Unlike 2PC, sagas never hold locks across service boundaries — each local transaction commits immediately and independently. Consistency is eventual rather than atomic, but availability is maintained even when individual services fail.
There are two coordination styles. Choreography: each service publishes events when a local transaction completes; other services subscribe to these events and react by executing their own local transactions. No central coordinator — the workflow emerges from event reactions. Simple for short, linear flows, but becomes a debugging nightmare for complex multi-service workflows: "which service processed this order?" requires tracing events across multiple Kafka topics. Orchestration: a dedicated saga orchestrator (a service or workflow engine) directs each participant what to do, receives results, and determines the next step. The workflow is explicit and traceable, but the orchestrator becomes a central component that must be resilient.
Here is a Spring Boot order placement saga using orchestration with Kafka:
// Saga orchestrator — manages the order placement workflow
@Service
public class OrderPlacementSaga {
@Transactional
public void startSaga(OrderPlacedEvent event) {
// Step 1: Reserve inventory
SagaState state = SagaState.builder()
.sagaId(UUID.randomUUID().toString())
.orderId(event.getOrderId())
.status(SagaStatus.INVENTORY_RESERVATION_PENDING)
.build();
sagaRepository.save(state);
kafkaTemplate.send("inventory-commands",
new ReserveInventoryCommand(
state.getSagaId(),
event.getOrderId(),
event.getItems()));
}
@KafkaListener(topics = "inventory-events")
@Transactional
public void handleInventoryResult(InventoryResultEvent event) {
SagaState state = sagaRepository.findBySagaId(event.getSagaId());
if (event.isSuccess()) {
state.setStatus(SagaStatus.PAYMENT_PENDING);
sagaRepository.save(state);
kafkaTemplate.send("payment-commands",
new ChargePaymentCommand(
state.getSagaId(),
state.getOrderId(),
event.getTotalAmount()));
} else {
// Inventory reservation failed — compensate by cancelling order
state.setStatus(SagaStatus.FAILED);
sagaRepository.save(state);
kafkaTemplate.send("order-commands",
new CancelOrderCommand(state.getOrderId(), "Inventory unavailable"));
}
}
@KafkaListener(topics = "payment-events")
@Transactional
public void handlePaymentResult(PaymentResultEvent event) {
SagaState state = sagaRepository.findBySagaId(event.getSagaId());
if (event.isSuccess()) {
state.setStatus(SagaStatus.COMPLETED);
sagaRepository.save(state);
kafkaTemplate.send("order-commands",
new ConfirmOrderCommand(state.getOrderId()));
} else {
// Payment failed — compensate by releasing inventory
state.setStatus(SagaStatus.COMPENSATING);
sagaRepository.save(state);
kafkaTemplate.send("inventory-commands",
new ReleaseInventoryCommand(state.getSagaId(), state.getOrderId()));
kafkaTemplate.send("order-commands",
new CancelOrderCommand(state.getOrderId(), "Payment failed"));
}
}
}
The saga state is persisted to a database at every step — if the orchestrator crashes, it can resume from the last persisted state after restart. Each command must be idempotent (the receiving service must handle receiving the same command twice without double-executing), because Kafka at-least-once delivery guarantees that any message may be delivered more than once under failure conditions.
Eventual Consistency in Practice
Eventual consistency is not "data might be wrong eventually" — it is "data will be consistent, but not necessarily at this exact moment." Services operating under eventual consistency must be designed explicitly for the consistency windows and failure modes they will encounter. Four concerns must be addressed:
Duplicate message handling (idempotency): Kafka and most message brokers guarantee at-least-once delivery. A service processing an OrderPlaced event must check whether it has already processed this event before acting. Store processed event IDs with a unique constraint: INSERT INTO processed_events (event_id, processed_at) VALUES (?, NOW()) ON CONFLICT DO NOTHING. If the insert succeeds, process normally. If it fails (duplicate), skip processing. This makes the consumer idempotent at the database level.
@Service
public class InventoryConsumer {
@KafkaListener(topics = "order-events")
@Transactional
public void handleOrderPlaced(OrderPlacedEvent event) {
// Idempotency check — skip if already processed
if (processedEventRepository.existsByEventId(event.getEventId())) {
log.info("Duplicate event {}, skipping", event.getEventId());
return;
}
// Process atomically with idempotency record
inventoryService.reserve(event.getOrderId(), event.getItems());
processedEventRepository.save(
ProcessedEvent.of(event.getEventId(), Instant.now()));
// Both inserts commit atomically — if this method retries,
// the second attempt is caught by the idempotency check above
}
}
Out-of-order message handling: A OrderCancelled event may arrive before the corresponding OrderPlaced event if messages end up on different Kafka partitions or if a retry delivers a later message first. Design consumers to handle this gracefully: check entity state before applying transitions, and use Kafka partition keys (order ID as the key) to guarantee ordering within a logical entity.
Compensating transactions: These must be designed as first-class operations, not afterthoughts. Every step in a saga that modifies state must have a corresponding compensating transaction. The compensating transaction for ReserveInventory is ReleaseInventory. Compensations must also be idempotent — they may be called multiple times during retry storms.
Dead Letter Queues (DLQ): When a consumer fails to process a message after the configured retry count (e.g., 3 retries with exponential backoff), the message moves to a DLQ. Operations teams must monitor DLQ depth and have procedures for reprocessing or manually resolving DLQ messages. An unmonitored DLQ is data loss waiting to happen.
CQRS for Read-Heavy Microservices
Command Query Responsibility Segregation separates the write model (commands that change state) from the read model (queries that read state). In a microservices context, this means maintaining a separate, denormalized read store that is purpose-built for query patterns — updated asynchronously by consuming domain events from the write-side services.
// Projection service: consumes events and maintains a denormalized read model
@Service
public class OrderProjectionService {
@KafkaListener(topics = {"order-events", "customer-events", "product-events"})
@Transactional
public void project(DomainEvent event) {
switch (event) {
case OrderPlacedEvent e -> {
OrderView view = OrderView.builder()
.orderId(e.getOrderId())
.customerId(e.getCustomerId())
.status("PLACED")
.items(e.getItems())
.totalAmount(e.getTotalAmount())
.placedAt(e.getOccurredAt())
.build();
orderViewRepository.save(view); // Writes to Elasticsearch
}
case OrderConfirmedEvent e -> {
orderViewRepository.updateStatus(e.getOrderId(), "CONFIRMED");
}
case CustomerUpdatedEvent e -> {
// Denormalize customer name into all open order views
orderViewRepository.updateCustomerName(
e.getCustomerId(), e.getNewName());
}
}
}
}
// Query service: reads from the denormalized read model (Elasticsearch)
@RestController
@RequestMapping("/api/orders")
public class OrderQueryController {
@GetMapping("/customer/{customerId}")
public List<OrderView> getCustomerOrders(
@PathVariable Long customerId,
@RequestParam(defaultValue = "0") int page,
@RequestParam String status) {
// This query spans data from Order, Customer, and Product services
// but executes as a single fast Elasticsearch query
return orderViewRepository.findByCustomerIdAndStatus(
customerId, status, PageRequest.of(page, 20));
}
}
The read model in this architecture is Elasticsearch, which enables full-text search, faceted filtering, and complex aggregations across the denormalized order data that would require expensive multi-service API calls or complex SQL in the write model. The consistency window between a command executing and the read model reflecting it is typically 50–200ms (Kafka propagation + projection processing) — acceptable for most user-facing query patterns.
Architecture diagram (ASCII):
Write Path:
Client → Order Service (PostgreSQL) → Kafka (order-events)
↓
Client → Customer Service (MongoDB) → Kafka (customer-events)
↓
Read Path:
Kafka topics → Projection Service → Elasticsearch (OrderView)
↑
Client → Order Query Service ────────────┘ (fast queries)
(ms latency, complex filters)
API Composition Pattern
Many client queries require data from multiple services. The API Composition pattern solves this at the API Gateway or a dedicated BFF (Backend for Frontend) layer: the gateway calls multiple services in parallel, aggregates the results, and returns a single response to the client. This avoids exposing the internal service decomposition to clients and reduces the number of client round trips.
@RestController
public class CustomerDashboardController {
@GetMapping("/api/dashboard/customer/{customerId}")
public Mono<CustomerDashboard> getDashboard(@PathVariable Long customerId) {
Mono<CustomerProfile> profile =
customerService.getProfile(customerId);
Mono<List<Order>> recentOrders =
orderService.getRecentOrders(customerId, 5);
Mono<List<Ticket>> openTickets =
supportService.getOpenTickets(customerId);
Mono<BillingSummary> billing =
billingService.getSummary(customerId);
// All 4 calls execute in parallel — total latency = slowest call
return Mono.zip(profile, recentOrders, openTickets, billing)
.map(tuple -> CustomerDashboard.builder()
.profile(tuple.getT1())
.recentOrders(tuple.getT2())
.openTickets(tuple.getT3())
.billingSummary(tuple.getT4())
.build());
}
}
The API composition N+1 problem occurs when the gateway fetches a list of entities from one service, then calls a second service once per entity. A dashboard showing 20 orders that each require a separate call to the Customer service for the customer name results in 21 service calls where 2 would suffice. Mitigate by designing batch-aware service APIs (GET /customers?ids=1,2,3,...) and using Flux.flatMap with concurrency limits to batch parallel lookups. GraphQL Federation is the structural solution — it allows the schema to declare relationships across services and resolves them with automatic batching via DataLoader.
Data Migration Strategies
Zero-downtime database schema migrations in a microservices environment require the expand/contract pattern (also called parallel change). Rather than a single migration that changes a column in-place — which requires a coordinated service deployment and database migration window — the expand/contract pattern splits schema changes into three phases that can each be deployed independently.
-- Phase 1: EXPAND — add new column, keep old column, write to both
ALTER TABLE orders ADD COLUMN customer_reference_id VARCHAR(36);
-- Application v1: writes to both old customer_id (INT) and new customer_reference_id (UUID)
UPDATE orders SET customer_reference_id = uuid_for(customer_id)
WHERE customer_reference_id IS NULL;
-- Phase 2: MIGRATE — backfill existing rows
-- (automated migration job runs between deployments)
UPDATE orders SET customer_reference_id = uuid_for(customer_id)
WHERE customer_reference_id IS NULL;
-- Phase 3: CONTRACT — remove old column (after all reads migrated to new column)
ALTER TABLE orders DROP COLUMN customer_id;
Flyway integrates naturally with microservices when each service manages its own migration scripts in its own db/migration directory. Configure Flyway with flyway.out-of-order=false and flyway.validate-on-migrate=true to prevent checksum mismatches from partial migration application. For blue-green deployments, use feature flags to control which column version the new service version reads — allowing rollback without a database rollback if the new version has issues.
FAQs: Microservices Data Management
Q: If I can't do JOIN across services, how do I generate reports that span multiple services?
A: Use a dedicated reporting service that consumes events from all relevant services and maintains a denormalized reporting database (a data warehouse or analytical PostgreSQL schema) purpose-built for the queries your reports require. This is the CQRS pattern applied at the organizational level — the reporting read model is maintained separately from operational write models. Tools like Apache Kafka Connect with JDBC sink can automate event-to-database projection.
Q: Should I use choreography or orchestration for my saga?
A: Choreography for simple, linear flows with 2–3 steps where services are loosely coupled and the event flow is easy to trace. Orchestration for complex workflows with branching logic, long-running processes, and clear compensation requirements. Orchestration provides better observability — you can query the saga orchestrator to find the current state of any in-flight transaction. In practice, most production teams start with choreography and migrate to orchestration as saga complexity grows.
Q: How do I handle the case where a consumer processes an event and then the database commit fails?
A: This is the dual-write problem. Use the transactional outbox pattern: within the same database transaction, write the state change and an outbox record. A separate relay service reads the outbox and publishes to Kafka, then marks the outbox record as sent. The commit succeeds atomically for both the state change and the outbox record — the relay handles Kafka publishing separately with retry. Debezium's Change Data Capture (CDC) is a popular implementation: it watches the database's transaction log and forwards changes to Kafka automatically.
Q: What consistency guarantees does the saga pattern provide?
A: Sagas provide ACD — Atomicity (the overall business transaction either completes or is fully compensated), Consistency (within each service's local transaction), and Durability. They do NOT provide Isolation — other transactions can see intermediate states. For example, during a payment saga, an inventory check might see the inventory as "reserved" but the order as not yet confirmed. Services must be designed to tolerate these intermediate states gracefully.
Q: How do I test saga compensation logic?
A: Write integration tests that inject failures at each saga step using Mockito fault injection or Testcontainers with Kafka, and verify that the compensating transactions restore the system to a consistent pre-saga state. Use contract testing (Pact) to verify that each service's Kafka event schema matches what the saga orchestrator expects. Test the idempotency of each step by calling it twice with the same input and verifying the outcome is the same.
Key Takeaways
- Database per service is non-negotiable: Shared databases couple microservices at the schema level, eliminating independent deployability. Each service must exclusively own its data — accessible only through its published API.
- 2PC is unsuitable for microservices: Coordinator failure causes indefinite blocking. The saga pattern with compensating transactions provides the correct durability model for distributed business transactions without cross-service locking.
- Idempotency is mandatory for saga participants: At-least-once Kafka delivery means every event handler must check for duplicate processing. Use unique constraint-based idempotency records committed in the same transaction as the state change.
- CQRS enables complex queries without coupling: Maintain a separate denormalized read model (Elasticsearch, Redis, a dedicated PostgreSQL schema) updated by consuming domain events. Queries that previously required cross-service JOINs become single-store lookups.
- Expand/contract for zero-downtime migrations: Never perform breaking schema changes in a single migration. Always expand (add new), migrate data, then contract (remove old) across independent deployments.
- Design for intermediate states: Services must be prepared to see partially-completed sagas in their data. State machines with explicit, named intermediate states are more robust than boolean flags that cannot represent in-progress transitions.
Related Articles
Discussion / Comments
Join the conversation — your comment goes directly to my inbox.