Advanced Design Patterns in Microservices & Spring Boot: Production Engineering Guide
1. Why GoF Patterns Are Not Enough for Microservices
The original 23 GoF patterns were designed for object-oriented design within a single process. They assume shared memory, synchronous execution, and transactional consistency. When you break a monolith into microservices, all three of these assumptions vanish simultaneously.
Microservices introduce a fundamentally different problem space that requires a separate catalog of distributed systems patterns:
- Network failures: Every remote call can fail, timeout, or return partial results.
- Partial outages: One failing service must not cascade and take down the entire system.
- Eventual consistency: Data is no longer immediately consistent across services after a write.
- Distributed transactions: ACID transactions don't span service boundaries; 2-Phase Commit is impractical at scale.
- Observability gaps: A single request may traverse 10 services; tracing and correlation are required.
| Problem in Microservices | Pattern That Addresses It |
|---|---|
| Cascading failures from slow downstream | Circuit Breaker |
| Multi-service transaction without 2PC | Saga (Choreography / Orchestration) |
| Read/write performance mismatch | CQRS |
| Dual-write inconsistency (DB + Kafka) | Outbox Pattern |
| Thread starvation from slow dependency | Bulkhead |
| Migrating legacy monolith | Strangler Fig |
| Cross-cutting concerns (mTLS, metrics) | Sidecar |
2. Circuit Breaker Pattern — Failing Fast in Distributed Systems
The Circuit Breaker pattern prevents a service from repeatedly calling a downstream dependency that is failing or timing out. Named after electrical circuit breakers, it has three states:
- Closed: Normal operation. Requests flow through. Failures are counted.
- Open: Failure threshold exceeded. Requests immediately fail (fast-fail) without calling downstream. A timer starts.
- Half-Open: Timer expires. A probe request is allowed through. If it succeeds, the circuit closes; if it fails, it reopens.
// BAD: No circuit breaker — payment service failure cascades to order service
@Service
public class OrderService {
private final RestTemplate restTemplate;
public PaymentResult processPayment(PaymentRequest req) {
// If payment service is down, this thread blocks for 30s then throws exception
// 100 concurrent users = 100 threads blocked = thread pool exhaustion
return restTemplate.postForObject(
"http://payment-service/api/payments", req, PaymentResult.class);
}
}
// GOOD: Resilience4j @CircuitBreaker with fallback
@Service
@RequiredArgsConstructor
public class OrderService {
private final PaymentClient paymentClient;
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
@TimeLimiter(name = "paymentService")
public CompletableFuture<PaymentResult> processPayment(PaymentRequest req) {
return CompletableFuture.supplyAsync(() -> paymentClient.charge(req));
}
public CompletableFuture<PaymentResult> paymentFallback(
PaymentRequest req, Throwable ex) {
log.warn("Payment service unavailable, queuing for retry: {}", ex.getMessage());
paymentQueueService.enqueue(req);
return CompletableFuture.completedFuture(PaymentResult.queued(req.orderId()));
}
}# application.yml — Resilience4j Circuit Breaker configuration
resilience4j:
circuitbreaker:
instances:
paymentService:
failure-rate-threshold: 50 # Open when 50% of calls fail
wait-duration-in-open-state: 30s # Stay open for 30 seconds
sliding-window-size: 10 # Evaluate last 10 requests
permitted-number-of-calls-in-half-open-state: 3
timelimiter:
instances:
paymentService:
timeout-duration: 3s/actuator/circuitbreakerevents) and alert when a circuit opens in production. An open circuit is a symptom of a downstream problem — it should page an on-call engineer.
3. Saga Pattern — Distributed Transactions Without 2PC
Two-Phase Commit (2PC) requires all participating services to hold locks during the transaction. In a microservices environment with potentially dozens of participants, this causes severe lock contention, network failure sensitivity, and coordinator SPOF. Sagas replace 2PC with a sequence of local transactions, each publishing an event to trigger the next step.
There are two styles: Choreography (services react to events autonomously, no central coordinator) and Orchestration (a central Saga orchestrator drives the workflow and decides the next step).
// BAD: Distributed transaction attempt — locks across services, fails on partial commit
@Transactional
public void placeOrder(PlaceOrderCommand cmd) {
orderService.createOrder(cmd); // Service 1 TX
inventoryService.reserveStock(cmd); // Service 2 TX — if this fails...
paymentService.chargeCustomer(cmd); // Service 3 TX — Service 1 may already be committed
// No clean rollback across service boundaries without 2PC!
}
// GOOD: Orchestration Saga with compensation steps
@Component
@RequiredArgsConstructor
public class PlaceOrderSaga {
private final OrderRepository orderRepository;
private final InventoryClient inventoryClient;
private final PaymentClient paymentClient;
private final ApplicationEventPublisher events;
public SagaResult execute(PlaceOrderCommand cmd) {
Order order = null;
String reservationId = null;
try {
// Step 1: Create order
order = orderRepository.save(Order.pending(cmd));
// Step 2: Reserve inventory (compensate: release reservation)
reservationId = inventoryClient.reserve(cmd.items());
// Step 3: Charge payment (compensate: refund)
PaymentResult payment = paymentClient.charge(cmd.customerId(), cmd.total());
// All steps succeeded — confirm
order.confirm(payment.transactionId());
orderRepository.save(order);
events.publishEvent(new OrderConfirmedEvent(order.getId()));
return SagaResult.success(order.getId());
} catch (InventoryException e) {
// Compensate step 1
if (order != null) orderRepository.delete(order);
return SagaResult.failed("Inventory unavailable: " + e.getMessage());
} catch (PaymentException e) {
// Compensate steps 1 and 2
if (reservationId != null) inventoryClient.release(reservationId);
if (order != null) orderRepository.delete(order);
return SagaResult.failed("Payment declined: " + e.getMessage());
}
}
}4. CQRS Pattern — Separating Reads and Writes
Command Query Responsibility Segregation (CQRS) separates the model for reading data from the model for writing data. The write model (commands) is optimized for consistency and business rule enforcement; the read model (queries) is optimized for query performance, often using denormalized projections.
// BAD: Same JPA entity for reads and writes — leads to N+1 queries for list views
@RestController
public class OrderController {
@GetMapping("/orders")
public List<Order> getOrders() {
// Loads full Order with all lazy associations — N+1 on OrderItems, Customer, Address
return orderRepository.findAll();
}
@PostMapping("/orders")
public Order createOrder(@RequestBody CreateOrderCommand cmd) {
return orderService.createOrder(cmd); // writes need full entity for validation
}
}
// GOOD: Separate Command model and Query projection model
// Command side: full JPA entity with business logic
@Entity
@Table(name = "orders")
public class Order {
@Id private String id;
@Enumerated(EnumType.STRING) private OrderStatus status;
@OneToMany(cascade = CascadeType.ALL) private List<OrderItem> items;
@Embedded private ShippingAddress shippingAddress;
// Domain methods: confirm(), ship(), cancel()
}
// Query side: flat projection DTO, no lazy loading
public interface OrderSummary {
String getId();
String getCustomerName();
BigDecimal getTotal();
String getStatus();
LocalDateTime getCreatedAt();
}
// Spring Data projection query — single optimized SQL
public interface OrderQueryRepository extends JpaRepository<Order, String> {
@Query("SELECT o.id AS id, c.fullName AS customerName, " +
"o.total AS total, o.status AS status, o.createdAt AS createdAt " +
"FROM Order o JOIN o.customer c WHERE o.customerId = :customerId")
List<OrderSummary> findSummariesByCustomerId(@Param("customerId") String customerId);
}
// Separate handlers per concern
@RestController
@RequiredArgsConstructor
public class OrderController {
private final OrderCommandService commandService;
private final OrderQueryRepository queryRepository;
@PostMapping("/orders")
public ResponseEntity<String> createOrder(@Valid @RequestBody CreateOrderCommand cmd) {
String orderId = commandService.handle(cmd);
return ResponseEntity.created(URI.create("/orders/" + orderId)).body(orderId);
}
@GetMapping("/orders")
public List<OrderSummary> getOrders(@RequestParam String customerId) {
return queryRepository.findSummariesByCustomerId(customerId);
}
}5. Outbox Pattern — Guaranteed Event Publishing
The dual-write problem: when you need to both save data to a database and publish an event to Kafka, doing them as two separate operations means one can succeed and the other can fail, leaving your system in an inconsistent state. The Outbox pattern solves this by writing the event to an outbox table in the same database transaction, then reading from that table asynchronously to publish to Kafka.
// BAD: Dual-write — Kafka publish after DB save. If Kafka is down, event is permanently lost
@Transactional
public Order createOrder(CreateOrderCommand cmd) {
Order order = orderRepository.save(Order.from(cmd));
kafkaTemplate.send("orders", new OrderCreatedEvent(order.getId())); // LOST if Kafka is down
return order;
}
// GOOD: Outbox pattern — event goes into same DB transaction as domain data
@Entity
@Table(name = "outbox_events")
public class OutboxEvent {
@Id @GeneratedValue private Long id;
private String aggregateType;
private String aggregateId;
private String eventType;
@Column(columnDefinition = "TEXT") private String payload;
private LocalDateTime createdAt;
private boolean processed;
}
@Service
@RequiredArgsConstructor
public class OrderCommandService {
private final OrderRepository orderRepository;
private final OutboxEventRepository outboxRepository;
private final ObjectMapper objectMapper;
@Transactional // Both saves are atomic
public String createOrder(CreateOrderCommand cmd) throws JsonProcessingException {
Order order = orderRepository.save(Order.from(cmd));
OutboxEvent outboxEvent = new OutboxEvent();
outboxEvent.setAggregateType("Order");
outboxEvent.setAggregateId(order.getId());
outboxEvent.setEventType("OrderCreated");
outboxEvent.setPayload(objectMapper.writeValueAsString(
new OrderCreatedEvent(order.getId(), order.getCustomerId(), order.getTotal())));
outboxEvent.setCreatedAt(LocalDateTime.now());
outboxRepository.save(outboxEvent); // same TX as order save
return order.getId();
}
}
// Simpler alternative: @TransactionalEventListener (Spring-native)
@Service
@RequiredArgsConstructor
public class OrderEventRelay {
private final KafkaTemplate<String, Object> kafkaTemplate;
@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
public void onOrderCreated(OrderCreatedEvent event) {
// This runs AFTER the DB transaction commits successfully
kafkaTemplate.send("order-events", event.orderId(), event);
}
}6. Bulkhead Pattern — Isolating Failure Domains
Named after the watertight compartments in a ship's hull, the Bulkhead pattern isolates failure domains by assigning separate thread pools or connection pools to different downstream dependencies. This ensures that a slow or failing dependency consumes only its allocated resources and cannot starve other parts of the system.
// BAD: Shared Tomcat thread pool — slow DB query starves all HTTP requests
@Service
public class ProductService {
public Product getProduct(String id) {
return productRepository.findById(id) // slow query blocks a Tomcat thread
.orElseThrow(() -> new ProductNotFoundException(id));
}
public List<Recommendation> getRecommendations(String userId) {
return recommendationClient.get(userId); // 3rd-party API also blocks a Tomcat thread
// If recommendations API is slow, ALL Tomcat threads may get consumed here
}
}
// GOOD: Resilience4j Bulkhead with separate thread pool per downstream
@Service
@RequiredArgsConstructor
public class ProductService {
@Bulkhead(name = "productDb", type = Bulkhead.Type.THREADPOOL)
public CompletableFuture<Product> getProduct(String id) {
return CompletableFuture.supplyAsync(() ->
productRepository.findById(id)
.orElseThrow(() -> new ProductNotFoundException(id)));
}
@Bulkhead(name = "recommendationApi", type = Bulkhead.Type.THREADPOOL)
@CircuitBreaker(name = "recommendationApi", fallbackMethod = "emptyRecommendations")
public CompletableFuture<List<Recommendation>> getRecommendations(String userId) {
return CompletableFuture.supplyAsync(() -> recommendationClient.get(userId));
}
public CompletableFuture<List<Recommendation>> emptyRecommendations(
String userId, Throwable t) {
return CompletableFuture.completedFuture(Collections.emptyList());
}
}# application.yml — Bulkhead thread pool configuration
resilience4j:
thread-pool-bulkhead:
instances:
productDb:
max-thread-pool-size: 10
core-thread-pool-size: 5
queue-capacity: 20
recommendationApi:
max-thread-pool-size: 5
core-thread-pool-size: 2
queue-capacity: 107. Strangler Fig Pattern — Migrating Legacy Monoliths
Named after the strangler fig tree that gradually envelops its host, this pattern enables incremental migration of a monolith to microservices without a big-bang rewrite. A proxy layer intercepts requests and routes them: new features go to new microservices, existing features still go to the monolith. Over time, the monolith is replaced module by module.
Real scenario: A team at a fintech company migrated a 8-year-old Spring MVC monolith to microservices over 18 months. They identified 6 bounded contexts, started with the most-changed module (payment processing), deployed it as a service, and routed traffic via Spring Cloud Gateway. No downtime, no big-bang risk.
// Spring Cloud Gateway routing: new services intercept specific paths
@Configuration
public class GatewayConfig {
@Bean
public RouteLocator routeLocator(RouteLocatorBuilder builder) {
return builder.routes()
// New payment microservice handles /api/payments
.route("payment-service", r -> r
.path("/api/payments/**")
.filters(f -> f.rewritePath("/api/payments/(?<segment>.*)", "/${segment}"))
.uri("lb://payment-service"))
// New inventory microservice handles /api/inventory
.route("inventory-service", r -> r
.path("/api/inventory/**")
.uri("lb://inventory-service"))
// Everything else still goes to the monolith
.route("legacy-monolith", r -> r
.path("/**")
.uri("http://legacy-monolith:8080"))
.build();
}
}8. Sidecar Pattern — Cross-Cutting Concerns in Kubernetes
The Sidecar pattern deploys a helper container alongside the main application container in the same Kubernetes Pod. The sidecar handles cross-cutting concerns (mTLS, log collection, metrics scraping, retries) without modifying the application code.
Envoy proxy as sidecar (used in Istio Service Mesh) intercepts all inbound and outbound network traffic and implements: TLS termination, circuit breaking, retries, tracing header injection, and metrics collection at the infrastructure layer.
- When to use infrastructure-level patterns (Sidecar/Istio): mTLS between services, distributed tracing headers, L4/L7 retries, canary traffic splitting. These are cross-cutting concerns that should not pollute business code.
- When to use application-level patterns (Resilience4j): Business-specific fallback logic, domain-aware retry strategies (e.g., don't retry non-idempotent operations), custom circuit breaker thresholds per use case.
9. Anti-Patterns in Microservices
Recognizing anti-patterns is as valuable as knowing the patterns. These are the five most common mistakes teams make when building microservices:
1. Distributed Monolith
Microservices that still share a single database. Services are independently deployable in theory, but any schema change requires coordinating deployments of all services. You get the complexity of microservices without the benefits. Fix: Each service owns its own database schema. Use the API to share data, never direct DB access.
2. Chatty Services
20 API calls to render one page. Synchronous call chains increase latency proportionally with depth. Fix: Use API aggregation (BFF pattern), GraphQL for flexible queries, or asynchronous event-driven communication for non-blocking operations.
3. Synchronous Chain Without Timeouts
Service A calls B calls C calls D synchronously. If D takes 5 seconds, A's total latency is at minimum 5 seconds, and all threads in A, B, and C are blocked waiting. Fix: Set aggressive timeouts at every hop. Use async patterns for non-critical paths. Apply Circuit Breakers and Bulkheads.
4. Shared Mutable Database
Multiple services writing to the same database tables. This couples services at the data layer — a change to the schema by one team breaks all other services. Fix: Database per service. Expose data via events or read APIs, not direct DB access.
5. Missing API Gateway
Exposing all services directly to clients. Clients must know about all service URLs, handle their own authentication for each service, and deal with version mismatches. Fix: Spring Cloud Gateway or Kong as a single entry point handling auth, rate limiting, routing, and SSL termination.
10. Pattern Selection Guide
| Problem | Pattern | Spring Boot Tool |
|---|---|---|
| Downstream service failing or slow | Circuit Breaker | Resilience4j @CircuitBreaker |
| Multi-service distributed transaction | Saga | Temporal.io / Apache Camel |
| Read and write optimization mismatch | CQRS | Spring Data projections + read replica |
| Guaranteed event delivery to Kafka | Outbox Pattern | Debezium / @TransactionalEventListener |
| Thread starvation from slow dependency | Bulkhead | Resilience4j @Bulkhead |
| Incremental monolith migration | Strangler Fig | Spring Cloud Gateway routing |
| Cross-cutting infra concerns (mTLS, tracing) | Sidecar | Istio / Envoy proxy |
11. Interview Insights & FAQ
Q: What's the difference between Choreography and Orchestration sagas?
Choreography: Each service listens for events and autonomously decides what to do next. No central coordinator. Simple for small flows but hard to trace and debug for complex workflows. Orchestration: A central Saga orchestrator drives the workflow, calling each service in sequence and handling compensation. Easier to reason about and trace, but introduces a single orchestrator component that must be resilient. For complex multi-step workflows (>3 steps), prefer orchestration.
Q: When should you use CQRS?
Use CQRS when your read and write loads have significantly different characteristics: high-volume reads that need denormalized data, or complex domain models that need consistency on writes. Don't apply CQRS to every service by default — it adds significant complexity (two models, eventual consistency between them). A good heuristic: if you're seeing N+1 query problems or your JPA entities are getting dozens of annotations to serve different views, CQRS will help.
Q: How does @TransactionalEventListener differ from @EventListener for the Outbox pattern?
@EventListener fires during the transaction, before commit. If the event listener throws an exception, the transaction rolls back. @TransactionalEventListener with AFTER_COMMIT fires only after the transaction successfully commits. This makes it much safer for publishing to Kafka: you only publish the event if the database commit succeeded. The remaining risk is that the listener itself can fail after the commit, so combine with retry logic or persistent outbox for guaranteed delivery.
Q: How is Circuit Breaker different from Retry?
Retry handles transient failures by retrying the same operation. Circuit Breaker handles systematic failures by stopping attempts altogether when failure rate exceeds a threshold. They are complementary: use Retry for transient glitches (network hiccup) and Circuit Breaker for persistent failures (downstream service down). Always configure retry with exponential backoff and jitter to avoid thundering herd when the downstream recovers.
FAQ
Q: Can I use CQRS without Event Sourcing?
Absolutely. CQRS and Event Sourcing are independent patterns that complement each other but are not required together. Most production CQRS implementations use a relational database for the write model and a read replica or materialized view for the query model.
Q: Is the Outbox pattern overkill for small systems?
For systems with fewer than 1,000 events per day and where losing a few events is acceptable, @TransactionalEventListener alone may suffice. The full Outbox pattern with Debezium is necessary when you need guaranteed at-least-once delivery and cannot tolerate event loss (financial transactions, inventory changes, etc.).
Q: Should every microservice implement Circuit Breaker?
Every synchronous outbound call to an external service should be protected by a Circuit Breaker and a timeout. This is non-negotiable for production systems. The Bulkhead is important when one slow dependency could starve resources needed by other, healthier dependencies.
Q: What's the minimum pattern set for a new microservice?
At minimum: Circuit Breaker + timeout on all outbound calls, structured logging with correlation ID, health check endpoint, and graceful shutdown. These four practices prevent the most common production incidents in microservices.
Key Takeaways
- GoF patterns solve OOP design problems; microservices patterns solve distributed systems problems.
- Protect every outbound call with Circuit Breaker + timeout — this prevents cascading failures.
- Use Saga for multi-service workflows; design compensation logic as carefully as the happy path.
- The Outbox pattern is the only reliable solution to the dual-write problem.
- CQRS is not required everywhere — apply it where read and write loads genuinely diverge.
- Strangler Fig is the safest migration strategy for legacy monoliths — avoid big-bang rewrites.
Leave a Comment
Related Posts
Java Design Patterns in Production
Strategy, Factory & Builder patterns for scalable Spring Boot microservices.
SOLID Principles in Java
Real-world refactoring patterns for Spring Boot microservices.
Design Patterns: Beginner to Advanced
All 23 GoF patterns with Java examples and Spring Boot usage.