Software Engineer · Java · Spring Boot · Microservices
Strangler Fig Pattern: Migrating Monolith to Microservices Step by Step
Rewriting a production monolith from scratch is a bet most engineering teams lose. Features keep changing, deadlines close in, and the "big bang" migration ships 18 months late with half the original functionality. The Strangler Fig Pattern — named after a tropical fig tree that grows around a host until the host disappears — offers a proven alternative: incrementally extract bounded contexts into microservices while the monolith continues serving live traffic. In this deep dive we walk through the full migration playbook, from identifying seam boundaries with DDD context maps to routing traffic with Spring Cloud Gateway, decomposing shared databases, and safely cutting over production traffic with feature flags and canary releases.
Table of Contents
- Why Monoliths Break Under Scale
- What Is the Strangler Fig Pattern?
- Step-by-Step Migration Playbook
- Spring Boot Implementation: Facade Router Example
- Domain Boundary Identification: Using DDD Context Maps
- Database Decomposition: Shared DB → Database per Service
- Traffic Cutover Strategies: Feature Flags, Canary, Dark Launch
- Production Pitfalls and Anti-Patterns
- Key Takeaways
- Conclusion
1. Why Monoliths Break Under Scale
Every monolith starts as a reasonable design decision. A single deployable unit is easy to develop locally, test end-to-end, and reason about. The trouble begins when the team doubles, the feature surface explodes, and the business demands that OrderService, UserService, and PaymentService all evolve at different rates with different reliability requirements. Four distinct failure modes consistently emerge in production monoliths under scale pressure:
Deployment coupling is the most painful. A bug fix in UserService that should take 20 minutes to ship instead requires a full regression cycle across order management, inventory, and payment flows — because they all live in the same JAR. Teams serialize their release trains, velocity drops, and engineers start fighting over deployment slots. A single failing test in the notification module blocks a critical payment hotfix from reaching production for 48 hours.
Scaling inefficiency follows deployment coupling. Your OrderService needs 20 instances to handle Black Friday throughput while your UserService needs only 3. In a monolith you scale everything or nothing — all 23 logical service instances run inside every JVM, meaning you pay for 23 instances of the entire application just to get enough OrderService capacity. Cloud bills balloon while resource utilization stays low.
Technology lock-in compounds over time. The monolith is Spring Boot 2.4 on Java 11. The team wants to adopt virtual threads (Java 21), reactive WebFlux for the high-throughput notification pipeline, and a graph database for the recommendation engine. None of these are practical upgrades inside a 600-KLOC monolith with 4,000 integration tests — the blast radius of each library version bump is enormous.
Reliability blast radius is the final pressure point. A memory leak in the ReportingService — which processes heavy analytical queries — causes GC pressure that degrades response times across the entire application, including the payment checkout flow. In a well-bounded microservices architecture, a degraded reporting instance is an isolated problem. In the monolith, it is a 4 AM P0 incident for the entire platform.
2. What Is the Strangler Fig Pattern?
Martin Fowler coined the term in 2004, inspired by the strangler fig tree (Ficus aurea) that germinates in the canopy of a rainforest host tree, sends roots down to the soil, and over decades completely encases the host — which eventually dies and rots away, leaving the fig tree as a self-supporting hollow shell around the original structure. The metaphor is precise: the new microservices system gradually wraps around the monolith, intercepting more and more traffic, until the monolith can be safely decommissioned.
The core principle is deceptively simple: introduce a facade (proxy/router) in front of the monolith that can redirect specific request paths to new microservices while passing everything else through to the monolith unchanged. From the client's perspective nothing changes — the same DNS name, the same API contracts, the same authentication headers. Under the hood, a growing percentage of requests are handled by new, independently deployable services.
The migration unfolds in three phases, visualized below as a text diagram:
Phase 1: Facade introduced — all traffic still hits monolith
┌─────────────┐ ┌──────────────────────────────────┐
│ Client │──────▶│ Strangler Facade / API Gateway │
└─────────────┘ └──────────────────┬───────────────┘
│ (all routes)
▼
┌──────────────────┐
│ Monolith │
└──────────────────┘
Phase 2: First service extracted — OrderService migrated
┌─────────────┐ ┌──────────────────────────────────┐
│ Client │──────▶│ Strangler Facade / API Gateway │
└─────────────┘ └────────┬─────────────────┬───────┘
/orders/* │ │ (everything else)
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ OrderService │ │ Monolith │
│ (microservice) │ └──────────────────┘
└──────────────────┘
Phase 3: Migration complete — monolith decommissioned
┌─────────────┐ ┌──────────────────────────────────┐
│ Client │──────▶│ Strangler Facade / API Gateway │
└─────────────┘ └──┬──────────┬──────────┬─────────┘
/orders/* │ /users/* │ /pay/* │ ...
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│OrderSvc │ │UserSvc │ │PaymentSvc│
└──────────┘ └──────────┘ └──────────┘
The key guarantee is that at every point in the migration, the system is fully functional and can serve production traffic. There is no "migration weekend" where the entire platform goes down. You can pause the migration at Phase 2 indefinitely if business priorities change, and the platform continues working correctly.
3. Step-by-Step Migration Playbook
The migration follows four concrete phases. Skipping any phase dramatically increases risk — each phase validates assumptions that the next phase depends on.
Phase 1 — Identify Seams. A seam is a boundary in the monolith along which you can cut without breaking other parts. Seams typically align with bounded contexts in Domain-Driven Design: OrderManagement, UserProfile, PaymentProcessing, InventoryControl. Start by mapping which modules have low coupling to the rest of the codebase — these are your low-risk first extractions. Draw a dependency graph using static analysis tools like Structure101 or jQAssistant, or by examining Spring bean injection graphs. Modules with fewer than 5 cross-boundary method calls per class are good extraction candidates.
Phase 2 — Deploy the Strangler Facade. Before extracting any service, deploy a reverse proxy or API gateway in front of the monolith. Initially it passes 100% of traffic through. This establishes the routing infrastructure and gives you operational confidence with zero business risk. Spring Cloud Gateway is the natural choice for Spring Boot shops — it integrates with Spring Security, Micrometer observability, and Resilience4j circuit breakers out of the box. Alternatively, NGINX, Envoy, or AWS API Gateway serve this role for polyglot environments.
Phase 3 — Incremental Extraction. Extract one bounded context at a time, in ascending order of coupling complexity. Start with a read-heavy, low-write service like UserProfileService where the data model is stable. Build the new microservice, deploy it, and configure the facade to route /api/users/** to the new service. Run the monolith and the new service in parallel for at least two weeks — compare response payloads, latency distributions, and error rates to validate correctness before cutting over. This "dark launch" pattern (covered in Section 7) is essential for catching behavioral differences you didn't anticipate.
Phase 4 — Traffic Cutover and Monolith Decommission. Once a new service is validated, update the facade routing to direct 100% of traffic to it. Delete the corresponding module from the monolith on the next sprint cycle. Over multiple sprints, the monolith shrinks to a thin shell, and eventually you can shut down the last monolith instance. The decommission is an anticlimactic non-event — which is exactly how it should feel.
The following table summarises what each migration stage looks like in practice for a typical e-commerce platform:
4. Spring Boot Implementation: Facade Router Example
The strangler facade can be implemented as a standalone Spring Boot application using Spring Cloud Gateway. The gateway configuration maps URL patterns to either the new microservices or the legacy monolith, and includes a StranglerFigFilter that handles feature-flag-controlled routing decisions at runtime without requiring a gateway redeploy.
First, the gateway application.yml that defines static routing rules:
# application.yml — Spring Cloud Gateway as Strangler Facade
spring:
cloud:
gateway:
routes:
# Extracted: UserService (fully migrated)
- id: user-service
uri: lb://user-service
predicates:
- Path=/api/users/**
filters:
- name: CircuitBreaker
args:
name: userServiceCB
fallbackUri: forward:/fallback/users
# Extracted: OrderService (partially migrated — feature-flag controlled)
- id: order-service
uri: lb://order-service
predicates:
- Path=/api/orders/**
- Header=X-Strangler-Route, new
filters:
- RewritePath=/api/orders/(?<segment>.*), /orders/${segment}
# Fallback: Legacy Monolith catches all unmatched routes
- id: monolith-fallback
uri: http://legacy-monolith:8080
predicates:
- Path=/**
filters:
- name: Retry
args:
retries: 2
statuses: SERVICE_UNAVAILABLE
For runtime routing decisions based on feature flags (rather than static YAML predicates), implement a custom GlobalFilter. This is the core of the Strangler Fig pattern — it inspects every incoming request and injects a routing header based on the current feature flag state:
@Component
@Slf4j
public class StranglerFigRoutingFilter implements GlobalFilter, Ordered {
private final FeatureFlagService featureFlags;
public StranglerFigRoutingFilter(FeatureFlagService featureFlags) {
this.featureFlags = featureFlags;
}
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String path = exchange.getRequest().getPath().value();
String userId = exchange.getRequest().getHeaders()
.getFirst("X-User-Id");
// Route /api/orders/** to new OrderService only when flag is enabled
if (path.startsWith("/api/orders") &&
featureFlags.isEnabled("order-service-migration", userId)) {
log.debug("Routing orders request to new OrderService for userId={}", userId);
ServerHttpRequest mutatedRequest = exchange.getRequest().mutate()
.header("X-Strangler-Route", "new")
.header("X-Migrated-At", Instant.now().toString())
.build();
return chain.filter(exchange.mutate().request(mutatedRequest).build());
}
// All other traffic: pass through to monolith unchanged
return chain.filter(exchange);
}
@Override
public int getOrder() {
return Ordered.HIGHEST_PRECEDENCE;
}
}
The FeatureFlagService is a thin abstraction over LaunchDarkly, Unleash, or a simple database-backed feature flag store. The key design decision is that the routing logic lives in the facade, not in the monolith — this means you can change routing behaviour without touching the monolith codebase at all:
@Service
public class FeatureFlagService {
private final FeatureFlagRepository repository;
private final LoadingCache<String, Set<String>> flagCache;
public FeatureFlagService(FeatureFlagRepository repository) {
this.repository = repository;
this.flagCache = Caffeine.newBuilder()
.expireAfterWrite(30, TimeUnit.SECONDS)
.build(flagName -> repository.findEnabledUserIds(flagName));
}
/**
* Returns true if the named feature flag is enabled for this user.
* Supports three modes:
* - "all" : flag enabled for everyone
* - "canary" : flag enabled for a % of users based on ID hash
* - "whitelist": flag enabled for specific user IDs
*/
public boolean isEnabled(String flagName, String userId) {
try {
FeatureFlag flag = repository.findByName(flagName)
.orElse(FeatureFlag.disabled());
return switch (flag.getMode()) {
case ALL -> true;
case DISABLED -> false;
case CANARY -> isInCanaryCohort(userId, flag.getPercentage());
case WHITELIST -> flagCache.get(flagName).contains(userId);
};
} catch (Exception e) {
log.warn("Feature flag lookup failed for {}, defaulting to false", flagName, e);
return false; // fail-safe: route to monolith on flag lookup failure
}
}
private boolean isInCanaryCohort(String userId, int percentage) {
int hash = Math.abs(userId.hashCode() % 100);
return hash < percentage;
}
}
Note the fail-safe default in the feature flag lookup: when the flag service is unavailable, traffic falls back to the monolith. This is a critical safety property of the strangler facade — a failure in the migration infrastructure should never cause a customer-facing outage. The monolith is always the safe fallback.
5. Domain Boundary Identification: Using DDD Context Maps
The single biggest mistake in monolith decomposition is drawing service boundaries along technical layers (controller → service → repository) rather than domain boundaries. Technical boundaries produce "nano-services" that must be called sequentially to complete any business operation — you've added network latency and operational complexity without gaining the independence that microservices promise.
Domain-Driven Design's Context Map is the correct tool for boundary identification. A bounded context is a region of the domain where a specific ubiquitous language applies consistently. Within the OrderManagement bounded context, the term "Customer" means "the entity who placed the order with a specific delivery address." Within the Loyalty bounded context, "Customer" means "the account holder with a points balance." These are different models — forcing them into a single shared Customer class is where monolith corruption begins.
To build the context map for your monolith, follow this process: first, run event-storming workshops with domain experts to identify the core domain events. For an e-commerce platform these might be: OrderPlaced, PaymentAuthorized, InventoryReserved, ShipmentDispatched, LoyaltyPointsAwarded. Group events by the team or subdomain that owns them — these groupings naturally reveal bounded context boundaries.
Second, map the relationships between bounded contexts using DDD's integration patterns. An Upstream/Downstream relationship means one context's model is imposed on another. A Shared Kernel means two contexts share a small common model (dangerous during migration — minimize this). An Anti-Corruption Layer (ACL) is a translation boundary that keeps bounded contexts genuinely independent. In Spring Boot, an ACL is typically implemented as a mapper class that translates between the external model and the internal domain model:
/**
* Anti-Corruption Layer: translates the monolith's UserDTO model
* into the UserService's bounded context model.
* Keeps UserService completely decoupled from monolith domain objects.
*/
@Component
public class UserContextTranslator {
// Converts monolith representation → UserService domain model
public UserProfile fromMonolithDto(MonolithUserDTO dto) {
return UserProfile.builder()
.userId(UserId.of(dto.getId()))
.displayName(dto.getFirstName() + " " + dto.getLastName())
.emailAddress(EmailAddress.of(dto.getEmail()))
.registeredAt(dto.getCreatedAt().toInstant())
// Note: monolith "userType" maps to our RoleEnum differently
.role(mapRole(dto.getUserType()))
.build();
}
// Converts UserService domain model → monolith event payload
public UserUpdatedEvent toMonolithEvent(UserProfile profile) {
return UserUpdatedEvent.builder()
.legacyUserId(profile.getUserId().value())
.fullName(profile.getDisplayName())
.email(profile.getEmailAddress().value())
.eventTimestamp(Instant.now())
.build();
}
private RoleEnum mapRole(String monolithUserType) {
return switch (monolithUserType) {
case "ADMIN", "SUPER_ADMIN" -> RoleEnum.ADMINISTRATOR;
case "SELLER" -> RoleEnum.MERCHANT;
default -> RoleEnum.CUSTOMER;
};
}
}
The ACL is not boilerplate — it is a conscious architectural decision to accept translation overhead in exchange for genuine bounded context independence. Once UserService is fully extracted and the monolith's user model evolves, only the ACL needs to change, not the UserService's domain logic.
6. Database Decomposition: Shared DB → Database per Service
Database decomposition is where most monolith migrations stall. It is tempting to leave the new microservices reading from the monolith's PostgreSQL database with a different schema prefix — this is the "Shared Database" anti-pattern, and it destroys the independence you're trying to achieve. Any schema migration in the monolith can break the microservice, and you can never independently scale the database tier.
The safest decomposition path uses the dual-write, then migrate, then decommission sequence:
Step 1 — Introduce a logical schema boundary. Before the microservice is extracted, move the target tables into a dedicated PostgreSQL schema (e.g., user_svc) within the shared database. Add access control so only the monolith's user-related code touches those tables. This creates the logical boundary while keeping the physical database shared.
Step 2 — Dual-write with the Outbox Pattern. When the new UserService is deployed, configure the monolith to write to both the original tables and publish domain events via the Transactional Outbox pattern. The UserService consumes these events and builds its own read model. This ensures the new service's database is a consistent replica of the monolith's data during the migration window:
// Monolith: Outbox-based dual-write during migration
@Service
@Transactional
public class MonolithUserService {
private final UserRepository userRepository;
private final OutboxEventRepository outboxRepository;
public void updateUserProfile(UpdateUserCommand cmd) {
// 1. Update monolith's own table
User user = userRepository.findById(cmd.userId())
.orElseThrow(() -> new UserNotFoundException(cmd.userId()));
user.updateProfile(cmd.displayName(), cmd.email());
userRepository.save(user);
// 2. Write event to Outbox table IN THE SAME TRANSACTION
// Debezium CDC will tail this table and publish to Kafka
OutboxEvent event = OutboxEvent.builder()
.aggregateType("User")
.aggregateId(cmd.userId())
.eventType("UserProfileUpdated")
.payload(toJson(UserProfileUpdatedPayload.from(user)))
.createdAt(Instant.now())
.build();
outboxRepository.save(event);
// Transaction commits both atomically — no dual-write inconsistency
}
}
UserService consumes from Kafka and applies events to its own PostgreSQL schema. This eliminates dual-write inconsistency at the application level — the monolith writes once, atomically, to its own DB, and the event propagation is handled by infrastructure. At the end of the migration window, point all writes to UserService and retire the monolith's user tables.
Step 3 — Validate consistency, then switch the write path. Run both databases in parallel for at least two weeks. Write a reconciliation job that queries both databases every 15 minutes and alerts on any row-count discrepancies or value mismatches. Once the discrepancy rate is zero for 7 consecutive days, update the strangler facade to route write requests to the new UserService. The monolith's user tables become read-only. After another week of zero-discrepancy validation, decommission the monolith's user tables entirely.
7. Traffic Cutover Strategies: Feature Flags, Canary, Dark Launch
Cutting over traffic from the monolith to a new microservice is never binary. The most resilient migrations use a progression of increasingly bold cutover strategies, each validating the new service's correctness before the next step.
Dark Launch (Shadow Traffic). The safest first step: mirror a percentage of production traffic to the new service without returning its response to the client. The client always gets the monolith's response. Capture the new service's response, compare it to the monolith's, and log any discrepancies. This validates correctness without any customer impact. Implement this in the strangler facade with a WebClient fire-and-forget call:
@Component
@Slf4j
public class DarkLaunchFilter implements GlobalFilter, Ordered {
private final WebClient newOrderServiceClient;
private final ResponseComparator comparator;
private final MeterRegistry metrics;
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
if (!isDarkLaunchCandidate(exchange)) {
return chain.filter(exchange);
}
// Fire shadow request to new OrderService — do not block primary flow
Mono<String> shadowRequest = newOrderServiceClient
.method(exchange.getRequest().getMethod())
.uri(buildShadowUri(exchange))
.headers(h -> h.addAll(exchange.getRequest().getHeaders()))
.retrieve()
.bodyToMono(String.class)
.doOnNext(shadowResponse -> {
// Compare shadow response with monolith's response asynchronously
comparator.compare(exchange, shadowResponse)
.ifPresent(diff -> {
log.warn("Dark launch divergence detected: {}", diff);
metrics.counter("strangler.dark_launch.divergence").increment();
});
})
.onErrorResume(ex -> {
// Shadow failure never affects the primary response
log.error("Dark launch shadow request failed", ex);
metrics.counter("strangler.dark_launch.error").increment();
return Mono.empty();
});
// Subscribe asynchronously — do not await shadow response
shadowRequest.subscribe();
// Return the monolith's response to the client as usual
return chain.filter(exchange);
}
@Override
public int getOrder() { return -90; }
}
Canary Release. After dark launch validation, route a small percentage of live traffic (1%, then 5%, then 25%, then 100%) to the new service. Use the feature flag's canary mode from Section 4 — route users whose ID hash modulo 100 falls below the canary percentage. Monitor error rates, p99 latency, and business KPIs (order completion rates, payment success rates) at each step. Automate rollback via an alert threshold on the error rate metric: if the new service's error rate exceeds 0.5% within a 5-minute window, the feature flag automatically disables the canary cohort.
Blue/Green with Instant Rollback. For the final 0% → 100% flip, use a blue/green deployment at the gateway level. Both the monolith and the new service are live. A single configuration change in the gateway routes all traffic to the new service. Rollback is a single config change back. Keep the monolith running in standby for 48 hours after the cutover, then decommission it once you're confident the new service is stable under full production load.
8. Production Pitfalls and Anti-Patterns
The Distributed Monolith. The most dangerous outcome is creating microservices that are logically independent but operationally coupled. If OrderService makes a synchronous REST call to UserService during the critical order placement path, you've replaced in-process coupling with network coupling — and added latency, timeout risk, and the need for circuit breakers. Before extracting a service, audit every synchronous call it would need to make at runtime. If the list is long, the service is not genuinely bounded.
Extracting Before Refactoring. Pulling a poorly designed module out of a monolith and wrapping it in a Docker container does not make it a good microservice — it makes it a bad microservice that is harder to change. Always refactor the module's internal design first (single responsibility, clean API, testable domain logic), then extract it into an independent deployable. The strangler facade approach gives you the time to do this refactoring incrementally without blocking the migration.
Ignoring Distributed Transactions. The monolith uses ACID database transactions to keep orders, payments, and inventory consistent. Once these are in separate services with separate databases, a 2-phase commit is impractical. You must adopt the Saga pattern — either orchestration-based (Temporal, Spring State Machine) or choreography-based (event-driven Sagas via Kafka). Plan the transactional boundary changes before extraction, not after. Retrofitting Saga patterns onto already-extracted services is painful and error-prone.
Premature Extraction Order. Extracting payment processing before user management creates a dependency problem: PaymentService needs to validate user accounts, but UserService hasn't been extracted yet — so it reads directly from the monolith's user tables. This creates a hidden coupling that surfaces as a production incident the moment user table schemas change. Always extract in dependency order: leaf services (no outbound calls to other bounded contexts) first, core services last.
Inadequate Observability. A monolith produces a single set of logs, metrics, and traces. After extraction, each service produces its own telemetry. Without a distributed tracing strategy in place from Day 1 of the facade deployment, debugging cross-service failures becomes a manual log correlation exercise across multiple Kibana dashboards. Deploy OpenTelemetry instrumentation and a Jaeger or Tempo backend before the first service extraction — not as an afterthought.
"The Strangler Fig application is a way to gradually migrate a legacy system by replacing specific pieces of functionality with new applications and services. As features from the legacy system are replaced, the new system eventually replaces all of the old system's features, strangling the old system and allowing you to decommission it."
— Martin Fowler, StranglerFigApplication (2004)
Key Takeaways
- Never rewrite from scratch — the Strangler Fig pattern enables incremental migration with zero downtime by introducing a facade that routes traffic to both monolith and new services simultaneously.
- Deploy the facade before the first extraction — the routing infrastructure must be operational and proven before any service is pulled out of the monolith, establishing safe operational habits early.
- Use DDD context maps to identify seams — boundaries drawn along domain events and ubiquitous language produce genuinely independent services; boundaries drawn along technical layers produce distributed monoliths.
- Database decomposition follows the Dual-Write + CDC pattern — use Debezium to propagate monolith writes to the new service's database atomically, eliminating dual-write inconsistency at the application level.
- Dark launch before canary before full cutover — validate new service correctness against real production traffic without any customer impact before routing live responses through the new service.
- Extract in dependency order — leaf services (no cross-context dependencies) first, core services last, to avoid creating hidden inter-service couplings during the migration window.
- Plan distributed transactions before extraction — Saga patterns must be designed into the extraction, not retrofitted after; ACID guarantees disappear the moment shared database transactions span service boundaries.
- Observability is a prerequisite, not an afterthought — deploy OpenTelemetry distributed tracing before the first service extraction; debugging cross-service failures without traces is an operational nightmare.
Conclusion
The Strangler Fig Pattern is not a silver bullet — it is a discipline. It requires more upfront investment in routing infrastructure, observability tooling, and careful dependency analysis than a naive "rewrite in parallel" approach. But it pays those costs back with interest: every sprint you execute the playbook, you are shipping production microservices that are serving real traffic, not aspirational code that exists only in a feature branch.
For Spring Boot teams, the combination of Spring Cloud Gateway (facade router), Debezium CDC (database bridge), Kafka (event bus), and feature flags (canary controller) provides a complete, production-proven toolkit for the migration. Start with the simplest bounded context in your monolith — perhaps a read-heavy reference data service — deploy the facade in front of it, and execute the dark launch validation cycle. The operational muscle memory you build on the first, low-risk extraction will serve you well when you reach the complex, high-coupling extractions of PaymentService and OrderService.
Monolith decomposition is ultimately a sociotechnical challenge as much as a technical one. The strangler fig metaphor is apt in another way: the process is slow, patient, and inevitable. Teams that commit to the discipline — seam identification, facade-first, incremental extraction, observability from Day 1 — consistently reach the decommission milestone. Teams that skip steps in pursuit of speed consistently find themselves with a bigger, more complex problem than the monolith they started with.
Leave a Comment
Related Posts
Software Engineer · Java · Spring Boot · Microservices