Spring Cloud Gateway in Production: Routing, Rate Limiting, JWT Auth & Resilience (2026)
A complete production guide to Spring Cloud Gateway: predicate-based routing, custom global filters, JWT authentication, Redis token bucket rate limiting, Resilience4j circuit breaker integration, CORS, observability, and Netty tuning.
1. Why Spring Cloud Gateway?
| Feature | Zuul 1 | Spring Cloud Gateway | Kong |
|---|---|---|---|
| I/O Model | Blocking (Servlet) | ✅ Non-blocking (Netty) | ✅ Non-blocking (Nginx) |
| Spring Boot native | ✅ Yes | ✅ Yes | ❌ Separate process |
| Resilience4j native | ❌ No | ✅ Yes | ❌ Plugin needed |
| Redis rate limiting | ❌ Manual | ✅ Built-in | ✅ Plugin |
| Custom Java filters | ✅ Yes | ✅ Yes (reactive) | ❌ Lua only |
2. Architecture: Predicates, Filters & Route Pipeline
Every request in SCG flows through a pipeline: Global Pre-Filters → Route Matching → Route-Specific GatewayFilters → Downstream Service → Route GatewayFilters (post) → Global Post-Filters → Response.
- Predicates: Path, Host, Method, Header, Query, After/Before/Between, Weight (canary)
- Built-in filters: AddRequestHeader, AddResponseHeader, RewritePath, StripPrefix, CircuitBreaker, RequestRateLimiter, Retry, DedupeResponseHeader
- Execution order: Pre-filters run in ascending order; post-filters run in descending order (onion model)
3. Routing & Service Discovery
spring:
cloud:
gateway:
routes:
- id: user-service
uri: http://192.168.1.10:8081 # Hardcoded IP — breaks on any deployment
predicates:
- Path=/api/users/**
spring:
cloud:
gateway:
discovery:
locator:
enabled: true
lower-case-service-id: true
routes:
- id: user-service
uri: lb://user-service # lb:// uses Eureka/Consul + Ribbon/LoadBalancer
predicates:
- Path=/api/v1/users/**
filters:
- StripPrefix=2 # /api/v1/users/1 -> /users/1
- AddRequestHeader=X-Gateway-Version, 2.0
- name: CircuitBreaker
args:
name: user-service-cb
fallbackUri: forward:/fallback/user-service
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100
redis-rate-limiter.burstCapacity: 200
key-resolver: "#{@userIdKeyResolver}"
4. Custom Global Filter: Request Logging & Correlation IDs
@Component
@Order(Ordered.HIGHEST_PRECEDENCE + 1)
public class CorrelationIdGlobalFilter implements GlobalFilter {
private static final String CORRELATION_HEADER = "X-Correlation-Id";
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String correlationId = exchange.getRequest().getHeaders()
.getFirst(CORRELATION_HEADER);
if (correlationId == null) {
correlationId = UUID.randomUUID().toString();
}
final String finalId = correlationId;
ServerHttpRequest mutatedRequest = exchange.getRequest().mutate()
.header(CORRELATION_HEADER, finalId)
.build();
return chain.filter(exchange.mutate().request(mutatedRequest).build())
.then(Mono.fromRunnable(() -> {
exchange.getResponse().getHeaders().add(CORRELATION_HEADER, finalId);
}));
}
}
5. Rate Limiting with Redis Token Bucket
@Configuration
public class GatewayConfig {
// Rate limit by authenticated user ID (from JWT claim)
@Bean
public KeyResolver userIdKeyResolver() {
return exchange -> {
String auth = exchange.getRequest().getHeaders().getFirst("Authorization");
if (auth != null && auth.startsWith("Bearer ")) {
try {
String token = auth.substring(7);
Jwt jwt = jwtDecoder.decode(token);
return Mono.just(jwt.getSubject()); // user ID from JWT
} catch (Exception e) {
return Mono.just("anonymous-" + exchange.getRequest().getRemoteAddress()
.getAddress().getHostAddress());
}
}
// Fallback: rate limit by IP for unauthenticated requests
return Mono.just(exchange.getRequest().getRemoteAddress()
.getAddress().getHostAddress());
};
}
@Bean
public RedisRateLimiter redisRateLimiter() {
return new RedisRateLimiter(100, 200, 1); // replenish=100/s, burst=200, tokens per request=1
}
}
# application.yml Redis for rate limiting
spring:
data:
redis:
host: redis-cluster.internal
port: 6379
lettuce:
pool:
max-active: 20
min-idle: 5
6. JWT Authentication GlobalFilter
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class JwtAuthenticationFilter implements GlobalFilter {
@Autowired private ReactiveJwtDecoder jwtDecoder;
private static final Set<String> PUBLIC_PATHS = Set.of(
"/api/v1/auth/login", "/api/v1/auth/register",
"/actuator/health", "/actuator/info"
);
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String path = exchange.getRequest().getPath().value();
if (PUBLIC_PATHS.stream().anyMatch(path::startsWith)) {
return chain.filter(exchange); // skip auth for public paths
}
String authHeader = exchange.getRequest().getHeaders().getFirst("Authorization");
if (authHeader == null || !authHeader.startsWith("Bearer ")) {
exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
return exchange.getResponse().setComplete();
}
return jwtDecoder.decode(authHeader.substring(7))
.flatMap(jwt -> {
// Forward validated claims as trusted headers to downstream services
ServerHttpRequest mutated = exchange.getRequest().mutate()
.header("X-User-Id", jwt.getSubject())
.header("X-User-Roles", String.join(",", jwt.getClaimAsStringList("roles")))
.header("X-User-Email", jwt.getClaimAsString("email"))
.build();
return chain.filter(exchange.mutate().request(mutated).build());
})
.onErrorResume(e -> {
exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
return exchange.getResponse().setComplete();
});
}
}
7. Circuit Breaker with Resilience4j
resilience4j:
circuitbreaker:
instances:
user-service-cb:
sliding-window-type: COUNT_BASED
sliding-window-size: 10
failure-rate-threshold: 50 # open CB when 50% of last 10 calls fail
wait-duration-in-open-state: 30s # try again after 30s
permitted-number-of-calls-in-half-open-state: 3
record-exceptions:
- java.net.ConnectException
- java.util.concurrent.TimeoutException
timelimiter:
instances:
user-service-cb:
timeout-duration: 3s # request timeout before CB counts it as failure
// FallbackController — meaningful degraded response
@RestController
public class FallbackController {
@RequestMapping("/fallback/user-service")
public ResponseEntity<Map<String, Object>> userServiceFallback() {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(Map.of(
"error", "User service is temporarily unavailable",
"code", "SERVICE_UNAVAILABLE",
"retryAfter", 30
));
}
}
8. CORS & Security Headers
@Configuration
public class SecurityConfig {
@Bean
public CorsWebFilter corsWebFilter() {
CorsConfiguration config = new CorsConfiguration();
config.setAllowedOriginPatterns(List.of("https://*.myapp.com", "https://myapp.com"));
config.setAllowedMethods(List.of("GET","POST","PUT","DELETE","OPTIONS","PATCH"));
config.setAllowedHeaders(List.of("*"));
config.setAllowCredentials(true);
config.setMaxAge(3600L);
UrlBasedCorsConfigurationSource source = new UrlBasedCorsConfigurationSource();
source.registerCorsConfiguration("/**", config);
return new CorsWebFilter(source);
}
@Bean
public SecurityWebFilterChain securityFilterChain(ServerHttpSecurity http) {
return http
.headers(h -> h
.frameOptions(ServerHttpSecurity.HeaderSpec.FrameOptionsSpec::deny)
.contentTypeOptions(Customizer.withDefaults())
.hsts(hsts -> hsts.maxAgeInSeconds(31536000).includeSubdomains(true))
.xssProtection(Customizer.withDefaults()))
.csrf(ServerHttpSecurity.CsrfSpec::disable) // JWT is CSRF-safe
.build();
}
}
9. Observability: Logging, Metrics & Tracing
- Micrometer metrics:
spring.cloud.gateway.requests(counter by route, status),spring.cloud.gateway.route.requests - Distributed tracing: Spring Boot 3 Micrometer Tracing with OpenTelemetry auto-instruments all WebFlux/Gateway requests — trace-id propagated to downstream services via W3C traceparent header
- Access logs: Custom GlobalFilter captures method, path, status code, duration per request to structured log (JSON)
- Correlation ID: CorrelationIdGlobalFilter ensures every request has a trace-able ID from gateway to all downstream services
10. Production Configuration & Netty Tuning
| Config | Default | Recommended | Why |
|---|---|---|---|
| connection-timeout | 45s | 5s | Fail fast on slow services |
| response-timeout | None | 10s | Prevent thread leaks |
| max-connections | 500 | 5000 | High traffic throughput |
| pending-acquire-timeout | 45s | 3s | Shed load quickly |
11. Interview Questions & Decision Matrix
A: Use the Weight predicate to split traffic: - Weight=group1, 90 sends 90% to v1, - Weight=group1, 10 sends 10% to v2. Gradually shift the weight from 90/10 to 0/100 as confidence grows. Combine with circuit breaker so any failures in v2 automatically fall through to v1.
- JWT validation centralized at gateway
- Redis rate limiting per user/IP
- Circuit breaker on all downstream routes
- Correlation ID filter for tracing
- Centralized CORS policy
- Security headers (HSTS, CSP, X-Frame)
- connection-timeout and response-timeout set
- Multiple gateway instances behind ALB
- Prometheus metrics exposed
- Access log to structured JSON
12. At BRAC IT: Our API Gateway Journey
When I joined the microfinance platform team at BRAC IT in Bangladesh, the API layer was a patchwork of individual Spring Boot services each exposing their own authentication, CORS, and logging logic. There was no single entry point. Every microservice validated JWTs independently, duplicated CORS configuration, and maintained its own access logs. The team decided to centralize this with Spring Cloud Gateway — and the journey taught me more about production API design than any tutorial ever could.
Today the gateway handles 8 million daily requests across 20+ microservices for our microfinance platform. It runs in three replicas behind an AWS Application Load Balancer, each replica on a dedicated t3.large EC2 instance in separate availability zones. On peak days — typically around month-end when loan repayments flood in — we see burst traffic of 2,000+ requests per second with sub-20ms gateway overhead.
Challenge 1: Route Configuration Sprawl
We started with YAML-only routing. Within three months we had 40+ route definitions crammed into application.yml. The problem: some routes needed conditional logic impossible to express in YAML — for instance, routing to different loan-service versions based on the X-Client-Version header. We migrated complex routes to programmatic RouteLocator beans, keeping simple static routes in YAML for readability.
@Configuration
public class LoanServiceRouteConfig {
@Bean
public RouteLocator loanServiceRoutes(RouteLocatorBuilder builder,
JwtAuthenticationFilter jwtFilter) {
return builder.routes()
// Route new mobile clients (v3+) to loan-service-v2
.route("loan-service-v2", r -> r
.path("/api/v1/loans/**")
.and().header("X-Client-Version", "3\\..*") // regex: 3.x.x
.filters(f -> f
.stripPrefix(2)
.addRequestHeader("X-Routed-By", "gateway-v2")
.filter(jwtFilter)
.circuitBreaker(c -> c
.setName("loan-service-cb")
.setFallbackUri("forward:/fallback/loans")))
.uri("lb://loan-service-v2"))
// All other clients go to stable loan-service-v1
.route("loan-service-v1", r -> r
.path("/api/v1/loans/**")
.filters(f -> f
.stripPrefix(2)
.filter(jwtFilter)
.circuitBreaker(c -> c
.setName("loan-service-cb")
.setFallbackUri("forward:/fallback/loans")))
.uri("lb://loan-service-v1"))
.build();
}
}
Challenge 2: JWT Validation Adding 35ms Latency
Our initial JWT filter fetched the JWKS (JSON Web Key Set) from our Keycloak server on every request to validate signatures. Under load this caused 30–40ms of blocking I/O per request and created a hard dependency on Keycloak availability. The fix was two-pronged: use Spring Security's ReactiveJwtDecoder which caches the JWKS in-memory with a configurable TTL, and add a Caffeine-based application-level cache on top of the decoded JWT claims to avoid redundant parsing for the same token within its validity window.
@Configuration
public class JwtConfig {
@Bean
public ReactiveJwtDecoder reactiveJwtDecoder(
@Value("${spring.security.oauth2.resourceserver.jwt.jwk-set-uri}") String jwkUri) {
// NimbusReactiveJwtDecoder caches JWKS for 5 minutes by default
return NimbusReactiveJwtDecoder.withJwkSetUri(jwkUri)
.cache(Cache.builder()
.maximumSize(1)
.expireAfterWrite(Duration.ofMinutes(5))
.build())
.build();
}
@Bean
public Cache<String, JwtClaimsSet> jwtClaimsCache() {
// Cache decoded JWT claims keyed by token fingerprint (first 32 chars of token)
// Tokens expire in 15 min — cache for 14 min to avoid stale entries
return Caffeine.newBuilder()
.maximumSize(50_000)
.expireAfterWrite(Duration.ofMinutes(14))
.recordStats()
.build();
}
}
// In the GlobalFilter — check cache before decoding
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String token = extractToken(exchange);
String cacheKey = token.substring(0, Math.min(32, token.length()));
JwtClaimsSet cached = claimsCache.getIfPresent(cacheKey);
if (cached != null) {
return proceedWithClaims(exchange, chain, cached); // cache hit — 0ms overhead
}
return jwtDecoder.decode(token)
.flatMap(jwt -> {
JwtClaimsSet claims = buildClaimsSet(jwt);
claimsCache.put(cacheKey, claims);
return proceedWithClaims(exchange, chain, claims);
});
}
Challenge 3: Rate Limiting Not Surviving Pod Restarts
We initially tried Bucket4j with an in-memory token bucket. Works great on one pod — but when we scaled to three replicas, each pod maintained its own counter. A user effectively got 3× their rate limit, since each pod independently allowed the full quota. Worse, on pod restart the counter reset, granting a burst. The solution was Spring Cloud Gateway's built-in Redis-backed RequestRateLimiterGatewayFilterFactory. Redis stores token state atomically via a Lua script — all gateway pods share the same counter. When Redis goes down, we fail open (allow traffic) rather than fail closed, since dropping legitimate financial transactions is worse than a brief rate limit bypass.
spring:
cloud:
gateway:
routes:
- id: loan-application-api
uri: lb://loan-service
predicates:
- Path=/api/v1/loans/apply
filters:
- name: RequestRateLimiter
args:
# Per-user: 10 applications/minute, burst of 3
redis-rate-limiter.replenishRate: 10
redis-rate-limiter.burstCapacity: 3
redis-rate-limiter.requestedTokens: 1
key-resolver: "#{@userIdKeyResolver}"
# Fail open when Redis is unavailable (don't drop traffic)
deny-empty-key: false
data:
redis:
host: ${REDIS_HOST:redis-cluster.internal}
port: 6379
timeout: 200ms # fast timeout — fail open if Redis is slow
lettuce:
pool:
max-active: 20
min-idle: 5
max-wait: 100ms
After these three fixes, our gateway overhead dropped from an average of 52ms to 6ms per request. We also saw our Keycloak server load drop by 85% since JWT claims were now served from the Caffeine cache on nearly every request. The programmatic routes gave us the flexibility to run A/B tests on backend services without touching YAML configuration files.
13. Advanced Filter Chains: A Real Example
A production gateway filter needs to do more than just proxy requests. For compliance and debugging, our BRAC IT gateway filter performs four tasks in a single pass: generates a UUID trace ID, propagates it downstream, logs the request and response with timings, and sanitizes sensitive headers so Authorization tokens never appear in log aggregation systems (Elasticsearch in our case).
@Component
@Order(Ordered.HIGHEST_PRECEDENCE + 5)
@Slf4j
public class RequestTracingAndLoggingFilter implements GlobalFilter {
private static final String REQUEST_START_TIME = "requestStartTime";
private static final String TRACE_ID_HEADER = "X-Request-ID";
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
// 1. Generate or propagate trace ID
String traceId = Optional.ofNullable(
exchange.getRequest().getHeaders().getFirst(TRACE_ID_HEADER))
.orElse(UUID.randomUUID().toString());
// 2. Store start time for duration calculation
exchange.getAttributes().put(REQUEST_START_TIME, System.currentTimeMillis());
// 3. Attach trace ID to outbound request headers
ServerHttpRequest mutatedRequest = exchange.getRequest().mutate()
.header(TRACE_ID_HEADER, traceId)
.build();
return chain.filter(exchange.mutate().request(mutatedRequest).build())
.then(Mono.fromRunnable(() -> {
long duration = System.currentTimeMillis() -
(Long) exchange.getAttributes().get(REQUEST_START_TIME);
String method = exchange.getRequest().getMethod().name();
String path = exchange.getRequest().getPath().value();
int status = exchange.getResponse().getStatusCode() != null
? exchange.getResponse().getStatusCode().value() : 0;
// 4. Sanitize — never log the raw Authorization header
String authHeader = exchange.getRequest().getHeaders()
.getFirst("Authorization");
String sanitizedAuth = sanitize(authHeader);
log.info("method={} path={} status={} duration={}ms traceId={} auth={}",
method, path, status, duration, traceId, sanitizedAuth);
// Propagate trace ID back to client response
exchange.getResponse().getHeaders().add(TRACE_ID_HEADER, traceId);
}));
}
/**
* Masks the JWT payload — keeps the header (alg/typ) visible for debugging
* but replaces the payload and signature with asterisks.
* Example: "Bearer eyJhbGciOiJSUzI1NiJ9.***"
*/
private String sanitize(String authHeader) {
if (authHeader == null) return "none";
if (!authHeader.startsWith("Bearer ")) return "[non-bearer]";
String token = authHeader.substring(7);
int firstDot = token.indexOf('.');
if (firstDot < 0) return "Bearer [malformed]";
return "Bearer " + token.substring(0, firstDot + 1) + "***";
}
}
The filter runs at HIGHEST_PRECEDENCE + 5, placing it after the JWT authentication filter (order HIGHEST_PRECEDENCE) but before business filters. This order ensures the trace ID is set before downstream filters see the request, and the logging happens after the response status code is available. The sanitize() method is critical for compliance: Bangladesh Bank's guidelines for digital financial services require that authentication credentials never appear in plaintext in operational logs.
We complement this with a response body logging filter for error responses only (status >= 400). Logging every response body would be prohibitively expensive at 8M requests/day, but logging error bodies helps diagnose failed loan applications and 4xx client errors from mobile apps:
@Component
public class ErrorResponseLoggingFilter implements GlobalFilter, Ordered {
@Override
public int getOrder() { return Ordered.LOWEST_PRECEDENCE - 10; }
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
return chain.filter(exchange).then(Mono.defer(() -> {
HttpStatus status = (HttpStatus) exchange.getResponse().getStatusCode();
if (status != null && status.isError()) {
String traceId = exchange.getRequest().getHeaders()
.getFirst("X-Request-ID");
log.warn("Error response: status={} path={} traceId={}",
status.value(),
exchange.getRequest().getPath().value(),
traceId);
}
return Mono.empty();
}));
}
}
14. Route Predicates Deep Dive
Spring Cloud Gateway ships with a rich library of built-in predicates. Knowing them prevents you from reinventing the wheel with custom code. The table below covers every predicate you'll encounter in production, along with real-world usage guidance:
| Predicate | Example Config | Matches When | Production Use |
|---|---|---|---|
| Path | - Path=/api/v1/loans/** |
URL path matches glob pattern | Primary routing — used on every route |
| Host | - Host=api.myapp.com |
Host header matches (supports wildcards) | Multi-tenant: route by domain |
| Header | - Header=X-Client-Version, 3\..* |
Request header exists and matches regex | API version routing; feature flags |
| Method | - Method=GET,POST |
HTTP method is one of the listed values | Route read-only traffic to read replicas |
| Query | - Query=debug, true |
Query param exists (with optional regex) | Debug mode routing; A/B test groups |
| Weight | - Weight=canary, 10 |
Random % split within a named group | Canary deployments; blue-green rollout |
| RemoteAddr | - RemoteAddr=10.0.0.0/8 |
Client IP falls within CIDR range | Admin routes restricted to internal IPs |
| Cookie | - Cookie=session, abc.* |
Cookie exists and value matches regex | Sticky sessions for legacy services |
| Before / After / Between | - Before=2026-12-31T23:59:59Z |
Current datetime in specified range | Scheduled maintenance windows; sunset APIs |
Predicates are composable with and() in the fluent Java API, and with multiple list entries in YAML (all conditions must pass — implicit AND). For OR logic you define multiple route entries and rely on the first-match wins behavior of the route list. The Weight predicate is particularly powerful: SCG groups routes by the weight group name and randomly distributes traffic according to the declared weights, all evaluated atomically without any external coordination.
Custom Predicate Implementation
Built-in predicates cover 90% of cases, but occasionally you need logic that doesn't fit any of them. At BRAC IT we built a LoanTierPredicate that routes premium loan accounts (loan value > 500,000 BDT) to a high-priority service instance with more CPU allocation:
@Component
public class PremiumUserRoutePredicateFactory
extends AbstractRoutePredicateFactory<PremiumUserRoutePredicateFactory.Config> {
@Autowired private UserTierService userTierService;
public PremiumUserRoutePredicateFactory() {
super(Config.class);
}
@Override
public Predicate<ServerWebExchange> apply(Config config) {
return exchange -> {
// Extract user ID injected by JWT filter upstream
String userId = exchange.getRequest().getHeaders()
.getFirst("X-User-Id");
if (userId == null) return false;
// Check user tier — cached in Redis, <1ms lookup
return userTierService.isPremium(userId);
};
}
@Validated
public static class Config {
// No config params needed for this predicate
}
}
// Usage in YAML:
// predicates:
// - Path=/api/v1/loans/**
// - PremiumUser=
// uri: lb://loan-service-premium
15. Detailed Production Checklist
After running Spring Cloud Gateway in production at BRAC IT for over a year, handling financial transactions that directly affect loan applicants' livelihoods, I've consolidated our operational learnings into this checklist. Each item includes why it matters — not just what to do.
| Area | Check | Why It Matters |
|---|---|---|
| Security | Disable SCG's default actuator exposure; expose only /actuator/health and /actuator/prometheus on a separate management port |
Default actuator endpoints leak route config, env vars, and thread dumps to anyone who can reach the gateway |
| Security | Strip internal headers (X-User-Id, X-User-Roles) from inbound requests before your JWT filter runs |
Prevents clients from spoofing trusted headers that microservices rely on for authorization decisions |
| Performance | Configure Netty connection pool: max-connections=500, pending-acquire-timeout=3s, acquire-timeout=5s |
Default pool is often too small for high-throughput APIs, causing queuing in the gateway rather than upstream |
| Performance | Cache JWKS with 5-minute TTL; add application-level JWT claim cache (Caffeine, 14-minute TTL) | Eliminates per-request network calls to identity provider; 10–40ms latency savings per request under load |
| Reliability | Attach a CircuitBreaker filter to every upstream route; define per-service fallback URIs |
Without circuit breakers, a slow upstream service causes thread/connection exhaustion that cascades to all routes |
| Reliability | Set connect-timeout: 2000 and response-timeout: 10s per route; never rely on gateway-level defaults |
Upstream services that hang indefinitely tie up Netty threads; timeouts are the last line of defense against hangs |
| Observability | Enable Micrometer + Prometheus; add spring.cloud.gateway.metrics.enabled=true |
Exposes per-route request counts, error rates, and latency histograms — the foundation of any SLO dashboard |
| Observability | Emit structured JSON access logs with trace ID, user ID, path, status, and duration on every request | Structured logs enable alerting on error rate spikes and trace correlation across services in your log aggregator |
| Operations | Configure graceful shutdown: server.shutdown=graceful + spring.lifecycle.timeout-per-shutdown-phase=30s |
Without graceful shutdown, in-flight requests are dropped on every deployment — especially harmful for financial transactions |
| Scalability | Use Redis Cluster (not standalone) for rate limiting state; test Redis failover behavior explicitly | A single Redis node is a SPOF for all rate limiting; cluster mode survives node failures without resetting counters |
| Scalability | Deploy 3+ gateway replicas across availability zones; configure ALB with HTTP/2 and keep-alive | A single gateway is a critical SPOF; 3 replicas allow one pod to restart during a deploy without losing availability |
Beyond the table above, one lesson I'll emphasize from running this in production at BRAC IT: test your fallback routes under load. We discovered during a load test that our Resilience4j circuit breaker fallback handler was itself making a downstream call (to a health-check service) — which meant that when the primary service was under stress, the fallback added more load. Fallback handlers should be entirely local: return a static JSON response, read from a cache, or serve a degraded-but-safe response. Never let a fallback call out to another service.
// application.yml — graceful shutdown
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
# Kubernetes preStop hook (give in-flight requests time to complete)
# spec.containers.lifecycle:
# preStop:
# exec:
# command: ["/bin/sh", "-c", "sleep 5"]
---
// FallbackController — entirely local, zero downstream calls
@RestController
public class GatewayFallbackController {
@GetMapping("/fallback/loans")
public ResponseEntity<Map<String, Object>> loansFallback(
ServerWebExchange exchange) {
String traceId = exchange.getRequest().getHeaders()
.getFirst("X-Request-ID");
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.header("X-Request-ID", traceId)
.header("Retry-After", "30")
.body(Map.of(
"error", "LOAN_SERVICE_UNAVAILABLE",
"message", "The loan service is temporarily unavailable. Please retry in 30 seconds.",
"traceId", traceId != null ? traceId : "unknown",
"timestamp", Instant.now().toString()
));
}
@GetMapping("/fallback/user-service")
public ResponseEntity<Map<String, Object>> userServiceFallback(
ServerWebExchange exchange) {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(Map.of(
"error", "USER_SERVICE_UNAVAILABLE",
"message", "User service is temporarily unavailable.",
"timestamp", Instant.now().toString()
));
}
}