Backpressure & Flow Control in Reactive Microservices Under Load

Backpressure and flow control in reactive microservices under high load

A reactive system that ignores backpressure is not reactive — it is a time bomb. When a fast producer floods a slow consumer, memory fills, queues overflow, latency spikes, and the entire system collapses in a cascade that is far worse than the original overload. Backpressure is the mechanism that prevents this: a signal from consumer to producer saying "slow down, I cannot keep up."

Part of the Distributed Systems Failure Handling Series.

Introduction

Reactive programming promises non-blocking, event-driven systems that handle high concurrency efficiently. The Reactive Streams specification (implemented by Project Reactor, RxJava, and Akka Streams) codifies backpressure as a first-class protocol: publishers cannot push data to subscribers at will. Subscribers request items explicitly, and publishers may only emit as many items as requested. This demand-driven model is what separates true reactive systems from systems that are merely asynchronous.

In practice, backpressure breaks down at service boundaries. When data flows from Kafka topics to Spring WebFlux endpoints to downstream HTTP services, each boundary has different throughput characteristics. The Reactive Streams contract governs behavior within a single JVM. Across service boundaries — HTTP, gRPC, message brokers — backpressure must be re-implemented using protocol-level mechanisms: HTTP/2 flow control, gRPC streaming, Kafka consumer fetch configuration, and RSocket's request-n protocol.

This post covers the complete backpressure story: from Project Reactor internals to Kafka consumer lag management, to cross-service flow control patterns used in production systems at scale.

Real-World Problem: The OOM Cascade

An e-commerce platform's order processing service consumed events from a Kafka topic at up to 50,000 events/second during flash sales. Each event triggered an enrichment call to a product catalog service and a customer profile service. During a Black Friday event, the catalog service slowed to 800 ms per request due to database contention. The order service, using Spring WebFlux with a reactive Kafka consumer, continued consuming events at full speed, accumulating in-flight enrichment operations as unbounded queues in memory. Within 14 minutes, the service was OOMKilled. The cascade: Kafka lag grew to 12 million messages, downstream services received retry storms, and the incident lasted 3 hours. The root cause was a single missing backpressure configuration: the reactive Kafka consumer's maxPollRecords was 5000, and the in-flight requests were never bounded.

Deep Dive: The Reactive Streams Contract

The Reactive Streams spec defines four interfaces: Publisher<T> (emits items), Subscriber<T> (consumes items), Subscription (the channel for requesting items and cancelling), and Processor<T,R> (transforms items). The key rule: a Publisher may only emit N items after the Subscriber has requested N items via Subscription.request(n). Any excess emission is a protocol violation.

Project Reactor implements this with Flux and Mono types. Most Flux operators are naturally backpressure-aware — flatMap, concatMap, buffer, and window all pass backpressure signals upstream. The danger is operators that break the contract: Flux.create() with a hot source that ignores backpressure, timer-based sources, and any integration with non-reactive APIs that must be bridged via Flux.fromIterable() or Schedulers.

Solution Approach: Layered Backpressure Strategy

Layer 1 — Bound In-Flight Operations

The most critical control for reactive microservices is bounding the concurrency of downstream calls. In Project Reactor, flatMap has an implicit concurrency limit: Flux.flatMap(fn, concurrency) where concurrency defaults to 256. This means up to 256 downstream requests can be in-flight simultaneously. During slow consumer scenarios, this limit prevents unbounded memory growth. Set it explicitly rather than relying on the default: flux.flatMap(item -> callDownstream(item), maxConcurrency, prefetch) where maxConcurrency is tuned to match the downstream service's comfortable throughput and prefetch controls the upstream buffer size.

Layer 2 — Reactive Kafka Consumer Configuration

Spring Kafka's reactive consumer (ReactiveKafkaConsumerTemplate) exposes several key backpressure levers: maxPollRecords (how many records to fetch per poll — reduce from default 500 to match processing rate), maxPollInterval (maximum time between polls before the consumer is kicked out of the group), and the concurrency parameter on @KafkaListener. For reactive consumers, apply downstream concurrency limits immediately: kafkaReceiver.receive().flatMap(record -> process(record), MAX_CONCURRENT).subscribe(). Without the concurrency parameter, flatMap will eagerly subscribe to all incoming records simultaneously.

Layer 3 — Overflow Strategies

When the producer outpaces the consumer and the internal buffer fills, Project Reactor applies an overflow strategy to determine what happens to excess items. The strategies are: BUFFER (default, unbounded — the source of OOM), DROP (discard excess items — acceptable for metrics/telemetry, not for transactional data), LATEST (keep only the most recent item — useful for sensor/state updates), ERROR (signal an OverflowException — triggers the error handling path), and IGNORE (for hot sources where the subscriber is expected to handle any rate). For financial events, ERROR with retry is safer than silently dropping. Apply with Flux.onBackpressureBuffer(1000, BufferOverflowStrategy.ERROR).

Layer 4 — Cross-Service Backpressure with HTTP/2

HTTP/1.1 has no backpressure mechanism — the server returns HTTP 429 Too Many Requests and the client must implement retry with exponential backoff. HTTP/2 has built-in flow control: stream-level and connection-level windows limit how much data can be in flight. Spring WebFlux's WebClient with HTTP/2 transport respects these flow control windows, automatically slowing emission when the server's window is full. For gRPC streaming, use bidirectional streaming with explicit flow control: the consumer sends an onNext(requestN) signal to control how many messages the producer sends.

Layer 5 — RSocket for True Reactive Cross-Service Backpressure

RSocket is a binary, multiplexed, backpressure-aware protocol designed specifically for reactive service communication. Unlike HTTP, RSocket carries the Reactive Streams demand signal across the wire. A Spring Boot service using RSocket's requestStream interaction model sends demand requests (request(n)) to the remote service, which emits at most n items before waiting for the next demand signal. This provides true end-to-end backpressure for streaming workloads. Spring Boot supports RSocket via spring-boot-starter-rsocket.

Failure Scenarios

Backpressure blindness in legacy integrations: When bridging between a reactive pipeline and a blocking API (JDBC, legacy REST client), use Schedulers.boundedElastic() to isolate blocking calls. Without isolation, blocking calls on reactive threads cause thread starvation — all event-loop threads block, and the entire service becomes unresponsive. Wrapper: Mono.fromCallable(() -> blockingOperation()).subscribeOn(Schedulers.boundedElastic()).

Reactor Context pollution: Reactive pipelines use Reactor Context instead of ThreadLocal for propagating request-scoped data (MDC logging, security context, tracing spans). When switching schedulers or bridging to CompletableFuture, the context can be lost. Use Mono.deferContextual() and explicitly propagate context across scheduler boundaries. Micrometer's ObservationRegistry integrates with Reactor Context natively for tracing.

Hot observable memory accumulation: A Flux.create() or ConnectableFlux that emits at a fixed rate regardless of subscriber demand creates a hot observable that will fill any bounded buffer and then apply the overflow strategy. Ensure that hot sources (event bus emissions, WebSocket frames, timer ticks) have explicit backpressure handling via onBackpressureBuffer, onBackpressureDrop, or onBackpressureLatest.

Architecture: Backpressure-Aware Order Processing

Production architecture: Kafka topic → ReactiveKafkaConsumerTemplate (maxPollRecords=50) → Flux.flatMap(concurrency=20) → parallel enrichment calls (catalog + profile) with 10s timeout each → Flux.zip() to join responses → downstream persistence. The concurrency=20 limit means at most 20 orders are enriched simultaneously, creating natural backpressure to the Kafka consumer. If enrichment takes longer, Kafka fetch slows automatically because maxPollInterval is respected. Consumer lag acts as the buffer — Kafka absorbs the producer excess rather than JVM heap.

Monitoring Backpressure Health

Key metrics: Kafka consumer group lag (kafka_consumer_group_lag) — healthy if stable or decreasing, alarming if growing monotonically. JVM heap used (jvm_memory_used_bytes{area="heap"}) — sudden growth often indicates backpressure failure. Active reactive operator subscriptions — Micrometer's reactor.flow.active (requires Reactor 3.5+ with metrics enabled). HTTP 429 rate from downstream services — indicates the downstream backpressure signal is being transmitted correctly. Thread pool queue depth for boundedElastic scheduler — indicates blocking operation backlog.

Trade-offs

Aggressive backpressure means slower throughput — the system processes at the rate of its slowest component. This is correct behavior, not a bug. The alternative is queue overflow and OOM. The engineering trade-off is between throughput (relax backpressure, use larger buffers) and stability (strict backpressure, smaller buffers, shed load early). For financial transactions where every event must be processed exactly once, stability wins. For analytics pipelines where dropping 0.1% of events is acceptable, throughput wins. Choose your overflow strategy accordingly.

Key Takeaways

  • Backpressure is the demand signal from consumer to producer — without it, reactive systems are just asynchronous systems waiting to OOM
  • Bound flatMap concurrency explicitly — the default of 256 is rarely appropriate for downstream service calls
  • Reactive Kafka consumers require explicit concurrency limits at the consumer level, not just in the downstream pipeline
  • Cross-service backpressure requires protocol-level support — HTTP/2 flow control, gRPC streaming, or RSocket
  • Use onBackpressureError() for transactional data; reserve onBackpressureDrop() for non-critical telemetry
  • Monitor Kafka consumer lag as a backpressure health signal — it should be your primary indicator

Conclusion

Backpressure is the mechanism that transforms a collection of asynchronous operations into a truly reactive, resilient system. Without it, reactive microservices fail under load in ways that are worse than synchronous systems — cascading memory exhaustion, queue overflow, and retry storms that amplify rather than absorb the original load spike. With proper layered backpressure — bounded concurrency in operators, protocol-level flow control at service boundaries, and conscious overflow strategies — reactive microservices can handle load that exceeds their nominal capacity gracefully, degrading throughput rather than stability.

Related Articles

Discussion / Comments

Join the conversation — your comment goes directly to my inbox.

← Back to Blog