What is API Gateway Aggregation Layer and how does it work?

API Gateway aggregation is the right tool for simple, declarative composition requirements — combining two or three service responses where the composition logic can be expressed in configuration rather than code. Spring Cloud Gateway with a custom aggregating filter or Kong's request transformer plugin are common implementations. A Spring Cloud Gateway custom filter that aggregates order and customer data illustrates the pattern: @Component public class OrderCustomerAggregatorFilter implements GatewayFilter { private final WebClient webClient = WebClient.builder().build(); @Override public Mono filter(ServerWebExchange exchange, GatewayFilterChain chain) { String orderId = exchange.getRequest().getPath().lastElement().value(); // Fetch order and customer in parallel using reactive composition Mono orderMono = webClient.get() .uri("http://order-service/orders/" + orderId) .retrieve() .bodyToMono(Order.class) .timeout(Duration.ofMillis(300)); return orderMono.flatMap(order -> webClient.get() .uri("http://customer-service/customers/" + order.getCustomerId()) .retrieve() .bodyToMono(Customer.class) .timeout(Duration.ofMillis(300)) .map(customer -> { // Merge order + customer into single response return AggregatedOrderResponse.builder().

Microservices

API Composition and Data Aggregation in Microservices: BFF Pattern, GraphQL & Avoiding Distributed Joins

Q: How does the The BFF (Backend for Frontend) Pattern work and when should you use it?

The Backend for Frontend pattern, coined by Sam Newman in 2015, is the most widely adopted composition pattern in production microservices architectures. The core principle is simple: create one aggregation backend per client type, owned and operated by the same team that builds the client. The Mobile BFF is built by the mobile team. The Web BFF is built by the web team. The Partner API BFF is built by the platform team. Each BFF is deeply tailored to what its client actually needs. A production Java Spring Boot Mobile BFF for the order details use case replaces the 480ms sequential chain with parallel composition using CompletableFuture : @Service public class OrderDetailsBff { private final OrderServiceClient orderClient; private final CustomerServiceClient customerClient; private final ProductServiceClient productClient; private final ShippingServiceClient shippingClient; private.

Q: What is GraphQL Federation for Data Aggregation and how does it work?

GraphQL Federation (Apollo Federation v2) takes a schema-driven approach to composition: each microservice defines its portion of a unified GraphQL schema, and an Apollo Router stitches them together into a single queryable graph. Clients send a single GraphQL query to the router, the router's query planner determines which subgraph services to call and in what order, executes independent subgraph queries in parallel, and merges the results. Each service declares its contribution to the graph using the @key directive to define its entity's primary key, enabling cross-service entity resolution: # OrderService GraphQL schema (subgraph) type Order @key(fields: "id") { id: ID! customerId: ID! customer: Customer # resolved by CustomerService via federation items: [OrderItem!]! total: Float! status: OrderStatus! createdAt: String! } type OrderItem { productId: ID! product: Product # resolved by ProductService.

One of the most persistent pain points in microservices architectures is the data aggregation problem: a single UI screen requires data from five different services, and without a composition layer, the client makes five sequential HTTP calls adding hundreds of milliseconds of unnecessary latency. The distributed N+1 query problem is the microservices equivalent of the ORM anti-pattern that has plagued monolithic applications for decades. This guide examines every production-viable composition strategy: the Backend for Frontend pattern, GraphQL federation, API Gateway aggregation, CQRS read models, and high-performance parallel composition with caching and circuit breakers.

Md Sanwar Hossain April 1, 2026 17 min read Microservices

API composition in microservices - BFF pattern, GraphQL federation, and data aggregation

The N+1 Problem in Microservices
API Composition Patterns Overview
The BFF (Backend for Frontend) Pattern
GraphQL Federation for Data Aggregation
API Gateway Aggregation Layer
CQRS Read Models for Pre-Composed Data
Saga-Based Data Consistency vs Aggregation
Performance Optimization: Parallel Composition and Caching
Key Takeaways
Conclusion

1. The N+1 Problem in Microservices

API Composition Patterns | mdsanwarhossain.me — API Composition Patterns — mdsanwarhossain.me

Consider a concrete scenario that plays out in almost every microservices migration: the order details page. This screen needs to display comprehensive information about an order, but that information lives across five different services. The Order Service holds the order header, items list, and total. The Customer Service holds the buyer's name, email, and loyalty tier. The Product Service holds the name, description, image, and current price for each item in the order — one call per distinct product. The Shipping Service holds the current tracking status and estimated delivery date. The Payment Service holds the transaction ID, payment method, and authorization status.

A naive implementation fetches these sequentially:

// ANTI-PATTERN: Sequential service calls — the distributed N+1 problem
public OrderDetailsDto getOrderDetails(String orderId) {
    // 1. Fetch order (20ms)
    Order order = orderServiceClient.getOrder(orderId);

    // 2. Fetch customer (20ms) — sequential, waits for order
    Customer customer = customerServiceClient.getCustomer(order.getCustomerId());

    // 3. Fetch each product — N sequential calls (20ms each, N=20 items)
    List<Product> products = new ArrayList<>();
    for (String productId : order.getProductIds()) {
        products.add(productServiceClient.getProduct(productId));  // 20ms × 20 = 400ms
    }

    // 4. Fetch shipping status (20ms)
    ShippingStatus shipping = shippingServiceClient.getStatus(orderId);

    // 5. Fetch payment info (20ms)
    PaymentInfo payment = paymentServiceClient.getPayment(order.getPaymentId());

    // Total: 20 + 20 + (20 × 20) + 20 + 20 = 480ms minimum
    // P99 with network jitter and retries: easily 1.5-2 seconds
    return buildDto(order, customer, products, shipping, payment);
}

For an order with 20 line items, this adds up to 24 sequential HTTP calls: 1 (order) + 1 (customer) + 20 (products) + 1 (shipping) + 1 (payment) = 24 calls at 20ms each = 480ms minimum round-trip time, before accounting for network jitter, garbage collection pauses, or service queue depth under load. At P99 in production, this easily exceeds 1.5 seconds — an eternity for a page load that a user expects in under 200ms.

This is the distributed N+1 problem. In ORM terms, it's the equivalent of loading a list of orders and then executing a separate SQL query for each order to load its customer — a pattern that every experienced developer knows to fix with eager loading or a JOIN. In microservices, the equivalent fixes are service composition patterns: BFF, GraphQL federation, CQRS read models, and parallel HTTP composition. The choice among these patterns depends on team structure, query complexity, and performance requirements — which we address in the following sections.

Warning: The distributed N+1 problem is insidious because it is invisible in development (where all services are local) and catastrophic in production (where network latency is real). Always measure end-to-end API latency with distributed tracing — a 480ms sequential call chain is invisible in unit tests but obvious in a Jaeger trace waterfall.

2. API Composition Patterns Overview

Four distinct composition patterns exist, each with a different trade-off profile. Understanding which pattern fits which situation is as important as knowing how to implement them.

Client-side composition has the frontend call each service directly. The browser or mobile app makes parallel API calls and assembles the UI from the responses. This pattern is used by some simpler SPAs but has significant drawbacks at scale: it exposes internal service boundaries to external clients, makes mobile clients over cellular networks particularly chatty, creates tight coupling between UI code and service APIs, and duplicates composition logic across web and mobile clients. It is appropriate only for simple use cases with two or three services and is generally considered an anti-pattern for complex data requirements.

API Gateway composition places a composition layer at the network edge. The gateway aggregates responses from multiple upstream services before returning a combined response to the client. This works well for simple 2-3 service aggregations (order + customer, product + inventory) and can be configured declaratively in some gateway products. Its limitation is that gateways are infrastructure components — they should not contain complex business logic, and complex multi-hop aggregations push beyond what declarative gateway configuration can express without becoming unmaintainable.

Backend for Frontend (BFF) is a dedicated aggregation service per client type. Each BFF is a purpose-built service owned by the client team, containing the composition logic tailored precisely to what each client needs. A Mobile BFF returns a compact payload optimized for limited bandwidth and small screens. A Web BFF returns richer data. A Partner API BFF returns a stable, versioned interface for external integrations. BFF is the most flexible and team-aligned pattern.

GraphQL federation enables schema-driven composition where each service owns its portion of a unified graph. Clients query a single federated endpoint and get exactly the fields they need — no over-fetching, no under-fetching. The federation layer (Apollo Router) intelligently plans which subgraph services to call and parallelizes independent subgraph queries. This is the most powerful pattern for complex data graphs but carries significant operational complexity.

CQRS read models are a pre-computation approach: rather than composing at query time, the read model is continuously updated by consuming events from all relevant services and materializing a denormalized, pre-joined document. Query latency drops to a single database read (<5ms), but write latency increases and the read model lags the source of truth by the event propagation time (typically 100ms-2s).

3. The BFF (Backend for Frontend) Pattern

Aggregator Pattern | mdsanwarhossain.me — Aggregator Pattern — mdsanwarhossain.me

The Backend for Frontend pattern, coined by Sam Newman in 2015, is the most widely adopted composition pattern in production microservices architectures. The core principle is simple: create one aggregation backend per client type, owned and operated by the same team that builds the client. The Mobile BFF is built by the mobile team. The Web BFF is built by the web team. The Partner API BFF is built by the platform team. Each BFF is deeply tailored to what its client actually needs.

A production Java Spring Boot Mobile BFF for the order details use case replaces the 480ms sequential chain with parallel composition using CompletableFuture:

@Service
public class OrderDetailsBff {

    private final OrderServiceClient orderClient;
    private final CustomerServiceClient customerClient;
    private final ProductServiceClient productClient;
    private final ShippingServiceClient shippingClient;
    private final PaymentServiceClient paymentClient;
    private final Executor compositionExecutor;  // dedicated thread pool

    public MobileOrderDetailsDto getOrderDetails(String orderId) throws Exception {
        // Step 1: Fetch order (required — it drives other calls)
        Order order = orderClient.getOrder(orderId);  // 20ms

        // Step 2: Fan out all remaining calls in parallel
        CompletableFuture<Customer> customerFuture = CompletableFuture.supplyAsync(
            () -> customerClient.getCustomer(order.getCustomerId()), compositionExecutor);

        // Batch product fetch: single call for all product IDs (if Product Service supports it)
        CompletableFuture<List<Product>> productsFuture = CompletableFuture.supplyAsync(
            () -> productClient.getProductsBatch(order.getProductIds()), compositionExecutor);

        CompletableFuture<ShippingStatus> shippingFuture = CompletableFuture.supplyAsync(
            () -> shippingClient.getStatus(orderId), compositionExecutor);

        CompletableFuture<PaymentInfo> paymentFuture = CompletableFuture.supplyAsync(
            () -> paymentClient.getPayment(order.getPaymentId()), compositionExecutor);

        // Wait for all parallel calls to complete (max wait = slowest service)
        CompletableFuture.allOf(customerFuture, productsFuture,
                                shippingFuture, paymentFuture).get(500, TimeUnit.MILLISECONDS);

        // Total time ≈ 20ms (order) + max(customerFuture, productsFuture, shippingFuture, paymentFuture)
        // ≈ 20ms + 25ms (batch product call is slightly slower than single) = ~45ms instead of 480ms

        // Return mobile-optimized payload (subset of full data)
        return MobileOrderDetailsDto.builder()
            .orderId(orderId)
            .orderStatus(order.getStatus())
            .customerName(customerFuture.get().getDisplayName())  // only name, not full profile
            .items(buildMobileItems(order.getItems(), productsFuture.get()))
            .trackingStatus(shippingFuture.get().getTrackingStatus())
            .paymentLast4(paymentFuture.get().getLast4())  // only last 4 digits for mobile
            .totalAmount(order.getTotal())
            .build();
    }
}

The parallel composition reduces total latency from 480ms to approximately 45ms — a 10× improvement driven by two techniques: (1) parallel fan-out using CompletableFuture.allOf() so all service calls execute simultaneously, and (2) replacing N individual product calls with a single batch endpoint that fetches all product IDs in one request. The batch endpoint is a critical optimization — it requires the Product Service to expose a bulk fetch API (POST /products/batch or GET /products?ids=p1,p2,p3), which is a service design discipline worth enforcing across your platform.

The BFF pattern has two well-documented anti-patterns to avoid. The shared BFF defeats the entire purpose: if Web, Mobile, and Partner teams share a single BFF, each team's requirements pollute the shared codebase, response payloads become over-engineered to satisfy everyone, and the BFF becomes a new monolith. Business logic creep is the second anti-pattern: the BFF should only compose and transform data, never contain business rules like pricing calculations, discount logic, or eligibility checks. Those belong in the domain services. A BFF that calculates loyalty point accrual is a domain service that needs to be extracted.

Key insight: The BFF pattern is as much an organizational pattern as a technical one. Team ownership — the mobile team owns the mobile BFF — ensures the BFF evolves in lockstep with its client, and no cross-team coordination is needed to add a field or optimize a payload. This autonomy is the primary business value, not just the latency improvement.

4. GraphQL Federation for Data Aggregation

GraphQL Federation (Apollo Federation v2) takes a schema-driven approach to composition: each microservice defines its portion of a unified GraphQL schema, and an Apollo Router stitches them together into a single queryable graph. Clients send a single GraphQL query to the router, the router's query planner determines which subgraph services to call and in what order, executes independent subgraph queries in parallel, and merges the results.

API Composition in Microservices | mdsanwarhossain.me — API Composition in Microservices — mdsanwarhossain.me

Each service declares its contribution to the graph using the @key directive to define its entity's primary key, enabling cross-service entity resolution:

# OrderService GraphQL schema (subgraph)
type Order @key(fields: "id") {
  id: ID!
  customerId: ID!
  customer: Customer    # resolved by CustomerService via federation
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
  createdAt: String!
}

type OrderItem {
  productId: ID!
  product: Product      # resolved by ProductService via federation
  quantity: Int!
  price: Float!
}

type Query {
  order(id: ID!): Order
  ordersByCustomer(customerId: ID!, limit: Int = 20): [Order!]!
}

# CustomerService GraphQL schema (subgraph)
type Customer @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
  loyaltyTier: String!
  # CustomerService extends the Order type (reference resolver)
}

# ProductService GraphQL schema (subgraph)
type Product @key(fields: "id") {
  id: ID!
  name: String!
  description: String!
  imageUrl: String!
  price: Float!
  inStock: Boolean!
}

# Client query — single request to Apollo Router:
query GetOrderDetails($orderId: ID!) {
  order(id: $orderId) {
    id
    status
    total
    customer {          # resolved by CustomerService
      name
      loyaltyTier
    }
    items {
      quantity
      price
      product {        # resolved by ProductService (batched via DataLoader)
        name
        imageUrl
      }
    }
  }
}

The Apollo Router's query planner generates an execution plan for this query that closely resembles what our BFF implemented manually: fetch the Order first (to get customerId and productIds), then in parallel fetch Customer and all Products. The federation layer automates what the BFF hand-codes. Critically, the N+1 problem within GraphQL resolvers is solved by DataLoader: rather than each product field resolver making an individual HTTP call to ProductService, DataLoader batches all product ID lookups within a single GraphQL execution into one batch request — exactly the same optimization we applied in the BFF, but automatic.

GraphQL Federation's strengths shine in organizations with many teams and complex data graphs. Each team owns their subgraph schema independently — the Order team owns the Order type, the Product team owns the Product type. A frontend developer can explore the entire graph in GraphQL Playground, discover what data is available without reading service documentation, and request exactly the fields needed. There is no over-fetching (the client specifies exactly which fields to return) and no under-fetching (the router fetches from all relevant services automatically). The operational complexity of maintaining the Apollo Router, managing schema registry, and coordinating subgraph schema changes is the primary cost.

5. API Gateway Aggregation Layer

API Gateway aggregation is the right tool for simple, declarative composition requirements — combining two or three service responses where the composition logic can be expressed in configuration rather than code. Spring Cloud Gateway with a custom aggregating filter or Kong's request transformer plugin are common implementations.

A Spring Cloud Gateway custom filter that aggregates order and customer data illustrates the pattern:

@Component
public class OrderCustomerAggregatorFilter implements GatewayFilter {

    private final WebClient webClient = WebClient.builder().build();

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String orderId = exchange.getRequest().getPath().lastElement().value();

        // Fetch order and customer in parallel using reactive composition
        Mono<Order> orderMono = webClient.get()
            .uri("http://order-service/orders/" + orderId)
            .retrieve()
            .bodyToMono(Order.class)
            .timeout(Duration.ofMillis(300));

        return orderMono.flatMap(order ->
            webClient.get()
                .uri("http://customer-service/customers/" + order.getCustomerId())
                .retrieve()
                .bodyToMono(Customer.class)
                .timeout(Duration.ofMillis(300))
                .map(customer -> {
                    // Merge order + customer into single response
                    return AggregatedOrderResponse.builder()
                        .order(order)
                        .customerName(customer.getName())
                        .customerTier(customer.getLoyaltyTier())
                        .build();
                })
        ).flatMap(aggregated -> {
            byte[] body = objectMapper.writeValueAsBytes(aggregated);
            exchange.getResponse().getHeaders().setContentType(MediaType.APPLICATION_JSON);
            exchange.getResponse().getHeaders().setContentLength(body.length);
            DataBuffer buffer = exchange.getResponse().bufferFactory().wrap(body);
            return exchange.getResponse().writeWith(Mono.just(buffer));
        });
    }
}

# Spring Cloud Gateway route configuration (YAML)
spring:
  cloud:
    gateway:
      routes:
        - id: order-details-aggregated
          uri: lb://order-service
          predicates:
            - Path=/api/orders/{orderId}/details
          filters:
            - OrderCustomerAggregatorFilter
            - name: CircuitBreaker
              args:
                name: orderAggregator
                fallbackUri: forward:/fallback/order-details

Gateway aggregation is appropriate for simple joins (order + customer, product + inventory level) where the composition logic fits in a few dozen lines of code. It degrades quickly for complex scenarios: multi-hop composition where the result of one service call determines which other services to call, conditional aggregation based on order state, or aggregation requiring business logic like price recalculation. These cases should use a BFF instead. A gateway bloated with complex aggregation filters becomes an operational nightmare — impossible to test, difficult to reason about, and a single point of failure for multiple client types.

6. CQRS Read Models for Pre-Composed Data

CQRS (Command Query Responsibility Segregation) read models represent the ultimate optimization for query performance: instead of composing data at query time, we pre-compute and materialize the composed view by handling domain events as they arrive. Every event from every relevant service updates a denormalized, pre-joined document store. Queries hit this document store with a single read — no inter-service calls at all.

The event handler that builds and maintains the order details read model:

// Read model: pre-composed order details document in MongoDB
// Single document contains all data needed for the order details screen

// Event handlers that maintain the read model
@EventHandler
public class OrderDetailsReadModelUpdater {

    private final MongoCollection<Document> orderDetailsCollection;

    @KafkaListener(topics = "order-events")
    public void onOrderEvent(OrderEvent event) {
        if (event instanceof OrderCreatedEvent e) {
            Document doc = new Document()
                .append("_id", e.getOrderId())
                .append("status", "CREATED")
                .append("customerId", e.getCustomerId())
                .append("items", buildItemDocuments(e.getItems()))
                .append("total", e.getTotal())
                .append("createdAt", Instant.now())
                .append("customerName", null)   // populated when CustomerUpdated arrives
                .append("shippingStatus", null) // populated when ShippingCreated arrives
                .append("paymentLast4", null);  // populated when PaymentProcessed arrives
            orderDetailsCollection.insertOne(doc);
        } else if (event instanceof OrderStatusChangedEvent e) {
            orderDetailsCollection.updateOne(
                Filters.eq("_id", e.getOrderId()),
                Updates.set("status", e.getNewStatus())
            );
        }
    }

    @KafkaListener(topics = "customer-events")
    public void onCustomerEvent(CustomerEvent event) {
        if (event instanceof CustomerProfileUpdatedEvent e) {
            // Update all order documents for this customer with new name
            orderDetailsCollection.updateMany(
                Filters.eq("customerId", e.getCustomerId()),
                Updates.set("customerName", e.getDisplayName())
            );
        }
    }

    @KafkaListener(topics = "shipping-events")
    public void onShippingEvent(ShippingEvent event) {
        if (event instanceof ShippingStatusUpdatedEvent e) {
            orderDetailsCollection.updateOne(
                Filters.eq("_id", e.getOrderId()),
                Updates.set("shippingStatus", e.getTrackingStatus())
            );
        }
    }
}

// Query: single MongoDB document fetch — no inter-service calls
public OrderDetailsDto getOrderDetails(String orderId) {
    Document doc = orderDetailsCollection.find(Filters.eq("_id", orderId)).first();
    if (doc == null) throw new OrderNotFoundException(orderId);
    return mapToDto(doc);  // <5ms, single DB read
}

The performance comparison is dramatic: CQRS read model query latency is under 5ms (single MongoDB document fetch), versus BFF parallel composition at 40-80ms (parallel HTTP calls to 4-5 services), versus sequential composition at 480ms (the anti-pattern). The cost is complexity: you must maintain the event consumer infrastructure, handle event ordering and deduplication, manage schema evolution as events change, and accept eventual consistency — the read model may lag behind the source of truth by 100ms to 2 seconds during high write load.

CQRS read models are the right choice for high-read, high-traffic screens (order history lists, dashboard summaries) where the acceptable consistency window is a few seconds. They are the wrong choice for screens that require real-time accuracy immediately after a write (post-checkout order confirmation, where the user must see the just-placed order immediately — the BFF is better here).

7. Saga-Based Data Consistency vs Aggregation

It is important to distinguish between two related but different problems: aggregation (reading data from multiple services into a single response) and coordination (writing or transacting across multiple services consistently). Sagas solve the coordination problem, not the aggregation problem.

When a developer encounters the need to "join" data across services on a write path — for example, creating an order while simultaneously checking inventory, reserving stock, and charging the customer — the temptation is to make synchronous calls across services in a single operation. This is the distributed transaction anti-pattern. The saga pattern provides the correct solution: a sequence of local transactions coordinated through events or choreography, with compensating transactions for rollback on failure.

The connection to aggregation is through local projections. A well-designed saga produces events that can feed the CQRS read model described above. When the Order Created saga completes — after inventory reservation, payment authorization, and order confirmation — each step publishes events that update the read model. The read model then serves as the aggregation layer for all subsequent reads. You maintain a local copy of the data you need from other services, rather than querying them at read time.

Warning: The CQRS read model will lag behind the source of truth during high write load. Design your UI to account for eventual consistency — show optimistic UI updates immediately after a write, not a spinner waiting for the read model to catch up. If a user places an order, show the confirmation immediately from the write response, not from a subsequent read model query.

The saga pattern also clarifies what belongs in a BFF: BFF aggregates reads across services; sagas coordinate writes across services. These are separate concerns that should never be mixed. A BFF that initiates a saga (by calling an order creation endpoint that internally orchestrates a saga) is correct. A BFF that directly calls inventory-reserve, payment-charge, and order-create sequentially as part of a "transaction" is an anti-pattern that will produce inconsistent data when any step fails.

Eventual consistency trade-offs for aggregated read models are real but manageable. The typical lag between a write and the read model reflecting that write is 100ms to 2 seconds under normal load. For most business use cases — order history lists, dashboard analytics, product catalog — this is entirely acceptable. The key discipline is explicitly documenting the consistency window for each read model and designing the client experience accordingly. Users tolerate a 1-second delay for data to "refresh" on a history page; they do not tolerate it on a checkout confirmation page.

8. Performance Optimization: Parallel Composition and Caching

For BFF-based composition, parallel execution and caching are the two most impactful performance levers. We have already seen how CompletableFuture.allOf() collapses sequential 480ms latency to parallel 45ms latency. Adding caching at the BFF layer further reduces latency for repeated requests and shields downstream services from traffic spikes.

@Service
public class OrderDetailsBffWithCaching {

    // L1: In-process Caffeine cache (sub-millisecond reads)
    private final Cache<String, MobileOrderDetailsDto> localCache = Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(Duration.ofSeconds(30))
        .recordStats()
        .build();

    // L2: Distributed Redis cache (shared across BFF instances)
    private final RedisTemplate<String, MobileOrderDetailsDto> redisTemplate;

    // Circuit breakers per downstream service
    private final CircuitBreaker productServiceCB =
        CircuitBreaker.of("productService", CircuitBreakerConfig.custom()
            .failureRateThreshold(50)
            .waitDurationInOpenState(Duration.ofSeconds(10))
            .slidingWindowSize(20)
            .build());

    public MobileOrderDetailsDto getOrderDetails(String orderId) {
        // L1 cache check
        MobileOrderDetailsDto cached = localCache.getIfPresent(orderId);
        if (cached != null) return cached;  // ~0.1ms

        // L2 cache check
        cached = redisTemplate.opsForValue().get("order-details:" + orderId);
        if (cached != null) {
            localCache.put(orderId, cached);
            return cached;  // ~1-3ms
        }

        // Cache miss: compose from services
        Order order = orderClient.getOrder(orderId);

        // Parallel calls with circuit breakers and timeouts
        CompletableFuture<Customer> customerFuture = CompletableFuture.supplyAsync(() ->
            customerClient.getCustomer(order.getCustomerId()), compositionExecutor);

        CompletableFuture<List<Product>> productsFuture = CompletableFuture.supplyAsync(() ->
            CircuitBreaker.decorateSupplier(productServiceCB,
                () -> productClient.getProductsBatch(order.getProductIds())
            ).get(), compositionExecutor);

        CompletableFuture<ShippingStatus> shippingFuture = CompletableFuture.supplyAsync(() -> {
            try {
                return shippingClient.getStatus(orderId);
            } catch (Exception e) {
                // Graceful degradation: return null, UI shows "Status unavailable"
                log.warn("Shipping service unavailable for order {}", orderId, e);
                return null;
            }
        }, compositionExecutor);

        try {
            CompletableFuture.allOf(customerFuture, productsFuture, shippingFuture)
                             .get(500, TimeUnit.MILLISECONDS);
        } catch (TimeoutException e) {
            // Return partial response with available data rather than failing entirely
            log.warn("Composition timeout for order {}, returning partial response", orderId);
        }

        MobileOrderDetailsDto result = buildDto(order, customerFuture, productsFuture, shippingFuture);

        // Populate both cache levels
        localCache.put(orderId, result);
        redisTemplate.opsForValue().set("order-details:" + orderId, result, Duration.ofSeconds(60));

        // Invalidate cache on domain events (via Kafka consumer in BFF)
        return result;
    }
}

// Cache invalidation via domain events
@KafkaListener(topics = "order-events,shipping-events")
public void onDomainEvent(DomainEvent event) {
    // Invalidate cache when order data changes
    String orderId = event.getOrderId();
    localCache.invalidate(orderId);
    redisTemplate.delete("order-details:" + orderId);
}

The two-layer cache strategy — Caffeine (in-process) plus Redis (distributed) — provides sub-millisecond response for hot items (the same order viewed repeatedly by a customer service agent) while sharing cache state across all BFF instances. Cache invalidation via domain events ensures correctness: when an order status changes, the Kafka consumer in the BFF immediately evicts the stale cache entry. This is event-driven cache invalidation — far more precise than time-based TTL alone, which would serve stale data for up to 60 seconds after a status update.

Resilience4j circuit breakers around each downstream service call prevent cascade failures. If the Product Service starts timing out, the circuit breaker opens after 10 consecutive failures and returns a fallback (cached product data or a "product unavailable" placeholder) for the next 10 seconds instead of letting 500ms timeout calls pile up. The combination of parallel composition, multi-level caching, circuit breakers, and graceful degradation (partial responses on timeout) produces a BFF that is fast, resilient, and observable — the three non-functional requirements that distinguish production systems from prototypes.

Key Takeaways

Sequential service calls are the root cause — The distributed N+1 problem turns milliseconds into seconds. Always measure end-to-end API latency with distributed tracing before and after adding a composition layer.
BFF is the most team-aligned pattern — One BFF per client type, owned by the client team, eliminates cross-team coordination for UI changes and prevents the shared-BFF anti-pattern.
Batch APIs are prerequisites for efficient BFF — Expose bulk fetch endpoints (POST /products/batch) across all services to eliminate per-item service calls in composition layers.
CompletableFuture.allOf() is the parallel composition primitive — Fan out all independent service calls simultaneously; total latency becomes the latency of the slowest call, not the sum.
GraphQL Federation for complex data graphs — When many teams contribute to a shared data graph and clients need fine-grained field selection, federation's schema-driven approach scales teams better than manual BFF code.
Gateway aggregation for simple joins only — Two or three services, declarative configuration; any more complexity than that belongs in a BFF.
CQRS read models for high-traffic, high-read screens — Pre-computed documents achieve under 5ms query latency at the cost of eventual consistency; use for history and dashboard screens.
Circuit breakers + graceful degradation — Protect composition layers from cascade failures; a partial response with available data is always better than a 500 error waiting for a timeout.

Conclusion

API composition in microservices is not a single solved problem — it is a family of patterns, each optimal for different situations along the axes of complexity, team ownership, consistency requirements, and performance targets. The BFF pattern covers the majority of real-world composition requirements: it is straightforward to implement, test, and own, and with parallel composition and caching it delivers latency that satisfies the most demanding UI requirements. GraphQL Federation is the right escalation for organizations with complex, team-spanning data graphs that benefit from schema-driven discovery and fine-grained field selection. CQRS read models are the optimization ceiling for read-heavy, latency-critical screens that can accept eventual consistency.

The universal prescriptions are: always measure before optimizing (distributed tracing reveals the actual bottlenecks), always expose batch APIs from your services (they are the multiplier that makes every composition pattern better), and always implement circuit breakers with graceful degradation (composition layers that fail completely on any downstream hiccup are not production-grade). The distributed N+1 problem is solvable — with the right pattern, the right tooling, and the measurement discipline to verify that it is actually solved.

Aspect	BFF Pattern	GraphQL Federation	Gateway Aggregation
Flexibility	High (code-first)	High (schema-driven)	Low (config-driven)
Client-specific optimization	Excellent (per-client BFF)	Good (field selection)	Limited
Schema / contract	REST / OpenAPI	GraphQL SDL	REST / OpenAPI
Team ownership	Client team	Shared (platform + domain)	Platform / infra team
N+1 prevention	Manual (batch APIs)	Automatic (DataLoader)	Manual
Caching strategy	Caffeine + Redis	Persisted queries + CDN	Gateway cache headers
Learning curve	Low (standard Java)	High (Federation concepts)	Low (config)
Best for	Most teams, varied clients	Complex graphs, many teams	Simple 2-3 service joins

"The distributed N+1 problem is just the ORM N+1 problem wearing a different hat. The solution is always the same: stop making N calls when one batch call will do, and stop calling sequentially when you can call in parallel."
— Microservices performance engineering principle

API Composition and Data Aggregation in Microservices: BFF Pattern, GraphQL & Avoiding Distributed Joins

Table of Contents

1. The N+1 Problem in Microservices

2. API Composition Patterns Overview

3. The BFF (Backend for Frontend) Pattern

4. GraphQL Federation for Data Aggregation

5. API Gateway Aggregation Layer

6. CQRS Read Models for Pre-Composed Data

7. Saga-Based Data Consistency vs Aggregation

8. Performance Optimization: Parallel Composition and Caching

Key Takeaways

Conclusion

Tags

Leave a Comment

Related Posts

API Composition and Data Aggregation in Microservices: BFF Pattern, GraphQL & Avoiding Distributed Joins

Table of Contents

1. The N+1 Problem in Microservices

2. API Composition Patterns Overview

3. The BFF (Backend for Frontend) Pattern

4. GraphQL Federation for Data Aggregation

5. API Gateway Aggregation Layer

6. CQRS Read Models for Pre-Composed Data

7. Saga-Based Data Consistency vs Aggregation

8. Performance Optimization: Parallel Composition and Caching

Key Takeaways

Conclusion

Tags

Leave a Comment

Related Posts

GraphQL Federation: Building a Unified API Across Microservices

CQRS and Event Sourcing: Building Event-Driven Microservices

API Gateway & Service Mesh: Patterns for Microservices Communication

Cookie Notice