GraphQL Federation: Building Distributed Supergraph APIs for Microservices

GraphQL Federation Supergraph API Architecture

In a microservices architecture, frontend teams face a brutal reality: to render a single product page, they must orchestrate calls to the product service, user service, inventory service, review service, and recommendation service. GraphQL Federation solves this with a unified supergraph — one endpoint, infinite composability, zero backend coordination required between frontend and individual service teams.

The Problem: API Explosion in Microservices

A mid-size e-commerce platform migrated to microservices and immediately created a problem for their frontend team. Loading the order detail page required:

  • GET /users/{id} — user name, email, preferences
  • GET /orders/{id} — order details, line items
  • GET /products/{ids} — product details for each line item (N+1!)
  • GET /inventory/{productIds} — stock levels
  • GET /reviews?productId={id} — reviews for each product
  • GET /recommendations?userId={id} — personalised upsells

Six API calls with waterfall dependencies. Mobile clients on 3G connections saw 2–4 second page loads. The BFF (Backend for Frontend) pattern was considered — but creating a dedicated BFF for every client type meant duplicating business logic and adding another service for every team to maintain.

GraphQL Federation solved this with a single query that the router decomposes and fans out to the appropriate subgraphs in parallel, returning a single composed response.

Federation v2 Architecture: Router, Subgraphs, Supergraph

┌─────────────────────────────────────────────────────────────┐
│                     CLIENTS                                 │
│         Web App    Mobile App    Partner API                │
└───────────────────────────┬─────────────────────────────────┘
                            │ Single GraphQL query
┌───────────────────────────▼─────────────────────────────────┐
│                  APOLLO ROUTER                              │
│  - Query planning (decomposes supergraph query)            │
│  - Subgraph fan-out (parallel where possible)              │
│  - Response composition                                    │
│  - Auth, rate limiting, caching, tracing                   │
└────┬──────────────┬────────────────┬────────────┬───────────┘
     │              │                │            │
┌────▼────┐  ┌──────▼──────┐  ┌─────▼────┐  ┌───▼─────────┐
│  User   │  │   Order     │  │ Product  │  │  Review     │
│ Service │  │  Service    │  │ Service  │  │  Service    │
│(subgraph│  │ (subgraph)  │  │(subgraph)│  │ (subgraph)  │
└─────────┘  └─────────────┘  └──────────┘  └─────────────┘

The supergraph is the composed schema representing the union of all subgraph schemas. Clients query the supergraph. The Router holds the query plan — a directed execution graph that determines which subgraphs to call, in what order, with what inputs. The Router never touches a database; it only orchestrates subgraph queries.

Entities and @key: How Services Share Types

The most important concept in Federation is the entity. An entity is a type that can be referenced and extended across multiple subgraphs. It is identified by a @key directive specifying its unique identifier fields.

User Service Subgraph (entity owner)

type User @key(fields: "id") {
    id: ID!
    name: String!
    email: String!
    profilePictureUrl: String
    createdAt: DateTime!
}

type Query {
    user(id: ID!): User
    me: User
}

Order Service Subgraph (entity extension)

extend type User @key(fields: "id") {
    id: ID! @external           # Owned by user-service
    orders(first: Int = 10): [Order!]!
    orderCount: Int!
}

type Order @key(fields: "id") {
    id: ID!
    userId: ID!
    status: OrderStatus!
    lineItems: [LineItem!]!
    totalAmount: Float!
    createdAt: DateTime!
}

type LineItem {
    productId: ID!
    quantity: Int!
    unitPrice: Float!
    product: Product   # Resolved via product-service entity
}

type Query {
    order(id: ID!): Order
    orders(userId: ID!): [Order!]!
}

With this schema, a client can query:

query OrderDetailPage($orderId: ID!) {
    order(id: $orderId) {
        id
        status
        totalAmount
        user {            # Resolved by user-service
            name
            email
        }
        lineItems {
            quantity
            product {     # Resolved by product-service
                name
                imageUrl
                inventory { # Resolved by inventory-service
                    stockLevel
                }
                reviews(first: 3) { # Resolved by review-service
                    rating
                    body
                }
            }
        }
    }
}

The Router decomposes this query, fans out to four subgraphs in parallel where dependencies allow, and returns a single composed response. What was 6+ sequential HTTP calls becomes one round-trip from the client's perspective.

@requires and @provides for Computed Fields

Sometimes a field in service B requires a field from service A that the Router would not normally fetch. @requires solves this:

// In the review service subgraph
extend type Product @key(fields: "id") {
    id: ID! @external
    category: String @external        # Owned by product-service

    # This field requires category from product-service to compute
    similarProductReviews: [Review!]! @requires(fields: "category")
}

// The Router will fetch Product.category from product-service
// before calling review-service to resolve similarProductReviews

@provides is the inverse: it tells the Router that a resolver will also provide additional entity fields, avoiding an extra round-trip to the owning service:

type Order @key(fields: "id") {
    lineItems: [LineItem!]!
}

type LineItem {
    # Order service provides product name alongside line items
    # Router doesn't need to call product-service for just the name
    product: Product @provides(fields: "name")
}

extend type Product @key(fields: "id") {
    id: ID! @external
    name: String @external
}

Netflix DGS Framework: Spring Boot Implementation

Netflix DGS (Domain Graph Service) is the premier Spring Boot GraphQL framework, with native Apollo Federation support:

// pom.xml
<dependency>
    <groupId>com.netflix.graphql.dgs</groupId>
    <artifactId>graphql-dgs-spring-boot-starter</artifactId>
    <version>8.4.0</version>
</dependency>
<dependency>
    <groupId>com.netflix.graphql.dgs</groupId>
    <artifactId>graphql-dgs-federation-graphql-java-support</artifactId>
    <version>8.4.0</version>
</dependency>
// Order service DGS data fetcher with Federation entity resolver
@DgsComponent
public class OrderDataFetcher {

    private final OrderRepository orderRepository;
    private final OrderDataLoader orderDataLoader;

    @DgsQuery
    public Order order(@InputArgument String id) {
        return orderRepository.findById(UUID.fromString(id))
            .orElseThrow(() -> new DgsEntityNotFoundException("Order not found: " + id));
    }

    // Entity resolver: called by Router to resolve User.orders
    // The @key field (user.id) is provided by the Router
    @DgsEntityFetcher(name = "User")
    public User resolveUser(Map<String, Object> values) {
        // We only need to return a stub User with id for further resolution
        return new User((String) values.get("id"));
    }

    @DgsData(parentType = "User", field = "orders")
    public CompletableFuture<List<Order>> userOrders(DgsDataFetchingEnvironment dfe) {
        User user = dfe.getSource();
        // Use DataLoader to batch multiple user.orders requests
        DataLoader<String, List<Order>> loader = dfe.getDataLoader("ordersForUser");
        return loader.load(user.getId());
    }
}

N+1 Problem in Federated GraphQL: DataLoader Pattern

Federation introduces a new N+1 vector: when the Router calls product-service to resolve 50 line items, it can generate 50 individual product queries. DataLoader batches these into a single request:

@DgsDataLoader(name = "products")
public class ProductBatchLoader implements MappedBatchLoader<String, Product> {

    private final ProductRepository productRepository;

    @Override
    public CompletionStage<Map<String, Product>> load(Set<String> productIds) {
        return CompletableFuture.supplyAsync(() -> {
            // Single query for all product IDs
            List<Product> products = productRepository.findAllById(
                productIds.stream().map(UUID::fromString).toList()
            );
            return products.stream()
                .collect(Collectors.toMap(p -> p.getId().toString(), p -> p));
        });
    }
}

@DgsData(parentType = "LineItem", field = "product")
public CompletableFuture<Product> lineItemProduct(DgsDataFetchingEnvironment dfe) {
    LineItem lineItem = dfe.getSource();
    DataLoader<String, Product> loader = dfe.getDataLoader("products");
    return loader.load(lineItem.getProductId()); // batched automatically
}

Apollo Router vs Apollo Gateway: Performance Comparison

Apollo Router (Rust-based, v1.0+ stable) vs Apollo Gateway (Node.js) is a critical production choice:

  • Apollo Router: written in Rust, 5–10x lower memory footprint, sub-millisecond query planning overhead, supports Rhai scripts for custom logic, recommended for all new deployments
  • Apollo Gateway: Node.js, higher memory usage (~200MB vs ~20MB for Router), slower cold start, but supports JavaScript plugins for teams with existing Node.js expertise

At 10,000 queries/second, Apollo Router consumes ~120MB RAM and adds ~0.5ms planning overhead. Apollo Gateway at the same load consumes ~800MB and adds ~3–5ms. For latency-sensitive APIs, the Router is the clear choice.

Schema Composition and Breaking Change Detection

# CI pipeline schema validation with Rover CLI
# Install Rover
curl -sSL https://rover.apollo.dev/nix/latest | sh

# Check subgraph composition (run in CI before every subgraph deployment)
rover subgraph check my-graph@production \
  --name order-service \
  --schema ./src/main/resources/graphql/schema.graphqls

# Publish updated subgraph schema to Apollo Registry
rover subgraph publish my-graph@production \
  --name order-service \
  --schema ./src/main/resources/graphql/schema.graphqls \
  --routing-url https://order-service.internal/graphql

Breaking changes detected by Rover include: removing a field, changing a field's type from nullable to non-nullable, removing an entity's @key, changing argument names. Non-breaking additions (new fields, new types) pass automatically.

Failure Scenarios

Subgraph Down

When a subgraph becomes unavailable, the Router returns partial data for fields resolvable from healthy subgraphs. Fields requiring the unavailable subgraph return null with an error extension. Configure the Router with health checks and circuit breaking:

// router.yaml
traffic_shaping:
  all:
    timeout: 30s
  subgraphs:
    order-service:
      timeout: 5s

health_check:
  enabled: true
  listen: 0.0.0.0:8088

Schema Composition Failure

If a subgraph publishes a schema that is incompatible with the supergraph (e.g., two services define conflicting types), composition fails. The Router continues serving the last valid supergraph. Composition errors are surfaced in Apollo Studio. Prevention: always run rover subgraph check in CI before publishing.

When NOT to Use GraphQL Federation

  • Simple REST APIs with <5 services: the operational overhead of Router deployment, schema composition, and Apollo Registry is not justified
  • Teams without GraphQL expertise: Federation adds substantial complexity on top of base GraphQL — learn base GraphQL first
  • High write-heavy APIs: GraphQL mutations across federated services introduce distributed transaction complexity; prefer REST + Saga pattern for heavy write workloads
  • Microservices with very different SLAs: if one subgraph has 99.0% availability, the supergraph's effective availability for queries crossing that subgraph cannot exceed 99.0%

Key Takeaways

  • GraphQL Federation eliminates the frontend API orchestration problem by moving it into the Router — a single, purpose-built, high-performance layer
  • Entities with @key are the glue of the supergraph — design them carefully, as changing key fields is a breaking change
  • DataLoader is mandatory in federated GraphQL; without it, entity resolution becomes an N+1 avalanche
  • Apollo Router (Rust) is 5–10x more efficient than Apollo Gateway (Node.js) — use it for all new production deployments
  • Run rover subgraph check in every CI pipeline; never allow breaking schema changes to reach the production supergraph undetected
  • Netflix DGS provides the best Federation v2 support for Spring Boot teams — it handles the federation boilerplate so you focus on resolvers

Conclusion

GraphQL Federation is not a simple technology — it introduces a new coordination layer, a new failure domain (the Router), and new concepts (entities, subgraphs, query planning) that every team member must understand. But the return on that investment is real: frontend teams gain complete autonomy over data fetching without requiring backend coordination, microservice teams gain complete autonomy over their subgraph schemas, and the entire organisation benefits from a single, consistent, discoverable API graph. For organisations with 5+ microservices and multiple client teams, Federation is the most powerful API architecture available today.


Related Posts