Microservices

Service Communication Patterns in Microservices: REST, gRPC, Messaging, and GraphQL Federation

Choosing the right inter-service communication mechanism is one of the most impactful architectural decisions in a microservices system. The wrong choice creates tight coupling, cascading failures, and brittle contracts. This guide covers the four dominant patterns and when to use each.

Md Sanwar Hossain March 2026 18 min read Microservices

Network services communicating in a microservices architecture

The Communication Spectrum
REST over HTTP/JSON: The Universal Interface
gRPC: High-Performance Internal APIs
Asynchronous Messaging with Apache Kafka
GraphQL Federation: Unified APIs Across Services
Choosing the Right Pattern

The Communication Spectrum

Service Communication Patterns | mdsanwarhossain.me — Service Communication Patterns — mdsanwarhossain.me

Inter-service communication spans a spectrum from tight synchronous coupling to loose asynchronous decoupling. At one end: a synchronous HTTP REST call where the caller blocks until it receives a response. At the other end: a Kafka event where the producer publishes and immediately continues, and the consumer processes at its own pace. Each point on this spectrum involves different trade-offs between latency, resilience, complexity, and consistency.

No single communication mechanism is universally best. Production microservices systems use different mechanisms for different interaction types, choosing based on the nature of the interaction: does the caller need the result before it can proceed? Is the interaction time-sensitive? Can the caller tolerate eventual consistency? These questions drive the decision.

REST over HTTP/JSON: The Universal Interface

REST is the default choice for public-facing APIs and for service-to-service calls where simplicity and debuggability are priorities. Its advantages are significant: every language and framework has HTTP client libraries; JSON is human-readable and trivially debuggable; REST APIs are easily documented with OpenAPI/Swagger; and browser-based clients can consume REST APIs directly.

Designing REST APIs for Microservices

REST APIs between services benefit from the same design discipline as public APIs. Use resource-oriented URLs, appropriate HTTP verbs, and standard HTTP status codes. Versioning matters — use URL path versioning (/v1/users) or content negotiation. Define response schemas strictly with OpenAPI contracts, and use contract testing (Pact) to verify that producers and consumers remain compatible as both evolve independently.

// Spring Boot REST controller with consistent error handling
@RestController
@RequestMapping("/v1/users")
public class UserController {
    private final GetUserUseCase getUserUseCase;
    @GetMapping("/{userId}")
    public ResponseEntity<UserResponse> getUser(@PathVariable UUID userId) {
        return getUserUseCase.execute(userId)
            .map(user -> ResponseEntity.ok(UserResponse.from(user)))
            .orElse(ResponseEntity.notFound().build());
    }
    @PostMapping
    public ResponseEntity<UserResponse> createUser(@Valid @RequestBody CreateUserRequest req) {
        User created = createUserUseCase.execute(req.toDomain());
        URI location = ServletUriComponentsBuilder.fromCurrentRequest()
            .path("/{id}").buildAndExpand(created.getId()).toUri();
        return ResponseEntity.created(location).body(UserResponse.from(created));
    }
}

REST Resilience: Circuit Breakers and Retries

Synchronous REST calls between services create availability dependencies. If Service B is slow, Service A's thread pool fills with waiting requests, eventually causing cascading failure. Apply the circuit breaker pattern with Resilience4j: after a threshold of failures, the circuit opens and requests fail fast without hitting the unavailable downstream service. Combine with retries (with exponential backoff) for transient failures, bulkhead isolation for different downstream dependencies, and timeouts on every outgoing request.

gRPC: High-Performance Internal APIs

API Communication Design | mdsanwarhossain.me — API Communication Design — mdsanwarhossain.me

gRPC is Google's open-source RPC framework built on HTTP/2 and Protocol Buffers. For internal service-to-service communication where throughput and latency are critical, gRPC offers significant advantages: strongly-typed contracts defined in .proto files (eliminating the type mismatch bugs common with JSON APIs); binary serialization with Protocol Buffers (3–10x smaller payloads and faster serialization than JSON); HTTP/2 multiplexing (multiple streams over a single connection with no head-of-line blocking); and bidirectional streaming (clients and servers can stream data in both directions simultaneously).

// user.proto — service contract
syntax = "proto3";
package com.example.user.v1;
service UserService {
  rpc GetUser (GetUserRequest) returns (UserResponse);
  rpc StreamUserEvents (StreamRequest) returns (stream UserEvent);
}
message GetUserRequest {
  string user_id = 1;
}
message UserResponse {
  string user_id = 1;
  string email    = 2;
  string name     = 3;
  int64  created_at = 4;
}

The .proto file serves as the canonical contract. Both client and server generate their code from it, ensuring type safety across service boundaries. In Spring Boot, the spring-grpc project (stable from Spring 2025) provides idiomatic gRPC server and client support with Spring's DI, security, and observability integrations.

When to choose gRPC over REST: High-throughput internal APIs; real-time streaming; mobile/backend communication where payload size matters; polyglot environments where strongly-typed contracts prevent integration bugs.

Asynchronous Messaging with Apache Kafka

Kafka is the dominant choice for event-driven inter-service communication. A service publishes an event to a Kafka topic; any number of consumers subscribe and process it independently, at their own pace, retrying on failure without affecting other consumers. Kafka provides durable, ordered, replayable event logs — published events are not lost if a consumer is temporarily unavailable.

// Spring Boot Kafka producer
@Service
public class OrderEventPublisher {
    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;
    public void publishOrderPlaced(Order order) {
        OrderEvent event = OrderEvent.builder()
            .eventId(UUID.randomUUID().toString())
            .eventType("ORDER_PLACED")
            .orderId(order.getId())
            .customerId(order.getCustomerId())
            .totalAmount(order.getTotalAmount())
            .occurredAt(Instant.now())
            .build();
        kafkaTemplate.send("order-events", order.getId(), event)
            .whenComplete((result, ex) -> {
                if (ex != null) log.error("Failed to publish order event: {}", order.getId(), ex);
                else log.info("Order event published: partition={}, offset={}",
                    result.getRecordMetadata().partition(),
                    result.getRecordMetadata().offset());
            });
    }
}

When to choose Kafka over synchronous communication: When the producer does not need the consumer's response to continue; when consumers need to process at their own rate; when the event log needs to be replayable; when multiple consumers need to independently process the same events; and when the producer and consumers should be independently deployable and scalable.

GraphQL Federation: Unified APIs Across Services

GraphQL Federation allows multiple microservices to each own a slice of a unified GraphQL schema. A gateway (Apollo Federation, Netflix's DGS) stitches the schemas together and routes queries to the appropriate services. This is particularly valuable for frontend teams building complex UIs that aggregate data across many services: instead of making N REST calls and assembling the result in the client, the client makes one GraphQL query and the federation gateway handles the fan-out.

Each service defines its entity types and the fields it owns. Other services can extend those entities with fields they own. The gateway resolves queries by fetching entity references from one service and extending them with fields from another, transparently to the client.

Choosing the Right Pattern

Use this decision framework: REST for CRUD operations, public APIs, and cases where debuggability is more important than raw performance. gRPC for high-throughput internal APIs between known services, especially with streaming requirements. Kafka for business events that need to trigger downstream processing, for workflows where steps can be parallel, and for data integration between services. GraphQL Federation for frontend-to-backend aggregation across multiple services when client flexibility and query efficiency are priorities.

Most production systems use all four. The discipline is applying each in the right context rather than defaulting to one for everything.

"The communication pattern is the contract. Choose it as carefully as you choose your API design — because changing it in production requires coordinated migration across all consumers."

Key Takeaways

REST is the universal default for public APIs and CRUD operations; always apply circuit breakers for service-to-service REST calls.
gRPC's strongly-typed contracts and binary serialization make it ideal for high-throughput internal communication.
Kafka enables loose coupling, independent scaling, and replayable event logs for business event communication.
GraphQL Federation unifies fragmented microservices data behind a single client-friendly API.
Production systems use all four patterns; the skill is matching the pattern to the interaction type.

Communication Pattern Decision Matrix

Pattern	Use When	Avoid When
REST/HTTP	Public APIs, CRUD, debuggability matters	Sub-millisecond internal latency needed
gRPC	High-throughput internal APIs, streaming	External clients, browser-facing APIs
Kafka	Business events, fan-out, replayable audit	Request/reply with immediate response needed
GraphQL Federation	Aggregating data across services for clients	Simple CRUD, machine-to-machine integrations

Conclusion

Service communication is the connective tissue of every microservices system. Choosing the wrong pattern is not just a performance issue — it creates organisational friction, difficult migrations, and hidden failure modes. REST provides universal reach and debuggability. gRPC delivers typed, high-throughput internal contracts. Kafka decouples producers and consumers for resilient event-driven workflows. GraphQL Federation unifies fragmented data behind a flexible client API.

At BRAC IT: Our Service Communication Evolution

In 2022, our entire platform used synchronous REST for all inter-service communication. Service A called Service B, which called Service C, which called D, E, and F. This design created a latency stack: a slow downstream service made every upstream service slow. During a payment gateway degradation in late 2022, one service chain with 8 hops had a P99 latency of 14 seconds. The payment gateway itself was responding in 3 seconds, but that 3-second delay was multiplied 3 times through the call chain. Users gave up and tried to double-submit payments.

Over 18 months, we progressively migrated to a hybrid communication model. The principle: REST for synchronous queries that need an immediate response, Kafka for state-change notifications that trigger downstream processing. Today our architecture uses:

REST: client-facing APIs, queries that return data the caller immediately displays
Kafka: loan state transitions, audit events, analytics data, triggering background jobs
gRPC: high-frequency internal service calls between our reporting and data aggregation services
SSE (Server-Sent Events): real-time status updates pushed to browser clients

The result: our synchronous call chains dropped from 8 hops to 3 hops at most. P99 API latency for loan applications dropped from 4.2 seconds to 820 milliseconds. The change failure rate dropped from 18% to 4% because services can now be deployed independently without coordinating with their downstream consumers.

Circuit Breaking and Resilience with Resilience4j

Even with a hybrid communication model, synchronous REST calls remain in every system. Circuit breakers are essential for preventing a slow downstream service from taking down your entire call chain. Resilience4j is the go-to library for Spring Boot:

resilience4j:
  circuitbreaker:
    instances:
      paymentGateway:
        sliding-window-type: COUNT_BASED
        sliding-window-size: 10
        failure-rate-threshold: 50       # open if 5/10 calls fail
        wait-duration-in-open-state: 30s # stay open 30s, then try half-open
        permitted-number-of-calls-in-half-open-state: 3
        slow-call-duration-threshold: 2s # calls > 2s count as slow
        slow-call-rate-threshold: 60     # open if 60% of calls are slow

@Service
public class PaymentService {

    @CircuitBreaker(name = "paymentGateway", fallbackMethod = "paymentFallback")
    @Retry(name = "paymentGateway")
    @TimeLimiter(name = "paymentGateway")
    public CompletableFuture<PaymentResult> processPayment(PaymentRequest req) {
        return CompletableFuture.supplyAsync(() -> gatewayClient.pay(req));
    }

    public CompletableFuture<PaymentResult> paymentFallback(
            PaymentRequest req, Exception ex) {
        // Queue for retry, return pending status to caller
        retryQueue.enqueue(req);
        return CompletableFuture.completedFuture(
            PaymentResult.pending("Payment queued for retry"));
    }
}

Monitor circuit breaker state with Actuator + Micrometer. Add a Grafana alert: if any circuit breaker is OPEN for more than 60 seconds, page on-call. A breaker stuck open means a downstream dependency has not recovered — it needs human investigation, not just automatic retries.

API Versioning Strategies for Long-Lived Services

APIs outlive the teams that built them. Getting versioning wrong early means painful migrations later. Three strategies and their tradeoffs:

Strategy	Example	Pros	Cons
URL versioning	`/api/v1/loans`	Simple, visible, easy to cache	URL pollution, version in path is unusual semantically
Header versioning	`Accept: application/vnd.bracit.v2+json`	Clean URLs, RESTful	Less visible, harder to test in browser
Query param	`/api/loans?version=2`	Easy to add to existing APIs	Breaks caching, feels like a hack

Our recommendation at BRAC IT: URL versioning for all client-facing public APIs (mobile apps, third-party integrations), header versioning for internal service-to-service APIs. The versioning contract we enforce: support N and N-1 simultaneously. When releasing v2, v1 enters deprecation. After 6 months with Sunset response headers warning consumers, v1 is retired. This gives integrators time to migrate without requiring your team to maintain versions indefinitely.

Service Communication Production Checklist

Use this checklist before any service communication pattern reaches production:

For REST APIs: Use OpenAPI 3.x contract-first design; validate request/response against the schema in CI; add correlation ID header to every request; set read timeouts (never rely on OS defaults); implement circuit breakers with Resilience4j on all outbound synchronous calls; version your API URL path and document the deprecation schedule.

For gRPC: Define services in .proto files committed to a shared schema repository; use deadlines (not just timeouts) on every RPC call; implement retries with idempotency keys; use server-side reflection in development but disable it in production; add gRPC status codes to your Prometheus metrics.

For Kafka: Partition by aggregate ID for ordering guarantees; implement idempotent consumers; configure a DLQ for unprocessable messages; monitor consumer group lag and alert before it becomes a business problem; set message retention period based on regulatory requirements, not convenience defaults.

For all patterns: Every service-to-service call must propagate trace context (W3C TraceContext headers); log correlation IDs on every request; document your service's communication contracts in Backstage; test failure scenarios — what happens when your downstream is unavailable? What is the fallback?

Service communication is the silent reliability multiplier. Get it right and your system degrades gracefully under load. Get it wrong and a single slow service brings down everything connected to it. The discipline of designing for failure — circuit breakers, retries with backoff, DLQs, idempotency — is what separates systems that handle reality from systems that only work in ideal conditions.

Service Communication Patterns in Microservices: REST, gRPC, Messaging, and GraphQL Federation

Table of Contents

The Communication Spectrum

REST over HTTP/JSON: The Universal Interface

Designing REST APIs for Microservices

REST Resilience: Circuit Breakers and Retries

gRPC: High-Performance Internal APIs

Asynchronous Messaging with Apache Kafka

GraphQL Federation: Unified APIs Across Services

Choosing the Right Pattern

Key Takeaways

Communication Pattern Decision Matrix

Conclusion

At BRAC IT: Our Service Communication Evolution

Circuit Breaking and Resilience with Resilience4j

API Versioning Strategies for Long-Lived Services

Service Communication Production Checklist

Tags

Leave a Comment

Related Posts

Service Communication Patterns in Microservices: REST, gRPC, Messaging, and GraphQL Federation

Table of Contents

The Communication Spectrum

REST over HTTP/JSON: The Universal Interface

Designing REST APIs for Microservices

REST Resilience: Circuit Breakers and Retries

gRPC: High-Performance Internal APIs

Asynchronous Messaging with Apache Kafka

GraphQL Federation: Unified APIs Across Services

Choosing the Right Pattern

Key Takeaways

Communication Pattern Decision Matrix

Conclusion

At BRAC IT: Our Service Communication Evolution

Circuit Breaking and Resilience with Resilience4j

API Versioning Strategies for Long-Lived Services

Service Communication Production Checklist

Tags

Leave a Comment

Related Posts

Microservices Architecture Patterns: Building Resilient, Scalable Distributed Systems

API Gateway & Service Mesh: Architecting the Network Layer for Distributed Systems

Event-Driven Architecture: Design, Patterns, and Production Best Practices

Cookie Notice