Microservices interview questions architecture patterns scenario-based answers 2026
Md Sanwar Hossain
Md Sanwar Hossain
Senior Software Engineer · Career & Interview Series
Microservices April 5, 2026 30 min read Career & Interview Series

Microservices Interview Questions 2026: Architecture, Patterns & Scenario-Based

"Microservices interview questions" draws 50K+ searches per month — and this is your domain. Senior microservices roles require deep knowledge of communication patterns, distributed data management, resilience strategies, and production operational realities. This guide covers 35+ scenario-based questions with detailed answers on Circuit Breaker, Saga, CQRS, Event Sourcing, service mesh, API gateway design, and the distributed systems trade-offs that interviewers truly care about in 2026.

Table of Contents

  1. Architecture & Design Principles
  2. Communication Patterns: Sync vs Async
  3. Resilience Patterns: Circuit Breaker, Bulkhead, Retry
  4. Distributed Data: Saga, CQRS, Outbox, Event Sourcing
  5. Service Mesh & API Gateway
  6. Observability: Tracing, Metrics, Logging
  7. Security: JWT, mTLS, OAuth2 in Microservices
  8. Production Scenario Questions

1. Architecture & Design Principles

Microservices interview architecture topics for senior engineers | mdsanwarhossain.me
Microservices Interview Architecture Topics 2026 — mdsanwarhossain.me

Q1: How do you determine the right service boundaries when decomposing a monolith into microservices?

Answer: Use Domain-Driven Design (DDD) bounded contexts as the primary guide. A bounded context defines a coherent model with its own ubiquitous language — each bounded context becomes a candidate service. Practical heuristics: decompose around business capabilities (not technical layers), ensure each service can be deployed independently, and target single ownership per service. Avoid fine-grained decomposition that creates a distributed monolith — if two "services" always deploy together or share a database, they belong in one service.

The Strangler Fig Pattern is the safest decomposition approach: incrementally route traffic to new microservices while the monolith handles the rest, gradually strangling it. Start with the domain that changes most frequently or causes the most deployment pain.

Interview Red Flag to Avoid: Do not decompose by technical layer (auth service, logging service, email service). Each call crosses a service boundary with added latency and failure risk. Domain-based decomposition maximizes cohesion within services and minimizes coupling between them.

Q2: What is the "distributed monolith" anti-pattern and how do you identify it in a codebase?

Answer: A distributed monolith looks like microservices architecturally (multiple deployed units) but behaves like a monolith operationally — tight coupling means services cannot be deployed independently. Identification signals: (1) synchronous chains where Service A calls B calls C calls D — a single user request crosses 4+ service hops; (2) shared databases across "separate" services; (3) a deploy of Service A requires deploying Service B; (4) a single failing service cascades and brings down all services; (5) all services version their APIs in lockstep. The fix is async event-driven communication, database-per-service, and independent deployability as a hard requirement.

2. Communication Patterns: Sync vs Async

Q3: When do you choose Kafka over REST for inter-service communication, and what are the trade-offs?

Answer:

Dimension REST (Sync) Kafka (Async)
Coupling Temporal coupling — both services must be up Decoupled — consumer can be down, message waits
Latency Low for single call; compounds in chains Higher (ms to seconds), eventual consistency
Use Case Query data needed for current response Events, notifications, state change propagation
Failure Caller gets error; retry logic needed Message retained, consumer retries independently
Observability HTTP status codes, distributed tracing Consumer lag, dead-letter queues, event replay

Choose Kafka for: order-of-operations workflows (order → inventory → payment → shipping), audit trails, fan-out to multiple consumers, and event sourcing. Use REST/gRPC for: real-time query results needed synchronously (user authentication, inventory check during checkout).

Q4: How does gRPC differ from REST, and when would you use it for microservices communication?

Answer: gRPC uses Protocol Buffers (binary serialization — ~5-10x smaller than JSON), HTTP/2 (multiplexed streams, header compression), and generated strongly-typed clients in any language. Key advantages over REST: lower latency from binary protocol, bidirectional streaming support, client/server code generation eliminates manual SDK maintenance, and built-in deadline propagation. Use gRPC for: high-throughput internal service-to-service communication (payment processing, inventory updates), streaming APIs (real-time price feeds, log streaming), and polyglot environments where services are written in different languages. Avoid for: public-facing APIs where JSON readability matters, browsers (gRPC-Web required), or teams unfamiliar with Protobuf tooling.

3. Resilience Patterns: Circuit Breaker, Bulkhead, Retry

Microservices resilience patterns circuit breaker saga bulkhead interview | mdsanwarhossain.me
Microservices Resilience Patterns — Circuit Breaker, Saga, Bulkhead — mdsanwarhossain.me

Q5: Explain the Bulkhead pattern — why is it needed, and how does it prevent cascade failures?

Answer: Named after ship bulkheads that divide a hull into isolated compartments, the Bulkhead pattern isolates resources used by different dependencies. Without it, a slow downstream service (e.g., inventory API with 3-second latency) exhausts all shared threads, blocking requests to healthy services (payment, user profile) — cascade failure. With bulkheads, each dependency gets its own dedicated thread pool (or semaphore limit). If inventory threads fill up, payment threads are unaffected.

# application.yml — Resilience4j thread-pool bulkhead
resilience4j:
  thread-pool-bulkhead:
    instances:
      inventoryService:
        maxThreadPoolSize: 10
        coreThreadPoolSize: 5
        queueCapacity: 20
        keepAliveDuration: 20ms
      paymentService:
        maxThreadPoolSize: 15
        coreThreadPoolSize: 8
        queueCapacity: 30

# Service annotation
@Bulkhead(name = "inventoryService", type = Bulkhead.Type.THREADPOOL,
           fallbackMethod = "inventoryFallback")
public CompletableFuture<Inventory> checkInventory(String sku) {
    return CompletableFuture.supplyAsync(() -> inventoryApi.check(sku));
}

Q6: How do you design a retry strategy that avoids the thundering herd problem?

Answer: The thundering herd occurs when many services retry a failed dependency simultaneously — the retries arrive at the recovering service in synchronized waves, causing it to fail again. The solution is exponential backoff with jitter. Each retry waits initialDelay * 2^retryCount milliseconds with a random jitter of ±20-50% added. This spreads retry attempts over time, giving the recovering service a chance to stabilize.

# Resilience4j Retry with exponential backoff + jitter
resilience4j:
  retry:
    instances:
      paymentService:
        maxAttempts: 3
        waitDuration: 500ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        enableRandomizedWait: true
        randomizedWaitFactor: 0.5   # jitter ±50%
        retryExceptions:
          - java.net.ConnectException
          - java.util.concurrent.TimeoutException
        ignoreExceptions:
          - com.example.BusinessValidationException  # don't retry business errors

Important: Only retry idempotent operations. Retrying a payment charge that actually succeeded but returned a timeout response will double-charge the customer. Always design payment APIs with idempotency keys.

4. Distributed Data: Saga, CQRS, Outbox, Event Sourcing

Q7: Compare Saga Orchestration vs Choreography — when do you use each?

Answer:

Choreography: Each service publishes an event, and other services react to it. No central coordinator. Example: Order service publishes OrderCreated → Inventory service reacts, reserves stock, publishes StockReserved → Payment service reacts, charges, publishes PaymentCompleted → Shipping reacts. Simple to implement, low coupling. Problem: hard to understand the overall workflow, difficult to detect saga failures, and compensating transaction logic is scattered across services.

Orchestration: A dedicated Saga Orchestrator (e.g., Spring State Machine or Temporal Workflow Engine) drives the workflow by sending commands to services and receiving results. The orchestrator knows the full saga state and can trigger compensating transactions precisely. Easier to debug (workflow state visible in one place) and handles failures explicitly. Trade-off: introduces a central coordinator service that can become a bottleneck or single point of failure if not designed for HA.

Choose Choreography for simple workflows (2-3 steps). Choose Orchestration for complex multi-step business workflows (5+ steps) where visibility and compensation tracking matter.

Q8: Explain CQRS — what problems does it solve and what does it introduce?

Answer: Command Query Responsibility Segregation separates the write model (Commands: PlaceOrder, UpdateInventory) from the read model (Queries: GetOrderHistory, GetProductCatalog). Problems solved: (1) write models are normalized for consistency but perform poorly for complex reads — separate read stores can be denormalized for fast queries; (2) write and read load can scale independently; (3) read projections can be optimized per use case (ElasticSearch for full-text, Redis for hot data, PostgreSQL for reports).

What it introduces: Eventual consistency between write and read stores (the projection update is async — milliseconds to seconds of lag), increased operational complexity (two data models to maintain), and more code. Use CQRS when read/write loads are dramatically different in scale or when multiple read projections are needed for different consumers.

// Command side — write to PostgreSQL
@CommandHandler
public void handle(PlaceOrderCommand cmd) {
    Order order = new Order(cmd.getCustomerId(), cmd.getItems());
    orderRepository.save(order);
    eventBus.publish(new OrderPlacedEvent(order.getId(), order.getItems()));
}

// Query side — read from denormalized Redis projection
@QueryHandler
public OrderSummary handle(GetOrderSummaryQuery query) {
    return redisTemplate.opsForValue().get("order-summary:" + query.getOrderId());
}

// Projection updater — consumes OrderPlacedEvent and updates Redis
@EventHandler
public void on(OrderPlacedEvent event) {
    OrderSummary summary = buildSummary(event);
    redisTemplate.opsForValue().set("order-summary:" + event.getOrderId(), summary);
}

5. Service Mesh & API Gateway

Q9: What is a service mesh (Istio/Linkerd) and how does it differ from an API gateway?

Answer: An API gateway sits at the north-south boundary — it handles external client traffic into the cluster: authentication, rate limiting, SSL termination, routing to internal services. A service mesh handles east-west traffic — communication between microservices inside the cluster. The service mesh injects a sidecar proxy (Envoy in Istio) next to each service pod. This proxy intercepts all traffic and provides: mutual TLS encryption between services, circuit breaking, retry policies, distributed tracing header injection, and traffic routing (canary, blue/green) — all without changing application code.

Feature API Gateway Service Mesh
Traffic Direction External → Internal (N-S) Service → Service (E-W)
mTLS SSL termination only Automatic mTLS between every pod
Examples Kong, AWS ALB, Spring Cloud Gateway Istio, Linkerd, Consul Connect

6. Observability: Tracing, Metrics, Logging

Q10: Explain the three pillars of observability and how you implement them in a microservices system.

Answer:

  • Metrics (What is happening?): Prometheus-format counters, gauges, histograms exposed by every service via Micrometer. Alert on SLO breaches: p99 latency > 500ms, error rate > 0.1%. Grafana dashboards aggregate across services.
  • Traces (Why is it happening?): OpenTelemetry SDK auto-instruments Spring Boot, Kafka consumers, and JDBC calls. TraceId propagated via W3C Trace Context headers across all service hops. Jaeger or Grafana Tempo stores traces for root-cause analysis.
  • Logs (What happened exactly?): Structured JSON logs (Logback + logstash-logback-encoder) with traceId and spanId injected from MDC. ELK Stack or Loki aggregates logs across all pods. Correlate a trace in Jaeger with the exact log lines that produced it using the traceId as a filter.

The golden signal alert stack: Latency, Traffic, Error Rate, Saturation (RED/USE methodology) — each service should have a Grafana dashboard showing these four signals.

7. Security in Microservices

Q11: How do you handle authentication and authorization across microservices? Explain the difference between JWT validation at the gateway vs at each service.

Answer: Two common approaches:

Gateway-only validation: The API gateway validates the JWT, strips it, and passes X-User-Id and X-User-Roles headers to downstream services. Services trust these headers without re-validating. Simpler for services, but services are vulnerable if bypassed (internal attacker can forge headers). Suitable for internal trusted networks.

Per-service validation: Each service validates the JWT independently using the IdP's public key (fetched from JWKS endpoint). Provides defense in depth — even if the gateway is compromised, individual services reject invalid tokens. Slightly more overhead. Recommended for zero-trust architectures. Combine with mTLS (service mesh) to guarantee the caller is an authorized service, not just any internal pod.

Authorization: JWT claims carry coarse-grained roles. Fine-grained authorization (e.g., "can user X modify resource Y?") is enforced at the service level using Spring Security's @PreAuthorize with custom PermissionEvaluator, or an external policy engine like OPA (Open Policy Agent).

8. Production Scenario Questions

Q12: An order processing saga is stuck in a half-committed state — inventory reserved but payment not charged after a service restart. How do you recover?

Answer: This is the classic saga incomplete state problem. Prevention and recovery approach:

  1. Prevention: Persist saga state (step, status) in a durable store (database or Redis) before each step. After a restart, the orchestrator queries incomplete sagas and resumes from the last completed step.
  2. Detection: A scheduled job scans for sagas in states like INVENTORY_RESERVED that have been unchanged for more than the expected payment timeout (e.g., 5 minutes).
  3. Recovery: Replay the next step (trigger payment command again — must be idempotent with idempotency key). If payment service confirms the charge was already processed, mark saga as complete. If payment failed, execute compensating transaction: call inventory service to release the reservation, move saga to COMPENSATED state, and notify the customer.
Key Design Rule: All saga command handlers must be idempotent. Use a saga execution ID + step ID as the idempotency key. Duplicate command delivery from retries must produce the same result, not a double-charge or double-reservation.

Q13: Kafka consumer lag is growing in production — your consumer group is falling behind. Walk through your diagnosis and scaling approach.

Answer:

  1. Check consumer lag metrics: kafka.consumer.fetch-manager-metrics.records-lag-max in Micrometer, or kafka-consumer-groups.sh --describe CLI. Identify which partition(s) have growing lag.
  2. Check consumer throughput: Is processing time per message increasing? Check if downstream DB or HTTP calls are slow (look at traces for the consumer thread).
  3. Scale horizontally: Add consumer instances up to the number of partitions. If you have 12 partitions and 3 consumers, add 9 more consumers to process in parallel. Beyond the partition count, extra consumers sit idle.
  4. Increase partitions: If you're already at partition count = consumer count, increase partitions (note: this is a one-way operation; Kafka doesn't reduce partitions).
  5. Tune fetch settings: Increase max.poll.records (default 500) and process in parallel batches within each consumer using a bounded thread pool.
  6. Check for poison pill messages: A single malformed message causing repeated processing errors can block the partition. Route to a dead-letter topic via Spring Kafka's DeadLetterPublishingRecoverer.
Senior Interview Formula for Microservices: For every scenario, answer: Problem → Root Cause → Immediate Fix → Long-term Prevention → Monitoring. Interviewers specifically want to hear about idempotency, compensating transactions, consumer lag, and dead-letter queues as signals of production experience.

Key Takeaways

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Last updated: April 5, 2026