Distributed Tracing with OpenTelemetry & Spring Boot: Complete Production Guide (2026)

A complete guide to implementing distributed tracing across Spring Boot microservices: OpenTelemetry Java agent vs SDK, Micrometer Tracing auto-instrumentation, custom spans, trace context propagation via W3C traceparent, Jaeger and Zipkin backends, sampling strategies, and Grafana Tempo integration.

Distributed Tracing OpenTelemetry Spring Boot 2026
TL;DR: Spring Boot 3 + Micrometer Tracing + OTel auto-instruments HTTP, JDBC, Kafka, and Redis with zero code changes. Add custom spans for business operations; use W3C traceparent for end-to-end trace propagation; send to Jaeger/Zipkin/Tempo via OTLP.

1. Core Concepts: Traces, Spans, Propagation

  • Trace: A complete record of one request as it flows through your entire system. Every trace has a globally unique traceId (128-bit hex string).
  • Span: A single unit of work within a trace (e.g., HTTP request, DB query, Kafka publish). Each span has a spanId, start/end time, parent span ID, status, and key-value attributes.
  • Parent-child relationship: When Service A calls Service B, Service B creates a child span with Service A's span as the parent. This forms the trace tree (waterfall view).
  • Trace context propagation: The traceId and parent spanId are forwarded to all downstream services via HTTP headers (W3C traceparent) or message headers (Kafka). Automatic in Spring Boot 3.
  • Attributes vs Events: Attributes are metadata on the span (userId, orderId, HTTP status). Events are time-stamped annotations within a span (e.g., "cache miss", "retry attempt 2").

2. Tooling: OTel Java Agent vs Micrometer Tracing

ApproachHowInstrumentationBest For
OTel Java AgentJVM -javaagent flagAuto (bytecode)Legacy apps, zero code change
Micrometer Tracing (Spring Boot 3)Spring Boot starterAuto + @Observed APISpring Boot microservices (recommended)
OTel SDK directSDK dependencyManual Span APIFull control, non-Spring apps

Recommendation: Use Micrometer Tracing for Spring Boot 3+ apps — it bridges to the OTel SDK under the hood, auto-instruments RestTemplate, WebClient, Feign, Spring Data, Kafka, and Redis, and integrates with Spring's observation API.

3. Spring Boot 3 Setup: Zero-Code Auto-Instrumentation

// pom.xml — Micrometer Tracing with OTel bridge + OTLP export
<!-- Micrometer Tracing core -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<!-- OTel OTLP exporter (sends to Jaeger/Tempo/any OTLP-compatible backend) -->
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
# application.yml — tracing config
management:
  tracing:
    sampling:
      probability: 1.0   # 100% for dev; 0.1 for prod (10%)
    propagation:
      type: w3c           # W3C traceparent (recommended)
  otlp:
    tracing:
      endpoint: http://jaeger:4318/v1/traces
  zipkin:
    tracing:
      endpoint: http://zipkin:9411/api/v2/spans  # if using Zipkin

spring:
  application:
    name: order-service   # appears as service name in trace backend

With this config, Spring Boot 3 automatically instruments: all HTTP incoming requests, RestTemplate/WebClient/Feign outbound calls, Spring Data (JPA/MongoDB/Redis), Spring Kafka, and @Scheduled tasks. No code changes needed.

4. Custom Spans: @Observed & Tracer API

// ❌ BAD: No business context in traces — only technical spans
// You see: "POST /api/orders" taking 3s — but WHY? Which sub-operation is slow?
// ✅ GOOD: Custom spans with business attributes + @Observed annotation
// Option 1: @Observed annotation (declarative, AOP-based)
@Observed(name = "order.payment", contextualName = "processPayment",
          lowCardinalityKeyValues = {"payment.provider", "stripe"})
public PaymentResult processPayment(Order order) {
    return stripeService.charge(order);
}

// Option 2: Tracer API for fine-grained control
@Service
public class InventoryService {
    @Autowired private Tracer tracer;

    public void deductInventory(String productId, int qty) {
        Span span = tracer.nextSpan()
            .name("inventory.deduct")
            .tag("product.id", productId)
            .tag("quantity", String.valueOf(qty))
            .start();
        try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
            // Business logic
            Product p = productRepository.findById(productId).orElseThrow();
            if (p.getStock() < qty) {
                span.tag("error", "insufficient_stock");
                span.event("insufficient_stock_detected");
                throw new InsufficientStockException(productId);
            }
            p.setStock(p.getStock() - qty);
            productRepository.save(p);
            span.tag("new.stock", String.valueOf(p.getStock()));
        } catch (Exception ex) {
            span.error(ex);
            throw ex;
        } finally {
            span.end();  // always end the span
        }
    }
}

5. Trace Context Propagation

Spring Boot 3 with Micrometer Tracing propagates the W3C traceparent header automatically for:

  • RestTemplate / WebClient / Feign: Auto-injects traceparent on all outgoing HTTP calls
  • Spring Kafka: Injects/extracts trace context in Kafka message headers
  • Incoming requests: Extracts traceparent from incoming HTTP headers to continue the trace

W3C traceparent format: 00-{traceId}-{parentSpanId}-{flags}. Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

// ✅ GOOD: Baggage for cross-service business context propagation
// In order-service: set business baggage that all downstream services see
@PostMapping("/orders")
public ResponseEntity<Order> createOrder(@RequestBody OrderRequest request) {
    // Baggage is auto-propagated via traceparent to all downstream services
    BaggageField.create("tenant.id").updateValue(request.getTenantId());
    BaggageField.create("user.id").updateValue(request.getUserId());
    return ResponseEntity.ok(orderService.create(request));
}

// In inventory-service: read baggage without any code coupling
@Service
public class InventoryService {
    public void deductStock(String productId, int qty) {
        String tenantId = BaggageField.getByName("tenant.id").getValue();
        // tenantId is automatically available here — propagated via HTTP header!
        log.info("Deducting stock for tenant={} product={}", tenantId, productId);
    }
}

6. Tracing Through Kafka Messages

// ✅ GOOD: Kafka trace propagation with Spring Kafka + Micrometer auto-instrumentation
// Producer: trace headers injected AUTOMATICALLY by Spring Kafka + Micrometer Tracing
@Service
public class OrderEventPublisher {
    @Autowired private KafkaTemplate<String, OrderCreatedEvent> kafkaTemplate;

    public void publish(OrderCreatedEvent event) {
        // traceparent header is auto-added to Kafka message headers — no manual code!
        kafkaTemplate.send("order-created", event.getOrderId(), event);
    }
}

// Consumer: trace automatically continued from message headers
@KafkaListener(topics = "order-created", groupId = "inventory-group")
@Observed(name = "kafka.order.inventory.process")
public void handleOrderCreated(OrderCreatedEvent event) {
    // This span is automatically a child of the producer's span — full trace!
    inventoryService.deductInventory(event.getProductId(), event.getQuantity());
}

7. Backends: Jaeger, Zipkin & Grafana Tempo

BackendProtocolStorageBest For
JaegerOTLP / Thrift UDPCassandra, Elasticsearch, BadgerSelf-hosted, mature UI, Kubernetes-native
ZipkinHTTP JSON / OTLPIn-memory, MySQL, ElasticsearchLightweight, simple setup, dev environments
Grafana TempoOTLPObject storage (S3, GCS)Production scale, correlate with Loki logs & Prometheus

8. Sampling Strategies

StrategyDecision PointProsCons
Head-based (probabilistic)At trace startLow overheadDiscards interesting error traces at low rates
Tail-based (in OTel Collector)After trace complete✅ Sample ALL errors, slow tracesHigher memory in collector
Always-on for errorsStatus code checkNever miss error tracesRequires custom sampler

9. Correlating Logs with Traces: MDC Integration

# application.yml — auto-inject traceId/spanId into every log line
logging:
  pattern:
    console: "%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} [%X{traceId},%X{spanId}] - %msg%n"
  level:
    io.micrometer.tracing: DEBUG   # see trace propagation in logs during debugging

# Result: every log line contains the traceId
# 09:15:42.001 [nio-8080-exec-1] INFO  OrderService [4bf92f3577b34da6a...] - Processing order 123
# Click traceId in Grafana/Jaeger to see the full request waterfall!

10. Production Observability Stack

The modern production observability stack for Spring Boot microservices in 2026:

  • Metrics: Micrometer + Prometheus + Grafana (JVM, business metrics)
  • Logs: Logback/Log4j2 → Loki (Grafana) or Elasticsearch (ELK)
  • Traces: Micrometer Tracing → OTel Collector → Grafana Tempo (all services)
  • Correlation: traceId in all three systems — click a log line to see the trace, click a slow trace to see related logs
  • Alerting: Prometheus AlertManager for metric-based alerts; Grafana for cross-signal alerts

11. Interview Questions & Observability Checklist

Q: A request takes 5 seconds but your health check says all services are healthy. How do you debug it?

A: Open the trace for that specific request in Jaeger/Tempo. The waterfall view shows which span takes 5 seconds — whether it's a database query, an external API call, or a specific microservice. Drill into that span's attributes (SQL query, endpoint URL). Cross-reference with logs for that traceId to get application-level context. Without distributed tracing, this investigation takes hours; with it, minutes.

✅ Distributed Tracing Production Checklist
  • Micrometer Tracing + OTel bridge in all services
  • Service name set per service (spring.application.name)
  • W3C traceparent propagation enabled
  • Custom spans for critical business operations
  • Business attributes on spans (orderId, userId)
  • traceId in log pattern (MDC)
  • Sampling 10% in prod; 100% for errors
  • Trace IDs in error API responses
  • Grafana Tempo linked to Loki logs
  • OTel Collector as sidecar (buffer + retry)
Tags:
distributed tracing spring boot opentelemetry spring boot 2026 micrometer tracing jaeger spring boot grafana tempo custom spans java

Leave a Comment

Related Posts

DevOps

Microservices Observability: Prometheus & Grafana

DevOps

ELK Stack for Java Microservices

Microservices

Spring Cloud Gateway Production

Core Java

Kafka Streams Java Guide

Back to Blog Last updated: April 11, 2026