Distributed Tracing with OpenTelemetry & Spring Boot: Complete Production Guide (2026)
A complete guide to implementing distributed tracing across Spring Boot microservices: OpenTelemetry Java agent vs SDK, Micrometer Tracing auto-instrumentation, custom spans, trace context propagation via W3C traceparent, Jaeger and Zipkin backends, sampling strategies, and Grafana Tempo integration.
1. Core Concepts: Traces, Spans, Propagation
- Trace: A complete record of one request as it flows through your entire system. Every trace has a globally unique
traceId(128-bit hex string). - Span: A single unit of work within a trace (e.g., HTTP request, DB query, Kafka publish). Each span has a
spanId, start/end time, parent span ID, status, and key-value attributes. - Parent-child relationship: When Service A calls Service B, Service B creates a child span with Service A's span as the parent. This forms the trace tree (waterfall view).
- Trace context propagation: The traceId and parent spanId are forwarded to all downstream services via HTTP headers (W3C
traceparent) or message headers (Kafka). Automatic in Spring Boot 3. - Attributes vs Events: Attributes are metadata on the span (userId, orderId, HTTP status). Events are time-stamped annotations within a span (e.g., "cache miss", "retry attempt 2").
2. Tooling: OTel Java Agent vs Micrometer Tracing
| Approach | How | Instrumentation | Best For |
|---|---|---|---|
| OTel Java Agent | JVM -javaagent flag | Auto (bytecode) | Legacy apps, zero code change |
| Micrometer Tracing (Spring Boot 3) | Spring Boot starter | Auto + @Observed API | Spring Boot microservices (recommended) |
| OTel SDK direct | SDK dependency | Manual Span API | Full control, non-Spring apps |
Recommendation: Use Micrometer Tracing for Spring Boot 3+ apps — it bridges to the OTel SDK under the hood, auto-instruments RestTemplate, WebClient, Feign, Spring Data, Kafka, and Redis, and integrates with Spring's observation API.
3. Spring Boot 3 Setup: Zero-Code Auto-Instrumentation
<!-- Micrometer Tracing core -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<!-- OTel OTLP exporter (sends to Jaeger/Tempo/any OTLP-compatible backend) -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
management:
tracing:
sampling:
probability: 1.0 # 100% for dev; 0.1 for prod (10%)
propagation:
type: w3c # W3C traceparent (recommended)
otlp:
tracing:
endpoint: http://jaeger:4318/v1/traces
zipkin:
tracing:
endpoint: http://zipkin:9411/api/v2/spans # if using Zipkin
spring:
application:
name: order-service # appears as service name in trace backend
With this config, Spring Boot 3 automatically instruments: all HTTP incoming requests, RestTemplate/WebClient/Feign outbound calls, Spring Data (JPA/MongoDB/Redis), Spring Kafka, and @Scheduled tasks. No code changes needed.
4. Custom Spans: @Observed & Tracer API
// You see: "POST /api/orders" taking 3s — but WHY? Which sub-operation is slow?
// Option 1: @Observed annotation (declarative, AOP-based)
@Observed(name = "order.payment", contextualName = "processPayment",
lowCardinalityKeyValues = {"payment.provider", "stripe"})
public PaymentResult processPayment(Order order) {
return stripeService.charge(order);
}
// Option 2: Tracer API for fine-grained control
@Service
public class InventoryService {
@Autowired private Tracer tracer;
public void deductInventory(String productId, int qty) {
Span span = tracer.nextSpan()
.name("inventory.deduct")
.tag("product.id", productId)
.tag("quantity", String.valueOf(qty))
.start();
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
// Business logic
Product p = productRepository.findById(productId).orElseThrow();
if (p.getStock() < qty) {
span.tag("error", "insufficient_stock");
span.event("insufficient_stock_detected");
throw new InsufficientStockException(productId);
}
p.setStock(p.getStock() - qty);
productRepository.save(p);
span.tag("new.stock", String.valueOf(p.getStock()));
} catch (Exception ex) {
span.error(ex);
throw ex;
} finally {
span.end(); // always end the span
}
}
}
5. Trace Context Propagation
Spring Boot 3 with Micrometer Tracing propagates the W3C traceparent header automatically for:
- RestTemplate / WebClient / Feign: Auto-injects traceparent on all outgoing HTTP calls
- Spring Kafka: Injects/extracts trace context in Kafka message headers
- Incoming requests: Extracts traceparent from incoming HTTP headers to continue the trace
W3C traceparent format: 00-{traceId}-{parentSpanId}-{flags}. Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
// In order-service: set business baggage that all downstream services see
@PostMapping("/orders")
public ResponseEntity<Order> createOrder(@RequestBody OrderRequest request) {
// Baggage is auto-propagated via traceparent to all downstream services
BaggageField.create("tenant.id").updateValue(request.getTenantId());
BaggageField.create("user.id").updateValue(request.getUserId());
return ResponseEntity.ok(orderService.create(request));
}
// In inventory-service: read baggage without any code coupling
@Service
public class InventoryService {
public void deductStock(String productId, int qty) {
String tenantId = BaggageField.getByName("tenant.id").getValue();
// tenantId is automatically available here — propagated via HTTP header!
log.info("Deducting stock for tenant={} product={}", tenantId, productId);
}
}
6. Tracing Through Kafka Messages
// Producer: trace headers injected AUTOMATICALLY by Spring Kafka + Micrometer Tracing
@Service
public class OrderEventPublisher {
@Autowired private KafkaTemplate<String, OrderCreatedEvent> kafkaTemplate;
public void publish(OrderCreatedEvent event) {
// traceparent header is auto-added to Kafka message headers — no manual code!
kafkaTemplate.send("order-created", event.getOrderId(), event);
}
}
// Consumer: trace automatically continued from message headers
@KafkaListener(topics = "order-created", groupId = "inventory-group")
@Observed(name = "kafka.order.inventory.process")
public void handleOrderCreated(OrderCreatedEvent event) {
// This span is automatically a child of the producer's span — full trace!
inventoryService.deductInventory(event.getProductId(), event.getQuantity());
}
7. Backends: Jaeger, Zipkin & Grafana Tempo
| Backend | Protocol | Storage | Best For |
|---|---|---|---|
| Jaeger | OTLP / Thrift UDP | Cassandra, Elasticsearch, Badger | Self-hosted, mature UI, Kubernetes-native |
| Zipkin | HTTP JSON / OTLP | In-memory, MySQL, Elasticsearch | Lightweight, simple setup, dev environments |
| Grafana Tempo | OTLP | Object storage (S3, GCS) | Production scale, correlate with Loki logs & Prometheus |
8. Sampling Strategies
| Strategy | Decision Point | Pros | Cons |
|---|---|---|---|
| Head-based (probabilistic) | At trace start | Low overhead | Discards interesting error traces at low rates |
| Tail-based (in OTel Collector) | After trace complete | ✅ Sample ALL errors, slow traces | Higher memory in collector |
| Always-on for errors | Status code check | Never miss error traces | Requires custom sampler |
9. Correlating Logs with Traces: MDC Integration
logging:
pattern:
console: "%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} [%X{traceId},%X{spanId}] - %msg%n"
level:
io.micrometer.tracing: DEBUG # see trace propagation in logs during debugging
# Result: every log line contains the traceId
# 09:15:42.001 [nio-8080-exec-1] INFO OrderService [4bf92f3577b34da6a...] - Processing order 123
# Click traceId in Grafana/Jaeger to see the full request waterfall!
10. Production Observability Stack
The modern production observability stack for Spring Boot microservices in 2026:
- Metrics: Micrometer + Prometheus + Grafana (JVM, business metrics)
- Logs: Logback/Log4j2 → Loki (Grafana) or Elasticsearch (ELK)
- Traces: Micrometer Tracing → OTel Collector → Grafana Tempo (all services)
- Correlation: traceId in all three systems — click a log line to see the trace, click a slow trace to see related logs
- Alerting: Prometheus AlertManager for metric-based alerts; Grafana for cross-signal alerts
11. Interview Questions & Observability Checklist
A: Open the trace for that specific request in Jaeger/Tempo. The waterfall view shows which span takes 5 seconds — whether it's a database query, an external API call, or a specific microservice. Drill into that span's attributes (SQL query, endpoint URL). Cross-reference with logs for that traceId to get application-level context. Without distributed tracing, this investigation takes hours; with it, minutes.
- Micrometer Tracing + OTel bridge in all services
- Service name set per service (spring.application.name)
- W3C traceparent propagation enabled
- Custom spans for critical business operations
- Business attributes on spans (orderId, userId)
- traceId in log pattern (MDC)
- Sampling 10% in prod; 100% for errors
- Trace IDs in error API responses
- Grafana Tempo linked to Loki logs
- OTel Collector as sidecar (buffer + retry)