eBPF for Production Observability: Kernel-Level Tracing Without Instrumentation
Traditional observability requires you to instrument code, deploy agents, and restart services. eBPF breaks that constraint entirely. With eBPF, you can trace every system call, every network packet, every CPU scheduling event, and every memory allocation — across all processes on a host — without touching a single line of application code, without restarting, and with overhead measured in single-digit percentages.
Part of the DevOps Reliability Engineering Series.
Introduction
Extended Berkeley Packet Filter (eBPF) is the most significant infrastructure technology of the past decade. Originally a Linux kernel mechanism for fast packet filtering, eBPF has evolved into a general-purpose programmable kernel extension system. Engineers write small programs in a restricted C subset, compile them to eBPF bytecode, and load them into the kernel — where they execute in response to any kernel event: system calls, network packet processing, scheduler decisions, hardware performance counters, and user-space probes.
The killer property is safety: eBPF programs are verified by the kernel before loading. The verifier proves that the program terminates, does not access memory out of bounds, and does not interfere with kernel stability. This is what makes eBPF suitable for production — you can load an eBPF program on a live production host handling millions of requests per second, and the kernel guarantees it will not destabilize the system.
For DevOps and SRE teams, eBPF unlocks a class of observability that was previously impossible or prohibitively expensive: system-call-level latency attribution, per-process network topology, cross-language CPU flamegraphs (Java, Go, Rust — without language-specific profilers), and real-time security event detection.
Real-World Problem: Unexplained p99 Latency Spikes
A payments platform was experiencing intermittent P99 latency spikes of 300ms on their checkout service. Application-level tracing (Jaeger) showed the spans were at the database query level. But the database team confirmed queries were fast — P99 under 5ms. The discrepancy suggested the latency was not in the query execution but in the network or connection pool. Standard monitoring could not explain it. An eBPF-based network tracer revealed the answer in 20 minutes: when the Linux kernel's TCP SYN backlog filled under burst traffic, new connection establishment was delayed by 250–300ms at the socket layer. No application instrumentation would have captured this — it was happening in the kernel's network stack, below the JDBC driver's visibility.
eBPF Architecture
The eBPF system has four components: eBPF programs (bytecode executing in kernel context), eBPF maps (shared memory structures for passing data between kernel and user space), hook points (the kernel events that trigger program execution), and user-space consumers (programs that read from eBPF maps to display, aggregate, or forward the data).
Hook points include: kprobes (arbitrary kernel function entry/exit), tracepoints (stable kernel trace events, ABI-stable across kernel versions), uprobes (user-space function entry/exit), perf events (CPU hardware counters), XDP (eXpress Data Path — packet processing before the kernel network stack), TC (Traffic Control — post-routing packet modification), socket filters, and LSM hooks (Linux Security Module — security policy enforcement).
The toolchain pyramid: raw eBPF bytecode (lowest level, maximum control) → BCC (BPF Compiler Collection, Python/Lua frontend for kernel tracing) → bpftrace (high-level tracing language, dtrace-like one-liners) → libbpf + BTF (portable eBPF, CO-RE for kernel version independence) → Cilium / Pixie / Tetragon / Parca (production-grade eBPF platforms).
Key Use Cases for Production Observability
1. CPU Flamegraphs Without Language-Specific Profilers
The profile BCC tool samples CPU call stacks across all processes at a configurable frequency: profile -F 99 -a 30 | flamegraph.pl > cpu.svg. This produces a flamegraph covering native code, JVM JIT-compiled Java, Go goroutines, and kernel context in a single unified view. For Java services, combine with frame pointer restoration (-XX:+PreserveFramePointer) to get accurate JVM stack frames in the eBPF profiler output.
2. Network Latency Attribution
bpftrace one-liner to measure TCP connection latency by destination: bpftrace -e 'kprobe:tcp_v4_connect { @start[tid] = nsecs; } kretprobe:tcp_v4_connect /@start[tid]/ { @latency_ns = hist(nsecs - @start[tid]); delete(@start[tid]); }'. This captures TCP connection establishment latency per thread, aggregated as a histogram — without touching the application or the network infrastructure.
3. System Call Latency Analysis
funclatency -u 'sys_read' measures the latency distribution of every read() system call on the host. Cross-reference with application-level I/O latency to identify where time is being spent — in the application, in the kernel buffer cache, or in the actual device I/O path.
4. File I/O Bottleneck Detection
biolatency (block I/O latency) and ext4slower (filesystem operations above a latency threshold) surface I/O bottlenecks at the device and filesystem layer. For services experiencing slow logging or slow config file reads, these tools pinpoint whether the issue is disk pressure, filesystem fragmentation, or kernel buffer cache thrashing.
5. Zero-Instrumentation Distributed Tracing with Pixie
Pixie uses eBPF to automatically capture and trace HTTP/gRPC/database requests across all pods in a Kubernetes cluster — with no agent installation, no code changes, and no sidecar injection. It parses protocol-level payloads from kernel network buffers, reconstructs request/response spans, and provides a full distributed trace view. This is transformative for polyglot microservices environments where getting every team to instrument every service is unrealistic.
6. Security Threat Detection with Tetragon
Cilium's Tetragon project uses eBPF LSM hooks to enforce security policies and detect threats at the kernel level: privilege escalation attempts, unexpected setuid calls, container escape attempts, and network connections to unexpected external IPs. Because enforcement happens in the kernel before system calls complete, Tetragon can block attacks rather than merely logging them after the fact.
Deploying eBPF on Kubernetes
eBPF tools typically run as privileged DaemonSets: one pod per node with access to the host's network namespace, PID namespace, and kernel interfaces. The minimum required capabilities are CAP_BPF, CAP_PERFMON, and CAP_NET_ADMIN. Avoid CAP_SYS_ADMIN (overly broad) in favor of the more specific capabilities available in Linux 5.8+.
Kernel version requirements: eBPF is stable from Linux 4.15+; BTF and CO-RE (Compile Once, Run Everywhere) require 5.2+; eBPF LSM requires 5.7+. Most managed Kubernetes services (EKS, GKE, AKS) now run kernels that support the full eBPF feature set. Verify with uname -r and check kernel config for CONFIG_BPF=y and CONFIG_BPF_SYSCALL=y.
Failure Scenarios and Limitations
eBPF verifier rejection: Complex programs with loops, recursive calls, or large stack frames can be rejected by the kernel verifier. Use bounded loops (for (int i = 0; i < MAX_ITER; i++) with a compile-time constant), avoid deep call chains, and stay within the 512-byte stack limit. The verifier error messages are terse — compile with debug info and consult the kernel source for the specific check that failed.
Map overflow: eBPF maps have fixed sizes defined at load time. If a hash map fills up, new insertions silently fail. Monitor map utilization via bpftool map show and right-size maps for your workload. For high-cardinality data (per-connection tracing on a host with thousands of connections), use LRU maps that evict the least-recently-used entry when full.
Overhead at very high event rates: eBPF programs execute synchronously on the hot path (for kprobes) or via perf buffers (for sampling). For system calls that fire millions of times per second (like read() on a busy I/O-intensive service), attaching a kprobe with non-trivial logic can add measurable overhead. Use sampling instead of tracing every event, and prefer tracepoints (lighter weight) over kprobes where available.
Kernel version fragility: kprobes attach to specific kernel function names that can change between kernel versions. Use BTF-based CO-RE programs with libbpf for portable programs, or use tracepoints (which have stable ABI guarantees) rather than kprobes for production monitoring that must survive kernel upgrades.
Architecture Diagram
The eBPF observability pipeline: Kernel Events (syscalls, network packets, scheduler) → eBPF Programs (loaded into kernel, verified, executed at hook points) → eBPF Maps (ring buffers, perf event arrays, hash maps) → User-Space Collector (BCC/libbpf daemon) → Prometheus Exporter or OpenTelemetry Collector → Grafana / Jaeger for visualization. The entire left half of this pipeline operates inside the kernel with no data leaving kernel space until it reaches the eBPF maps boundary.
Tool Selection Guide
Quick ad-hoc investigation: bpftrace one-liners. Structured scripts with Python integration: BCC. Production continuous profiling: Parca (open-source) or Polar Signals (commercial), both using eBPF for zero-instrumentation profiling. Kubernetes-native distributed tracing: Pixie. Network observability + security: Cilium with Hubble. Security enforcement: Tetragon. Full-featured commercial platform: Datadog eBPF agent, Dynatrace OneAgent (both use eBPF under the hood in their latest versions).
Trade-offs
eBPF has a steep learning curve — understanding kernel data structures, the verifier, and CO-RE requires significant investment. For most teams, using a managed eBPF platform (Pixie, Cilium, commercial offerings) provides 80% of the value without the kernel programming complexity. The operational model of privileged DaemonSets requires careful security review. However, the alternative — deploying language-specific agents, sidecars, and instrumentation libraries across every service — has its own operational complexity and gaps. eBPF's coverage of the full host, regardless of language or framework, makes it the highest-leverage observability investment for polyglot production environments.
Key Takeaways
- eBPF enables kernel-level tracing of any process on a host without code changes, restarts, or sidecar injection
- Safety is guaranteed by the kernel verifier — eBPF programs cannot crash or destabilize the kernel
- The toolchain ranges from raw eBPF to high-level platforms (Pixie, Cilium, Tetragon) for different use cases
- Production deployment as a DaemonSet with specific capabilities (CAP_BPF, CAP_PERFMON) rather than CAP_SYS_ADMIN
- Use CO-RE (Compile Once, Run Everywhere) for kernel-version-portable programs in production
- For most teams, managed eBPF platforms provide the best ROI over raw eBPF programming
Conclusion
eBPF represents a fundamental shift in how production systems are observed and secured. The ability to trace any system behavior — from CPU scheduling to network packets to security events — without application modification is the observability holy grail that engineers have pursued for decades. As the ecosystem matures and kernel support becomes ubiquitous, eBPF will become the default substrate for production monitoring, replacing the fragmented approach of language-specific agents, network taps, and sidecar containers. Teams that invest in eBPF fluency now will have a significant capability advantage as production systems grow more complex and more heterogeneous.
Related Articles
Discussion / Comments
Join the conversation — your comment goes directly to my inbox.