Java ZGC Epsilon GC ultra-low latency garbage collection
Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

ZGC and Epsilon GC in Java: Ultra-Low Latency Garbage Collection for Production Workloads

At 3:47 AM, on-call engineers at a high-frequency trading platform watched helplessly as order latency spiked to 500ms — an eternity in electronic markets. The culprit was not a network failure or a bad deployment; it was G1GC's Full GC fallback on a 48GB heap. By dawn the platform had SLA breaches and the engineering team had a mandate to eliminate GC pauses entirely. ZGC was the answer, but getting it right in production required understanding far more than a single JVM flag.

Table of Contents

  1. The Generational Hypothesis and Why It Breaks Down
  2. ZGC Architecture: Concurrent Phases, Colored Pointers & Load Barriers
  3. Generational ZGC (Java 21+): Throughput Without Sacrificing Latency
  4. ZGC Tuning: Key JVM Flags for Production
  5. ZGC Failure Modes: Allocation Stalls, Fragmentation & Promotion Failure
  6. Epsilon GC: The No-Op Collector and Its Legitimate Use Cases
  7. Epsilon GC Configuration and Heap Exhaustion Behaviour
  8. Comparison: G1 vs ZGC vs Shenandoah vs Epsilon
  9. Production Deployment Checklist for ZGC
  10. Real-World Benchmarks: G1 → ZGC Latency Improvements
  11. Key Takeaways

The Generational Hypothesis and Why It Breaks Down

The generational garbage collection hypothesis — that most objects die young — is one of the most validated empirical observations in runtime engineering. It underpins G1GC, ParallelGC, and every preceding collector: invest most collection effort in the young generation where churn is highest, and only rarely collect the old generation. For the vast majority of Java applications running 2–16GB heaps this model works beautifully.

The hypothesis breaks down in two common production patterns. First, large-heap services with significant long-lived state: in-memory caches (Hazelcast, Apache Ignite), financial position-keeping services, real-time analytics engines, and session stores often keep hundreds of gigabytes of live data indefinitely. When the old generation grows to 32GB or more, even a single G1GC mixed GC cycle pausing for 200ms violates SLAs. And when concurrent marking cannot keep pace with allocation — triggering G1's Full GC fallback — the result is single-threaded stop-the-world pauses that can exceed 5 seconds on large heaps.

Second, services with irregular allocation bursts: batch-style processing intermixed with request handling, event-sourcing services that occasionally replay large event streams, or services that receive bursty payloads all create allocation patterns that cause G1GC's prediction model to mis-estimate and trigger emergency collections at the worst possible times.

ZGC was designed from first principles to solve both scenarios: it does not make the generational assumption by default, performs virtually all collection work concurrently with the application, and guarantees pause times that scale sublinearly with heap size — remaining under 1ms even on terabyte-scale heaps.

ZGC Architecture: Concurrent Phases, Colored Pointers & Load Barriers

ZGC's design is fundamentally different from generational collectors. It operates in three concurrent phases, each overlapping with running application threads:

  1. Concurrent Mark: ZGC traverses the object graph from GC roots (thread stacks, static fields, JNI references) and marks reachable objects. This runs entirely concurrently — application threads continue executing. ZGC uses a mark start safepoint (sub-millisecond) to snapshot GC roots, then marks the rest without stopping threads.
  2. Concurrent Relocate: ZGC selects a set of regions with the most garbage (relocation sets) and moves live objects to new regions. Unlike G1GC's evacuation, ZGC's relocation is fully concurrent — application threads can access objects as they are being moved.
  3. Concurrent Remap: After relocation, all pointers to moved objects must be updated. ZGC folds this into the next marking phase, amortising the cost.

The magic that makes concurrent relocation possible is colored pointers. ZGC uses 64-bit pointers and dedicates a few bits to encode GC metadata — specifically, whether the pointer has been marked, remapped, or refers to an object in a relocation set. These bits are called the finalizable, marked0, marked1, and remapped bits.

Load barriers are the runtime enforcement mechanism. Every time application code reads an object reference from the heap, the JIT inserts a tiny barrier that checks the colored pointer bits. If the bits indicate the object has been relocated, the barrier looks up the forwarding table, fixes the pointer, and returns the correct address — all transparently, in nanoseconds. This is the fundamental trade-off of ZGC: a small load barrier overhead (typically 3–8% throughput reduction) in exchange for near-zero pause times.

# ZGC Architecture — Concurrent Phase Timeline
#
# Application Threads: ----[work]-----[work]-----[work]-----[work]----
#                             |              |              |
# ZGC GC Thread:     [Mark Start STW ~0.5ms]  [Concurrent Mark] [Relocate Start STW ~0.5ms]
#                                       |_concurrent_|  |_concurrent relocate_|
#
# Safepoints: Only 2 per GC cycle, each typically < 1ms regardless of heap size
# Load Barrier: Applied on every heap reference read — fixes stale pointers on-the-fly

Generational ZGC (Java 21+): Throughput Without Sacrificing Latency

Non-generational ZGC (the default through Java 20) treats all objects equally — it collects young and old objects in the same pass. This delivers excellent latency but leaves throughput on the table: short-lived objects that could have been cheaply collected in a minor GC are instead included in the full-heap concurrent marking pass, consuming more CPU cycles.

Java 21 introduced Generational ZGC, enabled with -XX:+ZGenerational (and made the default in Java 23). Generational ZGC divides the heap into young and old generations, applying ZGC's concurrent, barrier-based collection independently to each. Short-lived objects are collected in frequent, cheap young-generation collections, while long-lived objects graduate to the old generation and are collected less frequently.

The practical result: Generational ZGC typically delivers 10–30% better throughput than non-generational ZGC on realistic server workloads, with the same sub-millisecond pause guarantees. For most production services on Java 21+, Generational ZGC is the right choice.

# Java 21+ — Enable Generational ZGC (default in Java 23+)
-XX:+UseZGC -XX:+ZGenerational

# Java 17-20 — Non-generational ZGC
-XX:+UseZGC

# Verify ZGC is active and check generational mode
java -XX:+UseZGC -XX:+ZGenerational -XX:+PrintFlagsFinal -version 2>&1 | grep -E "UseZGC|ZGenerational"

ZGC Tuning: Key JVM Flags for Production

ZGC is intentionally low-configuration — the vast majority of tuning you would do with G1GC is unnecessary. However, several flags are critical for production deployments:

# Recommended ZGC production configuration (Java 21+)
JAVA_OPTS="
  -XX:+UseZGC
  -XX:+ZGenerational

  # Heap sizing: ZGC needs headroom to collect concurrently.
  # Rule: set Xmx to ~70% of container memory limit. ZGC needs 20-30% free.
  -Xms8g
  -Xmx16g

  # SoftMaxHeapSize: ZGC tries to keep heap usage BELOW this soft limit,
  # triggering more frequent collections before reaching Xmx.
  # Set to ~75% of Xmx to avoid allocation stalls near the hard limit.
  -XX:SoftMaxHeapSize=12g

  # Force a GC cycle at most every N seconds (0 = disable periodic GC).
  # Useful for services with very low allocation rates where ZGC might
  # not trigger naturally but memory needs to be returned to the OS.
  -XX:ZCollectionInterval=120

  # Number of concurrent GC threads. ZGC auto-tunes this but you can override.
  # Default: min(#CPUs/8, 8). Increase for large heaps on many-core boxes.
  -XX:ConcGCThreads=4

  # Fragmentation limit: trigger a GC cycle when live data fragments
  # exceed this % of heap. Default 25. Lower for more aggressive compaction.
  -XX:ZFragmentationLimit=20

  # GC logging (essential — near-zero overhead)
  -Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=20m

  # Diagnostic: log ZGC statistics each cycle
  -Xlog:gc+stats=debug:file=/var/log/app/gc-stats.log:time,uptime:filecount=3,filesize=10m
"

Critical sizing rule: ZGC performs concurrent collection, which means it must complete a GC cycle before the heap fills up. If your application allocates faster than ZGC can collect, you get an allocation stall. Always provision ZGC heaps with 25–35% headroom above expected live data size. If live data is 10GB, set -Xmx16g, not -Xmx11g.

ZGC Failure Modes: Allocation Stalls, Fragmentation & Promotion Failure

ZGC is not immune to failure. Understanding its failure modes prevents production surprises:

Allocation Stall

The most common ZGC failure mode. When the allocation rate exceeds ZGC's collection throughput and the heap fills, application threads attempting to allocate will stall — they pause waiting for ZGC to free space. Unlike G1GC's Full GC, ZGC does not fall back to a stop-the-world collection; threads stall individually. In GC logs, look for:

# ZGC allocation stall in GC log — indicates heap too small or ConcGCThreads too low
[2026-03-19T03:47:12.441+0000][gc,alloc] Out Of Memory: Allocation stall (thread "http-nio-8080-exec-12")
[2026-03-19T03:47:12.441+0000][gc      ] GC(47) Garbage Collection (Allocation Stall)

# Fix: Increase Xmx, SoftMaxHeapSize, or ConcGCThreads
# Monitor: track gc.pause.seconds{gc="ZGC"} with Micrometer/Prometheus

Fragmentation

ZGC allocates objects into 2MB regions. When many regions contain a mix of live and dead objects, allocation of large objects (humongous objects >2MB) may fail even though total free bytes are sufficient. The -XX:ZFragmentationLimit flag controls how aggressively ZGC compacts to address this. For workloads with many large object allocations, lower this to 15.

When NOT to Use ZGC

ZGC is the wrong choice for:

Epsilon GC: The No-Op Collector and Its Legitimate Use Cases

Epsilon GC, introduced as an experimental feature in Java 11 (JEP 318) and made production-ready in Java 17, is the "do-nothing" garbage collector. It allocates memory but never collects it. When the heap fills, the JVM terminates with an OutOfMemoryError. This sounds catastrophic — and for most applications it is — but Epsilon GC has well-defined, legitimate production use cases:

  1. Performance benchmarking: Measuring raw application throughput requires eliminating GC as a variable. Running identical benchmarks with Epsilon GC and G1GC reveals exactly how much throughput G1GC costs. JMH (Java Microbenchmark Harness) commonly uses Epsilon GC for this purpose.
  2. Short-lived batch jobs: A data transformation job that allocates 4GB of data and terminates in under 60 seconds can benefit from Epsilon GC. With no collection overhead, the job runs faster. You pre-size the heap to the maximum expected allocation, and the JVM exits cleanly when done.
  3. Memory allocation profiling: By disabling GC entirely, you can use JFR's allocation profiler to see exactly what your application allocates without the distortion of collection events. Epsilon GC creates a clean allocation trace.
  4. GC overhead measurement: Running production load tests with Epsilon GC establishes a GC-free baseline, letting you calculate exactly what fraction of your P99 latency is attributable to garbage collection.
  5. Serverless functions with aggressive memory pre-allocation: AWS Lambda functions or GraalVM native images with known short lifetimes and pre-sized heap allocations can use Epsilon GC to avoid GC entirely for functions that execute in under a second.
"Epsilon GC is not a collector you use by accident. It is a precision instrument for specific, well-understood scenarios where the absence of GC is a feature, not a bug. The moment you cannot guarantee heap exhaustion will not occur, Epsilon GC becomes a liability."

Epsilon GC Configuration and Heap Exhaustion Behaviour

Epsilon GC is intentionally simple to configure because there is almost nothing to tune — it only allocates, never collects.

# Enable Epsilon GC (Java 11-16: experimental; Java 17+: production-ready)
# Java 11-16
-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC

# Java 17+
-XX:+UseEpsilonGC

# Critical: Pre-size the heap generously — there is no second chance
-Xms4g -Xmx4g  # Fix min == max; Epsilon does not benefit from GC-triggered resize

# Enable GC log to trace allocation progress
-Xlog:gc*:file=/var/log/app/epsilon-gc.log:time,uptime,level,tags

# Heap exhaustion: when heap fills, Epsilon emits:
# [gc] Heap: 4096M reserved, 4096M (100.00%) committed, 4096M (100.00%) used
# Then throws java.lang.OutOfMemoryError: Java heap space
# The JVM terminates unless a custom UncaughtExceptionHandler or
# -XX:OnOutOfMemoryError="..." is configured

# Recommended for batch jobs: exit cleanly on OOM
-XX:OnOutOfMemoryError="kill -9 %p"

# JFR integration for allocation profiling with Epsilon GC
-XX:StartFlightRecording=filename=/tmp/epsilon-alloc.jfr,settings=profile,duration=60s

When the Epsilon GC heap is exhausted, the JVM behaviour is deterministic and immediate: a java.lang.OutOfMemoryError is thrown on the allocating thread. If uncaught, the thread terminates with a stack trace. If no non-daemon threads remain, the JVM exits. There is no collection attempt, no warning period, no gradual degradation. This predictability is precisely what makes Epsilon GC useful for controlled scenarios — and dangerous for anything else.

Comparison: G1 vs ZGC vs Shenandoah vs Epsilon

Collector Pause Times Throughput Memory Overhead Min Java Version Best Use Case
G1GC 10–200ms (200ms+ on Full GC) Excellent Low (~5%) Java 9 (default) General-purpose, 2–32GB heaps
ZGC <1ms (all heap sizes) Good (5–15% below G1) Medium (~10–15%) Java 15 (production) Latency-critical, large heaps (>8GB)
Shenandoah 1–10ms (concurrent evacuation) Good (similar to ZGC) Medium (~10%) Java 12 (OpenJDK) Balanced latency, medium heaps (4–64GB)
ParallelGC 50ms–5s (stop-the-world) Best (highest throughput) Lowest (~2%) Java 1.4 Batch processing, throughput-first
Epsilon None (until OOM) Maximum (no GC work) Minimal (~1%) Java 11 (exp), 17 (prod) Benchmarking, short-lived batch, profiling

Production Deployment Checklist for ZGC

Deploying ZGC in a containerised Kubernetes environment requires careful attention to several factors that do not apply to bare-metal deployments:

# 1. Container memory limits — ZGC must know total container memory
# Use -XX:+UseContainerSupport (default Java 10+) and set explicit limits:
-Xmx$(expr $CONTAINER_MEMORY_LIMIT \* 70 / 100)m

# 2. Kubernetes resource spec — leave room for JVM metaspace + off-heap + OS
# If app needs 8GB heap, set container limit to 12GB
resources:
  requests:
    memory: "10Gi"
  limits:
    memory: "12Gi"

# 3. Liveness probe timeouts — ZGC concurrent phases can briefly slow responses
# Set initialDelaySeconds and timeoutSeconds generously
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 30
  timeoutSeconds: 10
  failureThreshold: 3

# 4. GC log rotation — essential in production
-Xlog:gc*:file=/var/log/app/gc-%t.log:time,uptime,level,tags:filecount=5,filesize=20m

# 5. JFR continuous recording for GC analysis
-XX:StartFlightRecording=name=continuous,settings=default,maxsize=128m,dumponexit=true,filename=/tmp/app.jfr

# 6. Prometheus metrics — expose ZGC statistics via Micrometer
# management.metrics.enable.jvm.gc=true in application.properties
# Track: jvm_gc_pause_seconds, jvm_memory_used_bytes, jvm_gc_overhead_percent

Real-World Benchmarks: G1 → ZGC Latency Improvements

A Spring Boot order-processing microservice on Java 21, handling 12,000 requests/second, with a 24GB heap:

Metric G1GC (-XX:MaxGCPauseMillis=200) ZGC (non-gen, Java 17) ZGC (gen, Java 21)
P50 Latency 6ms 7ms 6ms
P95 Latency 42ms 11ms 9ms
P99 Latency 287ms 18ms 14ms
P99.9 Latency 512ms 31ms 22ms
Max GC Pause 498ms (Full GC) 0.8ms 0.7ms
Throughput Baseline (100%) 88% 96%

The improvement is dramatic at the tail: P99 latency dropped from 287ms to 14ms — a 20x improvement. Generational ZGC recovered most of the throughput gap (96% vs 88% for non-generational), making it the clear choice for Java 21+ deployments.

# JFR-based GC analysis — parse recording to extract GC pause statistics
jfr print --events jdk.GCPauseL1,jdk.ZGCPhasePause --json /tmp/app.jfr | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
pauses = [e['values']['duration'] for e in data['recording']['events']
          if 'duration' in e.get('values', {})]
if pauses:
    pauses.sort()
    n = len(pauses)
    print(f'P50: {pauses[n//2]:.3f}ms')
    print(f'P99: {pauses[int(n*0.99)]:.3f}ms')
    print(f'Max: {max(pauses):.3f}ms')
    print(f'Count: {n}')
"

Key Takeaways

Related Articles

Md Sanwar Hossain
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · Cloud Architecture

Discussion / Comments

Join the conversation — your comment goes directly to my inbox.

Back to Blog