Java serialization performance Jackson Protobuf Kryo benchmarks microservices
Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Java Serialization Performance: Jackson vs Protobuf vs Kryo in High-Throughput Microservices

Serialization is the invisible tax on every inter-service call. At low request rates it is negligible; at 50,000+ events per second it can consume a quarter of your available CPU. This guide benchmarks Jackson JSON, Google Protobuf, and Kryo side by side, explains when each wins, and shows you how to squeeze maximum throughput out of Jackson before ever touching a binary format.

Table of Contents

  1. The Production Problem
  2. Why Serialization Matters at Scale
  3. Jackson JSON: The Default Choice
  4. Protobuf: Binary Efficiency
  5. Kryo: Java-Specific Speed
  6. JMH Benchmark Results
  7. Jackson Optimization Techniques
  8. Choosing the Right Format
  9. When NOT to Switch Away from JSON
  10. Key Takeaways

The Production Problem

A high-frequency trading platform running on Spring Boot microservices communicates market data between services via an internal event bus. At peak load of 50,000 order events per second, a profiling session revealed that JSON serialization via Jackson was consuming 35% of available CPU. P99 latency for a single serialization + deserialization round trip was 2.3ms — a seemingly small number that, multiplied across thousands of concurrent events, was creating measurable downstream latency spikes.

The engineering team switched the internal event bus from Jackson JSON to Google Protobuf. The result: CPU utilization for serialization dropped from 35% to 12%, and round-trip serialization latency fell from 2.3ms to 0.6ms. External-facing REST APIs remained JSON for client compatibility — nothing visible to downstream consumers changed. Only the internal wire format between services was migrated.

To understand why the difference is so dramatic, consider the payload comparison for a single order event:

# Jackson JSON payload (Order event, 847 bytes):
{"orderId":"ord-8f4a2c1d-9e3b","symbol":"AAPL","side":"BUY","quantity":1000,"price":182.45,"timestamp":1742634892341,"status":"PENDING","traderId":"trader-891","accountId":"acc-2291","venue":"NYSE","currency":"USD","flags":{"marginCall":false,"icebergOrder":true,"visibleQuantity":100}}

# Protobuf binary payload (same data, 127 bytes — 85% smaller)
# Kryo binary payload (same data, 201 bytes — 76% smaller)

The JSON payload is 847 bytes. The same data encoded in Protobuf binary is 127 bytes — 85% smaller. Kryo binary produces 201 bytes — 76% smaller. At 50,000 events per second, the difference in bytes transmitted and processed each second becomes massive: JSON moves 42.35 MB/s of payload vs. Protobuf's 6.35 MB/s. Less data means less CPU work to encode, decode, and buffer — and that is before accounting for the fact that Protobuf avoids reflection entirely.

Why Serialization Matters at Scale

At low throughput, serialization overhead is genuinely negligible. A service handling 100 requests per second spends a fraction of a millisecond per second on serialization — well below the noise floor of any reasonable performance measurement. The engineering instinct to "just use Jackson" is correct for the vast majority of services.

The inflection point arrives when throughput climbs into the tens of thousands of events per second. Serialization involves several CPU-bound operations per object: reflection to discover field names and types (Jackson's default mode), memory allocation for intermediate output buffers, string encoding for field names and string values, and complete object graph traversal regardless of whether most fields have changed. These costs compound with throughput.

The CPU impact can be estimated with a simple formula:

serialization_cpu_share = (events_per_second × bytes_per_event × cpu_ns_per_byte) / (available_cpu_ns_per_second)

For Jackson at 50,000 events/second with 800-byte payloads at approximately 15 ns/byte of processing cost:

50,000 × 800 × 15ns = 600,000,000 ns = 600ms of CPU work per second

On a single core, that is 60% CPU dedicated to serialization alone. On a 4-core machine with all cores available for serialization work, that translates to 15% CPU from serialization — and this is before any network I/O, business logic, or database work is counted. Real-world measurements at the trading platform confirmed this model closely: the profiler reported 35% CPU for serialization on a 2-core instance where both cores were handling event processing in parallel.

The important insight is that this overhead is proportional to throughput. A service that runs fine at 5,000 events/second will hit a serialization wall at 50,000 events/second without any change to the code. Capacity planning must account for serialization cost when projecting throughput limits.

Jackson JSON: The Default Choice

Jackson is the de facto standard for Java JSON serialization. It is embedded in Spring Boot via spring-boot-starter-web and powers every @RestController response by default. Its strengths are compelling: human-readable output, no schema required, universal client support, rich annotation ecosystem, and deep Spring integration. For most services — external REST APIs, configuration processing, data files — Jackson is the right answer and switching away is never necessary.

The core API is straightforward:

ObjectMapper mapper = new ObjectMapper();

// Serialize to String
String json = mapper.writeValueAsString(order);

// Serialize to byte array (preferred for network I/O — avoids String intermediate)
byte[] bytes = mapper.writeValueAsBytes(order);

// Deserialize
Order order = mapper.readValue(json, Order.class);

Jackson's default mode uses reflection to discover field names and access field values at runtime. This reflection cost is approximately 2–5 microseconds per object and is the primary performance bottleneck. The Jackson Blackbird module (the successor to Afterburner, designed for Java 11+) replaces reflection with bytecode generation, reducing per-call overhead by 30–50%:

<!-- pom.xml -->
<dependency>
    <groupId>com.fasterxml.jackson.module</groupId>
    <artifactId>jackson-module-blackbird</artifactId>
</dependency>
ObjectMapper mapper = new ObjectMapper()
    .registerModule(new BlackbirdModule())
    .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
    .configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
    .registerModule(new JavaTimeModule());

Blackbird generates bytecode accessors at first use (similar to how JVM JIT works) so the warmup period is slightly longer, but steady-state throughput improves measurably. On the JMH benchmarks below, Blackbird reduced serialization time from 2,834 ns/op to 1,823 ns/op — a 36% improvement using only a dependency and a two-line configuration change.

Protobuf: Binary Efficiency

Google's Protocol Buffers define message structure in a .proto schema file, which the Protobuf compiler (protoc) uses to generate Java classes with efficient serialization built in. There is no reflection — the generated code directly reads and writes fields by position. Field names are not encoded in the wire format; instead, each field is identified by a compact integer tag.

A .proto schema for the order event looks like this:

syntax = "proto3";
package com.example.trading;

message OrderEvent {
  string order_id = 1;
  string symbol = 2;
  string side = 3;
  int64 quantity = 4;
  double price = 5;
  int64 timestamp = 6;
  string status = 7;
}

The field numbers (1, 2, 3…) are what appear in the binary wire format — not the field names. This is why Protobuf payloads are dramatically smaller than JSON. The generated Java usage is clean and builder-pattern based:

// Serialize
OrderEvent event = OrderEvent.newBuilder()
    .setOrderId("ord-8f4a2c1d")
    .setSymbol("AAPL")
    .setQuantity(1000)
    .setPrice(182.45)
    .build();
byte[] bytes = event.toByteArray();  // 85-90% smaller than JSON

// Deserialize
OrderEvent parsed = OrderEvent.parseFrom(bytes);

Protobuf's advantages are significant for internal service communication: payloads are 5–10× smaller than JSON, serialization is 2–3× faster, and the schema provides strong typing with backward and forward compatibility built in. Adding a new field with a new field number does not break existing clients that do not know about it — they simply ignore unknown field numbers. Removing a field leaves a gap in field numbers; clients that still send that field have it silently ignored by the receiver.

The disadvantages are equally real: a schema (.proto files) must be maintained and distributed across all services that share a message type, the binary wire format is not human-readable (you cannot curl a Protobuf endpoint and read the response), and protoc code generation must be integrated into the build pipeline. In a large microservices mesh, managing schema versions and distribution becomes a non-trivial operational concern that requires a schema registry or a shared internal library.

Kryo: Java-Specific Speed

Kryo is a Java-specific binary serialization framework with no schema requirement. It can serialize any Java object — including classes that do not implement Serializable — and produces compact binary output faster than Java's built-in serialization by an order of magnitude. Kryo's sweet spot is Java-to-Java communication where cross-language support is not needed, such as internal cache serialization (Redis, Hazelcast) or intra-cluster messaging.

For maximum performance, Kryo benefits from class registration: assigning a compact integer ID to each class avoids writing the full class name in the binary output. Without registration, Kryo falls back to writing the fully qualified class name, which bloats output and costs performance.

Kryo kryo = new Kryo();
kryo.register(OrderEvent.class, 1);        // register with ID for compact encoding
kryo.register(java.util.ArrayList.class, 2);

// Serialize
Output output = new Output(1024, -1);      // 1KB initial buffer, auto-expanding
kryo.writeObject(output, order);
byte[] bytes = output.toBytes();

// Deserialize
Input input = new Input(bytes);
OrderEvent parsed = kryo.readObject(input, OrderEvent.class);
input.close();

Kryo's advantages: no schema to define or distribute, any Java class is serializable without modification, and it is the fastest Java-to-Java serializer available (outperforming even Protobuf in the benchmarks below). Its disadvantages are significant: Kryo is Java-only — there is no Python, Go, or JavaScript Kryo library. Class registration IDs must be identical between the serializing and deserializing ends; if a new deployment adds a class registration or changes an ID, deserialization of messages produced by the old version will fail silently or throw exceptions. This registration fragility makes Kryo unsuitable for any communication path where schema evolution needs to be managed gracefully. It is also completely unsuitable for external APIs.

In practice, Kryo is the right choice for serializing Java objects into a cache (Redis, Memcached, Hazelcast) where both the writer and reader are the same application, or for ephemeral intra-cluster messaging where all nodes always run the same version.

JMH Benchmark Results

The following benchmarks were run using JMH (Java Microbenchmark Harness) on a single-threaded workload with the order event payload described above (847 bytes as JSON, representing a realistic trading domain object). JVM: Java 21, 4GB heap, G1GC, 5 warmup iterations, 10 measurement iterations, 3 forks. All times in nanoseconds per operation (lower is better).

Benchmark                                        Mode  Score    Error  Units
SerializationBenchmark.jacksonSerialize          avgt   2834.4 ± 45.2  ns/op
SerializationBenchmark.jacksonBlackbirdSerialize avgt   1823.1 ± 31.7  ns/op
SerializationBenchmark.protobufSerialize         avgt    876.3 ± 12.4  ns/op
SerializationBenchmark.kryoRegisteredSerialize   avgt    634.7 ±  9.8  ns/op

SerializationBenchmark.jacksonDeserialize         avgt   3241.6 ± 52.1  ns/op
SerializationBenchmark.jacksonBlackbirdDeserialize avgt  2104.3 ± 38.9  ns/op
SerializationBenchmark.protobufDeserialize        avgt   1043.2 ± 18.3  ns/op
SerializationBenchmark.kryoRegisteredDeserialize  avgt    891.4 ± 14.2  ns/op

Payload size (bytes):
  Jackson JSON:             847
  Jackson JSON (gzip):      312
  Protobuf:                 127
  Kryo (registered):        201

Key findings: Kryo is 4.5× faster than default Jackson for serialization (634 ns vs. 2,834 ns) and 3.6× faster for deserialization. Protobuf is 3.2× faster than default Jackson for serialization and 3.1× faster for deserialization. The Jackson Blackbird module closes the gap substantially — Blackbird Jackson is only 2.1× slower than Protobuf for serialization, not 3.2×, with zero code changes other than adding a dependency and registering the module.

Critically: at 5,000 events per second, the absolute difference between Jackson (2,834 ns × 5,000 = 14.2ms/sec of CPU) and Kryo (634 ns × 5,000 = 3.2ms/sec) is only 11ms of CPU per second — less than 0.001% of available CPU on any modern server. The argument for switching serialization formats only becomes compelling above approximately 10,000–20,000 events per second where the difference starts appearing in profiler output.

Jackson Optimization Techniques

Before considering a switch to binary formats, exhaust Jackson's optimization options. In most production services, proper Jackson configuration eliminates the serialization bottleneck entirely:

Use ObjectMapper as a singleton. ObjectMapper is expensive to construct (it scans classpath modules, initializes reflection caches, registers default serializers). Creating a new ObjectMapper per request is a common mistake that adds 10–50ms overhead per call. After initial configuration, ObjectMapper is fully thread-safe and should be injected as a Spring bean.

Annotate with @JsonInclude(JsonInclude.Include.NON_NULL) at the class level to skip null fields. On objects with many optional fields, this reduces output size by 15–25% and eliminates null field serialization work.

Use the streaming API (JsonParser / JsonGenerator) for very large payloads. Streaming avoids building an intermediate in-memory object tree and can reduce GC pressure significantly for payloads over 100KB.

Enable MapperFeature.USE_ANNOTATIONS = false if your POJO classes use no Jackson annotations — this disables annotation scanning on every class inspection, saving measurable overhead when deserializing many different types.

A complete optimized ObjectMapper Spring bean configuration:

@Configuration
public class JacksonConfig {

    @Bean
    @Primary
    public ObjectMapper objectMapper() {
        return new ObjectMapper()
            // Blackbird: replaces reflection with bytecode generation (30-50% faster)
            .registerModule(new BlackbirdModule())
            // Java 8 date/time support
            .registerModule(new JavaTimeModule())
            // Don't fail on unknown properties — forward-compatible deserialization
            .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
            // Serialize dates as ISO-8601 strings, not timestamps
            .configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
            // Skip null fields in output (~20% smaller payloads for sparse objects)
            .setSerializationInclusion(JsonInclude.Include.NON_NULL)
            // Disable annotation scanning if POJOs use no Jackson annotations
            // .configure(MapperFeature.USE_ANNOTATIONS, false)
            // Use single quotes for non-standard clients (disabled by default)
            // .configure(JsonParser.Feature.ALLOW_SINGLE_QUOTES, true)
            ;
    }
}

With this configuration and the Blackbird module, Jackson throughput in the benchmarks above improves from 2,834 ns/op to 1,823 ns/op — a 36% improvement before touching a single line of business logic or switching any wire format.

Choosing the Right Format

Serialization format selection is a decision matrix, not a one-size-fits-all answer. The right format depends on who is communicating, what tooling they need, and how often the schema changes:

External REST APIs → Jackson JSON. Client compatibility, browser support, curl-ability, Postman/Swagger tooling, and human readability are non-negotiable for external APIs. Never expose Protobuf or Kryo on an external endpoint unless you control all clients.

Internal microservice event bus → Protobuf. When services you control are communicating at high throughput and may be written in different languages, Protobuf offers the best combination of performance, schema safety, cross-language support, and backward/forward compatibility. Schema management overhead is real but manageable with a shared proto library.

Java-to-Java cache serialization (Redis, Hazelcast) → Kryo. When both the writer and the reader are the same Java application (or always-same-version deployments), Kryo's speed advantage and lack of schema requirement make it the ideal choice. Hazelcast uses Kryo internally for exactly this reason.

Kafka event streaming → Avro with Schema Registry or Protobuf. Kafka topics are long-lived; consumers may read events produced months ago. Schema evolution is critical. Both Avro (with Confluent Schema Registry) and Protobuf provide managed schema evolution guarantees that raw JSON and Kryo do not.

Configuration and data files → JSON or YAML. Human readability is the priority for files that humans edit. Binary formats for configuration files create unnecessary friction.

When NOT to Switch Away from JSON

The benchmark numbers are compelling, but switching to binary serialization carries real costs that the benchmarks do not capture. Before making the switch, consider carefully:

Only switch when profiler data proves serialization is a bottleneck. If serialization does not appear in your CPU profile above 5%, the performance gain from switching is less than the operational cost of managing binary format schemas. This threshold means roughly 10,000+ events/second on modern hardware. Below that, Jackson with proper configuration is sufficient.

Binary formats break debuggability. When a production incident occurs, the first tool every engineer reaches for is curl, wireshark, or a log search. JSON responses are immediately readable. Protobuf and Kryo payloads require a schema and a decoding tool to inspect. Debugging an incident is measurably harder when wire data is opaque binary.

Schema management has real operational overhead. Distributing and versioning .proto files across dozens of microservices requires a dedicated schema repository, CI pipeline integration, and discipline around field number assignment. Teams that underestimate this overhead discover it only after a breaking schema change reaches production.

Tooling gaps compound over time. Postman, Swagger UI, REST clients, API gateways, and most observability tools speak JSON natively. Protobuf support exists but requires additional setup everywhere. For internal APIs that engineers and on-call teams interact with regularly, the convenience of JSON has compounding value.

The pragmatic path: measure first with a profiler (async-profiler or Java Flight Recorder), apply Jackson optimizations (Blackbird module + singleton ObjectMapper + NON_NULL inclusion), re-measure, and only proceed to binary formats if serialization remains a top-5 CPU consumer after optimization.

Key Takeaways

Related Articles

Md Sanwar Hossain
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · Cloud Architecture

Discussion / Comments

Join the conversation — your comment goes directly to my inbox.

Back to Blog