Core Java

Spring Boot Multi-Level Caching: Caffeine L1 + Redis L2 for High-Throughput APIs

A product catalog API at a major retail platform was serving 50,000 requests per second during peak flash sales. The database was holding up fine — until it wasn't. A minor Redis hiccup caused 8,000 requests per second to fall through to PostgreSQL simultaneously, overwhelmed the connection pool, and cascaded into a full service outage. The root cause: a single-layer cache strategy with no local fallback. A two-tier caching architecture with Caffeine as L1 and Redis as L2 would have absorbed the Redis failure gracefully. This post shows you exactly how to build it.

Md Sanwar Hossain March 22, 2026 15 min read Core Java

Spring Boot Multi-Level Caching with Caffeine and Redis

The Cache Miss Cascade: When @Cacheable Fails at Scale
L1 Caffeine Cache: In-Process Caching
L2 Redis Cache: Distributed Caching Configuration
Two-Tier Cache Hierarchy: Caffeine + Redis
Cache-Aside vs Read-Through vs Write-Behind
Cache Eviction Strategies: TTL, LRU, Manual Invalidation
Cache Stampede Prevention with Locks and Jitter
Production Pitfalls: Serialization, Stale Data, Cache Poisoning
Key Takeaways

1. The Cache Miss Cascade: When @Cacheable Fails at Scale

Spring Boot Caching Architecture | mdsanwarhossain.me — Spring Boot Caching Architecture — mdsanwarhossain.me

Spring's @Cacheable annotation is deceptively simple — add it to a method, point it at a cache name, done. For low-traffic services or development, this works beautifully. Under production load, naive usage leads to the thundering herd problem: when a cached item expires, all concurrent requests that need it hit the database simultaneously, generating a spike far beyond what the database would see without any cache at all.

The problem compounds with Redis as a single cache tier. Redis is fast (sub-millisecond), but it's still a network hop. Under extreme load — 50k+ RPS — even a 0.5ms Redis read adds up. And when Redis becomes temporarily unavailable or experiences high latency, all traffic that can't hit Redis has nowhere to go but the origin database.

Real scenario: A product detail page was cached in Redis with a 60-second TTL. The key expired at 12:00:00 exactly. 3,400 concurrent users were viewing that product. All 3,400 requests got a Redis miss simultaneously, launched 3,400 database queries in the same millisecond window, exhausted the 50-connection HikariCP pool in under 100ms, and triggered a cascading timeout that took 45 seconds to recover from.

2. L1 Caffeine Cache: In-Process Caching

Caffeine is a high-performance, near-optimal caching library for Java. It uses the W-TinyLFU algorithm — a combination of a frequency sketch and an LRU window — which delivers significantly higher hit rates than simple LRU for typical access patterns. Caffeine caches live in the JVM heap, so reads are pure in-memory operations — nanoseconds, not microseconds.

# application.yml — Caffeine as Spring Cache provider
spring:
  cache:
    type: caffeine
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=30s,recordStats

# build.gradle
implementation 'com.github.ben-manes.caffeine:caffeine:3.1.8'

@Configuration
@EnableCaching
public class CaffeineCacheConfig {

    @Bean
    public CacheManager caffeineCacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();
        manager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofSeconds(30))
            .expireAfterAccess(Duration.ofSeconds(60))
            .recordStats());  // enables hit/miss ratio metrics
        return manager;
    }
}

The recordStats() call is important — it enables Micrometer metrics integration so you can track cache hit ratios in Grafana and get alerted when L1 hit rate drops below a threshold (indicating sizing or TTL misconfiguration).

3. L2 Redis Cache: Distributed Caching Configuration

Cache Strategy Patterns | mdsanwarhossain.me — Cache Strategy Patterns — mdsanwarhossain.me

Redis as L2 provides a shared cache across all service instances — critical in horizontally scaled deployments where different pods serve different users. Without L2, instance A might cache a product while instance B fetches it from the database again.

@Configuration
public class RedisCacheConfig {

    @Bean
    public RedisCacheManager redisCacheManager(RedisConnectionFactory connectionFactory) {
        // Configure per-cache TTLs using entryTtl override
        Map<String, RedisCacheConfiguration> configs = Map.of(
            "products",       defaultConfig().entryTtl(Duration.ofMinutes(10)),
            "user-sessions",  defaultConfig().entryTtl(Duration.ofMinutes(30)),
            "pricing",        defaultConfig().entryTtl(Duration.ofSeconds(60))
        );

        return RedisCacheManager.builder(connectionFactory)
            .cacheDefaults(defaultConfig())
            .withInitialCacheConfigurations(configs)
            .build();
    }

    private RedisCacheConfiguration defaultConfig() {
        return RedisCacheConfiguration.defaultCacheConfig()
            .serializeKeysWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new GenericJackson2JsonRedisSerializer()))
            .disableCachingNullValues()
            .prefixCacheNameWith("myapp:v1:");  // namespace prevents key collisions across services
    }
}

Key: Always prefix cache names with a service name and version. Without prefixes, two microservices caching under the same key name ("products") in shared Redis will overwrite each other's data — a subtle bug that's extremely hard to diagnose.

4. Two-Tier Cache Hierarchy: Caffeine + Redis

Spring's CompositeCacheManager doesn't truly implement a tiered hierarchy — it returns the first cache that finds a value, but doesn't write through to L2 on L1 miss automatically. For a true L1 → L2 → database hierarchy, implement a custom Cache wrapper:

Spring Boot Caching Strategies | mdsanwarhossain.me — Spring Boot Caching Strategies — mdsanwarhossain.me

public class TwoLevelCache implements Cache {
    private final Cache l1;   // Caffeine
    private final Cache l2;   // Redis
    private final String name;

    public TwoLevelCache(String name, Cache l1, Cache l2) {
        this.name = name; this.l1 = l1; this.l2 = l2;
    }

    @Override
    public ValueWrapper get(Object key) {
        // L1 hit — fastest path
        ValueWrapper l1Value = l1.get(key);
        if (l1Value != null) return l1Value;

        // L2 hit — populate L1 for future requests on this instance
        ValueWrapper l2Value = l2.get(key);
        if (l2Value != null) {
            l1.put(key, l2Value.get());  // warm L1
            return l2Value;
        }

        return null;  // full miss — caller will load from database
    }

    @Override
    public void put(Object key, Object value) {
        l1.put(key, value);
        l2.put(key, value);
    }

    @Override
    public void evict(Object key) {
        l1.evict(key);
        l2.evict(key);
    }

    @Override
    public String getName() { return name; }
    @Override
    public Object getNativeCache() { return this; }
}

With this implementation, a Redis outage degrades gracefully: L1 hits continue serving requests from in-process memory. Only new items or expired L1 entries need the database. This is exactly the resilience pattern that would have prevented the retail platform outage described in the introduction. For more resilience patterns, see Cache Stampede Prevention in High-Traffic Microservices.

5. Cache-Aside vs Read-Through vs Write-Behind

Cache-Aside (lazy population) is what Spring's @Cacheable implements: check the cache, miss → load from database → store in cache → return. The application manages cache population explicitly. This is the most flexible pattern because the application controls the cache key and serialization.

Read-Through places the cache between the application and the database — the cache itself loads from the database on miss. Caffeine's LoadingCache is the Java implementation:

LoadingCache<ProductId, Product> productCache = Caffeine.newBuilder()
    .maximumSize(5000)
    .expireAfterWrite(Duration.ofMinutes(5))
    .build(productId -> productRepository.findById(productId)
        .orElseThrow(() -> new ProductNotFoundException(productId)));

// Usage — always hits the cache; cache loads from DB if needed
Product product = productCache.get(productId);

Write-Behind (async write) lets the cache absorb writes and flush them to the database asynchronously, dramatically reducing write latency for high-write workloads. Appropriate for counters, metrics, and non-critical state. Use with caution — a JVM crash before flush means data loss.

6. Cache Eviction Strategies: TTL, LRU, Manual Invalidation

Choosing the right TTL is more art than science and requires understanding your data's change frequency. A product's price might change every few minutes during flash sales; a shipping zone configuration might change monthly. Use different TTLs per cache name rather than a global default:

Volatile, high-change data (prices, inventory): TTL 30–60 seconds in both L1 and L2
Semi-static reference data (product descriptions, categories): TTL 5–15 minutes in L2, 60 seconds in L1
Static configuration (shipping zones, tax rules): TTL 1 hour in L2; refresh on deployment via manual eviction

For event-driven invalidation, use @CacheEvict combined with a Redis Pub/Sub channel that broadcasts invalidation events to all instances, causing their L1 caches to evict stale entries:

@CacheEvict(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
    Product saved = productRepository.save(product);
    // Publish eviction event so other instances evict their L1
    redisTemplate.convertAndSend("cache.evict.products", product.getId().toString());
    return saved;
}

7. Cache Stampede Prevention with Locks and Jitter

Two techniques prevent stampedes when a popular cache entry expires. Mutex locking: only one thread loads from the database; all others wait on the lock and then read the freshly-populated cache entry. Caffeine's LoadingCache implements this automatically. For Redis, use a distributed lock:

public Product getProduct(ProductId id) {
    String cacheKey = "product:" + id;
    Product cached = (Product) redisTemplate.opsForValue().get(cacheKey);
    if (cached != null) return cached;

    // Distributed lock — only one instance loads from DB
    String lockKey = "lock:" + cacheKey;
    Boolean locked = redisTemplate.opsForValue()
        .setIfAbsent(lockKey, "1", Duration.ofSeconds(5));

    if (Boolean.TRUE.equals(locked)) {
        try {
            Product product = productRepository.findById(id).orElseThrow();
            // Jitter: randomize TTL ±10% to stagger expirations
            long ttl = 300 + ThreadLocalRandom.current().nextInt(-30, 30);
            redisTemplate.opsForValue().set(cacheKey, product, Duration.ofSeconds(ttl));
            return product;
        } finally {
            redisTemplate.delete(lockKey);
        }
    }
    // Another instance is loading — brief wait then retry
    Thread.sleep(50);
    return getProduct(id);
}

8. Production Pitfalls: Serialization, Stale Data, Cache Poisoning

Serialization mismatch: When you add a new field to a cached class and deploy, existing Redis entries serialized without that field will either throw deserialization errors or silently return null for the new field. Mitigation: use versioned cache key prefixes (v1:, v2:) and a graceful miss fallback. Never deserialize without a try/catch that evicts on failure.

Stale-while-revalidate pattern: Instead of evicting and causing a miss, some systems serve the stale entry while a background thread refreshes it. This trades consistency for availability — acceptable for product listings, dangerous for inventory or pricing data.

Cache poisoning: If user-controlled input can influence cache keys and you don't sanitize it, an attacker can store arbitrary data under crafted keys, polluting the cache for legitimate users. Always normalize and validate cache key components before use.

9. Key Takeaways

A single Redis cache tier has no graceful degradation — Caffeine L1 absorbs traffic during Redis outages
Caffeine's W-TinyLFU delivers higher hit rates than simple LRU for real-world access patterns
Always prefix Redis cache keys with service name + version to prevent cross-service collisions
Use per-cache TTLs matching data volatility — not a single global TTL for all cache names
Add ±10% TTL jitter to prevent synchronized stampedes when many items expire together
Enable recordStats() on Caffeine and export metrics — unexplained latency spikes often trace back to cache hit-rate drops
Version your cached classes — serialization mismatches after deployments are a common silent failure mode

Read Full Blog Here

Explore more Spring Boot production engineering articles at:

mdsanwarhossain.me

Frequently Asked Questions

What is The Cache Miss Cascade and how does it work?

Spring's @Cacheable annotation is deceptively simple — add it to a method, point it at a cache name, done. For low-traffic services or development, this works beautifully. Under production load, naive usage leads to the thundering herd problem : when a cached item expires, all concurrent requests that need it hit the database simultaneously, generating a spike far beyond what the database would see without any cache at all. The problem compounds with Redis as a single cache tier. Redis is fast (sub-millisecond), but it's still a network hop. Under extreme load — 50k+ RPS — even a 0.5ms Redis read adds up. And when Redis becomes temporarily unavailable or experiences high latency, all traffic that can't hit Redis has nowhere to go but the origin database.

What is L1 Caffeine Cache and how does it work?

Caffeine is a high-performance, near-optimal caching library for Java. It uses the W-TinyLFU algorithm — a combination of a frequency sketch and an LRU window — which delivers significantly higher hit rates than simple LRU for typical access patterns. Caffeine caches live in the JVM heap, so reads are pure in-memory operations — nanoseconds, not microseconds. # application.yml — Caffeine as Spring Cache provider spring: cache: type: caffeine caffeine: spec: maximumSize=10000,expireAfterWrite=30s,recordStats # build.gradle implementation 'com.github.ben-manes.caffeine:caffeine:3.1.8' @Configuration @EnableCaching public class CaffeineCacheConfig { @Bean public CacheManager caffeineCacheManager() { CaffeineCacheManager manager = new CaffeineCacheManager(); manager.setCaffeine(Caffeine.newBuilder() .maximumSize(10_000) .expireAfterWrite(Duration.ofSeconds(30)) .expireAfterAccess(Duration.ofSeconds(60)) .recordStats()); // enables hit/miss ratio metrics return manager; } } The recordStats() call is important — it enables Micrometer metrics integration so you can track cache hit ratios in Grafana and get alerted when L1.

What is L2 Redis Cache and how does it work?

What is Two-Tier Cache Hierarchy and how does it work?

Spring's CompositeCacheManager doesn't truly implement a tiered hierarchy — it returns the first cache that finds a value, but doesn't write through to L2 on L1 miss automatically. For a true L1 → L2 → database hierarchy, implement a custom Cache wrapper: public class TwoLevelCache implements Cache { private final Cache l1; // Caffeine private final Cache l2; // Redis private final String name; public TwoLevelCache(String name, Cache l1, Cache l2) { this.name = name; this.l1 = l1; this.l2 = l2; } @Override public ValueWrapper get(Object key) { // L1 hit — fastest path ValueWrapper l1Value = l1.get(key); if (l1Value != null) return l1Value; // L2 hit — populate L1 for future requests on this instance ValueWrapper l2Value = l2.get(key); if (l2Value != null) { l1.put(key, l2Value.get()); // warm L1.

What is the difference between Cache-Aside vs Read-Through vs Write-Behind?

Cache-Aside (lazy population) is what Spring's @Cacheable implements: check the cache, miss → load from database → store in cache → return. The application manages cache population explicitly. This is the most flexible pattern because the application controls the cache key and serialization. Read-Through places the cache between the application and the database — the cache itself loads from the database on miss. Caffeine's LoadingCache is the Java implementation: LoadingCache<ProductId, Product> productCache = Caffeine.newBuilder() .maximumSize(5000) .expireAfterWrite(Duration.ofMinutes(5)) .build(productId -> productRepository.findById(productId) .orElseThrow(() -> new ProductNotFoundException(productId))); // Usage — always hits the cache; cache loads from DB if needed Product product = productCache.get(productId); Write-Behind (async write) lets the cache absorb writes and flush them to the database asynchronously, dramatically reducing write latency for high-write workloads. Appropriate for counters, metrics, and non-critical state.

Spring Boot Multi-Level Caching: Caffeine L1 + Redis L2 for High-Throughput APIs

Table of Contents

1. The Cache Miss Cascade: When @Cacheable Fails at Scale

2. L1 Caffeine Cache: In-Process Caching

3. L2 Redis Cache: Distributed Caching Configuration

4. Two-Tier Cache Hierarchy: Caffeine + Redis

5. Cache-Aside vs Read-Through vs Write-Behind

6. Cache Eviction Strategies: TTL, LRU, Manual Invalidation

7. Cache Stampede Prevention with Locks and Jitter

8. Production Pitfalls: Serialization, Stale Data, Cache Poisoning

9. Key Takeaways

Read Full Blog Here

Frequently Asked Questions

What is The Cache Miss Cascade and how does it work?

What is L1 Caffeine Cache and how does it work?

What is L2 Redis Cache and how does it work?

What is Two-Tier Cache Hierarchy and how does it work?

What is the difference between Cache-Aside vs Read-Through vs Write-Behind?

Tags

Leave a Comment

Related Posts

Spring Boot Multi-Level Caching: Caffeine L1 + Redis L2 for High-Throughput APIs

Table of Contents

1. The Cache Miss Cascade: When @Cacheable Fails at Scale

2. L1 Caffeine Cache: In-Process Caching

3. L2 Redis Cache: Distributed Caching Configuration

4. Two-Tier Cache Hierarchy: Caffeine + Redis

5. Cache-Aside vs Read-Through vs Write-Behind

6. Cache Eviction Strategies: TTL, LRU, Manual Invalidation

7. Cache Stampede Prevention with Locks and Jitter

8. Production Pitfalls: Serialization, Stale Data, Cache Poisoning

9. Key Takeaways

Read Full Blog Here

Frequently Asked Questions

What is The Cache Miss Cascade and how does it work?

What is L1 Caffeine Cache and how does it work?

What is L2 Redis Cache and how does it work?

What is Two-Tier Cache Hierarchy and how does it work?

What is the difference between Cache-Aside vs Read-Through vs Write-Behind?

Tags

Leave a Comment

Related Posts

Cache Stampede Prevention in High-Traffic Microservices: Techniques, Patterns, and Production Solutions

Spring Boot Performance Tuning: 10 Proven Tips to Reduce Latency by 60% (2026)

Rate Limiting, Caching & Load Balancing: Essential Building Blocks for Scalable APIs

Cookie Notice