Spring Boot Multi-Level Caching with Caffeine and Redis
Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices

Core Java March 22, 2026 15 min read Spring Boot Production Engineering Series

Spring Boot Multi-Level Caching: Caffeine L1 + Redis L2 for High-Throughput APIs

A product catalog API at a major retail platform was serving 50,000 requests per second during peak flash sales. The database was holding up fine — until it wasn't. A minor Redis hiccup caused 8,000 requests per second to fall through to PostgreSQL simultaneously, overwhelmed the connection pool, and cascaded into a full service outage. The root cause: a single-layer cache strategy with no local fallback. A two-tier caching architecture with Caffeine as L1 and Redis as L2 would have absorbed the Redis failure gracefully. This post shows you exactly how to build it.

Table of Contents

  1. The Cache Miss Cascade: When @Cacheable Fails at Scale
  2. L1 Caffeine Cache: In-Process Caching
  3. L2 Redis Cache: Distributed Caching Configuration
  4. Two-Tier Cache Hierarchy: Caffeine + Redis
  5. Cache-Aside vs Read-Through vs Write-Behind
  6. Cache Eviction Strategies: TTL, LRU, Manual Invalidation
  7. Cache Stampede Prevention with Locks and Jitter
  8. Production Pitfalls: Serialization, Stale Data, Cache Poisoning
  9. Key Takeaways

1. The Cache Miss Cascade: When @Cacheable Fails at Scale

Spring's @Cacheable annotation is deceptively simple — add it to a method, point it at a cache name, done. For low-traffic services or development, this works beautifully. Under production load, naive usage leads to the thundering herd problem: when a cached item expires, all concurrent requests that need it hit the database simultaneously, generating a spike far beyond what the database would see without any cache at all.

The problem compounds with Redis as a single cache tier. Redis is fast (sub-millisecond), but it's still a network hop. Under extreme load — 50k+ RPS — even a 0.5ms Redis read adds up. And when Redis becomes temporarily unavailable or experiences high latency, all traffic that can't hit Redis has nowhere to go but the origin database.

Real scenario: A product detail page was cached in Redis with a 60-second TTL. The key expired at 12:00:00 exactly. 3,400 concurrent users were viewing that product. All 3,400 requests got a Redis miss simultaneously, launched 3,400 database queries in the same millisecond window, exhausted the 50-connection HikariCP pool in under 100ms, and triggered a cascading timeout that took 45 seconds to recover from.

2. L1 Caffeine Cache: In-Process Caching

Caffeine is a high-performance, near-optimal caching library for Java. It uses the W-TinyLFU algorithm — a combination of a frequency sketch and an LRU window — which delivers significantly higher hit rates than simple LRU for typical access patterns. Caffeine caches live in the JVM heap, so reads are pure in-memory operations — nanoseconds, not microseconds.

# application.yml — Caffeine as Spring Cache provider
spring:
  cache:
    type: caffeine
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=30s,recordStats

# build.gradle
implementation 'com.github.ben-manes.caffeine:caffeine:3.1.8'
@Configuration
@EnableCaching
public class CaffeineCacheConfig {

    @Bean
    public CacheManager caffeineCacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();
        manager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofSeconds(30))
            .expireAfterAccess(Duration.ofSeconds(60))
            .recordStats());  // enables hit/miss ratio metrics
        return manager;
    }
}

The recordStats() call is important — it enables Micrometer metrics integration so you can track cache hit ratios in Grafana and get alerted when L1 hit rate drops below a threshold (indicating sizing or TTL misconfiguration).

3. L2 Redis Cache: Distributed Caching Configuration

Redis as L2 provides a shared cache across all service instances — critical in horizontally scaled deployments where different pods serve different users. Without L2, instance A might cache a product while instance B fetches it from the database again.

@Configuration
public class RedisCacheConfig {

    @Bean
    public RedisCacheManager redisCacheManager(RedisConnectionFactory connectionFactory) {
        // Configure per-cache TTLs using entryTtl override
        Map<String, RedisCacheConfiguration> configs = Map.of(
            "products",       defaultConfig().entryTtl(Duration.ofMinutes(10)),
            "user-sessions",  defaultConfig().entryTtl(Duration.ofMinutes(30)),
            "pricing",        defaultConfig().entryTtl(Duration.ofSeconds(60))
        );

        return RedisCacheManager.builder(connectionFactory)
            .cacheDefaults(defaultConfig())
            .withInitialCacheConfigurations(configs)
            .build();
    }

    private RedisCacheConfiguration defaultConfig() {
        return RedisCacheConfiguration.defaultCacheConfig()
            .serializeKeysWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new GenericJackson2JsonRedisSerializer()))
            .disableCachingNullValues()
            .prefixCacheNameWith("myapp:v1:");  // namespace prevents key collisions across services
    }
}
Key: Always prefix cache names with a service name and version. Without prefixes, two microservices caching under the same key name ("products") in shared Redis will overwrite each other's data — a subtle bug that's extremely hard to diagnose.

4. Two-Tier Cache Hierarchy: Caffeine + Redis

Spring's CompositeCacheManager doesn't truly implement a tiered hierarchy — it returns the first cache that finds a value, but doesn't write through to L2 on L1 miss automatically. For a true L1 → L2 → database hierarchy, implement a custom Cache wrapper:

public class TwoLevelCache implements Cache {
    private final Cache l1;   // Caffeine
    private final Cache l2;   // Redis
    private final String name;

    public TwoLevelCache(String name, Cache l1, Cache l2) {
        this.name = name; this.l1 = l1; this.l2 = l2;
    }

    @Override
    public ValueWrapper get(Object key) {
        // L1 hit — fastest path
        ValueWrapper l1Value = l1.get(key);
        if (l1Value != null) return l1Value;

        // L2 hit — populate L1 for future requests on this instance
        ValueWrapper l2Value = l2.get(key);
        if (l2Value != null) {
            l1.put(key, l2Value.get());  // warm L1
            return l2Value;
        }

        return null;  // full miss — caller will load from database
    }

    @Override
    public void put(Object key, Object value) {
        l1.put(key, value);
        l2.put(key, value);
    }

    @Override
    public void evict(Object key) {
        l1.evict(key);
        l2.evict(key);
    }

    @Override
    public String getName() { return name; }
    @Override
    public Object getNativeCache() { return this; }
}

With this implementation, a Redis outage degrades gracefully: L1 hits continue serving requests from in-process memory. Only new items or expired L1 entries need the database. This is exactly the resilience pattern that would have prevented the retail platform outage described in the introduction. For more resilience patterns, see Cache Stampede Prevention in High-Traffic Microservices.

5. Cache-Aside vs Read-Through vs Write-Behind

Cache-Aside (lazy population) is what Spring's @Cacheable implements: check the cache, miss → load from database → store in cache → return. The application manages cache population explicitly. This is the most flexible pattern because the application controls the cache key and serialization.

Read-Through places the cache between the application and the database — the cache itself loads from the database on miss. Caffeine's LoadingCache is the Java implementation:

LoadingCache<ProductId, Product> productCache = Caffeine.newBuilder()
    .maximumSize(5000)
    .expireAfterWrite(Duration.ofMinutes(5))
    .build(productId -> productRepository.findById(productId)
        .orElseThrow(() -> new ProductNotFoundException(productId)));

// Usage — always hits the cache; cache loads from DB if needed
Product product = productCache.get(productId);

Write-Behind (async write) lets the cache absorb writes and flush them to the database asynchronously, dramatically reducing write latency for high-write workloads. Appropriate for counters, metrics, and non-critical state. Use with caution — a JVM crash before flush means data loss.

6. Cache Eviction Strategies: TTL, LRU, Manual Invalidation

Choosing the right TTL is more art than science and requires understanding your data's change frequency. A product's price might change every few minutes during flash sales; a shipping zone configuration might change monthly. Use different TTLs per cache name rather than a global default:

For event-driven invalidation, use @CacheEvict combined with a Redis Pub/Sub channel that broadcasts invalidation events to all instances, causing their L1 caches to evict stale entries:

@CacheEvict(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
    Product saved = productRepository.save(product);
    // Publish eviction event so other instances evict their L1
    redisTemplate.convertAndSend("cache.evict.products", product.getId().toString());
    return saved;
}

7. Cache Stampede Prevention with Locks and Jitter

Two techniques prevent stampedes when a popular cache entry expires. Mutex locking: only one thread loads from the database; all others wait on the lock and then read the freshly-populated cache entry. Caffeine's LoadingCache implements this automatically. For Redis, use a distributed lock:

public Product getProduct(ProductId id) {
    String cacheKey = "product:" + id;
    Product cached = (Product) redisTemplate.opsForValue().get(cacheKey);
    if (cached != null) return cached;

    // Distributed lock — only one instance loads from DB
    String lockKey = "lock:" + cacheKey;
    Boolean locked = redisTemplate.opsForValue()
        .setIfAbsent(lockKey, "1", Duration.ofSeconds(5));

    if (Boolean.TRUE.equals(locked)) {
        try {
            Product product = productRepository.findById(id).orElseThrow();
            // Jitter: randomize TTL ±10% to stagger expirations
            long ttl = 300 + ThreadLocalRandom.current().nextInt(-30, 30);
            redisTemplate.opsForValue().set(cacheKey, product, Duration.ofSeconds(ttl));
            return product;
        } finally {
            redisTemplate.delete(lockKey);
        }
    }
    // Another instance is loading — brief wait then retry
    Thread.sleep(50);
    return getProduct(id);
}

8. Production Pitfalls: Serialization, Stale Data, Cache Poisoning

Serialization mismatch: When you add a new field to a cached class and deploy, existing Redis entries serialized without that field will either throw deserialization errors or silently return null for the new field. Mitigation: use versioned cache key prefixes (v1:, v2:) and a graceful miss fallback. Never deserialize without a try/catch that evicts on failure.

Stale-while-revalidate pattern: Instead of evicting and causing a miss, some systems serve the stale entry while a background thread refreshes it. This trades consistency for availability — acceptable for product listings, dangerous for inventory or pricing data.

Cache poisoning: If user-controlled input can influence cache keys and you don't sanitize it, an attacker can store arbitrary data under crafted keys, polluting the cache for legitimate users. Always normalize and validate cache key components before use.

9. Key Takeaways

Read Full Blog Here

Explore more Spring Boot production engineering articles at:

mdsanwarhossain.me

Related Posts

Microservices

Cache Stampede Prevention

Preventing thundering herd problems in high-traffic microservices.

Core Java

Spring Boot Performance Tuning

End-to-end Spring Boot performance optimization strategies for production.

System Design

Rate Limiting & Caching

Protecting services with rate limiting and caching at scale.

Last updated: March 2026 — Written by Md Sanwar Hossain