Software Engineer · Java · Spring Boot · Microservices
Spring Boot Multi-Level Caching: Caffeine L1 + Redis L2 for High-Throughput APIs
A product catalog API at a major retail platform was serving 50,000 requests per second during peak flash sales. The database was holding up fine — until it wasn't. A minor Redis hiccup caused 8,000 requests per second to fall through to PostgreSQL simultaneously, overwhelmed the connection pool, and cascaded into a full service outage. The root cause: a single-layer cache strategy with no local fallback. A two-tier caching architecture with Caffeine as L1 and Redis as L2 would have absorbed the Redis failure gracefully. This post shows you exactly how to build it.
Table of Contents
- The Cache Miss Cascade: When @Cacheable Fails at Scale
- L1 Caffeine Cache: In-Process Caching
- L2 Redis Cache: Distributed Caching Configuration
- Two-Tier Cache Hierarchy: Caffeine + Redis
- Cache-Aside vs Read-Through vs Write-Behind
- Cache Eviction Strategies: TTL, LRU, Manual Invalidation
- Cache Stampede Prevention with Locks and Jitter
- Production Pitfalls: Serialization, Stale Data, Cache Poisoning
- Key Takeaways
1. The Cache Miss Cascade: When @Cacheable Fails at Scale
Spring's @Cacheable annotation is deceptively simple — add it to a method, point it at a cache name, done. For low-traffic services or development, this works beautifully. Under production load, naive usage leads to the thundering herd problem: when a cached item expires, all concurrent requests that need it hit the database simultaneously, generating a spike far beyond what the database would see without any cache at all.
The problem compounds with Redis as a single cache tier. Redis is fast (sub-millisecond), but it's still a network hop. Under extreme load — 50k+ RPS — even a 0.5ms Redis read adds up. And when Redis becomes temporarily unavailable or experiences high latency, all traffic that can't hit Redis has nowhere to go but the origin database.
2. L1 Caffeine Cache: In-Process Caching
Caffeine is a high-performance, near-optimal caching library for Java. It uses the W-TinyLFU algorithm — a combination of a frequency sketch and an LRU window — which delivers significantly higher hit rates than simple LRU for typical access patterns. Caffeine caches live in the JVM heap, so reads are pure in-memory operations — nanoseconds, not microseconds.
# application.yml — Caffeine as Spring Cache provider
spring:
cache:
type: caffeine
caffeine:
spec: maximumSize=10000,expireAfterWrite=30s,recordStats
# build.gradle
implementation 'com.github.ben-manes.caffeine:caffeine:3.1.8'
@Configuration
@EnableCaching
public class CaffeineCacheConfig {
@Bean
public CacheManager caffeineCacheManager() {
CaffeineCacheManager manager = new CaffeineCacheManager();
manager.setCaffeine(Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofSeconds(30))
.expireAfterAccess(Duration.ofSeconds(60))
.recordStats()); // enables hit/miss ratio metrics
return manager;
}
}
The recordStats() call is important — it enables Micrometer metrics integration so you can track cache hit ratios in Grafana and get alerted when L1 hit rate drops below a threshold (indicating sizing or TTL misconfiguration).
3. L2 Redis Cache: Distributed Caching Configuration
Redis as L2 provides a shared cache across all service instances — critical in horizontally scaled deployments where different pods serve different users. Without L2, instance A might cache a product while instance B fetches it from the database again.
@Configuration
public class RedisCacheConfig {
@Bean
public RedisCacheManager redisCacheManager(RedisConnectionFactory connectionFactory) {
// Configure per-cache TTLs using entryTtl override
Map<String, RedisCacheConfiguration> configs = Map.of(
"products", defaultConfig().entryTtl(Duration.ofMinutes(10)),
"user-sessions", defaultConfig().entryTtl(Duration.ofMinutes(30)),
"pricing", defaultConfig().entryTtl(Duration.ofSeconds(60))
);
return RedisCacheManager.builder(connectionFactory)
.cacheDefaults(defaultConfig())
.withInitialCacheConfigurations(configs)
.build();
}
private RedisCacheConfiguration defaultConfig() {
return RedisCacheConfiguration.defaultCacheConfig()
.serializeKeysWith(RedisSerializationContext.SerializationPair
.fromSerializer(new StringRedisSerializer()))
.serializeValuesWith(RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer()))
.disableCachingNullValues()
.prefixCacheNameWith("myapp:v1:"); // namespace prevents key collisions across services
}
}
4. Two-Tier Cache Hierarchy: Caffeine + Redis
Spring's CompositeCacheManager doesn't truly implement a tiered hierarchy — it returns the first cache that finds a value, but doesn't write through to L2 on L1 miss automatically. For a true L1 → L2 → database hierarchy, implement a custom Cache wrapper:
public class TwoLevelCache implements Cache {
private final Cache l1; // Caffeine
private final Cache l2; // Redis
private final String name;
public TwoLevelCache(String name, Cache l1, Cache l2) {
this.name = name; this.l1 = l1; this.l2 = l2;
}
@Override
public ValueWrapper get(Object key) {
// L1 hit — fastest path
ValueWrapper l1Value = l1.get(key);
if (l1Value != null) return l1Value;
// L2 hit — populate L1 for future requests on this instance
ValueWrapper l2Value = l2.get(key);
if (l2Value != null) {
l1.put(key, l2Value.get()); // warm L1
return l2Value;
}
return null; // full miss — caller will load from database
}
@Override
public void put(Object key, Object value) {
l1.put(key, value);
l2.put(key, value);
}
@Override
public void evict(Object key) {
l1.evict(key);
l2.evict(key);
}
@Override
public String getName() { return name; }
@Override
public Object getNativeCache() { return this; }
}
With this implementation, a Redis outage degrades gracefully: L1 hits continue serving requests from in-process memory. Only new items or expired L1 entries need the database. This is exactly the resilience pattern that would have prevented the retail platform outage described in the introduction. For more resilience patterns, see Cache Stampede Prevention in High-Traffic Microservices.
5. Cache-Aside vs Read-Through vs Write-Behind
Cache-Aside (lazy population) is what Spring's @Cacheable implements: check the cache, miss → load from database → store in cache → return. The application manages cache population explicitly. This is the most flexible pattern because the application controls the cache key and serialization.
Read-Through places the cache between the application and the database — the cache itself loads from the database on miss. Caffeine's LoadingCache is the Java implementation:
LoadingCache<ProductId, Product> productCache = Caffeine.newBuilder()
.maximumSize(5000)
.expireAfterWrite(Duration.ofMinutes(5))
.build(productId -> productRepository.findById(productId)
.orElseThrow(() -> new ProductNotFoundException(productId)));
// Usage — always hits the cache; cache loads from DB if needed
Product product = productCache.get(productId);
Write-Behind (async write) lets the cache absorb writes and flush them to the database asynchronously, dramatically reducing write latency for high-write workloads. Appropriate for counters, metrics, and non-critical state. Use with caution — a JVM crash before flush means data loss.
6. Cache Eviction Strategies: TTL, LRU, Manual Invalidation
Choosing the right TTL is more art than science and requires understanding your data's change frequency. A product's price might change every few minutes during flash sales; a shipping zone configuration might change monthly. Use different TTLs per cache name rather than a global default:
- Volatile, high-change data (prices, inventory): TTL 30–60 seconds in both L1 and L2
- Semi-static reference data (product descriptions, categories): TTL 5–15 minutes in L2, 60 seconds in L1
- Static configuration (shipping zones, tax rules): TTL 1 hour in L2; refresh on deployment via manual eviction
For event-driven invalidation, use @CacheEvict combined with a Redis Pub/Sub channel that broadcasts invalidation events to all instances, causing their L1 caches to evict stale entries:
@CacheEvict(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
Product saved = productRepository.save(product);
// Publish eviction event so other instances evict their L1
redisTemplate.convertAndSend("cache.evict.products", product.getId().toString());
return saved;
}
7. Cache Stampede Prevention with Locks and Jitter
Two techniques prevent stampedes when a popular cache entry expires. Mutex locking: only one thread loads from the database; all others wait on the lock and then read the freshly-populated cache entry. Caffeine's LoadingCache implements this automatically. For Redis, use a distributed lock:
public Product getProduct(ProductId id) {
String cacheKey = "product:" + id;
Product cached = (Product) redisTemplate.opsForValue().get(cacheKey);
if (cached != null) return cached;
// Distributed lock — only one instance loads from DB
String lockKey = "lock:" + cacheKey;
Boolean locked = redisTemplate.opsForValue()
.setIfAbsent(lockKey, "1", Duration.ofSeconds(5));
if (Boolean.TRUE.equals(locked)) {
try {
Product product = productRepository.findById(id).orElseThrow();
// Jitter: randomize TTL ±10% to stagger expirations
long ttl = 300 + ThreadLocalRandom.current().nextInt(-30, 30);
redisTemplate.opsForValue().set(cacheKey, product, Duration.ofSeconds(ttl));
return product;
} finally {
redisTemplate.delete(lockKey);
}
}
// Another instance is loading — brief wait then retry
Thread.sleep(50);
return getProduct(id);
}
8. Production Pitfalls: Serialization, Stale Data, Cache Poisoning
Serialization mismatch: When you add a new field to a cached class and deploy, existing Redis entries serialized without that field will either throw deserialization errors or silently return null for the new field. Mitigation: use versioned cache key prefixes (v1:, v2:) and a graceful miss fallback. Never deserialize without a try/catch that evicts on failure.
Stale-while-revalidate pattern: Instead of evicting and causing a miss, some systems serve the stale entry while a background thread refreshes it. This trades consistency for availability — acceptable for product listings, dangerous for inventory or pricing data.
Cache poisoning: If user-controlled input can influence cache keys and you don't sanitize it, an attacker can store arbitrary data under crafted keys, polluting the cache for legitimate users. Always normalize and validate cache key components before use.
9. Key Takeaways
- A single Redis cache tier has no graceful degradation — Caffeine L1 absorbs traffic during Redis outages
- Caffeine's W-TinyLFU delivers higher hit rates than simple LRU for real-world access patterns
- Always prefix Redis cache keys with service name + version to prevent cross-service collisions
- Use per-cache TTLs matching data volatility — not a single global TTL for all cache names
- Add ±10% TTL jitter to prevent synchronized stampedes when many items expire together
- Enable
recordStats()on Caffeine and export metrics — unexplained latency spikes often trace back to cache hit-rate drops - Version your cached classes — serialization mismatches after deployments are a common silent failure mode
Related Posts
Cache Stampede Prevention
Preventing thundering herd problems in high-traffic microservices.
Spring Boot Performance Tuning
End-to-end Spring Boot performance optimization strategies for production.
Rate Limiting & Caching
Protecting services with rate limiting and caching at scale.
Last updated: March 2026 — Written by Md Sanwar Hossain