Distributed Caching Patterns: Invalidation, Cache Stampedes & CQRS Integration
Caching is the oldest trick in computer science: remember expensive computations so you do not repeat them. But distributed caching—where multiple servers compete for a shared cache—introduces subtle problems that do not exist in single-machine systems. Cache invalidation, the famous joke says, is one of the hardest problems in computer science. When you combine that with thousands of servers pounding a cache, cache stampedes appear: a single expired cache entry causes 1000 simultaneous database hits. This article dissects these patterns, their failure modes, and the strategies that production systems use to handle them.
Software Engineer · System Design · Performance
The Cache Stampede Incident
A video streaming platform cached popular movie metadata (title, duration, recommendations) with a 1-hour TTL. At 15:00 UTC, millions of instances had the same cache entry. At 16:00 UTC, the entry expired. In the next 100 milliseconds, 50,000 concurrent requests asked "what is the metadata for movie X?" All found the cache empty. All queried the database simultaneously. The database, expecting a few hundred metadata queries per second, received 50,000 in 100ms and crashed. For the next 30 minutes, the entire platform was down. The fix: request coalescing—when a cache miss occurs, the first request computes the value while subsequent requests wait for it.
Cache Patterns: Aside, Through, Behind
Cache-Aside Pattern (Lazy Loading)
Application code is responsible for populating the cache. On a cache miss, the application loads from the source (database) and writes to the cache. On a subsequent hit, the cache is served directly.
public Product getProduct(String id) {
// Check cache first
Product cached = cache.get("product:" + id);
if (cached != null) return cached;
// Cache miss: fetch from database
Product product = database.getProduct(id);
// Populate cache with 1-hour TTL
cache.set("product:" + id, product, Duration.ofHours(1));
return product;
}
Pros: Simple, works with any database, only caches accessed data.
Cons: Cache misses cause database hits, vulnerable to stampedes, requires cache invalidation logic in application code.
Write-Through Pattern
On writes, data is written to both the cache and the database simultaneously. Reads hit the cache (which is kept warm).
public void saveProduct(Product product) {
// Write to cache and database together
cache.set("product:" + product.id, product);
database.saveProduct(product);
}
Pros: Cache stays consistent with the database, reads are always fast, no stampedes.
Cons: Writes are slower (dual write), cache is populated with data that may never be read (write amplification).
Write-Behind Pattern (Write-Back)
Writes go to the cache immediately (fast) and are asynchronously flushed to the database later (eventual consistency).
public void saveProduct(Product product) {
// Write only to cache (instant)
cache.set("product:" + product.id, product);
// Async flush to database after 5 seconds or when batch reaches 100 items
flushService.enqueue(product);
}
Pros: Writes are extremely fast, throughput is limited only by database batch capacity.
Cons: Data loss if cache crashes before flushing, eventual consistency can cause user-facing inconsistencies, complex failure handling.
Solving the Cache Stampede
Solution 1: Request Coalescing
When multiple threads detect a cache miss simultaneously, only one computes the value while others wait for it. This serializes the computation and prevents multiple database queries.
private final Map> inFlight = new ConcurrentHashMap<>();
public Product getProduct(String id) {
String cacheKey = "product:" + id;
// Check cache
Product cached = cache.get(cacheKey);
if (cached != null) return cached;
// Coalesce: only one thread computes
CompletableFuture future = inFlight.computeIfAbsent(cacheKey, key -> {
return CompletableFuture.supplyAsync(() -> {
Product product = database.getProduct(id);
cache.set(cacheKey, product, Duration.ofHours(1));
return product;
});
});
try {
return future.get(); // Wait for computation
} finally {
inFlight.remove(cacheKey);
}
}
Solution 2: Probabilistic Early Expiration (Xfetch)
Instead of expiring at a fixed TTL, start refreshing before expiry with some probability. If a key is accessed within 5 minutes of expiry and random() < probability, refresh it from the database. This spreads the load and prevents thundering herds.
public Product getProduct(String id) {
CachedValue cached = cache.get("product:" + id);
if (cached == null) {
return fetch(id); // Complete miss
}
long secondsUntilExpiry = cached.expiresAt - now();
// Proactively refresh if close to expiry and probability check passes
if (secondsUntilExpiry < 300 && Math.random() < 0.01) {
CompletableFuture.runAsync(() -> fetch(id)); // Non-blocking refresh
}
return cached.value;
}
Solution 3: Stale-While-Revalidate
Serve stale data from the cache while recomputing in the background. Users see data instantly; eventual consistency is tolerated for non-critical data.
public Product getProduct(String id) {
CachedValue cached = cache.get("product:" + id);
if (cached != null && !cached.isExpired()) {
return cached.value; // Fresh
}
if (cached != null && cached.isStale()) {
// Return stale but trigger refresh
CompletableFuture.runAsync(() -> refresh(id));
return cached.value; // Return stale immediately
}
// Complete miss: must compute
return fetch(id);
}
Cache Invalidation Strategies
Time-Based Expiry (TTL)
Entries expire after a fixed duration. Simple but can tolerate staleness up to the TTL window.
// Cache entries expire after 1 hour
cache.set("user:123", user, Duration.ofHours(1));
Event-Based Invalidation
When data changes (e.g., a user updates their profile), publish an event. Cache subscribers hear the event and invalidate the entry.
@Service
public class UserService {
public void updateUser(User user) {
database.saveUser(user);
// Publish invalidation event
eventBus.publish(new UserUpdatedEvent(user.id));
}
}
@Component
public class CacheInvalidator {
@EventListener
public void onUserUpdated(UserUpdatedEvent event) {
cache.delete("user:" + event.userId);
}
}
Pattern: Active-Active Invalidation
In a distributed system, different cache nodes may have the same entry. A single write should invalidate the entry on all nodes. This requires:
- A message bus (Kafka, RabbitMQ) that broadcasts invalidation messages
- All cache clients subscribe to invalidation topics
- Cache consistency is eventually consistent (stale reads for a few milliseconds)
CQRS Integration: Separating Reads and Writes
Command Query Responsibility Segregation (CQRS) splits your data model into write and read models. The write model is optimized for transactions; the read model is optimized for caching and queries.
// Write model: normalized, transactional
@Entity
public class User {
@Id Long id;
String email;
String name;
}
// Read model: denormalized, cached
@Data
public class UserReadModel {
Long id;
String email;
String name;
List recentOrders; // Denormalized from Order table
int orderCount; // Precomputed aggregate
String membershipTier; // Denormalized
}
// Reads hit the read model (which is in Redis)
@Service
public class UserQuery {
public UserReadModel getUserProfile(Long id) {
return redisCache.get("user:profile:" + id, UserReadModel.class);
}
}
// Writes go to the write model; changes are published to update read models
@Service
public class UserCommand {
public void updateUserEmail(Long id, String email) {
// Write to the transactional write model
user.setEmail(email);
writeDatabase.save(user);
// Publish a domain event
eventBus.publish(new UserEmailChangedEvent(id, email));
}
}
// Read model is kept in sync via event subscribers
@Component
public class UserReadModelUpdater {
@EventListener
public void onUserEmailChanged(UserEmailChangedEvent event) {
UserReadModel model = readModel.get(event.userId);
model.setEmail(event.newEmail);
redisCache.set("user:profile:" + event.userId, model);
}
}
Consistency Models in Distributed Caches
Strong Consistency
All readers see the same value immediately after a write. Achieved via synchronous replication or write-through patterns. Slow but correct.
Eventual Consistency
After a write, readers may see stale data for a brief period. Data converges to the correct value after the update propagates. Fast but temporarily inconsistent. Acceptable for most non-critical data (user preferences, recommendations, product metadata).
Causal Consistency
Operations that are causally related appear in order. If A writes X and then B reads X and writes Y, all readers see X before Y. Complex to implement but valuable for data with dependencies.
TTL Optimization: Balancing Freshness and Load
A short TTL (5 minutes) means stale data is bounded but cache misses are frequent. A long TTL (24 hours) reduces misses but increases staleness. In production, TTL should vary by data type:
- User profiles: 1 hour (infrequently changed, staleness tolerated)
- Product inventory: 5 minutes (frequent changes, near-real-time required)
- Recommendations: 24 hours (change infrequently, staleness acceptable)
- Real-time feeds: 30 seconds (highly dynamic, freshness critical)
Monitoring Cache Health
Observe:
- Hit rate: % of requests served from cache. Should be >80% for well-tuned caches.
- Eviction rate: How often entries are evicted due to memory pressure. High eviction means cache is too small.
- Staleness: Age of cached data relative to TTL. Should be skewed toward fresh (median age < 25% of TTL).
- Stampede incidents: Count of cache misses for the same key in rapid succession. Should be <5 per minute.
Caching is not optional in modern systems; it is essential for performance. But caching incorrectly is worse than no caching at all. Master these patterns, and your system will handle millions of concurrent requests.
Key Takeaways
- Cache-aside is simple but vulnerable to stampedes; write-through is safer but slower; write-behind is fast but risky.
- Request coalescing prevents thundering herds by serializing cache misses.
- Probabilistic early expiration and stale-while-revalidate spread load without sacrificing availability.
- Event-based invalidation is more accurate than TTL-based but requires discipline in application code.
- CQRS separates read and write models, allowing each to be optimized independently.
- Monitor hit rates, eviction rates, and staleness to detect cache inefficiencies early.
Tags:
Read More
Explore related deep-dives on system design and performance:
- Database Sharding at Scale — complement caching with horizontal scaling
- Database Replication and Consistency — understand replication lag with caching
- Transaction Isolation Levels — ensure consistency with cached data
Discussion / Comments
Related Posts
Last updated: March 2026 — Written by Md Sanwar Hossain