Software Engineer · Java · Spring Boot · Microservices
Designing a Global CDN Architecture: Edge Caching, Cache Invalidation, and Origin Shield Patterns
A Content Delivery Network is not simply a cache sitting in front of your origin server — it is a distributed system with its own topology, failure modes, routing strategies, and consistency trade-offs. Getting CDN architecture right is the difference between a platform that absorbs a 10× traffic spike without the origin even noticing, and one that collapses under coordinated cache misses at launch time. This post walks through every layer of a production-grade global CDN: from PoP topology and cache hierarchy, through smart invalidation and origin shield design, to TLS termination, failover semantics, and a real incident post-mortem that explains exactly how a cache stampede took down a video platform for eight minutes.
Table of Contents
- CDN Architecture Fundamentals: PoPs, Edge Nodes, and the Origin Shield
- Cache Hierarchy: Edge → Regional → Origin Shield → Origin
- Routing Strategies: Anycast, GeoDNS, and Latency-Based Routing
- Cache Key Design and Vary Headers: Getting Cache Hits Right
- Cache Invalidation Strategies: Purge, Soft Purge, and Surrogate Keys
- TLS Termination at the Edge: Certificate Management and OCSP Stapling
- Origin Shield: Protecting the Origin from Cache Misses at Scale
- Failover and Degraded Mode: Serving Stale Content During Origin Outages
- Production Incident: The Cache Stampede That Took Down a Video Platform
1. CDN Architecture Fundamentals: PoPs, Edge Nodes, and the Origin Shield
A global CDN is built around Points of Presence (PoPs) — physical data centers, typically collocated with internet exchange points (IXPs) in major cities. A well-scaled CDN operates 50 to 300+ PoPs worldwide, strategically placed in regions like Frankfurt, Singapore, São Paulo, Mumbai, and Ashburn (Virginia) where major internet traffic concentrates. Each PoP contains multiple edge servers — anywhere from 10 to several hundred commodity machines, each running the CDN software stack responsible for TLS termination, cache lookup, request routing, and response delivery.
The value proposition is pure physics: a user in Tokyo requesting a static asset from an origin in us-east-1 experiences roughly 180ms of round-trip latency just from the speed-of-light constraint. A PoP in Tokyo serves that same asset from local SSD in under 5ms. For dynamic content that cannot be cached, the CDN still provides value by maintaining persistent, optimized TCP connections to the origin over dedicated backbone links — reducing handshake overhead for every user request.
Between the edge PoPs and the origin sits the origin shield — a dedicated cluster (usually a single geographic region chosen for low latency to the origin) whose sole job is to coalesce cache misses from all PoPs into a minimal number of upstream requests. The origin shield is the CDN's last line of defense before traffic reaches your application servers. Without it, every PoP independently fetches uncached content from the origin, multiplying your origin load by the number of PoPs you operate.
2. Cache Hierarchy: Edge → Regional → Origin Shield → Origin
Modern CDNs implement a multi-tier cache hierarchy. Understanding each tier's capacity, TTL characteristics, and eviction policy is critical for tuning hit rates.
L1 — Edge Cache: Each edge server within a PoP maintains a local SSD-backed cache, typically 50–200 GB per node depending on hardware generation. TTLs at the edge are short — typically 1 to 15 minutes for dynamic content, up to 24 hours for static assets. The edge cache uses LRU (Least Recently Used) or LFU (Least Frequently Used) eviction. Because edge nodes are closest to users, maximizing the L1 hit rate has the highest latency impact. A cache miss at the edge is measured in tens of milliseconds for L2 retrieval; a miss all the way to origin can be hundreds of milliseconds.
L2 — Regional Cache: Some CDN architectures insert a regional aggregation layer between edge PoPs and the origin shield. Regional nodes are fewer in number (perhaps 10–20 globally) and carry significantly larger cache capacity — 5 to 20 TB of NVMe storage per node. TTLs are longer at this tier, ranging from a few hours to several days. The regional tier absorbs misses from multiple edge PoPs in the same geographic area before they propagate to the origin shield.
L3 — Origin Shield: The origin shield operates as an in-memory and NVMe-backed cache, but its most important function is not storage capacity — it is request coalescing. When a popular piece of content expires simultaneously across hundreds of edge nodes (a phenomenon called a cache stampede or thundering herd), those nodes all send cache-miss requests upstream at the same moment. Without coalescing, every one of those requests propagates to the origin. The origin shield serializes them: it detects that 500 edge nodes are all requesting the same URL within the same 50ms window, issues a single upstream request to the origin, and fans the response back to all 500 waiting edge nodes.
# Cache hierarchy summary
L1 Edge Cache: 50–200 GB SSD per node, TTL: 1–15 min (dynamic), up to 24h (static)
L2 Regional Cache: 5–20 TB NVMe per node, TTL: hours to days
L3 Origin Shield: In-memory LRU + coalescing, TTL: mirrors origin Cache-Control headers
Request flow on a cache miss:
User → Edge (L1 miss) → Regional (L2 miss) → Origin Shield (L3 miss or coalesced)
→ Origin (single upstream request)
← Response cached at all tiers
User ← Edge ← Regional ← Origin Shield ← Origin
Request coalescing is the single most impactful CDN feature for origin protection. Fastly calls this "request collapsing." Varnish implements it natively. The mechanism works by holding the first miss request open (waiting for the origin response) and queuing all subsequent requests for the same cache key. When the origin responds, the result is atomically written to the cache and all queued requests are served the cached copy simultaneously. The net effect: 1 origin request satisfies N simultaneous edge misses.
3. Routing Strategies: Anycast, GeoDNS, and Latency-Based Routing
How a user's request gets routed to the nearest (or best) PoP is itself a non-trivial distributed systems problem. Three primary strategies are used in production, often in combination.
Anycast: The CDN announces the same IP address from every PoP simultaneously via BGP. The internet's routing infrastructure naturally delivers each user's packets to the BGP-topologically closest PoP — the one with the shortest AS path. Anycast is fast to set up, requires no DNS logic, and provides automatic failover: if a PoP disappears, BGP withdraws its routes and traffic flows to the next closest PoP within minutes. Cloudflare's entire network runs on anycast. The downside: BGP shortest-path is not the same as geographically closest or lowest latency. A user in Los Angeles might have better latency to the Dallas PoP if AS-path to Dallas is shorter than to the LA PoP due to peering agreements.
GeoDNS: The CDN's authoritative DNS server inspects the client's IP address (or the recursive resolver's IP as a proxy), maps it to a geographic region using an IP geolocation database (MaxMind, IP2Location), and returns the A/CNAME record pointing to the nearest PoP cluster. GeoDNS allows fine-grained control: you can pin specific countries to specific PoPs, implement traffic splitting by geography, and respond differently to ISP-level subnets. The limitation is DNS TTL latency — a DNS record change takes up to the TTL (typically 60–300 seconds) to propagate globally, making GeoDNS failover slower than anycast.
Latency-based routing: Some CDNs (notably AWS CloudFront) continuously measure actual round-trip time from clients to PoPs using passive and active probing, then route new connections to the PoP with the lowest measured latency rather than the geographically or topologically closest. This accounts for real network conditions — congestion, cross-ocean asymmetric routing, and satellite internet paths — that pure geographic or BGP-based routing misses.
4. Cache Key Design and Vary Headers: Getting Cache Hits Right
Cache hit rate is the most important CDN performance metric, and cache hit rate is directly controlled by how you construct the cache key — the string that uniquely identifies a cacheable resource. The naive cache key is the full request URL including all query parameters:
# Naive cache key (destroys hit rate)
https://cdn.example.com/product/123?color=blue&size=M&utm_source=newsletter&utm_campaign=spring2026&fbclid=abc123
# These are all the same resource but generate DIFFERENT cache keys:
https://cdn.example.com/product/123?size=M&color=blue&utm_source=newsletter
https://cdn.example.com/product/123?COLOR=blue&SIZE=m
https://cdn.example.com/product/123?color=blue&size=M
Each variation above generates a cache miss and a separate origin request, despite all returning identical content. The fix requires a systematic cache key normalization strategy applied at the edge:
# Cache key normalization rules (applied by edge VCL/worker script)
1. Strip tracking parameters (allowlist approach — keep only known functional params):
Remove: utm_source, utm_medium, utm_campaign, utm_content, utm_term
Remove: fbclid, gclid, msclkid, twclid (ad-network click IDs)
Keep: color, size, page, sort, filter (params that affect content)
2. Sort remaining query parameters alphabetically:
?size=M&color=blue → ?color=blue&size=M
3. Lowercase the entire URL path (for case-insensitive origins):
/Product/123 → /product/123
4. Normalize scheme (always https in cache key even if request was http):
http://cdn.example.com/... → https://cdn.example.com/...
# Result: all 3 URLs above collapse to the same cache key:
https://cdn.example.com/product/123?color=blue&size=M
The HTTP Vary header introduces cache fragmentation at the CDN level. When your origin responds with Vary: Accept-Encoding, Accept-Language, the CDN must store a separate cached copy for each combination of Accept-Encoding and Accept-Language values. A single URL with 3 encoding variants (gzip, br, identity) and 10 language variants explodes into 30 cache entries. This is rarely intentional.
The production pattern is to handle content negotiation at the edge — the edge node selects the appropriate pre-compressed variant based on Accept-Encoding, stores compressed responses keyed by content type, and strips the Vary header before caching. For Accept-Language, the preferred approach is to encode the language in the URL path (/en/product/123 vs /de/product/123) rather than relying on Vary: Accept-Language, which is invisible in URLs and creates cache pollution that is extremely difficult to debug.
5. Cache Invalidation Strategies: Purge, Soft Purge, and Surrogate Keys
Phil Karlton's famous quip — "There are only two hard things in Computer Science: cache invalidation and naming things" — becomes a production reality the moment you need to push a price change to 200 PoPs within 5 seconds. Three invalidation strategies cover the vast majority of production use cases.
Instant purge by URL: The simplest strategy — send a DELETE or PURGE request to the CDN API specifying the exact URL. The CDN propagates a purge command to all PoPs, and within 5–30 seconds (depending on the CDN's internal gossip or control-plane propagation speed), the cached copy is evicted globally. Any subsequent request triggers a fresh origin fetch. The limitation is atomicity at scale: purging a product catalog with 500,000 SKUs one URL at a time is neither fast nor practical, and the window between first and last purge means some PoPs serve stale content during propagation.
Soft purge (mark stale): Instead of deleting the cached copy, a soft purge marks it as stale while keeping it in the cache. Subsequent requests for that resource serve the stale copy immediately (no user-facing latency spike) while the edge asynchronously revalidates with the origin via a conditional GET. If the origin returns a 304 Not Modified, the existing copy is refreshed in-place. This pattern is ideal for high-availability scenarios where a brief period of staleness is acceptable and a thundering herd of synchronous origin requests (triggered by a hard purge) would be dangerous.
Surrogate keys (cache tags): The most powerful and scalable invalidation mechanism. Your origin attaches logical tags to responses via the Surrogate-Key header (Fastly) or Cache-Tag header (Cloudflare). The CDN indexes each cached object by all its tags. A single purge API call targeting a tag instantly invalidates every cached object bearing that tag, across all PoPs, regardless of URL structure.
# Origin response headers: tag this resource with multiple logical keys
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: public, max-age=300, stale-while-revalidate=60, stale-if-error=86400
Surrogate-Key: product:123 category:shoes user-segment:premium
# A single API call purges ALL content tagged with product:123 across ALL PoPs
curl -X POST "https://api.fastly.com/service/SVC_ID/purge/product:123" \
-H "Fastly-Key: $API_TOKEN"
# Effect: every URL that served content tagged product:123 is simultaneously invalidated.
# This includes:
# /product/123 (product detail page)
# /api/product/123/ (JSON API response)
# /category/shoes (listing page that includes product 123)
# /search?q=running (search results that contained product 123)
The surrogate key pattern effectively solves the URL-explosion problem of individual purges. When a product's price is updated in the database, the e-commerce backend fires a single purge call to the CDN for the tag product:123. Every page that renders data from that product — detail pages, category listings, search results, API responses, promotional widgets — is invalidated atomically. Both Fastly and Cloudflare support this pattern natively; Akamai calls it "Fast Purge with cache tags."
6. TLS Termination at the Edge: Certificate Management and OCSP Stapling
When a user's browser establishes an HTTPS connection to your CDN-fronted website, TLS is terminated at the edge PoP — the edge server holds the private key, performs the TLS handshake, and decrypts the request. The subsequent connection from the edge to your origin (within the CDN's private backbone) can be plain HTTP within the datacenter or TLS re-encryption depending on your security posture.
Edge TLS benefits: The CPU overhead of TLS decryption is distributed across hundreds of edge nodes rather than concentrated on origin servers. TLS session resumption is handled at the edge using session tickets or session IDs, meaning returning users skip the full handshake overhead. HTTP/2 and HTTP/3 (QUIC) multiplexing is negotiated at the edge, further reducing connection overhead for users on high-latency links.
OCSP stapling: During a TLS handshake, browsers traditionally make a real-time request to the Certificate Authority's OCSP (Online Certificate Status Protocol) endpoint to verify the server's certificate hasn't been revoked. This adds a DNS lookup + TCP connection + HTTP round trip — easily 100–300ms — to every new TLS connection. OCSP stapling eliminates this: the edge server periodically fetches the OCSP response from the CA and caches it locally, then "staples" the pre-fetched response directly into the TLS handshake. The browser receives the certificate plus the CA's freshness token in a single round trip, with no additional OCSP lookup required.
# Verify OCSP stapling is active on an edge endpoint
openssl s_client -connect cdn.example.com:443 -status -servername cdn.example.com 2>/dev/null \
| grep -A 7 "OCSP Response"
# Expected output when OCSP stapling is working:
OCSP Response Status: successful (0x0)
Response verify OK
Cert Status: good
This Update: Mar 19 12:00:00 2026 GMT
Next Update: Mar 26 12:00:00 2026 GMT
Certificate rotation and zero-trust edge-to-origin: CDN providers handle certificate issuance and renewal via ACME protocol integration (Let's Encrypt) or through proprietary certificate management platforms. Certificates are typically rotated 30 days before expiry, with gradual rollout across PoPs to detect compatibility issues before global propagation.
7. Origin Shield: Protecting the Origin from Cache Misses at Scale
The origin shield's value becomes viscerally clear with simple arithmetic. Without an origin shield, every one of your 200 edge PoPs independently fetches uncached content from your origin. If a popular article expires simultaneously across all PoPs (triggered by its TTL), your origin receives 200 requests for that single URL within a 100ms window — before any PoP has had time to cache the response and serve it to queued local requests.
Now add request coalescing per PoP: if each PoP has 1,000 users simultaneously requesting the same expired resource, and each PoP coalesces locally to 1 upstream request, you still have 200 requests hitting the origin. With an origin shield coalescing across all PoPs, those 200 PoP-level requests collapse to a single shield-to-origin request. The shield serves the response to all 200 waiting PoPs from a single origin fetch. This is the canonical "thundering herd prevention" pattern at CDN scale.
# Without origin shield:
200 PoPs × 1 coalesced miss per PoP = 200 origin requests per popular URL expiry
# With origin shield (single shield cluster, e.g., in us-east-1):
200 PoPs → origin shield → 1 origin request
Origin shield serves response to all 200 PoPs simultaneously
# --- Configuring origin shield in AWS CloudFront ---
# Enable "Origin Shield" in the distribution origin settings:
# Region: us-east-1 (choose lowest latency to your origin)
# CloudFront adds header: X-Forwarded-For includes shield node IP
# Origin can identify shield requests via custom header:
# X-CDN-Origin-Shield: true (set in origin custom headers config)
# --- Configuring shield in Fastly ---
# In the Fastly UI or API, set "shielding" on the origin:
# Shield: IAD (Washington DC PoP, close to us-east-1)
# Fastly VCL exposes: req.is_cluster_edge (true when at shielding PoP)
# --- Cloudflare Tiered Cache (equivalent to origin shield) ---
# Enable Smart Tiered Cache in Caching → Tiered Cache settings
# Cloudflare automatically selects the upper-tier PoP closest to your origin
Choosing the origin shield location is a latency-versus-redundancy trade-off. The shield should be as close as possible to your origin to minimize the latency added to cache misses. A shield in the same AWS region as your origin adds 1–3ms; a shield on the wrong continent adds 80–150ms to every cache miss, negating much of the CDN's benefit for time-sensitive content. Most CDN providers recommend a shield in the same cloud region or the nearest major IXP to your origin data center.
8. Failover and Degraded Mode: Serving Stale Content During Origin Outages
Your origin will experience outages — database failovers, deployment rollbacks, regional AWS incidents, network partitions. The CDN is your buffer between those outages and your users. Two HTTP cache directives are the production tools for configuring degraded-mode behavior: stale-while-revalidate and stale-if-error.
# Production Cache-Control header combining all three directives:
Cache-Control: public, max-age=300, stale-while-revalidate=60, stale-if-error=86400
# Semantics:
# max-age=300 → serve from cache without revalidation for up to 5 minutes
# stale-while-revalidate=60 → after 5 min, serve the stale copy immediately while
# asynchronously refreshing in the background (up to 1 min grace)
# stale-if-error=86400 → if origin returns 5xx (or is unreachable), serve the stale
# copy for up to 24 hours rather than propagating the error to users
stale-while-revalidate is the workhorse for high-availability under normal conditions. When content expires (max-age exceeded), instead of blocking the current request while fetching a fresh copy from the origin, the CDN immediately serves the stale cached copy and dispatches a background revalidation request. The user experiences zero latency penalty. The background request updates the cache for subsequent users. The trade-off is a brief window (up to the stale-while-revalidate duration) during which users may receive slightly stale content — typically acceptable for product listings, article pages, and non-financial data.
stale-if-error is your disaster recovery mechanism. When the origin returns a 5xx error or is completely unreachable (connection refused, timeout, DNS resolution failure), the CDN falls back to the most recent cached copy and serves it for up to the stale-if-error duration. From the user's perspective, the page loads normally. The origin outage is invisible.
stale-if-error=86400 configured on all product and category pages, the CDN served cached content throughout the outage. Zero HTTP 5xx errors reached end users. The only observable impact was that prices updated 90 seconds before the incident were served from pre-failover cache during the outage window — a known and accepted trade-off documented in their SLA. Recovery was transparent: once the origin resumed serving 200 responses, the CDN's background revalidation repopulated the cache within minutes.
9. Production Incident: The Cache Stampede That Took Down a Video Platform
At 8:00 PM on a Thursday, a video streaming platform with 50 million subscribers launched a new season of its most popular original series. The engineering team had prepared for months: origin servers were scaled to 5× normal capacity, the CDN had been pre-warmed with static episode thumbnails and promotional assets, and load tests had been run at 200% of expected peak concurrency.
What the team hadn't accounted for was the interaction between their CDN's TTL configuration and the launch timing. Episode metadata — the JSON payload containing episode titles, playback URLs, subtitle track lists, and quality-level manifests — was configured with a 5-minute TTL and no stale-while-revalidate or stale-if-error directives. The content had been published and cached at 7:55 PM during pre-launch warmup.
At exactly 8:00 PM, 2.1 million users simultaneously opened the platform and navigated to the episode page. The 5-minute TTL on the metadata payload expired at exactly 8:00 PM across all edge nodes that had cached it during the 7:55 PM warmup window. Every edge node simultaneously experienced a cache miss for the episode metadata endpoint.
The CDN's origin shield had request coalescing enabled — but a configuration audit 3 weeks earlier had disabled coalescing specifically for the /api/v2/episode/ path family as a debugging measure and it had never been re-enabled. Without coalescing, the origin shield passed every miss through directly to the origin application servers. The origin received approximately 40,000 requests per second within the first 3 seconds after 8:00 PM — roughly 80× its normal sustained load and 16× the load the scaled capacity had been tested against.
- T+0s: Cache miss storm begins. Origin receives 40,000 req/s.
- T+3s: Origin database connection pool (max 500 connections) exhausted. New requests queue in application layer.
- T+8s: Application request queue fills. New metadata requests receive HTTP 503.
- T+12s: CDN begins caching the 503 responses (no
no-storeon error responses — another misconfiguration). The 503s spread to all PoPs. - T+45s: Incident declared. Engineering team begins emergency response.
- T+4m: Request coalescing re-enabled on the episode metadata path. Origin load drops to 200 req/s. But cached 503s continue serving from PoPs for the remaining TTL duration.
- T+6m: Emergency purge issued for all episode metadata URLs. Fresh origin fetches succeed.
- T+8m: Full service restoration. 8 minutes of degraded service. 400,000 users experienced errors during the peak launch window.
Root cause: Three independent misconfigurations compounded into a catastrophic failure: (1) request coalescing disabled on the highest-traffic path, (2) no stale-while-revalidate on metadata TTL, and (3) 5xx responses not explicitly marked Cache-Control: no-store, allowing the CDN to cache error responses and amplify the incident duration.
Remediations implemented: Request coalescing was made a required, audited configuration setting with automated compliance checks in the CDN configuration pipeline. All API endpoints were updated to include stale-while-revalidate=120, providing a 2-minute grace window during which stale content serves while the origin revalidates. Error responses were globally patched to return Cache-Control: no-store, preventing the CDN from ever caching 5xx responses. A synthetic canary that validates CDN configuration correctness was added to the deployment pipeline, running checks for coalescing state, error caching behavior, and stale directives before any production change is approved.
Key Takeaways
- Request coalescing at the origin shield is non-negotiable at scale — without it, a single popular URL expiry can generate N×M origin requests where N is PoP count and M is concurrent users per PoP, collapsing your origin within seconds.
- Cache key normalization is the highest-leverage hit-rate improvement — strip tracking parameters, sort query params alphabetically, and lowercase paths to collapse URL variants that serve identical content into a single cache entry.
- stale-if-error is your CDN disaster recovery layer — a 24-hour stale-if-error window means your users are insulated from origin outages up to a day long, provided content was cached before the failure. Configure it on every cacheable endpoint.
- Surrogate keys replace URL-by-URL purge at any meaningful scale — tag every response with logical entity identifiers, and invalidate entire content graphs with a single API call instead of iterating millions of URLs.
- Never cache 5xx responses — always include
Cache-Control: no-storeorCache-Control: no-cacheon error responses. A cached 503 propagating to all PoPs turns a 30-second origin outage into an 8-minute user-facing incident. - Treat CDN configuration as code — version-control VCL/worker scripts, enforce coalescing and stale directive requirements via automated compliance checks in your CI/CD pipeline, and test CDN behavior in staging environments with traffic replay before production changes.
Conclusion
A CDN is not a plug-and-play performance layer — it is a distributed system with its own correctness requirements, failure modes, and operational complexity. The decisions you make about cache key construction, TTL values, coalescing configuration, surrogate key tagging, and stale directives directly determine whether your origin survives a traffic spike or collapses under it. The architecture choices compound: a missing stale-if-error directive, combined with a disabled coalescing rule, combined with no no-store on errors, is not three minor misconfigurations — it is a recipe for the exact eight-minute outage described above.
The teams that operate CDNs reliably treat every CDN configuration setting with the same rigor as application code: version-controlled, tested, reviewed, and monitored. They instrument cache hit rates per endpoint, alert on sudden hit-rate drops (which signal either a cache key regression or a traffic pattern change), and regularly audit coalescing and stale configuration against a compliance baseline. The CDN is the first and most powerful line of defense for your origin's availability — invest in understanding it as deeply as you understand your application tier, and it will silently absorb the traffic spikes that would otherwise define your incident history.
Discussion / Comments
Related Posts
Consistent Hashing
Distribute load evenly across nodes with minimal reshuffling using consistent hashing and virtual nodes.
Cache Stampede Prevention
Stop thundering herd cache misses from overwhelming your origin with probabilistic early expiry and locking strategies.
Rate Limiting & Caching
Protect your APIs and services with token bucket, sliding window rate limiting, and multi-layer caching patterns.
Last updated: March 2026 — Written by Md Sanwar Hossain