Why a CDN has 3 cache layers, not 1

A single CDN edge layer creates a bottleneck; three layers distribute load by balancing latency, cost, and origin protection.

Content Delivery Networks exist to move static assets closer to the user, but a single point of presence cannot serve global traffic without collapsing the origin server. When a user requests an image, the system must decide whether to serve it from a local cache, a regional cache, or the origin. This decision is not random. It follows a strict hierarchy designed to minimize latency while protecting the origin from bandwidth costs.

The industry standard architecture relies on three distinct layers. The Edge POP serves the user directly. The Regional cache aggregates traffic between the Edge and the Origin. The Shield layer sits immediately in front of the origin server to prevent the “thundering herd” problem. Each layer handles a different percentage of requests, with different costs and latencies.

The math behind this hierarchy is visible in the distribution of traffic across the layers.

The three-layer architecture

Layer	Role	Typical Hit Rate	Latency	Cost Factor (per GB)
Edge POP	Direct user connection	85%	< 50ms	1.0x
Regional Cache	Interconnect between Edge and Origin	10%	< 100ms	1.5x
Origin Shield	Buffer against origin overload	5%	> 200ms	4.0x

The table shows the shape of the function. The Edge POP handles the vast majority of requests because it is physically closest to the user. Akamai Technologies operates over 400,000 servers in 130 countries, placing assets within a few hops of most end-users. When a request hits an Edge POP that has the content, the response is immediate. This is the 85% figure.

The Regional cache handles the next 10% of requests. These are items that are popular enough to be cached regionally but not cached at the specific Edge POP. This layer aggregates traffic. Instead of 100 Edge POPs all querying the origin for the same file, they query the Regional cache once. Cloudflare manages over 300 cities in 100+ countries, utilizing this aggregation to reduce upstream bandwidth.

The Origin Shield handles the final 5%. This is the “miss” rate. If the Edge and Regional layers do not have the content, the request falls through to the origin. The Shield layer sits between the CDN network and the origin server. Its job is not to serve the user, but to serve the origin. It ensures that even if 10,000 Edge POPs miss the cache simultaneously, they all hit the Shield once, not the origin 10,000 times.

The cost factor column reflects the economic reality of bandwidth. Serving from Edge is the baseline cost. Serving from Regional often incurs internal transfer fees between data centers. Serving from Origin incurs the highest cost because it involves egress fees and consumes the origin’s compute resources. Amazon CloudFront pricing structures reflect this, charging different rates for requests served from the edge versus requests pulled directly from the origin.

The tradeoff between latency and load

The three-layer model exists because a single layer cannot optimize for both speed and stability. If a CDN only had Edge POPs, every cache miss would hit the origin directly. During a flash crowd event, the origin would see a spike in traffic proportional to the number of Edge POPs. This is the “thundering herd” problem.

The IETF RFC 7234 defines the HTTP Caching standard, which dictates how these layers communicate. It allows the Regional cache to hold a “stale” copy of content while the Edge POP fetches a fresh version. This mechanism allows the system to serve 95% of traffic without touching the origin, even if the content is slightly out of date.

The 5% origin load is the critical number. It represents the maximum stress the origin server must handle. If the CDN architecture collapses to two layers, that 5% can spike to 50% during a failure. The Shield layer absorbs this spike. It acts as a dam. When the Edge POPs experience a cache flush, they do not all hit the origin at once. They hit the Shield. The Shield then fetches the content once and distributes it back to the Edge.

The closer

A misconfigured cache header is not a small mistake. It moves traffic from the 1.0x edge tier to the 4.0x origin tier, and from sub-50ms latency to over-200ms latency. The three-layer architecture only delivers its promise when the cache headers match the content. Akamai operates ~400,000 edge servers and Cloudflare operates 300+ city POPs not because more layers are always better, but because three layers is the precise number that holds the 5% origin load below the spike threshold while keeping the 85% edge hit rate within reach. The 85/10/5 split is an equilibrium, not a guarantee — and the origin server’s sizing is the bill the operator pays for getting the headers wrong.