The Internet infrastructure won’t be complete without content delivery networks (CDNs). However, not many users fully understand what they are and what happens behind the scenes. Often, you may hear people brag about using a CDN or give suggestions on using it without really having any in-depth idea about it. A content delivery network, like any other technological innovation, isn’t a magical thing and works straightforwardly.
A user requests for information through a web browser passes through the DNS. It is similar to searching for a contact detail in a phone book. The browser provides the domain name, and the DNS sends the IP address. After receiving it, it contacts the webserver by making subsequent requests directly. For example, you have a small e-commerce site or blog, and it has a domain name with an IP address. However, for large apps on the Internet, they may have one domain name with several IP addresses.
The laws of Physics determine the connection speed of one device with another. Therefore, if you’re attempting to connect to a US server while you’re in China, you can expect your connection will take longer. However, if you access the same server within the country, your connection will be faster. As such, large companies have different servers with duplicates of data in strategic locations around the world to lower transmission costs and enhance user experience. This strategy is what we know as a content delivery network, and edge servers are those servers closest to the end-user.
The CDN manages the DNS request made by the web browser for a domain name. It searches for the nearest set of servers that can handle the incoming request. Then, the DNS server sends the IP address of the nearest edge server. Therefore, if a user is in Virginia and makes a request for data, the DNS server determines the nearest server on the East coast. Consequently, if another user is in California, he will receive an IP address of an edge server on the West coast. However, the DNS resolver may assign a server, not within the geographic location of the requestor.
After the initial process of requesting the nearest edge server, some companies may decide to implement a few CDN optimizations too. For example, they can redirect to a cheaper maintenance server or an idle computer when the original server nears its full capacity. The CDN system can still provide the most efficient IP address to process the request.
Accessing the Content
An edge server is a proxy cache that is analogous to a browser cache. Once a request goes through it, it checks if the designed content exists. If the content is in the cache, it then checks if the entry hasn’t expired. Afterward, it serves the content to the requestor.
On the other hand, if the content doesn’t exist in the cache or the entry has expired, the edge server sends a request to the original server to fetch the data. Once it receives the information, it stores the content in the cache and then serves it to the requestor.
Yahoo! initiated the open-sourced Apache Traffic Server that it uses to manage its CDN traffic. For instance, it uses its combo handler tool to serve the YUI library data to its CDN. This mechanism handles the request containing the filenames and directs the requestor to a nearby edge server. The origin server sends the content if it doesn’t exist on the edge so that it can send it to the requestor.
A common misconception is that a CDN works like an FTP repository for static files. An edge server serves as a proxy of the origin server, which instructs it to send specific content to the requestor. This original server may run in Node.js, Ruby, Java, or other webservers so it can do everything. The edge server only requests the content to the original server and then serves it to the end-user. Therefore, in our YIU example, the combo handler isn’t on any edge server, but in the origin server only.
The performance guidelines of Yahoo! specify that static files must have far-future Expires headers. Why is it so? Well, for two reasons:
- It enables the caching of resources by the browser for a prolonged duration
- It facilitates the caching of content by the CDN for an extended time
What does this mean? It means that content providers can’t use a filename twice. The information will be in at least two locations anywhere in the world, and users will access the cached version of the content instead of requesting it from the origin server.
The YUI library has directories that contain its version number to separate it from the file versions. Usually, it appends identifiers at the end of the filename like version control or an MD5 hash. These techniques ensure that requestors receive the latest file version as they maintain far-future Expires headers on each request.
A CDN plays a significant part on the Internet. You can expect it to have a more substantial role in the future. Today, companies strive to explore alternatives to assign more functionality to the edge server to allow end-users to gain outstanding user experience. Edge Side Includes (ESI) is one of the techniques designed to offer partial pages from the cache. Having an excellent understanding of what a CDN is and how it works can unlock better performance benefits for your users. If you wish your website guests to achieve the utmost user involvement, consider placing your content in a content delivery network today.