How to Combine GZip + CDN for Fastest Page Loads
This is Article #3 of a 4-part series. For a good primer, check out the first two articles listed below. Otherwise, jump right in!
- The Importance of Page Load Speed
- Improve Page Load Speed (by 80%) by Improving Component Load Speed
- How to Combine GZip + CDN for Fastest Page Loads
- Pitfall of Amazon S3 + CloudFront as a CDN: No GZipping
- Possible Solutions (that don’t work)
- Intercept CloudFront requests with app server and rewrite
- Detect requests with Rails and write asset URLs accordingly
- The Solution: Hybrid Gzipping/CloudFront Depending on Asset-type
In my last post, I discussed the three techniques used to improve asset load speed. In this post, I will discuss how to combine the use of GZipping and a Content Delivery Network (CDN) for the fastest possible page loads.
Pitfall of Amazon S3 + CloudFront
Everyone’s favorite CDN these days is Amazon’s CloudFront service, which serves files directly from Amazon’s scalable “simple storage system”, Amazon S3. It is very easy to work with, has widespread support in Ruby gems and plugins (and countless other libraries), and is very inexpensive with it’s pay-as-you-go billing.
However, there is one large pitfall to using Amazon S3 + CloudFront, and that is that neither S3 nor CloudFront support GZip detecting and encoding. It would seem that we need to now decide whether we’ll do without GZipping or using a CDN. Not so! There is another way.
Possible Solutions (that don’t work)
Amazon S3 and CloudFront servers do not detect whether the incoming requests accept GZip encoding, and so they are not able to Gzip and serve components on the fly. Then, it’s simply a matter of figuring out whether we should link to the compressed or the uncompressed components when the user visits the page.
Detect requests with application and write asset URLs accordingly
This solution is similar to the last, except that it attacks the problem one step earlier in the workflow. So, let’s take a step back. Instead of linking to the asset through our own server, this time, we’ll revert to linking directly to CloudFront:
However, this time we’ll have our application (whether it be Ruby, PHP, Python, or whatever) detect if the request header accepts GZip encoding, and rewrite the asset tag accordingly.
I won’t go into detail about how to actually accomplish this, because the truth is, this won’t work either.
Why this doesn’t work
This will only work as long as your code is run dynamically every time a user loads the page. That means, once you implement this strategy, you no longer have the option to cache the page. Ever.
Sure, you could probably come up with some system that creates two versions of each cached page (one with gzipped links and one without), but that will add a lot of complexity to your server setup and filesystem, and it’s just too much trouble. So, let’s move on to another solution.
Intercept CloudFront requests with app server and rewrite
Now this first solution may seem clever, but let’s see if you can figure out why it won’t work. The idea here is that rather link to a stylesheet, for example, on CloudFront like this:
…we’ll instead link to our own server, which will read the request and redirect to either the compressed or the uncompressed stylesheet on CloudFront as appropriate.
And then the Apache configuration for the compressed.yourdomain.com virtual host would look like this:
ServerName compressed.yourdomain.com
DocumentRoot /home/user/yourapp/current/public
RewriteEngine On
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} -s
RewriteRule ^(.+) http://xxxxxx.cloudfront.net$1.gz
</VirtualHost>
Why this doesn’t work
Remember in the last article, one of the added benefits of off-loading your assets to a CDN is that your server no longer must listen for and respond to asset requests. This solution rescinds that benefit; even though the asset download still takes place between the CDN server and the user’s browser, the initial request must still go through your server to be resolved to the CDN.
Furthermore, each component request now requires two DNS lookups instead of one, which adds to the request latency (though the request is latency is small compared to the download time in the request/download cycle).
But the real reason this won’t work well is because it disrupts the magic that make CDNs fast. A CDN is beneficial primarily because serves files faster by choosing, for each request, the server (or “service node”) that is closest in proximity to the user. CDNs are able to estimate the closest server in the CDN using a variety of techniques, including reactive probing, proactive probing, and connection monitoring. (See Content Networking Techniques for more info)
By inserting your server (acting as a proxy) into the request cycle between the user’s computer and the CDN, you may cause the CDN to choose a sub-optimal service node for the delivery of content directly to the user. If the CDN probes the network from the request side, it will most likely choose the edge node location closest to your server rather than to the user’s computer, completely negating the benefit of using the CDN in the first place.
To illustrate this point, consider the typical request/download cycle for a javascript file served from your application’s server:
Below is a simplified diagram of the typical request/response cycle for a javascript file when using a CDN to serve the component.
This final diagram depicts the request/response cycle when delivery components through the CDN with your application server acting as a proxy (so that your app server can read the request and tell the CDN whether to serve the unzipped or the zipped component).
Notice in the diagram above, the CDN should have chosen the service node closest to the User, so that the javascript file would have less distance to travel and would thus download the fastest. Instead, it chose the node closer to the application server that proxied the request to the CDN.
The graph below compares download times for the user from my server (located in St. Louis, Missouri), from a server in Amazon CloudFront’s CDN, and from CloudFront with my server acting as a proxy. I performed this comparison from my own computer here in Ann Arbor, MI, while my buddy, Dave Leal, downloaded the file from his computer in Portugal.
The Solution: Hybrid Gzipping/CloudFront Depending on Asset-type
At this point, it may seem like we can choose between two alternatives:
- Move all assets to the Amazon CloudFront and forget about GZipping
- Keep components self-hosted and configure our server to detect incoming requests and perform on-the-fly GZipping as appropriate
In our last post, we saw that Gzipping our components can compress them down to ~25% of their original size, which means they’ll transfer 4X faster. And in this post, we see that serving components from Amazon CloudFront can transfer component files ~2X faster*.
The following is the simple formula for Download Time. You can see that File Size is directly proportional to Download Time (so reducing File Size by 1/2 reduces Download Time by 1/2). And Download Speed is indirectly proportional to Download Time (so increasing Download Speed by 2 reduces Download Time by 1/2):
Ideally we’d be able to do both (and some other more expensive Content Delivery Networks actually will allow you to). But if we must choose, compressible file-types gain much more by way of serving them compressed, than by serving them uncompressed from a CDN edge location. So, we will serve compressible file-types (stylesheets, javascripts, and static HTML files) from our own server, GZipped.
However, images are already compressed in the image encoding; image file size is unaffected by Gzipping them on our server. So, we may as well allow images to benefit from the 2X speed improvement by serving them straight from our Amazon CloudFront CDN.
Using this solution for hosting/serving components, we’ve been able to reduce page load time by 75% on several of our sites.
If you have a Ruby on Rails application, implementing this solution is easy, and won’t take you more than an hour or so. Stay tuned for Part 4: Caching, Zipping, and (Amazon CloudFront) CDN For A Rails App.
Tags: Amazon CloudFront, Amazon S3, assets, caching, CDN, Components, GZipping

June 11th, 2010 at 11:41 am
[...] Contact Navigate to us Is Your Site Too Slow?(The Importance of Page Load Speed) Optimal Configuration of Caching, Zipping, and (CloudFront) CDN for Fastest Component Loading [...]
June 11th, 2010 at 7:04 pm
You could do it with javascript…
June 11th, 2010 at 7:41 pm
Missing from the “3 ways to reduce page load time” is one of the most important:
Reduce the number of assets on each page
For small assets it takes as long to connect to the server as it does to download the file (especially if your ping times to the server are long; yes, pipelining makes a difference but that’s a crap shoot and often doesn’t work). Many “modern” pages load 30 or more JS and CSS assets and take an eternity to render as a result (also, depending on where in the page the loads are triggered, some browsers may not render anything until all the assets are loaded). So load your JS and CSS from links that concatenate as many files into one request.
You should use a CDN but the nonsense about redirecting through your own origin wasn’t worth mentioning even if to be dismissed.
June 11th, 2010 at 8:28 pm
@Bozo: Yes, absolutely reducing the number of assets, or components, on each page is HUGE, and has been written about constantly all over the web. That is something I was going to touch on in the next article, though I didn’t think it would need more than a sentence or two. Perhaps, I should have at least mentioned it off-hand in this or one of the previous articles.
However, I do disagree with your assertion that redirecting through your own origin was not worth mentioning. If you think about it, it actually could be a very worthwhile method, and with some other CDNs it could actually work. I’ll quickly address the reasons it doesn’t work, and show you how it almost could work (and thus why I felt it was worth addressing).
Issue 1: If you keep the request proxying through your own origin, you haven’t removed the bottleneck (your own app server) from the chain.
-Why it could work: True, your own server is still in the request path, but it’s no longer involved in the actual file download. If you look at the amount of time required to forward a request (as in forward the request from your origin server to the CDN), it’s actually very small compared to the amount of time it takes to download the file. And yes, that small amount of time does add up if you have 50 different requests to 50 different assets, but if not, this could still work.
Issue 2: It causes the CDN to choose a sub-optimal edge location from which to serve the file.
-Why it could work: CDNs actually use various methods and algorithms for choosing the best service node from which to serve the file. Now of course the CDN will choose a sub-optimal node if it pings backward through the request path, which has to go through your proxying server.
However, hypothetically a CDN could use something as simple as IP geocoding to pick the closest location. Now when your server proxies the request to the CDN, it actually does pass through the IP address of the original request. For this reason, it’s entirely plausible that this method would not mess up the CDNs service-node decision. This is why I had to perform benchmarks to see if this actually affected Amazon CloudFront (hence the graph). Sure enough, it does mess up the way Amazon CF works. For other CDNs, it may not.
June 11th, 2010 at 8:30 pm
@W. Andrew Loe III: Yes, you could do it with javascript. Of course, that will only work if the user’s browser supports javascript (and has it enabled). Though if it doesn’t accept Gzip encoding, there is a good chance it doesn’t fully support javascript as well.
June 12th, 2010 at 4:31 am
I always enjoy learning what other people think about Amazon Web Services and how they use them. Check out my very own tool CloudBerry Explorer that helps to
manage S3 on Windows . It helps automate setting up CloudFront distibutions and Gzip headers among other things.
June 21st, 2010 at 12:31 pm
[...] How to Combine GZip + CDN for Fastest Page Loads (alfajango.com) [...]
July 7th, 2010 at 1:47 am
[quote]
This will only work as long as your code is run dynamically every time a user loads the page. That means, once you implement this strategy, you no longer have the option to cache the page. Ever.
[/quote]
That is not quite true — you can still cache dynamically generated pages, with links pointing out to either compressed or uncompressed assets, simply by setting Vary: Accept-Encoding header in the response.
That way browsers and intermediate proxies will cache and deliver different page versions based on the value of the Accept-Encoding header, which is exactly what you want. Best of all, it doesn’t really add any complexity to the setup — it simply uses standard features of the HTTP protocol.
July 7th, 2010 at 9:00 am
@Aleks: I think what you are saying is that you can cache two versions of each page, one for browsers that accept encoding, and one for browsers that don’t. And yes, you absolutely can. Then you just need to configure your server, such as Apache, to serve the appropriate cached page.
This does add to the complexity of the setup though in two ways.