Caching, Zipping, and (Amazon CloudFront) CDN For A Rails App

18 June 2010

This is Article #4 of a 4-part series. For a good primer, check out the first two articles listed below. For the reasoning and analysis behind the "Recommended" option in this article, check out Part 3, How to Combine GZip + CDN for Fastest Page Loads. Otherwise, jump right in!

In this article, we're going to speed up our Rails application by up to 75%, simply by optimizing our Rails asset host. We're going to serve our components (stylesheets, javascripts, images, etc.) from a combination of our app's server and Amazon CloudFront (Option A, recommended), or entirely from CloudFront (Option B - easier).

The best option for you may depend on your specific needs, but I'll cover both processes below. For a an in-depth analysis of why Option A is recommended over Option B, see the last article in this series, How to Combine GZip + CDN for Fastest Page Loads.

Prerequisites

Cached Stylesheets and Javascripts

Another way to reduce page load time is to combine all of your components into as few files as possible. In other words, combine all of your stylesheets into a single css file, and likewise with your javascripts. Remember from the last article, that each request takes 50-150ms, not including the response and download time. If you have 10 separate javascripts, this equates to 0.5-1.5 seconds just to request the files (not to mention all the time to download them). If you can combine all of the files into one, that means you need just one request to get the same amount of data.

Luckily in Rails, this is easy, simply add :cache => 'cached-file-name' to your stylesheet_link_tag and javascript_include_tag in your application layout. For example:

Now, as long as the following line is set to true in your environment.rb, or more likely in production.rb, Rails will either load your combined files in the layout, or create and load them if they don't already exist.

config.action_controller.perform_caching             = true

Simply packing all stylesheets and javascripts into one file each reduced page load time of one of our production applications from 10.1 to 8.3 seconds (an 18% reduction in load time alone).

Creating an Amazon AWS Account

If you do not yet have an Amazon AWS account, you will need to create that and enable S3 and CloudFront services. See this writeup on creating and setting up an Amazon S3 account. for helpful instructions.

Setup S3 Buckets

Once you've signed up for your Amazon AWS account and activated S3 and CloudFront, you'll want to setup 4 S3 buckets for your application, using Amazons S3 management console.

We're going to setup 4 buckets and CDN distributions because some old browsers still have an artificial limitation that only allow 2 concurrent connections to each domain, meaning our components will take longer to download from Amazon if they can only be downloaded 2 at a time. By creating 4 different domains pointing to 4 different buckets/distributions, we're allowing our components to download up to 8 at a time from those browsers that still enforce this limitation.

When naming your S3 buckets, avoid using periods if you would like the option of accessing your components directly from S3 over HTTPS. Amazon has a trusted SSL wildcard certificate for *.s3.amazonaws.com.

If you name your bucket cdn0.yourapp.com, then your components will have the URL https://cdn0.yourapp.com.s3.amazonaws.com/stylesheet.css. This will give you a warning message saying the connection is not trusted, because the browser treats your bucket name as subdomains (and in this case, com.s3.amazonaws.com would be trusted, but subdomains of that, cdn0.yourapp and yourapp will not).

Setup CloudFront Distributions

Once your S3 buckets are created, click over to the CloudFront tab and create one distribution for each S3 bucket as shown. You can type any comment to help you quickly identify each distribution.

Create CNAME Records (optional)

Once you've created your 4 CloudFront distributions, you may create a CNAME record for each distribution. This allows you to serve files from CloudFront using your own asset subdomains, like cdn0.yourapp.com, instead of raNDomString1234.cloudfront.net. We'll use the following format of cdn%d.yourapp.com, where %d stands for digits 0-3:

cdn0.yourapp.com
cdn1.yourapp.com
cdn2.yourapp.com
cdn3.yourapp.com

Install Rails S3 Synch Plugin

This plugin adds some Capistrano recipes to synch our application's public directory with our four S3 buckets automatically every time we deploy our app. See Spatten Design's documentation for more information. I've made some updates to their original plugin to properly set the Cache-control and Expires headers for our assets on S3, as well as to properly set the Content-encoding header for Gzipped assets.

Update: I've updated the S3 Synch Plugin further; it can now handle unique S3 buckets for different Rails environments (e.g. one set of buckets for production and another for staging). Be sure to update your synch_s3_asset_host.yml file as shown below.

Installing AWS-S3 Gem

The synch_s3_asset_host plugin requires the AWS-S3 gem, so add the following to your environment.rb:

...and then run the following from the terminal to install the S3 Synch plugin's gem dependency:

sudo rake gems:install

Configure S3 Synch Plugin

Create a config/synch_s3_asset_host.yml file like this:

Update: The "production" part in the file above has been added for my latest update of the S3 Asset Synch Plugin.

Add S3 Synch to Deployment

Now, in your Capistrano deploy.rb script, add the following line to the :deploy namespace:

namespace :deploy do
  ...
  before "deploy:symlink", "s3_asset_host:synch_public"
  ...
end

...and then add the :asset_host_syncher => true flag to the :web role:

Option A: Compressible Assets from App Server, Images from CloudFront (recommended)

For more detail about why this method is recommended, see the last article in this series.

Configure Rails Asset Host

Use the following configuration in your production.rb file to configure the way Rails writes the URLs for asset_tags:

If you did not configure custom CNAME records earlier, your Rails asset_host configuration would be a bit simpler:

Notice the source.hash % 4 code above. This ensures that the same component is always served from the same subdomain to take full advantage of client-side caching for that component, rather than randomly selecting from which subdomain to serve each component on each page load.

For more information on configuring Rails's asset_host, see the documentation for Base.asset_host

Create A-name Record

We will also need to create an A-name record for the cache.yourapp.com subdomain, which points to your application server's IP address.

Configure Apache

Now we need to configure Apache to accept incoming requests to our "cache" subdomain, setting the appropriate far-future Expires and Cache-control headers. We also need to tell Apache to automatically compress and serve any compressible filetype on the fly. Add this to your site's Apache conf file:

...
   # gzip html, css, and js
   AddOutputFilterByType DEFLATE text/html text/css application/x-javascript application/javascript

   <virtualhost *:80>
      ServerName cache.yourapp.com
      DocumentRoot /path/to/yourapp/public
      <filesmatch ".(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
         ExpiresActive On
         ExpiresDefault "access plus 1 year"
      </filesmatch>
      FileETag none
   </virtualhost>
...

Also note here that we turned off the ETag functionality for this subdomain. ETags ("entity tags") are suppose to be a more flexible mechanism to query and invalidate cached assets, rather than using the last-modified_date of the file. See Yahoo's ETag description for more info.

However, the ETag's uniqueness depends not just on the file, but usually on the server it's being served from as well. This means if you have your assets copied to several asset domains on different servers, a file downloaded and cached from one server, and then the next page tries to pull the asset from another asset domain, the file's ETag will not match the ETag of the cached file, so it will re-download the file instead of serving it from cache.

Furthermore, Rails does a very good job of appending the last-modified-date to the asset file names (using the asset_tag helpers), which effectively serves, caches, and invalidates the assets for you as necessary. So, we're much better off just turning ETags off for our Rails app.

Now we need to make sure the appropriate Apache modules are enabled and restart Apache.

sudo a2enmod deflate
sudo a2enmod expires
sudo /etc/init.d/apache2 force-reload

Option B: Serve Everything from CloudFront (easier, but not recommended)

For more detail about why this is not recommended, see the last article in this series. Basically, though, it's because it requires you to make one of the following compromises:

That being said, if this is acceptable for you, this method is simpler to set up and configure.

Configure Rails Asset Host

Add the following to your production.rb:

Again, if you did not configure custom CNAME records earlier, your Rails asset_host will be a bit simpler:

Pre-compile Cached Stylesheet and Javascript File

If you're serving every component from CloudFront, you will need to pre-compile your stylesheets and javascripts on every deploy. Otherwise, Rails will try to compile and save the files to your application server, but try to serve them from S3 (where they won't exist).

To solve this, we'll add some Capistrano scripts to our deploy.rb to compile our files for us before the synch_s3_asset_host plugin copies our public directory over to our S3 buckets. But this means, we'd have to copy the list of asset files to be compiled into our Capistrano script, as well as having them listed in our application.html.erb layout. To DRY things up a little, we're going to create some project-wide constants:

lib/assets_for_cache.rb

module AssetsForCache
   JAVASCRIPT_FILES = ['jquery', 'jquery-ui', 'application']
   STYLESHEET_FILES = ['reset', 'application']
   JAVASCRIPT_CACHE_FILE = 'all-app-javascripts'
   STYLESHEET_CACHE_FILE = 'all-app-stylesheets'
end

And then replace your javascript_include_tag and stylesheet_link_tag in your application layout with the following:

Add this to your deploy.rb script:

namespace :assets do
   require File.dirname(__FILE__) + '/../lib/assets_for_cache.rb'
   set :stylesheets, AssetsForCache::STYLESHEET_FILES
   set :javascripts, AssetsForCache::JAVASCRIPT_FILES

   task :package_cached_assets do
      package_stylesheets
      package_javascripts
   end

   task :package_stylesheets, :roles => :web do
     sudo %{rm -f #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css}
     stylesheets.each do |stylesheet|
       run %{cat #{release_path}/public/stylesheets/#{stylesheet}.css >> \
             #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css}
     end
     run %{gzip -c #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css > #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css.gz}
   end
   task :package_javascripts, :roles => :web do
     sudo %{rm -f #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js}
     javascripts.each do |javascript|
       run %{cat #{release_path}/public/javascripts/#{javascript}.js >> \
             #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js}
     end
     run %{gzip -c #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js > #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js.gz}
   end
end

...and then add this to the :deploy namespace in your deploy.rb file, before calling the s3_asset_host sync script:

namespace :deploy do
  ...
  before "deploy:symlink", "assets:package_cached_assets"
  before "deploy:symlink", "s3_asset_host:synch_public"
  ...
end

Conclusion

Now simply save your project and deploy it! The first deploy will take quite a while, as your entire /public directory will be copied to all 4 buckets on Amazon S3, one at a time. But after that, it's a painless process.

If you have any files or directories in your public folder that are not assets to be copied to S3 (like a WordPress blog or whatever), you can add them to the --exclude list in the synch_s3_asset_host plugin on line 186 of vendor/plugins/synch_s3_asset_host/recipes/synch_s3_asset_host.rb

Whether you chose the "recommended" or the "easier" option, you should immediately notice a significant increase in the performance of your Rails app. Thanks for sticking with me through this 4-part series! Please let me know if you have any thoughts, questions, or feedback in the comments.

About the author:

Steve Schwartz // Owner of Alfa Jango Web-based Software, creator of RateMyStudentRental & LeadNuke, engineer, hacker, rubyist, guitarist, aspiring racecar driverist.



Comments are loading...