Alfa Jango Blog Engineering, Software, and Entrepreneurship

Posts Tagged ‘Optimization’

Caching, Zipping, and (Amazon CloudFront) CDN For A Rails App

Friday, June 18th, 2010

This is Article #4 of a 4-part series. For a good primer, check out the first two articles listed below. For the reasoning and analysis behind the “Recommended” option in this article, check out Part 3, How to Combine GZip + CDN for Fastest Page Loads. Otherwise, jump right in!

  • The Importance of Page Load Speed
  • Improve Page Load Speed (by 80%) by Improving Component Load Speed
  • How to Combine GZip + CDN for Fastest Page Loads
  • Caching, Zipping, and (Amazon CloudFront) CDN For A Rails App
    • Prerequisites
      1. Cached stylesheets and javascripts
      2. Creating an Amazon AWS Account
    • Setup S3 Buckets
    • Setup CloudFront Distributions
    • Create CNAME records (optional)
    • Install Rails S3 Synch Plugin
      1. Installing AWS-S3 Gem
      2. Configure S3 Synch Plugin
      3. Add S3 Synch to Deployment
    • Option A: Compressible Assets from App Server, Images from CloudFront (recommended)
      1. Configure Rails Asset Host
      2. Create A-name Record
      3. Configure Apache
    • Option B: Serve Everything from CloudFront (easier, but not recommended)
      1. Configure Rails Asset Host
      2. Pre-compile Cached Stylesheet and Javascript File
    • Conclusion

In this article, we’re going to speed up our Rails application by up to 75%, simply by optimizing our Rails asset host. We’re going to serve our components (stylesheets, javascripts, images, etc.) from a combination of our app’s server and Amazon CloudFront (Option A, recommended), or entirely from CloudFront (Option B – easier).

The best option for you may depend on your specific needs, but I’ll cover both processes below. For a an in-depth analysis of why Option A is recommended over Option B, see the last article in this series, How to Combine GZip + CDN for Fastest Page Loads.

Prerequisites

Cached Stylesheets and Javascripts

Another way to reduce page load time is to combine all of your components into as few files as possible. In other words, combine all of your stylesheets into a single css file, and likewise with your javascripts. Remember from the last article, that each request takes 50-150ms, not including the response and download time. If you have 10 separate javascripts, this equates to 0.5-1.5 seconds just to request the files (not to mention all the time to download them). If you can combine all of the files into one, that means you need just one request to get the same amount of data.

Luckily in Rails, this is easy, simply add :cache => 'cached-file-name' to your stylesheet_link_tag and javascript_include_tag in your application layout. For example:

<%= stylesheet_link_tag 'reset', 'application', :cache => 'all-app-stylesheets' %>
<%= javascript_include_tag 'jquery', 'jquery-ui', 'application', :cache => 'all-app-javascripts' %>

Now, as long as the following line is set to true in your environment.rb, or more likely in production.rb, Rails will either load your combined files in the layout, or create and load them if they don’t already exist.

config.action_controller.perform_caching             = true

Simply packing all stylesheets and javascripts into one file each reduced page load time of one of our production applications from 10.1 to 8.3 seconds (an 18% reduction in load time alone).

Creating an Amazon AWS Account

If you do not yet have an Amazon AWS account, you will need to create that and enable S3 and CloudFront services. See this writeup on creating and setting up an Amazon S3 account. for helpful instructions.

Setup S3 Buckets

Once you’ve signed up for your Amazon AWS account and activated S3 and CloudFront, you’ll want to setup 4 S3 buckets for your application, using Amazons S3 management console.

We’re going to setup 4 buckets and CDN distributions because some old browsers still have an artificial limitation that only allow 2 concurrent connections to each domain, meaning our components will take longer to download from Amazon if they can only be downloaded 2 at a time. By creating 4 different domains pointing to 4 different buckets/distributions, we’re allowing our components to download up to 8 at a time from those browsers that still enforce this limitation.

When naming your S3 buckets, avoid using periods if you would like the option of accessing your components directly from S3 over HTTPS. Amazon has a trusted SSL wildcard certificate for *.s3.amazonaws.com.

If you name your bucket cdn0.yourapp.com, then your components will have the URL https://cdn0.yourapp.com.s3.amazonaws.com/stylesheet.css. This will give you a warning message saying the connection is not trusted, because the browser treats your bucket name as subdomains (and in this case, com.s3.amazonaws.com would be trusted, but subdomains of that, cdn0.yourapp and yourapp will not).

Setup CloudFront Distributions

Once your S3 buckets are created, click over to the CloudFront tab and create one distribution for each S3 bucket as shown. You can type any comment to help you quickly identify each distribution.

Create CNAME Records (optional)

Once you’ve created your 4 CloudFront distributions, you may create a CNAME record for each distribution. This allows you to serve files from CloudFront using your own asset subdomains, like cdn0.yourapp.com, instead of raNDomString1234.cloudfront.net. We’ll use the following format of cdn%d.yourapp.com, where %d stands for digits 0-3:

cdn0.yourapp.com
cdn1.yourapp.com
cdn2.yourapp.com
cdn3.yourapp.com

Install Rails S3 Synch Plugin

This plugin adds some Capistrano recipes to synch our application’s public directory with our four S3 buckets automatically every time we deploy our app. See Spatten Design’s documentation for more information. I’ve made some updates to their original plugin to properly set the Cache-control and Expires headers for our assets on S3, as well as to properly set the Content-encoding header for Gzipped assets.

Update: I’ve updated the S3 Synch Plugin further; it can now handle unique S3 buckets for different Rails environments (e.g. one set of buckets for production and another for staging). Be sure to update your synch_s3_asset_host.yml file as shown below.

./script/plugin install git://github.com/JangoSteve/synch_s3_asset_host.git

Installing AWS-S3 Gem

The synch_s3_asset_host plugin requires the AWS-S3 gem, so add the following to your environment.rb:

config.gem "aws-s3", :lib => "aws/s3"

…and then run the following from the terminal to install the S3 Synch plugin’s gem dependency:

sudo rake gems:install

Configure S3 Synch Plugin

Create a config/synch_s3_asset_host.yml file like this:

AWS_ACCESS_KEY_ID: 'YOURKEYHERE'
AWS_SECRET_ACCESS_KEY: 'YourSecretAccessKeyHere'
production:
  asset_host_name: "yourapp-com-cdn%d" # This is whatever you named your S3 buckets, using %d in place of the numbers 0-3
# dry_run: false # Set to true if you want to test the asset_host uploading without doing anything on Amazon S3

Update: The “production” part in the file above has been added for my latest update of the S3 Asset Synch Plugin.

Add S3 Synch to Deployment

Now, in your Capistrano deploy.rb script, add the following line to the :deploy namespace:

namespace :deploy do
  ...
  before "deploy:symlink", "s3_asset_host:synch_public"
  ...
end

…and then add the :asset_host_syncher => true flag to the :web role:

...
role :web, "yourapp.com", :asset_host_syncher => true
...

Option A: Compressible Assets from App Server, Images from CloudFront (recommended)

For more detail about why this method is recommended, see the last article in this series.

Configure Rails Asset Host

Use the following configuration in your production.rb file to configure the way Rails writes the URLs for asset_tags:

# Enable serving of images, stylesheets, and javascripts from an asset server
# config.action_controller.asset_host = "http://assets.example.com"
ActionController::Base.asset_host = Proc.new { |source, request|
    # the following will route to Amazon S3 + CloudFront if /images asset (setup with CNAMEs as domains cdn0-cdn3)
    #   and will route to cdn for anything else (js, css, html), which routes to RMSR's own server so that files can be gzipped and served
    if source.starts_with?('/images')
      unless request.ssl? # CloudFront does not support HTTPS, but S3 does
        "http://cdn#{source.hash % 4}.yourapp.com"
      else # For SSL we want the certificate to match the hosting domain for cloudfront
        [ "https://yourcloudfrontdist0.cloudfront.net",
          "https://yourcloudfrontdist1.cloudfront.net",
          "https://yourcloudfrontdist2.cloudfront.net",
          "https://yourcloudfrontdist3.cloudfront.net" ][source.hash % 4]
      end
    else
      # use the cahed and zipped subdomain for assets that can be zipped (i.e. non-binary filetypes)
      # => text/html text/css application/x-javascript application/javascript
      "#{request.protocol}cache.yourapp.com"
    end
  }

If you did not configure custom CNAME records earlier, your Rails asset_host configuration would be a bit simpler:

ActionController::Base.asset_host = Proc.new { |source, request|
    if source.starts_with?('/images')
      [ "#{request.protocol}yourcloudfrontdist0.cloudfront.net",
        "#{request.protocol}yourcloudfrontdist1.cloudfront.net",
        "#{request.protocol}yourcloudfrontdist2.cloudfront.net",
        "#{request.protocol}yourcloudfrontdist3.cloudfront.net" ][source.hash % 4]
    else
      # use the cahed and zipped subdomain for assets that can be zipped (i.e. non-binary filetypes)
      # => text/html text/css application/x-javascript application/javascript
      "#{request.protocol}cache.yourapp.com"
    end
  }
Notice the source.hash % 4 code above. This ensures that the same component is always served from the same subdomain to take full advantage of client-side caching for that component, rather than randomly selecting from which subdomain to serve each component on each page load.

For more information on configuring Rails’s asset_host, see the documentation for Base.asset_host

Create A-name Record

We will also need to create an A-name record for the cache.yourapp.com subdomain, which points to your application server’s IP address.

Configure Apache

Now we need to configure Apache to accept incoming requests to our “cache” subdomain, setting the appropriate far-future Expires and Cache-control headers. We also need to tell Apache to automatically compress and serve any compressible filetype on the fly. Add this to your site’s Apache conf file:

...
   # gzip html, css, and js
   AddOutputFilterByType DEFLATE text/html text/css application/x-javascript application/javascript

   <virtualhost *:80>
      ServerName cache.yourapp.com
      DocumentRoot /path/to/yourapp/public
      <filesmatch ".(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
         ExpiresActive On
         ExpiresDefault "access plus 1 year"
      </filesmatch>
      FileETag none
   </virtualhost>
...

Also note here that we turned off the ETag functionality for this subdomain. ETags (“entity tags”) are suppose to be a more flexible mechanism to query and invalidate cached assets, rather than using the last-modified_date of the file. See Yahoo’s ETag description for more info.

However, the ETag’s uniqueness depends not just on the file, but usually on the server it’s being served from as well. This means if you have your assets copied to several asset domains on different servers, a file downloaded and cached from one server, and then the next page tries to pull the asset from another asset domain, the file’s ETag will not match the ETag of the cached file, so it will re-download the file instead of serving it from cache.

Furthermore, Rails does a very good job of appending the last-modified-date to the asset file names (using the asset_tag helpers), which effectively serves, caches, and invalidates the assets for you as necessary. So, we’re much better off just turning ETags off for our Rails app.

Now we need to make sure the appropriate Apache modules are enabled and restart Apache.

sudo a2enmod deflate
sudo a2enmod expires
sudo /etc/init.d/apache2 force-reload

Option B: Serve Everything from CloudFront (easier, but not recommended)

For more detail about why this is not recommended, see the last article in this series. Basically, though, it’s because it requires you to make one of the following compromises:

  • a) Serve all files uncompressed, resulting in file sizes up to 4x bigger than necessary.
  • b) Serve Gzipped assets from CloudFront without first detecting whether or not the visitor’s browser support Gzip encoding.

That being said, if this is acceptable for you, this method is simpler to set up and configure.

Configure Rails Asset Host

Add the following to your production.rb:

ActionController::Base.asset_host = Proc.new { |source, request|
  # Enable serving of images, stylesheets, and javascripts from an asset server
  # config.action_controller.asset_host = "http://assets.example.com"
  unless request.ssl? # CloudFront does not support HTTPS, but S3 does
    "http://cdn#{source.hash % 4}.yourapp.com"
  else # For SSL we want the certificate to match the hosting domain for cloudfront
    [ "https://yourcloudfrontdist0.cloudfront.net",
      "https://yourcloudfrontdist1.cloudfront.net",
      "https://yourcloudfrontdist2.cloudfront.net",
      "https://yourcloudfrontdist3.cloudfront.net" ][source.hash % 4]
  end
}

Again, if you did not configure custom CNAME records earlier, your Rails asset_host will be a bit simpler:

ActionController::Base.asset_host = Proc.new { |source, request|
  [ "#{request.protocol}yourcloudfrontdist0.cloudfront.net",
    "#{request.protocol}yourcloudfrontdist1.cloudfront.net",
    "#{request.protocol}yourcloudfrontdist2.cloudfront.net",
    "#{request.protocol}yourcloudfrontdist3.cloudfront.net" ][source.hash % 4]
}

Pre-compile Cached Stylesheet and Javascript File

If you’re serving every component from CloudFront, you will need to pre-compile your stylesheets and javascripts on every deploy. Otherwise, Rails will try to compile and save the files to your application server, but try to serve them from S3 (where they won’t exist).

To solve this, we’ll add some Capistrano scripts to our deploy.rb to compile our files for us before the synch_s3_asset_host plugin copies our public directory over to our S3 buckets. But this means, we’d have to copy the list of asset files to be compiled into our Capistrano script, as well as having them listed in our application.html.erb layout. To DRY things up a little, we’re going to create some project-wide constants:

lib/assets_for_cache.rb

module AssetsForCache
   JAVASCRIPT_FILES = ['jquery', 'jquery-ui', 'application']
   STYLESHEET_FILES = ['reset', 'application']
   JAVASCRIPT_CACHE_FILE = 'all-app-javascripts'
   STYLESHEET_CACHE_FILE = 'all-app-stylesheets'
end

And then replace your javascript_include_tag and stylesheet_link_tag in your application layout with the following:

<%= javascript_include_tag AssetsForCache::JAVASCRIPT_FILES, :cache => AssetsForCache::JAVASCRIPT_CACHE_FILE %>
<%= stylesheet_link_tag AssetsForCache::STYLESHEET_FILES, :cache => AssetsForCache::STYLESHEET_CACHE_FILE %>

Add this to your deploy.rb script:

namespace :assets do
   require File.dirname(__FILE__) + '/../lib/assets_for_cache.rb'
   set :stylesheets, AssetsForCache::STYLESHEET_FILES
   set :javascripts, AssetsForCache::JAVASCRIPT_FILES
   
   task :package_cached_assets do
      package_stylesheets
      package_javascripts
   end
   
   task :package_stylesheets, :roles => :web do
     sudo %{rm -f #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css}
     stylesheets.each do |stylesheet|
       run %{cat #{release_path}/public/stylesheets/#{stylesheet}.css >> \
             #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css}
     end
     run %{gzip -c #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css > #{release_path}/public/stylesheets/#{AssetsForCache::STYLESHEET_CACHE_FILE}.css.gz}
   end
   task :package_javascripts, :roles => :web do
     sudo %{rm -f #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js}
     javascripts.each do |javascript|
       run %{cat #{release_path}/public/javascripts/#{javascript}.js >> \
             #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js}
     end
     run %{gzip -c #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js > #{release_path}/public/javascripts/#{AssetsForCache::JAVASCRIPT_CACHE_FILE}.js.gz}
   end
end

…and then add this to the :deploy namespace in your deploy.rb file, before calling the s3_asset_host sync script:

namespace :deploy do
  ...
  before "deploy:symlink", "assets:package_cached_assets"
  before "deploy:symlink", "s3_asset_host:synch_public"
  ...
end

Conclusion

Now simply save your project and deploy it! The first deploy will take quite a while, as your entire /public directory will be copied to all 4 buckets on Amazon S3, one at a time. But after that, it’s a painless process.

If you have any files or directories in your public folder that are not assets to be copied to S3 (like a WordPress blog or whatever), you can add them to the --exclude list in the synch_s3_asset_host plugin on line 186 of vendor/plugins/synch_s3_asset_host/recipes/synch_s3_asset_host.rb

Whether you chose the “recommended” or the “easier” option, you should immediately notice a significant increase in the performance of your Rails app. Thanks for sticking with me through this 4-part series! Please let me know if you have any thoughts, questions, or feedback in the comments.

Improve Page Load Speed (by 80%) by Improving Component Load Speed

Tuesday, June 1st, 2010

This is Article #2 of a 4-part series. This article (along with Article #1) serves as a primer for the next two entries in this series, which discuss the most efficient way to put these concepts into practice in your web application. For more a more in-depth look at these concepts, see Yahoo!’s Best Practices for Speeding Up Your Web Site and Google’s Speed Tracer tutorial

  • The Importance of Page Load Speed
  • Improve Page Load Speed by Improving Asset Load Speed
    • 3 Techniques to Speed Up Asset Loading
      • Better Caching with Expires header
      • Zipping
      • Content Delivery Network (CDN)

In my last post, I discussed why it’s becoming increasingly important to ensure your website loads quickly for users (and Googlebots). At the end of the article, I mentioned a quote from Yahoo!’s article, Best Practices for Speeding Up Your Web Site:

80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the Performance Golden Rule.

In other words, you can get the most “bang for your buck” when it comes to page load optimization by speeding up the loading of your site components (a.k.a. “assets”, such as javascripts, CSS stylesheets, images, etc.). In this post, I’ll discuss the 3 main techniques used to improve asset load speed.

Asset load optimization often results in a reduction in asset-load-time by 75-90%. Since asset-load-time accounts for 80-90% of your total page-load-time, this equates to an overall reduction in page-load-time by up to 80%!

Below are graphs showing the load time of RateMyStudentRental.com before and after (respectively) implementing the asset-optimization shown here. Using Google’s Speed Tracer extension, we can see that total load time decreased from 40 milliseconds down to 15ms.

(more…)

Is Your Site Too Slow?
(The Importance of Page Load Speed)

Friday, May 14th, 2010

This is article #1 of a 4-part series. This article (along with Article #2) serves as a primer for the last two entries in this series, which discuss the most efficient way to put these concepts into practice in your web application. For more a more in-depth look at these concepts, see Yahoo!’s Best Practices for Speeding Up Your Web Site and Google’s Speed Tracer tutorial

Page load speed is becoming increasingly important as rich web applications become more interactive. It’s not just about usability anymore; it can now directly affect your placement in search engine results, now that Google uses page load speed in their ranking algorithm. Are you ready for a reality check? Get Google Webmaster Tools for your site, and go to the Labs >> Site Performance to view your average page load time, as seen by Google’s web crawlers. That’s right, Google is already tracking your site’s performance history.

Google Webmaster Tools is even kind enough to tell you how you stack up against the rest of the web. Here is what Webmaster Tools had to say about one of our sites before optimizing it for quick page loading:

On average, pages in your site take 4.5 seconds to load (updated on Feb 21, 2010). This is slower than 70% of sites. These estimates are of low accuracy (fewer than 100 data points). The chart below shows how your site’s average page load time has changed over the last few months. For your reference, it also shows the 20th percentile value across all sites, separating slow and fast load times.

Ouch. Did I mention this would be a painful reality check?

Now to be fair, there’s a very reasonable explanation for this. Google claims that the majority of users will click “back” to the search results page if a link takes too long to load. So, if a webpage is too slow for the visitor to read it, the relevance of the content is…well, irrelevant. I should point out, however, that it’s unknown precisely how much page load speed affects your placement in search results.

What is Included in Page Load Time?

At first, you may think that Google crawlers only index the initial load-time of the HTML. You’d be wrong. They actually include the time it takes to load all Javascript files, CSS stylesheets, images, etc. Now it’s time to see how your site performs. For this, you’ll need either the Page Speed extension for Firefox, or for an even better look into your site’s performance, get Speed Tracer for Chrome. And of course, there’s always the ever-popular YSlow extension for Firefox from Yahoo!

Watch Your Assets

Your website’s assets include all of the files the visitor’s browser must download to render your webpage. This includes Javascript files, CSS stylesheets, and images. And according you Yahoo!:

80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc.

Continue to Article #2, which focuses on reducing the time it takes visitors to download your site’s assets by up to 90%. That means your site will load up to 2-3x faster.




Entries (RSS) and Comments (RSS)