CivicActions has been working with Google's “Make the Web Faster” project team to make some (last minute) improvements that make Drupal 7 faster.

What do we mean by “faster” - this word can have many meanings, but here we simply mean the response time for end users, also sometimes called front-end performance: after a user presses an appropriate mouse/keyboard button, how long does it take for pages to be generated, download and display for a user. This response time metric can also be applied to AJAX and other “in browser” interactions - however this is distinct from (for example) server resource usage, scalability or availability concerns.

Most work to improve end-user response times breaks down into several broad strategies:

  1. Reducing the time taken to generate/serve requests
  2. Reducing the number of requests needed to display the page
  3. Reducing the size of requests
  4. Reducing the time it takes for browsers to parse, render or interact with downloaded pages (not discussed here)

This post will give a brief overview of each topic, the state of Drupal 7, and some of the recent/ongoing work with some benchmarks of these changes. A summary of this is available as a guest post over on the Google Code blog.

Reducing the time taken to generate/serve requests

It is fairly hard in Drupal core to make radical, end-user scale improvements in the page generation time - assuming there aren't any huge blunders that everyone has missed. Of course this work is still very valuable in terms of scalability – a 5ms improvement won't be noticed by end-users (on it's own), but adds up quickly on a server dealing with millions of requests.

The exception to this is high level caching, such as page caching – this can often shave off a significant portion of the page generation time, especially for anonymous users. Drupal 7 has made some major improvements in both the native caching layer, such as easier pluggability (e.g. for memcache integration). Another change enables a site to be configured to deliver HTTP headers that allow anonymous users to cache entire pages locally (for most pages: when no PHP session is needed) as well as improve compatibility with reverse proxies and content distribution networks (CDNs) which can cache requests in multiple geographic locations.

Apart from the HTML, there are static page assets, such as CSS files, JavaScript files and images – here it is also hard to make major (noticeable to an end-user) improvements in the speed with which they can be served to the network: most web servers are fast enough that, for the majority of sites, page assets are served plenty fast enough for end-users limited bandwidth. The one setting that can have an effect here is KeepAlive - if this is enabled it sidesteps some of the network overhead by allowing HTTP connections to be reused for multiple requests.

In addition to serving valid requests, it is also worth considering how long it takes to serve error responses. By default Drupal generates a full, user friendly page for “404 not found” errors. This is obviously a really nice feature when the user actually sees the page, but for inlined assets such as CSS, JavaScript and images there is no benefit to these friendly pages. What’s more, the time needed for Drupal to generate the 404 page, and the additional time needed to download the larger response can actually slow down the original page the user is trying to load: depending on the type of 404 (CSS has a larger effect than images, for example) this can add 0.5 to 1 second to the page load. After a very long period of discussion the patch is finally RTBC that improves this situation by serving a much smaller response earlier in the bootstrap for specific file extension 404s, as well as offering the option to almost completely bypass the bootstrap (which has some potential side effects).

Reducing the number of requests needed to display the page

Before a page can be rendered most of the page assets need to be downloaded – each HTTP request needs to travel from the browser to the server, and then back to the client. This time taken for this round trip is known as network latency – this is often in the order of 100-200 ms, occasionally less, but sometimes significantly more (mobile/3G connections can be more like 600 ms) and unlike bandwidth it is not showing signs of improving anytime soon. Some of this is due to the intrinsic speed of light, some of this is due to network delays, and some due to connection/parsing overheads. Browsers can request several page assets simultaneously (often 6 or so), but it is common to max-out the limit of simultaneous connections and cause queuing. The browser may also need to make several requests to determine a complete list of all the assets it needs. Given that it is not uncommon for unoptimized web pages to include many dozens of assets, the effect of latency on total page load times can be in the order of seconds.

The bottom line of all this is that you generally want to limit the number of HTTP requests per-page as much as possible. There are several ways of doing this:

Reduce the number of assets in your pages

In other words, just take away some page assets completely. Certainly something worth considering when working on the design of your sites. However, there is clearly some limits to this approach – there is a need to balance site performance with a clients need for branding, and the users need for usable sites that convey more than just text.

Allow clients to cache and expire assets locally

This one is really important – without proper cache headers with page assets many client/server combinations will check the validity of their cache for each individual asset (normally receiving a “304 Not Modified” response). While this saves the extra time needed to re-download the assets the latency for each of these requests is still present and can really slow down page loads.

Drupal's default .htaccess file includes rules to tell clients that all page assets should be expired only after 2 weeks (by default) – this means that clients can use these resources for this time without checking that the cached version is valid, which avoids these requests on subsequent pages. However, for these rules to function, mod_expires Apache module needs to be enabled. This should be enabled by default on most Red Hat based distributions, on Debian or Ubuntu you can enable it by running “sudo a2enmod expires” and reloading Apache as directed. For web servers other than Apache you should read the documentation to see how to enable cache expiration headers for these assets.

Aggregate multiple assets into a single request

This involves bundling several assets together. Drupal has had the ability to do this for CSS and since Drupal 5 and for JavaScript since Drupal 6 - part of the “preprocess” functionality (called “optimize” in the user interface). Here is a rough sketch of how this works:

Drupal aggregation

As you can see, by concatenating the 6 files into 2 (one for each CSS media type) we avoid 4 HTTP requests. The aggregate file name is a generated hash of all the file names that are included in that aggregate. On production sites (with additional modules and themes) the saving is often even more, hopefully keeping the total number of CSS and JavaScript requests small enough in number to avoid browsers needing to queue these requests.

Of course, the actual aggregation process if a fair bit more complex than the diagram above implies. We don’t want to aggregate all files, as a large volume of CSS and JavaScript applies to only admin users, and is unnecessary for the majority of regular visitors - so the API needs to allow the module author the ability to select this behavior on a per file basis. We also need to make sure that files stay in the correct order, which may involve having multiple aggregate files before and after a non-aggregated file. For CSS we also need to ensure any @import rules are moved to the top of the file. We also need to ensure that we can force caching browsers to fetch an updated aggregate if the CSS or JavaScript changes.

While this worked quite well it had several places where it needed improvement.

Enable aggregation by default

One issue is that aggregation is disabled by default in the interface. This means that sites built by inexperienced Drupal users are much less likely to have aggregation enabled, even when the site is launched. When aggregation is disabled this can add anything from 0-1.5 seconds to page loads, depending on the number of CSS/JS requests and visitor connection speeds/latencies. This might seem simple to fix - just change the default - however it is not quite as easy as that. While we want all visitors to Drupal sites to have the best experience possible, we also need to ensure that Drupal is an accessible platform for brand new site builders and developers. It is quite normal when learning a new platform to spend time experimenting, and so it is useful for Drupal to stay as “hackable” as possible - for example, if you edit some CSS file you can see the effects without needing to dig in settings pages.

There is a simple solution that gets the best of both worlds: if we include the modification time (“mtime”) of each file in with the list of filenames we hash to generate the aggregate filename then we will make the aggregate file automatically refresh whenever a source file is edited. We could then have aggregation enabled by default - patch is in progress that does just this. One complexity that needs some consideration is that there is a slight decrease in the server side performance (a few ms) in checking all the files mtimes. While pretty minimal this is something that large, busy sites would want to disable, so an option to allow aggregation without checking mtimes is necessary

Prevent aggregates with duplicate contents

Another major (but complex) issue is that the API to enable preprocessing when modules add a file to the page previously defaulted to TRUE. This might sound OK, but in reality is a really bad idea. It means that if a module author added a file conditionally (e.g. only on a specific page or pages) the user would end up downloading the same CSS or JavaScript code multiple times, unless the developer explicitly disabled preprocessing for that file.

For example a user visits 3 pages:

  1. Page X sends an aggregate of files a, b, c and d
  2. Page Y sends an aggregate of files a, b, c and e
  3. Page Z sends an aggregate of files a, c, d and e

In this case the user has downloaded the code for file “a” 3 times, “b”, “c” and “e” twice over. Files “b”, “d” and “e” have been added conditionally. Not only is this a waste of bandwidth, but it breaks the native browser ability to cache things properly. This is not unlikely to occur, even with skilled core developers - some tests show that even in Drupal 7 core alone we were generating over 3MB (yes, megabytes!) of duplicated CSS/JS code in aggregate files across different page loads, and over 30 additional HTTP requests. Add a few contributed modules and this issue would be even larger.

To demonstrate an improved approach, lets say that file “b” and “d” are files that regular visitors would likely need over the course of their visit, and file “e” is actually a file than only administrators would ever need, or only appears in site pages that visitors would rarely use.

  1. Page X sends an aggregate of files a, b, c and d
  2. Page Y sends an aggregate of files a, b, c and d, file e sent individually
  3. Page Z sends an aggregate of files a, b, c and d, file e sent individually

The fix then involves several elements:

  1. Setting preprocess to default to FALSE - this is the safe default.
  2. Checking that all files that typical visitors would need (on a typical visit) are preprocessed, and added unconditionally - normally in the modules hook_init() implementation.
  3. Checking that all other files (for administrators or non-frequently visited used site pages) are not set to preprocess and are added conditionally - only when required for a specific page.
  4. Fix the documentation so that the usage of this is clear to module developers.

Happily, a patch that does just this was just committed. To fully understand the value of this fix consider that in the first scenario on pages Y and Z the user had to completely redownload these aggregate files. On production sites it is normal for CSS and JavaScript aggregates to be 50KB, often 200KB or even more. A little benchmarking shows 50KB adds something like 0.5-1 seconds to a page load, 200KB would be more like 1-4 seconds (depending on latency and connection speed, of course). Let’s compare this to pages Y and Z in the second scenario - if the site is using mod_expires then the locally cached aggregate files can be used with no HTTP request at all, on page Y would add an individual request for file “e” which, if this is (for example) 2KB would add perhaps 0.1-0.5 seconds to that page load, but would be cached for page Z. Adding all this up in the first scenario users would be unnecessarily be waiting something like 1.9-7.5 extra seconds over the course of the 3 page loads, verses the second scenario. Benchmarking on core alone (which of course has much smaller, non-typical aggregate sizes) shows that avoiding duplicate aggregates can still save over a second across 5 page loads. Further work is ongoing to allow files to be added to the aggregate via the module “.info” metadata files, which should make this simpler and more robust.

Consider CSS sprites

Aside from CSS and JavaScript, images can also be aggregated using CSS sprites, where a single image files contains multiple images that are selected for display by positioning a CSS background image to show that image through a “window”. Drupal core does not have any facility for managing these (perhaps Drupal 8!), but does incorporate the technique in the new admin theme “Seven”. Several contributed modules are available that can support using CSS sprites as well as other approaches such as embedding images into CSS files (CSS3 only; in base64 format).

Reducing the size of requests

This can be done by reducing the amount of code, compressing the code using format specific compression, or using general purpose compression at the http level.

Reducing the amount of code is something that is quite hard for a general purpose web framework like Drupal. Making sure that site builders can accomplish as much as possible using CSS alone can mean you inevitably end up with lots of extra wrapper divs. Some great work is going on in contrib though, to allow users to opt to trim this back to a more minimal semantic structure, and a smaller HTML request size. In terms of CSS and JavaScript, core tends to have enough eyes that we avoid much of the cruft that tends to develop on production sites, but still features are bing added and the total volume of code has grown significantly since Drupal 6 (although much of it is targeted only at admin users).

Drupal does have basic format specific compression for aggregated CSS, in the form of whitespace and comment removal. This can have a reasonable fraction off file sizes - in the sketch above you can see that the 2 aggregated files weigh in 24KB total, relative to 32KB for the individual files. Core does not have built in JavaScript compression (such as Minify) although several specific libraries (e.g. jQuery) are included that are distributed in a compressed form. A good JavaScript compressor is available in contrib, however. For images the popular imagecache functionality is now in core, which can help ensure images are sized and compressed appropriately (although not always as well as manual tuning!), there is also a PNG optimization contrib module available.

Drupal has supported gzip compression of HTML output (for cached pages, delivered to anonymous users) for a long time. For CSS and JavaScript however, the files are delivered directly by the webserver, so Drupal has less control. There are webserver based compressors such as Apache’s mod_deflate, but sites are not always able to turn these on, and can bring some performance concerns, since files are (re)compressed on every request with no caching. A patch is RTBC that attempts to address these issues by storing compressed versions of aggregated files on write (avoiding the server-side performance penalty) and using rewrite and headers directives in .htaccess that allow these files to be served correctly. Benchmarks show that this patch can make pages load 20-60% faster - a really nice improvement.

Conclusion

Front-end performance is a critical area for increasing the usability and effectiveness of web sites, and hence also the tools used to build the web, of which Drupal is now a major contender. One of the many statistics available on this topic shows that a 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. Several of the patch benchmarks highlighted in this post each have the potential to avoid 1 second (or more) of page load delay.

Improving the front-end performance of Drupal can be a complex task, balancing different use cases and users - trying to find the best overall solution. Dozens of contributors have contributed to this effort - including: Khalid Baheyeldin (kbahey), Owen Barton, Alex Bronstein (effulgentsia), Mike Carper (mikeytown2), Daniel F. Kudwien (sun) (who also helped review this post) and David Rothstein.

Going forward, no doubt many performance optimizations will continue to appear and be refined in the contributed module/theme space, as well as site building best practices and documentation. Getting Drupal 7 to final release is of course critical to bring the above improvements to the wider web. In Drupal 8 we will hopefully see further improvements in the CSS/JS file aggregation system, increased high-level caching effectiveness and hopefully more tools to help site builders reduce file sizes. Whatever your skill set contributions are always welcome - get involved!

Owen Barton is Director of Engineering at CivicActions. He has been developing elegant solutions in Drupal for over 12 years and is widely credited with building one of the most reputable and experienced Drupal engineering teams on the planet.