-
FriendFeed's Pseudo-ODBR.J. LorimerFri, Feb 27 2009 @ 11:29 pm
-
RubotoR.J. LorimerThu, Feb 26 2009 @ 4:33 pm
-
Constructor CompletionR.J. LorimerTue, Jan 20 2009 @ 3:44 pm
-
The Call of ChrismathuluR.J. LorimerFri, Nov 7 2008 @ 2:38 pm
-
Getting True Java Classes in JRubyR.J. LorimerThu, Sep 25 2008 @ 6:41 pm
Tuning Drupal for a Weblog
Choosing Drupal (or another similar technology such as Joomla) for producing a community managed website is generally a fairly easy process. Content management systems have become so mature and configurable that there is little they can’t do when it comes to a content-oriented presence.
For RealJenius.com I chose Drupal for my re-set. I’ve been following Drupal for a long time, and their 5.0 release was simply too enticing for me to keep ‘rolling my own’. I’ve dug into a lot of the bits and parts of Drupal over the past two months as I ramped up my new installation, and I have quickly learned what is important and what is not when it comes to keeping Drupal fast and happy. If you are considering Drupal for the first time, hopefully this list of bullet points will help you know where to start.
Find a Competent Host
This one seems relatively obvious, but it’s so easy for the host you are using to make or break your Drupal installation. Drupal has few simple needs: fast PHP execution and a fast connection to MySQL. With most modules, the second is extremely valuable, as Drupal will trigger several queries on a single page load without caching.
I’ve chosen Dreamhost, and in general I have been very happy. There was a period, however, where my server cluster was bogged down very badly. Connections to MySQL (which is running on separate hardware with Dreamhost) were slow, and my PHP execution was abysmal. At the time I didn’t realize that the performance was indicative of a problem, and not just inefficient Drupal combined with a slow server.
I started blaming Drupal because small test PHP/MySQL pages were running acceptably (albeit, not fast), and Drupal was just invoking so many queries it was exposing a problem with my host. Later, I was relieved to discover that my host was having some significant issues, and the performance was not related to my software stack.
Nonetheless, during that ‘slow’ period, I was still able to make my website run acceptably well for the average individual browsing my site; I wouldn’t call it fast, but I was satisfied.
Caching
Drupal 5 has a very proficient caching mechanism that operates at the page level. You get to this functionality by going to Administer->Site Configuration->Performance->Caching (section) and looking at the ‘Page Cache’ section:

The caching options include the ability to set a minimum time-to-life for the caches as well as the ability to choose between ‘Normal’ caching and ‘Aggressive’ caching.
Minimum Time-to-Live
Caching in Drupal is done at the page level, so in concept you can think of it as all page content (the node, comments, and blocks) all being cached. This is a good thing because it is the most effective from a performance standpoint. However, Drupal loses some context about what it has cached, making the cache volatile to changes.
Currently, if you change any content on a live Drupal site with a high number of cached pages, all of the cached pages will be evicted (resetting the entire cache). This may seem excessive, but consider a new comment, for example, may appear as a new line item in a block on the side of your page. Alternatively, consider a block that renders the number of nodes tagged with a certain tag (e.g. ‘Java (36)’) - if you add a new piece of content that is tagged with ‘Java’, you would expect it to now say ‘Java (37)’ - so either Drupal needs to be smart enough to figure out that block is affected, or it just needs to blindly dump the cache. Drupal doesn’t currently have a reliable way to distinguish what pages are affected by other nodes, so it uses the ‘clear all’ approach.
A minimum time-to-live ensures that entries in the cache are used for a certain minimum time-frame (be it an hour, a day, a week, or some other time selected by you) - the longer the delay, the more stable the cache; however the more likely the user will see stale information on certain pages. Once a page is cached, no matter what else happens after that, the changes won’t appear on that cached page until the cache lifetime is surpassed.
Leaving this setting off (None) means the cache will reset whenever Drupal detects the possibility the content could be stale (in other words, any writes).
Normal Caching vs Aggressive Caching
The two caching levels implemented in Drupal have a subtle, but key difference. Normal caching initializes a page call like any normal call, but when it comes time to actually invoke the code to render the page (blocks, nodes, all that good stuff) it bypasses that by providing a pre-rendered HTML block.
Aggressive caching, on the other hand, bypasses all module hooks for even the initialization. What this means is that as soon as it gets a request for node 1001 it is going to look up a cache result for node 1001 and try to return it immediately, rather than first telling all of the modules installed that it is going to render node 1001.
The reason that this makes a difference is because the initialization hook for node rendering is used by several modules to set themselves up to respond to the event. While none of this setup is typically expensive, it can add up if you have hundreds of page loads a second.
As a side effect, various modules can perform activities such as usage tracking during the initialization and exit calls (that’s the most common use case, anyway). Because these calls aren’t made with aggressive caching, these features won’t work properly. On the flip side, statistics are expensive, and this is a sure fire way to avoid them on page loads.
As you can see above, the page itself has a disclaimer about the side-effects of aggressive caching:
The normal cache mode is suitable for most sites and does not cause any side effects. The aggressive cache mode causes Drupal to skip the loading (init) and unloading (exit) of enabled modules when serving a cached page. This results in an additional performance boost but can cause unwanted side effects.
I have normal caching enabled on RealJenius.com, and it makes a significant enough difference that I don’t feel the need to consider aggressive caching. The most significant performance trade-off given my limited plug-in installation is going to be the cost of capturing the statistics in the first place, and I’d just as soon those work correctly so I can keep track of site trending data.
Advantages
- Page caching dramatically reduces the amount of work to render mostly static pages.
- Caches on a web-log will have a high number of cache hits, and a low number of cache misses based on user interest.
- Not only does end-user experience improve, but your hosting provider will like you too, because CPU and memory usage will drop, as will DB traffic.
Disadvantages
- Page caching doesn’t work at all currently for logged-in users.
- Page caches are currently all dropped if any content is changed on the site.
Aggregate your CSS Files
The other key ‘out-of-the-box’ performance option is CSS aggregation, and is available from the same option page.

The aggregation feature effectively does two things: 1. Combines all of the styles for all of the included modules into one file instead of several files; this makes it much easier on the browser to get all of the CSS content (not to mention the socket), as it only has to make one request. 2. Compress the CSS by removing white-space.
Both of these are valuable for improving perceived performance; overall, you’re talking about avoiding unnecessary network traffic.
Advantages
- Reducing concurrent requests and network traffic is always a good thing.
- Improving CSS delivery is very ‘close to the client’; it will quickly result in improved perceived performance because the browser is doing much less work.
Disadvantages
- The CSS aggregation also caches a new CSS file, and as such interferes with changes to module and theme CSS content: in short, leave this off if you are working on your theme or developing a module.
- The aggregation code isn’t fool-proof; nested imports will prevent it from aggregating as effectively.
Other Options
- Block Cache - The block cache is theoretically a very valuable module, but for me it didn’t apply for two reasons: 1.) I didn’t need the added complexity since I don’t have logged-in users who could benefit from block caching and 2.) it isn’t very 5.0 solid (dev snapshots only as of 4/30/07).
- eAccelerator (or comparable technology) - These are a major win-win. Installing these technologies on your host are not always easily possible, but if you can manage it, it almost indefinitely will improve performance, and rarely has significant side-effects.
- MySQL Tuning - MySQL is open for a ton of tuning options, but again, it’s entirely up to your host whether you have access to such wonderful techniques such as buffer sizes, query caches, and more.
Summary
What I’ve covered here is just a quick overview of what worked for me, and what I avoided dealing with. Everybody’s case is different, and my site is largely anonymous, read only access (I’m the only one logging in, and I’m the only one regularly maintaining content), which makes me the only one operating without all of the major performance benefits broken down above.

Comments
Really nice post.
Really nice post. Thanks.
Maybe you could submit it to drupal.org.