Drupal 6 vs. Drupal 7 Performance (and Comments vs. Nodes)

Nathaniel Catchpole


May 19, 2009

Drupal 6 vs. Drupal 7 Performance (and Comments vs. Nodes)

I've been doing quite a bit of work recently trying to improve performance in Drupal 7. This has included both reducing the raw number of queries run per page, particularly those generated by node_load() and drupal_lookup_path(), and also trying to track down some PHP bottlenecks. In January, I compared nodes vs. comments in HEAD. However there haven't been any comparisons between Drupal 6 and 7 recently. To maintain the comments vs. nodes comparison I benchmarked the default front page with 50 and 300 nodes listed, and node/n with 0, 50 and 300 comments listed (and one node of course). This isn't a very realistic scenario, but it's useful for seeing the impact of certain changes. There's been discussion of trying to run continuous automated benchmarks of both HEAD and Drupal 6 - so we can catch performance regressions in a similar way the automated testing suite does now. Hopefully as part of that we'll see more realistic tests being developed. In the meantime, here's what I found: The good news is that we're spending a lot less time in the database for both nodes and comments in Drupal 7. The bad news is a lot of this is being canceled out by more time spent in PHP. This is particularly obvious when viewing a single node, where HEAD is a fair bit slower than Drupal 6, but it carries over to long listings too. Viewing 300 nodes takes just over 1 second in both releases on my laptop. In Drupal 6, over half the time is spent executing database queries (700ms), in Drupal 7, we cut that by 500ms, but benchmarks come out about the same. Times are in requests per second via ab (more requests is better), I've also included devel query log results. Note I cheated a bit and applied this patch to HEAD before doing the comparisons - without that there's an extra query per node. The extra PHP time is in large part due to much more flexibility in certain parts of Drupal 7 - like passing node contents through drupal_render(). This is good in that we've traded performance for flexibility rather than just introduced bottlenecks without a trade-off, but still, work to be done.
Drupal 6 Drupal 7
1 node, 0 comments 15.84 [#/sec] 42 queries in 26.76 milliseconds. 9.29 [#/sec] 39 queries in 25.2 milliseconds.
1 node, 50 comments 4.17 [#/sec] 243 queries in 131.15 milliseconds. 2.60 [#/sec] 91 queries in 50.76 milliseconds.
50 nodes 1.87 [#/sec] 400 queries in 248.76 milliseconds. 2.46 [#/sec] 87 queries in 47.55 milliseconds.
1 node, 300 comments 0.97 [#/sec] 1244 queries in 808.31 milliseconds. 0.87 [#/sec] 341 queries in 152.71 milliseconds.
300 nodes 0.46 [#/sec] 2134 queries in 788.24 milliseconds. 0.50 [#/sec] 337 queries in 271.23 milliseconds.
If you're interested in improving this situation further, take a look at http://drupal.org/project/issues/search/drupal?issue_tags=Performance for things currently being worked on.

Share it!

@anonymous: check_markup() is cached and currently run on every formatted textarea - so it's one cache_get() for every textarea displayed. If you run memcache or similar, then you skip that query. For database caching I'm hoping a combination of http://drupal.org/node/369011 and http://drupal.org/node/333171 will reduce this to a single query for any list of fieldable stuff with text field attached - i.e. if body as field lands the hook_field_sanitize() patch would cut 2 queries per node shown on a page. @greggles: Yeah I think it's not too bad for large sites - especially if we manage to successfully push down memory usage per apache process via the registry (although that's not done yet). Better than the other way around. But still...
If we push the bottleneck to the PHP, that could be good news - it's easier to create a cluster of webservers than a cluster of databases. So, people who need to scale beyond 1 server will have an easier time doing this if the bottleneck is in PHP. Of course, it would be nice if clustering MySQL behind Drupal were easier, and it's probably getting there, but this doesn't seem like terrible news to me.
Just out of curiosity. Why does the number of queries rise with the number of comments (iow why is there an extra query for every comment)?
Some of the reasons include Fields in Core, changes to drupal_render, theming and so on. There are afaik some bottlenecks that are currently discussed. There are also some additional abstraction layers (like the old DB API) that slow things down a bit, I think and will be removed (hopefully, I'm working on that.. ) before D7 is released. What would imho be really interesting is a performance test with ~50 real modules with hooks and stuff to soo the influence of the Registry. But that obviously can't be done at the moment... catch, can you post which modules were enabled in D6 and D7? I'll try to come up with some CacheGrind data...
In Drupal 6, the default modules + path + devel (also generated path aliases for the nodes) In Drupal 7, same again + the field modules (which are optional, but won't be soon). I did some profiling as well, nothing very conclusive though but will try to get some better results and post those too. Most interesting is going to be node/n with no comments (or a few) I think - where the database gains don't make much difference. I think PDO is responsible for a fair bit at that level but not been able to get reliable results yet.
It's great to see the number of queries come down like this, but I admit I'm surprised to see the overall PHP runtime not to come down with it. Would the inclusion of APC at the PHP layer significantly cut down on this processing to increase the net gain over Drupal 6? Either way, keep up the fantastic work catch. You might be the sole reason why my 2009 prediction won't come true.
I should have mentioned - I run APC on localhost and never switch it off unless I'm specifically testing something without it (like this: http://drupal.org/node/345118). So no, it wouldn't help here - things might yet be much, much worse without it though!