Two recent projects at CivicActions had some queries that were causing performance issues. What is interesting to note that in both cases, we found that adding a correct index to a table or two helped mitigate some of the issues. I have been thinking about this, and I wonder how far Drupal has gone to deal with database optimization. Focusing on indexes alone is enough to get started. I have this vague sense of Well if it's in core, then I know lots of people have looked at these tables and indexes are created as needed. With contributed modules, my vague sense is You better check every table created by every module to make sure indexes exist where you need them. That is probably a bit too harsh. In general, you want to create indexes only when you need them. You don't want too many and you don't want too few, so it's a matter of it being "just right". I decided to investigate contributed modules some more. What about the biggies? Views? Part of CCK are already in core, so I'm not worry about CCK too much. Certainly they would have proper table indexing, right? Let's see. Views looks good. I see a couple of indexes beyond primary keys to support views tables. Even just seeing these indexes in the schema adds a sense of relief. What about others? The two issues we came across recently had to do with i18n. The current tables in i18n look good to me, so what happened? One issue is that in Drupal 5x the i18n_node table does not have indexes where you really need them. In particular, there are queries which rely on the trid column like this example from translation_node_get_translation():
SELECT n.nid, n.title, n.status, a.language FROM node n INNER JOIN i18n_node a ON n.nid = a.nid INNER JOIN i18n_node b ON a.trid = b.trid WHERE b.nid = '11' AND n.nid != 11 AND b.trid != 0
There is an INNER JOIN and a WHERE clause utilizing the i18n.trid column. Running EXPLAIN before adding indexes, shows:
EXPLAIN SELECT n.nid, n.title, n.status, a.language FROM node n INNER JOIN i18n_node a ON n.nid = a.nid INNER JOIN i18n_node b ON a.trid = b.trid WHERE b.nid = '11' AND n.nid != 11 AND b.trid != 0;
+----+-------------+-------+-------+---------------+---------+---------+-----------+------+----+
| id | select_type | table | type  | possible_keys | key     | key_len | ref       | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-----------+------+----+
|  1 | SIMPLE      | b     | const | PRIMARY       | PRIMARY | 4       | const     |    1 |             |
|  1 | SIMPLE      | a     | ALL   | PRIMARY       | NULL    | NULL    | NULL      |  205 | Using where |
|  1 | SIMPLE      | n     | ref   | PRIMARY,nid   | nid     | 4       | mnm.a.nid |    1 |             |
+----+-------------+-------+-------+---------------+---------+---------+-----------+------+----+
3 rows in set (0.02 sec)
I add an index like this:
CREATE INDEX trid ON i18n_node (trid);
Now when running EXPLAIN, look at the results:
EXPLAIN SELECT n.nid, n.title, n.status, a.language FROM node n INNER JOIN i18n_node a ON n.nid = a.nid INNER JOIN i18n_node b ON a.trid = b.trid WHERE b.nid = '11' AND n.nid != 11 AND b.trid != 0;
+----+-------------+-------+-------+---------------+---------+---------+-----------+------+----+
| id | select_type | table | type  | possible_keys | key     | key_len | ref       | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-----------+------+----+
|  1 | SIMPLE      | b     | const | PRIMARY,trid  | PRIMARY | 4       | const     |    1 |             |
|  1 | SIMPLE      | a     | ref   | PRIMARY,trid  | trid    | 4       | const     |    2 | Using where |
|  1 | SIMPLE      | n     | ref   | PRIMARY,nid   | nid     | 4       | mnm.a.nid |    1 |             |
+----+-------------+-------+-------+---------------+---------+---------+-----------+------+----+
3 rows in set (0.00 sec)
Do you see the changes in the second row? The type is now 'ref' not 'ALL', the ref value is 'const', and there is a reduction in the number of rows MySQL thinks it will review. That's what we're looking for! The other issue was with Drupal6 and it turns out that adding a new module, country_code necessitated an index on the language column in the node table. This makes more sense when you look at one of the queries:
SELECT COALESCE(node2.nid, node.nid)
FROM node node LEFT JOIN node node2
ON node.tnid = node2.tnid AND node2.language = 'en'
WHERE node.language = '' AND
node.tnid = 1;
That node.language = '' desires an index. So adding an index to that column alone improved performance:
CREATE INDEX language ON node (language);
What does this mean? It means you need to watch out when adding new modules to your high traffic and heavy sites. I think it is worthwhile to check at least any new tables a module is creating and see if you can identify a column or two that may need an index. With a little more time, running the devel module and looking for slow queries is helpful too. The point of this post is not to show you what I know or to blame developers. It is a reminder to me (and I hope for you too) to use available tools and to watch for performance changes when adding new modules.