Choosing which database indexes to add

By Nathan Donaldson in Development on August 20, 2009

Image 0476 full 2x

When writing a Rails application, how do you decide on the best indexes to add to your database? It might seem obvious, especially if you work on a project from scratch. The problem is a little harder when you come to optimize an existing codebase.

New Relic

Recently I’ve been using two methods to work out where to put indexes. Firstly I’d strongly recommend using New Relic RPM in development mode. When running your application you can visit /newrelic to get all kinds of useful information. Here you can see the most recent rails calls:

Picture 4

And you can drill down to see the database calls:

Picture 8

And then you can drill down further to see a statement analysis. This will give you information like so:

Query SELECT * FROM `projects` WHERE (`projects`.account_id = 3) ORDER BY position
Select type SIMPLE
Possible Keys
Key
Extra Using where; Using filesort

So this will tell you that there are no keys used, and no possible keys to use. If you now create a migration as follows:

class CreateIndexes < ActiveRecord::Migration
  def self.up
    add_index :projects, :account_id
  end
end

Now run the query again after migrating:

Query SELECT * FROM `projects` WHERE (`projects`.account_id = 3) ORDER BY position
Select type SIMPLE
Possible Keys index_projects_on_account_id
Key index_projects_on_account_id
Extra Using where; Using filesort

But we could do better. Notice that in the extra field it says “Using filesort”. This means that MySQL is doing a normal sort using the row data. If projects are usually selected by account_id and ordered by position then we have a good case for sorting by index. We can’t just add an index on the position column as this won’t be used (try it and you’ll see exactly the same results). Instead we have to add an index that covers both account_id and position:

add_index :projects, [:account_id, :position]

Now looking in RPM again you’ll see things change to:

Query SELECT * FROM `projects` WHERE (`projects`.account_id = 3) ORDER BY position
Select type SIMPLE
Possible Keys index_projects_on_account_id_and_position
Key index_projects_on_account_id_and_position
Extra Using where

Overriding execute

I talked about using two tools to work out where to put indexes. New Relic is great and gives lots of useful information, but I wanted to see what queries were running without indexes as the application was used, in order decide if they required optimizing. I placed the following (ugly) snippet of code into the development or staging environment file:

config.after_initialize do
  unless ActiveRecord::Base.connection.respond_to?(:old_execute)
    ActiveRecord::Base.connection.class.class_eval do
      alias_method :old_execute, :execute

      def log_explain(str)
        if str =~ /^SELECT.*WHERE/i
          results = old_execute('EXPLAIN %s' % str)

          while row = results.fetch_row
            if row[5].blank?
              File.open(RAILS_ROOT + '/log/sql.log', 'a') do |f|
                f.puts str
                f.puts row.inspect
              end
            end
          end
        end
      end

      def execute(*args)
        log_explain(args.first)
        old_execute(*args)
      end
    end
  end
end

This performs a similar task to RPM in that it runs MySQL’s EXPLAIN statement on each SELECT statement that runs. However, I’ve set it up to filter out any statements that are already using keys (if row[5].blank?). Now by tailing the file log/sql.log I’ll be able to see any unindexed queries that are running on the database. I can use this information to quickly run through the whole application to find any problem areas.

What not to index

Most of the database calls from a rails application are going to be quite generic, the same thing over and over, and indexing is good in this situation. Some database calls are going to be a little different – think search, statistical count, etc. You’ll need to think careful if these require indexes or not, considering how often they’re run, and how fast they need to be. Caching can be a good replacement in these areas. Also take into account that indexes slow down database writes, and indexes on string columns are going to be much more intensive that indexes on integer columns.

Load testing

Of course you might not find all problem areas, or you might not be optimizing the correct queries. These methods should give you a good idea, but they’re no replacement for doing good load testing on your application. We’ll be posting more about load testing a rails application in the future.