The hidden cost of indexes

Published by Aaron Price on March 25, 2016

If indexing is so amazing at speeding up database queries, why not index everything?

To answer this question, let's dive in to what are indexes and how they work.

Think of your database as a city

Credit: Matthew Wiebe

Each table is its own building in the city. Every time you add a new row to the table, you add a new floor to the building.

Think of queries as people

The building, by default, allows you to traverse each floor using stairs, which is a great solution if you only have one or two floors. But with each floor you add, the slower it becomes too traverse the entire building.

Adding an index to a table is like putting in an elevator

It's a quick, convenient way to get to the floor you need, but elevators cost a lot of money, which in database terms translates to memory, and the cost of a single elevator/index grows with the addition of each new floor/row.

So if you only have a couple rows in a table that is hardly used, there's absolutely no need to add an index. In fact, I would go so far as to say, you should only add indexes after you've confirmed that the benefit you'll receive justifies the cost.

Here's why:

Adding an index is easy, but removing one is hard

If there were a building with 4 floors and its residents were accustomed to using an elevator, it would be a hard sell to remove it. Similarly, with an index on a database table, you would have to go through all the queries on that table to see if it makes sense to remove that particular index. This process becomes more difficult as you scale and queries on a single table may exist in many places.

One way to determine if you need to index is to use a tool like NewRelic or Skylight.io. These tools tell you which queries are performing the slowest.

Only add one index at a time. In a building, a person can only use one elevator at a time. Similarly, a database will only use one index per table per query.

Questions to ask before adding an index

Ask yourself these questions to determine the best place to add an index:

  • Which query condition is used most?
  • Which query condition best filters the search result to the smallest possible set?
  • Which column has the highest amount of unique values?
  • Which column has the smallest byte size?

Index the column or combination of columns that best meet the above criteria.

SPOILER ALERT: ID columns usually score very high on this test, and text fields (i.e.: textarea fields) and boolean columns usually score very low.

Let's look at some real world examples

I'm going to use the below table to show various scenarios of how best to think about indexing. I want to take this opportunity to reiterate, you should only add one index at a time, and only after you've deemed it necessary.

Consider the following table:


                                     Table "public.articles"
    Column    |            Type             |                      Modifiers
--------------+-----------------------------+------------------------------------------------------
 id           | integer                     | not null default nextval('articles_id_seq'::regclass)
 subject_type | character varying           | not null
 subject_id   | integer                     | not null
 slug         | character varying           | not null
 title        | character varying           | not null
 body         | text                        | not null
 published_at | timestamp without time zone |
 created_at   | timestamp without time zone |
 updated_at   | timestamp without time zone |
Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)

If you're using the to_param method to have the article's slug act like an ID column, then putting a single index on a slug might make sense.


add_index :articles, :slug

Composite indexes

A composite index is comprised of more than one column. In the above table you can see that an article has a polymorphic relationship to a subject. In all cases of polymorphism, it's better to create a composite index rather that two separate indexes.


add_index :articles, [:subject_type, :subject_id]

The reason is that, the composite index best satifies the "highest number of unique values" requirement.

Datatype mismatching

There may come a time in the development of your app that you want to group your data by day. E.g.: I'd like to see a graph showing articles published by day. If the query seems slow, you might want to add an index to pubished_at column to speed it up, but published_at is a timestamp column, and the data you need is of type date.

In cases like this, it's better to create a new column called published_on which contains only the date the article was published, and add an index to that field if necessary.


add_column :articles, :published_on
execute "UPDATE articles SET published_on = published_at::date WHERE published_at IS NOT NULL"
add_index :articles, :published_on

Conclusion

Indexes are a great way to speed up database queries, but they come a the cost of memory. They're easy to setup but hard to remove, so only add indexes as you need them. Using the right type of index is important (single column vs. composite) and make sure you're indexing the data you're actually using.

About me...

My name is Aaron Price. I'm a ruby developer based in Toronto and I'm passionate about simplicity.

Feel free to reach out using the following links: