If indexing is so amazing at speeding up database queries, why not index everything?
To answer this question, let's dive in to what are indexes and how they work.
Each table is its own building in the city. Every time you add a new row to the table, you add a new floor to the building.
The building, by default, allows you to traverse each floor using stairs, which is a great solution if you only have one or two floors. But with each floor you add, the slower it becomes too traverse the entire building.
It's a quick, convenient way to get to the floor you need, but elevators cost a lot of money, which in database terms translates to memory, and the cost of a single elevator/index grows with the addition of each new floor/row.
So if you only have a couple rows in a table that is hardly used, there's absolutely no need to add an index. In fact, I would go so far as to say, you should only add indexes after you've confirmed that the benefit you'll receive justifies the cost.
If there were a building with 4 floors and its residents were accustomed to using an elevator, it would be a hard sell to remove it. Similarly, with an index on a database table, you would have to go through all the queries on that table to see if it makes sense to remove that particular index. This process becomes more difficult as you scale and queries on a single table may exist in many places.
One way to determine if you need to index is to use a tool like NewRelic or Skylight.io. These tools tell you which queries are performing the slowest.
Only add one index at a time. In a building, a person can only use one elevator at a time. Similarly, a database will only use one index per table per query.
Ask yourself these questions to determine the best place to add an index:
Index the column or combination of columns that best meet the above criteria.
SPOILER ALERT: ID columns usually score very high on this test, and text fields (i.e.: textarea fields) and boolean columns usually score very low.
I'm going to use the below table to show various scenarios of how best to think about indexing. I want to take this opportunity to reiterate, you should only add one index at a time, and only after you've deemed it necessary.
Consider the following table:
Table "public.articles" Column | Type | Modifiers --------------+-----------------------------+------------------------------------------------------ id | integer | not null default nextval('articles_id_seq'::regclass) subject_type | character varying | not null subject_id | integer | not null slug | character varying | not null title | character varying | not null body | text | not null published_at | timestamp without time zone | created_at | timestamp without time zone | updated_at | timestamp without time zone | Indexes: "articles_pkey" PRIMARY KEY, btree (id)
If you're using the
to_param method to have the article's slug act like an ID column, then putting a single index on a slug might make sense.
add_index :articles, :slug
A composite index is comprised of more than one column. In the above table you can see that an article has a polymorphic relationship to a subject. In all cases of polymorphism, it's better to create a composite index rather that two separate indexes.
add_index :articles, [:subject_type, :subject_id]
The reason is that, the composite index best satifies the "highest number of unique values" requirement.
There may come a time in the development of your app that you want to group your data by day. E.g.: I'd like to see a graph showing articles published by day. If the query seems slow, you might want to add an index to
pubished_at column to speed it up, but
published_at is a timestamp column, and the data you need is of type date.
In cases like this, it's better to create a new column called
published_on which contains only the date the article was published, and add an index to that field if necessary.
add_column :articles, :published_on execute "UPDATE articles SET published_on = published_at::date WHERE published_at IS NOT NULL" add_index :articles, :published_on
Indexes are a great way to speed up database queries, but they come a the cost of memory. They're easy to setup but hard to remove, so only add indexes as you need them. Using the right type of index is important (single column vs. composite) and make sure you're indexing the data you're actually using.