Once upon a time I have stumbled into a performance issue with one of the queries in the app I have been working on.
Here I want to share my findings during the issue investigation.
Let’s start simple. Imagine that we have a database with such a table:
CREATE TABLE sample(
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
some_name VARCHAR(200) NOT NULL,
some_time TIMESTAMP NOT NULL,
some_status VARCHAR(20) NOT NULL,
some_number INTEGER NOT NULL
);
The table has about 6 million rows, and we need to query it fairly often. We were using the PostgresSQL as our database, so we were expecting that everything should work just fine on such scale. Since we know that it will be read intensively, we have put an index on it:
CREATE INDEX some_name_some_time_some_status_index on sample (some_name, some_time, some_status);
The index covers all the fields we were planning to filter by, so we could assume that everything will be fine from the performance perspective, and it actually was. The query like this one:
SELECT *
FROM sample
WHERE some_name
NOT IN ('8c9ff4808ce85067d7c8c43907482e62', '02f371da71e4f315995cb6195cb04100', 'a78ef4eedebb04cd80555668b054997f')
AND some_time > now() - interval '6 month'
AND some_status = 'PROCESSED' LIMIT 500;
behaved correctly, and we were getting good result with it:
In some time after the initial implementation we figured out that we need the results to be sorted in some way:
SELECT *
FROM sample
WHERE some_name
NOT IN ('8c9ff4808ce85067d7c8c43907482e62', '02f371da71e4f315995cb6195cb04100', 'a78ef4eedebb04cd80555668b054997f')
AND some_time < now() - interval '6 month'
AND some_status = 'PROCESSED'
ORDER BY some_name DESC, some_time LIMIT 500;
The query looked almost the same with only addition of the ORDER BY
clause, however the
execution time has grown significantly.
We have figured out that our index is not so performant in such situation
and even was fully ignored sometimes:
The first thought was to look for some internal DB configurations which could help us to avoid the issue, however we have found nothing suitable.
Then, after some more time we have looked more thoroughly at the Postgres indexes documentation and bumped into https://www.postgresql.org/docs/9.4/indexes-ordering.html which has opened our eyes. The issue we were fighting with could have been resolved with simple index tweak. We have added another index to the DB (ordered this time) and this was the right thing.
The index:
CREATE INDEX some_name_some_time_some_status_ordered_index on sample (some_name DESC, some_time ASC, some_status);
has improved the query execution speed dramatically (10x increase).
On the project I have worked on the improvement was even more impressing (200x for 10kk entries).
This case has taught me the importance of indexes, and the right choice of them.
The sources with a script for the sample DB setup is hosted on GitHub.
Thanks for reading.