sql - Which column to index in multicolumn indexing?

UserID	First	Middle	Last	Type	CreatedAt
123	John	Henry	Doe	Mage	03-28-2025

UserID	First	Middle	Last	Type	CreatedAt
123	John	Henry	Doe	Mage	03-28-2025

Let's say I have the following table above. I would like to create index to help speed up my queries.

All queries would like along the lines of this:

Select *
from users
where Type = 'SomeType'
  and First = 'SomeName1'
Order by CreatedAt DESC;

Select *
from users
where Type = 'SomeType'
  and Middle = 'SomeName2'
Order by CreatedAt DESC;

Select *
from users
where Type = 'SomeType'
  and Last = 'SomeName3'
Order by CreatedAt DESC;

How would I index the columns to make the queries efficient? Would CreatedAt be the first of the indexed column?

I'm thinking

CREATE INDEX idx_users on users(CreatedAt, Type, First, Middle, Last)

CreatedAt and type would always be used, whereas first, middle, and last varies.

Share Improve this question edited Mar 28 at 7:56 jarlh 44.8k8 gold badges50 silver badges67 bronze badges asked Mar 28 at 6:20 Alan Chen 412 bronze badges

CREATE INDEX idx_users on users(Type, First, Middle, Last); or CREATE INDEX idx_users on users(Type, First, Middle, Last)include(createdat,userid); if it fits and you're after index-only scans. If createdat and userid aren't in where, they don't need to be index keys, only an optionally included payload. If the type you're searching for is constant, you can make that a partial index by removing it from the keys and adding a where(type='sometype') to the definition. You might be interested in index types that support pattern matching (gin_trgm_ops) for the names. – Zegarek Commented Mar 28 at 6:50
How much research effort is expected of Stack Overflow users? minimal reproducible example Tips for asking a good SQL question Asking query performance questions How to Optimize Queries in a Database - The Basics What are your most common sql optimizations? When and why are database joins expensive? – philipxy Commented Mar 28 at 20:52

Add a comment |

3 Answers 3

Sorted by: Reset to default 3

The index you proposed might be unhelpful:

CREATE INDEX idx_users on users(CreatedAt, Type, First, Middle, Last)

The fact that you're selecting and ordering by CreatedAt doesn't mean it's needed as an index key. In my test on 400k samples, none of the three queries you showed ended up using it. They all ended up running a sequential scan that took above 150ms.

If you simply remove createdat from the index, they're happy to use it for a Bitmap Index Scan taking below 5ms. You can add more tweaks depending on how much input latency you're ready to tolerate and how much space you're willing to spend on accommodating the indexes.

Here's a comparison of storage space they take up on 200k rows, with query times in ms averaged on 45 calls, plus the index size and how much more space you need for it, in % compared to the base table size:
_{demo at db<>fiddle}

variant	query1	query2	query3	indexes size	%size
`(Type,First,Middle,Last) include(createdat,userid);`	0.159	0.148	0.148	35 MB	104
`(Type,First,CreatedAt) INCLUDE(Middle,Last,UserID);` `(Type,Middle,CreatedAt) INCLUDE(First,Last,UserID);` `(Type,Last,CreatedAt) INCLUDE (First,Middle,UserID);` _charlieface	0.172	0.165	0.161	108 MB	325
`(Type,First,Middle,Last);`	0.197	0.187	0.182	33 MB	98
`(Type,first);` `(Type,middle);` `(Type,last);`	0.196	0.191	0.179	4912 kB	14
`(first,type,createdat desc);` `(middle,type,createdat desc);` `(last,type,createdat desc);` _{Thorsten Kettner}	0.214	0.210	0.202	53 MB	161
`(Type) include(First,Middle,Last, UserID,CreatedAt);`	2.712	2.689	2.399	34 MB	103
`(Type);`	5.098	5.024	4.627	1456 kB	4
`(CreatedAt,Type,First,Middle,Last);`	77.073	76.153	76.333	32 MB	97

Indexing is about selectivity. If there was an index, and a query selected 50% of the rows in the table for instance, then it would be a bad idea for the DBMS to use the index, as this would be a lot of work compared to just reading all table rows sequentially.

There are two columns in each where clause. Which of the two is more selective? Let's look at the first query: I would expect there to be more distinct first names then types, so the first name would be the first column in the index and the type only be the second one. So, an index for the first query would look like this:

create index idx_1 on users (first, type);

As you are using equality comparison only (Type = ... , First = ...) you will hit one position in the index where you will find all matches. This allows us to add the CreatedAt column to the index, so all those matches would already be sorted:

create index idx_1 on users (first, type, createdat desc);

For the other two queries you'd have accordingly:

create index idx_2 on users (middle, type, createdat desc);
create index idx_3 on users (last, type, createdat desc);

So, you need three different indexes for the three queries.

If you only wanted to provide a single index for all queries instead, then you'd index the type, because this is the column in the where clause all three queries have in common. You could add the other columns, so as to provide them before the table is read, but I doubt the DBMS would use the index, anyway, because I don't assume the type to be selective enough. One query might benefit, namely the one where the second indexed column would match the column in the where clause, but that's it. So, providing one index only isn't a promising option. You must provide three indexes, one per query.

That index is not useful, as CreatedAt is the first column, which means all the other columns cannot be seeked.

Furthermore:

As it doesn't fully "cover" the query, which uses select *. You would need all other columns in an include.
Middle and Last are not helping you by being in the index key, as they are after First. So this index (even if it was covering) will only help for the first query.

So you need three separate indexes.

(Type, First, CreatedAt) INCLUDE (Middle, Last, UserID)
(Type, Middle, CreatedAt) INCLUDE (First, Last, UserID)
(Type, Last, CreatedAt) INCLUDE (First, Middle, UserID)

You can see in the fiddle that this gets you an index-only scan in all three queries.

db<>fiddle

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sql - Which column to index in multicolumn indexing? - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)