最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Which column to index in multicolumn indexing? - Stack Overflow

programmeradmin6浏览0评论
UserID First Middle Last Type CreatedAt
123 John Henry Doe Mage 03-28-2025
UserID First Middle Last Type CreatedAt
123 John Henry Doe Mage 03-28-2025

Let's say I have the following table above. I would like to create index to help speed up my queries.

All queries would like along the lines of this:

Select *
from users
where Type = 'SomeType'
  and First = 'SomeName1'
Order by CreatedAt DESC;
Select *
from users
where Type = 'SomeType'
  and Middle = 'SomeName2'
Order by CreatedAt DESC;
Select *
from users
where Type = 'SomeType'
  and Last = 'SomeName3'
Order by CreatedAt DESC;

How would I index the columns to make the queries efficient? Would CreatedAt be the first of the indexed column?

I'm thinking

CREATE INDEX idx_users on users(CreatedAt, Type, First, Middle, Last)

CreatedAt and type would always be used, whereas first, middle, and last varies.

Share Improve this question edited Mar 28 at 7:56 jarlh 44.8k8 gold badges50 silver badges67 bronze badges asked Mar 28 at 6:20 Alan ChenAlan Chen 412 bronze badges 2
  • CREATE INDEX idx_users on users(Type, First, Middle, Last); or CREATE INDEX idx_users on users(Type, First, Middle, Last)include(createdat,userid); if it fits and you're after index-only scans. If createdat and userid aren't in where, they don't need to be index keys, only an optionally included payload. If the type you're searching for is constant, you can make that a partial index by removing it from the keys and adding a where(type='sometype') to the definition. You might be interested in index types that support pattern matching (gin_trgm_ops) for the names. – Zegarek Commented Mar 28 at 6:50
  • How much research effort is expected of Stack Overflow users? minimal reproducible example Tips for asking a good SQL question Asking query performance questions How to Optimize Queries in a Database - The Basics What are your most common sql optimizations? When and why are database joins expensive? – philipxy Commented Mar 28 at 20:52
Add a comment  | 

3 Answers 3

Reset to default 3

The index you proposed might be unhelpful:

CREATE INDEX idx_users on users(CreatedAt, Type, First, Middle, Last)

The fact that you're selecting and ordering by CreatedAt doesn't mean it's needed as an index key. In my test on 400k samples, none of the three queries you showed ended up using it. They all ended up running a sequential scan that took above 150ms.

If you simply remove createdat from the index, they're happy to use it for a Bitmap Index Scan taking below 5ms. You can add more tweaks depending on how much input latency you're ready to tolerate and how much space you're willing to spend on accommodating the indexes.

Here's a comparison of storage space they take up on 200k rows, with query times in ms averaged on 45 calls, plus the index size and how much more space you need for it, in % compared to the base table size:
demo at db<>fiddle

variant query1 query2 query3 indexes size %size
(Type,First,Middle,Last) include(createdat,userid); 0.159 0.148 0.148 35 MB 104
(Type,First,CreatedAt) INCLUDE(Middle,Last,UserID);
(Type,Middle,CreatedAt) INCLUDE(First,Last,UserID);
(Type,Last,CreatedAt) INCLUDE (First,Middle,UserID);
charlieface
0.172 0.165 0.161 108 MB 325
(Type,First,Middle,Last); 0.197 0.187 0.182 33 MB 98
(Type,first);
(Type,middle);
(Type,last);
0.196 0.191 0.179 4912 kB 14
(first,type,createdat desc);
(middle,type,createdat desc);
(last,type,createdat desc);
Thorsten Kettner
0.214 0.210 0.202 53 MB 161
(Type) include(First,Middle,Last, UserID,CreatedAt); 2.712 2.689 2.399 34 MB 103
(Type); 5.098 5.024 4.627 1456 kB 4
(CreatedAt,Type,First,Middle,Last); 77.073 76.153 76.333 32 MB 97

Indexing is about selectivity. If there was an index, and a query selected 50% of the rows in the table for instance, then it would be a bad idea for the DBMS to use the index, as this would be a lot of work compared to just reading all table rows sequentially.

There are two columns in each where clause. Which of the two is more selective? Let's look at the first query: I would expect there to be more distinct first names then types, so the first name would be the first column in the index and the type only be the second one. So, an index for the first query would look like this:

create index idx_1 on users (first, type);

As you are using equality comparison only (Type = ... , First = ...) you will hit one position in the index where you will find all matches. This allows us to add the CreatedAt column to the index, so all those matches would already be sorted:

create index idx_1 on users (first, type, createdat desc);

For the other two queries you'd have accordingly:

create index idx_2 on users (middle, type, createdat desc);
create index idx_3 on users (last, type, createdat desc);

So, you need three different indexes for the three queries.

If you only wanted to provide a single index for all queries instead, then you'd index the type, because this is the column in the where clause all three queries have in common. You could add the other columns, so as to provide them before the table is read, but I doubt the DBMS would use the index, anyway, because I don't assume the type to be selective enough. One query might benefit, namely the one where the second indexed column would match the column in the where clause, but that's it. So, providing one index only isn't a promising option. You must provide three indexes, one per query.

That index is not useful, as CreatedAt is the first column, which means all the other columns cannot be seeked.

Furthermore:

  • As it doesn't fully "cover" the query, which uses select *. You would need all other columns in an include.
  • Middle and Last are not helping you by being in the index key, as they are after First. So this index (even if it was covering) will only help for the first query.

So you need three separate indexes.

(Type, First, CreatedAt) INCLUDE (Middle, Last, UserID)
(Type, Middle, CreatedAt) INCLUDE (First, Last, UserID)
(Type, Last, CreatedAt) INCLUDE (First, Middle, UserID)

You can see in the fiddle that this gets you an index-only scan in all three queries.

db<>fiddle

发布评论

评论列表(0)

  1. 暂无评论