Thread: Index on multiple columns VS multiple index

Index on multiple columns VS multiple index

From
Brice André
Date:
Hello everyone,

I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
  • One index on 'a' and one on 'b'
  • One index on both columns 'a' and 'b'
  • A combination of both solutions ?

Thanks in advance,

Brice

Re: Index on multiple columns VS multiple index

From
"Jonathan S. Katz"
Date:
On Jan 2, 2014, at 2:17 PM, Brice André wrote:

Hello everyone,

I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
  • One index on 'a' and one on 'b'
  • One index on both columns 'a' and 'b'
  • A combination of both solutions ?
Could you clarify your question a bit?  Are you saying your queries are predominantly

SELECT ... FROM table WHERE a = ?

With some queries that are

SELECT ... FROM table WHERE a = ? AND b = ?

Thanks,

Jonathan

Re: Index on multiple columns VS multiple index

From
"Jonathan S. Katz"
Date:
Hi Brice,

On Jan 2, 2014, at 2:24 PM, Brice André wrote:

Yes, except that on b, it's a inequality operator.


2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
On Jan 2, 2014, at 2:17 PM, Brice André wrote:

Hello everyone,

I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
  • One index on 'a' and one on 'b'
  • One index on both columns 'a' and 'b'
  • A combination of both solutions ?
Could you clarify your question a bit?  Are you saying your queries are predominantly

SELECT ... FROM table WHERE a = ?

With some queries that are

SELECT ... FROM table WHERE a = ? AND b = ?

Thanks,

Jonathan

Moving your reply to the list.

Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.

However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.

Best,

Jonathan

Re: Index on multiple columns VS multiple index

From
Erik Darling
Date:

This is a good article on the subject. I know it's not PG specific, but it expands a bit on Jonathan's point.

http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

On Jan 2, 2014 2:34 PM, "Jonathan S. Katz" <jonathan.katz@excoventures.com> wrote:
Hi Brice,

On Jan 2, 2014, at 2:24 PM, Brice André wrote:

Yes, except that on b, it's a inequality operator.


2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
On Jan 2, 2014, at 2:17 PM, Brice André wrote:

Hello everyone,

I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
  • One index on 'a' and one on 'b'
  • One index on both columns 'a' and 'b'
  • A combination of both solutions ?
Could you clarify your question a bit?  Are you saying your queries are predominantly

SELECT ... FROM table WHERE a = ?

With some queries that are

SELECT ... FROM table WHERE a = ? AND b = ?

Thanks,

Jonathan

Moving your reply to the list.

Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.

However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.

Best,

Jonathan

Re: Index on multiple columns VS multiple index

From
Brice André
Date:
Hello Jonathan,

Sorry, it's probably my bad english that is confusing. When I sayed inequality, I ment operators like <= or >=. In fact, 'a' is an integer' which is a foreign key on a primary key of another table and b is a timestamp.

So, if I understand you, making an index on both ('a', 'b') will be faster than two separate indices ?

But, if yes, does this index can be useful for a search on 'a' only ? Or do I need a separate index for this ?

I generally also have a ORDER clause on 'b'. I suppose that the index will be a good point for it too ?

And, last question, I also have time-consuming queries that are of the form :

SELECT .. FROM table WHERE 'a'=x AND 'c'=y AND 'b' >= z

where 'c' is an integer, but that is not a foreign key. Does it makes sense to create an additional multi-column index on ('a', 'b', 'c') ? Does the order of declaration of columns in the index creation makes a difference ? (for example ('a', 'c', 'b')) ? And is this index useful for a search on 'a' and 'b' only ?

I am sorry for bothering you with all those questions, but I tried to get information from the doc, but I have big difficulties understanding multi-column index stuff...

Thanks for your help,

Brice


2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
Hi Brice,

On Jan 2, 2014, at 2:24 PM, Brice André wrote:

Yes, except that on b, it's a inequality operator.


2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
On Jan 2, 2014, at 2:17 PM, Brice André wrote:

Hello everyone,

I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
  • One index on 'a' and one on 'b'
  • One index on both columns 'a' and 'b'
  • A combination of both solutions ?
Could you clarify your question a bit?  Are you saying your queries are predominantly

SELECT ... FROM table WHERE a = ?

With some queries that are

SELECT ... FROM table WHERE a = ? AND b = ?

Thanks,

Jonathan

Moving your reply to the list.

Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.

However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.

Best,

Jonathan


Re: Index on multiple columns VS multiple index

From
Brice André
Date:
Hello Erik,

Thanks for this very useful link.

I think that I found the answers to nearly all my questions. I must still think a little about it but it was very interesting.

But I have still one question that remains :

suppose I define an index on ('a', 'b') columns, will it be useful for a search on 'a' column only, or will it be ignore by postgresl ?

Regards,
Brice

PS : from what I read, I think that I should consider two indices :

('a', 'b') and
('a', 'c', 'b')

because, from what I understand, multi-column index is faster, and, as there is no more seeking after an inequality, 'b' should be at the right of all columns of index (and as 'a' is the most selective column in my case, it should be at the left of the index definition).

but if the answer to the above question is that postgresl does not use a multi-columns index on a single column search, I should probably also consider a third index on 'a' only... But this is still not clear for me...


2014/1/2 Erik Darling <edarling80@gmail.com>

This is a good article on the subject. I know it's not PG specific, but it expands a bit on Jonathan's point.

http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

On Jan 2, 2014 2:34 PM, "Jonathan S. Katz" <jonathan.katz@excoventures.com> wrote:
Hi Brice,

On Jan 2, 2014, at 2:24 PM, Brice André wrote:

Yes, except that on b, it's a inequality operator.


2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
On Jan 2, 2014, at 2:17 PM, Brice André wrote:

Hello everyone,

I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
  • One index on 'a' and one on 'b'
  • One index on both columns 'a' and 'b'
  • A combination of both solutions ?
Could you clarify your question a bit?  Are you saying your queries are predominantly

SELECT ... FROM table WHERE a = ?

With some queries that are

SELECT ... FROM table WHERE a = ? AND b = ?

Thanks,

Jonathan

Moving your reply to the list.

Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.

However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.

Best,

Jonathan


Re: Index on multiple columns VS multiple index

From
"Jonathan S. Katz"
Date:
Hi Brice,

On Jan 2, 2014, at 2:45 PM, Brice André wrote:

> Sorry, it's probably my bad english that is confusing. When I sayed inequality, I ment operators like <= or >=. In
fact,'a' is an integer' which is a foreign key on a primary key of another table and b is a timestamp. 
>
> So, if I understand you, making an index on both ('a', 'b') will be faster than two separate indices ?
>
> But, if yes, does this index can be useful for a search on 'a' only ? Or do I need a separate index for this ?
>
> I generally also have a ORDER clause on 'b'. I suppose that the index will be a good point for it too ?

It should be more space-efficient to use a multi-column index, but you would have similar performance on SELECTs as
havingtwo indexes. 

But now understanding your data a bit more, it probably would be better just to have an index on "a" - unless you will
besearching only over "b" or even after filtering your data by "a" you will have a lot of rows that need to be filtered
by"b" having a multi-column index or an index on "b" would be probably be overkill.  

Of course, I use "a lot" because it really depends on what your actual data is - if you can you should probably run a
fewscenarios with your data set (if you can) and using EXPLAIN ANALYZE to see which index or indexes actually are used. 

Keep in mind that you have to consider what happens when you have writes (INSERT/UPDATE/DELETE) on the table with your
indexes,your write queries will have to wait for those indexes to be updated, thus putting more I/O load on the system. 

> And, last question, I also have time-consuming queries that are of the form :
>
> SELECT .. FROM table WHERE 'a'=x AND 'c'=y AND 'b' >= z
>
> where 'c' is an integer, but that is not a foreign key. Does it makes sense to create an additional multi-column
indexon ('a', 'b', 'c') ? Does the order of declaration of columns in the index creation makes a difference ? (for
example('a', 'c', 'b')) ? And is this index useful for a search on 'a' and 'b' only ? 

Probably not, unless you have a lot of rows to further filter from "c" - too much indexing could actually impede
performance,which is why you need to experiment a bit :-) 

Best,

Jonathan


Re: Index on multiple columns VS multiple index

From
"Jonathan S. Katz"
Date:
On Jan 2, 2014, at 3:00 PM, Brice André wrote:

> But I have still one question that remains :
>
> suppose I define an index on ('a', 'b') columns, will it be useful for a search on 'a' column only, or will it be
ignoreby postgresl ? 

If you define the index on ('a', 'b') in that order and are just searching for 'a' in your query, the query planner
willuse the index. 

Jonathan




Re: Index on multiple columns VS multiple index

From
Brice André
Date:
this was what I was guessing. Thank you for the confirmation.

So I think that I now have a clear idea on the best index strategy for my project.

Thanks for your help,

Regard,
Brice


2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
On Jan 2, 2014, at 3:00 PM, Brice André wrote:

> But I have still one question that remains :
>
> suppose I define an index on ('a', 'b') columns, will it be useful for a search on 'a' column only, or will it be ignore by postgresl ?

If you define the index on ('a', 'b') in that order and are just searching for 'a' in your query, the query planner will use the index.

Jonathan