Thread: Index on multiple columns VS multiple index
Hello everyone,
I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy: - One index on 'a' and one on 'b'
- One index on both columns 'a' and 'b'
- A combination of both solutions ?
Thanks in advance,
Brice
On Jan 2, 2014, at 2:17 PM, Brice André wrote:
Hello everyone,I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
- One index on 'a' and one on 'b'
- One index on both columns 'a' and 'b'
- A combination of both solutions ?
Could you clarify your question a bit? Are you saying your queries are predominantly
SELECT ... FROM table WHERE a = ?
With some queries that are
SELECT ... FROM table WHERE a = ? AND b = ?
Thanks,
Jonathan
Hi Brice,
On Jan 2, 2014, at 2:24 PM, Brice André wrote:
Yes, except that on b, it's a inequality operator.2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>On Jan 2, 2014, at 2:17 PM, Brice André wrote:Hello everyone,I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
- One index on 'a' and one on 'b'
- One index on both columns 'a' and 'b'
- A combination of both solutions ?
Could you clarify your question a bit? Are you saying your queries are predominantlySELECT ... FROM table WHERE a = ?With some queries that areSELECT ... FROM table WHERE a = ? AND b = ?Thanks,Jonathan
Moving your reply to the list.
Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.
However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.
Best,
Jonathan
This is a good article on the subject. I know it's not PG specific, but it expands a bit on Jonathan's point.
http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/
On Jan 2, 2014 2:34 PM, "Jonathan S. Katz" <jonathan.katz@excoventures.com> wrote:
Hi Brice,On Jan 2, 2014, at 2:24 PM, Brice André wrote:Yes, except that on b, it's a inequality operator.2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>On Jan 2, 2014, at 2:17 PM, Brice André wrote:Hello everyone,I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
- One index on 'a' and one on 'b'
- One index on both columns 'a' and 'b'
- A combination of both solutions ?
Could you clarify your question a bit? Are you saying your queries are predominantlySELECT ... FROM table WHERE a = ?With some queries that areSELECT ... FROM table WHERE a = ? AND b = ?Thanks,JonathanMoving your reply to the list.Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.Best,Jonathan
Hello Jonathan,
Sorry, it's probably my bad english that is confusing. When I sayed inequality, I ment operators like <= or >=. In fact, 'a' is an integer' which is a foreign key on a primary key of another table and b is a timestamp.Thanks for your help,
2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
Hi Brice,On Jan 2, 2014, at 2:24 PM, Brice André wrote:Yes, except that on b, it's a inequality operator.2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>On Jan 2, 2014, at 2:17 PM, Brice André wrote:Hello everyone,I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
- One index on 'a' and one on 'b'
- One index on both columns 'a' and 'b'
- A combination of both solutions ?
Could you clarify your question a bit? Are you saying your queries are predominantlySELECT ... FROM table WHERE a = ?With some queries that areSELECT ... FROM table WHERE a = ? AND b = ?Thanks,JonathanMoving your reply to the list.Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.Best,Jonathan
Hello Erik,
Thanks for this very useful link.because, from what I understand, multi-column index is faster, and, as there is no more seeking after an inequality, 'b' should be at the right of all columns of index (and as 'a' is the most selective column in my case, it should be at the left of the index definition).
2014/1/2 Erik Darling <edarling80@gmail.com>
This is a good article on the subject. I know it's not PG specific, but it expands a bit on Jonathan's point.
http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/
On Jan 2, 2014 2:34 PM, "Jonathan S. Katz" <jonathan.katz@excoventures.com> wrote:Hi Brice,On Jan 2, 2014, at 2:24 PM, Brice André wrote:Yes, except that on b, it's a inequality operator.2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>On Jan 2, 2014, at 2:17 PM, Brice André wrote:Hello everyone,I have a question concerning index : suppose I have a table with fields 'a' and 'b' and that all requests perform WHERE clauses on 'a' field, and some requests also perform WHERE clauses on 'b' fields. What is the best approach for indexing strategy:
- One index on 'a' and one on 'b'
- One index on both columns 'a' and 'b'
- A combination of both solutions ?
Could you clarify your question a bit? Are you saying your queries are predominantlySELECT ... FROM table WHERE a = ?With some queries that areSELECT ... FROM table WHERE a = ? AND b = ?Thanks,JonathanMoving your reply to the list.Assuming the data type you are using supports B-tree indexes, I can't think of any cases where inequality (specifically <> or !=) would use an index, so a single index on 'a' is what you are looking for.However, if you are doing anything with equality (<, <=, =, >=, >) then you would wnat a multi-column index on (a,b), in that column order.Best,Jonathan
Hi Brice, On Jan 2, 2014, at 2:45 PM, Brice André wrote: > Sorry, it's probably my bad english that is confusing. When I sayed inequality, I ment operators like <= or >=. In fact,'a' is an integer' which is a foreign key on a primary key of another table and b is a timestamp. > > So, if I understand you, making an index on both ('a', 'b') will be faster than two separate indices ? > > But, if yes, does this index can be useful for a search on 'a' only ? Or do I need a separate index for this ? > > I generally also have a ORDER clause on 'b'. I suppose that the index will be a good point for it too ? It should be more space-efficient to use a multi-column index, but you would have similar performance on SELECTs as havingtwo indexes. But now understanding your data a bit more, it probably would be better just to have an index on "a" - unless you will besearching only over "b" or even after filtering your data by "a" you will have a lot of rows that need to be filtered by"b" having a multi-column index or an index on "b" would be probably be overkill. Of course, I use "a lot" because it really depends on what your actual data is - if you can you should probably run a fewscenarios with your data set (if you can) and using EXPLAIN ANALYZE to see which index or indexes actually are used. Keep in mind that you have to consider what happens when you have writes (INSERT/UPDATE/DELETE) on the table with your indexes,your write queries will have to wait for those indexes to be updated, thus putting more I/O load on the system. > And, last question, I also have time-consuming queries that are of the form : > > SELECT .. FROM table WHERE 'a'=x AND 'c'=y AND 'b' >= z > > where 'c' is an integer, but that is not a foreign key. Does it makes sense to create an additional multi-column indexon ('a', 'b', 'c') ? Does the order of declaration of columns in the index creation makes a difference ? (for example('a', 'c', 'b')) ? And is this index useful for a search on 'a' and 'b' only ? Probably not, unless you have a lot of rows to further filter from "c" - too much indexing could actually impede performance,which is why you need to experiment a bit :-) Best, Jonathan
On Jan 2, 2014, at 3:00 PM, Brice André wrote: > But I have still one question that remains : > > suppose I define an index on ('a', 'b') columns, will it be useful for a search on 'a' column only, or will it be ignoreby postgresl ? If you define the index on ('a', 'b') in that order and are just searching for 'a' in your query, the query planner willuse the index. Jonathan
this was what I was guessing. Thank you for the confirmation.
So I think that I now have a clear idea on the best index strategy for my project.
Thanks for your help,So I think that I now have a clear idea on the best index strategy for my project.
Regard,
2014/1/2 Jonathan S. Katz <jonathan.katz@excoventures.com>
On Jan 2, 2014, at 3:00 PM, Brice André wrote:If you define the index on ('a', 'b') in that order and are just searching for 'a' in your query, the query planner will use the index.
> But I have still one question that remains :
>
> suppose I define an index on ('a', 'b') columns, will it be useful for a search on 'a' column only, or will it be ignore by postgresl ?
Jonathan