Thread: Quesion on the use of indexes

Quesion on the use of indexes

From

"Benjamin Krajmalnik"

Date:

17 August 2010, 00:26:17

A little background – I have various multi-column indexes whenever I have queries which restrict the output based on the values of the 2 fields (for example, a client code and the date of a transaction).

Is there a performance gain using this approach as opposed to using 2 separate indexes, one on the first column and one on the second column?

The reason I am asking is that my coding convetion goes back to the days where I used ISAM tables, so the systems did not know how to use more than a single index.

In some cases, I may have an index on (columna, columnb) and one on (columnb, columna) due to the data access patterns. If there are no performance gains in having these multi-part indexes, and performance will be the same as having one index solely on columna and one solely on columnb, then I can reduce the disk usage significantly in some cases.

Re: Quesion on the use of indexes

From

Tom Lane

Date:

17 August 2010, 00:33:37

"Benjamin Krajmalnik" <kraj@servoyant.com> writes:
> A little background - I have various multi-column indexes whenever I
> have queries which restrict the output based on the values of the 2
> fields (for example, a client code and the date of a transaction).

> Is there a performance gain using this approach as opposed to using 2
> separate indexes, one on the first column and one on the second column?

Maybe, maybe not ... it's going to depend on a bunch of factors, one of
which is what your update load is like compared to the queries that read
the indexes.  There's a bit of coverage of this in the fine manual: see
http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
and the next few pages.

The short of it is that there's no substitute for doing your own
experiments for your own application ...

            regards, tom lane

Re: Quesion on the use of indexes

From

Alvaro Herrera

Date:

17 August 2010, 12:08:01

Excerpts from Tom Lane's message of lun ago 16 23:33:29 -0400 2010:
> "Benjamin Krajmalnik" <kraj@servoyant.com> writes:
> > A little background - I have various multi-column indexes whenever I
> > have queries which restrict the output based on the values of the 2
> > fields (for example, a client code and the date of a transaction).
>
> > Is there a performance gain using this approach as opposed to using 2
> > separate indexes, one on the first column and one on the second column?
>
> Maybe, maybe not ... it's going to depend on a bunch of factors, one of
> which is what your update load is like compared to the queries that read
> the indexes.  There's a bit of coverage of this in the fine manual: see
> http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
> and the next few pages.

Another important factor is how selective is each clause in isolation
compared to how selective they are together.  We have found that doing
BitmapAnd of two bitmap-scanned indexes is sometimes much too slow
compared to a two-column index.  (I have yet to see a case where indexes
beyond two columns are useful; at this point, combined bitmap indexscans
are enough.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Quesion on the use of indexes

From

Date:

17 August 2010, 12:28:05

Here's a quote from the docs:

To combine multiple indexes, the system scans each needed index and prepares a bitmap  in memory giving the locations
oftable rows that are reported as matching that index's conditions. The bitmaps are then ANDed and ORed together as
neededby the query. Finally, the actual table rows are visited and returned. The table rows are visited in physical
order,because that is how the bitmap is laid out; this means that any ordering of the original indexes is lost, and so
aseparate sort step will be needed if the query has an ORDER BY  clause. For this reason, and because each additional
indexscan adds extra time, the planner will sometimes choose to use a simple index scan even though additional indexes
areavailable that could have been used as well 

So, if you have a multi-column index, supply values from the major component on down, not skipping any (but not
necessarilysupplying all components, leaving the tail components), and the index is clustered, then you will get the
bestperformance on a range scan.  For equality scans, who knows?  For high selectivity (meaning here, few hits) of
singleindexes the cost of preparing the bitmaps and such may be less than traversing the multi-index and visiting the
table. For non-clustered multi-column, my bet would be on the single indexes up to some small number of indexes. 

And, as the docs say, the optimizer may well decide that it isn't worth the effort to use more than the most selective
singleindex. 

Robert

---- Original message ----
>Date: Tue, 17 Aug 2010 11:07:39 -0400
>From: pgsql-performance-owner@postgresql.org (on behalf of Alvaro Herrera <alvherre@commandprompt.com>)
>Subject: Re: [PERFORM] Quesion on the use of indexes
>To: Tom Lane <tgl@sss.pgh.pa.us>
>Cc: Benjamin Krajmalnik <kraj@servoyant.com>,pgsql-performance <pgsql-performance@postgresql.org>
>
>Excerpts from Tom Lane's message of lun ago 16 23:33:29 -0400 2010:
>> "Benjamin Krajmalnik" <kraj@servoyant.com> writes:
>> > A little background - I have various multi-column indexes whenever I
>> > have queries which restrict the output based on the values of the 2
>> > fields (for example, a client code and the date of a transaction).
>>
>> > Is there a performance gain using this approach as opposed to using 2
>> > separate indexes, one on the first column and one on the second column?
>>
>> Maybe, maybe not ... it's going to depend on a bunch of factors, one of
>> which is what your update load is like compared to the queries that read
>> the indexes.  There's a bit of coverage of this in the fine manual: see
>> http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
>> and the next few pages.
>
>Another important factor is how selective is each clause in isolation
>compared to how selective they are together.  We have found that doing
>BitmapAnd of two bitmap-scanned indexes is sometimes much too slow
>compared to a two-column index.  (I have yet to see a case where indexes
>beyond two columns are useful; at this point, combined bitmap indexscans
>are enough.)
>
>--
>Álvaro Herrera <alvherre@commandprompt.com>
>The PostgreSQL Company - Command Prompt, Inc.
>PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance