Re: Compound Indexes - Mailing list pgsql-general

From Lew
Subject Re: Compound Indexes
Date
Msg-id le-dnQgcTt0stF_bnZ2dnUVZ_jOdnZ2d@comcast.com
Whole thread Raw
In response to Re: Compound Indexes  (Chris Browne <cbbrowne@acm.org>)
List pgsql-general
"Phoenix Kiula" writes:
>> I have a table with ten columns. My queries basically one column as
>> the first WHERE condition, so an index on that column is certain. But
>> the columns after that one vary depending on end-user's choice (this
>> is a reporting application) and so does the sorting order.
>>
>> In MySQL world, I had sort_buffer in the config file, and I made a
>> compound index with the columns most often used in these types of
>> queries. So my index looked like:
>>
>>   INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);
>>
>> This has five columns in it. While reading the pgsql documentation, I
>> gather than anything beyond three columns offers diminishing benefits.
>>
>> My queries will look like these:
>>
>>    SELECT * from trades where id = 99999
>>    and c_id = 9999
>>    ORDER by s_id;
>>
>>    SELECT * from trades where id = 99999
>>    and s_id = 99990
>>    ORDER by created_on desc ;
>>
>>    SELECT * from trades where id = 99999
>>    and s_id = 99990
>>    and t_brief ~* 'more|than|one|word'
>>    ORDER by created_on desc ;


The documentation at
<http://www.postgresql.org/docs/8.2/interactive/indexes-multicolumn.html>
also points out that your multi-column index will not help much with the
columns after "id", certainly not with the columns after "t_id" since "t_id"
isn't involved in any of the conditions or ordering.

> The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column
thatdoes not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints
oncolumns to the right of these columns are checked in the index, so they save visits to the table proper, but they do
notreduce the portion of the index that has to be scanned. 

and goes on to say,
> Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves
spaceand time. 

The planner can make use of single-column indexes in combination (if your PG
is recent enough).
<http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html>
> Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the
index-combinationfeature. 

The selectivity of each column is also relevant.  If you have ten million rows
with "s_id" values of only either 99990 or 99991, an index on "s_id" is not
going to help much.

--
Lew

pgsql-general by date:

Previous
From: ptjm@interlog.com (Patrick TJ McPhee)
Date:
Subject: Re: memory optimization
Next
From: Torsten Zühlsdorff
Date:
Subject: Re: Persistent connections in PHP