Thread: Compound Indexes

Compound Indexes

From
"Phoenix Kiula"
Date:
I have a table with ten columns. My queries basically one column as
the first WHERE condition, so an index on that column is certain. But
the columns after that one vary depending on end-user's choice (this
is a reporting application) and so does the sorting order.

In MySQL world, I had sort_buffer in the config file, and I made a
compound index with the columns most often used in these types of
queries. So my index looked like:

  INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);

This has five columns in it. While reading the pgsql documentation, I
gather than anything beyond three columns offers diminishing benefits.

My queries will look like these:

   SELECT * from trades where id = 99999
   and c_id = 9999
   ORDER by s_id;

   SELECT * from trades where id = 99999
   and s_id = 99990
   ORDER by created_on desc ;

   SELECT * from trades where id = 99999
   and s_id = 99990
   and t_brief ~* 'more|than|one|word'
   ORDER by created_on desc ;

So my question: how does PGSQL optimize its sorts? If I were to index
merely the columns that are most commonly used in the reporting WHERE
clause, would that be ok? Some ofthese columns may be "TEXT" type --
how should I include these in the index (in MySQL, I included only the
first 100 words in the index).

TIA!

Re: Compound Indexes

From
Chris Browne
Date:
phoenix.kiula@gmail.com ("Phoenix Kiula") writes:
> I have a table with ten columns. My queries basically one column as
> the first WHERE condition, so an index on that column is certain. But
> the columns after that one vary depending on end-user's choice (this
> is a reporting application) and so does the sorting order.
>
> In MySQL world, I had sort_buffer in the config file, and I made a
> compound index with the columns most often used in these types of
> queries. So my index looked like:
>
>   INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);
>
> This has five columns in it. While reading the pgsql documentation, I
> gather than anything beyond three columns offers diminishing benefits.
>
> My queries will look like these:
>
>    SELECT * from trades where id = 99999
>    and c_id = 9999
>    ORDER by s_id;
>
>    SELECT * from trades where id = 99999
>    and s_id = 99990
>    ORDER by created_on desc ;
>
>    SELECT * from trades where id = 99999
>    and s_id = 99990
>    and t_brief ~* 'more|than|one|word'
>    ORDER by created_on desc ;
>
> So my question: how does PGSQL optimize its sorts? If I were to index
> merely the columns that are most commonly used in the reporting WHERE
> clause, would that be ok? Some ofthese columns may be "TEXT" type --
> how should I include these in the index (in MySQL, I included only the
> first 100 words in the index).

If you have only these three sorts of queries, then I would speculate
that the following indices *might* be useful:

 create idx1 on trades (id);
 create idx2 on trades (c_id);
 create idx3 on trades (s_id);
 create idx4 on trades (created_on);
 create idx5 on trades (created_on) where t_brief ~* 'more|than|one|word';
 create idx6 on trades (id, s_id) where t_brief ~* 'more|than|one|word';

(I'm assuming with idx5 and idx6 that you were actually searching for
'more|than|one|word'; if what is searched for can vary, then idx5/idx6
are worthless.)

You could try adding them all, and check out which of them are
actually used by the query planner.  And eventually drop out the
irrelevant ones.

PostgreSQL has a rather sophisticated query planner (pretty much
"rocket science," compared to MySQL), and it is even possible that it
would use multiple of those indices simultaneously for some of the
queries.  Which indexes, if any, it will use will vary from query to
query based on the parameters in the query.

You can determine the query plan by prefixing the query with the
keyword "EXPLAIN."

Suppose you add the above 6 indexes, you could get query plans via
running the following:

>    EXPLAIN SELECT * from trades where id = 99999
>    and c_id = 9999
>    ORDER by s_id;
>
>    EXPLAIN SELECT * from trades where id = 99999
>    and s_id = 99990
>    ORDER by created_on desc ;
>
>    EXPLAIN SELECT * from trades where id = 99999
>    and s_id = 99990
>    and t_brief ~* 'more|than|one|word'
>    ORDER by created_on desc ;

You may want to post the output to the list; learning to read query
planner output is a bit of an art, and you won't necessarily make the
right sense of the results on day #1...
--
output = ("cbbrowne" "@" "cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
Consciousness - that annoying time between naps.

Re: Compound Indexes

From
"Phoenix Kiula"
Date:
On 15/08/07, Chris Browne <cbbrowne@acm.org> wrote:
> phoenix.kiula@gmail.com ("Phoenix Kiula") writes:
> > I have a table with ten columns. My queries basically one column as
> > the first WHERE condition, so an index on that column is certain. But
> > the columns after that one vary depending on end-user's choice (this
> > is a reporting application) and so does the sorting order.
> >
> > In MySQL world, I had sort_buffer in the config file, and I made a
> > compound index with the columns most often used in these types of
> > queries. So my index looked like:
> >
> >   INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);
> >
> > This has five columns in it. While reading the pgsql documentation, I
> > gather than anything beyond three columns offers diminishing benefits.
> >
> > My queries will look like these:
> >
> >    SELECT * from trades where id = 99999
> >    and c_id = 9999
> >    ORDER by s_id;
> >
> >    SELECT * from trades where id = 99999
> >    and s_id = 99990
> >    ORDER by created_on desc ;
> >
> >    SELECT * from trades where id = 99999
> >    and s_id = 99990
> >    and t_brief ~* 'more|than|one|word'
> >    ORDER by created_on desc ;
> >
> > So my question: how does PGSQL optimize its sorts? If I were to index
> > merely the columns that are most commonly used in the reporting WHERE
> > clause, would that be ok? Some ofthese columns may be "TEXT" type --
> > how should I include these in the index (in MySQL, I included only the
> > first 100 words in the index).
>
> If you have only these three sorts of queries, then I would speculate
> that the following indices *might* be useful:
>
>  create idx1 on trades (id);
>  create idx2 on trades (c_id);
>  create idx3 on trades (s_id);
>  create idx4 on trades (created_on);
>  create idx5 on trades (created_on) where t_brief ~* 'more|than|one|word';
>  create idx6 on trades (id, s_id) where t_brief ~* 'more|than|one|word';
>
> (I'm assuming with idx5 and idx6 that you were actually searching for
> 'more|than|one|word'; if what is searched for can vary, then idx5/idx6
> are worthless.)
>
> You could try adding them all, and check out which of them are
> actually used by the query planner.  And eventually drop out the
> irrelevant ones.
>
> PostgreSQL has a rather sophisticated query planner (pretty much
> "rocket science," compared to MySQL), and it is even possible that it
> would use multiple of those indices simultaneously for some of the
> queries.  Which indexes, if any, it will use will vary from query to
> query based on the parameters in the query.




Thank you so much! My only concern, probably a hangover from the MySQL
world, is that if I have 5 of 6 indices, what would that do to INSERT
and UPDATE performance if all these indices have to be updated? Is
updating individual indices faster than one large compound index?

Re: Compound Indexes

From
Tom Lane
Date:
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
> Thank you so much! My only concern, probably a hangover from the MySQL
> world, is that if I have 5 of 6 indices, what would that do to INSERT
> and UPDATE performance if all these indices have to be updated? Is
> updating individual indices faster than one large compound index?

Probably slower, but the compound index is enough more specialized that
it's not as useful as the individual indexes.  (I suspect that that
compound index in MySQL didn't do you nearly as much good as you
thought, either.)  You should read
http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html
and nearby sections of the manual.

            regards, tom lane

Re: Compound Indexes

From
Gregory Stark
Date:
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:

>> >    SELECT * from trades where id = 99999
>> >    and c_id = 9999
>> >    ORDER by s_id;
>> >
>> >    SELECT * from trades where id = 99999
>> >    and s_id = 99990
>> >    ORDER by created_on desc ;
>> >
>> >    SELECT * from trades where id = 99999
>> >    and s_id = 99990
>> >    and t_brief ~* 'more|than|one|word'
>> >    ORDER by created_on desc ;

Well I would start with testing:

trades(id, c_id, s_id)
trades(id, s_id, created_on)

However you may (and actually probably will, i expect) find that the third
column is not helping. That's especially true if the result of matching id and
either c_id or s_id is always a small enough set of records that sorting them
is quick (I would expect the point where an extra column in the index would
start to save you anything to be somewhere around 100-1,000, possibly even as
much as 10,000 or more).

Note that in released versions getting an index which is useful for ORDER BY
created_on *DESC* is actually quite difficult. So unless these queries are
returning thousands of records I would suggest ignoring the ORDER BY clauses
and just looking at the WHERE clauses.

If id,s_id and id,c_id are selective enough to return only a few records I
would actually expect you to end up with just

trades(id, s_id)
trades(id, c_id)

You might also be able to build some kind of index to help the ~* clause. If
you do a lot of queries like that and the id,s_id restriction isn't very
selective you might look into tsearch2 which can index that type of query.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Compound Indexes

From
"Phoenix Kiula"
Date:
> you do a lot of queries like that and the id,s_id restriction isn't very
> selective you might look into tsearch2 which can index that type of query.
>


Thanks. Does tsearch2 come installed with 8.2.3? I am not techie
enough to do all the compiling stuff so I'm hoping it does! How can I
check?

Re: Compound Indexes

From
Lew
Date:
"Phoenix Kiula" writes:
>> I have a table with ten columns. My queries basically one column as
>> the first WHERE condition, so an index on that column is certain. But
>> the columns after that one vary depending on end-user's choice (this
>> is a reporting application) and so does the sorting order.
>>
>> In MySQL world, I had sort_buffer in the config file, and I made a
>> compound index with the columns most often used in these types of
>> queries. So my index looked like:
>>
>>   INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);
>>
>> This has five columns in it. While reading the pgsql documentation, I
>> gather than anything beyond three columns offers diminishing benefits.
>>
>> My queries will look like these:
>>
>>    SELECT * from trades where id = 99999
>>    and c_id = 9999
>>    ORDER by s_id;
>>
>>    SELECT * from trades where id = 99999
>>    and s_id = 99990
>>    ORDER by created_on desc ;
>>
>>    SELECT * from trades where id = 99999
>>    and s_id = 99990
>>    and t_brief ~* 'more|than|one|word'
>>    ORDER by created_on desc ;


The documentation at
<http://www.postgresql.org/docs/8.2/interactive/indexes-multicolumn.html>
also points out that your multi-column index will not help much with the
columns after "id", certainly not with the columns after "t_id" since "t_id"
isn't involved in any of the conditions or ordering.

> The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column
thatdoes not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints
oncolumns to the right of these columns are checked in the index, so they save visits to the table proper, but they do
notreduce the portion of the index that has to be scanned. 

and goes on to say,
> Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves
spaceand time. 

The planner can make use of single-column indexes in combination (if your PG
is recent enough).
<http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html>
> Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the
index-combinationfeature. 

The selectivity of each column is also relevant.  If you have ten million rows
with "s_id" values of only either 99990 or 99991, an index on "s_id" is not
going to help much.

--
Lew