Home > mailing lists

Use of multipart index with "IN" - Mailing list pgsql-performance

From	Rob Messer
Subject	Use of multipart index with "IN"
Date	October 24, 2003 15:01:29
Msg-id	20031023181831.40095.qmail@web41215.mail.yahoo.com Whole thread Raw
In response to	Re: Moving postgresql.conf tunables into 2003... (Sean Chittenden <sean@chittenden.org>)
Responses	Re: Use of multipart index with "IN"
List	pgsql-performance

Tree view

I have a reporting system that does regular queries on a table with a
multipart index.  I am running version 7.3.4.  Here is the table
definition:

               Table "public.ds_rec_fld"
    Column     |          Type           | Modifiers
---------------+-------------------------+-----------
 dsid          | character varying(20)   | not null
 recid         | integer                 | not null
 field_name    | character varying(20)   | not null
 option_tag    | character varying(10)   | not null
 option_value  | integer                 |
 field_text    | character varying(2000) |
 field_type_cd | character varying(8)    |
Indexes: ds_rf_ndx1 btree (recid, field_name, option_value)

Normally queries are done using recid and field_name, so Postgresql
returns rows very quickly as expected.  Here is a sample explain
analyze output for a typical query:

db=> explain analyze
db-> select field_name, option_tag from ds_rec_fld where recid = 3000
and field_name = 'Q3A1';
                                                       QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------
 Index Scan using ds_rf_ndx1 on ds_rec_fld  (cost=0.00..163.09 rows=40
width=38) (actual time=0.06..0.07 rows=1 loops=1)
   Index Cond: ((recid = 3001) AND (field_name = 'Q3A1'::character
varying))
 Total runtime: 0.12 msec
(3 rows)

The problem comes in when we are selecting multiple field_name values
in one query.  The normal SQL syntax we have been using is like this:

select field_name, option_tag from ds_rec_fld where recid = 3001 and
field_name in ('Q3A1', 'Q3A9');

This is just a simplified example, at times there can be a lot of
field_name values in one query in the "in" clause.  Here postgresql
refuses to use the full index, instead doing a filter based on part of
the first recid part of index.  Here is the explain analyze output:

 Index Scan using ds_rf_ndx1 on ds_rec_fld  (cost=0.00..30425.51
rows=80 width=38) (actual time=0.18..1.08 rows=2 loops=1)
   Index Cond: (recid = 3001)
   Filter: ((field_name = 'Q3A1'::character varying) OR (field_name =
'Q3A9'::character varying))
 Total runtime: 1.12 msec
(4 rows)

So, 10 times longer.  This is an issue because at times we are
iterating through thousands of recid values.

I did a vacuum analyze, adjusted random_page_cost, etc. all to no
avail.

I also noticed that the problem goes away when I reformat the query
like this:

select field_name, option_tag from ds_rec_fld where
(recid = 3001 and field_name = 'Q3A1') or
(recid = 3001 and field_name = 'Q3A9')

Here is the explain analyze output for this:

  Index Scan using ds_rf_ndx1, ds_rf_ndx1 on ds_rec_fld
(cost=0.00..326.57 rows=80 width=38) (actual time=0.07..0.10 rows=2
loops=1)
   Index Cond: (((recid = 3001) AND (field_name = 'Q3A1'::character
varying)) OR ((recid = 3001) AND (field_name = 'Q3A9'::character
varying)))
 Total runtime: 0.16 msec
(3 rows)

Much better.  So I have partially solved my own problem, but there are
other places that this is not this simple to fix.

Therefore, my question is, is there some way to force postgresql to use
the full index and still stick with the shorter "field_name in ('...',
'...')" syntax?

If anyone has any thoughts please let me know.  Also it strikes me that
perhaps the optimizer could be tweaked to treat the first case like the
second one.  Thanks in advance,

Rob

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

pgsql-performance by date:

From: "John Pagakis"
Date: 24 October 2003, 15:00:33
Subject: Performance Concern

From: William Yu
Date: 24 October 2003, 15:01:40
Subject: Re: Tuning for mid-size server

Use of multipart index with "IN" - Mailing list pgsql-performance

Previous

Next