Use of multipart index with "IN" - Mailing list pgsql-performance
From | Rob Messer |
---|---|
Subject | Use of multipart index with "IN" |
Date | |
Msg-id | 20031023181831.40095.qmail@web41215.mail.yahoo.com Whole thread Raw |
In response to | Re: Moving postgresql.conf tunables into 2003... (Sean Chittenden <sean@chittenden.org>) |
Responses |
Re: Use of multipart index with "IN"
|
List | pgsql-performance |
I have a reporting system that does regular queries on a table with a multipart index. I am running version 7.3.4. Here is the table definition: Table "public.ds_rec_fld" Column | Type | Modifiers ---------------+-------------------------+----------- dsid | character varying(20) | not null recid | integer | not null field_name | character varying(20) | not null option_tag | character varying(10) | not null option_value | integer | field_text | character varying(2000) | field_type_cd | character varying(8) | Indexes: ds_rf_ndx1 btree (recid, field_name, option_value) Normally queries are done using recid and field_name, so Postgresql returns rows very quickly as expected. Here is a sample explain analyze output for a typical query: db=> explain analyze db-> select field_name, option_tag from ds_rec_fld where recid = 3000 and field_name = 'Q3A1'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using ds_rf_ndx1 on ds_rec_fld (cost=0.00..163.09 rows=40 width=38) (actual time=0.06..0.07 rows=1 loops=1) Index Cond: ((recid = 3001) AND (field_name = 'Q3A1'::character varying)) Total runtime: 0.12 msec (3 rows) The problem comes in when we are selecting multiple field_name values in one query. The normal SQL syntax we have been using is like this: select field_name, option_tag from ds_rec_fld where recid = 3001 and field_name in ('Q3A1', 'Q3A9'); This is just a simplified example, at times there can be a lot of field_name values in one query in the "in" clause. Here postgresql refuses to use the full index, instead doing a filter based on part of the first recid part of index. Here is the explain analyze output: Index Scan using ds_rf_ndx1 on ds_rec_fld (cost=0.00..30425.51 rows=80 width=38) (actual time=0.18..1.08 rows=2 loops=1) Index Cond: (recid = 3001) Filter: ((field_name = 'Q3A1'::character varying) OR (field_name = 'Q3A9'::character varying)) Total runtime: 1.12 msec (4 rows) So, 10 times longer. This is an issue because at times we are iterating through thousands of recid values. I did a vacuum analyze, adjusted random_page_cost, etc. all to no avail. I also noticed that the problem goes away when I reformat the query like this: select field_name, option_tag from ds_rec_fld where (recid = 3001 and field_name = 'Q3A1') or (recid = 3001 and field_name = 'Q3A9') Here is the explain analyze output for this: Index Scan using ds_rf_ndx1, ds_rf_ndx1 on ds_rec_fld (cost=0.00..326.57 rows=80 width=38) (actual time=0.07..0.10 rows=2 loops=1) Index Cond: (((recid = 3001) AND (field_name = 'Q3A1'::character varying)) OR ((recid = 3001) AND (field_name = 'Q3A9'::character varying))) Total runtime: 0.16 msec (3 rows) Much better. So I have partially solved my own problem, but there are other places that this is not this simple to fix. Therefore, my question is, is there some way to force postgresql to use the full index and still stick with the shorter "field_name in ('...', '...')" syntax? If anyone has any thoughts please let me know. Also it strikes me that perhaps the optimizer could be tweaked to treat the first case like the second one. Thanks in advance, Rob __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com
pgsql-performance by date: