Thread: Index problem
Hi, I have a table containing columns: "END_DATE" timestamptz NOT NULL "REO_ID" int4 NOT NULL and i am indexed "REO_ID" coulumn. I have a query: select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN ('112851' ,'112859' ,'112871' ,'112883' ,'112891' ,'112904' ,'112915' ,'112924' ,'112939' ,'112947' ,'112960' ,'112984' ,'112999' ,'113013' ,'113032' ,'113059' ,'113067' ,'113084' ,'113096' ,'113103' ,'113110' ,'113117' ,'113125' ,'113132' ,'113139' ,'113146' ,'113153' ,'113160' ,'113167' ,'113174' ,'113181' ,'113188' ,'113195' ,'113204' ,'113268' ,'113279' ,'113294' ,'113302' ,'113317' ,'113340' ,'113358' ,'113385' ,'113404' ,'113412' ,'113419' ,'113429' ,'113436' ,'113443' ,'113571' ,'113636' ,'113649' ,'113689' ,'113705' ,'113744' ,'113755' ,'113724' ,'113737' ,'113812' ,'113828' ,'113762' ,'113842' ,'113869' ,'113925' ,'113976' ,'114035' ,'114044' ,'114057' ,'114070' ,'114084' ,'114094' ,'114119' ) and it is _not_ using that index But following query (notice there are less id-s in WHERE clause, but rest is same) select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN ('112851' ,'112859' ,'112871' ,'112883' ,'112891' ,'112904' ,'112915' ,'112924' ,'112939' ,'112947' ,'112960' ,'112984' ,'112999' ,'113013' ,'113032' ,'113059' ,'113067' ,'113084' ,'113096' ,'113103' ,'113110' ,'113117' ,'113125' ,'113132' ,'113139' ,'113146' ,'113153' ,'113160' ,'113167' ,'113174' ,'113181' ,'113188' ,'113195' ,'113204' ,'113268' ,'113279' ,'113294' ,'113302' ,'113317' ,'113340' ,'113358' ,'113385' ,'113404' ,'113412' ,'113419' ,'113429' ,'113436' ,'113443' ,'113571' ,'113636' ,'113649' ,'113689' ,'113705' ,'113744' ,'113755' ,'113724' ,'113737' ) will _is_ using index: Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS (cost=0.00..394.06 rows=102 width=12) What causes this behaviour? is there any workaround? Suggestions? best, Rigmor Ukuhe --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003
> Hi, > > I have a table containing columns: > > "END_DATE" timestamptz NOT NULL > "REO_ID" int4 NOT NULL > > and i am indexed "REO_ID" coulumn. > I have a query: > > select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN > ('112851' ,'112859' ,'112871' ,'112883' ,'112891' ,'112904' ,'112915' > ,'112924' ,'112939' ,'112947' ,'112960' ,'112984' ,'112999' ,'113013' > ,'113032' ,'113059' ,'113067' ,'113084' ,'113096' ,'113103' ,'113110' > ,'113117' ,'113125' ,'113132' ,'113139' ,'113146' ,'113153' ,'113160' > ,'113167' ,'113174' ,'113181' ,'113188' ,'113195' ,'113204' ,'113268' > ,'113279' ,'113294' ,'113302' ,'113317' ,'113340' ,'113358' ,'113385' > ,'113404' ,'113412' ,'113419' ,'113429' ,'113436' ,'113443' ,'113571' > ,'113636' ,'113649' ,'113689' ,'113705' ,'113744' ,'113755' ,'113724' > ,'113737' ,'113812' ,'113828' ,'113762' ,'113842' ,'113869' ,'113925' > ,'113976' ,'114035' ,'114044' ,'114057' ,'114070' ,'114084' ,'114094' > ,'114119' ) > > and it is _not_ using that index > > But following query (notice there are less id-s in WHERE clause, but rest is > same) > > select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN > ('112851' ,'112859' ,'112871' ,'112883' ,'112891' ,'112904' ,'112915' > ,'112924' ,'112939' ,'112947' ,'112960' ,'112984' ,'112999' ,'113013' > ,'113032' ,'113059' ,'113067' ,'113084' ,'113096' ,'113103' ,'113110' > ,'113117' ,'113125' ,'113132' ,'113139' ,'113146' ,'113153' ,'113160' > ,'113167' ,'113174' ,'113181' ,'113188' ,'113195' ,'113204' ,'113268' > ,'113279' ,'113294' ,'113302' ,'113317' ,'113340' ,'113358' ,'113385' > ,'113404' ,'113412' ,'113419' ,'113429' ,'113436' ,'113443' ,'113571' > ,'113636' ,'113649' ,'113689' ,'113705' ,'113744' ,'113755' ,'113724' > ,'113737' ) > > will _is_ using index: Why not. It's just because the second query is more selective. Probably you don't have too many rows in your table and Postgres thinks it's better (faster) to use sequential scan than index one. Regards, Tomasz Myrta
> What causes this behaviour? is there any workaround? Suggestions? > How many rows are there in the table, and can you post the 'explain analyze' for both queries after doing a 'vacuum verboseanalyze [tablename]'? Cheers Matt
> > What causes this behaviour? is there any workaround? Suggestions? > > > > How many rows are there in the table, and can you post the > 'explain analyze' for both queries after doing a 'vacuum verbose analyze > [tablename]'? There are about 2500 rows in that table. 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0 loops=1) Total runtime: 98.74 msec 2nd query explain analyze: NOTICE: QUERY PLAN: Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS (cost=0.00..394.06 rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1) Total runtime: 21.59 msec Any help? Rigmor > > Cheers > > Matt > > > > --- > Incoming mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003 > --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003
> There are about 2500 rows in that table. > > 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS > (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0 > loops=1) > Total runtime: 98.74 msec > > 2nd query explain analyze: NOTICE: QUERY PLAN: > > Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, [snip] > PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS (cost=0.00..394.06 > rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1) > Total runtime: 21.59 msec With only 2500 rows the planner could be deciding that it's going to have to read every disk block to do an index scan anyway,so it might as well do a sequential scan. If the pages are in fact in the kernel cache then the compute time will dominate, notthe IO time, so it ends up looking like a bad plan, but it's probably not really such a bad plan... Is your effective_cache_size set to something sensibly large? You could also try decreasing cpu_index_tuple_cost and cpu_tuple_cost. These will affect all your queries though, so whatyou gain on one might be lost on another. Matt
"Rigmor Ukuhe" <rigmor.ukuhe@finestmedia.com> writes: >>> What causes this behaviour? is there any workaround? Suggestions? At some point the planner is going to decide that one seqscan is cheaper than repeated indexscans. At some point it'll be right ... but in this case it seems its relative cost estimates are off a bit. You might try reducing random_page_cost to bring them more into line with reality. (But keep in mind that the reality you are measuring appears to be small-table-already-fully-cached reality. On a large table you might find that small random_page_cost isn't such a hot idea after all.) regards, tom lane