Thread: Planner issue
I get the following output from explain analyze on a certain subset of a large query I'm doing. From the looks of it, I need to increase how often postgres uses an index over a seq scan, but I'm not sure how to do that. I looked through the run-time configuration docs on the website, but didn't see anything pertaining to index selectivity. Thanks, Alex Turner netEconomist trendmls=# explain analyze select listnum from propmain where listprice<=300000 and listprice>=220000; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on propmain (cost=0.00..15556.05 rows=6228 width=4) (actual time=0.093..506.730 rows=5671 loops=1) Filter: ((listprice <= 300000::numeric) AND (listprice >= 220000::numeric)) Total runtime: 510.482 ms (3 rows) trendmls=# explain analyze select listnum from propmain where listprice<=300000 and listprice>=250000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using propmain_listprice_i on propmain (cost=0.00..12578.65 rows=3486 width=4) (actual time=0.103..16.418 rows=3440 loops=1) Index Cond: ((listprice <= 300000::numeric) AND (listprice >= 250000::numeric)) Total runtime: 18.528 ms (3 rows)
Alex Turner wrote: >I get the following output from explain analyze on a certain subset of >a large query I'm doing. > > > Try increases the statistics on the listprice column with alter table and then re-run analyze. alter table foo alter column set statistics <n> Sincerely, Joshua D. Drake >From the looks of it, I need to increase how often postgres uses an >index over a seq scan, but I'm not sure how to do that. I looked >through the run-time configuration docs on the website, but didn't see >anything pertaining to index selectivity. > >Thanks, > >Alex Turner >netEconomist > > >trendmls=# explain analyze select listnum from propmain where >listprice<=300000 and listprice>=220000; > QUERY PLAN >-------------------------------------------------------------------------------------------------------------- > Seq Scan on propmain (cost=0.00..15556.05 rows=6228 width=4) (actual >time=0.093..506.730 rows=5671 loops=1) > Filter: ((listprice <= 300000::numeric) AND (listprice >= 220000::numeric)) > Total runtime: 510.482 ms >(3 rows) > >trendmls=# explain analyze select listnum from propmain where >listprice<=300000 and listprice>=250000; > QUERY PLAN >------------------------------------------------------------------------------------------------------------------------------------------ > Index Scan using propmain_listprice_i on propmain >(cost=0.00..12578.65 rows=3486 width=4) (actual time=0.103..16.418 >rows=3440 loops=1) > Index Cond: ((listprice <= 300000::numeric) AND (listprice >= >250000::numeric)) > Total runtime: 18.528 ms >(3 rows) > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
This helps a bit when I set it to 1000 - but it's still pretty bad: I will use an index 220-300, but not 200-300. Alex trendmls=# explain analyze select listnum from propmain where listprice<=300000 and listprice>=200000; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on propmain (cost=0.00..15517.56 rows=6842 width=4) (actual time=0.039..239.760 rows=6847 loops=1) Filter: ((listprice <= 300000::numeric) AND (listprice >= 200000::numeric)) Total runtime: 244.301 ms (3 rows) trendmls=# set enable_seqscan=off; SET trendmls=# explain analyze select listnum from propmain where listprice<=300000 and listprice>=200000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using propmain_listprice_i on propmain (cost=0.00..22395.95 rows=6842 width=4) (actual time=0.084..25.751 rows=6847 loops=1) Index Cond: ((listprice <= 300000::numeric) AND (listprice >= 200000::numeric)) Total runtime: 30.193 ms (3 rows) trendmls=# On Tue, 22 Mar 2005 08:22:59 -0800, Joshua D. Drake <jd@commandprompt.com> wrote: > Alex Turner wrote: > > >I get the following output from explain analyze on a certain subset of > >a large query I'm doing. > > > > > > > Try increases the statistics on the listprice column with alter > table and then re-run analyze. > > alter table foo alter column set statistics <n> > > Sincerely, > > Joshua D. Drake > > > >>From the looks of it, I need to increase how often postgres uses an > >index over a seq scan, but I'm not sure how to do that. I looked > >through the run-time configuration docs on the website, but didn't see > >anything pertaining to index selectivity. > > > >Thanks, > > > >Alex Turner > >netEconomist > > > > > >trendmls=# explain analyze select listnum from propmain where > >listprice<=300000 and listprice>=220000; > > QUERY PLAN > >-------------------------------------------------------------------------------------------------------------- > > Seq Scan on propmain (cost=0.00..15556.05 rows=6228 width=4) (actual > >time=0.093..506.730 rows=5671 loops=1) > > Filter: ((listprice <= 300000::numeric) AND (listprice >= 220000::numeric)) > > Total runtime: 510.482 ms > >(3 rows) > > > >trendmls=# explain analyze select listnum from propmain where > >listprice<=300000 and listprice>=250000; > > QUERY PLAN > >------------------------------------------------------------------------------------------------------------------------------------------ > > Index Scan using propmain_listprice_i on propmain > >(cost=0.00..12578.65 rows=3486 width=4) (actual time=0.103..16.418 > >rows=3440 loops=1) > > Index Cond: ((listprice <= 300000::numeric) AND (listprice >= > >250000::numeric)) > > Total runtime: 18.528 ms > >(3 rows) > > > >---------------------------(end of broadcast)--------------------------- > >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > -- > Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC > Postgresql support, programming shared hosting and dedicated hosting. > +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com > PostgreSQL Replicator -- production quality replication for PostgreSQL > > >
On Tue, 2005-03-22 at 14:36 -0500, Alex Turner wrote: > I will use an index 220-300, but not 200-300. > ... > Seq Scan on propmain (cost=0.00..15517.56 rows=6842 width=4) (actual > time=0.039..239.760 rows=6847 loops=1) > ... > Index Scan using propmain_listprice_i on propmain > (cost=0.00..22395.95 rows=6842 width=4) (actual time=0.084..25.751 > rows=6847 loops=1) the rows estimates are accurate, so it is not a question of statistics anymore. first make sure effective_cache_size is correctly set, and then if that is not enough, you might try to lower random_page_cost a bit gnari
I'm guessing your data is actually more "clustered" than the "correlation" stastic thinks it is. Alex Turner wrote: > trendmls=# explain analyze select listnum from propmain where > listprice<=300000 and listprice>=200000; Is that a database of properties like land/houses? If your table is clustered geographically (by zip code, etc), the index scan might do quite well because all houses in a neighborhood may have similar prices (and therefore live on just a few disk pages). However since high-priced neighborhoods are scattered across the country, the optimizer would see a very low "correlation" and not notice this clustering. If this is the cause, one thing you could do is CLUSTER your table on propmain_listprice_i. I'm quite confident it'll fix this particular query - but might slow down other queries.
I'm guessing your data is actually more "clustered" than the "correlation" statistic thinks it is. Alex Turner wrote: > trendmls=# explain analyze select listnum from propmain where > listprice<=300000 and listprice>=200000; Is that a database of properties like land/houses? If your table is clustered geographically (by zip code, etc), the index scan might do quite well because all houses in a neighborhood may have similar prices (and therefore live on just a few disk pages). However since high-priced neighborhoods are scattered across the country, the optimizer would see a very low "correlation" and not notice this clustering. If this is the cause, one thing you could do is CLUSTER your table on propmain_listprice_i. I'm quite confident it'll fix this particular query - but might slow down other queries.