Thread: Planner issue

Planner issue

From
Alex Turner
Date:
I get the following output from explain analyze on a certain subset of
a large query I'm doing.

From the looks of it, I need to increase how often postgres uses an
index over a seq scan, but I'm not sure how to do that.  I looked
through the run-time configuration docs on the website, but didn't see
anything pertaining to index selectivity.

Thanks,

Alex Turner
netEconomist


trendmls=# explain analyze select listnum from propmain where
listprice<=300000 and listprice>=220000;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on propmain  (cost=0.00..15556.05 rows=6228 width=4) (actual
time=0.093..506.730 rows=5671 loops=1)
   Filter: ((listprice <= 300000::numeric) AND (listprice >= 220000::numeric))
 Total runtime: 510.482 ms
(3 rows)

trendmls=# explain analyze select listnum from propmain where
listprice<=300000 and listprice>=250000;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using propmain_listprice_i on propmain
(cost=0.00..12578.65 rows=3486 width=4) (actual time=0.103..16.418
rows=3440 loops=1)
   Index Cond: ((listprice <= 300000::numeric) AND (listprice >=
250000::numeric))
 Total runtime: 18.528 ms
(3 rows)

Re: Planner issue

From
"Joshua D. Drake"
Date:
Alex Turner wrote:

>I get the following output from explain analyze on a certain subset of
>a large query I'm doing.
>
>
>
Try increases the statistics on the listprice column with alter
table and then re-run analyze.

alter table foo alter column set statistics <n>

Sincerely,

Joshua D. Drake


>From the looks of it, I need to increase how often postgres uses an
>index over a seq scan, but I'm not sure how to do that.  I looked
>through the run-time configuration docs on the website, but didn't see
>anything pertaining to index selectivity.
>
>Thanks,
>
>Alex Turner
>netEconomist
>
>
>trendmls=# explain analyze select listnum from propmain where
>listprice<=300000 and listprice>=220000;
>                                                  QUERY PLAN
>--------------------------------------------------------------------------------------------------------------
> Seq Scan on propmain  (cost=0.00..15556.05 rows=6228 width=4) (actual
>time=0.093..506.730 rows=5671 loops=1)
>   Filter: ((listprice <= 300000::numeric) AND (listprice >= 220000::numeric))
> Total runtime: 510.482 ms
>(3 rows)
>
>trendmls=# explain analyze select listnum from propmain where
>listprice<=300000 and listprice>=250000;
>                                                                QUERY PLAN

>------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan using propmain_listprice_i on propmain
>(cost=0.00..12578.65 rows=3486 width=4) (actual time=0.103..16.418
>rows=3440 loops=1)
>   Index Cond: ((listprice <= 300000::numeric) AND (listprice >=
>250000::numeric))
> Total runtime: 18.528 ms
>(3 rows)
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

Re: Planner issue

From
Alex Turner
Date:
This helps a bit when I set it to 1000 - but it's still pretty bad:

I will use an index 220-300, but not 200-300.

Alex

trendmls=# explain analyze select listnum from propmain where
listprice<=300000 and listprice>=200000;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on propmain  (cost=0.00..15517.56 rows=6842 width=4) (actual
time=0.039..239.760 rows=6847 loops=1)
   Filter: ((listprice <= 300000::numeric) AND (listprice >= 200000::numeric))
 Total runtime: 244.301 ms
(3 rows)

trendmls=# set enable_seqscan=off;
SET
trendmls=# explain analyze select listnum from propmain where
listprice<=300000 and listprice>=200000;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using propmain_listprice_i on propmain
(cost=0.00..22395.95 rows=6842 width=4) (actual time=0.084..25.751
rows=6847 loops=1)
   Index Cond: ((listprice <= 300000::numeric) AND (listprice >=
200000::numeric))
 Total runtime: 30.193 ms
(3 rows)

trendmls=#



On Tue, 22 Mar 2005 08:22:59 -0800, Joshua D. Drake
<jd@commandprompt.com> wrote:
> Alex Turner wrote:
>
> >I get the following output from explain analyze on a certain subset of
> >a large query I'm doing.
> >
> >
> >
> Try increases the statistics on the listprice column with alter
> table and then re-run analyze.
>
> alter table foo alter column set statistics <n>
>
> Sincerely,
>
> Joshua D. Drake
>
>
> >>From the looks of it, I need to increase how often postgres uses an
> >index over a seq scan, but I'm not sure how to do that.  I looked
> >through the run-time configuration docs on the website, but didn't see
> >anything pertaining to index selectivity.
> >
> >Thanks,
> >
> >Alex Turner
> >netEconomist
> >
> >
> >trendmls=# explain analyze select listnum from propmain where
> >listprice<=300000 and listprice>=220000;
> >                                                  QUERY PLAN
> >--------------------------------------------------------------------------------------------------------------
> > Seq Scan on propmain  (cost=0.00..15556.05 rows=6228 width=4) (actual
> >time=0.093..506.730 rows=5671 loops=1)
> >   Filter: ((listprice <= 300000::numeric) AND (listprice >= 220000::numeric))
> > Total runtime: 510.482 ms
> >(3 rows)
> >
> >trendmls=# explain analyze select listnum from propmain where
> >listprice<=300000 and listprice>=250000;
> >                                                                QUERY PLAN
>
>------------------------------------------------------------------------------------------------------------------------------------------
> > Index Scan using propmain_listprice_i on propmain
> >(cost=0.00..12578.65 rows=3486 width=4) (actual time=0.103..16.418
> >rows=3440 loops=1)
> >   Index Cond: ((listprice <= 300000::numeric) AND (listprice >=
> >250000::numeric))
> > Total runtime: 18.528 ms
> >(3 rows)
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
> >
>
> --
> Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
> Postgresql support, programming shared hosting and dedicated hosting.
> +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
> PostgreSQL Replicator -- production quality replication for PostgreSQL
>
>
>

Re: Planner issue

From
Ragnar Hafstað
Date:
On Tue, 2005-03-22 at 14:36 -0500, Alex Turner wrote:

> I will use an index 220-300, but not 200-300.
> ...
>  Seq Scan on propmain  (cost=0.00..15517.56 rows=6842 width=4) (actual
> time=0.039..239.760 rows=6847 loops=1)
> ...
>  Index Scan using propmain_listprice_i on propmain
> (cost=0.00..22395.95 rows=6842 width=4) (actual time=0.084..25.751
> rows=6847 loops=1)

the rows estimates are accurate, so it is not a question of statistics
anymore.

first make sure effective_cache_size is correctly set, and then
if that is not enough, you might try to lower random_page_cost a bit


gnari



Re: Planner issue

From
Ron Mayer
Date:
I'm guessing your data is actually more "clustered" than the
"correlation" stastic thinks it is.

Alex Turner wrote:
 >   trendmls=# explain analyze select listnum from propmain where
 >                      listprice<=300000 and listprice>=200000;


Is that a database of properties like land/houses?

If your table is clustered geographically (by zip code, etc),
the index scan might do quite well because all houses in a
neighborhood may have similar prices (and therefore live on
just a few disk pages).   However since high-priced neighborhoods
are scattered across the country, the optimizer would see
a very low "correlation" and not notice this clustering.


If this is the cause, one thing you could do is
CLUSTER your table on propmain_listprice_i.    I'm quite
confident it'll fix this particular query - but might
slow down other queries.



Re: Planner issue

From
Ron Mayer
Date:
I'm guessing your data is actually more "clustered" than the
"correlation" statistic thinks it is.

Alex Turner wrote:
 >   trendmls=# explain analyze select listnum from propmain where
 >                      listprice<=300000 and listprice>=200000;


Is that a database of properties like land/houses?

If your table is clustered geographically (by zip code, etc),
the index scan might do quite well because all houses in a
neighborhood may have similar prices (and therefore live on
just a few disk pages).   However since high-priced neighborhoods
are scattered across the country, the optimizer would see
a very low "correlation" and not notice this clustering.


If this is the cause, one thing you could do is
CLUSTER your table on propmain_listprice_i.    I'm quite
confident it'll fix this particular query - but might
slow down other queries.