Re: Optimizer(?) off by factor of 3 ... ? - Mailing list pgsql-hackers

From Marc G. Fournier
Subject Re: Optimizer(?) off by factor of 3 ... ?
Date
Msg-id 20020212092343.V59276-100000@mail1.hub.org
Whole thread Raw
In response to Re: Optimizer(?) off by factor of 3 ... ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 11 Feb 2002, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@hub.org> writes:
> > [ bogus optimizer choices in 7.2 ]
>
> Well, I guess the good news is we seem to be past the old bugaboo of bad
> statistics: the estimated row counts are all in the right ballpark.  Now
> we get to have fun with the cost models :-).
>
> It looks to me like there are a couple of problems here.  One is that
> the default value of effective_cache_size is way too small --- it's set
> at 1000, which is probably silly when you have NBuffers set to 32768.
> (In hindsight maybe we should have expressed it as a multiple of
> NBuffers rather than an absolute size.)  You could tweak that with a
> postgresql.conf change, but I'm not sure that that alone will help much.
>
> The more difficult issue is that nestloops with inner indexscan are
> being seriously misestimated.  We're computing the cost as though each
> iteration of the inner scan were completely independent and being done
> from a standing start --- which is wrong, because in practice scans
> after the first will tend to find buffer cache hits for pages already
> read in by prior scans.  You can bet, for example, that the btree
> metapage and root page aren't going to need to be re-read on each
> iteration.
>
> I am thinking that the right way to do this is to cost the entire inner
> indexscan (all iterations put together) as if it were a single
> indexscan, at least for the purposes of applying the Mackert & Lohman
> formula embedded in cost_index.  That would give us a more realistic
> result for the total cost of the main-table accesses driven by the
> index.  Not sure how to adjust the cost estimate for reading the index,
> but clearly we need to make some adjustment for repeated hits on the
> upper index pages.
>
> This is probably a bigger change than we can hope to make in 7.2.* ...
>
> BTW, what do you get if you EXPLAIN ANALYZE that orient/clubs join
> with seqscan enabled and hashjoin disabled?  If it's a mergejoin,
> how about if you also disable mergejoin?  It seems to me that a seqscan
> on clubs would be a much better way to do the nestloop join than an
> indexscan --- but it's being forced into an indexscan because you
> disabled seqscan.

iwantu=# set enable_seqscan=true;
iwantu=# set enable_hashjoin=false;
iwantu=# explain analyze SELECT o.uid,headline,pictures,voice FROM orient o JOIN clubs c ON (o.uid = c.uid AND c.club =
1AND ( c.hide ='1' OR c.hide='2' ) AND (o.female) );
 
NOTICE:  QUERY PLAN:

Merge Join  (cost=97750.86..100011.74 rows=78391 width=72) (actual time=17041.33..23771.57 rows=50745 loops=1) ->  Sort
(cost=53412.61..53412.61 rows=422145 width=8) (actual time=12996.56..15563.59 rows=418951 loops=1)       ->  Seq Scan
onorient o  (cost=0.00..7718.69 rows=422145 width=8) (actual time=0.02..3237.46 rows=418951 loops=1) ->  Sort
(cost=44338.25..44338.25rows=90251 width=64) (actual time=4044.65..4531.18 rows=76954 loops=1)       ->  Seq Scan on
clubsc  (cost=0.00..34057.19 rows=90251 width=64) (actual time=0.04..1399.83 rows=76954 loops=1)
 
Total runtime: 24082.76 msec

iwantu=# set enable_mergejoin=false;
iwantu=# explain analyze SELECT o.uid,headline,pictures,voice FROM orient o JOIN clubs c ON (o.uid = c.uid AND c.club =
1AND ( c.hide ='1' OR c.hide='2' ) AND (o.female) );
 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..363373.00 rows=78391 width=72) (actual time=0.54..5488.15 rows=50745 loops=1) ->  Seq Scan on
clubsc  (cost=0.00..34057.19 rows=90251 width=64) (actual time=0.03..1434.97 rows=76954 loops=1) ->  Index Scan using
orient_pkeyon orient o  (cost=0.00..3.64 rows=1 width=8) (actual time=0.03..0.03 rows=1 loops=76954)
 
Total runtime: 5769.21 msec





pgsql-hackers by date:

Previous
From: Daniel Kalchev
Date:
Subject: again on index usage (7.1.3)
Next
From: Tom Lane
Date:
Subject: Re: RTLD_LAZY considered harmful (Re: pltlc and pltlcu