Re: planner with index scan cost way off actual cost, advices to tweak cost constants? - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
Date
Msg-id 20060321115835.GN15742@pervasive.com
Whole thread Raw
In response to Re: planner with index scan cost way off actual cost, advices to tweak cost constants?  (Guillaume Cottenceau <gc@mnc.ch>)
Responses Re: planner with index scan cost way off actual cost, advices to tweak cost constants?  (Guillaume Cottenceau <gc@mnc.ch>)
List pgsql-performance
On Tue, Mar 21, 2006 at 11:13:06AM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> [...]
>
> > > My point is that the planner's cost estimate is way above the
> > > actual cost of the query, so the planner doesn't use the best
> > > plan. Even if the index returns so much rows, actual cost of the
> > > query is so that index scan (worst case, all disk cache flushed)
> > > is still better than seq scan but the planner uses seq scan.
> >
> > Yes. The cost estimator for an index scan supposedly does a linear
> > interpolation between a minimum cost and a maximum cost depending on the
> > correlation of the first field in the index. The problem is that while
> > the comment states it's a linear interpolation, the actual formula
> > squares the correlation before interpolating. This means that unless the
> > correlation is very high, you're going to get an unrealistically high
> > cost for an index scan. I have data that supports this at
> > http://stats.distributed.net/~decibel/, but I've never been able to get
> > around to testing a patch to see if it improves things.
>
> Interesting.
>
> It would be nice to investigate the arguments behind the choice
> you describe for the formula used to perform the interpolation. I
> have absolutely no knowledge on pg internals so this is rather
> new/fresh for me, I have no idea how smart that choice is (but
> based on my general feeling about pg, I'm suspecting this is
> actually smart but I am not smart enough to see why ;p).

If you feel like running some tests, you need to change

    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

in src/backend/optimizer/path/costsize.c to something like

    run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);

That might not produce a perfect cost estimate, but I'll wager that it
will be substantially better than what's in there now. FYI, see also
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-performance by date:

Previous
From: "Steinar H. Gunderson"
Date:
Subject: Re: Migration study, step 1: bulk write performance
Next
From: Csaba Nagy
Date:
Subject: Re: Migration study, step 1: bulk write performance