Re: Fuzzy cost comparison to eliminate redundant planning - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Fuzzy cost comparison to eliminate redundant planning
Date
Msg-id 200403291500.i2TF0mm20947@candle.pha.pa.us
Whole thread Raw
In response to Re: Fuzzy cost comparison to eliminate redundant planning  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fuzzy cost comparison to eliminate redundant planning
List pgsql-hackers
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Right.  There are potentially some ranges of LIMIT for which it could
> >> win, I believe.
> 
> > What if we take the total cost and divide it by the number of rows returned ---
> > then we have a per-row cost for each plan. Then we subtract the two, and
> > that difference compared to the difference in startup costs tell us how
> > many rows could potentially use this plan.
> 
> Here, plan B loses everywhere: to A at lower percentages and to C at
> higher ones.  But I don't see how to eliminate B on the basis of
> one-at-a-time comparisons.  It seems like getting rid of B would require
> turning add_path into an O(N^3) instead of O(N^2) algorithm ... which
> would almost certainly eat more cycles than it'd save.

Nice charts.  :-)

I agree we don't want anything that is O(high), but I was thinking of
something that would be more agressive than 1%, which works well for
lots of self joins, but I am not sure how well for other cases. 
Consider these plans that return 10 rows:
total    startup    total per row retrieved
plan1    1    1    .1
plan2    5    0.5    .5
plan3    10    0    1

Now, the difference between plan1 and plan2 total is 500%, yet it is a
useless plan.  If you want to retrieve one row, you pick plan3, if you
want 2 or more rows, you pick plan1.

If the per-row total cost plus the startup cost is less than another's,
we can throw it away. In fact, when we go check for cheapest startup
cost to keep, do we at least assume we have one row fetched?

What if instead of doing total cost 1% difference, we compute
total-per-row + startup being 1% different?  Does that catch more
similar cost plans?  Seems it would.

In your example, how many rows were returned?  I can see how this would
have handled that case.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: GIST code doesn't build on strict 64-bit machines
Next
From: Chris Bowlby
Date:
Subject: Row sampling..