Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)
Date
Msg-id 199902061751.MAA03085@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> >> Why are we maintaining this huge Path List for every RelOptInfo
> >> structure if we only need the cheapest?
> 
> I think Bruce is onto something here... keep only the best-so-far
> not everything ever generated.  That'd get rid of the O(N^2)
> comparison behavior.
> 
> The only thing I can think of that we might possibly *need* the
> whole list for is if it is used as a recursion stopper.
> ("Oh, I already investigated that alternative, no need to go down
> that path again.")  It did not look to me like the optimizer does
> such a thing, but I don't claim to understand the code.
> 
> It seems to me that the search space of possible paths is
> well-structured and can be enumerated without generation of duplicates.
> You've got a known set of tables involved in a query, a fixed set of
> possible access methods for each table, and only so many ways to
> combine them.  So the real question here is why does the optimizer
> even need to check for duplicates --- should it not never generate
> any to begin with?  The profiles I ran a few days ago show that indeed
> most of the generated paths are not duplicates, but a small fraction
> are duplicates.  Why is that?  I'd feel a lot closer to understanding
> what's going on if we could explain where the duplicates come from.

Here is my guess, and I think it is valid.  There are cases where a join
of two tables may be more expensive than another plan, but the order of
the result may be cheaper to use for a later join than the cheaper plan.
That is why I suspect they are doing in better_path():
       if (samekeys(path->keys, new_path->keys) &&           equal_path_ordering(&path->p_ordering,
         &new_path->p_ordering))       {           old_path = path;           break;       }
 
So I don't think we can grab the cheapest path right away, but I think
the system is keeping around non-cheapest paths with the same result
ordering.  I also would like to throw away plans that are so expensive
that sorting another plan to the desired ordering would be cheaper than
using the plan.

I am only guessing on this.  How often does this 'keep non-cheapest path
around' really help?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [SQL] Functional Indexes
Next
From: Michael Meskes
Date:
Subject: Re: [HACKERS] DEC OSF1 Compilation problems