Re: How to retain lesser paths at add_path()? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: How to retain lesser paths at add_path()?
Date
Msg-id 20190801102454.qwvfkloue4w67pyk@development
Whole thread Raw
In response to Re: How to retain lesser paths at add_path()?  (Kohei KaiGai <kaigai@heterodb.com>)
Responses Re: How to retain lesser paths at add_path()?
List pgsql-hackers
On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
>2019年8月1日(木) 16:19 Richard Guo <riguo@pivotal.io>:
>>
>> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>>>
>>> 2019年8月1日(木) 1:41 Tom Lane <tgl@sss.pgh.pa.us>:
>>> >
>>> > Robert Haas <robertmhaas@gmail.com> writes:
>>> > > Yeah, but I have to admit that this whole design makes me kinda
>>> > > uncomfortable.  Every time somebody comes up with a new figure of
>>> > > merit, it increases not only the number of paths retained but also the
>>> > > cost of comparing two paths to possibly reject one of them. A few
>>> > > years ago, you came up with the (good) idea of rejecting some join
>>> > > paths before actually creating the paths, and I wonder if we ought to
>>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>>> > > been saying, we ought to think about planning top-down with
>>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>>> > > It just feels like the whole idea of a list of paths ordered by cost
>>> > > breaks down when there are so many ways that a not-cheapest path can
>>> > > still be worth keeping. Not sure exactly what would be better, though.
>>> >
>>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
>>> > know what to do instead though.  Changing to a top-down design
>>> > sounds like it would solve some problems while introducing others
>>> > (not to mention the amount of work and breakage involved).
>>> >
>>> Hmm... It looks the problem we ought to revise about path construction
>>> is much larger than my expectation, and uncertain for me how much works
>>> are needed.
>>>
>>> Although it might be a workaround until fundamental reconstruction,
>>> how about to have a margin of estimated cost to reject paths?
>>> Current add_path() immediately rejects lesser paths if its cost is
>>> even a little more expensive than the compared one. One the other hands,
>>
>>
>> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
>> costs of two paths, although the fuzz factor (1%) is hard coded and not
>> user-controllable.
>>
>Ah, sorry, I oversight this logic...
>

FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
solution, because how would you know what value is the right one? Why ould
10% be the right threshold, for example? In my experience these these
hard-coded coefficients imply behavior that's difficult to predict and
explain to users.

>>> I understand it is not an essential re-design of path-construction logic, and
>>> may have limitation. However, amount of works are reasonable and no side-
>>> effect. (current behavior = 0% threshold).
>>> How about your opinions?
>>>
>>
>> How's about Tom's suggestion on adding another dimension in add_path()
>> to be considered, just like how it considers paths of better sort order
>> or parallel-safe?
>>
>Robert also mentioned it makes comparison operation more complicated.
>If we try to have another dimension here, a callback function in Path node
>may be able to tell the core optimizer whether "dominated path" shall be
>dropped or not, without further complexity. It is just an idea.
>

I think adding a hook to add_path() allowing to override the decidion
should be OK. The chance of getting that committed in the near future
seems much higher than for a patch that completely reworks add_path().

There's one caveat, though - AFAICS various places in the planner use
things like cheapest_total_path, cheapest_startup_path and even
get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
considers startup/total cost. It might happen that even after keeping
additional paths, the planner still won't use them :-(

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: SQL:2011 PERIODS vs Postgres Ranges?
Next
From: Thomas Munro
Date:
Subject: Re: Support for jsonpath .datetime() method