Re: a path towards replacing GEQO with something better - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: a path towards replacing GEQO with something better
Date
Msg-id c4224a3c-569a-679c-1b3c-208b575794b1@enterprisedb.com
Whole thread Raw
In response to Re: a path towards replacing GEQO with something better  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: a path towards replacing GEQO with something better
Re: a path towards replacing GEQO with something better
List pgsql-hackers

On 6/14/21 1:16 PM, John Naylor wrote:
> On Sun, Jun 13, 2021 at 9:50 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> wrote:
> 
>> > 2) We can still keep GEQO around (with some huge limit by default) for a
>> > few years as an escape hatch, while we refine the replacement. If there
>> > is some bug that prevents finding a plan, we can emit a WARNING and fall
>> > back to GEQO. Or if a user encounters a regression in a big query, they
>> > can lower the limit to restore the plan they had in an earlier version.
>> >
>>
>> Not sure. Keeping significant amounts of code may not be free - both for
>> maintenance and new features. It'd be a bit sad if someone proposed
>> improvements to join planning, but had to do 2x the work to support it
>> in both the DP and GEQO branches, or risk incompatibility.
> 
> Looking back again at the commit history, we did modify geqo to support
> partial paths and partition-wise join, so that's a fair concern.

Right. I think the question is how complex those changes were. If it was
mostly mechanical, it's not a big deal and we can keep both, but if it
requires deeper knowledge of the GEQO inner workings it may be an issue
(planner changes are already challenging enough).

> My concern is the risk of plan regressions after an upgrade, even if
> for a small number of cases.
> 

I don't know. My impression/experience with GEQO is that getting a good
join order for queries with many joins is often a matter of luck, and
the risk of getting poor plan just forces me to increase geqo_threshold
or disable it altogether. Or just rephrase the the query to nudge the
planner to use a good join order (those large queries are often star or
chain shaped, so it's not that difficult).

So I'm not sure "GEQO accidentally produces a better plan for this one
query" is a good argument to keep it around. We should probably evaluate
the overall behavior, and then make a decision.

FWIW I think we're facing this very dilemma for every optimizer change,
to some extent. Every time we make the planner smarter by adding a new
plan variant or heuristics, we're also increasing the opportunity for
errors. And every time we look (or should look) at average behavior and
worst case behavior ...

>> OTOH maybe this concern is unfounded in practice - I don't think we've
>> done very many big changes to geqo in the last few years.
> 
> Yeah, I get the feeling that it's already de facto obsolete, and we
> could make it a policy not to consider improvements aside from bug fixes
> where it can't find a valid plan, or forced API changes. Which I guess
> is another way of saying "deprecated".
> 
> (I briefly considered turning it into a contrib module, but that seems
> like the worst of both worlds.)
> 

True. I'm fine with deprecating / not improving geqo. What would worry
me is incompatibility, i.e. if geqo could not support some features. I'm
thinking of storage engines in MySQL not supporting some features,
leading to a mine field for users. Producing poor plans is fine, IMO.

>> This reminds me the proposals to have a GUC that'd determine how much
>> effort should the planner invest into various optimizations. For OLTP it
>> might be quite low, for large OLAP queries it'd be economical to spend
>> more time trying some more expensive optimizations.
>>
>> The challenge of course is how / in what units / to define the budget,
>> so that it's meaningful and understandable for users. Not sure if
>> "number of join rels generated" will be clear enough for users. But it
>> seems good enough for PoC / development, and hopefully people won't have
>> to tweak it very often.
> 
> I'm also in favor of having some type of "planner effort" or "OLTP to
> OLAP spectrum" guc, but I'm not yet sure whether it'd be better to have
> it separate or to couple the joinrel budget to it. If we go that route,
> I imagine there'll be many things that planner_effort changes that we
> don't want to give a separate knob for. And, I hope with graceful
> degradation and a good enough heuristic search, it won't be necessary to
> change in most cases.
> 

Yeah. I'm really worried about having a zillion separate GUC knobs for
tiny parts of the code. That's impossible to tune, and it also exposes
details about the implementation. And some people just can't resist
touching all the available options ;-)

The thing I like about JIT tunables is that it's specified in "cost"
which the users are fairly familiar with. Having another GUC with
entirely different unit is not great.

But as I said - it seems perfectly fine for PoC / development, and we
can revisit that later. Adding some sort of "planner effort" or multiple
optimization passes is a huge project on it's own.

>> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
>> [Neumann, 2018] paper seems to build on it, and it seems to work with
>> much larger subtrees of the join tree than k=5.
> 
> Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay,
> so Neumann's favorite algorithm stack "Adaptive" is complex, and I
> believe you are referring to cases where they can iteratively improve up
> to 100 rels at a time because of linearization. That's a separate
> algorithm (IKKBZ) that complicates the cost model and also cannot have
> outer joins. If it has outer joins, they use regular DP on subsets of
> size up to 10. It's not substantively different from IDP-2, and that's
> the one I'd like to try to gracefully fall back to. Or something similar.
> 

Yes, that's what I was referring to. You're right it's complex and we
don't need to implement all of that - certainly not on day one. The
linearization / IKKBZ seems interesting (even if just for inner joins),
but better to start with something generic.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Next
From: Tomas Vondra
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses