Re: a path towards replacing GEQO with something better - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: a path towards replacing GEQO with something better |
Date | |
Msg-id | CAFBsxsGCEZiqZZiBN1mMD26y4xUyqEkV00_6Eth8GEnBnko1_w@mail.gmail.com Whole thread Raw |
In response to | Re: a path towards replacing GEQO with something better (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: a path towards replacing GEQO with something better
|
List | pgsql-hackers |
On Sun, Jun 13, 2021 at 9:50 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> > 2) We can still keep GEQO around (with some huge limit by default) for a
> > few years as an escape hatch, while we refine the replacement. If there
> > is some bug that prevents finding a plan, we can emit a WARNING and fall
> > back to GEQO. Or if a user encounters a regression in a big query, they
> > can lower the limit to restore the plan they had in an earlier version.
> >
>
> Not sure. Keeping significant amounts of code may not be free - both for
> maintenance and new features. It'd be a bit sad if someone proposed
> improvements to join planning, but had to do 2x the work to support it
> in both the DP and GEQO branches, or risk incompatibility.
Looking back again at the commit history, we did modify geqo to support partial paths and partition-wise join, so that's a fair concern. My concern is the risk of plan regressions after an upgrade, even if for a small number of cases.
> OTOH maybe this concern is unfounded in practice - I don't think we've
> done very many big changes to geqo in the last few years.
Yeah, I get the feeling that it's already de facto obsolete, and we could make it a policy not to consider improvements aside from bug fixes where it can't find a valid plan, or forced API changes. Which I guess is another way of saying "deprecated".
(I briefly considered turning it into a contrib module, but that seems like the worst of both worlds.)
> This reminds me the proposals to have a GUC that'd determine how much
> effort should the planner invest into various optimizations. For OLTP it
> might be quite low, for large OLAP queries it'd be economical to spend
> more time trying some more expensive optimizations.
>
> The challenge of course is how / in what units / to define the budget,
> so that it's meaningful and understandable for users. Not sure if
> "number of join rels generated" will be clear enough for users. But it
> seems good enough for PoC / development, and hopefully people won't have
> to tweak it very often.
I'm also in favor of having some type of "planner effort" or "OLTP to OLAP spectrum" guc, but I'm not yet sure whether it'd be better to have it separate or to couple the joinrel budget to it. If we go that route, I imagine there'll be many things that planner_effort changes that we don't want to give a separate knob for. And, I hope with graceful degradation and a good enough heuristic search, it won't be necessary to change in most cases.
> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
> [Neumann, 2018] paper seems to build on it, and it seems to work with
> much larger subtrees of the join tree than k=5.
Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay, so Neumann's favorite algorithm stack "Adaptive" is complex, and I believe you are referring to cases where they can iteratively improve up to 100 rels at a time because of linearization. That's a separate algorithm (IKKBZ) that complicates the cost model and also cannot have outer joins. If it has outer joins, they use regular DP on subsets of size up to 10. It's not substantively different from IDP-2, and that's the one I'd like to try to gracefully fall back to. Or something similar.
> What I find fairly interesting is the section in [Neumann, 2018] about
> cardinality estimates, and quality of query plans when the estimates are
> off. The last paragraph in the 6.5 section essentially says that despite
> poor estimates, the proposed algorithm performs better than the simple
> (and cheap) heuristics. I'm not sure what to think about that, because
> my "intuitive" understanding is that the more elaborate the planning is,
> the more errors it can make when the estimates are off.
Yeah, I'm not sure this part is very useful and seems almost like an afterthought. In table 3, all those poor examples are "pure" greedy algorithms and don't have iterative refinement added, so it kind of makes sense that poor estimates would hurt them more. But they don't compare those with *just* a refinement step added. I also don't know how realistic their "estimate fuzzing" is.
--
John Naylor
EDB: http://www.enterprisedb.com
> > 2) We can still keep GEQO around (with some huge limit by default) for a
> > few years as an escape hatch, while we refine the replacement. If there
> > is some bug that prevents finding a plan, we can emit a WARNING and fall
> > back to GEQO. Or if a user encounters a regression in a big query, they
> > can lower the limit to restore the plan they had in an earlier version.
> >
>
> Not sure. Keeping significant amounts of code may not be free - both for
> maintenance and new features. It'd be a bit sad if someone proposed
> improvements to join planning, but had to do 2x the work to support it
> in both the DP and GEQO branches, or risk incompatibility.
Looking back again at the commit history, we did modify geqo to support partial paths and partition-wise join, so that's a fair concern. My concern is the risk of plan regressions after an upgrade, even if for a small number of cases.
> OTOH maybe this concern is unfounded in practice - I don't think we've
> done very many big changes to geqo in the last few years.
Yeah, I get the feeling that it's already de facto obsolete, and we could make it a policy not to consider improvements aside from bug fixes where it can't find a valid plan, or forced API changes. Which I guess is another way of saying "deprecated".
(I briefly considered turning it into a contrib module, but that seems like the worst of both worlds.)
> This reminds me the proposals to have a GUC that'd determine how much
> effort should the planner invest into various optimizations. For OLTP it
> might be quite low, for large OLAP queries it'd be economical to spend
> more time trying some more expensive optimizations.
>
> The challenge of course is how / in what units / to define the budget,
> so that it's meaningful and understandable for users. Not sure if
> "number of join rels generated" will be clear enough for users. But it
> seems good enough for PoC / development, and hopefully people won't have
> to tweak it very often.
I'm also in favor of having some type of "planner effort" or "OLTP to OLAP spectrum" guc, but I'm not yet sure whether it'd be better to have it separate or to couple the joinrel budget to it. If we go that route, I imagine there'll be many things that planner_effort changes that we don't want to give a separate knob for. And, I hope with graceful degradation and a good enough heuristic search, it won't be necessary to change in most cases.
> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
> [Neumann, 2018] paper seems to build on it, and it seems to work with
> much larger subtrees of the join tree than k=5.
Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay, so Neumann's favorite algorithm stack "Adaptive" is complex, and I believe you are referring to cases where they can iteratively improve up to 100 rels at a time because of linearization. That's a separate algorithm (IKKBZ) that complicates the cost model and also cannot have outer joins. If it has outer joins, they use regular DP on subsets of size up to 10. It's not substantively different from IDP-2, and that's the one I'd like to try to gracefully fall back to. Or something similar.
> What I find fairly interesting is the section in [Neumann, 2018] about
> cardinality estimates, and quality of query plans when the estimates are
> off. The last paragraph in the 6.5 section essentially says that despite
> poor estimates, the proposed algorithm performs better than the simple
> (and cheap) heuristics. I'm not sure what to think about that, because
> my "intuitive" understanding is that the more elaborate the planning is,
> the more errors it can make when the estimates are off.
Yeah, I'm not sure this part is very useful and seems almost like an afterthought. In table 3, all those poor examples are "pure" greedy algorithms and don't have iterative refinement added, so it kind of makes sense that poor estimates would hurt them more. But they don't compare those with *just* a refinement step added. I also don't know how realistic their "estimate fuzzing" is.
--
John Naylor
EDB: http://www.enterprisedb.com
pgsql-hackers by date: