Thread: a path towards replacing GEQO with something better

a path towards replacing GEQO with something better

From
John Naylor
Date:
Hi,

On occasion it comes up that the genetic query optimizer (GEQO) doesn't produce particularly great plans, and is slow ([1] for example). The only alternative that has gotten as far as a prototype patch (as far as I know) is simulated annealing some years ago, which didn't seem to get far.

The join problem as it pertains to Postgres has been described within the community in
[Gustaffson, 2017] and [Stroffek & Kovarik, 2007].

The fact that there is much more interest than code in this area indicates that this is a hard problem. I hadn't given it much thought myself until by chance I came across [Neumann, 2018], which describes a number of interesting ideas. The key takeaway is that you want a graceful transition between exhaustive search and heuristic search. In other words, if the space of possible join orderings is just slightly larger than the maximum allowed exhaustive search, then the search should be *almost* exhaustive. This not only increases the chances of finding a good plan, but also has three engineering advantages I can think of:

1) It's natural to re-use data structures etc. already used by the existing dynamic programming (DP) algorithm. Framing the problem as extending DP greatly lowers the barrier to making realistic progress. If the problem is framed as "find an algorithm as a complete drop-in replacement for GEQO", it's a riskier project in my view.

2) We can still keep GEQO around (with some huge limit by default) for a few years as an escape hatch, while we refine the replacement. If there is some bug that prevents finding a plan, we can emit a WARNING and fall back to GEQO. Or if a user encounters a regression in a big query, they can lower the limit to restore the plan they had in an earlier version.

3) It actually improves the existing exhaustive search, because the complexity of the join order problem depends on the query shape: a "chain" shape (linear) vs. a "star" shape (as in star schema), for the most common examples. The size of the DP table grows like this (for n >= 4):

Chain: (n^3 - n) / 6   (including bushy plans)
Star:  (n - 1) * 2^(n - 2)

 n  chain       star
--------------------
 4     10         12
 5     20         32
 6     35         80
 7     56        192
 8     84        448
 9    120       1024
10    165       2304
11    220       5120
12    286      11264
13    364      24576
14    455      53248
15    560     114688
...
64  43680     290536219160925437952

The math behind this is described in detail in [Ono & Lohman, 1990]. I verified this in Postgres by instrumenting the planner to count how many times it calls make_join_rel().

Imagine having a "join enumeration budget" that, if exceeded, prevents advancing to the next join level. Given the above numbers and a query with some combination of chain and star shapes, a budget of 400 can guarantee an exhaustive search when there are up to 8 relations. For a pure chain join, we can do an exhaustive search on up to 13 relations, for a similar cost of time and space. Out of curiosity I tested HEAD with a chain query having 64 tables found in the SQLite tests [2] and found exhaustive search to take only twice as long as GEQO. If we have some 30-way join, and some (> 400) budget, it's actually possible that we will complete the exhaustive search and get the optimal plan. This is a bottom-up way of determining the complexity. Rather than configuring a number-of-relations threshold and possibly have exponential behavior blow up in their faces, users can configure something that somewhat resembles the runtime cost.

Now, I'll walk through one way that a greedy heuristic can integrate with DP. In our 30-way join example, if we use up our budget and don't have a valid plan, we'll break out of DP at the last join level we completed. Since we already have built a number of joinrels, we build upon that work as we proceed. The approach I have in mind is described in [Kossmann & Stocker, 2000], which the authors call "iterative dynamic programming" (IDP). I'll describe one of the possible variants here. Let's say we only got as far as join level 8, so we've created up to 8-way joinrels. We pick the best few (maybe 5%) of these 8-way joinrels by some measure (doesn't have to be the full cost model) and on top of each of them create a full plan quickly: At each join level, we only pick one base relation (again by some measure) to create one new joinrel and then move to the next join level. This is very fast, even with hundreds of relations.

Once we have one valid, complete plan, we can technically stop at any time (Coding this much is also a good proof-of-concept). How much additional effort we expend to find a good plan could be another budget we have.  With a complete plan obtained quickly, we also have an upper bound on the measure of the cheapest overall plan, so with that we can prune any more expensive plans as we iterate through the 8-way joinrels. Once we have a set of complete plans, we pick some of them to improve the part of the plan picked during the greedy step. For some size k (typically between 4 and 7), we divide the greedy-step part of the join into k-sized sections. So with our 30-table join where we started with an 8-way joinrel, we have 22 tables. If k=5, we run standard dynamic programming (including the standard cost model) on four 5-table sections and then once the last 2-table section.

You can also think of it like this: We quickly find 5 tables that likely would be good to join next, find the optimal join order among the 5, then add that to our joinrel. We keep doing that until we get a valid plan. The only difference is, performing the greedy step to completion allows us to prune subsequent bad intermediate steps.

By "some measure" above I'm being a bit hand-wavy, but at least in the literature I've read, fast heuristic algorithms seem to use simpler and cheaper-to-compute metrics like intermediate result size or selectivity, rather than a full cost function. That's a detail to be worked out. Also, it must be said that in choosing among intermediate steps we need to be careful to include things like:

- interesting sort orders
- patition-wise joins
- parameterized paths

Further along the lines of extending DP that's kind of orthogonal to the above is the possibility of doing pruning during the initial DP step. Looking again at how quickly the join enumeration for star queries grows with increasing "n", it makes sense that a large number of those are bad plans. In [Das & Haritsa, 2006], the authors demonstrate a method of extending the reach of DP by pruning joinrels at each join level by two criteria:

1) Whether the joinrel contains a hub relation (i.e. is the center of a star)
2) A skyline function taking into account cost, cardinality, and selectivity

This way, the worst joinrels of star queries are pruned and the initial join budget I mentioned above goes a lot farther.

There are quite a few details and variations I left out (testing, for one), but this is enough to show the idea. I plan on working on this during the PG15 cycle. I'd appreciate any feedback on the above.
--

[1] https://www.postgresql.org/message-id/15658.1241278636@sss.pgh.pa.us

[Stroffek & Kovarik, 2007] https://www.pgcon.org/2007/schedule/attachments/28-Execution_Plan_Optimization_Techniques_Stroffek_Kovarik.pdf

[Gustaffson, 2017]  https://www.postgresql.eu/events/pgconfeu2017/sessions/session/1586/slides/26/Through_the_Joining_Glass-PGConfeu-DanielGustafsson.pdf

[Neumann, 2018] Adaptive Optimization of Very Large Join Queries. https://dl.acm.org/doi/10.1145/3183713.3183733

[Ono & Lohman, 1990] Measuring the Complexity of Join Enumeration in Query Optimization. https://www.csd.uoc.gr/~hy460/pdf/MeasuringtheComplexityofJoinEnumerationinQueryOptimization.PDF

[2] https://www.sqlite.org/sqllogictest/file?name=test/select5.test
(Note: there are no explicit join clauses so "from" limits didn't have an effect in my quick test.)

[Kossmann & Stocker, 2000] Iterative dynamic programming: a new class of query optimization algorithms. https://doi.org/10.1145/352958.352982

[Das & Haritsa, 2006] Robust Heuristics for Scalable Optimization of Complex SQL Queries. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.549.4331&rep=rep1&type=pdf

--

Re: a path towards replacing GEQO with something better

From
Tomas Vondra
Date:
Hi,

On 6/10/21 3:21 AM, John Naylor wrote:
> Hi,
> 
> On occasion it comes up that the genetic query optimizer (GEQO) doesn't
> produce particularly great plans, and is slow ([1] for example). The
> only alternative that has gotten as far as a prototype patch (as far as
> I know) is simulated annealing some years ago, which didn't seem to get far.
> 
> The join problem as it pertains to Postgres has been described within
> the community in
> [Gustaffson, 2017] and [Stroffek & Kovarik, 2007].
> 
> The fact that there is much more interest than code in this area
> indicates that this is a hard problem. I hadn't given it much thought
> myself until by chance I came across [Neumann, 2018], which describes a
> number of interesting ideas. The key takeaway is that you want a
> graceful transition between exhaustive search and heuristic search. In
> other words, if the space of possible join orderings is just slightly
> larger than the maximum allowed exhaustive search, then the search
> should be *almost* exhaustive.

Yeah, I think this is one of the places with the annoying "cliff edge"
behavior in our code base, so an alternative that would degrade a bit
more gracefully would be welcome.

I only quickly read the [Neumann, 2018] paper over the weekend, and
overall it seems like a very interesting/promising approach. Of course,
the question is how well it can be combined with the rest of our code,
and various other details from real-world queries (papers often ignore
some of those bits for simplicity).

> This not only increases the chances of finding a good plan, but also
> has three engineering advantages I can think of:
> 
> 1) It's natural to re-use data structures etc. already used by the
> existing dynamic programming (DP) algorithm. Framing the problem as
> extending DP greatly lowers the barrier to making realistic progress. If
> the problem is framed as "find an algorithm as a complete drop-in
> replacement for GEQO", it's a riskier project in my view.
> 

True.

> 2) We can still keep GEQO around (with some huge limit by default) for a
> few years as an escape hatch, while we refine the replacement. If there
> is some bug that prevents finding a plan, we can emit a WARNING and fall
> back to GEQO. Or if a user encounters a regression in a big query, they
> can lower the limit to restore the plan they had in an earlier version.
> 

Not sure. Keeping significant amounts of code may not be free - both for
maintenance and new features. It'd be a bit sad if someone proposed
improvements to join planning, but had to do 2x the work to support it
in both the DP and GEQO branches, or risk incompatibility.

OTOH maybe this concern is unfounded in practice - I don't think we've
done very many big changes to geqo in the last few years.

> 3) It actually improves the existing exhaustive search, because the
> complexity of the join order problem depends on the query shape: a
> "chain" shape (linear) vs. a "star" shape (as in star schema), for the
> most common examples. The size of the DP table grows like this (for n >= 4):
> 
> Chain: (n^3 - n) / 6   (including bushy plans)
> Star:  (n - 1) * 2^(n - 2)
> 
>  n  chain       star
> --------------------
>  4     10         12
>  5     20         32
>  6     35         80
>  7     56        192
>  8     84        448
>  9    120       1024
> 10    165       2304
> 11    220       5120
> 12    286      11264
> 13    364      24576
> 14    455      53248
> 15    560     114688
> ...
> 64  43680     290536219160925437952
> 
> The math behind this is described in detail in [Ono & Lohman, 1990]. I
> verified this in Postgres by instrumenting the planner to count how many
> times it calls make_join_rel().
> 

So, did you verify it for star query with 64 relations? ;-)

> Imagine having a "join enumeration budget" that, if exceeded, prevents
> advancing to the next join level. Given the above numbers and a query
> with some combination of chain and star shapes, a budget of 400 can
> guarantee an exhaustive search when there are up to 8 relations. For a
> pure chain join, we can do an exhaustive search on up to 13 relations,
> for a similar cost of time and space. Out of curiosity I tested HEAD
> with a chain query having 64 tables found in the SQLite tests [2] and
> found exhaustive search to take only twice as long as GEQO. If we have
> some 30-way join, and some (> 400) budget, it's actually possible that
> we will complete the exhaustive search and get the optimal plan. This is
> a bottom-up way of determining the complexity. Rather than configuring a
> number-of-relations threshold and possibly have exponential behavior
> blow up in their faces, users can configure something that somewhat
> resembles the runtime cost.
> 

Sound reasonable in principle, I think.

This reminds me the proposals to have a GUC that'd determine how much
effort should the planner invest into various optimizations. For OLTP it
might be quite low, for large OLAP queries it'd be economical to spend
more time trying some more expensive optimizations.

The challenge of course is how / in what units / to define the budget,
so that it's meaningful and understandable for users. Not sure if
"number of join rels generated" will be clear enough for users. But it
seems good enough for PoC / development, and hopefully people won't have
to tweak it very often.

For JIT we used the query cost, which is a term users are familiar with,
but that's possible because we do the decision after the plan is
built/costed. That doesn't work for join order search :-(

> Now, I'll walk through one way that a greedy heuristic can integrate
> with DP. In our 30-way join example, if we use up our budget and don't
> have a valid plan, we'll break out of DP at the last join level
> we completed. Since we already have built a number of joinrels, we build
> upon that work as we proceed. The approach I have in mind is described
> in [Kossmann & Stocker, 2000], which the authors call "iterative dynamic
> programming" (IDP). I'll describe one of the possible variants here.
> Let's say we only got as far as join level 8, so we've created up to
> 8-way joinrels. We pick the best few (maybe 5%) of these 8-way joinrels
> by some measure (doesn't have to be the full cost model) and on top of
> each of them create a full plan quickly: At each join level, we only
> pick one base relation (again by some measure) to create one new joinrel
> and then move to the next join level. This is very fast, even with
> hundreds of relations.
> 
> Once we have one valid, complete plan, we can technically stop at any
> time (Coding this much is also a good proof-of-concept). How much
> additional effort we expend to find a good plan could be another budget
> we have.  With a complete plan obtained quickly, we also have an upper
> bound on the measure of the cheapest overall plan, so with that we can
> prune any more expensive plans as we iterate through the 8-way joinrels.
> Once we have a set of complete plans, we pick some of them to improve
> the part of the plan picked during the greedy step. For some size k
> (typically between 4 and 7), we divide the greedy-step part of the join
> into k-sized sections. So with our 30-table join where we started with
> an 8-way joinrel, we have 22 tables. If k=5, we run standard dynamic
> programming (including the standard cost model) on four 5-table sections
> and then once the last 2-table section.
> 
> You can also think of it like this: We quickly find 5 tables that likely
> would be good to join next, find the optimal join order among the 5,
> then add that to our joinrel. We keep doing that until we get a valid
> plan. The only difference is, performing the greedy step to completion
> allows us to prune subsequent bad intermediate steps.
> 

I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
[Neumann, 2018] paper seems to build on it, and it seems to work with
much larger subtrees of the join tree than k=5.

> By "some measure" above I'm being a bit hand-wavy, but at least in the
> literature I've read, fast heuristic algorithms seem to use simpler and
> cheaper-to-compute metrics like intermediate result size or selectivity,
> rather than a full cost function. That's a detail to be worked out.
> Also, it must be said that in choosing among intermediate steps we need
> to be careful to include things like:
> 
> - interesting sort orders
> - patition-wise joins
> - parameterized paths
> 

Yeah. I think this is going to be somewhat tricky - the paper seems to
have very few details about dealing with criteria like this. OTOH the
query plans for TPC-H/TPC-DS etc. seem to be quite good.

What I find fairly interesting is the section in [Neumann, 2018] about
cardinality estimates, and quality of query plans when the estimates are
off. The last paragraph in the 6.5 section essentially says that despite
poor estimates, the proposed algorithm performs better than the simple
(and cheap) heuristics. I'm not sure what to think about that, because
my "intuitive" understanding is that the more elaborate the planning is,
the more errors it can make when the estimates are off.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a path towards replacing GEQO with something better

From
John Naylor
Date:
On Sun, Jun 13, 2021 at 9:50 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> > 2) We can still keep GEQO around (with some huge limit by default) for a
> > few years as an escape hatch, while we refine the replacement. If there
> > is some bug that prevents finding a plan, we can emit a WARNING and fall
> > back to GEQO. Or if a user encounters a regression in a big query, they
> > can lower the limit to restore the plan they had in an earlier version.
> >
>
> Not sure. Keeping significant amounts of code may not be free - both for
> maintenance and new features. It'd be a bit sad if someone proposed
> improvements to join planning, but had to do 2x the work to support it
> in both the DP and GEQO branches, or risk incompatibility.

Looking back again at the commit history, we did modify geqo to support partial paths and partition-wise join, so that's a fair concern. My concern is the risk of plan regressions after an upgrade, even if for a small number of cases.

> OTOH maybe this concern is unfounded in practice - I don't think we've
> done very many big changes to geqo in the last few years.

Yeah, I get the feeling that it's already de facto obsolete, and we could make it a policy not to consider improvements aside from bug fixes where it can't find a valid plan, or forced API changes. Which I guess is another way of saying "deprecated".

(I briefly considered turning it into a contrib module, but that seems like the worst of both worlds.)

> This reminds me the proposals to have a GUC that'd determine how much
> effort should the planner invest into various optimizations. For OLTP it
> might be quite low, for large OLAP queries it'd be economical to spend
> more time trying some more expensive optimizations.
>
> The challenge of course is how / in what units / to define the budget,
> so that it's meaningful and understandable for users. Not sure if
> "number of join rels generated" will be clear enough for users. But it
> seems good enough for PoC / development, and hopefully people won't have
> to tweak it very often.

I'm also in favor of having some type of "planner effort" or "OLTP to OLAP spectrum" guc, but I'm not yet sure whether it'd be better to have it separate or to couple the joinrel budget to it. If we go that route, I imagine there'll be many things that planner_effort changes that we don't want to give a separate knob for. And, I hope with graceful degradation and a good enough heuristic search, it won't be necessary to change in most cases.

> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
> [Neumann, 2018] paper seems to build on it, and it seems to work with
> much larger subtrees of the join tree than k=5.

Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay, so Neumann's favorite algorithm stack "Adaptive" is complex, and I believe you are referring to cases where they can iteratively improve up to 100 rels at a time because of linearization. That's a separate algorithm (IKKBZ) that complicates the cost model and also cannot have outer joins. If it has outer joins, they use regular DP on subsets of size up to 10. It's not substantively different from IDP-2, and that's the one I'd like to try to gracefully fall back to. Or something similar.

> What I find fairly interesting is the section in [Neumann, 2018] about
> cardinality estimates, and quality of query plans when the estimates are
> off. The last paragraph in the 6.5 section essentially says that despite
> poor estimates, the proposed algorithm performs better than the simple
> (and cheap) heuristics. I'm not sure what to think about that, because
> my "intuitive" understanding is that the more elaborate the planning is,
> the more errors it can make when the estimates are off.

Yeah, I'm not sure this part is very useful and seems almost like an afterthought. In table 3, all those poor examples are "pure" greedy algorithms and don't have iterative refinement added, so it kind of makes sense that poor estimates would hurt them more. But they don't compare those with *just* a refinement step added. I also don't know how realistic their "estimate fuzzing" is.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: a path towards replacing GEQO with something better

From
Tomas Vondra
Date:

On 6/14/21 1:16 PM, John Naylor wrote:
> On Sun, Jun 13, 2021 at 9:50 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> wrote:
> 
>> > 2) We can still keep GEQO around (with some huge limit by default) for a
>> > few years as an escape hatch, while we refine the replacement. If there
>> > is some bug that prevents finding a plan, we can emit a WARNING and fall
>> > back to GEQO. Or if a user encounters a regression in a big query, they
>> > can lower the limit to restore the plan they had in an earlier version.
>> >
>>
>> Not sure. Keeping significant amounts of code may not be free - both for
>> maintenance and new features. It'd be a bit sad if someone proposed
>> improvements to join planning, but had to do 2x the work to support it
>> in both the DP and GEQO branches, or risk incompatibility.
> 
> Looking back again at the commit history, we did modify geqo to support
> partial paths and partition-wise join, so that's a fair concern.

Right. I think the question is how complex those changes were. If it was
mostly mechanical, it's not a big deal and we can keep both, but if it
requires deeper knowledge of the GEQO inner workings it may be an issue
(planner changes are already challenging enough).

> My concern is the risk of plan regressions after an upgrade, even if
> for a small number of cases.
> 

I don't know. My impression/experience with GEQO is that getting a good
join order for queries with many joins is often a matter of luck, and
the risk of getting poor plan just forces me to increase geqo_threshold
or disable it altogether. Or just rephrase the the query to nudge the
planner to use a good join order (those large queries are often star or
chain shaped, so it's not that difficult).

So I'm not sure "GEQO accidentally produces a better plan for this one
query" is a good argument to keep it around. We should probably evaluate
the overall behavior, and then make a decision.

FWIW I think we're facing this very dilemma for every optimizer change,
to some extent. Every time we make the planner smarter by adding a new
plan variant or heuristics, we're also increasing the opportunity for
errors. And every time we look (or should look) at average behavior and
worst case behavior ...

>> OTOH maybe this concern is unfounded in practice - I don't think we've
>> done very many big changes to geqo in the last few years.
> 
> Yeah, I get the feeling that it's already de facto obsolete, and we
> could make it a policy not to consider improvements aside from bug fixes
> where it can't find a valid plan, or forced API changes. Which I guess
> is another way of saying "deprecated".
> 
> (I briefly considered turning it into a contrib module, but that seems
> like the worst of both worlds.)
> 

True. I'm fine with deprecating / not improving geqo. What would worry
me is incompatibility, i.e. if geqo could not support some features. I'm
thinking of storage engines in MySQL not supporting some features,
leading to a mine field for users. Producing poor plans is fine, IMO.

>> This reminds me the proposals to have a GUC that'd determine how much
>> effort should the planner invest into various optimizations. For OLTP it
>> might be quite low, for large OLAP queries it'd be economical to spend
>> more time trying some more expensive optimizations.
>>
>> The challenge of course is how / in what units / to define the budget,
>> so that it's meaningful and understandable for users. Not sure if
>> "number of join rels generated" will be clear enough for users. But it
>> seems good enough for PoC / development, and hopefully people won't have
>> to tweak it very often.
> 
> I'm also in favor of having some type of "planner effort" or "OLTP to
> OLAP spectrum" guc, but I'm not yet sure whether it'd be better to have
> it separate or to couple the joinrel budget to it. If we go that route,
> I imagine there'll be many things that planner_effort changes that we
> don't want to give a separate knob for. And, I hope with graceful
> degradation and a good enough heuristic search, it won't be necessary to
> change in most cases.
> 

Yeah. I'm really worried about having a zillion separate GUC knobs for
tiny parts of the code. That's impossible to tune, and it also exposes
details about the implementation. And some people just can't resist
touching all the available options ;-)

The thing I like about JIT tunables is that it's specified in "cost"
which the users are fairly familiar with. Having another GUC with
entirely different unit is not great.

But as I said - it seems perfectly fine for PoC / development, and we
can revisit that later. Adding some sort of "planner effort" or multiple
optimization passes is a huge project on it's own.

>> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
>> [Neumann, 2018] paper seems to build on it, and it seems to work with
>> much larger subtrees of the join tree than k=5.
> 
> Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay,
> so Neumann's favorite algorithm stack "Adaptive" is complex, and I
> believe you are referring to cases where they can iteratively improve up
> to 100 rels at a time because of linearization. That's a separate
> algorithm (IKKBZ) that complicates the cost model and also cannot have
> outer joins. If it has outer joins, they use regular DP on subsets of
> size up to 10. It's not substantively different from IDP-2, and that's
> the one I'd like to try to gracefully fall back to. Or something similar.
> 

Yes, that's what I was referring to. You're right it's complex and we
don't need to implement all of that - certainly not on day one. The
linearization / IKKBZ seems interesting (even if just for inner joins),
but better to start with something generic.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a path towards replacing GEQO with something better

From
Bruce Momjian
Date:
On Mon, Jun 14, 2021 at 06:10:28PM +0200, Tomas Vondra wrote:
> On 6/14/21 1:16 PM, John Naylor wrote:
> > On Sun, Jun 13, 2021 at 9:50 AM Tomas Vondra
> > <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> > wrote:
> > 
> >> > 2) We can still keep GEQO around (with some huge limit by default) for a
> >> > few years as an escape hatch, while we refine the replacement. If there
> >> > is some bug that prevents finding a plan, we can emit a WARNING and fall
> >> > back to GEQO. Or if a user encounters a regression in a big query, they
> >> > can lower the limit to restore the plan they had in an earlier version.
> >> >
> >>
> >> Not sure. Keeping significant amounts of code may not be free - both for
> >> maintenance and new features. It'd be a bit sad if someone proposed
> >> improvements to join planning, but had to do 2x the work to support it
> >> in both the DP and GEQO branches, or risk incompatibility.
> > 
> > Looking back again at the commit history, we did modify geqo to support
> > partial paths and partition-wise join, so that's a fair concern.
> 
> Right. I think the question is how complex those changes were. If it was
> mostly mechanical, it's not a big deal and we can keep both, but if it
> requires deeper knowledge of the GEQO inner workings it may be an issue
> (planner changes are already challenging enough).

The random plan nature of GEQO, along with its "cliff", make it
something I would be glad to get rid of if we can get an improved
approach to large planning needs.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: a path towards replacing GEQO with something better

From
Robert Haas
Date:
On Wed, Jun 9, 2021 at 9:24 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> 3) It actually improves the existing exhaustive search, because the complexity of the join order problem depends on
thequery shape: a "chain" shape (linear) vs. a "star" shape (as in star schema), for the most common examples. The size
ofthe DP table grows like this (for n >= 4): 
>
> Chain: (n^3 - n) / 6   (including bushy plans)
> Star:  (n - 1) * 2^(n - 2)
>
>  n  chain       star
> --------------------
>  4     10         12
>  5     20         32
>  6     35         80
>  7     56        192
>  8     84        448
>  9    120       1024
> 10    165       2304
> 11    220       5120
> 12    286      11264
> 13    364      24576
> 14    455      53248
> 15    560     114688
> ...
> 64  43680     290536219160925437952

I don't quite understand the difference between the "chain" case and
the "star" case. Can you show sample queries for each one? e.g. SELECT
... FROM a_1, a_2, ..., a_n WHERE <something>?

One idea I just ran across in
https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
is to try to economize by skipping consideration of bushy plans. We
could start doing that when some budget is exceeded, similar to what
you are proposing here, but probably the budget for skipping
consideration of bushy plans would be smaller than the budget for
switching to IDP. The idea of skipping bushy plan generation in some
cases makes sense to me intuitively because most of the plans
PostgreSQL generates are mostly left-deep, and even when we do
generate bushy plans, they're not always a whole lot better than the
nearest equivalent left-deep plan. The paper argues that considering
bushy plans makes measurable gains, but also that failure to consider
such plans isn't catastrophically bad.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: a path towards replacing GEQO with something better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> One idea I just ran across in
> https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
> is to try to economize by skipping consideration of bushy plans. We
> could start doing that when some budget is exceeded, similar to what
> you are proposing here, but probably the budget for skipping
> consideration of bushy plans would be smaller than the budget for
> switching to IDP. The idea of skipping bushy plan generation in some
> cases makes sense to me intuitively because most of the plans
> PostgreSQL generates are mostly left-deep, and even when we do
> generate bushy plans, they're not always a whole lot better than the
> nearest equivalent left-deep plan. The paper argues that considering
> bushy plans makes measurable gains, but also that failure to consider
> such plans isn't catastrophically bad.

It's not catastrophically bad until you hit a query where the only
correct plans are bushy.  These do exist, and I think they're not
that uncommon.

Still, I take your point that maybe we could ratchet down the cost of
exhaustive search by skimping on this part.  Maybe it'd work to skip
bushy so long as we'd found at least one left-deep or right-deep path
for the current rel.

            regards, tom lane



Re: a path towards replacing GEQO with something better

From
Robert Haas
Date:
On Tue, Jun 15, 2021 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Still, I take your point that maybe we could ratchet down the cost of
> exhaustive search by skimping on this part.  Maybe it'd work to skip
> bushy so long as we'd found at least one left-deep or right-deep path
> for the current rel.

Yes, that sounds better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: a path towards replacing GEQO with something better

From
John Naylor
Date:
On Tue, Jun 15, 2021 at 12:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 9:24 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> > 3) It actually improves the existing exhaustive search, because the complexity of the join order problem depends on the query shape: a "chain" shape (linear) vs. a "star" shape (as in star schema), for the most common examples. The size of the DP table grows like this (for n >= 4):
...
> I don't quite understand the difference between the "chain" case and
> the "star" case. Can you show sample queries for each one? e.g. SELECT
> ... FROM a_1, a_2, ..., a_n WHERE <something>?

There's a very simple example in the optimizer README:

--
SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.col = tab2.col AND
    tab2.col = tab3.col AND
    tab3.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{2 3},{3 4}
{1 2 3},{2 3 4}
{1 2 3 4}
(other possibilities will be excluded for lack of join clauses)

SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.col = tab2.col AND
    tab1.col = tab3.col AND
    tab1.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{1 3},{1 4}
{1 2 3},{1 3 4},{1 2 4}
{1 2 3 4}
--

The first one is chain, and the second is star. Four is the smallest set where we have a difference. I should now point out an imprecision in my language: By "size of the DP table", the numbers in my first email refer to the number of joinrels times the number of possible joins (not paths, and ignoring commutativity). Here are the steps laid out, with cumulative counts:

join_level, # joins,  cumulative # joins:

linear, n=4
 2               3               3
 3               4               7
 4               3              10

star, n=4
 2               3               3
 3               6               9
 4               3              12

And of course, the chain query also has three at the last level, because it tries two left- (or right-) deep joins and one bushy join.

> One idea I just ran across in
> https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
> is to try to economize by skipping consideration of bushy plans.

That's a good paper, and it did influence my thinking.

You likely already know this, but for the archives: If only chain queries could have bushy plans, it wouldn't matter because they are so cheap to enumerate. But, since star queries can introduce a large number of extra joins via equivalence (same join column or FK), making them resemble "clique" queries, bushy joins get excessively numerous.

> We
> could start doing that when some budget is exceeded, similar to what
> you are proposing here, but probably the budget for skipping
> consideration of bushy plans would be smaller than the budget for
> switching to IDP. The idea of skipping bushy plan generation in some
> cases makes sense to me intuitively because most of the plans
> PostgreSQL generates are mostly left-deep, and even when we do
> generate bushy plans, they're not always a whole lot better than the
> nearest equivalent left-deep plan. The paper argues that considering
> bushy plans makes measurable gains, but also that failure to consider
> such plans isn't catastrophically bad.

I think that makes sense. There are a few things we could do within the "grey zone" -- too many rels to finish exhaustive search, but not enough to justify starting directly with the greedy step -- to increase our chances of completing, and that's a very simple one.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: a path towards replacing GEQO with something better

From
Robert Haas
Date:
On Tue, Jun 15, 2021 at 2:16 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
> > I don't quite understand the difference between the "chain" case and
> > the "star" case. Can you show sample queries for each one? e.g. SELECT
> > ... FROM a_1, a_2, ..., a_n WHERE <something>?
>
> SELECT  *
> FROM    tab1, tab2, tab3, tab4
> WHERE   tab1.col = tab2.col AND
>     tab2.col = tab3.col AND
>     tab3.col = tab4.col
>
> SELECT  *
> FROM    tab1, tab2, tab3, tab4
> WHERE   tab1.col = tab2.col AND
>     tab1.col = tab3.col AND
>     tab1.col = tab4.col

I feel like these are completely equivalent. Either way, the planner
is going to deduce that all the ".col" columns are equal to each other
via the equivalence class machinery, and then the subsequent planning
will be absolutely identical. Or am I missing something?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: a path towards replacing GEQO with something better

From
John Naylor
Date:


On Wed, Jun 16, 2021 at 12:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jun 15, 2021 at 2:16 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:
> > > I don't quite understand the difference between the "chain" case and
> > > the "star" case. Can you show sample queries for each one? e.g. SELECT
> > > ... FROM a_1, a_2, ..., a_n WHERE <something>?
> >
> > SELECT  *
> > FROM    tab1, tab2, tab3, tab4
> > WHERE   tab1.col = tab2.col AND
> >     tab2.col = tab3.col AND
> >     tab3.col = tab4.col
> >
> > SELECT  *
> > FROM    tab1, tab2, tab3, tab4
> > WHERE   tab1.col = tab2.col AND
> >     tab1.col = tab3.col AND
> >     tab1.col = tab4.col
>
> I feel like these are completely equivalent. Either way, the planner
> is going to deduce that all the ".col" columns are equal to each other
> via the equivalence class machinery, and then the subsequent planning
> will be absolutely identical. Or am I missing something?

I believe the intention of the example is that ".col" is a place holder for some column (all different). Otherwise, with enough ECs it would result in an even bigger set of joinrels than what we see here. If ECs don't actually cause additional joinrels to be created, then I'm missing something fundamental.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: a path towards replacing GEQO with something better

From
Tom Lane
Date:
John Naylor <john.naylor@enterprisedb.com> writes:
> On Wed, Jun 16, 2021 at 12:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel like these are completely equivalent. Either way, the planner
>> is going to deduce that all the ".col" columns are equal to each other
>> via the equivalence class machinery, and then the subsequent planning
>> will be absolutely identical. Or am I missing something?

> I believe the intention of the example is that ".col" is a place holder for
> some column (all different). Otherwise, with enough ECs it would result in
> an even bigger set of joinrels than what we see here. If ECs don't actually
> cause additional joinrels to be created, then I'm missing something
> fundamental.

Yeah, I'm not sure I believe this distinction either.  IMV a typical star
schema is going to involve joins of dimension-table ID columns to
*different* referencing columns of the fact table(s), so that you won't
get large equivalence classes.

There certainly are cases where a query produces large equivalence classes
that will lead us to investigate a lot of join paths that we wouldn't have
considered were it not for the EC-deduced join clauses.  But I don't
think that scenario has much to do with star schemas.

            regards, tom lane



Re: a path towards replacing GEQO with something better

From
John Naylor
Date:
On Wed, Jun 16, 2021 at 12:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I feel like these are completely equivalent. Either way, the planner
> is going to deduce that all the ".col" columns are equal to each other
> via the equivalence class machinery, and then the subsequent planning
> will be absolutely identical. Or am I missing something?

Ok, I've modified the examples so it reflects the distinction:

A chain has join predicates linking relations in a linear sequence:

SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.a = tab2.b AND
       tab2.i = tab3.j AND
        tab3.x = tab4.y

A star has a hub with join predicates to multiple spokes:

SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.f1 = tab2.d1 AND
        tab1.f2 = tab3.d2 AND
        tab1.f3 = tab4.d3

--
John Naylor
EDB: http://www.enterprisedb.com

Re: a path towards replacing GEQO with something better

From
AJG
Date:
Hi,

I stumbled across this which may be of interest to this topic and GEQO
alternative.

The main creator/author of Neo and Bao (ML for Query Optimizer) Ryan Marcus
(finishing Postdoc and looking for job) recently posted [1] about Bao for
distributed systems.

But what was interesting was the links he had to a 2020 Neo YouTube [2]
which discussed better cardinality estimation / 90% less errors (vs.
Postgres 10) only improved query latency by 2-3%, and other MLs made worse
in other scenarios.

Other interesting takeaways from the video (summary):

PostgreSQL Query Optimizer – 40k LOC. PG10 70% worse/slower than Oracle. PG
has 3 major flaws in QO, 1 fixed in PG11. Neo 10-25% better than PG QO after
30hr training (using GPU). Neo drops to 10% better if 3 flaws were / could
be fixed.

MS SQL – 1 million LOC.

Oracle – 45-55 FTEs working on it. No LOC given by Oracle. Appear to focus
on TPC-DS. NEO better than Oracle after 60hr training (using GPU).

Humans and hand tuning will always beat ML. I.e. Neo (and Bao) good for
those who cannot afford a fulltime DBA doing query optimizing.


Bao – follow-on work from Neo.
“This is a prototype implementation of Bao for PostgreSQL. Bao is a learned
query optimizer that learns to "steer" the PostgreSQL optimizer by issuing
coarse-grained query hints. For more information about Bao”

BAO GitHub here [3] and is AGPLv3 license (not sure if that’s good or bad).

Bao drawbacks… (but may not matter from a GEQO perspective??)

“Of course, Bao does come with some drawbacks. Bao causes query optimization
to take a little bit more time (~300ms), requiring quite a bit more
computation. We studied this overhead in our SIGMOD paper. For data
warehouse workloads, which largely consists of long-running, resource
intensive queries, Bao’s increased overhead is hardly noticeable. However,
for workloads with a lot of short running queries, like OLTP workloads, this
might not be the case. We are currently working on new approaches to
mitigate that problem – so stay tuned!”


[1] https://rmarcus.info/blog/2021/06/17/bao-distributed.html
[2] https://www.youtube.com/watch?v=lMb1yNbIopc
Cardinality errors impact on latency - Starting at 8:00, interesting at
10:10 approx.
[3] https://github.com/learnedsystems/baoforpostgresql




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: a path towards replacing GEQO with something better

From
John Naylor
Date:

On Mon, Jun 14, 2021 at 12:10 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> >> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
> >> [Neumann, 2018] paper seems to build on it, and it seems to work with
> >> much larger subtrees of the join tree than k=5.
> >
> > Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay,
> > so Neumann's favorite algorithm stack "Adaptive" is complex, and I
> > believe you are referring to cases where they can iteratively improve up
> > to 100 rels at a time because of linearization. That's a separate
> > algorithm (IKKBZ) that complicates the cost model and also cannot have
> > outer joins. If it has outer joins, they use regular DP on subsets of
> > size up to 10. It's not substantively different from IDP-2, and that's
> > the one I'd like to try to gracefully fall back to. Or something similar.
> >
>
> Yes, that's what I was referring to. You're right it's complex and we
> don't need to implement all of that - certainly not on day one. The
> linearization / IKKBZ seems interesting (even if just for inner joins),
> but better to start with something generic.

Update for future reference: The authors published a follow-up in 2019 in which they describe a way to allow non-inner joins to be considered during linearization. Their scheme also allows for incorporating a limited number of cross products into the search in a safe way. Unsurprisingly, these features add complexity, and I don't quite understand it yet, but it might be worth evaluating in the future.