Re: a path towards replacing GEQO with something better - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: a path towards replacing GEQO with something better
Date
Msg-id 2e930f7d-a3ea-793b-830b-54247f1a39e5@enterprisedb.com
Whole thread Raw
In response to a path towards replacing GEQO with something better  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: a path towards replacing GEQO with something better
List pgsql-hackers
Hi,

On 6/10/21 3:21 AM, John Naylor wrote:
> Hi,
> 
> On occasion it comes up that the genetic query optimizer (GEQO) doesn't
> produce particularly great plans, and is slow ([1] for example). The
> only alternative that has gotten as far as a prototype patch (as far as
> I know) is simulated annealing some years ago, which didn't seem to get far.
> 
> The join problem as it pertains to Postgres has been described within
> the community in
> [Gustaffson, 2017] and [Stroffek & Kovarik, 2007].
> 
> The fact that there is much more interest than code in this area
> indicates that this is a hard problem. I hadn't given it much thought
> myself until by chance I came across [Neumann, 2018], which describes a
> number of interesting ideas. The key takeaway is that you want a
> graceful transition between exhaustive search and heuristic search. In
> other words, if the space of possible join orderings is just slightly
> larger than the maximum allowed exhaustive search, then the search
> should be *almost* exhaustive.

Yeah, I think this is one of the places with the annoying "cliff edge"
behavior in our code base, so an alternative that would degrade a bit
more gracefully would be welcome.

I only quickly read the [Neumann, 2018] paper over the weekend, and
overall it seems like a very interesting/promising approach. Of course,
the question is how well it can be combined with the rest of our code,
and various other details from real-world queries (papers often ignore
some of those bits for simplicity).

> This not only increases the chances of finding a good plan, but also
> has three engineering advantages I can think of:
> 
> 1) It's natural to re-use data structures etc. already used by the
> existing dynamic programming (DP) algorithm. Framing the problem as
> extending DP greatly lowers the barrier to making realistic progress. If
> the problem is framed as "find an algorithm as a complete drop-in
> replacement for GEQO", it's a riskier project in my view.
> 

True.

> 2) We can still keep GEQO around (with some huge limit by default) for a
> few years as an escape hatch, while we refine the replacement. If there
> is some bug that prevents finding a plan, we can emit a WARNING and fall
> back to GEQO. Or if a user encounters a regression in a big query, they
> can lower the limit to restore the plan they had in an earlier version.
> 

Not sure. Keeping significant amounts of code may not be free - both for
maintenance and new features. It'd be a bit sad if someone proposed
improvements to join planning, but had to do 2x the work to support it
in both the DP and GEQO branches, or risk incompatibility.

OTOH maybe this concern is unfounded in practice - I don't think we've
done very many big changes to geqo in the last few years.

> 3) It actually improves the existing exhaustive search, because the
> complexity of the join order problem depends on the query shape: a
> "chain" shape (linear) vs. a "star" shape (as in star schema), for the
> most common examples. The size of the DP table grows like this (for n >= 4):
> 
> Chain: (n^3 - n) / 6   (including bushy plans)
> Star:  (n - 1) * 2^(n - 2)
> 
>  n  chain       star
> --------------------
>  4     10         12
>  5     20         32
>  6     35         80
>  7     56        192
>  8     84        448
>  9    120       1024
> 10    165       2304
> 11    220       5120
> 12    286      11264
> 13    364      24576
> 14    455      53248
> 15    560     114688
> ...
> 64  43680     290536219160925437952
> 
> The math behind this is described in detail in [Ono & Lohman, 1990]. I
> verified this in Postgres by instrumenting the planner to count how many
> times it calls make_join_rel().
> 

So, did you verify it for star query with 64 relations? ;-)

> Imagine having a "join enumeration budget" that, if exceeded, prevents
> advancing to the next join level. Given the above numbers and a query
> with some combination of chain and star shapes, a budget of 400 can
> guarantee an exhaustive search when there are up to 8 relations. For a
> pure chain join, we can do an exhaustive search on up to 13 relations,
> for a similar cost of time and space. Out of curiosity I tested HEAD
> with a chain query having 64 tables found in the SQLite tests [2] and
> found exhaustive search to take only twice as long as GEQO. If we have
> some 30-way join, and some (> 400) budget, it's actually possible that
> we will complete the exhaustive search and get the optimal plan. This is
> a bottom-up way of determining the complexity. Rather than configuring a
> number-of-relations threshold and possibly have exponential behavior
> blow up in their faces, users can configure something that somewhat
> resembles the runtime cost.
> 

Sound reasonable in principle, I think.

This reminds me the proposals to have a GUC that'd determine how much
effort should the planner invest into various optimizations. For OLTP it
might be quite low, for large OLAP queries it'd be economical to spend
more time trying some more expensive optimizations.

The challenge of course is how / in what units / to define the budget,
so that it's meaningful and understandable for users. Not sure if
"number of join rels generated" will be clear enough for users. But it
seems good enough for PoC / development, and hopefully people won't have
to tweak it very often.

For JIT we used the query cost, which is a term users are familiar with,
but that's possible because we do the decision after the plan is
built/costed. That doesn't work for join order search :-(

> Now, I'll walk through one way that a greedy heuristic can integrate
> with DP. In our 30-way join example, if we use up our budget and don't
> have a valid plan, we'll break out of DP at the last join level
> we completed. Since we already have built a number of joinrels, we build
> upon that work as we proceed. The approach I have in mind is described
> in [Kossmann & Stocker, 2000], which the authors call "iterative dynamic
> programming" (IDP). I'll describe one of the possible variants here.
> Let's say we only got as far as join level 8, so we've created up to
> 8-way joinrels. We pick the best few (maybe 5%) of these 8-way joinrels
> by some measure (doesn't have to be the full cost model) and on top of
> each of them create a full plan quickly: At each join level, we only
> pick one base relation (again by some measure) to create one new joinrel
> and then move to the next join level. This is very fast, even with
> hundreds of relations.
> 
> Once we have one valid, complete plan, we can technically stop at any
> time (Coding this much is also a good proof-of-concept). How much
> additional effort we expend to find a good plan could be another budget
> we have.  With a complete plan obtained quickly, we also have an upper
> bound on the measure of the cheapest overall plan, so with that we can
> prune any more expensive plans as we iterate through the 8-way joinrels.
> Once we have a set of complete plans, we pick some of them to improve
> the part of the plan picked during the greedy step. For some size k
> (typically between 4 and 7), we divide the greedy-step part of the join
> into k-sized sections. So with our 30-table join where we started with
> an 8-way joinrel, we have 22 tables. If k=5, we run standard dynamic
> programming (including the standard cost model) on four 5-table sections
> and then once the last 2-table section.
> 
> You can also think of it like this: We quickly find 5 tables that likely
> would be good to join next, find the optimal join order among the 5,
> then add that to our joinrel. We keep doing that until we get a valid
> plan. The only difference is, performing the greedy step to completion
> allows us to prune subsequent bad intermediate steps.
> 

I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
[Neumann, 2018] paper seems to build on it, and it seems to work with
much larger subtrees of the join tree than k=5.

> By "some measure" above I'm being a bit hand-wavy, but at least in the
> literature I've read, fast heuristic algorithms seem to use simpler and
> cheaper-to-compute metrics like intermediate result size or selectivity,
> rather than a full cost function. That's a detail to be worked out.
> Also, it must be said that in choosing among intermediate steps we need
> to be careful to include things like:
> 
> - interesting sort orders
> - patition-wise joins
> - parameterized paths
> 

Yeah. I think this is going to be somewhat tricky - the paper seems to
have very few details about dealing with criteria like this. OTOH the
query plans for TPC-H/TPC-DS etc. seem to be quite good.

What I find fairly interesting is the section in [Neumann, 2018] about
cardinality estimates, and quality of query plans when the estimates are
off. The last paragraph in the 6.5 section essentially says that despite
poor estimates, the proposed algorithm performs better than the simple
(and cheap) heuristics. I'm not sure what to think about that, because
my "intuitive" understanding is that the more elaborate the planning is,
the more errors it can make when the estimates are off.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: An out-of-date comment in nodeIndexonlyscan.c
Next
From: Tomas Vondra
Date:
Subject: Re: Fdw batch insert error out when set batch_size > 65535