Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: should we have a fast-path planning for OLTP starjoins?
Date
Msg-id 4e74529d-1a9f-4ff6-80a3-70638665b330@vondra.me
Whole thread Raw
In response to Re: should we have a fast-path planning for OLTP starjoins?  (James Hunter <james.hunter.pg@gmail.com>)
Responses Re: should we have a fast-path planning for OLTP starjoins?
List pgsql-hackers

On 2/7/25 20:11, James Hunter wrote:
> On Wed, Feb 5, 2025 at 4:23 AM Tomas Vondra <tomas@vondra.me> wrote:
>>
>> If the requirement is that all "dimensions" only join to the fact table
>> (which in this example would be "A" I think) through a FK, then why
>> would these joins be illegal?
>>
>> ...
>> Essentially, this defines a "dimension" as a relation that is joined
>> through a PK, without any other restrictions, both of which seems fairly
>> simple to check, and it's a "local" feature. And we'd simply mark those
>> as "join at the very end, in arbitrary order". Easy enough, I guess.
> 
> As I understand your proposal, you want to detect queries that join a
> large number, N, of tables -- which means doing an exhaustive search
> of all possible join orders is expensive -- where N - 1 of the tables
> do not join to each other, but join only to the Nth table.
> 
Yes. Essentially, it reduces the size of the problem by ignoring joins
for which we know the optimal order. We know the dimensions can be
joined last, and it does not matter in which exact order we join them.

The starjoins are a bit of the "worst case" for our heuristics, because
there are no dependencies between the dimensions, and we end up
exploring the n! possible join orders, more or less. For other joins we
quickly prune the space.

> PostgreSQL already falls back on geqo when it hits some heuristic that
> says exhaustive search is too expensive, but you're proposing an
> additional, better heuristic.

True, but most people will never actually hit the GEQO, because the
default threshold are set like this:

join_collapse_limit = 8
geqo_threshold = 12

So the planner will not "create" join search problems with more than 8
relations, but geqo only kicks in at 12. Most systems run with the
default values for these GUCs, so they don't really use GEQO.

FWIW I don't know a lot about the GEQO internals, but I heard it doesn't
work all that well for the join order problem anyway. Not sure.


> Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you
> gave, the single-table predicate on F makes it small enough, I think,
> that F will be the "build" side of any Hash Join, right? You're
> assuming, I think, that the cardinality |F| = 1, after applying the
> filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <=
> k < N. So then the join order does not matter. I think this is what
> you mean by "OLTP star join."
> 

I don't think it matters very much on which side of the join the F will
end up (or if it's a hash join, it can easily be NL). It will definitely
be in the first join, though, because all other dimensions join to it
(assuming this is just a starjoin, with only fact + dimensions).

It also doesn't really matter what's the exact cardinality of |F|. The
example used a PK lookup, so that would be 1 row, but the point is that
this is (much) cheaper than the planning. E.g. the planning might take
3ms while the execution only takes 1ms, etc. In the OLAP cases this is
usually not the case, because the queries are processing a lot of data
from the fact table, and the planning is negligible.

> For *OLAP* star joins, Oracle's Star Transformation [1] works
> reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs
> Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq
> Scan on F.
> 

I don't care about OLAP star joins, at least no in this patch. It's a
completely different / separate use case, and it affects very different
parts of the planner (and also the executor, which this patch does not
need to touch at all).

> So, I suggest:
> 1. Add an *OLTP* optimization similar to what you described, but
> instead of classifying the largest table as fact F, look for the "hub"
> of the star and classify it as F. And then enable your optimization if
> and only if the estimated nrows for F is very small.
> 

Right. I believe this is mostly what looking for FKs (as suggested by
Tom) would end up doing. It doesn't need to consider the cardinality of
F at all.

> 2. For an *OLAP* optimization, do something like Oracle's Star
> Transformation.
> 

I consider that well outside the scope of this patch.

> Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star
> queries, because the fact table F is *small* (post-filtering). And
> because F is small, it doesn't matter so much in what order you join
> the dimension tables, because the result is "likely" to be small as
> well.
> 

I don't think that's quite true. The order of dimension joins does not
matter because the joins do not affect the join size at all. The size of
|F| has nothing to do with that, I think. We'll do the same number of
lookups against the dimensions no matter in what order we join them. And
we know it's best to join them as late as possible, after all the joins
that reduce the size (and before joins that "add" rows, I think).

> Tom correctly points out that you really need foreign key constraints
> to ensure the previous sentence's "likely," but since your
> optimization is just intended to avoid considering unnecessary join
> orders, you may be able to get away with asking the optimizer what it
> thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and
> just fall back on the existing join-search logic when the optimizer
> thinks that Dk will create lots of rows (and so the join order
> matters...).
> 

Possibly, but TBH the join cardinality estimates can be quite dubious
pretty easily. The FK is a much more reliable (definitive) information.

> So much for the OLTP case. For completeness, some discussion about the
> OLAP case; fwiw, let me start by plugging my "credentials" [2].
> 

Thanks ;-)

> The OLAP case is more complicated than the OLTP case, in that the bad
> thing about *OLAP* star joins is that joins are pairwise. With OLAP
> star joins, you assume that |F| is always much larger than |Dk|, and
> by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than
> |D(k+1)|. And the problem for OLAP is that while every Dk potentially
> filters rows out from F, you have to join to the Dk's one at a time,
> so you never get as much filtering as you'd like.
> 
> For OLAP, you can take the Cartesian product of D1, ..., DN , and then
> scan F to aggregate into the resulting cube; see [3] . (Link [2] is
> related to transformation.)
> 
> Or, you can scan D1, ..., DN first, without joining anything,
> constructing Hash tables and Bloom filters from your scans; then push
> the Bloom filters down to the scan of F; and finally join the
> (Bloom-filtered) F back to D1, ..., DN. This is what link [1]
> describes. Note that [1] came out before [3].
> 
> However... for OLAP, you see from the above discussion that it's not
> compilation that takes too long, but rather execution. So the
> optimizations require significant changes to the SQL executor.
> 

Agreed. I'm not against improving the OLAP case too, but it's not what
this thread was about. It seems it'll need changes in very different
places, etc.

> What you're proposing, IIUC, is a nice optimization to compilation
> times, which is why (I think) you're focused on the OLTP use case. In
> that case, I suggest focusing on an OLTP-specific solution, maybe a
> straw man like:
> 1. I see a query where N-1 relations join to the Nth relation, but not
> to each other (except transitively, of course).
> 2. Estimated cardinality for F, after pushing down single table
> predicates, is very small.
> 3. OK, let's start joining tables D1, ..., D(N-1) in order, since
> we're assuming (thanks to (1) and (2)) that the join order won't
> matter.
> 4. Continue joining tables in this fixed (arbitrary) order, unless we
> come to a Dk where the optimizer thinks joining to Dk will generate a
> significant number of rows.
> 5. Either we join all tables in order (fast compilation!); or we hit
> the case in (4), so we just fall back on the existing join logic.
> 

Yes, I think that's pretty much the idea. Except that I don't think we
need to look at the |F| at all - it will have more impact for small |F|,
of course, but it doesn't hurt for large |F|.

I think it'll probably need to consider which joins increase/decrease
the cardinality, and "inject" the dimension joins in between those.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Matheus Alcantara
Date:
Subject: Re: explain analyze rows=%.0f
Next
From: Daniel Gustafsson
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER