Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: should we have a fast-path planning for OLTP starjoins?
Date
Msg-id aUcn9DAbN3RttmIl@momjian.us
Whole thread Raw
In response to Re: should we have a fast-path planning for OLTP starjoins?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: should we have a fast-path planning for OLTP starjoins?
List pgsql-hackers
On Fri, Nov 28, 2025 at 05:50:34PM -0500, Robert Haas wrote:
> I'm guessing that the reason you're limiting this to relations without
> restrictions is so that you don't have to reason about the join
> causing cardinality changes. But I don't quite understand how you
> avoid having that problem anyway. For example, suppose that we have a
> fact table F which is joined to relations S1, S2, D1, D2, I1, and I2.
> The joins to S1 and S2 are joins on FKs to unconstrained dimension
> tables; i.e. cardinality stays the same. The joins to D1 and D2 are
> joins on FKs to constrained dimension tables, i.e. cardinality
> decreases. The joins to I1 and I2 on average match more than once,
> i.e. cardinality increases. The optimal join order is presumably
> something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing
> through each level of the join tree is kept as small as possible, but
> how do you achieve this if the joins to S1 and S2 are set aside? In
> the presence of row-count-inflating joins, it's wrong to suppose that
> S1 and S2 should be joined at the end.

I spend some time thinking about this email.  I have a few questions. 
As I see it, we are regrouping joins of a single table into three
groups:

1.  restriction joins
2.  neutral joins
3.  expansion joins

It seems we have to identify these joins _before_ we actually start the
main optimizer.  We can identify restriction joins since we see the
restriction in the query, and we can identify neutral joins because of
foreign keys.  How do we identify expansion joins?  Is it all the joins
which are not the previous types?

> What seems more likely to be true is that we should perform the join
> to S1 and the join to S2 consecutively. It's very common in my
> experience for complex reporting queries to have a bunch of joins the
> results of which matter very little: each one changes the cardinality
> very little or not at all, and so any ordering of those joins produces
> essentially the same result, and probably it's best to do them all at
> whatever point in the join sequence the cardinality of the other input
> is at lowest. However, I'm not sure that even this is guaranteed. For
> instance, suppose that in the above example there's no index on the
> column of F that joins to S2. Could it be that the only reasonable way
> to join to S2 is a merge join? If so, it might be best to start by
> join F to S2 using an index scan on each, and then continue by joining
> to D1, D2, S1, I1, I2 in that order.

So, this got me thinking.  As the optimizer runs, it doesn't choose just
the cheapest path for each stage, but retains paths with higher costs if
they have sort orders (pathkeys) that might be useful in later stages.

In the group execution order we are considering:

1.  restriction joins
2.  neutral joins
3.  expansion joins

do we generate only the cheapest path for each group or return multiple
paths that can be considered by later group executions?  If not, if we
join the same column in stages 1 & 2, would it make sense to move the
stage 2 join into stage 1 so we can potentially use sort order
(pathkeys) for both joins?  Similarly if we join the same column in
stage 2 & 3 can we move the stage 2 join to stage 3?  I don't think it
is worth moving stage 1 to 3 or 3 to 1 since it seems too risky.

I am sorry if this was already discussed in the patch comments.

What Robert is saying in the above paragraph is even more complex ---
that we might want to use an index on the fact table for a merge join of
a stage 2 or 3 join, and then do the other joins.  How would we detect
this?  Could we run each join on its own and see which ones choose merge
join, and move them to stage 1?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inline non-SQL SRFs using SupportRequestSimplify
Next
From: Alena Rybakina
Date:
Subject: Re: Vacuum statistics