Home > mailing lists

Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: should we have a fast-path planning for OLTP starjoins?
Date	July 28 22:44:11
Msg-id	0e60b314-0761-47ae-8e41-f8666110cbc9@vondra.me Whole thread Raw
In response to	Re: should we have a fast-path planning for OLTP starjoins? (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On 2/4/25 22:55, Tom Lane wrote:
> Tomas Vondra <tomas@vondra.me> writes:
>>> The interesting thing about this is we pretty much have all the
>>> infrastructure for detecting such FK-related join conditions
>>> already.  Possibly the join order forcing could be done with
>>> existing infrastructure too (by manipulating the joinlist).
>
>> Maybe, interesting. I've ruled out relying on the FKeys early in the
>> coding, but I'm sure there's infrastructure the patch could use.
>
> It would be very sad to do that work twice in a patch that purports
> to reduce planning time.  If it's done too late to suit you now,
> could we move it to happen earlier?
>
>> What kind of "manipulation" of the joinlist you have in mind?
>
> Right now, if we have four tables to join, we have a joinlist
> (A B C D).  (Really they're integer relids, but let's use names here.)
> If we decide to force C to be joined last, it should be sufficient to
> convert this to ((A B D) C).  If C and D both look like candidates for
> this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
> This is pretty much the same thing that happens if you set
> join_collapse_limit to 1 and use JOIN syntax to force a join order.
> In fact, IIRC we start out with nested joinlists and there is some
> code that normally flattens them until it decides it'd be creating
> too large a sub-problem.  I'm suggesting selectively reversing the
> flattening.
>
>             regards, tom lane

Here's a patch trying to do it more like this - by manipulating the
lists describing the join problems, before it's passed the the actual
join search algorithm (which is where the PoC patch did this).

I wonder if this is roughly the place where you imagined this would be
done, or if you envision some other issue with this approach. The patch
is definitely incomplete, there's plenty of XXX comments about places
missing some code, etc.

I initially tried to manipulate the joinlist much earlier - pretty much
right at the end of deconstruct_jointree. But that turned out to be way
too early. To identify dimensions etc. we need to check stuff about
foreign keys, join clauses, ... and that's not available that early.

So I think this needs to happen much later in query_planner(), and the
patch does it right before the make_one_rel() call. Maybe that's too
late, but it needs to happen after match_foreign_keys_to_quals(), as it
relies on some of the FK info built by that call. Maybe we could call
match_foreign_keys_to_quals() earlier, but I don't quite see any
benefits of doing that ...

On 2/8/25 02:49, Tomas Vondra wrote:
> On 2/8/25 01:23, Tom Lane wrote:
>> Tomas Vondra <tomas@vondra.me> writes:
>>> Instead, I was thinking about the "other" joins (if there are any), that
>>> may add or remove rows. AFAIK we want to join the dimensions at the
>>> place with the lowest cardinality - the discussion mostly assumed the
>>> joins would only reduce the cardinality, in which case we'd just leave
>>> the dimensions until the very end.
>>
>>> But ISTM that may not be necessarily true. Let's say there's a join that
>>> "multiplies" each row. It'll probably be done at the end, and the
>>> dimension joins should probably happen right before it ... not sure.
>>
>> I thought the idea here was to get rid of as much join order searching
>> as we could.  Insisting that we get the best possible plan anyway
>> seems counterproductive, not to mention very messy to implement.
>> So I'd just push all these joins to the end and be done with it.
>>
> 
> Right, cutting down on the join order searching is the point. But most
> of the savings comes (I think) from not considering different ordering
> of the dimensions, because those are all essentially the same.
> 
> Consider a join with 16 relations, 10 of which are dimensions. There are
> 10! possible orders of the dimensions, but all of them behave pretty
> much exactly the same. In a way, this behaves almost like a join with 7
> relations, one of which represents all the 10 dimensions.
> 
> I think this "join group" abstraction (a relation representing a bunch
> of relations in a particular order) would make this reasonably clean to
> implement. I haven't tried yet, though.
> 
> Yes, this means we'd explore more orderings, compared to just pushing
> all the dimensions to the end. In the example above, that'd be 7!/6!, so
> up to ~7x orderings.
> 
> I don't know if this is worth the extra complexity, of course.
> 

I'm still concerned about regressions when happen to postpone some
expensive dimension joins after a join that increases the cardinality.
And the "join group" would address that. But It probably is not a very
common query pattern, and it'd require changes to join_search_one_level.

I'm not sure what could / should count as 'dimension'. The patch doesn't
implement this yet, but I think it can mostly copy how we match the FK
to the join in get_foreign_key_join_selectivity. There's probably more
to think about, though. For example, don't we need to check NOT NULL /
nullfrac on the referencing side? Also, it probably interacts with
LEFT/OUTER joins. I plan to start strict and then relax and handle some
additional cases.

I'm however struggling with the concept of join order restrictions a
bit. I suspect we need to worry about that when walking the relation
list and figuring out what can be a dimension, but I've never worked
with this, so my mental model of how this works is not great.

Another question is if this planning shortcut (which for the dimensions
mostly picks a random join order) could have some unexpected impact on
the rest of the planning. For example, could we "miss" some join
producing tuples in an interesting order? Or could we fail to consider a
partition-wise join?

Could this "shortcut" restrict the rest of the plan in some undesirable
way? For example, if some of the tables are partitioned, could this mean
we no longer can do partition-wise joins with the (mostly arbitrary)
join order we picked?

There's also a "patch" directory, with some SQL scripts creating two
simple examples of schemas/queries that would benefit from this. The
"create-1/select-1" examples are the simple starjoins, this thread
focuses on. It might be modified to do "snowflake" join, which is
fundamentally a variant of this query type.

The second example (create-2/select-2) is quite different, in that it's
nor a starjoin schema. It still joins one "main" table with multiple
"dimensions", but the FKs go in the other direction (to a single column
in the main table). But it has a very similar bottleneck - the order of
the joins is expensive, but it often does not matter very much, because
the query matches just one row anyway. And even if it returns more rows,
isn't the join order determined just by the selectivity of each join?
Maybe the starjoin optimization could be made to work for this type too?

regards

-- 
Tomas Vondra

Attachment

v2-0001-Simplify-planning-of-starjoin-queries.patch

pgsql-hackers by date:

From: Masahiko Sawada
Date: 28 July, 22:33:28
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations

From: "Euler Taveira"
Date: 28 July, 22:53:26
Subject: Re: refactor backend type lists

Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers

Attachment

Previous

Next