Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: should we have a fast-path planning for OLTP starjoins? |
Date | |
Msg-id | bcc47569-f3a5-48af-8d57-446ed045584d@vondra.me Whole thread Raw |
In response to | Re: should we have a fast-path planning for OLTP starjoins? (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: should we have a fast-path planning for OLTP starjoins?
|
List | pgsql-hackers |
On 2/5/25 09:27, Richard Guo wrote: > On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right now, if we have four tables to join, we have a joinlist >> (A B C D). (Really they're integer relids, but let's use names here.) >> If we decide to force C to be joined last, it should be sufficient to >> convert this to ((A B D) C). If C and D both look like candidates for >> this treatment, we can make it be (((A B) C) D) or (((A B) D) C). >> This is pretty much the same thing that happens if you set >> join_collapse_limit to 1 and use JOIN syntax to force a join order. >> In fact, IIRC we start out with nested joinlists and there is some >> code that normally flattens them until it decides it'd be creating >> too large a sub-problem. I'm suggesting selectively reversing the >> flattening. > > This should not be too difficult to implement. Outer joins seem to > add some complexity, though. We need to ensure that the resulting > joins in each sub-list are legal given the query's join order > constraints. For example, if we make the joinlist be (((A B) C) D), > we need to ensure that the A/B join and the (A/B)/C join are legal. > If the requirement is that all "dimensions" only join to the fact table (which in this example would be "A" I think) through a FK, then why would these joins be illegal? We'd also need to require either an outer (left) join, or "NOT NULL" on the fact table side, right? IIRC we already do that when using the FKeys for join estimates. Essentially, this defines a "dimension" as a relation that is joined through a PK, without any other restrictions, both of which seems fairly simple to check, and it's a "local" feature. And we'd simply mark those as "join at the very end, in arbitrary order". Easy enough, I guess. I'm thinking about some more complex cases: (a) Query with multiple starjoins (a special case of that is snowflake schema) - but I guess this is not too difficult, it just needs to consider the FKs as "transitive" (a bit like Dijkstra's algorithm). In the worst case we might need to "split" the whole query into multiple smaller subproblems. (b) Joining additional stuff to the dimensions (not through a FK, possibly to multiple dimensions, ...). Imagine a "diamond join" with some summary table, etc. IMHO this is a fairly rare case / expensive enough to make the planning part less important. I'm also wondering how this should interact with join_collapse_limit. It would seems ideal to do this optimization before splitting the join list into subproblems (so that the "dimensions" are do not even count against the limit, pretty much). But that would mean join_collapse_limit can no longer be used to enforce a join order like today ... regards -- Tomas Vondra
pgsql-hackers by date: