Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: should we have a fast-path planning for OLTP starjoins? |
Date | |
Msg-id | 0e60b314-0761-47ae-8e41-f8666110cbc9@vondra.me Whole thread Raw |
In response to | Re: should we have a fast-path planning for OLTP starjoins? (Jeff Davis <pgsql@j-davis.com>) |
List | pgsql-hackers |
On 2/4/25 22:55, Tom Lane wrote: > Tomas Vondra <tomas@vondra.me> writes: >>> The interesting thing about this is we pretty much have all the >>> infrastructure for detecting such FK-related join conditions >>> already. Possibly the join order forcing could be done with >>> existing infrastructure too (by manipulating the joinlist). > >> Maybe, interesting. I've ruled out relying on the FKeys early in the >> coding, but I'm sure there's infrastructure the patch could use. > > It would be very sad to do that work twice in a patch that purports > to reduce planning time. If it's done too late to suit you now, > could we move it to happen earlier? > >> What kind of "manipulation" of the joinlist you have in mind? > > Right now, if we have four tables to join, we have a joinlist > (A B C D). (Really they're integer relids, but let's use names here.) > If we decide to force C to be joined last, it should be sufficient to > convert this to ((A B D) C). If C and D both look like candidates for > this treatment, we can make it be (((A B) C) D) or (((A B) D) C). > This is pretty much the same thing that happens if you set > join_collapse_limit to 1 and use JOIN syntax to force a join order. > In fact, IIRC we start out with nested joinlists and there is some > code that normally flattens them until it decides it'd be creating > too large a sub-problem. I'm suggesting selectively reversing the > flattening. > > regards, tom lane Here's a patch trying to do it more like this - by manipulating the lists describing the join problems, before it's passed the the actual join search algorithm (which is where the PoC patch did this). I wonder if this is roughly the place where you imagined this would be done, or if you envision some other issue with this approach. The patch is definitely incomplete, there's plenty of XXX comments about places missing some code, etc. I initially tried to manipulate the joinlist much earlier - pretty much right at the end of deconstruct_jointree. But that turned out to be way too early. To identify dimensions etc. we need to check stuff about foreign keys, join clauses, ... and that's not available that early. So I think this needs to happen much later in query_planner(), and the patch does it right before the make_one_rel() call. Maybe that's too late, but it needs to happen after match_foreign_keys_to_quals(), as it relies on some of the FK info built by that call. Maybe we could call match_foreign_keys_to_quals() earlier, but I don't quite see any benefits of doing that ... On 2/8/25 02:49, Tomas Vondra wrote: > On 2/8/25 01:23, Tom Lane wrote: >> Tomas Vondra <tomas@vondra.me> writes: >>> Instead, I was thinking about the "other" joins (if there are any), that >>> may add or remove rows. AFAIK we want to join the dimensions at the >>> place with the lowest cardinality - the discussion mostly assumed the >>> joins would only reduce the cardinality, in which case we'd just leave >>> the dimensions until the very end. >> >>> But ISTM that may not be necessarily true. Let's say there's a join that >>> "multiplies" each row. It'll probably be done at the end, and the >>> dimension joins should probably happen right before it ... not sure. >> >> I thought the idea here was to get rid of as much join order searching >> as we could. Insisting that we get the best possible plan anyway >> seems counterproductive, not to mention very messy to implement. >> So I'd just push all these joins to the end and be done with it. >> > > Right, cutting down on the join order searching is the point. But most > of the savings comes (I think) from not considering different ordering > of the dimensions, because those are all essentially the same. > > Consider a join with 16 relations, 10 of which are dimensions. There are > 10! possible orders of the dimensions, but all of them behave pretty > much exactly the same. In a way, this behaves almost like a join with 7 > relations, one of which represents all the 10 dimensions. > > I think this "join group" abstraction (a relation representing a bunch > of relations in a particular order) would make this reasonably clean to > implement. I haven't tried yet, though. > > Yes, this means we'd explore more orderings, compared to just pushing > all the dimensions to the end. In the example above, that'd be 7!/6!, so > up to ~7x orderings. > > I don't know if this is worth the extra complexity, of course. > I'm still concerned about regressions when happen to postpone some expensive dimension joins after a join that increases the cardinality. And the "join group" would address that. But It probably is not a very common query pattern, and it'd require changes to join_search_one_level. I'm not sure what could / should count as 'dimension'. The patch doesn't implement this yet, but I think it can mostly copy how we match the FK to the join in get_foreign_key_join_selectivity. There's probably more to think about, though. For example, don't we need to check NOT NULL / nullfrac on the referencing side? Also, it probably interacts with LEFT/OUTER joins. I plan to start strict and then relax and handle some additional cases. I'm however struggling with the concept of join order restrictions a bit. I suspect we need to worry about that when walking the relation list and figuring out what can be a dimension, but I've never worked with this, so my mental model of how this works is not great. Another question is if this planning shortcut (which for the dimensions mostly picks a random join order) could have some unexpected impact on the rest of the planning. For example, could we "miss" some join producing tuples in an interesting order? Or could we fail to consider a partition-wise join? Could this "shortcut" restrict the rest of the plan in some undesirable way? For example, if some of the tables are partitioned, could this mean we no longer can do partition-wise joins with the (mostly arbitrary) join order we picked? There's also a "patch" directory, with some SQL scripts creating two simple examples of schemas/queries that would benefit from this. The "create-1/select-1" examples are the simple starjoins, this thread focuses on. It might be modified to do "snowflake" join, which is fundamentally a variant of this query type. The second example (create-2/select-2) is quite different, in that it's nor a starjoin schema. It still joins one "main" table with multiple "dimensions", but the FKs go in the other direction (to a single column in the main table). But it has a very similar bottleneck - the order of the joins is expensive, but it often does not matter very much, because the query matches just one row anyway. And even if it returns more rows, isn't the join order determined just by the selectivity of each join? Maybe the starjoin optimization could be made to work for this type too? regards -- Tomas Vondra
Attachment
pgsql-hackers by date: