Re: Removing unneeded self joins - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: Removing unneeded self joins |
Date | |
Msg-id | CAEZATCUwwvoaRuK+UuFF+S49KzKA3uMibjj7qUCOw_iSdtrZnA@mail.gmail.com Whole thread Raw |
In response to | Re: Removing unneeded self joins (Alexander Korotkov <aekorotkov@gmail.com>) |
List | pgsql-hackers |
On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > The new function replace_relid() looks to be the same as adjust_relid_set(). > > They are similar, not the same. replace_relid() has handling for > negative newId, while adjust_relid_set() hasn't. One thing I'd like > to borrow from adjust_relid_set() to replace_relid() is the usage of > IS_SPECIAL_VARNO() macro. Ah, that makes sense. In that case, I'd say that replace_relid() should go in analyzejoins.c (and be a local function there), since that's the only place that requires this special negative newId handling. > It would be probably nice to move this logic into bms_replace_member() > residing at bitmapset.c. What do you think? Maybe. It feels a little specialised though, so maybe it's not worth the effort. I have been reviewing more of the patch, mainly focusing on the logic in analyzejoins.c that decides when to apply SJE. I understand broadly what the code is doing, but I still find it somewhat hard to follow. One thing that makes it hard is that in analyzejoins.c, "inner" and "outer" get swapped round at various points. For example generate_join_implied_equalities() is defined like this: List * generate_join_implied_equalities(PlannerInfo *root, Relids join_relids, Relids outer_relids, RelOptInfo *inner_rel, SpecialJoinInfo *sjinfo); but remove_self_joins_one_group() calls it like this: restrictlist = generate_join_implied_equalities(root, joinrelids, inner->relids, outer, NULL); So you have to remember that "inner" is "outer" and "outer" is "inner" when going into generate_join_implied_equalities() from remove_self_joins_one_group(). And the same thing happens when calling innerrel_is_unique_ext() and match_unique_clauses(). I think all that could be resolved by swapping "inner" and "outer" in the variable names and comments in remove_self_joins_one_group(). Another thing I noticed in remove_self_joins_one_group() was this: /* * To enable SJE for the only degenerate case without any self * join clauses at all, add baserestrictinfo to this list. The * degenerate case works only if both sides have the same clause. * So doesn't matter which side to add. */ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo); That appears to be pointless, because is_innerrel_unique_for() will filter the restrictlist it is given, removing those baserestrictinfo clauses (because I think they'll always have can_join = false). And then relation_has_unique_index_ext() will re-add them: /* * Examine the rel's restriction clauses for usable var = const clauses * that we can add to the restrictlist. */ foreach(ic, rel->baserestrictinfo) { ... add suitable clauses } where "rel" is "innerrel" from is_innerrel_unique_for(), which is "outer" from remove_self_joins_one_group(), so it's the same set of baserestrictinfo clauses. Something else that looks a little messy is this in innerrel_is_unique_ext(): /* * innerrel_is_unique_ext * Do the same as innerrel_is_unique(), but also set to '*extra_clauses' * additional clauses from a baserestrictinfo list that were used to prove * uniqueness. A non NULL 'extra_clauses' indicates that we're checking * for self-join and correspondingly dealing with filtered clauses. */ bool innerrel_is_unique_ext(PlannerInfo *root, ... List **extra_clauses) { bool self_join = (extra_clauses != NULL); [logic depending on self_join] } This presumes that any caller interested in knowing the extra baserestrictinfo clauses used to prove uniqueness must be looking at a self join. That may be true today, but it doesn't seem like a good API design choice. I think it would be better to just add "self_join" as an extra parameter, and also maybe have the function return the UniqueRelInfo containing the "extra_clauses", or NULL if it's not unique. That way, it would be more extensible, if we wanted it to return more information in the future. Instead of adding relation_has_unique_index_ext(), maybe it would be OK to just change the signature of relation_has_unique_index_for(). It looks like it's only called from one other place outside analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext(). Should match_unique_clauses() be comparing mergeopfamilies or opnos to ensure that the clauses are using the same equality operator? Regards, Dean
pgsql-hackers by date: