Re: Clean up remove_rel_from_query() after self-join elimination commit - Mailing list pgsql-hackers
| From | Tender Wang |
|---|---|
| Subject | Re: Clean up remove_rel_from_query() after self-join elimination commit |
| Date | |
| Msg-id | CAHewXNmrGtEwhCFeJMtx-e=3TcD2Q0CUt6CVNbr6Hu0zbTg+tQ@mail.gmail.com Whole thread Raw |
| In response to | Clean up remove_rel_from_query() after self-join elimination commit (Richard Guo <guofenglinux@gmail.com>) |
| List | pgsql-hackers |
Richard Guo <guofenglinux@gmail.com> 于2026年4月6日周一 16:11写道:
>
> While working on reusing remove_rel_from_query() for inner-join
> removal, I noticed that the function is in pretty rough shape. The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.
Yes, I noticed this a few days ago when I tried to remove a redundant
filter added during SJE.
>This resulted in several issues:
> 1. Comments throughout remove_rel_from_query() still assumed only
> left-join removal, making the code misleading. For example:
>
> * This is relevant in case the outer join we're deleting is nested inside
> * other outer joins:
>
The current logic of remove_rel_from_query() is not easy to read. For example,
It distinguishes between left-join removal and SJE based on whether
sjinfo is NULL.
> 2. This was called even for left-join removal, with subst=-1. This is
> pointless and confusing.
>
> ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
> replace_relid_callback);
>
Yes, it is.
> 3. phinfo->ph_lateral was adjusted for left-join removal, which is
> also confusing ...
>
> phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
>
> /*
> * ph_lateral might contain rels mentioned in ph_eval_at after the
> * replacement, remove them.
> */
> phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
> phinfo->ph_eval_at);
>
> ... since you can find this Assert just above:
>
> Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
>
> 4. The comment about attr_needed reconstruction was in
> remove_rel_from_query(), but the actual rebuild is performed by
> the callers.
>
> 5. The EquivalenceClass processing in remove_rel_from_query():
>
> /*
> * Likewise remove references from EquivalenceClasses.
> */
> foreach(l, root->eq_classes)
> {
> EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
>
> if (bms_is_member(relid, ec->ec_relids) ||
> (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
> remove_rel_from_eclass(ec, sjinfo, relid, subst);
> }
>
> The condition always evaluates to true for self-join elimination
> (i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
> But this is redundant because the caller remove_self_join_rel()
> already handles ECs via update_eclasses().
>
> 6. In remove_self_join_rel(), I notice this code:
>
> /* At last, replace varno in root targetlist and HAVING clause */
> ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
> toKeep->relid, 0, replace_relid_callback);
> ChangeVarNodesExtended((Node *) root->processed_groupClause,
> toRemove->relid, toKeep->relid, 0,
> replace_relid_callback);
>
> The comment mentions "HAVING clause", but neither processed_tlist nor
> processed_groupClause has anything to do with the HAVING clause.
> Furthermore, processed_groupClause contains SortGroupClause nodes,
> which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
> on it is pointless.
I didn't find as many as you listed here. I agree that we should do
something for
current logic.
>
> The attached patch is an attempt to fix all these issues. It also
> aims to leave the code better structured for adding new types of join
> removal (such as inner-join removal) in the future.
I looked through the attached patch. LGTM.
--
Thanks,
Tender Wang
pgsql-hackers by date: