Re: Clean up remove_rel_from_query() after self-join elimination commit - Mailing list pgsql-hackers

From Tender Wang
Subject Re: Clean up remove_rel_from_query() after self-join elimination commit
Date
Msg-id CAHewXNmrGtEwhCFeJMtx-e=3TcD2Q0CUt6CVNbr6Hu0zbTg+tQ@mail.gmail.com
Whole thread Raw
In response to Clean up remove_rel_from_query() after self-join elimination commit  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
Richard Guo <guofenglinux@gmail.com> 于2026年4月6日周一 16:11写道:
>
> While working on reusing remove_rel_from_query() for inner-join
> removal, I noticed that the function is in pretty rough shape.  The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.

Yes, I noticed this a few days ago when I tried to remove a redundant
filter added during SJE.

>This resulted in several issues:
> 1. Comments throughout remove_rel_from_query() still assumed only
> left-join removal, making the code misleading.  For example:
>
>   * This is relevant in case the outer join we're deleting is nested inside
>   * other outer joins:
>

The current logic of remove_rel_from_query() is not easy to read. For example,
It distinguishes between left-join removal and SJE based on whether
sjinfo is NULL.

> 2. This was called even for left-join removal, with subst=-1.  This is
> pointless and confusing.
>
>       ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
>                              replace_relid_callback);
>
Yes, it is.
> 3. phinfo->ph_lateral was adjusted for left-join removal, which is
> also confusing ...
>
>        phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
>
>        /*
>         * ph_lateral might contain rels mentioned in ph_eval_at after the
>         * replacement, remove them.
>         */
>        phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
> phinfo->ph_eval_at);
>
> ... since you can find this Assert just above:
>
>        Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
>
> 4. The comment about attr_needed reconstruction was in
> remove_rel_from_query(), but the actual rebuild is performed by
> the callers.
>
> 5. The EquivalenceClass processing in remove_rel_from_query():
>
>     /*
>      * Likewise remove references from EquivalenceClasses.
>      */
>     foreach(l, root->eq_classes)
>     {
>         EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
>
>         if (bms_is_member(relid, ec->ec_relids) ||
>             (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
>             remove_rel_from_eclass(ec, sjinfo, relid, subst);
>     }
>
> The condition always evaluates to true for self-join elimination
> (i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
> But this is redundant because the caller remove_self_join_rel()
> already handles ECs via update_eclasses().
>
> 6. In remove_self_join_rel(), I notice this code:
>
>     /* At last, replace varno in root targetlist and HAVING clause */
>     ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
>                            toKeep->relid, 0, replace_relid_callback);
>     ChangeVarNodesExtended((Node *) root->processed_groupClause,
>                            toRemove->relid, toKeep->relid, 0,
>                            replace_relid_callback);
>
> The comment mentions "HAVING clause", but neither processed_tlist nor
> processed_groupClause has anything to do with the HAVING clause.
> Furthermore, processed_groupClause contains SortGroupClause nodes,
> which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
> on it is pointless.

I didn't find as many as you listed here. I agree that we should do
something for
current logic.
>
> The attached patch is an attempt to fix all these issues.  It also
> aims to leave the code better structured for adding new types of join
> removal (such as inner-join removal) in the future.

I looked through the attached patch. LGTM.



--
Thanks,
Tender Wang



pgsql-hackers by date:

Previous
From: Josh Kupershmidt
Date:
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Next
From: Chao Li
Date:
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3