Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Removing unneeded self joins
Date
Msg-id CAPpHfdsj1WdqoCg+g4oq1XQ_=KOj0U4pXqhauBc7e+0msHu66A@mail.gmail.com
Whole thread Raw
In response to Re: Removing unneeded self joins  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Removing unneeded self joins
Re: Removing unneeded self joins
List pgsql-hackers
On Fri, Jul 19, 2024 at 11:30 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Jian He gave a try to ChangeVarNodes() [1].  That gives some
> > improvement, but the vast majority of complexity is still here. I
> > think the reason for complexity of SJE is that it's the first time we
> > remove relation, which is actually *used* and therefore might has
> > references in awful a lot of places.  In previous cases we removed
> > relations, which were actually unused.
>
> I had a quick look at this, and I have a couple of comments on the
> rewriter changes.
>
> The new function replace_relid() looks to be the same as adjust_relid_set().

They are similar, not the same.  replace_relid() has handling for
negative newId, while adjust_relid_set() hasn't.  One thing I'd like
to borrow from adjust_relid_set() to replace_relid() is the usage of
IS_SPECIAL_VARNO() macro.

It would be probably nice to move this logic into bms_replace_member()
residing at bitmapset.c.  What do you think?

> The changes to ChangeVarNodes() look a little messy. There's a lot of
> code duplicated between ChangeVarNodesExtended() and ChangeVarNodes(),
> which could be avoided by having one call the other. Also, it would be
> better for ChangeVarNodesExtended() to have a "flags" parameter
> instead of an extra boolean parameter, to make it more extensible in
> the future. However,...

I certainly didn't mean to have duplicate functions
ChangeVarNodesExtended() and ChangeVarNodes().  I mean
ChangeVarNodes() should just call ChangeVarNodesExtended().

> I question whether ChangeVarNodesExtended() and the changes to
> ChangeVarNodes() are really the right way to go about this.
> ChangeVarNodes() in particular gains a lot more logic to handle
> RestrictInfo nodes that doesn't really feel like it belongs there --
> e.g., building  NullTest nodes is really specific to SJE, and doesn't
> seem like it's something ChangeVarNodes() should be doing.
>
> A better solution might be to add a new walker function to
> analyzejoins.c that does just what SJE needs, which is different from
> ChangeVarNodes() in a number of ways. For Var nodes, it might
> ultimately be necessary to do more than just change the varno, to
> solve the RETURNING/EPQ problems. For RestrictInfo nodes, there's a
> lot of SJE-specific logic. The SJE code wants to ignore RangeTblRef
> nodes, but it could delegate to ChangeVarNodes() for various other
> node types to avoid code duplication. At the top level, the stuff that
> ChangeVarNodes() does to fields of the Query struct would be different
> for SJE, I think.

We initially didn't use ChangeVarNodes() in SJE at all.  See the last
patch version without it [1].  We're trying to address Tom Lane's
proposal to re-use more of existing tree-manipulation infrastructure
[2].  I agree with you that the case with ChangeVarNodes() looks
questionable.  Do you have other ideas how we can re-use some more of
existing tree-manipulation infrastructure in SJE?

Links
1. https://www.postgresql.org/message-id/55f680bc-756d-4dd3-ab27-3c6e663b0e4c%40postgrespro.ru
2. https://www.postgresql.org/message-id/3622801.1715010885%40sss.pgh.pa.us

------
Regards,
Alexander Korotkov
Supabase

Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: race condition in pg_class
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: UUID v7