Re: Removing unneeded self joins - Mailing list pgsql-hackers

From jian he
Subject Re: Removing unneeded self joins
Date
Msg-id CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE+LGJEQaL9w@mail.gmail.com
Whole thread Raw
In response to Re: Removing unneeded self joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Removing unneeded self joins
Re: Removing unneeded self joins
List pgsql-hackers
On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > I want to go on record right now as disagreeing with the plan proposed
> > in the commit message for the revert commit, namely, committing this
> > again early in the v18 cycle. I don't think Tom would have proposed
> > reverting this feature unless he believed that it had more serious
> > problems than could be easily fixed in a short period of time. I think
> > that concern is well-founded, given the number of fixes that were
> > committed. It seems likely that the patch needs significant rework and
> > stabilization before it gets committed again, and I think it shouldn't
> > be committed again without explicit agreement from Tom or one of the
> > other committers who have significant experience with the query
> > planner.
>
> FWIW I accept some of the blame here, for not having paid any
> attention to the SJE work earlier.  I had other things on my mind
> for most of last year, and not enough bandwidth to help.
>
> The main thing I'd like to understand before we try this again is
> why SJE needed so much new query-tree-manipulation infrastructure.
> I would have expected it to be very similar to the left-join
> elimination we do already, and therefore to mostly just share the
> existing infrastructure.  (I also harbor suspicions that some of
> the new code existed just because someone didn't research what
> was already there --- for instance, the now-removed replace_varno
> sure looks like ChangeVarNodes should have been used instead.)
>

i have looked around the code.
about replace_varno and ChangeVarNodes:

ChangeVarNodes
have
````
if (IsA(node, RangeTblRef))
{
RangeTblRef *rtr = (RangeTblRef *) node;

if (context->sublevels_up == 0 &&
rtr->rtindex == context->rt_index)
rtr->rtindex = context->new_index;
/* the subquery itself is visited separately */
return false;
}
````
if ChangeVarNodes executed the above code in remove_useless_self_joins and
remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
*nremoved will be zero.
then the below code error branch will be executed.
````
joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
if (nremoved != 1)
elog(ERROR, "failed to find relation %d in joinlist", relid);
```

---------------------------------------------------------------------
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.

then replace_varno will have problems with DELETE, UPDATE, MERGE
someway.
ChangeVarNodes solved this problem.


> Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> result relation).  While I don't claim that that destroyed the entire
> use case for SJE, it certainly knocked its usefulness down by many
> notches, maybe even to the point where it's not worth putting in the
> effort needed to get it to re-committability.  So I think we need to
> look harder at finding a way around that.  Is the concern that
> RETURNING should return either old or new values depending on which
> RTE is mentioned?  If so, maybe the feature Dean has proposed to
> allow RETURNING to access old values [1] is a prerequisite to moving
> forward.  Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.
>
>                         regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com

if only SELECT, no worth to make it being committed,
do you think support DML but no support RETURNING worth the effort?

excerpt from [1] latest patch:
+/* Returning behavior for Vars in RETURNING list */
+typedef enum VarReturningType
+{
+ VAR_RETURNING_DEFAULT, /* return OLD for DELETE, else return NEW */
+ VAR_RETURNING_OLD, /* return OLD for DELETE/UPDATE, else NULL */
+ VAR_RETURNING_NEW, /* return NEW for INSERT/UPDATE, else NULL */
+} VarReturningType;
+

 typedef struct Var
 {
  Expr xpr;
@@ -265,6 +278,9 @@ typedef struct Var
  */
  Index varlevelsup;

+ /* returning type of this var (see above) */
+ VarReturningType varreturningtype;
--------------------------------------------
example. e.g.
explain(costs off)
WITH t1 AS (SELECT * FROM emp1) UPDATE emp1 SET code = t1.code + 1
FROM t1 WHERE t1.id = emp1.id  RETURNING emp1.code, t1.code;

the returning (emp1.code,t1.code) these two var the VarReturningType
is VAR_RETURNING_DEFAULT.
That means the patch (support-returning-old-new-v9.patch in [1]) see
the RETURNING (emp1.code, t1.code) are two different table.column
references.
but here we need to transform it to
"RETURNING new.code, old.code", i think.
that would be way more harder.

>Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.
i will try to explore this area. in this case would be
allow SJE apply to " RETURNING t1.code, t1.code".



I've attached 2 patches, based on the latest patch in this thread.
0001 mainly about replacing all replace_varno to ChangeVarNodes.
0002 makes SJE support for DML without RETURNING clause.


now SJE also works with updatable view. for example:
+CREATE TABLE sj_target (tid integer primary key, balance integer)
WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5,
50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;

+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON
t.tid = s.tid
+    WHEN MATCHED AND t.balance = 20
+    THEN update set balance = t.balance + 2;
+                   QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+   ->  Bitmap Heap Scan on sj_target
+         Recheck Cond: (tid >= 2)
+         ->  Bitmap Index Scan on sj_target_pkey
+               Index Cond: (tid >= 2)
+(5 rows)


> [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: ecdh support causes unnecessary roundtrips
Next
From: Thomas Munro
Date:
Subject: Re: Using LibPq in TAP tests via FFI