Re: Removing unneeded self joins - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: Removing unneeded self joins |
Date | |
Msg-id | CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE+LGJEQaL9w@mail.gmail.com Whole thread Raw |
In response to | Re: Removing unneeded self joins (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Removing unneeded self joins
Re: Removing unneeded self joins |
List | pgsql-hackers |
On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > I want to go on record right now as disagreeing with the plan proposed > > in the commit message for the revert commit, namely, committing this > > again early in the v18 cycle. I don't think Tom would have proposed > > reverting this feature unless he believed that it had more serious > > problems than could be easily fixed in a short period of time. I think > > that concern is well-founded, given the number of fixes that were > > committed. It seems likely that the patch needs significant rework and > > stabilization before it gets committed again, and I think it shouldn't > > be committed again without explicit agreement from Tom or one of the > > other committers who have significant experience with the query > > planner. > > FWIW I accept some of the blame here, for not having paid any > attention to the SJE work earlier. I had other things on my mind > for most of last year, and not enough bandwidth to help. > > The main thing I'd like to understand before we try this again is > why SJE needed so much new query-tree-manipulation infrastructure. > I would have expected it to be very similar to the left-join > elimination we do already, and therefore to mostly just share the > existing infrastructure. (I also harbor suspicions that some of > the new code existed just because someone didn't research what > was already there --- for instance, the now-removed replace_varno > sure looks like ChangeVarNodes should have been used instead.) > i have looked around the code. about replace_varno and ChangeVarNodes: ChangeVarNodes have ```` if (IsA(node, RangeTblRef)) { RangeTblRef *rtr = (RangeTblRef *) node; if (context->sublevels_up == 0 && rtr->rtindex == context->rt_index) rtr->rtindex = context->new_index; /* the subquery itself is visited separately */ return false; } ```` if ChangeVarNodes executed the above code in remove_useless_self_joins and remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2) to (2,2). then later, remove_rel_from_joinlist cannot remove the 1, *nremoved will be zero. then the below code error branch will be executed. ```` joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved); if (nremoved != 1) elog(ERROR, "failed to find relation %d in joinlist", relid); ``` --------------------------------------------------------------------- replace_varno and replace_varno_walker didn't replace Query->resultRelation, Query->mergeTargetRelation as ChangeVarNodes did. then replace_varno will have problems with DELETE, UPDATE, MERGE someway. ChangeVarNodes solved this problem. > Another thing that made me pretty sad was 8c441c082 (Forbid SJE with > result relation). While I don't claim that that destroyed the entire > use case for SJE, it certainly knocked its usefulness down by many > notches, maybe even to the point where it's not worth putting in the > effort needed to get it to re-committability. So I think we need to > look harder at finding a way around that. Is the concern that > RETURNING should return either old or new values depending on which > RTE is mentioned? If so, maybe the feature Dean has proposed to > allow RETURNING to access old values [1] is a prerequisite to moving > forward. Alternatively, perhaps it'd be good enough to forbid SJE > only when the non-target relation is actually mentioned in RETURNING. > > regards, tom lane > > [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com if only SELECT, no worth to make it being committed, do you think support DML but no support RETURNING worth the effort? excerpt from [1] latest patch: +/* Returning behavior for Vars in RETURNING list */ +typedef enum VarReturningType +{ + VAR_RETURNING_DEFAULT, /* return OLD for DELETE, else return NEW */ + VAR_RETURNING_OLD, /* return OLD for DELETE/UPDATE, else NULL */ + VAR_RETURNING_NEW, /* return NEW for INSERT/UPDATE, else NULL */ +} VarReturningType; + typedef struct Var { Expr xpr; @@ -265,6 +278,9 @@ typedef struct Var */ Index varlevelsup; + /* returning type of this var (see above) */ + VarReturningType varreturningtype; -------------------------------------------- example. e.g. explain(costs off) WITH t1 AS (SELECT * FROM emp1) UPDATE emp1 SET code = t1.code + 1 FROM t1 WHERE t1.id = emp1.id RETURNING emp1.code, t1.code; the returning (emp1.code,t1.code) these two var the VarReturningType is VAR_RETURNING_DEFAULT. That means the patch (support-returning-old-new-v9.patch in [1]) see the RETURNING (emp1.code, t1.code) are two different table.column references. but here we need to transform it to "RETURNING new.code, old.code", i think. that would be way more harder. >Alternatively, perhaps it'd be good enough to forbid SJE > only when the non-target relation is actually mentioned in RETURNING. i will try to explore this area. in this case would be allow SJE apply to " RETURNING t1.code, t1.code". I've attached 2 patches, based on the latest patch in this thread. 0001 mainly about replacing all replace_varno to ChangeVarNodes. 0002 makes SJE support for DML without RETURNING clause. now SJE also works with updatable view. for example: +CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off); +INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60); +create view rw_sj_target as select * from sj_target where tid >= 2; +EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid + WHEN MATCHED AND t.balance = 20 + THEN update set balance = t.balance + 2; + QUERY PLAN +------------------------------------------------- + Merge on sj_target + -> Bitmap Heap Scan on sj_target + Recheck Cond: (tid >= 2) + -> Bitmap Index Scan on sj_target_pkey + Index Cond: (tid >= 2) +(5 rows) > [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
Attachment
pgsql-hackers by date: