Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CA+HiwqEr4z_5iWd2L_nOxG=8S=imCYFum+_RmgE36UyK7grhKg@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Responses RE: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
Hi,

While reviewing the v14 set of patches (will send my comments
shortly), I too had some reservations on how 0001 decided to go about
setting hasModifyingCTE.

On Fri, Feb 5, 2021 at 1:51 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > > I took a look into the hasModifyingCTE bugfix recently, and found a
> > > possible bug case without the parallel insert patch.
> > >
> > > ---------------------------------
> > > drop table if exists test_data1;
> > > create table test_data1(a int, b int) ; insert into test_data1 select
> > > generate_series(1,1000), generate_series(1,1000); set
> > > force_parallel_mode=on;
> > >
> > > CREATE TEMP TABLE bug6051 AS
> > >   select i from generate_series(1,3) as i;
> > >
> > > SELECT * FROM bug6051;
> > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as
> > > i from test_data1;
> > >
> > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051
> > > SELECT * FROM t1;
> > >
> > > *******
> > > ***ERROR:  cannot assign XIDs during a parallel operation
> > > *******
> > > ---------------------------------
> > >
> > > I debugged it and it did have modifycte in the parsetree after rewrite.
> > > I think if we can properly set the hasModifyingCTE, we can avoid this
> > error by not consider parallel for this.
> > >
> >
> > Thanks. You've identified that the bug exists for SELECT too. I've verified
> > that the issue is fixed by the bugfix included in the Parallel INSERT patch.
> > Are you able to review my bugfix?
> > Since the problem exists for SELECT in the current Postgres code, I'd like
> > to pull that bugfix out and provide it as a separate fix.

+1, a separate patch for this seems better.

> > My concern is that there may well be a better way to fix the issue - for
> > example, during the re-writing, rather than after the query has been
> > re-written.
> Hi,
>
> I took a look at the fix and have some thoughts on it.
> (Please correct me if you have tried this idea and found something is wrong)
>
> IMO, the main reason for the hasModifyingCTE=false is that:
> the Rewriter did not update the  hasModifyingCTE when copying the existsing 'cteList' to the rewrited one.
>
> It seems there is only one place where ctelist will be copied separately.
> -------
> static Query *
> rewriteRuleAction(Query *parsetree,
> ...
>                 /* OK, it's safe to combine the CTE lists */
>                 sub_action->cteList = list_concat(sub_action->cteList,
>                                                                                   copyObject(parsetree->cteList));
> +               sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> --------
>
> Based on the above, if we update the hasModifyingCTE here, we may solve this problem.
>
> And there is another point here, the sub_action may be not the final parsetree.
> If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will
> Put the ctelist into subquery in parsetree's rtable.
> In this case, we may also need to update the final parsetree too.
> (I think you know this case, I found same logic in the latest patch)
>
> --------
> static Query *
> rewriteRuleAction(Query *parsetree,
> ...
>                 if (sub_action_ptr)
> +               {
>                         *sub_action_ptr = sub_action;
> +                       rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> +               }
> --------
>
> And the Basic test passed.
> What do you think ?

That is very close to what I was going to suggest, which is this:

diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..3c4417af98 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
                checkExprHasSubLink((Node *) rule_action->returningList);
    }

+   rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
+
    return rule_action;
 }

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Next
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?