RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Hou, Zhijie
Subject RE: Parallel Inserts in CREATE TABLE AS
Date
Msg-id 3129fda75d9c4e44ba13acd5f0ce8e88@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

> > I took a deep look at the projection logic.
> > In most cases, you are right that Gather node does not need projection.
> >
> > In some rare cases, such as Subplan (or initplan I guess).
> > The projection will happen in Gather node.
> >
> > The example:
> >
> > Create table test(i int);
> > Create table test2(a int, b int);
> > insert into test values(generate_series(1,10000000,1));
> > insert into test2 values(generate_series(1,1000,1),
> > generate_series(1,1000,1));
> >
> > postgres=# explain(verbose, costs off) select test.i,(select i from
> (select * from test2) as tt limit 1) from test where test.i < 2000;
> >                QUERY PLAN
> > ----------------------------------------
> >  Gather
> >    Output: test.i, (SubPlan 1)
> >    Workers Planned: 2
> >    ->  Parallel Seq Scan on public.test
> >          Output: test.i
> >          Filter: (test.i < 2000)
> >    SubPlan 1
> >      ->  Limit
> >            Output: (test.i)
> >            ->  Seq Scan on public.test2
> >                  Output: test.i
> >
> > In this case, projection is necessary, because the subplan will be
> > executed in projection.
> >
> > If skipped, the table created will loss some data.
> >
> 
> Thanks a lot for the use case. Yes with the current patch table will lose
> data related to the subplan. On analyzing further, I think we can not allow
> parallel inserts in the cases when the Gather node has some projections
> to do. Because the workers can not perform that projection. So, having
> ps_ProjInfo in the Gather node is an indication for us to disable parallel
> inserts and only the leader can do the insertions after the Gather node
> does the required projections.
> 
> Thoughts?
> 

Agreed.


2.
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
         {
             ParallelContext *pcxt;
 
+            /*
+             * Take the necessary information to be passed to workers for
+             * parallel inserts in CTAS.
+             */
+            if (ISCTAS(node->ps.intoclause))
+            {
+                node->ps.lefttree->intoclause = node->ps.intoclause;
+                node->ps.lefttree->objectid = node->ps.objectid;
+            }
+
             /* Initialize, or re-initialize, shared state needed by workers. */
             if (!node->pei)
                 node->pei = ExecInitParallelPlan(node->ps.lefttree,

I found the code pass intoclause and objectid to Gather node's lefttree.
Is it necessary? It seems only Gather node will use the information.


Best regards,
houzj




pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Disable WAL logging to speed up data loading
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Disable WAL logging to speed up data loading