RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Hou, Zhijie
Subject RE: Parallel Inserts in CREATE TABLE AS
Date
Msg-id 30fa39b09b614f799df364f59307e57e@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi ,

> On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
> wrote:
> >
> > Hi,
> >
> > I have an issue about the following code:
> >
> >         econtext = node->ps.ps_ExprContext;
> >         ResetExprContext(econtext);
> >
> > +       if (ISCTAS(node->ps.intoclause))
> > +       {
> > +               ExecParallelInsertInCTAS(node);
> > +               return NULL;
> > +       }
> >
> >         /* If no projection is required, we're done. */
> >         if (node->ps.ps_ProjInfo == NULL)
> >                 return slot;
> >
> >         /*
> >          * Form the result tuple using ExecProject(), and return it.
> >          */
> >         econtext->ecxt_outertuple = slot;
> >         return ExecProject(node->ps.ps_ProjInfo);
> >
> > It seems the projection will be skipped.
> > Is this because projection is not required in this case ?
> > (I'm not very familiar with where the projection will be.)
> >
> 
> For parallel inserts in CTAS, I don't think we need to project the tuples
> being returned from the underlying plan nodes, and also we have nothing
> to project from the Gather node further up. The required projection will
> happen while the tuples are being returned from the underlying nodes and
> the projected tuples are being directly fed to CTAS's dest receiver
> intorel_receive(), from there into the created table. We don't need
> ExecProject again in ExecParallelInsertInCTAS().
> 
> For instance, projection will always be done when the tuple is being returned
> from an underlying sequential scan node(see ExecScan() -->
> ExecProject() and this is true for both leader and workers. In both leader
> and workers, we are just calling CTAS's dest receiver intorel_receive().
> 
> Thoughts?

I took a deep look at the projection logic.
In most cases, you are right that Gather node does not need projection.

In some rare cases, such as Subplan (or initplan I guess).
The projection will happen in Gather node.

The example:

Create table test(i int);
Create table test2(a int, b int);
insert into test values(generate_series(1,10000000,1));
insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));

postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test
wheretest.i < 2000;
 
               QUERY PLAN
----------------------------------------
 Gather
   Output: test.i, (SubPlan 1)
   Workers Planned: 2
   ->  Parallel Seq Scan on public.test
         Output: test.i
         Filter: (test.i < 2000)
   SubPlan 1
     ->  Limit
           Output: (test.i)
           ->  Seq Scan on public.test2
                 Output: test.i

In this case, projection is necessary,
because the subplan will be executed in projection.

If skipped, the table created will loss some data.



Best regards,
houzj



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Amit Khandekar
Date:
Subject: Re: Improving spin-lock implementation on ARM.