Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CAFiTN-tiM+A5fL6fXuX9xq7MTsni1zb+8aRNkJq15BbFPpA8fw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Tue, Dec 15, 2020 at 2:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Currently with the patch, we can allow parallel CTAS when topnode is Gather.
> > When top-node is Append and Gather is the sub-node of Append, I think we can still enable
> > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:
> >
> > Append
> > ------>Gather
> > --------->Create table
> > ------------->Seqscan
> > ------>Gather
> > --------->create table
> > ------------->Seqscan
> >
> > And the use case seems common to me, such as:
> > select * from A where xxx union all select * from B where xxx;
>
> Thanks for the append use case.
>
> Here's my analysis on pushing parallel inserts down even in case the
> top node is Append.
>
> For union cases which need to remove duplicate tuples, we can't push
> the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> is not doing duplicate removal(??), I saw that it's the HashAggregate
> node (which is the top node that removes the duplicate tuples). And
> also for except/except all/intersect/intersect all cases we receive
> HashSetOp nodes on top of Append. So for both cases, our check for
> Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
>
> For union all:
> case 1: We can push the CTAS dest receiver to each Gather node
>  Append
>      ->Gather
>          ->Parallel Seq Scan
>      ->Gather
>          ->Parallel Seq Scan
>       ->Gather
>          ->Parallel Seq Scan
>
> case 2: We can still push the CTAS dest receiver to each Gather node.
> Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> to Append and from there to CTAS dest receiver.
>  Append
>      ->Gather
>          ->Parallel Seq Scan
>      ->Seq Scan / Join / any other non-Gather node
>      ->Gather
>          ->Parallel Seq Scan
>      ->Seq Scan / Join / any other non-Gather node
>
> case 3:  We can push the CTAS dest receiver to Gather
>  Gather
>      ->Parallel Append
>          ->Parallel Seq Scan
>          ->Parallel Seq Scan
>
> case 4: We can push the CTAS dest receiver to Gather
>  Gather
>      ->Parallel Append
>          ->Parallel Seq Scan
>          ->Parallel Seq Scan
>          ->Seq Scan / Join / any other non-Gather node
>
> Please let me know if I'm missing any other possible use case.
>
> Thoughts?

Your analysis looks right to me.

> > I attach a WIP patch which just show the possibility of this feature.
> > The patch is based on the latest v11-patch.
> >
> > What do you think?
>
> As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> it doesn't have the code to influence the planner to consider parallel
> tuple cost as 0. It works on the plan whatever gets generated and
> decides to allow parallel inserts or not. And in the 0002 patch, I
> added the code for influencing the planner to consider parallel tuple
> cost as 0. Maybe we can have a 0003 patch for tests alone.

Yeah, that makes sense and it will be easy for the review.

> Once we are okay with the above analysis and use cases, we can
> incorporate the Append changes to respective patches.
>
> Hope that's okay.

Make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: libpq debug log
Next
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?