Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CAFiTN-tiM+A5fL6fXuX9xq7MTsni1zb+8aRNkJq15BbFPpA8fw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
List | pgsql-hackers |
On Tue, Dec 15, 2020 at 2:06 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Currently with the patch, we can allow parallel CTAS when topnode is Gather. > > When top-node is Append and Gather is the sub-node of Append, I think we can still enable > > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as: > > > > Append > > ------>Gather > > --------->Create table > > ------------->Seqscan > > ------>Gather > > --------->create table > > ------------->Seqscan > > > > And the use case seems common to me, such as: > > select * from A where xxx union all select * from B where xxx; > > Thanks for the append use case. > > Here's my analysis on pushing parallel inserts down even in case the > top node is Append. > > For union cases which need to remove duplicate tuples, we can't push > the inserts or CTAS dest receiver down. If I'm not wrong, Append node > is not doing duplicate removal(??), I saw that it's the HashAggregate > node (which is the top node that removes the duplicate tuples). And > also for except/except all/intersect/intersect all cases we receive > HashSetOp nodes on top of Append. So for both cases, our check for > Gather or Append at the top node is enough to detect this to not allow > parallel inserts. > > For union all: > case 1: We can push the CTAS dest receiver to each Gather node > Append > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > > case 2: We can still push the CTAS dest receiver to each Gather node. > Non-Gather nodes will do inserts as they do now i.e. by sending tuples > to Append and from there to CTAS dest receiver. > Append > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > case 3: We can push the CTAS dest receiver to Gather > Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > > case 4: We can push the CTAS dest receiver to Gather > Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > Please let me know if I'm missing any other possible use case. > > Thoughts? Your analysis looks right to me. > > I attach a WIP patch which just show the possibility of this feature. > > The patch is based on the latest v11-patch. > > > > What do you think? > > As suggested by Amit earlier, I kept the 0001 patch(so far) such that > it doesn't have the code to influence the planner to consider parallel > tuple cost as 0. It works on the plan whatever gets generated and > decides to allow parallel inserts or not. And in the 0002 patch, I > added the code for influencing the planner to consider parallel tuple > cost as 0. Maybe we can have a 0003 patch for tests alone. Yeah, that makes sense and it will be easy for the review. > Once we are okay with the above analysis and use cases, we can > incorporate the Append changes to respective patches. > > Hope that's okay. Make sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: