Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CALj2ACWLA775yaSCo06FM+qh6is3W3Ht5SvqUHBzu=x0-jPa0g@mail.gmail.com Whole thread Raw |
In response to | RE: Parallel Inserts in CREATE TABLE AS ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>) |
Responses |
Re: Parallel Inserts in CREATE TABLE AS
RE: Parallel Inserts in CREATE TABLE AS |
List | pgsql-hackers |
On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > Currently with the patch, we can allow parallel CTAS when topnode is Gather. > When top-node is Append and Gather is the sub-node of Append, I think we can still enable > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as: > > Append > ------>Gather > --------->Create table > ------------->Seqscan > ------>Gather > --------->create table > ------------->Seqscan > > And the use case seems common to me, such as: > select * from A where xxx union all select * from B where xxx; Thanks for the append use case. Here's my analysis on pushing parallel inserts down even in case the top node is Append. For union cases which need to remove duplicate tuples, we can't push the inserts or CTAS dest receiver down. If I'm not wrong, Append node is not doing duplicate removal(??), I saw that it's the HashAggregate node (which is the top node that removes the duplicate tuples). And also for except/except all/intersect/intersect all cases we receive HashSetOp nodes on top of Append. So for both cases, our check for Gather or Append at the top node is enough to detect this to not allow parallel inserts. For union all: case 1: We can push the CTAS dest receiver to each Gather node Append ->Gather ->Parallel Seq Scan ->Gather ->Parallel Seq Scan ->Gather ->Parallel Seq Scan case 2: We can still push the CTAS dest receiver to each Gather node. Non-Gather nodes will do inserts as they do now i.e. by sending tuples to Append and from there to CTAS dest receiver. Append ->Gather ->Parallel Seq Scan ->Seq Scan / Join / any other non-Gather node ->Gather ->Parallel Seq Scan ->Seq Scan / Join / any other non-Gather node case 3: We can push the CTAS dest receiver to Gather Gather ->Parallel Append ->Parallel Seq Scan ->Parallel Seq Scan case 4: We can push the CTAS dest receiver to Gather Gather ->Parallel Append ->Parallel Seq Scan ->Parallel Seq Scan ->Seq Scan / Join / any other non-Gather node Please let me know if I'm missing any other possible use case. Thoughts? > I attach a WIP patch which just show the possibility of this feature. > The patch is based on the latest v11-patch. > > What do you think? As suggested by Amit earlier, I kept the 0001 patch(so far) such that it doesn't have the code to influence the planner to consider parallel tuple cost as 0. It works on the plan whatever gets generated and decides to allow parallel inserts or not. And in the 0002 patch, I added the code for influencing the planner to consider parallel tuple cost as 0. Maybe we can have a 0003 patch for tests alone. Once we are okay with the above analysis and use cases, we can incorporate the Append changes to respective patches. Hope that's okay. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: