RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Hou, Zhijie |
---|---|
Subject | RE: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | bc400813bc8a4a0194d21098c1b8276f@G08CNEXMBPEKD05.g08.fujitsu.local Whole thread Raw |
In response to | RE: Parallel Inserts in CREATE TABLE AS ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>) |
Responses |
Re: Parallel Inserts in CREATE TABLE AS
|
List | pgsql-hackers |
> From: Hou, Zhijie [mailto:houzj.fnst@cn.fujitsu.com] > Sent: Tuesday, December 15, 2020 7:30 PM > To: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > Cc: Amit Kapila <amit.kapila16@gmail.com>; Luc Vlaming <luc@swarm64.com>; > PostgreSQL-development <pgsql-hackers@postgresql.org>; Zhihong Yu > <zyu@yugabyte.com>; Dilip Kumar <dilipbalaut@gmail.com> > Subject: RE: Parallel Inserts in CREATE TABLE AS > > > Thanks for the append use case. > > > > Here's my analysis on pushing parallel inserts down even in case the > > top node is Append. > > > > For union cases which need to remove duplicate tuples, we can't push > > the inserts or CTAS dest receiver down. If I'm not wrong, Append node > > is not doing duplicate removal(??), I saw that it's the HashAggregate > > node (which is the top node that removes the duplicate tuples). And > > also for except/except all/intersect/intersect all cases we receive > > HashSetOp nodes on top of Append. So for both cases, our check for > > Gather or Append at the top node is enough to detect this to not allow > parallel inserts. > > > > For union all: > > case 1: We can push the CTAS dest receiver to each Gather node Append > > ->Gather > > ->Parallel Seq Scan > > ->Gather > > ->Parallel Seq Scan > > ->Gather > > ->Parallel Seq Scan > > > > case 2: We can still push the CTAS dest receiver to each Gather node. > > Non-Gather nodes will do inserts as they do now i.e. by sending tuples > > to Append and from there to CTAS dest receiver. > > Append > > ->Gather > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > ->Gather > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > > > case 3: We can push the CTAS dest receiver to Gather Gather > > ->Parallel Append > > ->Parallel Seq Scan > > ->Parallel Seq Scan > > > > case 4: We can push the CTAS dest receiver to Gather Gather > > ->Parallel Append > > ->Parallel Seq Scan > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > > > Please let me know if I'm missing any other possible use case. > > > > Thoughts? > > > Yes, The analysis looks right to me. > > > > As suggested by Amit earlier, I kept the 0001 patch(so far) such that > > it doesn't have the code to influence the planner to consider parallel > > tuple cost as 0. It works on the plan whatever gets generated and > > decides to allow parallel inserts or not. And in the 0002 patch, I > > added the code for influencing the planner to consider parallel tuple > > cost as 0. Maybe we can have a 0003 patch for tests alone. > > > > Once we are okay with the above analysis and use cases, we can > > incorporate the Append changes to respective patches. > > > > Hope that's okay. > > A little explanation about how to push down the ctas info in append. > > 1. about how to ignore tuple cost in this case. > IMO, it create gather path under append like the following: > query_planner > -make_one_rel > --set_base_rel_sizes > ---set_rel_size > ----set_append_rel_size (*) > -----set_rel_size > ------set_subquery_pathlist > -------subquery_planner > --------grouping_planner > ---------apply_scanjoin_target_to_paths > ----------generate_useful_gather_paths > > set_append_rel_size seems the right place where we can check and set a flag > to ignore tuple cost later. > We can set the flag for two cases when there is no parent path will be > created(such as : limit,sort,distinct...): > i) query_level is 1 > ii) query_level > 1 and we have set the flag in the parent_root. > > The case ii) is to check append under append: > Append > ->Append > ->Gather > ->Other plan > > 2.about how to push ctas info down. > > We traversing the whole plans tree, and we only care Append and Gather type. > Gather: It set the ctas dest info and returned true at once if the gathernode > does not have projection. > Append: It will recursively traversing the subplan of Appendnode and will > reture true if one of the subplan can be parallel. > > +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) { > + bool parallel = false; > + > + if(ps == NULL) > + return parallel; > + > + if(IsA(ps, AppendState)) > + { > + AppendState *aps = (AppendState *) ps; > + for(int i = 0; i < aps->as_nplans; i++) > + { > + parallel |= > PushDownCTASParallelInsertState(dest, aps->appendplans[i]); > + } > + } > + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) > + { > + GatherState *gstate = (GatherState *) ps; > + parallel = true; > + > + ((DR_intorel *) dest)->is_parallel = true; > + gstate->dest = dest; > + ps->plan->plan_rows = 0; > + } > + > + return parallel; > +} So sorry for my miss, my last patch has some mistakes. Attatch the new one. Best regards, houzj
Attachment
pgsql-hackers by date: