Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CAA4eK1L7T+OzQh-zZB+MXPWzxVvd5ywn9ooWnmQyADN_WZXnEA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
List | pgsql-hackers |
On Tue, Dec 8, 2020 at 6:36 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > > I'm not quite sure how to address this. Can we not allow the planner > > > > to consider that the select is for CTAS and check only after the > > > > planning is done for the Gather node and other checks? > > > > > > > > > > IIUC, you are saying that we should not influence the cost of gather node > > > even when the insertion would be done by workers? I think that should be > > > our fallback option anyway but that might miss some paths to be considered > > > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple > > > transfer cost). I think the idea is we can avoid the tuple transfer cost > > > only when Gather is the top node because only at that time we can push > > > insertion down, right? How about if we have some way to detect the same > > > before calling generate_useful_gather_paths()? I think when we are calling > > > apply_scanjoin_target_to_paths() in grouping_planner(), if the > > > query_level is 1, it is for CTAS, and it doesn't have a chance to create > > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can > > > probably assume that the Gather will be top_node. I am not sure about this > > > but I think it is worth exploring. > > > > > > > I took a look at the parallel insert patch and have the same idea. > > https://commitfest.postgresql.org/31/2844/ > > > > * Consider generating Gather or Gather Merge paths. We must only do this > > * if the relation is parallel safe, and we don't do it for child rels to > > * avoid creating multiple Gather nodes within the same plan. We must do > > * this after all paths have been generated and before set_cheapest, since > > * one of the generated paths may turn out to be the cheapest one. > > */ > > if (rel->consider_parallel && !IS_OTHER_REL(rel)) > > generate_useful_gather_paths(root, rel, false); > > > > IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS. > > But We need check the following parse option which will create path to be the parent of Gatherpath here. > > > > if (root->parse->rowMarks) > > if (limit_needed(root->parse)) > > if (root->parse->sortClause) > > if (root->parse->distinctClause) > > if (root->parse->hasWindowFuncs) > > if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual) > > > > Thanks Amit and Hou. I will look into these areas and get back soon. > It might be better to split the patch for this such that in the base patch, we won't consider anything special for gather costing w.r.t CTAS and in the next patch, we consider all the checks discussed above. -- With Regards, Amit Kapila.
pgsql-hackers by date: