Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CAFiTN-uvhELLvJeEneqvRYMff+=d2rOAyNjA04VwC9KidF1s5A@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
List | pgsql-hackers |
On Mon, Dec 7, 2020 at 7:04 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Dec 7, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > What is the need of checking query_level when 'isForCTAS' is set only > > > > when Gather is a top-node? > > > > > > > > > > isForCTAS is getting set before pg_plan_query() which is being used in > > > cost_gather(). We will not have a Gather node by then and hence will > > > not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while > > > setting isForCTAS to true. > > > > > > > IsParallelInsertInCTASAllowed() seems to be returning false if > > queryDesc is NULL, so won't isForCTAS be always set to false? I think > > I am missing something here. > > > > My bad. I utterly missed this, sorry for the confusion. > > My intention to have IsParallelInsertInCTASAllowed() is for two > purposes. 1. when called before planning without queryDesc, it should > return true if IS_CTAS(into) is true and is not a temporary table. 2. > when called after planning with a non-null queryDesc, along with 1) > checks, it should also perform the Gather State checks and return > accordingly. > > I have corrected it in v9 patch. Please have a look. > > > > > > > The isForCTAS will be true because [create table as], the > > > > query_level is always 1 because there is no subquery. > > > > So even if gather is not the top node, parallel cost will still be ignored. > > > > > > > > Is that works as expected ? > > > > > > > > > > I don't think that is expected and is not the case without this patch. > > > The cost shouldn't be changed for existing cases where the write is > > > not pushed to workers. > > > > > > > Thanks for pointing that out. Yes it should not change for the cases > > where parallel inserts will not be picked later. > > > > Any better suggestions on how to make the planner consider that the > > CTAS might choose parallel inserts later at the same time avoiding the > > above issue in case it doesn't? > > > > I'm not quite sure how to address this. Can we not allow the planner > to consider that the select is for CTAS and check only after the > planning is done for the Gather node and other checks? This is simple > to do, but we might miss some parallel plans for the SELECTs because > the planner would have already considered the tuple transfer cost from > workers to Gather wrongly because of which that parallel plan would > have become costlier compared to non parallel plans. IMO, we can do > this since it also keeps the existing behaviour of the planner i.e. > when the planner is planning for SELECTs it doesn't know that it is > doing it for CTAS. Thoughts? > I have done some initial review and I have a few comments. @@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt, query = linitial_node(Query, rewritten); Assert(query->commandType == CMD_SELECT); + /* + * Flag to let the planner know that the SELECT query is for CTAS. This + * is used to calculate the tuple transfer cost from workers to gather + * node(in case parallelism kicks in for the SELECT part of the CTAS), + * to zero as each worker will insert its share of tuples in parallel. + */ + if (IsParallelInsertInCTASAllowed(into, NULL)) + query->isForCTAS = true; + /* plan the query */ plan = pg_plan_query(query, pstate->p_sourcetext, CURSOR_OPT_PARALLEL_OK, params); @@ -350,6 +347,15 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt, /* call ExecutorStart to prepare the plan for execution */ ExecutorStart(queryDesc, GetIntoRelEFlags(into)); + /* + * If SELECT part of the CTAS is parallelizable, then make each + * parallel worker insert the tuples that are resulted in its execution + * into the target table. We need plan state to be initialized by the + * executor to decide whether to allow parallel inserts or not. + */ + if (IsParallelInsertInCTASAllowed(into, queryDesc)) + SetCTASParallelInsertState(queryDesc); Once we have called IsParallelInsertInCTASAllowed and set the query->isForCTAS flag then why we are calling this again? —— --- + */ + if (!(root->parse->isForCTAS && + root->query_level == 1)) + run_cost += parallel_tuple_cost * path->path.rows; From this check, it appeared that the lower level gather will also get influenced by this, consider this -> NLJ -> Gather -> Parallel Seq Scan -> Index Scan This condition is only checking that it should be a top-level query and it should be under CTAS then this will impact all the gather nodes as shown in the above example. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: