Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Luc Vlaming |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | 04c52d1d-7e67-90dd-1cdc-a84b09229c9b@swarm64.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
RE: Parallel Inserts in CREATE TABLE AS
Re: Parallel Inserts in CREATE TABLE AS |
List | pgsql-hackers |
On 30-12-2020 04:55, Bharath Rupireddy wrote: > On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote: >> w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch >> >> + * Push the dest receiver to Gather node when it is either at the top of the >> + * plan or under top Append node unless it does not have any projections to do. >> >> I think the 'unless' should be 'if'. As can be seen from the body of the method: >> >> + if (!ps->ps_ProjInfo) >> + { >> + GatherState *gstate = (GatherState *) ps; >> + >> + parallel = true; > > Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note > that no change in 0001 to 0003 patches from v17. > > Please consider v18 patch set for further review. > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com > Hi, Sorry it took so long to get back to reviewing this. wrt v18-0001....patch: + /* + * If the worker is for parallel insert in CTAS, then use the proper + * dest receiver. + */ + intoclause = (IntoClause *) stringToNode(intoclausestr); + receiver = CreateIntoRelDestReceiver(intoclause); + ((DR_intorel *)receiver)->is_parallel_worker = true; + ((DR_intorel *)receiver)->object_id = fpes->objectid; I would move this into a function called e.g. GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in createas.c. I would then also split up intorel_startup into intorel_leader_startup and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set self->pub.rStartup to intorel_worker_startup. + volatile pg_atomic_uint64 *processed; why is it volatile? + if (isctas) + { + intoclause = ((DR_intorel *) node->dest)->into; + objectid = ((DR_intorel *) node->dest)->object_id; + } Given that you extract them each once and then pass them directly into the parallel-worker, can't you instead pass in the destreceiver and leave that logic to ExecInitParallelPlan? + if (IS_PARALLEL_CTAS_DEST(gstate->dest) && + ((DR_intorel *) gstate->dest)->into->rel && + ((DR_intorel *) gstate->dest)->into->rel->relname) why would rel and relname not be there? if no rows have been inserted? because it seems from the intorel_startup function that that would be set as soon as startup was done, which i assume (wrongly?) is always done? + * In case if no workers were launched, allow the leader to insert entire + * tuples. what does "entire tuples" mean? should it maybe be "all tuples"? ================ wrt v18-0002....patch: It looks like this introduces a state machine that goes like: - starts at CTAS_PARALLEL_INS_UNDEF - possibly moves to CTAS_PARALLEL_INS_SELECT - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added - if both were added at some stage, we can go to CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs what i'm wondering is why you opted to put logic around generate_useful_gather_paths and in cost_gather when to me it seems more logical to put it in create_gather_path? i'm probably missing something there? ================ wrt v18-0003....patch: not sure if it is needed, but i was wondering if we would want more tests with multiple gather nodes existing? caused e.g. by using CTE's, valid subquery's (like the one test you have, but without the group by/having)? Kind regards, Luc
pgsql-hackers by date: