Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id 04c52d1d-7e67-90dd-1cdc-a84b09229c9b@swarm64.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses RE: Parallel Inserts in CREATE TABLE AS
Re: Parallel Inserts in CREATE TABLE AS
List pgsql-hackers
On 30-12-2020 04:55, Bharath Rupireddy wrote:
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>> w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
>>
>> + * Push the dest receiver to Gather node when it is either at the top of the
>> + * plan or under top Append node unless it does not have any projections to do.
>>
>> I think the 'unless' should be 'if'. As can be seen from the body of the method:
>>
>> +       if (!ps->ps_ProjInfo)
>> +       {
>> +           GatherState *gstate = (GatherState *) ps;
>> +
>> +           parallel = true;
> 
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
> 
> Please consider v18 patch set for further review.
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Sorry it took so long to get back to reviewing this.

wrt v18-0001....patch:

+        /*
+         * If the worker is for parallel insert in CTAS, then use the proper
+         * dest receiver.
+         */
+        intoclause = (IntoClause *) stringToNode(intoclausestr);
+        receiver = CreateIntoRelDestReceiver(intoclause);
+        ((DR_intorel *)receiver)->is_parallel_worker = true;
+        ((DR_intorel *)receiver)->object_id = fpes->objectid;
I would move this into a function called e.g. 
GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in 
createas.c.
I would then also split up intorel_startup into intorel_leader_startup
and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set 
self->pub.rStartup to intorel_worker_startup.


+    volatile pg_atomic_uint64    *processed;
why is it volatile?


+            if (isctas)
+            {
+                intoclause = ((DR_intorel *) node->dest)->into;
+                objectid = ((DR_intorel *) node->dest)->object_id;
+            }
Given that you extract them each once and then pass them directly into 
the parallel-worker, can't you instead pass in the destreceiver and 
leave that logic to ExecInitParallelPlan?


+                    if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+                        ((DR_intorel *) gstate->dest)->into->rel &&
+                        ((DR_intorel *) gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted? 
because it seems from the intorel_startup function that that would be 
set as soon as startup was done, which i assume (wrongly?) is always done?


+     * In case if no workers were launched, allow the leader to insert entire
+     * tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?


================
wrt v18-0002....patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to 
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around 
generate_useful_gather_paths and in cost_gather when to me it seems more 
logical to put it in create_gather_path? i'm probably missing something 
there?


================
wrt v18-0003....patch:

not sure if it is needed, but i was wondering if we would want more 
tests with multiple gather nodes existing? caused e.g. by using CTE's, 
valid subquery's (like the one test you have, but without the group 
by/having)?


Kind regards,
Luc



pgsql-hackers by date:

Previous
From: Daniil Zakhlystov
Date:
Subject: Re: zstd compression for pg_dump
Next
From: Masahiko Sawada
Date:
Subject: Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements