Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id 9a31a93b-ea03-9cf6-e075-61766bfe2720@swarm64.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On 05-01-2021 13:57, Bharath Rupireddy wrote:
> On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <luc@swarm64.com> wrote:
>> Reviewing further v20-0001:
>>
>> I would still opt for moving the code for the parallel worker into a
>> separate function, and then setting rStartup of the dest receiver to
>> that function in ExecParallelGetInsReceiver, as its completely
>> independent code. Just a matter of style I guess.
> 
> If we were to have a intorel_startup_worker and assign it to
> self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
> we have to pass a parameter to CreateIntoRelDestReceiver as an
> indication of parallel worker, which requires code changes in places
> wherever CreateIntoRelDestReceiver is used. 2) we can also assign
> intorel_startup_worker after CreateIntoRelDestReceiver in
> ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
> can duplicate CreateIntoRelDestReceiver and have a
> CreateIntoRelParallelDestReceiver with the only change being that
> self->pub.rStartup = intorel_startup_worker;
> 
> IMHO, the way it is currently, looks good. Anyways, I'm open to
> changing that if we agree on any of the above 3 ways.

The current way is good enough, it was a suggestion as personally I find 
it hard to read to have two completely separate code paths in the same 
function. If any I would opt for something like 3) where there's a 
CreateIntoRelParallelDestReceiver which calls CreateIntoRelDestReceiver 
and then overrides rStartup to intorel_startup_worker. Then no callsites 
have to change except the ones that are for parallel workers.
> 
> If we were to do any of the above, then we might have to do the same
> thing for other commands Refresh Materialized View or Copy To where we
> can parallelize.
> 
> Thoughts?
> 
>> Maybe I'm not completely following why but afaics we want parallel
>> inserts in various scenarios, not just CTAS? I'm asking because code like
>> +       if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>> +               pg_atomic_add_fetch_u64(&fpes->processed,
>> queryDesc->estate->es_processed);
>> seems very specific to CTAS. For now that seems fine but I suppose that
>> would be generalized soon after? Basically I would have expected the if
>> to compare against PARALLEL_INSERT_CMD_UNDEF.
> 
> After this patch is reviewed and goes for commit, then the next thing
> I plan to do is to allow parallel inserts in Refresh Materialized View
> and it can be used for that. I think the processed variable can also
> be used for parallel inserts in INSERT INTO SELECT [1] as well.
> Currently, I'm keeping it for CTAS, maybe later (after this is
> committed) it can be generalized.
> 
> Thoughts?

Sounds good

> 
> [1] -
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com
> 
>> Apart from these small things v20-0001 looks (very) good to me.
>> v20-0003 and v20-0004:
>> looks good to me.
> 
> Thanks.
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Kind regards,
Luc



pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: When (and whether) should we improve the chapter on parallel query to accommodate parallel data updates?
Next
From: Andrey Borodin
Date:
Subject: Re: archive status ".ready" files may be created too early