Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Luc Vlaming
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id ef02fe4e-b476-40e6-3fd6-3407caf82fb5@swarm64.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On 04-01-2021 14:32, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com 
> <mailto:luc@swarm64.com>> wrote:
>  > Sorry it took so long to get back to reviewing this.
> 
> Thanks for the comments.
> 
>  > wrt v18-0001....patch:
>  >
>  > +               /*
>  > +                * If the worker is for parallel insert in CTAS, then 
> use the proper
>  > +                * dest receiver.
>  > +                */
>  > +               intoclause = (IntoClause *) stringToNode(intoclausestr);
>  > +               receiver = CreateIntoRelDestReceiver(intoclause);
>  > +               ((DR_intorel *)receiver)->is_parallel_worker = true;
>  > +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
>  > I would move this into a function called e.g.
>  > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
>  > createas.c.
>  > I would then also split up intorel_startup into intorel_leader_startup
>  > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
>  > self->pub.rStartup to intorel_worker_startup.
> 
> My intention was to not add any new APIs to the dest receiver. I simply 
> made the changes in intorel_startup, in which for workers it just does 
> the minimalistic work and exit from it. In the leader most of the table 
> creation and sanity check is kept untouched. Please have a look at the 
> v19 patch posted upthread [1].
> 

Looks much better, really nicely abstracted away in the v20 patch.

>  > +       volatile pg_atomic_uint64       *processed;
>  > why is it volatile?
> 
> Intention is to always read from the actual memory location. I referred 
> it from the way pg_atomic_fetch_add_u64_impl, 
> pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their 
> u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
> 

Okay I had not seen this syntax before for atomics with the volatile 
keyword but its apparently how the atomics abstraction works in postgresql.

>  > +                       if (isctas)
>  > +                       {
>  > +                               intoclause = ((DR_intorel *) 
> node->dest)->into;
>  > +                               objectid = ((DR_intorel *) 
> node->dest)->object_id;
>  > +                       }
>  > Given that you extract them each once and then pass them directly into
>  > the parallel-worker, can't you instead pass in the destreceiver and
>  > leave that logic to ExecInitParallelPlan?
> 
> That's changed entirely in the v19 patch set posted upthread [1]. Please 
> have a look. I didn't pass the dest receiver, to keep the API generic, I 
> passed parallel insert command type and a void * ptr which points to 
> insertion command because the information we pass to workers depends on 
> the insertion command (for instance, the information needed by workers 
> is for CTAS into clause and object id and for Refresh Mat View object id).
> 
>  >
>  > +                                       if 
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
>  > +                                               ((DR_intorel *) 
> gstate->dest)->into->rel &&
>  > +                                               ((DR_intorel *) 
> gstate->dest)->into->rel->relname)
>  > why would rel and relname not be there? if no rows have been inserted?
>  > because it seems from the intorel_startup function that that would be
>  > set as soon as startup was done, which i assume (wrongly?) is always 
> done?
> 
> Actually, that into clause rel variable is always being set in the 
> gram.y for CTAS, Create Materialized View and SELECT INTO (because 
> qualified_name non-terminal is not optional). My bad. I just added it as 
> a sanity check. Actually, it's not required.
> 
> create_as_target:
> *qualified_name* opt_column_list table_access_method_clause
>              OptWith OnCommitOption OptTableSpace
>                  {
>                      $$ = makeNode(IntoClause);
> *                    $$->rel = $1;*
> create_mv_target:
> *qualified_name* opt_column_list table_access_method_clause 
> opt_reloptions OptTableSpace
>              {
>                  $$ = makeNode(IntoClause);
> *                $$->rel = $1;*
> into_clause:
>              INTO OptTempTableName
>              {
>                  $$ = makeNode(IntoClause);
> *               $$->rel = $2;*
> 
> I will change the below code:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> +                        ((DR_intorel *) gstate->dest)->into &&
> +                        ((DR_intorel *) gstate->dest)->into->rel &&
> +                        ((DR_intorel *) gstate->dest)->into->rel->relname)
> +                    {
> 
> to:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                    {
> 
> I will update this in the next version of the patch set.
> 

Thanks
>  >
>  > +        * In case if no workers were launched, allow the leader to 
> insert entire
>  > +        * tuples.
>  > what does "entire tuples" mean? should it maybe be "all tuples"?
> 
> Yeah, noticed that while working on the v19 patch set. Please have a 
> look at the v19 patch posted upthread [1].
> 
>  > ================
>  > wrt v18-0003....patch:
>  >
>  > not sure if it is needed, but i was wondering if we would want more
>  > tests with multiple gather nodes existing? caused e.g. by using CTE's,
>  > valid subquery's (like the one test you have, but without the group
>  > by/having)?
> 
> I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH 
> RECURSIVE and I don't see that any of the WITH clause processing hits 
> createas.c functions. So, IMHO, we don't need to add them. Please let me 
> know if there are any specific use cases you have in mind.
> 
> For instance, I tried to cover Init/Sub Plan and Subquery cases with:
> 
> below case has multiple Gather, Init Plan:
> +-- parallel inserts must occur, as there is init plan that gets executed by
> +-- each parallel worker
> +select explain_pictas(
> +'create table parallel_write as select two col1,
> +    (select two from (select * from tenk2) as tt limit 1) col2
> +    from tenk1  where tenk1.four = 3;');
> 
> below case has Gather, Sub Plan:
> +-- parallel inserts must not occur, as there is sub plan that gets 
> executed by
> +-- the Gather node in leader
> +select explain_pictas(
> +'create table parallel_write as select two col1,
> +    (select tenk1.two from generate_series(1,1)) col2
> +    from tenk1  where tenk1.four = 3;');
> 
> For multiple Gather node cases, I covered them with the Union All/Append 
> cases in the 0004 patch. Please have a look.
> 

Right, had not reviewed part 4 yet. My bad.

> [1] - 
> https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com 
> <https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com>
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Kind regards,
Luc



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Track replica origin progress for Rollback Prepared
Next
From: Michael Paquier
Date:
Subject: Re: pg_rewind restore_command issue in PG12