Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1JMpPZcWknaa1-afbJ=bof3JmV1HWHbVRkThbkwah4_Dg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Antonin Houska <ah@cybertec.at>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
..
>
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
> -------------------------------------------------------------------
>
> @@ -1021,12 +1039,15 @@ IsInParallelMode(void)
>   * Prepare for entering parallel mode, based on command-type.
>   */
>  void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
>  {
>         Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
>
>         if (IsModifySupportedInParallelMode(commandType))
>         {
> +               if (isParallelModifyLeader)
> +                       (void) GetCurrentCommandId(true);
>
> I miss a comment here. I suppose this is to set currentCommandIdUsed, so that
> the leader process gets a new commandId for the following statements in the
> same transaction, and thus it can see the rows inserted by the parallel
> workers?
>

oh no, leader backend and worker backends must use the same commandId.
I am also not sure if we need this because for Insert statements we
already call GetCurrentCommandId(true) is standard_ExecutorStart. We
don't want the rows visibility behavior for parallel-inserts any
different than non-parallel ones.

> If my understanding is correct, I think that the leader should not participate
> in the execution of the Insert node, else it would use higher commandId than
> the workers. That would be weird, although probably not data corruption.
>

Yeah, exactly this is the reason both leader and backends must use the
same commandId.

> I
> wonder if parallel_leader_participation should be considered false for the
> "Gather -> Insert -> ..." plans.
>

If what I said above is correct then this is moot.

>
>
> @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate)
>                 }
>
>                 /* Run plan locally if no workers or enabled and not single-copy. */
> -               node->need_to_scan_locally = (node->nreaders == 0)
> +               node->need_to_scan_locally = (node->nworkers_launched <= 0)
>                         || (!gather->single_copy && parallel_leader_participation);
>                 node->initialized = true;
>         }
>
> Is this change needed? The code just before this test indicates that nreaders
> should be equal to nworkers_launched.
>

This change is required because we don't need to set up readers for
parallel-insert unless there is a returning clause. See the below
check a few lines before this change:

- if (pcxt->nworkers_launched > 0)
+ if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader &&
!isParallelModifyWithReturning))
  {

I think this check could be simplified to if (pcxt->nworkers_launched
> 0 && isParallelModifyWithReturning) or something like that.

>
> In grouping_planner(), this branch
>
> +       /* Consider a supported parallel table-modification command */
> +       if (IsModifySupportedInParallelMode(parse->commandType) &&
> +               !inheritance_update &&
> +               final_rel->consider_parallel &&
> +               parse->rowMarks == NIL)
> +       {
>
> is very similar to creation of the non-parallel ModifyTablePaths - perhaps an
> opportunity to move the common code into a new function.
>

+1.

>
> @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
>                 }
>         }
>
> +       if (parallel_modify_partial_path_count > 0)
> +       {
> +               final_rel->rows = current_rel->rows;    /* ??? why hasn't this been
> +                                                                                                * set above
somewhere???? */
 
> +               generate_useful_gather_paths(root, final_rel, false);
> +       }
> +
>         extra.limit_needed = limit_needed(parse);
>         extra.limit_tuples = limit_tuples;
>         extra.count_est = count_est;
>
> A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no
> need to count the paths using parallel_modify_partial_path_count.
>

Sounds sensible.

>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         PlannerGlobal *glob = root->glob;
>         int                     rtoffset = list_length(glob->finalrtable);
>         ListCell   *lc;
> +       Plan       *finalPlan;
>
>         /*
>          * Add all the query's RTEs to the flattened rangetable.  The live ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         }
>
>         /* Now fix the Plan tree */
> -       return set_plan_refs(root, plan, rtoffset);
> +       finalPlan = set_plan_refs(root, plan, rtoffset);
> +       if (finalPlan != NULL && IsA(finalPlan, Gather))
> +       {
> +               Plan       *subplan = outerPlan(finalPlan);
> +
> +               if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> +               {
> +                       finalPlan->targetlist = copyObject(subplan->targetlist);
> +               }
> +       }
> +       return finalPlan;
>  }
>
> I'm not sure if the problem of missing targetlist should be handled here (BTW,
> NIL is the constant for an empty list, not NULL). Obviously this is a
> consequence of the fact that the ModifyTable node has no regular targetlist.
>

I think it is better to add comments along with this change. In this
form, this looks quite hacky to me.

> Actually I don't quite understand why (in the current master branch) the
> targetlist initialized in set_plan_refs()
>
>         /*
>          * Set up the visible plan targetlist as being the same as
>          * the first RETURNING list. This is for the use of
>          * EXPLAIN; the executor won't pay any attention to the
>          * targetlist.  We postpone this step until here so that
>          * we don't have to do set_returning_clause_references()
>          * twice on identical targetlists.
>          */
>         splan->plan.targetlist = copyObject(linitial(newRL));
>
> is not used. Instead, ExecInitModifyTable() picks the first returning list
> again:
>
>         /*
>          * Initialize result tuple slot and assign its rowtype using the first
>          * RETURNING list.  We assume the rest will look the same.
>          */
>         mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>
> So if you set the targetlist in create_modifytable_plan() (according to
> best_path->returningLists), or even in create_modifytable_path(), and ensure
> that it gets propagated to the Gather node (generate_gather_pahs currently
> uses rel->reltarget), then you should no longer need to tweak
> setrefs.c.

This sounds worth investigating.

> Moreover, ExecInitModifyTable() would no longer need to set the
> targetlist.
>

I am not sure if we need to do anything about ExecInitModifyTable. If
we want to unify what setrefs.c does with ExecInitModifyTable, then we
can start a separate thread.

Thanks for all the reviews. I would like to emphasize what I said
earlier in this thread that it is better to first focus on
Parallelising Selects for Insert (aka what
v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT does) as that
in itself is a step towards achieving parallel inserts, doing both
0001 and 0003 at the same time can take much more time as both touches
quite intricate parts of the code.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Peter Eisentraut
Date:
Subject: Re: Improper use about DatumGetInt32