Home > mailing lists
Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: Parallel INSERT (INTO ... SELECT ...)
Date	January 6, 2021 08:42:10
Msg-id	75182.1609922530@antos Whole thread Raw
In response to	Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>)
Responses	Re: Parallel INSERT (INTO ... SELECT ...) Re: Parallel INSERT (INTO ... SELECT ...) Re: Parallel INSERT (INTO ... SELECT ...)
List	pgsql-hackers
Tree view
Greg Nancarrow <gregn4422@gmail.com> wrote:

> Posting an updated set of patches to address recent feedback:

Following is my review.

v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
--------------------------------------------------------------

@@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
         /* all the cheap tests pass, so scan the query tree */
         glob->maxParallelHazard = max_parallel_hazard(parse);
         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+        /*
+         * Additional parallel-mode safety checks are required in order to
+         * allow an underlying parallel query to be used for a
+         * table-modification command that is supported in parallel-mode.
+         */
+        if (glob->parallelModeOK &&
+            IsModifySupportedInParallelMode(parse->commandType))
+        {
+            glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
+            glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+        }

Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
functions should not be called by parallel worker.


@@ -1015,6 +1016,27 @@ IsInParallelMode(void)
 }

 /*
+ *    PrepareParallelMode
+ *
+ * Prepare for entering parallel mode, based on command-type.
+ */
+void
+PrepareParallelMode(CmdType commandType)
+{
+    Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);

Isn't the test of force_parallel_mode just a hack to make regression tests
pass? When I removed this part and ran the regression tests with
force_parallel_mode=regress, the assertion fired when executing a subquery
because the executor was already in parallel mode due to the main query
execution. I think the function should be implemented such that it does not
mind repeated execution by the same backend.

As an alternative, have you considered allocation of the XID even in parallel
mode? I imagine that the first parallel worker that needs the XID for
insertions allocates it and shares it with the other workers as well as with
the leader process.

One problem of the current patch version is that the "INSERT INTO ... SELECT
..." statement consumes XID even if the SELECT eventually does not return any
row. However, if the same query is processed w/o parallelism, the XID is only
allocated if at least one tuple needs to be inserted.


v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
-------------------------------------------------------------------

@@ -1021,12 +1039,15 @@ IsInParallelMode(void)
  * Prepare for entering parallel mode, based on command-type.
  */
 void
-PrepareParallelMode(CmdType commandType)
+PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
 {
        Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);

        if (IsModifySupportedInParallelMode(commandType))
        {
+               if (isParallelModifyLeader)
+                       (void) GetCurrentCommandId(true);

I miss a comment here. I suppose this is to set currentCommandIdUsed, so that
the leader process gets a new commandId for the following statements in the
same transaction, and thus it can see the rows inserted by the parallel
workers?

If my understanding is correct, I think that the leader should not participate
in the execution of the Insert node, else it would use higher commandId than
the workers. That would be weird, although probably not data corruption. I
wonder if parallel_leader_participation should be considered false for the
"Gather -> Insert -> ..." plans.


@@ -144,9 +148,19 @@ ExecGather(PlanState *pstate)
     GatherState *node = castNode(GatherState, pstate);
     TupleTableSlot *slot;
     ExprContext *econtext;
+    ModifyTableState *nodeModifyTableState = NULL;
+    bool        isParallelModifyLeader = false;
+    bool        isParallelModifyWithReturning = false;

The variable names are quite long. Since this code deals with the Gather node,
I think that both "Parallel" and "Leader" components can be removed.


@@ -418,14 +446,35 @@ ExecShutdownGatherWorkers(GatherState *node)
 void
 ExecShutdownGather(GatherState *node)
 {
-    ExecShutdownGatherWorkers(node);
+    bool        isParallelModifyLeader;

Likewise, the variable name.


@@ -208,7 +236,7 @@ ExecGather(PlanState *pstate)
         }

         /* Run plan locally if no workers or enabled and not single-copy. */
-        node->need_to_scan_locally = (node->nreaders == 0)
+        node->need_to_scan_locally = (node->nworkers_launched <= 0)
             || (!gather->single_copy && parallel_leader_participation);
         node->initialized = true;
     }

Is this change needed? The code just before this test indicates that nreaders
should be equal to nworkers_launched.


In grouping_planner(), this branch

+    /* Consider a supported parallel table-modification command */
+    if (IsModifySupportedInParallelMode(parse->commandType) &&
+        !inheritance_update &&
+        final_rel->consider_parallel &&
+        parse->rowMarks == NIL)
+    {

is very similar to creation of the non-parallel ModifyTablePaths - perhaps an
opportunity to move the common code into a new function.


@@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
         }
     }

+    if (parallel_modify_partial_path_count > 0)
+    {
+        final_rel->rows = current_rel->rows;    /* ??? why hasn't this been
+                                                 * set above somewhere ???? */
+        generate_useful_gather_paths(root, final_rel, false);
+    }
+
     extra.limit_needed = limit_needed(parse);
     extra.limit_tuples = limit_tuples;
     extra.count_est = count_est;

A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no
need to count the paths using parallel_modify_partial_path_count.


@@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
     PlannerGlobal *glob = root->glob;
     int            rtoffset = list_length(glob->finalrtable);
     ListCell   *lc;
+    Plan       *finalPlan;

     /*
      * Add all the query's RTEs to the flattened rangetable.  The live ones
@@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
     }

     /* Now fix the Plan tree */
-    return set_plan_refs(root, plan, rtoffset);
+    finalPlan = set_plan_refs(root, plan, rtoffset);
+    if (finalPlan != NULL && IsA(finalPlan, Gather))
+    {
+        Plan       *subplan = outerPlan(finalPlan);
+
+        if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
+        {
+            finalPlan->targetlist = copyObject(subplan->targetlist);
+        }
+    }
+    return finalPlan;
 }

I'm not sure if the problem of missing targetlist should be handled here (BTW,
NIL is the constant for an empty list, not NULL). Obviously this is a
consequence of the fact that the ModifyTable node has no regular targetlist.

Actually I don't quite understand why (in the current master branch) the
targetlist initialized in set_plan_refs()

    /*
     * Set up the visible plan targetlist as being the same as
     * the first RETURNING list. This is for the use of
     * EXPLAIN; the executor won't pay any attention to the
     * targetlist.  We postpone this step until here so that
     * we don't have to do set_returning_clause_references()
     * twice on identical targetlists.
     */
    splan->plan.targetlist = copyObject(linitial(newRL));

is not used. Instead, ExecInitModifyTable() picks the first returning list
again:

    /*
     * Initialize result tuple slot and assign its rowtype using the first
     * RETURNING list.  We assume the rest will look the same.
     */
    mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);

So if you set the targetlist in create_modifytable_plan() (according to
best_path->returningLists), or even in create_modifytable_path(), and ensure
that it gets propagated to the Gather node (generate_gather_pahs currently
uses rel->reltarget), then you should no longer need to tweak
setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the
targetlist. However I don't guarantee that this is the best approach - some
planner expert should speak up.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com
pgsql-hackers by date:
From: Kyotaro Horiguchi
Date: 06 January 2021, 08:33:27
Subject: Re: Corruption during WAL replay
From: Peter Smith
Date: 06 January 2021, 08:43:37
Subject: Re: Single transaction in the tablesync worker?
Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

Previous

Next