RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id TYAPR01MB29903F0BEB8D41D287697612FEBE0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
Hello Greg-san,


Second group of comments (I'll reply to (1) - (4) later):


(5)
@@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
... 
-    if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
+    if ((plannedstmt->commandType != CMD_SELECT &&
+         !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
         PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
 }

Now that we're trying to allow parallel writes (INSERT), we should:

* use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents.  That is,
movethe call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
 

* Update the comment  above the call to ExecCheckXactReadOnly().


(6)
@@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
...
+    else
+    {
+        pei->processed_count = NULL;
+    }

The braces can be deleted.


(7)
@@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
                                          true);
     queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
 
+    Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
+    if (IsModifySupportedInParallelMode(queryDesc->operation))
+    {
+        /*
+         * Record that the CurrentCommandId is used, at the start of the
+         * parallel operation.
+         */
+        SetCurrentCommandIdUsedForWorker();
+    }
+
     /* Setting debug_query_string for individual workers */
     debug_query_string = queryDesc->sourceText;

@@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
     if (used)
     {
         /*
-         * Forbid setting currentCommandIdUsed in a parallel worker, because
-         * we have no provision for communicating this back to the leader.  We
-         * could relax this restriction when currentCommandIdUsed was already
-         * true at the start of the parallel operation.
+         * If in a parallel worker, only allow setting currentCommandIdUsed if
+         * currentCommandIdUsed was already true at the start of the parallel
+         * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+         * setting currentCommandIdUsed because we have no provision for
+         * communicating this back to the leader. Once currentCommandIdUsed is
+         * set, the commandId used by leader and workers can't be changed,
+         * because CommandCounterIncrement() then prevents any attempted
+         * increment of the current commandId.
          */
-        Assert(!IsParallelWorker());
+        Assert(!(IsParallelWorker() && !currentCommandIdUsed));
         currentCommandIdUsed = true;
     }
     return currentCommandId;

What happens without these changes?  If this kind of change is really necessary, it seems more natural to pass
currentCommandIdUsedtogether with currentCommandId through SerializeTransactionState() and
StartParallelWorkerTransaction(),instead of the above changes.
 

As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().


(8)
+        /*
+         * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+         * the relation, and this would result in creation of new CommandIds
+         * on insert/update/delete and this isn't supported in a parallel
+         * worker (but is safe in the parallel leader).
+         */
+        trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+        if (trigtype == RI_TRIGGER_FK)
+        {
+            if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+                return true;
+        }

Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.  See
RI_FKey_check()which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows
argumentset to false, which causes CommandCounterIncrement() to not be called.
 

Plus, tables that have RI_TRIGGER_PK should allow parallel INSERT in a parallel-safe manner, because those triggers
onlyfire for UPDATE and DELETE.  So, for the future parallel UPDATE/DELETE support, the above check should be performed
inUPDATE and DELETE cases.
 

(In a data warehouse, fact tables, which store large amounts of historical data, typically have foreign keys to smaller
dimensiontables.  Thus, it's important to allow parallel INSERTs on tables with foreign keys.)
 


Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Mark Rofail
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Next
From: Masahiro Ikeda
Date:
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view