Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From vignesh C
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CALDaNm2GK8+0uicSvP3hKYqREymNus5GoxtwhvfTx7Mx2O=qWA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers

>
> See attached patches.
>

Thanks for providing the patches.
I had reviewed v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to be
+ * included in the transaction state that is serialized in the parallel DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+       Assert(!IsInParallelMode());
+
+       (void)GetCurrentTransactionId();
+}

-> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes into a completely new table.  In the future, we can extend it to allow workers to write into the table" must be updated accordingly:
+        * modify any data using a CTE, or if this is a cursor operation, or if
+        * GUCs are set to values that don't permit parallelism, or if
+        * parallel-unsafe functions are present in the query tree.
         *
-        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+        * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
         * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
         * backend writes into a completely new table.  In the future, we can
         * extend it to allow workers to write into the table.  However, to allow

-> Also should we specify insert as "insert into select"
 
-> We could include a small writeup of the design may be in the commit message. It will be useful for review.

-> I felt the below two assignment statements can be in the else condition:
                glob->maxParallelHazard = max_parallel_hazard(parse);
                glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+               /*
+                * Additional parallel-mode safety checks are required in order to
+                * allow an underlying parallel query to be used for a
+                * table-modification command that is supported in parallel-mode.
+                */
+               if (glob->parallelModeOK &&
+                       IsModifySupportedInParallelMode(parse->commandType))
+               {
+                       glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+               }

something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);

-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+               /*
+                * Supported table-modification commands may require additional steps
+                * prior to entering parallel mode, such as assigning a FullTransactionId.
+                */

-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+               if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+                       break;
+
+               /*
+                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+                * the relation, and this would result in creation of new CommandIds
+                * on insert/update/delete and this isn't supported in a parallel
+                * worker (but is safe in the parallel leader).
+                */
+               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+               if (trigtype == RI_TRIGGER_FK)
+               {
+                       context->max_hazard = PROPARALLEL_RESTRICTED;
+                       /*
+                        * As we're looking for the max parallel hazard, we don't break
+                        * here; examine any further triggers ...
+                        */
+               }

-> Should we switch to non-parallel mode in this case, instead of throwing error?
+                       val = SysCacheGetAttr(CONSTROID, tup,
+                                               Anum_pg_constraint_conbin, &isnull);
+                       if (isnull)
+                               elog(ERROR, "null conbin for constraint %u", con->oid);
+                       conbin = TextDatumGetCString(val);

-> We could include a few tests for this in regression.

-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix typo in xlogreader.h and procarray.c
Next
From: Noah Misch
Date:
Subject: Re: public schema default ACL