Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-f=UX1uKbPjDXf+8gJOoEPz9VCzh7pKnknfU4sG4LXj0A@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi Vignesh,

Thanks for reviewing the patches.

On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:
>
> -> commandType is not used, we can remove it.
> + * Prepare for entering parallel mode by assigning a FullTransactionId, to be
> + * included in the transaction state that is serialized in the parallel DSM.
> + */
> +void PrepareParallelModeForModify(CmdType commandType)
> +{
> +       Assert(!IsInParallelMode());
> +
> +       (void)GetCurrentTransactionId();
> +}

Thanks, at least for INSERT, it's not needed, so I'll remove it.

>
> -> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes
intoa completely new table.  In the future, we can extend it to allow workers to write into the table" must be updated
accordingly:
> +        * modify any data using a CTE, or if this is a cursor operation, or if
> +        * GUCs are set to values that don't permit parallelism, or if
> +        * parallel-unsafe functions are present in the query tree.
>          *
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> +        * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
>          * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
>          * backend writes into a completely new table.  In the future, we can
>          * extend it to allow workers to write into the table.  However, to allow
>
> -> Also should we specify insert as "insert into select"
>

I'll update it, appropriate to each patch.

> -> We could include a small writeup of the design may be in the commit message. It will be useful for review.
>

Will do so for the next patch version.

> -> I felt the below two assignment statements can be in the else condition:
>                 glob->maxParallelHazard = max_parallel_hazard(parse);
>                 glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +
> +               /*
> +                * Additional parallel-mode safety checks are required in order to
> +                * allow an underlying parallel query to be used for a
> +                * table-modification command that is supported in parallel-mode.
> +                */
> +               if (glob->parallelModeOK &&
> +                       IsModifySupportedInParallelMode(parse->commandType))
> +               {
> +                       glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
> +                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +               }
>
> something like:
> /*
> * Additional parallel-mode safety checks are required in order to
> * allow an underlying parallel query to be used for a
> * table-modification command that is supported in parallel-mode.
> */
> if (glob->parallelModeOK &&
> IsModifySupportedInParallelMode(parse->commandType))
> glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
> else
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse);
> glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
>

That won't work. As the comment is trying to point out, additional
parallel-safety checks (i.e. in addition to those done by
max_parallel_hazard()) are required to determine if INSERT can be
safely run in parallel-mode with an underlying parallel query.
Also, the max_parallel_hazard found from first calling
max_parallel_hazard() then needs to be fed into
MaxParallelHazardForModify(), in case it finds a worse parallel
hazard.
For example, max_parallel_hazard() may find something parallel
RESTRICTED, but then the additional parallel-safety checks done by
MaxParallelHazardForModify() find something parallel UNSAFE.

> -> Comments need slight adjustment, maybe you could run pgindent for the modified code.
> +               /*
> +                * Supported table-modification commands may require additional steps
> +                * prior to entering parallel mode, such as assigning a FullTransactionId.
> +                */
>

OK, will run pgindent.

> -> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in
thatcase? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
 
> +               if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
> +                       break;
> +
> +               /*
> +                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> +                * the relation, and this would result in creation of new CommandIds
> +                * on insert/update/delete and this isn't supported in a parallel
> +                * worker (but is safe in the parallel leader).
> +                */
> +               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> +               if (trigtype == RI_TRIGGER_FK)
> +               {
> +                       context->max_hazard = PROPARALLEL_RESTRICTED;
> +                       /*
> +                        * As we're looking for the max parallel hazard, we don't break
> +                        * here; examine any further triggers ...
> +                        */
> +               }
>

max_parallel_hazard_test won't return true for PROPARALLEL_RESTRICTED.
max_parallel_hazard_test only returns true when
"context.max_interesting" is found, and that is set to
PROPARALLEL_UNSAFE in max_parallel_hazard_for_modify().

> -> Should we switch to non-parallel mode in this case, instead of throwing error?
> +                       val = SysCacheGetAttr(CONSTROID, tup,
> +                                               Anum_pg_constraint_conbin, &isnull);
> +                       if (isnull)
> +                               elog(ERROR, "null conbin for constraint %u", con->oid);
> +                       conbin = TextDatumGetCString(val);
>

I didn't invent that error check, it's found in several other places
in the Postgres code (that error should only ever occur if the
database has been corrupted or intentionally invalidated).
Having said that, I agree that perhaps it's best to switch to
non-parallel mode in this case, but this wouldn't stop it erroring out
when the plan is actually run.

> -> We could include a few tests for this in regression.
>

Looking at adding relevant test cases.

> -> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
>

Looking at doc updates.


Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Soumyadeep Chakraborty
Date:
Subject: Re: Delay of standby shutdown
Next
From: Michael Paquier
Date:
Subject: Re: Move OpenSSL random under USE_OPENSSL_RANDOM