Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1LSu3_dA777+90CQ8bYKSzxjHgj3yiKb-DR_2GE8psNfA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 0001-InsertParallelSelect
> > 1.
> > ParallelContext *pcxt;
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right?
>
> Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
> that ExecutePlan() sets parallel-mode for the entire plan execution).
> Also, did not expect that to necessarily remain in a final patch.
>
> >If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> No, should be OK I guess, but will update and test to be sure.
>
> > 2.
> > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   */
> >   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> >   IsUnderPostmaster &&
> > - parse->commandType == CMD_SELECT &&
> > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
> >   !parse->hasModifyingCTE &&
> >   max_parallel_workers_per_gather > 0 &&
> >   !IsParallelWorker())
> >
> > I think the comments above this need to be updated especially the part
> > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> > the leader backend writes into a completely new table.". Don't we need
> > to include Insert also?
>
> Yes, Insert needs to be mentioned somewhere there.
>
> >
> > 3.
> > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   * parallel-unsafe, or else the query planner itself has a bug.
> >   */
> >   glob->parallelModeNeeded = glob->parallelModeOK &&
> > + (parse->commandType == CMD_SELECT) &&
> >   (force_parallel_mode != FORCE_PARALLEL_OFF);
> >
> > Why do you need this change? The comments above this code should be
> > updated to reflect this change. I think for the same reason the below
> > code seems to be modified but I don't understand the reason for the
> > below change as well, also it is better to update the comments for
> > this as well.
> >
>
> OK, I will update the comments for this.
> Basically, up to now, the "force_parallel_mode" has only ever operated
> on a SELECT.
> But since we are now allowing CMD_INSERT to be assessed for parallel
> mode too, we need to prevent the force_parallel_mode logic from
> sticking a Gather node over the top of arbitrary INSERTs and causing
> them to be run in parallel. Not all INSERTs are suitable for parallel
> operation, and also there are further considerations for
> parallel-safety for INSERTs compared to SELECT. INSERTs can also
> trigger UPDATEs.
>

Sure but in that case 'top_plan->parallel_safe' should be false and it
should stick Gather node atop Insert node. For the purpose of this
patch, the scan beneath Insert should be considered as parallel_safe.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: Amit Kapila
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)