Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
| From | Amit Langote | 
|---|---|
| Subject | Re: Parallel INSERT (INTO ... SELECT ...) | 
| Date | |
| Msg-id | CA+HiwqH1Zzm6Hyjj75Uk2fR8L0yRmAbQuWdsn1tEvv30NbdydA@mail.gmail.com Whole thread Raw | 
| In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) | 
| Responses | RE: Parallel INSERT (INTO ... SELECT ...) Re: Parallel INSERT (INTO ... SELECT ...) | 
| List | pgsql-hackers | 
On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > > The code check that you have identified above ensures that the INSERT
> > > has an underlying SELECT, because the planner won't (and shouldn't
> > > anyway) generate a parallel plan for INSERT...VALUES, so there is no
> > > point doing any parallel-safety checks in this case.
> > > It just so happens that the problem test case uses INSERT...VALUES -
> > > and it shouldn't have triggered the parallel-safety checks for
> > > parallel INSERT for this case anyway, because INSERT...VALUES can't
> > > (and shouldn't) be parallelized.
> >
> > AFAICS, max_parallel_hazard() path never bails from doing further
> > safety checks based on anything other than finding a query component
> > whose hazard level crosses context->max_interesting.
>
> It's parallel UNSAFE because it's not safe or even possible to have a
> parallel plan for this.
> (as UNSAFE is the max hazard level, no point in referencing
> context->max_interesting).
> And there are existing cases of bailing out and not doing further
> safety checks (even your v15_delta.diff patch placed code - for
> bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
> existing case in max_parallel_hazard_walker()):
>
>     else if (IsA(node, Query))
>     {
>         Query      *query = (Query *) node;
>
>         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
>         if (query->rowMarks != NULL)
>         {
>             context->max_hazard = PROPARALLEL_UNSAFE;
>             return true;
>         }
In my understanding, the max_parallel_hazard() query tree walk is to
find constructs that are parallel unsafe in that they call code that
can't run in parallel mode.  For example, FOR UPDATE/SHARE on
traditional heap AM tuples calls AssignTransactionId() which doesn't
support being called in parallel mode.  Likewise ON CONFLICT ... DO
UPDATE calls heap_update() which doesn't support parallelism.  I'm not
aware of any such hazards downstream of ExecValuesScan().
> >You're trying to
> > add something that bails based on second-guessing that a parallel path
> > would not be chosen, which I find somewhat objectionable.
> >
> > If the main goal of bailing out is to avoid doing the potentially
> > expensive modification safety check on the target relation, maybe we
> > should try to somehow make the check less expensive.  I remember
> > reading somewhere in the thread about caching the result of this check
> > in relcache, but haven't closely studied the feasibility of doing so.
> >
>
> There's no "second-guessing" involved here.
> There is no underlying way of dividing up the VALUES data of
> "INSERT...VALUES" amongst the parallel workers, even if the planner
> was updated to produce a parallel-plan for the "INSERT...VALUES" case
> (apart from the fact that spawning off parallel workers to insert that
> data would almost always result in worse performance than a
> non-parallel plan...)
> The division of work for parallel workers is part of the table AM
> (scan) implementation, which is not invoked for "INSERT...VALUES".
I don't disagree that the planner would not normally assign a parallel
path simply to pull values out of a VALUES list mentioned in the
INSERT command, but deciding something based on the certainty of it in
an earlier planning phase seems odd to me.  Maybe that's just me
though.
--
Amit Langote
EDB: http://www.enterprisedb.com
		
	pgsql-hackers by date: