Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CA+HiwqH1Zzm6Hyjj75Uk2fR8L0yRmAbQuWdsn1tEvv30NbdydA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
RE: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...) |
List | pgsql-hackers |
On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > The code check that you have identified above ensures that the INSERT > > > has an underlying SELECT, because the planner won't (and shouldn't > > > anyway) generate a parallel plan for INSERT...VALUES, so there is no > > > point doing any parallel-safety checks in this case. > > > It just so happens that the problem test case uses INSERT...VALUES - > > > and it shouldn't have triggered the parallel-safety checks for > > > parallel INSERT for this case anyway, because INSERT...VALUES can't > > > (and shouldn't) be parallelized. > > > > AFAICS, max_parallel_hazard() path never bails from doing further > > safety checks based on anything other than finding a query component > > whose hazard level crosses context->max_interesting. > > It's parallel UNSAFE because it's not safe or even possible to have a > parallel plan for this. > (as UNSAFE is the max hazard level, no point in referencing > context->max_interesting). > And there are existing cases of bailing out and not doing further > safety checks (even your v15_delta.diff patch placed code - for > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such > existing case in max_parallel_hazard_walker()): > > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > if (query->rowMarks != NULL) > { > context->max_hazard = PROPARALLEL_UNSAFE; > return true; > } In my understanding, the max_parallel_hazard() query tree walk is to find constructs that are parallel unsafe in that they call code that can't run in parallel mode. For example, FOR UPDATE/SHARE on traditional heap AM tuples calls AssignTransactionId() which doesn't support being called in parallel mode. Likewise ON CONFLICT ... DO UPDATE calls heap_update() which doesn't support parallelism. I'm not aware of any such hazards downstream of ExecValuesScan(). > >You're trying to > > add something that bails based on second-guessing that a parallel path > > would not be chosen, which I find somewhat objectionable. > > > > If the main goal of bailing out is to avoid doing the potentially > > expensive modification safety check on the target relation, maybe we > > should try to somehow make the check less expensive. I remember > > reading somewhere in the thread about caching the result of this check > > in relcache, but haven't closely studied the feasibility of doing so. > > > > There's no "second-guessing" involved here. > There is no underlying way of dividing up the VALUES data of > "INSERT...VALUES" amongst the parallel workers, even if the planner > was updated to produce a parallel-plan for the "INSERT...VALUES" case > (apart from the fact that spawning off parallel workers to insert that > data would almost always result in worse performance than a > non-parallel plan...) > The division of work for parallel workers is part of the table AM > (scan) implementation, which is not invoked for "INSERT...VALUES". I don't disagree that the planner would not normally assign a parallel path simply to pull values out of a VALUES list mentioned in the INSERT command, but deciding something based on the certainty of it in an earlier planning phase seems odd to me. Maybe that's just me though. -- Amit Langote EDB: http://www.enterprisedb.com
pgsql-hackers by date: