Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-fhdbhP7PeKf-sgRuZG4DLPyrbQUbub=ho=A9dyNhhrnw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Fri, Oct 30, 2020 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > So then to avoid > > errors from when parallel-mode is forced on and such unsafe INSERTs > > are run, the only real choice is to only allow parallelModeNeeded to > > be true for SELECT only (not INSERT), and this is kind of cheating and > > also not picking up cases where parallel-safe INSERT is run but > > invokes parallel-mode-unsafe features. > > My conclusion, at least for the moment, is to leave the check where it is. > > > > Okay, then can we integrate the functionality of > MaxParallelHazardForModify in max_parallel_hazard? Calling it > separately looks bit awkward. > Looking into that. > > > > > Few other comments on latest patch: > > > =============================== > > > 1. > > > MaxRelParallelHazardForModify() > > > { > > > .. > > > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE) > > > + { > > > + /* > > > .. > > > > > > Why to check CMD_UPDATE here? > > > > > > > That was a bit of forward-thinking, for when/if UPDATE/DELETE is > > supported in parallel-mode. > > Column default expressions and check-constraints are only applicable > > to INSERT and UPDATE. > > Note however that currently this function can only ever be called with > > commandType == CMD_INSERT. > > > > I feel then for other command types there should be an Assert rather > than try to handle something which is not yet implemented nor it is > clear what all is required for that. It confuses the reader, at least > it confused me. Probably we can write a comment but I don't think we > should have any check for Update at this stage of work. > OK, for now I'll restrict the checks to INSERT, but I'll add comments to assist with potential future UPDATE support. > > > 2. > > > +void PrepareParallelModeForModify(CmdType commandType, bool > > > isParallelModifyLeader) > > > +{ > > > + Assert(!IsInParallelMode()); > > > + > > > + if (isParallelModifyLeader) > > > + (void)GetCurrentCommandId(true); > > > + > > > + (void)GetCurrentFullTransactionId(); > > > > > > Here, we should use GetCurrentTransactionId() similar to heap_insert > > > or other heap operations. I am not sure why you have used > > > GetCurrentFullTransactionId? > > > > > > > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually > > have the same functionality, just a different return value (which is > > not being used here). > > > > Sure but lets use what is required. > > > But anyway I've changed it to use GetCurrentTransactionId(). > > > > But comments in ExecutePlan and PrepareParallelModeForModify still > refer to FullTransactionId. > I believe those comments are technically correct. GetCurrentTransactionId() calls AssignTransactionId() to do all the work - and the comment for that function says "Assigns a new permanent FullTransactionId to the given TransactionState". > > > > > > > 4. Have you checked the overhead of this on the planner for different > > > kinds of statements like inserts into tables having 100 or 500 > > > partitions? Similarly, it is good to check the overhead of domain > > > related checks added in the patch. > > > > > > > Checking that now and will post results soon. > > > I am seeing a fair bit of overhead in the planning for the INSERT parallel-safety checks (mind you, compared to the overall performance gain, it's not too bad). Some representative timings for a parallel INSERT of a millions rows into 100,250 and 500 partitions are shown below. (1) Without patch # Partitions Planning Time (ms) Execution Time (ms) 100 1.014 4176.435 250 0.404 3842.414 500 0.529 4440.633 (2) With Parallel INSERT patch # Partitions Planning Time (ms) Execution Time (ms) 100 11.420 2131.148 250 23.269 3472.259 500 36.531 3238.868 I'm looking into how this can be improved by better integration into the current code, and addressing locking concerns that you've previously mentioned. Regards, Greg Nancarrow Fujitsu Australia
pgsql-hackers by date: