Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAA4eK1LSu3_dA777+90CQ8bYKSzxjHgj3yiKb-DR_2GE8psNfA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 0001-InsertParallelSelect > > 1. > > ParallelContext *pcxt; > > > > + /* > > + * We need to avoid an attempt on INSERT to assign a > > + * FullTransactionId whilst in parallel mode (which is in > > + * effect due to the underlying parallel query) - so the > > + * FullTransactionId is assigned here. Parallel mode must > > + * be temporarily escaped in order for this to be possible. > > + * The FullTransactionId will be included in the transaction > > + * state that is serialized in the parallel DSM. > > + */ > > + if (estate->es_plannedstmt->commandType == CMD_INSERT) > > + { > > + Assert(IsInParallelMode()); > > + ExitParallelMode(); > > + GetCurrentFullTransactionId(); > > + EnterParallelMode(); > > + } > > + > > > > This looks like a hack to me. I think you are doing this to avoid the > > parallel mode checks in GetNewTransactionId(), right? > > Yes, agreed, is a hack to avoid that (mind you, it's not exactly great > that ExecutePlan() sets parallel-mode for the entire plan execution). > Also, did not expect that to necessarily remain in a final patch. > > >If so, I have > > already mentioned above [1] that we can change it so that we disallow > > assigning xids for parallel workers only. The same is true for the > > check in ExecGatherMerge. Do you see any problem with that suggestion? > > > > No, should be OK I guess, but will update and test to be sure. > > > 2. > > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char > > *query_string, int cursorOptions, > > */ > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > > IsUnderPostmaster && > > - parse->commandType == CMD_SELECT && > > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) && > > !parse->hasModifyingCTE && > > max_parallel_workers_per_gather > 0 && > > !IsParallelWorker()) > > > > I think the comments above this need to be updated especially the part > > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and > > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only > > the leader backend writes into a completely new table.". Don't we need > > to include Insert also? > > Yes, Insert needs to be mentioned somewhere there. > > > > > 3. > > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char > > *query_string, int cursorOptions, > > * parallel-unsafe, or else the query planner itself has a bug. > > */ > > glob->parallelModeNeeded = glob->parallelModeOK && > > + (parse->commandType == CMD_SELECT) && > > (force_parallel_mode != FORCE_PARALLEL_OFF); > > > > Why do you need this change? The comments above this code should be > > updated to reflect this change. I think for the same reason the below > > code seems to be modified but I don't understand the reason for the > > below change as well, also it is better to update the comments for > > this as well. > > > > OK, I will update the comments for this. > Basically, up to now, the "force_parallel_mode" has only ever operated > on a SELECT. > But since we are now allowing CMD_INSERT to be assessed for parallel > mode too, we need to prevent the force_parallel_mode logic from > sticking a Gather node over the top of arbitrary INSERTs and causing > them to be run in parallel. Not all INSERTs are suitable for parallel > operation, and also there are further considerations for > parallel-safety for INSERTs compared to SELECT. INSERTs can also > trigger UPDATEs. > Sure but in that case 'top_plan->parallel_safe' should be false and it should stick Gather node atop Insert node. For the purpose of this patch, the scan beneath Insert should be considered as parallel_safe. -- With Regards, Amit Kapila.
pgsql-hackers by date: