Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAA4eK1KZMEGEu5Q9iWH=hUY0p4VjOAdOjWfuOKhoNZVXK3LmtA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote: > > Greg Nancarrow <gregn4422@gmail.com> wrote: > > > Posting an updated set of patches to address recent feedback: > > Following is my review. > > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch > -------------------------------------------------------------- > > @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > + > + /* > + * Additional parallel-mode safety checks are required in order to > + * allow an underlying parallel query to be used for a > + * table-modification command that is supported in parallel-mode. > + */ > + if (glob->parallelModeOK && > + IsModifySupportedInParallelMode(parse->commandType)) > + { > + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard); > + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > + } > > Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these > functions should not be called by parallel worker. > What in the above change indicates that the parallel_restricted will be allowed in parallel workers. This just sets paralleModeOK to allow parallel plans for Selects if the Insert can be performed safely in a leader backend. > > @@ -1015,6 +1016,27 @@ IsInParallelMode(void) > } > > /* > + * PrepareParallelMode > + * > + * Prepare for entering parallel mode, based on command-type. > + */ > +void > +PrepareParallelMode(CmdType commandType) > +{ > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > > Isn't the test of force_parallel_mode just a hack to make regression tests > pass? When I removed this part and ran the regression tests with > force_parallel_mode=regress, the assertion fired when executing a subquery > because the executor was already in parallel mode due to the main query > execution. > I think this Assert is bogus. We are allowed to enter in parallel-mode if we are already in parallel-mode, see EnterParallelMode. But we shouldn't be allowed allocate xid in parallel-mode. So the Assert(!IsInParallelMode()) should be moved inside the check if (IsModifySupportedInParallelMode(commandType)) in this function. Can you check if it still fails after such a modification? > As an alternative, have you considered allocation of the XID even in parallel > mode? I imagine that the first parallel worker that needs the XID for > insertions allocates it and shares it with the other workers as well as with > the leader process. > As a matter of this patch (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we never need to allocate xids by workers because Insert is always performed by leader backend. Even, if we want to do what you are suggesting it would be tricky because currently, we don't have such an infrastructure where we can pass information among workers. > One problem of the current patch version is that the "INSERT INTO ... SELECT > ..." statement consumes XID even if the SELECT eventually does not return any > row. However, if the same query is processed w/o parallelism, the XID is only > allocated if at least one tuple needs to be inserted. > Yeah, that is true but I think this can happen w/o parallelism for updates and deletes where by the time we try to modify the row, it got modified by a concurrent session and the first session will needlessly allocate XID. -- With Regards, Amit Kapila.
pgsql-hackers by date: