Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1KZMEGEu5Q9iWH=hUY0p4VjOAdOjWfuOKhoNZVXK3LmtA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Antonin Houska <ah@cybertec.at>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
> v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> --------------------------------------------------------------
>
> @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
>                 /* all the cheap tests pass, so scan the query tree */
>                 glob->maxParallelHazard = max_parallel_hazard(parse);
>                 glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +
> +               /*
> +                * Additional parallel-mode safety checks are required in order to
> +                * allow an underlying parallel query to be used for a
> +                * table-modification command that is supported in parallel-mode.
> +                */
> +               if (glob->parallelModeOK &&
> +                       IsModifySupportedInParallelMode(parse->commandType))
> +               {
> +                       glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
> +                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +               }
>
> Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> functions should not be called by parallel worker.
>

What in the above change indicates that the parallel_restricted will
be allowed in parallel workers. This just sets paralleModeOK to allow
parallel plans for Selects if the Insert can be performed safely in a
leader backend.

>
> @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
>  }
>
>  /*
> + *     PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> +       Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Isn't the test of force_parallel_mode just a hack to make regression tests
> pass? When I removed this part and ran the regression tests with
> force_parallel_mode=regress, the assertion fired when executing a subquery
> because the executor was already in parallel mode due to the main query
> execution.
>

I think this Assert is bogus. We are allowed to enter in parallel-mode
if we are already in parallel-mode, see EnterParallelMode. But we
shouldn't be allowed allocate xid in parallel-mode. So the
Assert(!IsInParallelMode()) should be moved inside the check if
(IsModifySupportedInParallelMode(commandType)) in this function. Can
you check if it still fails after such a modification?

> As an alternative, have you considered allocation of the XID even in parallel
> mode? I imagine that the first parallel worker that needs the XID for
> insertions allocates it and shares it with the other workers as well as with
> the leader process.
>

As a matter of this patch
(v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
never need to allocate xids by workers because Insert is always
performed by leader backend. Even, if we want to do what you are
suggesting it would be tricky because currently, we don't have such an
infrastructure where we can pass information among workers.

> One problem of the current patch version is that the "INSERT INTO ... SELECT
> ..." statement consumes XID even if the SELECT eventually does not return any
> row. However, if the same query is processed w/o parallelism, the XID is only
> allocated if at least one tuple needs to be inserted.
>

Yeah, that is true but I think this can happen w/o parallelism for
updates and deletes where by the time we try to modify the row, it got
modified by a concurrent session and the first session will needlessly
allocate XID.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
Next
From: Peter Eisentraut
Date:
Subject: Re: Prevent printing "next step instructions" in initdb and pg_upgrade