Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-e-sjgbk0H4EdaXSWTWvhgBN8nfdCbNN0Umr5OY1Yj0yQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> IIUC, below is code for this workaround:
>
> +MaxRelParallelHazardForModify(Oid relid,
> + CmdType commandType,
> + max_parallel_hazard_context *context)
> +{
> + Relation        rel;
> + TupleDesc tupdesc;
> + int attnum;
> +
> + LOCKMODE lockmode = AccessShareLock;
> +
> + /*
> + * It's possible that this relation is locked for exclusive access
> + * in another concurrent transaction (e.g. as a result of a
> + * ALTER TABLE ... operation) until that transaction completes.
> + * If a share-lock can't be acquired on it now, we have to assume this
> + * could be the worst-case, so to avoid blocking here until that
> + * transaction completes, conditionally try to acquire the lock and
> + * assume and return UNSAFE on failure.
> + */
> + if (ConditionalLockRelationOid(relid, lockmode))
> + {
> + rel = table_open(relid, NoLock);
> + }
> + else
> + {
> + context->max_hazard = PROPARALLEL_UNSAFE;
> + return context->max_hazard;
> + }
>
> Do we need this workaround if we lock just the parent table instead of
> locking all the tables? Basically, can we safely identify the
> parallel-safety of partitioned relation if we just have a lock on
> parent relation?

I believe the workaround is still needed in this case, because the
workaround was added because of a test in which the parent table was
exclusively locked in another concurrent transaction (as a result of
ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a
ShareLock on the parent table without hanging (and then ending up
failing the test because of it).
So at the moment the workaround is needed, even if just trying to lock
the parent table.
I'll do some more testing to determine the secondary issue of whether
locks on the partition tables are needed, but at the moment I believe
they are.

>One more thing I have noticed is that for scan
> relations (Select query), we do such checks much later based on
> RelOptInfo (see set_rel_consider_parallel) which seems to have most of
> the information required to perform parallel-safety checks but I guess
> for ModifyTable (aka the Insert table) the equivalent doesn't seem
> feasible but have you thought of doing at the later stage in planner?
>

Yes, and in fact I tried putting the checks in a later stage of the
planner, and it's almost successful, except it then makes setting
"parallelModeNeeded" very tricky indeed, because that is expected to
be set based on whether the SQL is safe to run in parallel mode
(paralleModeOK == true) and whether force_parallel_mode is not off.
With parallel safety checks delayed to a later stage in the planner,
it's then not known whether there are certain types of parallel-unsafe
INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE
...), because processing for those doesn't reach those later stages of
the planner where parallelism is being considered. So then to avoid
errors from when parallel-mode is forced on and such unsafe INSERTs
are run, the only real choice is to only allow parallelModeNeeded to
be true for SELECT only (not INSERT), and this is kind of cheating and
also not picking up cases where parallel-safe INSERT is run but
invokes parallel-mode-unsafe features.
My conclusion, at least for the moment, is to leave the check where it is.


> Few other comments on latest patch:
> ===============================
> 1.
> MaxRelParallelHazardForModify()
> {
> ..
> + if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
> + {
> + /*
> ..
>
> Why to check CMD_UPDATE here?
>

That was a bit of forward-thinking, for when/if UPDATE/DELETE is
supported in parallel-mode.
Column default expressions and check-constraints are only applicable
to INSERT and UPDATE.
Note however that currently this function can only ever be called with
commandType == CMD_INSERT.

> 2.
> +void PrepareParallelModeForModify(CmdType commandType, bool
> isParallelModifyLeader)
> +{
> + Assert(!IsInParallelMode());
> +
> + if (isParallelModifyLeader)
> + (void)GetCurrentCommandId(true);
> +
> + (void)GetCurrentFullTransactionId();
>
> Here, we should use GetCurrentTransactionId() similar to heap_insert
> or other heap operations. I am not sure why you have used
> GetCurrentFullTransactionId?
>

GetCurrentTransactionId() and GetCurrentFullTransactionId() actually
have the same functionality, just a different return value (which is
not being used here).
But anyway I've changed it to use GetCurrentTransactionId().


> 3. Can we have a test to show why we need to check all the partitions
> for parallel-safety? I think it would be possible when there is a
> trigger on only one of the partitions and that trigger has
> corresponding parallel_unsafe function. But it is good to verify that
> once.
>

I can't imagine how you could check parallel-safety properly without
checking all of the partitions.
We don't know which partition that data will get inserted into until
runtime (e.g. range/list partitioning).
Each partition can have its own column default expressions,
check-constraints, triggers etc. (which may or may not be
parallel-safe) and a partition may itself be a partitioned table.


> 4. Have you checked the overhead of this on the planner for different
> kinds of statements like inserts into tables having 100 or 500
> partitions? Similarly, it is good to check the overhead of domain
> related checks added in the patch.
>

Checking that now and will post results soon.

> 5. Can we have a separate patch for parallel-selects for Insert? It
> will make review easier.
>

See attached patches.


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Greg Sabino Mullane
Date:
Subject: Re: psql \df choose functions by their arguments
Next
From: Tomas Vondra
Date:
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits