Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-e-sjgbk0H4EdaXSWTWvhgBN8nfdCbNN0Umr5OY1Yj0yQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...) |
List | pgsql-hackers |
On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > IIUC, below is code for this workaround: > > +MaxRelParallelHazardForModify(Oid relid, > + CmdType commandType, > + max_parallel_hazard_context *context) > +{ > + Relation rel; > + TupleDesc tupdesc; > + int attnum; > + > + LOCKMODE lockmode = AccessShareLock; > + > + /* > + * It's possible that this relation is locked for exclusive access > + * in another concurrent transaction (e.g. as a result of a > + * ALTER TABLE ... operation) until that transaction completes. > + * If a share-lock can't be acquired on it now, we have to assume this > + * could be the worst-case, so to avoid blocking here until that > + * transaction completes, conditionally try to acquire the lock and > + * assume and return UNSAFE on failure. > + */ > + if (ConditionalLockRelationOid(relid, lockmode)) > + { > + rel = table_open(relid, NoLock); > + } > + else > + { > + context->max_hazard = PROPARALLEL_UNSAFE; > + return context->max_hazard; > + } > > Do we need this workaround if we lock just the parent table instead of > locking all the tables? Basically, can we safely identify the > parallel-safety of partitioned relation if we just have a lock on > parent relation? I believe the workaround is still needed in this case, because the workaround was added because of a test in which the parent table was exclusively locked in another concurrent transaction (as a result of ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a ShareLock on the parent table without hanging (and then ending up failing the test because of it). So at the moment the workaround is needed, even if just trying to lock the parent table. I'll do some more testing to determine the secondary issue of whether locks on the partition tables are needed, but at the moment I believe they are. >One more thing I have noticed is that for scan > relations (Select query), we do such checks much later based on > RelOptInfo (see set_rel_consider_parallel) which seems to have most of > the information required to perform parallel-safety checks but I guess > for ModifyTable (aka the Insert table) the equivalent doesn't seem > feasible but have you thought of doing at the later stage in planner? > Yes, and in fact I tried putting the checks in a later stage of the planner, and it's almost successful, except it then makes setting "parallelModeNeeded" very tricky indeed, because that is expected to be set based on whether the SQL is safe to run in parallel mode (paralleModeOK == true) and whether force_parallel_mode is not off. With parallel safety checks delayed to a later stage in the planner, it's then not known whether there are certain types of parallel-unsafe INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE ...), because processing for those doesn't reach those later stages of the planner where parallelism is being considered. So then to avoid errors from when parallel-mode is forced on and such unsafe INSERTs are run, the only real choice is to only allow parallelModeNeeded to be true for SELECT only (not INSERT), and this is kind of cheating and also not picking up cases where parallel-safe INSERT is run but invokes parallel-mode-unsafe features. My conclusion, at least for the moment, is to leave the check where it is. > Few other comments on latest patch: > =============================== > 1. > MaxRelParallelHazardForModify() > { > .. > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE) > + { > + /* > .. > > Why to check CMD_UPDATE here? > That was a bit of forward-thinking, for when/if UPDATE/DELETE is supported in parallel-mode. Column default expressions and check-constraints are only applicable to INSERT and UPDATE. Note however that currently this function can only ever be called with commandType == CMD_INSERT. > 2. > +void PrepareParallelModeForModify(CmdType commandType, bool > isParallelModifyLeader) > +{ > + Assert(!IsInParallelMode()); > + > + if (isParallelModifyLeader) > + (void)GetCurrentCommandId(true); > + > + (void)GetCurrentFullTransactionId(); > > Here, we should use GetCurrentTransactionId() similar to heap_insert > or other heap operations. I am not sure why you have used > GetCurrentFullTransactionId? > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually have the same functionality, just a different return value (which is not being used here). But anyway I've changed it to use GetCurrentTransactionId(). > 3. Can we have a test to show why we need to check all the partitions > for parallel-safety? I think it would be possible when there is a > trigger on only one of the partitions and that trigger has > corresponding parallel_unsafe function. But it is good to verify that > once. > I can't imagine how you could check parallel-safety properly without checking all of the partitions. We don't know which partition that data will get inserted into until runtime (e.g. range/list partitioning). Each partition can have its own column default expressions, check-constraints, triggers etc. (which may or may not be parallel-safe) and a partition may itself be a partitioned table. > 4. Have you checked the overhead of this on the planner for different > kinds of statements like inserts into tables having 100 or 500 > partitions? Similarly, it is good to check the overhead of domain > related checks added in the patch. > Checking that now and will post results soon. > 5. Can we have a separate patch for parallel-selects for Insert? It > will make review easier. > See attached patches. Regards, Greg Nancarrow Fujitsu Australia
Attachment
pgsql-hackers by date: