Thread: Parallel INSERT (INTO ... SELECT ...)
Hi Hackers, Following on from Dilip Kumar's POC patch for allowing parallelism of the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC patch for allowing parallelism of both the INSERT and SELECT parts, where it can be allowed. For cases where it can't be allowed (e.g. INSERT into a table with foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE ...") it at least allows parallelism of the SELECT part. Obviously I've had to update the planner and executor and parallel-worker code to make this happen, hopefully not breaking too many things along the way. Examples with patch applied: (1) non-parallel: test=# explain analyze insert into primary_tbl select * from third_tbl; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Insert on primary_tbl (cost=0.00..154.99 rows=9999 width=12) (actual time=108.445..108.446 rows=0 loops=1) -> Seq Scan on third_tbl (cost=0.00..154.99 rows=9999 width=12) (actual time=0.009..5.282 rows=9999 loops=1) Planning Time: 0.132 ms Execution Time: 108.596 ms (4 rows) (2) parallel: test=# explain analyze insert into primary_tbl select * from third_tbl; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..16.00 rows=9999 width=12) (actual time=69.870..70.310 rows=0 loops=1) Workers Planned: 5 Workers Launched: 5 -> Parallel Insert on primary_tbl (cost=0.00..16.00 rows=500 width=12) (actual time=59.948..59.949 rows=0 loops=6) -> Parallel Seq Scan on third_tbl (cost=0.00..80.00 rows=2500 width=12) (actual time=0.014..0.922 rows=1666 loops=6) Planning Time: 0.121 ms Execution Time: 70.438 ms (7 rows) (3) parallel select only (insert into table with foreign key) test=# explain analyze insert into secondary_tbl select * from third_tbl; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on secondary_tbl (cost=0.00..80.00 rows=9999 width=12) (actual time=33.864..33.926 rows=0 loops=1) -> Gather (cost=0.00..80.00 rows=9999 width=12) (actual time=0.451..5.201 rows=9999 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on third_tbl (cost=0.00..80.00 rows=2500 width=12) (actual time=0.013..0.717 rows=2000 loops=5) Planning Time: 0.127 ms Trigger for constraint secondary_tbl_index_fkey: time=331.834 calls=9999 Execution Time: 367.342 ms (8 rows) Known issues/TODOs: - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT INTO ... VALUES ..." would need additional Table AM functions for dividing up the INSERT work amongst the workers (currently only exists for scans). - When INSERTs are made parallel, currently the reported row-count in the "INSERT 0 <row-count>" status only reflects the rows that the leader has processed (not the workers) - so it is obviously less than the actual number of rows inserted. - Functions relating to computing the number of parallel workers for an INSERT, and the cost of an INSERT, need work. - "force_parallel_mode" handling was updated so that it only affects SELECT (not INSERT) - can't allow it for INSERT because we're only supporting "INSERT INTO .. SELECT ..." and don't support other types of INSERTs, and also can't allow attempted parallel UPDATEs resulting from "INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE" etc. Thoughts and feedback? Regards, Greg Nancarrow Fujitsu Australia
Attachment
> - When INSERTs are made parallel, currently the reported row-count in > the "INSERT 0 <row-count>" status only reflects the rows that the > leader has processed (not the workers) - so it is obviously less than > the actual number of rows inserted. Attached an updated patch which fixes this issue (for parallel INSERTs, each worker's processed tuple count is communicated in shared memory back to the leader, where it is added to the global "es_processed" count).
Attachment
On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Hi Hackers, > > Following on from Dilip Kumar's POC patch for allowing parallelism of > the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC > patch for allowing parallelism of both the INSERT and SELECT parts, > where it can be allowed. > For cases where it can't be allowed (e.g. INSERT into a table with > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE > ...") it at least allows parallelism of the SELECT part. > Obviously I've had to update the planner and executor and > parallel-worker code to make this happen, hopefully not breaking too > many things along the way. I feel this will be a very good performance improvement. +1 for this. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Hi, On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote: > Following on from Dilip Kumar's POC patch for allowing parallelism of > the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC > patch for allowing parallelism of both the INSERT and SELECT parts, > where it can be allowed. Cool! I think it'd be good if you outlined what your approach is to make this safe. > For cases where it can't be allowed (e.g. INSERT into a table with > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE > ...") it at least allows parallelism of the SELECT part. I think it'd be good to do this part separately and first, independent of whether the insert part can be parallelized. > Obviously I've had to update the planner and executor and > parallel-worker code to make this happen, hopefully not breaking too > many things along the way. Hm, it looks like you've removed a fair bit of checks, it's not clear to me why that's safe in each instance. > - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT > INTO ... VALUES ..." would need additional Table AM functions for > dividing up the INSERT work amongst the workers (currently only exists > for scans). Hm, not entirely following. What precisely are you thinking of here? I doubt it's really worth adding parallelism support for INSERT ... VALUES, the cost of spawning workers will almost always higher than the benefit. > @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value, > TupleDesc toasttupDesc; > Datum t_values[3]; > bool t_isnull[3]; > - CommandId mycid = GetCurrentCommandId(true); > + CommandId mycid = GetCurrentCommandId(!IsParallelWorker()); > struct varlena *result; > struct varatt_external toast_pointer; > union Hm? Why do we need this in the various places you have made this change? > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c > index 1585861..94c8507 100644 > --- a/src/backend/access/heap/heapam.c > +++ b/src/backend/access/heap/heapam.c > @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, > * inserts in general except for the cases where inserts generate a new > * CommandId (eg. inserts into a table having a foreign key column). > */ > - if (IsParallelWorker()) > - ereport(ERROR, > - (errcode(ERRCODE_INVALID_TRANSACTION_STATE), > - errmsg("cannot insert tuples in a parallel worker"))); > - I'm afraid that this weakens our checks more than I'd like. What if this ends up being invoked from inside C code? > @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap, > isdead = false; > break; > case HEAPTUPLE_INSERT_IN_PROGRESS: > - > /* > * Since we hold exclusive lock on the relation, normally the > * only way to see this is if it was inserted earlier in our > * own transaction. However, it can happen in system > * catalogs, since we tend to release write lock before commit > - * there. Give a warning if neither case applies; but in any > - * case we had better copy it. > + * there. In any case we had better copy it. > */ > - if (!is_system_catalog && > - !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data))) > - elog(WARNING, "concurrent insert in progress within table \"%s\"", > - RelationGetRelationName(OldHeap)); > + > /* treat as live */ > isdead = false; > break; > @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation, > * the only way to see this is if it was inserted earlier > * in our own transaction. However, it can happen in > * system catalogs, since we tend to release write lock > - * before commit there. Give a warning if neither case > - * applies. > + * before commit there. > */ > xwait = HeapTupleHeaderGetXmin(heapTuple->t_data); > if (!TransactionIdIsCurrentTransactionId(xwait)) > { > - if (!is_system_catalog) > - elog(WARNING, "concurrent insert in progress within table \"%s\"", > - RelationGetRelationName(heapRelation)); > - > /* > * If we are performing uniqueness checks, indexing > * such a tuple could lead to a bogus uniqueness Huh, I don't think this should be necessary? > diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c > index a4944fa..9d3f100 100644 > --- a/src/backend/access/transam/varsup.c > +++ b/src/backend/access/transam/varsup.c > @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact) > TransactionId xid; > > /* > - * Workers synchronize transaction state at the beginning of each parallel > - * operation, so we can't account for new XIDs after that point. > - */ > - if (IsInParallelMode()) > - elog(ERROR, "cannot assign TransactionIds during a parallel operation"); > - > - /* > * During bootstrap initialization, we return the special bootstrap > * transaction id. > */ Same thing, this code cannot just be allowed to be reachable. What prevents you from assigning two different xids from different workers etc? > @@ -577,13 +608,6 @@ AssignTransactionId(TransactionState s) > Assert(s->state == TRANS_INPROGRESS); > > /* > - * Workers synchronize transaction state at the beginning of each parallel > - * operation, so we can't account for new XIDs at this point. > - */ > - if (IsInParallelMode() || IsParallelWorker()) > - elog(ERROR, "cannot assign XIDs during a parallel operation"); > - > - /* > * Ensure parent(s) have XIDs, so that a child always has an XID later > * than its parent. Mustn't recurse here, or we might get a stack > * overflow if we're at the bottom of a huge stack of subtransactions none Dito. Greetings, Andres Freund
On Tue, Sep 22, 2020 at 4:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > Gather (cost=0.00..16.00 rows=9999 width=12) (actual > time=69.870..70.310 rows=0 loops=1) > Workers Planned: 5 > Workers Launched: 5 > -> Parallel Insert on primary_tbl (cost=0.00..16.00 rows=500 > width=12) (actual time=59.948..59.949 rows=0 loops=6) Nice. I took it for a quick spin. I was initially surprised to see Gather. I suppose I thought that Parallel {Insert|Update|Delete} might be a top level node itself, because in such a plan there is no need to gather tuples per se. I understand exactly why you have it that way though: Gather is needed to control workers and handle their errors etc, and we don't want to have to terminate parallelism anyway (thinking of some kind of plan with multiple write subqueries).
On Thu, Sep 24, 2020 at 7:57 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Tue, Sep 22, 2020 at 4:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Gather (cost=0.00..16.00 rows=9999 width=12) (actual > > time=69.870..70.310 rows=0 loops=1) > > Workers Planned: 5 > > Workers Launched: 5 > > -> Parallel Insert on primary_tbl (cost=0.00..16.00 rows=500 > > width=12) (actual time=59.948..59.949 rows=0 loops=6) > > Nice. I took it for a quick spin. I was initially surprised to see > Gather. I suppose I thought that Parallel {Insert|Update|Delete} > might be a top level node itself, because in such a plan there is no > need to gather tuples per se. I understand exactly why you have it > that way though: Gather is needed to control workers and handle their > errors etc, and we don't want to have to terminate parallelism anyway > (thinking of some kind of plan with multiple write subqueries). > I have not checked the patch but I guess if we parallelise Inserts with Returning then isn't it better to have Gather node above Parallel Inserts? -- With Regards, Amit Kapila.
On Thu, Sep 24, 2020 at 12:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I have not checked the patch but I guess if we parallelise Inserts > with Returning then isn't it better to have Gather node above Parallel > Inserts? > This is indeed the case with the patch applied. For example: test=# explain insert into primary_tbl select * from third_tbl returning index, height; QUERY PLAN ----------------------------------------------------------------------------------- Gather (cost=0.00..28.15 rows=9999 width=12) Workers Planned: 3 -> Parallel Insert on primary_tbl (cost=0.00..28.15 rows=1040 width=12) -> Parallel Seq Scan on third_tbl (cost=0.00..87.25 rows=3225 width=12) (4 rows) test=# insert into primary_tbl select * from third_tbl returning index, height; index | height -------+-------- 1 | 1.2 2 | 1.2 3 | 1.2 4 | 1.2 5 | 1.2 6 | 1.2 7 | 1.2 ... 9435 | 1.2 9619 | 1.2 9620 | 1.2 (9999 rows) INSERT 0 9999 Regards, Greg Nancarrow Fujitsu Australia
On Thu, Sep 24, 2020 at 7:51 AM Andres Freund <andres@anarazel.de> wrote: > > On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote: > > > > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c > > index 1585861..94c8507 100644 > > --- a/src/backend/access/heap/heapam.c > > +++ b/src/backend/access/heap/heapam.c > > @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, > > * inserts in general except for the cases where inserts generate a new > > * CommandId (eg. inserts into a table having a foreign key column). > > */ > > - if (IsParallelWorker()) > > - ereport(ERROR, > > - (errcode(ERRCODE_INVALID_TRANSACTION_STATE), > > - errmsg("cannot insert tuples in a parallel worker"))); > > - > > I'm afraid that this weakens our checks more than I'd like. > I think we need to change/remove this check to allow inserts by parallel workers. I am not sure but maybe we can add an Assert to ensure that it is safe to perform insert via parallel worker. > What if this > ends up being invoked from inside C code? > I think it shouldn't be a problem unless one is trying to do something like insert into foreign key table. So, probably we can have an Assert to catch it if possible. Do you have any other idea? -- With Regards, Amit Kapila.
> > What if this > > ends up being invoked from inside C code? > > > > I think it shouldn't be a problem unless one is trying to do something > like insert into foreign key table. So, probably we can have an Assert > to catch it if possible. Do you have any other idea? > Note that the planner code updated by the patch does avoid creating a Parallel INSERT plan in the case of inserting into a table with a foreign key (so commandIds won't be created in the parallel-worker code). I'm not sure how to distinguish the "invoked from inside C code" case though. Regards, Greg Nancarrow Fujitsu Australia
Hi Andres, On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres@anarazel.de> wrote: > >I think it'd be good if you outlined what your approach is to make this >safe. Some prior work has already been done to establish the necessary infrastructure to allow parallel INSERTs, in general, to be safe, except for cases where new commandIds would be generated in the parallel-worker code (such as inserts into a table having a foreign key) - these cases need to be avoided. See the following commits. 85f6b49 Allow relation extension lock to conflict among parallel group members 3ba59cc Allow page lock to conflict among parallel group members The planner code updated by the patch avoids creating a Parallel INSERT plan in the case of inserting into a table that has a foreign key. >> For cases where it can't be allowed (e.g. INSERT into a table with >> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE >> ...") it at least allows parallelism of the SELECT part. > >I think it'd be good to do this part separately and first, independent >of whether the insert part can be parallelized. OK then, I'll try to extract that as a separate patch. >> Obviously I've had to update the planner and executor and >> parallel-worker code to make this happen, hopefully not breaking too >> many things along the way. > >Hm, it looks like you've removed a fair bit of checks, it's not clear to >me why that's safe in each instance. It should be safe for Parallel INSERT - but you are right, these are brute force removals (for the purpose of a POC patch) that should be tightened up wherever possible to disallow unsafe paths into that code. Problem is, currently there's not a lot of context information available to easily allow that, so some work needs to be done. >> - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT >> INTO ... VALUES ..." would need additional Table AM functions for >> dividing up the INSERT work amongst the workers (currently only exists >> for scans). > >Hm, not entirely following. What precisely are you thinking of here? All I was saying is that for SELECTs, the work done by each parallel worker is effectively divided up by parallel-worker-related functions in tableam.c and indexam.c, and no such technology currently exists for dividing up work for the "INSERT ... VALUES" case. >I doubt it's really worth adding parallelism support for INSERT >... VALUES, the cost of spawning workers will almost always higher than >the benefit. You're probably right in doubting any benefit, but I wasn't entirely sure. >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value, >> TupleDesc toasttupDesc; >> Datum t_values[3]; >> bool t_isnull[3]; >> - CommandId mycid = GetCurrentCommandId(true); >> + CommandId mycid = GetCurrentCommandId(!IsParallelWorker()); >> struct varlena *result; >> struct varatt_external toast_pointer; >> union > >Hm? Why do we need this in the various places you have made this change? It's because for Parallel INSERT, we're assigning the same command-id to each worker up-front during worker initialization (the commandId has been retrieved by the leader and passed through to each worker) and "currentCommandIdUsed" has been set true. See the AssignCommandIdForWorker() function in the patch. If you see the code of GetCurrentCommandId(), you'll see it Assert that it's not being run by a parallel worker if the parameter is true. I didn't want to remove yet another check, without being able to know the context of the caller, because only for Parallel INSERT do I know that "currentCommandIdUsed was already true at the start of the parallel operation". See the comment in that function. Anyway, that's why I'm passing "false" to relevant GetCurrentCommandId() calls if they're being run by a parallel (INSERT) worker. >> @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap, >> isdead = false; >> break; >> case HEAPTUPLE_INSERT_IN_PROGRESS: >> - >> /* >> * Since we hold exclusive lock on the relation, normally the >> * only way to see this is if it was inserted earlier in our >> * own transaction. However, it can happen in system >> * catalogs, since we tend to release write lock before >commit >> - * there. Give a warning if neither case applies; but in any >> - * case we had better copy it. >> + * there. In any case we had better copy it. >> */ >> - if (!is_system_catalog && >> - !TransactionIdIsCurrentTransactionId>(HeapTupleHeaderGetXmin(tuple->t_data))) >> - elog(WARNING, "concurrent insert in progress within >table \"%s\"", >> - RelationGetRelationName(OldHeap)); >> + >> /* treat as live */ >> isdead = false; >> break; >> @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation, >> * the only way to see this is if it was inserted >earlier >> * in our own transaction. However, it can happen in >> * system catalogs, since we tend to release write >lock >> - * before commit there. Give a warning if neither >case >> - * applies. >> + * before commit there. >> */ >> xwait = HeapTupleHeaderGetXmin(heapTuple->t_data); >> if (!TransactionIdIsCurrentTransactionId(xwait)) >> { >> - if (!is_system_catalog) >> - elog(WARNING, "concurrent insert in >progress within table \"%s\"", >> - RelationGetRelationName>(heapRelation)); >> - >> /* >> * If we are performing uniqueness checks, >>indexing >> * such a tuple could lead to a bogus >uniqueness > >Huh, I don't think this should be necessary? Yes, I think you're right, I perhaps got carried away removing checks on concurrent inserts. I will revert those changes. >> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c >> index a4944fa..9d3f100 100644 >> --- a/src/backend/access/transam/varsup.c >> +++ b/src/backend/access/transam/varsup.c >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact) >> TransactionId xid; >> >> /* >> - * Workers synchronize transaction state at the beginning of each parallel >> - * operation, so we can't account for new XIDs after that point. >> - */ >> - if (IsInParallelMode()) >> - elog(ERROR, "cannot assign TransactionIds during a parallel operation"); >> - >> - /* >> * During bootstrap initialization, we return the special bootstrap >> * transaction id. >> */ > >Same thing, this code cannot just be allowed to be reachable. What >prevents you from assigning two different xids from different workers >etc? At least in the case of Parallel INSERT, the leader for the Parallel INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed through and assigned to each of the workers during their initialization (so they are assigned the same xid). Regards, Greg Nancarrow Fujitsu Australia
On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > For cases where it can't be allowed (e.g. INSERT into a table with > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE > ...") it at least allows parallelism of the SELECT part. > Thanks Greg for the patch. I have few points (inspired from parallel copy feature work) to mention: 1. What if the target table is a foreign table or partitioned table? 2. What happens if the target table has triggers(before statement, after statement, before row, after row) that are parallel unsafe? 3. Will each worker be doing single row insertions or multi inserts? If single row insertions, will the buffer lock contentions be more? 5. How does it behave with toast columns values? 6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT? I'm looking forward to seeing some initial numbers on execution times with and without patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Fri, Sep 25, 2020 at 7:01 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > I have few points (inspired from parallel copy feature work) to mention: > > 1. What if the target table is a foreign table or partitioned table? > 2. What happens if the target table has triggers(before statement, > after statement, before row, after row) that are parallel unsafe? > 3. Will each worker be doing single row insertions or multi inserts? > If single row insertions, will the buffer lock contentions be more? > 5. How does it behave with toast columns values? > 6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT? > Hi Bharath, Thanks for pointing out more cases I need to exclude and things I need to investigate further. I have taken note of them, and will do more testing and improvement. At least RETURNING clause with INSERT INTO SELECT is working! Regards, Greg Nancarrow Fujitsu Australia
On Fri, Sep 25, 2020 at 10:02 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Hi Andres, > > On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres@anarazel.de> wrote: > > > > > >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value, > >> TupleDesc toasttupDesc; > >> Datum t_values[3]; > >> bool t_isnull[3]; > >> - CommandId mycid = GetCurrentCommandId(true); > >> + CommandId mycid = GetCurrentCommandId(!IsParallelWorker()); > >> struct varlena *result; > >> struct varatt_external toast_pointer; > >> union > > > >Hm? Why do we need this in the various places you have made this change? > > It's because for Parallel INSERT, we're assigning the same command-id > to each worker up-front during worker initialization (the commandId > has been retrieved by the leader and passed through to each worker) > and "currentCommandIdUsed" has been set true. See the > AssignCommandIdForWorker() function in the patch. > If you see the code of GetCurrentCommandId(), you'll see it Assert > that it's not being run by a parallel worker if the parameter is true. > I didn't want to remove yet another check, without being able to know > the context of the caller, because only for Parallel INSERT do I know > that "currentCommandIdUsed was already true at the start of the > parallel operation". See the comment in that function. Anyway, that's > why I'm passing "false" to relevant GetCurrentCommandId() calls if > they're being run by a parallel (INSERT) worker. > But we can tighten the condition in GetCurrentCommandId() such that it Asserts for parallel worker only when currentCommandIdUsed is not set before start of parallel operation. I also find these changes in the callers of GetCurrentCommandId() quite adhoc and ugly even if they are correct. Also, why we don't face a similar problems for parallel copy? > > >> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c > >> index a4944fa..9d3f100 100644 > >> --- a/src/backend/access/transam/varsup.c > >> +++ b/src/backend/access/transam/varsup.c > >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact) > >> TransactionId xid; > >> > >> /* > >> - * Workers synchronize transaction state at the beginning of each parallel > >> - * operation, so we can't account for new XIDs after that point. > >> - */ > >> - if (IsInParallelMode()) > >> - elog(ERROR, "cannot assign TransactionIds during a parallel operation"); > >> - > >> - /* > >> * During bootstrap initialization, we return the special bootstrap > >> * transaction id. > >> */ > > > >Same thing, this code cannot just be allowed to be reachable. What > >prevents you from assigning two different xids from different workers > >etc? > > At least in the case of Parallel INSERT, the leader for the Parallel > INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed > through and assigned to each of the workers during their > initialization (so they are assigned the same xid). > So are you facing problems in this area because we EnterParallelMode before even assigning the xid in the leader? Because I don't think we should ever reach this code in the worker. If so, there are two possibilities that come to my mind (a) assign xid in leader before entering parallel mode or (b) change the check so that we don't assign the new xid in workers. In this case, I am again wondering how does parallel copy dealing this? -- With Regards, Amit Kapila.
On Fri, Sep 25, 2020 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > At least in the case of Parallel INSERT, the leader for the Parallel > > INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed > > through and assigned to each of the workers during their > > initialization (so they are assigned the same xid). > > > > So are you facing problems in this area because we EnterParallelMode > before even assigning the xid in the leader? Because I don't think we > should ever reach this code in the worker. If so, there are two > possibilities that come to my mind (a) assign xid in leader before > entering parallel mode or (b) change the check so that we don't assign > the new xid in workers. In this case, I am again wondering how does > parallel copy dealing this? > In parallel copy, we are doing option (a) i.e. the leader gets the full txn id before entering parallel mode and passes it to all workers. In the leader: full_transaction_id = GetCurrentFullTransactionId(); EnterParallelMode(); shared_info_ptr->full_transaction_id = full_transaction_id; In the workers: AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id); Hence below part of the code doesn't get hit. if (IsInParallelMode() || IsParallelWorker()) elog(ERROR, "cannot assign XIDs during a parallel operation"); We also deal with the commandid similarly i.e. the leader gets the command id, and workers would use it while insertion. In the leader: shared_info_ptr->mycid = GetCurrentCommandId(true); In the workers: AssignCommandIdForWorker(pcshared_info->mycid, true); [1] void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId) { TransactionState s = CurrentTransactionState; Assert((IsInParallelMode() || IsParallelWorker())); s->fullTransactionId = fullTransactionId; } void AssignCommandIdForWorker(CommandId commandId, bool used) { Assert((IsInParallelMode() || IsParallelWorker())); /* this is global to a transaction, not subtransaction-local */ if (used) currentCommandIdUsed = true; currentCommandId = commandId; } With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > But we can tighten the condition in GetCurrentCommandId() such that it > Asserts for parallel worker only when currentCommandIdUsed is not set > before start of parallel operation. I also find these changes in the > callers of GetCurrentCommandId() quite adhoc and ugly even if they are > correct. Also, why we don't face a similar problems for parallel copy? > For Parallel Insert, as part of query plan execution, GetCurrentCommandId(true) is being called as part of INSERT statement execution. Parallel Copy of course doesn't have to deal with this. That's why there's a difference. And also, it has its own parallel entry point (ParallelCopyMain), so it's in full control, it's not trying to fit in with the infrastructure for plan execution. > So are you facing problems in this area because we EnterParallelMode > before even assigning the xid in the leader? Because I don't think we > should ever reach this code in the worker. If so, there are two > possibilities that come to my mind (a) assign xid in leader before > entering parallel mode or (b) change the check so that we don't assign > the new xid in workers. In this case, I am again wondering how does > parallel copy dealing this? > Again, there's a fundamental difference in the Parallel Insert case. Right at the top of ExecutePlan it calls EnterParallelMode(). For ParallelCopy(), there is no such problem. EnterParallelMode() is only called just before ParallelCopyMain() is called. So it can easily acquire the xid before this, because then parallel mode is not set. As it turns out, I think I have solved the commandId issue (and almost the xid issue) by realising that both the xid and cid are ALREADY being included as part of the serialized transaction state in the Parallel DSM. So actually I don't believe that there is any need for separately passing them in the DSM, and having to use those AssignXXXXForWorker() functions in the worker code - not even in the Parallel Copy case (? - need to check). GetCurrentCommandId(true) and GetFullTransactionId() need to be called prior to Parallel DSM initialization, so they are included in the serialized transaction state. I just needed to add a function to set currentCommandIdUsed=true in the worker initialization (for INSERT case) and make a small tweak to the Assert in GetCurrentCommandId() to ensure that currentCommandIdUsed, in a parallel worker, never gets set to true when it is false. This is in line with the comment in that function, because we know that "currentCommandId was already true at the start of the parallel operation". With this in place, I don't need to change any of the original calls to GetCurrentCommandId(), so this addresses that issue raised by Andres. I am not sure yet how to get past the issue of the parallel mode being set at the top of ExecutePlan(). With that in place, it doesn't allow a xid to be acquired for the leader, without removing/changing that parallel-mode check in GetNewTransactionId(). Regards, Greg Nancarrow Fujitsu Australia
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@g > > As it turns out, I think I have solved the commandId issue (and almost > the xid issue) by realising that both the xid and cid are ALREADY > being included as part of the serialized transaction state in the > Parallel DSM. So actually I don't believe that there is any need for > separately passing them in the DSM, and having to use those > AssignXXXXForWorker() functions in the worker code - not even in the > Parallel Copy case (? - need to check). GetCurrentCommandId(true) and > GetFullTransactionId() need to be called prior to Parallel DSM > initialization, so they are included in the serialized transaction > state. > I just needed to add a function to set currentCommandIdUsed=true in > the worker initialization (for INSERT case) and make a small tweak to > the Assert in GetCurrentCommandId() to ensure that > currentCommandIdUsed, in a parallel worker, never gets set to true > when it is false. This is in line with the comment in that function, > because we know that "currentCommandId was already true at the start > of the parallel operation". With this in place, I don't need to change > any of the original calls to GetCurrentCommandId(), so this addresses > that issue raised by Andres. > > I am not sure yet how to get past the issue of the parallel mode being > set at the top of ExecutePlan(). With that in place, it doesn't allow > a xid to be acquired for the leader, without removing/changing that > parallel-mode check in GetNewTransactionId(). > I think now there is no fundamental problem in allocating xid in the leader and then sharing it with workers who can use it to perform the insert. So we can probably tweak that check so that it is true for only parallel workers. -- With Regards, Amit Kapila.
On Fri, Sep 25, 2020 at 2:31 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > For cases where it can't be allowed (e.g. INSERT into a table with > > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE > > ...") it at least allows parallelism of the SELECT part. > > > > Thanks Greg for the patch. > > 2. What happens if the target table has triggers(before statement, > after statement, before row, after row) that are parallel unsafe? > In such a case, the parallel insert shouldn't be selected. However, we should still be able to execute the Select part in parallel. > 3. Will each worker be doing single row insertions or multi inserts? > If single row insertions, will the buffer lock contentions be more? > I don't think the purpose of this patch is to change the basic flow of how Insert works and also I am not sure if it is worth the effort as well. I have answered this earlier in a bit more detailed way [1]. [1] - https://www.postgresql.org/message-id/CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg%40mail.gmail.com -- With Regards, Amit Kapila.
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Again, there's a fundamental difference in the Parallel Insert case. > Right at the top of ExecutePlan it calls EnterParallelMode(). > For ParallelCopy(), there is no such problem. EnterParallelMode() is > only called just before ParallelCopyMain() is called. So it can easily > acquire the xid before this, because then parallel mode is not set. > > As it turns out, I think I have solved the commandId issue (and almost > the xid issue) by realising that both the xid and cid are ALREADY > being included as part of the serialized transaction state in the > Parallel DSM. So actually I don't believe that there is any need for > separately passing them in the DSM, and having to use those > AssignXXXXForWorker() functions in the worker code - not even in the > Parallel Copy case (? - need to check). > Thanks Gred for the detailed points. I further checked on full txn id and command id. Yes, these are getting passed to workers via InitializeParallelDSM() -> SerializeTransactionState(). I tried to summarize what we need to do in case of parallel inserts in general i.e. parallel COPY, parallel inserts in INSERT INTO and parallel inserts in CTAS. In the leader: GetCurrentFullTransactionId() GetCurrentCommandId(true) EnterParallelMode(); InitializeParallelDSM() --> calls SerializeTransactionState() (both full txn id and command id are serialized into parallel DSM) In the workers: ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both full txn id and command id are restored into workers' CurrentTransactionState->fullTransactionId and currentCommandId) If the parallel workers are meant for insertions, then we need to set currentCommandIdUsed = true; Maybe we can lift the assert in GetCurrentCommandId(), if we don't want to touch that function, then we can have a new function GetCurrentCommandidInWorker() whose functionality will be same as GetCurrentCommandId() without the Assert(!IsParallelWorker());. Am I missing something? If the above points are true, we might have to update the parallel copy patch set, test the use cases and post separately in the parallel copy thread in coming days. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Sat, Sep 26, 2020 at 11:00 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > Again, there's a fundamental difference in the Parallel Insert case. > > Right at the top of ExecutePlan it calls EnterParallelMode(). > > For ParallelCopy(), there is no such problem. EnterParallelMode() is > > only called just before ParallelCopyMain() is called. So it can easily > > acquire the xid before this, because then parallel mode is not set. > > > > As it turns out, I think I have solved the commandId issue (and almost > > the xid issue) by realising that both the xid and cid are ALREADY > > being included as part of the serialized transaction state in the > > Parallel DSM. So actually I don't believe that there is any need for > > separately passing them in the DSM, and having to use those > > AssignXXXXForWorker() functions in the worker code - not even in the > > Parallel Copy case (? - need to check). > > > > Thanks Gred for the detailed points. > > I further checked on full txn id and command id. Yes, these are > getting passed to workers via InitializeParallelDSM() -> > SerializeTransactionState(). I tried to summarize what we need to do > in case of parallel inserts in general i.e. parallel COPY, parallel > inserts in INSERT INTO and parallel inserts in CTAS. > > In the leader: > GetCurrentFullTransactionId() > GetCurrentCommandId(true) > EnterParallelMode(); > InitializeParallelDSM() --> calls SerializeTransactionState() > (both full txn id and command id are serialized into parallel DSM) > This won't be true for Parallel Insert patch as explained by Greg as well because we enter-parallel-mode much before we assign xid. -- With Regards, Amit Kapila.
On Fri, Sep 25, 2020 at 9:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > What if this > > > ends up being invoked from inside C code? > > > > > > > I think it shouldn't be a problem unless one is trying to do something > > like insert into foreign key table. So, probably we can have an Assert > > to catch it if possible. Do you have any other idea? > > > > Note that the planner code updated by the patch does avoid creating a > Parallel INSERT plan in the case of inserting into a table with a > foreign key (so commandIds won't be created in the parallel-worker > code). > I'm not sure how to distinguish the "invoked from inside C code" case though. > I think if possible we can have an Assert to check if it is a parallel-worker and relation has a foreign-key. Similarly, we can enhance the check for any other un-safe use. This will prevent the illegal usage of inserts via parallel workers. -- With Regards, Amit Kapila.
On Wed, Sep 23, 2020 at 2:21 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > - When INSERTs are made parallel, currently the reported row-count in > > the "INSERT 0 <row-count>" status only reflects the rows that the > > leader has processed (not the workers) - so it is obviously less than > > the actual number of rows inserted. > > Attached an updated patch which fixes this issue (for parallel > INSERTs, each worker's processed tuple count is communicated in shared > memory back to the leader, where it is added to the global > "es_processed" count). I noticed that we are not having any check for skipping temporary table insertion. /* Check if the target relation has foreign keys; if so, avoid * creating a parallel Insert plan (because inserting into * such tables would result in creation of new CommandIds, and * this isn't supported by parallel workers). * Similarly, avoid creating a parallel Insert plan if ON * CONFLICT ... DO UPDATE ... has been specified, because * parallel UPDATE is not supported. * However, do allow any underlying query to be run by parallel * workers in these cases. */ You should also include temporary tables check here, as parallel workers might not have access to temporary tables. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Sun, Sep 27, 2020 at 2:03 AM vignesh C <vignesh21@gmail.com> wrote: > > I noticed that we are not having any check for skipping temporary > table insertion. > > You should also include temporary tables check here, as parallel > workers might not have access to temporary tables. > Thanks Vignesh, you are right, I need to test this and add it to the list of further exclusions that the patch needs to check for. Hopefully I can provide an updated patch soon that caters for these additional identified cases. Regards, Greg Nancarrow Fujitsu Australia
On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > I further checked on full txn id and command id. Yes, these are > getting passed to workers via InitializeParallelDSM() -> > SerializeTransactionState(). I tried to summarize what we need to do > in case of parallel inserts in general i.e. parallel COPY, parallel > inserts in INSERT INTO and parallel inserts in CTAS. > > In the leader: > GetCurrentFullTransactionId() > GetCurrentCommandId(true) > EnterParallelMode(); > InitializeParallelDSM() --> calls SerializeTransactionState() > (both full txn id and command id are serialized into parallel DSM) > > In the workers: > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > full txn id and command id are restored into workers' > CurrentTransactionState->fullTransactionId and currentCommandId) > If the parallel workers are meant for insertions, then we need to set > currentCommandIdUsed = true; Maybe we can lift the assert in > GetCurrentCommandId(), if we don't want to touch that function, then > we can have a new function GetCurrentCommandidInWorker() whose > functionality will be same as GetCurrentCommandId() without the > Assert(!IsParallelWorker());. > > Am I missing something? > > If the above points are true, we might have to update the parallel > copy patch set, test the use cases and post separately in the parallel > copy thread in coming days. > Hi Bharath, I pretty much agree with your above points. I've attached an updated Parallel INSERT...SELECT patch, that: - Only uses existing transaction state serialization support for transfer of xid and cid. - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting currentCommandIdUsed=true at the start of a parallel operation (used for Parallel INSERT case, where we know the currentCommandId has been assigned to the worker at the start of the parallel operation). - Tweaks the Assert condition within "used=true" parameter case in GetCurrentCommandId(), so that it only fires if in a parallel worker and currentCommandId is false - refer to the updated comment in that function. - Does not modify any existing GetCurrentCommandId() calls. - Does not remove any existing parallel-related asserts/checks, except for the "cannot insert tuples in a parallel worker" error in heap_prepare_insert(). I am still considering what to do with the original error-check here. [- Does not yet cater for other exclusion cases that you and Vignesh have pointed out] This patch is mostly a lot cleaner, but does contain a possible ugly hack, in that where it needs to call GetCurrentFullTransactionId(), it must temporarily escape parallel-mode (recalling that parallel-mode is set true right at the top of ExectePlan() in the cases of Parallel INSERT/SELECT). Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > I further checked on full txn id and command id. Yes, these are > > getting passed to workers via InitializeParallelDSM() -> > > SerializeTransactionState(). I tried to summarize what we need to do > > in case of parallel inserts in general i.e. parallel COPY, parallel > > inserts in INSERT INTO and parallel inserts in CTAS. > > > > In the leader: > > GetCurrentFullTransactionId() > > GetCurrentCommandId(true) > > EnterParallelMode(); > > InitializeParallelDSM() --> calls SerializeTransactionState() > > (both full txn id and command id are serialized into parallel DSM) > > > > In the workers: > > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > > full txn id and command id are restored into workers' > > CurrentTransactionState->fullTransactionId and currentCommandId) > > If the parallel workers are meant for insertions, then we need to set > > currentCommandIdUsed = true; Maybe we can lift the assert in > > GetCurrentCommandId(), if we don't want to touch that function, then > > we can have a new function GetCurrentCommandidInWorker() whose > > functionality will be same as GetCurrentCommandId() without the > > Assert(!IsParallelWorker());. > > > > Am I missing something? > > > > If the above points are true, we might have to update the parallel > > copy patch set, test the use cases and post separately in the parallel > > copy thread in coming days. > > > > Hi Bharath, > > I pretty much agree with your above points. > > I've attached an updated Parallel INSERT...SELECT patch, that: > - Only uses existing transaction state serialization support for > transfer of xid and cid. > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting > currentCommandIdUsed=true at the start of a parallel operation (used > for Parallel INSERT case, where we know the currentCommandId has been > assigned to the worker at the start of the parallel operation). > - Tweaks the Assert condition within "used=true" parameter case in > GetCurrentCommandId(), so that it only fires if in a parallel worker > and currentCommandId is false - refer to the updated comment in that > function. > - Does not modify any existing GetCurrentCommandId() calls. > - Does not remove any existing parallel-related asserts/checks, except > for the "cannot insert tuples in a parallel worker" error in > heap_prepare_insert(). I am still considering what to do with the > original error-check here. > [- Does not yet cater for other exclusion cases that you and Vignesh > have pointed out] > > This patch is mostly a lot cleaner, but does contain a possible ugly > hack, in that where it needs to call GetCurrentFullTransactionId(), it > must temporarily escape parallel-mode (recalling that parallel-mode is > set true right at the top of ExectePlan() in the cases of Parallel > INSERT/SELECT). > Thanks Greg. In general, see a few things common to all parallel insert cases(CTAS[1], COPY[2], INSERT INTO SELECTs): 1. Removal of "cannot insert tuples in a parallel worker" restriction from heap_prepare_insert() 2. Each worker should be able to set currentCommandIdUsed to true. 3. The change you proposed to make in GetCurrentCommandId()'s assert condition. Please add if I miss any other common point. Common solutions to each of the above points would be beneficial to all the parallel insert cases. How about having a common thread, discussion and a common patch for all the 3 points? @Amit Kapila @Greg Nancarrow @vignesh C Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACWj%2B3H5TQqwxANZmdePEnSNxk-YAeT1c5WE184Gf75XUw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 28, 2020 at 4:06 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > I further checked on full txn id and command id. Yes, these are > > > getting passed to workers via InitializeParallelDSM() -> > > > SerializeTransactionState(). I tried to summarize what we need to do > > > in case of parallel inserts in general i.e. parallel COPY, parallel > > > inserts in INSERT INTO and parallel inserts in CTAS. > > > > > > In the leader: > > > GetCurrentFullTransactionId() > > > GetCurrentCommandId(true) > > > EnterParallelMode(); > > > InitializeParallelDSM() --> calls SerializeTransactionState() > > > (both full txn id and command id are serialized into parallel DSM) > > > > > > In the workers: > > > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > > > full txn id and command id are restored into workers' > > > CurrentTransactionState->fullTransactionId and currentCommandId) > > > If the parallel workers are meant for insertions, then we need to set > > > currentCommandIdUsed = true; Maybe we can lift the assert in > > > GetCurrentCommandId(), if we don't want to touch that function, then > > > we can have a new function GetCurrentCommandidInWorker() whose > > > functionality will be same as GetCurrentCommandId() without the > > > Assert(!IsParallelWorker());. > > > > > > Am I missing something? > > > > > > If the above points are true, we might have to update the parallel > > > copy patch set, test the use cases and post separately in the parallel > > > copy thread in coming days. > > > > > > > Hi Bharath, > > > > I pretty much agree with your above points. > > > > I've attached an updated Parallel INSERT...SELECT patch, that: > > - Only uses existing transaction state serialization support for > > transfer of xid and cid. > > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting > > currentCommandIdUsed=true at the start of a parallel operation (used > > for Parallel INSERT case, where we know the currentCommandId has been > > assigned to the worker at the start of the parallel operation). > > - Tweaks the Assert condition within "used=true" parameter case in > > GetCurrentCommandId(), so that it only fires if in a parallel worker > > and currentCommandId is false - refer to the updated comment in that > > function. > > - Does not modify any existing GetCurrentCommandId() calls. > > - Does not remove any existing parallel-related asserts/checks, except > > for the "cannot insert tuples in a parallel worker" error in > > heap_prepare_insert(). I am still considering what to do with the > > original error-check here. > > [- Does not yet cater for other exclusion cases that you and Vignesh > > have pointed out] > > > > This patch is mostly a lot cleaner, but does contain a possible ugly > > hack, in that where it needs to call GetCurrentFullTransactionId(), it > > must temporarily escape parallel-mode (recalling that parallel-mode is > > set true right at the top of ExectePlan() in the cases of Parallel > > INSERT/SELECT). > > > > Thanks Greg. > > In general, see a few things common to all parallel insert > cases(CTAS[1], COPY[2], INSERT INTO SELECTs): > 1. Removal of "cannot insert tuples in a parallel worker" restriction > from heap_prepare_insert() > 2. Each worker should be able to set currentCommandIdUsed to true. > 3. The change you proposed to make in GetCurrentCommandId()'s assert condition. > > Please add if I miss any other common point. > > Common solutions to each of the above points would be beneficial to > all the parallel insert cases. How about having a common thread, > discussion and a common patch for all the 3 points? > I am not sure if that is required at this stage, lets first sort out other parts of the design because there could be other bigger problems which we have not thought bout yet. I have already shared some thoughts on those points in this thread, lets first get that done and have the basic patch ready then if required we can discuss in detail about these points in other thread. -- With Regards, Amit Kapila.
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > I further checked on full txn id and command id. Yes, these are > > getting passed to workers via InitializeParallelDSM() -> > > SerializeTransactionState(). I tried to summarize what we need to do > > in case of parallel inserts in general i.e. parallel COPY, parallel > > inserts in INSERT INTO and parallel inserts in CTAS. > > > > In the leader: > > GetCurrentFullTransactionId() > > GetCurrentCommandId(true) > > EnterParallelMode(); > > InitializeParallelDSM() --> calls SerializeTransactionState() > > (both full txn id and command id are serialized into parallel DSM) > > > > In the workers: > > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > > full txn id and command id are restored into workers' > > CurrentTransactionState->fullTransactionId and currentCommandId) > > If the parallel workers are meant for insertions, then we need to set > > currentCommandIdUsed = true; Maybe we can lift the assert in > > GetCurrentCommandId(), if we don't want to touch that function, then > > we can have a new function GetCurrentCommandidInWorker() whose > > functionality will be same as GetCurrentCommandId() without the > > Assert(!IsParallelWorker());. > > > > Am I missing something? > > > > If the above points are true, we might have to update the parallel > > copy patch set, test the use cases and post separately in the parallel > > copy thread in coming days. > > > > Hi Bharath, > > I pretty much agree with your above points. > > I've attached an updated Parallel INSERT...SELECT patch, that: > - Only uses existing transaction state serialization support for > transfer of xid and cid. > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting > currentCommandIdUsed=true at the start of a parallel operation (used > for Parallel INSERT case, where we know the currentCommandId has been > assigned to the worker at the start of the parallel operation). > - Tweaks the Assert condition within "used=true" parameter case in > GetCurrentCommandId(), so that it only fires if in a parallel worker > and currentCommandId is false - refer to the updated comment in that > function. > - Does not modify any existing GetCurrentCommandId() calls. > - Does not remove any existing parallel-related asserts/checks, except > for the "cannot insert tuples in a parallel worker" error in > heap_prepare_insert(). I am still considering what to do with the > original error-check here. > [- Does not yet cater for other exclusion cases that you and Vignesh > have pointed out] > > This patch is mostly a lot cleaner, but does contain a possible ugly > hack, in that where it needs to call GetCurrentFullTransactionId(), it > must temporarily escape parallel-mode (recalling that parallel-mode is > set true right at the top of ExectePlan() in the cases of Parallel > INSERT/SELECT). I think you still need to work on the costing part, basically if we are parallelizing whole insert then plan is like below -> Gather -> Parallel Insert -> Parallel Seq Scan That means the tuple we are selecting via scan are not sent back to the gather node, so in cost_gather we need to see if it is for the INSERT then there is no row transferred through the parallel queue that mean we need not to pay any parallel tuple cost. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Sep 29, 2020 at 8:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > I further checked on full txn id and command id. Yes, these are > > > getting passed to workers via InitializeParallelDSM() -> > > > SerializeTransactionState(). I tried to summarize what we need to do > > > in case of parallel inserts in general i.e. parallel COPY, parallel > > > inserts in INSERT INTO and parallel inserts in CTAS. > > > > > > In the leader: > > > GetCurrentFullTransactionId() > > > GetCurrentCommandId(true) > > > EnterParallelMode(); > > > InitializeParallelDSM() --> calls SerializeTransactionState() > > > (both full txn id and command id are serialized into parallel DSM) > > > > > > In the workers: > > > ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both > > > full txn id and command id are restored into workers' > > > CurrentTransactionState->fullTransactionId and currentCommandId) > > > If the parallel workers are meant for insertions, then we need to set > > > currentCommandIdUsed = true; Maybe we can lift the assert in > > > GetCurrentCommandId(), if we don't want to touch that function, then > > > we can have a new function GetCurrentCommandidInWorker() whose > > > functionality will be same as GetCurrentCommandId() without the > > > Assert(!IsParallelWorker());. > > > > > > Am I missing something? > > > > > > If the above points are true, we might have to update the parallel > > > copy patch set, test the use cases and post separately in the parallel > > > copy thread in coming days. > > > > > > > Hi Bharath, > > > > I pretty much agree with your above points. > > > > I've attached an updated Parallel INSERT...SELECT patch, that: > > - Only uses existing transaction state serialization support for > > transfer of xid and cid. > > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting > > currentCommandIdUsed=true at the start of a parallel operation (used > > for Parallel INSERT case, where we know the currentCommandId has been > > assigned to the worker at the start of the parallel operation). > > - Tweaks the Assert condition within "used=true" parameter case in > > GetCurrentCommandId(), so that it only fires if in a parallel worker > > and currentCommandId is false - refer to the updated comment in that > > function. > > - Does not modify any existing GetCurrentCommandId() calls. > > - Does not remove any existing parallel-related asserts/checks, except > > for the "cannot insert tuples in a parallel worker" error in > > heap_prepare_insert(). I am still considering what to do with the > > original error-check here. > > [- Does not yet cater for other exclusion cases that you and Vignesh > > have pointed out] > > > > This patch is mostly a lot cleaner, but does contain a possible ugly > > hack, in that where it needs to call GetCurrentFullTransactionId(), it > > must temporarily escape parallel-mode (recalling that parallel-mode is > > set true right at the top of ExectePlan() in the cases of Parallel > > INSERT/SELECT). > > I think you still need to work on the costing part, basically if we > are parallelizing whole insert then plan is like below > > -> Gather > -> Parallel Insert > -> Parallel Seq Scan > > That means the tuple we are selecting via scan are not sent back to > the gather node, so in cost_gather we need to see if it is for the > INSERT then there is no row transferred through the parallel queue > that mean we need not to pay any parallel tuple cost. I just looked into the parallel CTAS[1] patch for the same thing, and I can see in that patch it is being handled. [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
> > > > I think you still need to work on the costing part, basically if we > > are parallelizing whole insert then plan is like below > > > > -> Gather > > -> Parallel Insert > > -> Parallel Seq Scan > > > > That means the tuple we are selecting via scan are not sent back to > > the gather node, so in cost_gather we need to see if it is for the > > INSERT then there is no row transferred through the parallel queue > > that mean we need not to pay any parallel tuple cost. > > I just looked into the parallel CTAS[1] patch for the same thing, and > I can see in that patch it is being handled. > > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com > Hi Dilip, You're right, the costing for Parallel Insert is not done and finished, I'm still working on the costing, and haven't posted an updated patch for it yet. As far as cost_gather() method is concerned, for Parallel INSERT, it can probably use the same costing approach as the CTAS patch except in the case of a specified RETURNING clause. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > > I think you still need to work on the costing part, basically if we > > > are parallelizing whole insert then plan is like below > > > > > > -> Gather > > > -> Parallel Insert > > > -> Parallel Seq Scan > > > > > > That means the tuple we are selecting via scan are not sent back to > > > the gather node, so in cost_gather we need to see if it is for the > > > INSERT then there is no row transferred through the parallel queue > > > that mean we need not to pay any parallel tuple cost. > > > > I just looked into the parallel CTAS[1] patch for the same thing, and > > I can see in that patch it is being handled. > > > > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com > > > > Hi Dilip, > > You're right, the costing for Parallel Insert is not done and > finished, I'm still working on the costing, and haven't posted an > updated patch for it yet. Okay. > As far as cost_gather() method is concerned, for Parallel INSERT, it > can probably use the same costing approach as the CTAS patch except in > the case of a specified RETURNING clause. Yeah right. I did not think about the returning part. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > > I think you still need to work on the costing part, basically if we > > > are parallelizing whole insert then plan is like below > > > > > > -> Gather > > > -> Parallel Insert > > > -> Parallel Seq Scan > > > > > > That means the tuple we are selecting via scan are not sent back to > > > the gather node, so in cost_gather we need to see if it is for the > > > INSERT then there is no row transferred through the parallel queue > > > that mean we need not to pay any parallel tuple cost. > > > > I just looked into the parallel CTAS[1] patch for the same thing, and > > I can see in that patch it is being handled. > > > > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com > > > > Hi Dilip, > > You're right, the costing for Parallel Insert is not done and > finished, I'm still working on the costing, and haven't posted an > updated patch for it yet. > As far as cost_gather() method is concerned, for Parallel INSERT, it > can probably use the same costing approach as the CTAS patch except in > the case of a specified RETURNING clause. > I have one question which is common to both this patch and parallel inserts in CTAS[1], do we need to skip creating tuple queues(ExecParallelSetupTupleQueues) as we don't have any tuples that's being shared from workers to leader? Put it another way, do we use the tuple queue for sharing any info other than tuples from workers to leader? [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > > > > > I think you still need to work on the costing part, basically if we > > > > are parallelizing whole insert then plan is like below > > > > > > > > -> Gather > > > > -> Parallel Insert > > > > -> Parallel Seq Scan > > > > > > > > That means the tuple we are selecting via scan are not sent back to > > > > the gather node, so in cost_gather we need to see if it is for the > > > > INSERT then there is no row transferred through the parallel queue > > > > that mean we need not to pay any parallel tuple cost. > > > > > > I just looked into the parallel CTAS[1] patch for the same thing, and > > > I can see in that patch it is being handled. > > > > > > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com > > > > > > > Hi Dilip, > > > > You're right, the costing for Parallel Insert is not done and > > finished, I'm still working on the costing, and haven't posted an > > updated patch for it yet. > > As far as cost_gather() method is concerned, for Parallel INSERT, it > > can probably use the same costing approach as the CTAS patch except in > > the case of a specified RETURNING clause. > > > > I have one question which is common to both this patch and parallel > inserts in CTAS[1], do we need to skip creating tuple > queues(ExecParallelSetupTupleQueues) as we don't have any tuples > that's being shared from workers to leader? Put it another way, do we > use the tuple queue for sharing any info other than tuples from > workers to leader? Ideally, we don't need the tuple queue unless we want to transfer the tuple to the gather node. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > I have one question which is common to both this patch and parallel > inserts in CTAS[1], do we need to skip creating tuple > queues(ExecParallelSetupTupleQueues) as we don't have any tuples > that's being shared from workers to leader? > As far as this patch is concerned we might need to return tuples when there is a Returning clause. I think for the cases where we don't need to return tuples we might want to skip creating these queues if it is feasible without too many changes. -- With Regards, Amit Kapila.
On Mon, Oct 5, 2020 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > > I have one question which is common to both this patch and parallel > > inserts in CTAS[1], do we need to skip creating tuple > > queues(ExecParallelSetupTupleQueues) as we don't have any tuples > > that's being shared from workers to leader? > > > > As far as this patch is concerned we might need to return tuples when > there is a Returning clause. I think for the cases where we don't need > to return tuples we might want to skip creating these queues if it is > feasible without too many changes. +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have one question which is common to both this patch and parallel > > > inserts in CTAS[1], do we need to skip creating tuple > > > queues(ExecParallelSetupTupleQueues) as we don't have any tuples > > > that's being shared from workers to leader? > > > > > > > As far as this patch is concerned we might need to return tuples when > > there is a Returning clause. I think for the cases where we don't need > > to return tuples we might want to skip creating these queues if it is > > feasible without too many changes. > Hi Dilip, You're right. I've included that in my latest version of the patch (so Gather should only start tuple queues in the case of parallel SELECT or parallel INSERT with a RETURNING clause). Other functionality updated includes: - Added more necessary exclusions for Parallel INSERT INTO ... SELECT ... (but allowing underlying query to still be parallel): - non-parallel-safe triggers - non-parallel-safe default and check expressions - foreign tables - temporary tables - Added support for before/after statement-level INSERT triggers (can't allow parallel workers to execute these) - Adjusted cost of Gather node, for when RETURNING clause is not specified I have not found issues with partition tables (yet) or toast column values. Also, I have attached a separate patch (requested by Andres Freund) that just allows the underlying SELECT part of "INSERT INTO ... SELECT ..." to be parallel. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > I have not found issues with partition tables (yet) or toast column values. > I think for toast column values there may not be a problem as each parallel worker inserts toast column values individually. But the problem may arise if a partitioned table has foreign table as a partition, I think we can not allow parallelism for this case too, but it's hard to determine ahead of time whether a table has a foreign partition.(See [1] in copy.c) > > - Added support for before/after statement-level INSERT triggers > (can't allow parallel workers to execute these) > I think we can allow parallelism for before statement level-triggers. Leader can execute this trigger and go for parallel inserts. How about before row, after row, instead row, new table type triggers? [1] else { /* * For partitioned tables, we may still be able to perform bulk * inserts. However, the possibility of this depends on which types * of triggers exist on the partition. We must disable bulk inserts * if the partition is a foreign table or it has any before row insert * or insert instead triggers (same as we checked above for the parent * table). Since the partition's resultRelInfos are initialized only * when we actually need to insert the first tuple into them, we must * have the intermediate insert method of CIM_MULTI_CONDITIONAL to * flag that we must later determine if we can use bulk-inserts for * the partition being inserted into. */ if (proute) insertMethod = CIM_MULTI_CONDITIONAL; With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 6, 2020 at 9:10 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > But the problem may arise if a partitioned table has foreign table as > a partition, I think we can not allow parallelism for this case too, > but it's hard to determine ahead of time whether a table has a foreign > partition.(See [1] in copy.c) > Thanks, I had seen that as a potential issue when scanning the code, but had forgotten to note it. I'll check your code again. > > > > - Added support for before/after statement-level INSERT triggers > > (can't allow parallel workers to execute these) > > > > I think we can allow parallelism for before statement level-triggers. > Leader can execute this trigger and go for parallel inserts. > My attached patch implements the before/after statement-level trigger invocation. (For INSERT INTO ... SELECT... case, it needs to account for parallel and non-parallel INSERT, and also the fact that, as the patch currently stands, the leader also participates in a parallel INSERT - so I found it necessary to invoke those triggers at the Gather node level in that case). > How about before row, after row, instead row, new table type triggers? > My attached patch does not allow parallel INSERT if there are any row-level triggers (as the trigger functions could see a different and unpredictable table state compared to non-parallel INSERT, even if otherwise parallel-safe). Regards, Greg Nancarrow Fujitsu Australia
On Tue, Oct 6, 2020 at 4:13 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Oct 6, 2020 at 9:10 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > But the problem may arise if a partitioned table has foreign table as > > a partition, I think we can not allow parallelism for this case too, > > but it's hard to determine ahead of time whether a table has a foreign > > partition.(See [1] in copy.c) > > > > Thanks, I had seen that as a potential issue when scanning the code, > but had forgotten to note it. I'll check your code again. > In parallel, we are not doing anything(due to the same reason explained in above comment) to find whether there is a foreign partition or not while deciding to go with parallel/non-parallel copy, we are just throwing an error during the first tuple insertion into the partition. errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"), errhint("Try COPY without PARALLEL option"))); > > > > > > - Added support for before/after statement-level INSERT triggers > > > (can't allow parallel workers to execute these) > > > > > > > I think we can allow parallelism for before statement level-triggers. > > Leader can execute this trigger and go for parallel inserts. > > > > My attached patch implements the before/after statement-level trigger > invocation. > (For INSERT INTO ... SELECT... case, it needs to account for parallel > and non-parallel INSERT, and also the fact that, as the patch > currently stands, the leader also participates in a parallel INSERT - > so I found it necessary to invoke those triggers at the Gather node > level in that case). > Allowing the leader to execute before statement triggers at Gather node level before invoking the parallel plan and then parallel inserts makes sense. But if there are any after statement triggers, there may come transition tables, see Amit's findings under Case-1 in [1] and we must disable parallelism in that case. [1] - https://www.postgresql.org/message-id/flat/CAA4eK1%2BANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > In parallel, we are not doing anything(due to the same reason > explained in above comment) to find whether there is a foreign > partition or not while deciding to go with parallel/non-parallel copy, > we are just throwing an error during the first tuple insertion into > the partition. > > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD > OF triggers, or if the partition is foreign partition"), > errhint("Try COPY without PARALLEL option"))); > I may well need to do something similar for parallel INSERT, but I'm kind of surprised it can't be detected earlier (?). Will need to further test this. > > Allowing the leader to execute before statement triggers at Gather > node level before invoking the parallel plan and then parallel inserts > makes sense. But if there are any after statement triggers, there may > come transition tables, see Amit's findings under Case-1 in [1] and we > must disable parallelism in that case. > > [1] - https://www.postgresql.org/message-id/flat/CAA4eK1%2BANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA%40mail.gmail.com > The patch I last posted for parallel INSERT does detect use of transition tables in this case (trigdesc->trig_insert_new_table) and disables INSERT parallelism (I tested it against Amit's example), yet still otherwise allows AFTER STATEMENT triggers for parallel INSERT. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > In parallel, we are not doing anything(due to the same reason > explained in above comment) to find whether there is a foreign > partition or not while deciding to go with parallel/non-parallel copy, > we are just throwing an error during the first tuple insertion into > the partition. > > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD > OF triggers, or if the partition is foreign partition"), > errhint("Try COPY without PARALLEL option"))); > I'm wondering whether code similar to the following can safely be used to detect a foreign partition: if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { int i; PartitionDesc pd = RelationGetPartitionDesc(rel); for (i = 0; i < pd->nparts; i++) { if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE) { table_close(rel, NoLock); return false; } } } Thoughts? Regards, Greg Nancarrow Fujitsu Australia
On Wed, Oct 7, 2020 at 7:25 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > In parallel, we are not doing anything(due to the same reason > > explained in above comment) to find whether there is a foreign > > partition or not while deciding to go with parallel/non-parallel copy, > > we are just throwing an error during the first tuple insertion into > > the partition. > > > > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD > > OF triggers, or if the partition is foreign partition"), > > errhint("Try COPY without PARALLEL option"))); > > > > I'm wondering whether code similar to the following can safely be used > to detect a foreign partition: > > if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > { > int i; > PartitionDesc pd = RelationGetPartitionDesc(rel); > for (i = 0; i < pd->nparts; i++) > { > if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE) > { > table_close(rel, NoLock); > return false; > } > } > } > Actually, the addition of this kind of check is still not good enough. Partitions can have their own constraints, triggers, column default expressions etc. and a partition itself can be partitioned. I've written code to recursively walk the partitions and do all the various checks for parallel-insert-safety as before, but it's doing a fair bit of work. Any other idea of dealing with this? Seems it can't be avoided if you want to support partitioned tables and partitions. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Oct 8, 2020 at 1:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 7:25 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > In parallel, we are not doing anything(due to the same reason > > > explained in above comment) to find whether there is a foreign > > > partition or not while deciding to go with parallel/non-parallel copy, > > > we are just throwing an error during the first tuple insertion into > > > the partition. > > > > > > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD > > > OF triggers, or if the partition is foreign partition"), > > > errhint("Try COPY without PARALLEL option"))); > > > > > > > I'm wondering whether code similar to the following can safely be used > > to detect a foreign partition: > > > > if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > > { > > int i; > > PartitionDesc pd = RelationGetPartitionDesc(rel); > > for (i = 0; i < pd->nparts; i++) > > { > > if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE) > > { > > table_close(rel, NoLock); > > return false; > > } > > } > > } > > > > Actually, the addition of this kind of check is still not good enough. > Partitions can have their own constraints, triggers, column default > expressions etc. and a partition itself can be partitioned. > I've written code to recursively walk the partitions and do all the > various checks for parallel-insert-safety as before, but it's doing a > fair bit of work. > Any other idea of dealing with this? Seems it can't be avoided if you > want to support partitioned tables and partitions. > IMHO, it's good not to do all of this recursive checking right now, which may complicate the code or may restrict the performance gain. Having said that, in future we may have to do something about it. Others may have better opinions on this point. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 6, 2020 at 10:38 PM Greg Nancarrow <gregn4422@gmail.com> wrote: + if (estate->es_plannedstmt->commandType == CMD_INSERT) ... + if ((XactReadOnly || (IsInParallelMode() && queryDesc->plannedstmt->commandType != CMD_INSERT)) && ... + isParallelInsertLeader = nodeModifyTableState->operation == CMD_INSERT; ... One thing I noticed is that you have logic, variable names and assertions all over the tree that assume that we can only do parallel *inserts*. I agree 100% with your plan to make Parallel Insert work first, it is an excellent goal and if we get it in it'll be a headline feature of PG14 (along with COPY etc). That said, I wonder if it would make sense to use more general naming (isParallelModifyLeader?), be more liberal where you really mean "is it DML", and find a way to centralise the logic about which DML commands types are currently allowed (ie insert only for now) for assertions and error checks etc, so that in future we don't have to go around and change all these places and rename things again and again. While contemplating that, I couldn't resist taking a swing at the main (?) show stopper for Parallel Update and Parallel Delete, judging by various clues left in code comments by Robert: combo command IDs created by other processes. Here's a rapid prototype to make that work (though perhaps not as efficiently as we'd want, not sure). With that in place, I wonder what else we'd need to extend your patch to cover all three operations... it can't be much! Of course I don't want to derail your work on Parallel Insert, I'm just providing some motivation for my comments on the (IMHO) shortsightedness of some of the coding. PS Why not use git format-patch to create patches?
Attachment
On Fri, Oct 9, 2020 at 8:41 AM Thomas Munro <thomas.munro@gmail.com> wrote: > One thing I noticed is that you have logic, variable names and > assertions all over the tree that assume that we can only do parallel > *inserts*. I agree 100% with your plan to make Parallel Insert work > first, it is an excellent goal and if we get it in it'll be a headline > feature of PG14 (along with COPY etc). That said, I wonder if it > would make sense to use more general naming (isParallelModifyLeader?), > be more liberal where you really mean "is it DML", and find a way to > centralise the logic about which DML commands types are currently > allowed (ie insert only for now) for assertions and error checks etc, > so that in future we don't have to go around and change all these > places and rename things again and again. > Fair points. I agree, it would make more sense to generalise the naming and centralise the DML-command-type checks, rather than everything being insert-specific. It was getting a bit ugly. I'll work on that. > While contemplating that, I couldn't resist taking a swing at the main > (?) show stopper for Parallel Update and Parallel Delete, judging by > various clues left in code comments by Robert: combo command IDs > created by other processes. Here's a rapid prototype to make that > work (though perhaps not as efficiently as we'd want, not sure). With > that in place, I wonder what else we'd need to extend your patch to > cover all three operations... it can't be much! Of course I don't > want to derail your work on Parallel Insert, I'm just providing some > motivation for my comments on the (IMHO) shortsightedness of some of > the coding. > Thanks for your prototype code for coordination of combo command IDs with the workers. It does give me the incentive to look beyond that issue and see whether parallel Update and parallel Delete are indeed possible. I'll be sure to give it a go! > PS Why not use git format-patch to create patches? Guess I was being a bit lazy - will use git format-patch in future. Regards, Greg Nancarrow Fujitsu Australia
On Fri, Oct 9, 2020 at 3:48 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > It does give me the incentive to look beyond that issue and see > whether parallel Update and parallel Delete are indeed possible. I'll > be sure to give it a go! Cool! A couple more observations: + pathnode->path.parallel_aware = parallel_workers > 0 ? true : false; Hmm, I think this may be bogus window dressing only affecting EXPLAIN. If you change it to assign false always, it works just the same, except EXPLAIN says: Gather (cost=15428.00..16101.14 rows=1000000 width=4) Workers Planned: 2 -> Insert on s (cost=15428.00..16101.14 rows=208334 width=4) -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4) ... instead of: Gather (cost=15428.00..16101.14 rows=1000000 width=4) Workers Planned: 2 -> Parallel Insert on s (cost=15428.00..16101.14 rows=208334 width=4) -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4) AFAICS it's not parallel-aware, it just happens to be running in parallel with a partial input and partial output (and in this case, effect in terms of writes). Parallel-aware is our term for nodes that actually know they are running in parallel and do some special coordination with their twins in other processes. The estimated row count also looks wrong; at a guess, the parallel divisor is applied twice. Let me try that with parallel_leader_particiation=off (which disables some funky maths in the row estimation and makes it straight division by number of processes): Gather (cost=17629.00..18645.50 rows=1000000 width=4) Workers Planned: 2 -> Insert on s (cost=17629.00..18645.50 rows=250000 width=4) -> Parallel Hash Join (cost=17629.00..37291.00 rows=500000 width=4) [more nodes omitted] Yeah, that was a join that spat out a million rows, and we correctly estimated 500k per process, and then Insert (still with my hack to turn off the bogus "Parallel" display in this case, but it doesn't affect the estimation) estimated 250k per process, which is wrong.
On Fri, Oct 9, 2020 at 6:31 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > A couple more observations: > > + pathnode->path.parallel_aware = parallel_workers > 0 ? true : false; > > Hmm, I think this may be bogus window dressing only affecting EXPLAIN. > If you change it to assign false always, it works just the same, > except EXPLAIN says: > > Gather (cost=15428.00..16101.14 rows=1000000 width=4) > Workers Planned: 2 > -> Insert on s (cost=15428.00..16101.14 rows=208334 width=4) > -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4) > > ... instead of: > > Gather (cost=15428.00..16101.14 rows=1000000 width=4) > Workers Planned: 2 > -> Parallel Insert on s (cost=15428.00..16101.14 rows=208334 width=4) > -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4) > > AFAICS it's not parallel-aware, it just happens to be running in > parallel with a partial input and partial output (and in this case, > effect in terms of writes). Parallel-aware is our term for nodes that > actually know they are running in parallel and do some special > coordination with their twins in other processes. > Ah, thanks, I see the distinction now. I'll fix that, to restore parallel_aware=false for the ModifyTable node. > The estimated row count also looks wrong; at a guess, the parallel > divisor is applied twice. Let me try that with > parallel_leader_particiation=off (which disables some funky maths in > the row estimation and makes it straight division by number of > processes): > > Gather (cost=17629.00..18645.50 rows=1000000 width=4) > Workers Planned: 2 > -> Insert on s (cost=17629.00..18645.50 rows=250000 width=4) > -> Parallel Hash Join (cost=17629.00..37291.00 rows=500000 width=4) > [more nodes omitted] > > Yeah, that was a join that spat out a million rows, and we correctly > estimated 500k per process, and then Insert (still with my hack to > turn off the bogus "Parallel" display in this case, but it doesn't > affect the estimation) estimated 250k per process, which is wrong. Thanks, I did suspect the current costing was wrong for ModifyTable (workers>0 case), as I'd thrown it in (moving current costing code into costsize.c) without a lot of checking or great thought, and was on my TODO list of things to check. At least I created a placeholder for it. Looks like I've applied a parallel-divisor again (not allowing for that of the underlying query), as you said. Speaking of costing, I'm not sure I really agree with the current costing of a Gather node. Just considering a simple Parallel SeqScan case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of Gather cost always completely drowns out any other path costs when a large number of rows are involved (at least with default parallel-related GUC values), such that Parallel SeqScan would never be the cheapest path. This linear relationship in the costing based on the rows and a parallel_tuple_cost doesn't make sense to me. Surely after a certain amount of rows, the overhead of launching workers will be out-weighed by the benefit of their parallel work, such that the more rows, the more likely a Parallel SeqScan will benefit. That seems to suggest something like a logarithmic formula (or similar) would better match reality than what we have now. Am I wrong on this? Every time I use default GUC values, the planner doesn't want to generate a parallel plan. Lowering parallel-related GUCs like parallel_tuple_cost (which I normally do for testing) influences it of course, but the linear relationship still seems wrong. Regards, Greg Nancarrow Fujitsu Australia
On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Also, I have attached a separate patch (requested by Andres Freund) > that just allows the underlying SELECT part of "INSERT INTO ... SELECT > ..." to be parallel. > It might be a good idea to first just get this patch committed, if possible. So, I have reviewed the latest version of this patch: 0001-InsertParallelSelect 1. ParallelContext *pcxt; + /* + * We need to avoid an attempt on INSERT to assign a + * FullTransactionId whilst in parallel mode (which is in + * effect due to the underlying parallel query) - so the + * FullTransactionId is assigned here. Parallel mode must + * be temporarily escaped in order for this to be possible. + * The FullTransactionId will be included in the transaction + * state that is serialized in the parallel DSM. + */ + if (estate->es_plannedstmt->commandType == CMD_INSERT) + { + Assert(IsInParallelMode()); + ExitParallelMode(); + GetCurrentFullTransactionId(); + EnterParallelMode(); + } + This looks like a hack to me. I think you are doing this to avoid the parallel mode checks in GetNewTransactionId(), right? If so, I have already mentioned above [1] that we can change it so that we disallow assigning xids for parallel workers only. The same is true for the check in ExecGatherMerge. Do you see any problem with that suggestion? 2. @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, */ if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && - parse->commandType == CMD_SELECT && + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) I think the comments above this need to be updated especially the part where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only the leader backend writes into a completely new table.". Don't we need to include Insert also? 3. @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, * parallel-unsafe, or else the query planner itself has a bug. */ glob->parallelModeNeeded = glob->parallelModeOK && + (parse->commandType == CMD_SELECT) && (force_parallel_mode != FORCE_PARALLEL_OFF); Why do you need this change? The comments above this code should be updated to reflect this change. I think for the same reason the below code seems to be modified but I don't understand the reason for the below change as well, also it is better to update the comments for this as well. @@ -425,7 +426,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, * Optionally add a Gather node for testing purposes, provided this is * actually a safe thing to do. */ - if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe) + if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType == CMD_SELECT && top_plan->parallel_safe) { Gather *gather = makeNode(Gather); [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Speaking of costing, I'm not sure I really agree with the current > costing of a Gather node. Just considering a simple Parallel SeqScan > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of > Gather cost always completely drowns out any other path costs when a > large number of rows are involved (at least with default > parallel-related GUC values), such that Parallel SeqScan would never > be the cheapest path. This linear relationship in the costing based on > the rows and a parallel_tuple_cost doesn't make sense to me. Surely > after a certain amount of rows, the overhead of launching workers will > be out-weighed by the benefit of their parallel work, such that the > more rows, the more likely a Parallel SeqScan will benefit. > That will be true for the number of rows/pages we need to scan not for the number of tuples we need to return as a result. The formula here considers the number of rows the parallel scan will return and the more the number of rows each parallel node needs to pass via shared memory to gather node the more costly it will be. We do consider the total pages we need to scan in compute_parallel_worker() where we use a logarithmic formula to determine the number of workers. -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 0001-InsertParallelSelect > 1. > ParallelContext *pcxt; > > + /* > + * We need to avoid an attempt on INSERT to assign a > + * FullTransactionId whilst in parallel mode (which is in > + * effect due to the underlying parallel query) - so the > + * FullTransactionId is assigned here. Parallel mode must > + * be temporarily escaped in order for this to be possible. > + * The FullTransactionId will be included in the transaction > + * state that is serialized in the parallel DSM. > + */ > + if (estate->es_plannedstmt->commandType == CMD_INSERT) > + { > + Assert(IsInParallelMode()); > + ExitParallelMode(); > + GetCurrentFullTransactionId(); > + EnterParallelMode(); > + } > + > > This looks like a hack to me. I think you are doing this to avoid the > parallel mode checks in GetNewTransactionId(), right? Yes, agreed, is a hack to avoid that (mind you, it's not exactly great that ExecutePlan() sets parallel-mode for the entire plan execution). Also, did not expect that to necessarily remain in a final patch. >If so, I have > already mentioned above [1] that we can change it so that we disallow > assigning xids for parallel workers only. The same is true for the > check in ExecGatherMerge. Do you see any problem with that suggestion? > No, should be OK I guess, but will update and test to be sure. > 2. > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char > *query_string, int cursorOptions, > */ > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > IsUnderPostmaster && > - parse->commandType == CMD_SELECT && > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > > I think the comments above this need to be updated especially the part > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only > the leader backend writes into a completely new table.". Don't we need > to include Insert also? Yes, Insert needs to be mentioned somewhere there. > > 3. > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char > *query_string, int cursorOptions, > * parallel-unsafe, or else the query planner itself has a bug. > */ > glob->parallelModeNeeded = glob->parallelModeOK && > + (parse->commandType == CMD_SELECT) && > (force_parallel_mode != FORCE_PARALLEL_OFF); > > Why do you need this change? The comments above this code should be > updated to reflect this change. I think for the same reason the below > code seems to be modified but I don't understand the reason for the > below change as well, also it is better to update the comments for > this as well. > OK, I will update the comments for this. Basically, up to now, the "force_parallel_mode" has only ever operated on a SELECT. But since we are now allowing CMD_INSERT to be assessed for parallel mode too, we need to prevent the force_parallel_mode logic from sticking a Gather node over the top of arbitrary INSERTs and causing them to be run in parallel. Not all INSERTs are suitable for parallel operation, and also there are further considerations for parallel-safety for INSERTs compared to SELECT. INSERTs can also trigger UPDATEs. If we need to support force_parallel_mode for INSERT, more work will need to be done. > @@ -425,7 +426,7 @@ standard_planner(Query *parse, const char > *query_string, int cursorOptions, > * Optionally add a Gather node for testing purposes, provided this is > * actually a safe thing to do. > */ > - if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe) > + if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType > == CMD_SELECT && top_plan->parallel_safe) > { > Gather *gather = makeNode(Gather); > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com > Regards, Greg Nancarrow Fujitsu Australia
On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > Speaking of costing, I'm not sure I really agree with the current > > costing of a Gather node. Just considering a simple Parallel SeqScan > > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of > > Gather cost always completely drowns out any other path costs when a > > large number of rows are involved (at least with default > > parallel-related GUC values), such that Parallel SeqScan would never > > be the cheapest path. This linear relationship in the costing based on > > the rows and a parallel_tuple_cost doesn't make sense to me. Surely > > after a certain amount of rows, the overhead of launching workers will > > be out-weighed by the benefit of their parallel work, such that the > > more rows, the more likely a Parallel SeqScan will benefit. > > > > That will be true for the number of rows/pages we need to scan not for > the number of tuples we need to return as a result. The formula here > considers the number of rows the parallel scan will return and the > more the number of rows each parallel node needs to pass via shared > memory to gather node the more costly it will be. > > We do consider the total pages we need to scan in > compute_parallel_worker() where we use a logarithmic formula to > determine the number of workers. > Despite all the best intentions, the current costings seem to be geared towards selection of a non-parallel plan over a parallel plan, the more rows there are in the table. Yet the performance of a parallel plan appears to be better than non-parallel-plan the more rows there are in the table. This doesn't seem right to me. Is there a rationale behind this costing model? I have pointed out the part of the parallel_tuple_cost calculation that seems to drown out all other costs (causing the cost value to be huge), the more rows there are in the table. Regards, Greg Nancarrow Fujitsu Australia
On Fri, Oct 9, 2020 at 4:28 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > Speaking of costing, I'm not sure I really agree with the current > > > costing of a Gather node. Just considering a simple Parallel SeqScan > > > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of > > > Gather cost always completely drowns out any other path costs when a > > > large number of rows are involved (at least with default > > > parallel-related GUC values), such that Parallel SeqScan would never > > > be the cheapest path. This linear relationship in the costing based on > > > the rows and a parallel_tuple_cost doesn't make sense to me. Surely > > > after a certain amount of rows, the overhead of launching workers will > > > be out-weighed by the benefit of their parallel work, such that the > > > more rows, the more likely a Parallel SeqScan will benefit. > > > > > > > That will be true for the number of rows/pages we need to scan not for > > the number of tuples we need to return as a result. The formula here > > considers the number of rows the parallel scan will return and the > > more the number of rows each parallel node needs to pass via shared > > memory to gather node the more costly it will be. > > > > We do consider the total pages we need to scan in > > compute_parallel_worker() where we use a logarithmic formula to > > determine the number of workers. > > > > Despite all the best intentions, the current costings seem to be > geared towards selection of a non-parallel plan over a parallel plan, > the more rows there are in the table. Yet the performance of a > parallel plan appears to be better than non-parallel-plan the more > rows there are in the table. > This doesn't seem right to me. Is there a rationale behind this costing model? > Yes, AFAIK, there is no proof that we can get any (much) gain by dividing the I/O among workers. It is primarily the CPU effort which gives the benefit. So, the parallel plans show greater benefit when we have to scan a large table and then project much lesser rows. -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > + /* > + * We need to avoid an attempt on INSERT to assign a > + * FullTransactionId whilst in parallel mode (which is in > + * effect due to the underlying parallel query) - so the > + * FullTransactionId is assigned here. Parallel mode must > + * be temporarily escaped in order for this to be possible. > + * The FullTransactionId will be included in the transaction > + * state that is serialized in the parallel DSM. > + */ > + if (estate->es_plannedstmt->commandType == CMD_INSERT) > + { > + Assert(IsInParallelMode()); > + ExitParallelMode(); > + GetCurrentFullTransactionId(); > + EnterParallelMode(); > + } > + > > This looks like a hack to me. I think you are doing this to avoid the > parallel mode checks in GetNewTransactionId(), right? If so, I have > already mentioned above [1] that we can change it so that we disallow > assigning xids for parallel workers only. The same is true for the > check in ExecGatherMerge. Do you see any problem with that suggestion? > Actually, there is a problem. If I remove that "hack", and change the code in GetNewTransactionId() to disallow xid assignment for parallel workers only, then there is also similar code in AssignTransactionId() which gets called. If I change that code too, in the same way, then on a parallel INSERT, that code gets called by a parallel worker (from GetCurrentTransactionId()) and the ERROR "cannot assign XIDs in a parallel worker" results. GetCurrentFullTransactionId() must be called in the leader, somewhere (and will be included in the transaction state that is serialized in the parallel DSM). If not done here, then where? Regards, Greg Nancarrow Fujitsu Australia
On Fri, Oct 9, 2020 at 5:54 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > + /* > > + * We need to avoid an attempt on INSERT to assign a > > + * FullTransactionId whilst in parallel mode (which is in > > + * effect due to the underlying parallel query) - so the > > + * FullTransactionId is assigned here. Parallel mode must > > + * be temporarily escaped in order for this to be possible. > > + * The FullTransactionId will be included in the transaction > > + * state that is serialized in the parallel DSM. > > + */ > > + if (estate->es_plannedstmt->commandType == CMD_INSERT) > > + { > > + Assert(IsInParallelMode()); > > + ExitParallelMode(); > > + GetCurrentFullTransactionId(); > > + EnterParallelMode(); > > + } > > + > > > > This looks like a hack to me. I think you are doing this to avoid the > > parallel mode checks in GetNewTransactionId(), right? If so, I have > > already mentioned above [1] that we can change it so that we disallow > > assigning xids for parallel workers only. The same is true for the > > check in ExecGatherMerge. Do you see any problem with that suggestion? > > > > Actually, there is a problem. > If I remove that "hack", and change the code in GetNewTransactionId() > to disallow xid assignment for parallel workers only, then there is > also similar code in AssignTransactionId() which gets called. > I don't think workers need to call AssignTransactionId(), before that the transactionid passed from leader should be set in CurrentTransactionState. Why GetCurrentTransactionId()/GetCurrentFullTransactionId(void) needs to call AssignTransactionId() when called from worker? > GetCurrentFullTransactionId() must be called in the leader, somewhere > (and will be included in the transaction state that is serialized in > the parallel DSM). > Yes, it should have done in the leader and then it should have been set in the workers via StartParallelWorkerTransaction before we do any actual operation. If that happens then GetCurrentTransactionId() won't need to call AssignTransactionId(). -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 0001-InsertParallelSelect > > 1. > > ParallelContext *pcxt; > > > > + /* > > + * We need to avoid an attempt on INSERT to assign a > > + * FullTransactionId whilst in parallel mode (which is in > > + * effect due to the underlying parallel query) - so the > > + * FullTransactionId is assigned here. Parallel mode must > > + * be temporarily escaped in order for this to be possible. > > + * The FullTransactionId will be included in the transaction > > + * state that is serialized in the parallel DSM. > > + */ > > + if (estate->es_plannedstmt->commandType == CMD_INSERT) > > + { > > + Assert(IsInParallelMode()); > > + ExitParallelMode(); > > + GetCurrentFullTransactionId(); > > + EnterParallelMode(); > > + } > > + > > > > This looks like a hack to me. I think you are doing this to avoid the > > parallel mode checks in GetNewTransactionId(), right? > > Yes, agreed, is a hack to avoid that (mind you, it's not exactly great > that ExecutePlan() sets parallel-mode for the entire plan execution). > Also, did not expect that to necessarily remain in a final patch. > > >If so, I have > > already mentioned above [1] that we can change it so that we disallow > > assigning xids for parallel workers only. The same is true for the > > check in ExecGatherMerge. Do you see any problem with that suggestion? > > > > No, should be OK I guess, but will update and test to be sure. > > > 2. > > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char > > *query_string, int cursorOptions, > > */ > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > > IsUnderPostmaster && > > - parse->commandType == CMD_SELECT && > > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) && > > !parse->hasModifyingCTE && > > max_parallel_workers_per_gather > 0 && > > !IsParallelWorker()) > > > > I think the comments above this need to be updated especially the part > > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and > > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only > > the leader backend writes into a completely new table.". Don't we need > > to include Insert also? > > Yes, Insert needs to be mentioned somewhere there. > > > > > 3. > > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char > > *query_string, int cursorOptions, > > * parallel-unsafe, or else the query planner itself has a bug. > > */ > > glob->parallelModeNeeded = glob->parallelModeOK && > > + (parse->commandType == CMD_SELECT) && > > (force_parallel_mode != FORCE_PARALLEL_OFF); > > > > Why do you need this change? The comments above this code should be > > updated to reflect this change. I think for the same reason the below > > code seems to be modified but I don't understand the reason for the > > below change as well, also it is better to update the comments for > > this as well. > > > > OK, I will update the comments for this. > Basically, up to now, the "force_parallel_mode" has only ever operated > on a SELECT. > But since we are now allowing CMD_INSERT to be assessed for parallel > mode too, we need to prevent the force_parallel_mode logic from > sticking a Gather node over the top of arbitrary INSERTs and causing > them to be run in parallel. Not all INSERTs are suitable for parallel > operation, and also there are further considerations for > parallel-safety for INSERTs compared to SELECT. INSERTs can also > trigger UPDATEs. > Sure but in that case 'top_plan->parallel_safe' should be false and it should stick Gather node atop Insert node. For the purpose of this patch, the scan beneath Insert should be considered as parallel_safe. -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 6:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > OK, I will update the comments for this. > > Basically, up to now, the "force_parallel_mode" has only ever operated > > on a SELECT. > > But since we are now allowing CMD_INSERT to be assessed for parallel > > mode too, we need to prevent the force_parallel_mode logic from > > sticking a Gather node over the top of arbitrary INSERTs and causing > > them to be run in parallel. Not all INSERTs are suitable for parallel > > operation, and also there are further considerations for > > parallel-safety for INSERTs compared to SELECT. INSERTs can also > > trigger UPDATEs. > > > > Sure but in that case 'top_plan->parallel_safe' should be false and it > should stick Gather node atop Insert node. > /should/should not. -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Sure but in that case 'top_plan->parallel_safe' should be false and it > > should stick Gather node atop Insert node. > > > > /should/should not. > OK, for the minimal patch, just allowing INSERT with parallel SELECT, you're right, neither of those additional "commandType == CMD_SELECT" checks are needed, so I'll remove them. (In the main patch, I think the first check can be removed, once the XID handling is fixed; the second check is definitely needed though). Regards, Greg Nancarrow Fujitsu Australia
On Fri, Oct 9, 2020 at 11:58 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > That will be true for the number of rows/pages we need to scan not for > > the number of tuples we need to return as a result. The formula here > > considers the number of rows the parallel scan will return and the > > more the number of rows each parallel node needs to pass via shared > > memory to gather node the more costly it will be. > > > > We do consider the total pages we need to scan in > > compute_parallel_worker() where we use a logarithmic formula to > > determine the number of workers. > > Despite all the best intentions, the current costings seem to be > geared towards selection of a non-parallel plan over a parallel plan, > the more rows there are in the table. Yet the performance of a > parallel plan appears to be better than non-parallel-plan the more > rows there are in the table. Right, but as Amit said, we still have to account for the cost of schlepping tuples between processes. Hmm... could the problem be that we're incorrectly estimating that Insert (without RETURNING) will send a bazillion tuples, even though that isn't true? I didn't look at the code but that's what the plan seems to imply when it says stuff like "Gather (cost=15428.00..16101.14 rows=1000000 width=4)". I suppose the row estimates for ModifyTable paths are based on what they write, not what they emit, and in the past that distinction didn't matter much because it wasn't something that was used for comparing alternative plans. Now it is.
On Fri, Oct 9, 2020 at 7:32 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Sure but in that case 'top_plan->parallel_safe' should be false and it > > > should stick Gather node atop Insert node. > > > > > > > /should/should not. > > > > OK, for the minimal patch, just allowing INSERT with parallel SELECT, > you're right, neither of those additional "commandType == CMD_SELECT" > checks are needed, so I'll remove them. > Okay, that makes sense. > (In the main patch, I think > the first check can be removed, once the XID handling is fixed; the > second check is definitely needed though). > Okay, then move that check but please do add some comments to state the reason. -- With Regards, Amit Kapila.
On Fri, Oct 9, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Also, I have attached a separate patch (requested by Andres Freund) > > that just allows the underlying SELECT part of "INSERT INTO ... SELECT > > ..." to be parallel. > > > > It might be a good idea to first just get this patch committed, if > possible. So, I have reviewed the latest version of this patch: > Few initial comments on 0004-ParallelInsertSelect: 1. @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, * inserts in general except for the cases where inserts generate a new * CommandId (eg. inserts into a table having a foreign key column). */ - if (IsParallelWorker()) - ereport(ERROR, - (errcode(ERRCODE_INVALID_TRANSACTION_STATE), - errmsg("cannot insert tuples in a parallel worker"))); - I have speculated above [1] to see if we can change this Assert condition instead of just removing it? Have you considered that suggestion? 2. @@ -764,12 +778,13 @@ GetCurrentCommandId(bool used) if (used) { /* - * Forbid setting currentCommandIdUsed in a parallel worker, because - * we have no provision for communicating this back to the leader. We - * could relax this restriction when currentCommandIdUsed was already - * true at the start of the parallel operation. + * If in a parallel worker, only allow setting currentCommandIdUsed + * if currentCommandIdUsed was already true at the start of the + * parallel operation (by way of SetCurrentCommandIdUsed()), otherwise + * forbid setting currentCommandIdUsed because we have no provision + * for communicating this back to the leader. */ - Assert(!IsParallelWorker()); + Assert(!(IsParallelWorker() && !currentCommandIdUsed)); currentCommandIdUsed = true; } Once we allowed this, won't the next CommandCounterIncrement() in the worker will increment the commandId which will lead to using different commandIds in worker and leader? Is that prevented in some way, if so, how? Can we document the same? 3. @@ -173,7 +173,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * against performing unsafe operations in parallel mode, but this gives a * more user-friendly error message. */ - if ((XactReadOnly || IsInParallelMode()) && + if ((XactReadOnly || (IsInParallelMode() && queryDesc->plannedstmt->commandType != CMD_INSERT)) && !(eflags & EXEC_FLAG_EXPLAIN_ONLY)) ExecCheckXactReadOnly(queryDesc->plannedstmt); I don't think above change is correct. We need to extend the below check in ExecCheckXactReadOnly() because otherwise, it can allow Insert operations even when XactReadOnly is set which we don't want. ExecCheckXactReadOnly() { .. if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE) PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt)); .. } 4. @@ -173,18 +175,20 @@ ExecSerializePlan(Plan *plan, EState *estate) * PlannedStmt to start the executor. - pstmt->hasReturning = false; - pstmt->hasModifyingCTE = false; + pstmt->hasReturning = estate->es_plannedstmt->hasReturning; + pstmt->hasModifyingCTE = estate->es_plannedstmt->hasModifyingCTE; Why change hasModifyingCTE? 5. + if (isParallelInsertLeader) + { + /* For Parallel INSERT, if there are BEFORE STATEMENT triggers, + * these must be fired by the leader, not the parallel workers. + */ The multi-line comment should start from the second line. I see a similar problem at other places in the patch as well. 6. @@ -178,6 +214,25 @@ ExecGather(PlanState *pstate) node->pei, gather->initParam); + if (isParallelInsertLeader) + { + /* For Parallel INSERT, if there are BEFORE STATEMENT triggers, + * these must be fired by the leader, not the parallel workers. + */ + if (nodeModifyTableState->fireBSTriggers) + { + fireBSTriggers(nodeModifyTableState); + nodeModifyTableState->fireBSTriggers = false; + + /* + * Disable firing of AFTER STATEMENT triggers by local + * plan execution (ModifyTable processing). These will be + * fired at end of Gather processing. + */ + nodeModifyTableState->fireASTriggers = false; + } + } Can we encapsulate this in a separate function? It seems a bit odd to directly do this ExecGather. 7. @@ -418,14 +476,25 @@ ExecShutdownGatherWorkers(GatherState *node) void ExecShutdownGather(GatherState *node) { - ExecShutdownGatherWorkers(node); + if (node->pei == NULL) + return; - /* Now destroy the parallel context. */ - if (node->pei != NULL) So after this patch if "node->pei == NULL" then we won't shutdown workers here? Why so? 8. You have made changes related to trigger execution for Gather node, don't we need similar changes for GatherMerge node? 9. @@ -383,7 +444,21 @@ cost_gather(GatherPath *path, PlannerInfo *root, /* Parallel setup and communication cost. */ startup_cost += parallel_setup_cost; - run_cost += parallel_tuple_cost * path->path.rows; + + /* + * For Parallel INSERT, provided no tuples are returned from workers + * to gather/leader node, don't add a cost-per-row, as each worker + * parallelly inserts the tuples that result from its chunk of plan + * execution. This change may make the parallel plan cheap among all + * other plans, and influence the planner to consider this parallel + * plan. + */ + if (!(IsA(path->subpath, ModifyTablePath) && + castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT && + castNode(ModifyTablePath, path->subpath)->returningLists != NULL)) + { + run_cost += parallel_tuple_cost * path->path.rows; + } Isn't the last condition in above check "castNode(ModifyTablePath, path->subpath)->returningLists != NULL" should be "castNode(ModifyTablePath, path->subpath)->returningLists == NULL" instead? Because otherwise when there is returning list it won't count the cost for passing tuples via Gather node. This might be reason of what Thomas has seen in his recent email [2]. 10. Don't we need a change similar to cost_gather in cost_gather_merge? It seems you have made only partial changes for GatherMerge node. [1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CA%2BhUKGLZB%3D1Q%2BAQQEEmffr3bUMAh%2BJD%2BJ%2B7axv%2BK10Kea0U9TQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 8. You have made changes related to trigger execution for Gather node, > don't we need similar changes for GatherMerge node? > .. > > 10. Don't we need a change similar to cost_gather in > cost_gather_merge? It seems you have made only partial changes for > GatherMerge node. > Please ignore these two comments as we can't push Insert to workers when GatherMerge is involved as a leader backend does the final phase (merge the results by workers). So, we can only push the Select part of the statement. -- With Regards, Amit Kapila.
On Sun, Oct 11, 2020 at 12:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > + /* > + * For Parallel INSERT, provided no tuples are returned from workers > + * to gather/leader node, don't add a cost-per-row, as each worker > + * parallelly inserts the tuples that result from its chunk of plan > + * execution. This change may make the parallel plan cheap among all > + * other plans, and influence the planner to consider this parallel > + * plan. > + */ > + if (!(IsA(path->subpath, ModifyTablePath) && > + castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT && > + castNode(ModifyTablePath, path->subpath)->returningLists != NULL)) > + { > + run_cost += parallel_tuple_cost * path->path.rows; > + } > > Isn't the last condition in above check "castNode(ModifyTablePath, > path->subpath)->returningLists != NULL" should be > "castNode(ModifyTablePath, path->subpath)->returningLists == NULL" > instead? Because otherwise when there is returning list it won't count > the cost for passing tuples via Gather node. This might be reason of > what Thomas has seen in his recent email [2]. Yeah, I think this is trying to fix the problem too late. Instead, we should fix the incorrect row estimates so we don't have to fudge it later like that. For example, this should be estimating rows=0: postgres=# explain analyze insert into s select * from t t1 join t t2 using (i); ... Insert on s (cost=30839.08..70744.45 rows=1000226 width=4) (actual time=2940.560..2940.562 rows=0 loops=1) I think that should be done with something like this: --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel, if (lc == list_head(subpaths)) /* first node? */ pathnode->path.startup_cost = subpath->startup_cost; pathnode->path.total_cost += subpath->total_cost; - pathnode->path.rows += subpath->rows; + if (returningLists != NIL) + pathnode->path.rows += subpath->rows; total_size += subpath->pathtarget->width * subpath->rows; } - /* - * Set width to the average width of the subpath outputs. XXX this is - * totally wrong: we should report zero if no RETURNING, else an average - * of the RETURNING tlist widths. But it's what happened historically, - * and improving it is a task for another day. - */ if (pathnode->path.rows > 0) total_size /= pathnode->path.rows; pathnode->path.pathtarget->width = rint(total_size);
On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > OK, for the minimal patch, just allowing INSERT with parallel SELECT, > > you're right, neither of those additional "commandType == CMD_SELECT" > > checks are needed, so I'll remove them. > > >various > Okay, that makes sense. > For the minimal patch (just allowing INSERT with parallel SELECT), there are issues with parallel-mode and various parallel-mode-related checks in the code. Initially, I thought it was only a couple of XID-related checks (which could perhaps just be tweaked to check for IsParallelWorker() instead, as you suggested), but I now realise that there are a lot more cases. This stems from the fact that just having a parallel SELECT (as part of non-parallel INSERT) causes parallel-mode to be set for the WHOLE plan. I'm not sure why parallel-mode is set globally like this, for the whole plan. Couldn't it just be set for the scope of Gather/GatherMerge? Otherwise, errors from these checks seem to be misleading when outside the scope of Gather/GatherMerge, as technically they are not occurring within the scope of parallel-leader and parallel-worker(s). The global parallel-mode wouldn't have been an issue before, because up to now INSERT has never had underlying parallel operations. For example, when running the tests under "force_parallel_mode=regress", the test failures show that there are a lot more cases affected: "cannot assign TransactionIds during a parallel operation" "cannot assign XIDs during a parallel operation" "cannot start commands during a parallel operation" "cannot modify commandid in active snapshot during a parallel operation" "cannot execute nextval() during a parallel operation" "cannot execute INSERT during a parallel operation" "cannot execute ANALYZE during a parallel operation "cannot update tuples during a parallel operation" (and there are more not currently detected by the tests, found by searching the code). As an example, with the minimal patch applied, if you had a trigger on INSERT that, say, attempted a table creation or UPDATE/DELETE, and you ran an "INSERT INTO ... SELECT...", it would treat the trigger operations as being attempted in parallel-mode, and so an error would result. Let me know your thoughts on how to deal with these issues. Can you see a problem with only having parallel-mode set for scope of Gather/GatherMerge, or do you have some other idea? Regards, Greg Nancarrow Fujitsu Australia
On Sun, Oct 11, 2020 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 8. You have made changes related to trigger execution for Gather node, > > don't we need similar changes for GatherMerge node? > > > .. > > > > 10. Don't we need a change similar to cost_gather in > > cost_gather_merge? It seems you have made only partial changes for > > GatherMerge node. > > > > Please ignore these two comments as we can't push Insert to workers > when GatherMerge is involved as a leader backend does the final phase > (merge the results by workers). So, we can only push the Select part > of the statement. > Precisely, that's why I didn't make those changes for GatherMerge. Regards, Greg Nancarrow Fujitsu Australia
On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Yeah, I think this is trying to fix the problem too late. Instead, we > should fix the incorrect row estimates so we don't have to fudge it > later like that. For example, this should be estimating rows=0: > > postgres=# explain analyze insert into s select * from t t1 join t t2 using (i); > ... > Insert on s (cost=30839.08..70744.45 rows=1000226 width=4) (actual > time=2940.560..2940.562 rows=0 loops=1) > > I think that should be done with something like this: > > --- a/src/backend/optimizer/util/pathnode.c > +++ b/src/backend/optimizer/util/pathnode.c > @@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root, > RelOptInfo *rel, > if (lc == list_head(subpaths)) /* first node? */ > pathnode->path.startup_cost = subpath->startup_cost; > pathnode->path.total_cost += subpath->total_cost; > - pathnode->path.rows += subpath->rows; > + if (returningLists != NIL) > + pathnode->path.rows += subpath->rows; > total_size += subpath->pathtarget->width * subpath->rows; > } > > - /* > - * Set width to the average width of the subpath outputs. XXX this is > - * totally wrong: we should report zero if no RETURNING, else an average > - * of the RETURNING tlist widths. But it's what happened historically, > - * and improving it is a task for another day. > - */ > if (pathnode->path.rows > 0) > total_size /= pathnode->path.rows; > pathnode->path.pathtarget->width = rint(total_size); Agree, thanks (bug in existing Postgres code, right?) Regards, Greg Nancarrow Fujitsu Australia
On Mon, Oct 12, 2020 at 3:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > pathnode->path.total_cost += subpath->total_cost; > > - pathnode->path.rows += subpath->rows; > > + if (returningLists != NIL) > > + pathnode->path.rows += subpath->rows; > > total_size += subpath->pathtarget->width * subpath->rows; > > } Erm, except the condition should of course cover total_size too. > Agree, thanks (bug in existing Postgres code, right?) Yeah, I think we should go ahead and fix that up front. Here's a version with a commit message.
Attachment
On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > OK, for the minimal patch, just allowing INSERT with parallel SELECT, > > > you're right, neither of those additional "commandType == CMD_SELECT" > > > checks are needed, so I'll remove them. > > > > >various > > Okay, that makes sense. > > > > For the minimal patch (just allowing INSERT with parallel SELECT), > there are issues with parallel-mode and various parallel-mode-related > checks in the code. > Initially, I thought it was only a couple of XID-related checks (which > could perhaps just be tweaked to check for IsParallelWorker() instead, > as you suggested), but I now realise that there are a lot more cases. > This stems from the fact that just having a parallel SELECT (as part > of non-parallel INSERT) causes parallel-mode to be set for the WHOLE > plan. I'm not sure why parallel-mode is set globally like this, for > the whole plan. Couldn't it just be set for the scope of > Gather/GatherMerge? Otherwise, errors from these checks seem to be > misleading when outside the scope of Gather/GatherMerge, as > technically they are not occurring within the scope of parallel-leader > and parallel-worker(s). The global parallel-mode wouldn't have been an > issue before, because up to now INSERT has never had underlying > parallel operations. > That is right but there is another operation which works like that. For ex. a statement like "create table test_new As select * from test_parallel where c1 < 1000;" will use parallel select but the write operation will be performed in a leader. I agree that the code flow of Insert is different so we will have a different set of challenges in that case but to make it work there shouldn't be any fundamental problem. > For example, when running the tests under > "force_parallel_mode=regress", the test failures show that there are a > lot more cases affected: > > "cannot assign TransactionIds during a parallel operation" > "cannot assign XIDs during a parallel operation" > "cannot start commands during a parallel operation" > "cannot modify commandid in active snapshot during a parallel operation" > "cannot execute nextval() during a parallel operation" > "cannot execute INSERT during a parallel operation" > "cannot execute ANALYZE during a parallel operation > "cannot update tuples during a parallel operation" > > (and there are more not currently detected by the tests, found by > searching the code). > Did you get these after applying your patch? If so, can you share the version which you are using, or if you have already posted the same then point me to the same? > As an example, with the minimal patch applied, if you had a trigger on > INSERT that, say, attempted a table creation or UPDATE/DELETE, and you > ran an "INSERT INTO ... SELECT...", it would treat the trigger > operations as being attempted in parallel-mode, and so an error would > result. > Oh, I guess this happens because you need to execute Insert in parallel-mode even though Insert is happening in the leader, right? And probably we are not facing this with "Create Table As .." because there is no trigger execution involved there. > Let me know your thoughts on how to deal with these issues. > Can you see a problem with only having parallel-mode set for scope of > Gather/GatherMerge, or do you have some other idea? > I have not thought about this yet but I don't understand your proposal. How will you set it only for the scope of Gather (Merge)? The execution of the Gather node will be interleaved with the Insert node, basically, you fetch a tuple from Gather, and then you need to Insert it. Can you be a bit more specific on what you have in mind for this? -- With Regards, Amit Kapila.
On Mon, Oct 12, 2020 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Oct 12, 2020 at 3:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > pathnode->path.total_cost += subpath->total_cost; > > > - pathnode->path.rows += subpath->rows; > > > + if (returningLists != NIL) > > > + pathnode->path.rows += subpath->rows; > > > total_size += subpath->pathtarget->width * subpath->rows; > > > } > > Erm, except the condition should of course cover total_size too. > > > Agree, thanks (bug in existing Postgres code, right?) > > Yeah, I think we should go ahead and fix that up front. Here's a > version with a commit message. I've checked it and tested it, and it looks fine to me. Also, it seems to align with the gripe in the old comment about width ("XXX this is totally wrong: we should report zero if no RETURNING ..."). I'm happy for you to commit it. Regards, Greg Nancarrow Fujitsu Australia
On Mon, Oct 12, 2020 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > Let me know your thoughts on how to deal with these issues. > > Can you see a problem with only having parallel-mode set for scope of > > Gather/GatherMerge, or do you have some other idea? > > > > I have not thought about this yet but I don't understand your > proposal. How will you set it only for the scope of Gather (Merge)? > The execution of the Gather node will be interleaved with the Insert > node, basically, you fetch a tuple from Gather, and then you need to > Insert it. Can you be a bit more specific on what you have in mind for > this? > One more thing I would like to add here is that we can't exit parallel-mode till the workers are running (performing the scan or other operation it is assigned with) and shared memory is not destroyed. Otherwise, the leader can perform un-safe things like assigning new commandsids or probably workers can send some error for which the leader should still be in parallel-mode. So, considering this I think we need quite similar checks (similar to parallel inserts) to make even the Select part parallel for Inserts. If we do that then you won't face many of the problems you mentioned above like executing triggers that contain parallel-unsafe stuff. I feel still it will be beneficial to do this as it will cover cases like Insert with GatherMerge underneath it which would otherwise not possible. -- With Regards, Amit Kapila.
On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I have not thought about this yet but I don't understand your > > proposal. How will you set it only for the scope of Gather (Merge)? > > The execution of the Gather node will be interleaved with the Insert > > node, basically, you fetch a tuple from Gather, and then you need to > > Insert it. Can you be a bit more specific on what you have in mind for > > this? > > > > One more thing I would like to add here is that we can't exit > parallel-mode till the workers are running (performing the scan or > other operation it is assigned with) and shared memory is not > destroyed. Otherwise, the leader can perform un-safe things like > assigning new commandsids or probably workers can send some error for > which the leader should still be in parallel-mode. So, considering > this I think we need quite similar checks (similar to parallel > inserts) to make even the Select part parallel for Inserts. If we do > that then you won't face many of the problems you mentioned above like > executing triggers that contain parallel-unsafe stuff. I feel still it > will be beneficial to do this as it will cover cases like Insert with > GatherMerge underneath it which would otherwise not possible. > Yes, I see what you mean, exiting parallel-mode can't be done safely where I had hoped it could, so looks like, even for making just the Select part of Insert parallel, I need to add checks (along the same lines as for Parallel Insert) to avoid the parallel Select in certain potentially-unsafe cases. Regards, Greg Nancarrow Fujitsu Australia
On Mon, Oct 12, 2020 at 12:38 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I have not thought about this yet but I don't understand your > > > proposal. How will you set it only for the scope of Gather (Merge)? > > > The execution of the Gather node will be interleaved with the Insert > > > node, basically, you fetch a tuple from Gather, and then you need to > > > Insert it. Can you be a bit more specific on what you have in mind for > > > this? > > > > > > > One more thing I would like to add here is that we can't exit > > parallel-mode till the workers are running (performing the scan or > > other operation it is assigned with) and shared memory is not > > destroyed. Otherwise, the leader can perform un-safe things like > > assigning new commandsids or probably workers can send some error for > > which the leader should still be in parallel-mode. So, considering > > this I think we need quite similar checks (similar to parallel > > inserts) to make even the Select part parallel for Inserts. If we do > > that then you won't face many of the problems you mentioned above like > > executing triggers that contain parallel-unsafe stuff. I feel still it > > will be beneficial to do this as it will cover cases like Insert with > > GatherMerge underneath it which would otherwise not possible. > > > > Yes, I see what you mean, exiting parallel-mode can't be done safely > where I had hoped it could, so looks like, even for making just the > Select part of Insert parallel, I need to add checks (along the same > lines as for Parallel Insert) to avoid the parallel Select in certain > potentially-unsafe cases. > Right, after we take care of that, we can think of assigning xid or things like that before we enter parallel mode. Say we have a function like PrepareParallelMode (or PrepareEnterParallelMode) or something like that where we can check whether we need to perform parallel-safe-write operation (as of now Insert) and then do the required preparation like assign xid, etc. I think this might not be idle because it is possible that we don't fetch even a single row (say due to filter condition) which needs to be inserted and then we will waste xid but such cases might not occur often enough to worry. -- With Regards, Amit Kapila.
On Mon, Oct 12, 2020 at 6:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Mon, Oct 12, 2020 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Yeah, I think we should go ahead and fix that up front. Here's a > > version with a commit message. > > I've checked it and tested it, and it looks fine to me. > Also, it seems to align with the gripe in the old comment about width > ("XXX this is totally wrong: we should report zero if no RETURNING > ..."). > I'm happy for you to commit it. Pushed, though I left most of that comment there because the width estimate still needs work when you do use RETURNING. At least we now have rows=0 for queries without RETURNING, which was a bigger problem for your patch.
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > It might be a good idea to first just get this patch committed, if > possible. So, I have reviewed the latest version of this patch: > > 0001-InsertParallelSelect I've attached an updated InsertParallelSelect patch (v2) - allowing underlying parallel SELECT for "INSERT INTO ... SELECT ...". I think I've addressed most of the issues identified in the previous version of the patch. I'm still seeing a couple of errors in the tests when "force_parallel_mode=regress" is in effect, and those need to be addressed (extra checks required to avoid parallel SELECT in certain cases). Also, I'm seeing a partition-related error when running installcheck-world - I'm investigating that. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > It might be a good idea to first just get this patch committed, if > > possible. So, I have reviewed the latest version of this patch: > > > > 0001-InsertParallelSelect > > I've attached an updated InsertParallelSelect patch (v2) - allowing > underlying parallel SELECT for "INSERT INTO ... SELECT ...". > I think I've addressed most of the issues identified in the previous > version of the patch. > I'm still seeing a couple of errors in the tests when > "force_parallel_mode=regress" is in effect, and those need to be > addressed (extra checks required to avoid parallel SELECT in certain > cases). > I am getting below error in force_parallel_mode: @@ -1087,9 +1087,14 @@ ERROR: value for domain inotnull violates check constraint "inotnull_check" create table dom_table (x inotnull); insert into dom_table values ('1'); +ERROR: cannot start commands during a parallel operation +CONTEXT: SQL function "sql_is_distinct_from" It happened with below test: create function sql_is_distinct_from(anyelement, anyelement) returns boolean language sql as 'select $1 is distinct from $2 limit 1'; create domain inotnull int check (sql_is_distinct_from(value, null)); create table dom_table (x inotnull); insert into dom_table values ('1'); So it is clear that this is happening because we have allowed insert that is parallel-unsafe. The attribute is of type domain which has a parallel-unsafe constraint. As per your current code, we need to detect it in IsRelParallelModeSafeForModify. The idea would be to check the type of each attribute and if it is domain type then we need to check if it has a constraint (See function ExecGrant_Type on how to detect a domain type and then refer to functions AlterTypeNamespaceInternal and AlterConstraintNamespaces to know how to determine constraint for domain type). Once you can find a constraint then you already have code in your patch to find if it contains parallel-unsafe expression. > Also, I'm seeing a partition-related error when running > installcheck-world - I'm investigating that. > Okay. Few more comments: ================== 1. + /* + * Can't support execution of row-level or transition-table triggers + * during parallel-mode, since such triggers may query the table + * into which the data is being inserted, and the content returned + * would vary unpredictably according to the order of retrieval by + * the workers and the rows already inserted. + */ + if (trigdesc != NULL && + (trigdesc->trig_insert_instead_row || + trigdesc->trig_insert_before_row || + trigdesc->trig_insert_after_row || + trigdesc->trig_insert_new_table)) + { + return false; + } I don't think it is a good idea to prohibit all before/after/instead row triggers because for the case you are referring to should mark trigger functions as parallel-unsafe. We might want to have to Assert somewhere to detect if there is illegal usage but I don't see the need to directly prohibit them. 2. @@ -56,8 +57,8 @@ GetNewTransactionId(bool isSubXact) * Workers synchronize transaction state at the beginning of each parallel * operation, so we can't account for new XIDs after that point. */ - if (IsInParallelMode()) - elog(ERROR, "cannot assign TransactionIds during a parallel operation"); + if (IsParallelWorker()) + elog(ERROR, "cannot assign TransactionIds in a parallel worker"); /* * During bootstrap initialization, we return the special bootstrap diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index af6afce..ef423fb 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -580,8 +580,8 @@ AssignTransactionId(TransactionState s) * Workers synchronize transaction state at the beginning of each parallel * operation, so we can't account for new XIDs at this point. */ - if (IsInParallelMode() || IsParallelWorker()) - elog(ERROR, "cannot assign XIDs during a parallel operation"); + if (IsParallelWorker()) + elog(ERROR, "cannot assign XIDs in a parallel worker"); I think we don't need these changes at least for the purpose of this patch if you follow the suggestion related to having a new function like PrepareParallelMode in the email above [1]. One problem I see with removing these checks is how do we ensure that leader won't assign a new transactionid once we start executing a parallel node. It can do via sub-transactions maybe that is already protected at some previous point but I would like to see if we can retain these checks. [1] - https://www.postgresql.org/message-id/CAA4eK1JogfXUa%3D3wMPO%2BK%3DUiOLgHgCO7-fj1wCHsSxdaXsfVbw%40mail.gmail.com -- With Regards, Amit Kapila.
On Thu, Oct 15, 2020 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > Also, I'm seeing a partition-related error when running > > installcheck-world - I'm investigating that. > > > > Okay. > The attached patch fixes this partition case for me. Basically, we need to check the parallel-safety of PartitionKey. I have only checked for partsupfunc but I think we need to check the parallel-safety of partexprs as well. Also, I noticed that you have allowed for parallelism only when all expressions/functions involved with Insert are parallel-safe, can't we allow parallel-restricted case because anyway Inserts have to be performed by the leader for this patch. -- With Regards, Amit Kapila.
Attachment
On Fri, Oct 16, 2020 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > Also, I'm seeing a partition-related error when running > > > installcheck-world - I'm investigating that. > > > > > > > Okay. > > > > The attached patch fixes this partition case for me. Basically, we > need to check the parallel-safety of PartitionKey. I have only checked > for partsupfunc but I think we need to check the parallel-safety of > partexprs as well. Thanks, I had already added the parallel-safety check for partexprs when I saw this, so your patch hopefully completes all the partition-related checks required. > Also, I noticed that you have allowed for > parallelism only when all expressions/functions involved with Insert > are parallel-safe, can't we allow parallel-restricted case because > anyway Inserts have to be performed by the leader for this patch. > Yes, I think you're right. "A parallel restricted operation is one which cannot be performed in a parallel worker, but which can be performed in the leader while parallel query is in use." I'll make the change and test that everything works OK. Regards, Greg Nancarrow Fujitsu Australia
On Fri, Oct 16, 2020 at 2:16 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 16, 2020 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Also, I noticed that you have allowed for > > parallelism only when all expressions/functions involved with Insert > > are parallel-safe, can't we allow parallel-restricted case because > > anyway Inserts have to be performed by the leader for this patch. > > > > Yes, I think you're right. > "A parallel restricted operation is one which cannot be performed in a > parallel worker, but which can be performed in the leader while > parallel query is in use." > I'll make the change and test that everything works OK. > Cool, let me try to explain my thoughts a bit more. The idea is first (in standard_planner) we check if there is any 'parallel_unsafe' function/expression (via max_parallel_hazard) in the query tree. If we don't find anything 'parallel_unsafe' then we mark parallelModeOk. At this stage, the patch is checking whether there is any 'parallel_unsafe' or 'parallel_restricted' expression/function in the target relation and if there is none then we mark parallelModeOK as true. So, if there is anything 'parallel_restricted' then we will mark parallelModeOK as false which doesn't seem right to me. Then later in the planner during set_rel_consider_parallel, we determine if a particular relation can be scanned from within a worker, then we consider that relation for parallelism. Here, we determine if certain things are parallel-restricted then we don't consider this for parallelism. Then we create partial paths for the relations that are considered for parallelism. I think we don't need to change anything for the current patch in these later stages because we anyway are not considering Insert to be pushed into workers. However, in the second patch where we are thinking to push Inserts in workers, we might need to do something to filter parallel-restricted cases during this stage of the planner. -- With Regards, Amit Kapila.
On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Cool, let me try to explain my thoughts a bit more. The idea is first > (in standard_planner) we check if there is any 'parallel_unsafe' > function/expression (via max_parallel_hazard) in the query tree. If we > don't find anything 'parallel_unsafe' then we mark parallelModeOk. At > this stage, the patch is checking whether there is any > 'parallel_unsafe' or 'parallel_restricted' expression/function in the > target relation and if there is none then we mark parallelModeOK as > true. So, if there is anything 'parallel_restricted' then we will mark > parallelModeOK as false which doesn't seem right to me. > > Then later in the planner during set_rel_consider_parallel, we > determine if a particular relation can be scanned from within a > worker, then we consider that relation for parallelism. Here, we > determine if certain things are parallel-restricted then we don't > consider this for parallelism. Then we create partial paths for the > relations that are considered for parallelism. I think we don't need > to change anything for the current patch in these later stages because > we anyway are not considering Insert to be pushed into workers. > However, in the second patch where we are thinking to push Inserts in > workers, we might need to do something to filter parallel-restricted > cases during this stage of the planner. > Posting an update to the smaller patch (Parallel SELECT for INSERT INTO...SELECT...). Most of this patch feeds into the larger Parallel INSERT patch, for which I'll also be posting an update soon. Patch updates include: - Removed explicit trigger-type checks (instead rely on declared trigger parallel safety) - Restored parallel-related XID checks that previous patch altered; now assign XID prior to entering parallel-mode - Now considers parallel-SELECT for parallel RESTRICTED cases (not just parallel SAFE cases) - Added parallel-safety checks for partition key expressions and support functions - Workaround added for test failure in "partition-concurrent-attach" test; since ALTER TABLE operations may exclusively lock a relation until end-of-transaction, now assume and return UNSAFE if can't acquire a share-lock on the relation, rather than block until potentially end of the other concurrent transaction in which the exclusive lock is held. Examples of when a relation is exclusively locked (AccessExclusiveLock) until end-of-transaction include: ALTER TABLE DROP COLUMN ALTER TABLE ... RENAME ALTER TABLE ... ATTACH PARTITION (locks default partition) Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Cool, let me try to explain my thoughts a bit more. The idea is first > (in standard_planner) we check if there is any 'parallel_unsafe' > function/expression (via max_parallel_hazard) in the query tree. If we > don't find anything 'parallel_unsafe' then we mark parallelModeOk. At > this stage, the patch is checking whether there is any > 'parallel_unsafe' or 'parallel_restricted' expression/function in the > target relation and if there is none then we mark parallelModeOK as > true. So, if there is anything 'parallel_restricted' then we will mark > parallelModeOK as false which doesn't seem right to me. > > Then later in the planner during set_rel_consider_parallel, we > determine if a particular relation can be scanned from within a > worker, then we consider that relation for parallelism. Here, we > determine if certain things are parallel-restricted then we don't > consider this for parallelism. Then we create partial paths for the > relations that are considered for parallelism. I think we don't need > to change anything for the current patch in these later stages because > we anyway are not considering Insert to be pushed into workers. > However, in the second patch where we are thinking to push Inserts in > workers, we might need to do something to filter parallel-restricted > cases during this stage of the planner. > Posting an updated Parallel INSERT patch which (mostly) addresses previously-identified issues and suggestions. More work needs to be done in order to support parallel UPDATE and DELETE (even after application of Thomas Munro's combo-cid parallel-support patch), but it is getting closer. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Thu, Oct 22, 2020 at 9:47 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Posting an update to the smaller patch (Parallel SELECT for INSERT > INTO...SELECT...). > > Most of this patch feeds into the larger Parallel INSERT patch, for > which I'll also be posting an update soon. > > Patch updates include: > - Removed explicit trigger-type checks (instead rely on declared > trigger parallel safety) > - Restored parallel-related XID checks that previous patch altered; > now assign XID prior to entering parallel-mode > - Now considers parallel-SELECT for parallel RESTRICTED cases (not > just parallel SAFE cases) > - Added parallel-safety checks for partition key expressions and > support functions > - Workaround added for test failure in "partition-concurrent-attach" > test; > IIUC, below is code for this workaround: +MaxRelParallelHazardForModify(Oid relid, + CmdType commandType, + max_parallel_hazard_context *context) +{ + Relation rel; + TupleDesc tupdesc; + int attnum; + + LOCKMODE lockmode = AccessShareLock; + + /* + * It's possible that this relation is locked for exclusive access + * in another concurrent transaction (e.g. as a result of a + * ALTER TABLE ... operation) until that transaction completes. + * If a share-lock can't be acquired on it now, we have to assume this + * could be the worst-case, so to avoid blocking here until that + * transaction completes, conditionally try to acquire the lock and + * assume and return UNSAFE on failure. + */ + if (ConditionalLockRelationOid(relid, lockmode)) + { + rel = table_open(relid, NoLock); + } + else + { + context->max_hazard = PROPARALLEL_UNSAFE; + return context->max_hazard; + } Do we need this workaround if we lock just the parent table instead of locking all the tables? Basically, can we safely identify the parallel-safety of partitioned relation if we just have a lock on parent relation? One more thing I have noticed is that for scan relations (Select query), we do such checks much later based on RelOptInfo (see set_rel_consider_parallel) which seems to have most of the information required to perform parallel-safety checks but I guess for ModifyTable (aka the Insert table) the equivalent doesn't seem feasible but have you thought of doing at the later stage in planner? Few other comments on latest patch: =============================== 1. MaxRelParallelHazardForModify() { .. + if (commandType == CMD_INSERT || commandType == CMD_UPDATE) + { + /* .. Why to check CMD_UPDATE here? 2. +void PrepareParallelModeForModify(CmdType commandType, bool isParallelModifyLeader) +{ + Assert(!IsInParallelMode()); + + if (isParallelModifyLeader) + (void)GetCurrentCommandId(true); + + (void)GetCurrentFullTransactionId(); Here, we should use GetCurrentTransactionId() similar to heap_insert or other heap operations. I am not sure why you have used GetCurrentFullTransactionId? 3. Can we have a test to show why we need to check all the partitions for parallel-safety? I think it would be possible when there is a trigger on only one of the partitions and that trigger has corresponding parallel_unsafe function. But it is good to verify that once. 4. Have you checked the overhead of this on the planner for different kinds of statements like inserts into tables having 100 or 500 partitions? Similarly, it is good to check the overhead of domain related checks added in the patch. 5. Can we have a separate patch for parallel-selects for Insert? It will make review easier. -- With Regards, Amit Kapila.
On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > IIUC, below is code for this workaround: > > +MaxRelParallelHazardForModify(Oid relid, > + CmdType commandType, > + max_parallel_hazard_context *context) > +{ > + Relation rel; > + TupleDesc tupdesc; > + int attnum; > + > + LOCKMODE lockmode = AccessShareLock; > + > + /* > + * It's possible that this relation is locked for exclusive access > + * in another concurrent transaction (e.g. as a result of a > + * ALTER TABLE ... operation) until that transaction completes. > + * If a share-lock can't be acquired on it now, we have to assume this > + * could be the worst-case, so to avoid blocking here until that > + * transaction completes, conditionally try to acquire the lock and > + * assume and return UNSAFE on failure. > + */ > + if (ConditionalLockRelationOid(relid, lockmode)) > + { > + rel = table_open(relid, NoLock); > + } > + else > + { > + context->max_hazard = PROPARALLEL_UNSAFE; > + return context->max_hazard; > + } > > Do we need this workaround if we lock just the parent table instead of > locking all the tables? Basically, can we safely identify the > parallel-safety of partitioned relation if we just have a lock on > parent relation? I believe the workaround is still needed in this case, because the workaround was added because of a test in which the parent table was exclusively locked in another concurrent transaction (as a result of ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a ShareLock on the parent table without hanging (and then ending up failing the test because of it). So at the moment the workaround is needed, even if just trying to lock the parent table. I'll do some more testing to determine the secondary issue of whether locks on the partition tables are needed, but at the moment I believe they are. >One more thing I have noticed is that for scan > relations (Select query), we do such checks much later based on > RelOptInfo (see set_rel_consider_parallel) which seems to have most of > the information required to perform parallel-safety checks but I guess > for ModifyTable (aka the Insert table) the equivalent doesn't seem > feasible but have you thought of doing at the later stage in planner? > Yes, and in fact I tried putting the checks in a later stage of the planner, and it's almost successful, except it then makes setting "parallelModeNeeded" very tricky indeed, because that is expected to be set based on whether the SQL is safe to run in parallel mode (paralleModeOK == true) and whether force_parallel_mode is not off. With parallel safety checks delayed to a later stage in the planner, it's then not known whether there are certain types of parallel-unsafe INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE ...), because processing for those doesn't reach those later stages of the planner where parallelism is being considered. So then to avoid errors from when parallel-mode is forced on and such unsafe INSERTs are run, the only real choice is to only allow parallelModeNeeded to be true for SELECT only (not INSERT), and this is kind of cheating and also not picking up cases where parallel-safe INSERT is run but invokes parallel-mode-unsafe features. My conclusion, at least for the moment, is to leave the check where it is. > Few other comments on latest patch: > =============================== > 1. > MaxRelParallelHazardForModify() > { > .. > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE) > + { > + /* > .. > > Why to check CMD_UPDATE here? > That was a bit of forward-thinking, for when/if UPDATE/DELETE is supported in parallel-mode. Column default expressions and check-constraints are only applicable to INSERT and UPDATE. Note however that currently this function can only ever be called with commandType == CMD_INSERT. > 2. > +void PrepareParallelModeForModify(CmdType commandType, bool > isParallelModifyLeader) > +{ > + Assert(!IsInParallelMode()); > + > + if (isParallelModifyLeader) > + (void)GetCurrentCommandId(true); > + > + (void)GetCurrentFullTransactionId(); > > Here, we should use GetCurrentTransactionId() similar to heap_insert > or other heap operations. I am not sure why you have used > GetCurrentFullTransactionId? > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually have the same functionality, just a different return value (which is not being used here). But anyway I've changed it to use GetCurrentTransactionId(). > 3. Can we have a test to show why we need to check all the partitions > for parallel-safety? I think it would be possible when there is a > trigger on only one of the partitions and that trigger has > corresponding parallel_unsafe function. But it is good to verify that > once. > I can't imagine how you could check parallel-safety properly without checking all of the partitions. We don't know which partition that data will get inserted into until runtime (e.g. range/list partitioning). Each partition can have its own column default expressions, check-constraints, triggers etc. (which may or may not be parallel-safe) and a partition may itself be a partitioned table. > 4. Have you checked the overhead of this on the planner for different > kinds of statements like inserts into tables having 100 or 500 > partitions? Similarly, it is good to check the overhead of domain > related checks added in the patch. > Checking that now and will post results soon. > 5. Can we have a separate patch for parallel-selects for Insert? It > will make review easier. > See attached patches. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Fri, Oct 30, 2020 at 6:09 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > IIUC, below is code for this workaround: > > > > +MaxRelParallelHazardForModify(Oid relid, > > + CmdType commandType, > > + max_parallel_hazard_context *context) > > +{ > > + Relation rel; > > + TupleDesc tupdesc; > > + int attnum; > > + > > + LOCKMODE lockmode = AccessShareLock; > > + > > + /* > > + * It's possible that this relation is locked for exclusive access > > + * in another concurrent transaction (e.g. as a result of a > > + * ALTER TABLE ... operation) until that transaction completes. > > + * If a share-lock can't be acquired on it now, we have to assume this > > + * could be the worst-case, so to avoid blocking here until that > > + * transaction completes, conditionally try to acquire the lock and > > + * assume and return UNSAFE on failure. > > + */ > > + if (ConditionalLockRelationOid(relid, lockmode)) > > + { > > + rel = table_open(relid, NoLock); > > + } > > + else > > + { > > + context->max_hazard = PROPARALLEL_UNSAFE; > > + return context->max_hazard; > > + } > > > > Do we need this workaround if we lock just the parent table instead of > > locking all the tables? Basically, can we safely identify the > > parallel-safety of partitioned relation if we just have a lock on > > parent relation? > > I believe the workaround is still needed in this case, because the > workaround was added because of a test in which the parent table was > exclusively locked in another concurrent transaction (as a result of > ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a > ShareLock on the parent table without hanging (and then ending up > failing the test because of it). > Don't you think the test case design is flawed in that case? Because even simple "select * from tpart;" will hang in planner while taking share lock (the code flow is: add_other_rels_to_query->expand_inherited_rtentry->expand_partitioned_rtentry) once you take exclusive lock for a parallel session on the table. Currently we never need to acquire any lock for Inserts in the planner but not sure we can design a test case based on that assumption as we can see it fails in this basic case. > So at the moment the workaround is needed, even if just trying to lock > the parent table. > I am not convinced, rather I think that the test case is not well designed unless there is any other way (without taking share lock on the relation) to determine parallel-safety of Inserts which neither of us have thought of. I understand that you don't want to change that test case as part of this patch so you are using this workaround. > I'll do some more testing to determine the secondary issue of whether > locks on the partition tables are needed, but at the moment I believe > they are. > Fair enough but lets determine that by some testing and analysis. I feel we should even add a comment if we require to lock all partition tables. I see that we are already doing it for SELECT in the above mentioned code path so maybe it is okay to do so for Inserts as well. > >One more thing I have noticed is that for scan > > relations (Select query), we do such checks much later based on > > RelOptInfo (see set_rel_consider_parallel) which seems to have most of > > the information required to perform parallel-safety checks but I guess > > for ModifyTable (aka the Insert table) the equivalent doesn't seem > > feasible but have you thought of doing at the later stage in planner? > > > > Yes, and in fact I tried putting the checks in a later stage of the > planner, and it's almost successful, except it then makes setting > "parallelModeNeeded" very tricky indeed, because that is expected to > be set based on whether the SQL is safe to run in parallel mode > (paralleModeOK == true) and whether force_parallel_mode is not off. > With parallel safety checks delayed to a later stage in the planner, > it's then not known whether there are certain types of parallel-unsafe > INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE > ...), because processing for those doesn't reach those later stages of > the planner where parallelism is being considered. > I guess if that is the only case then you can have that check in the earlier stage of planner (we should be able to do that as the information is present in Query) and other checks in the later stage. However, I guess that is not the only case, we need to determine parallel-safety of index expressions, trigger functions if any, any other CHECK expressions on each of attribute, etc. > So then to avoid > errors from when parallel-mode is forced on and such unsafe INSERTs > are run, the only real choice is to only allow parallelModeNeeded to > be true for SELECT only (not INSERT), and this is kind of cheating and > also not picking up cases where parallel-safe INSERT is run but > invokes parallel-mode-unsafe features. > My conclusion, at least for the moment, is to leave the check where it is. > Okay, then can we integrate the functionality of MaxParallelHazardForModify in max_parallel_hazard? Calling it separately looks bit awkward. > > > Few other comments on latest patch: > > =============================== > > 1. > > MaxRelParallelHazardForModify() > > { > > .. > > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE) > > + { > > + /* > > .. > > > > Why to check CMD_UPDATE here? > > > > That was a bit of forward-thinking, for when/if UPDATE/DELETE is > supported in parallel-mode. > Column default expressions and check-constraints are only applicable > to INSERT and UPDATE. > Note however that currently this function can only ever be called with > commandType == CMD_INSERT. > I feel then for other command types there should be an Assert rather than try to handle something which is not yet implemented nor it is clear what all is required for that. It confuses the reader, at least it confused me. Probably we can write a comment but I don't think we should have any check for Update at this stage of work. > > 2. > > +void PrepareParallelModeForModify(CmdType commandType, bool > > isParallelModifyLeader) > > +{ > > + Assert(!IsInParallelMode()); > > + > > + if (isParallelModifyLeader) > > + (void)GetCurrentCommandId(true); > > + > > + (void)GetCurrentFullTransactionId(); > > > > Here, we should use GetCurrentTransactionId() similar to heap_insert > > or other heap operations. I am not sure why you have used > > GetCurrentFullTransactionId? > > > > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually > have the same functionality, just a different return value (which is > not being used here). > Sure but lets use what is required. > But anyway I've changed it to use GetCurrentTransactionId(). > But comments in ExecutePlan and PrepareParallelModeForModify still refer to FullTransactionId. > > > > 4. Have you checked the overhead of this on the planner for different > > kinds of statements like inserts into tables having 100 or 500 > > partitions? Similarly, it is good to check the overhead of domain > > related checks added in the patch. > > > > Checking that now and will post results soon. > Thanks. -- With Regards, Amit Kapila.
On Fri, Oct 30, 2020 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > So then to avoid > > errors from when parallel-mode is forced on and such unsafe INSERTs > > are run, the only real choice is to only allow parallelModeNeeded to > > be true for SELECT only (not INSERT), and this is kind of cheating and > > also not picking up cases where parallel-safe INSERT is run but > > invokes parallel-mode-unsafe features. > > My conclusion, at least for the moment, is to leave the check where it is. > > > > Okay, then can we integrate the functionality of > MaxParallelHazardForModify in max_parallel_hazard? Calling it > separately looks bit awkward. > Looking into that. > > > > > Few other comments on latest patch: > > > =============================== > > > 1. > > > MaxRelParallelHazardForModify() > > > { > > > .. > > > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE) > > > + { > > > + /* > > > .. > > > > > > Why to check CMD_UPDATE here? > > > > > > > That was a bit of forward-thinking, for when/if UPDATE/DELETE is > > supported in parallel-mode. > > Column default expressions and check-constraints are only applicable > > to INSERT and UPDATE. > > Note however that currently this function can only ever be called with > > commandType == CMD_INSERT. > > > > I feel then for other command types there should be an Assert rather > than try to handle something which is not yet implemented nor it is > clear what all is required for that. It confuses the reader, at least > it confused me. Probably we can write a comment but I don't think we > should have any check for Update at this stage of work. > OK, for now I'll restrict the checks to INSERT, but I'll add comments to assist with potential future UPDATE support. > > > 2. > > > +void PrepareParallelModeForModify(CmdType commandType, bool > > > isParallelModifyLeader) > > > +{ > > > + Assert(!IsInParallelMode()); > > > + > > > + if (isParallelModifyLeader) > > > + (void)GetCurrentCommandId(true); > > > + > > > + (void)GetCurrentFullTransactionId(); > > > > > > Here, we should use GetCurrentTransactionId() similar to heap_insert > > > or other heap operations. I am not sure why you have used > > > GetCurrentFullTransactionId? > > > > > > > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually > > have the same functionality, just a different return value (which is > > not being used here). > > > > Sure but lets use what is required. > > > But anyway I've changed it to use GetCurrentTransactionId(). > > > > But comments in ExecutePlan and PrepareParallelModeForModify still > refer to FullTransactionId. > I believe those comments are technically correct. GetCurrentTransactionId() calls AssignTransactionId() to do all the work - and the comment for that function says "Assigns a new permanent FullTransactionId to the given TransactionState". > > > > > > > 4. Have you checked the overhead of this on the planner for different > > > kinds of statements like inserts into tables having 100 or 500 > > > partitions? Similarly, it is good to check the overhead of domain > > > related checks added in the patch. > > > > > > > Checking that now and will post results soon. > > > I am seeing a fair bit of overhead in the planning for the INSERT parallel-safety checks (mind you, compared to the overall performance gain, it's not too bad). Some representative timings for a parallel INSERT of a millions rows into 100,250 and 500 partitions are shown below. (1) Without patch # Partitions Planning Time (ms) Execution Time (ms) 100 1.014 4176.435 250 0.404 3842.414 500 0.529 4440.633 (2) With Parallel INSERT patch # Partitions Planning Time (ms) Execution Time (ms) 100 11.420 2131.148 250 23.269 3472.259 500 36.531 3238.868 I'm looking into how this can be improved by better integration into the current code, and addressing locking concerns that you've previously mentioned. Regards, Greg Nancarrow Fujitsu Australia
>
> See attached patches.
>
Thanks for providing the patches.
I had reviewed v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to be
+ * included in the transaction state that is serialized in the parallel DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+ Assert(!IsInParallelMode());
+
+ (void)GetCurrentTransactionId();
+}
-> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes into a completely new table. In the future, we can extend it to allow workers to write into the table" must be updated accordingly:
+ * modify any data using a CTE, or if this is a cursor operation, or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.
*
- * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+ * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
* backend writes into a completely new table. In the future, we can
* extend it to allow workers to write into the table. However, to allow
-> Also should we specify insert as "insert into select"
-> We could include a small writeup of the design may be in the commit message. It will be useful for review.
-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+ /*
+ * Supported table-modification commands may require additional steps
+ * prior to entering parallel mode, such as assigning a FullTransactionId.
+ */
-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+ if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+ break;
+
+ /*
+ * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+ * the relation, and this would result in creation of new CommandIds
+ * on insert/update/delete and this isn't supported in a parallel
+ * worker (but is safe in the parallel leader).
+ */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_FK)
+ {
+ context->max_hazard = PROPARALLEL_RESTRICTED;
+ /*
+ * As we're looking for the max parallel hazard, we don't break
+ * here; examine any further triggers ...
+ */
+ }
-> Should we switch to non-parallel mode in this case, instead of throwing error?
+ val = SysCacheGetAttr(CONSTROID, tup,
+ Anum_pg_constraint_conbin, &isnull);
+ if (isnull)
+ elog(ERROR, "null conbin for constraint %u", con->oid);
+ conbin = TextDatumGetCString(val);
-> We could include a few tests for this in regression.
-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
+ * modify any data using a CTE, or if this is a cursor operation, or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.
*
- * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+ * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
* backend writes into a completely new table. In the future, we can
* extend it to allow workers to write into the table. However, to allow
-> Also should we specify insert as "insert into select"
-> We could include a small writeup of the design may be in the commit message. It will be useful for review.
-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+ /*
+ * Supported table-modification commands may require additional steps
+ * prior to entering parallel mode, such as assigning a FullTransactionId.
+ */
-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+ if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+ break;
+
+ /*
+ * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+ * the relation, and this would result in creation of new CommandIds
+ * on insert/update/delete and this isn't supported in a parallel
+ * worker (but is safe in the parallel leader).
+ */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_FK)
+ {
+ context->max_hazard = PROPARALLEL_RESTRICTED;
+ /*
+ * As we're looking for the max parallel hazard, we don't break
+ * here; examine any further triggers ...
+ */
+ }
-> Should we switch to non-parallel mode in this case, instead of throwing error?
+ val = SysCacheGetAttr(CONSTROID, tup,
+ Anum_pg_constraint_conbin, &isnull);
+ if (isnull)
+ elog(ERROR, "null conbin for constraint %u", con->oid);
+ conbin = TextDatumGetCString(val);
-> We could include a few tests for this in regression.
-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
Hi Vignesh, Thanks for reviewing the patches. On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote: > > -> commandType is not used, we can remove it. > + * Prepare for entering parallel mode by assigning a FullTransactionId, to be > + * included in the transaction state that is serialized in the parallel DSM. > + */ > +void PrepareParallelModeForModify(CmdType commandType) > +{ > + Assert(!IsInParallelMode()); > + > + (void)GetCurrentTransactionId(); > +} Thanks, at least for INSERT, it's not needed, so I'll remove it. > > -> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes intoa completely new table. In the future, we can extend it to allow workers to write into the table" must be updated accordingly: > + * modify any data using a CTE, or if this is a cursor operation, or if > + * GUCs are set to values that don't permit parallelism, or if > + * parallel-unsafe functions are present in the query tree. > * > - * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE > + * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE > * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader > * backend writes into a completely new table. In the future, we can > * extend it to allow workers to write into the table. However, to allow > > -> Also should we specify insert as "insert into select" > I'll update it, appropriate to each patch. > -> We could include a small writeup of the design may be in the commit message. It will be useful for review. > Will do so for the next patch version. > -> I felt the below two assignment statements can be in the else condition: > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > + > + /* > + * Additional parallel-mode safety checks are required in order to > + * allow an underlying parallel query to be used for a > + * table-modification command that is supported in parallel-mode. > + */ > + if (glob->parallelModeOK && > + IsModifySupportedInParallelMode(parse->commandType)) > + { > + glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard); > + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > + } > > something like: > /* > * Additional parallel-mode safety checks are required in order to > * allow an underlying parallel query to be used for a > * table-modification command that is supported in parallel-mode. > */ > if (glob->parallelModeOK && > IsModifySupportedInParallelMode(parse->commandType)) > glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard); > else > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > That won't work. As the comment is trying to point out, additional parallel-safety checks (i.e. in addition to those done by max_parallel_hazard()) are required to determine if INSERT can be safely run in parallel-mode with an underlying parallel query. Also, the max_parallel_hazard found from first calling max_parallel_hazard() then needs to be fed into MaxParallelHazardForModify(), in case it finds a worse parallel hazard. For example, max_parallel_hazard() may find something parallel RESTRICTED, but then the additional parallel-safety checks done by MaxParallelHazardForModify() find something parallel UNSAFE. > -> Comments need slight adjustment, maybe you could run pgindent for the modified code. > + /* > + * Supported table-modification commands may require additional steps > + * prior to entering parallel mode, such as assigning a FullTransactionId. > + */ > OK, will run pgindent. > -> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in thatcase? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue. > + if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context)) > + break; > + > + /* > + * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in > + * the relation, and this would result in creation of new CommandIds > + * on insert/update/delete and this isn't supported in a parallel > + * worker (but is safe in the parallel leader). > + */ > + trigtype = RI_FKey_trigger_type(trigger->tgfoid); > + if (trigtype == RI_TRIGGER_FK) > + { > + context->max_hazard = PROPARALLEL_RESTRICTED; > + /* > + * As we're looking for the max parallel hazard, we don't break > + * here; examine any further triggers ... > + */ > + } > max_parallel_hazard_test won't return true for PROPARALLEL_RESTRICTED. max_parallel_hazard_test only returns true when "context.max_interesting" is found, and that is set to PROPARALLEL_UNSAFE in max_parallel_hazard_for_modify(). > -> Should we switch to non-parallel mode in this case, instead of throwing error? > + val = SysCacheGetAttr(CONSTROID, tup, > + Anum_pg_constraint_conbin, &isnull); > + if (isnull) > + elog(ERROR, "null conbin for constraint %u", con->oid); > + conbin = TextDatumGetCString(val); > I didn't invent that error check, it's found in several other places in the Postgres code (that error should only ever occur if the database has been corrupted or intentionally invalidated). Having said that, I agree that perhaps it's best to switch to non-parallel mode in this case, but this wouldn't stop it erroring out when the plan is actually run. > -> We could include a few tests for this in regression. > Looking at adding relevant test cases. > -> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc > Looking at doc updates. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Nov 4, 2020 at 6:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote: > > > > -> commandType is not used, we can remove it. > > + * Prepare for entering parallel mode by assigning a FullTransactionId, to be > > + * included in the transaction state that is serialized in the parallel DSM. > > + */ > > +void PrepareParallelModeForModify(CmdType commandType) > > +{ > > + Assert(!IsInParallelMode()); > > + > > + (void)GetCurrentTransactionId(); > > +} > > Thanks, at least for INSERT, it's not needed, so I'll remove it. > Or you might want to consider moving the check related to IsModifySupportedInParallelMode() inside PrepareParallelModeForModify(). That way the code might look a bit cleaner. -- With Regards, Amit Kapila.
On Wed, Nov 4, 2020 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Or you might want to consider moving the check related to > IsModifySupportedInParallelMode() inside > PrepareParallelModeForModify(). That way the code might look a bit > cleaner. > Posting an updated Parallel SELECT for "INSERT INTO ... SELECT ..." patch which addresses previously-identified issues and suggestions, and adds some tests and doc updates. I won't post an updated Parallel INSERT patch just yet (which just builds on the 1st patch), because there's at least a couple of issues in this 1st patch which need to be discussed first. Firstly, in order to perform parallel-safety checks in the case of partitions, the patch currently recursively locks/unlocks (AccessShareLock) each partition during such checks (as each partition may itself be a partitioned table). Is there a better way of performing the parallel-safety checks and reducing the locking requirements? Secondly, I found that when running "make check-world", the "partition-concurrent-attach" test fails, because it is expecting a partition constraint to be violated on insert, while an "alter table attach partition ..." is concurrently being executed in another transaction. Because of the partition locking done by the patch's parallel-safety checking code, the insert blocks on the exclusive lock held by the "alter table" in the other transaction until the transaction ends, so the insert ends up successfully completing (and thus fails the test) when the other transaction ends. To overcome this test failure, the patch code was updated to instead perform a conditional lock on the partition, and on failure (i.e. because of an exclusive lock held somewhere else), just assume it's parallel-unsafe because the parallel-safety can't be determined without blocking on the lock. This is not ideal, but I'm not sure of what other approach could be used and I am somewhat reluctant to change that test. If anybody is familiar with the "partition-concurrent-attach" test, any ideas or insights would be appreciated. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Fri, Nov 13, 2020 at 8:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Nov 4, 2020 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Or you might want to consider moving the check related to > > IsModifySupportedInParallelMode() inside > > PrepareParallelModeForModify(). That way the code might look a bit > > cleaner. > > > > Posting an updated Parallel SELECT for "INSERT INTO ... SELECT ..." > patch which addresses previously-identified issues and suggestions, > and adds some tests and doc updates. > I won't post an updated Parallel INSERT patch just yet (which just > builds on the 1st patch), because there's at least a couple of issues > in this 1st patch which need to be discussed first. > > Firstly, in order to perform parallel-safety checks in the case of > partitions, the patch currently recursively locks/unlocks > (AccessShareLock) each partition during such checks (as each partition > may itself be a partitioned table). Is there a better way of > performing the parallel-safety checks and reducing the locking > requirements? > > Secondly, I found that when running "make check-world", the > "partition-concurrent-attach" test fails, because it is expecting a > partition constraint to be violated on insert, while an "alter table > attach partition ..." is concurrently being executed in another > transaction. Because of the partition locking done by the patch's > parallel-safety checking code, the insert blocks on the exclusive lock > held by the "alter table" in the other transaction until the > transaction ends, so the insert ends up successfully completing (and > thus fails the test) when the other transaction ends. To overcome this > test failure, the patch code was updated to instead perform a > conditional lock on the partition, and on failure (i.e. because of an > exclusive lock held somewhere else), just assume it's parallel-unsafe > because the parallel-safety can't be determined without blocking on > the lock. This is not ideal, but I'm not sure of what other approach > could be used and I am somewhat reluctant to change that test. If > anybody is familiar with the "partition-concurrent-attach" test, any > ideas or insights would be appreciated. > Posting an updated set of patches, with some additional testing and documentation updates, and including the latest version of the Parallel Insert patch. Any feedback appreciated, especially on the two points mentioned in the previous post. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches, with some additional testing and > documentation updates, and including the latest version of the > Parallel Insert patch. > Any feedback appreciated, especially on the two points mentioned in > the previous post. > Posting an updated set of patches, since a minor bug was found in the 1st patch that was causing a postgresql-cfbot build failure. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Mon, Dec 7, 2020 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > Posting an updated set of patches, with some additional testing and > > documentation updates, and including the latest version of the > > Parallel Insert patch. > > Any feedback appreciated, especially on the two points mentioned in > > the previous post. > > > > Posting an updated set of patches, since a minor bug was found in the > 1st patch that was causing a postgresql-cfbot build failure. > Most of the code present in v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is applicable for parallel copy patch also. The patch in this thread handles the check for PROPARALLEL_UNSAFE, we could slightly make it generic by handling like the comments below, that way this parallel safety checks can be used based on the value set in max_parallel_hazard_context. There is nothing wrong with the changes, I'm providing these comments so that this patch can be generalized for parallel checks and the same can also be used by parallel copy. Few comments: 1) + trigtype = RI_FKey_trigger_type(trigger->tgfoid); + if (trigtype == RI_TRIGGER_FK) + { + context->max_hazard = PROPARALLEL_RESTRICTED; + + /* + * As we're looking for the max parallel hazard, we don't break + * here; examine any further triggers ... + */ + } Can we change this something like: trigtype = RI_FKey_trigger_type(trigger->tgfoid); if (trigtype == RI_TRIGGER_FK) { if(max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context) break; } This below line is not required as it will be taken care by max_parallel_hazard_test. context->max_hazard = PROPARALLEL_RESTRICTED; 2) + /* Recursively check each partition ... */ + pdesc = RelationGetPartitionDesc(rel); + for (i = 0; i < pdesc->nparts; i++) + { + if (rel_max_parallel_hazard_for_modify(pdesc->oids[i], + command_type, + context, + AccessShareLock) == PROPARALLEL_UNSAFE) + { + table_close(rel, lockmode); + return context->max_hazard; + } + } Can we change this something like: /* Recursively check each partition ... */ pdesc = RelationGetPartitionDesc(rel); for (i = 0; i < pdesc->nparts; i++) { char max_hazard = rel_max_parallel_hazard_for_modify(pdesc->oids[i], command_type, context, AccessShareLock); if(max_parallel_hazard_test(max_hazard, context) { table_close(rel, lockmode); return context->max_hazard; } } 3) Similarly for the below: + /* + * If there are any index expressions, check that they are parallel-mode + * safe. + */ + if (index_expr_max_parallel_hazard_for_modify(rel, context) == PROPARALLEL_UNSAFE) + { + table_close(rel, lockmode); + return context->max_hazard; + } + + /* + * If any triggers exist, check that they are parallel safe. + */ + if (rel->trigdesc != NULL && + trigger_max_parallel_hazard_for_modify(rel->trigdesc, context) == PROPARALLEL_UNSAFE) + { + table_close(rel, lockmode); + return context->max_hazard; + } 4) Similar change required for the below: + /* + * If the column is of a DOMAIN type, determine whether that + * domain has any CHECK expressions that are not parallel-mode + * safe. + */ + if (get_typtype(att->atttypid) == TYPTYPE_DOMAIN) + { + if (domain_max_parallel_hazard_for_modify(att->atttypid, context) == PROPARALLEL_UNSAFE) + { + table_close(rel, lockmode); + return context->max_hazard; + } + } Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > Most of the code present in > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > applicable for parallel copy patch also. The patch in this thread > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > generic by handling like the comments below, that way this parallel > safety checks can be used based on the value set in > max_parallel_hazard_context. There is nothing wrong with the changes, > I'm providing these comments so that this patch can be generalized for > parallel checks and the same can also be used by parallel copy. Hi Vignesh, You are absolutely right in pointing that out, the code was taking short-cuts knowing that for Parallel Insert, "max_parallel_hazard_context.max_interesting" had been set to PROPARALLEL_UNSAFE, which doesn't allow that code to be generically re-used by other callers. I've attached a new set of patches that includes your suggested improvements. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > Most of the code present in > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > applicable for parallel copy patch also. The patch in this thread > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > generic by handling like the comments below, that way this parallel > > safety checks can be used based on the value set in > > max_parallel_hazard_context. There is nothing wrong with the changes, > > I'm providing these comments so that this patch can be generalized for > > parallel checks and the same can also be used by parallel copy. > > Hi Vignesh, > > You are absolutely right in pointing that out, the code was taking > short-cuts knowing that for Parallel Insert, > "max_parallel_hazard_context.max_interesting" had been set to > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > re-used by other callers. > > I've attached a new set of patches that includes your suggested improvements. I was going through v10-0001 patch where we are parallelizing only the select part. + /* + * UPDATE is not currently supported in parallel-mode, so prohibit + * INSERT...ON CONFLICT...DO UPDATE... + */ + if (parse->onConflict != NULL && parse->onConflict->action == ONCONFLICT_UPDATE) + return PROPARALLEL_UNSAFE; I understand that we can now allow updates from the worker, but what is the problem if we allow the parallel select even if there is an update in the leader? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > Most of the code present in > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > > applicable for parallel copy patch also. The patch in this thread > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > > generic by handling like the comments below, that way this parallel > > > safety checks can be used based on the value set in > > > max_parallel_hazard_context. There is nothing wrong with the changes, > > > I'm providing these comments so that this patch can be generalized for > > > parallel checks and the same can also be used by parallel copy. > > > > Hi Vignesh, > > > > You are absolutely right in pointing that out, the code was taking > > short-cuts knowing that for Parallel Insert, > > "max_parallel_hazard_context.max_interesting" had been set to > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > > re-used by other callers. > > > > I've attached a new set of patches that includes your suggested improvements. > > I was going through v10-0001 patch where we are parallelizing only the > select part. > > + /* > + * UPDATE is not currently supported in parallel-mode, so prohibit > + * INSERT...ON CONFLICT...DO UPDATE... > + */ > + if (parse->onConflict != NULL && parse->onConflict->action == > ONCONFLICT_UPDATE) > + return PROPARALLEL_UNSAFE; > > I understand that we can now allow updates from the worker, but what > is the problem if we allow the parallel select even if there is an > update in the leader? > I think we can't allow update even in leader without having a mechanism for a shared combocid table. Right now, we share the ComboCids at the beginning of the parallel query and then never change it during the parallel query but if we allow updates in the leader backend which can generate a combocid then we need a mechanism to propagate that change. Does this make sense? -- With Regards, Amit Kapila.
On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > Most of the code present in > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > > > applicable for parallel copy patch also. The patch in this thread > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > > > generic by handling like the comments below, that way this parallel > > > > safety checks can be used based on the value set in > > > > max_parallel_hazard_context. There is nothing wrong with the changes, > > > > I'm providing these comments so that this patch can be generalized for > > > > parallel checks and the same can also be used by parallel copy. > > > > > > Hi Vignesh, > > > > > > You are absolutely right in pointing that out, the code was taking > > > short-cuts knowing that for Parallel Insert, > > > "max_parallel_hazard_context.max_interesting" had been set to > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > > > re-used by other callers. > > > > > > I've attached a new set of patches that includes your suggested improvements. > > > > I was going through v10-0001 patch where we are parallelizing only the > > select part. > > > > + /* > > + * UPDATE is not currently supported in parallel-mode, so prohibit > > + * INSERT...ON CONFLICT...DO UPDATE... > > + */ > > + if (parse->onConflict != NULL && parse->onConflict->action == > > ONCONFLICT_UPDATE) > > + return PROPARALLEL_UNSAFE; > > > > I understand that we can now allow updates from the worker, but what > > is the problem if we allow the parallel select even if there is an > > update in the leader? > > > > I think we can't allow update even in leader without having a > mechanism for a shared combocid table. Right now, we share the > ComboCids at the beginning of the parallel query and then never change > it during the parallel query but if we allow updates in the leader > backend which can generate a combocid then we need a mechanism to > propagate that change. Does this make sense? > Okay, got it. Basically, ONCONFLICT_UPDATE might run inside some transaction block and there is a possibility that update may try to update the same tuple is previously inserted by the same transaction and in that case, it will generate the combo cid. Thanks for clarifying. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > Most of the code present in > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > applicable for parallel copy patch also. The patch in this thread > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > generic by handling like the comments below, that way this parallel > > safety checks can be used based on the value set in > > max_parallel_hazard_context. There is nothing wrong with the changes, > > I'm providing these comments so that this patch can be generalized for > > parallel checks and the same can also be used by parallel copy. > > Hi Vignesh, > > You are absolutely right in pointing that out, the code was taking > short-cuts knowing that for Parallel Insert, > "max_parallel_hazard_context.max_interesting" had been set to > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > re-used by other callers. > In v10-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO, --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2049,10 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, * inserts in general except for the cases where inserts generate a new * CommandId (eg. inserts into a table having a foreign key column). */ - if (IsParallelWorker()) - ereport(ERROR, - (errcode(ERRCODE_INVALID_TRANSACTION_STATE), - errmsg("cannot insert tuples in a parallel worker"))); I think I have given a comment long back that here we can have an Assert to check if it is a parallel-worker and relation has a foreign-key and probably other conditions if possible. It is better to protect such cases from happening due to any bugs. Is there a reason you have not handled it? [1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com -- With Regards, Amit Kapila.
On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > > > Most of the code present in > > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > > > > applicable for parallel copy patch also. The patch in this thread > > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > > > > generic by handling like the comments below, that way this parallel > > > > > safety checks can be used based on the value set in > > > > > max_parallel_hazard_context. There is nothing wrong with the changes, > > > > > I'm providing these comments so that this patch can be generalized for > > > > > parallel checks and the same can also be used by parallel copy. > > > > > > > > Hi Vignesh, > > > > > > > > You are absolutely right in pointing that out, the code was taking > > > > short-cuts knowing that for Parallel Insert, > > > > "max_parallel_hazard_context.max_interesting" had been set to > > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > > > > re-used by other callers. > > > > > > > > I've attached a new set of patches that includes your suggested improvements. > > > > > > I was going through v10-0001 patch where we are parallelizing only the > > > select part. > > > > > > + /* > > > + * UPDATE is not currently supported in parallel-mode, so prohibit > > > + * INSERT...ON CONFLICT...DO UPDATE... > > > + */ > > > + if (parse->onConflict != NULL && parse->onConflict->action == > > > ONCONFLICT_UPDATE) > > > + return PROPARALLEL_UNSAFE; > > > > > > I understand that we can now allow updates from the worker, but what > > > is the problem if we allow the parallel select even if there is an > > > update in the leader? > > > > > > > I think we can't allow update even in leader without having a > > mechanism for a shared combocid table. Right now, we share the > > ComboCids at the beginning of the parallel query and then never change > > it during the parallel query but if we allow updates in the leader > > backend which can generate a combocid then we need a mechanism to > > propagate that change. Does this make sense? > > > > Okay, got it. Basically, ONCONFLICT_UPDATE might run inside some > transaction block and there is a possibility that update may try to > update the same tuple is previously inserted by the same transaction > and in that case, it will generate the combo cid. Thanks for > clarifying. > We can probably add a comment in the patch so that it is clear why we are not allowing this case. -- With Regards, Amit Kapila.
On Wed, 9 Dec 2020 at 5:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > >
> > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > Most of the code present in
> > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > > applicable for parallel copy patch also. The patch in this thread
> > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > > generic by handling like the comments below, that way this parallel
> > > > > safety checks can be used based on the value set in
> > > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > > I'm providing these comments so that this patch can be generalized for
> > > > > parallel checks and the same can also be used by parallel copy.
> > > >
> > > > Hi Vignesh,
> > > >
> > > > You are absolutely right in pointing that out, the code was taking
> > > > short-cuts knowing that for Parallel Insert,
> > > > "max_parallel_hazard_context.max_interesting" had been set to
> > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > > re-used by other callers.
> > > >
> > > > I've attached a new set of patches that includes your suggested improvements.
> > >
> > > I was going through v10-0001 patch where we are parallelizing only the
> > > select part.
> > >
> > > + /*
> > > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > > + * INSERT...ON CONFLICT...DO UPDATE...
> > > + */
> > > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > > ONCONFLICT_UPDATE)
> > > + return PROPARALLEL_UNSAFE;
> > >
> > > I understand that we can now allow updates from the worker, but what
> > > is the problem if we allow the parallel select even if there is an
> > > update in the leader?
> > >
> >
> > I think we can't allow update even in leader without having a
> > mechanism for a shared combocid table. Right now, we share the
> > ComboCids at the beginning of the parallel query and then never change
> > it during the parallel query but if we allow updates in the leader
> > backend which can generate a combocid then we need a mechanism to
> > propagate that change. Does this make sense?
> >
>
> Okay, got it. Basically, ONCONFLICT_UPDATE might run inside some
> transaction block and there is a possibility that update may try to
> update the same tuple is previously inserted by the same transaction
> and in that case, it will generate the combo cid. Thanks for
> clarifying.
>
We can probably add a comment in the patch so that it is clear why we
are not allowing this case.
+1
From: Greg Nancarrow <gregn4422@gmail.com> > Firstly, in order to perform parallel-safety checks in the case of partitions, the > patch currently recursively locks/unlocks > (AccessShareLock) each partition during such checks (as each partition may > itself be a partitioned table). Is there a better way of performing the > parallel-safety checks and reducing the locking requirements? First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased planningtime is negligible when the parallel insert is intentionally used for loading large amount of data. However, it'sa problem if the overhead is imposed on OLTP transactions. Does the overhead occur with the default values of max_parallel_workers_per_gather= 2 and max_parall_workers = 8? To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like relhasindexesand relhas triggers. The concerning point is that we have to maintain the accuracy of the value when droppingancillary objects around the table/partition. > Secondly, I found that when running "make check-world", the > "partition-concurrent-attach" test fails, because it is expecting a partition > constraint to be violated on insert, while an "alter table attach partition ..." is > concurrently being executed in another transaction. Because of the partition > locking done by the patch's parallel-safety checking code, the insert blocks on > the exclusive lock held by the "alter table" in the other transaction until the > transaction ends, so the insert ends up successfully completing (and thus fails > the test) when the other transaction ends. To overcome this test failure, the > patch code was updated to instead perform a conditional lock on the partition, > and on failure (i.e. because of an exclusive lock held somewhere else), just > assume it's parallel-unsafe because the parallel-safety can't be determined > without blocking on the lock. This is not ideal, but I'm not sure of what other > approach could be used and I am somewhat reluctant to change that test. If > anybody is familiar with the "partition-concurrent-attach" test, any ideas or > insights would be appreciated. That test looks sane. I think what we should do is to disable parallel operation during that test. It looks like some ofother existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0. It's not strange thatsome tests fail with some configuration. autovacuum is disabled in many places of the regression test. Rather, I don't think we should introduce the trick to use ConditionalLockAcquire(). Otherwise, the insert would be executedin a serial fashion without the user knowing it -- "What? The insert suddenly slowed down multiple times today,and it didn't finish within the planned maintenance window. What's wrong?" Regards Takayuki Tsunakawa
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > Most of the code present in > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > applicable for parallel copy patch also. The patch in this thread > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > generic by handling like the comments below, that way this parallel > > safety checks can be used based on the value set in > > max_parallel_hazard_context. There is nothing wrong with the changes, > > I'm providing these comments so that this patch can be generalized for > > parallel checks and the same can also be used by parallel copy. > > Hi Vignesh, > > You are absolutely right in pointing that out, the code was taking > short-cuts knowing that for Parallel Insert, > "max_parallel_hazard_context.max_interesting" had been set to > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > re-used by other callers. > > I've attached a new set of patches that includes your suggested improvements. > Thanks for fixing and posting a new patch. Few comments: + Node *index_expr; + + if (index_expr_item == NULL) /* shouldn't happen */ + elog(ERROR, "too few entries in indexprs list"); + + index_expr = (Node *) lfirst(index_expr_item); We can change this elog to below to maintain consistency: if (index_expr_item == NULL) /* shouldn't happen */ { context->max_hazard = PROPARALLEL_UNSAFE; return context->max_hazard; } static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, CommandId cid, int options) { /* * To allow parallel inserts, we need to ensure that they are safe to be * performed in workers. We have the infrastructure to allow parallel * inserts in general except for the cases where inserts generate a new * CommandId (eg. inserts into a table having a foreign key column). */ I felt we could remove the above comments or maybe rephrase it. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 3:50 PM vignesh C <vignesh21@gmail.com> wrote: > Few comments: > + Node *index_expr; > + > + if (index_expr_item == NULL) > /* shouldn't happen */ > + elog(ERROR, "too few > entries in indexprs list"); > + > + index_expr = (Node *) > lfirst(index_expr_item); > > We can change this elog to below to maintain consistency: > if (index_expr_item == NULL) /* shouldn't happen */ > { > context->max_hazard = PROPARALLEL_UNSAFE; > return context->max_hazard; > } > Thanks. I think you pointed out something similar to this before, but somehow I must have missed updating this as well (I was just following existing error handling for this case in the Postgres code). I'll update it as you suggest, in the next version of the patch I post. > static HeapTuple > heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, > CommandId cid, int options) > { > /* > * To allow parallel inserts, we need to ensure that they are safe to be > * performed in workers. We have the infrastructure to allow parallel > * inserts in general except for the cases where inserts generate a new > * CommandId (eg. inserts into a table having a foreign key column). > */ > I felt we could remove the above comments or maybe rephrase it. > That is Amit's comment, and I'm reluctant to change it because it is still applicable even after application of this patch. Amit has previously suggested that I add an Assert here, to match the comment (to replace the original Parallel-worker error-check that I removed), so I am looking into that. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote: > > > > Most of the code present in > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is > > applicable for parallel copy patch also. The patch in this thread > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it > > generic by handling like the comments below, that way this parallel > > safety checks can be used based on the value set in > > max_parallel_hazard_context. There is nothing wrong with the changes, > > I'm providing these comments so that this patch can be generalized for > > parallel checks and the same can also be used by parallel copy. > > Hi Vignesh, > > You are absolutely right in pointing that out, the code was taking > short-cuts knowing that for Parallel Insert, > "max_parallel_hazard_context.max_interesting" had been set to > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically > re-used by other callers. > > I've attached a new set of patches that includes your suggested improvements. > /* + * PrepareParallelMode + * + * Prepare for entering parallel mode, based on command-type. + */ +void +PrepareParallelMode(CmdType commandType) +{ + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); + + if (IsModifySupportedInParallelMode(commandType)) + { + /* + * Prepare for entering parallel mode by assigning a + * FullTransactionId, to be included in the transaction state that is + * serialized in the parallel DSM. + */ + (void) GetCurrentTransactionId(); + } +} Why do we need to serialize the transaction ID for 0001? I mean in 0001 we are just allowing the SELECT to be executed in parallel so why we would need the transaction Id for that. I agree that we would need this once we try to perform the Insert also from the worker in the remaining patches. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 10, 2020 at 1:23 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Greg Nancarrow <gregn4422@gmail.com> > > Firstly, in order to perform parallel-safety checks in the case of partitions, the > > patch currently recursively locks/unlocks > > (AccessShareLock) each partition during such checks (as each partition may > > itself be a partitioned table). Is there a better way of performing the > > parallel-safety checks and reducing the locking requirements? > > First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased planningtime is negligible when the parallel insert is intentionally used for loading large amount of data. However, it'sa problem if the overhead is imposed on OLTP transactions. Does the overhead occur with the default values of max_parallel_workers_per_gather= 2 and max_parall_workers = 8? > > To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like relhasindexesand relhas triggers. The concerning point is that we have to maintain the accuracy of the value when droppingancillary objects around the table/partition. > Having information in another table that needs to be accessed is likely to also have locking requirements. Here the issue is specifically with partitions, because otherwise if the target relation is not a partitioned table, it will already be locked prior to planning as part of the parse/re-write phase (and you will notice that the initial lock-mode, used by the parallel-safety checking code for opening the table, is NoLock). > > > Secondly, I found that when running "make check-world", the > > "partition-concurrent-attach" test fails, because it is expecting a partition > > constraint to be violated on insert, while an "alter table attach partition ..." is > > concurrently being executed in another transaction. Because of the partition > > locking done by the patch's parallel-safety checking code, the insert blocks on > > the exclusive lock held by the "alter table" in the other transaction until the > > transaction ends, so the insert ends up successfully completing (and thus fails > > the test) when the other transaction ends. To overcome this test failure, the > > patch code was updated to instead perform a conditional lock on the partition, > > and on failure (i.e. because of an exclusive lock held somewhere else), just > > assume it's parallel-unsafe because the parallel-safety can't be determined > > without blocking on the lock. This is not ideal, but I'm not sure of what other > > approach could be used and I am somewhat reluctant to change that test. If > > anybody is familiar with the "partition-concurrent-attach" test, any ideas or > > insights would be appreciated. > > That test looks sane. I think what we should do is to disable parallel operation during that test. It looks like someof other existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0. It's not strangethat some tests fail with some configuration. autovacuum is disabled in many places of the regression test. > > Rather, I don't think we should introduce the trick to use ConditionalLockAcquire(). Otherwise, the insert would be executedin a serial fashion without the user knowing it -- "What? The insert suddenly slowed down multiple times today,and it didn't finish within the planned maintenance window. What's wrong?" > > I think that's probably the best idea, to disable parallel operation during that test. However, that doesn't change the fact that, after removal of that "trick", then the partition locking used in the parallel-safety checking code will block, if a concurrent transaction has exclusively locked that partition (as in this test case), and thus there is no guarantee that a parallel insert will execute faster compared to serial execution (as such locks tend to be held until the end of the transaction). Regards, Greg Nancarrow Fujitsu Australia
On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > /* > + * PrepareParallelMode > + * > + * Prepare for entering parallel mode, based on command-type. > + */ > +void > +PrepareParallelMode(CmdType commandType) > +{ > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > + > + if (IsModifySupportedInParallelMode(commandType)) > + { > + /* > + * Prepare for entering parallel mode by assigning a > + * FullTransactionId, to be included in the transaction state that is > + * serialized in the parallel DSM. > + */ > + (void) GetCurrentTransactionId(); > + } > +} > > Why do we need to serialize the transaction ID for 0001? I mean in > 0001 we are just allowing the SELECT to be executed in parallel so why > we would need the transaction Id for that. I agree that we would need > this once we try to perform the Insert also from the worker in the > remaining patches. > There's a very good reason. It's related to parallel-mode checks for Insert and how the XID is lazily acquired if required. When allowing SELECT to be executed in parallel, we're in parallel-mode and the leader interleaves Inserts with retrieval of the tuple data from the workers. You will notice that heap_insert() calls GetTransactionId() as the very first thing it does. If the FullTransactionId is not valid, AssignTransactionId() is then called, which then executes this code: /* * Workers synchronize transaction state at the beginning of each parallel * operation, so we can't account for new XIDs at this point. */ if (IsInParallelMode() || IsParallelWorker()) elog(ERROR, "cannot assign XIDs during a parallel operation"); So that code (currently) has no way of knowing that a XID is being (lazily) assigned at the beginning, or somewhere in the middle of, a parallel operation. This is the reason why PrepareParallelMode() is calling GetTransactionId() up-front, to ensure a FullTransactionId is assigned up-front, prior to parallel-mode (so then there won't be an attempted XID assignment). If you remove the GetTransactionId() call from PrepareParallelMode() and run "make installcheck-world" with "force_parallel_mode=regress" in effect, many tests will fail with: ERROR: cannot assign XIDs during a parallel operation Regards, Greg Nancarrow Fujitsu Australia
On Thu, Dec 10, 2020 at 1:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > /* > > + * PrepareParallelMode > > + * > > + * Prepare for entering parallel mode, based on command-type. > > + */ > > +void > > +PrepareParallelMode(CmdType commandType) > > +{ > > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > > + > > + if (IsModifySupportedInParallelMode(commandType)) > > + { > > + /* > > + * Prepare for entering parallel mode by assigning a > > + * FullTransactionId, to be included in the transaction state that is > > + * serialized in the parallel DSM. > > + */ > > + (void) GetCurrentTransactionId(); > > + } > > +} > > > > Why do we need to serialize the transaction ID for 0001? I mean in > > 0001 we are just allowing the SELECT to be executed in parallel so why > > we would need the transaction Id for that. I agree that we would need > > this once we try to perform the Insert also from the worker in the > > remaining patches. > > > > There's a very good reason. It's related to parallel-mode checks for > Insert and how the XID is lazily acquired if required. > When allowing SELECT to be executed in parallel, we're in > parallel-mode and the leader interleaves Inserts with retrieval of the > tuple data from the workers. > You will notice that heap_insert() calls GetTransactionId() as the > very first thing it does. If the FullTransactionId is not valid, > AssignTransactionId() is then called, which then executes this code: > > /* > * Workers synchronize transaction state at the beginning of each parallel > * operation, so we can't account for new XIDs at this point. > */ > if (IsInParallelMode() || IsParallelWorker()) > elog(ERROR, "cannot assign XIDs during a parallel operation"); > > So that code (currently) has no way of knowing that a XID is being > (lazily) assigned at the beginning, or somewhere in the middle of, a > parallel operation. > This is the reason why PrepareParallelMode() is calling > GetTransactionId() up-front, to ensure a FullTransactionId is assigned > up-front, prior to parallel-mode (so then there won't be an attempted > XID assignment). > > If you remove the GetTransactionId() call from PrepareParallelMode() > and run "make installcheck-world" with "force_parallel_mode=regress" > in effect, many tests will fail with: > ERROR: cannot assign XIDs during a parallel operation Yeah got it, I missed that point that the goal is the avoid assigning the Transaction Id when we are in parallel mode. But IIUC at least for the first patch we don't want to serialize the XID in the transaction state right because workers don't need the xid as they are only doing select. So maybe we can readjust the comment slightly in the below code > > + * Prepare for entering parallel mode by assigning a > > + * FullTransactionId, to be included in the transaction state that is > > + * serialized in the parallel DSM. > > + */ > > + (void) GetCurrentTransactionId(); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Posting an updated set of patches to address recent feedback: - Removed conditional-locking code used in parallel-safety checking code (Tsunakawa-san feedback). It turns out that for the problem test case, no parallel-safety checking should be occurring that locks relations because those inserts are specifying VALUES, not an underlying SELECT, so the parallel-safety checking code was updated to bail out early if no underlying SELECT is specified for the INSERT. No change to the test code was required. - Added comment to better explain the reason for treating "INSERT ... ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) - Added assertion to heap_prepare_insert() (Amit) - Updated error handling for NULL index_expr_item case (Vignesh) Regards, Greg Nancarrow Fujitsu Australia
Attachment
Hi > Posting an updated set of patches to address recent feedback: > > - Removed conditional-locking code used in parallel-safety checking code > (Tsunakawa-san feedback). It turns out that for the problem test case, no > parallel-safety checking should be occurring that locks relations because > those inserts are specifying VALUES, not an underlying SELECT, so the > parallel-safety checking code was updated to bail out early if no underlying > SELECT is specified for the INSERT. No change to the test code was required. > - Added comment to better explain the reason for treating "INSERT ... > ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) > - Added assertion to heap_prepare_insert() (Amit) > - Updated error handling for NULL index_expr_item case (Vignesh) + + index_oid_list = RelationGetIndexList(rel); ... As memtioned in the comments of RelationGetIndexList: * we return a copy of the list palloc'd in the caller's context. The caller * may list_free() the returned list after scanning it. Shall we list_free(index_oid_list) at the end of function ? Just to avoid potential memory leak. Best regards, houzj
Hi I have an issue about the parallel-safety checks. If target table is foreign table or temporary table, rel_max_parallel_hazard_for_modify will return PROPARALLEL_UNSAFE, which not only disable parallel insert but also disable underlying parallel SELECT. +create temporary table temp_names (like names); +explain (costs off) insert into temp_names select * from names; + QUERY PLAN +------------------------- + Insert on temp_names + -> Seq Scan on names +(2 rows) I may be wrong, and if I miss sth in previous mails, please give me some hints. IMO, serial insertion with underlying parallel SELECT can be considered for foreign table or temporary table, as the insertions only happened in the leader process. Are there any special considerations for this case ? Best regards, houzj
On Wed, Dec 23, 2020 at 7:15 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi > > I have an issue about the parallel-safety checks. > > If target table is foreign table or temporary table, > rel_max_parallel_hazard_for_modify will return PROPARALLEL_UNSAFE, > which not only disable parallel insert but also disable underlying parallel SELECT. > > +create temporary table temp_names (like names); > +explain (costs off) insert into temp_names select * from names; > + QUERY PLAN > +------------------------- > + Insert on temp_names > + -> Seq Scan on names > +(2 rows) > > I may be wrong, and if I miss sth in previous mails, please give me some hints. > IMO, serial insertion with underlying parallel SELECT can be considered for foreign table or temporary table, > as the insertions only happened in the leader process. > I don't think we support parallel scan for temporary tables. Can you please try once both of these operations without Insert being involved? If you are able to produce a parallel plan without Insert then we can see why it is not supported with Insert. -- With Regards, Amit Kapila.
Hi > > I may be wrong, and if I miss sth in previous mails, please give me some > hints. > > IMO, serial insertion with underlying parallel SELECT can be > > considered for foreign table or temporary table, as the insertions only > happened in the leader process. > > > > I don't think we support parallel scan for temporary tables. Can you please > try once both of these operations without Insert being involved? If you > are able to produce a parallel plan without Insert then we can see why it > is not supported with Insert. Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) table is temporary. And you are right that parallel select is not enabled when temporary table is in select part. I test for the case when insert's target table is temporary or not. --insert into not temporary table--- postgres=# explain (costs off) insert into notemp select * from test where i < 600; QUERY PLAN --------------------------------------- Gather Workers Planned: 4 -> Insert on notemp -> Parallel Seq Scan on test Filter: (i < 600) --insert into temporary table--- postgres=# explain (costs off) insert into temp select * from test where i < 600; QUERY PLAN --------------------------- Insert on temp -> Seq Scan on test Filter: (i < 600) ---without insert part--- postgres=# explain (costs off) select * from test where i < 600; QUERY PLAN --------------------------------- Gather Workers Planned: 4 -> Parallel Seq Scan on test Filter: (i < 600) Best regards, houzj
On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi > > > > I may be wrong, and if I miss sth in previous mails, please give me some > > hints. > > > IMO, serial insertion with underlying parallel SELECT can be > > > considered for foreign table or temporary table, as the insertions only > > happened in the leader process. > > > > > > > I don't think we support parallel scan for temporary tables. Can you please > > try once both of these operations without Insert being involved? If you > > are able to produce a parallel plan without Insert then we can see why it > > is not supported with Insert. > > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) table istemporary. > And you are right that parallel select is not enabled when temporary table is in select part. > I think Select can be parallel for this case and we should support this case. -- With Regards, Amit Kapila.
Greg Nancarrow <gregn4422@gmail.com> wrote: > Posting an updated set of patches to address recent feedback: Following is my review. v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch -------------------------------------------------------------- @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); + + /* + * Additional parallel-mode safety checks are required in order to + * allow an underlying parallel query to be used for a + * table-modification command that is supported in parallel-mode. + */ + if (glob->parallelModeOK && + IsModifySupportedInParallelMode(parse->commandType)) + { + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard); + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); + } Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these functions should not be called by parallel worker. @@ -1015,6 +1016,27 @@ IsInParallelMode(void) } /* + * PrepareParallelMode + * + * Prepare for entering parallel mode, based on command-type. + */ +void +PrepareParallelMode(CmdType commandType) +{ + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); Isn't the test of force_parallel_mode just a hack to make regression tests pass? When I removed this part and ran the regression tests with force_parallel_mode=regress, the assertion fired when executing a subquery because the executor was already in parallel mode due to the main query execution. I think the function should be implemented such that it does not mind repeated execution by the same backend. As an alternative, have you considered allocation of the XID even in parallel mode? I imagine that the first parallel worker that needs the XID for insertions allocates it and shares it with the other workers as well as with the leader process. One problem of the current patch version is that the "INSERT INTO ... SELECT ..." statement consumes XID even if the SELECT eventually does not return any row. However, if the same query is processed w/o parallelism, the XID is only allocated if at least one tuple needs to be inserted. v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch ------------------------------------------------------------------- @@ -1021,12 +1039,15 @@ IsInParallelMode(void) * Prepare for entering parallel mode, based on command-type. */ void -PrepareParallelMode(CmdType commandType) +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader) { Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); if (IsModifySupportedInParallelMode(commandType)) { + if (isParallelModifyLeader) + (void) GetCurrentCommandId(true); I miss a comment here. I suppose this is to set currentCommandIdUsed, so that the leader process gets a new commandId for the following statements in the same transaction, and thus it can see the rows inserted by the parallel workers? If my understanding is correct, I think that the leader should not participate in the execution of the Insert node, else it would use higher commandId than the workers. That would be weird, although probably not data corruption. I wonder if parallel_leader_participation should be considered false for the "Gather -> Insert -> ..." plans. @@ -144,9 +148,19 @@ ExecGather(PlanState *pstate) GatherState *node = castNode(GatherState, pstate); TupleTableSlot *slot; ExprContext *econtext; + ModifyTableState *nodeModifyTableState = NULL; + bool isParallelModifyLeader = false; + bool isParallelModifyWithReturning = false; The variable names are quite long. Since this code deals with the Gather node, I think that both "Parallel" and "Leader" components can be removed. @@ -418,14 +446,35 @@ ExecShutdownGatherWorkers(GatherState *node) void ExecShutdownGather(GatherState *node) { - ExecShutdownGatherWorkers(node); + bool isParallelModifyLeader; Likewise, the variable name. @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate) } /* Run plan locally if no workers or enabled and not single-copy. */ - node->need_to_scan_locally = (node->nreaders == 0) + node->need_to_scan_locally = (node->nworkers_launched <= 0) || (!gather->single_copy && parallel_leader_participation); node->initialized = true; } Is this change needed? The code just before this test indicates that nreaders should be equal to nworkers_launched. In grouping_planner(), this branch + /* Consider a supported parallel table-modification command */ + if (IsModifySupportedInParallelMode(parse->commandType) && + !inheritance_update && + final_rel->consider_parallel && + parse->rowMarks == NIL) + { is very similar to creation of the non-parallel ModifyTablePaths - perhaps an opportunity to move the common code into a new function. @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, } } + if (parallel_modify_partial_path_count > 0) + { + final_rel->rows = current_rel->rows; /* ??? why hasn't this been + * set above somewhere ???? */ + generate_useful_gather_paths(root, final_rel, false); + } + extra.limit_needed = limit_needed(parse); extra.limit_tuples = limit_tuples; extra.count_est = count_est; A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no need to count the paths using parallel_modify_partial_path_count. @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan) PlannerGlobal *glob = root->glob; int rtoffset = list_length(glob->finalrtable); ListCell *lc; + Plan *finalPlan; /* * Add all the query's RTEs to the flattened rangetable. The live ones @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan) } /* Now fix the Plan tree */ - return set_plan_refs(root, plan, rtoffset); + finalPlan = set_plan_refs(root, plan, rtoffset); + if (finalPlan != NULL && IsA(finalPlan, Gather)) + { + Plan *subplan = outerPlan(finalPlan); + + if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL) + { + finalPlan->targetlist = copyObject(subplan->targetlist); + } + } + return finalPlan; } I'm not sure if the problem of missing targetlist should be handled here (BTW, NIL is the constant for an empty list, not NULL). Obviously this is a consequence of the fact that the ModifyTable node has no regular targetlist. Actually I don't quite understand why (in the current master branch) the targetlist initialized in set_plan_refs() /* * Set up the visible plan targetlist as being the same as * the first RETURNING list. This is for the use of * EXPLAIN; the executor won't pay any attention to the * targetlist. We postpone this step until here so that * we don't have to do set_returning_clause_references() * twice on identical targetlists. */ splan->plan.targetlist = copyObject(linitial(newRL)); is not used. Instead, ExecInitModifyTable() picks the first returning list again: /* * Initialize result tuple slot and assign its rowtype using the first * RETURNING list. We assume the rest will look the same. */ mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists); So if you set the targetlist in create_modifytable_plan() (according to best_path->returningLists), or even in create_modifytable_path(), and ensure that it gets propagated to the Gather node (generate_gather_pahs currently uses rel->reltarget), then you should no longer need to tweak setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the targetlist. However I don't guarantee that this is the best approach - some planner expert should speak up. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote: > > Greg Nancarrow <gregn4422@gmail.com> wrote: > > > Posting an updated set of patches to address recent feedback: > > Following is my review. > > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch > -------------------------------------------------------------- > > @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > + > + /* > + * Additional parallel-mode safety checks are required in order to > + * allow an underlying parallel query to be used for a > + * table-modification command that is supported in parallel-mode. > + */ > + if (glob->parallelModeOK && > + IsModifySupportedInParallelMode(parse->commandType)) > + { > + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard); > + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > + } > > Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these > functions should not be called by parallel worker. > What in the above change indicates that the parallel_restricted will be allowed in parallel workers. This just sets paralleModeOK to allow parallel plans for Selects if the Insert can be performed safely in a leader backend. > > @@ -1015,6 +1016,27 @@ IsInParallelMode(void) > } > > /* > + * PrepareParallelMode > + * > + * Prepare for entering parallel mode, based on command-type. > + */ > +void > +PrepareParallelMode(CmdType commandType) > +{ > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > > Isn't the test of force_parallel_mode just a hack to make regression tests > pass? When I removed this part and ran the regression tests with > force_parallel_mode=regress, the assertion fired when executing a subquery > because the executor was already in parallel mode due to the main query > execution. > I think this Assert is bogus. We are allowed to enter in parallel-mode if we are already in parallel-mode, see EnterParallelMode. But we shouldn't be allowed allocate xid in parallel-mode. So the Assert(!IsInParallelMode()) should be moved inside the check if (IsModifySupportedInParallelMode(commandType)) in this function. Can you check if it still fails after such a modification? > As an alternative, have you considered allocation of the XID even in parallel > mode? I imagine that the first parallel worker that needs the XID for > insertions allocates it and shares it with the other workers as well as with > the leader process. > As a matter of this patch (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we never need to allocate xids by workers because Insert is always performed by leader backend. Even, if we want to do what you are suggesting it would be tricky because currently, we don't have such an infrastructure where we can pass information among workers. > One problem of the current patch version is that the "INSERT INTO ... SELECT > ..." statement consumes XID even if the SELECT eventually does not return any > row. However, if the same query is processed w/o parallelism, the XID is only > allocated if at least one tuple needs to be inserted. > Yeah, that is true but I think this can happen w/o parallelism for updates and deletes where by the time we try to modify the row, it got modified by a concurrent session and the first session will needlessly allocate XID. -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote: > > > > Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > Posting an updated set of patches to address recent feedback: > > > > Following is my review. > > > > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch > > -------------------------------------------------------------- > > > > @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, > > /* all the cheap tests pass, so scan the query tree */ > > glob->maxParallelHazard = max_parallel_hazard(parse); > > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > > + > > + /* > > + * Additional parallel-mode safety checks are required in order to > > + * allow an underlying parallel query to be used for a > > + * table-modification command that is supported in parallel-mode. > > + */ > > + if (glob->parallelModeOK && > > + IsModifySupportedInParallelMode(parse->commandType)) > > + { > > + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard); > > + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > > + } > > > > Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these > > functions should not be called by parallel worker. > > > > What in the above change indicates that the parallel_restricted will > be allowed in parallel workers. This just sets paralleModeOK to allow > parallel plans for Selects if the Insert can be performed safely in a > leader backend. Well, this is just the initial setting, while the distinction between "Gather -> Insert -> ..." and "Insert -> Gather -> ..." is made later. So I withdraw my objection. > > > > @@ -1015,6 +1016,27 @@ IsInParallelMode(void) > > } > > > > /* > > + * PrepareParallelMode > > + * > > + * Prepare for entering parallel mode, based on command-type. > > + */ > > +void > > +PrepareParallelMode(CmdType commandType) > > +{ > > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > > > > Isn't the test of force_parallel_mode just a hack to make regression tests > > pass? When I removed this part and ran the regression tests with > > force_parallel_mode=regress, the assertion fired when executing a subquery > > because the executor was already in parallel mode due to the main query > > execution. > > > > I think this Assert is bogus. We are allowed to enter in parallel-mode > if we are already in parallel-mode, see EnterParallelMode. Right. > But we shouldn't be allowed allocate xid in parallel-mode. So the > Assert(!IsInParallelMode()) should be moved inside the check if > (IsModifySupportedInParallelMode(commandType)) in this function. Can you > check if it still fails after such a modification? Yes, this works. > > As an alternative, have you considered allocation of the XID even in parallel > > mode? I imagine that the first parallel worker that needs the XID for > > insertions allocates it and shares it with the other workers as well as with > > the leader process. > > > > As a matter of this patch > (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we > never need to allocate xids by workers because Insert is always > performed by leader backend. When writing this comment, I was actually thinking of v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather than v11-0001, see below. On the other hand, if we allowed XID allocation in the parallel mode (as a separate patch), even the 0001 patch would get a bit simpler. > Even, if we want to do what you are suggesting it would be tricky because > currently, we don't have such an infrastructure where we can pass > information among workers. How about barriers (storage/ipc/barrier.c)? What I imagine is that all the workers "meet" at the barrier when they want to insert the first tuple. Then one of them allocates the XID, makes it available to others (via shared memory) and all the workers can continue. > > One problem of the current patch version is that the "INSERT INTO ... SELECT > > ..." statement consumes XID even if the SELECT eventually does not return any > > row. However, if the same query is processed w/o parallelism, the XID is only > > allocated if at least one tuple needs to be inserted. > > > Yeah, that is true but I think this can happen w/o parallelism for > updates and deletes where by the time we try to modify the row, it got > modified by a concurrent session and the first session will needlessly > allocate XID. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote: > > Greg Nancarrow <gregn4422@gmail.com> wrote: > > > Posting an updated set of patches to address recent feedback: > > Following is my review. > .. > > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch > ------------------------------------------------------------------- > > @@ -1021,12 +1039,15 @@ IsInParallelMode(void) > * Prepare for entering parallel mode, based on command-type. > */ > void > -PrepareParallelMode(CmdType commandType) > +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader) > { > Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF); > > if (IsModifySupportedInParallelMode(commandType)) > { > + if (isParallelModifyLeader) > + (void) GetCurrentCommandId(true); > > I miss a comment here. I suppose this is to set currentCommandIdUsed, so that > the leader process gets a new commandId for the following statements in the > same transaction, and thus it can see the rows inserted by the parallel > workers? > oh no, leader backend and worker backends must use the same commandId. I am also not sure if we need this because for Insert statements we already call GetCurrentCommandId(true) is standard_ExecutorStart. We don't want the rows visibility behavior for parallel-inserts any different than non-parallel ones. > If my understanding is correct, I think that the leader should not participate > in the execution of the Insert node, else it would use higher commandId than > the workers. That would be weird, although probably not data corruption. > Yeah, exactly this is the reason both leader and backends must use the same commandId. > I > wonder if parallel_leader_participation should be considered false for the > "Gather -> Insert -> ..." plans. > If what I said above is correct then this is moot. > > > @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate) > } > > /* Run plan locally if no workers or enabled and not single-copy. */ > - node->need_to_scan_locally = (node->nreaders == 0) > + node->need_to_scan_locally = (node->nworkers_launched <= 0) > || (!gather->single_copy && parallel_leader_participation); > node->initialized = true; > } > > Is this change needed? The code just before this test indicates that nreaders > should be equal to nworkers_launched. > This change is required because we don't need to set up readers for parallel-insert unless there is a returning clause. See the below check a few lines before this change: - if (pcxt->nworkers_launched > 0) + if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader && !isParallelModifyWithReturning)) { I think this check could be simplified to if (pcxt->nworkers_launched > 0 && isParallelModifyWithReturning) or something like that. > > In grouping_planner(), this branch > > + /* Consider a supported parallel table-modification command */ > + if (IsModifySupportedInParallelMode(parse->commandType) && > + !inheritance_update && > + final_rel->consider_parallel && > + parse->rowMarks == NIL) > + { > > is very similar to creation of the non-parallel ModifyTablePaths - perhaps an > opportunity to move the common code into a new function. > +1. > > @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, > } > } > > + if (parallel_modify_partial_path_count > 0) > + { > + final_rel->rows = current_rel->rows; /* ??? why hasn't this been > + * set above somewhere???? */ > + generate_useful_gather_paths(root, final_rel, false); > + } > + > extra.limit_needed = limit_needed(parse); > extra.limit_tuples = limit_tuples; > extra.count_est = count_est; > > A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no > need to count the paths using parallel_modify_partial_path_count. > Sounds sensible. > > @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan) > PlannerGlobal *glob = root->glob; > int rtoffset = list_length(glob->finalrtable); > ListCell *lc; > + Plan *finalPlan; > > /* > * Add all the query's RTEs to the flattened rangetable. The live ones > @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan) > } > > /* Now fix the Plan tree */ > - return set_plan_refs(root, plan, rtoffset); > + finalPlan = set_plan_refs(root, plan, rtoffset); > + if (finalPlan != NULL && IsA(finalPlan, Gather)) > + { > + Plan *subplan = outerPlan(finalPlan); > + > + if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL) > + { > + finalPlan->targetlist = copyObject(subplan->targetlist); > + } > + } > + return finalPlan; > } > > I'm not sure if the problem of missing targetlist should be handled here (BTW, > NIL is the constant for an empty list, not NULL). Obviously this is a > consequence of the fact that the ModifyTable node has no regular targetlist. > I think it is better to add comments along with this change. In this form, this looks quite hacky to me. > Actually I don't quite understand why (in the current master branch) the > targetlist initialized in set_plan_refs() > > /* > * Set up the visible plan targetlist as being the same as > * the first RETURNING list. This is for the use of > * EXPLAIN; the executor won't pay any attention to the > * targetlist. We postpone this step until here so that > * we don't have to do set_returning_clause_references() > * twice on identical targetlists. > */ > splan->plan.targetlist = copyObject(linitial(newRL)); > > is not used. Instead, ExecInitModifyTable() picks the first returning list > again: > > /* > * Initialize result tuple slot and assign its rowtype using the first > * RETURNING list. We assume the rest will look the same. > */ > mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists); > > So if you set the targetlist in create_modifytable_plan() (according to > best_path->returningLists), or even in create_modifytable_path(), and ensure > that it gets propagated to the Gather node (generate_gather_pahs currently > uses rel->reltarget), then you should no longer need to tweak > setrefs.c. This sounds worth investigating. > Moreover, ExecInitModifyTable() would no longer need to set the > targetlist. > I am not sure if we need to do anything about ExecInitModifyTable. If we want to unify what setrefs.c does with ExecInitModifyTable, then we can start a separate thread. Thanks for all the reviews. I would like to emphasize what I said earlier in this thread that it is better to first focus on Parallelising Selects for Insert (aka what v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT does) as that in itself is a step towards achieving parallel inserts, doing both 0001 and 0003 at the same time can take much more time as both touches quite intricate parts of the code. -- With Regards, Amit Kapila.
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches to address recent feedback: > > - Removed conditional-locking code used in parallel-safety checking > code (Tsunakawa-san feedback). It turns out that for the problem test > case, no parallel-safety checking should be occurring that locks > relations because those inserts are specifying VALUES, not an > underlying SELECT, so the parallel-safety checking code was updated to > bail out early if no underlying SELECT is specified for the INSERT. No > change to the test code was required. > - Added comment to better explain the reason for treating "INSERT ... > ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) > - Added assertion to heap_prepare_insert() (Amit) > - Updated error handling for NULL index_expr_item case (Vignesh) Thanks Greg for fixing and posting a new patch. Few comments: +-- Test INSERT with underlying query. +-- (should create plan with parallel SELECT, Gather parent node) +-- +explain(costs off) insert into para_insert_p1 select unique1, stringu1 from tenk1; + QUERY PLAN +---------------------------------------- + Insert on para_insert_p1 + -> Gather + Workers Planned: 4 + -> Parallel Seq Scan on tenk1 +(4 rows) + +insert into para_insert_p1 select unique1, stringu1 from tenk1; +select count(*), sum(unique1) from para_insert_p1; + count | sum +-------+---------- + 10000 | 49995000 +(1 row) + For one of the test you can validate that the same transaction has been used by all the parallel workers, you could use something like below to validate: SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM para_insert_p1) as dt; Few includes are not required: #include "executor/nodeGather.h" +#include "executor/nodeModifyTable.h" #include "executor/nodeSubplan.h" #include "executor/tqueue.h" #include "miscadmin.h" @@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags) GatherState *gatherstate; Plan *outerNode; TupleDesc tupDesc; + Index varno; This include is not required in nodeModifyTable.c +#include "catalog/index.h" +#include "catalog/indexing.h" @@ -43,7 +49,11 @@ #include "parser/parse_agg.h" #include "parser/parse_coerce.h" #include "parser/parse_func.h" +#include "parser/parsetree.h" +#include "partitioning/partdesc.h" +#include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" +#include "storage/lmgr.h" #include "tcop/tcopprot.h" The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c There are few typos: + table and populate it can use a parallel plan. Another exeption is the command + <literal>INSERT INTO ... SELECT ...</literal> which can use a parallel plan for + the underlying <literal>SELECT</literal> part of the query. exeption should be exception + /* + * For the number of workers to use for a parallel + * INSERT/UPDATE/DELETE, it seems resonable to use the same number + * of workers as estimated for the underlying query. + */ + parallelModifyWorkers = path->parallel_workers; resonable should be reasonable Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 8, 2021 at 12:21 PM Antonin Houska <ah@cybertec.at> wrote: > > Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > As an alternative, have you considered allocation of the XID even in parallel > > > mode? I imagine that the first parallel worker that needs the XID for > > > insertions allocates it and shares it with the other workers as well as with > > > the leader process. > > > > > > > As a matter of this patch > > (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we > > never need to allocate xids by workers because Insert is always > > performed by leader backend. > > When writing this comment, I was actually thinking of > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather > than v11-0001, see below. On the other hand, if we allowed XID allocation in > the parallel mode (as a separate patch), even the 0001 patch would get a bit > simpler. > > > Even, if we want to do what you are suggesting it would be tricky because > > currently, we don't have such an infrastructure where we can pass > > information among workers. > > How about barriers (storage/ipc/barrier.c)? What I imagine is that all the > workers "meet" at the barrier when they want to insert the first tuple. Then > one of them allocates the XID, makes it available to others (via shared > memory) and all the workers can continue. > Even if want to do this I am not sure if we need barriers because there is no intrinsic need for all workers to stop before allocating XID. After allocation of XID, we just need some way for other workers to use it, maybe something like all workers currently synchronizes for getting the block number to process in parallel sequence scans. But the question is it really worth because in many cases it would be already allocated by the time parallel DML operation is started and we do share it in the beginning? I think if we really want to allow allocation of xid in parallel-mode then we should also think to allow it for subtransactions xid not only for main transactions and that will open up some other use cases. I feel it is better to tackle that problem separately. -- With Regards, Amit Kapila.
Hi Greg, Amit Cc:hackers > > > 4. Have you checked the overhead of this on the planner for > > > different kinds of statements like inserts into tables having 100 > > > or 500 partitions? Similarly, it is good to check the overhead of > > > domain related checks added in the patch. > > > > > > > Checking that now and will post results soon. > > >I am seeing a fair bit of overhead in the planning for the INSERT >parallel-safety checks (mind you, compared to the overall performance >gain, it's not too bad). Considering the 'real-world' use cases and extreme cases I can imagine, I took 3 kinds of measurements on partition tablefor the latest patch(V11). The measurement is mainly focus on small rows because this could be easier to evaluate check overhead among the parallelismoptimization. From current results, the overhead looks acceptable compared to the benefits as Greg said. Test 1: overhead of parallel insert into thousands partitions and 1 rows per partition. %reg=(patched-master)/master all time= Execution Time+ Planning Time | patched | master | %reg | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| partitions |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) | %reg(alltime) | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| 1000 | 2281.291 | 25.983 | 9752.145 | 0.208 | -77% | -76% | 2000 | 2303.229 | 50.427 | 9446.221 | 0.227 | -76% | -75% | 4000 | 2303.207 | 100.946 | 9948.743 | 0.211 | -77% | -76% | 6000 | 2411.877 | 152.212 | 9953.114 | 0.210 | -76% | -74% | 10000 | 2467.235 | 260.751 | 10917.494 | 0.284 | -77% | -75% | Test 2: overhead of parallel insert into thousands partitions and 100 rows per partition. | patched | master | %reg | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| partitions |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) | %reg(alltime) | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| 1000 | 2366.620 | 25.787 | 14052.748 | 0.238 | -83% | -83% | 2000 | 2325.171 | 48.780 | 10099.203 | 0.211 | -77% | -76% | 4000 | 2599.344 | 110.978 | 10678.065 | 0.216 | -76% | -75% | 6000 | 2764.070 | 152.929 | 10880.948 | 0.238 | -75% | -73% | 10000 | 3043.658 | 265.297 | 11607.202 | 0.207 | -74% | -71% | Test 3: overhead of parallel insert into varying number of partitions and inserted rows. | patched | master | %reg | -----------|-----------------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| partitions |total table rows |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(ExcutionTime) | %reg(all time) | -----------|-----------------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| 100 | 10000000 | 11202.021 | 1.593 | 25668.560 | 0.212 | -56% | -56% | 500 | 10000000 | 10290.368 | 12.722 | 25730.860 | 0.214 | -60% | -60% | 1000 | 10000000 | 8946.627 | 24.851 | 26271.026 | 0.219 | -66% | -66% | 2000 | 10000000 | 10615.643 | 50.111 | 25512.692 | 0.231 | -58% | -58% | 4000 | 10000000 | 9056.334 | 105.644 | 26643.383 | 0.217 | -66% | -66% | ------------------------------------------------------------------------------------------------------------------------------------------------------ 100 | 1000000 | 2757.670 | 1.493 | 11136.357 | 0.208 | -75% | -75% | 500 | 1000000 | 2810.980 | 12.696 | 11483.715 | 0.206 | -76% | -75% | 1000 | 1000000 | 2773.342 | 24.746 | 13441.169 | 0.214 | -79% | -79% | 2000 | 1000000 | 2856.915 | 51.737 | 10996.621 | 0.226 | -74% | -74% | 4000 | 1000000 | 2942.478 | 100.235 | 11422.699 | 0.220 | -74% | -73% | ------------------------------------------------------------------------------------------------------------------------------------------------------ 100 | 100000 | 2257.134 | 1.682 | 9351.511 | 0.226 | -76% | -76% | 500 | 100000 | 2197.570 | 12.452 | 9636.659 | 0.203 | -77% | -77% | 1000 | 100000 | 2188.356 | 24.553 | 9647.583 | 0.202 | -77% | -77% | 2000 | 100000 | 2293.287 | 49.167 | 9365.449 | 0.224 | -76% | -75% | 4000 | 100000 | 2375.935 | 104.562 | 10125.190 | 0.219 | -77% | -76% | ------------------------------------------------------------------------------------------------------------------------------------------------------ 100 | 10000 | 2142.086 | 1.506 | 9500.491 | 0.206 | -77% | -77% | 500 | 10000 | 2147.779 | 12.260 | 11746.766 | 0.202 | -82% | -82% | 1000 | 10000 | 2153.286 | 23.900 | 9298.452 | 0.212 | -77% | -77% | 2000 | 10000 | 2303.170 | 52.844 | 9772.971 | 0.217 | -76% | -76% | However, just like Amit and other hackers concerned, if we want to leave the overhead as it is, we should cover real usecase as much as possible in case we find the overhead can't be ignored(then we should consider to reduce the overhead). So if anyone has some reality use cases(which I didn't include in my results above) need to test on this patch. Please sharethe info with me, I'd like to do more tests on it. Regards, Tang
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches to address recent feedback: > Here is an additional review of v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are quite a few comments raised on the V11-0001* patch. I suggest first post a revised version of V11-0001* patch addressing those comments and then you can separately post a revised version of v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Few comments: ============== 1. +char +max_parallel_hazard_for_modify(Query *parse, const char *initial_max_parallel_hazard) { .. + return (rel_max_parallel_hazard_for_modify(rte->relid, parse->commandType, &context, NoLock)); .. } rel_max_parallel_hazard_for_modify() { .. + rel = table_open(relid, lockmode); .. + if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || .. + /* + * Column default expressions for columns in the target-list are + * already being checked for parallel-safety in the + * max_parallel_hazard() scan of the query tree in standard_planner(). + */ + + tupdesc = RelationGetDescr(rel); } Here, it seems we are accessing the relation descriptor without any lock on the table which is dangerous considering the table can be modified in a parallel session. Is there a reason why you think this is safe? Did you check anywhere else such a coding pattern? 2. + /* + * If there are any index expressions, check that they are parallel-mode + * safe. + */ + max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context); + if (max_parallel_hazard_test(max_hazard, context)) + { + table_close(rel, lockmode); + return context->max_hazard; + } Here and at all other similar places, the call to max_parallel_hazard_test seems redundant because index_expr_max_parallel_hazard_for_modify would have already done that. Why can't we just return true/false from index_expr_max_parallel_hazard_for_modify? The context would have been already updated for max_hazard. 3. @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); + + /* + * Additional parallel-mode safety checks are required in order to + * allow an underlying parallel query to be used for a + * table-modification command that is supported in parallel-mode. + */ + if (glob->parallelModeOK && + IsModifySupportedInParallelMode(parse->commandType)) + { + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard); + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); + } } I don't like this way of checking parallel_hazard for modify. This not only duplicates some code in max_parallel_hazard_for_modify from max_parallel_hazard but also appears quite awkward. Can we move max_parallel_hazard_for_modify inside max_parallel_hazard? Basically, after calling max_parallel_hazard_walker, we can check for modify statement and call the new function. 4. domain_max_parallel_hazard_for_modify() { .. + if (isnull) + { + /* + * This shouldn't ever happen, but if it does, log a WARNING + * and return UNSAFE, rather than erroring out. + */ + elog(WARNING, "null conbin for constraint %u", con->oid); + context->max_hazard = PROPARALLEL_UNSAFE; + break; + } .. } index_expr_max_parallel_hazard_for_modify() { .. + if (index_expr_item == NULL) /* shouldn't happen */ + { + index_close(index_rel, lockmode); + context->max_hazard = PROPARALLEL_UNSAFE; + return context->max_hazard; + } .. } It is not clear why the above two are shouldn't happen cases and if so why we want to treat them as unsafe. Ideally, there should be an Assert if these can't happen but it is difficult to decide without knowing why you have considered them unsafe? -- With Regards, Amit Kapila.
On Thu, Jan 14, 2021 at 2:37 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote: > > Hi Greg, Amit > Cc:hackers > > > > > 4. Have you checked the overhead of this on the planner for > > > > different kinds of statements like inserts into tables having 100 > > > > or 500 partitions? Similarly, it is good to check the overhead of > > > > domain related checks added in the patch. > > > > > > > > > > Checking that now and will post results soon. > > > > >I am seeing a fair bit of overhead in the planning for the INSERT > >parallel-safety checks (mind you, compared to the overall performance > >gain, it's not too bad). > > Considering the 'real-world' use cases and extreme cases I can imagine, I took 3 kinds of measurements on partition tablefor the latest patch(V11). > The measurement is mainly focus on small rows because this could be easier to evaluate check overhead among the parallelismoptimization. > From current results, the overhead looks acceptable compared to the benefits as Greg said. > Can we test cases when we have few rows in the Select table (say 1000) and there 500 or 1000 partitions. In that case, we won't select parallelism but we have to pay the price of checking parallel-safety of all partitions. Can you check this with 100, 200, 500, 1000 partitions table? -- With Regards, Amit Kapila.
Hello Tang-san, From: Amit Kapila <amit.kapila16@gmail.com> > Can we test cases when we have few rows in the Select table (say 1000) > and there 500 or 1000 partitions. In that case, we won't select > parallelism but we have to pay the price of checking parallel-safety > of all partitions. Can you check this with 100, 200, 500, 1000 > partitions table? I also wanted to see such an extreme(?) case. The 1,000 rows is not the count per partition but the total count of all partitions.e.g.,when # of partitions is 100, # of rows per partition is 10. Regards Takayuki Tsunakawa
On Fri, Jan 8, 2021 at 8:25 PM vignesh C <vignesh21@gmail.com> wrote: > > For one of the test you can validate that the same transaction has > been used by all the parallel workers, you could use something like > below to validate: > SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM para_insert_p1) as dt; > > Few includes are not required: > #include "executor/nodeGather.h" > > This include is not required in nodeModifyTable.c > > > The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c > > exeption should be exception > > resonable should be reasonable > Thanks Vignesh, I'll be sure to make those updates in the next version of the patches. Regards, Greg
On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Here is an additional review of > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are > quite a few comments raised on the V11-0001* patch. I suggest first > post a revised version of V11-0001* patch addressing those comments > and then you can separately post a revised version of > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. > 1) >Here, it seems we are accessing the relation descriptor without any >lock on the table which is dangerous considering the table can be >modified in a parallel session. Is there a reason why you think this >is safe? Did you check anywhere else such a coding pattern? Yes, there's a very good reason and I certainly have checked for the same coding pattern elsewhere, and not just randomly decided that locking can be ignored. The table has ALREADY been locked (by the caller) during the parse/analyze phase. (This is not the case for a partition, in which case the patch code uses AccessShareLock, as you will see). And BTW, with asserts enabled, an attempt to table_open() with NoLock when you haven't already locked the table will fire an assert - see following code in relation_open(): /* * If we didn't get the lock ourselves, assert that caller holds one, * except in bootstrap mode where no locks are used. */ Assert(lockmode != NoLock || IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, true)); 2) >+ /* >+ * If there are any index expressions, check that they are parallel-mode >+ * safe. >+ */ >+ max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context); >+ if (max_parallel_hazard_test(max_hazard, context)) >+ { >+ table_close(rel, lockmode); >+ return context->max_hazard; >+ } >Here and at all other similar places, the call to >max_parallel_hazard_test seems redundant because >index_expr_max_parallel_hazard_for_modify would have already done >that. Why can't we just return true/false from >index_expr_max_parallel_hazard_for_modify? The context would have been >already updated for max_hazard. Yes, you're right, it's redundant to call max_parallel_hazard_test(_) again. The max_hazard can always be retrieved from the context if needed (rather than explicitly returned), so I'll change this function (and any similar cases) to just return true if the max_hazard of interest has been reached. 3) >I don't like this way of checking parallel_hazard for modify. This not >only duplicates some code in max_parallel_hazard_for_modify from >max_parallel_hazard but also appears quite awkward. Can we move >max_parallel_hazard_for_modify inside max_parallel_hazard? Basically, >after calling max_parallel_hazard_walker, we can check for modify >statement and call the new function. Agree, I'll move it, as you suggest. 4) >domain_max_parallel_hazard_for_modify() >{ >.. >+ if (isnull) >+ { >+ /* >+ * This shouldn't ever happen, but if it does, log a WARNING >+ * and return UNSAFE, rather than erroring out. >+ */ >+ elog(WARNING, "null conbin for constraint %u", con->oid); >+ context->max_hazard = PROPARALLEL_UNSAFE; >+ break; >+ } >.. >} >index_expr_max_parallel_hazard_for_modify() >{ >.. >+ if (index_expr_item == NULL) /* shouldn't happen */ >+ { >+ index_close(index_rel, lockmode); >+ context->max_hazard = PROPARALLEL_UNSAFE; >+ return context->max_hazard; >+ } >.. >} >It is not clear why the above two are shouldn't happen cases and if so >why we want to treat them as unsafe. Ideally, there should be an >Assert if these can't happen but it is difficult to decide without >knowing why you have considered them unsafe? The checks being done here for "should never happen" cases are THE SAME as other parts of the Postgres code. For example, search Postgres code for "null conbin" and you'll find 6 other places in the Postgres code which actually ERROR out if conbin (binary representation of the constraint) in a pg_constraint tuple is found to be null. The cases that you point out in the patch used to also error out in the same way, but Vignesh suggested changing them to just return parallel-unsafe instead of erroring-out, which I agree with. Such cases could surely ever only happen if the DB had been corrupted, so extremely rare IMHO and most likely to have caused an ERROR elsewhere before ever reaching here... I can add some Asserts to the current code, to better alert for such cases, for when asserts are enabled, but otherwise I think that the current code is OK (cleaning up other Postgres code is beyond the scope of the task here). Regards, Greg Nancarrow Fujitsu Australia
> From: Amit Kapila <amit.kapila16@gmail.com> > > Can we test cases when we have few rows in the Select table (say > > 1000) and there 500 or 1000 partitions. In that case, we won't > > select parallelism but we have to pay the price of checking > > parallel-safety of all partitions. Can you check this with 100, 200, > > 500, 1000 partitions table? > > I also wanted to see such an extreme(?) case. The 1,000 rows is not > the count per partition but the total count of all partitions.e.g., > when # of partitions is 100, # of rows per partition is 10. Below results are in serial plan which select table total rows are 1,000. The Excution Time + Planning Time is still lessthan unpatched. (does this patch make some optimizes in serial insert? I'm a little confused here, Because the patched execution time isless than unpatched, but I didn't find information in commit messages about it. If I missed something, please kindly letme know.) | patched | master | %reg | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| partitions |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) | %reg(alltime) | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| 100 | 5.294 | 1.581 | 6.951 | 0.037 | -24% | -2% | 200 | 9.666 | 3.068 | 13.681 | 0.043 | -29% | -7% | 500 | 22.742 | 12.061 | 35.928 | 0.125 | -37% | -3% | 1000 | 46.386 | 24.872 | 75.523 | 0.142 | -39% | -6% | I did another test which made check overhead obvious. this case is not fitting for partition purpose, but I put it here asan extreme case too. Select table total rows are 1,000, # of partitions is 2000. So only the first 1000 partitions have 1 row per partition, thelast 1000 partitions have no data inserted. | patched | master | %reg | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| partitions |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) | %reg(alltime) | -----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------| 2000 | 45.758 | 51.697 | 80.272 | 0.136 | -43 | 21% | Regards, Tang
From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com> > (does this patch make some optimizes in serial insert? I'm a little confused > here, Because the patched execution time is less than unpatched, but I didn't > find information in commit messages about it. If I missed something, please > kindly let me know.) I haven't thought of anything yet. Could you show us the output of EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000 partitionscase for the patched and unpatched? If it doesn't show any difference, the output of perf may be necessary next. (BTW, were all the 1,000 rows stored in the target table?) > I did another test which made check overhead obvious. this case is not fitting > for partition purpose, but I put it here as an extreme case too. > Select table total rows are 1,000, # of partitions is 2000. So only the first 1000 > partitions have 1 row per partition, the last 1000 partitions have no data > inserted. Thank you, that's a good test. Regards Takayuki Tsunakawa
On Mon, Jan 18, 2021 at 1:02 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote: > > > From: Amit Kapila <amit.kapila16@gmail.com> > > > Can we test cases when we have few rows in the Select table (say > > > 1000) and there 500 or 1000 partitions. In that case, we won't > > > select parallelism but we have to pay the price of checking > > > parallel-safety of all partitions. Can you check this with 100, 200, > > > 500, 1000 partitions table? > > > > I also wanted to see such an extreme(?) case. The 1,000 rows is not > > the count per partition but the total count of all partitions.e.g., > > when # of partitions is 100, # of rows per partition is 10. > > Below results are in serial plan which select table total rows are 1,000. The Excution Time + Planning Time is still lessthan unpatched. > (does this patch make some optimizes in serial insert? I'm a little confused here, Because the patched execution time isless than unpatched, but I didn't find information in commit messages about it. If I missed something, please kindly letme know.) > I don't think the patch should have any impact on the serial case. I think you can try to repeat each test 3 times both with and without a patch and take the median of the three. -- With Regards, Amit Kapila.
Hi Tsunakawa-san > From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com> > > (does this patch make some optimizes in serial insert? I'm a little > > confused here, Because the patched execution time is less than > > unpatched, but I didn't find information in commit messages about it. > > If I missed something, please kindly let me know.) > > I haven't thought of anything yet. Could you show us the output of > EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000 partitions case for the > patched and unpatched? If it doesn't show any difference, the output > of perf may be necessary next. Execute EXPLAIN on Patched: postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual time=44.139..44.140 rows=0 loops=1) Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000 -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 width=8) (actual time=0.007..0.201 rows=1000 loops=1) Output: test_data1.a, test_data1.b Buffers: shared hit=5 Planning: Buffers: shared hit=27011 Planning Time: 24.526 ms Execution Time: 44.981 ms Execute EXPLAIN on non-Patched: postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual time=72.656..72.657 rows=0 loops=1) Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000 -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 width=8) (actual time=0.010..0.175 rows=1000 loops=1) Output: test_data1.a, test_data1.b Buffers: shared hit=5 Planning: Buffers: shared hit=72 Planning Time: 0.135 ms Execution Time: 79.058 ms > (BTW, were all the 1,000 rows stored in the target table?) Yes, I checked all rows stored in target table. postgres=# select count(*) from test_part; count ------- 1000 Regards, Tang
On Mon, Jan 18, 2021 at 10:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Here is an additional review of > > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are > > quite a few comments raised on the V11-0001* patch. I suggest first > > post a revised version of V11-0001* patch addressing those comments > > and then you can separately post a revised version of > > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. > > > > 1) > > >Here, it seems we are accessing the relation descriptor without any > >lock on the table which is dangerous considering the table can be > >modified in a parallel session. Is there a reason why you think this > >is safe? Did you check anywhere else such a coding pattern? > > Yes, there's a very good reason and I certainly have checked for the > same coding pattern elsewhere, and not just randomly decided that > locking can be ignored. > The table has ALREADY been locked (by the caller) during the > parse/analyze phase. > Fair enough. I suggest adding a comment saying the same so that the reader doesn't get confused about the same. > (This is not the case for a partition, in which case the patch code > uses AccessShareLock, as you will see). Okay, but I see you release the lock on partition rel immediately after checking parallel-safety. What if a user added some parallel-unsafe constraint (via Alter Table) after that check? > > 4) > > >domain_max_parallel_hazard_for_modify() > >{ > >.. > >+ if (isnull) > >+ { > >+ /* > >+ * This shouldn't ever happen, but if it does, log a WARNING > >+ * and return UNSAFE, rather than erroring out. > >+ */ > >+ elog(WARNING, "null conbin for constraint %u", con->oid); > >+ context->max_hazard = PROPARALLEL_UNSAFE; > >+ break; > >+ } > >.. > >} > >index_expr_max_parallel_hazard_for_modify() > >{ > >.. > >+ if (index_expr_item == NULL) /* shouldn't happen */ > >+ { > >+ index_close(index_rel, lockmode); > >+ context->max_hazard = PROPARALLEL_UNSAFE; > >+ return context->max_hazard; > >+ } > >.. > >} > > >It is not clear why the above two are shouldn't happen cases and if so > >why we want to treat them as unsafe. Ideally, there should be an > >Assert if these can't happen but it is difficult to decide without > >knowing why you have considered them unsafe? > > The checks being done here for "should never happen" cases are THE > SAME as other parts of the Postgres code. > For example, search Postgres code for "null conbin" and you'll find 6 > other places in the Postgres code which actually ERROR out if conbin > (binary representation of the constraint) in a pg_constraint tuple is > found to be null. > The cases that you point out in the patch used to also error out in > the same way, but Vignesh suggested changing them to just return > parallel-unsafe instead of erroring-out, which I agree with. > You have not raised a WARNING for the second case. But in the first place what is the reasoning for making this different from other parts of code? If we don't have a solid reason then I suggest keeping these checks and errors the same as in other parts of the code. -- With Regards, Amit Kapila.
Hi Amit > I don't think the patch should have any impact on the serial case. I > think you can try to repeat each test 3 times both with and without a > patch and take the median of the three. Actually, I repeated about 10 times, the execution time is always less than unpatched. Regards, Tang
On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote: > > Hi Tsunakawa-san > > > From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com> > > > (does this patch make some optimizes in serial insert? I'm a little > > > confused here, Because the patched execution time is less than > > > unpatched, but I didn't find information in commit messages about it. > > > If I missed something, please kindly let me know.) > > > > I haven't thought of anything yet. Could you show us the output of > > EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000 partitions case for the > > patched and unpatched? If it doesn't show any difference, the output > > of perf may be necessary next. > > Execute EXPLAIN on Patched: > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual time=44.139..44.140 rows=0 loops=1) > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000 > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 width=8) (actual time=0.007..0.201 rows=1000 loops=1) > Output: test_data1.a, test_data1.b > Buffers: shared hit=5 > Planning: > Buffers: shared hit=27011 > Planning Time: 24.526 ms > Execution Time: 44.981 ms > > Execute EXPLAIN on non-Patched: > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual time=72.656..72.657 rows=0 loops=1) > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000 > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 width=8) (actual time=0.010..0.175 rows=1000 loops=1) > Output: test_data1.a, test_data1.b > Buffers: shared hit=5 > Planning: > Buffers: shared hit=72 > Planning Time: 0.135 ms > Execution Time: 79.058 ms > So, the results indicate that after the patch we touch more buffers during planning which I think is because of accessing the partition information, and during execution, the patch touches fewer buffers for the same reason. But why this can reduce the time with patch? I think this needs some investigation. -- With Regards, Amit Kapila.
On Mon, Jan 18, 2021 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 18, 2021 at 10:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Here is an additional review of > > > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are > > > quite a few comments raised on the V11-0001* patch. I suggest first > > > post a revised version of V11-0001* patch addressing those comments > > > and then you can separately post a revised version of > > > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. > > > > > > > 1) > > > > >Here, it seems we are accessing the relation descriptor without any > > >lock on the table which is dangerous considering the table can be > > >modified in a parallel session. Is there a reason why you think this > > >is safe? Did you check anywhere else such a coding pattern? > > > > Yes, there's a very good reason and I certainly have checked for the > > same coding pattern elsewhere, and not just randomly decided that > > locking can be ignored. > > The table has ALREADY been locked (by the caller) during the > > parse/analyze phase. > > > > Fair enough. I suggest adding a comment saying the same so that the > reader doesn't get confused about the same. > > > (This is not the case for a partition, in which case the patch code > > uses AccessShareLock, as you will see). > > Okay, but I see you release the lock on partition rel immediately > after checking parallel-safety. What if a user added some > parallel-unsafe constraint (via Alter Table) after that check? > > > > > 4) > > > > >domain_max_parallel_hazard_for_modify() > > >{ > > >.. > > >+ if (isnull) > > >+ { > > >+ /* > > >+ * This shouldn't ever happen, but if it does, log a WARNING > > >+ * and return UNSAFE, rather than erroring out. > > >+ */ > > >+ elog(WARNING, "null conbin for constraint %u", con->oid); > > >+ context->max_hazard = PROPARALLEL_UNSAFE; > > >+ break; > > >+ } > > >.. > > >} > > >index_expr_max_parallel_hazard_for_modify() > > >{ > > >.. > > >+ if (index_expr_item == NULL) /* shouldn't happen */ > > >+ { > > >+ index_close(index_rel, lockmode); > > >+ context->max_hazard = PROPARALLEL_UNSAFE; > > >+ return context->max_hazard; > > >+ } > > >.. > > >} > > > > >It is not clear why the above two are shouldn't happen cases and if so > > >why we want to treat them as unsafe. Ideally, there should be an > > >Assert if these can't happen but it is difficult to decide without > > >knowing why you have considered them unsafe? > > > > The checks being done here for "should never happen" cases are THE > > SAME as other parts of the Postgres code. > > For example, search Postgres code for "null conbin" and you'll find 6 > > other places in the Postgres code which actually ERROR out if conbin > > (binary representation of the constraint) in a pg_constraint tuple is > > found to be null. > > The cases that you point out in the patch used to also error out in > > the same way, but Vignesh suggested changing them to just return > > parallel-unsafe instead of erroring-out, which I agree with. > > > > You have not raised a WARNING for the second case. But in the first > place what is the reasoning for making this different from other parts > of code? > On again, thinking about this, I see a reason why one wants to do like you have done currently in the patch. It helps us to avoid giving such errors when they are really not required say when it occurred while checking parallel-safety for a particular partition and in reality we will never insert in that partition and there probably similar other cases. I guess we should give WARNING consistently in all such cases. -- With Regards, Amit Kapila.
On Mon, Jan 18, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > 1) > > > > >Here, it seems we are accessing the relation descriptor without any > > >lock on the table which is dangerous considering the table can be > > >modified in a parallel session. Is there a reason why you think this > > >is safe? Did you check anywhere else such a coding pattern? > > > > Yes, there's a very good reason and I certainly have checked for the > > same coding pattern elsewhere, and not just randomly decided that > > locking can be ignored. > > The table has ALREADY been locked (by the caller) during the > > parse/analyze phase. > > > > Fair enough. I suggest adding a comment saying the same so that the > reader doesn't get confused about the same. > OK, I'll add a comment. > > (This is not the case for a partition, in which case the patch code > > uses AccessShareLock, as you will see). > > Okay, but I see you release the lock on partition rel immediately > after checking parallel-safety. What if a user added some > parallel-unsafe constraint (via Alter Table) after that check? > > > I'm not sure. But there would be a similar concern for current Parallel SELECT functionality, right? My recollection is that ALTER TABLE obtains an exclusive lock on the table which it retains until the end of the transaction, so that will result in blocking at certain points, during parallel-checks and execution, but there may still be a window. > > 4) > > > > >domain_max_parallel_hazard_for_modify() > > >{ > > >.. > > >+ if (isnull) > > >+ { > > >+ /* > > >+ * This shouldn't ever happen, but if it does, log a WARNING > > >+ * and return UNSAFE, rather than erroring out. > > >+ */ > > >+ elog(WARNING, "null conbin for constraint %u", con->oid); > > >+ context->max_hazard = PROPARALLEL_UNSAFE; > > >+ break; > > >+ } > > >.. > > >} > > >index_expr_max_parallel_hazard_for_modify() > > >{ > > >.. > > >+ if (index_expr_item == NULL) /* shouldn't happen */ > > >+ { > > >+ index_close(index_rel, lockmode); > > >+ context->max_hazard = PROPARALLEL_UNSAFE; > > >+ return context->max_hazard; > > >+ } > > >.. > > >} > > > > >It is not clear why the above two are shouldn't happen cases and if so > > >why we want to treat them as unsafe. Ideally, there should be an > > >Assert if these can't happen but it is difficult to decide without > > >knowing why you have considered them unsafe? > > > > The checks being done here for "should never happen" cases are THE > > SAME as other parts of the Postgres code. > > For example, search Postgres code for "null conbin" and you'll find 6 > > other places in the Postgres code which actually ERROR out if conbin > > (binary representation of the constraint) in a pg_constraint tuple is > > found to be null. > > The cases that you point out in the patch used to also error out in > > the same way, but Vignesh suggested changing them to just return > > parallel-unsafe instead of erroring-out, which I agree with. > > > > You have not raised a WARNING for the second case. The same checks in current Postgres code also don't raise a WARNING for that case, so I'm just being consistent with existing Postgres code (which itself isn't consistent for those two cases). >But in the first > place what is the reasoning for making this different from other parts > of code? If we don't have a solid reason then I suggest keeping these > checks and errors the same as in other parts of the code. > The checks are the same as done in existing Postgres source - but instead of failing with an ERROR (i.e. whole backend dies), in the middle of parallel-safety-checking, it has been changed to regard the operation as parallel-unsafe, so that it will try to execute in non-parallel mode, where it will most likely fail too when those corrupted attributes are accessed - but it will fail in the way that it currently does in Postgres, should that very rare condition ever happen. This was suggested by Vignesh, and I agree with him. So in effect, it's just allowing it to use the existing error paths in the code, rather than introducing new ERROR points. Regards, Greg Nancarrow Fujitsu Australia
From: Greg Nancarrow <gregn4422@gmail.com> > On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >Here, it seems we are accessing the relation descriptor without any > >lock on the table which is dangerous considering the table can be > >modified in a parallel session. Is there a reason why you think this > >is safe? Did you check anywhere else such a coding pattern? > > Yes, there's a very good reason and I certainly have checked for the > same coding pattern elsewhere, and not just randomly decided that > locking can be ignored. > The table has ALREADY been locked (by the caller) during the > parse/analyze phase. Isn't there any case where planning is done but parse analysis is not done immediately before? e.g. * Alteration of objects invalidates cached query plans, and the next execution of the plan rebuilds it. (But it seems thatparse analysis is done in this case in plancache.c.) * Execute a prepared statement with a different parameter value, which builds a new custom plan or a generic plan. Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a triggerwith a parallel-unsafe trigger action? Regards Takayuki Tsunakawa
On Mon, Jan 18, 2021 at 3:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Mon, Jan 18, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > >It is not clear why the above two are shouldn't happen cases and if so > > > >why we want to treat them as unsafe. Ideally, there should be an > > > >Assert if these can't happen but it is difficult to decide without > > > >knowing why you have considered them unsafe? > > > > > > The checks being done here for "should never happen" cases are THE > > > SAME as other parts of the Postgres code. > > > For example, search Postgres code for "null conbin" and you'll find 6 > > > other places in the Postgres code which actually ERROR out if conbin > > > (binary representation of the constraint) in a pg_constraint tuple is > > > found to be null. > > > The cases that you point out in the patch used to also error out in > > > the same way, but Vignesh suggested changing them to just return > > > parallel-unsafe instead of erroring-out, which I agree with. > > > > > > > You have not raised a WARNING for the second case. > > The same checks in current Postgres code also don't raise a WARNING > for that case, so I'm just being consistent with existing Postgres > code (which itself isn't consistent for those two cases). > Search for the string "too few entries in indexprs list" and you will find a lot of places in code raising ERROR for the same condition. -- With Regards, Amit Kapila.
On Tue, Jan 19, 2021 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > You have not raised a WARNING for the second case. > > > > The same checks in current Postgres code also don't raise a WARNING > > for that case, so I'm just being consistent with existing Postgres > > code (which itself isn't consistent for those two cases). > > > > Search for the string "too few entries in indexprs list" and you will > find a lot of places in code raising ERROR for the same condition. > Yes, but raising an ERROR stops processing (not just logs an error message). Raising a WARNING logs a warning message and continues processing. It's a big difference. So, for the added parallel-safety-checking code, it was suggested by Vignesh (and agreed by me) that, for these rare and highly unlikely conditions, it would be best not to just copy the error-handling code verbatim from other cases in the Postgres code (as I had originally done) and just stop processing dead with an error, but to instead return PARALLEL_UNSAFE, so that processing continues as it would for current non-parallel processing, which would most likely error-out anyway along the current error-handling checks and paths when those bad attributes/fields are referenced. I will add some Asserts() and don't mind adding a WARNING message for the 2nd case. If you really feel strongly about this, I can just restore the original code, which will stop dead with an ERROR in the middle of parallel-safety checking should one of these rare conditions ever occur. Regards, Greg Nancarrow Fujitsu Australia
On Tue, Jan 19, 2021 at 1:37 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > The table has ALREADY been locked (by the caller) during the > > parse/analyze phase. > > Isn't there any case where planning is done but parse analysis is not done immediately before? e.g. > > * Alteration of objects invalidates cached query plans, and the next execution of the plan rebuilds it. (But it seemsthat parse analysis is done in this case in plancache.c.) > > * Execute a prepared statement with a different parameter value, which builds a new custom plan or a generic plan. > I don't know, but since NoLock is used in other parts of the planner, I'd expect those to fail if such cases existed. > Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a triggerwith a parallel-unsafe trigger action? > Needs to be tested, but I'd expect the cached plan to get invalidated in this case - surely the same potential issue exists in Postgres for the current Parallel SELECT functionality - for example, for a column with a default value that is an expression (which could be altered from being parallel-safe to parallel-unsafe). Regards, Greg Nancarrow Fujitsu Australia
On Mon, Jan 18, 2021 at 3:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Mon, Jan 18, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > 1) > > > > > > >Here, it seems we are accessing the relation descriptor without any > > > >lock on the table which is dangerous considering the table can be > > > >modified in a parallel session. Is there a reason why you think this > > > >is safe? Did you check anywhere else such a coding pattern? > > > > > > Yes, there's a very good reason and I certainly have checked for the > > > same coding pattern elsewhere, and not just randomly decided that > > > locking can be ignored. > > > The table has ALREADY been locked (by the caller) during the > > > parse/analyze phase. > > > > > > > Fair enough. I suggest adding a comment saying the same so that the > > reader doesn't get confused about the same. > > > > OK, I'll add a comment. > > > > (This is not the case for a partition, in which case the patch code > > > uses AccessShareLock, as you will see). > > > > Okay, but I see you release the lock on partition rel immediately > > after checking parallel-safety. What if a user added some > > parallel-unsafe constraint (via Alter Table) after that check? > > > > > > > I'm not sure. But there would be a similar concern for current > Parallel SELECT functionality, right? > I don't think so because, for Selects, we take locks on the required partitions and don't release them immediately. We do parallel safety checks after acquiring those locks. From the code perspective, we lock individual partitions via expand_inherited_rtentry->expand_partitioned_rtentry and then check parallel-safety at a later point via set_append_rel_size->set_rel_consider_parallel. Also, I am not sure if there is anything we check for Selects at each partition relation level that can be changed by a concurrent session. Do you have a different understanding? Similarly, we do retain locks on indexes, see get_relation_info, which we are not doing in the patch. > My recollection is that ALTER TABLE obtains an exclusive lock on the > table which it retains until the end of the transaction, so that will > result in blocking at certain points, during parallel-checks and > execution, but there may still be a window. > Once the Select has acquired locks in the above code path, I don't think Alter for a particular partition would be able to proceed unless those locks are non-conflicting. -- With Regards, Amit Kapila.
On Tue, Jan 19, 2021 at 9:19 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Jan 19, 2021 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > You have not raised a WARNING for the second case. > > > > > > The same checks in current Postgres code also don't raise a WARNING > > > for that case, so I'm just being consistent with existing Postgres > > > code (which itself isn't consistent for those two cases). > > > > > > > Search for the string "too few entries in indexprs list" and you will > > find a lot of places in code raising ERROR for the same condition. > > > > Yes, but raising an ERROR stops processing (not just logs an error > message). Raising a WARNING logs a warning message and continues > processing. It's a big difference. > So, for the added parallel-safety-checking code, it was suggested by > Vignesh (and agreed by me) that, for these rare and highly unlikely > conditions, it would be best not to just copy the error-handling code > verbatim from other cases in the Postgres code (as I had originally > done) and just stop processing dead with an error, but to instead > return PARALLEL_UNSAFE, so that processing continues as it would for > current non-parallel processing, which would most likely error-out > anyway along the current error-handling checks and paths when those > bad attributes/fields are referenced. > I will add some Asserts() and don't mind adding a WARNING message for > the 2nd case. > If you really feel strongly about this, I can just restore the > original code, which will stop dead with an ERROR in the middle of > parallel-safety checking should one of these rare conditions ever > occur. > I am expecting that either we raise a WARNING and return parallel_unsafe for all such checks (shouldn't reach cases) in the patch or simply raise an ERROR as we do in other parts of the patch. I personally prefer the latter alternative but I am fine with the former one as well. -- With Regards, Amit Kapila.
On Tue, Jan 19, 2021 at 10:32 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Jan 19, 2021 at 1:37 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > > The table has ALREADY been locked (by the caller) during the > > > parse/analyze phase. > > > > Isn't there any case where planning is done but parse analysis is not done immediately before? e.g. > > > > * Alteration of objects invalidates cached query plans, and the next execution of the plan rebuilds it. (But it seemsthat parse analysis is done in this case in plancache.c.) > > > > * Execute a prepared statement with a different parameter value, which builds a new custom plan or a generic plan. > > > > I don't know, but since NoLock is used in other parts of the planner, > I'd expect those to fail if such cases existed. > I think I know how for both the above cases, we ensure that the locks are acquired before we reach the planner. It seems we will call GetCachedPlan during these scenarios which will acquire the required locks in RevalidateCachedQuery both when the cached plan is invalid and when it is valid. So, we should be fine even when the custom/generic plan needs to be formed due to a different parameter. > > Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a triggerwith a parallel-unsafe trigger action? > > > > Needs to be tested, > Yeah, it would be good to test it but I think even if the plan is invalidated, we will reacquire the required locks as mentioned above. Tsunakawa-San, does this address your concerns around locking the target relation in the required cases? It would be good to test but I don't see any problems in the scenarios you mentioned. -- With Regards, Amit Kapila.
From: Amit Kapila <amit.kapila16@gmail.com> > Tsunakawa-San, does this address your concerns around locking the > target relation in the required cases? It would be good to test but I > don't see any problems in the scenarios you mentioned. Thank you, understood. RevalidateCachedQuery() does parse analysis, that's the trick. Regards Takayuki Tsunakawa
On Fri, Jan 8, 2021 at 8:25 PM vignesh C <vignesh21@gmail.com> wrote: > > > Few includes are not required: > #include "executor/nodeGather.h" > +#include "executor/nodeModifyTable.h" > #include "executor/nodeSubplan.h" > #include "executor/tqueue.h" > #include "miscadmin.h" > @@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags) > GatherState *gatherstate; > Plan *outerNode; > TupleDesc tupDesc; > + Index varno; > > This include is not required in nodeModifyTable.c > I think you meant nodeGather.c (not nodeModifyTable.c). However, the include file (executor/nodeModifyTable.h) is actually required here, otherwise there are build warnings. Regards, Greg Nancarrow Fujitsu Australia
From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com> > Execute EXPLAIN on Patched: > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual > time=44.139..44.140 rows=0 loops=1) > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000 > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 > width=8) (actual time=0.007..0.201 rows=1000 loops=1) > Output: test_data1.a, test_data1.b > Buffers: shared hit=5 > Execute EXPLAIN on non-Patched: > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual > time=72.656..72.657 rows=0 loops=1) > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000 > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 > width=8) (actual time=0.010..0.175 rows=1000 loops=1) > Output: test_data1.a, test_data1.b > Buffers: shared hit=5 I don't know if this is related to this issue, but I felt "shared hit=5" for Seq Scan is strange. This test case reads 1,000rows from 1,000 partitions, one row per partition, so I thought the shared hit should be 1,000 in Seq Scan. I wonderif the 1,000 is included in Insert node? Regards Takayuki Tsunakawa
On Fri, Jan 8, 2021 at 8:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > - if (pcxt->nworkers_launched > 0) > + if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader && > !isParallelModifyWithReturning)) > { > > I think this check could be simplified to if (pcxt->nworkers_launched > > 0 && isParallelModifyWithReturning) or something like that. > Not quite. The existing check is correct, because it needs to account for existing Parallel SELECT functionality (not just new Parallel INSERT). But I will re-write the test as an equivalent expression, so it's hopefully more readable (taking into account Antonin's suggested variable-name changes): if (pcxt->nworkers_launched > 0 && (!isModify || isModifyWithReturning)) > > > > @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan) > > PlannerGlobal *glob = root->glob; > > int rtoffset = list_length(glob->finalrtable); > > ListCell *lc; > > + Plan *finalPlan; > > > > /* > > * Add all the query's RTEs to the flattened rangetable. The live ones > > @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan) > > } > > > > /* Now fix the Plan tree */ > > - return set_plan_refs(root, plan, rtoffset); > > + finalPlan = set_plan_refs(root, plan, rtoffset); > > + if (finalPlan != NULL && IsA(finalPlan, Gather)) > > + { > > + Plan *subplan = outerPlan(finalPlan); > > + > > + if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL) > > + { > > + finalPlan->targetlist = copyObject(subplan->targetlist); > > + } > > + } > > + return finalPlan; > > } > > > > I'm not sure if the problem of missing targetlist should be handled here (BTW, > > NIL is the constant for an empty list, not NULL). Obviously this is a > > consequence of the fact that the ModifyTable node has no regular targetlist. > > > > I think it is better to add comments along with this change. In this > form, this looks quite hacky to me. > The targetlist on the ModifyTable node has been setup correctly, but it hasn't been propagated to the Gather node. Of course, I have previously tried to elegantly fix this, but struck various problems, using different approaches. Perhaps this code could just be moved into set_plan_refs(). For the moment, I'll just keep the current code, but I'll add a FIXME comment for this. I'll investigate Antonin's suggestions as a lower-priority side-task. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Jan 20, 2021 at 3:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Jan 8, 2021 at 8:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I'm not sure if the problem of missing targetlist should be handled here (BTW, > > > NIL is the constant for an empty list, not NULL). Obviously this is a > > > consequence of the fact that the ModifyTable node has no regular targetlist. > > > > > > > I think it is better to add comments along with this change. In this > > form, this looks quite hacky to me. > > > > The targetlist on the ModifyTable node has been setup correctly, but > it hasn't been propagated to the Gather node. > Of course, I have previously tried to elegantly fix this, but struck > various problems, using different approaches. > Perhaps this code could just be moved into set_plan_refs(). > For the moment, I'll just keep the current code, but I'll add a FIXME > comment for this. > I'll investigate Antonin's suggestions as a lower-priority side-task. > +1, that sounds like a good approach because anyway this has to be dealt for the second patch and at this point, we should make the first patch ready. -- With Regards, Amit Kapila.
Thanks for the feedback. Posting an updated set of patches. Changes are based on feedback, as detailed below: There's a couple of potential issues currently being looked at: - locking issues in additional parallel-safety checks? - apparent uneven work distribution across the parallel workers, for large insert data [Antonin] - Fixed bad Assert in PrepareParallelMode() - Added missing comment to explain use of GetCurrentCommandId() in PrepareParallelMode() - Some variable name shortening in a few places - Created common function for creation of non-parallel and parallel ModifyTable paths - Path count variable changed to bool - Added FIXME comment to dubious code for creating Gather target-list from ModifyTable subplan - Fixed check on returningLists to use NIL instead of NULL [Amit] - Moved additional parallel-safety checks (for modify case) into max_parallel_hazard() - Removed redundant calls to max_parallel_hazard_test() - Added Asserts to "should never happen" null-attribute cases (and added WARNING log missing from one case) - Added comment for use of NoLock in max_parallel_hazard_for_modify() [Vignesh] - Fixed a couple of typos - Added a couple of test cases for testing that the same transaction is used by all parallel workers Regards, Greg Nancarrow Fujitsu Australia
Attachment
> > Thanks for the feedback. > Posting an updated set of patches. Changes are based on feedback, as detailed > below: Hi It seems there are some previous comments[1][2] not addressed in current patch. Just to make sure it's not missed. [1] https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local [2] https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com Best regards, houzj
Hi,
For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.
parallel-safety checks for SELECT that it currently performs.
existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.
Minor. For index_expr_max_parallel_hazard_for_modify(),
+ if (keycol == 0)
+ {
+ /* Found an index expression */
+ {
+ /* Found an index expression */
You can check if keycol != 0, continue with the loop. This would save some indent.
+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ elog(WARNING, "too few entries in indexprs list");
+ {
+ elog(WARNING, "too few entries in indexprs list");
I think the warning should be an error since there is assertion ahead of the if statement.
+ Assert(!isnull);
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);
The above should be error as well.
Cheers
On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:
There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data
[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL
[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()
[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers
Regards,
Greg Nancarrow
Fujitsu Australia
Hi,
For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:
+ bool isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState);
Please wrap long line.
+ uint64 *processed_count_space;
If I read the code correctly, it seems it can be dropped (use pei->processed_count directly).
Cheers
On Wed, Jan 20, 2021 at 6:29 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.Minor. For index_expr_max_parallel_hazard_for_modify(),+ if (keycol == 0)
+ {
+ /* Found an index expression */You can check if keycol != 0, continue with the loop. This would save some indent.+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ elog(WARNING, "too few entries in indexprs list");I think the warning should be an error since there is assertion ahead of the if statement.+ Assert(!isnull);
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);The above should be error as well.CheersOn Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:
There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data
[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL
[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()
[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers
Regards,
Greg Nancarrow
Fujitsu Australia
On Thu, Jan 21, 2021 at 1:28 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, > For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch : > > is found from the additional parallel-safety checks, or from the existing > parallel-safety checks for SELECT that it currently performs. > > existing and 'it currently performs' are redundant. You can omit 'that it currently performs'. > OK, but this is very minor. > Minor. For index_expr_max_parallel_hazard_for_modify(), > > + if (keycol == 0) > + { > + /* Found an index expression */ > > You can check if keycol != 0, continue with the loop. This would save some indent. Yes I know, but I don't really see any issue with indent (I'm using 4-space tabs). > > + if (index_expr_item == NULL) /* shouldn't happen */ > + { > + elog(WARNING, "too few entries in indexprs list"); > > I think the warning should be an error since there is assertion ahead of the if statement. > Assertions are normally for DEBUG builds, so the Assert would have no effect in a production (release) build. Besides, as I have explained in my reply to previous feedback, the logging of a WARNING (rather than ERROR) is intentional, because I want processing to continue (not stop) if ever this very rare condition was to occur - so that the issue can be dealt with by the current non-parallel processing (rather than stop dead in the middle of parallel-safety-checking code). For a DEBUG build, it is handy for the Assert to immediately alert us to the issue (which could really only be caused by a database corruption, not bug in the code). Note that Vignesh originally suggested changing it from "elog(ERROR,...)" to "elog(WARNING,...)", and I agree with him. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Jan 21, 2021 at 1:50 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch: > > + bool isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState); > > Please wrap long line. > OK. I thought I ran pg_indent fairly recently, but maybe it chose not to wrap that line. > + uint64 *processed_count_space; > > If I read the code correctly, it seems it can be dropped (use pei->processed_count directly). > You're right. I'll change it in the next version. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Jan 21, 2021 at 12:47 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > Hi > > It seems there are some previous comments[1][2] not addressed in current patch. > Just to make sure it's not missed. > > [1] > https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local > > [2] > https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com > Thanks for alerting me to those, somehow I completely missed them, sorry about that. I'll be sure to investigate and address them in the next version of the patch I post. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > Hi > > > > > > I may be wrong, and if I miss sth in previous mails, please give me some > > > hints. > > > > IMO, serial insertion with underlying parallel SELECT can be > > > > considered for foreign table or temporary table, as the insertions only > > > happened in the leader process. > > > > > > > > > > I don't think we support parallel scan for temporary tables. Can you please > > > try once both of these operations without Insert being involved? If you > > > are able to produce a parallel plan without Insert then we can see why it > > > is not supported with Insert. > > > > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) tableis temporary. > > And you are right that parallel select is not enabled when temporary table is in select part. > > > > I think Select can be parallel for this case and we should support this case. > So I think we're saying that if the target table is a foreign table or temporary table, it can be regarded as PARALLEL_RESTRICTED, right? i.e. code-wise: /* - * We can't support table modification in parallel-mode if it's a foreign - * table/partition (no FDW API for supporting parallel access) or a + * We can't support table modification in a parallel worker if it's a + * foreign table/partition (no FDW API for supporting parallel access) or a * temporary table. */ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || RelationUsesLocalBuffers(rel)) { - table_close(rel, lockmode); - context->max_hazard = PROPARALLEL_UNSAFE; - return true; + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) + { + table_close(rel, lockmode); + return true; + } } Regards, Greg Nancarrow Fujitsu Australia
On Mon, Dec 21, 2020 at 1:50 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi > > + > + index_oid_list = RelationGetIndexList(rel); > ... > > As memtioned in the comments of RelationGetIndexList: > * we return a copy of the list palloc'd in the caller's context. The caller > * may list_free() the returned list after scanning it. > > Shall we list_free(index_oid_list) at the end of function ? > Just to avoid potential memory leak. > I think that's a good idea, so I'll make that update in the next version of the patch. I do notice, however, that there seems to be quite a few places in the Postgres code where RelationGetIndexList() is being called without a corresponding list_free() being called - obviously instead relying on it being deallocated when the caller's memory-context is destroyed. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Jan 21, 2021 at 12:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > > Hi > > > > > > > > I may be wrong, and if I miss sth in previous mails, please give me some > > > > hints. > > > > > IMO, serial insertion with underlying parallel SELECT can be > > > > > considered for foreign table or temporary table, as the insertions only > > > > happened in the leader process. > > > > > > > > > > > > > I don't think we support parallel scan for temporary tables. Can you please > > > > try once both of these operations without Insert being involved? If you > > > > are able to produce a parallel plan without Insert then we can see why it > > > > is not supported with Insert. > > > > > > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) tableis temporary. > > > And you are right that parallel select is not enabled when temporary table is in select part. > > > > > > > I think Select can be parallel for this case and we should support this case. > > > > So I think we're saying that if the target table is a foreign table or > temporary table, it can be regarded as PARALLEL_RESTRICTED, right? > Yes. > i.e. code-wise: > > /* > - * We can't support table modification in parallel-mode if > it's a foreign > - * table/partition (no FDW API for supporting parallel access) or a > + * We can't support table modification in a parallel worker if it's a > + * foreign table/partition (no FDW API for supporting parallel > access) or a > * temporary table. > */ > if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || > RelationUsesLocalBuffers(rel)) > { > - table_close(rel, lockmode); > - context->max_hazard = PROPARALLEL_UNSAFE; > - return true; > + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > + { > + table_close(rel, lockmode); > + return true; > + } > } > Yeah, these changes look correct to me. -- With Regards, Amit Kapila.
> > > Hi > > > > > > > > I may be wrong, and if I miss sth in previous mails, please give > > > > > me some > > > > hints. > > > > > IMO, serial insertion with underlying parallel SELECT can be > > > > > considered for foreign table or temporary table, as the > > > > > insertions only > > > > happened in the leader process. > > > > > > > > > > > > > I don't think we support parallel scan for temporary tables. Can > > > > you please try once both of these operations without Insert being > > > > involved? If you are able to produce a parallel plan without > > > > Insert then we can see why it is not supported with Insert. > > > > > > Sorry, may be I did not express it clearly, I actually means the case > when insert's target(not in select part) table is temporary. > > > And you are right that parallel select is not enabled when temporary > table is in select part. > > > > > > > I think Select can be parallel for this case and we should support this > case. > > > > So I think we're saying that if the target table is a foreign table or > temporary table, it can be regarded as PARALLEL_RESTRICTED, right? > Yes IMO, PARALLEL_RESTRICTED currently enable parallel select but disable parallel insert. So, the INSERT only happen in leader worker which seems safe to insert into tempory/foreigh table. In addition, there are some other restriction about parallel select which seems can be removed: 1.- Target table has a parallel-unsafe trigger, index expression, column default expression or check constraint 2.- Target table is a partitioned table with a parallel-unsafe partition key expression or support function If the Insert's target table is the type listed above, Is there some reasons why we can not support parallel select ? It seems only leader worker will execute the trigger and key-experssion which seems safe. (If I miss something about it, please let me know) Best regards, houzj
> > So I think we're saying that if the target table is a foreign table or > > temporary table, it can be regarded as PARALLEL_RESTRICTED, right? > > > > Yes > > IMO, PARALLEL_RESTRICTED currently enable parallel select but disable > parallel insert. > So, the INSERT only happen in leader worker which seems safe to insert into > tempory/foreigh table. > > In addition, there are some other restriction about parallel select which > seems can be removed: > > 1.- Target table has a parallel-unsafe trigger, index expression, column > default > expression or check constraint > 2.- Target table is a partitioned table with a parallel-unsafe partition > key > expression or support function > > If the Insert's target table is the type listed above, Is there some reasons > why we can not support parallel select ? > It seems only leader worker will execute the trigger and key-experssion > which seems safe. > (If I miss something about it, please let me know) So Sorry, please ignore the above, I think of something wrong. Best regards, houzj
> > > > + > > + index_oid_list = RelationGetIndexList(rel); > > ... > > > > As memtioned in the comments of RelationGetIndexList: > > * we return a copy of the list palloc'd in the caller's context. The > > caller > > * may list_free() the returned list after scanning it. > > > > Shall we list_free(index_oid_list) at the end of function ? > > Just to avoid potential memory leak. > > > > I think that's a good idea, so I'll make that update in the next version > of the patch. > I do notice, however, that there seems to be quite a few places in the Postgres > code where RelationGetIndexList() is being called without a corresponding > list_free() being called - obviously instead relying on it being deallocated > when the caller's memory-context is destroyed. > Yes, it will be deallocated when the caller's memory-context is destroyed. Currently, parallel safety-check check each partition. I am just a little worried about if there are lots of partition here, it may cause high memory use. And there is another place like this: 1. + conbin = TextDatumGetCString(val); + check_expr = stringToNode(conbin); It seems we can free the cobin when not used(for the same reason above). What do you think ? Best regards, houzj
On Fri, Jan 22, 2021 at 12:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > I think that's a good idea, so I'll make that update in the next version > > of the patch. > > I do notice, however, that there seems to be quite a few places in the Postgres > > code where RelationGetIndexList() is being called without a corresponding > > list_free() being called - obviously instead relying on it being deallocated > > when the caller's memory-context is destroyed. > > > > Yes, it will be deallocated when the caller's memory-context is destroyed. > > Currently, parallel safety-check check each partition. > I am just a little worried about if there are lots of partition here, it may cause high memory use. > > And there is another place like this: > > 1. > + conbin = TextDatumGetCString(val); > + check_expr = stringToNode(conbin); > > It seems we can free the cobin when not used(for the same reason above). > What do you think ? > > Yes, I think you're right, we should pfree conbin after converting to Node, to minimize memory usage. Again, it's interesting that existing Postgres code, when looping through all of the constraints, doesn't do this. Hmmm. I'm wondering if there is a performance reason behind this - avoiding multiple calls to pfree() and just relying on it to be deallocated in one hit, when the memory context is destroyed. Anyway, perhaps the concerns of many partitions and the recursive nature of these checks overrides that, because, as you say, possible high memory usage. Regards, Greg Nancarrow Fujitsu Australia
Hi I took a look at v12-0001 patch, here are some comments: 1. + /* + * Setup the context used in finding the max parallel-mode hazard. + */ + Assert(initial_max_parallel_hazard == 0 || + initial_max_parallel_hazard == PROPARALLEL_SAFE || + initial_max_parallel_hazard == PROPARALLEL_RESTRICTED); + context.max_hazard = initial_max_parallel_hazard == 0 ? + PROPARALLEL_SAFE : initial_max_parallel_hazard; I am not quiet sure when will " max_parallel_hazard == 0" Does it means the case max_parallel_hazard_context not initialized ? 2. Some tiny code style suggestions + if (con->contype == CONSTRAINT_CHECK) + { + char *conbin; + Datum val; + bool isnull; + Expr *check_expr; How about : if (con->contype != CONSTRAINT_CHECK) continue; 3. + if (keycol == 0) + { + /* Found an index expression */ + + Node *index_expr; Like 2, how about: If (keycol != 0) Continue; 4. + ListCell *index_expr_item = list_head(index_info->ii_Expressions); ... + index_expr = (Node *) lfirst(index_expr_item); + index_expr = (Node *) expression_planner((Expr *) index_expr); It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions, Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions So, IMO, we do not need to call expression_planner for the expr again. And there seems another solution for this: In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo, which seems can get from "Relation-> rd_index". Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo. It can avoid some unnecessary cost. And in this solution we do not need to remove expression_planner. What do you think ? Best regards, houzj
On Thu, Jan 21, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > i.e. code-wise: > > > > /* > > - * We can't support table modification in parallel-mode if > > it's a foreign > > - * table/partition (no FDW API for supporting parallel access) or a > > + * We can't support table modification in a parallel worker if it's a > > + * foreign table/partition (no FDW API for supporting parallel > > access) or a > > * temporary table. > > */ > > if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || > > RelationUsesLocalBuffers(rel)) > > { > > - table_close(rel, lockmode); > > - context->max_hazard = PROPARALLEL_UNSAFE; > > - return true; > > + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > > + { > > + table_close(rel, lockmode); > > + return true; > > + } > > } > > > > Yeah, these changes look correct to me. > Unfortunately, this change results in a single test failure in the "with" tests when "force_parallel_mode=regress" is in effect. I have reproduced the problem, by extracting relevant SQL from those tests, as follows: CREATE TEMP TABLE bug6051 AS select i from generate_series(1,3) as i; SELECT * FROM bug6051; CREATE TEMP TABLE bug6051_2 (i int); CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD INSERT INTO bug6051_2 SELECT NEW.i; WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 SELECT * FROM t1; ERROR: cannot delete tuples during a parallel operation Note that prior to the patch, all INSERTs were regarded as PARALLEL_UNSAFE, so this problem obviously didn't occur. I believe this INSERT should be regarded as PARALLEL_UNSAFE, because it contains a modifying CTE. However, for some reason, the INSERT is not regarded as having a modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into the parallel-safety-checks and is found to be PARALLEL_RESTRICTED: The relevant code in standard_planner() is: if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && (parse->commandType == CMD_SELECT || IsModifySupportedInParallelMode(parse->commandType)) && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) { /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); } else { /* skip the query tree scan, just assume it's unsafe */ glob->maxParallelHazard = PROPARALLEL_UNSAFE; glob->parallelModeOK = false; } When I debugged this (transformWithClause()), the WITH clause was found to contain a modifying CTE and for the INSERT query->hasModifyingCTE was set true. But somehow in the re-writer code, this got lost. Bug? Ideas? Regards, Greg Nancarrow Fujitsu Australia
On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi > > I took a look at v12-0001 patch, here are some comments: > > 1. > + /* > + * Setup the context used in finding the max parallel-mode hazard. > + */ > + Assert(initial_max_parallel_hazard == 0 || > + initial_max_parallel_hazard == PROPARALLEL_SAFE || > + initial_max_parallel_hazard == PROPARALLEL_RESTRICTED); > + context.max_hazard = initial_max_parallel_hazard == 0 ? > + PROPARALLEL_SAFE : initial_max_parallel_hazard; > > I am not quiet sure when will " max_parallel_hazard == 0" > Does it means the case max_parallel_hazard_context not initialized ? > That function doesn't accept a "max_parallel_hazard_context". It accepts an initial "max_parallel_hazard" value (char). The "0" value is just a way of specifying "use the default" (PROPARALLEL_SAFE). It is not currently used, since currently we just always pass the "context.max_parallel_hazard" value resulting from the previous parallel-safety checks for SELECT (and otherwise don't call that function anywhere else). > > 2. > Some tiny code style suggestions > > + if (con->contype == CONSTRAINT_CHECK) > + { > + char *conbin; > + Datum val; > + bool isnull; > + Expr *check_expr; > > How about : > > if (con->contype != CONSTRAINT_CHECK) > continue; > > 3. > + if (keycol == 0) > + { > + /* Found an index expression */ > + > + Node *index_expr; > > Like 2, how about: > > If (keycol != 0) > Continue; > This is really a programmer style preference (plenty of discussions on the internet about it), but it can be argued that use of "continue" here is not quite as clear as the explicit "if" condition, especially in this very simple one-condition case. I'm inclined to leave it as is. > > 4. > + ListCell *index_expr_item = list_head(index_info->ii_Expressions); > ... > + index_expr = (Node *) lfirst(index_expr_item); > + index_expr = (Node *) expression_planner((Expr *) index_expr); > > It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions, > Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions > > So, IMO, we do not need to call expression_planner for the expr again. > > > And there seems another solution for this: > > In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo, > which seems can get from "Relation-> rd_index". > > Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo. > It can avoid some unnecessary cost. > And in this solution we do not need to remove expression_planner. > OK, maybe this is a good idea, but I do recall trying to minimize this kind of processing before, but there were cases which broke it. Have you actually tried your idea and run all regression tests and verified that they passed? In any case, I'll look into it. Regards, Greg Nancarrow Fujitsu Australia
On Fri, Jan 22, 2021 at 8:29 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Thu, Jan 21, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > i.e. code-wise: > > > > > > /* > > > - * We can't support table modification in parallel-mode if > > > it's a foreign > > > - * table/partition (no FDW API for supporting parallel access) or a > > > + * We can't support table modification in a parallel worker if it's a > > > + * foreign table/partition (no FDW API for supporting parallel > > > access) or a > > > * temporary table. > > > */ > > > if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || > > > RelationUsesLocalBuffers(rel)) > > > { > > > - table_close(rel, lockmode); > > > - context->max_hazard = PROPARALLEL_UNSAFE; > > > - return true; > > > + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > > > + { > > > + table_close(rel, lockmode); > > > + return true; > > > + } > > > } > > > > > > > Yeah, these changes look correct to me. > > > > Unfortunately, this change results in a single test failure in the > "with" tests when "force_parallel_mode=regress" is in effect. > > I have reproduced the problem, by extracting relevant SQL from those > tests, as follows: > > CREATE TEMP TABLE bug6051 AS > select i from generate_series(1,3) as i; > SELECT * FROM bug6051; > CREATE TEMP TABLE bug6051_2 (i int); > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD > INSERT INTO bug6051_2 > SELECT NEW.i; > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) > INSERT INTO bug6051 SELECT * FROM t1; > ERROR: cannot delete tuples during a parallel operation > > Note that prior to the patch, all INSERTs were regarded as > PARALLEL_UNSAFE, so this problem obviously didn't occur. > I believe this INSERT should be regarded as PARALLEL_UNSAFE, because > it contains a modifying CTE. > However, for some reason, the INSERT is not regarded as having a > modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into > the parallel-safety-checks and is found to be PARALLEL_RESTRICTED: > > The relevant code in standard_planner() is: > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > IsUnderPostmaster && > (parse->commandType == CMD_SELECT || > IsModifySupportedInParallelMode(parse->commandType)) && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > { > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > } > else > { > /* skip the query tree scan, just assume it's unsafe */ > glob->maxParallelHazard = PROPARALLEL_UNSAFE; > glob->parallelModeOK = false; > } > > When I debugged this (transformWithClause()), the WITH clause was > found to contain a modifying CTE and for the INSERT > query->hasModifyingCTE was set true. > But somehow in the re-writer code, this got lost. > Bug? > Ideas? > How it behaves when the table in the above test is a non-temp table with your patch? If it leads to the same error then we can at least conclude that this is a generic problem and nothing specific to temp tables. -- With Regards, Amit Kapila.
On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > 4. > + ListCell *index_expr_item = list_head(index_info->ii_Expressions); > ... > + index_expr = (Node *) lfirst(index_expr_item); > + index_expr = (Node *) expression_planner((Expr *) index_expr); > > It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions, > Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions > > So, IMO, we do not need to call expression_planner for the expr again. > Thanks. You are right. I debugged it, and found that BuildIndexInfo--> RelationGetIndexExpressions executes the same expression evaluation code as expression_planner(). So I'll remove the redundant call to expression_planner() here. > > And there seems another solution for this: > > In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo, > which seems can get from "Relation-> rd_index". > > Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo. > It can avoid some unnecessary cost. > And in this solution we do not need to remove expression_planner. > Hmmm, when I debugged my simple test case, I found rel->rd_index was NULL, so it seems that the call to BuildIndexInfo is needed. (have I understood your suggestion correctly?) Regards, Greg Nancarrow Fujitsu Australia
On Fri, Jan 22, 2021 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Unfortunately, this change results in a single test failure in the > > "with" tests when "force_parallel_mode=regress" is in effect. > > > > I have reproduced the problem, by extracting relevant SQL from those > > tests, as follows: > > > > CREATE TEMP TABLE bug6051 AS > > select i from generate_series(1,3) as i; > > SELECT * FROM bug6051; > > CREATE TEMP TABLE bug6051_2 (i int); > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD > > INSERT INTO bug6051_2 > > SELECT NEW.i; > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) > > INSERT INTO bug6051 SELECT * FROM t1; > > ERROR: cannot delete tuples during a parallel operation > > > > Note that prior to the patch, all INSERTs were regarded as > > PARALLEL_UNSAFE, so this problem obviously didn't occur. > > I believe this INSERT should be regarded as PARALLEL_UNSAFE, because > > it contains a modifying CTE. > > However, for some reason, the INSERT is not regarded as having a > > modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into > > the parallel-safety-checks and is found to be PARALLEL_RESTRICTED: > > > > The relevant code in standard_planner() is: > > > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > > IsUnderPostmaster && > > (parse->commandType == CMD_SELECT || > > IsModifySupportedInParallelMode(parse->commandType)) && > > !parse->hasModifyingCTE && > > max_parallel_workers_per_gather > 0 && > > !IsParallelWorker()) > > { > > /* all the cheap tests pass, so scan the query tree */ > > glob->maxParallelHazard = max_parallel_hazard(parse); > > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > > } > > else > > { > > /* skip the query tree scan, just assume it's unsafe */ > > glob->maxParallelHazard = PROPARALLEL_UNSAFE; > > glob->parallelModeOK = false; > > } > > > > When I debugged this (transformWithClause()), the WITH clause was > > found to contain a modifying CTE and for the INSERT > > query->hasModifyingCTE was set true. > > But somehow in the re-writer code, this got lost. > > Bug? > > Ideas? > > > > How it behaves when the table in the above test is a non-temp table > with your patch? If it leads to the same error then we can at least > conclude that this is a generic problem and nothing specific to temp > tables. > Oh, I don't believe that this has anything to do with TEMP tables - it's just that when I relaxed the parallel-safety level on TEMP tables, it exposed the CTE issue in this test case because it just happens to use a TEMP table. Having said that, when I changed that test code to not use a TEMP table, an Assert fired in the planner code and caused the backend to abort. It looks like I need to update the following Assert in the planner code (unchanged by the current patch) in order to test further - but this Assert only fired because the commandType was CMD_DELETE, which SHOULD have been excluded by the "hasModifyingCTE" test on the parent INSERT, which is what I'm saying is strangely NOT getting set. /* * Generate partial paths for final_rel, too, if outer query levels might * be able to make use of them. */ if (final_rel->consider_parallel && root->query_level > 1 && !limit_needed(parse)) { Assert(!parse->rowMarks && parse->commandType == CMD_SELECT); foreach(lc, current_rel->partial_pathlist) { Path *partial_path = (Path *) lfirst(lc); add_partial_path(final_rel, partial_path); } } Once the Assert above is changed to not test the commandType, the same error occurs as before. This appears to possibly be some kind of bug in which hasModifyingCTE is not getting set, at least in this particular INSERT case, but in the current Postgres code it doesn't matter because all INSERTs are considered parallel-unsafe, so won't be executed in parallel-mode anyway. I notice that if I execute "SELECT * from t1" instead of "INSERT INTO bug6051 SELECT * from t1", then "hasModifyingCTE" is getting set to true for the query, and thus is always considered parallel-unsafe. Regards, Greg Nancarrow Fujitsu Australia
> > And there seems another solution for this: > > > > In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , > > ii_IndexAttrNumbers } from the IndexInfo, which seems can get from > "Relation-> rd_index". > > > > Based on above, May be we do not need to call BuildIndexInfo to build > the IndexInfo. > > It can avoid some unnecessary cost. > > And in this solution we do not need to remove expression_planner. > > > > Hmmm, when I debugged my simple test case, I found rel->rd_index was NULL, > so it seems that the call to BuildIndexInfo is needed. > (have I understood your suggestion correctly?) > Hi greg, Thanks for debugging this. May be I missed something. I am not sure about the case when rel->rd_index was NULL. Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index. Like the following: ---- BuildIndexInfo(Relation index) { IndexInfo *ii; Form_pg_index indexStruct = index->rd_index; int i; int numAtts; /* check the number of keys, and copy attr numbers into the IndexInfo */ numAtts = indexStruct->indnatts; ---- And the patch do not have NULL-check for index->rd_index too. So I thought we can assume index->rd_index is not null, but it seems I may missed something ? Can you please share the test case with me ? I use the following code to replace the call of BuildIndexInfo. And the installcheck passed. Example: + Form_pg_index indexStruct = index_rel->rd_index; + List *ii_Expressions = RelationGetIndexExpressions(index_rel); + int ii_NumIndexAttrs = indexStruct->indnatts; + AttrNumber ii_IndexAttrNumbers[INDEX_MAX_KEYS]; + for (int i = 0; i < ii_NumIndexAttrs; i++) + ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i]; Best regards, houzj
On Fri, Jan 22, 2021 at 6:21 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > Hi greg, > > Thanks for debugging this. > > May be I missed something. I am not sure about the case when rel->rd_index was NULL. > Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index. > Like the following: > ---- > BuildIndexInfo(Relation index) > { > IndexInfo *ii; > Form_pg_index indexStruct = index->rd_index; > int i; > int numAtts; > > /* check the number of keys, and copy attr numbers into the IndexInfo */ > numAtts = indexStruct->indnatts; > ---- > > And the patch do not have NULL-check for index->rd_index too. > So I thought we can assume index->rd_index is not null, but it seems I may missed something ? > > Can you please share the test case with me ? > > I use the following code to replace the call of BuildIndexInfo. > And the installcheck passed. > > Example: > + Form_pg_index indexStruct = index_rel->rd_index; > + List *ii_Expressions = RelationGetIndexExpressions(index_rel); > + int ii_NumIndexAttrs = indexStruct->indnatts; > + AttrNumber ii_IndexAttrNumbers[INDEX_MAX_KEYS]; > > + for (int i = 0; i < ii_NumIndexAttrs; i++) > + ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i]; Sorry, I was looking at rel->rd_index, not index_rel->rd_index, my fault. Your code looks OK. I've taken it and reduced some of the lines and got rid of the C99-only intermingled variable declarations (see https://www.postgresql.org/docs/13/source-conventions.html). The changes are below. The regression tests all pass, so should be OK (my test case was taken from insert_parallel regression tests). Thanks for your help. - Oid index_oid = lfirst_oid(lc); - Relation index_rel; - IndexInfo *index_info; + Relation index_rel; + Form_pg_index indexStruct; + List *ii_Expressions; + Oid index_oid = lfirst_oid(lc); index_rel = index_open(index_oid, lockmode); - index_info = BuildIndexInfo(index_rel); + indexStruct = index_rel->rd_index; + ii_Expressions = RelationGetIndexExpressions(index_rel); - if (index_info->ii_Expressions != NIL) + if (ii_Expressions != NIL) { int i; - ListCell *index_expr_item = list_head(index_info->ii_Expressions); + ListCell *index_expr_item = list_head(ii_Expressions); - for (i = 0; i < index_info->ii_NumIndexAttrs; i++) + for (i = 0; i < indexStruct->indnatts; i++) { - int keycol = index_info->ii_IndexAttrNumbers[i]; + int keycol = indexStruct->indkey.values[i]; if (keycol == 0) { @@ -912,7 +914,7 @@ index_expr_max_parallel_hazard_for_modify(Relation rel, return true; } - index_expr_item = lnext(index_info->ii_Expressions, index_expr_item); + index_expr_item = lnext(ii_Expressions, index_expr_item); } } Regards, Greg Nancarrow Fujitsu Australia
Hello Greg-san, Initially, some miner comments: (1) - * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE - * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader - * backend writes into a completely new table. In the future, we can - * extend it to allow workers to write into the table. However, to allow - * parallel updates and deletes, we have to solve other problems, - * especially around combo CIDs.) + * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT + * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as + * of now, only the leader backend writes into a completely new table. In This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a completelynew table." The reality is that workers as well as the leader can write into an empty or non-empty table in parallel,isn't it? (2) /* * RELATION_IS_LOCAL - * If a rel is either temp or newly created in the current transaction, - * it can be assumed to be accessible only to the current backend. - * This is typically used to decide that we can skip acquiring locks. + * If a rel is temp, it can be assumed to be accessible only to the + * current backend. This is typically used to decide that we can + * skip acquiring locks. * * Beware of multiple eval of argument */ #define RELATION_IS_LOCAL(relation) \ - ((relation)->rd_islocaltemp || \ - (relation)->rd_createSubid != InvalidSubTransactionId) + ((relation)->rd_islocaltemp) How is this correct? At least, this change would cause a transaction that creates a new relation acquire an unnecessarylock. I'm not sure if that overhead is worth worrying about (perhaps not, I guess). But can we still check >rd_createSubidin non-parallel mode? If we adopt the above change, the comments at call sites need modification - "new ortemp relation" becomes "temp relations". (3) @@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate) ... - pstmt->commandType = CMD_SELECT; + Assert(estate->es_plannedstmt->commandType == CMD_SELECT || + IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType)); + pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT; The last line can just be as follows, according to the Assert(): + pstmt->commandType = estate->es_plannedstmt->commandType); (4) @@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate, estate->es_use_parallel_mode = use_parallel_mode; if (use_parallel_mode) { - PrepareParallelMode(estate->es_plannedstmt->commandType); + bool isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState); + + PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader); EnterParallelMode(); } @@ -1021,12 +1039,25 @@ IsInParallelMode(void) * Prepare for entering parallel mode, based on command-type. */ void -PrepareParallelMode(CmdType commandType) +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader) { if (IsModifySupportedInParallelMode(commandType)) { Assert(!IsInParallelMode()); + if (isParallelModifyLeader) + { + /* + * Set currentCommandIdUsed to true, to ensure that the current + * CommandId (which will be used by the parallel workers) won't + * change during this parallel operation, as starting new + * commands in parallel-mode is not currently supported. + * See related comments in GetCurrentCommandId and + * CommandCounterIncrement. + */ + (void) GetCurrentCommandId(true); + } I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan(). PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel leaderor a parallel worker. BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()? Someone who will read other call sitesof EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there. Can wejust add an argument to EnterParallelMode()? Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to use CMD_XX. Regards Takayuki Tsunakawa
On Fri, Jan 22, 2021 at 7:52 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > (1) > - * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE > - * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader > - * backend writes into a completely new table. In the future, we can > - * extend it to allow workers to write into the table. However, to allow > - * parallel updates and deletes, we have to solve other problems, > - * especially around combo CIDs.) > + * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT > + * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as > + * of now, only the leader backend writes into a completely new table. In > > This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a completelynew table." The reality is that workers as well as the leader can write into an empty or non-empty table in parallel,isn't it? > Yes, you're right the wording is not right (and I don't really like the wording used before the patch). Perhaps it could say: (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as of now, other than in the case of INSERT INTO...SELECT, only the leader backend writes into a completely new table. In the future, we can extend it to allow workers for the other commands to write into the table. However, to allow parallel updates and deletes, we have to solve other problems, especially around combo CIDs.) Of course, this will need further updating when parallel CREATE TABLE AS etc. is implemented ... > > (2) > /* > * RELATION_IS_LOCAL > - * If a rel is either temp or newly created in the current transaction, > - * it can be assumed to be accessible only to the current backend. > - * This is typically used to decide that we can skip acquiring locks. > + * If a rel is temp, it can be assumed to be accessible only to the > + * current backend. This is typically used to decide that we can > + * skip acquiring locks. > * > * Beware of multiple eval of argument > */ > #define RELATION_IS_LOCAL(relation) \ > - ((relation)->rd_islocaltemp || \ > - (relation)->rd_createSubid != InvalidSubTransactionId) > + ((relation)->rd_islocaltemp) > > How is this correct? At least, this change would cause a transaction that creates a new relation acquire an unnecessarylock. I'm not sure if that overhead is worth worrying about (perhaps not, I guess). But can we still check >rd_createSubidin non-parallel mode? If we adopt the above change, the comments at call sites need modification - "new ortemp relation" becomes "temp relations". > The problem is, with the introduction of parallel INSERT, it's no longer the case that newly-created tables can't be accessed by anyone else in the same transaction - now, a transaction can include parallel workers, inserting into the table concurrently. Without changing that macro, things fail with a very obscure message (e.g. ERROR: unexpected data beyond EOF in block 5 of relation base/16384/16388) and it takes days to debug what the cause of it is. Maybe updating the macro to still check rd_createSubid in non-parallel mode is a good idea - I'll need to try it. Other than that, each and every usage of RELATION_IS_LOCAL would need to be closely examined, to see if it could be within a parallel INSERT. > > (3) > @@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate) > ... > - pstmt->commandType = CMD_SELECT; > + Assert(estate->es_plannedstmt->commandType == CMD_SELECT || > + IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType)); > + pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT; > > The last line can just be as follows, according to the Assert(): > > + pstmt->commandType = estate->es_plannedstmt->commandType); > No, that's not right. I did that originally and it failed in some cases (try changing it and then run the regression tests and you'll see). The commandType of the es_plannedstmt might be CMD_INSERT but the one in the plan might be CMD_SELECT (for the underlying SELECT). > > (4) > @@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate, > estate->es_use_parallel_mode = use_parallel_mode; > if (use_parallel_mode) > { > - PrepareParallelMode(estate->es_plannedstmt->commandType); > + bool isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate),ModifyTableState); > + > + PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader); > EnterParallelMode(); > } > > @@ -1021,12 +1039,25 @@ IsInParallelMode(void) > * Prepare for entering parallel mode, based on command-type. > */ > void > -PrepareParallelMode(CmdType commandType) > +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader) > { > if (IsModifySupportedInParallelMode(commandType)) > { > Assert(!IsInParallelMode()); > > + if (isParallelModifyLeader) > + { > + /* > + * Set currentCommandIdUsed to true, to ensure that the current > + * CommandId (which will be used by the parallel workers) won't > + * change during this parallel operation, as starting new > + * commands in parallel-mode is not currently supported. > + * See related comments in GetCurrentCommandId and > + * CommandCounterIncrement. > + */ > + (void) GetCurrentCommandId(true); > + } > > I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan(). PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel leaderor a parallel worker. You could, but I'm not sure it would make the code easier to read, especially for those who don't know !isParallelWorker() means it's a parallel leader. > > BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()? Someone who will read other callsites of EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there. Can we just add an argument to EnterParallelMode()? Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to useCMD_XX. > I really can't see a problem. PrepareParallelMode() is only needed prior to execution of a parallel plan, so it's not needed for "other call sites" using EnterParallelMode(). Perhaps the name can be changed to disassociate it from generic EnterParallelMode() usage. So far, I've only thought of long names like: PrepareParallelModePlanExec(). Ideas? Regards, Greg Nancarrow Fujitsu Australia .
Hello Greg-san, Second group of comments (I'll reply to (1) - (4) later): (5) @@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt) ... - if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE) + if ((plannedstmt->commandType != CMD_SELECT && + !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE) PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt)); } Now that we're trying to allow parallel writes (INSERT), we should: * use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents. That is, movethe call to PreventCommandIfParallelMode() up to standard_ExecutorStart(). * Update the comment above the call to ExecCheckXactReadOnly(). (6) @@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, ... + else + { + pei->processed_count = NULL; + } The braces can be deleted. (7) @@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) true); queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options); + Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation)); + if (IsModifySupportedInParallelMode(queryDesc->operation)) + { + /* + * Record that the CurrentCommandId is used, at the start of the + * parallel operation. + */ + SetCurrentCommandIdUsedForWorker(); + } + /* Setting debug_query_string for individual workers */ debug_query_string = queryDesc->sourceText; @@ -765,12 +779,16 @@ GetCurrentCommandId(bool used) if (used) { /* - * Forbid setting currentCommandIdUsed in a parallel worker, because - * we have no provision for communicating this back to the leader. We - * could relax this restriction when currentCommandIdUsed was already - * true at the start of the parallel operation. + * If in a parallel worker, only allow setting currentCommandIdUsed if + * currentCommandIdUsed was already true at the start of the parallel + * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid + * setting currentCommandIdUsed because we have no provision for + * communicating this back to the leader. Once currentCommandIdUsed is + * set, the commandId used by leader and workers can't be changed, + * because CommandCounterIncrement() then prevents any attempted + * increment of the current commandId. */ - Assert(!IsParallelWorker()); + Assert(!(IsParallelWorker() && !currentCommandIdUsed)); currentCommandIdUsed = true; } return currentCommandId; What happens without these changes? If this kind of change is really necessary, it seems more natural to pass currentCommandIdUsedtogether with currentCommandId through SerializeTransactionState() and StartParallelWorkerTransaction(),instead of the above changes. As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker(). (8) + /* + * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in + * the relation, and this would result in creation of new CommandIds + * on insert/update/delete and this isn't supported in a parallel + * worker (but is safe in the parallel leader). + */ + trigtype = RI_FKey_trigger_type(trigger->tgfoid); + if (trigtype == RI_TRIGGER_FK) + { + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) + return true; + } Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs. See RI_FKey_check()which is called in RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the detectNewRows argumentset to false, which causes CommandCounterIncrement() to not be called. Plus, tables that have RI_TRIGGER_PK should allow parallel INSERT in a parallel-safe manner, because those triggers onlyfire for UPDATE and DELETE. So, for the future parallel UPDATE/DELETE support, the above check should be performed inUPDATE and DELETE cases. (In a data warehouse, fact tables, which store large amounts of historical data, typically have foreign keys to smaller dimensiontables. Thus, it's important to allow parallel INSERTs on tables with foreign keys.) Regards Takayuki Tsunakawa
Hi, After doing some test to cover the code path in the PATCH 0001. I have some suggestions for the 0002 testcase. (1) + /* Check parallel-safety of any expressions in the partition key */ + if (get_partition_col_attnum(pkey, i) == 0) + { + Node *check_expr = (Node *) lfirst(partexprs_item); + + if (max_parallel_hazard_walker(check_expr, context)) + { + table_close(rel, lockmode); + return true; + } The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition key). Personally, I use the following sql to cover this: ----- create table partkey_unsafe_key_expr_t (a int4, b name) partition by range ((fullname_parallel_unsafe('',a::varchar))); explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1; ----- (2) I noticed that most of testcase test both (parallel safe/unsafe/restricted). But the index expression seems does not test the parallel restricted. How about add a testcase like: ----- create or replace function fullname_parallel_restricted(f text, l text) returns text as $$ begin return f || l; end; $$ language plpgsql immutable parallel restricted; create table names4(index int, first_name text, last_name text); create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name)); -- -- Test INSERT with parallel-restricted index expression -- (should create a parallel plan) -- explain (costs off) insert into names4 select * from names; ----- (3) + /* Recursively check each partition ... */ + pdesc = RelationGetPartitionDesc(rel); + for (i = 0; i < pdesc->nparts; i++) + { + if (rel_max_parallel_hazard_for_modify(pdesc->oids[i], + command_type, + context, + AccessShareLock)) + { + table_close(rel, lockmode); + return true; + } + } It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition) Hoe about add one testcase to test parallel unsafe partition ? Best regards, houzj
On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
>
> Hello Greg-san,
>
>
> Second group of comments (I'll reply to (1) - (4) later):
>
>
> (5)
> @@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
> ...
> - if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
> + if ((plannedstmt->commandType != CMD_SELECT &&
> + !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
> PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
> }
>
> Now that we're trying to allow parallel writes (INSERT), we should:
>
> * use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents. That is, move the call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
>
> * Update the comment above the call to ExecCheckXactReadOnly().
>
>
Hmmm, I not so sure. The patch changes just make the existing test for calling PreventCommandIfParallelMode() a bit more restrictive, to exclude the Parallel INSERT case. So the code previously wasn't just checking read-only transactions anyway, so it's not as if the patch has changed something fundamental in this function. And by moving the PreventCommandIfParallelMode() call to a higher level, then you're making a change to the existing order of error-handling (as ExecCheckXactReadOnly() is calling PreventCommandIfReadOnly() based on a few other range-table conditions, prior to testing whether to call PreventCommandIfParallelMode()). I don't want to introduce a bug by making the change that you're suggesting.
> (6)
> @@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
> ...
> + else
> + {
> + pei->processed_count = NULL;
> + }
>
> The braces can be deleted.
>
Yes they can be deleted, and I guess I will, but for the record, I personally prefer the explicit brackets, even if just one line, because:
- if more code ever needs to be added to the else, you'll need to add brackets anyway (and newbies might add extra lines tabbed in, thinking it's part of the else block ...).
- I think it looks better and slightly easier to read, especially when there's a mix of cases with multiple code lines and single code lines
Of course, these kind of things could be debated forever, but I don't think it's such a big deal.
>
> (7)
> @@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
> true);
> queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
>
> + Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
> + if (IsModifySupportedInParallelMode(queryDesc->operation))
> + {
> + /*
> + * Record that the CurrentCommandId is used, at the start of the
> + * parallel operation.
> + */
> + SetCurrentCommandIdUsedForWorker();
> + }
> +
> /* Setting debug_query_string for individual workers */
> debug_query_string = queryDesc->sourceText;
>
> @@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
> if (used)
> {
> /*
> - * Forbid setting currentCommandIdUsed in a parallel worker, because
> - * we have no provision for communicating this back to the leader. We
> - * could relax this restriction when currentCommandIdUsed was already
> - * true at the start of the parallel operation.
> + * If in a parallel worker, only allow setting currentCommandIdUsed if
> + * currentCommandIdUsed was already true at the start of the parallel
> + * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
> + * setting currentCommandIdUsed because we have no provision for
> + * communicating this back to the leader. Once currentCommandIdUsed is
> + * set, the commandId used by leader and workers can't be changed,
> + * because CommandCounterIncrement() then prevents any attempted
> + * increment of the current commandId.
> */
> - Assert(!IsParallelWorker());
> + Assert(!(IsParallelWorker() && !currentCommandIdUsed));
> currentCommandIdUsed = true;
> }
> return currentCommandId;
>
> What happens without these changes?
The change to the above comment explains why the change is needed.
Without these changes, a call in a parallel worker to GetCurrentCommandId() will result in an Assert being fired because (prior to the patch) currentCommandIdUsed is forbidden to be set in a parallel worker, and calling GetCurrentCommandId(true) (to signify the intent to use the returned CommandId to mark inserted/updated/deleted tuples) will result in currentCommandIdUsed being set to true.
So it is clear that this cannot remain the same, in order to support Parallel INSERT by workers.
So for each worker, the patch sets "currentCommandIdUsed" to true at the start of the parallel operation (using SetCurrentCommandIdUsedForWorker()) and the Assert condition in GetCurrentCommandId() is tweaked to fire the Assert if GetCurrentCommandId(true) is called in a parallel worker when currentCommandIdUsed is false;
To me, this makes perfect sense.
>If this kind of change is really necessary, it seems more natural to pass currentCommandIdUsed together with currentCommandId through SerializeTransactionState() and StartParallelWorkerTransaction(), instead of the above changes.
No, I don't agree with that. That approach doesn't sound right to me at all.
All the patch really changes is WHERE "currentCurrentIdUsed" can be set for a parallel worker - now it is only allowed to be set to true at the start of the parallel operation for each worker, and the Assert (which is just a sanity check) is updated to ensure that for workers, it can only be set true at that time. That's all it does. It's completely consistent with the old comment that said "We could relax this restriction when currentCommandIdUsed was already true at the start of the parallel operation" - that's what we are now doing with the patch.
> As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().
>
Thanks, I'll fix that in the comments.
>
> (8)
> + /*
> + * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> + * the relation, and this would result in creation of new CommandIds
> + * on insert/update/delete and this isn't supported in a parallel
> + * worker (but is safe in the parallel leader).
> + */
> + trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> + if (trigtype == RI_TRIGGER_FK)
> + {
> + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> + return true;
> + }
>
> Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs. See RI_FKey_check() which is called in RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the detectNewRows argument set to false, which causes CommandCounterIncrement() to not be called.
>
Hmmm, I'm not sure that you have read and interpreted the patch code correctly.
The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will generate a new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new command IDs across the participants in the parallel operation). This is what the code comment says, It does not say that such a trigger generates a new command ID.
See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665
In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK).
If you have a test case that breaks the existing patch, please let me know.
Regards,
Greg Nancarrow
Fujitsu Australia
From: Greg Nancarrow <gregn4422@gmail.com> > > (1) > Yes, you're right the wording is not right (and I don't really like > the wording used before the patch). > > Perhaps it could say: > > (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT > INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as > of now, other than in the case of INSERT INTO...SELECT, only the leader > backend > writes into a completely new table. In the future, we can extend it to > allow workers for the > other commands to write into the table. However, to allow parallel > updates and deletes, we > have to solve other problems, especially around combo CIDs.) That looks good to me, thanks. > > (4) > You could, but I'm not sure it would make the code easier to read, > especially for those who don't know !isParallelWorker() means it's a > parallel leader. ... > I really can't see a problem. PrepareParallelMode() is only needed > prior to execution of a parallel plan, so it's not needed for "other > call sites" using EnterParallelMode(). My frank first impressions were (and are): * Why do we have to call a separate function for preparation despite the actual entering follows immediately? We can donecessary preparation in the entering function. * Those who read the parallel index build and parallel VACUUM code for the first time might be startled at the missing PrepareParallelMode()call: "Oh, EnterParallelMode() is called without preparation unlike the other site I saw the other day. Isn't this a but?" > Perhaps the name can be changed to disassociate it from generic > EnterParallelMode() usage. So far, I've only thought of long names > like: PrepareParallelModePlanExec(). > Ideas? What PrepareParallelMode() handles is the XID and command ID, which are managed by access/transam/ module and are not executor-specific. It's natural (or at least not unnatural) that EnterParallelMode() prepares them, because EnterParallelMode()is part of access/transam/. Regards Takayuki Tsunakawa
On Mon, Jan 25, 2021 at 2:22 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi,
After doing some test to cover the code path in the PATCH 0001.
I have some suggestions for the 0002 testcase.
(1)
+ /* Check parallel-safety of any expressions in the partition key */
+ if (get_partition_col_attnum(pkey, i) == 0)
+ {
+ Node *check_expr = (Node *) lfirst(partexprs_item);
+
+ if (max_parallel_hazard_walker(check_expr, context))
+ {
+ table_close(rel, lockmode);
+ return true;
+ }
The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition key).
Personally, I use the following sql to cover this:
-----
create table partkey_unsafe_key_expr_t (a int4, b name) partition by range ((fullname_parallel_unsafe('',a::varchar)));
explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1;
-----
Thanks. It looks like that test case was accidently missed (since the comment said to test the index expressions, but it actually tested the support functions).
I'll update the test code (and comments) accordingly, using your suggestion.
(2)
I noticed that most of testcase test both (parallel safe/unsafe/restricted).
But the index expression seems does not test the parallel restricted.
How about add a testcase like:
-----
create or replace function fullname_parallel_restricted(f text, l text) returns text as $$
begin
return f || l;
end;
$$ language plpgsql immutable parallel restricted;
create table names4(index int, first_name text, last_name text);
create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name));
--
-- Test INSERT with parallel-restricted index expression
-- (should create a parallel plan)
--
explain (costs off) insert into names4 select * from names;
-----
Thanks, looks like that test case is missing, I'll add it as you suggest.
(3)
+ /* Recursively check each partition ... */
+ pdesc = RelationGetPartitionDesc(rel);
+ for (i = 0; i < pdesc->nparts; i++)
+ {
+ if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+ command_type,
+ context,
+ AccessShareLock))
+ {
+ table_close(rel, lockmode);
+ return true;
+ }
+ }
It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition)
Hoe about add one testcase to test parallel unsafe partition ?
OK, I have to create a more complex table to test those other potential parallel-safety issues of partitions (other than what was tested before the recursive call, or support functions and expression in index key), but since it's a recursive call, invoking code that's already been tested, I would not anticipate any problems.
Thanks,
Greg Nancarrow
Fujitsu Australia
On Mon, Jan 25, 2021 at 4:37 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > > > (4) > > You could, but I'm not sure it would make the code easier to read, > > especially for those who don't know !isParallelWorker() means it's a > > parallel leader. > ... > > I really can't see a problem. PrepareParallelMode() is only needed > > prior to execution of a parallel plan, so it's not needed for "other > > call sites" using EnterParallelMode(). > My frank first impressions were (and are): > > * Why do we have to call a separate function for preparation despite the actual entering follows immediately? We can donecessary preparation in the entering function. > > * Those who read the parallel index build and parallel VACUUM code for the first time might be startled at the missingPrepareParallelMode() call: "Oh, EnterParallelMode() is called without preparation unlike the other site I saw theother day. Isn't this a but?" > > > > Perhaps the name can be changed to disassociate it from generic > > EnterParallelMode() usage. So far, I've only thought of long names > > like: PrepareParallelModePlanExec(). > > Ideas? > > What PrepareParallelMode() handles is the XID and command ID, which are managed by access/transam/ module and are not executor-specific. It's natural (or at least not unnatural) that EnterParallelMode() prepares them, because EnterParallelMode()is part of access/transam/. > > EnterParallelMode() is part of a generic interface for execution of a parallel operation, and EnterParallelMode() is called in several different places to enter parallel mode prior to execution of different parallel operations. At the moment it is assumed that EnterParallelMode() just essentially sets a flag to prohibit certain unsafe operations when doing the parallel operation. If I move PrepareParallelMode() into EnterParallelMode() then I need to pass in contextual information to distinguish who the caller is, and possibly extra information needed by that caller - and change the function call for each caller, and probably update the comments for each, and in other places, etc. etc. I think that it just complicates things doing this. The other callers of EnterParallelMode() are obviously currently doing their own "pre" parallel-mode code themselves, specific to whatever parallel operation they are doing - but nobody has thought it necessary to have to hook this code into EnterParallelMode(). I think the "PrepareParallelMode()" name can just be changed to something specific to plan execution, so nobody gets confused with a name like "PrepareParallelMode()", which as you point out sounds generic to all callers of EnterParallelMode(). Regards, Greg Nancarrow Fujitsu Australia
On Fri, Jan 22, 2021 at 7:52 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > (1) > - * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE > - * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader > - * backend writes into a completely new table. In the future, we can > - * extend it to allow workers to write into the table. However, to allow > - * parallel updates and deletes, we have to solve other problems, > - * especially around combo CIDs.) > + * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT > + * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as > + * of now, only the leader backend writes into a completely new table. In > > This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a completelynew table." The reality is that workers as well as the leader can write into an empty or non-empty table in parallel,isn't it? > > Sorry, I've just realized that this is in reference to the 1st patch (v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), which implements parallel SELECT for INSERT. In that case, data is SELECTed in parallel by the workers, but only INSERTed by the parallel leader. So the patch comment is, in fact, correct. In the 3rd patch (v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch), which implements parallel INSERT, the wording for this comment is again altered, to reflect the fact that parallel workers also write into the table. Regards, Greg Nancarrow Fujitsu Australia
Hi, When reading the code of rel_max_parallel_hazard_for_modify in 0001. I thought there are so many places call table_close(). Personally, It's a little confused to me. Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ? Like: static bool rel_max_parallel_hazard_for_modify(Relation rel, CmdType command_type, max_parallel_hazard_context *context); ... Relation relation = table_open(rte->relid, NoLock); (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, &context); table_close(relation, NoLock); And we seems do not need the lockmode param with the above define. Best regards, houzj
On Mon, Jan 25, 2021 at 10:40 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi, > > When reading the code of rel_max_parallel_hazard_for_modify in 0001. > > I thought there are so many places call table_close(). > Personally, It's a little confused to me. > > Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ? > > Like: > > static bool rel_max_parallel_hazard_for_modify(Relation rel, > CmdType command_type, > max_parallel_hazard_context *context); > ... > Relation relation = table_open(rte->relid, NoLock); > (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, &context); > table_close(relation, NoLock); > > > And we seems do not need the lockmode param with the above define. > > Yeah, the repeated cleanup at the point of return is a bit ugly. It could be solved by changing the function to do cleanup at a common return point, but I agree with you that in this case it could simply be done outside the function. Thanks, I'll make that change. Regards, Greg Nancarrow Fujitsu Australia
Hi, I have an issue of the check about column default expressions. + if (command_type == CMD_INSERT) + { + /* + * Column default expressions for columns in the target-list are + * already being checked for parallel-safety in the + * max_parallel_hazard() scan of the query tree in standard_planner(). + */ + + tupdesc = RelationGetDescr(rel); + for (attnum = 0; attnum < tupdesc->natts; attnum++) IMO, max_parallel_hazard() only check the parent table's default expressions, But if the table has partitions and its partitionhave its own default expressions, max_parallel_hazard() seems does not check that. And we seems does not check that too. I am not sure should we allow parallel insert for this case ? Example: ------------------------- set parallel_setup_cost=0; set parallel_tuple_cost=0; set min_parallel_table_scan_size=0; set max_parallel_workers_per_gather=4; create table origin(a int); insert into origin values(generate_series(1,5000)); create or replace function bdefault_unsafe () returns int language plpgsql parallel unsafe as $$ begin RETURN 5; end $$; create table parttable1 (a int, b name) partition by range (a); create table parttable1_1 partition of parttable1 for valuesfrom (0) to (5000); create table parttable1_2 partition of parttable1 for values from (5000) to (10000); alter table parttable1_1 ALTER COLUMN b SET DEFAULT bdefault_unsafe(); postgres=# explain insert into parttable1 select * from origin ; QUERY PLAN -------------------------------------------------------------------------------- Gather (cost=0.00..41.92 rows=5865 width=0) Workers Planned: 3 -> Insert on parttable1 (cost=0.00..41.92 rows=0 width=0) -> Parallel Seq Scan on origin (cost=0.00..41.92 rows=1892 width=68) (4 rows) postgres=# explain insert into parttable1_1 select * from origin ; QUERY PLAN ------------------------------------------------------------------- Insert on parttable1_1 (cost=0.00..1348.00 rows=0 width=0) -> Seq Scan on origin (cost=0.00..1348.00 rows=5000 width=68) (2 rows) ------------------------- Best regards, houzj
From: Hou, Zhijie <houzj.fnst@cn.fujitsu.com> > IMO, max_parallel_hazard() only check the parent table's default expressions, > But if the table has partitions and its partition have its own default expressions, > max_parallel_hazard() seems does not check that. > And we seems does not check that too. > > I am not sure should we allow parallel insert for this case ? I think we can allow parallel insert in this case, because the column value is determined according to the DEFAULT definitionof the target table specified in the INSERT statement. This is described here: https://www.postgresql.org/docs/devel/sql-createtable.html "Defaults may be specified separately for each partition. But note that a partition's default value is not applied when insertinga tuple through a partitioned table." So the parallel-unsafe function should not be called. Regards Takayuki Tsunakawa
> I think we can allow parallel insert in this case, because the column value > is determined according to the DEFAULT definition of the target table > specified in the INSERT statement. This is described here: > > https://www.postgresql.org/docs/devel/sql-createtable.html > > "Defaults may be specified separately for each partition. But note that > a partition's default value is not applied when inserting a tuple through > a partitioned table." > > So the parallel-unsafe function should not be called. Thanks for the explanation. I think you are right, I did miss it. Best regards, houzj
Hi, When testing the patch with the following kind of sql. --- Insert into part_table select 1; Insert into part_table select generate_series(1,10000,1); Insert into part_table select * from testfunc(); --- we usually use these sqls to initialize the table or for testing purpose. Personally I think we do not need to do the parallel safety-check for these cases, because there seems no chance for the select part to consider parallel. I thought we aim to not check the safety unless parallel is possible. , So I was thinking is it possible to avoid the check it these cases ? I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query. If no we do not check the safety-check. I am not sure is it worth to do that, any thoughts ? Best regards, Houzj
On Wed, Jan 27, 2021 at 2:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi, > > When testing the patch with the following kind of sql. > > --- > Insert into part_table select 1; > Insert into part_table select generate_series(1,10000,1); > Insert into part_table select * from testfunc(); > --- > > we usually use these sqls to initialize the table or for testing purpose. > > Personally I think we do not need to do the parallel safety-check for these cases, > because there seems no chance for the select part to consider parallel. > > I thought we aim to not check the safety unless parallel is possible. > , So I was thinking is it possible to avoid the check it these cases ? > > I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query. > If no we do not check the safety-check. > > I am not sure is it worth to do that, any thoughts ? > Yes, I think it's worth it. It's surprising that there's not really any optimizations for these with just the current Postgres parallel SELECT functionality (as there's currently no way to divide the work for these amongst the workers, even if the function/expression is parallel-safe). For the additional parallel-safety checks for INSERT, currently we check that RTE_SUBQUERY is in the range-table. So I think we can additionally check that RTE_RELATION is in the subquery range-table (otherwise treat it as unsafe). Regards, Greg Nancarrow Fujitsu Australia
Thanks for the feedback. Posting an updated set of patches. Changes are based on feedback, as detailed below: [Hou] - Deallocate list returned from RelationGetIndexList() using list_free() after finished using list - Regard foreign and temporary tables as parallel-restricted (instead of parallel unsafe) for Insert - pfree() conbin returned from TextDatumGetCString() after finished using it - Make parallel-safety checks of index expressions more efficient, remove some redundant function calls - Add a few more test cases to cover certain untested parallel-safety check cases - Remove repeated table_close() on return, by moving table_open() & table_close() to a higher level - Reduce Insert parallel-safety checks required for some SQL, by noting that the subquery must operate on a relation (check for RTE_RELATION in subquery range-table) [Zhihong Yu] - Minor change to patch comment - Wrap long line - Remove intermediate local variable [Tsunakawa-san] - Update RELATION_IS_LOCAL macro to reinstate previously-removed check on the relation being newly created in the current transaction (and so assumed accessible only to the current backend), but for non-parallel-mode only (since now it may be accessible to parallel workers) - Remove braces for one-line else - Fix code comment - Rename PrepareParallelMode() for plan execution, so that it's not misinterpreted as a general function for preparation of parallel-mode [Misc] - Fix bug in query re-writer - hasModifyingCTE is not set in re-written non-SELECT queries having a CTE Regards, Greg Nancarrow Fujitsu Australia
Attachment
Hi Greg, Recently, I was keeping evaluating performance of this patch(1/28 V13). Here I find a regression test case which is parallel insert with bitmap heap scan. when the target table has primary key or index, then the patched performance will have a 7%-19% declines than unpatched. Could you please have a look about this? I tried max_parallel_workers_per_gather=2/4/8, and I didn't tune other parameters(like GUCs or other enforce parallel parameters). 1. max_parallel_workers_per_gather=2(default) target_table patched master %reg ------------------------------------------------------ without_PK_index 83.683 142.183 -41% with_PK 382.824 321.101 19% with_index 372.682 324.246 15% 2. max_parallel_workers_per_gather=4 target_table patched master %reg ------------------------------------------------------ without_PK_index 73.189 141.879 -48% with_PK 362.104 329.759 10% with_index 372.237 333.718 12% 3. max_parallel_workers_per_gather=8 (also set max_parallel_workers=16, max_worker_processes = 16) target_table patched master %reg ------------------------------------------------------ without_PK_index 75.072 146.100 -49% with_PK 365.312 324.339 13% with_index 362.636 338.366 7% Attached test_bitmap.sql which includes my test data and sql if you want to have a look. Regards, Tang
Attachment
Hi, When developing the reloption patch, I noticed some issues in the patch. 1). > - Reduce Insert parallel-safety checks required for some SQL, by noting > that the subquery must operate on a relation (check for RTE_RELATION in > subquery range-table) + foreach(lcSub, rte->subquery->rtable) + { + rteSub = lfirst_node(RangeTblEntry, lcSub); + if (rteSub->rtekind == RTE_RELATION) + { + hasSubQueryOnRelation = true; + break; + } + } It seems we can not only search RTE_RELATION in rtable, because RTE_RELATION may exist in other place like: --- --** explain insert into target select (select * from test); Subplan's subplan --** with cte as (select * from test) insert into target select * from cte; In query's ctelist. --- May be we should use a walker function [1] to search the subquery and ctelist. 2). +-- +-- Test INSERT into temporary table with underlying query. +-- (should not use a parallel plan) +-- May be the comment here need some change since we currently support parallel plan for temp table. 3) Do you think we can add a testcase for foreign-table ? To test parallel query with serial insert on foreign table. [1] static bool relation_walker(Node *node) { if (node == NULL) return false; else if (IsA(node, RangeTblEntry)) { RangeTblEntry *rte = (RangeTblEntry *) node; if (rte->rtekind == RTE_RELATION) return true; return false; } else if (IsA(node, Query)) { Query *query = (Query *) node; /* Recurse into subselects */ return query_tree_walker(query, relation_walker, NULL, QTW_EXAMINE_RTES_BEFORE); } /* Recurse to check arguments */ return expression_tree_walker(node, relation_walker, NULL); } Best regards, houzj
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > When developing the reloption patch, I noticed some issues in the patch. > > 1). > > - Reduce Insert parallel-safety checks required for some SQL, by noting > > that the subquery must operate on a relation (check for RTE_RELATION in > > subquery range-table) > > + foreach(lcSub, rte->subquery->rtable) > + { > + rteSub = lfirst_node(RangeTblEntry, lcSub); > + if (rteSub->rtekind == RTE_RELATION) > + { > + hasSubQueryOnRelation = true; > + break; > + } > + } > It seems we can not only search RTE_RELATION in rtable, > because RTE_RELATION may exist in other place like: > > --- > --** explain insert into target select (select * from test); > Subplan's subplan > > --** with cte as (select * from test) insert into target select * from cte; > In query's ctelist. > --- > > May be we should use a walker function [1] to > search the subquery and ctelist. > Yes, the current checks are too simple, as you point out, there seem to be more complex cases that it doesn't pick up. Unfortunately expanding the testing for them does detract from the original intention of this code (which was to avoid extra parallel-safety check processing on code which can't be run in parallel). I guess the relation walker function should additionally check for SELECT queries only (commandType == CMD_SELECT), and exclude SELECT FOR UPDATE/SHARE (rowMarks != NIL) too. I'll need to look further into it, but will certainly update the code for the next version of the patch. > 2). > > +-- > +-- Test INSERT into temporary table with underlying query. > +-- (should not use a parallel plan) > +-- > > May be the comment here need some change since > we currently support parallel plan for temp table. > Thanks, it should say something like "should create the plan with INSERT + parallel SELECT". > 3) > Do you think we can add a testcase for foreign-table ? > To test parallel query with serial insert on foreign table. > I have intended to do it, but as a lower-priority task. > > [1] > static bool > relation_walker(Node *node) > { > if (node == NULL) > return false; > > else if (IsA(node, RangeTblEntry)) > { > RangeTblEntry *rte = (RangeTblEntry *) node; > if (rte->rtekind == RTE_RELATION) > return true; > > return false; > } > > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* Recurse into subselects */ > return query_tree_walker(query, relation_walker, > NULL, QTW_EXAMINE_RTES_BEFORE); > } > > /* Recurse to check arguments */ > return expression_tree_walker(node, > relation_walker, > NULL); > } > Regards, Greg Nancarrow Fujitsu Australia
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi, > > When developing the reloption patch, I noticed some issues in the patch. > > 1). > > - Reduce Insert parallel-safety checks required for some SQL, by noting > > that the subquery must operate on a relation (check for RTE_RELATION in > > subquery range-table) > > + foreach(lcSub, rte->subquery->rtable) > + { > + rteSub = lfirst_node(RangeTblEntry, lcSub); > + if (rteSub->rtekind == RTE_RELATION) > + { > + hasSubQueryOnRelation = true; > + break; > + } > + } > It seems we can not only search RTE_RELATION in rtable, > because RTE_RELATION may exist in other place like: > > --- > --** explain insert into target select (select * from test); > Subplan's subplan > > --** with cte as (select * from test) insert into target select * from cte; > In query's ctelist. > --- > > May be we should use a walker function [1] to > search the subquery and ctelist. > > > > [1] > static bool > relation_walker(Node *node) > { > if (node == NULL) > return false; > > else if (IsA(node, RangeTblEntry)) > { > RangeTblEntry *rte = (RangeTblEntry *) node; > if (rte->rtekind == RTE_RELATION) > return true; > > return false; > } > > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* Recurse into subselects */ > return query_tree_walker(query, relation_walker, > NULL, QTW_EXAMINE_RTES_BEFORE); > } > > /* Recurse to check arguments */ > return expression_tree_walker(node, > relation_walker, > NULL); > } > I've had a further look at this, and this walker function is doing a lot of work recursing the parse tree, and I'm not sure that it reliably retrieves the information that we;re looking for, for all cases of different SQL queries. Unless it can be made much more efficient and specific to our needs, I think we should not try to do this optimization, because there's too much overhead. Also, keep in mind that for the current parallel SELECT functionality in Postgres, I don't see any similar optimization being attempted (and such optimization should be attempted at the SELECT level). So I don't think we should be attempting such optimization in this patch (but could be attempted in a separate patch, just related to current parallel SELECT functionality). Regards, Greg Nancarrow Fujitsu Australia
> > I've had a further look at this, and this walker function is doing a lot > of work recursing the parse tree, and I'm not sure that it reliably retrieves > the information that we;re looking for, for all cases of different SQL > queries. Unless it can be made much more efficient and specific to our needs, > I think we should not try to do this optimization, because there's too much > overhead. Also, keep in mind that for the current parallel SELECT > functionality in Postgres, I don't see any similar optimization being > attempted (and such optimization should be attempted at the SELECT level). > So I don't think we should be attempting such optimization in this patch > (but could be attempted in a separate patch, just related to current parallel > SELECT functionality). Yes, I agreed, I was worried about the overhead it may bring too, we can remove this from the current patch. Best regards, houzj
On Tue, Feb 2, 2021 at 7:26 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > Yes, I agreed, > I was worried about the overhead it may bring too, > we can remove this from the current patch. > Posting an updated set of patches. Changes are based on feedback, as detailed below: [Hou] - Corrected code that tries to check for underlying query on a relation (but must at least check for underlying query, to rule out VALUES). More complex tests not used, due to overhead. - Fixed comment in tests for INSERT on temp table. [Antonin] - Moved Gather node targetlist fix-up code to set_plan_refs() and updated comment (Antonin) - Added extra tests for INSERT with RETURNING clause. Hou: the parallel_dml patches will need slight rebasing Regards, Greg Nancarrow Fujitsu Australia
Attachment
From: Amit Kapila <amit.kapila16@gmail.com> > On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying > <tanghy.fnst@cn.fujitsu.com> wrote: > > Execute EXPLAIN on Patched: > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part > select * from test_data1; > > QUERY PLAN > > > --------------------------------------------------------------------------- > --------------------------------------------- > > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual > time=44.139..44.140 rows=0 loops=1) > > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000 > > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 > width=8) (actual time=0.007..0.201 rows=1000 loops=1) > > Output: test_data1.a, test_data1.b > > Buffers: shared hit=5 > > Planning: > > Buffers: shared hit=27011 > > Planning Time: 24.526 ms > > Execution Time: 44.981 ms > > > > Execute EXPLAIN on non-Patched: > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part > select * from test_data1; > > QUERY PLAN > > > --------------------------------------------------------------------------- > --------------------------------------------- > > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual > time=72.656..72.657 rows=0 loops=1) > > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000 > > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 > width=8) (actual time=0.010..0.175 rows=1000 loops=1) > > Output: test_data1.a, test_data1.b > > Buffers: shared hit=5 > > Planning: > > Buffers: shared hit=72 > > Planning Time: 0.135 ms > > Execution Time: 79.058 ms > > > > So, the results indicate that after the patch we touch more buffers > during planning which I think is because of accessing the partition > information, and during execution, the patch touches fewer buffers for > the same reason. But why this can reduce the time with patch? I think > this needs some investigation. I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched versionhas to gradually build those cache entries during execution. How can wee confirm its effect? Regards Takayuki Tsunakawa
On Thu, Feb 4, 2021 at 6:26 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Amit Kapila <amit.kapila16@gmail.com> > > On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying > > <tanghy.fnst@cn.fujitsu.com> wrote: > > > Execute EXPLAIN on Patched: > > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part > > select * from test_data1; > > > QUERY PLAN > > > > > --------------------------------------------------------------------------- > > --------------------------------------------- > > > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual > > time=44.139..44.140 rows=0 loops=1) > > > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000 > > > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 > > width=8) (actual time=0.007..0.201 rows=1000 loops=1) > > > Output: test_data1.a, test_data1.b > > > Buffers: shared hit=5 > > > Planning: > > > Buffers: shared hit=27011 > > > Planning Time: 24.526 ms > > > Execution Time: 44.981 ms > > > > > > Execute EXPLAIN on non-Patched: > > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part > > select * from test_data1; > > > QUERY PLAN > > > > > --------------------------------------------------------------------------- > > --------------------------------------------- > > > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual > > time=72.656..72.657 rows=0 loops=1) > > > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000 > > > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000 > > width=8) (actual time=0.010..0.175 rows=1000 loops=1) > > > Output: test_data1.a, test_data1.b > > > Buffers: shared hit=5 > > > Planning: > > > Buffers: shared hit=72 > > > Planning Time: 0.135 ms > > > Execution Time: 79.058 ms > > > > > > > So, the results indicate that after the patch we touch more buffers > > during planning which I think is because of accessing the partition > > information, and during execution, the patch touches fewer buffers for > > the same reason. But why this can reduce the time with patch? I think > > this needs some investigation. > > I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched versionhas to gradually build those cache entries during execution. > Right. > How can wee confirm its effect? > I am not sure but if your theory is correct then won't in consecutive runs both should have the same performance? -- With Regards, Amit Kapila.
On Thu, Feb 4, 2021 at 11:56 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > > > So, the results indicate that after the patch we touch more buffers > > during planning which I think is because of accessing the partition > > information, and during execution, the patch touches fewer buffers for > > the same reason. But why this can reduce the time with patch? I think > > this needs some investigation. > > I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched versionhas to gradually build those cache entries during execution. How can wee confirm its effect? > I believe that we can confirm its effect by invalidating relcache and catcache, in both the patched and unpatched versions, just after the parallel-safety checks are performed in the planner, and then running tests and comparing the performance. So that's exactly what I did (adding a call to InvalidateSystemCaches() just after the parallel-safety checks in the planner). I found that then the unpatched version always performed better than the patched version for tests inserting 1000 records into a table with 100,200,500 and 1000 partitions. Looking at the breakdown of the timing for each Insert, the Planning Time was always significantly more for the patched version (expected, because it does extra checks), but the Execution Time was very similar for both the patched and unpatched versions. Regards, Greg Nancarrow Fujitsu Australia
Hi, I took a look into the hasModifyingCTE bugfix recently, and found a possible bug case without the parallel insert patch. --------------------------------- drop table if exists test_data1; create table test_data1(a int, b int) ; insert into test_data1 select generate_series(1,1000), generate_series(1,1000); set force_parallel_mode=on; CREATE TEMP TABLE bug6051 AS select i from generate_series(1,3) as i; SELECT * FROM bug6051; CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as i from test_data1; WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 SELECT * FROM t1; ******* ***ERROR: cannot assign XIDs during a parallel operation ******* --------------------------------- I debugged it and it did have modifycte in the parsetree after rewrite. I think if we can properly set the hasModifyingCTE, we can avoid this error by not consider parallel for this. Thoughts ? Best regards, houzj
On Fri, Feb 5, 2021 at 2:58 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi, > > I took a look into the hasModifyingCTE bugfix recently, > and found a possible bug case without the parallel insert patch. > > --------------------------------- > drop table if exists test_data1; > create table test_data1(a int, b int) ; > insert into test_data1 select generate_series(1,1000), generate_series(1,1000); > set force_parallel_mode=on; > > CREATE TEMP TABLE bug6051 AS > select i from generate_series(1,3) as i; > > SELECT * FROM bug6051; > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as i from test_data1; > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 SELECT * FROM t1; > > ******* > ***ERROR: cannot assign XIDs during a parallel operation > ******* > --------------------------------- > > I debugged it and it did have modifycte in the parsetree after rewrite. > I think if we can properly set the hasModifyingCTE, we can avoid this error by not consider parallel for this. > Thanks. You've identified that the bug exists for SELECT too. I've verified that the issue is fixed by the bugfix included in the Parallel INSERT patch. Are you able to review my bugfix? Since the problem exists for SELECT in the current Postgres code, I'd like to pull that bugfix out and provide it as a separate fix. My concern is that there may well be a better way to fix the issue - for example, during the re-writing, rather than after the query has been re-written. Regards, Greg Nancarrow Fujitsu Australia
> > I took a look into the hasModifyingCTE bugfix recently, and found a > > possible bug case without the parallel insert patch. > > > > --------------------------------- > > drop table if exists test_data1; > > create table test_data1(a int, b int) ; insert into test_data1 select > > generate_series(1,1000), generate_series(1,1000); set > > force_parallel_mode=on; > > > > CREATE TEMP TABLE bug6051 AS > > select i from generate_series(1,3) as i; > > > > SELECT * FROM bug6051; > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as > > i from test_data1; > > > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 > > SELECT * FROM t1; > > > > ******* > > ***ERROR: cannot assign XIDs during a parallel operation > > ******* > > --------------------------------- > > > > I debugged it and it did have modifycte in the parsetree after rewrite. > > I think if we can properly set the hasModifyingCTE, we can avoid this > error by not consider parallel for this. > > > > Thanks. You've identified that the bug exists for SELECT too. I've verified > that the issue is fixed by the bugfix included in the Parallel INSERT patch. > Are you able to review my bugfix? > Since the problem exists for SELECT in the current Postgres code, I'd like > to pull that bugfix out and provide it as a separate fix. > My concern is that there may well be a better way to fix the issue - for > example, during the re-writing, rather than after the query has been > re-written. Hi, I took a look at the fix and have some thoughts on it. (Please correct me if you have tried this idea and found something is wrong) IMO, the main reason for the hasModifyingCTE=false is that: the Rewriter did not update the hasModifyingCTE when copying the existsing 'cteList' to the rewrited one. It seems there is only one place where ctelist will be copied separately. ------- static Query * rewriteRuleAction(Query *parsetree, ... /* OK, it's safe to combine the CTE lists */ sub_action->cteList = list_concat(sub_action->cteList, copyObject(parsetree->cteList)); + sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE; -------- Based on the above, if we update the hasModifyingCTE here, we may solve this problem. And there is another point here, the sub_action may be not the final parsetree. If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will Put the ctelist into subquery in parsetree's rtable. In this case, we may also need to update the final parsetree too. (I think you know this case, I found same logic in the latest patch) -------- static Query * rewriteRuleAction(Query *parsetree, ... if (sub_action_ptr) + { *sub_action_ptr = sub_action; + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; + } -------- And the Basic test passed. What do you think ? Best regards, houzj
Hi, While reviewing the v14 set of patches (will send my comments shortly), I too had some reservations on how 0001 decided to go about setting hasModifyingCTE. On Fri, Feb 5, 2021 at 1:51 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > I took a look into the hasModifyingCTE bugfix recently, and found a > > > possible bug case without the parallel insert patch. > > > > > > --------------------------------- > > > drop table if exists test_data1; > > > create table test_data1(a int, b int) ; insert into test_data1 select > > > generate_series(1,1000), generate_series(1,1000); set > > > force_parallel_mode=on; > > > > > > CREATE TEMP TABLE bug6051 AS > > > select i from generate_series(1,3) as i; > > > > > > SELECT * FROM bug6051; > > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as > > > i from test_data1; > > > > > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 > > > SELECT * FROM t1; > > > > > > ******* > > > ***ERROR: cannot assign XIDs during a parallel operation > > > ******* > > > --------------------------------- > > > > > > I debugged it and it did have modifycte in the parsetree after rewrite. > > > I think if we can properly set the hasModifyingCTE, we can avoid this > > error by not consider parallel for this. > > > > > > > Thanks. You've identified that the bug exists for SELECT too. I've verified > > that the issue is fixed by the bugfix included in the Parallel INSERT patch. > > Are you able to review my bugfix? > > Since the problem exists for SELECT in the current Postgres code, I'd like > > to pull that bugfix out and provide it as a separate fix. +1, a separate patch for this seems better. > > My concern is that there may well be a better way to fix the issue - for > > example, during the re-writing, rather than after the query has been > > re-written. > Hi, > > I took a look at the fix and have some thoughts on it. > (Please correct me if you have tried this idea and found something is wrong) > > IMO, the main reason for the hasModifyingCTE=false is that: > the Rewriter did not update the hasModifyingCTE when copying the existsing 'cteList' to the rewrited one. > > It seems there is only one place where ctelist will be copied separately. > ------- > static Query * > rewriteRuleAction(Query *parsetree, > ... > /* OK, it's safe to combine the CTE lists */ > sub_action->cteList = list_concat(sub_action->cteList, > copyObject(parsetree->cteList)); > + sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > -------- > > Based on the above, if we update the hasModifyingCTE here, we may solve this problem. > > And there is another point here, the sub_action may be not the final parsetree. > If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will > Put the ctelist into subquery in parsetree's rtable. > In this case, we may also need to update the final parsetree too. > (I think you know this case, I found same logic in the latest patch) > > -------- > static Query * > rewriteRuleAction(Query *parsetree, > ... > if (sub_action_ptr) > + { > *sub_action_ptr = sub_action; > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > + } > -------- > > And the Basic test passed. > What do you think ? That is very close to what I was going to suggest, which is this: diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 0672f497c6..3c4417af98 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, checkExprHasSubLink((Node *) rule_action->returningList); } + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; + return rule_action; } -- Amit Langote EDB: http://www.enterprisedb.com
> > -------- > > static Query * > > rewriteRuleAction(Query *parsetree, > > ... > > if (sub_action_ptr) > > + { > > *sub_action_ptr = sub_action; > > + rule_action->hasModifyingCTE |= > parsetree->hasModifyingCTE; > > + } > > -------- > > > > And the Basic test passed. > > What do you think ? > > That is very close to what I was going to suggest, which is this: > > diff --git a/src/backend/rewrite/rewriteHandler.c > b/src/backend/rewrite/rewriteHandler.c > index 0672f497c6..3c4417af98 100644 > --- a/src/backend/rewrite/rewriteHandler.c > +++ b/src/backend/rewrite/rewriteHandler.c > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, > checkExprHasSubLink((Node *) > rule_action->returningList); > } > > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > + > return rule_action; > } if (parsetree->cteList != NIL && sub_action->commandType != CMD_UTILITY) { ... sub_action->cteList = list_concat(sub_action->cteList, } Is is possible when sub_action is CMD_UTILITY ? In this case CTE will be copied to the newone, should we set the set the flag in this case ? Best regard, houzj
> > > -------- > > > static Query * > > > rewriteRuleAction(Query *parsetree, > > > ... > > > if (sub_action_ptr) > > > + { > > > *sub_action_ptr = sub_action; > > > + rule_action->hasModifyingCTE |= > > parsetree->hasModifyingCTE; > > > + } > > > -------- > > > > > > And the Basic test passed. > > > What do you think ? > > > > That is very close to what I was going to suggest, which is this: > > > > diff --git a/src/backend/rewrite/rewriteHandler.c > > b/src/backend/rewrite/rewriteHandler.c > > index 0672f497c6..3c4417af98 100644 > > --- a/src/backend/rewrite/rewriteHandler.c > > +++ b/src/backend/rewrite/rewriteHandler.c > > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, > > checkExprHasSubLink((Node *) > > rule_action->returningList); > > } > > > > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > > + > > return rule_action; > > } > > > if (parsetree->cteList != NIL && sub_action->commandType != > CMD_UTILITY) > { > ... > sub_action->cteList = list_concat(sub_action->cteList, > } > > Is is possible when sub_action is CMD_UTILITY ? > In this case CTE will be copied to the newone, should we set the set the > flag in this case ? Sorry , a typo in my word. In this case CTE will not be copied to the newone, should we set the set the flag in this case ? Best regards, houzj
On Fri, Feb 5, 2021 at 4:25 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > > > That is very close to what I was going to suggest, which is this: > > > > > > diff --git a/src/backend/rewrite/rewriteHandler.c > > > b/src/backend/rewrite/rewriteHandler.c > > > index 0672f497c6..3c4417af98 100644 > > > --- a/src/backend/rewrite/rewriteHandler.c > > > +++ b/src/backend/rewrite/rewriteHandler.c > > > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, > > > checkExprHasSubLink((Node *) > > > rule_action->returningList); > > > } > > > > > > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > > > + > > > return rule_action; > > > } > > > > > > if (parsetree->cteList != NIL && sub_action->commandType != > > CMD_UTILITY) > > { > > ... > > sub_action->cteList = list_concat(sub_action->cteList, > > } > > > > Is is possible when sub_action is CMD_UTILITY ? > > In this case CTE will be copied to the newone, should we set the set the > > flag in this case ? > > Sorry , a typo in my word. > In this case CTE will not be copied to the newone, should we set the set the flag in this case ? > No, strictly speaking, we probably shouldn't, because the CTE wasn't copied in that case. Also, I know the bitwise OR "works" in this case, but I think some will frown on use of that for a bool. IMHO better to use: if (parsetree->hasModifyingCTE) rule_action->hasModifyingCTE = true; So patch might be something like: diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 0672f497c6..a989e02925 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -557,6 +557,8 @@ rewriteRuleAction(Query *parsetree, /* OK, it's safe to combine the CTE lists */ sub_action->cteList = list_concat(sub_action->cteList, copyObject(parsetree->cteList)); + if (parsetree->hasModifyingCTE) + sub_action->hasModifyingCTE = true; } /* @@ -594,6 +596,9 @@ rewriteRuleAction(Query *parsetree, *sub_action_ptr = sub_action; else rule_action = sub_action; + + if (parsetree->hasModifyingCTE) + sub_action->hasModifyingCTE = true; } /* I'll do some further checks, because the rewriting is recursive and tricky, so don't want to miss any cases ... Regards, Greg Nancarrow Fujitsu Australia
On Fri, Feb 5, 2021 at 2:55 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Fri, Feb 5, 2021 at 4:25 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > That is very close to what I was going to suggest, which is this: > > > > > > > > diff --git a/src/backend/rewrite/rewriteHandler.c > > > > b/src/backend/rewrite/rewriteHandler.c > > > > index 0672f497c6..3c4417af98 100644 > > > > --- a/src/backend/rewrite/rewriteHandler.c > > > > +++ b/src/backend/rewrite/rewriteHandler.c > > > > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, > > > > checkExprHasSubLink((Node *) > > > > rule_action->returningList); > > > > } > > > > > > > > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > > > > + > > > > return rule_action; > > > > } > > > > > > > > > if (parsetree->cteList != NIL && sub_action->commandType != > > > CMD_UTILITY) > > > { > > > ... > > > sub_action->cteList = list_concat(sub_action->cteList, > > > } > > > > > > Is is possible when sub_action is CMD_UTILITY ? > > > In this case CTE will be copied to the newone, should we set the set the > > > flag in this case ? > > > > Sorry , a typo in my word. > > In this case CTE will not be copied to the newone, should we set the set the flag in this case ? > > > > No, strictly speaking, we probably shouldn't, because the CTE wasn't > copied in that case. Right. > Also, I know the bitwise OR "works" in this case, but I think some > will frown on use of that for a bool. > IMHO better to use: > > if (parsetree->hasModifyingCTE) > rule_action->hasModifyingCTE = true; > > So patch might be something like: > > diff --git a/src/backend/rewrite/rewriteHandler.c > b/src/backend/rewrite/rewriteHandler.c > index 0672f497c6..a989e02925 100644 > --- a/src/backend/rewrite/rewriteHandler.c > +++ b/src/backend/rewrite/rewriteHandler.c > @@ -557,6 +557,8 @@ rewriteRuleAction(Query *parsetree, > /* OK, it's safe to combine the CTE lists */ > sub_action->cteList = list_concat(sub_action->cteList, > copyObject(parsetree->cteList)); > + if (parsetree->hasModifyingCTE) > + sub_action->hasModifyingCTE = true; > } > > /* > @@ -594,6 +596,9 @@ rewriteRuleAction(Query *parsetree, > *sub_action_ptr = sub_action; > else > rule_action = sub_action; > + > + if (parsetree->hasModifyingCTE) > + sub_action->hasModifyingCTE = true; > } That may be better. BTW, the original query's cteList is copied into sub_action query but not into rule_action for reasons I haven't looked very closely into, even though we'd like to ultimately set the latter's hasModifyingCTE to reflect the original query's, right? So we should do the following at some point before returning: if (sub_action->hasModifyingCTE) rule_action->hasModifyingCTE = true; > I'll do some further checks, because the rewriting is recursive and > tricky, so don't want to miss any cases ... Always a good idea. -- Amit Langote EDB: http://www.enterprisedb.com
On Fri, Feb 5, 2021 at 5:21 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > BTW, the original query's cteList is copied into sub_action query but > not into rule_action for reasons I haven't looked very closely into, > even though we'd like to ultimately set the latter's hasModifyingCTE > to reflect the original query's, right? So we should do the following > at some point before returning: > > if (sub_action->hasModifyingCTE) > rule_action->hasModifyingCTE = true; > Actually, rule_action will usually point to sub_action (in which case, no need to copy to rule_action), except if the rule action is an INSERT...SELECT, which seems to be handled by some "kludge" according to the following comment (and KLUDGE ALERT comment in the function that is called): /* * Adjust rule action and qual to offset its varnos, so that we can merge * its rtable with the main parsetree's rtable. * * If the rule action is an INSERT...SELECT, the OLD/NEW rtable entries * will be in the SELECT part, and we have to modify that rather than the * top-level INSERT (kluge!). */ sub_action = getInsertSelectQuery(rule_action, &sub_action_ptr); So in that case (sub_action_ptr != NULL), within rule_action there is a pointer to sub_action (RTE for the subquery), so whenever sub_action is re-created, this pointer needs to be fixed-up. It looks like I might need to copy hasModifyingCTE back to rule_action in this case - but not 100% sure on it yet - still checking that. All tests run so far pass without doing that though. This is one reason for my original approach (though I admit, it was not optimal) because at least it was reliable and detected the modifyingCTE after all the rewriting and kludgy code had finished. Regards, Greg Nancarrow Fujitsu Australia
On Fri, Feb 5, 2021 at 4:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Fri, Feb 5, 2021 at 5:21 PM Amit Langote <amitlangote09@gmail.com> wrote: > > BTW, the original query's cteList is copied into sub_action query but > > not into rule_action for reasons I haven't looked very closely into, > > even though we'd like to ultimately set the latter's hasModifyingCTE > > to reflect the original query's, right? So we should do the following > > at some point before returning: > > > > if (sub_action->hasModifyingCTE) > > rule_action->hasModifyingCTE = true; > > Actually, rule_action will usually point to sub_action (in which case, > no need to copy to rule_action), except if the rule action is an > INSERT...SELECT, which seems to be handled by some "kludge" according > to the following comment (and KLUDGE ALERT comment in the function > that is called): > > /* > * Adjust rule action and qual to offset its varnos, so that we can merge > * its rtable with the main parsetree's rtable. > * > * If the rule action is an INSERT...SELECT, the OLD/NEW rtable entries > * will be in the SELECT part, and we have to modify that rather than the > * top-level INSERT (kluge!). > */ > sub_action = getInsertSelectQuery(rule_action, &sub_action_ptr); > > So in that case (sub_action_ptr != NULL), within rule_action there is > a pointer to sub_action (RTE for the subquery), so whenever sub_action > is re-created, this pointer needs to be fixed-up. > It looks like I might need to copy hasModifyingCTE back to rule_action > in this case - but not 100% sure on it yet - still checking that. All > tests run so far pass without doing that though. I guess we just don't have a test case where the rule_action query is actually parallelized, like one that houzj shared a few emails ago. > This is one reason for my original approach (though I admit, it was > not optimal) because at least it was reliable and detected the > modifyingCTE after all the rewriting and kludgy code had finished. Yeah it's hard to go through all of this highly recursive legacy code to be sure that hasModifyingCTE is consistent with reality in *all* cases, but let's try to do it. No other has* flags are set after-the-fact, so I wouldn't bet on a committer letting this one through. -- Amit Langote EDB: http://www.enterprisedb.com
On Fri, Feb 5, 2021 at 8:07 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > This is one reason for my original approach (though I admit, it was > > not optimal) because at least it was reliable and detected the > > modifyingCTE after all the rewriting and kludgy code had finished. > > Yeah it's hard to go through all of this highly recursive legacy code > to be sure that hasModifyingCTE is consistent with reality in *all* > cases, but let's try to do it. No other has* flags are set > after-the-fact, so I wouldn't bet on a committer letting this one > through. > I have debugged the code a bit more now, and the following patch seems to correctly fix the issue, at least for the known test cases. (i.e. SELECT case, shared by houzj, and the INSERT...SELECT case, as in the "with" regression tests, for which I originally detected the issue) diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 0672f497c6..8f695b32ec 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -557,6 +557,12 @@ rewriteRuleAction(Query *parsetree, /* OK, it's safe to combine the CTE lists */ sub_action->cteList = list_concat(sub_action->cteList, copyObject(parsetree->cteList)); + if (parsetree->hasModifyingCTE) + { + sub_action->hasModifyingCTE = true; + if (sub_action_ptr) + rule_action->hasModifyingCTE = true; + } } /* Regards, Greg Nancarrow Fujitsu Australia
On Fri, Feb 5, 2021 at 6:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Fri, Feb 5, 2021 at 8:07 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > This is one reason for my original approach (though I admit, it was > > > not optimal) because at least it was reliable and detected the > > > modifyingCTE after all the rewriting and kludgy code had finished. > > > > Yeah it's hard to go through all of this highly recursive legacy code > > to be sure that hasModifyingCTE is consistent with reality in *all* > > cases, but let's try to do it. No other has* flags are set > > after-the-fact, so I wouldn't bet on a committer letting this one > > through. > > I have debugged the code a bit more now, and the following patch seems > to correctly fix the issue, at least for the known test cases. > (i.e. SELECT case, shared by houzj, and the INSERT...SELECT case, as > in the "with" regression tests, for which I originally detected the > issue) > > diff --git a/src/backend/rewrite/rewriteHandler.c > b/src/backend/rewrite/rewriteHandler.c > index 0672f497c6..8f695b32ec 100644 > --- a/src/backend/rewrite/rewriteHandler.c > +++ b/src/backend/rewrite/rewriteHandler.c > @@ -557,6 +557,12 @@ rewriteRuleAction(Query *parsetree, > /* OK, it's safe to combine the CTE lists */ > sub_action->cteList = list_concat(sub_action->cteList, > copyObject(parsetree->cteList)); > + if (parsetree->hasModifyingCTE) > + { > + sub_action->hasModifyingCTE = true; > + if (sub_action_ptr) > + rule_action->hasModifyingCTE = true; > + } > } That seems good enough as far as I am concerned. Although either an Assert as follows or a comment why the if (sub_action_ptr) is needed seems warranted. if (sub_action_ptr) rule_action->hasModifyingCTE = true; else Assert(sub_action == rule_action); Does the Assert seem overly confident? -- Amit Langote EDB: http://www.enterprisedb.com
On Fri, Feb 5, 2021 at 11:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > That seems good enough as far as I am concerned. Although either an > Assert as follows or a comment why the if (sub_action_ptr) is needed > seems warranted. > > if (sub_action_ptr) > rule_action->hasModifyingCTE = true; > else > Assert(sub_action == rule_action); > > Does the Assert seem overly confident? > No, the Assert is exactly right, and I'll add a comment too. See below. I'll post the patch separately, if you can't see any further issues. diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 0672f497c6..05b80bd347 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -557,6 +557,21 @@ rewriteRuleAction(Query *parsetree, /* OK, it's safe to combine the CTE lists */ sub_action->cteList = list_concat(sub_action->cteList, copyObject(parsetree->cteList)); + + /* + * If the hasModifyingCTE flag is set in the source parsetree from + * which the CTE list is copied, the flag needs to be set in the + * sub_action and, if applicable, in the rule_action (INSERT...SELECT + * case). + */ + if (parsetree->hasModifyingCTE) + { + sub_action->hasModifyingCTE = true; + if (sub_action_ptr) + rule_action->hasModifyingCTE = true; + else + Assert(sub_action == rule_action); + } } /* Regards, Greg Nancarrow Fujitsu Australia
On Fri, Feb 5, 2021 at 11:01 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Fri, Feb 5, 2021 at 11:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > > That seems good enough as far as I am concerned. Although either an > > Assert as follows or a comment why the if (sub_action_ptr) is needed > > seems warranted. > > > > if (sub_action_ptr) > > rule_action->hasModifyingCTE = true; > > else > > Assert(sub_action == rule_action); > > > > Does the Assert seem overly confident? > > No, the Assert is exactly right, and I'll add a comment too. > See below. > I'll post the patch separately, if you can't see any further issues. > > diff --git a/src/backend/rewrite/rewriteHandler.c > b/src/backend/rewrite/rewriteHandler.c > index 0672f497c6..05b80bd347 100644 > --- a/src/backend/rewrite/rewriteHandler.c > +++ b/src/backend/rewrite/rewriteHandler.c > @@ -557,6 +557,21 @@ rewriteRuleAction(Query *parsetree, > /* OK, it's safe to combine the CTE lists */ > sub_action->cteList = list_concat(sub_action->cteList, > copyObject(parsetree->cteList)); > + > + /* > + * If the hasModifyingCTE flag is set in the source parsetree from > + * which the CTE list is copied, the flag needs to be set in the > + * sub_action and, if applicable, in the rule_action (INSERT...SELECT > + * case). > + */ > + if (parsetree->hasModifyingCTE) > + { > + sub_action->hasModifyingCTE = true; > + if (sub_action_ptr) > + rule_action->hasModifyingCTE = true; > + else > + Assert(sub_action == rule_action); > + } > } LGTM, thank you. -- Amit Langote EDB: http://www.enterprisedb.com
Posting an updated set of patches. The only update is to include an improved, but only temporary, fix to the query rewriter hasModifyingCTE issue (I separately posted a patch for this but Tom Lane concluded that the issue is more complex than initially thought, and no easy fix could be confidently attempted at this point in time, so nothing ended up getting pushed - the issue needs further investigation at another time). See also comment in patch 0001. Regards, Greg Nancarrow Fujitsu Australia
Attachment
> Posting an updated set of patches. A minor comment about doc. + <para> + Where the above target table features are determined to be, at worst, + parallel-restricted, rather than parallel-unsafe, at least a parallel table + scan may be used in the query plan for the <literal>INSERT</literal> + statement. For more information about Parallel Safety, see + <xref linkend="parallel-safety"/>. + </para> It seems does not mention that if target table is a foreign/temp table, a parallel table scan may be used. So how about: + <para> + Where the target table is a foreign/temporary table or the above target table features + are determined to be, at worst, parallel-restricted, rather than parallel-unsafe, + at least a parallel table scan may be used in the query plan for the + <literal>INSERT</literal> statement. For more information about Parallel Safety, + see <xref linkend="parallel-safety"/>. + </para> Best regards, houzj
On Mon, Feb 1, 2021 at 7:20 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote: > > Hi Greg, > > Recently, I was keeping evaluating performance of this patch(1/28 V13). > Here I find a regression test case which is parallel insert with bitmap heap scan. > when the target table has primary key or index, then the patched performance will have a 7%-19% declines than unpatched. > > Could you please have a look about this? > > I tried max_parallel_workers_per_gather=2/4/8, and I didn't tune other parameters(like GUCs or other enforce parallel parameters). > > 1. max_parallel_workers_per_gather=2(default) > target_table patched master %reg > ------------------------------------------------------ > without_PK_index 83.683 142.183 -41% > with_PK 382.824 321.101 19% > with_index 372.682 324.246 15% > > 2. max_parallel_workers_per_gather=4 > target_table patched master %reg > ------------------------------------------------------ > without_PK_index 73.189 141.879 -48% > with_PK 362.104 329.759 10% > with_index 372.237 333.718 12% > > 3. max_parallel_workers_per_gather=8 (also set max_parallel_workers=16, max_worker_processes = 16) > target_table patched master %reg > ------------------------------------------------------ > without_PK_index 75.072 146.100 -49% > with_PK 365.312 324.339 13% > with_index 362.636 338.366 7% > > Attached test_bitmap.sql which includes my test data and sql if you want to have a look. > Hi, Did it actually use a parallel plan in your testing? When I ran these tests with the Parallel INSERT patch applied, it did not naturally choose a parallel plan for any of these cases. So we can hardly blame the parallel insert with bitmap heap scan for having worse performance, when based on costings, it doesn't actually choose to use a parallel plan in this case. Regards, Greg Nancarrow Fujitsu Australia
On Mon, Feb 8, 2021 at 6:00 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > Posting an updated set of patches. > > A minor comment about doc. > > + <para> > + Where the above target table features are determined to be, at worst, > + parallel-restricted, rather than parallel-unsafe, at least a parallel table > + scan may be used in the query plan for the <literal>INSERT</literal> > + statement. For more information about Parallel Safety, see > + <xref linkend="parallel-safety"/>. > + </para> > > It seems does not mention that if target table is a foreign/temp table, a parallel table scan may be used. > > So how about: > > + <para> > + Where the target table is a foreign/temporary table or the above target table features > + are determined to be, at worst, parallel-restricted, rather than parallel-unsafe, > + at least a parallel table scan may be used in the query plan for the > + <literal>INSERT</literal> statement. For more information about Parallel Safety, > + see <xref linkend="parallel-safety"/>. > + </para> > Thanks. You're right, I should probably update the docs to clarify those two cases. (I had removed them from the list of parallel-unsafe things, but not pointed out that a parallel table scan could still be used in these cases). Regards, Greg Nancarrow Fujitsu Australia
> Did it actually use a parallel plan in your testing? > When I ran these tests with the Parallel INSERT patch applied, it did > not naturally choose a parallel plan for any of these cases. Yes, these cases pick parallel plan naturally on my test environment. postgres=# explain verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN --------------------------------------------------------------------------------------------------- Gather (cost=4346.89..1281204.64 rows=81372 width=0) Workers Planned: 4 -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 width=0) -> Parallel Bitmap Heap Scan on public.x1 (cost=3346.89..1272067.44 rows=20343 width=8) Output: x1.a, NULL::integer Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a > 199900000))) -> BitmapOr (cost=3346.89..3346.89 rows=178808 width=0) -> Bitmap Index Scan on x1_a_idx (cost=0.00..1495.19 rows=80883 width=0) Index Cond: (x1.a < 80000) -> Bitmap Index Scan on x1_a_idx (cost=0.00..1811.01 rows=97925 width=0) Index Cond: (x1.a > 199900000) PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't do any parameters tuning in my test session. Regards, Tang
Attachment
> > Did it actually use a parallel plan in your testing? > > When I ran these tests with the Parallel INSERT patch applied, it did > > not naturally choose a parallel plan for any of these cases. > > Yes, these cases pick parallel plan naturally on my test environment. > > postgres=# explain verbose insert into testscan select a from x where > a<80000 or (a%2=0 and a>199900000); > QUERY PLAN > ---------------------------------------------------------------------- > ----------------------------- > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > Workers Planned: 4 > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > width=0) > -> Parallel Bitmap Heap Scan on public.x1 > (cost=3346.89..1272067.44 rows=20343 width=8) > Output: x1.a, NULL::integer > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a > > 199900000))) > -> BitmapOr (cost=3346.89..3346.89 rows=178808 > width=0) > -> Bitmap Index Scan on x1_a_idx > (cost=0.00..1495.19 rows=80883 width=0) > Index Cond: (x1.a < 80000) > -> Bitmap Index Scan on x1_a_idx > (cost=0.00..1811.01 rows=97925 width=0) > Index Cond: (x1.a > 199900000) > > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't > do any parameters tuning in my test session. I reproduced this on my machine. I think we'd better do "analyze" before insert which helps reproduce this easier. Like: ----- analyze; explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); ----- Best regards, houzj
Greg, all Thanks a lot for your work on this. On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > Posting an updated set of patches. I've been looking at these patches, initially with an intention to review mainly any partitioning-related concerns, but have some general thoughts as well concerning mostly the patches 0001 and 0002. * I've seen review comments on this thread where I think it's been suggested that whatever max_parallel_hazard_for_modify() does had better have been integrated into max_parallel_hazard() such that there's no particular need for that function to exist. For example, the following: + /* + * UPDATE is not currently supported in parallel-mode, so prohibit + * INSERT...ON CONFLICT...DO UPDATE... + * In order to support update, even if only in the leader, some + * further work would need to be done. A mechanism would be needed + * for sharing combo-cids between leader and workers during + * parallel-mode, since for example, the leader might generate a + * combo-cid and it needs to be propagated to the workers. + */ + if (parse->onConflict != NULL && parse->onConflict->action == ONCONFLICT_UPDATE) + return PROPARALLEL_UNSAFE; could be placed in the following block in max_parallel_hazard(): /* * When we're first invoked on a completely unplanned tree, we must * recurse into subqueries so to as to locate parallel-unsafe constructs * anywhere in the tree. */ else if (IsA(node, Query)) { Query *query = (Query *) node; /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ if (query->rowMarks != NULL) { context->max_hazard = PROPARALLEL_UNSAFE; return true; } Furthermore, the following: + rte = rt_fetch(parse->resultRelation, parse->rtable); + + /* + * The target table is already locked by the caller (this is done in the + * parse/analyze phase). + */ + rel = table_open(rte->relid, NoLock); + (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType, &context); + table_close(rel, NoLock); can itself be wrapped in a function that's called from max_parallel_hazard() by adding a new block for RangeTblEntry nodes and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker(). That brings me to to this part of the hunk: + /* + * If there is no underlying SELECT, a parallel table-modification + * operation is not possible (nor desirable). + */ + hasSubQuery = false; + foreach(lc, parse->rtable) + { + rte = lfirst_node(RangeTblEntry, lc); + if (rte->rtekind == RTE_SUBQUERY) + { + hasSubQuery = true; + break; + } + } + if (!hasSubQuery) + return PROPARALLEL_UNSAFE; The justification for this given in: https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com seems to be that the failure of a test case in partition-concurrent-attach isolation suite is prevented if finding no subquery RTEs in the query is flagged as parallel unsafe, which in turn stops max_parallel_hazard_modify() from locking partitions for safety checks in such cases. But it feels unprincipled to have this code to work around a specific test case that's failing. I'd rather edit the failing test case to disable parallel execution as Tsunakawa-san suggested. * Regarding function names: +static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc, + max_parallel_hazard_context *context); +static bool index_expr_max_parallel_hazard_for_modify(Relation rel, + max_parallel_hazard_context *context); +static bool domain_max_parallel_hazard_for_modify(Oid typid, max_parallel_hazard_context *context); +static bool rel_max_parallel_hazard_for_modify(Relation rel, + CmdType command_type, + max_parallel_hazard_context *context) IMO, it would be better to name these target_rel_trigger_max_parallel_hazard(), target_rel_index_max_parallel_hazard(), etc. rather than have _for_modify at the end of these names to better connote that they check the parallel safety of applying the modify operation to a given target relation. Also, put these prototypes just below that of max_parallel_hazard() to have related things close by. Attached please see v15_delta.diff showing the changes suggested above. * I suspect that the following is broken in light of concurrent attachment of partitions. + + /* Recursively check each partition ... */ + pdesc = RelationGetPartitionDesc(rel); I think we'd need to use CreatePartitionDirectory() and retrieve the PartitionDesc using PartitionDirectoryLookup(). Something we already do when opening partitions for SELECT planning. * I think that the concerns raised by Tsunakawa-san in: https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com regarding how this interacts with plancache.c deserve a look. Specifically, a plan that uses parallel insert may fail to be invalidated when partitions are altered directly (that is without altering their root parent). That would be because we are not adding partition OIDs to PlannerGlobal.invalItems despite making a plan that's based on checking their properties. See this (tested with all patches applied!): create table rp (a int) partition by range (a); create table rp1 partition of rp for values from (minvalue) to (0); create table rp2 partition of rp for values from (0) to (maxvalue); create table foo (a) as select generate_series(1, 1000000); prepare q as insert into rp select * from foo where a%2 = 0; explain execute q; QUERY PLAN ------------------------------------------------------------------------------- Gather (cost=1000.00..13041.54 rows=5642 width=0) Workers Planned: 2 -> Insert on rp (cost=0.00..11477.34 rows=0 width=0) -> Parallel Seq Scan on foo (cost=0.00..11477.34 rows=2351 width=4) Filter: ((a % 2) = 0) (5 rows) -- create a parallel unsafe trigger (that's actually marked so) directly on a partition create or replace function make_table () returns trigger language plpgsql as $$ begin create table bar(); return null; end; $$ parallel unsafe; create trigger ai_rp2 after insert on rp2 for each row execute function make_table();CREATE TRIGGER -- plan still parallel explain execute q; QUERY PLAN ------------------------------------------------------------------------------- Gather (cost=1000.00..13041.54 rows=5642 width=0) Workers Planned: 2 -> Insert on rp (cost=0.00..11477.34 rows=0 width=0) -> Parallel Seq Scan on foo (cost=0.00..11477.34 rows=2351 width=4) Filter: ((a % 2) = 0) (5 rows) -- and because it is execute q; ERROR: cannot start commands during a parallel operation CONTEXT: SQL statement "create table bar()" PL/pgSQL function make_table() line 1 at SQL statement -- OTOH, altering parent correctly discards the parallel plan create trigger ai_rp after insert on rp for each row execute function make_table(); explain execute q; QUERY PLAN ---------------------------------------------------------------- Insert on rp (cost=0.00..19425.00 rows=0 width=0) -> Seq Scan on foo (cost=0.00..19425.00 rows=5000 width=4) Filter: ((a % 2) = 0) (3 rows) It's fair to argue that it would rarely make sense to use PREPARE for bulk loads, but we need to tighten things up a bit here regardless. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > Greg, all > > Thanks a lot for your work on this. > > On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches. > > I've been looking at these patches, initially with an intention to > review mainly any partitioning-related concerns, but have some general > thoughts as well concerning mostly the patches 0001 and 0002. > > * I've seen review comments on this thread where I think it's been > suggested that whatever max_parallel_hazard_for_modify() does had > better have been integrated into max_parallel_hazard() such that > there's no particular need for that function to exist. For example, > the following: > > + /* > + * UPDATE is not currently supported in parallel-mode, so prohibit > + * INSERT...ON CONFLICT...DO UPDATE... > + * In order to support update, even if only in the leader, some > + * further work would need to be done. A mechanism would be needed > + * for sharing combo-cids between leader and workers during > + * parallel-mode, since for example, the leader might generate a > + * combo-cid and it needs to be propagated to the workers. > + */ > + if (parse->onConflict != NULL && parse->onConflict->action == > ONCONFLICT_UPDATE) > + return PROPARALLEL_UNSAFE; > > could be placed in the following block in max_parallel_hazard(): > > /* > * When we're first invoked on a completely unplanned tree, we must > * recurse into subqueries so to as to locate parallel-unsafe constructs > * anywhere in the tree. > */ > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > if (query->rowMarks != NULL) > { > context->max_hazard = PROPARALLEL_UNSAFE; > return true; > } > > Furthermore, the following: > > + rte = rt_fetch(parse->resultRelation, parse->rtable); > + > + /* > + * The target table is already locked by the caller (this is done in the > + * parse/analyze phase). > + */ > + rel = table_open(rte->relid, NoLock); > + (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType, > &context); > + table_close(rel, NoLock); > > can itself be wrapped in a function that's called from > max_parallel_hazard() by adding a new block for RangeTblEntry nodes > and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker(). > Thanks, I think those suggestions look good to me. > That brings me to to this part of the hunk: > > + /* > + * If there is no underlying SELECT, a parallel table-modification > + * operation is not possible (nor desirable). > + */ > + hasSubQuery = false; > + foreach(lc, parse->rtable) > + { > + rte = lfirst_node(RangeTblEntry, lc); > + if (rte->rtekind == RTE_SUBQUERY) > + { > + hasSubQuery = true; > + break; > + } > + } > + if (!hasSubQuery) > + return PROPARALLEL_UNSAFE; > > The justification for this given in: > > https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com > > seems to be that the failure of a test case in > partition-concurrent-attach isolation suite is prevented if finding no > subquery RTEs in the query is flagged as parallel unsafe, which in > turn stops max_parallel_hazard_modify() from locking partitions for > safety checks in such cases. But it feels unprincipled to have this > code to work around a specific test case that's failing. I'd rather > edit the failing test case to disable parallel execution as > Tsunakawa-san suggested. > The code was not changed because of the test case (though it was fortunate that the test case worked after the change). The code check that you have identified above ensures that the INSERT has an underlying SELECT, because the planner won't (and shouldn't anyway) generate a parallel plan for INSERT...VALUES, so there is no point doing any parallel-safety checks in this case. It just so happens that the problem test case uses INSERT...VALUES - and it shouldn't have triggered the parallel-safety checks for parallel INSERT for this case anyway, because INSERT...VALUES can't (and shouldn't) be parallelized. So I will need to keep that check in the code somewhere, to avoid overhead of parallel-safety checks in the case of INSERT...VALUES. > * Regarding function names: > > +static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc, > + > max_parallel_hazard_context *context); > +static bool index_expr_max_parallel_hazard_for_modify(Relation rel, > + > max_parallel_hazard_context *context); > +static bool domain_max_parallel_hazard_for_modify(Oid typid, > max_parallel_hazard_context *context); > +static bool rel_max_parallel_hazard_for_modify(Relation rel, > + CmdType command_type, > + > max_parallel_hazard_context *context) > > IMO, it would be better to name these > target_rel_trigger_max_parallel_hazard(), > target_rel_index_max_parallel_hazard(), etc. rather than have > _for_modify at the end of these names to better connote that they > check the parallel safety of applying the modify operation to a given > target relation. Also, put these prototypes just below that of > max_parallel_hazard() to have related things close by. > > Attached please see v15_delta.diff showing the changes suggested above. > OK, sounds reasonable. Thanks for the patch! > * I suspect that the following is broken in light of concurrent > attachment of partitions. > > + > + /* Recursively check each partition ... */ > + pdesc = RelationGetPartitionDesc(rel); > > I think we'd need to use CreatePartitionDirectory() and retrieve the > PartitionDesc using PartitionDirectoryLookup(). Something we already > do when opening partitions for SELECT planning. > > * I think that the concerns raised by Tsunakawa-san in: > > https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com > > regarding how this interacts with plancache.c deserve a look. > Specifically, a plan that uses parallel insert may fail to be > invalidated when partitions are altered directly (that is without > altering their root parent). That would be because we are not adding > partition OIDs to PlannerGlobal.invalItems despite making a plan > that's based on checking their properties. See this (tested with all > patches applied!): > > create table rp (a int) partition by range (a); > create table rp1 partition of rp for values from (minvalue) to (0); > create table rp2 partition of rp for values from (0) to (maxvalue); > create table foo (a) as select generate_series(1, 1000000); > prepare q as insert into rp select * from foo where a%2 = 0; > explain execute q; > QUERY PLAN > ------------------------------------------------------------------------------- > Gather (cost=1000.00..13041.54 rows=5642 width=0) > Workers Planned: 2 > -> Insert on rp (cost=0.00..11477.34 rows=0 width=0) > -> Parallel Seq Scan on foo (cost=0.00..11477.34 rows=2351 width=4) > Filter: ((a % 2) = 0) > (5 rows) > > -- create a parallel unsafe trigger (that's actually marked so) > directly on a partition > create or replace function make_table () returns trigger language > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > unsafe; > create trigger ai_rp2 after insert on rp2 for each row execute > function make_table();CREATE TRIGGER > > -- plan still parallel > explain execute q; > QUERY PLAN > ------------------------------------------------------------------------------- > Gather (cost=1000.00..13041.54 rows=5642 width=0) > Workers Planned: 2 > -> Insert on rp (cost=0.00..11477.34 rows=0 width=0) > -> Parallel Seq Scan on foo (cost=0.00..11477.34 rows=2351 width=4) > Filter: ((a % 2) = 0) > (5 rows) > > -- and because it is > execute q; > ERROR: cannot start commands during a parallel operation > CONTEXT: SQL statement "create table bar()" > PL/pgSQL function make_table() line 1 at SQL statement > > -- OTOH, altering parent correctly discards the parallel plan > create trigger ai_rp after insert on rp for each row execute function > make_table(); > explain execute q; > QUERY PLAN > ---------------------------------------------------------------- > Insert on rp (cost=0.00..19425.00 rows=0 width=0) > -> Seq Scan on foo (cost=0.00..19425.00 rows=5000 width=4) > Filter: ((a % 2) = 0) > (3 rows) > > It's fair to argue that it would rarely make sense to use PREPARE for > bulk loads, but we need to tighten things up a bit here regardless. > > Thanks, looks like you've identified some definite issues in the partition support and some missing test cases to help detect them. I'll look into it. Thanks very much for your review of this and suggestions. Regards, Greg Nancarrow Fujitsu Australia
On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > Did it actually use a parallel plan in your testing? > > > When I ran these tests with the Parallel INSERT patch applied, it did > > > not naturally choose a parallel plan for any of these cases. > > > > Yes, these cases pick parallel plan naturally on my test environment. > > > > postgres=# explain verbose insert into testscan select a from x where > > a<80000 or (a%2=0 and a>199900000); > > QUERY PLAN > > ---------------------------------------------------------------------- > > ----------------------------- > > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > > Workers Planned: 4 > > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > > width=0) > > -> Parallel Bitmap Heap Scan on public.x1 > > (cost=3346.89..1272067.44 rows=20343 width=8) > > Output: x1.a, NULL::integer > > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) > > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a > > > 199900000))) > > -> BitmapOr (cost=3346.89..3346.89 rows=178808 > > width=0) > > -> Bitmap Index Scan on x1_a_idx > > (cost=0.00..1495.19 rows=80883 width=0) > > Index Cond: (x1.a < 80000) > > -> Bitmap Index Scan on x1_a_idx > > (cost=0.00..1811.01 rows=97925 width=0) > > Index Cond: (x1.a > 199900000) > > > > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't > > do any parameters tuning in my test session. > > I reproduced this on my machine. > > I think we'd better do "analyze" before insert which helps reproduce this easier. > Like: > > ----- > analyze; > explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); > ----- OK then. Can you check if just the underlying SELECTs are run (without INSERT), is there any performance degradation when compared to a non-parallel scan? Regards, Greg Nancarrow Fujitsu Australia
> > > postgres=# explain verbose insert into testscan select a from x > > > where > > > a<80000 or (a%2=0 and a>199900000); > > > QUERY PLAN > > > > -------------------------------------------------------------------- > > > -- > > > ----------------------------- > > > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > > > Workers Planned: 4 > > > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > > > width=0) > > > -> Parallel Bitmap Heap Scan on public.x1 > > > (cost=3346.89..1272067.44 rows=20343 width=8) > > > Output: x1.a, NULL::integer > > > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) > > > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND > > > (x1.a > > > > 199900000))) > > > -> BitmapOr (cost=3346.89..3346.89 rows=178808 > > > width=0) > > > -> Bitmap Index Scan on x1_a_idx > > > (cost=0.00..1495.19 rows=80883 width=0) > > > Index Cond: (x1.a < 80000) > > > -> Bitmap Index Scan on x1_a_idx > > > (cost=0.00..1811.01 rows=97925 width=0) > > > Index Cond: (x1.a > 199900000) > > > > > > PSA is my postgresql.conf file, maybe you can have a look. Besides, > > > I didn't do any parameters tuning in my test session. > > > > I reproduced this on my machine. > > > > I think we'd better do "analyze" before insert which helps reproduce this > easier. > > Like: > > > > ----- > > analyze; > > explain analyze verbose insert into testscan select a from x where > > a<80000 or (a%2=0 and a>199900000); > > ----- > > OK then. > Can you check if just the underlying SELECTs are run (without INSERT), is > there any performance degradation when compared to a non-parallel scan? It seems there is no performance degradation without insert. Till now, what I found is that: With tang's conf, when doing parallel insert, the walrecord is more than serial insert (IMO, this is the main reason why it has performance degradation) See the attatchment for the plan info. I have tried alter the target table to unlogged and then the performance degradation will not happen any more. And the additional walrecord seems related to the index on the target table. If the target table does not have any index, the wal record is the same between parallel plan and serial plan. Also, it does not have performance degradation without index. I am still looking at this problem, if someone think of something about it, It's very grateful to share the knowledge with me. Best regards, houzj
Attachment
On Tue, Feb 9, 2021 at 5:49 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > postgres=# explain verbose insert into testscan select a from x > > > > where > > > > a<80000 or (a%2=0 and a>199900000); > > > > QUERY PLAN > > > > > > -------------------------------------------------------------------- > > > > -- > > > > ----------------------------- > > > > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > > > > Workers Planned: 4 > > > > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > > > > width=0) > > > > -> Parallel Bitmap Heap Scan on public.x1 > > > > (cost=3346.89..1272067.44 rows=20343 width=8) > > > > Output: x1.a, NULL::integer > > > > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) > > > > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND > > > > (x1.a > > > > > 199900000))) > > > > -> BitmapOr (cost=3346.89..3346.89 rows=178808 > > > > width=0) > > > > -> Bitmap Index Scan on x1_a_idx > > > > (cost=0.00..1495.19 rows=80883 width=0) > > > > Index Cond: (x1.a < 80000) > > > > -> Bitmap Index Scan on x1_a_idx > > > > (cost=0.00..1811.01 rows=97925 width=0) > > > > Index Cond: (x1.a > 199900000) > > > > > > > > PSA is my postgresql.conf file, maybe you can have a look. Besides, > > > > I didn't do any parameters tuning in my test session. > > > > > > I reproduced this on my machine. > > > > > > I think we'd better do "analyze" before insert which helps reproduce this > > easier. > > > Like: > > > > > > ----- > > > analyze; > > > explain analyze verbose insert into testscan select a from x where > > > a<80000 or (a%2=0 and a>199900000); > > > ----- > > > > OK then. > > Can you check if just the underlying SELECTs are run (without INSERT), is > > there any performance degradation when compared to a non-parallel scan? > > It seems there is no performance degradation without insert. > > Till now, what I found is that: > With tang's conf, when doing parallel insert, the walrecord is more than serial insert > (IMO, this is the main reason why it has performance degradation) > See the attatchment for the plan info. > > I have tried alter the target table to unlogged and > then the performance degradation will not happen any more. > > And the additional walrecord seems related to the index on the target table. > I think you might want to see which exact WAL records are extra by using pg_waldump? -- With Regards, Amit Kapila.
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > Till now, what I found is that: > With tang's conf, when doing parallel insert, the walrecord is more than serial > insert (IMO, this is the main reason why it has performance degradation) See > the attatchment for the plan info. > > I have tried alter the target table to unlogged and then the performance > degradation will not happen any more. > > And the additional walrecord seems related to the index on the target table. > If the target table does not have any index, the wal record is the same between > parallel plan and serial plan. > Also, it does not have performance degradation without index. [serial] Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) (actual time=360.474..360.476 rows=0 loops=1) Buffers: shared hit=392569 read=3 dirtied=934 written=933 WAL: records=260354 bytes=16259841 [parallel] -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0 loops=5) Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 More pages are dirtied and written in the parallel execution. Aren't the index and possibly the target table bigger withparallel execution than with serial execution? That may be due to the difference of inserts of index keys. Regards Takayuki Tsunakawa
> > Till now, what I found is that: > > With tang's conf, when doing parallel insert, the walrecord is more > > than serial insert (IMO, this is the main reason why it has > > performance degradation) See the attatchment for the plan info. > > > > I have tried alter the target table to unlogged and then the > > performance degradation will not happen any more. > > > > And the additional walrecord seems related to the index on the target > table. > > If the target table does not have any index, the wal record is the > > same between parallel plan and serial plan. > > Also, it does not have performance degradation without index. > > > [serial] > Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) > (actual time=360.474..360.476 rows=0 loops=1) > Buffers: shared hit=392569 read=3 dirtied=934 written=933 > WAL: records=260354 bytes=16259841 > > [parallel] > -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 > width=0) (actual time=378.227..378.229 rows=0 loops=5) > Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 > WAL: records=260498 bytes=17019359 > > > More pages are dirtied and written in the parallel execution. Aren't the > index and possibly the target table bigger with parallel execution than > with serial execution? That may be due to the difference of inserts of > index keys. Yes, the table size and index size is bigger with parallel execution. table and index's size after parallel insert ---------------------- postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); pg_size_pretty ---------------- 4048 kB (1 row) postgres=# postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); pg_size_pretty ---------------- 4768 kB (1 row) ---------------------- table and index's size after serial insert ---------------------- postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); pg_size_pretty ---------------- 2864 kB (1 row) postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); pg_size_pretty ---------------- 4608 kB ---------------------- To Amit: > I think you might want to see which exact WAL records are extra by using pg_waldump? Yes, thanks for the hint, I was doing that and the result is as follow: Heap wal record is the same between parallel and serial: (129999 which is the number count of the query result). parallel Btree walrecord(130500 record): ---------------------- INSERT_LEAF:129500 INSERT_UPPER:497 SPLIT_L:172 SPLIT_R:328 INSERT_POST:0 DEDUP:0 VACUUM:0 DELETE:0 MARK_PAGE_HALFDEAD:0 UNLINK_PAGE:0 UNLINK_PAGE_META:0 NEWROOT:3 REUSE_PAGE:0 META_CLEANUP:0 ---------------------- serial Btree walrecord(130355 record): ---------------------- INSERT_LEAF:129644 INSERT_UPPER:354 SPLIT_L:0 SPLIT_R:355 INSERT_POST:0 DEDUP:0 VACUUM:0 DELETE:0 MARK_PAGE_HALFDEAD:0 UNLINK_PAGE:0 UNLINK_PAGE_META:0 NEWROOT:2 REUSE_PAGE:0 META_CLEANUP:0 ---------------------- IMO, due to the difference of inserts with parallel execution, the btree insert's cost is more than serial. At the same time, the parallel does not have a huge performance gain with bitmapscan, So the extra cost of btree index will result in performance degradation. Does it make sense ? Best regards, Houzj
On Tue, Feb 9, 2021 at 10:30 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > That brings me to to this part of the hunk: > > > > + /* > > + * If there is no underlying SELECT, a parallel table-modification > > + * operation is not possible (nor desirable). > > + */ > > + hasSubQuery = false; > > + foreach(lc, parse->rtable) > > + { > > + rte = lfirst_node(RangeTblEntry, lc); > > + if (rte->rtekind == RTE_SUBQUERY) > > + { > > + hasSubQuery = true; > > + break; > > + } > > + } > > + if (!hasSubQuery) > > + return PROPARALLEL_UNSAFE; > > > > The justification for this given in: > > > > https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com > > > > seems to be that the failure of a test case in > > partition-concurrent-attach isolation suite is prevented if finding no > > subquery RTEs in the query is flagged as parallel unsafe, which in > > turn stops max_parallel_hazard_modify() from locking partitions for > > safety checks in such cases. But it feels unprincipled to have this > > code to work around a specific test case that's failing. I'd rather > > edit the failing test case to disable parallel execution as > > Tsunakawa-san suggested. > > > > The code was not changed because of the test case (though it was > fortunate that the test case worked after the change). Ah, I misread then, sorry. > The code check that you have identified above ensures that the INSERT > has an underlying SELECT, because the planner won't (and shouldn't > anyway) generate a parallel plan for INSERT...VALUES, so there is no > point doing any parallel-safety checks in this case. > It just so happens that the problem test case uses INSERT...VALUES - > and it shouldn't have triggered the parallel-safety checks for > parallel INSERT for this case anyway, because INSERT...VALUES can't > (and shouldn't) be parallelized. AFAICS, max_parallel_hazard() path never bails from doing further safety checks based on anything other than finding a query component whose hazard level crosses context->max_interesting. You're trying to add something that bails based on second-guessing that a parallel path would not be chosen, which I find somewhat objectionable. If the main goal of bailing out is to avoid doing the potentially expensive modification safety check on the target relation, maybe we should try to somehow make the check less expensive. I remember reading somewhere in the thread about caching the result of this check in relcache, but haven't closely studied the feasibility of doing so. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Jan 6, 2021 at 7:39 PM Antonin Houska <ah@cybertec.at> wrote: > > > @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan) > PlannerGlobal *glob = root->glob; > int rtoffset = list_length(glob->finalrtable); > ListCell *lc; > + Plan *finalPlan; > > /* > * Add all the query's RTEs to the flattened rangetable. The live ones > @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan) > } > > /* Now fix the Plan tree */ > - return set_plan_refs(root, plan, rtoffset); > + finalPlan = set_plan_refs(root, plan, rtoffset); > + if (finalPlan != NULL && IsA(finalPlan, Gather)) > + { > + Plan *subplan = outerPlan(finalPlan); > + > + if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL) > + { > + finalPlan->targetlist = copyObject(subplan->targetlist); > + } > + } > + return finalPlan; > } > > I'm not sure if the problem of missing targetlist should be handled here (BTW, > NIL is the constant for an empty list, not NULL). Obviously this is a > consequence of the fact that the ModifyTable node has no regular targetlist. > > Actually I don't quite understand why (in the current master branch) the > targetlist initialized in set_plan_refs() > > /* > * Set up the visible plan targetlist as being the same as > * the first RETURNING list. This is for the use of > * EXPLAIN; the executor won't pay any attention to the > * targetlist. We postpone this step until here so that > * we don't have to do set_returning_clause_references() > * twice on identical targetlists. > */ > splan->plan.targetlist = copyObject(linitial(newRL)); > > is not used. Instead, ExecInitModifyTable() picks the first returning list > again: > > /* > * Initialize result tuple slot and assign its rowtype using the first > * RETURNING list. We assume the rest will look the same. > */ > mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists); > > So if you set the targetlist in create_modifytable_plan() (according to > best_path->returningLists), or even in create_modifytable_path(), and ensure > that it gets propagated to the Gather node (generate_gather_pahs currently > uses rel->reltarget), then you should no longer need to tweak > setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the > targetlist. However I don't guarantee that this is the best approach - some > planner expert should speak up. > I've had a bit closer look at this particular issue. I can see what you mean about the ModifyTable targetlist (that is set in set_plan_refs()) getting overwritten by ExecInitModifyTable() - which also contradicts the comment in set_plan_refs() that claims the targetlist being set is used by EXPLAIN (which it is not). So the current Postgres master branch does seem to be broken in that respect. I did try your suggestion (and also remove my tweak), but I found that in the T_Gather case of set_plan_refs() it does some processing (see set_upper_references()) of the current Gather targetlist and subplan's targetlist (and will then overwrite the Gather targetlist after that), but in doing that processing it produces the error: ERROR: variable not found in subplan target list I think that one of the fundamental problems is that, up to now, ModifyTable has always been the top node in a (non-parallel) plan, but now for Parallel INSERT we have a Gather node with ModifyTable in its subplan. So the expected order of processing and node configuration has changed. For the moment (until perhaps a planner expert DOES speak up) I've parked my temporary "fix" at the bottom of set_plan_refs(), simply pointing the Gather node's targetlist to subplan's ModifyTable targetlist. if (nodeTag(plan) == T_Gather) { Plan *subplan = plan->lefttree; if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NIL) { plan->targetlist = subplan->targetlist; } } Regards, Greg Nancarrow Fujitsu Australia
On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > The code check that you have identified above ensures that the INSERT > > has an underlying SELECT, because the planner won't (and shouldn't > > anyway) generate a parallel plan for INSERT...VALUES, so there is no > > point doing any parallel-safety checks in this case. > > It just so happens that the problem test case uses INSERT...VALUES - > > and it shouldn't have triggered the parallel-safety checks for > > parallel INSERT for this case anyway, because INSERT...VALUES can't > > (and shouldn't) be parallelized. > > AFAICS, max_parallel_hazard() path never bails from doing further > safety checks based on anything other than finding a query component > whose hazard level crosses context->max_interesting. It's parallel UNSAFE because it's not safe or even possible to have a parallel plan for this. (as UNSAFE is the max hazard level, no point in referencing context->max_interesting). And there are existing cases of bailing out and not doing further safety checks (even your v15_delta.diff patch placed code - for bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such existing case in max_parallel_hazard_walker()): else if (IsA(node, Query)) { Query *query = (Query *) node; /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ if (query->rowMarks != NULL) { context->max_hazard = PROPARALLEL_UNSAFE; return true; } >You're trying to > add something that bails based on second-guessing that a parallel path > would not be chosen, which I find somewhat objectionable. > > If the main goal of bailing out is to avoid doing the potentially > expensive modification safety check on the target relation, maybe we > should try to somehow make the check less expensive. I remember > reading somewhere in the thread about caching the result of this check > in relcache, but haven't closely studied the feasibility of doing so. > There's no "second-guessing" involved here. There is no underlying way of dividing up the VALUES data of "INSERT...VALUES" amongst the parallel workers, even if the planner was updated to produce a parallel-plan for the "INSERT...VALUES" case (apart from the fact that spawning off parallel workers to insert that data would almost always result in worse performance than a non-parallel plan...) The division of work for parallel workers is part of the table AM (scan) implementation, which is not invoked for "INSERT...VALUES". Regards, Greg Nancarrow Fujitsu Australia
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > table and index's size after parallel insert > ---------------------- > postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); > pg_size_pretty > ---------------- > 4048 kB > (1 row) > > postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); > pg_size_pretty > ---------------- > 4768 kB > (1 row) Which of the above shows the table size? What does pg_indexes_size() against an index (testscan_index) return? > IMO, due to the difference of inserts with parallel execution, > the btree insert's cost is more than serial. > > At the same time, the parallel does not have a huge performance gain with > bitmapscan, > So the extra cost of btree index will result in performance degradation. How did you know that the parallelism didn't have a huge performance gain with bitmap scan? [serial] -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005 rows=129999loops=1) [parallel] -> Parallel Bitmap Heap Scan on public.x (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787rows=26000 loops=5) Regards Takayuki Tsunakawa
> > ---------------------- > > postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); > > pg_size_pretty > > ---------------- > > 4048 kB > > (1 row) > > > > postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); > > pg_size_pretty > > ---------------- > > 4768 kB > > (1 row) > > Which of the above shows the table size? What does pg_indexes_size() > against an index (testscan_index) return? Sorry, Maybe the tablename is a little confused, but 'testscan_index' is actually a table's name. pg_indexes_size will return the index's size attatched to the table. pg_relation_size will return the table's size. Did I miss something ? > > IMO, due to the difference of inserts with parallel execution, the > > btree insert's cost is more than serial. > > > > At the same time, the parallel does not have a huge performance gain > > with bitmapscan, So the extra cost of btree index will result in > > performance degradation. > > How did you know that the parallelism didn't have a huge performance gain > with bitmap scan? > > [serial] > -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 > width=8) (actual time=8.096..41.005 rows=129999 loops=1) > > [parallel] > -> Parallel Bitmap Heap Scan on public.x > (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787 > rows=26000 loops=5) I tested the case without insert(Just the query use bitmapscan): [serial]: postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on x (cost=3258.59..3647578.53 rows=81338 width=4) (actual time=8.091..34.222 rows=129999 loops=1) Recheck Cond: ((a < 80000) OR (a > 199900000)) Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000))) Rows Removed by Filter: 50000 Heap Blocks: exact=975 -> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=7.964..7.965 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.451..3.451 rows=79999 loops=1) Index Cond: (a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.513..4.513 rows=100000loops=1) Index Cond: (a > 199900000) Planning Time: 0.108 ms Execution Time: 38.136 ms [parallel] postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=4258.59..1266704.42 rows=81338 width=4) (actual time=9.177..22.592 rows=129999 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Bitmap Heap Scan on x (cost=3258.59..1257570.62 rows=20334 width=4) (actual time=6.402..12.882 rows=26000loops=5) Recheck Cond: ((a < 80000) OR (a > 199900000)) Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000))) Rows Removed by Filter: 10000 Heap Blocks: exact=1 -> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=8.785..8.786 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.871..3.871 rows=79999loops=1) Index Cond: (a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.914..4.914 rows=100000loops=1) Index Cond: (a > 199900000) Planning Time: 0.158 ms Execution Time: 26.951 ms (15 rows) It did have performance gain, but I think it's not huge enough to ignore the extra's index cost. What do you think ? Best regards, houzj
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > Sorry, Maybe the tablename is a little confused, > but 'testscan_index' is actually a table's name. ... > Did I miss something ? No, I don't think so. I just wanted to know the facts correctly. Your EXPLAIN output shows that the target table is testscanas follows. How does testscan_index relate to testscan? -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0 loops=5) > It did have performance gain, but I think it's not huge enough to ignore the > extra's index cost. > What do you think ? Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not compensate for the loss of the parallel insertoperation. The loss is probably due to 1) more index page splits, 2) more buffer writes (table and index), and 3) internal locks forthings such as relation extension and page content protection. To investigate 3), we should want something like [1],which tells us the wait event statistics (wait count and time for each wait event) per session or across the instancelike Oracke, MySQL and EDB provides. I want to continue this in the near future. [1] Add accumulated statistics for wait event https://commitfest.postgresql.org/28/2332/ Regards Takayuki Tsunakawa
From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> > From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > > It did have performance gain, but I think it's not huge enough to ignore the > > extra's index cost. > > What do you think ? > > Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not > compensate for the loss of the parallel insert operation. > > The loss is probably due to 1) more index page splits, 2) more buffer writes > (table and index), and 3) internal locks for things such as relation extension > and page content protection. To investigate 3), we should want something > like [1], which tells us the wait event statistics (wait count and time for each > wait event) per session or across the instance like Oracke, MySQL and EDB > provides. I want to continue this in the near future. What would the result look like if you turn off parallel_leader_participation? If the leader is freed from reading/writingthe table and index, the index page splits and internal lock contention may decrease enough to recover partof the loss. https://www.postgresql.org/docs/devel/parallel-plans.html "In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexesand builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperatingprocesses as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlyingindex scan is not." BTW, the following sentences seem to be revisited, because "the work to be done" is not the same for parallel INSERT as forserial INSERT - the order of rows stored, table and index sizes, and what else? https://www.postgresql.org/docs/devel/using-explain.html#USING-EXPLAIN-ANALYZE "It's worth noting that although the data-modifying node can take a considerable amount of run time (here, it's consumingthe lion's share of the time), the planner does not currently add anything to the cost estimates to account forthat work. That's because the work to be done is the same for every correct query plan, so it doesn't affect planningdecisions." Regards Takayuki Tsunakawa
On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > The code check that you have identified above ensures that the INSERT > > > has an underlying SELECT, because the planner won't (and shouldn't > > > anyway) generate a parallel plan for INSERT...VALUES, so there is no > > > point doing any parallel-safety checks in this case. > > > It just so happens that the problem test case uses INSERT...VALUES - > > > and it shouldn't have triggered the parallel-safety checks for > > > parallel INSERT for this case anyway, because INSERT...VALUES can't > > > (and shouldn't) be parallelized. > > > > AFAICS, max_parallel_hazard() path never bails from doing further > > safety checks based on anything other than finding a query component > > whose hazard level crosses context->max_interesting. > > It's parallel UNSAFE because it's not safe or even possible to have a > parallel plan for this. > (as UNSAFE is the max hazard level, no point in referencing > context->max_interesting). > And there are existing cases of bailing out and not doing further > safety checks (even your v15_delta.diff patch placed code - for > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such > existing case in max_parallel_hazard_walker()): > > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > if (query->rowMarks != NULL) > { > context->max_hazard = PROPARALLEL_UNSAFE; > return true; > } In my understanding, the max_parallel_hazard() query tree walk is to find constructs that are parallel unsafe in that they call code that can't run in parallel mode. For example, FOR UPDATE/SHARE on traditional heap AM tuples calls AssignTransactionId() which doesn't support being called in parallel mode. Likewise ON CONFLICT ... DO UPDATE calls heap_update() which doesn't support parallelism. I'm not aware of any such hazards downstream of ExecValuesScan(). > >You're trying to > > add something that bails based on second-guessing that a parallel path > > would not be chosen, which I find somewhat objectionable. > > > > If the main goal of bailing out is to avoid doing the potentially > > expensive modification safety check on the target relation, maybe we > > should try to somehow make the check less expensive. I remember > > reading somewhere in the thread about caching the result of this check > > in relcache, but haven't closely studied the feasibility of doing so. > > > > There's no "second-guessing" involved here. > There is no underlying way of dividing up the VALUES data of > "INSERT...VALUES" amongst the parallel workers, even if the planner > was updated to produce a parallel-plan for the "INSERT...VALUES" case > (apart from the fact that spawning off parallel workers to insert that > data would almost always result in worse performance than a > non-parallel plan...) > The division of work for parallel workers is part of the table AM > (scan) implementation, which is not invoked for "INSERT...VALUES". I don't disagree that the planner would not normally assign a parallel path simply to pull values out of a VALUES list mentioned in the INSERT command, but deciding something based on the certainty of it in an earlier planning phase seems odd to me. Maybe that's just me though. -- Amit Langote EDB: http://www.enterprisedb.com
From: Amit Langote <amitlangote09@gmail.com> > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> > wrote: > > There's no "second-guessing" involved here. > > There is no underlying way of dividing up the VALUES data of > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > was updated to produce a parallel-plan for the "INSERT...VALUES" case > > (apart from the fact that spawning off parallel workers to insert that > > data would almost always result in worse performance than a > > non-parallel plan...) > > The division of work for parallel workers is part of the table AM > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > I don't disagree that the planner would not normally assign a parallel > path simply to pull values out of a VALUES list mentioned in the > INSERT command, but deciding something based on the certainty of it in > an earlier planning phase seems odd to me. Maybe that's just me > though. In terms of competitiveness, Oracle does not run INSERT VALUES in parallel: https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-6626C70C-876C-47A4-8C01-9B66574062D8 "The INSERT VALUES statement is never executed in parallel." And SQL Server doesn't either: https://docs.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15 "Starting with SQL Server 2016 (13.x) and database compatibility level 130, the INSERT … SELECT statement can be executedin parallel when inserting into heaps or clustered columnstore indexes (CCI), and using the TABLOCK hint." Regards Takayuki Tsunakawa
On Wed, Feb 10, 2021 at 5:03 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > From: Amit Langote <amitlangote09@gmail.com> > > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> > > wrote: > > > There's no "second-guessing" involved here. > > > There is no underlying way of dividing up the VALUES data of > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > was updated to produce a parallel-plan for the "INSERT...VALUES" case > > > (apart from the fact that spawning off parallel workers to insert that > > > data would almost always result in worse performance than a > > > non-parallel plan...) > > > The division of work for parallel workers is part of the table AM > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > I don't disagree that the planner would not normally assign a parallel > > path simply to pull values out of a VALUES list mentioned in the > > INSERT command, but deciding something based on the certainty of it in > > an earlier planning phase seems odd to me. Maybe that's just me > > though. > > In terms of competitiveness, Oracle does not run INSERT VALUES in parallel: > > https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-6626C70C-876C-47A4-8C01-9B66574062D8 > > "The INSERT VALUES statement is never executed in parallel." > > > And SQL Server doesn't either: > > https://docs.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15 > > "Starting with SQL Server 2016 (13.x) and database compatibility level 130, the INSERT … SELECT statement can be executedin parallel when inserting into heaps or clustered columnstore indexes (CCI), and using the TABLOCK hint." Just to be clear, I'm not suggesting that we should put effort into making INSERT ... VALUES run in parallel. I'm just raising my concern about embedding the assumption in max_parallel_hazard() that it will never make sense to do so. Although, maybe there are other more pressing concerns to resolve, so I will not insist too much on doing anything about this. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > > > The code check that you have identified above ensures that the INSERT > > > > has an underlying SELECT, because the planner won't (and shouldn't > > > > anyway) generate a parallel plan for INSERT...VALUES, so there is no > > > > point doing any parallel-safety checks in this case. > > > > It just so happens that the problem test case uses INSERT...VALUES - > > > > and it shouldn't have triggered the parallel-safety checks for > > > > parallel INSERT for this case anyway, because INSERT...VALUES can't > > > > (and shouldn't) be parallelized. > > > > > > AFAICS, max_parallel_hazard() path never bails from doing further > > > safety checks based on anything other than finding a query component > > > whose hazard level crosses context->max_interesting. > > > > It's parallel UNSAFE because it's not safe or even possible to have a > > parallel plan for this. > > (as UNSAFE is the max hazard level, no point in referencing > > context->max_interesting). > > And there are existing cases of bailing out and not doing further > > safety checks (even your v15_delta.diff patch placed code - for > > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such > > existing case in max_parallel_hazard_walker()): > > > > else if (IsA(node, Query)) > > { > > Query *query = (Query *) node; > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > if (query->rowMarks != NULL) > > { > > context->max_hazard = PROPARALLEL_UNSAFE; > > return true; > > } > > In my understanding, the max_parallel_hazard() query tree walk is to > find constructs that are parallel unsafe in that they call code that > can't run in parallel mode. For example, FOR UPDATE/SHARE on > traditional heap AM tuples calls AssignTransactionId() which doesn't > support being called in parallel mode. Likewise ON CONFLICT ... DO > UPDATE calls heap_update() which doesn't support parallelism. I'm not > aware of any such hazards downstream of ExecValuesScan(). > > > >You're trying to > > > add something that bails based on second-guessing that a parallel path > > > would not be chosen, which I find somewhat objectionable. > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > expensive modification safety check on the target relation, maybe we > > > should try to somehow make the check less expensive. I remember > > > reading somewhere in the thread about caching the result of this check > > > in relcache, but haven't closely studied the feasibility of doing so. > > > > > > > There's no "second-guessing" involved here. > > There is no underlying way of dividing up the VALUES data of > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > was updated to produce a parallel-plan for the "INSERT...VALUES" case > > (apart from the fact that spawning off parallel workers to insert that > > data would almost always result in worse performance than a > > non-parallel plan...) > > The division of work for parallel workers is part of the table AM > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > I don't disagree that the planner would not normally assign a parallel > path simply to pull values out of a VALUES list mentioned in the > INSERT command, but deciding something based on the certainty of it in > an earlier planning phase seems odd to me. Maybe that's just me > though. > I think it is more of a case where neither is a need for parallelism nor we want to support parallelism of it. The other possibility for such a check could be at some earlier phase say in standard_planner [1] where we are doing checks for other constructs where we don't want parallelism (I think the check for 'parse->hasModifyingCTE' is quite similar). If you see in that check as well we just assume other operations to be in the category of parallel-unsafe. I think we should rule out such cases earlier than later. Do you have better ideas than what Greg has done to avoid parallelism for such cases? [1] - standard_planner() { .. if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && parse->commandType == CMD_SELECT && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) { /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); } else { /* skip the query tree scan, just assume it's unsafe */ glob->maxParallelHazard = PROPARALLEL_UNSAFE; glob->parallelModeOK = false; } -- With Regards, Amit Kapila.
On Wed, Feb 10, 2021 at 11:13 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > The loss is probably due to 1) more index page splits, 2) more buffer writes (table and index), and 3) internal locks forthings such as relation extension and page content protection. To investigate 3), we should want something like [1],which tells us the wait event statistics (wait count and time for each wait event) per session or across the instancelike Oracke, MySQL and EDB provides. I want to continue this in the near future. > > I think we might mitigate such losses with a different patch where we can do a bulk insert for Insert .. Select something like we are discussing in the other thread [1]. I wonder if these performance characteristics are due to the reason of underlying bitmap heap scan. What are the results if disable the bitmap heap scan(Set enable_bitmapscan = off)? If that happens to be true, then we might also want to consider if in some way we can teach parallel insert to cost more in such cases. Another thing we can try is to integrate a parallel-insert patch with the patch on another thread [1] and see if that makes any difference but not sure if we want to go there at this stage unless it is simple to try that out? [1] - https://www.postgresql.org/message-id/20200508072545.GA9701%40telsasoft.com -- With Regards, Amit Kapila.
On Wed, Feb 10, 2021 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > It's parallel UNSAFE because it's not safe or even possible to have a > > > parallel plan for this. > > > (as UNSAFE is the max hazard level, no point in referencing > > > context->max_interesting). > > > And there are existing cases of bailing out and not doing further > > > safety checks (even your v15_delta.diff patch placed code - for > > > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such > > > existing case in max_parallel_hazard_walker()): > > > > > > else if (IsA(node, Query)) > > > { > > > Query *query = (Query *) node; > > > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > > if (query->rowMarks != NULL) > > > { > > > context->max_hazard = PROPARALLEL_UNSAFE; > > > return true; > > > } > > > > In my understanding, the max_parallel_hazard() query tree walk is to > > find constructs that are parallel unsafe in that they call code that > > can't run in parallel mode. For example, FOR UPDATE/SHARE on > > traditional heap AM tuples calls AssignTransactionId() which doesn't > > support being called in parallel mode. Likewise ON CONFLICT ... DO > > UPDATE calls heap_update() which doesn't support parallelism. I'm not > > aware of any such hazards downstream of ExecValuesScan(). > > > > > >You're trying to > > > > add something that bails based on second-guessing that a parallel path > > > > would not be chosen, which I find somewhat objectionable. > > > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > > expensive modification safety check on the target relation, maybe we > > > > should try to somehow make the check less expensive. I remember > > > > reading somewhere in the thread about caching the result of this check > > > > in relcache, but haven't closely studied the feasibility of doing so. > > > > > > > > > > There's no "second-guessing" involved here. > > > There is no underlying way of dividing up the VALUES data of > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > was updated to produce a parallel-plan for the "INSERT...VALUES" case > > > (apart from the fact that spawning off parallel workers to insert that > > > data would almost always result in worse performance than a > > > non-parallel plan...) > > > The division of work for parallel workers is part of the table AM > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > I don't disagree that the planner would not normally assign a parallel > > path simply to pull values out of a VALUES list mentioned in the > > INSERT command, but deciding something based on the certainty of it in > > an earlier planning phase seems odd to me. Maybe that's just me > > though. > > > > I think it is more of a case where neither is a need for parallelism > nor we want to support parallelism of it. The other possibility for > such a check could be at some earlier phase say in standard_planner > [1] where we are doing checks for other constructs where we don't want > parallelism (I think the check for 'parse->hasModifyingCTE' is quite > similar). If you see in that check as well we just assume other > operations to be in the category of parallel-unsafe. I think we should > rule out such cases earlier than later. Do you have better ideas than > what Greg has done to avoid parallelism for such cases? > > [1] - > standard_planner() > { > .. > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > IsUnderPostmaster && > parse->commandType == CMD_SELECT && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > { > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > } > else > { > /* skip the query tree scan, just assume it's unsafe */ > glob->maxParallelHazard = PROPARALLEL_UNSAFE; > glob->parallelModeOK = false; > } Yeah, maybe having the block I was commenting on, viz.: + /* + * If there is no underlying SELECT, a parallel table-modification + * operation is not possible (nor desirable). + */ + hasSubQuery = false; + foreach(lc, parse->rtable) + { + rte = lfirst_node(RangeTblEntry, lc); + if (rte->rtekind == RTE_SUBQUERY) + { + hasSubQuery = true; + break; + } + } + if (!hasSubQuery) + return PROPARALLEL_UNSAFE; before the standard_planner() block you quoted might be a good idea. So something like this: + /* + * If there is no underlying SELECT, a parallel table-modification + * operation is not possible (nor desirable). + */ + rangeTablehasSubQuery = false; + foreach(lc, parse->rtable) + { + rte = lfirst_node(RangeTblEntry, lc); + if (rte->rtekind == RTE_SUBQUERY) + { + rangeTableHasSubQuery = true; + break; + } + } if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && (parse->commandType == CMD_SELECT || (IsModifySupportedInParallelMode(parse->commandType) && rangeTableHasSubQuery)) && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) { /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); } else { /* skip the query tree scan, just assume it's unsafe */ glob->maxParallelHazard = PROPARALLEL_UNSAFE; glob->parallelModeOK = false; } -- Amit Langote EDB: http://www.enterprisedb.com
From: Amit Langote <amitlangote09@gmail.com> > Just to be clear, I'm not suggesting that we should put effort into > making INSERT ... VALUES run in parallel. I'm just raising my concern > about embedding the assumption in max_parallel_hazard() that it will > never make sense to do so. I'm sorry I misunderstood your suggestion. So, you're suggesting that it may be better to place the VALUES existence checkoutside max_parallel_hazard(). (I may be a bit worried if I may misunderstanding in a different way.) The description of max_parallel_hazard() gave me an impression that this is the right place to check VALUES, because itsrole can be paraphrased in simpler words like "Tell you if the given Query is safe for parallel execution." In that regard,the standard_planner()'s if conditions that check Query->commandType and Query->hasModifyingCTE can be moved intomax_parallel_hazard() too. But looking closer to the description, it says "Returns the worst function hazard." Function hazard? Should this functiononly check functions? Or do we want to modify this description and get max_parallel_hazard() to provide more service? /* * max_parallel_hazard * Find the worst parallel-hazard level in the given query * * Returns the worst function hazard property (the earliest in this list: * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE) that can * be found in the given parsetree. We use this to find out whether the query * can be parallelized at all. The caller will also save the result in * PlannerGlobal so as to short-circuit checks of portions of the querytree * later, in the common case where everything is SAFE. */ char max_parallel_hazard(Query *parse) Regards Takayuki Tsunakawa
> > > > > > else if (IsA(node, Query)) > > > { > > > Query *query = (Query *) node; > > > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > > if (query->rowMarks != NULL) > > > { > > > context->max_hazard = PROPARALLEL_UNSAFE; > > > return true; > > > } > > > > In my understanding, the max_parallel_hazard() query tree walk is to > > find constructs that are parallel unsafe in that they call code that > > can't run in parallel mode. For example, FOR UPDATE/SHARE on > > traditional heap AM tuples calls AssignTransactionId() which doesn't > > support being called in parallel mode. Likewise ON CONFLICT ... DO > > UPDATE calls heap_update() which doesn't support parallelism. I'm not > > aware of any such hazards downstream of ExecValuesScan(). > > > > > >You're trying to > > > > add something that bails based on second-guessing that a parallel > > > >path would not be chosen, which I find somewhat objectionable. > > > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > > expensive modification safety check on the target relation, maybe > > > > we should try to somehow make the check less expensive. I > > > > remember reading somewhere in the thread about caching the result > > > > of this check in relcache, but haven't closely studied the feasibility > of doing so. > > > > > > > > > > There's no "second-guessing" involved here. > > > There is no underlying way of dividing up the VALUES data of > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > was updated to produce a parallel-plan for the "INSERT...VALUES" > > > case (apart from the fact that spawning off parallel workers to > > > insert that data would almost always result in worse performance > > > than a non-parallel plan...) The division of work for parallel > > > workers is part of the table AM > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > I don't disagree that the planner would not normally assign a parallel > > path simply to pull values out of a VALUES list mentioned in the > > INSERT command, but deciding something based on the certainty of it in > > an earlier planning phase seems odd to me. Maybe that's just me > > though. > > > > I think it is more of a case where neither is a need for parallelism nor > we want to support parallelism of it. The other possibility for such a check > could be at some earlier phase say in standard_planner [1] where we are > doing checks for other constructs where we don't want parallelism (I think > the check for 'parse->hasModifyingCTE' is quite similar). If you see in > that check as well we just assume other operations to be in the category > of parallel-unsafe. I think we should rule out such cases earlier than later. > Do you have better ideas than what Greg has done to avoid parallelism for > such cases? > > [1] - > standard_planner() > { > .. > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && > parse->commandType == CMD_SELECT && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > { > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK = > glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE); > } > else > { > /* skip the query tree scan, just assume it's unsafe */ > glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false; > } +1. In the current parallel_dml option patch. I put this check and some high-level check in a separate function called is_parallel_possible_for_modify. - * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE + * Check at a high-level if parallel mode is able to be used for the specified + * table-modification statement. + * It's not possible in the following cases: + * + * 1) enable_parallel_dml is off + * 2) UPDATE or DELETE command + * 3) INSERT...ON CONFLICT...DO UPDATE + * 4) INSERT without SELECT on a relation + * 5) the reloption parallel_dml_enabled is not set for the target table + * + * (Note: we don't do in-depth parallel-safety checks here, we do only the + * cheaper tests that can quickly exclude obvious cases for which + * parallelism isn't supported, to avoid having to do further parallel-safety + * checks for these) */ +bool +is_parallel_possible_for_modify(Query *parse) And I put the function at earlier place like the following: if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && (parse->commandType == CMD_SELECT || - (enable_parallel_dml && - IsModifySupportedInParallelMode(parse->commandType))) && + is_parallel_possible_for_modify(parse)) && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) If this looks good, maybe we can merge this change. Best regards, houzj
On Wed, Feb 10, 2021 at 5:52 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > From: Amit Langote <amitlangote09@gmail.com> > > Just to be clear, I'm not suggesting that we should put effort into > > making INSERT ... VALUES run in parallel. I'm just raising my concern > > about embedding the assumption in max_parallel_hazard() that it will > > never make sense to do so. > > I'm sorry I misunderstood your suggestion. So, you're suggesting that it may be better to place the VALUES existence checkoutside max_parallel_hazard(). (I may be a bit worried if I may misunderstanding in a different way.) To add context to my comments, here's the block of code in the patch I was commenting on: + /* + * If there is no underlying SELECT, a parallel table-modification + * operation is not possible (nor desirable). + */ + hasSubQuery = false; + foreach(lc, parse->rtable) + { + rte = lfirst_node(RangeTblEntry, lc); + if (rte->rtekind == RTE_SUBQUERY) + { + hasSubQuery = true; + break; + } + } + if (!hasSubQuery) + return PROPARALLEL_UNSAFE; For a modification query, this makes max_parallel_hazard() return that it is unsafe to parallelize the query because it doesn't contain a subquery RTE, or only contains a VALUES RTE. I was trying to say that inside max_parallel_hazard() seems to be a wrong place to reject parallelism for modification if only because there are no subquery RTEs in the query. Although now I'm thinking that maybe it's okay as long as it's appropriately placed. I shared one suggestion in my reply to Amit K's email. > The description of max_parallel_hazard() gave me an impression that this is the right place to check VALUES, because itsrole can be paraphrased in simpler words like "Tell you if the given Query is safe for parallel execution." > > In that regard, the standard_planner()'s if conditions that check Query->commandType and Query->hasModifyingCTE can bemoved into max_parallel_hazard() too. > > But looking closer to the description, it says "Returns the worst function hazard." Function hazard? Should this functiononly check functions? Or do we want to modify this description and get max_parallel_hazard() to provide more service? Yeah, updating the description to be more general may make sense. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Feb 10, 2021 at 5:50 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Feb 10, 2021 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > It's parallel UNSAFE because it's not safe or even possible to have a > > > > parallel plan for this. > > > > (as UNSAFE is the max hazard level, no point in referencing > > > > context->max_interesting). > > > > And there are existing cases of bailing out and not doing further > > > > safety checks (even your v15_delta.diff patch placed code - for > > > > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such > > > > existing case in max_parallel_hazard_walker()): > > > > > > > > else if (IsA(node, Query)) > > > > { > > > > Query *query = (Query *) node; > > > > > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > > > if (query->rowMarks != NULL) > > > > { > > > > context->max_hazard = PROPARALLEL_UNSAFE; > > > > return true; > > > > } > > > > > > In my understanding, the max_parallel_hazard() query tree walk is to > > > find constructs that are parallel unsafe in that they call code that > > > can't run in parallel mode. For example, FOR UPDATE/SHARE on > > > traditional heap AM tuples calls AssignTransactionId() which doesn't > > > support being called in parallel mode. Likewise ON CONFLICT ... DO > > > UPDATE calls heap_update() which doesn't support parallelism. I'm not > > > aware of any such hazards downstream of ExecValuesScan(). > > > > > > > >You're trying to > > > > > add something that bails based on second-guessing that a parallel path > > > > > would not be chosen, which I find somewhat objectionable. > > > > > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > > > expensive modification safety check on the target relation, maybe we > > > > > should try to somehow make the check less expensive. I remember > > > > > reading somewhere in the thread about caching the result of this check > > > > > in relcache, but haven't closely studied the feasibility of doing so. > > > > > > > > > > > > > There's no "second-guessing" involved here. > > > > There is no underlying way of dividing up the VALUES data of > > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > > was updated to produce a parallel-plan for the "INSERT...VALUES" case > > > > (apart from the fact that spawning off parallel workers to insert that > > > > data would almost always result in worse performance than a > > > > non-parallel plan...) > > > > The division of work for parallel workers is part of the table AM > > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > > > I don't disagree that the planner would not normally assign a parallel > > > path simply to pull values out of a VALUES list mentioned in the > > > INSERT command, but deciding something based on the certainty of it in > > > an earlier planning phase seems odd to me. Maybe that's just me > > > though. > > > > > > > I think it is more of a case where neither is a need for parallelism > > nor we want to support parallelism of it. The other possibility for > > such a check could be at some earlier phase say in standard_planner > > [1] where we are doing checks for other constructs where we don't want > > parallelism (I think the check for 'parse->hasModifyingCTE' is quite > > similar). If you see in that check as well we just assume other > > operations to be in the category of parallel-unsafe. I think we should > > rule out such cases earlier than later. Do you have better ideas than > > what Greg has done to avoid parallelism for such cases? > > > > [1] - > > standard_planner() > > { > > .. > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > > IsUnderPostmaster && > > parse->commandType == CMD_SELECT && > > !parse->hasModifyingCTE && > > max_parallel_workers_per_gather > 0 && > > !IsParallelWorker()) > > { > > /* all the cheap tests pass, so scan the query tree */ > > glob->maxParallelHazard = max_parallel_hazard(parse); > > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > > } > > else > > { > > /* skip the query tree scan, just assume it's unsafe */ > > glob->maxParallelHazard = PROPARALLEL_UNSAFE; > > glob->parallelModeOK = false; > > } > > Yeah, maybe having the block I was commenting on, viz.: > > + /* > + * If there is no underlying SELECT, a parallel table-modification > + * operation is not possible (nor desirable). > + */ > + hasSubQuery = false; > + foreach(lc, parse->rtable) > + { > + rte = lfirst_node(RangeTblEntry, lc); > + if (rte->rtekind == RTE_SUBQUERY) > + { > + hasSubQuery = true; > + break; > + } > + } > + if (!hasSubQuery) > + return PROPARALLEL_UNSAFE; > > before the standard_planner() block you quoted might be a good idea. > So something like this: > > + /* > + * If there is no underlying SELECT, a parallel table-modification > + * operation is not possible (nor desirable). > + */ > + rangeTablehasSubQuery = false; > + foreach(lc, parse->rtable) > + { > + rte = lfirst_node(RangeTblEntry, lc); > + if (rte->rtekind == RTE_SUBQUERY) > + { > + rangeTableHasSubQuery = true; > + break; > + } > + } > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > IsUnderPostmaster && > (parse->commandType == CMD_SELECT || > (IsModifySupportedInParallelMode(parse->commandType) && > rangeTableHasSubQuery)) && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > { > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); > } > else > { > /* skip the query tree scan, just assume it's unsafe */ > glob->maxParallelHazard = PROPARALLEL_UNSAFE; > glob->parallelModeOK = false; > } On second thought, maybe we could even put the hasSubQuery-based short-circuit in the following block of max_parallel_hazard_walker(): /* * When we're first invoked on a completely unplanned tree, we must * recurse into subqueries so to as to locate parallel-unsafe constructs * anywhere in the tree. */ else if (IsA(node, Query)) { Query *query = (Query *) node; /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ if (query->rowMarks != NULL) { context->max_hazard = PROPARALLEL_UNSAFE; return true; } Also, update the comment to mention we bail out if (!hasSubQuery) as a special case. -- Amit Langote EDB: http://www.enterprisedb.com
> What are the results if disable the bitmap heap scan(Set enable_bitmapscan > = off)? If that happens to be true, then we might also want to consider > if in some way we can teach parallel insert to cost more in such cases. > Another thing we can try is to integrate a parallel-insert patch with the > patch on another thread [1] and see if that makes any difference but not > sure if we want to go there at this stage unless it is simple to try that > out? If we diable bitmapscan, the performance degradation seems will not happen. [Parallel] postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..2090216.68 rows=81338 width=0) (actual time=0.226..5488.455 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027 WAL: records=260400 bytes=16549513 -> Insert on public.testscan (cost=0.00..2081082.88 rows=0 width=0) (actual time=5483.113..5483.114 rows=0 loops=4) Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027 WAL: records=260400 bytes=16549513 Worker 0: actual time=5483.116..5483.117 rows=0 loops=1 Buffers: shared hit=36306 read=264288 dirtied=100 written=49 WAL: records=23895 bytes=1575860 Worker 1: actual time=5483.220..5483.222 rows=0 loops=1 Buffers: shared hit=39750 read=280476 dirtied=101 written=106 WAL: records=26141 bytes=1685083 Worker 2: actual time=5482.844..5482.845 rows=0 loops=1 Buffers: shared hit=38660 read=263713 dirtied=105 written=250 WAL: records=25318 bytes=1657396 Worker 3: actual time=5483.272..5483.274 rows=0 loops=1 Buffers: shared hit=278648 read=271058 dirtied=678 written=622 WAL: records=185046 bytes=11631174 -> Parallel Seq Scan on public.x (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248 rows=32500loops=4) Output: x.a, NULL::integer Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 49967500 Buffers: shared hit=1551 read=1079531 Worker 0: actual time=5335.456..5340.757 rows=11924 loops=1 Buffers: shared hit=281 read=264288 Worker 1: actual time=5335.559..5341.766 rows=13049 loops=1 Buffers: shared hit=281 read=280476 Worker 2: actual time=5335.534..5340.964 rows=12636 loops=1 Buffers: shared hit=278 read=263712 Worker 3: actual time=0.015..5125.503 rows=92390 loops=1 Buffers: shared hit=711 read=271055 Planning: Buffers: shared hit=19 Planning Time: 0.175 ms Execution Time: 5488.493 ms [Serial] postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Insert on public.testscan (cost=0.00..5081085.52 rows=0 width=0) (actual time=19311.642..19311.643 rows=0 loops=1) Buffers: shared hit=392485 read=1079694 dirtied=934 written=933 WAL: records=260354 bytes=16259841 -> Seq Scan on public.x (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=129999 loops=1) Output: x.a, NULL::integer Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 199870001 Buffers: shared hit=1391 read=1079691 Planning: Buffers: shared hit=10 Planning Time: 0.125 ms Execution Time: 19311.700 ms Best regards, houzj
> > > It did have performance gain, but I think it's not huge enough to > > > ignore the extra's index cost. > > > What do you think ? > > > > Yes... as you suspect, I'm afraid the benefit from parallel bitmap > > scan may not compensate for the loss of the parallel insert operation. > > > > The loss is probably due to 1) more index page splits, 2) more buffer > > writes (table and index), and 3) internal locks for things such as > > relation extension and page content protection. To investigate 3), we > > should want something like [1], which tells us the wait event > > statistics (wait count and time for each wait event) per session or > > across the instance like Oracke, MySQL and EDB provides. I want to > continue this in the near future. > > What would the result look like if you turn off > parallel_leader_participation? If the leader is freed from > reading/writing the table and index, the index page splits and internal > lock contention may decrease enough to recover part of the loss. > > https://www.postgresql.org/docs/devel/parallel-plans.html > > "In a parallel bitmap heap scan, one process is chosen as the leader. That > process performs a scan of one or more indexes and builds a bitmap indicating > which table blocks need to be visited. These blocks are then divided among > the cooperating processes as in a parallel sequential scan. In other words, > the heap scan is performed in parallel, but the underlying index scan is > not." If I disable parallel_leader_participation. For max_parallel_workers_per_gather = 4, It still have performance degradation. For max_parallel_workers_per_gather = 2, the performance degradation will not happen in most of the case. There is sometimes a noise(performance degradation), but most of result(about 80%) is good. Best regards, houzj
Attachment
On Wed, Feb 10, 2021 at 8:59 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > > > > > > else if (IsA(node, Query)) > > > > { > > > > Query *query = (Query *) node; > > > > > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > > > if (query->rowMarks != NULL) > > > > { > > > > context->max_hazard = PROPARALLEL_UNSAFE; > > > > return true; > > > > } > > > > > > In my understanding, the max_parallel_hazard() query tree walk is to > > > find constructs that are parallel unsafe in that they call code that > > > can't run in parallel mode. For example, FOR UPDATE/SHARE on > > > traditional heap AM tuples calls AssignTransactionId() which doesn't > > > support being called in parallel mode. Likewise ON CONFLICT ... DO > > > UPDATE calls heap_update() which doesn't support parallelism. I'm not > > > aware of any such hazards downstream of ExecValuesScan(). > > > > > > > >You're trying to > > > > > add something that bails based on second-guessing that a parallel > > > > >path would not be chosen, which I find somewhat objectionable. > > > > > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > > > expensive modification safety check on the target relation, maybe > > > > > we should try to somehow make the check less expensive. I > > > > > remember reading somewhere in the thread about caching the result > > > > > of this check in relcache, but haven't closely studied the feasibility > > of doing so. > > > > > > > > > > > > > There's no "second-guessing" involved here. > > > > There is no underlying way of dividing up the VALUES data of > > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > > was updated to produce a parallel-plan for the "INSERT...VALUES" > > > > case (apart from the fact that spawning off parallel workers to > > > > insert that data would almost always result in worse performance > > > > than a non-parallel plan...) The division of work for parallel > > > > workers is part of the table AM > > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > > > I don't disagree that the planner would not normally assign a parallel > > > path simply to pull values out of a VALUES list mentioned in the > > > INSERT command, but deciding something based on the certainty of it in > > > an earlier planning phase seems odd to me. Maybe that's just me > > > though. > > > > > > > I think it is more of a case where neither is a need for parallelism nor > > we want to support parallelism of it. The other possibility for such a check > > could be at some earlier phase say in standard_planner [1] where we are > > doing checks for other constructs where we don't want parallelism (I think > > the check for 'parse->hasModifyingCTE' is quite similar). If you see in > > that check as well we just assume other operations to be in the category > > of parallel-unsafe. I think we should rule out such cases earlier than later. > > Do you have better ideas than what Greg has done to avoid parallelism for > > such cases? > > > > [1] - > > standard_planner() > > { > > .. > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && > > parse->commandType == CMD_SELECT && > > !parse->hasModifyingCTE && > > max_parallel_workers_per_gather > 0 && > > !IsParallelWorker()) > > { > > /* all the cheap tests pass, so scan the query tree */ > > glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK = > > glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE); > > } > > else > > { > > /* skip the query tree scan, just assume it's unsafe */ > > glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false; > > } > > +1. > > In the current parallel_dml option patch. I put this check and some high-level check in a separate function called is_parallel_possible_for_modify. > > > - * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE > + * Check at a high-level if parallel mode is able to be used for the specified > + * table-modification statement. > + * It's not possible in the following cases: > + * > + * 1) enable_parallel_dml is off > + * 2) UPDATE or DELETE command > + * 3) INSERT...ON CONFLICT...DO UPDATE > + * 4) INSERT without SELECT on a relation > + * 5) the reloption parallel_dml_enabled is not set for the target table > + * > + * (Note: we don't do in-depth parallel-safety checks here, we do only the > + * cheaper tests that can quickly exclude obvious cases for which > + * parallelism isn't supported, to avoid having to do further parallel-safety > + * checks for these) > */ > +bool > +is_parallel_possible_for_modify(Query *parse) > > > > > > And I put the function at earlier place like the following: > > > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && > IsUnderPostmaster && > (parse->commandType == CMD_SELECT || > - (enable_parallel_dml && > - IsModifySupportedInParallelMode(parse->commandType))) && > + is_parallel_possible_for_modify(parse)) && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > > > If this looks good, maybe we can merge this change. > If I've understood correctly, you're suggesting to merge the "is_parallel_possible_for_modify()" code from your parallel_dml patch into the main parallel INSERT patch, right? If so, I think that's a good idea, as it will help simplify both patches (and then, if need be, we can still discuss where best to place certain checks ...). I'll post an update soon that includes that change (then the parallel_dml patch will need to be rebased accordingly). Regards, Greg Nancarrow Fujitsu Australia
On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > * I think that the concerns raised by Tsunakawa-san in: > > https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com > > regarding how this interacts with plancache.c deserve a look. > Specifically, a plan that uses parallel insert may fail to be > invalidated when partitions are altered directly (that is without > altering their root parent). That would be because we are not adding > partition OIDs to PlannerGlobal.invalItems despite making a plan > that's based on checking their properties. See this (tested with all > patches applied!): > Does any current Postgres code add partition OIDs to PlannerGlobal.invalItems for a similar reason? I would have thought that, for example, partitions with a default column expression, using a function that is changed from SAFE to UNSAFE, would suffer the same plancache issue (for current parallel SELECT functionality) as we're talking about here - but so far I haven't seen any code handling this. (Currently invalItems seems to support PROCID and TYPEOID; relation OIDs seem to be handled through a different mechanism).. Can you elaborate on what you believe is required here, so that the partition OID dependency is registered and the altered partition results in the plan being invalidated? Thanks in advance for any help you can provide here. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > * I think that the concerns raised by Tsunakawa-san in: > > > > https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com > > > > regarding how this interacts with plancache.c deserve a look. > > Specifically, a plan that uses parallel insert may fail to be > > invalidated when partitions are altered directly (that is without > > altering their root parent). That would be because we are not adding > > partition OIDs to PlannerGlobal.invalItems despite making a plan > > that's based on checking their properties. See this (tested with all > > patches applied!): > > > > Does any current Postgres code add partition OIDs to > PlannerGlobal.invalItems for a similar reason? > I would have thought that, for example, partitions with a default > column expression, using a function that is changed from SAFE to > UNSAFE, would suffer the same plancache issue (for current parallel > SELECT functionality) as we're talking about here - but so far I > haven't seen any code handling this. > > (Currently invalItems seems to support PROCID and TYPEOID; relation > OIDs seem to be handled through a different mechanism).. > Can you elaborate on what you believe is required here, so that the > partition OID dependency is registered and the altered partition > results in the plan being invalidated? > Thanks in advance for any help you can provide here. > Actually, I tried adding the following in the loop that checks the parallel-safety of each partition and it seemed to work: glob->relationOids = lappend_oid(glob->relationOids, pdesc->oids[i]); Can you confirm, is that what you were referring to? (note that I've already updated the code to use CreatePartitionDirectory() and PartitionDirectoryLookup()) Regards, Greg Nancarrow Fujitsu Australia
Posting an updated set of patches. Changes are based on feedback, as detailed below: [Amit Langote] - Integrate max_parallel_hazard_for_modify() with max_parallel_hazard() - Some function name changes - Fix partition-related problems (to handle concurrent attachment of partitions and altering of partitions, plan cache invalidation) and added some tests for this. (Method of fixing yet to be verified) [Hou-san] - Merge is_parallel_possible_for_modify() from the parallel_dml patch, which helps in placement of some short-circuits of parallel-safety checks - Minor update to documentation for temp and foreign tables [Greg] - Temporary fix for query rewriter hasModifyingCTE bug (without changing query rewriter code - note that v15 patch put fix in query rewriter) Hou-san: the parallel_dml patches will need slight rebasing. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches. Changes are based on feedback, as > detailed below: > Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the tests, and it caused some test failures in the PostgreSQL Patch Tester (cfbot). Also, I think that perhaps the localized temporary fix included in the patch for the hasModifyingCTE bug should be restricted to INSERT, even though the bug actually exists for SELECT too. Posting an updated set of patches to address these. Regards, Greg Nancarrow Fujitsu Australia
Attachment
Hi,
For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
+ /* Assume original queries have hasModifyingCTE set correctly */
+ if (parsetree->hasModifyingCTE)
+ hasModifyingCTE = true;
+ if (parsetree->hasModifyingCTE)
+ hasModifyingCTE = true;
Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
hasModifyingCTE = parsetree->hasModifyingCTE
+ if (!hasSubQuery)
+ return false;
+
+ return true;
+ return false;
+
+ return true;
The above can be simplified as:
return hasSubQuery;
Cheers
On Thu, Feb 11, 2021 at 7:02 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches. Changes are based on feedback, as
> detailed below:
>
Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
tests, and it caused some test failures in the PostgreSQL Patch Tester
(cfbot).
Also, I think that perhaps the localized temporary fix included in the
patch for the hasModifyingCTE bug should be restricted to INSERT, even
though the bug actually exists for SELECT too.
Posting an updated set of patches to address these.
Regards,
Greg Nancarrow
Fujitsu Australia
On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch : > > + /* Assume original queries have hasModifyingCTE set correctly */ > + if (parsetree->hasModifyingCTE) > + hasModifyingCTE = true; > > Since hasModifyingCTE is false by the time the above is run, it can be simplified as: > hasModifyingCTE = parsetree->hasModifyingCTE > Actually, we should just return parsetree->hasModifyingCTE at this point, because if it's false, we shouldn't need to continue the search (as we're assuming it has been set correctly for QSRC_ORIGINAL case). > + if (!hasSubQuery) > + return false; > + > + return true; > > The above can be simplified as: > return hasSubQuery; > Yes, absolutely right, silly miss on that one! Thanks. This was only ever meant to be a temporary fix for this bug that affects this patch. Regards, Greg Nancarrow Fujitsu Australia
Greg:
bq. we should just return parsetree->hasModifyingCTE at this point,
Maybe you can clarify a bit.
The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE) block.
In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)
Cheers
On Thu, Feb 11, 2021 at 8:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> + /* Assume original queries have hasModifyingCTE set correctly */
> + if (parsetree->hasModifyingCTE)
> + hasModifyingCTE = true;
>
> Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
> hasModifyingCTE = parsetree->hasModifyingCTE
>
Actually, we should just return parsetree->hasModifyingCTE at this
point, because if it's false, we shouldn't need to continue the search
(as we're assuming it has been set correctly for QSRC_ORIGINAL case).
> + if (!hasSubQuery)
> + return false;
> +
> + return true;
>
> The above can be simplified as:
> return hasSubQuery;
>
Yes, absolutely right, silly miss on that one!
Thanks.
This was only ever meant to be a temporary fix for this bug that
affects this patch.
Regards,
Greg Nancarrow
Fujitsu Australia
On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > Greg: > bq. we should just return parsetree->hasModifyingCTE at this point, > > Maybe you can clarify a bit. > The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE). > When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE)block. > > In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not) > Sorry for not making it clear. What I meant was that instead of: if (parsetree->querySource == QSRC_ORIGINAL) { /* Assume original queries have hasModifyingCTE set correctly */ if (parsetree->hasModifyingCTE) hasModifyingCTE = true; } I thought I should be able to use the following (it the setting for QSRC_ORIGINAL can really be trusted): if (parsetree->querySource == QSRC_ORIGINAL) { /* Assume original queries have hasModifyingCTE set correctly */ return parsetree->hasModifyingCTE; } (and then the "if (!hasModifyingCTE)" test on the code following immediately below it can be removed) BUT - after testing that change, the problem test case (in the "with" tests) STILL fails. I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL case (by adding an Assert), and it always is false. So actually, there is no point in having the "if (parsetree->querySource == QSRC_ORIGINAL)" code block - even the so called "original" query doesn't maintain the setting correctly (even though the actual original query sent into the query rewriter does). And also then the "if (!hasModifyingCTE)" test on the code following immediately below it can be removed. Regards, Greg Nancarrow Fujitsu Australia
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > > What would the result look like if you turn off > > parallel_leader_participation? If the leader is freed from > > reading/writing the table and index, the index page splits and > > internal lock contention may decrease enough to recover part of the loss. > > > > https://www.postgresql.org/docs/devel/parallel-plans.html > > > > "In a parallel bitmap heap scan, one process is chosen as the leader. > > That process performs a scan of one or more indexes and builds a > > bitmap indicating which table blocks need to be visited. These blocks > > are then divided among the cooperating processes as in a parallel > > sequential scan. In other words, the heap scan is performed in > > parallel, but the underlying index scan is not." > > If I disable parallel_leader_participation. > > For max_parallel_workers_per_gather = 4, It still have performance > degradation. > > For max_parallel_workers_per_gather = 2, the performance degradation will > not happen in most of the case. > There is sometimes a noise(performance degradation), but most of > result(about 80%) is good. Thank you. The results indicate that it depends on the degree of parallelism whether the gain from parallelism outweighsthe loss of parallel insert operations, at least in the bitmap scan case. But can we conclude that this is limited to bitmap scan? Even if that's the case, the planner does not have informationabout insert operation to choose other plans like serial execution or parallel sequential scan. Should we encouragethe user in the manual to tune parameters and find the fastest plan? Regards Takayuki Tsunakawa
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > If we diable bitmapscan, the performance degradation seems will not happen. Yes, but that's because the hundreds of times slower sequential scan hides the insert time. Furthermore, as an aside, Worker3 does much of the work in the parallel sequential scan + parallel insert case, while the load is well balanced inthe parallel bitmap scan + parallel insert case. Oracle and SQL Server executes parallel DML by holding an exclusive lock on the target table. They might use some specialpath for parallel DML to mitigate contention. [serial bitmap scan + serial insert] -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005 rows=129999loops=1) ... Execution Time: 360.547 ms [parallel bitmap scan + parallel insert] -> Parallel Bitmap Heap Scan on public.x (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787rows=26000 loops=5) ... Execution Time: 382.776 ms [serial sequential scan + serial insert] -> Seq Scan on public.x (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=129999 loops=1) ... Execution Time: 19311.700 ms [parallel sequential scan + parallel insert] -> Parallel Seq Scan on public.x (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248 rows=32500loops=4) ... Execution Time: 5488.493 ms Regards Takayuki Tsunakawa
Greg:
Thanks for more debugging.
Cheers
On Thu, Feb 11, 2021 at 9:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Greg:
> bq. we should just return parsetree->hasModifyingCTE at this point,
>
> Maybe you can clarify a bit.
> The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
> When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE) block.
>
> In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)
>
Sorry for not making it clear. What I meant was that instead of:
if (parsetree->querySource == QSRC_ORIGINAL)
{
/* Assume original queries have hasModifyingCTE set correctly */
if (parsetree->hasModifyingCTE)
hasModifyingCTE = true;
}
I thought I should be able to use the following (it the setting for
QSRC_ORIGINAL can really be trusted):
if (parsetree->querySource == QSRC_ORIGINAL)
{
/* Assume original queries have hasModifyingCTE set correctly */
return parsetree->hasModifyingCTE;
}
(and then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed)
BUT - after testing that change, the problem test case (in the "with"
tests) STILL fails.
I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
case (by adding an Assert), and it always is false.
So actually, there is no point in having the "if
(parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
called "original" query doesn't maintain the setting correctly (even
though the actual original query sent into the query rewriter does).
And also then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed.
Regards,
Greg Nancarrow
Fujitsu Australia
On Fri, Feb 12, 2021 at 5:30 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > If I disable parallel_leader_participation. > > > > For max_parallel_workers_per_gather = 4, It still have performance > > degradation. > > > > For max_parallel_workers_per_gather = 2, the performance degradation will > > not happen in most of the case. > > There is sometimes a noise(performance degradation), but most of > > result(about 80%) is good. > > Thank you. The results indicate that it depends on the degree of parallelism whether the gain from parallelism outweighsthe loss of parallel insert operations, at least in the bitmap scan case. > That seems to be the pattern for this particular query, but I think we'd need to test a variety to determine if that's always the case. > But can we conclude that this is limited to bitmap scan? Even if that's the case, the planner does not have informationabout insert operation to choose other plans like serial execution or parallel sequential scan. Should we encouragethe user in the manual to tune parameters and find the fastest plan? > > It's all based on path costs, so we need to analyze and compare the costing calculations done in this particular case against other cases, and the values of the various parameters (costsize.c). It's not difficult to determine for a parallel ModifyTablePath if it has a BitmapHeapPath subpath - perhaps total_cost needs adjustment (increase) for this case - and that will influence the planner to choose a cheaper path. I was able to easily test the effect of doing this, in the debugger - by increasing total_cost in cost_modifytable() for the parallel bitmap heap scan case, the planner then chose a serial Insert + bitmap heap scan, because it then had a cheaper cost. Of course we need to better understand the problem and observed patters in order to get a better feel of how those costs should be adjusted. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > > * I think that the concerns raised by Tsunakawa-san in: > > > > > > https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com > > > > > > regarding how this interacts with plancache.c deserve a look. > > > Specifically, a plan that uses parallel insert may fail to be > > > invalidated when partitions are altered directly (that is without > > > altering their root parent). That would be because we are not adding > > > partition OIDs to PlannerGlobal.invalItems despite making a plan > > > that's based on checking their properties. See this (tested with all > > > patches applied!): > > > > > > > Does any current Postgres code add partition OIDs to > > PlannerGlobal.invalItems for a similar reason? Currently, the planner opens partitions only for SELECT queries and also adds them to the query's range table. And because they are added to the range table, their OIDs do get added to PlannerGlobal.relationOids (not invalItems, sorry!) by way of CompleteCachedPlan() calling extract_query_dependencies(), which looks at Query.rtable to decide which tables/partitions to add. > > I would have thought that, for example, partitions with a default > > column expression, using a function that is changed from SAFE to > > UNSAFE, would suffer the same plancache issue (for current parallel > > SELECT functionality) as we're talking about here - but so far I > > haven't seen any code handling this. AFAIK, default column expressions don't affect plans for SELECT queries. OTOH, consider a check constraint expression as an example. The planner may use one to exclude a partition from the plan with its constraint exclusion algorithm (separate from "partition pruning"). If the check constraint is dropped, any cached plans that used it will be invalidated. create table rp (a int) partition by range (a); create table rp1 partition of rp for values from (minvalue) to (0); create table rp2 partition of rp for values from (0) to (maxvalue); alter table rp1 add constraint chk check (a >= -5); set constraint_exclusion to on; -- forces using a cached plan set plan_cache_mode to force_generic_plan ; prepare q as select * from rp where a < -5; -- planner excluded rp1 because of the contradictory constraint explain execute q; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) -- constraint dropped, plancache inval hook invoked alter table rp1 drop constraint chk ; -- old plan invalidated, new one made explain execute q; QUERY PLAN --------------------------------------------------------- Seq Scan on rp1 rp (cost=0.00..41.88 rows=850 width=4) Filter: (a < '-5'::integer) (2 rows) > > (Currently invalItems seems to support PROCID and TYPEOID; relation > > OIDs seem to be handled through a different mechanism).. > > > Can you elaborate on what you believe is required here, so that the > > partition OID dependency is registered and the altered partition > > results in the plan being invalidated? > > Thanks in advance for any help you can provide here. > > Actually, I tried adding the following in the loop that checks the > parallel-safety of each partition and it seemed to work: > > glob->relationOids = > lappend_oid(glob->relationOids, pdesc->oids[i]); > > Can you confirm, is that what you were referring to? Right. I had mistakenly mentioned PlannerGlobal.invalItems, sorry. Although it gets the job done, I'm not sure if manipulating relationOids from max_parallel_hazard() or its subroutines is okay, but I will let the committer decide that. As I mentioned above, the person who designed this decided for some reason that it is extract_query_dependencies()'s job to populate PlannerGlobal.relationOids/invalItems. > (note that I've already updated the code to use > CreatePartitionDirectory() and PartitionDirectoryLookup()) I will check your v16 to check if that indeed does the intended thing. -- Amit Langote EDB: http://www.enterprisedb.com
On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > > > > * I think that the concerns raised by Tsunakawa-san in: > > > > > > > > https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com > > > > > > > > regarding how this interacts with plancache.c deserve a look. > > > > Specifically, a plan that uses parallel insert may fail to be > > > > invalidated when partitions are altered directly (that is without > > > > altering their root parent). That would be because we are not adding > > > > partition OIDs to PlannerGlobal.invalItems despite making a plan > > > > that's based on checking their properties. See this (tested with all > > > > patches applied!): > > > > > > > > > > Does any current Postgres code add partition OIDs to > > > PlannerGlobal.invalItems for a similar reason? > > Currently, the planner opens partitions only for SELECT queries and > also adds them to the query's range table. And because they are added > to the range table, their OIDs do get added to > PlannerGlobal.relationOids (not invalItems, sorry!) by way of > CompleteCachedPlan() calling extract_query_dependencies(), which looks > at Query.rtable to decide which tables/partitions to add. > > > > I would have thought that, for example, partitions with a default > > > column expression, using a function that is changed from SAFE to > > > UNSAFE, would suffer the same plancache issue (for current parallel > > > SELECT functionality) as we're talking about here - but so far I > > > haven't seen any code handling this. > > AFAIK, default column expressions don't affect plans for SELECT > queries. OTOH, consider a check constraint expression as an example. > The planner may use one to exclude a partition from the plan with its > constraint exclusion algorithm (separate from "partition pruning"). > If the check constraint is dropped, any cached plans that used it will > be invalidated. > Sorry, I got that wrong, default column expressions are relevant to INSERT, not SELECT. > > > > Actually, I tried adding the following in the loop that checks the > > parallel-safety of each partition and it seemed to work: > > > > glob->relationOids = > > lappend_oid(glob->relationOids, pdesc->oids[i]); > > > > Can you confirm, is that what you were referring to? > > Right. I had mistakenly mentioned PlannerGlobal.invalItems, sorry. > > Although it gets the job done, I'm not sure if manipulating > relationOids from max_parallel_hazard() or its subroutines is okay, > but I will let the committer decide that. As I mentioned above, the > person who designed this decided for some reason that it is > extract_query_dependencies()'s job to populate > PlannerGlobal.relationOids/invalItems. > Yes, it doesn't really seem right doing it within max_parallel_hazard(). I tried doing it in extract_query_dependencies() instead - see attached patch - and it seems to work, but I'm not sure if there might be any unintended side-effects. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > > > Did it actually use a parallel plan in your testing? > > > When I ran these tests with the Parallel INSERT patch applied, it did > > > not naturally choose a parallel plan for any of these cases. > > > > Yes, these cases pick parallel plan naturally on my test environment. > > > > postgres=# explain verbose insert into testscan select a from x where > > a<80000 or (a%2=0 and a>199900000); > > QUERY PLAN > > ---------------------------------------------------------------------- > > ----------------------------- > > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > > Workers Planned: 4 > > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > > width=0) > > -> Parallel Bitmap Heap Scan on public.x1 > > (cost=3346.89..1272067.44 rows=20343 width=8) > > Output: x1.a, NULL::integer > > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) > > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a > > > 199900000))) > > -> BitmapOr (cost=3346.89..3346.89 rows=178808 > > width=0) > > -> Bitmap Index Scan on x1_a_idx > > (cost=0.00..1495.19 rows=80883 width=0) > > Index Cond: (x1.a < 80000) > > -> Bitmap Index Scan on x1_a_idx > > (cost=0.00..1811.01 rows=97925 width=0) > > Index Cond: (x1.a > 199900000) > > > > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't > > do any parameters tuning in my test session. > > I reproduced this on my machine. > > I think we'd better do "analyze" before insert which helps reproduce this easier. > Like: > > ----- > analyze; > explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); > ----- > Thanks, I tried test_bimap.sql in my own environment, and added "analyze", and I also found it naturally chose a parallel INSERT with parallel bitmap heap scan for each of these cases. However, I didn't see any performance degradation when compared against serial INSERT with bitmap heap scan. The parallel plan in these cases seems to run a bit faster. (Note that I'm using a release build of Postgres, and using default postgresql.conf) test=# set max_parallel_workers_per_gather=4; SET test=# explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=4193.29..1255440.94 rows=74267 width=0) (actual time=210.587..212.135 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 -> Insert on public.testscan (cost=3193.29..1247014.24 rows=0 width=0) (actual time=195.296..195.298 rows=0 loops=5) Worker 0: actual time=189.512..189.514 rows=0 loops=1 Worker 1: actual time=194.843..194.844 rows=0 loops=1 Worker 2: actual time=193.986..193.988 rows=0 loops=1 Worker 3: actual time=188.035..188.037 rows=0 loops=1 -> Parallel Bitmap Heap Scan on public.x (cost=3193.29..1247014.24 rows=18567 width=8) (actual time=7.992..25.837 row s=26000 loops=5) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 10000 Heap Blocks: exact=261 Worker 0: actual time=1.473..14.458 rows=22465 loops=1 Worker 1: actual time=7.370..31.359 rows=30525 loops=1 Worker 2: actual time=8.765..19.838 rows=18549 loops=1 Worker 3: actual time=0.279..17.269 rows=23864 loops=1 -> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0) (actual time=21.775..21.777 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94 rows=73783 width=0) (actual time=11.961..11.961 rows= 79999 loops=1) Index Cond: (x.a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21 rows=96752 width=0) (actual time=9.809..9.809 rows=10 0000 loops=1) Index Cond: (x.a > 199900000) Planning Time: 0.276 ms Execution Time: 212.189 ms (25 rows) test=# truncate testscan; TRUNCATE TABLE test=# set max_parallel_workers_per_gather=0; SET test=# explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on public.testscan (cost=3193.29..3625636.35 rows=0 width=0) (actual time=241.222..241.224 rows=0 loops=1) -> Bitmap Heap Scan on public.x (cost=3193.29..3625636.35 rows=74267 width=8) (actual time=16.945..92.392 rows=129999 loops =1) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 50000 Heap Blocks: exact=975 -> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0) (actual time=16.735..16.736 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94 rows=73783 width=0) (actual time=9.222..9.223 rows=79999 lo ops=1) Index Cond: (x.a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21 rows=96752 width=0) (actual time=7.511..7.511 rows=100000 l oops=1) Index Cond: (x.a > 199900000) Planning Time: 0.205 ms Execution Time: 241.274 ms (14 rows) ============ test=# set max_parallel_workers_per_gather=4; SET test=# explain analyze verbose insert into testscan_pk select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=4193.29..1255440.94 rows=74267 width=0) (actual time=572.242..573.683 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 -> Insert on public.testscan_pk (cost=3193.29..1247014.24 rows=0 width=0) (actual time=566.303..566.308 rows=0 loops=5) Worker 0: actual time=566.756..566.757 rows=0 loops=1 Worker 1: actual time=564.778..564.779 rows=0 loops=1 Worker 2: actual time=564.402..564.419 rows=0 loops=1 Worker 3: actual time=563.748..563.749 rows=0 loops=1 -> Parallel Bitmap Heap Scan on public.x (cost=3193.29..1247014.24 rows=18567 width=8) (actual time=16.479..37.327 ro ws=26000 loops=5) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 10000 Heap Blocks: exact=204 Worker 0: actual time=17.358..36.895 rows=24233 loops=1 Worker 1: actual time=12.711..33.538 rows=25616 loops=1 Worker 2: actual time=15.671..35.701 rows=24831 loops=1 Worker 3: actual time=17.656..39.310 rows=26645 loops=1 -> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0) (actual time=18.541..18.542 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94 rows=73783 width=0) (actual time=8.549..8.549 rows=79 999 loops=1) Index Cond: (x.a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21 rows=96752 width=0) (actual time=9.990..9.990 rows=10 0000 loops=1) Index Cond: (x.a > 199900000) Planning Time: 0.240 ms Execution Time: 573.733 ms (25 rows) test=# set max_parallel_workers_per_gather=0; SET test=# truncate testscan_pk; TRUNCATE TABLE test=# explain analyze verbose insert into testscan_pk select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on public.testscan_pk (cost=3193.29..3625636.35 rows=0 width=0) (actual time=598.997..598.998 rows=0 loops=1) -> Bitmap Heap Scan on public.x (cost=3193.29..3625636.35 rows=74267 width=8) (actual time=20.153..96.858 rows=129999 loops =1) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 50000 Heap Blocks: exact=975 -> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0) (actual time=19.840..19.841 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94 rows=73783 width=0) (actual time=9.276..9.276 rows=79999 lo ops=1) Index Cond: (x.a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21 rows=96752 width=0) (actual time=10.562..10.562 rows=100000 loops=1) Index Cond: (x.a > 199900000) Planning Time: 0.204 ms Execution Time: 599.098 ms (14 rows) ============ test=# set max_parallel_workers_per_gather=4; SET test=# explain analyze verbose insert into testscan_index select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=4193.29..1255440.94 rows=74267 width=0) (actual time=560.460..562.386 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 -> Insert on public.testscan_index (cost=3193.29..1247014.24 rows=0 width=0) (actual time=553.434..553.435 rows=0 loops=5) Worker 0: actual time=548.751..548.752 rows=0 loops=1 Worker 1: actual time=552.008..552.009 rows=0 loops=1 Worker 2: actual time=553.094..553.095 rows=0 loops=1 Worker 3: actual time=553.389..553.390 rows=0 loops=1 -> Parallel Bitmap Heap Scan on public.x (cost=3193.29..1247014.24 rows=18567 width=8) (actual time=13.759..34.487 ro ws=26000 loops=5) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 10000 Heap Blocks: exact=183 Worker 0: actual time=8.698..29.924 rows=26173 loops=1 Worker 1: actual time=12.865..33.889 rows=27421 loops=1 Worker 2: actual time=13.088..32.823 rows=24591 loops=1 Worker 3: actual time=14.075..36.349 rows=26571 loops=1 -> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0) (actual time=19.356..19.357 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94 rows=73783 width=0) (actual time=10.330..10.330 rows= 79999 loops=1) Index Cond: (x.a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21 rows=96752 width=0) (actual time=9.024..9.024 rows=10 0000 loops=1) Index Cond: (x.a > 199900000) Planning Time: 0.219 ms Execution Time: 562.442 ms (25 rows) test=# set max_parallel_workers_per_gather=0; SET test=# truncate testscan_index; TRUNCATE TABLE test=# explain analyze verbose insert into testscan_index select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on public.testscan_index (cost=3193.29..3625636.35 rows=0 width=0) (actual time=607.619..607.621 rows=0 loops=1) -> Bitmap Heap Scan on public.x (cost=3193.29..3625636.35 rows=74267 width=8) (actual time=21.001..96.283 rows=129999 loops =1) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 50000 Heap Blocks: exact=975 -> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0) (actual time=20.690..20.691 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94 rows=73783 width=0) (actual time=9.097..9.097 rows=79999 lo ops=1) Index Cond: (x.a < 80000) -> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21 rows=96752 width=0) (actual time=11.591..11.591 rows=100000 loops=1) Index Cond: (x.a > 199900000) Planning Time: 0.205 ms Execution Time: 607.734 ms (14 rows) Even when I changed the queries to return more rows from the scan, to the point where it chose not to use a parallel INSERT bitmap heap scan (in favour of parallel seq scan), and then forced it to by disabling seqscan, I found that it was still at least as fast as serial INSERT with bitmap heap scan. Regards, Greg Nancarrow Fujitsu Australia
On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > Actually, I tried adding the following in the loop that checks the > > > parallel-safety of each partition and it seemed to work: > > > > > > glob->relationOids = > > > lappend_oid(glob->relationOids, pdesc->oids[i]); > > > > > > Can you confirm, is that what you were referring to? > > > > Right. I had mistakenly mentioned PlannerGlobal.invalItems, sorry. > > > > Although it gets the job done, I'm not sure if manipulating > > relationOids from max_parallel_hazard() or its subroutines is okay, > > but I will let the committer decide that. As I mentioned above, the > > person who designed this decided for some reason that it is > > extract_query_dependencies()'s job to populate > > PlannerGlobal.relationOids/invalItems. > > Yes, it doesn't really seem right doing it within max_parallel_hazard(). > I tried doing it in extract_query_dependencies() instead - see > attached patch - and it seems to work, but I'm not sure if there might > be any unintended side-effects. One issue I see with the patch is that it fails to consider multi-level partitioning, because it's looking up partitions only in the target table's PartitionDesc and no other. @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context) RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc); if (rte->rtekind == RTE_RELATION) - context->glob->relationOids = - lappend_oid(context->glob->relationOids, rte->relid); + { + PlannerGlobal *glob; + + glob = context->glob; + glob->relationOids = + lappend_oid(glob->relationOids, rte->relid); + if (query->commandType == CMD_INSERT && + rte->relkind == RELKIND_PARTITIONED_TABLE) The RTE whose relkind is being checked here may not be the INSERT target relation's RTE, even though that's perhaps always true today. So, I suggest to pull the new block out of the loop over rtable and perform its deeds on the result RTE explicitly fetched using rt_fetch(), preferably using a separate recursive function. I'm thinking something like the attached revised version. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > Actually, I tried adding the following in the loop that checks the > > > > parallel-safety of each partition and it seemed to work: > > > > > > > > glob->relationOids = > > > > lappend_oid(glob->relationOids, pdesc->oids[i]); > > > > > > > > Can you confirm, is that what you were referring to? > > > > > > Right. I had mistakenly mentioned PlannerGlobal.invalItems, sorry. > > > > > > Although it gets the job done, I'm not sure if manipulating > > > relationOids from max_parallel_hazard() or its subroutines is okay, > > > but I will let the committer decide that. As I mentioned above, the > > > person who designed this decided for some reason that it is > > > extract_query_dependencies()'s job to populate > > > PlannerGlobal.relationOids/invalItems. > > > > Yes, it doesn't really seem right doing it within max_parallel_hazard(). > > I tried doing it in extract_query_dependencies() instead - see > > attached patch - and it seems to work, but I'm not sure if there might > > be any unintended side-effects. > > One issue I see with the patch is that it fails to consider > multi-level partitioning, because it's looking up partitions only in > the target table's PartitionDesc and no other. > > @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node, > PlannerInfo *context) > RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc); > > if (rte->rtekind == RTE_RELATION) > - context->glob->relationOids = > - lappend_oid(context->glob->relationOids, rte->relid); > + { > + PlannerGlobal *glob; > + > + glob = context->glob; > + glob->relationOids = > + lappend_oid(glob->relationOids, rte->relid); > + if (query->commandType == CMD_INSERT && > + rte->relkind == RELKIND_PARTITIONED_TABLE) > > The RTE whose relkind is being checked here may not be the INSERT > target relation's RTE, even though that's perhaps always true today. > So, I suggest to pull the new block out of the loop over rtable and > perform its deeds on the result RTE explicitly fetched using > rt_fetch(), preferably using a separate recursive function. I'm > thinking something like the attached revised version. > > Thanks. Yes, I'd forgotten about the fact a partition may itself be partitioned, so it needs to be recursive (like in the parallel-safety checks). Your revised version seems OK, though I do have a concern: Is the use of "table_close(rel, NoLock)'' intentional? That will keep the lock (lockmode) until end-of-transaction. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > Actually, I tried adding the following in the loop that checks the > > > > > parallel-safety of each partition and it seemed to work: > > > > > > > > > > glob->relationOids = > > > > > lappend_oid(glob->relationOids, pdesc->oids[i]); > > > > > > > > > > Can you confirm, is that what you were referring to? > > > > > > > > Right. I had mistakenly mentioned PlannerGlobal.invalItems, sorry. > > > > > > > > Although it gets the job done, I'm not sure if manipulating > > > > relationOids from max_parallel_hazard() or its subroutines is okay, > > > > but I will let the committer decide that. As I mentioned above, the > > > > person who designed this decided for some reason that it is > > > > extract_query_dependencies()'s job to populate > > > > PlannerGlobal.relationOids/invalItems. > > > > > > Yes, it doesn't really seem right doing it within max_parallel_hazard(). > > > I tried doing it in extract_query_dependencies() instead - see > > > attached patch - and it seems to work, but I'm not sure if there might > > > be any unintended side-effects. > > > > One issue I see with the patch is that it fails to consider > > multi-level partitioning, because it's looking up partitions only in > > the target table's PartitionDesc and no other. > > > > @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node, > > PlannerInfo *context) > > RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc); > > > > if (rte->rtekind == RTE_RELATION) > > - context->glob->relationOids = > > - lappend_oid(context->glob->relationOids, rte->relid); > > + { > > + PlannerGlobal *glob; > > + > > + glob = context->glob; > > + glob->relationOids = > > + lappend_oid(glob->relationOids, rte->relid); > > + if (query->commandType == CMD_INSERT && > > + rte->relkind == RELKIND_PARTITIONED_TABLE) > > > > The RTE whose relkind is being checked here may not be the INSERT > > target relation's RTE, even though that's perhaps always true today. > > So, I suggest to pull the new block out of the loop over rtable and > > perform its deeds on the result RTE explicitly fetched using > > rt_fetch(), preferably using a separate recursive function. I'm > > thinking something like the attached revised version. > > Thanks. Yes, I'd forgotten about the fact a partition may itself be > partitioned, so it needs to be recursive (like in the parallel-safety > checks). > Your revised version seems OK, though I do have a concern: > Is the use of "table_close(rel, NoLock)'' intentional? That will keep > the lock (lockmode) until end-of-transaction. I think we always keep any locks on relations that are involved in a plan until end-of-transaction. What if a partition is changed in an unsafe manner between being considered safe for parallel insertion and actually performing the parallel insert? BTW, I just noticed that exctract_query_dependencies() runs on a rewritten, but not-yet-planned query tree, that is, I didn't know that extract_query_dependencies() only populates the CachedPlanSource's relationOids and not CachedPlan's. The former is only for tracking the dependencies of an unplanned Query, so partitions should never be added to it. Instead, they should be added to PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan), which is kind of what your earlier patch did. Needless to say, PlanCacheRelCallback checks both CachedPlanSource.relationOids and PlannedStmt.relationOids, so if it receives a message about a partition, its OID is matched from the latter. All that is to say that we should move our code to add partition OIDs as plan dependencies to somewhere in set_plan_references(), which otherwise populates PlannedStmt.relationOids. I updated the patch to do that. It also occurred to me that we can avoid pointless adding of partitions if the final plan won't use parallelism. For that, the patch adds checking glob->parallelModeNeeded, which seems to do the trick though I don't know if that's the correct way of doing that. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > Your revised version seems OK, though I do have a concern: > > Is the use of "table_close(rel, NoLock)'' intentional? That will keep > > the lock (lockmode) until end-of-transaction. > > I think we always keep any locks on relations that are involved in a > plan until end-of-transaction. What if a partition is changed in an > unsafe manner between being considered safe for parallel insertion and > actually performing the parallel insert? > > BTW, I just noticed that exctract_query_dependencies() runs on a > rewritten, but not-yet-planned query tree, that is, I didn't know that > extract_query_dependencies() only populates the CachedPlanSource's > relationOids and not CachedPlan's. The former is only for tracking > the dependencies of an unplanned Query, so partitions should never be > added to it. Instead, they should be added to > PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan), > which is kind of what your earlier patch did. Needless to say, > PlanCacheRelCallback checks both CachedPlanSource.relationOids and > PlannedStmt.relationOids, so if it receives a message about a > partition, its OID is matched from the latter. > > All that is to say that we should move our code to add partition OIDs > as plan dependencies to somewhere in set_plan_references(), which > otherwise populates PlannedStmt.relationOids. I updated the patch to > do that. OK, understood. Thanks for the detailed explanation. > It also occurred to me that we can avoid pointless adding of > partitions if the final plan won't use parallelism. For that, the > patch adds checking glob->parallelModeNeeded, which seems to do the > trick though I don't know if that's the correct way of doing that. > I'm not sure if's pointless adding partitions even in the case of NOT using parallelism, because we may be relying on the result of parallel-safety checks on partitions in both cases. For example, insert_parallel.sql currently includes a test (that you originally provided in a previous post) that checks a non-parallel plan is generated after a parallel-unsafe trigger is created on a partition involved in the INSERT. If I further add to that test by then dropping that trigger and then again using EXPLAIN to see what plan is generated, then I'd expect a parallel-plan to be generated, but with the setrefs-v3.patch it still generates a non-parallel plan. So I think the "&& glob->parallelModeNeeded" part of test needs to be removed. Regards, Greg Nancarrow Fujitsu Australia
On Thu, Feb 18, 2021 at 10:03 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09@gmail.com> wrote: > > All that is to say that we should move our code to add partition OIDs > > as plan dependencies to somewhere in set_plan_references(), which > > otherwise populates PlannedStmt.relationOids. I updated the patch to > > do that. > > OK, understood. Thanks for the detailed explanation. > > > It also occurred to me that we can avoid pointless adding of > > partitions if the final plan won't use parallelism. For that, the > > patch adds checking glob->parallelModeNeeded, which seems to do the > > trick though I don't know if that's the correct way of doing that. > > > > I'm not sure if's pointless adding partitions even in the case of NOT > using parallelism, because we may be relying on the result of > parallel-safety checks on partitions in both cases. > For example, insert_parallel.sql currently includes a test (that you > originally provided in a previous post) that checks a non-parallel > plan is generated after a parallel-unsafe trigger is created on a > partition involved in the INSERT. > If I further add to that test by then dropping that trigger and then > again using EXPLAIN to see what plan is generated, then I'd expect a > parallel-plan to be generated, but with the setrefs-v3.patch it still > generates a non-parallel plan. So I think the "&& > glob->parallelModeNeeded" part of test needs to be removed. Ah, okay, I didn't retest my case after making that change. Looking at this again, I am a bit concerned about going over the whole partition tree *twice* when making a parallel plan for insert into partitioned tables. Maybe we should do what you did in your first attempt a slightly differently -- add partition OIDs during the max_parallel_hazard() initiated scan of the partition tree as you did. Instead of adding them directly to PlannerGlobal.relationOids, add to, say, PlannerInfo.targetPartitionOids and have set_plan_references() do list_concat(glob->relationOids, list_copy(root->targetPartitionOids) in the same place as setrefs-v3 does add_target_partition_oids_recurse(). Thoughts? -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, Feb 18, 2021 at 4:35 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Looking at this again, I am a bit concerned about going over the whole > partition tree *twice* when making a parallel plan for insert into > partitioned tables. Maybe we should do what you did in your first > attempt a slightly differently -- add partition OIDs during the > max_parallel_hazard() initiated scan of the partition tree as you did. > Instead of adding them directly to PlannerGlobal.relationOids, add to, > say, PlannerInfo.targetPartitionOids and have set_plan_references() do > list_concat(glob->relationOids, list_copy(root->targetPartitionOids) > in the same place as setrefs-v3 does > add_target_partition_oids_recurse(). Thoughts? > Agreed, that might be a better approach, and that way we're also only recording the partition OIDs that the parallel-safety checks are relying on. I'll give it a go and see if I can detect any issues with this method. Regards, Greg Nancarrow Fujitsu Australia
Posting a new version of the patches, with the following updates: - Moved the update of glob->relationOIDs (i.e. addition of partition OIDs that plan depends on, resulting from parallel-safety checks) from within max_parallel_hazard() to set_plan_references(). - Added an extra test for partition plan-cache invalidation. - Simplified query_has_modifying_cte() temporary bug-fix. - Added a comment explaining why parallel-safety of partition column defaults is not checked. - Minor simplification: hasSubQuery return to is_parallel_possible_for_modify(). Regards, Greg Nancarrow Fujitsu Australia
Attachment
From: Greg Nancarrow <gregn4422@gmail.com> -------------------------------------------------- On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > (8) > + /* > + * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in > + * the relation, and this would result in creation of new CommandIds > + * on insert/update/delete and this isn't supported in a parallel > + * worker (but is safe in the parallel leader). > + */ > + trigtype = RI_FKey_trigger_type(trigger->tgfoid); > + if (trigtype == RI_TRIGGER_FK) > + { > + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > + return true; > + } > > Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs. See RI_FKey_check()which is called in RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the detectNewRows argumentset to false, which causes CommandCounterIncrement() to not be called. > Hmmm, I'm not sure that you have read and interpreted the patch code correctly. The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will generatea new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new commandIDs across the participants in the parallel operation). This is what the code comment says, It does not say that sucha trigger generates a new command ID. See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665 In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK). -------------------------------------------------- First of all, I anticipate this parallel INSERT SELECT feature will typically shine, and expected to work, in the ETL orELT into a data warehouse or an ODS for analytics. Bearing that in mind, let me list some issues or questions below. But the current state of the patch would be of course attractive in some workloads, so I don't think these are not necessarilyblockers. (1) According to the classic book "The Data Warehouse Toolkit" and the website [1] by its author, the fact table (large transactionhistory) in the data warehouse has foreign keys referencing to the dimension tables (small or medium-sized masteror reference data). So, parallel insert will be effective if it works when loading data into the fact table with foreignkeys. To answer the above question, I'm assuming: CREATE TABLE some_dimension (key_col int PRIMARY KEY); CREATE TABLE some_fact (some_key int REFERENCES some_dimension); INSERT INTO some_fact SELECT ...; My naive question is, "why should new command IDs be generated to check foreign key constraints in this INSERT case? Thecheck just reads the parent (some_dimension table here)..." Looking a bit deeper into the code, although ri_PerformCheck() itself tries to avoid generating command IDs, it calls _SPI_execute_snapshot()with the read_only argument always set to false. It in turn calls _SPI_execute_plan() -> CommandCounterIncrement()as follows: [_SPI_execute_plan()] /* * If not read-only mode, advance the command counter before each * command and update the snapshot. */ if (!read_only && !plan->no_snapshots) { CommandCounterIncrement(); UpdateActiveSnapshotCommandId(); } Can't we pass true to read_only from ri_PerformCheck() in some cases? (2) Likewise, dimension tables have surrogate keys that are typically implemented as a sequence or an identity column. It issuggested that even fact tables sometimes (or often?) have surrogate keys. But the current patch does not parallelizethe statement when the target table has a sequence or an identity column. I was looking at the sequence code, and my naive (again) idea is that the parallel leader and workers allocates numbers fromthe sequence independently, and sets the largest number of them as the session's currval at the end of parallel operation. We have to note in the documentation that gaps in the sequence numbers will arise and not used in parallel DML. (3) As Hou-san demonstrated, the current patch causes the resulting table and index to become larger when inserted in parallelthan in inserted serially. This could be a problem for analytics use cases where the table is just inserted andread only afterwards. We could advise the user to run REINDEX CONCURRENTLY after loading data, but what about tables? BTW, I don't know if Oracle and SQL Server have similar issues. They may have some reason about this why they take an exclusivelock on the target table. (4) When the target table is partitioned, is the INSERT parallelized among its partitions? Some plan like: Parallel Append on parent_table -> Insert on partiton1 -> Insert on partiton2 [1] Fact Table Surrogate Key | Kimball Dimensional Modeling Techniques https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/fact-surrogate-key/ Regards Takayuki Tsunakawa
On Thu, Feb 18, 2021 at 11:05 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > It also occurred to me that we can avoid pointless adding of > > > partitions if the final plan won't use parallelism. For that, the > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > trick though I don't know if that's the correct way of doing that. > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > using parallelism, because we may be relying on the result of > > parallel-safety checks on partitions in both cases. > > For example, insert_parallel.sql currently includes a test (that you > > originally provided in a previous post) that checks a non-parallel > > plan is generated after a parallel-unsafe trigger is created on a > > partition involved in the INSERT. > > If I further add to that test by then dropping that trigger and then > > again using EXPLAIN to see what plan is generated, then I'd expect a > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > generates a non-parallel plan. So I think the "&& > > glob->parallelModeNeeded" part of test needs to be removed. > > Ah, okay, I didn't retest my case after making that change. > Greg has point here but I feel something on previous lines (having a test of glob->parallelModeNeeded) is better. We only want to invalidate the plan if the prepared plan is unsafe to execute next time. It is quite possible that there are unsafe triggers on different partitions and only one of them is dropped, so next time planning will again yield to the same non-parallel plan. If we agree with that I think it is better to add this dependency in set_plan_refs (along with Gather node handling). Also, if we agree that we don't have any cheap way to determine parallel-safety of partitioned relations then shall we consider the patch being discussed [1] to be combined here? Amit L, do you agree with that line of thought, or you have any other ideas? I feel we should focus on getting the first patch of Greg (v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT, along with a test case patch) and Hou-San's patch discussed at [1] ready. Then we can focus on the v18-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Because even if we get the first patch that is good enough for some users. What do you think? [1] - https://www.postgresql.org/message-id/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Fri, Feb 19, 2021 at 10:13 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Greg Nancarrow <gregn4422@gmail.com> > -------------------------------------------------- > On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > (8) > > + /* > > + * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in > > + * the relation, and this would result in creation of new CommandIds > > + * on insert/update/delete and this isn't supported in a parallel > > + * worker (but is safe in the parallel leader). > > + */ > > + trigtype = RI_FKey_trigger_type(trigger->tgfoid); > > + if (trigtype == RI_TRIGGER_FK) > > + { > > + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > > + return true; > > + } > > > > Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs. SeeRI_FKey_check() which is called in RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the detectNewRows argumentset to false, which causes CommandCounterIncrement() to not be called. > > > > Hmmm, I'm not sure that you have read and interpreted the patch code correctly. > The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will generatea new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new commandIDs across the participants in the parallel operation). This is what the code comment says, It does not say that sucha trigger generates a new command ID. > > See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665 > > In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK). > -------------------------------------------------- > > > First of all, I anticipate this parallel INSERT SELECT feature will typically shine, and expected to work, in the ETL orELT into a data warehouse or an ODS for analytics. Bearing that in mind, let me list some issues or questions below. But the current state of the patch would be of course attractive in some workloads, so I don't think these are not necessarilyblockers. > > > (1) > According to the classic book "The Data Warehouse Toolkit" and the website [1] by its author, the fact table (large transactionhistory) in the data warehouse has foreign keys referencing to the dimension tables (small or medium-sized masteror reference data). So, parallel insert will be effective if it works when loading data into the fact table with foreignkeys. > > To answer the above question, I'm assuming: > > CREATE TABLE some_dimension (key_col int PRIMARY KEY); > CREATE TABLE some_fact (some_key int REFERENCES some_dimension); > INSERT INTO some_fact SELECT ...; > > > My naive question is, "why should new command IDs be generated to check foreign key constraints in this INSERT case? Thecheck just reads the parent (some_dimension table here)..." > It is quite possible what you are saying is correct but I feel that is not this patch's fault. So, won't it better to discuss this in a separate thread? > > > (2) > Likewise, dimension tables have surrogate keys that are typically implemented as a sequence or an identity column. Itis suggested that even fact tables sometimes (or often?) have surrogate keys. But the current patch does not parallelizethe statement when the target table has a sequence or an identity column. > > I was looking at the sequence code, and my naive (again) idea is that the parallel leader and workers allocates numbersfrom the sequence independently, and sets the largest number of them as the session's currval at the end of paralleloperation. We have to note in the documentation that gaps in the sequence numbers will arise and not used in parallelDML. > Good use case but again, I think this can be done as a separate patch. > > (3) > As Hou-san demonstrated, the current patch causes the resulting table and index to become larger when inserted in parallelthan in inserted serially. This could be a problem for analytics use cases where the table is just inserted andread only afterwards. We could advise the user to run REINDEX CONCURRENTLY after loading data, but what about tables? > I think here you are talking about the third patch (Parallel Inserts). I guess if one has run inserts parallelly from psql then also similar behavior would have been observed. For tables, it might lead to better results once we have the patch discussed at [1]. Actually, this needs more investigation. [1] - https://www.postgresql.org/message-id/20200508072545.GA9701%40telsasoft.com -- With Regards, Amit Kapila.
On Thu, Mar 18, 2021 at 8:30 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml > index ff1b642722..d5d356f2de 100644 > --- a/doc/src/sgml/ref/create_table.sgml > +++ b/doc/src/sgml/ref/create_table.sgml > @@ -1338,8 +1338,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM > If a table parameter value is set and the > equivalent <literal>toast.</literal> parameter is not, the TOAST table > will use the table's parameter value. > - Specifying these parameters for partitioned tables is not supported, > - but you may specify them for individual leaf partitions. > + These parameters, with the exception of > + <literal>parallel_insert_enabled</literal>, > + are not supported on partitioned tables, but may be specified for > + individual leaf partitions. > </para> > Your patch looks good to me. While checking this, I notice a typo in the previous patch: - planner parameter <varname>parallel_workers</varname>. + planner parameter <varname>parallel_workers</varname> and + <varname>parallel_insert_enabled</varname>. Here, it should be /planner parameter/planner parameters. -- With Regards, Amit Kapila.
> > If a table parameter value is set and the > > equivalent <literal>toast.</literal> parameter is not, the TOAST table > > will use the table's parameter value. > > - Specifying these parameters for partitioned tables is not supported, > > - but you may specify them for individual leaf partitions. > > + These parameters, with the exception of > > + <literal>parallel_insert_enabled</literal>, > > + are not supported on partitioned tables, but may be specified for > > + individual leaf partitions. > > </para> > > > > Your patch looks good to me. While checking this, I notice a typo in the > previous patch: > - planner parameter <varname>parallel_workers</varname>. > + planner parameter <varname>parallel_workers</varname> and > + <varname>parallel_insert_enabled</varname>. > > Here, it should be /planner parameter/planner parameters. Thanks amit and justin for pointing this out ! The changes looks good to me. Best regards, houzj
On Thu, Mar 18, 2021 at 9:04 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > > If a table parameter value is set and the > > > equivalent <literal>toast.</literal> parameter is not, the TOAST table > > > will use the table's parameter value. > > > - Specifying these parameters for partitioned tables is not supported, > > > - but you may specify them for individual leaf partitions. > > > + These parameters, with the exception of > > > + <literal>parallel_insert_enabled</literal>, > > > + are not supported on partitioned tables, but may be specified for > > > + individual leaf partitions. > > > </para> > > > > > > > Your patch looks good to me. While checking this, I notice a typo in the > > previous patch: > > - planner parameter <varname>parallel_workers</varname>. > > + planner parameter <varname>parallel_workers</varname> and > > + <varname>parallel_insert_enabled</varname>. > > > > Here, it should be /planner parameter/planner parameters. > > Thanks amit and justin for pointing this out ! > The changes looks good to me. > Pushed! -- With Regards, Amit Kapila.
Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch have been committed, I have now rebased and attached the rest of the original patchset, which includes:
- Additional tests for "Parallel SELECT for INSERT"
- Parallel INSERT functionality
- Test and documentation updates for Parallel INSERT
Regards,
Greg Nancarrow
Fujitsu Australia
Attachment
> Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch have been committed, I have now rebased andattached the rest of the original patchset, > which includes: >- Additional tests for "Parallel SELECT for INSERT" >- Parallel INSERT functionality >- Test and documentation updates for Parallel INSERT Hi, I noticed that some comments may need updated since we introduced parallel insert in this patch. 1) src/backend/executor/execMain.c * Don't allow writes in parallel mode. Supporting UPDATE and DELETE * would require (a) storing the combocid hash in shared memory, rather * than synchronizing it just once at the start of parallelism, and (b) an * alternative to heap_update()'s reliance on xmax for mutual exclusion. * INSERT may have no such troubles, but we forbid it to simplify the * checks. As we will allow INSERT in parallel mode, we'd better change the comment here. 2) src/backend/storage/lmgr/README dangers are modest. The leader and worker share the same transaction, snapshot, and combo CID hash, and neither can perform any DDL or, indeed, write any data at all. Thus, for either to read a table locked exclusively by The same as 1), parallel insert is the exception. 3) src/backend/storage/lmgr/README mutual exclusion method for such cases. Currently, the parallel mode is strictly read-only, but now we have the infrastructure to allow parallel inserts and parallel copy. May be we can say: +mutual exclusion method for such cases. Currently, we only allowed parallel +inserts, but we already have the infrastructure to allow parallel copy. Best regards, houzj
On Mon, Mar 22, 2021 at 2:30 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > I noticed that some comments may need updated since we introduced parallel insert in this patch. > > 1) src/backend/executor/execMain.c > * Don't allow writes in parallel mode. Supporting UPDATE and DELETE > * would require (a) storing the combocid hash in shared memory, rather > * than synchronizing it just once at the start of parallelism, and (b) an > * alternative to heap_update()'s reliance on xmax for mutual exclusion. > * INSERT may have no such troubles, but we forbid it to simplify the > * checks. > > As we will allow INSERT in parallel mode, we'd better change the comment here. > Thanks, it does need to be updated for parallel INSERT. I was thinking of the following change: - * Don't allow writes in parallel mode. Supporting UPDATE and DELETE - * would require (a) storing the combocid hash in shared memory, rather - * than synchronizing it just once at the start of parallelism, and (b) an - * alternative to heap_update()'s reliance on xmax for mutual exclusion. - * INSERT may have no such troubles, but we forbid it to simplify the - * checks. + * Except for INSERT, don't allow writes in parallel mode. Supporting + * UPDATE and DELETE would require (a) storing the combocid hash in shared + * memory, rather than synchronizing it just once at the start of + * parallelism, and (b) an alternative to heap_update()'s reliance on xmax + * for mutual exclusion. > 2) src/backend/storage/lmgr/README > dangers are modest. The leader and worker share the same transaction, > snapshot, and combo CID hash, and neither can perform any DDL or, indeed, > write any data at all. Thus, for either to read a table locked exclusively by > > The same as 1), parallel insert is the exception. > I agree, it needs to be updated too, to account for parallel INSERT now being supported. -write any data at all. ... +write any data at all (with the exception of parallel insert). ... > 3) src/backend/storage/lmgr/README > mutual exclusion method for such cases. Currently, the parallel mode is > strictly read-only, but now we have the infrastructure to allow parallel > inserts and parallel copy. > > May be we can say: > +mutual exclusion method for such cases. Currently, we only allowed parallel > +inserts, but we already have the infrastructure to allow parallel copy. > Yes, agree, something like: -mutual exclusion method for such cases. Currently, the parallel mode is -strictly read-only, but now we have the infrastructure to allow parallel -inserts and parallel copy. +mutual exclusion method for such cases. Currently, only parallel insert is +allowed, but we have the infrastructure to allow parallel copy. Let me know if these changes seem OK to you. Regards, Greg Nancarrow Fujitsu Australia
> > I noticed that some comments may need updated since we introduced > parallel insert in this patch. > > > > 1) src/backend/executor/execMain.c > > * Don't allow writes in parallel mode. Supporting UPDATE and > DELETE > > * would require (a) storing the combocid hash in shared memory, > rather > > * than synchronizing it just once at the start of parallelism, and (b) an > > * alternative to heap_update()'s reliance on xmax for mutual > exclusion. > > * INSERT may have no such troubles, but we forbid it to simplify the > > * checks. > > > > As we will allow INSERT in parallel mode, we'd better change the comment > here. > > > > Thanks, it does need to be updated for parallel INSERT. > I was thinking of the following change: > > - * Don't allow writes in parallel mode. Supporting UPDATE and DELETE > - * would require (a) storing the combocid hash in shared memory, rather > - * than synchronizing it just once at the start of parallelism, and (b) an > - * alternative to heap_update()'s reliance on xmax for mutual exclusion. > - * INSERT may have no such troubles, but we forbid it to simplify the > - * checks. > + * Except for INSERT, don't allow writes in parallel mode. Supporting > + * UPDATE and DELETE would require (a) storing the combocid hash in > shared > + * memory, rather than synchronizing it just once at the start of > + * parallelism, and (b) an alternative to heap_update()'s reliance on xmax > + * for mutual exclusion. > > > > > 2) src/backend/storage/lmgr/README > > dangers are modest. The leader and worker share the same transaction, > > snapshot, and combo CID hash, and neither can perform any DDL or, > > indeed, write any data at all. Thus, for either to read a table > > locked exclusively by > > > > The same as 1), parallel insert is the exception. > > > > I agree, it needs to be updated too, to account for parallel INSERT now being > supported. > > -write any data at all. ... > +write any data at all (with the exception of parallel insert). ... > > > > 3) src/backend/storage/lmgr/README > > mutual exclusion method for such cases. Currently, the parallel mode > > is strictly read-only, but now we have the infrastructure to allow > > parallel inserts and parallel copy. > > > > May be we can say: > > +mutual exclusion method for such cases. Currently, we only allowed > > +parallel inserts, but we already have the infrastructure to allow parallel copy. > > > > Yes, agree, something like: > > -mutual exclusion method for such cases. Currently, the parallel mode is > -strictly read-only, but now we have the infrastructure to allow parallel -inserts > and parallel copy. > +mutual exclusion method for such cases. Currently, only parallel > +insert is allowed, but we have the infrastructure to allow parallel copy. > > > Let me know if these changes seem OK to you. Yes, these changes look good to me. Best regards, houzj
On Mon, Mar 22, 2021 at 6:28 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > > > Let me know if these changes seem OK to you. > > Yes, these changes look good to me. Posting an updated set of patches with these changes... Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Mon, Mar 22, 2021 at 3:57 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Mon, Mar 22, 2021 at 6:28 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > Let me know if these changes seem OK to you. > > > > Yes, these changes look good to me. > > Posting an updated set of patches with these changes... > I have marked this as Returned with Feedback. There is a lot of work to do for this patch as per the feedback given on pgsql-committers [1]. [1] - https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY%40gemulon.postgresql.org -- With Regards, Amit Kapila.