Thread: Parallel INSERT (INTO ... SELECT ...)

Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 September 2020, 04:55:21

Hi Hackers,

Following on from Dilip Kumar's POC patch for allowing parallelism of
the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
patch for allowing parallelism of both the INSERT and SELECT parts,
where it can be allowed.
For cases where it can't be allowed (e.g. INSERT into a table with
foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
...") it at least allows parallelism of the SELECT part.
Obviously I've had to update the planner and executor and
parallel-worker code to make this happen, hopefully not breaking too
many things along the way.

Examples with patch applied:


(1) non-parallel:

test=# explain analyze insert into primary_tbl select * from third_tbl;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Insert on primary_tbl  (cost=0.00..154.99 rows=9999 width=12) (actual
time=108.445..108.446 rows=0 loops=1)
   ->  Seq Scan on third_tbl  (cost=0.00..154.99 rows=9999 width=12)
(actual time=0.009..5.282 rows=9999 loops=1)
 Planning Time: 0.132 ms
 Execution Time: 108.596 ms
(4 rows)


(2) parallel:

test=# explain analyze insert into primary_tbl select * from third_tbl;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..16.00 rows=9999 width=12) (actual
time=69.870..70.310 rows=0 loops=1)
   Workers Planned: 5
   Workers Launched: 5
   ->  Parallel Insert on primary_tbl  (cost=0.00..16.00 rows=500
width=12) (actual time=59.948..59.949 rows=0 loops=6)
         ->  Parallel Seq Scan on third_tbl  (cost=0.00..80.00
rows=2500 width=12) (actual time=0.014..0.922 rows=1666 loops=6)
 Planning Time: 0.121 ms
 Execution Time: 70.438 ms
(7 rows)


(3) parallel select only (insert into table with foreign key)

test=# explain analyze insert into secondary_tbl select * from third_tbl;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Insert on secondary_tbl  (cost=0.00..80.00 rows=9999 width=12)
(actual time=33.864..33.926 rows=0 loops=1)
   ->  Gather  (cost=0.00..80.00 rows=9999 width=12) (actual
time=0.451..5.201 rows=9999 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Parallel Seq Scan on third_tbl  (cost=0.00..80.00
rows=2500 width=12) (actual time=0.013..0.717 rows=2000 loops=5)
 Planning Time: 0.127 ms
 Trigger for constraint secondary_tbl_index_fkey: time=331.834 calls=9999
 Execution Time: 367.342 ms
(8 rows)


Known issues/TODOs:
- Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
INTO ... VALUES ..." would need additional Table AM functions for
dividing up the INSERT work amongst the workers (currently only exists
for scans).
- When INSERTs are made parallel, currently the reported row-count in
the "INSERT 0 <row-count>" status only reflects the rows that the
leader has processed (not the workers) - so it is obviously less than
the actual number of rows inserted.
- Functions relating to computing the number of parallel workers for
an INSERT, and the cost of an INSERT, need work.
- "force_parallel_mode" handling was updated so that it only affects
SELECT (not INSERT) - can't allow it for INSERT because we're only
supporting "INSERT INTO .. SELECT ..." and don't support other types
of INSERTs, and also can't allow attempted parallel UPDATEs resulting
from "INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE" etc.


Thoughts and feedback?

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

0001-ParallelInsertSelect.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

23 September 2020, 08:50:22

> - When INSERTs are made parallel, currently the reported row-count in
> the "INSERT 0 <row-count>" status only reflects the rows that the
> leader has processed (not the workers) - so it is obviously less than
> the actual number of rows inserted.

Attached an updated patch which fixes this issue (for parallel
INSERTs, each worker's processed tuple count is communicated in shared
memory back to the leader, where it is added to the global
"es_processed" count).

Attachment

0002-ParallelInsertSelect.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

vignesh C

Date:

24 September 2020, 01:59:03

On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Hi Hackers,
>
> Following on from Dilip Kumar's POC patch for allowing parallelism of
> the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
> patch for allowing parallelism of both the INSERT and SELECT parts,
> where it can be allowed.
> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.
> Obviously I've had to update the planner and executor and
> parallel-worker code to make this happen, hopefully not breaking too
> many things along the way.

I feel this will be a very good performance improvement. +1 for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Andres Freund

Date:

24 September 2020, 02:21:17

Hi,

On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote:
> Following on from Dilip Kumar's POC patch for allowing parallelism of
> the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
> patch for allowing parallelism of both the INSERT and SELECT parts,
> where it can be allowed.

Cool!

I think it'd be good if you outlined what your approach is to make this
safe.


> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.

I think it'd be good to do this part separately and first, independent
of whether the insert part can be parallelized.


> Obviously I've had to update the planner and executor and
> parallel-worker code to make this happen, hopefully not breaking too
> many things along the way.

Hm, it looks like you've removed a fair bit of checks, it's not clear to
me why that's safe in each instance.


> - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
> INTO ... VALUES ..." would need additional Table AM functions for
> dividing up the INSERT work amongst the workers (currently only exists
> for scans).

Hm, not entirely following. What precisely are you thinking of here?

I doubt it's really worth adding parallelism support for INSERT
... VALUES, the cost of spawning workers will almost always higher than
the benefit.





> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
>      TupleDesc    toasttupDesc;
>      Datum        t_values[3];
>      bool        t_isnull[3];
> -    CommandId    mycid = GetCurrentCommandId(true);
> +    CommandId    mycid = GetCurrentCommandId(!IsParallelWorker());
>      struct varlena *result;
>      struct varatt_external toast_pointer;
>      union

Hm? Why do we need this in the various places you have made this change?


> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 1585861..94c8507 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
>       * inserts in general except for the cases where inserts generate a new
>       * CommandId (eg. inserts into a table having a foreign key column).
>       */
> -    if (IsParallelWorker())
> -        ereport(ERROR,
> -                (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> -                 errmsg("cannot insert tuples in a parallel worker")));
> -

I'm afraid that this weakens our checks more than I'd like. What if this
ends up being invoked from inside C code?


> @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
>                  isdead = false;
>                  break;
>              case HEAPTUPLE_INSERT_IN_PROGRESS:
> -
>                  /*
>                   * Since we hold exclusive lock on the relation, normally the
>                   * only way to see this is if it was inserted earlier in our
>                   * own transaction.  However, it can happen in system
>                   * catalogs, since we tend to release write lock before commit
> -                 * there.  Give a warning if neither case applies; but in any
> -                 * case we had better copy it.
> +                 * there. In any case we had better copy it.
>                   */
> -                if (!is_system_catalog &&
> -                    !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
> -                    elog(WARNING, "concurrent insert in progress within table \"%s\"",
> -                         RelationGetRelationName(OldHeap));
> +
>                  /* treat as live */
>                  isdead = false;
>                  break;
> @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation,
>                       * the only way to see this is if it was inserted earlier
>                       * in our own transaction.  However, it can happen in
>                       * system catalogs, since we tend to release write lock
> -                     * before commit there.  Give a warning if neither case
> -                     * applies.
> +                     * before commit there.
>                       */
>                      xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
>                      if (!TransactionIdIsCurrentTransactionId(xwait))
>                      {
> -                        if (!is_system_catalog)
> -                            elog(WARNING, "concurrent insert in progress within table \"%s\"",
> -                                 RelationGetRelationName(heapRelation));
> -
>                          /*
>                           * If we are performing uniqueness checks, indexing
>                           * such a tuple could lead to a bogus uniqueness

Huh, I don't think this should be necessary?


> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> index a4944fa..9d3f100 100644
> --- a/src/backend/access/transam/varsup.c
> +++ b/src/backend/access/transam/varsup.c
> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
>      TransactionId xid;
>  
>      /*
> -     * Workers synchronize transaction state at the beginning of each parallel
> -     * operation, so we can't account for new XIDs after that point.
> -     */
> -    if (IsInParallelMode())
> -        elog(ERROR, "cannot assign TransactionIds during a parallel operation");
> -
> -    /*
>       * During bootstrap initialization, we return the special bootstrap
>       * transaction id.
>       */

Same thing, this code cannot just be allowed to be reachable. What
prevents you from assigning two different xids from different workers
etc?


> @@ -577,13 +608,6 @@ AssignTransactionId(TransactionState s)
>      Assert(s->state == TRANS_INPROGRESS);
>  
>      /*
> -     * Workers synchronize transaction state at the beginning of each parallel
> -     * operation, so we can't account for new XIDs at this point.
> -     */
> -    if (IsInParallelMode() || IsParallelWorker())
> -        elog(ERROR, "cannot assign XIDs during a parallel operation");
> -
> -    /*
>       * Ensure parent(s) have XIDs, so that a child always has an XID later
>       * than its parent.  Mustn't recurse here, or we might get a stack
>       * overflow if we're at the bottom of a huge stack of subtransactions none

Dito.


Greetings,

Andres Freund

Re: Parallel INSERT (INTO ... SELECT ...)

From

Thomas Munro

Date:

24 September 2020, 02:27:07

On Tue, Sep 22, 2020 at 4:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>  Gather  (cost=0.00..16.00 rows=9999 width=12) (actual
> time=69.870..70.310 rows=0 loops=1)
>    Workers Planned: 5
>    Workers Launched: 5
>    ->  Parallel Insert on primary_tbl  (cost=0.00..16.00 rows=500
> width=12) (actual time=59.948..59.949 rows=0 loops=6)

Nice.  I took it for a quick spin.  I was initially surprised to see
Gather.  I suppose I thought that Parallel {Insert|Update|Delete}
might be a top level node itself, because in such a plan there is no
need to gather tuples per se.  I understand exactly why you have it
that way though: Gather is needed to control workers and handle their
errors etc, and we don't want to have to terminate parallelism anyway
(thinking of some kind of plan with multiple write subqueries).

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

24 September 2020, 02:38:18

On Thu, Sep 24, 2020 at 7:57 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Tue, Sep 22, 2020 at 4:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >  Gather  (cost=0.00..16.00 rows=9999 width=12) (actual
> > time=69.870..70.310 rows=0 loops=1)
> >    Workers Planned: 5
> >    Workers Launched: 5
> >    ->  Parallel Insert on primary_tbl  (cost=0.00..16.00 rows=500
> > width=12) (actual time=59.948..59.949 rows=0 loops=6)
>
> Nice.  I took it for a quick spin.  I was initially surprised to see
> Gather.  I suppose I thought that Parallel {Insert|Update|Delete}
> might be a top level node itself, because in such a plan there is no
> need to gather tuples per se.  I understand exactly why you have it
> that way though: Gather is needed to control workers and handle their
> errors etc, and we don't want to have to terminate parallelism anyway
> (thinking of some kind of plan with multiple write subqueries).
>

I have not checked the patch but I guess if we parallelise Inserts
with Returning then isn't it better to have Gather node above Parallel
Inserts?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

24 September 2020, 03:25:22

On Thu, Sep 24, 2020 at 12:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I have not checked the patch but I guess if we parallelise Inserts
> with Returning then isn't it better to have Gather node above Parallel
> Inserts?
>

This is indeed the case with the patch applied.

For example:

test=# explain insert into primary_tbl select * from third_tbl
returning index, height;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather  (cost=0.00..28.15 rows=9999 width=12)
   Workers Planned: 3
   ->  Parallel Insert on primary_tbl  (cost=0.00..28.15 rows=1040 width=12)
         ->  Parallel Seq Scan on third_tbl  (cost=0.00..87.25
rows=3225 width=12)
(4 rows)

test=# insert into primary_tbl select * from third_tbl returning index, height;
 index | height
-------+--------
     1 |    1.2
     2 |    1.2
     3 |    1.2
     4 |    1.2
     5 |    1.2
     6 |    1.2
     7 |    1.2

...

  9435 |    1.2
  9619 |    1.2
  9620 |    1.2
(9999 rows)

INSERT 0 9999


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

25 September 2020, 03:18:19

On Thu, Sep 24, 2020 at 7:51 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote:
>
>
> > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> > index 1585861..94c8507 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
> >        * inserts in general except for the cases where inserts generate a new
> >        * CommandId (eg. inserts into a table having a foreign key column).
> >        */
> > -     if (IsParallelWorker())
> > -             ereport(ERROR,
> > -                             (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -                              errmsg("cannot insert tuples in a parallel worker")));
> > -
>
> I'm afraid that this weakens our checks more than I'd like.
>

I think we need to change/remove this check to allow inserts by
parallel workers. I am not sure but maybe we can add an Assert to
ensure that it is safe to perform insert via parallel worker.

> What if this
> ends up being invoked from inside C code?
>

I think it shouldn't be a problem unless one is trying to do something
like insert into foreign key table. So, probably we can have an Assert
to catch it if possible. Do you have any other idea?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 September 2020, 03:40:47

> > What if this
> > ends up being invoked from inside C code?
> >
>
> I think it shouldn't be a problem unless one is trying to do something
> like insert into foreign key table. So, probably we can have an Assert
> to catch it if possible. Do you have any other idea?
>

Note that the planner code updated by the patch does avoid creating a
Parallel INSERT plan in the case of inserting into a table with a
foreign key (so commandIds won't be created in the parallel-worker
code).
I'm not sure how to distinguish the "invoked from inside C code" case though.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 September 2020, 04:31:34

Hi Andres,

On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres@anarazel.de> wrote:
>
>I think it'd be good if you outlined what your approach is to make this
>safe.

Some prior work has already been done to establish the necessary
infrastructure to allow parallel INSERTs, in general, to be safe,
except for cases where new commandIds would be generated in the
parallel-worker code (such as inserts into a table having a foreign
key) - these cases need to be avoided.
See the following commits.

85f6b49 Allow relation extension lock to conflict among parallel group members
3ba59cc Allow page lock to conflict among parallel group members

The planner code updated by the patch avoids creating a Parallel
INSERT plan in the case of inserting into a table that has a foreign
key.


>> For cases where it can't be allowed (e.g. INSERT into a table with
>> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
>> ...") it at least allows parallelism of the SELECT part.
>
>I think it'd be good to do this part separately and first, independent
>of whether the insert part can be parallelized.

OK then, I'll try to extract that as a separate patch.


>> Obviously I've had to update the planner and executor and
>> parallel-worker code to make this happen, hopefully not breaking too
>> many things along the way.
>
>Hm, it looks like you've removed a fair bit of checks, it's not clear to
>me why that's safe in each instance.

It should be safe for Parallel INSERT - but you are right, these are
brute force removals (for the purpose of a POC patch) that should be
tightened up wherever possible to disallow unsafe paths into that
code. Problem is, currently there's not a lot of context information
available to easily allow that, so some work needs to be done.


>> - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
>> INTO ... VALUES ..." would need additional Table AM functions for
>> dividing up the INSERT work amongst the workers (currently only exists
>> for scans).
>
>Hm, not entirely following. What precisely are you thinking of here?

All I was saying is that for SELECTs, the work done by each parallel
worker is effectively divided up by parallel-worker-related functions
in tableam.c and indexam.c, and no such technology currently exists
for dividing up work for the "INSERT ... VALUES" case.


>I doubt it's really worth adding parallelism support for INSERT
>... VALUES, the cost of spawning workers will almost always higher than
>the benefit.

You're probably right in doubting any benefit, but I wasn't entirely sure.


>> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
>>       TupleDesc       toasttupDesc;
>>       Datum           t_values[3];
>>       bool            t_isnull[3];
>> -     CommandId       mycid = GetCurrentCommandId(true);
>> +     CommandId       mycid = GetCurrentCommandId(!IsParallelWorker());
>>       struct varlena *result;
>>       struct varatt_external toast_pointer;
>>       union
>
>Hm? Why do we need this in the various places you have made this change?

It's because for Parallel INSERT, we're assigning the same command-id
to each worker up-front during worker initialization (the commandId
has been retrieved by the leader and passed through to each worker)
and "currentCommandIdUsed" has been set true. See the
AssignCommandIdForWorker() function in the patch.
If you see the code of GetCurrentCommandId(), you'll see it Assert
that it's not being run by a parallel worker if the parameter is true.
I didn't want to remove yet another check, without being able to know
the context of the caller, because only for Parallel INSERT do I know
that "currentCommandIdUsed was already true at the start of the
parallel operation". See the comment in that function. Anyway, that's
why I'm passing "false" to relevant GetCurrentCommandId() calls if
they're being run by a parallel (INSERT) worker.


>> @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
>>                               isdead = false;
>>                               break;
>>                       case HEAPTUPLE_INSERT_IN_PROGRESS:
>> -
>>                               /*
>>                                * Since we hold exclusive lock on the relation, normally the
>>                                * only way to see this is if it was inserted earlier in our
>>                                * own transaction.  However, it can happen in system
>>                                * catalogs, since we tend to release write lock before >commit
>> -                              * there.  Give a warning if neither case applies; but in any
>> -                              * case we had better copy it.
>> +                              * there. In any case we had better copy it.
>>                                */
>> -                             if (!is_system_catalog &&
>> -                                     !TransactionIdIsCurrentTransactionId>(HeapTupleHeaderGetXmin(tuple->t_data)))
>> -                                     elog(WARNING, "concurrent insert in progress within >table \"%s\"",
>> -                                              RelationGetRelationName(OldHeap));
>> +
>>                               /* treat as live */
>>                               isdead = false;
>>                               break;
>> @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation,
>>                                        * the only way to see this is if it was inserted >earlier
>>                                        * in our own transaction.  However, it can happen in
>>                                        * system catalogs, since we tend to release write >lock
>> -                                      * before commit there.  Give a warning if neither >case
>> -                                      * applies.
>> +                                      * before commit there.
>>                                        */
>>                                       xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
>>                                       if (!TransactionIdIsCurrentTransactionId(xwait))
>>                                       {
>> -                                             if (!is_system_catalog)
>> -                                                     elog(WARNING, "concurrent insert in >progress within table
\"%s\"",
>> -                                                              RelationGetRelationName>(heapRelation));
>> -
>>                                               /*
>>                                                * If we are performing uniqueness checks, >>indexing
>>                                                * such a tuple could lead to a bogus >uniqueness
>
>Huh, I don't think this should be necessary?

Yes, I think you're right, I perhaps got carried away removing checks
on concurrent inserts. I will revert those changes.


>> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
>> index a4944fa..9d3f100 100644
>> --- a/src/backend/access/transam/varsup.c
>> +++ b/src/backend/access/transam/varsup.c
>> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
>>       TransactionId xid;
>>
>>       /*
>> -      * Workers synchronize transaction state at the beginning of each parallel
>> -      * operation, so we can't account for new XIDs after that point.
>> -      */
>> -     if (IsInParallelMode())
>> -             elog(ERROR, "cannot assign TransactionIds during a parallel operation");
>> -
>> -     /*
>>        * During bootstrap initialization, we return the special bootstrap
>>        * transaction id.
>>        */
>
>Same thing, this code cannot just be allowed to be reachable. What
>prevents you from assigning two different xids from different workers
>etc?

At least in the case of Parallel INSERT, the leader for the Parallel
INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
through and assigned to each of the workers during their
initialization (so they are assigned the same xid).


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Bharath Rupireddy

Date:

25 September 2020, 09:01:07

On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.
>

Thanks Greg for the patch.

I have few points (inspired from parallel copy feature work) to mention:

1. What if the target table is a foreign table or partitioned table?
2. What happens if the target table has triggers(before statement,
after statement, before row, after row) that are parallel unsafe?
3. Will each worker be doing single row insertions or multi inserts?
If single row insertions, will the buffer lock contentions be more?
5. How does it behave with toast columns values?
6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT?

I'm looking forward to seeing some initial numbers on execution times
with and without patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 September 2020, 11:41:06

On Fri, Sep 25, 2020 at 7:01 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I have few points (inspired from parallel copy feature work) to mention:
>
> 1. What if the target table is a foreign table or partitioned table?
> 2. What happens if the target table has triggers(before statement,
> after statement, before row, after row) that are parallel unsafe?
> 3. Will each worker be doing single row insertions or multi inserts?
> If single row insertions, will the buffer lock contentions be more?
> 5. How does it behave with toast columns values?
> 6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT?
>

Hi Bharath,

Thanks for pointing out more cases I need to exclude and things I need
to investigate further.
I have taken note of them, and will do more testing and improvement.
At least RETURNING clause with INSERT INTO SELECT is working!

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

25 September 2020, 12:17:42

On Fri, Sep 25, 2020 at 10:02 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Hi Andres,
>
> On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres@anarazel.de> wrote:
> >
>
>
> >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
> >>       TupleDesc       toasttupDesc;
> >>       Datum           t_values[3];
> >>       bool            t_isnull[3];
> >> -     CommandId       mycid = GetCurrentCommandId(true);
> >> +     CommandId       mycid = GetCurrentCommandId(!IsParallelWorker());
> >>       struct varlena *result;
> >>       struct varatt_external toast_pointer;
> >>       union
> >
> >Hm? Why do we need this in the various places you have made this change?
>
> It's because for Parallel INSERT, we're assigning the same command-id
> to each worker up-front during worker initialization (the commandId
> has been retrieved by the leader and passed through to each worker)
> and "currentCommandIdUsed" has been set true. See the
> AssignCommandIdForWorker() function in the patch.
> If you see the code of GetCurrentCommandId(), you'll see it Assert
> that it's not being run by a parallel worker if the parameter is true.
> I didn't want to remove yet another check, without being able to know
> the context of the caller, because only for Parallel INSERT do I know
> that "currentCommandIdUsed was already true at the start of the
> parallel operation". See the comment in that function. Anyway, that's
> why I'm passing "false" to relevant GetCurrentCommandId() calls if
> they're being run by a parallel (INSERT) worker.
>

But we can tighten the condition in GetCurrentCommandId() such that it
Asserts for parallel worker only when currentCommandIdUsed is not set
before start of parallel operation. I also find these changes in the
callers of GetCurrentCommandId() quite adhoc and ugly even if they are
correct. Also, why we don't face a similar problems for parallel copy?

>
> >> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> >> index a4944fa..9d3f100 100644
> >> --- a/src/backend/access/transam/varsup.c
> >> +++ b/src/backend/access/transam/varsup.c
> >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
> >>       TransactionId xid;
> >>
> >>       /*
> >> -      * Workers synchronize transaction state at the beginning of each parallel
> >> -      * operation, so we can't account for new XIDs after that point.
> >> -      */
> >> -     if (IsInParallelMode())
> >> -             elog(ERROR, "cannot assign TransactionIds during a parallel operation");
> >> -
> >> -     /*
> >>        * During bootstrap initialization, we return the special bootstrap
> >>        * transaction id.
> >>        */
> >
> >Same thing, this code cannot just be allowed to be reachable. What
> >prevents you from assigning two different xids from different workers
> >etc?
>
> At least in the case of Parallel INSERT, the leader for the Parallel
> INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
> through and assigned to each of the workers during their
> initialization (so they are assigned the same xid).
>

So are you facing problems in this area because we EnterParallelMode
before even assigning the xid in the leader? Because I don't think we
should ever reach this code in the worker. If so, there are two
possibilities that come to my mind (a) assign xid in leader before
entering parallel mode or (b) change the check so that we don't assign
the new xid in workers. In this case, I am again wondering how does
parallel copy dealing this?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Bharath Rupireddy

Date:

25 September 2020, 13:09:32

On Fri, Sep 25, 2020 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > At least in the case of Parallel INSERT, the leader for the Parallel
> > INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
> > through and assigned to each of the workers during their
> > initialization (so they are assigned the same xid).
> >
>
> So are you facing problems in this area because we EnterParallelMode
> before even assigning the xid in the leader? Because I don't think we
> should ever reach this code in the worker. If so, there are two
> possibilities that come to my mind (a) assign xid in leader before
> entering parallel mode or (b) change the check so that we don't assign
> the new xid in workers. In this case, I am again wondering how does
> parallel copy dealing this?
>

In parallel copy, we are doing option (a) i.e. the leader gets the
full txn id before entering parallel mode and passes it to all
workers.
In the leader:
    full_transaction_id = GetCurrentFullTransactionId();
    EnterParallelMode();
    shared_info_ptr->full_transaction_id = full_transaction_id;
In the workers:
    AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);

Hence below part of the code doesn't get hit.
    if (IsInParallelMode() || IsParallelWorker())
        elog(ERROR, "cannot assign XIDs during a parallel operation");

We also deal with the commandid similarly i.e. the leader gets the
command id, and workers would use it while insertion.
In the leader:
    shared_info_ptr->mycid = GetCurrentCommandId(true);
In the workers:
    AssignCommandIdForWorker(pcshared_info->mycid, true);

[1]
void
AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
{
    TransactionState s = CurrentTransactionState;

    Assert((IsInParallelMode() || IsParallelWorker()));
    s->fullTransactionId = fullTransactionId;
}

void
AssignCommandIdForWorker(CommandId commandId, bool used)
{
    Assert((IsInParallelMode() || IsParallelWorker()));

    /* this is global to a transaction, not subtransaction-local */
    if (used)
        currentCommandIdUsed = true;

    currentCommandId = commandId;
}

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 September 2020, 15:52:50

On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> But we can tighten the condition in GetCurrentCommandId() such that it
> Asserts for parallel worker only when currentCommandIdUsed is not set
> before start of parallel operation. I also find these changes in the
> callers of GetCurrentCommandId() quite adhoc and ugly even if they are
> correct. Also, why we don't face a similar problems for parallel copy?
>

For Parallel Insert, as part of query plan execution,
GetCurrentCommandId(true) is being called as part of INSERT statement
execution.
Parallel Copy of course doesn't have to deal with this. That's why
there's a difference. And also, it has its own parallel entry point
(ParallelCopyMain), so it's in full control, it's not trying to fit in
with the infrastructure for plan execution.

> So are you facing problems in this area because we EnterParallelMode
> before even assigning the xid in the leader? Because I don't think we
> should ever reach this code in the worker. If so, there are two
> possibilities that come to my mind (a) assign xid in leader before
> entering parallel mode or (b) change the check so that we don't assign
> the new xid in workers. In this case, I am again wondering how does
> parallel copy dealing this?
>

Again, there's a fundamental difference in the Parallel Insert case.
Right at the top of ExecutePlan it calls EnterParallelMode().
For ParallelCopy(), there is no such problem. EnterParallelMode() is
only called just before ParallelCopyMain() is called. So it can easily
acquire the xid before this, because then parallel mode is not set.

As it turns out, I think I have solved the commandId issue (and almost
the xid issue) by realising that both the xid and cid are ALREADY
being included as part of the serialized transaction state in the
Parallel DSM. So actually I don't believe that there is any need for
separately passing them in the DSM, and having to use those
AssignXXXXForWorker() functions in the worker code - not even in the
Parallel Copy case (? - need to check). GetCurrentCommandId(true) and
GetFullTransactionId() need to be called prior to Parallel DSM
initialization, so they are included in the serialized transaction
state.
I just needed to add a function to set currentCommandIdUsed=true in
the worker initialization (for INSERT case) and make a small tweak to
the Assert in GetCurrentCommandId() to ensure that
currentCommandIdUsed, in a parallel worker, never gets set to true
when it is false. This is in line with the comment in that function,
because we know that "currentCommandId was already true at the start
of the parallel operation". With this in place, I don't need to change
any of the original calls to GetCurrentCommandId(), so this addresses
that issue raised by Andres.

I am not sure yet how to get past the issue of the parallel mode being
set at the top of ExecutePlan(). With that in place, it doesn't allow
a xid to be acquired for the leader, without removing/changing that
parallel-mode check in GetNewTransactionId().

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

26 September 2020, 05:16:51

On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@g
>
> As it turns out, I think I have solved the commandId issue (and almost
> the xid issue) by realising that both the xid and cid are ALREADY
> being included as part of the serialized transaction state in the
> Parallel DSM. So actually I don't believe that there is any need for
> separately passing them in the DSM, and having to use those
> AssignXXXXForWorker() functions in the worker code - not even in the
> Parallel Copy case (? - need to check). GetCurrentCommandId(true) and
> GetFullTransactionId() need to be called prior to Parallel DSM
> initialization, so they are included in the serialized transaction
> state.
> I just needed to add a function to set currentCommandIdUsed=true in
> the worker initialization (for INSERT case) and make a small tweak to
> the Assert in GetCurrentCommandId() to ensure that
> currentCommandIdUsed, in a parallel worker, never gets set to true
> when it is false. This is in line with the comment in that function,
> because we know that "currentCommandId was already true at the start
> of the parallel operation". With this in place, I don't need to change
> any of the original calls to GetCurrentCommandId(), so this addresses
> that issue raised by Andres.
>
> I am not sure yet how to get past the issue of the parallel mode being
> set at the top of ExecutePlan(). With that in place, it doesn't allow
> a xid to be acquired for the leader, without removing/changing that
> parallel-mode check in GetNewTransactionId().
>

I think now there is no fundamental problem in allocating xid in the
leader and then sharing it with workers who can use it to perform the
insert. So we can probably tweak that check so that it is true for
only parallel workers.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

26 September 2020, 05:29:44

On Fri, Sep 25, 2020 at 2:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > For cases where it can't be allowed (e.g. INSERT into a table with
> > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> > ...") it at least allows parallelism of the SELECT part.
> >
>
> Thanks Greg for the patch.
>
> 2. What happens if the target table has triggers(before statement,
> after statement, before row, after row) that are parallel unsafe?
>

In such a case, the parallel insert shouldn't be selected. However, we
should still be able to execute the Select part in parallel.

> 3. Will each worker be doing single row insertions or multi inserts?
> If single row insertions, will the buffer lock contentions be more?
>

I don't think the purpose of this patch is to change the basic flow of
how Insert works and also I am not sure if it is worth the effort as
well. I have answered this earlier in a bit more detailed way [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Bharath Rupireddy

Date:

26 September 2020, 05:30:04

On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> Again, there's a fundamental difference in the Parallel Insert case.
> Right at the top of ExecutePlan it calls EnterParallelMode().
> For ParallelCopy(), there is no such problem. EnterParallelMode() is
> only called just before ParallelCopyMain() is called. So it can easily
> acquire the xid before this, because then parallel mode is not set.
>
> As it turns out, I think I have solved the commandId issue (and almost
> the xid issue) by realising that both the xid and cid are ALREADY
> being included as part of the serialized transaction state in the
> Parallel DSM. So actually I don't believe that there is any need for
> separately passing them in the DSM, and having to use those
> AssignXXXXForWorker() functions in the worker code - not even in the
> Parallel Copy case (? - need to check).
>

Thanks Gred for the detailed points.

I further checked on full txn id and command id. Yes, these are
getting passed to workers  via InitializeParallelDSM() ->
SerializeTransactionState(). I tried to summarize what we need to do
in case of parallel inserts in general i.e. parallel COPY, parallel
inserts in INSERT INTO and parallel inserts in CTAS.

In the leader:
    GetCurrentFullTransactionId()
    GetCurrentCommandId(true)
    EnterParallelMode();
    InitializeParallelDSM() --> calls SerializeTransactionState()
(both full txn id and command id are serialized into parallel DSM)

In the workers:
ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
full txn id and command id are restored into workers'
CurrentTransactionState->fullTransactionId and currentCommandId)
If the parallel workers are meant for insertions, then we need to set
currentCommandIdUsed = true; Maybe we can lift the assert in
GetCurrentCommandId(), if we don't want to touch that function, then
we can have a new function GetCurrentCommandidInWorker() whose
functionality will be same as GetCurrentCommandId() without the
Assert(!IsParallelWorker());.

Am I missing something?

If the above points are true, we might have to update the parallel
copy patch set, test the use cases and post separately in the parallel
copy thread in coming days.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

26 September 2020, 05:41:52

On Sat, Sep 26, 2020 at 11:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > Again, there's a fundamental difference in the Parallel Insert case.
> > Right at the top of ExecutePlan it calls EnterParallelMode().
> > For ParallelCopy(), there is no such problem. EnterParallelMode() is
> > only called just before ParallelCopyMain() is called. So it can easily
> > acquire the xid before this, because then parallel mode is not set.
> >
> > As it turns out, I think I have solved the commandId issue (and almost
> > the xid issue) by realising that both the xid and cid are ALREADY
> > being included as part of the serialized transaction state in the
> > Parallel DSM. So actually I don't believe that there is any need for
> > separately passing them in the DSM, and having to use those
> > AssignXXXXForWorker() functions in the worker code - not even in the
> > Parallel Copy case (? - need to check).
> >
>
> Thanks Gred for the detailed points.
>
> I further checked on full txn id and command id. Yes, these are
> getting passed to workers  via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
>     GetCurrentFullTransactionId()
>     GetCurrentCommandId(true)
>     EnterParallelMode();
>     InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>

This won't be true for Parallel Insert patch as explained by Greg as
well because we enter-parallel-mode much before we assign xid.


-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

26 September 2020, 06:37:13

On Fri, Sep 25, 2020 at 9:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > > What if this
> > > ends up being invoked from inside C code?
> > >
> >
> > I think it shouldn't be a problem unless one is trying to do something
> > like insert into foreign key table. So, probably we can have an Assert
> > to catch it if possible. Do you have any other idea?
> >
>
> Note that the planner code updated by the patch does avoid creating a
> Parallel INSERT plan in the case of inserting into a table with a
> foreign key (so commandIds won't be created in the parallel-worker
> code).
> I'm not sure how to distinguish the "invoked from inside C code" case though.
>

I think if possible we can have an Assert to check if it is a
parallel-worker and relation has a foreign-key. Similarly, we can
enhance the check for any other un-safe use. This will prevent the
illegal usage of inserts via parallel workers.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

vignesh C

Date:

26 September 2020, 16:03:14

On Wed, Sep 23, 2020 at 2:21 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > - When INSERTs are made parallel, currently the reported row-count in
> > the "INSERT 0 <row-count>" status only reflects the rows that the
> > leader has processed (not the workers) - so it is obviously less than
> > the actual number of rows inserted.
>
> Attached an updated patch which fixes this issue (for parallel
> INSERTs, each worker's processed tuple count is communicated in shared
> memory back to the leader, where it is added to the global
> "es_processed" count).

I noticed that we are not having any check for skipping temporary
table insertion.

/* Check if the target relation has foreign keys; if so, avoid
* creating a parallel Insert plan (because inserting into
* such tables would result in creation of new CommandIds, and
* this isn't supported by parallel workers).
* Similarly, avoid creating a parallel Insert plan if ON
* CONFLICT ... DO UPDATE ... has been specified, because
* parallel UPDATE is not supported.
* However, do allow any underlying query to be run by parallel
* workers in these cases.
*/

You should also include temporary tables check here, as parallel
workers might not have access to temporary tables.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

27 September 2020, 23:41:41

On Sun, Sep 27, 2020 at 2:03 AM vignesh C <vignesh21@gmail.com> wrote:
>
> I noticed that we are not having any check for skipping temporary
> table insertion.
>

> You should also include temporary tables check here, as parallel
> workers might not have access to temporary tables.
>

Thanks Vignesh, you are right, I need to test this and add it to the
list of further exclusions that the patch needs to check for.
Hopefully I can provide an updated patch soon that caters for these
additional identified cases.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

28 September 2020, 03:14:27

On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> I further checked on full txn id and command id. Yes, these are
> getting passed to workers  via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
>     GetCurrentFullTransactionId()
>     GetCurrentCommandId(true)
>     EnterParallelMode();
>     InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>
> In the workers:
> ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> full txn id and command id are restored into workers'
> CurrentTransactionState->fullTransactionId and currentCommandId)
> If the parallel workers are meant for insertions, then we need to set
> currentCommandIdUsed = true; Maybe we can lift the assert in
> GetCurrentCommandId(), if we don't want to touch that function, then
> we can have a new function GetCurrentCommandidInWorker() whose
> functionality will be same as GetCurrentCommandId() without the
> Assert(!IsParallelWorker());.
>
> Am I missing something?
>
> If the above points are true, we might have to update the parallel
> copy patch set, test the use cases and post separately in the parallel
> copy thread in coming days.
>

Hi Bharath,

I pretty much agree with your above points.

I've attached an updated Parallel INSERT...SELECT patch, that:
- Only uses existing transaction state serialization support for
transfer of xid and cid.
- Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
currentCommandIdUsed=true at the start of a parallel operation (used
for Parallel INSERT case, where we know the currentCommandId has been
assigned to the worker at the start of the parallel operation).
- Tweaks the Assert condition within "used=true" parameter case in
GetCurrentCommandId(), so that it only fires if in a parallel worker
and currentCommandId is false - refer to the updated comment in that
function.
- Does not modify any existing GetCurrentCommandId() calls.
- Does not remove any existing parallel-related asserts/checks, except
for the "cannot insert tuples in a parallel worker" error in
heap_prepare_insert(). I am still considering what to do with the
original error-check here.
[- Does not yet cater for other exclusion cases that you and Vignesh
have pointed out]

This patch is mostly a lot cleaner, but does contain a possible ugly
hack, in that where it needs to call GetCurrentFullTransactionId(), it
must temporarily escape parallel-mode (recalling that parallel-mode is
set true right at the top of ExectePlan() in the cases of Parallel
INSERT/SELECT).

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

0003-ParallelInsertSelect.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Bharath Rupireddy

Date:

28 September 2020, 10:36:04

On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I further checked on full txn id and command id. Yes, these are
> > getting passed to workers  via InitializeParallelDSM() ->
> > SerializeTransactionState(). I tried to summarize what we need to do
> > in case of parallel inserts in general i.e. parallel COPY, parallel
> > inserts in INSERT INTO and parallel inserts in CTAS.
> >
> > In the leader:
> >     GetCurrentFullTransactionId()
> >     GetCurrentCommandId(true)
> >     EnterParallelMode();
> >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > (both full txn id and command id are serialized into parallel DSM)
> >
> > In the workers:
> > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > full txn id and command id are restored into workers'
> > CurrentTransactionState->fullTransactionId and currentCommandId)
> > If the parallel workers are meant for insertions, then we need to set
> > currentCommandIdUsed = true; Maybe we can lift the assert in
> > GetCurrentCommandId(), if we don't want to touch that function, then
> > we can have a new function GetCurrentCommandidInWorker() whose
> > functionality will be same as GetCurrentCommandId() without the
> > Assert(!IsParallelWorker());.
> >
> > Am I missing something?
> >
> > If the above points are true, we might have to update the parallel
> > copy patch set, test the use cases and post separately in the parallel
> > copy thread in coming days.
> >
>
> Hi Bharath,
>
> I pretty much agree with your above points.
>
> I've attached an updated Parallel INSERT...SELECT patch, that:
> - Only uses existing transaction state serialization support for
> transfer of xid and cid.
> - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> currentCommandIdUsed=true at the start of a parallel operation (used
> for Parallel INSERT case, where we know the currentCommandId has been
> assigned to the worker at the start of the parallel operation).
> - Tweaks the Assert condition within "used=true" parameter case in
> GetCurrentCommandId(), so that it only fires if in a parallel worker
> and currentCommandId is false - refer to the updated comment in that
> function.
> - Does not modify any existing GetCurrentCommandId() calls.
> - Does not remove any existing parallel-related asserts/checks, except
> for the "cannot insert tuples in a parallel worker" error in
> heap_prepare_insert(). I am still considering what to do with the
> original error-check here.
> [- Does not yet cater for other exclusion cases that you and Vignesh
> have pointed out]
>
> This patch is mostly a lot cleaner, but does contain a possible ugly
> hack, in that where it needs to call GetCurrentFullTransactionId(), it
> must temporarily escape parallel-mode (recalling that parallel-mode is
> set true right at the top of ExectePlan() in the cases of Parallel
> INSERT/SELECT).
>

Thanks Greg.

In general, see a few things common to all parallel insert
cases(CTAS[1], COPY[2], INSERT INTO SELECTs):
1. Removal of "cannot insert tuples in a parallel worker" restriction
from heap_prepare_insert()
2. Each worker should be able to set currentCommandIdUsed to true.
3. The change you proposed to make in GetCurrentCommandId()'s assert condition.

Please add if I miss any other common point.

Common solutions to each of the above points would be beneficial to
all the parallel insert cases. How about having a common thread,
discussion and a common patch for all the 3 points?

@Amit Kapila  @Greg Nancarrow @vignesh C Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACWj%2B3H5TQqwxANZmdePEnSNxk-YAeT1c5WE184Gf75XUw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

28 September 2020, 12:25:00

On Mon, Sep 28, 2020 at 4:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I further checked on full txn id and command id. Yes, these are
> > > getting passed to workers  via InitializeParallelDSM() ->
> > > SerializeTransactionState(). I tried to summarize what we need to do
> > > in case of parallel inserts in general i.e. parallel COPY, parallel
> > > inserts in INSERT INTO and parallel inserts in CTAS.
> > >
> > > In the leader:
> > >     GetCurrentFullTransactionId()
> > >     GetCurrentCommandId(true)
> > >     EnterParallelMode();
> > >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > > (both full txn id and command id are serialized into parallel DSM)
> > >
> > > In the workers:
> > > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > > full txn id and command id are restored into workers'
> > > CurrentTransactionState->fullTransactionId and currentCommandId)
> > > If the parallel workers are meant for insertions, then we need to set
> > > currentCommandIdUsed = true; Maybe we can lift the assert in
> > > GetCurrentCommandId(), if we don't want to touch that function, then
> > > we can have a new function GetCurrentCommandidInWorker() whose
> > > functionality will be same as GetCurrentCommandId() without the
> > > Assert(!IsParallelWorker());.
> > >
> > > Am I missing something?
> > >
> > > If the above points are true, we might have to update the parallel
> > > copy patch set, test the use cases and post separately in the parallel
> > > copy thread in coming days.
> > >
> >
> > Hi Bharath,
> >
> > I pretty much agree with your above points.
> >
> > I've attached an updated Parallel INSERT...SELECT patch, that:
> > - Only uses existing transaction state serialization support for
> > transfer of xid and cid.
> > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> > currentCommandIdUsed=true at the start of a parallel operation (used
> > for Parallel INSERT case, where we know the currentCommandId has been
> > assigned to the worker at the start of the parallel operation).
> > - Tweaks the Assert condition within "used=true" parameter case in
> > GetCurrentCommandId(), so that it only fires if in a parallel worker
> > and currentCommandId is false - refer to the updated comment in that
> > function.
> > - Does not modify any existing GetCurrentCommandId() calls.
> > - Does not remove any existing parallel-related asserts/checks, except
> > for the "cannot insert tuples in a parallel worker" error in
> > heap_prepare_insert(). I am still considering what to do with the
> > original error-check here.
> > [- Does not yet cater for other exclusion cases that you and Vignesh
> > have pointed out]
> >
> > This patch is mostly a lot cleaner, but does contain a possible ugly
> > hack, in that where it needs to call GetCurrentFullTransactionId(), it
> > must temporarily escape parallel-mode (recalling that parallel-mode is
> > set true right at the top of ExectePlan() in the cases of Parallel
> > INSERT/SELECT).
> >
>
> Thanks Greg.
>
> In general, see a few things common to all parallel insert
> cases(CTAS[1], COPY[2], INSERT INTO SELECTs):
> 1. Removal of "cannot insert tuples in a parallel worker" restriction
> from heap_prepare_insert()
> 2. Each worker should be able to set currentCommandIdUsed to true.
> 3. The change you proposed to make in GetCurrentCommandId()'s assert condition.
>
> Please add if I miss any other common point.
>
> Common solutions to each of the above points would be beneficial to
> all the parallel insert cases. How about having a common thread,
> discussion and a common patch for all the 3 points?
>

I am not sure if that is required at this stage, lets first sort out
other parts of the design because there could be other bigger problems
which we have not thought bout yet. I have already shared some
thoughts on those points in this thread, lets first get that done and
have the basic patch ready then if required we can discuss in detail
about these points in other thread.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

29 September 2020, 14:57:48

On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I further checked on full txn id and command id. Yes, these are
> > getting passed to workers  via InitializeParallelDSM() ->
> > SerializeTransactionState(). I tried to summarize what we need to do
> > in case of parallel inserts in general i.e. parallel COPY, parallel
> > inserts in INSERT INTO and parallel inserts in CTAS.
> >
> > In the leader:
> >     GetCurrentFullTransactionId()
> >     GetCurrentCommandId(true)
> >     EnterParallelMode();
> >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > (both full txn id and command id are serialized into parallel DSM)
> >
> > In the workers:
> > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > full txn id and command id are restored into workers'
> > CurrentTransactionState->fullTransactionId and currentCommandId)
> > If the parallel workers are meant for insertions, then we need to set
> > currentCommandIdUsed = true; Maybe we can lift the assert in
> > GetCurrentCommandId(), if we don't want to touch that function, then
> > we can have a new function GetCurrentCommandidInWorker() whose
> > functionality will be same as GetCurrentCommandId() without the
> > Assert(!IsParallelWorker());.
> >
> > Am I missing something?
> >
> > If the above points are true, we might have to update the parallel
> > copy patch set, test the use cases and post separately in the parallel
> > copy thread in coming days.
> >
>
> Hi Bharath,
>
> I pretty much agree with your above points.
>
> I've attached an updated Parallel INSERT...SELECT patch, that:
> - Only uses existing transaction state serialization support for
> transfer of xid and cid.
> - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> currentCommandIdUsed=true at the start of a parallel operation (used
> for Parallel INSERT case, where we know the currentCommandId has been
> assigned to the worker at the start of the parallel operation).
> - Tweaks the Assert condition within "used=true" parameter case in
> GetCurrentCommandId(), so that it only fires if in a parallel worker
> and currentCommandId is false - refer to the updated comment in that
> function.
> - Does not modify any existing GetCurrentCommandId() calls.
> - Does not remove any existing parallel-related asserts/checks, except
> for the "cannot insert tuples in a parallel worker" error in
> heap_prepare_insert(). I am still considering what to do with the
> original error-check here.
> [- Does not yet cater for other exclusion cases that you and Vignesh
> have pointed out]
>
> This patch is mostly a lot cleaner, but does contain a possible ugly
> hack, in that where it needs to call GetCurrentFullTransactionId(), it
> must temporarily escape parallel-mode (recalling that parallel-mode is
> set true right at the top of ExectePlan() in the cases of Parallel
> INSERT/SELECT).

I think you still need to work on the costing part, basically if we
are parallelizing whole insert then plan is like below

-> Gather
  -> Parallel Insert
      -> Parallel Seq Scan

That means the tuple we are selecting via scan are not sent back to
the gather node, so in cost_gather we need to see if it is for the
INSERT then there is no row transferred through the parallel queue
that mean we need not to pay any parallel tuple cost.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

29 September 2020, 15:03:04

On Tue, Sep 29, 2020 at 8:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I further checked on full txn id and command id. Yes, these are
> > > getting passed to workers  via InitializeParallelDSM() ->
> > > SerializeTransactionState(). I tried to summarize what we need to do
> > > in case of parallel inserts in general i.e. parallel COPY, parallel
> > > inserts in INSERT INTO and parallel inserts in CTAS.
> > >
> > > In the leader:
> > >     GetCurrentFullTransactionId()
> > >     GetCurrentCommandId(true)
> > >     EnterParallelMode();
> > >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > > (both full txn id and command id are serialized into parallel DSM)
> > >
> > > In the workers:
> > > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > > full txn id and command id are restored into workers'
> > > CurrentTransactionState->fullTransactionId and currentCommandId)
> > > If the parallel workers are meant for insertions, then we need to set
> > > currentCommandIdUsed = true; Maybe we can lift the assert in
> > > GetCurrentCommandId(), if we don't want to touch that function, then
> > > we can have a new function GetCurrentCommandidInWorker() whose
> > > functionality will be same as GetCurrentCommandId() without the
> > > Assert(!IsParallelWorker());.
> > >
> > > Am I missing something?
> > >
> > > If the above points are true, we might have to update the parallel
> > > copy patch set, test the use cases and post separately in the parallel
> > > copy thread in coming days.
> > >
> >
> > Hi Bharath,
> >
> > I pretty much agree with your above points.
> >
> > I've attached an updated Parallel INSERT...SELECT patch, that:
> > - Only uses existing transaction state serialization support for
> > transfer of xid and cid.
> > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> > currentCommandIdUsed=true at the start of a parallel operation (used
> > for Parallel INSERT case, where we know the currentCommandId has been
> > assigned to the worker at the start of the parallel operation).
> > - Tweaks the Assert condition within "used=true" parameter case in
> > GetCurrentCommandId(), so that it only fires if in a parallel worker
> > and currentCommandId is false - refer to the updated comment in that
> > function.
> > - Does not modify any existing GetCurrentCommandId() calls.
> > - Does not remove any existing parallel-related asserts/checks, except
> > for the "cannot insert tuples in a parallel worker" error in
> > heap_prepare_insert(). I am still considering what to do with the
> > original error-check here.
> > [- Does not yet cater for other exclusion cases that you and Vignesh
> > have pointed out]
> >
> > This patch is mostly a lot cleaner, but does contain a possible ugly
> > hack, in that where it needs to call GetCurrentFullTransactionId(), it
> > must temporarily escape parallel-mode (recalling that parallel-mode is
> > set true right at the top of ExectePlan() in the cases of Parallel
> > INSERT/SELECT).
>
> I think you still need to work on the costing part, basically if we
> are parallelizing whole insert then plan is like below
>
> -> Gather
>   -> Parallel Insert
>       -> Parallel Seq Scan
>
> That means the tuple we are selecting via scan are not sent back to
> the gather node, so in cost_gather we need to see if it is for the
> INSERT then there is no row transferred through the parallel queue
> that mean we need not to pay any parallel tuple cost.

I just looked into the parallel CTAS[1] patch for the same thing, and
I can see in that patch it is being handled.

[1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

30 September 2020, 02:07:40

> >
> > I think you still need to work on the costing part, basically if we
> > are parallelizing whole insert then plan is like below
> >
> > -> Gather
> >   -> Parallel Insert
> >       -> Parallel Seq Scan
> >
> > That means the tuple we are selecting via scan are not sent back to
> > the gather node, so in cost_gather we need to see if it is for the
> > INSERT then there is no row transferred through the parallel queue
> > that mean we need not to pay any parallel tuple cost.
>
> I just looked into the parallel CTAS[1] patch for the same thing, and
> I can see in that patch it is being handled.
>
> [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
>

Hi Dilip,

You're right, the costing for Parallel Insert is not done and
finished, I'm still working on the costing, and haven't posted an
updated patch for it yet.
As far as cost_gather() method is concerned, for Parallel INSERT, it
can probably use the same costing approach as the CTAS patch except in
the case of a specified RETURNING clause.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

30 September 2020, 05:58:44

On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > >
> > > I think you still need to work on the costing part, basically if we
> > > are parallelizing whole insert then plan is like below
> > >
> > > -> Gather
> > >   -> Parallel Insert
> > >       -> Parallel Seq Scan
> > >
> > > That means the tuple we are selecting via scan are not sent back to
> > > the gather node, so in cost_gather we need to see if it is for the
> > > INSERT then there is no row transferred through the parallel queue
> > > that mean we need not to pay any parallel tuple cost.
> >
> > I just looked into the parallel CTAS[1] patch for the same thing, and
> > I can see in that patch it is being handled.
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> >
>
> Hi Dilip,
>
> You're right, the costing for Parallel Insert is not done and
> finished, I'm still working on the costing, and haven't posted an
> updated patch for it yet.

Okay.

> As far as cost_gather() method is concerned, for Parallel INSERT, it
> can probably use the same costing approach as the CTAS patch except in
> the case of a specified RETURNING clause.

Yeah right.  I did not think about the returning part.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Bharath Rupireddy

Date:

05 October 2020, 10:56:22

On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > >
> > > I think you still need to work on the costing part, basically if we
> > > are parallelizing whole insert then plan is like below
> > >
> > > -> Gather
> > >   -> Parallel Insert
> > >       -> Parallel Seq Scan
> > >
> > > That means the tuple we are selecting via scan are not sent back to
> > > the gather node, so in cost_gather we need to see if it is for the
> > > INSERT then there is no row transferred through the parallel queue
> > > that mean we need not to pay any parallel tuple cost.
> >
> > I just looked into the parallel CTAS[1] patch for the same thing, and
> > I can see in that patch it is being handled.
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> >
>
> Hi Dilip,
>
> You're right, the costing for Parallel Insert is not done and
> finished, I'm still working on the costing, and haven't posted an
> updated patch for it yet.
> As far as cost_gather() method is concerned, for Parallel INSERT, it
> can probably use the same costing approach as the CTAS patch except in
> the case of a specified RETURNING clause.
>

I have one question which is common to both this patch and parallel
inserts in CTAS[1], do we need to skip creating tuple
queues(ExecParallelSetupTupleQueues) as we don't have any tuples
that's being shared from workers to leader? Put it another way, do we
use the tuple queue for sharing any info other than tuples from
workers to leader?

[1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

05 October 2020, 11:15:04

On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > > >
> > > > I think you still need to work on the costing part, basically if we
> > > > are parallelizing whole insert then plan is like below
> > > >
> > > > -> Gather
> > > >   -> Parallel Insert
> > > >       -> Parallel Seq Scan
> > > >
> > > > That means the tuple we are selecting via scan are not sent back to
> > > > the gather node, so in cost_gather we need to see if it is for the
> > > > INSERT then there is no row transferred through the parallel queue
> > > > that mean we need not to pay any parallel tuple cost.
> > >
> > > I just looked into the parallel CTAS[1] patch for the same thing, and
> > > I can see in that patch it is being handled.
> > >
> > > [1]
https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> > >
> >
> > Hi Dilip,
> >
> > You're right, the costing for Parallel Insert is not done and
> > finished, I'm still working on the costing, and haven't posted an
> > updated patch for it yet.
> > As far as cost_gather() method is concerned, for Parallel INSERT, it
> > can probably use the same costing approach as the CTAS patch except in
> > the case of a specified RETURNING clause.
> >
>
> I have one question which is common to both this patch and parallel
> inserts in CTAS[1], do we need to skip creating tuple
> queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> that's being shared from workers to leader? Put it another way, do we
> use the tuple queue for sharing any info other than tuples from
> workers to leader?

Ideally, we don't need the tuple queue unless we want to transfer the
tuple to the gather node.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

05 October 2020, 11:23:51

On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
>
> I have one question which is common to both this patch and parallel
> inserts in CTAS[1], do we need to skip creating tuple
> queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> that's being shared from workers to leader?
>

As far as this patch is concerned we might need to return tuples when
there is a Returning clause. I think for the cases where we don't need
to return tuples we might want to skip creating these queues if it is
feasible without too many changes.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

05 October 2020, 11:36:39

On Mon, Oct 5, 2020 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> >
> > I have one question which is common to both this patch and parallel
> > inserts in CTAS[1], do we need to skip creating tuple
> > queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> > that's being shared from workers to leader?
> >
>
> As far as this patch is concerned we might need to return tuples when
> there is a Returning clause. I think for the cases where we don't need
> to return tuples we might want to skip creating these queues if it is
> feasible without too many changes.

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

06 October 2020, 09:37:28

On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > > I have one question which is common to both this patch and parallel
> > > inserts in CTAS[1], do we need to skip creating tuple
> > > queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> > > that's being shared from workers to leader?
> > >
> >
> > As far as this patch is concerned we might need to return tuples when
> > there is a Returning clause. I think for the cases where we don't need
> > to return tuples we might want to skip creating these queues if it is
> > feasible without too many changes.
>

Hi Dilip,

You're right. I've included that in my latest version of the patch (so
Gather should only start tuple queues in the case of parallel SELECT
or parallel INSERT with a RETURNING clause).
Other functionality updated includes:
- Added more necessary exclusions for Parallel INSERT INTO ... SELECT
... (but allowing underlying query to still be parallel):
  - non-parallel-safe triggers
  - non-parallel-safe default and check expressions
  - foreign tables
  - temporary tables
- Added support for before/after statement-level INSERT triggers
(can't allow parallel workers to execute these)
- Adjusted cost of Gather node, for when RETURNING clause is not specified
I have not found issues with partition tables (yet) or toast column values.

Also, I have attached a separate patch (requested by Andres Freund)
that just allows the underlying SELECT part of "INSERT INTO ... SELECT
..." to be parallel.

Regards,
Greg Nancarrow
Fujitsu Australia

On Tue, Oct 6, 2020 at 10:38 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
+            if (estate->es_plannedstmt->commandType == CMD_INSERT)
...
+    if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
...
+        isParallelInsertLeader = nodeModifyTableState->operation == CMD_INSERT;
...

One thing I noticed is that you have logic, variable names and
assertions all over the tree that assume that we can only do parallel
*inserts*.  I agree 100% with your plan to make Parallel Insert work
first, it is an excellent goal and if we get it in it'll be a headline
feature of PG14 (along with COPY etc).  That said, I wonder if it
would make sense to use more general naming (isParallelModifyLeader?),
be more liberal where you really mean "is it DML", and find a way to
centralise the logic about which DML commands types are currently
allowed (ie insert only for now) for assertions and error checks etc,
so that in future we don't have to go around and change all these
places and rename things again and again.

While contemplating that, I couldn't resist taking a swing at the main
(?) show stopper for Parallel Update and Parallel Delete, judging by
various clues left in code comments by Robert: combo command IDs
created by other processes.  Here's a rapid prototype to make that
work (though perhaps not as efficiently as we'd want, not sure).  With
that in place, I wonder what else we'd need to extend your patch to
cover all three operations... it can't be much!  Of course I don't
want to derail your work on Parallel Insert, I'm just providing some
motivation for my comments on the (IMHO) shortsightedness of some of
the coding.

PS Why not use git format-patch to create patches?

Attachment

0001-Coordinate-combo-command-IDs-with-parallel-workers.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 October 2020, 02:47:08

On Fri, Oct 9, 2020 at 8:41 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> One thing I noticed is that you have logic, variable names and
> assertions all over the tree that assume that we can only do parallel
> *inserts*.  I agree 100% with your plan to make Parallel Insert work
> first, it is an excellent goal and if we get it in it'll be a headline
> feature of PG14 (along with COPY etc).  That said, I wonder if it
> would make sense to use more general naming (isParallelModifyLeader?),
> be more liberal where you really mean "is it DML", and find a way to
> centralise the logic about which DML commands types are currently
> allowed (ie insert only for now) for assertions and error checks etc,
> so that in future we don't have to go around and change all these
> places and rename things again and again.
>

Fair points.
I agree, it would make more sense to generalise the naming and
centralise the DML-command-type checks, rather than everything being
insert-specific.
It was getting a bit ugly. I'll work on that.

> While contemplating that, I couldn't resist taking a swing at the main
> (?) show stopper for Parallel Update and Parallel Delete, judging by
> various clues left in code comments by Robert: combo command IDs
> created by other processes.  Here's a rapid prototype to make that
> work (though perhaps not as efficiently as we'd want, not sure).  With
> that in place, I wonder what else we'd need to extend your patch to
> cover all three operations... it can't be much!  Of course I don't
> want to derail your work on Parallel Insert, I'm just providing some
> motivation for my comments on the (IMHO) shortsightedness of some of
> the coding.
>

Thanks for your prototype code for coordination of combo command IDs
with the workers.
It does give me the incentive to look beyond that issue and see
whether parallel Update and parallel Delete are indeed possible. I'll
be sure to give it a go!

> PS Why not use git format-patch to create patches?

Guess I was being a bit lazy - will use git format-patch in future.


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Thomas Munro

Date:

09 October 2020, 07:30:40

On Fri, Oct 9, 2020 at 3:48 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> It does give me the incentive to look beyond that issue and see
> whether parallel Update and parallel Delete are indeed possible. I'll
> be sure to give it a go!

Cool!

A couple more observations:

+       pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;

Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
If you change it to assign false always, it works just the same,
except EXPLAIN says:

 Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
   Workers Planned: 2
   ->  Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
         ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)

... instead of:

 Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
   Workers Planned: 2
   ->  Parallel Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
         ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)

AFAICS it's not parallel-aware, it just happens to be running in
parallel with a partial input and partial output (and in this case,
effect in terms of writes).  Parallel-aware is our term for nodes that
actually know they are running in parallel and do some special
coordination with their twins in other processes.

The estimated row count also looks wrong; at a guess, the parallel
divisor is applied twice.  Let me try that with
parallel_leader_particiation=off (which disables some funky maths in
the row estimation and makes it straight division by number of
processes):

 Gather  (cost=17629.00..18645.50 rows=1000000 width=4)
   Workers Planned: 2
   ->  Insert on s  (cost=17629.00..18645.50 rows=250000 width=4)
         ->  Parallel Hash Join  (cost=17629.00..37291.00 rows=500000 width=4)
[more nodes omitted]

Yeah, that was a join that spat out a million rows, and we correctly
estimated 500k per process, and then Insert (still with my hack to
turn off the bogus "Parallel" display in this case, but it doesn't
affect the estimation) estimated 250k per process, which is wrong.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 October 2020, 09:06:33

On Fri, Oct 9, 2020 at 6:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> A couple more observations:
>
> +       pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;
>
> Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
> If you change it to assign false always, it works just the same,
> except EXPLAIN says:
>
>  Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
>          ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)
>
> ... instead of:
>
>  Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Parallel Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
>          ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)
>
> AFAICS it's not parallel-aware, it just happens to be running in
> parallel with a partial input and partial output (and in this case,
> effect in terms of writes).  Parallel-aware is our term for nodes that
> actually know they are running in parallel and do some special
> coordination with their twins in other processes.
>

Ah, thanks, I see the distinction now. I'll fix that, to restore
parallel_aware=false for the ModifyTable node.

> The estimated row count also looks wrong; at a guess, the parallel
> divisor is applied twice.  Let me try that with
> parallel_leader_particiation=off (which disables some funky maths in
> the row estimation and makes it straight division by number of
> processes):
>
>  Gather  (cost=17629.00..18645.50 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Insert on s  (cost=17629.00..18645.50 rows=250000 width=4)
>          ->  Parallel Hash Join  (cost=17629.00..37291.00 rows=500000 width=4)
> [more nodes omitted]
>
> Yeah, that was a join that spat out a million rows, and we correctly
> estimated 500k per process, and then Insert (still with my hack to
> turn off the bogus "Parallel" display in this case, but it doesn't
> affect the estimation) estimated 250k per process, which is wrong.

Thanks, I did suspect the current costing was wrong for ModifyTable
(workers>0 case), as I'd thrown it in (moving current costing code
into costsize.c) without a lot of checking or great thought, and was
on my TODO list of things to check. At least I created a placeholder
for it. Looks like I've applied a parallel-divisor again (not allowing
for that of the underlying query), as you said.
Speaking of costing, I'm not sure I really agree with the current
costing of a Gather node. Just considering a simple Parallel SeqScan
case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
Gather cost always completely drowns out any other path costs when a
large number of rows are involved (at least with default
parallel-related GUC values), such that Parallel SeqScan would never
be the cheapest path. This linear relationship in the costing based on
the rows and a parallel_tuple_cost doesn't make sense to me. Surely
after a certain amount of rows, the overhead of launching workers will
be out-weighed by the benefit of their parallel work, such that the
more rows, the more likely a Parallel SeqScan will benefit. That seems
to suggest something like a logarithmic formula (or similar) would
better match reality than what we have now. Am I wrong on this? Every
time I use default GUC values, the planner doesn't want to generate a
parallel plan. Lowering parallel-related GUCs like parallel_tuple_cost
(which I normally do for testing) influences it of course, but the
linear relationship still seems wrong.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

09 October 2020, 09:09:29

On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Also, I have attached a separate patch (requested by Andres Freund)
> that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> ..." to be parallel.
>

It might be a good idea to first just get this patch committed, if
possible. So, I have reviewed the latest version of this patch:

0001-InsertParallelSelect
1.
ParallelContext *pcxt;

+ /*
+ * We need to avoid an attempt on INSERT to assign a
+ * FullTransactionId whilst in parallel mode (which is in
+ * effect due to the underlying parallel query) - so the
+ * FullTransactionId is assigned here. Parallel mode must
+ * be temporarily escaped in order for this to be possible.
+ * The FullTransactionId will be included in the transaction
+ * state that is serialized in the parallel DSM.
+ */
+ if (estate->es_plannedstmt->commandType == CMD_INSERT)
+ {
+ Assert(IsInParallelMode());
+ ExitParallelMode();
+ GetCurrentFullTransactionId();
+ EnterParallelMode();
+ }
+

This looks like a hack to me. I think you are doing this to avoid the
parallel mode checks in GetNewTransactionId(), right? If so, I have
already mentioned above [1] that we can change it so that we disallow
assigning xids for parallel workers only. The same is true for the
check in ExecGatherMerge. Do you see any problem with that suggestion?

2.
@@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  */
  if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
  IsUnderPostmaster &&
- parse->commandType == CMD_SELECT &&
+ (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
  !parse->hasModifyingCTE &&
  max_parallel_workers_per_gather > 0 &&
  !IsParallelWorker())

I think the comments above this need to be updated especially the part
where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
the leader backend writes into a completely new table.". Don't we need
to include Insert also?

3.
@@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  * parallel-unsafe, or else the query planner itself has a bug.
  */
  glob->parallelModeNeeded = glob->parallelModeOK &&
+ (parse->commandType == CMD_SELECT) &&
  (force_parallel_mode != FORCE_PARALLEL_OFF);

Why do you need this change? The comments above this code should be
updated to reflect this change. I think for the same reason the below
code seems to be modified but I don't understand the reason for the
below change as well, also it is better to update the comments for
this as well.

@@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  * Optionally add a Gather node for testing purposes, provided this is
  * actually a safe thing to do.
  */
- if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
+ if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
== CMD_SELECT && top_plan->parallel_safe)
  {
  Gather    *gather = makeNode(Gather);

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

09 October 2020, 09:42:05

On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Speaking of costing, I'm not sure I really agree with the current
> costing of a Gather node. Just considering a simple Parallel SeqScan
> case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> Gather cost always completely drowns out any other path costs when a
> large number of rows are involved (at least with default
> parallel-related GUC values), such that Parallel SeqScan would never
> be the cheapest path. This linear relationship in the costing based on
> the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> after a certain amount of rows, the overhead of launching workers will
> be out-weighed by the benefit of their parallel work, such that the
> more rows, the more likely a Parallel SeqScan will benefit.
>

That will be true for the number of rows/pages we need to scan not for
the number of tuples we need to return as a result. The formula here
considers the number of rows the parallel scan will return and the
more the number of rows each parallel node needs to pass via shared
memory to gather node the more costly it will be.

We do consider the total pages we need to scan in
compute_parallel_worker() where we use a logarithmic formula to
determine the number of workers.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 October 2020, 10:20:01

On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 0001-InsertParallelSelect
> 1.
> ParallelContext *pcxt;
>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right?

Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
that ExecutePlan() sets parallel-mode for the entire plan execution).
Also, did not expect that to necessarily remain in a final patch.

>If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

No, should be OK I guess, but will update and test to be sure.

> 2.
> @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   */
>   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>   IsUnderPostmaster &&
> - parse->commandType == CMD_SELECT &&
> + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
>   !parse->hasModifyingCTE &&
>   max_parallel_workers_per_gather > 0 &&
>   !IsParallelWorker())
>
> I think the comments above this need to be updated especially the part
> where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> the leader backend writes into a completely new table.". Don't we need
> to include Insert also?

Yes, Insert needs to be mentioned somewhere there.

>
> 3.
> @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   * parallel-unsafe, or else the query planner itself has a bug.
>   */
>   glob->parallelModeNeeded = glob->parallelModeOK &&
> + (parse->commandType == CMD_SELECT) &&
>   (force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Why do you need this change? The comments above this code should be
> updated to reflect this change. I think for the same reason the below
> code seems to be modified but I don't understand the reason for the
> below change as well, also it is better to update the comments for
> this as well.
>

OK, I will update the comments for this.
Basically, up to now, the "force_parallel_mode" has only ever operated
on a SELECT.
But since we are now allowing CMD_INSERT to be assessed for parallel
mode too, we need to prevent the force_parallel_mode logic from
sticking a Gather node over the top of arbitrary INSERTs and causing
them to be run in parallel. Not all INSERTs are suitable for parallel
operation, and also there are further considerations for
parallel-safety for INSERTs compared to SELECT. INSERTs can also
trigger UPDATEs.
If we need to support force_parallel_mode for INSERT, more work will
need to be done.

> @@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   * Optionally add a Gather node for testing purposes, provided this is
>   * actually a safe thing to do.
>   */
> - if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
> + if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
> == CMD_SELECT && top_plan->parallel_safe)
>   {
>   Gather    *gather = makeNode(Gather);
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com
>

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 October 2020, 10:57:38

On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > Speaking of costing, I'm not sure I really agree with the current
> > costing of a Gather node. Just considering a simple Parallel SeqScan
> > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> > Gather cost always completely drowns out any other path costs when a
> > large number of rows are involved (at least with default
> > parallel-related GUC values), such that Parallel SeqScan would never
> > be the cheapest path. This linear relationship in the costing based on
> > the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> > after a certain amount of rows, the overhead of launching workers will
> > be out-weighed by the benefit of their parallel work, such that the
> > more rows, the more likely a Parallel SeqScan will benefit.
> >
>
> That will be true for the number of rows/pages we need to scan not for
> the number of tuples we need to return as a result. The formula here
> considers the number of rows the parallel scan will return and the
> more the number of rows each parallel node needs to pass via shared
> memory to gather node the more costly it will be.
>
> We do consider the total pages we need to scan in
> compute_parallel_worker() where we use a logarithmic formula to
> determine the number of workers.
>

Despite all the best intentions, the current costings seem to be
geared towards selection of a non-parallel plan over a parallel plan,
the more rows there are in the table. Yet the performance of a
parallel plan appears to be better than non-parallel-plan the more
rows there are in the table.
This doesn't seem right to me. Is there a rationale behind this costing model?
I have pointed out the part of the parallel_tuple_cost calculation
that seems to drown out all other costs (causing the cost value to be
huge), the more rows there are in the table.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

09 October 2020, 12:04:38

On Fri, Oct 9, 2020 at 4:28 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > Speaking of costing, I'm not sure I really agree with the current
> > > costing of a Gather node. Just considering a simple Parallel SeqScan
> > > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> > > Gather cost always completely drowns out any other path costs when a
> > > large number of rows are involved (at least with default
> > > parallel-related GUC values), such that Parallel SeqScan would never
> > > be the cheapest path. This linear relationship in the costing based on
> > > the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> > > after a certain amount of rows, the overhead of launching workers will
> > > be out-weighed by the benefit of their parallel work, such that the
> > > more rows, the more likely a Parallel SeqScan will benefit.
> > >
> >
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
> >
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.
> This doesn't seem right to me. Is there a rationale behind this costing model?
>

Yes, AFAIK, there is no proof that we can get any (much) gain by
dividing the I/O among workers. It is primarily the CPU effort which
gives the benefit. So, the parallel plans show greater benefit when we
have to scan a large table and then project much lesser rows.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 October 2020, 12:23:47

On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right? If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

Actually, there is a problem.
If I remove that "hack", and change the code in GetNewTransactionId()
to disallow xid assignment for parallel workers only, then there is
also similar code in AssignTransactionId() which gets called. If I
change that code too, in the same way, then on a parallel INSERT, that
code gets called by a parallel worker (from GetCurrentTransactionId())
and the ERROR "cannot assign XIDs in a parallel worker" results.
GetCurrentFullTransactionId() must be called in the leader, somewhere
(and will be included in the transaction state that is serialized in
the parallel DSM).
If not done here, then where?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

09 October 2020, 12:51:22

On Fri, Oct 9, 2020 at 5:54 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right? If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> Actually, there is a problem.
> If I remove that "hack", and change the code in GetNewTransactionId()
> to disallow xid assignment for parallel workers only, then there is
> also similar code in AssignTransactionId() which gets called.
>

I don't think workers need to call AssignTransactionId(), before that
the transactionid passed from leader should be set in
CurrentTransactionState. Why
GetCurrentTransactionId()/GetCurrentFullTransactionId(void) needs to
call AssignTransactionId() when called from worker?

> GetCurrentFullTransactionId() must be called in the leader, somewhere
> (and will be included in the transaction state that is serialized in
> the parallel DSM).
>

Yes, it should have done in the leader and then it should have been
set in the workers via StartParallelWorkerTransaction before we do any
actual operation. If that happens then GetCurrentTransactionId() won't
need to call AssignTransactionId().

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

09 October 2020, 12:56:27

On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 0001-InsertParallelSelect
> > 1.
> > ParallelContext *pcxt;
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right?
>
> Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
> that ExecutePlan() sets parallel-mode for the entire plan execution).
> Also, did not expect that to necessarily remain in a final patch.
>
> >If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> No, should be OK I guess, but will update and test to be sure.
>
> > 2.
> > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   */
> >   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> >   IsUnderPostmaster &&
> > - parse->commandType == CMD_SELECT &&
> > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
> >   !parse->hasModifyingCTE &&
> >   max_parallel_workers_per_gather > 0 &&
> >   !IsParallelWorker())
> >
> > I think the comments above this need to be updated especially the part
> > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> > the leader backend writes into a completely new table.". Don't we need
> > to include Insert also?
>
> Yes, Insert needs to be mentioned somewhere there.
>
> >
> > 3.
> > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   * parallel-unsafe, or else the query planner itself has a bug.
> >   */
> >   glob->parallelModeNeeded = glob->parallelModeOK &&
> > + (parse->commandType == CMD_SELECT) &&
> >   (force_parallel_mode != FORCE_PARALLEL_OFF);
> >
> > Why do you need this change? The comments above this code should be
> > updated to reflect this change. I think for the same reason the below
> > code seems to be modified but I don't understand the reason for the
> > below change as well, also it is better to update the comments for
> > this as well.
> >
>
> OK, I will update the comments for this.
> Basically, up to now, the "force_parallel_mode" has only ever operated
> on a SELECT.
> But since we are now allowing CMD_INSERT to be assessed for parallel
> mode too, we need to prevent the force_parallel_mode logic from
> sticking a Gather node over the top of arbitrary INSERTs and causing
> them to be run in parallel. Not all INSERTs are suitable for parallel
> operation, and also there are further considerations for
> parallel-safety for INSERTs compared to SELECT. INSERTs can also
> trigger UPDATEs.
>

Sure but in that case 'top_plan->parallel_safe' should be false and it
should stick Gather node atop Insert node. For the purpose of this
patch, the scan beneath Insert should be considered as parallel_safe.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

09 October 2020, 12:58:08

On Fri, Oct 9, 2020 at 6:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > OK, I will update the comments for this.
> > Basically, up to now, the "force_parallel_mode" has only ever operated
> > on a SELECT.
> > But since we are now allowing CMD_INSERT to be assessed for parallel
> > mode too, we need to prevent the force_parallel_mode logic from
> > sticking a Gather node over the top of arbitrary INSERTs and causing
> > them to be run in parallel. Not all INSERTs are suitable for parallel
> > operation, and also there are further considerations for
> > parallel-safety for INSERTs compared to SELECT. INSERTs can also
> > trigger UPDATEs.
> >
>
> Sure but in that case 'top_plan->parallel_safe' should be false and it
> should stick Gather node atop Insert node.
>

/should/should not.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 October 2020, 14:01:20

On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > Sure but in that case 'top_plan->parallel_safe' should be false and it
> > should stick Gather node atop Insert node.
> >
>
> /should/should not.
>

OK, for the minimal patch, just allowing INSERT with parallel SELECT,
you're right, neither of those additional "commandType == CMD_SELECT"
checks are needed, so I'll remove them. (In the main patch, I think
the first check can be removed, once the XID handling is fixed; the
second check is definitely needed though).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Thomas Munro

Date:

09 October 2020, 20:39:15

On Fri, Oct 9, 2020 at 11:58 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.

Right, but as Amit said, we still have to account for the cost of
schlepping tuples between processes.  Hmm... could the problem be that
we're incorrectly estimating that Insert (without RETURNING) will send
a bazillion tuples, even though that isn't true?  I didn't look at the
code but that's what the plan seems to imply when it says stuff like
"Gather  (cost=15428.00..16101.14 rows=1000000 width=4)".  I suppose
the row estimates for ModifyTable paths are based on what they write,
not what they emit, and in the past that distinction didn't matter
much because it wasn't something that was used for comparing
alternative plans.  Now it is.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

10 October 2020, 04:32:46

On Fri, Oct 9, 2020 at 7:32 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > Sure but in that case 'top_plan->parallel_safe' should be false and it
> > > should stick Gather node atop Insert node.
> > >
> >
> > /should/should not.
> >
>
> OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> you're right, neither of those additional "commandType == CMD_SELECT"
> checks are needed, so I'll remove them.
>

Okay, that makes sense.

> (In the main patch, I think
> the first check can be removed, once the XID handling is fixed; the
> second check is definitely needed though).
>

Okay, then move that check but please do add some comments to state the reason.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

10 October 2020, 11:55:42

On Fri, Oct 9, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Also, I have attached a separate patch (requested by Andres Freund)
> > that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> > ..." to be parallel.
> >
>
> It might be a good idea to first just get this patch committed, if
> possible. So, I have reviewed the latest version of this patch:
>

Few initial comments on 0004-ParallelInsertSelect:

1.
@@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation,
HeapTuple tup, TransactionId xid,
  * inserts in general except for the cases where inserts generate a new
  * CommandId (eg. inserts into a table having a foreign key column).
  */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-

I have speculated above [1] to see if we can change this Assert
condition instead of just removing it? Have you considered that
suggestion?

2.
@@ -764,12 +778,13 @@ GetCurrentCommandId(bool used)
  if (used)
  {
  /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader.  We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed
+ * if currentCommandIdUsed was already true at the start of the
+ * parallel operation (by way of SetCurrentCommandIdUsed()), otherwise
+ * forbid setting currentCommandIdUsed because we have no provision
+ * for communicating this back to the leader.
  */
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
  currentCommandIdUsed = true;
  }

Once we allowed this, won't the next CommandCounterIncrement() in the
worker will increment the commandId which will lead to using different
commandIds in worker and leader? Is that prevented in some way, if so,
how? Can we document the same?

3.
@@ -173,7 +173,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * against performing unsafe operations in parallel mode, but this gives a
  * more user-friendly error message.
  */
- if ((XactReadOnly || IsInParallelMode()) &&
+ if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
  !(eflags & EXEC_FLAG_EXPLAIN_ONLY))
  ExecCheckXactReadOnly(queryDesc->plannedstmt);

I don't think above change is correct. We need to extend the below
check in ExecCheckXactReadOnly() because otherwise, it can allow
Insert operations even when XactReadOnly is set which we don't want.

ExecCheckXactReadOnly()
{
..
if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
..
}

4.
@@ -173,18 +175,20 @@ ExecSerializePlan(Plan *plan, EState *estate)
  * PlannedStmt to start the executor.
- pstmt->hasReturning = false;
- pstmt->hasModifyingCTE = false;
+ pstmt->hasReturning = estate->es_plannedstmt->hasReturning;
+ pstmt->hasModifyingCTE = estate->es_plannedstmt->hasModifyingCTE;

Why change hasModifyingCTE?

5.
+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */

The multi-line comment should start from the second line. I see a
similar problem at other places in the patch as well.

6.
@@ -178,6 +214,25 @@ ExecGather(PlanState *pstate)
  node->pei,
  gather->initParam);

+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */
+ if (nodeModifyTableState->fireBSTriggers)
+ {
+ fireBSTriggers(nodeModifyTableState);
+ nodeModifyTableState->fireBSTriggers = false;
+
+ /*
+ * Disable firing of AFTER STATEMENT triggers by local
+ * plan execution (ModifyTable processing). These will be
+ * fired at end of Gather processing.
+ */
+ nodeModifyTableState->fireASTriggers = false;
+ }
+ }

Can we encapsulate this in a separate function? It seems a bit odd to
directly do this ExecGather.

7.
@@ -418,14 +476,25 @@ ExecShutdownGatherWorkers(GatherState *node)
 void
 ExecShutdownGather(GatherState *node)
 {
- ExecShutdownGatherWorkers(node);
+ if (node->pei == NULL)
+ return;

- /* Now destroy the parallel context. */
- if (node->pei != NULL)

So after this patch if "node->pei == NULL" then we won't shutdown
workers here? Why so?

8. You have made changes related to trigger execution for Gather node,
don't we need similar changes for GatherMerge node?

9.
@@ -383,7 +444,21 @@ cost_gather(GatherPath *path, PlannerInfo *root,

  /* Parallel setup and communication cost. */
  startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * For Parallel INSERT, provided no tuples are returned from workers
+ * to gather/leader node, don't add a cost-per-row, as each worker
+ * parallelly inserts the tuples that result from its chunk of plan
+ * execution. This change may make the parallel plan cheap among all
+ * other plans, and influence the planner to consider this parallel
+ * plan.
+ */
+ if (!(IsA(path->subpath, ModifyTablePath) &&
+ castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
+ castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
+ {
+ run_cost += parallel_tuple_cost * path->path.rows;
+ }

Isn't the last condition in above check "castNode(ModifyTablePath,
path->subpath)->returningLists != NULL" should be
"castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
instead? Because otherwise when there is returning list it won't count
the cost for passing tuples via Gather node. This might be reason of
what Thomas has seen in his recent email [2].

10. Don't we need a change similar to cost_gather in
cost_gather_merge? It seems you have made only partial changes for
GatherMerge node.


[1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CA%2BhUKGLZB%3D1Q%2BAQQEEmffr3bUMAh%2BJD%2BJ%2B7axv%2BK10Kea0U9TQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

11 October 2020, 02:05:26

On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 8. You have made changes related to trigger execution for Gather node,
> don't we need similar changes for GatherMerge node?
>
..
>
> 10. Don't we need a change similar to cost_gather in
> cost_gather_merge? It seems you have made only partial changes for
> GatherMerge node.
>

Please ignore these two comments as we can't push Insert to workers
when GatherMerge is involved as a leader backend does the final phase
(merge the results by workers). So, we can only push the Select part
of the statement.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Thomas Munro

Date:

11 October 2020, 02:38:29

On Sun, Oct 11, 2020 at 12:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> + /*
> + * For Parallel INSERT, provided no tuples are returned from workers
> + * to gather/leader node, don't add a cost-per-row, as each worker
> + * parallelly inserts the tuples that result from its chunk of plan
> + * execution. This change may make the parallel plan cheap among all
> + * other plans, and influence the planner to consider this parallel
> + * plan.
> + */
> + if (!(IsA(path->subpath, ModifyTablePath) &&
> + castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
> + castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
> + {
> + run_cost += parallel_tuple_cost * path->path.rows;
> + }
>
> Isn't the last condition in above check "castNode(ModifyTablePath,
> path->subpath)->returningLists != NULL" should be
> "castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
> instead? Because otherwise when there is returning list it won't count
> the cost for passing tuples via Gather node. This might be reason of
> what Thomas has seen in his recent email [2].

Yeah, I think this is trying to fix the problem too late.  Instead, we
should fix the incorrect row estimates so we don't have to fudge it
later like that.  For example, this should be estimating rows=0:

postgres=# explain analyze insert into s select * from t t1 join t t2 using (i);
...
 Insert on s  (cost=30839.08..70744.45 rows=1000226 width=4) (actual
time=2940.560..2940.562 rows=0 loops=1)

I think that should be done with something like this:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root,
RelOptInfo *rel,
                if (lc == list_head(subpaths))  /* first node? */
                        pathnode->path.startup_cost = subpath->startup_cost;
                pathnode->path.total_cost += subpath->total_cost;
-               pathnode->path.rows += subpath->rows;
+               if (returningLists != NIL)
+                       pathnode->path.rows += subpath->rows;
                total_size += subpath->pathtarget->width * subpath->rows;
        }

-       /*
-        * Set width to the average width of the subpath outputs.  XXX this is
-        * totally wrong: we should report zero if no RETURNING, else an average
-        * of the RETURNING tlist widths.  But it's what happened historically,
-        * and improving it is a task for another day.
-        */
        if (pathnode->path.rows > 0)
                total_size /= pathnode->path.rows;
        pathnode->path.pathtarget->width = rint(total_size);

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 October 2020, 01:20:35

On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> > you're right, neither of those additional "commandType == CMD_SELECT"
> > checks are needed, so I'll remove them.
> >
>various
> Okay, that makes sense.
>

For the minimal patch (just allowing INSERT with parallel SELECT),
there are issues with parallel-mode and various parallel-mode-related
checks in the code.
Initially, I thought it was only a couple of XID-related checks (which
could perhaps just be tweaked to check for IsParallelWorker() instead,
as you suggested), but I now realise that there are a lot more cases.
This stems from the fact that just having a parallel SELECT (as part
of non-parallel INSERT) causes parallel-mode to be set for the WHOLE
plan. I'm not sure why parallel-mode is set globally like this, for
the whole plan. Couldn't it just be set for the scope of
Gather/GatherMerge? Otherwise, errors from these checks seem to be
misleading when outside the scope of Gather/GatherMerge, as
technically they are not occurring within the scope of parallel-leader
and parallel-worker(s). The global parallel-mode wouldn't have been an
issue before, because up to now INSERT has never had underlying
parallel operations.

For example, when running the tests under
"force_parallel_mode=regress", the test failures show that there are a
lot more cases affected:

"cannot assign TransactionIds during a parallel operation"
"cannot assign XIDs during a parallel operation"
"cannot start commands during a parallel operation"
"cannot modify commandid in active snapshot during a parallel operation"
"cannot execute nextval() during a parallel operation"
"cannot execute INSERT during a parallel operation"
"cannot execute ANALYZE during a parallel operation
"cannot update tuples during a parallel operation"

(and there are more not currently detected by the tests, found by
searching the code).

As an example, with the minimal patch applied, if you had a trigger on
INSERT that, say, attempted a table creation or UPDATE/DELETE, and you
ran an "INSERT INTO ... SELECT...", it would treat the trigger
operations as being attempted in parallel-mode, and so an error would
result.

Let me know your thoughts on how to deal with these issues.
Can you see a problem with only having parallel-mode set for scope of
Gather/GatherMerge, or do you have some other idea?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 October 2020, 01:44:24

On Sun, Oct 11, 2020 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 8. You have made changes related to trigger execution for Gather node,
> > don't we need similar changes for GatherMerge node?
> >
> ..
> >
> > 10. Don't we need a change similar to cost_gather in
> > cost_gather_merge? It seems you have made only partial changes for
> > GatherMerge node.
> >
>
> Please ignore these two comments as we can't push Insert to workers
> when GatherMerge is involved as a leader backend does the final phase
> (merge the results by workers). So, we can only push the Select part
> of the statement.
>

Precisely, that's why I didn't make those changes for GatherMerge.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 October 2020, 02:41:09

On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Yeah, I think this is trying to fix the problem too late.  Instead, we
> should fix the incorrect row estimates so we don't have to fudge it
> later like that.  For example, this should be estimating rows=0:
>
> postgres=# explain analyze insert into s select * from t t1 join t t2 using (i);
> ...
>  Insert on s  (cost=30839.08..70744.45 rows=1000226 width=4) (actual
> time=2940.560..2940.562 rows=0 loops=1)
>
> I think that should be done with something like this:
>
> --- a/src/backend/optimizer/util/pathnode.c
> +++ b/src/backend/optimizer/util/pathnode.c
> @@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root,
> RelOptInfo *rel,
>                 if (lc == list_head(subpaths))  /* first node? */
>                         pathnode->path.startup_cost = subpath->startup_cost;
>                 pathnode->path.total_cost += subpath->total_cost;
> -               pathnode->path.rows += subpath->rows;
> +               if (returningLists != NIL)
> +                       pathnode->path.rows += subpath->rows;
>                 total_size += subpath->pathtarget->width * subpath->rows;
>         }
>
> -       /*
> -        * Set width to the average width of the subpath outputs.  XXX this is
> -        * totally wrong: we should report zero if no RETURNING, else an average
> -        * of the RETURNING tlist widths.  But it's what happened historically,
> -        * and improving it is a task for another day.
> -        */
>         if (pathnode->path.rows > 0)
>                 total_size /= pathnode->path.rows;
>         pathnode->path.pathtarget->width = rint(total_size);

Agree, thanks (bug in existing Postgres code, right?)

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Thomas Munro

Date:

12 October 2020, 03:10:43

On Mon, Oct 12, 2020 at 3:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >                 pathnode->path.total_cost += subpath->total_cost;
> > -               pathnode->path.rows += subpath->rows;
> > +               if (returningLists != NIL)
> > +                       pathnode->path.rows += subpath->rows;
> >                 total_size += subpath->pathtarget->width * subpath->rows;
> >         }

Erm, except the condition should of course cover total_size too.

> Agree, thanks (bug in existing Postgres code, right?)

Yeah, I think we should go ahead and fix that up front.  Here's a
version with a commit message.

Attachment

0001-Fix-row-estimate-for-ModifyTable-paths.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

12 October 2020, 03:31:42

On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> > > you're right, neither of those additional "commandType == CMD_SELECT"
> > > checks are needed, so I'll remove them.
> > >
> >various
> > Okay, that makes sense.
> >
>
> For the minimal patch (just allowing INSERT with parallel SELECT),
> there are issues with parallel-mode and various parallel-mode-related
> checks in the code.
> Initially, I thought it was only a couple of XID-related checks (which
> could perhaps just be tweaked to check for IsParallelWorker() instead,
> as you suggested), but I now realise that there are a lot more cases.
> This stems from the fact that just having a parallel SELECT (as part
> of non-parallel INSERT) causes parallel-mode to be set for the WHOLE
> plan. I'm not sure why parallel-mode is set globally like this, for
> the whole plan. Couldn't it just be set for the scope of
> Gather/GatherMerge? Otherwise, errors from these checks seem to be
> misleading when outside the scope of Gather/GatherMerge, as
> technically they are not occurring within the scope of parallel-leader
> and parallel-worker(s). The global parallel-mode wouldn't have been an
> issue before, because up to now INSERT has never had underlying
> parallel operations.
>

That is right but there is another operation which works like that.
For ex. a statement like "create table test_new As select * from
test_parallel where c1 < 1000;" will use parallel select but the write
operation will be performed in a leader. I agree that the code flow of
Insert is different so we will have a different set of challenges in
that case but to make it work there shouldn't be any fundamental
problem.

> For example, when running the tests under
> "force_parallel_mode=regress", the test failures show that there are a
> lot more cases affected:
>
> "cannot assign TransactionIds during a parallel operation"
> "cannot assign XIDs during a parallel operation"
> "cannot start commands during a parallel operation"
> "cannot modify commandid in active snapshot during a parallel operation"
> "cannot execute nextval() during a parallel operation"
> "cannot execute INSERT during a parallel operation"
> "cannot execute ANALYZE during a parallel operation
> "cannot update tuples during a parallel operation"
>
> (and there are more not currently detected by the tests, found by
> searching the code).
>

Did you get these after applying your patch? If so, can you share the
version which you are using, or if you have already posted the same
then point me to the same?

> As an example, with the minimal patch applied, if you had a trigger on
> INSERT that, say, attempted a table creation or UPDATE/DELETE, and you
> ran an "INSERT INTO ... SELECT...", it would treat the trigger
> operations as being attempted in parallel-mode, and so an error would
> result.
>

Oh, I guess this happens because you need to execute Insert in
parallel-mode even though Insert is happening in the leader, right?
And probably we are not facing this with "Create Table As .." because
there is no trigger execution involved there.

> Let me know your thoughts on how to deal with these issues.
> Can you see a problem with only having parallel-mode set for scope of
> Gather/GatherMerge, or do you have some other idea?
>

I have not thought about this yet but I don't understand your
proposal. How will you set it only for the scope of Gather (Merge)?
The execution of the Gather node will be interleaved with the Insert
node, basically, you fetch a tuple from Gather, and then you need to
Insert it. Can you be a bit more specific on what you have in mind for
this?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 October 2020, 05:34:21

On Mon, Oct 12, 2020 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 3:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > >                 pathnode->path.total_cost += subpath->total_cost;
> > > -               pathnode->path.rows += subpath->rows;
> > > +               if (returningLists != NIL)
> > > +                       pathnode->path.rows += subpath->rows;
> > >                 total_size += subpath->pathtarget->width * subpath->rows;
> > >         }
>
> Erm, except the condition should of course cover total_size too.
>
> > Agree, thanks (bug in existing Postgres code, right?)
>
> Yeah, I think we should go ahead and fix that up front.  Here's a
> version with a commit message.

I've checked it and tested it, and it looks fine to me.
Also, it seems to align with the gripe in the old comment about width
("XXX this is totally wrong: we should report zero if no RETURNING
...").
I'm happy for you to commit it.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

12 October 2020, 06:36:45

On Mon, Oct 12, 2020 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
>
> > Let me know your thoughts on how to deal with these issues.
> > Can you see a problem with only having parallel-mode set for scope of
> > Gather/GatherMerge, or do you have some other idea?
> >
>
> I have not thought about this yet but I don't understand your
> proposal. How will you set it only for the scope of Gather (Merge)?
> The execution of the Gather node will be interleaved with the Insert
> node, basically, you fetch a tuple from Gather, and then you need to
> Insert it. Can you be a bit more specific on what you have in mind for
> this?
>

One more thing I would like to add here is that we can't exit
parallel-mode till the workers are running (performing the scan or
other operation it is assigned with) and shared memory is not
destroyed. Otherwise, the leader can perform un-safe things like
assigning new commandsids or probably workers can send some error for
which the leader should still be in parallel-mode. So, considering
this I think we need quite similar checks (similar to parallel
inserts) to make even the Select part parallel for Inserts. If we do
that then you won't face many of the problems you mentioned above like
executing triggers that contain parallel-unsafe stuff. I feel still it
will be beneficial to do this as it will cover cases like Insert with
GatherMerge underneath it which would otherwise not possible.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 October 2020, 07:07:19

On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > I have not thought about this yet but I don't understand your
> > proposal. How will you set it only for the scope of Gather (Merge)?
> > The execution of the Gather node will be interleaved with the Insert
> > node, basically, you fetch a tuple from Gather, and then you need to
> > Insert it. Can you be a bit more specific on what you have in mind for
> > this?
> >
>
> One more thing I would like to add here is that we can't exit
> parallel-mode till the workers are running (performing the scan or
> other operation it is assigned with) and shared memory is not
> destroyed. Otherwise, the leader can perform un-safe things like
> assigning new commandsids or probably workers can send some error for
> which the leader should still be in parallel-mode. So, considering
> this I think we need quite similar checks (similar to parallel
> inserts) to make even the Select part parallel for Inserts. If we do
> that then you won't face many of the problems you mentioned above like
> executing triggers that contain parallel-unsafe stuff. I feel still it
> will be beneficial to do this as it will cover cases like Insert with
> GatherMerge underneath it which would otherwise not possible.
>

Yes, I see what you mean, exiting parallel-mode can't be done safely
where I had hoped it could, so looks like, even for making just the
Select part of Insert parallel, I need to add checks (along the same
lines as for Parallel Insert) to avoid the parallel Select in certain
potentially-unsafe cases.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

12 October 2020, 08:52:43

On Mon, Oct 12, 2020 at 12:38 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > I have not thought about this yet but I don't understand your
> > > proposal. How will you set it only for the scope of Gather (Merge)?
> > > The execution of the Gather node will be interleaved with the Insert
> > > node, basically, you fetch a tuple from Gather, and then you need to
> > > Insert it. Can you be a bit more specific on what you have in mind for
> > > this?
> > >
> >
> > One more thing I would like to add here is that we can't exit
> > parallel-mode till the workers are running (performing the scan or
> > other operation it is assigned with) and shared memory is not
> > destroyed. Otherwise, the leader can perform un-safe things like
> > assigning new commandsids or probably workers can send some error for
> > which the leader should still be in parallel-mode. So, considering
> > this I think we need quite similar checks (similar to parallel
> > inserts) to make even the Select part parallel for Inserts. If we do
> > that then you won't face many of the problems you mentioned above like
> > executing triggers that contain parallel-unsafe stuff. I feel still it
> > will be beneficial to do this as it will cover cases like Insert with
> > GatherMerge underneath it which would otherwise not possible.
> >
>
> Yes, I see what you mean, exiting parallel-mode can't be done safely
> where I had hoped it could, so looks like, even for making just the
> Select part of Insert parallel, I need to add checks (along the same
> lines as for Parallel Insert) to avoid the parallel Select in certain
> potentially-unsafe cases.
>

Right, after we take care of that, we can think of assigning xid or
things like that before we enter parallel mode. Say we have a function
like PrepareParallelMode (or PrepareEnterParallelMode) or something
like that where we can check whether we need to perform
parallel-safe-write operation (as of now Insert) and then do the
required preparation like assign xid, etc. I think this might not be
idle because it is possible that we don't fetch even a single row (say
due to filter condition) which needs to be inserted and then we will
waste xid but such cases might not occur often enough to worry.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Thomas Munro

Date:

12 October 2020, 11:37:40

On Mon, Oct 12, 2020 at 6:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Mon, Oct 12, 2020 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Yeah, I think we should go ahead and fix that up front.  Here's a
> > version with a commit message.
>
> I've checked it and tested it, and it looks fine to me.
> Also, it seems to align with the gripe in the old comment about width
> ("XXX this is totally wrong: we should report zero if no RETURNING
> ...").
> I'm happy for you to commit it.

Pushed, though I left most of that comment there because the width
estimate still needs work when you do use RETURNING.  At least we now
have rows=0 for queries without RETURNING, which was a bigger problem
for your patch.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

15 October 2020, 04:25:14

On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> It might be a good idea to first just get this patch committed, if
> possible. So, I have reviewed the latest version of this patch:
>
> 0001-InsertParallelSelect

I've attached an updated InsertParallelSelect patch (v2) - allowing
underlying parallel SELECT for "INSERT INTO ... SELECT ...".
I think I've addressed most of the issues identified in the previous
version of the patch.
I'm still seeing a couple of errors in the tests when
"force_parallel_mode=regress" is in effect, and those need to be
addressed (extra checks required to avoid parallel SELECT in certain
cases).
Also, I'm seeing a partition-related error when running
installcheck-world - I'm investigating that.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

v2-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

15 October 2020, 12:43:48

On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It might be a good idea to first just get this patch committed, if
> > possible. So, I have reviewed the latest version of this patch:
> >
> > 0001-InsertParallelSelect
>
> I've attached an updated InsertParallelSelect patch (v2) - allowing
> underlying parallel SELECT for "INSERT INTO ... SELECT ...".
> I think I've addressed most of the issues identified in the previous
> version of the patch.
> I'm still seeing a couple of errors in the tests when
> "force_parallel_mode=regress" is in effect, and those need to be
> addressed (extra checks required to avoid parallel SELECT in certain
> cases).
>

I am getting below error in force_parallel_mode:
@@ -1087,9 +1087,14 @@
 ERROR:  value for domain inotnull violates check constraint "inotnull_check"
 create table dom_table (x inotnull);
 insert into dom_table values ('1');
+ERROR:  cannot start commands during a parallel operation
+CONTEXT:  SQL function "sql_is_distinct_from"

It happened with below test:

create function sql_is_distinct_from(anyelement, anyelement)
returns boolean language sql
as 'select $1 is distinct from $2 limit 1';

create domain inotnull int
  check (sql_is_distinct_from(value, null));

create table dom_table (x inotnull);
insert into dom_table values ('1');

So it is clear that this is happening because we have allowed insert
that is parallel-unsafe. The attribute is of type domain which has a
parallel-unsafe constraint. As per your current code, we need to
detect it in IsRelParallelModeSafeForModify. The idea would be to
check the type of each attribute and if it is domain type then we need
to check if it has a constraint (See function ExecGrant_Type on how to
detect a domain type and then refer to functions
AlterTypeNamespaceInternal and AlterConstraintNamespaces to know how
to determine constraint for domain type). Once you can find a
constraint then you already have code in your patch to find if it
contains parallel-unsafe expression.

> Also, I'm seeing a partition-related error when running
> installcheck-world - I'm investigating that.
>

Okay.

Few more comments:
==================
1.
+ /*
+ * Can't support execution of row-level or transition-table triggers
+ * during parallel-mode, since such triggers may query the table
+ * into which the data is being inserted, and the content returned
+ * would vary unpredictably according to the order of retrieval by
+ * the workers and the rows already inserted.
+ */
+ if (trigdesc != NULL &&
+ (trigdesc->trig_insert_instead_row ||
+   trigdesc->trig_insert_before_row ||
+   trigdesc->trig_insert_after_row ||
+   trigdesc->trig_insert_new_table))
+ {
+ return false;
+ }

I don't think it is a good idea to prohibit all before/after/instead
row triggers because for the case you are referring to should mark
trigger functions as parallel-unsafe. We might want to have to Assert
somewhere to detect if there is illegal usage but I don't see the need
to directly prohibit them.

2.
@@ -56,8 +57,8 @@ GetNewTransactionId(bool isSubXact)
  * Workers synchronize transaction state at the beginning of each parallel
  * operation, so we can't account for new XIDs after that point.
  */
- if (IsInParallelMode())
- elog(ERROR, "cannot assign TransactionIds during a parallel operation");
+ if (IsParallelWorker())
+ elog(ERROR, "cannot assign TransactionIds in a parallel worker");

  /*
  * During bootstrap initialization, we return the special bootstrap
diff --git a/src/backend/access/transam/xact.c
b/src/backend/access/transam/xact.c
index af6afce..ef423fb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -580,8 +580,8 @@ AssignTransactionId(TransactionState s)
  * Workers synchronize transaction state at the beginning of each parallel
  * operation, so we can't account for new XIDs at this point.
  */
- if (IsInParallelMode() || IsParallelWorker())
- elog(ERROR, "cannot assign XIDs during a parallel operation");
+ if (IsParallelWorker())
+ elog(ERROR, "cannot assign XIDs in a parallel worker");

I think we don't need these changes at least for the purpose of this
patch if you follow the suggestion related to having a new function
like PrepareParallelMode in the email above [1]. One problem I see
with removing these checks is how do we ensure that leader won't
assign a new transactionid once we start executing a parallel node. It
can do via sub-transactions maybe that is already protected at some
previous point but I would like to see if we can retain these checks.

[1] - https://www.postgresql.org/message-id/CAA4eK1JogfXUa%3D3wMPO%2BK%3DUiOLgHgCO7-fj1wCHsSxdaXsfVbw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

16 October 2020, 04:44:13

On Thu, Oct 15, 2020 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > Also, I'm seeing a partition-related error when running
> > installcheck-world - I'm investigating that.
> >
>
> Okay.
>

The attached patch fixes this partition case for me. Basically, we
need to check the parallel-safety of PartitionKey. I have only checked
for partsupfunc but I think we need to check the parallel-safety of
partexprs as well. Also, I noticed that you have allowed for
parallelism only when all expressions/functions involved with Insert
are parallel-safe, can't we allow parallel-restricted case because
anyway Inserts have to be performed by the leader for this patch.

-- 
With Regards,
Amit Kapila.

Attachment

fix_paratition_failure_1.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

16 October 2020, 08:45:34

On Fri, Oct 16, 2020 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > Also, I'm seeing a partition-related error when running
> > > installcheck-world - I'm investigating that.
> > >
> >
> > Okay.
> >
>
> The attached patch fixes this partition case for me. Basically, we
> need to check the parallel-safety of PartitionKey. I have only checked
> for partsupfunc but I think we need to check the parallel-safety of
> partexprs as well.

Thanks, I had already added the parallel-safety check for partexprs
when I saw this, so your patch hopefully completes all the
partition-related checks required.

> Also, I noticed that you have allowed for
> parallelism only when all expressions/functions involved with Insert
> are parallel-safe, can't we allow parallel-restricted case because
> anyway Inserts have to be performed by the leader for this patch.
>

Yes, I think you're right.
"A parallel restricted operation is one which cannot be performed in a
parallel worker, but which can be performed in the leader while
parallel query is in use."
I'll make the change and test that everything works OK.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

16 October 2020, 10:26:51

On Fri, Oct 16, 2020 at 2:16 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 16, 2020 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > Also, I noticed that you have allowed for
> > parallelism only when all expressions/functions involved with Insert
> > are parallel-safe, can't we allow parallel-restricted case because
> > anyway Inserts have to be performed by the leader for this patch.
> >
>
> Yes, I think you're right.
> "A parallel restricted operation is one which cannot be performed in a
> parallel worker, but which can be performed in the leader while
> parallel query is in use."
> I'll make the change and test that everything works OK.
>

Cool, let me try to explain my thoughts a bit more. The idea is first
(in standard_planner) we check if there is any 'parallel_unsafe'
function/expression (via max_parallel_hazard) in the query tree. If we
don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
this stage, the patch is checking whether there is any
'parallel_unsafe' or 'parallel_restricted' expression/function in the
target relation and if there is none then we mark parallelModeOK as
true. So, if there is anything 'parallel_restricted' then we will mark
parallelModeOK as false which doesn't seem right to me.

Then later in the planner during set_rel_consider_parallel, we
determine if a particular relation can be scanned from within a
worker, then we consider that relation for parallelism. Here, we
determine if certain things are parallel-restricted then we don't
consider this for parallelism. Then we create partial paths for the
relations that are considered for parallelism. I think we don't need
to change anything for the current patch in these later stages because
we anyway are not considering Insert to be pushed into workers.
However, in the second patch where we are thinking to push Inserts in
workers, we might need to do something to filter parallel-restricted
cases during this stage of the planner.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 October 2020, 04:16:08

On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Cool, let me try to explain my thoughts a bit more. The idea is first
> (in standard_planner) we check if there is any 'parallel_unsafe'
> function/expression (via max_parallel_hazard) in the query tree. If we
> don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
> this stage, the patch is checking whether there is any
> 'parallel_unsafe' or 'parallel_restricted' expression/function in the
> target relation and if there is none then we mark parallelModeOK as
> true. So, if there is anything 'parallel_restricted' then we will mark
> parallelModeOK as false which doesn't seem right to me.
>
> Then later in the planner during set_rel_consider_parallel, we
> determine if a particular relation can be scanned from within a
> worker, then we consider that relation for parallelism. Here, we
> determine if certain things are parallel-restricted then we don't
> consider this for parallelism. Then we create partial paths for the
> relations that are considered for parallelism. I think we don't need
> to change anything for the current patch in these later stages because
> we anyway are not considering Insert to be pushed into workers.
> However, in the second patch where we are thinking to push Inserts in
> workers, we might need to do something to filter parallel-restricted
> cases during this stage of the planner.
>

Posting an update to the smaller patch (Parallel SELECT for INSERT
INTO...SELECT...).

Most of this patch feeds into the larger Parallel INSERT patch, for
which I'll also be posting an update soon.

Patch updates include:
- Removed explicit trigger-type checks (instead rely on declared
trigger parallel safety)
- Restored parallel-related XID checks that previous patch altered;
now assign XID prior to entering parallel-mode
- Now considers parallel-SELECT for parallel RESTRICTED cases (not
just parallel SAFE cases)
- Added parallel-safety checks for partition key expressions and
support functions
- Workaround added for test failure in "partition-concurrent-attach"
test; since ALTER TABLE operations may exclusively lock a relation
until end-of-transaction, now assume and return UNSAFE if can't
acquire a share-lock on the relation, rather than block until
potentially end of the other concurrent transaction in which the
exclusive lock is held.
Examples of when a relation is exclusively locked
(AccessExclusiveLock) until end-of-transaction include:
    ALTER TABLE DROP COLUMN
    ALTER TABLE ... RENAME
    ALTER TABLE ... ATTACH PARTITION  (locks default partition)

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

v3-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

27 October 2020, 03:45:47

On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Cool, let me try to explain my thoughts a bit more. The idea is first
> (in standard_planner) we check if there is any 'parallel_unsafe'
> function/expression (via max_parallel_hazard) in the query tree. If we
> don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
> this stage, the patch is checking whether there is any
> 'parallel_unsafe' or 'parallel_restricted' expression/function in the
> target relation and if there is none then we mark parallelModeOK as
> true. So, if there is anything 'parallel_restricted' then we will mark
> parallelModeOK as false which doesn't seem right to me.
>
> Then later in the planner during set_rel_consider_parallel, we
> determine if a particular relation can be scanned from within a
> worker, then we consider that relation for parallelism. Here, we
> determine if certain things are parallel-restricted then we don't
> consider this for parallelism. Then we create partial paths for the
> relations that are considered for parallelism. I think we don't need
> to change anything for the current patch in these later stages because
> we anyway are not considering Insert to be pushed into workers.
> However, in the second patch where we are thinking to push Inserts in
> workers, we might need to do something to filter parallel-restricted
> cases during this stage of the planner.
>

Posting an updated Parallel INSERT patch which (mostly) addresses
previously-identified issues and suggestions.

More work needs to be done in order to support parallel UPDATE and
DELETE (even after application of Thomas Munro's combo-cid
parallel-support patch), but it is getting closer.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

v5-0001-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

27 October 2020, 09:57:20

On Thu, Oct 22, 2020 at 9:47 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> Posting an update to the smaller patch (Parallel SELECT for INSERT
> INTO...SELECT...).
>
> Most of this patch feeds into the larger Parallel INSERT patch, for
> which I'll also be posting an update soon.
>
> Patch updates include:
> - Removed explicit trigger-type checks (instead rely on declared
> trigger parallel safety)
> - Restored parallel-related XID checks that previous patch altered;
> now assign XID prior to entering parallel-mode
> - Now considers parallel-SELECT for parallel RESTRICTED cases (not
> just parallel SAFE cases)
> - Added parallel-safety checks for partition key expressions and
> support functions
> - Workaround added for test failure in "partition-concurrent-attach"
> test;
>

IIUC, below is code for this workaround:

+MaxRelParallelHazardForModify(Oid relid,
+ CmdType commandType,
+ max_parallel_hazard_context *context)
+{
+ Relation        rel;
+ TupleDesc tupdesc;
+ int attnum;
+
+ LOCKMODE lockmode = AccessShareLock;
+
+ /*
+ * It's possible that this relation is locked for exclusive access
+ * in another concurrent transaction (e.g. as a result of a
+ * ALTER TABLE ... operation) until that transaction completes.
+ * If a share-lock can't be acquired on it now, we have to assume this
+ * could be the worst-case, so to avoid blocking here until that
+ * transaction completes, conditionally try to acquire the lock and
+ * assume and return UNSAFE on failure.
+ */
+ if (ConditionalLockRelationOid(relid, lockmode))
+ {
+ rel = table_open(relid, NoLock);
+ }
+ else
+ {
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ return context->max_hazard;
+ }

Do we need this workaround if we lock just the parent table instead of
locking all the tables? Basically, can we safely identify the
parallel-safety of partitioned relation if we just have a lock on
parent relation? One more thing I have noticed is that for scan
relations (Select query), we do such checks much later based on
RelOptInfo (see set_rel_consider_parallel) which seems to have most of
the information required to perform parallel-safety checks but I guess
for ModifyTable (aka the Insert table) the equivalent doesn't seem
feasible but have you thought of doing at the later stage in planner?

Few other comments on latest patch:
===============================
1.
MaxRelParallelHazardForModify()
{
..
+ if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
+ {
+ /*
..

Why to check CMD_UPDATE here?

2.
+void PrepareParallelModeForModify(CmdType commandType, bool
isParallelModifyLeader)
+{
+ Assert(!IsInParallelMode());
+
+ if (isParallelModifyLeader)
+ (void)GetCurrentCommandId(true);
+
+ (void)GetCurrentFullTransactionId();

Here, we should use GetCurrentTransactionId() similar to heap_insert
or other heap operations. I am not sure why you have used
GetCurrentFullTransactionId?

3. Can we have a test to show why we need to check all the partitions
for parallel-safety? I think it would be possible when there is a
trigger on only one of the partitions and that trigger has
corresponding parallel_unsafe function. But it is good to verify that
once.

4. Have you checked the overhead of this on the planner for different
kinds of statements like inserts into tables having 100 or 500
partitions? Similarly, it is good to check the overhead of domain
related checks added in the patch.

5. Can we have a separate patch for parallel-selects for Insert? It
will make review easier.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

30 October 2020, 00:38:49

On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> IIUC, below is code for this workaround:
>
> +MaxRelParallelHazardForModify(Oid relid,
> + CmdType commandType,
> + max_parallel_hazard_context *context)
> +{
> + Relation        rel;
> + TupleDesc tupdesc;
> + int attnum;
> +
> + LOCKMODE lockmode = AccessShareLock;
> +
> + /*
> + * It's possible that this relation is locked for exclusive access
> + * in another concurrent transaction (e.g. as a result of a
> + * ALTER TABLE ... operation) until that transaction completes.
> + * If a share-lock can't be acquired on it now, we have to assume this
> + * could be the worst-case, so to avoid blocking here until that
> + * transaction completes, conditionally try to acquire the lock and
> + * assume and return UNSAFE on failure.
> + */
> + if (ConditionalLockRelationOid(relid, lockmode))
> + {
> + rel = table_open(relid, NoLock);
> + }
> + else
> + {
> + context->max_hazard = PROPARALLEL_UNSAFE;
> + return context->max_hazard;
> + }
>
> Do we need this workaround if we lock just the parent table instead of
> locking all the tables? Basically, can we safely identify the
> parallel-safety of partitioned relation if we just have a lock on
> parent relation?

I believe the workaround is still needed in this case, because the
workaround was added because of a test in which the parent table was
exclusively locked in another concurrent transaction (as a result of
ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a
ShareLock on the parent table without hanging (and then ending up
failing the test because of it).
So at the moment the workaround is needed, even if just trying to lock
the parent table.
I'll do some more testing to determine the secondary issue of whether
locks on the partition tables are needed, but at the moment I believe
they are.

>One more thing I have noticed is that for scan
> relations (Select query), we do such checks much later based on
> RelOptInfo (see set_rel_consider_parallel) which seems to have most of
> the information required to perform parallel-safety checks but I guess
> for ModifyTable (aka the Insert table) the equivalent doesn't seem
> feasible but have you thought of doing at the later stage in planner?
>

Yes, and in fact I tried putting the checks in a later stage of the
planner, and it's almost successful, except it then makes setting
"parallelModeNeeded" very tricky indeed, because that is expected to
be set based on whether the SQL is safe to run in parallel mode
(paralleModeOK == true) and whether force_parallel_mode is not off.
With parallel safety checks delayed to a later stage in the planner,
it's then not known whether there are certain types of parallel-unsafe
INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE
...), because processing for those doesn't reach those later stages of
the planner where parallelism is being considered. So then to avoid
errors from when parallel-mode is forced on and such unsafe INSERTs
are run, the only real choice is to only allow parallelModeNeeded to
be true for SELECT only (not INSERT), and this is kind of cheating and
also not picking up cases where parallel-safe INSERT is run but
invokes parallel-mode-unsafe features.
My conclusion, at least for the moment, is to leave the check where it is.

> Few other comments on latest patch:
> ===============================
> 1.
> MaxRelParallelHazardForModify()
> {
> ..
> + if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
> + {
> + /*
> ..
>
> Why to check CMD_UPDATE here?
>

That was a bit of forward-thinking, for when/if UPDATE/DELETE is
supported in parallel-mode.
Column default expressions and check-constraints are only applicable
to INSERT and UPDATE.
Note however that currently this function can only ever be called with
commandType == CMD_INSERT.

> 2.
> +void PrepareParallelModeForModify(CmdType commandType, bool
> isParallelModifyLeader)
> +{
> + Assert(!IsInParallelMode());
> +
> + if (isParallelModifyLeader)
> + (void)GetCurrentCommandId(true);
> +
> + (void)GetCurrentFullTransactionId();
>
> Here, we should use GetCurrentTransactionId() similar to heap_insert
> or other heap operations. I am not sure why you have used
> GetCurrentFullTransactionId?
>

GetCurrentTransactionId() and GetCurrentFullTransactionId() actually
have the same functionality, just a different return value (which is
not being used here).
But anyway I've changed it to use GetCurrentTransactionId().

> 3. Can we have a test to show why we need to check all the partitions
> for parallel-safety? I think it would be possible when there is a
> trigger on only one of the partitions and that trigger has
> corresponding parallel_unsafe function. But it is good to verify that
> once.
>

I can't imagine how you could check parallel-safety properly without
checking all of the partitions.
We don't know which partition that data will get inserted into until
runtime (e.g. range/list partitioning).
Each partition can have its own column default expressions,
check-constraints, triggers etc. (which may or may not be
parallel-safe) and a partition may itself be a partitioned table.

> 4. Have you checked the overhead of this on the planner for different
> kinds of statements like inserts into tables having 100 or 500
> partitions? Similarly, it is good to check the overhead of domain
> related checks added in the patch.
>

Checking that now and will post results soon.

> 5. Can we have a separate patch for parallel-selects for Insert? It
> will make review easier.
>

See attached patches.

Regards,
Greg Nancarrow
Fujitsu Australia

>
> See attached patches.
>

Thanks for providing the patches.
I had reviewed v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to be
+ * included in the transaction state that is serialized in the parallel DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+ Assert(!IsInParallelMode());
+
+ (void)GetCurrentTransactionId();
+}

-> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes into a completely new table. In the future, we can extend it to allow workers to write into the table" must be updated accordingly:
+ * modify any data using a CTE, or if this is a cursor operation, or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.
*
- * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+ * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
* backend writes into a completely new table. In the future, we can
* extend it to allow workers to write into the table. However, to allow

-> Also should we specify insert as "insert into select"

-> We could include a small writeup of the design may be in the commit message. It will be useful for review.

-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }

something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);

-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+ /*
+ * Supported table-modification commands may require additional steps
+ * prior to entering parallel mode, such as assigning a FullTransactionId.
+ */

-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+ if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+ break;
+
+ /*
+ * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+ * the relation, and this would result in creation of new CommandIds
+ * on insert/update/delete and this isn't supported in a parallel
+ * worker (but is safe in the parallel leader).
+ */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_FK)
+ {
+ context->max_hazard = PROPARALLEL_RESTRICTED;
+ /*
+ * As we're looking for the max parallel hazard, we don't break
+ * here; examine any further triggers ...
+ */
+ }

-> Should we switch to non-parallel mode in this case, instead of throwing error?
+ val = SysCacheGetAttr(CONSTROID, tup,
+ Anum_pg_constraint_conbin, &isnull);
+ if (isnull)
+ elog(ERROR, "null conbin for constraint %u", con->oid);
+ conbin = TextDatumGetCString(val);

-> We could include a few tests for this in regression.

-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

04 November 2020, 00:41:12

Hi Vignesh,

Thanks for reviewing the patches.

On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:
>
> -> commandType is not used, we can remove it.
> + * Prepare for entering parallel mode by assigning a FullTransactionId, to be
> + * included in the transaction state that is serialized in the parallel DSM.
> + */
> +void PrepareParallelModeForModify(CmdType commandType)
> +{
> +       Assert(!IsInParallelMode());
> +
> +       (void)GetCurrentTransactionId();
> +}

Thanks, at least for INSERT, it's not needed, so I'll remove it.

>
> -> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes
intoa completely new table.  In the future, we can extend it to allow workers to write into the table" must be updated
accordingly:
> +        * modify any data using a CTE, or if this is a cursor operation, or if
> +        * GUCs are set to values that don't permit parallelism, or if
> +        * parallel-unsafe functions are present in the query tree.
>          *
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> +        * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
>          * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
>          * backend writes into a completely new table.  In the future, we can
>          * extend it to allow workers to write into the table.  However, to allow
>
> -> Also should we specify insert as "insert into select"
>

I'll update it, appropriate to each patch.

> -> We could include a small writeup of the design may be in the commit message. It will be useful for review.
>

Will do so for the next patch version.

> -> I felt the below two assignment statements can be in the else condition:
>                 glob->maxParallelHazard = max_parallel_hazard(parse);
>                 glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +
> +               /*
> +                * Additional parallel-mode safety checks are required in order to
> +                * allow an underlying parallel query to be used for a
> +                * table-modification command that is supported in parallel-mode.
> +                */
> +               if (glob->parallelModeOK &&
> +                       IsModifySupportedInParallelMode(parse->commandType))
> +               {
> +                       glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
> +                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +               }
>
> something like:
> /*
> * Additional parallel-mode safety checks are required in order to
> * allow an underlying parallel query to be used for a
> * table-modification command that is supported in parallel-mode.
> */
> if (glob->parallelModeOK &&
> IsModifySupportedInParallelMode(parse->commandType))
> glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
> else
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse);
> glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
>

That won't work. As the comment is trying to point out, additional
parallel-safety checks (i.e. in addition to those done by
max_parallel_hazard()) are required to determine if INSERT can be
safely run in parallel-mode with an underlying parallel query.
Also, the max_parallel_hazard found from first calling
max_parallel_hazard() then needs to be fed into
MaxParallelHazardForModify(), in case it finds a worse parallel
hazard.
For example, max_parallel_hazard() may find something parallel
RESTRICTED, but then the additional parallel-safety checks done by
MaxParallelHazardForModify() find something parallel UNSAFE.

> -> Comments need slight adjustment, maybe you could run pgindent for the modified code.
> +               /*
> +                * Supported table-modification commands may require additional steps
> +                * prior to entering parallel mode, such as assigning a FullTransactionId.
> +                */
>

OK, will run pgindent.

> -> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in
thatcase? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
 
> +               if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
> +                       break;
> +
> +               /*
> +                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> +                * the relation, and this would result in creation of new CommandIds
> +                * on insert/update/delete and this isn't supported in a parallel
> +                * worker (but is safe in the parallel leader).
> +                */
> +               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> +               if (trigtype == RI_TRIGGER_FK)
> +               {
> +                       context->max_hazard = PROPARALLEL_RESTRICTED;
> +                       /*
> +                        * As we're looking for the max parallel hazard, we don't break
> +                        * here; examine any further triggers ...
> +                        */
> +               }
>

max_parallel_hazard_test won't return true for PROPARALLEL_RESTRICTED.
max_parallel_hazard_test only returns true when
"context.max_interesting" is found, and that is set to
PROPARALLEL_UNSAFE in max_parallel_hazard_for_modify().

> -> Should we switch to non-parallel mode in this case, instead of throwing error?
> +                       val = SysCacheGetAttr(CONSTROID, tup,
> +                                               Anum_pg_constraint_conbin, &isnull);
> +                       if (isnull)
> +                               elog(ERROR, "null conbin for constraint %u", con->oid);
> +                       conbin = TextDatumGetCString(val);
>

I didn't invent that error check, it's found in several other places
in the Postgres code (that error should only ever occur if the
database has been corrupted or intentionally invalidated).
Having said that, I agree that perhaps it's best to switch to
non-parallel mode in this case, but this wouldn't stop it erroring out
when the plan is actually run.

> -> We could include a few tests for this in regression.
>

Looking at adding relevant test cases.

> -> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
>

Looking at doc updates.


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

04 November 2020, 03:19:04

On Wed, Nov 4, 2020 at 6:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > -> commandType is not used, we can remove it.
> > + * Prepare for entering parallel mode by assigning a FullTransactionId, to be
> > + * included in the transaction state that is serialized in the parallel DSM.
> > + */
> > +void PrepareParallelModeForModify(CmdType commandType)
> > +{
> > +       Assert(!IsInParallelMode());
> > +
> > +       (void)GetCurrentTransactionId();
> > +}
>
> Thanks, at least for INSERT, it's not needed, so I'll remove it.
>

Or you might want to consider moving the check related to
IsModifySupportedInParallelMode() inside
PrepareParallelModeForModify(). That way the code might look a bit
cleaner.


-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

13 November 2020, 09:14:20

On Wed, Nov 4, 2020 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Or you might want to consider moving the check related to
> IsModifySupportedInParallelMode() inside
> PrepareParallelModeForModify(). That way the code might look a bit
> cleaner.
>

Posting an updated Parallel SELECT for "INSERT INTO ... SELECT ..."
patch which addresses previously-identified issues and suggestions,
and adds some tests and doc updates.
I won't post an updated Parallel INSERT patch just yet (which just
builds on the 1st patch), because there's at least a couple of issues
in this 1st patch which need to be discussed first.

Firstly, in order to perform parallel-safety checks in the case of
partitions, the patch currently recursively locks/unlocks
(AccessShareLock) each partition during such checks (as each partition
may itself be a partitioned table). Is there a better way of
performing the parallel-safety checks and reducing the locking
requirements?

Secondly, I found that when running "make check-world", the
"partition-concurrent-attach" test fails, because it is expecting a
partition constraint to be violated on insert, while an "alter table
attach partition ..." is concurrently being executed in another
transaction. Because of the partition locking done by the patch's
parallel-safety checking code, the insert blocks on the exclusive lock
held by the "alter table" in the other transaction until the
transaction ends, so the insert ends up successfully completing (and
thus fails the test) when the other transaction ends. To overcome this
test failure, the patch code was updated to instead perform a
conditional lock on the partition, and on failure (i.e. because of an
exclusive lock held somewhere else), just assume it's parallel-unsafe
because the parallel-safety can't be determined without blocking on
the lock. This is not ideal, but I'm not sure of what other approach
could be used and I am somewhat reluctant to change that test. If
anybody is familiar with the "partition-concurrent-attach" test, any
ideas or insights would be appreciated.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

20 November 2020, 08:44:37

On Fri, Nov 13, 2020 at 8:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Or you might want to consider moving the check related to
> > IsModifySupportedInParallelMode() inside
> > PrepareParallelModeForModify(). That way the code might look a bit
> > cleaner.
> >
>
> Posting an updated Parallel SELECT for "INSERT INTO ... SELECT ..."
> patch which addresses previously-identified issues and suggestions,
> and adds some tests and doc updates.
> I won't post an updated Parallel INSERT patch just yet (which just
> builds on the 1st patch), because there's at least a couple of issues
> in this 1st patch which need to be discussed first.
>
> Firstly, in order to perform parallel-safety checks in the case of
> partitions, the patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition
> may itself be a partitioned table). Is there a better way of
> performing the parallel-safety checks and reducing the locking
> requirements?
>
> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a
> partition constraint to be violated on insert, while an "alter table
> attach partition ..." is concurrently being executed in another
> transaction. Because of the partition locking done by the patch's
> parallel-safety checking code, the insert blocks on the exclusive lock
> held by the "alter table" in the other transaction until the
> transaction ends, so the insert ends up successfully completing (and
> thus fails the test) when the other transaction ends. To overcome this
> test failure, the patch code was updated to instead perform a
> conditional lock on the partition, and on failure (i.e. because of an
> exclusive lock held somewhere else), just assume it's parallel-unsafe
> because the parallel-safety can't be determined without blocking on
> the lock. This is not ideal, but I'm not sure of what other approach
> could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any
> ideas or insights would be appreciated.
>

Posting an updated set of patches, with some additional testing and
documentation updates, and including the latest version of the
Parallel Insert patch.
Any feedback appreciated, especially on the two points mentioned in
the previous post.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

07 December 2020, 09:04:59

On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches, with some additional testing and
> documentation updates, and including the latest version of the
> Parallel Insert patch.
> Any feedback appreciated, especially on the two points mentioned in
> the previous post.
>

Posting an updated set of patches, since a minor bug was found in the
1st patch that was causing a postgresql-cfbot build failure.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From

vignesh C

Date:

08 December 2020, 14:35:36

On Mon, Dec 7, 2020 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > Posting an updated set of patches, with some additional testing and
> > documentation updates, and including the latest version of the
> > Parallel Insert patch.
> > Any feedback appreciated, especially on the two points mentioned in
> > the previous post.
> >
>
> Posting an updated set of patches, since a minor bug was found in the
> 1st patch that was causing a postgresql-cfbot build failure.
>

Most of the code present in
v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
applicable for parallel copy patch also. The patch in this thread
handles the check for PROPARALLEL_UNSAFE, we could slightly make it
generic by handling like the comments below, that way this parallel
safety checks can be used based on the value set in
max_parallel_hazard_context. There is nothing wrong with the changes,
I'm providing these comments so that this patch can be generalized for
parallel checks and the same can also be used by parallel copy.
Few comments:
1)
+               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+               if (trigtype == RI_TRIGGER_FK)
+               {
+                       context->max_hazard = PROPARALLEL_RESTRICTED;
+
+                       /*
+                        * As we're looking for the max parallel
hazard, we don't break
+                        * here; examine any further triggers ...
+                        */
+               }

Can we change this something like:
trigtype = RI_FKey_trigger_type(trigger->tgfoid);
if (trigtype == RI_TRIGGER_FK)
{
        if(max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)
break;
}

This below line is not required as it will be taken care by
max_parallel_hazard_test.
context->max_hazard = PROPARALLEL_RESTRICTED;

2)
+               /* Recursively check each partition ... */
+               pdesc = RelationGetPartitionDesc(rel);
+               for (i = 0; i < pdesc->nparts; i++)
+               {
+                       if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+
                            command_type,
+
                            context,
+
                            AccessShareLock) == PROPARALLEL_UNSAFE)
+                       {
+                               table_close(rel, lockmode);
+                               return context->max_hazard;
+                       }
+               }


Can we change this something like:
/* Recursively check each partition ... */
pdesc = RelationGetPartitionDesc(rel);
for (i = 0; i < pdesc->nparts; i++)
{
char max_hazard = rel_max_parallel_hazard_for_modify(pdesc->oids[i],

            command_type,

            context,

            AccessShareLock);

    if(max_parallel_hazard_test(max_hazard, context)
    {
            table_close(rel, lockmode);
            return context->max_hazard;
    }
}

3)
Similarly for the below:
+       /*
+        * If there are any index expressions, check that they are parallel-mode
+        * safe.
+        */
+       if (index_expr_max_parallel_hazard_for_modify(rel, context) ==
PROPARALLEL_UNSAFE)
+       {
+               table_close(rel, lockmode);
+               return context->max_hazard;
+       }
+
+       /*
+        * If any triggers exist, check that they are parallel safe.
+        */
+       if (rel->trigdesc != NULL &&
+               trigger_max_parallel_hazard_for_modify(rel->trigdesc,
context) == PROPARALLEL_UNSAFE)
+       {
+               table_close(rel, lockmode);
+               return context->max_hazard;
+       }


4) Similar change required for the below:
+                       /*
+                        * If the column is of a DOMAIN type,
determine whether that
+                        * domain has any CHECK expressions that are
not parallel-mode
+                        * safe.
+                        */
+                       if (get_typtype(att->atttypid) == TYPTYPE_DOMAIN)
+                       {
+                               if
(domain_max_parallel_hazard_for_modify(att->atttypid, context) ==
PROPARALLEL_UNSAFE)
+                               {
+                                       table_close(rel, lockmode);
+                                       return context->max_hazard;
+                               }
+                       }

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 December 2020, 04:40:52

On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
>
> Most of the code present in
> v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> applicable for parallel copy patch also. The patch in this thread
> handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> generic by handling like the comments below, that way this parallel
> safety checks can be used based on the value set in
> max_parallel_hazard_context. There is nothing wrong with the changes,
> I'm providing these comments so that this patch can be generalized for
> parallel checks and the same can also be used by parallel copy.

Hi Vignesh,

You are absolutely right in pointing that out, the code was taking
short-cuts knowing that for Parallel Insert,
"max_parallel_hazard_context.max_interesting" had been set to
PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
re-used by other callers.

I've attached a new set of patches that includes your suggested improvements.

Regards,
Greg Nancarrow
Fujitsu Australia

On Wed, 9 Dec 2020 at 5:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > >
> > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > Most of the code present in
> > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > > applicable for parallel copy patch also. The patch in this thread
> > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > > generic by handling like the comments below, that way this parallel
> > > > > safety checks can be used based on the value set in
> > > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > > I'm providing these comments so that this patch can be generalized for
> > > > > parallel checks and the same can also be used by parallel copy.
> > > >
> > > > Hi Vignesh,
> > > >
> > > > You are absolutely right in pointing that out, the code was taking
> > > > short-cuts knowing that for Parallel Insert,
> > > > "max_parallel_hazard_context.max_interesting" had been set to
> > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > > re-used by other callers.
> > > >
> > > > I've attached a new set of patches that includes your suggested improvements.
> > >
> > > I was going through v10-0001 patch where we are parallelizing only the
> > > select part.
> > >
> > > + /*
> > > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > > + * INSERT...ON CONFLICT...DO UPDATE...
> > > + */
> > > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > > ONCONFLICT_UPDATE)
> > > + return PROPARALLEL_UNSAFE;
> > >
> > > I understand that we can now allow updates from the worker, but what
> > > is the problem if we allow the parallel select even if there is an
> > > update in the leader?
> > >
> >
> > I think we can't allow update even in leader without having a
> > mechanism for a shared combocid table. Right now, we share the
> > ComboCids at the beginning of the parallel query and then never change
> > it during the parallel query but if we allow updates in the leader
> > backend which can generate a combocid then we need a mechanism to
> > propagate that change. Does this make sense?
> >
>
> Okay, got it. Basically, ONCONFLICT_UPDATE might run inside some
> transaction block and there is a possibility that update may try to
> update the same tuple is previously inserted by the same transaction
> and in that case, it will generate the combo cid. Thanks for
> clarifying.
>

We can probably add a comment in the patch so that it is clear why we
are not allowing this case.

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

10 December 2020, 02:23:54

From: Greg Nancarrow <gregn4422@gmail.com>
> Firstly, in order to perform parallel-safety checks in the case of partitions, the
> patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition may
> itself be a partitioned table). Is there a better way of performing the
> parallel-safety checks and reducing the locking requirements?

First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased
planningtime is negligible when the parallel insert is intentionally used for loading large amount of data.  However,
it'sa problem if the overhead is imposed on OLTP transactions.  Does the overhead occur with the default values of
max_parallel_workers_per_gather= 2 and max_parall_workers = 8?
 

To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like
relhasindexesand relhas triggers.  The concerning point is that we have to maintain the accuracy of the value when
droppingancillary objects around the table/partition.
 


> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a partition
> constraint to be violated on insert, while an "alter table attach partition ..." is
> concurrently being executed in another transaction. Because of the partition
> locking done by the patch's parallel-safety checking code, the insert blocks on
> the exclusive lock held by the "alter table" in the other transaction until the
> transaction ends, so the insert ends up successfully completing (and thus fails
> the test) when the other transaction ends. To overcome this test failure, the
> patch code was updated to instead perform a conditional lock on the partition,
> and on failure (i.e. because of an exclusive lock held somewhere else), just
> assume it's parallel-unsafe because the parallel-safety can't be determined
> without blocking on the lock. This is not ideal, but I'm not sure of what other
> approach could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> insights would be appreciated.

That test looks sane.  I think what we should do is to disable parallel operation during that test.  It looks like some
ofother existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0.  It's not strange
thatsome tests fail with some configuration.  autovacuum is disabled in many places of the regression test.
 

Rather, I don't think we should introduce the trick to use ConditionalLockAcquire().  Otherwise, the insert would be
executedin a serial fashion without the user knowing it -- "What?  The insert suddenly slowed down multiple times
today,and it didn't finish within the planned maintenance window.  What's wrong?"
 


Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

From

vignesh C

Date:

10 December 2020, 04:50:24

On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

Thanks for fixing and posting a new patch.
Few comments:
+                                       Node       *index_expr;
+
+                                       if (index_expr_item == NULL)
 /* shouldn't happen */
+                                               elog(ERROR, "too few
entries in indexprs list");
+
+                                       index_expr = (Node *)
lfirst(index_expr_item);

We can change this elog to below to maintain consistency:
if (index_expr_item == NULL)    /* shouldn't happen */
{
  context->max_hazard = PROPARALLEL_UNSAFE;
  return context->max_hazard;
}

static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
/*
* To allow parallel inserts, we need to ensure that they are safe to be
* performed in workers. We have the infrastructure to allow parallel
* inserts in general except for the cases where inserts generate a new
* CommandId (eg. inserts into a table having a foreign key column).
*/
I felt we could remove the above comments or maybe rephrase it.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

10 December 2020, 05:39:33

On Thu, Dec 10, 2020 at 3:50 PM vignesh C <vignesh21@gmail.com> wrote:
> Few comments:
> +                                       Node       *index_expr;
> +
> +                                       if (index_expr_item == NULL)
>  /* shouldn't happen */
> +                                               elog(ERROR, "too few
> entries in indexprs list");
> +
> +                                       index_expr = (Node *)
> lfirst(index_expr_item);
>
> We can change this elog to below to maintain consistency:
> if (index_expr_item == NULL)    /* shouldn't happen */
> {
>   context->max_hazard = PROPARALLEL_UNSAFE;
>   return context->max_hazard;
> }
>

Thanks. I think you pointed out something similar to this before, but
somehow I must have missed updating this as well (I was just following
existing error handling for this case in the Postgres code).
I'll update it as you suggest, in the next version of the patch I post.

> static HeapTuple
> heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
> CommandId cid, int options)
> {
> /*
> * To allow parallel inserts, we need to ensure that they are safe to be
> * performed in workers. We have the infrastructure to allow parallel
> * inserts in general except for the cases where inserts generate a new
> * CommandId (eg. inserts into a table having a foreign key column).
> */
> I felt we could remove the above comments or maybe rephrase it.
>

That is Amit's comment, and I'm reluctant to change it because it is
still applicable even after application of this patch.
Amit has previously suggested that I add an Assert here, to match the
comment (to replace the original Parallel-worker error-check that I
removed), so I am looking into that.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

10 December 2020, 06:24:46

On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

 /*
+ * PrepareParallelMode
+ *
+ * Prepare for entering parallel mode, based on command-type.
+ */
+void
+PrepareParallelMode(CmdType commandType)
+{
+ Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
+
+ if (IsModifySupportedInParallelMode(commandType))
+ {
+ /*
+ * Prepare for entering parallel mode by assigning a
+ * FullTransactionId, to be included in the transaction state that is
+ * serialized in the parallel DSM.
+ */
+ (void) GetCurrentTransactionId();
+ }
+}

Why do we need to serialize the transaction ID for 0001?  I mean in
0001 we are just allowing the SELECT to be executed in parallel so why
we would need the transaction Id for that.  I agree that we would need
this once we try to perform the Insert also from the worker in the
remaining patches.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

10 December 2020, 07:08:17

On Thu, Dec 10, 2020 at 1:23 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Greg Nancarrow <gregn4422@gmail.com>
> > Firstly, in order to perform parallel-safety checks in the case of partitions, the
> > patch currently recursively locks/unlocks
> > (AccessShareLock) each partition during such checks (as each partition may
> > itself be a partitioned table). Is there a better way of performing the
> > parallel-safety checks and reducing the locking requirements?
>
> First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased
planningtime is negligible when the parallel insert is intentionally used for loading large amount of data.  However,
it'sa problem if the overhead is imposed on OLTP transactions.  Does the overhead occur with the default values of
max_parallel_workers_per_gather= 2 and max_parall_workers = 8? 
>
> To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like
relhasindexesand relhas triggers.  The concerning point is that we have to maintain the accuracy of the value when
droppingancillary objects around the table/partition. 
>

Having information in another table that needs to be accessed is
likely to also have locking requirements.
Here the issue is specifically with partitions, because otherwise if
the target relation is not a partitioned table, it will already be
locked prior to planning as part of the parse/re-write phase (and you
will notice that the initial lock-mode, used by the parallel-safety
checking code for opening the table, is NoLock).

>
> > Secondly, I found that when running "make check-world", the
> > "partition-concurrent-attach" test fails, because it is expecting a partition
> > constraint to be violated on insert, while an "alter table attach partition ..." is
> > concurrently being executed in another transaction. Because of the partition
> > locking done by the patch's parallel-safety checking code, the insert blocks on
> > the exclusive lock held by the "alter table" in the other transaction until the
> > transaction ends, so the insert ends up successfully completing (and thus fails
> > the test) when the other transaction ends. To overcome this test failure, the
> > patch code was updated to instead perform a conditional lock on the partition,
> > and on failure (i.e. because of an exclusive lock held somewhere else), just
> > assume it's parallel-unsafe because the parallel-safety can't be determined
> > without blocking on the lock. This is not ideal, but I'm not sure of what other
> > approach could be used and I am somewhat reluctant to change that test. If
> > anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> > insights would be appreciated.
>
> That test looks sane.  I think what we should do is to disable parallel operation during that test.  It looks like
someof other existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0.  It's not
strangethat some tests fail with some configuration.  autovacuum is disabled in many places of the regression test. 
>
> Rather, I don't think we should introduce the trick to use ConditionalLockAcquire().  Otherwise, the insert would be
executedin a serial fashion without the user knowing it -- "What?  The insert suddenly slowed down multiple times
today,and it didn't finish within the planned maintenance window.  What's wrong?" 
>
>

I think that's probably the best idea, to disable parallel operation
during that test.
However, that doesn't change the fact that, after removal of that
"trick", then the partition locking used in the parallel-safety
checking code will block, if a concurrent transaction has exclusively
locked that partition (as in this test case), and thus there is no
guarantee that a parallel insert will execute faster compared to
serial execution (as such locks tend to be held until the end of the
transaction).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

10 December 2020, 08:20:22

On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
>
>  /*
> + * PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> +
> + if (IsModifySupportedInParallelMode(commandType))
> + {
> + /*
> + * Prepare for entering parallel mode by assigning a
> + * FullTransactionId, to be included in the transaction state that is
> + * serialized in the parallel DSM.
> + */
> + (void) GetCurrentTransactionId();
> + }
> +}
>
> Why do we need to serialize the transaction ID for 0001?  I mean in
> 0001 we are just allowing the SELECT to be executed in parallel so why
> we would need the transaction Id for that.  I agree that we would need
> this once we try to perform the Insert also from the worker in the
> remaining patches.
>

There's a very good reason. It's related to parallel-mode checks for
Insert and how the XID is lazily acquired if required.
When allowing SELECT to be executed in parallel, we're in
parallel-mode and the leader interleaves Inserts with retrieval of the
tuple data from the workers.
You will notice that heap_insert() calls GetTransactionId() as the
very first thing it does. If the FullTransactionId is not valid,
AssignTransactionId() is then called, which then executes this code:

    /*
     * Workers synchronize transaction state at the beginning of each parallel
     * operation, so we can't account for new XIDs at this point.
     */
    if (IsInParallelMode() || IsParallelWorker())
        elog(ERROR, "cannot assign XIDs during a parallel operation");

So that code (currently) has no way of knowing that a XID is being
(lazily) assigned at the beginning, or somewhere in the middle of, a
parallel operation.
This is the reason why PrepareParallelMode() is calling
GetTransactionId() up-front, to ensure a FullTransactionId is assigned
up-front, prior to parallel-mode (so then there won't be an attempted
XID assignment).

If you remove the GetTransactionId() call from PrepareParallelMode()
and run "make installcheck-world" with "force_parallel_mode=regress"
in effect, many tests will fail with:
    ERROR:  cannot assign XIDs during a parallel operation

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Dilip Kumar

Date:

10 December 2020, 09:56:37

On Thu, Dec 10, 2020 at 1:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> >  /*
> > + * PrepareParallelMode
> > + *
> > + * Prepare for entering parallel mode, based on command-type.
> > + */
> > +void
> > +PrepareParallelMode(CmdType commandType)
> > +{
> > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> > +
> > + if (IsModifySupportedInParallelMode(commandType))
> > + {
> > + /*
> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();
> > + }
> > +}
> >
> > Why do we need to serialize the transaction ID for 0001?  I mean in
> > 0001 we are just allowing the SELECT to be executed in parallel so why
> > we would need the transaction Id for that.  I agree that we would need
> > this once we try to perform the Insert also from the worker in the
> > remaining patches.
> >
>
> There's a very good reason. It's related to parallel-mode checks for
> Insert and how the XID is lazily acquired if required.
> When allowing SELECT to be executed in parallel, we're in
> parallel-mode and the leader interleaves Inserts with retrieval of the
> tuple data from the workers.
> You will notice that heap_insert() calls GetTransactionId() as the
> very first thing it does. If the FullTransactionId is not valid,
> AssignTransactionId() is then called, which then executes this code:
>
>     /*
>      * Workers synchronize transaction state at the beginning of each parallel
>      * operation, so we can't account for new XIDs at this point.
>      */
>     if (IsInParallelMode() || IsParallelWorker())
>         elog(ERROR, "cannot assign XIDs during a parallel operation");
>
> So that code (currently) has no way of knowing that a XID is being
> (lazily) assigned at the beginning, or somewhere in the middle of, a
> parallel operation.
> This is the reason why PrepareParallelMode() is calling
> GetTransactionId() up-front, to ensure a FullTransactionId is assigned
> up-front, prior to parallel-mode (so then there won't be an attempted
> XID assignment).
>
> If you remove the GetTransactionId() call from PrepareParallelMode()
> and run "make installcheck-world" with "force_parallel_mode=regress"
> in effect, many tests will fail with:
>     ERROR:  cannot assign XIDs during a parallel operation

Yeah got it, I missed that point that the goal is the avoid assigning
the Transaction Id when we are in parallel mode.  But IIUC at least
for the first patch we don't want to serialize the XID in the
transaction state right because workers don't need the xid as they are
only doing select.  So maybe we can readjust the comment slightly in
the below code

> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

11 December 2020, 11:00:47

Posting an updated set of patches to address recent feedback:

- Removed conditional-locking code used in parallel-safety checking
code (Tsunakawa-san feedback). It turns out that for the problem test
case, no parallel-safety checking should be occurring that locks
relations because those inserts are specifying VALUES, not an
underlying SELECT, so the parallel-safety checking code was updated to
bail out early if no underlying SELECT is specified for the INSERT. No
change to the test code was required.
- Added comment to better explain the reason for treating "INSERT ...
ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
- Added assertion to heap_prepare_insert() (Amit)
- Updated error handling for NULL index_expr_item case (Vignesh)


Regards,
Greg Nancarrow
Fujitsu Australia

Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data


[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL

[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()

[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

21 January 2021, 01:46:57

> 
> Thanks for the feedback.
> Posting an updated set of patches. Changes are based on feedback, as detailed
> below:
Hi

It seems there are some previous comments[1][2] not addressed in current patch.
Just to make sure it's not missed.

[1]
https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local

[2]
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Zhihong Yu

Date:

21 January 2021, 02:29:48

Hi,

For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.

existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.

Minor. For index_expr_max_parallel_hazard_for_modify(),

+ if (keycol == 0)
+ {
+ /* Found an index expression */

You can check if keycol != 0, continue with the loop. This would save some indent.

+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ elog(WARNING, "too few entries in indexprs list");

I think the warning should be an error since there is assertion ahead of the if statement.

+ Assert(!isnull);
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);

The above should be error as well.

Cheers

On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data

[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL

[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()

[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Zhihong Yu

Date:

21 January 2021, 02:52:00

Hi,

For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:

+ bool isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState);

Please wrap long line.

+ uint64 *processed_count_space;

If I read the code correctly, it seems it can be dropped (use pei->processed_count directly).

Cheers

On Wed, Jan 20, 2021 at 6:29 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.

existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.

Minor. For index_expr_max_parallel_hazard_for_modify(),

+ if (keycol == 0)
+ {
+ /* Found an index expression */

You can check if keycol != 0, continue with the loop. This would save some indent.

+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ elog(WARNING, "too few entries in indexprs list");

I think the warning should be an error since there is assertion ahead of the if statement.

+ Assert(!isnull);
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);

The above should be error as well.

Cheers

On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data

[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL

[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()

[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

21 January 2021, 04:16:18

On Thu, Jan 21, 2021 at 1:28 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> is found from the additional parallel-safety checks, or from the existing
> parallel-safety checks for SELECT that it currently performs.
>
> existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.
>

OK, but this is very minor.

> Minor. For index_expr_max_parallel_hazard_for_modify(),
>
> +               if (keycol == 0)
> +               {
> +                   /* Found an index expression */
>
> You can check if keycol != 0, continue with the loop. This would save some indent.

Yes I know, but I don't really see any issue with indent (I'm using
4-space tabs).

>
> +                   if (index_expr_item == NULL)    /* shouldn't happen */
> +                   {
> +                       elog(WARNING, "too few entries in indexprs list");
>
> I think the warning should be an error since there is assertion ahead of the if statement.
>

Assertions are normally for DEBUG builds, so the Assert would have no
effect in a production (release) build.
Besides, as I have explained in my reply to previous feedback, the
logging of a WARNING (rather than ERROR) is intentional, because I
want processing to continue (not stop) if ever this very rare
condition was to occur - so that the issue can be dealt with by the
current non-parallel processing (rather than stop dead in the middle
of parallel-safety-checking code). For a DEBUG build, it is handy for
the Assert to immediately alert us to the issue (which could really
only be caused by a database corruption, not bug in the code).
Note that Vignesh originally suggested changing it from
"elog(ERROR,...)" to "elog(WARNING,...)", and I agree with him.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

21 January 2021, 04:59:16

On Thu, Jan 21, 2021 at 1:50 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:
>
> +       bool        isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate),
ModifyTableState);
>
> Please wrap long line.
>

OK.
I thought I ran pg_indent fairly recently, but maybe it chose not to
wrap that line.


> +   uint64     *processed_count_space;
>
> If I read the code correctly, it seems it can be dropped (use pei->processed_count directly).
>

You're right. I'll change it in the next version.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

21 January 2021, 05:51:49

On Thu, Jan 21, 2021 at 12:47 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> >
> Hi
>
> It seems there are some previous comments[1][2] not addressed in current patch.
> Just to make sure it's not missed.
>
> [1]
> https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local
>
> [2]
> https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com
>

Thanks for alerting me to those, somehow I completely missed them,
sorry about that.
I'll be sure to investigate and address them in the next version of
the patch I post.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

21 January 2021, 07:14:14

On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > Hi
> >
> > > > I may be wrong, and if I miss sth in previous mails, please give me some
> > > hints.
> > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > considered for foreign table or temporary table, as the insertions only
> > > happened in the leader process.
> > > >
> > >
> > > I don't think we support parallel scan for temporary tables. Can you please
> > > try once both of these operations without Insert being involved? If you
> > > are able to produce a parallel plan without Insert then we can see why it
> > > is not supported with Insert.
> >
> > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part)
tableis temporary.
 
> > And you are right that parallel select is not enabled when temporary table is in select part.
> >
>
> I think Select can be parallel for this case and we should support this case.
>

So I think we're saying that if the target table is a foreign table or
temporary table, it can be regarded as PARALLEL_RESTRICTED, right?

i.e. code-wise:

        /*
-        * We can't support table modification in parallel-mode if
it's a foreign
-        * table/partition (no FDW API for supporting parallel access) or a
+        * We can't support table modification in a parallel worker if it's a
+        * foreign table/partition (no FDW API for supporting parallel
access) or a
         * temporary table.
         */
        if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
                RelationUsesLocalBuffers(rel))
        {
-               table_close(rel, lockmode);
-               context->max_hazard = PROPARALLEL_UNSAFE;
-               return true;
+               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+               {
+                       table_close(rel, lockmode);
+                       return true;
+               }
        }


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

21 January 2021, 07:34:20

On Mon, Dec 21, 2020 at 1:50 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> +
> +       index_oid_list = RelationGetIndexList(rel);
> ...
>
> As memtioned in the comments of RelationGetIndexList:
> * we return a copy of the list palloc'd in the caller's context.  The caller
> * may list_free() the returned list after scanning it.
>
> Shall we list_free(index_oid_list) at the end of function ?
> Just to avoid potential memory leak.
>

I think that's a good idea, so I'll make that update in the next
version of the patch.
I do notice, however, that there seems to be quite a few places in the
Postgres code where RelationGetIndexList() is being called without a
corresponding list_free() being called - obviously instead relying on
it being deallocated when the caller's memory-context is destroyed.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

21 January 2021, 08:30:33

On Thu, Jan 21, 2021 at 12:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > >
> > > Hi
> > >
> > > > > I may be wrong, and if I miss sth in previous mails, please give me some
> > > > hints.
> > > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > > considered for foreign table or temporary table, as the insertions only
> > > > happened in the leader process.
> > > > >
> > > >
> > > > I don't think we support parallel scan for temporary tables. Can you please
> > > > try once both of these operations without Insert being involved? If you
> > > > are able to produce a parallel plan without Insert then we can see why it
> > > > is not supported with Insert.
> > >
> > > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part)
tableis temporary.
 
> > > And you are right that parallel select is not enabled when temporary table is in select part.
> > >
> >
> > I think Select can be parallel for this case and we should support this case.
> >
>
> So I think we're saying that if the target table is a foreign table or
> temporary table, it can be regarded as PARALLEL_RESTRICTED, right?
>

Yes.

> i.e. code-wise:
>
>         /*
> -        * We can't support table modification in parallel-mode if
> it's a foreign
> -        * table/partition (no FDW API for supporting parallel access) or a
> +        * We can't support table modification in a parallel worker if it's a
> +        * foreign table/partition (no FDW API for supporting parallel
> access) or a
>          * temporary table.
>          */
>         if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
>                 RelationUsesLocalBuffers(rel))
>         {
> -               table_close(rel, lockmode);
> -               context->max_hazard = PROPARALLEL_UNSAFE;
> -               return true;
> +               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> +               {
> +                       table_close(rel, lockmode);
> +                       return true;
> +               }
>         }
>

Yeah, these changes look correct to me.

-- 
With Regards,
Amit Kapila.

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

21 January 2021, 08:53:44

> > > Hi
> > >
> > > > > I may be wrong, and if I miss sth in previous mails, please give
> > > > > me some
> > > > hints.
> > > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > > considered for foreign table or temporary table, as the
> > > > > insertions only
> > > > happened in the leader process.
> > > > >
> > > >
> > > > I don't think we support parallel scan for temporary tables. Can
> > > > you please try once both of these operations without Insert being
> > > > involved? If you are able to produce a parallel plan without
> > > > Insert then we can see why it is not supported with Insert.
> > >
> > > Sorry, may be I did not express it clearly, I actually means the case
> when insert's target(not in select part) table is temporary.
> > > And you are right that parallel select is not enabled when temporary
> table is in select part.
> > >
> >
> > I think Select can be parallel for this case and we should support this
> case.
> >
> 
> So I think we're saying that if the target table is a foreign table or
> temporary table, it can be regarded as PARALLEL_RESTRICTED, right?
>

Yes

IMO, PARALLEL_RESTRICTED currently enable parallel select but disable parallel insert.
So, the INSERT only happen in leader worker which seems safe to insert into tempory/foreigh table.

In addition, there are some other restriction about parallel select which seems can be removed:

1.- Target table has a parallel-unsafe trigger, index expression, column default
    expression or check constraint
2.- Target table is a partitioned table with a parallel-unsafe partition key
    expression or support function

If the Insert's target table is the type listed above, Is there some reasons why we can not support parallel select ?
It seems only leader worker will execute the trigger and key-experssion which seems safe.
(If I miss something about it, please let me know)

Best regards,
houzj

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

21 January 2021, 09:18:23

> > So I think we're saying that if the target table is a foreign table or
> > temporary table, it can be regarded as PARALLEL_RESTRICTED, right?
> >
> 
> Yes
> 
> IMO, PARALLEL_RESTRICTED currently enable parallel select but disable
> parallel insert.
> So, the INSERT only happen in leader worker which seems safe to insert into
> tempory/foreigh table.
> 
> In addition, there are some other restriction about parallel select which
> seems can be removed:
> 
> 1.- Target table has a parallel-unsafe trigger, index expression, column
> default
>     expression or check constraint
> 2.- Target table is a partitioned table with a parallel-unsafe partition
> key
>     expression or support function
> 
> If the Insert's target table is the type listed above, Is there some reasons
> why we can not support parallel select ?
> It seems only leader worker will execute the trigger and key-experssion
> which seems safe.
> (If I miss something about it, please let me know)


So Sorry, please ignore the above, I think of something wrong.

Best regards,
houzj

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

22 January 2021, 01:08:15

> >
> > +
> > +       index_oid_list = RelationGetIndexList(rel);
> > ...
> >
> > As memtioned in the comments of RelationGetIndexList:
> > * we return a copy of the list palloc'd in the caller's context.  The
> > caller
> > * may list_free() the returned list after scanning it.
> >
> > Shall we list_free(index_oid_list) at the end of function ?
> > Just to avoid potential memory leak.
> >
> 
> I think that's a good idea, so I'll make that update in the next version
> of the patch.
> I do notice, however, that there seems to be quite a few places in the Postgres
> code where RelationGetIndexList() is being called without a corresponding
> list_free() being called - obviously instead relying on it being deallocated
> when the caller's memory-context is destroyed.
> 

Yes, it will be deallocated when the caller's memory-context is destroyed.

Currently, parallel safety-check check each partition.
I am just a little worried about if there are lots of partition here, it may cause high memory use.

And there is another place like this:

1.
+            conbin = TextDatumGetCString(val);
+            check_expr = stringToNode(conbin);

It seems we can free the cobin when not used(for the same reason above).
What do you think ?


Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 01:42:15

On Fri, Jan 22, 2021 at 12:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> >
> > I think that's a good idea, so I'll make that update in the next version
> > of the patch.
> > I do notice, however, that there seems to be quite a few places in the Postgres
> > code where RelationGetIndexList() is being called without a corresponding
> > list_free() being called - obviously instead relying on it being deallocated
> > when the caller's memory-context is destroyed.
> >
>
> Yes, it will be deallocated when the caller's memory-context is destroyed.
>
> Currently, parallel safety-check check each partition.
> I am just a little worried about if there are lots of partition here, it may cause high memory use.
>
> And there is another place like this:
>
> 1.
> +                       conbin = TextDatumGetCString(val);
> +                       check_expr = stringToNode(conbin);
>
> It seems we can free the cobin when not used(for the same reason above).
> What do you think ?
>
>

Yes, I think you're right, we should pfree conbin after converting to
Node, to minimize memory usage.
Again, it's interesting that existing Postgres code, when looping
through all of the constraints, doesn't do this.
Hmmm. I'm wondering if there is a performance reason behind this -
avoiding multiple calls to pfree() and just relying on it to be
deallocated in one hit, when the memory context is destroyed.
Anyway, perhaps the concerns of many partitions and the recursive
nature of these checks overrides that, because, as you say, possible
high memory usage.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

22 January 2021, 02:15:58

Hi

I took a look at v12-0001 patch, here are some comments:

1.
+    /*
+     * Setup the context used in finding the max parallel-mode hazard.
+     */
+    Assert(initial_max_parallel_hazard == 0 ||
+           initial_max_parallel_hazard == PROPARALLEL_SAFE ||
+           initial_max_parallel_hazard == PROPARALLEL_RESTRICTED);
+    context.max_hazard = initial_max_parallel_hazard == 0 ?
+        PROPARALLEL_SAFE : initial_max_parallel_hazard;

I am not quiet sure when will " max_parallel_hazard == 0"
Does it means the case max_parallel_hazard_context not initialized ?


2.
Some tiny code style suggestions

+        if (con->contype == CONSTRAINT_CHECK)
+        {
+            char       *conbin;
+            Datum        val;
+            bool        isnull;
+            Expr       *check_expr;

How about :

if (con->contype != CONSTRAINT_CHECK)
    continue;

3.
+                if (keycol == 0)
+                {
+                    /* Found an index expression */
+
+                    Node       *index_expr;

Like 2, how about:

If (keycol != 0)
    Continue;


4.
+            ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
...
+                    index_expr = (Node *) lfirst(index_expr_item);
+                    index_expr = (Node *) expression_planner((Expr *) index_expr);

It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions

So, IMO, we do not need to call expression_planner for the expr again.


And there seems another solution for this:

In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
which seems can get from "Relation-> rd_index".

Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
It can avoid some unnecessary cost.
And in this solution we do not need to remove expression_planner.

What do you think ?

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 02:58:50

On Thu, Jan 21, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > i.e. code-wise:
> >
> >         /*
> > -        * We can't support table modification in parallel-mode if
> > it's a foreign
> > -        * table/partition (no FDW API for supporting parallel access) or a
> > +        * We can't support table modification in a parallel worker if it's a
> > +        * foreign table/partition (no FDW API for supporting parallel
> > access) or a
> >          * temporary table.
> >          */
> >         if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
> >                 RelationUsesLocalBuffers(rel))
> >         {
> > -               table_close(rel, lockmode);
> > -               context->max_hazard = PROPARALLEL_UNSAFE;
> > -               return true;
> > +               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> > +               {
> > +                       table_close(rel, lockmode);
> > +                       return true;
> > +               }
> >         }
> >
>
> Yeah, these changes look correct to me.
>

Unfortunately, this change results in a single test failure in the
"with" tests when "force_parallel_mode=regress" is in effect.

I have reproduced the problem, by extracting relevant SQL from those
tests, as follows:

CREATE TEMP TABLE bug6051 AS
  select i from generate_series(1,3) as i;
SELECT * FROM bug6051;
CREATE TEMP TABLE bug6051_2 (i int);
CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD
 INSERT INTO bug6051_2
 SELECT NEW.i;
WITH t1 AS ( DELETE FROM bug6051 RETURNING * )
INSERT INTO bug6051 SELECT * FROM t1;
ERROR:  cannot delete tuples during a parallel operation

Note that prior to the patch, all INSERTs were regarded as
PARALLEL_UNSAFE, so this problem obviously didn't occur.
I believe this INSERT should be regarded as PARALLEL_UNSAFE, because
it contains a modifying CTE.
However, for some reason, the INSERT is not regarded as having a
modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into
the parallel-safety-checks and is found to be PARALLEL_RESTRICTED:

The relevant code in standard_planner() is:

    if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
        IsUnderPostmaster &&
        (parse->commandType == CMD_SELECT ||
         IsModifySupportedInParallelMode(parse->commandType)) &&
        !parse->hasModifyingCTE &&
        max_parallel_workers_per_gather > 0 &&
        !IsParallelWorker())
    {
        /* all the cheap tests pass, so scan the query tree */
        glob->maxParallelHazard = max_parallel_hazard(parse);
        glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
    }
    else
    {
        /* skip the query tree scan, just assume it's unsafe */
        glob->maxParallelHazard = PROPARALLEL_UNSAFE;
        glob->parallelModeOK = false;
    }

When I debugged this (transformWithClause()), the WITH clause was
found to contain a modifying CTE and for the INSERT
query->hasModifyingCTE was set true.
But somehow in the re-writer code, this got lost.
Bug?
Ideas?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 04:11:31

On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> I took a look at v12-0001 patch, here are some comments:
>
> 1.
> +       /*
> +        * Setup the context used in finding the max parallel-mode hazard.
> +        */
> +       Assert(initial_max_parallel_hazard == 0 ||
> +                  initial_max_parallel_hazard == PROPARALLEL_SAFE ||
> +                  initial_max_parallel_hazard == PROPARALLEL_RESTRICTED);
> +       context.max_hazard = initial_max_parallel_hazard == 0 ?
> +               PROPARALLEL_SAFE : initial_max_parallel_hazard;
>
> I am not quiet sure when will " max_parallel_hazard == 0"
> Does it means the case max_parallel_hazard_context not initialized ?
>

That function doesn't accept a "max_parallel_hazard_context". It
accepts an initial "max_parallel_hazard" value (char).
The "0" value is just a way of specifying "use the default"
(PROPARALLEL_SAFE). It is not currently used, since currently we just
always pass the "context.max_parallel_hazard" value resulting from the
previous parallel-safety checks for SELECT (and otherwise don't call
that function anywhere else).

>
> 2.
> Some tiny code style suggestions
>
> +               if (con->contype == CONSTRAINT_CHECK)
> +               {
> +                       char       *conbin;
> +                       Datum           val;
> +                       bool            isnull;
> +                       Expr       *check_expr;
>
> How about :
>
> if (con->contype != CONSTRAINT_CHECK)
>         continue;
>
> 3.
> +                               if (keycol == 0)
> +                               {
> +                                       /* Found an index expression */
> +
> +                                       Node       *index_expr;
>
> Like 2, how about:
>
> If (keycol != 0)
>         Continue;
>

This is really a programmer style preference (plenty of discussions on
the internet about it), but it can be argued that use of "continue"
here is not quite as clear as the explicit "if" condition, especially
in this very simple one-condition case.
I'm inclined to leave it as is.

>
> 4.
> +                       ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
> ...
> +                                       index_expr = (Node *) lfirst(index_expr_item);
> +                                       index_expr = (Node *) expression_planner((Expr *) index_expr);
>
> It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
> Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions
>
> So, IMO, we do not need to call expression_planner for the expr again.
>
>
> And there seems another solution for this:
>
> In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
> which seems can get from "Relation-> rd_index".
>
> Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
> It can avoid some unnecessary cost.
> And in this solution we do not need to remove expression_planner.
>

OK, maybe this is a good idea, but I do recall trying to minimize this
kind of processing before, but there were cases which broke it.
Have you actually tried your idea and run all regression tests and
verified that they passed?
In any case, I'll look into it.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

22 January 2021, 05:48:53

On Fri, Jan 22, 2021 at 8:29 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Thu, Jan 21, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > i.e. code-wise:
> > >
> > >         /*
> > > -        * We can't support table modification in parallel-mode if
> > > it's a foreign
> > > -        * table/partition (no FDW API for supporting parallel access) or a
> > > +        * We can't support table modification in a parallel worker if it's a
> > > +        * foreign table/partition (no FDW API for supporting parallel
> > > access) or a
> > >          * temporary table.
> > >          */
> > >         if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
> > >                 RelationUsesLocalBuffers(rel))
> > >         {
> > > -               table_close(rel, lockmode);
> > > -               context->max_hazard = PROPARALLEL_UNSAFE;
> > > -               return true;
> > > +               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> > > +               {
> > > +                       table_close(rel, lockmode);
> > > +                       return true;
> > > +               }
> > >         }
> > >
> >
> > Yeah, these changes look correct to me.
> >
>
> Unfortunately, this change results in a single test failure in the
> "with" tests when "force_parallel_mode=regress" is in effect.
>
> I have reproduced the problem, by extracting relevant SQL from those
> tests, as follows:
>
> CREATE TEMP TABLE bug6051 AS
>   select i from generate_series(1,3) as i;
> SELECT * FROM bug6051;
> CREATE TEMP TABLE bug6051_2 (i int);
> CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD
>  INSERT INTO bug6051_2
>  SELECT NEW.i;
> WITH t1 AS ( DELETE FROM bug6051 RETURNING * )
> INSERT INTO bug6051 SELECT * FROM t1;
> ERROR:  cannot delete tuples during a parallel operation
>
> Note that prior to the patch, all INSERTs were regarded as
> PARALLEL_UNSAFE, so this problem obviously didn't occur.
> I believe this INSERT should be regarded as PARALLEL_UNSAFE, because
> it contains a modifying CTE.
> However, for some reason, the INSERT is not regarded as having a
> modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into
> the parallel-safety-checks and is found to be PARALLEL_RESTRICTED:
>
> The relevant code in standard_planner() is:
>
>     if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>         IsUnderPostmaster &&
>         (parse->commandType == CMD_SELECT ||
>          IsModifySupportedInParallelMode(parse->commandType)) &&
>         !parse->hasModifyingCTE &&
>         max_parallel_workers_per_gather > 0 &&
>         !IsParallelWorker())
>     {
>         /* all the cheap tests pass, so scan the query tree */
>         glob->maxParallelHazard = max_parallel_hazard(parse);
>         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
>     }
>     else
>     {
>         /* skip the query tree scan, just assume it's unsafe */
>         glob->maxParallelHazard = PROPARALLEL_UNSAFE;
>         glob->parallelModeOK = false;
>     }
>
> When I debugged this (transformWithClause()), the WITH clause was
> found to contain a modifying CTE and for the INSERT
> query->hasModifyingCTE was set true.
> But somehow in the re-writer code, this got lost.
> Bug?
> Ideas?
>

How it behaves when the table in the above test is a non-temp table
with your patch? If it leads to the same error then we can at least
conclude that this is a generic problem and nothing specific to temp
tables.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 06:02:09

On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
>
> 4.
> +                       ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
> ...
> +                                       index_expr = (Node *) lfirst(index_expr_item);
> +                                       index_expr = (Node *) expression_planner((Expr *) index_expr);
>
> It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
> Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions
>
> So, IMO, we do not need to call expression_planner for the expr again.
>

Thanks. You are right. I debugged it, and found that BuildIndexInfo-->
RelationGetIndexExpressions executes the same expression evaluation
code as expression_planner().
So I'll remove the redundant call to expression_planner() here.

>
> And there seems another solution for this:
>
> In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
> which seems can get from "Relation-> rd_index".
>
> Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
> It can avoid some unnecessary cost.
> And in this solution we do not need to remove expression_planner.
>

Hmmm, when I debugged my simple test case, I found rel->rd_index was
NULL, so it seems that the call to BuildIndexInfo is needed.
(have I understood your suggestion correctly?)

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 07:16:16

On Fri, Jan 22, 2021 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > Unfortunately, this change results in a single test failure in the
> > "with" tests when "force_parallel_mode=regress" is in effect.
> >
> > I have reproduced the problem, by extracting relevant SQL from those
> > tests, as follows:
> >
> > CREATE TEMP TABLE bug6051 AS
> >   select i from generate_series(1,3) as i;
> > SELECT * FROM bug6051;
> > CREATE TEMP TABLE bug6051_2 (i int);
> > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD
> >  INSERT INTO bug6051_2
> >  SELECT NEW.i;
> > WITH t1 AS ( DELETE FROM bug6051 RETURNING * )
> > INSERT INTO bug6051 SELECT * FROM t1;
> > ERROR:  cannot delete tuples during a parallel operation
> >
> > Note that prior to the patch, all INSERTs were regarded as
> > PARALLEL_UNSAFE, so this problem obviously didn't occur.
> > I believe this INSERT should be regarded as PARALLEL_UNSAFE, because
> > it contains a modifying CTE.
> > However, for some reason, the INSERT is not regarded as having a
> > modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into
> > the parallel-safety-checks and is found to be PARALLEL_RESTRICTED:
> >
> > The relevant code in standard_planner() is:
> >
> >     if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> >         IsUnderPostmaster &&
> >         (parse->commandType == CMD_SELECT ||
> >          IsModifySupportedInParallelMode(parse->commandType)) &&
> >         !parse->hasModifyingCTE &&
> >         max_parallel_workers_per_gather > 0 &&
> >         !IsParallelWorker())
> >     {
> >         /* all the cheap tests pass, so scan the query tree */
> >         glob->maxParallelHazard = max_parallel_hazard(parse);
> >         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> >     }
> >     else
> >     {
> >         /* skip the query tree scan, just assume it's unsafe */
> >         glob->maxParallelHazard = PROPARALLEL_UNSAFE;
> >         glob->parallelModeOK = false;
> >     }
> >
> > When I debugged this (transformWithClause()), the WITH clause was
> > found to contain a modifying CTE and for the INSERT
> > query->hasModifyingCTE was set true.
> > But somehow in the re-writer code, this got lost.
> > Bug?
> > Ideas?
> >
>
> How it behaves when the table in the above test is a non-temp table
> with your patch? If it leads to the same error then we can at least
> conclude that this is a generic problem and nothing specific to temp
> tables.
>

Oh, I don't believe that this has anything to do with TEMP tables -
it's just that when I relaxed the parallel-safety level on TEMP
tables, it exposed the CTE issue in this test case because it just
happens to use a TEMP table.
Having said that, when I changed that test code to not use a TEMP
table, an Assert fired in the planner code and caused the backend to
abort.
It looks like I need to update the following Assert in the planner
code (unchanged by the current patch) in order to test further - but
this Assert only fired because the commandType was CMD_DELETE, which
SHOULD have been excluded by the "hasModifyingCTE" test on the parent
INSERT, which is what I'm saying is strangely NOT getting set.

    /*
     * Generate partial paths for final_rel, too, if outer query levels might
     * be able to make use of them.
     */
    if (final_rel->consider_parallel && root->query_level > 1 &&
        !limit_needed(parse))
    {
        Assert(!parse->rowMarks && parse->commandType == CMD_SELECT);
        foreach(lc, current_rel->partial_pathlist)
        {
            Path       *partial_path = (Path *) lfirst(lc);

            add_partial_path(final_rel, partial_path);
        }
    }

Once the Assert above is changed to not test the commandType, the same
error occurs as before.

This appears to possibly be some kind of bug in which hasModifyingCTE
is not getting set, at least in this particular INSERT case, but in
the current Postgres code it doesn't matter because all INSERTs are
considered parallel-unsafe, so won't be executed in parallel-mode
anyway.
I notice that if I execute "SELECT * from t1" instead of "INSERT INTO
bug6051 SELECT * from t1", then "hasModifyingCTE" is getting set to
true for the query, and thus is always considered parallel-unsafe.


Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

22 January 2021, 07:21:33

> > And there seems another solution for this:
> >
> > In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs ,
> > ii_IndexAttrNumbers } from the IndexInfo, which seems can get from
> "Relation-> rd_index".
> >
> > Based on above, May be we do not need to call BuildIndexInfo to build
> the IndexInfo.
> > It can avoid some unnecessary cost.
> > And in this solution we do not need to remove expression_planner.
> >
> 
> Hmmm, when I debugged my simple test case, I found rel->rd_index was NULL,
> so it seems that the call to BuildIndexInfo is needed.
> (have I understood your suggestion correctly?)
> 

Hi greg,

Thanks for debugging this.

May be I missed something. I am not sure about the case when rel->rd_index was NULL.
Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index.
Like the following:
----
BuildIndexInfo(Relation index)
{
    IndexInfo  *ii;
    Form_pg_index indexStruct = index->rd_index;
    int            i;
    int            numAtts;

    /* check the number of keys, and copy attr numbers into the IndexInfo */
    numAtts = indexStruct->indnatts;
----

And the patch do not have NULL-check for index->rd_index too.
So I thought we can assume index->rd_index is not null, but it seems I may missed something ?

Can you please share the test case with me ?

I use the following code to replace the call of BuildIndexInfo.
And the installcheck passed.

Example:
+                Form_pg_index indexStruct = index_rel->rd_index;
+                List *ii_Expressions = RelationGetIndexExpressions(index_rel);
+                int ii_NumIndexAttrs = indexStruct->indnatts;
+                AttrNumber      ii_IndexAttrNumbers[INDEX_MAX_KEYS];

+                for (int i = 0; i < ii_NumIndexAttrs; i++)
+                        ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i];

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 08:19:54

On Fri, Jan 22, 2021 at 6:21 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> Hi greg,
>
> Thanks for debugging this.
>
> May be I missed something. I am not sure about the case when rel->rd_index was NULL.
> Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index.
> Like the following:
> ----
> BuildIndexInfo(Relation index)
> {
>         IndexInfo  *ii;
>         Form_pg_index indexStruct = index->rd_index;
>         int                     i;
>         int                     numAtts;
>
>         /* check the number of keys, and copy attr numbers into the IndexInfo */
>         numAtts = indexStruct->indnatts;
> ----
>
> And the patch do not have NULL-check for index->rd_index too.
> So I thought we can assume index->rd_index is not null, but it seems I may missed something ?
>
> Can you please share the test case with me ?
>
> I use the following code to replace the call of BuildIndexInfo.
> And the installcheck passed.
>
> Example:
> +                Form_pg_index indexStruct = index_rel->rd_index;
> +                List *ii_Expressions = RelationGetIndexExpressions(index_rel);
> +                int ii_NumIndexAttrs = indexStruct->indnatts;
> +                AttrNumber      ii_IndexAttrNumbers[INDEX_MAX_KEYS];
>
> +                for (int i = 0; i < ii_NumIndexAttrs; i++)
> +                        ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i];

Sorry, I was looking at rel->rd_index, not index_rel->rd_index, my fault.
Your code looks OK. I've taken it and reduced some of the lines and
got rid of the C99-only intermingled variable declarations (see
https://www.postgresql.org/docs/13/source-conventions.html).
The changes are below.
The regression tests all pass, so should be OK (my test case was taken
from insert_parallel regression tests).
Thanks for your help.

-        Oid         index_oid = lfirst_oid(lc);
-        Relation    index_rel;
-        IndexInfo  *index_info;
+        Relation        index_rel;
+        Form_pg_index   indexStruct;
+        List            *ii_Expressions;
+        Oid             index_oid = lfirst_oid(lc);

         index_rel = index_open(index_oid, lockmode);

-        index_info = BuildIndexInfo(index_rel);
+        indexStruct = index_rel->rd_index;
+        ii_Expressions = RelationGetIndexExpressions(index_rel);

-        if (index_info->ii_Expressions != NIL)
+        if (ii_Expressions != NIL)
         {
             int         i;
-            ListCell   *index_expr_item =
list_head(index_info->ii_Expressions);
+            ListCell    *index_expr_item = list_head(ii_Expressions);

-            for (i = 0; i < index_info->ii_NumIndexAttrs; i++)
+            for (i = 0; i < indexStruct->indnatts; i++)
             {
-                int         keycol = index_info->ii_IndexAttrNumbers[i];
+                int         keycol = indexStruct->indkey.values[i];

                 if (keycol == 0)
                 {
@@ -912,7 +914,7 @@ index_expr_max_parallel_hazard_for_modify(Relation rel,
                         return true;
                     }

-                    index_expr_item =
lnext(index_info->ii_Expressions, index_expr_item);
+                    index_expr_item = lnext(ii_Expressions, index_expr_item);
                 }
             }


Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

22 January 2021, 08:52:07

Hello Greg-san,


Initially, some miner comments:


(1)
-     * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
-     * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
-     * backend writes into a completely new table.  In the future, we can
-     * extend it to allow workers to write into the table.  However, to allow
-     * parallel updates and deletes, we have to solve other problems,
-     * especially around combo CIDs.)
+     * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
+     * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
+     * of now, only the leader backend writes into a completely new table. In

This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it?
 


(2)
 /*
  * RELATION_IS_LOCAL
- *        If a rel is either temp or newly created in the current transaction,
- *        it can be assumed to be accessible only to the current backend.
- *        This is typically used to decide that we can skip acquiring locks.
+ *        If a rel is temp, it can be assumed to be accessible only to the
+ *        current backend. This is typically used to decide that we can
+ *        skip acquiring locks.
  *
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_LOCAL(relation) \
-    ((relation)->rd_islocaltemp || \
-     (relation)->rd_createSubid != InvalidSubTransactionId)
+    ((relation)->rd_islocaltemp)

How is this correct?  At least, this change would cause a transaction that creates a new relation acquire an
unnecessarylock.  I'm not sure if that overhead is worth worrying about (perhaps not, I guess).  But can we still check
>rd_createSubidin non-parallel mode?  If we adopt the above change, the comments at call sites need modification - "new
ortemp relation" becomes "temp relations".
 


(3)
@@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate)
...
-    pstmt->commandType = CMD_SELECT;
+    Assert(estate->es_plannedstmt->commandType == CMD_SELECT ||
+           IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType));
+    pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT;

The last line can just be as follows, according to the Assert():

+    pstmt->commandType = estate->es_plannedstmt->commandType);


(4)
@@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate,
     estate->es_use_parallel_mode = use_parallel_mode;
     if (use_parallel_mode)
     {
-        PrepareParallelMode(estate->es_plannedstmt->commandType);
+        bool        isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate),
ModifyTableState);
+
+        PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader);
         EnterParallelMode();
     }

@@ -1021,12 +1039,25 @@ IsInParallelMode(void)
  * Prepare for entering parallel mode, based on command-type.
  */
 void
-PrepareParallelMode(CmdType commandType)
+PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
 {
     if (IsModifySupportedInParallelMode(commandType))
     {
         Assert(!IsInParallelMode());
 
+        if (isParallelModifyLeader)
+        {
+            /*
+             * Set currentCommandIdUsed to true, to ensure that the current
+             * CommandId (which will be used by the parallel workers) won't
+             * change during this parallel operation, as starting new
+             * commands in parallel-mode is not currently supported.
+             * See related comments in GetCurrentCommandId and
+             * CommandCounterIncrement.
+             */
+            (void) GetCurrentCommandId(true);
+        }

I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan().
PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel
leaderor a parallel worker.
 

BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()?  Someone who will read other call
sitesof EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there.  Can
wejust add an argument to EnterParallelMode()?  Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to use
CMD_XX.


Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

22 January 2021, 11:49:24

On Fri, Jan 22, 2021 at 7:52 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> (1)
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> -        * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
> -        * backend writes into a completely new table.  In the future, we can
> -        * extend it to allow workers to write into the table.  However, to allow
> -        * parallel updates and deletes, we have to solve other problems,
> -        * especially around combo CIDs.)
> +        * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> +        * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> +        * of now, only the leader backend writes into a completely new table. In
>
> This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it? 
>

Yes, you're right the wording is not right (and I don't really like
the wording used before the patch).

Perhaps it could say:

(Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
of now, other than in the case of INSERT INTO...SELECT, only the leader backend
writes into a completely new table. In the future, we can extend it to
allow workers for the
other commands to write into the table. However, to allow parallel
updates and deletes, we
have to solve other problems, especially around combo CIDs.)

Of course, this will need further updating when parallel CREATE TABLE
AS etc. is implemented ...

>
> (2)
>  /*
>   * RELATION_IS_LOCAL
> - *             If a rel is either temp or newly created in the current transaction,
> - *             it can be assumed to be accessible only to the current backend.
> - *             This is typically used to decide that we can skip acquiring locks.
> + *             If a rel is temp, it can be assumed to be accessible only to the
> + *             current backend. This is typically used to decide that we can
> + *             skip acquiring locks.
>   *
>   * Beware of multiple eval of argument
>   */
>  #define RELATION_IS_LOCAL(relation) \
> -       ((relation)->rd_islocaltemp || \
> -        (relation)->rd_createSubid != InvalidSubTransactionId)
> +       ((relation)->rd_islocaltemp)
>
> How is this correct?  At least, this change would cause a transaction that creates a new relation acquire an
unnecessarylock.  I'm not sure if that overhead is worth worrying about (perhaps not, I guess).  But can we still check
>rd_createSubidin non-parallel mode?  If we adopt the above change, the comments at call sites need modification - "new
ortemp relation" becomes "temp relations". 
>

The problem is, with the introduction of parallel INSERT, it's no
longer the case that newly-created tables can't be accessed by anyone
else in the same transaction - now, a transaction can include parallel
workers, inserting into the table concurrently. Without changing that
macro, things fail with a very obscure message (e.g. ERROR:
unexpected data beyond EOF in block 5 of relation base/16384/16388)
and it takes days to debug what the cause of it is.
Maybe updating the macro to still check rd_createSubid in non-parallel
mode is a good idea - I'll need to try it.
Other than that, each and every usage of RELATION_IS_LOCAL would need
to be closely examined, to see if it could be within a parallel
INSERT.

>
> (3)
> @@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate)
> ...
> -       pstmt->commandType = CMD_SELECT;
> +       Assert(estate->es_plannedstmt->commandType == CMD_SELECT ||
> +                  IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType));
> +       pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT;
>
> The last line can just be as follows, according to the Assert():
>
> +       pstmt->commandType = estate->es_plannedstmt->commandType);
>

No, that's not right. I did that originally and it failed in some
cases (try changing it and then run the regression tests and you'll
see).
The commandType of the es_plannedstmt might be CMD_INSERT but the one
in the plan might be CMD_SELECT (for the underlying SELECT).

>
> (4)
> @@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate,
>         estate->es_use_parallel_mode = use_parallel_mode;
>         if (use_parallel_mode)
>         {
> -               PrepareParallelMode(estate->es_plannedstmt->commandType);
> +               bool            isParallelModifyLeader = IsA(planstate, GatherState) &&
IsA(outerPlanState(planstate),ModifyTableState); 
> +
> +               PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader);
>                 EnterParallelMode();
>         }
>
> @@ -1021,12 +1039,25 @@ IsInParallelMode(void)
>   * Prepare for entering parallel mode, based on command-type.
>   */
>  void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
>  {
>         if (IsModifySupportedInParallelMode(commandType))
>         {
>                 Assert(!IsInParallelMode());
>
> +               if (isParallelModifyLeader)
> +               {
> +                       /*
> +                        * Set currentCommandIdUsed to true, to ensure that the current
> +                        * CommandId (which will be used by the parallel workers) won't
> +                        * change during this parallel operation, as starting new
> +                        * commands in parallel-mode is not currently supported.
> +                        * See related comments in GetCurrentCommandId and
> +                        * CommandCounterIncrement.
> +                        */
> +                       (void) GetCurrentCommandId(true);
> +               }
>
> I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan().
PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel
leaderor a parallel worker. 

You could, but I'm not sure it would make the code easier to read,
especially for those who don't know !isParallelWorker() means it's a
parallel leader.

>
> BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()?  Someone who will read other
callsites of EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there.
Can we just add an argument to EnterParallelMode()?  Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to
useCMD_XX. 
>

I really can't see a problem. PrepareParallelMode() is only needed
prior to execution of a parallel plan, so it's not needed for "other
call sites" using EnterParallelMode().
Perhaps the name can be changed to disassociate it from generic
EnterParallelMode() usage. So far, I've only thought of long names
like: PrepareParallelModePlanExec().
Ideas?

Regards,
Greg Nancarrow
Fujitsu Australia

.

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

24 January 2021, 23:23:43

Hello Greg-san,


Second group of comments (I'll reply to (1) - (4) later):


(5)
@@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
... 
-    if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
+    if ((plannedstmt->commandType != CMD_SELECT &&
+         !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
         PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
 }

Now that we're trying to allow parallel writes (INSERT), we should:

* use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents.  That is,
movethe call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
 

* Update the comment  above the call to ExecCheckXactReadOnly().


(6)
@@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
...
+    else
+    {
+        pei->processed_count = NULL;
+    }

The braces can be deleted.


(7)
@@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
                                          true);
     queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
 
+    Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
+    if (IsModifySupportedInParallelMode(queryDesc->operation))
+    {
+        /*
+         * Record that the CurrentCommandId is used, at the start of the
+         * parallel operation.
+         */
+        SetCurrentCommandIdUsedForWorker();
+    }
+
     /* Setting debug_query_string for individual workers */
     debug_query_string = queryDesc->sourceText;

@@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
     if (used)
     {
         /*
-         * Forbid setting currentCommandIdUsed in a parallel worker, because
-         * we have no provision for communicating this back to the leader.  We
-         * could relax this restriction when currentCommandIdUsed was already
-         * true at the start of the parallel operation.
+         * If in a parallel worker, only allow setting currentCommandIdUsed if
+         * currentCommandIdUsed was already true at the start of the parallel
+         * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+         * setting currentCommandIdUsed because we have no provision for
+         * communicating this back to the leader. Once currentCommandIdUsed is
+         * set, the commandId used by leader and workers can't be changed,
+         * because CommandCounterIncrement() then prevents any attempted
+         * increment of the current commandId.
          */
-        Assert(!IsParallelWorker());
+        Assert(!(IsParallelWorker() && !currentCommandIdUsed));
         currentCommandIdUsed = true;
     }
     return currentCommandId;

What happens without these changes?  If this kind of change is really necessary, it seems more natural to pass
currentCommandIdUsedtogether with currentCommandId through SerializeTransactionState() and
StartParallelWorkerTransaction(),instead of the above changes.
 

As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().


(8)
+        /*
+         * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+         * the relation, and this would result in creation of new CommandIds
+         * on insert/update/delete and this isn't supported in a parallel
+         * worker (but is safe in the parallel leader).
+         */
+        trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+        if (trigtype == RI_TRIGGER_FK)
+        {
+            if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+                return true;
+        }

Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.  See
RI_FKey_check()which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows
argumentset to false, which causes CommandCounterIncrement() to not be called.
 

Plus, tables that have RI_TRIGGER_PK should allow parallel INSERT in a parallel-safe manner, because those triggers
onlyfire for UPDATE and DELETE.  So, for the future parallel UPDATE/DELETE support, the above check should be performed
inUPDATE and DELETE cases.
 

(In a data warehouse, fact tables, which store large amounts of historical data, typically have foreign keys to smaller
dimensiontables.  Thus, it's important to allow parallel INSERTs on tables with foreign keys.)
 


Regards
Takayuki Tsunakawa

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

25 January 2021, 03:22:29

Hi,

After doing some test to cover the code path in the PATCH 0001.
I have some suggestions for the 0002 testcase.


(1)
+            /* Check parallel-safety of any expressions in the partition key */
+            if (get_partition_col_attnum(pkey, i) == 0)
+            {
+                Node       *check_expr = (Node *) lfirst(partexprs_item);
+
+                if (max_parallel_hazard_walker(check_expr, context))
+                {
+                    table_close(rel, lockmode);
+                    return true;
+                }

The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition
key).

Personally, I use the following sql to cover this:
-----
create table partkey_unsafe_key_expr_t (a int4, b name) partition by range
((fullname_parallel_unsafe('',a::varchar)));
explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1;
-----


(2)
I noticed that most of testcase test both (parallel safe/unsafe/restricted).
But the index expression seems does not test the parallel restricted.
How about add a testcase like:
-----
create or replace function fullname_parallel_restricted(f text, l text) returns text as $$
    begin
        return f || l;
    end;
$$ language plpgsql immutable parallel restricted;

create table names4(index int, first_name text, last_name text);
create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name));

--
-- Test INSERT with parallel-restricted index expression
-- (should create a parallel plan)
--
explain (costs off) insert into names4 select * from names;
-----

(3)
+        /* Recursively check each partition ... */
+        pdesc = RelationGetPartitionDesc(rel);
+        for (i = 0; i < pdesc->nparts; i++)
+        {
+            if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+                                                        command_type,
+                                                        context,
+                                                        AccessShareLock))
+            {
+                table_close(rel, lockmode);
+                return true;
+            }
+        }

It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition)
Hoe about add one testcase to test parallel unsafe partition ?

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 January 2021, 03:46:03

On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
>
> Hello Greg-san,
>
>
> Second group of comments (I'll reply to (1) - (4) later):
>
>
> (5)
> @@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
> ...
> - if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
> + if ((plannedstmt->commandType != CMD_SELECT &&
> + !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
> PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
> }
>
> Now that we're trying to allow parallel writes (INSERT), we should:
>
> * use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents. That is, move the call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
>
> * Update the comment above the call to ExecCheckXactReadOnly().
>
>

Hmmm, I not so sure. The patch changes just make the existing test for calling PreventCommandIfParallelMode() a bit more restrictive, to exclude the Parallel INSERT case. So the code previously wasn't just checking read-only transactions anyway, so it's not as if the patch has changed something fundamental in this function. And by moving the PreventCommandIfParallelMode() call to a higher level, then you're making a change to the existing order of error-handling (as ExecCheckXactReadOnly() is calling PreventCommandIfReadOnly() based on a few other range-table conditions, prior to testing whether to call PreventCommandIfParallelMode()). I don't want to introduce a bug by making the change that you're suggesting.

> (6)
> @@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
> ...
> + else
> + {
> + pei->processed_count = NULL;
> + }
>
> The braces can be deleted.
>

Yes they can be deleted, and I guess I will, but for the record, I personally prefer the explicit brackets, even if just one line, because:
- if more code ever needs to be added to the else, you'll need to add brackets anyway (and newbies might add extra lines tabbed in, thinking it's part of the else block ...).
- I think it looks better and slightly easier to read, especially when there's a mix of cases with multiple code lines and single code lines
Of course, these kind of things could be debated forever, but I don't think it's such a big deal.

>
> (7)
> @@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
> true);
> queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
>
> + Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
> + if (IsModifySupportedInParallelMode(queryDesc->operation))
> + {
> + /*
> + * Record that the CurrentCommandId is used, at the start of the
> + * parallel operation.
> + */
> + SetCurrentCommandIdUsedForWorker();
> + }
> +
> /* Setting debug_query_string for individual workers */
> debug_query_string = queryDesc->sourceText;
>
> @@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
> if (used)
> {
> /*
> - * Forbid setting currentCommandIdUsed in a parallel worker, because
> - * we have no provision for communicating this back to the leader. We
> - * could relax this restriction when currentCommandIdUsed was already
> - * true at the start of the parallel operation.
> + * If in a parallel worker, only allow setting currentCommandIdUsed if
> + * currentCommandIdUsed was already true at the start of the parallel
> + * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
> + * setting currentCommandIdUsed because we have no provision for
> + * communicating this back to the leader. Once currentCommandIdUsed is
> + * set, the commandId used by leader and workers can't be changed,
> + * because CommandCounterIncrement() then prevents any attempted
> + * increment of the current commandId.
> */
> - Assert(!IsParallelWorker());
> + Assert(!(IsParallelWorker() && !currentCommandIdUsed));
> currentCommandIdUsed = true;
> }
> return currentCommandId;
>
> What happens without these changes?

The change to the above comment explains why the change is needed.
Without these changes, a call in a parallel worker to GetCurrentCommandId() will result in an Assert being fired because (prior to the patch) currentCommandIdUsed is forbidden to be set in a parallel worker, and calling GetCurrentCommandId(true) (to signify the intent to use the returned CommandId to mark inserted/updated/deleted tuples) will result in currentCommandIdUsed being set to true.
So it is clear that this cannot remain the same, in order to support Parallel INSERT by workers.
So for each worker, the patch sets "currentCommandIdUsed" to true at the start of the parallel operation (using SetCurrentCommandIdUsedForWorker()) and the Assert condition in GetCurrentCommandId() is tweaked to fire the Assert if GetCurrentCommandId(true) is called in a parallel worker when currentCommandIdUsed is false;
To me, this makes perfect sense.

>If this kind of change is really necessary, it seems more natural to pass currentCommandIdUsed together with currentCommandId through SerializeTransactionState() and StartParallelWorkerTransaction(), instead of the above changes.

No, I don't agree with that. That approach doesn't sound right to me at all.
All the patch really changes is WHERE "currentCurrentIdUsed" can be set for a parallel worker - now it is only allowed to be set to true at the start of the parallel operation for each worker, and the Assert (which is just a sanity check) is updated to ensure that for workers, it can only be set true at that time. That's all it does. It's completely consistent with the old comment that said "We could relax this restriction when currentCommandIdUsed was already true at the start of the parallel operation" - that's what we are now doing with the patch.

> As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().
>

Thanks, I'll fix that in the comments.

>
> (8)
> + /*
> + * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> + * the relation, and this would result in creation of new CommandIds
> + * on insert/update/delete and this isn't supported in a parallel
> + * worker (but is safe in the parallel leader).
> + */
> + trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> + if (trigtype == RI_TRIGGER_FK)
> + {
> + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> + return true;
> + }
>
> Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs. See RI_FKey_check() which is called in RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the detectNewRows argument set to false, which causes CommandCounterIncrement() to not be called.
>

Hmmm, I'm not sure that you have read and interpreted the patch code correctly.

The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will generate a new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new command IDs across the participants in the parallel operation). This is what the code comment says, It does not say that such a trigger generates a new command ID.

See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665

In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK).

If you have a test case that breaks the existing patch, please let me know.

Regards,

Greg Nancarrow

Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

25 January 2021, 05:37:29

From: Greg Nancarrow <gregn4422@gmail.com>
> > (1)
> Yes, you're right the wording is not right (and I don't really like
> the wording used before the patch).
> 
> Perhaps it could say:
> 
> (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> of now, other than in the case of INSERT INTO...SELECT, only the leader
> backend
> writes into a completely new table. In the future, we can extend it to
> allow workers for the
> other commands to write into the table. However, to allow parallel
> updates and deletes, we
> have to solve other problems, especially around combo CIDs.)

That looks good to me, thanks.


> > (4)
> You could, but I'm not sure it would make the code easier to read,
> especially for those who don't know !isParallelWorker() means it's a
> parallel leader.
...
> I really can't see a problem. PrepareParallelMode() is only needed
> prior to execution of a parallel plan, so it's not needed for "other
> call sites" using EnterParallelMode().
My frank first impressions were (and are):

* Why do we have to call a separate function for preparation despite the actual entering follows immediately?  We can
donecessary preparation in the entering function.
 

* Those who read the parallel index build and parallel VACUUM code for the first time might be startled at the missing
PrepareParallelMode()call: "Oh, EnterParallelMode() is called without preparation unlike the other site I saw the other
day. Isn't this a but?"
 


> Perhaps the name can be changed to disassociate it from generic
> EnterParallelMode() usage. So far, I've only thought of long names
> like: PrepareParallelModePlanExec().
> Ideas?

What PrepareParallelMode() handles is the XID and command ID, which are managed by access/transam/ module and are not
executor-specific. It's natural (or at least not unnatural) that EnterParallelMode() prepares them, because
EnterParallelMode()is part of access/transam/.
 


Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 January 2021, 06:47:41

On Mon, Jan 25, 2021 at 2:22 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Hi,

After doing some test to cover the code path in the PATCH 0001.
I have some suggestions for the 0002 testcase.

(1)
+ /* Check parallel-safety of any expressions in the partition key */
+ if (get_partition_col_attnum(pkey, i) == 0)
+ {
+ Node *check_expr = (Node *) lfirst(partexprs_item);
+
+ if (max_parallel_hazard_walker(check_expr, context))
+ {
+ table_close(rel, lockmode);
+ return true;
+ }

The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition key).

Personally, I use the following sql to cover this:
-----
create table partkey_unsafe_key_expr_t (a int4, b name) partition by range ((fullname_parallel_unsafe('',a::varchar)));
explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1;
-----

Thanks. It looks like that test case was accidently missed (since the comment said to test the index expressions, but it actually tested the support functions).

I'll update the test code (and comments) accordingly, using your suggestion.

(2)
I noticed that most of testcase test both (parallel safe/unsafe/restricted).
But the index expression seems does not test the parallel restricted.
How about add a testcase like:
-----
create or replace function fullname_parallel_restricted(f text, l text) returns text as $$
begin
return f || l;
end;
$$ language plpgsql immutable parallel restricted;

create table names4(index int, first_name text, last_name text);
create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name));

--
-- Test INSERT with parallel-restricted index expression
-- (should create a parallel plan)
--
explain (costs off) insert into names4 select * from names;
-----

Thanks, looks like that test case is missing, I'll add it as you suggest.

(3)
+ /* Recursively check each partition ... */
+ pdesc = RelationGetPartitionDesc(rel);
+ for (i = 0; i < pdesc->nparts; i++)
+ {
+ if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+ command_type,
+ context,
+ AccessShareLock))
+ {
+ table_close(rel, lockmode);
+ return true;
+ }
+ }

It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition)
Hoe about add one testcase to test parallel unsafe partition ?

OK, I have to create a more complex table to test those other potential parallel-safety issues of partitions (other than what was tested before the recursive call, or support functions and expression in index key), but since it's a recursive call, invoking code that's already been tested, I would not anticipate any problems.

Thanks,

Greg Nancarrow

Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 January 2021, 07:51:32

On Mon, Jan 25, 2021 at 4:37 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> > > (4)
> > You could, but I'm not sure it would make the code easier to read,
> > especially for those who don't know !isParallelWorker() means it's a
> > parallel leader.
> ...
> > I really can't see a problem. PrepareParallelMode() is only needed
> > prior to execution of a parallel plan, so it's not needed for "other
> > call sites" using EnterParallelMode().
> My frank first impressions were (and are):
>
> * Why do we have to call a separate function for preparation despite the actual entering follows immediately?  We can
donecessary preparation in the entering function. 
>
> * Those who read the parallel index build and parallel VACUUM code for the first time might be startled at the
missingPrepareParallelMode() call: "Oh, EnterParallelMode() is called without preparation unlike the other site I saw
theother day.  Isn't this a but?" 
>
>
> > Perhaps the name can be changed to disassociate it from generic
> > EnterParallelMode() usage. So far, I've only thought of long names
> > like: PrepareParallelModePlanExec().
> > Ideas?
>
> What PrepareParallelMode() handles is the XID and command ID, which are managed by access/transam/ module and are not
executor-specific. It's natural (or at least not unnatural) that EnterParallelMode() prepares them, because
EnterParallelMode()is part of access/transam/. 
>
>

EnterParallelMode() is part of a generic interface for execution of a
parallel operation, and EnterParallelMode() is called in several
different places to enter parallel mode prior to execution of
different parallel operations. At the moment it is assumed that
EnterParallelMode() just essentially sets a flag to prohibit certain
unsafe operations when doing the parallel operation. If I move
PrepareParallelMode() into EnterParallelMode() then I need to pass in
contextual information to distinguish who the caller is, and possibly
extra information needed by that caller - and change the function call
for each caller, and probably update the comments for each, and in
other places, etc. etc.
I think that it just complicates things doing this. The other callers
of EnterParallelMode() are obviously currently doing their own "pre"
parallel-mode code themselves, specific to whatever parallel operation
they are doing - but nobody has thought it necessary to have to hook
this code into EnterParallelMode().
I think the "PrepareParallelMode()" name can just be changed to
something specific to plan execution, so nobody gets confused with a
name like "PrepareParallelMode()", which as you point out sounds
generic to all callers of EnterParallelMode().

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 January 2021, 11:39:11

On Fri, Jan 22, 2021 at 7:52 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> (1)
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> -        * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
> -        * backend writes into a completely new table.  In the future, we can
> -        * extend it to allow workers to write into the table.  However, to allow
> -        * parallel updates and deletes, we have to solve other problems,
> -        * especially around combo CIDs.)
> +        * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> +        * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> +        * of now, only the leader backend writes into a completely new table. In
>
> This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it?

>
>

Sorry, I've just realized that this is in reference to the 1st patch
(v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch),
which implements parallel SELECT for INSERT.
In that case, data is SELECTed in parallel by the workers, but only
INSERTed by the parallel leader.
So the patch comment is, in fact, correct.
In the 3rd patch
(v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch),
which implements parallel INSERT, the wording for this comment is
again altered, to reflect the fact that parallel workers also write
into the table.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

25 January 2021, 11:39:47

Hi,

When reading the code of rel_max_parallel_hazard_for_modify in 0001.

I thought there are so many places call table_close().
Personally, It's a little confused to me.

Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ?

Like:

static bool rel_max_parallel_hazard_for_modify(Relation rel,
                                               CmdType command_type,
                                               max_parallel_hazard_context *context);
...
        Relation relation = table_open(rte->relid, NoLock);
        (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, &context);
        table_close(relation, NoLock);


And we seems do not need the lockmode param with the above define.


Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

25 January 2021, 13:14:14

On Mon, Jan 25, 2021 at 10:40 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When reading the code of rel_max_parallel_hazard_for_modify in 0001.
>
> I thought there are so many places call table_close().
> Personally, It's a little confused to me.
>
> Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ?
>
> Like:
>
> static bool rel_max_parallel_hazard_for_modify(Relation rel,
>                                                CmdType command_type,
>                                                max_parallel_hazard_context *context);
> ...
>         Relation relation = table_open(rte->relid, NoLock);
>         (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, &context);
>         table_close(relation, NoLock);
>
>
> And we seems do not need the lockmode param with the above define.
>
>

Yeah, the repeated cleanup at the point of return is a bit ugly.
It could be solved by changing the function to do cleanup at a common
return point, but I agree with you that in this case it could simply
be done outside the function.
Thanks, I'll make that change.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

26 January 2021, 04:18:22

Hi,

I have an issue of the check about column default expressions.

+    if (command_type == CMD_INSERT)
+    {
+        /*
+         * Column default expressions for columns in the target-list are
+         * already being checked for parallel-safety in the
+         * max_parallel_hazard() scan of the query tree in standard_planner().
+         */
+
+        tupdesc = RelationGetDescr(rel);
+        for (attnum = 0; attnum < tupdesc->natts; attnum++)


IMO, max_parallel_hazard() only check the parent table's default expressions, But if the table has partitions and its
partitionhave its own default expressions, max_parallel_hazard() seems does not check that.
 
And we seems does not check that too.

I am not sure should we allow parallel insert for this case ?

Example:

-------------------------
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;
set max_parallel_workers_per_gather=4;

create table origin(a int);
insert into origin values(generate_series(1,5000));

create or replace function bdefault_unsafe () returns int language plpgsql parallel unsafe as $$ begin
    RETURN 5;
end $$;

create table parttable1 (a int, b name) partition by range (a); create table parttable1_1 partition of parttable1 for
valuesfrom (0) to (5000); create table parttable1_2 partition of parttable1 for values from (5000) to (10000);
 

alter table parttable1_1 ALTER COLUMN b SET DEFAULT bdefault_unsafe();

postgres=# explain insert into parttable1 select * from origin ;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Gather  (cost=0.00..41.92 rows=5865 width=0)
   Workers Planned: 3
   ->  Insert on parttable1  (cost=0.00..41.92 rows=0 width=0)
         ->  Parallel Seq Scan on origin  (cost=0.00..41.92 rows=1892 width=68)
(4 rows)

postgres=# explain insert into parttable1_1 select * from origin ;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Insert on parttable1_1  (cost=0.00..1348.00 rows=0 width=0)
   ->  Seq Scan on origin  (cost=0.00..1348.00 rows=5000 width=68)
(2 rows)

-------------------------

Best regards,
houzj

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

26 January 2021, 05:36:24

From: Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
> IMO, max_parallel_hazard() only check the parent table's default expressions,
> But if the table has partitions and its partition have its own default expressions,
> max_parallel_hazard() seems does not check that.
> And we seems does not check that too.
> 
> I am not sure should we allow parallel insert for this case ?

I think we can allow parallel insert in this case, because the column value is determined according to the DEFAULT
definitionof the target table specified in the INSERT statement.  This is described here:
 

https://www.postgresql.org/docs/devel/sql-createtable.html

"Defaults may be specified separately for each partition. But note that a partition's default value is not applied when
insertinga tuple through a partitioned table."
 

So the parallel-unsafe function should not be called.


Regards
Takayuki Tsunakawa

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

26 January 2021, 10:29:49

> I think we can allow parallel insert in this case, because the column value
> is determined according to the DEFAULT definition of the target table
> specified in the INSERT statement.  This is described here:
> 
> https://www.postgresql.org/docs/devel/sql-createtable.html
> 
> "Defaults may be specified separately for each partition. But note that
> a partition's default value is not applied when inserting a tuple through
> a partitioned table."
> 
> So the parallel-unsafe function should not be called.

Thanks for the explanation.
I think you are right, I did miss it.

Best regards,
houzj

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

27 January 2021, 03:12:53

Hi,

When testing the patch with the following kind of sql.

---
Insert into part_table select 1;
Insert into part_table select generate_series(1,10000,1);
Insert into part_table select * from testfunc();
---

we usually use these sqls to initialize the table or for testing purpose.

Personally I think we do not need to do the parallel safety-check for these cases,
because there seems no chance for the select part to consider parallel.

I thought we aim to not check the safety unless parallel is possible.
, So I was thinking is it possible to avoid the check it these cases ?

I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query.
If no we do not check the safety-check.

I am not sure is it worth to do that, any thoughts ?

Best regards,
Houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

27 January 2021, 11:37:01

On Wed, Jan 27, 2021 at 2:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When testing the patch with the following kind of sql.
>
> ---
> Insert into part_table select 1;
> Insert into part_table select generate_series(1,10000,1);
> Insert into part_table select * from testfunc();
> ---
>
> we usually use these sqls to initialize the table or for testing purpose.
>
> Personally I think we do not need to do the parallel safety-check for these cases,
> because there seems no chance for the select part to consider parallel.
>
> I thought we aim to not check the safety unless parallel is possible.
> , So I was thinking is it possible to avoid the check it these cases ?
>
> I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query.
> If no we do not check the safety-check.
>
> I am not sure is it worth to do that, any thoughts ?
>

Yes, I think it's worth it. It's surprising that there's not really
any optimizations for these with just the current Postgres parallel
SELECT functionality (as there's currently no way to divide the work
for these amongst the workers, even if the function/expression is
parallel-safe).
For the additional parallel-safety checks for INSERT, currently we
check that RTE_SUBQUERY is in the range-table. So I think we can
additionally check that RTE_RELATION is in the subquery range-table
(otherwise treat it as unsafe).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

28 January 2021, 13:11:06

Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

[Hou]
- Deallocate list returned from RelationGetIndexList() using
list_free() after finished using list
- Regard foreign and temporary tables as parallel-restricted (instead
of parallel unsafe) for Insert
- pfree() conbin returned from TextDatumGetCString() after finished using it
- Make parallel-safety checks of index expressions more efficient,
remove some redundant function calls
- Add a few more test cases to cover certain untested parallel-safety
check cases
- Remove repeated table_close() on return, by moving table_open() &
table_close() to a higher level
- Reduce Insert parallel-safety checks required for some SQL, by
noting that the subquery must operate on a relation (check for
RTE_RELATION in subquery range-table)

[Zhihong Yu]
- Minor change to patch comment
- Wrap long line
- Remove intermediate local variable

[Tsunakawa-san]
- Update RELATION_IS_LOCAL macro to reinstate previously-removed check
on the relation being newly created in the current transaction (and so
assumed accessible only to the current backend), but for
non-parallel-mode only (since now it may be accessible to parallel
workers)
- Remove braces for one-line else
- Fix code comment
- Rename PrepareParallelMode() for plan execution, so that it's not
misinterpreted as a general function for preparation of parallel-mode

[Misc]
- Fix bug in query re-writer - hasModifyingCTE is not set in
re-written non-SELECT queries having a CTE


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Tang, Haiying"

Date:

01 February 2021, 08:19:56

Hi Greg,

Recently, I was keeping evaluating performance of this patch(1/28 V13).
Here I find a regression test case which is parallel insert with bitmap heap scan.
when the target table has primary key or index, then the patched performance will have a 7%-19% declines than
unpatched.
 

Could you please have a look about this?

I tried max_parallel_workers_per_gather=2/4/8, and I didn't tune other parameters(like GUCs or other enforce parallel
parameters).
 

1. max_parallel_workers_per_gather=2(default)
target_table        patched       master      %reg
------------------------------------------------------
without_PK_index    83.683        142.183    -41%
with_PK             382.824       321.101    19%
with_index          372.682       324.246    15%

2. max_parallel_workers_per_gather=4
target_table        patched       master      %reg
------------------------------------------------------
without_PK_index    73.189        141.879     -48%
with_PK             362.104       329.759     10%
with_index          372.237       333.718     12%

3. max_parallel_workers_per_gather=8 (also set max_parallel_workers=16, max_worker_processes = 16)
target_table        patched       master      %reg
------------------------------------------------------
without_PK_index    75.072        146.100     -49%
with_PK             365.312       324.339     13%
with_index          362.636       338.366     7%

Attached test_bitmap.sql which includes my test data and sql if you want to have a look. 

Regards,
Tang

Attachment

test_bitmap.sql

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

01 February 2021, 09:18:39

Hi,

When developing the reloption patch, I noticed some issues in the patch.

1).
> - Reduce Insert parallel-safety checks required for some SQL, by noting
> that the subquery must operate on a relation (check for RTE_RELATION in
> subquery range-table)

+            foreach(lcSub, rte->subquery->rtable)
+            {
+                rteSub = lfirst_node(RangeTblEntry, lcSub);
+                if (rteSub->rtekind == RTE_RELATION)
+                {
+                    hasSubQueryOnRelation = true;
+                    break;
+                }
+            }
It seems we can not only search RTE_RELATION in rtable,
because RTE_RELATION may exist in other place like:

---
--** explain insert into target select (select * from test);
    Subplan's subplan

--** with cte as (select * from test) insert into target select * from cte;
    In query's ctelist.
---

May be we should use a walker function [1] to
search the subquery and ctelist.

2).

+--
+-- Test INSERT into temporary table with underlying query.
+-- (should not use a parallel plan)
+--

May be the comment here need some change since
we currently support parallel plan for temp table.

3)
Do you think we can add a testcase for foreign-table ?
To test parallel query with serial insert on foreign table.


[1]
static bool
relation_walker(Node *node)
{
    if (node == NULL)
        return false;

    else if (IsA(node, RangeTblEntry))
    {
        RangeTblEntry *rte = (RangeTblEntry *) node;
        if (rte->rtekind == RTE_RELATION)
            return true;
        
        return false;
    }

    else if (IsA(node, Query))
    {
        Query       *query = (Query *) node;

        /* Recurse into subselects */
        return query_tree_walker(query, relation_walker,
                                 NULL, QTW_EXAMINE_RTES_BEFORE);
    }

    /* Recurse to check arguments */
    return expression_tree_walker(node,
                                  relation_walker,
                                  NULL);
}


Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

01 February 2021, 12:41:41

On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> +                       foreach(lcSub, rte->subquery->rtable)
> +                       {
> +                               rteSub = lfirst_node(RangeTblEntry, lcSub);
> +                               if (rteSub->rtekind == RTE_RELATION)
> +                               {
> +                                       hasSubQueryOnRelation = true;
> +                                       break;
> +                               }
> +                       }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
>         Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
>         In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>

Yes, the current checks are too simple, as you point out, there seem
to be more complex cases that it doesn't pick up. Unfortunately
expanding the testing for them does detract from the original
intention of this code (which was to avoid extra parallel-safety check
processing on code which can't be run in parallel). I guess the
relation walker function should additionally check for SELECT queries
only (commandType == CMD_SELECT), and exclude SELECT FOR UPDATE/SHARE
(rowMarks != NIL) too. I'll need to look further into it, but will
certainly update the code for the next version of the patch.

> 2).
>
> +--
> +-- Test INSERT into temporary table with underlying query.
> +-- (should not use a parallel plan)
> +--
>
> May be the comment here need some change since
> we currently support parallel plan for temp table.
>

Thanks, it should say something like "should create the plan with
INSERT + parallel SELECT".

> 3)
> Do you think we can add a testcase for foreign-table ?
> To test parallel query with serial insert on foreign table.
>

I have intended to do it, but as a lower-priority task.

>
> [1]
> static bool
> relation_walker(Node *node)
> {
>         if (node == NULL)
>                 return false;
>
>         else if (IsA(node, RangeTblEntry))
>         {
>                 RangeTblEntry *rte = (RangeTblEntry *) node;
>                 if (rte->rtekind == RTE_RELATION)
>                         return true;
>
>                 return false;
>         }
>
>         else if (IsA(node, Query))
>         {
>                 Query      *query = (Query *) node;
>
>                 /* Recurse into subselects */
>                 return query_tree_walker(query, relation_walker,
>                                                                  NULL, QTW_EXAMINE_RTES_BEFORE);
>         }
>
>         /* Recurse to check arguments */
>         return expression_tree_walker(node,
>                                                                   relation_walker,
>                                                                   NULL);
> }
>

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

02 February 2021, 08:11:51

On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> +                       foreach(lcSub, rte->subquery->rtable)
> +                       {
> +                               rteSub = lfirst_node(RangeTblEntry, lcSub);
> +                               if (rteSub->rtekind == RTE_RELATION)
> +                               {
> +                                       hasSubQueryOnRelation = true;
> +                                       break;
> +                               }
> +                       }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
>         Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
>         In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>
>
>
> [1]
> static bool
> relation_walker(Node *node)
> {
>         if (node == NULL)
>                 return false;
>
>         else if (IsA(node, RangeTblEntry))
>         {
>                 RangeTblEntry *rte = (RangeTblEntry *) node;
>                 if (rte->rtekind == RTE_RELATION)
>                         return true;
>
>                 return false;
>         }
>
>         else if (IsA(node, Query))
>         {
>                 Query      *query = (Query *) node;
>
>                 /* Recurse into subselects */
>                 return query_tree_walker(query, relation_walker,
>                                                                  NULL, QTW_EXAMINE_RTES_BEFORE);
>         }
>
>         /* Recurse to check arguments */
>         return expression_tree_walker(node,
>                                                                   relation_walker,
>                                                                   NULL);
> }
>

I've had a further look at this, and this walker function is doing a
lot of work recursing the parse tree, and I'm not sure that it
reliably retrieves the information that we;re looking for, for all
cases of different SQL queries. Unless it can be made much more
efficient and specific to our needs, I think we should not try to do
this optimization, because there's too much overhead. Also, keep in
mind that for the current parallel SELECT functionality in Postgres, I
don't see any similar optimization being attempted (and such
optimization should be attempted at the SELECT level). So I don't
think we should be attempting such optimization in this patch (but
could be attempted in a separate patch, just related to current
parallel SELECT functionality).

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

02 February 2021, 08:26:49

> 
> I've had a further look at this, and this walker function is doing a lot
> of work recursing the parse tree, and I'm not sure that it reliably retrieves
> the information that we;re looking for, for all cases of different SQL
> queries. Unless it can be made much more efficient and specific to our needs,
> I think we should not try to do this optimization, because there's too much
> overhead. Also, keep in mind that for the current parallel SELECT
> functionality in Postgres, I don't see any similar optimization being
> attempted (and such optimization should be attempted at the SELECT level).
> So I don't think we should be attempting such optimization in this patch
> (but could be attempted in a separate patch, just related to current parallel
> SELECT functionality).

Yes, I agreed,
I was worried about the overhead it may bring too,
we can remove this from the current patch.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

02 February 2021, 11:03:08

On Tue, Feb 2, 2021 at 7:26 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> >
>
> Yes, I agreed,
> I was worried about the overhead it may bring too,
> we can remove this from the current patch.
>

Posting an updated set of patches. Changes are based on feedback, as
detailed below:

[Hou]
- Corrected code that tries to check for underlying query on a
relation (but must at least check for underlying query, to rule out
VALUES). More complex tests not used, due to overhead.
- Fixed comment in tests for INSERT on temp table.
[Antonin]
- Moved Gather node targetlist fix-up code to set_plan_refs() and
updated comment (Antonin)
- Added extra tests for INSERT with RETURNING clause.

Hou: the parallel_dml patches will need slight rebasing

Regards,
Greg Nancarrow
Fujitsu Australia

Posting an updated set of patches.
The only update is to include an improved, but only temporary, fix to
the query rewriter hasModifyingCTE issue (I separately posted a patch
for this but Tom Lane concluded that the issue is more complex than
initially thought, and no easy fix could be confidently attempted at
this point in time, so nothing ended up getting pushed - the issue
needs further investigation at another time).
See also comment in patch 0001.

Regards,
Greg Nancarrow
Fujitsu Australia

> Did it actually use a parallel plan in your testing?
> When I ran these tests with the Parallel INSERT patch applied, it did 
> not naturally choose a parallel plan for any of these cases.

Yes, these cases pick parallel plan naturally on my test environment.  

postgres=# explain verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
   Workers Planned: 4
   ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0 width=0)
         ->  Parallel Bitmap Heap Scan on public.x1  (cost=3346.89..1272067.44 rows=20343 width=8)
               Output: x1.a, NULL::integer
               Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
               Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a > 199900000)))
               ->  BitmapOr  (cost=3346.89..3346.89 rows=178808 width=0)
                     ->  Bitmap Index Scan on x1_a_idx  (cost=0.00..1495.19 rows=80883 width=0)
                           Index Cond: (x1.a < 80000)
                     ->  Bitmap Index Scan on x1_a_idx  (cost=0.00..1811.01 rows=97925 width=0)
                           Index Cond: (x1.a > 199900000)

PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't do any parameters tuning in my test
session.

Regards,
Tang

Attachment

postgresql.conf

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

08 February 2021, 09:13:08

> > Did it actually use a parallel plan in your testing?
> > When I ran these tests with the Parallel INSERT patch applied, it did
> > not naturally choose a parallel plan for any of these cases.
> 
> Yes, these cases pick parallel plan naturally on my test environment.
> 
> postgres=# explain verbose insert into testscan select a from x where
> a<80000 or (a%2=0 and a>199900000);
>                                             QUERY PLAN
> ----------------------------------------------------------------------
> -----------------------------
>  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
>    Workers Planned: 4
>    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> width=0)
>          ->  Parallel Bitmap Heap Scan on public.x1
> (cost=3346.89..1272067.44 rows=20343 width=8)
>                Output: x1.a, NULL::integer
>                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
>                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> 199900000)))
>                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> width=0)
>                      ->  Bitmap Index Scan on x1_a_idx
> (cost=0.00..1495.19 rows=80883 width=0)
>                            Index Cond: (x1.a < 80000)
>                      ->  Bitmap Index Scan on x1_a_idx
> (cost=0.00..1811.01 rows=97925 width=0)
>                            Index Cond: (x1.a > 199900000)
> 
> PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> do any parameters tuning in my test session.

I reproduced this on my machine.

I think we'd better do "analyze" before insert which helps reproduce this easier.
Like:

-----
analyze;
explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
-----

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Langote

Date:

08 February 2021, 14:03:50

Greg, all

Thanks a lot for your work on this.

On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Posting an updated set of patches.

I've been looking at these patches, initially with an intention to
review mainly any partitioning-related concerns, but have some general
thoughts as well concerning mostly the patches 0001 and 0002.

* I've seen review comments on this thread where I think it's been
suggested that whatever max_parallel_hazard_for_modify() does had
better have been integrated into max_parallel_hazard() such that
there's no particular need for that function to exist.  For example,
the following:

+   /*
+    * UPDATE is not currently supported in parallel-mode, so prohibit
+    * INSERT...ON CONFLICT...DO UPDATE...
+    * In order to support update, even if only in the leader, some
+    * further work would need to be done. A mechanism would be needed
+    * for sharing combo-cids between leader and workers during
+    * parallel-mode, since for example, the leader might generate a
+    * combo-cid and it needs to be propagated to the workers.
+    */
+   if (parse->onConflict != NULL && parse->onConflict->action ==
ONCONFLICT_UPDATE)
+       return PROPARALLEL_UNSAFE;

could be placed in the following block in max_parallel_hazard():

    /*
     * When we're first invoked on a completely unplanned tree, we must
     * recurse into subqueries so to as to locate parallel-unsafe constructs
     * anywhere in the tree.
     */
    else if (IsA(node, Query))
    {
        Query      *query = (Query *) node;

        /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
        if (query->rowMarks != NULL)
        {
            context->max_hazard = PROPARALLEL_UNSAFE;
            return true;
        }

Furthermore, the following:

+   rte = rt_fetch(parse->resultRelation, parse->rtable);
+
+   /*
+    * The target table is already locked by the caller (this is done in the
+    * parse/analyze phase).
+    */
+   rel = table_open(rte->relid, NoLock);
+   (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType,
&context);
+   table_close(rel, NoLock);

can itself be wrapped in a function that's called from
max_parallel_hazard() by adding a new block for RangeTblEntry nodes
and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker().

That brings me to to this part of the hunk:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   hasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           hasSubQuery = true;
+           break;
+       }
+   }
+   if (!hasSubQuery)
+       return PROPARALLEL_UNSAFE;

The justification for this given in:

https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com

seems to be that the failure of a test case in
partition-concurrent-attach isolation suite is prevented if finding no
subquery RTEs in the query is flagged as parallel unsafe, which in
turn stops max_parallel_hazard_modify() from locking partitions for
safety checks in such cases.  But it feels unprincipled to have this
code to work around a specific test case that's failing.  I'd rather
edit the failing test case to disable parallel execution as
Tsunakawa-san suggested.

* Regarding function names:

+static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc,
+
max_parallel_hazard_context *context);
+static bool index_expr_max_parallel_hazard_for_modify(Relation rel,
+
max_parallel_hazard_context *context);
+static bool domain_max_parallel_hazard_for_modify(Oid typid,
max_parallel_hazard_context *context);
+static bool rel_max_parallel_hazard_for_modify(Relation rel,
+                                              CmdType command_type,
+
max_parallel_hazard_context *context)

IMO, it would be better to name these
target_rel_trigger_max_parallel_hazard(),
target_rel_index_max_parallel_hazard(), etc. rather than have
_for_modify at the end of these names to better connote that they
check the parallel safety of applying the modify operation to a given
target relation.  Also, put these prototypes just below that of
max_parallel_hazard() to have related things close by.

Attached please see v15_delta.diff showing the changes suggested above.

* I suspect that the following is broken in light of concurrent
attachment of partitions.

+
+       /* Recursively check each partition ... */
+       pdesc = RelationGetPartitionDesc(rel);

I think we'd need to use CreatePartitionDirectory() and retrieve the
PartitionDesc using PartitionDirectoryLookup().  Something we already
do when opening partitions for SELECT planning.

* I think that the concerns raised by Tsunakawa-san in:

https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com

regarding how this interacts with plancache.c deserve a look.
Specifically, a plan that uses parallel insert may fail to be
invalidated when partitions are altered directly (that is without
altering their root parent).  That would be because we are not adding
partition OIDs to PlannerGlobal.invalItems despite making a plan
that's based on checking their properties.  See this (tested with all
patches applied!):

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
create table foo (a) as select generate_series(1, 1000000);
prepare q as insert into rp select * from foo where a%2 = 0;
explain execute q;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..13041.54 rows=5642 width=0)
   Workers Planned: 2
   ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
         ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
               Filter: ((a % 2) = 0)
(5 rows)

-- create a parallel unsafe trigger (that's actually marked so)
directly on a partition
create or replace function make_table () returns trigger language
plpgsql as $$ begin create table bar(); return null; end; $$ parallel
unsafe;
create trigger ai_rp2 after insert on rp2 for each row execute
function make_table();CREATE TRIGGER

-- plan still parallel
explain execute q;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..13041.54 rows=5642 width=0)
   Workers Planned: 2
   ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
         ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
               Filter: ((a % 2) = 0)
(5 rows)

-- and because it is
execute q;
ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL statement "create table bar()"
PL/pgSQL function make_table() line 1 at SQL statement

-- OTOH, altering parent correctly discards the parallel plan
create trigger ai_rp after insert on rp for each row execute function
make_table();
explain execute q;
                           QUERY PLAN
----------------------------------------------------------------
 Insert on rp  (cost=0.00..19425.00 rows=0 width=0)
   ->  Seq Scan on foo  (cost=0.00..19425.00 rows=5000 width=4)
         Filter: ((a % 2) = 0)
(3 rows)

It's fair to argue that it would rarely make sense to use PREPARE for
bulk loads, but we need to tighten things up a bit here regardless.


--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

v15_delta.diff

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 February 2021, 01:29:52

On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Greg, all
>
> Thanks a lot for your work on this.
>
> On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > Posting an updated set of patches.
>
> I've been looking at these patches, initially with an intention to
> review mainly any partitioning-related concerns, but have some general
> thoughts as well concerning mostly the patches 0001 and 0002.
>
> * I've seen review comments on this thread where I think it's been
> suggested that whatever max_parallel_hazard_for_modify() does had
> better have been integrated into max_parallel_hazard() such that
> there's no particular need for that function to exist.  For example,
> the following:
>
> +   /*
> +    * UPDATE is not currently supported in parallel-mode, so prohibit
> +    * INSERT...ON CONFLICT...DO UPDATE...
> +    * In order to support update, even if only in the leader, some
> +    * further work would need to be done. A mechanism would be needed
> +    * for sharing combo-cids between leader and workers during
> +    * parallel-mode, since for example, the leader might generate a
> +    * combo-cid and it needs to be propagated to the workers.
> +    */
> +   if (parse->onConflict != NULL && parse->onConflict->action ==
> ONCONFLICT_UPDATE)
> +       return PROPARALLEL_UNSAFE;
>
> could be placed in the following block in max_parallel_hazard():
>
>     /*
>      * When we're first invoked on a completely unplanned tree, we must
>      * recurse into subqueries so to as to locate parallel-unsafe constructs
>      * anywhere in the tree.
>      */
>     else if (IsA(node, Query))
>     {
>         Query      *query = (Query *) node;
>
>         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
>         if (query->rowMarks != NULL)
>         {
>             context->max_hazard = PROPARALLEL_UNSAFE;
>             return true;
>         }
>
> Furthermore, the following:
>
> +   rte = rt_fetch(parse->resultRelation, parse->rtable);
> +
> +   /*
> +    * The target table is already locked by the caller (this is done in the
> +    * parse/analyze phase).
> +    */
> +   rel = table_open(rte->relid, NoLock);
> +   (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType,
> &context);
> +   table_close(rel, NoLock);
>
> can itself be wrapped in a function that's called from
> max_parallel_hazard() by adding a new block for RangeTblEntry nodes
> and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker().
>

Thanks, I think those suggestions look good to me.

> That brings me to to this part of the hunk:
>
> +   /*
> +    * If there is no underlying SELECT, a parallel table-modification
> +    * operation is not possible (nor desirable).
> +    */
> +   hasSubQuery = false;
> +   foreach(lc, parse->rtable)
> +   {
> +       rte = lfirst_node(RangeTblEntry, lc);
> +       if (rte->rtekind == RTE_SUBQUERY)
> +       {
> +           hasSubQuery = true;
> +           break;
> +       }
> +   }
> +   if (!hasSubQuery)
> +       return PROPARALLEL_UNSAFE;
>
> The justification for this given in:
>
> https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com
>
> seems to be that the failure of a test case in
> partition-concurrent-attach isolation suite is prevented if finding no
> subquery RTEs in the query is flagged as parallel unsafe, which in
> turn stops max_parallel_hazard_modify() from locking partitions for
> safety checks in such cases.  But it feels unprincipled to have this
> code to work around a specific test case that's failing.  I'd rather
> edit the failing test case to disable parallel execution as
> Tsunakawa-san suggested.
>

The code was not changed because of the test case (though it was
fortunate that the test case worked after the change).
The code check that you have identified above ensures that the INSERT
has an underlying SELECT, because the planner won't (and shouldn't
anyway) generate a parallel plan for INSERT...VALUES, so there is no
point doing any parallel-safety checks in this case.
It just so happens that the problem test case uses INSERT...VALUES -
and it shouldn't have triggered the parallel-safety checks for
parallel INSERT for this case anyway, because INSERT...VALUES can't
(and shouldn't) be parallelized.
So I will need to keep that check in the code somewhere, to avoid
overhead of parallel-safety checks in the case of INSERT...VALUES.

> * Regarding function names:
>
> +static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc,
> +
> max_parallel_hazard_context *context);
> +static bool index_expr_max_parallel_hazard_for_modify(Relation rel,
> +
> max_parallel_hazard_context *context);
> +static bool domain_max_parallel_hazard_for_modify(Oid typid,
> max_parallel_hazard_context *context);
> +static bool rel_max_parallel_hazard_for_modify(Relation rel,
> +                                              CmdType command_type,
> +
> max_parallel_hazard_context *context)
>
> IMO, it would be better to name these
> target_rel_trigger_max_parallel_hazard(),
> target_rel_index_max_parallel_hazard(), etc. rather than have
> _for_modify at the end of these names to better connote that they
> check the parallel safety of applying the modify operation to a given
> target relation.  Also, put these prototypes just below that of
> max_parallel_hazard() to have related things close by.
>
> Attached please see v15_delta.diff showing the changes suggested above.
>

OK, sounds reasonable. Thanks for the patch!

> * I suspect that the following is broken in light of concurrent
> attachment of partitions.
>
> +
> +       /* Recursively check each partition ... */
> +       pdesc = RelationGetPartitionDesc(rel);
>
> I think we'd need to use CreatePartitionDirectory() and retrieve the
> PartitionDesc using PartitionDirectoryLookup().  Something we already
> do when opening partitions for SELECT planning.
>
> * I think that the concerns raised by Tsunakawa-san in:
>
>
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
>
> regarding how this interacts with plancache.c deserve a look.
> Specifically, a plan that uses parallel insert may fail to be
> invalidated when partitions are altered directly (that is without
> altering their root parent).  That would be because we are not adding
> partition OIDs to PlannerGlobal.invalItems despite making a plan
> that's based on checking their properties.  See this (tested with all
> patches applied!):
>
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> create table foo (a) as select generate_series(1, 1000000);
> prepare q as insert into rp select * from foo where a%2 = 0;
> explain execute q;
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Gather  (cost=1000.00..13041.54 rows=5642 width=0)
>    Workers Planned: 2
>    ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
>          ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
>                Filter: ((a % 2) = 0)
> (5 rows)
>
> -- create a parallel unsafe trigger (that's actually marked so)
> directly on a partition
> create or replace function make_table () returns trigger language
> plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> unsafe;
> create trigger ai_rp2 after insert on rp2 for each row execute
> function make_table();CREATE TRIGGER
>
> -- plan still parallel
> explain execute q;
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Gather  (cost=1000.00..13041.54 rows=5642 width=0)
>    Workers Planned: 2
>    ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
>          ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
>                Filter: ((a % 2) = 0)
> (5 rows)
>
> -- and because it is
> execute q;
> ERROR:  cannot start commands during a parallel operation
> CONTEXT:  SQL statement "create table bar()"
> PL/pgSQL function make_table() line 1 at SQL statement
>
> -- OTOH, altering parent correctly discards the parallel plan
> create trigger ai_rp after insert on rp for each row execute function
> make_table();
> explain execute q;
>                            QUERY PLAN
> ----------------------------------------------------------------
>  Insert on rp  (cost=0.00..19425.00 rows=0 width=0)
>    ->  Seq Scan on foo  (cost=0.00..19425.00 rows=5000 width=4)
>          Filter: ((a % 2) = 0)
> (3 rows)
>
> It's fair to argue that it would rarely make sense to use PREPARE for
> bulk loads, but we need to tighten things up a bit here regardless.
>
>

Thanks, looks like you've identified some definite issues in the
partition support and some missing test cases to help detect them.
I'll look into it.

Thanks very much for your review of this and suggestions.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

09 February 2021, 04:00:08

On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > Did it actually use a parallel plan in your testing?
> > > When I ran these tests with the Parallel INSERT patch applied, it did
> > > not naturally choose a parallel plan for any of these cases.
> >
> > Yes, these cases pick parallel plan naturally on my test environment.
> >
> > postgres=# explain verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> >                                             QUERY PLAN
> > ----------------------------------------------------------------------
> > -----------------------------
> >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> >    Workers Planned: 4
> >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > width=0)
> >          ->  Parallel Bitmap Heap Scan on public.x1
> > (cost=3346.89..1272067.44 rows=20343 width=8)
> >                Output: x1.a, NULL::integer
> >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> > 199900000)))
> >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > width=0)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1495.19 rows=80883 width=0)
> >                            Index Cond: (x1.a < 80000)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1811.01 rows=97925 width=0)
> >                            Index Cond: (x1.a > 199900000)
> >
> > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> > do any parameters tuning in my test session.
>
> I reproduced this on my machine.
>
> I think we'd better do "analyze" before insert which helps reproduce this easier.
> Like:
>
> -----
> analyze;
> explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
> -----

OK then.
Can you check if just the underlying SELECTs are run (without INSERT),
is there any performance degradation when compared to a non-parallel
scan?

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"Hou, Zhijie"

Date:

09 February 2021, 12:18:49

> > > postgres=# explain verbose insert into testscan select a from x
> > > where
> > > a<80000 or (a%2=0 and a>199900000);
> > >                                             QUERY PLAN
> > >
> --------------------------------------------------------------------
> > > --
> > > -----------------------------
> > >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> > >    Workers Planned: 4
> > >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > > width=0)
> > >          ->  Parallel Bitmap Heap Scan on public.x1
> > > (cost=3346.89..1272067.44 rows=20343 width=8)
> > >                Output: x1.a, NULL::integer
> > >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> > >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND
> > > (x1.a >
> > > 199900000)))
> > >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > > width=0)
> > >                      ->  Bitmap Index Scan on x1_a_idx
> > > (cost=0.00..1495.19 rows=80883 width=0)
> > >                            Index Cond: (x1.a < 80000)
> > >                      ->  Bitmap Index Scan on x1_a_idx
> > > (cost=0.00..1811.01 rows=97925 width=0)
> > >                            Index Cond: (x1.a > 199900000)
> > >
> > > PSA is my postgresql.conf file, maybe you can have a look. Besides,
> > > I didn't do any parameters tuning in my test session.
> >
> > I reproduced this on my machine.
> >
> > I think we'd better do "analyze" before insert which helps reproduce this
> easier.
> > Like:
> >
> > -----
> > analyze;
> > explain analyze verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> > -----
> 
> OK then.
> Can you check if just the underlying SELECTs are run (without INSERT), is
> there any performance degradation when compared to a non-parallel scan?

It seems there is no performance degradation without insert.

Till now, what I found is that:
With tang's conf, when doing parallel insert, the walrecord is more than serial insert
(IMO, this is the main reason why it has performance degradation)
See the attatchment for the plan info.

I have tried alter the target table to unlogged and
then the performance degradation will not happen any more.

And the additional walrecord seems related to the index on the target table.
If the target table does not have any index, the wal record is the same between parallel plan and serial plan.
Also, it does not have performance degradation without index.

I am still looking at this problem, if someone think of something about it,
It's very grateful to share the knowledge with me.

Best regards,
houzj

> > > It did have performance gain, but I think it's not huge enough to
> > > ignore the extra's index cost.
> > > What do you think ?
> >
> > Yes... as you suspect, I'm afraid the benefit from parallel bitmap
> > scan may not compensate for the loss of the parallel insert operation.
> >
> > The loss is probably due to 1) more index page splits, 2) more buffer
> > writes (table and index), and 3) internal locks for things such as
> > relation extension and page content protection.  To investigate 3), we
> > should want something like [1], which tells us the wait event
> > statistics (wait count and time for each wait event) per session or
> > across the instance like Oracke, MySQL and EDB provides.  I want to
> continue this in the near future.
> 
> What would the result look like if you turn off
> parallel_leader_participation?  If the leader is freed from
> reading/writing the table and index, the index page splits and internal
> lock contention may decrease enough to recover part of the loss.
> 
> https://www.postgresql.org/docs/devel/parallel-plans.html
> 
> "In a parallel bitmap heap scan, one process is chosen as the leader. That
> process performs a scan of one or more indexes and builds a bitmap indicating
> which table blocks need to be visited. These blocks are then divided among
> the cooperating processes as in a parallel sequential scan. In other words,
> the heap scan is performed in parallel, but the underlying index scan is
> not."

If I disable parallel_leader_participation.

For max_parallel_workers_per_gather = 4, It still have performance degradation.

For max_parallel_workers_per_gather = 2, the performance degradation will not happen in most of the case.
There is sometimes a noise(performance degradation), but most of result(about 80%) is good.

Best regards,
houzj

Posting an updated set of patches. Changes are based on feedback, as
detailed below:

[Amit Langote]
- Integrate max_parallel_hazard_for_modify() with max_parallel_hazard()
- Some function name changes
- Fix partition-related problems (to handle concurrent attachment of
partitions and altering of partitions, plan cache invalidation) and
added some tests for this.
(Method of fixing yet to be verified)
[Hou-san]
- Merge is_parallel_possible_for_modify() from the parallel_dml patch,
which helps in placement of some short-circuits of parallel-safety
checks
- Minor update to documentation for temp and foreign tables
[Greg]
- Temporary fix for query rewriter hasModifyingCTE bug (without
changing query rewriter code - note that v15 patch put fix in query
rewriter)

Hou-san: the parallel_dml patches will need slight rebasing.


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 February 2021, 03:02:22

On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches. Changes are based on feedback, as
> detailed below:
>

Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
tests, and it caused some test failures in the PostgreSQL Patch Tester
(cfbot).
Also, I think that perhaps the localized temporary fix included in the
patch for the hasModifyingCTE bug should be restricted to INSERT, even
though the bug actually exists for SELECT too.
Posting an updated set of patches to address these.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From

Zhihong Yu

Date:

12 February 2021, 03:35:18

Hi,

For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

+ /* Assume original queries have hasModifyingCTE set correctly */
+ if (parsetree->hasModifyingCTE)
+ hasModifyingCTE = true;

Since hasModifyingCTE is false by the time the above is run, it can be simplified as:

hasModifyingCTE = parsetree->hasModifyingCTE

+ if (!hasSubQuery)
+ return false;
+
+ return true;

The above can be simplified as:

return hasSubQuery;

Cheers

On Thu, Feb 11, 2021 at 7:02 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches. Changes are based on feedback, as
> detailed below:
>

Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
tests, and it caused some test failures in the PostgreSQL Patch Tester
(cfbot).
Also, I think that perhaps the localized temporary fix included in the
patch for the hasModifyingCTE bug should be restricted to INSERT, even
though the bug actually exists for SELECT too.
Posting an updated set of patches to address these.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 February 2021, 04:14:02

On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> +       /* Assume original queries have hasModifyingCTE set correctly */
> +       if (parsetree->hasModifyingCTE)
> +           hasModifyingCTE = true;
>
> Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
>     hasModifyingCTE = parsetree->hasModifyingCTE
>

Actually, we should just return parsetree->hasModifyingCTE at this
point, because if it's false, we shouldn't need to continue the search
(as we're assuming it has been set correctly for QSRC_ORIGINAL case).

> +   if (!hasSubQuery)
> +       return false;
> +
> +   return true;
>
> The above can be simplified as:
>     return hasSubQuery;
>

Yes, absolutely right, silly miss on that one!
Thanks.

This was only ever meant to be a temporary fix for this bug that
affects this patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Zhihong Yu

Date:

12 February 2021, 04:23:40

Greg:

bq. we should just return parsetree->hasModifyingCTE at this point,

Maybe you can clarify a bit.

The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).

When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE) block.

In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)

Cheers

On Thu, Feb 11, 2021 at 8:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> + /* Assume original queries have hasModifyingCTE set correctly */
> + if (parsetree->hasModifyingCTE)
> + hasModifyingCTE = true;
>
> Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
> hasModifyingCTE = parsetree->hasModifyingCTE
>

Actually, we should just return parsetree->hasModifyingCTE at this
point, because if it's false, we shouldn't need to continue the search
(as we're assuming it has been set correctly for QSRC_ORIGINAL case).

> + if (!hasSubQuery)
> + return false;
> +
> + return true;
>
> The above can be simplified as:
> return hasSubQuery;
>

Yes, absolutely right, silly miss on that one!
Thanks.

This was only ever meant to be a temporary fix for this bug that
affects this patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 February 2021, 05:42:51

On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Greg:
> bq. we should just return parsetree->hasModifyingCTE at this point,
>
> Maybe you can clarify a bit.
> The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
> When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if
(!hasModifyingCTE)block.

>
> In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)
>

Sorry for not making it clear. What I meant was that instead of:

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  if (parsetree->hasModifyingCTE)
    hasModifyingCTE = true;
}

I thought I should be able to use the following (it the setting for
QSRC_ORIGINAL can really be trusted):

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  return parsetree->hasModifyingCTE;
}

(and then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed)

BUT - after testing that change, the problem test case (in the "with"
tests) STILL fails.
I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
case (by adding an Assert), and it always is false.
So actually, there is no point in having the "if
(parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
called "original" query doesn't maintain the setting correctly (even
though the actual original query sent into the query rewriter does).
And also then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

12 February 2021, 06:30:50

From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> > What would the result look like if you turn off
> > parallel_leader_participation?  If the leader is freed from
> > reading/writing the table and index, the index page splits and
> > internal lock contention may decrease enough to recover part of the loss.
> >
> > https://www.postgresql.org/docs/devel/parallel-plans.html
> >
> > "In a parallel bitmap heap scan, one process is chosen as the leader.
> > That process performs a scan of one or more indexes and builds a
> > bitmap indicating which table blocks need to be visited. These blocks
> > are then divided among the cooperating processes as in a parallel
> > sequential scan. In other words, the heap scan is performed in
> > parallel, but the underlying index scan is not."
> 
> If I disable parallel_leader_participation.
> 
> For max_parallel_workers_per_gather = 4, It still have performance
> degradation.
> 
> For max_parallel_workers_per_gather = 2, the performance degradation will
> not happen in most of the case.
> There is sometimes a noise(performance degradation), but most of
> result(about 80%) is good.

Thank you.  The results indicate that it depends on the degree of parallelism whether the gain from parallelism
outweighsthe loss of parallel insert operations, at least in the bitmap scan case.
 

But can we conclude that this is limited to bitmap scan?  Even if that's the case, the planner does not have
informationabout insert operation to choose other plans like serial execution or parallel sequential scan.  Should we
encouragethe user in the manual to tune parameters and find the fastest plan?
 


Regards
Takayuki Tsunakawa

RE: Parallel INSERT (INTO ... SELECT ...)

From

"tsunakawa.takay@fujitsu.com"

Date:

12 February 2021, 07:32:48

From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> If we diable bitmapscan, the performance degradation seems will not happen.

Yes, but that's because the hundreds of times slower sequential scan hides the insert time.  Furthermore, as an aside,
Worker3 does much of the work in the parallel sequential scan + parallel insert case, while the load is well balanced
inthe parallel bitmap scan + parallel insert case.
 

Oracle and SQL Server executes parallel DML by holding an exclusive lock on the target table.  They might use some
specialpath for parallel DML to mitigate contention.
 


[serial bitmap scan + serial insert]
   ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005
rows=129999loops=1)
 
...
 Execution Time: 360.547 ms

[parallel bitmap scan + parallel insert]
         ->  Parallel Bitmap Heap Scan on public.x  (cost=3272.20..1260119.35 rows=19980 width=8) (actual
time=5.832..14.787rows=26000 loops=5)
 
...
 Execution Time: 382.776 ms


[serial sequential scan + serial insert]
   ->  Seq Scan on public.x  (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=129999
loops=1)
...
 Execution Time: 19311.700 ms

[parallel sequential scan + parallel insert]
         ->  Parallel Seq Scan on public.x  (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248
rows=32500loops=4)
 
...
 Execution Time: 5488.493 ms


Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

From

Zhihong Yu

Date:

12 February 2021, 12:09:40

Greg:

Thanks for more debugging.

Cheers

On Thu, Feb 11, 2021 at 9:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Greg:
> bq. we should just return parsetree->hasModifyingCTE at this point,
>
> Maybe you can clarify a bit.
> The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
> When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE) block.
>
> In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)
>

Sorry for not making it clear. What I meant was that instead of:

if (parsetree->querySource == QSRC_ORIGINAL)
{
/* Assume original queries have hasModifyingCTE set correctly */
if (parsetree->hasModifyingCTE)
hasModifyingCTE = true;
}

I thought I should be able to use the following (it the setting for
QSRC_ORIGINAL can really be trusted):

if (parsetree->querySource == QSRC_ORIGINAL)
{
/* Assume original queries have hasModifyingCTE set correctly */
return parsetree->hasModifyingCTE;
}

(and then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed)

BUT - after testing that change, the problem test case (in the "with"
tests) STILL fails.
I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
case (by adding an Assert), and it always is false.
So actually, there is no point in having the "if
(parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
called "original" query doesn't maintain the setting correctly (even
though the actual original query sent into the query rewriter does).
And also then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

12 February 2021, 12:47:57

On Fri, Feb 12, 2021 at 5:30 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> > If I disable parallel_leader_participation.
> >
> > For max_parallel_workers_per_gather = 4, It still have performance
> > degradation.
> >
> > For max_parallel_workers_per_gather = 2, the performance degradation will
> > not happen in most of the case.
> > There is sometimes a noise(performance degradation), but most of
> > result(about 80%) is good.
>
> Thank you.  The results indicate that it depends on the degree of parallelism whether the gain from parallelism
outweighsthe loss of parallel insert operations, at least in the bitmap scan case. 
>

That seems to be the pattern for this particular query, but I think
we'd need to test a variety to determine if that's always the case.

> But can we conclude that this is limited to bitmap scan?  Even if that's the case, the planner does not have
informationabout insert operation to choose other plans like serial execution or parallel sequential scan.  Should we
encouragethe user in the manual to tune parameters and find the fastest plan? 
>
>

It's all based on path costs, so we need to analyze and compare the
costing calculations done in this particular case against other cases,
and the values of the various parameters (costsize.c).
It's not difficult to determine for a parallel ModifyTablePath if it
has a BitmapHeapPath subpath - perhaps total_cost needs adjustment
(increase) for this case - and that will influence the planner to
choose a cheaper path. I was able to easily test the effect of doing
this, in the debugger - by increasing total_cost in cost_modifytable()
for the parallel bitmap heap scan case, the planner then chose a
serial Insert + bitmap heap scan, because it then had a cheaper cost.
Of course we need to better understand the problem and observed
patters in order to get a better feel of how those costs should be
adjusted.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Langote

Date:

12 February 2021, 13:17:22

On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > * I think that the concerns raised by Tsunakawa-san in:
> > >
> > >
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
> > >
> > > regarding how this interacts with plancache.c deserve a look.
> > > Specifically, a plan that uses parallel insert may fail to be
> > > invalidated when partitions are altered directly (that is without
> > > altering their root parent).  That would be because we are not adding
> > > partition OIDs to PlannerGlobal.invalItems despite making a plan
> > > that's based on checking their properties.  See this (tested with all
> > > patches applied!):
> > >
> >
> > Does any current Postgres code add partition OIDs to
> > PlannerGlobal.invalItems for a similar reason?

Currently, the planner opens partitions only for SELECT queries and
also adds them to the query's range table.  And because they are added
to the range table, their OIDs do get added to
PlannerGlobal.relationOids (not invalItems, sorry!) by way of
CompleteCachedPlan() calling extract_query_dependencies(), which looks
at Query.rtable to decide which tables/partitions to add.

> > I would have thought that, for example,  partitions with a default
> > column expression, using a function that is changed from SAFE to
> > UNSAFE, would suffer the same plancache issue (for current parallel
> > SELECT functionality) as we're talking about here - but so far I
> > haven't seen any code handling this.

AFAIK, default column expressions don't affect plans for SELECT
queries.  OTOH, consider a check constraint expression as an example.
The planner may use one to exclude a partition from the plan with its
constraint exclusion algorithm (separate from "partition pruning").
If the check constraint is dropped, any cached plans that used it will
be invalidated.

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp1 add constraint chk check (a >= -5);
set constraint_exclusion to on;

-- forces using a cached plan
set plan_cache_mode to force_generic_plan ;
prepare q as select * from rp where a < -5;

-- planner excluded rp1 because of the contradictory constraint
explain execute q;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

-- constraint dropped, plancache inval hook invoked
alter table rp1 drop constraint chk ;

-- old plan invalidated, new one made
explain execute q;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on rp1 rp  (cost=0.00..41.88 rows=850 width=4)
   Filter: (a < '-5'::integer)
(2 rows)

> > (Currently invalItems seems to support PROCID and TYPEOID; relation
> > OIDs seem to be handled through a different mechanism)..
>
> > Can you elaborate on what you believe is required here, so that the
> > partition OID dependency is registered and the altered partition
> > results in the plan being invalidated?
> > Thanks in advance for any help you can provide here.
>
> Actually, I tried adding the following in the loop that checks the
> parallel-safety of each partition and it seemed to work:
>
>             glob->relationOids =
>                     lappend_oid(glob->relationOids, pdesc->oids[i]);
>
> Can you confirm, is that what you were referring to?

Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.

Although it gets the job done, I'm not sure if manipulating
relationOids from max_parallel_hazard() or its subroutines is okay,
but I will let the committer decide that.  As I mentioned above, the
person who designed this decided for some reason that it is
extract_query_dependencies()'s job to populate
PlannerGlobal.relationOids/invalItems.

> (note that I've already updated the code to use
> CreatePartitionDirectory() and PartitionDirectoryLookup())

I will check your v16 to check if that indeed does the intended thing.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

15 February 2021, 07:39:36

On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > >
> > > > * I think that the concerns raised by Tsunakawa-san in:
> > > >
> > > >
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
> > > >
> > > > regarding how this interacts with plancache.c deserve a look.
> > > > Specifically, a plan that uses parallel insert may fail to be
> > > > invalidated when partitions are altered directly (that is without
> > > > altering their root parent).  That would be because we are not adding
> > > > partition OIDs to PlannerGlobal.invalItems despite making a plan
> > > > that's based on checking their properties.  See this (tested with all
> > > > patches applied!):
> > > >
> > >
> > > Does any current Postgres code add partition OIDs to
> > > PlannerGlobal.invalItems for a similar reason?
>
> Currently, the planner opens partitions only for SELECT queries and
> also adds them to the query's range table.  And because they are added
> to the range table, their OIDs do get added to
> PlannerGlobal.relationOids (not invalItems, sorry!) by way of
> CompleteCachedPlan() calling extract_query_dependencies(), which looks
> at Query.rtable to decide which tables/partitions to add.
>
> > > I would have thought that, for example,  partitions with a default
> > > column expression, using a function that is changed from SAFE to
> > > UNSAFE, would suffer the same plancache issue (for current parallel
> > > SELECT functionality) as we're talking about here - but so far I
> > > haven't seen any code handling this.
>
> AFAIK, default column expressions don't affect plans for SELECT
> queries.  OTOH, consider a check constraint expression as an example.
> The planner may use one to exclude a partition from the plan with its
> constraint exclusion algorithm (separate from "partition pruning").
> If the check constraint is dropped, any cached plans that used it will
> be invalidated.
>

Sorry, I got that wrong, default column expressions are relevant to
INSERT, not SELECT.

> >
> > Actually, I tried adding the following in the loop that checks the
> > parallel-safety of each partition and it seemed to work:
> >
> >             glob->relationOids =
> >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> >
> > Can you confirm, is that what you were referring to?
>
> Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
>
> Although it gets the job done, I'm not sure if manipulating
> relationOids from max_parallel_hazard() or its subroutines is okay,
> but I will let the committer decide that.  As I mentioned above, the
> person who designed this decided for some reason that it is
> extract_query_dependencies()'s job to populate
> PlannerGlobal.relationOids/invalItems.
>

Yes, it doesn't really seem right doing it within max_parallel_hazard().
I tried doing it in extract_query_dependencies() instead - see
attached patch - and it seems to work, but I'm not sure if there might
be any unintended side-effects.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

setrefs.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

16 February 2021, 09:07:51

On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > Did it actually use a parallel plan in your testing?
> > > When I ran these tests with the Parallel INSERT patch applied, it did
> > > not naturally choose a parallel plan for any of these cases.
> >
> > Yes, these cases pick parallel plan naturally on my test environment.
> >
> > postgres=# explain verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> >                                             QUERY PLAN
> > ----------------------------------------------------------------------
> > -----------------------------
> >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> >    Workers Planned: 4
> >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > width=0)
> >          ->  Parallel Bitmap Heap Scan on public.x1
> > (cost=3346.89..1272067.44 rows=20343 width=8)
> >                Output: x1.a, NULL::integer
> >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> > 199900000)))
> >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > width=0)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1495.19 rows=80883 width=0)
> >                            Index Cond: (x1.a < 80000)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1811.01 rows=97925 width=0)
> >                            Index Cond: (x1.a > 199900000)
> >
> > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> > do any parameters tuning in my test session.
>
> I reproduced this on my machine.
>
> I think we'd better do "analyze" before insert which helps reproduce this easier.
> Like:
>
> -----
> analyze;
> explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
> -----
>

Thanks, I tried test_bimap.sql in my own environment, and added
"analyze", and I also found it naturally chose a parallel INSERT with
parallel bitmap heap scan for each of these cases.
However, I didn't see any performance degradation when compared
against serial INSERT with bitmap heap scan.
The parallel plan in these cases seems to run a bit faster.
(Note that I'm using a release build of Postgres, and using default
postgresql.conf)


test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan select a from x
where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=210.587..212.135 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Insert on public.testscan  (cost=3193.29..1247014.24 rows=0
width=0) (actual time=195.296..195.298 rows=0 loops=5)
         Worker 0:  actual time=189.512..189.514 rows=0 loops=1
         Worker 1:  actual time=194.843..194.844 rows=0 loops=1
         Worker 2:  actual time=193.986..193.988 rows=0 loops=1
         Worker 3:  actual time=188.035..188.037 rows=0 loops=1
         ->  Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=7.992..25.837 row
s=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=261
               Worker 0:  actual time=1.473..14.458 rows=22465 loops=1
               Worker 1:  actual time=7.370..31.359 rows=30525 loops=1
               Worker 2:  actual time=8.765..19.838 rows=18549 loops=1
               Worker 3:  actual time=0.279..17.269 rows=23864 loops=1
               ->  BitmapOr  (cost=3193.29..3193.29 rows=170535
width=0) (actual time=21.775..21.777 rows=0 loops=1)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=11.961..11.961
rows=
79999 loops=1)
                           Index Cond: (x.a < 80000)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.809..9.809
rows=10
0000 loops=1)
                           Index Cond: (x.a > 199900000)
 Planning Time: 0.276 ms
 Execution Time: 212.189 ms
(25 rows)


test=# truncate testscan;
TRUNCATE TABLE
test=# set max_parallel_workers_per_gather=0;
SET
test=# explain analyze verbose insert into testscan select a from x
where a<80000 or (a%2=0 and a>199900000);
                                                               QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan  (cost=3193.29..3625636.35 rows=0 width=0)
(actual time=241.222..241.224 rows=0 loops=1)
   ->  Bitmap Heap Scan on public.x  (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=16.945..92.392 rows=129999 loops
=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         ->  BitmapOr  (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=16.735..16.736 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.222..9.223 rows=79999 lo
ops=1)
                     Index Cond: (x.a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1790.21
rows=96752 width=0) (actual time=7.511..7.511 rows=100000 l
oops=1)
                     Index Cond: (x.a > 199900000)
 Planning Time: 0.205 ms
 Execution Time: 241.274 ms
(14 rows)


============


test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan_pk select a from x
where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=572.242..573.683 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Insert on public.testscan_pk  (cost=3193.29..1247014.24 rows=0
width=0) (actual time=566.303..566.308 rows=0 loops=5)
         Worker 0:  actual time=566.756..566.757 rows=0 loops=1
         Worker 1:  actual time=564.778..564.779 rows=0 loops=1
         Worker 2:  actual time=564.402..564.419 rows=0 loops=1
         Worker 3:  actual time=563.748..563.749 rows=0 loops=1
         ->  Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=16.479..37.327 ro
ws=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=204
               Worker 0:  actual time=17.358..36.895 rows=24233 loops=1
               Worker 1:  actual time=12.711..33.538 rows=25616 loops=1
               Worker 2:  actual time=15.671..35.701 rows=24831 loops=1
               Worker 3:  actual time=17.656..39.310 rows=26645 loops=1
               ->  BitmapOr  (cost=3193.29..3193.29 rows=170535
width=0) (actual time=18.541..18.542 rows=0 loops=1)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=8.549..8.549
rows=79
999 loops=1)
                           Index Cond: (x.a < 80000)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.990..9.990
rows=10
0000 loops=1)
                           Index Cond: (x.a > 199900000)
 Planning Time: 0.240 ms
 Execution Time: 573.733 ms
(25 rows)



test=# set max_parallel_workers_per_gather=0;
SET
test=# truncate testscan_pk;
TRUNCATE TABLE
test=# explain analyze verbose insert into testscan_pk select a from x
where a<80000 or (a%2=0 and a>199900000);
                                                                QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan_pk  (cost=3193.29..3625636.35 rows=0
width=0) (actual time=598.997..598.998 rows=0 loops=1)
   ->  Bitmap Heap Scan on public.x  (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=20.153..96.858 rows=129999 loops
=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         ->  BitmapOr  (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=19.840..19.841 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.276..9.276 rows=79999 lo
ops=1)
                     Index Cond: (x.a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1790.21
rows=96752 width=0) (actual time=10.562..10.562 rows=100000
 loops=1)
                     Index Cond: (x.a > 199900000)
 Planning Time: 0.204 ms
 Execution Time: 599.098 ms
(14 rows)


============


test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan_index select a
from x where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=560.460..562.386 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Insert on public.testscan_index  (cost=3193.29..1247014.24
rows=0 width=0) (actual time=553.434..553.435 rows=0 loops=5)
         Worker 0:  actual time=548.751..548.752 rows=0 loops=1
         Worker 1:  actual time=552.008..552.009 rows=0 loops=1
         Worker 2:  actual time=553.094..553.095 rows=0 loops=1
         Worker 3:  actual time=553.389..553.390 rows=0 loops=1
         ->  Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=13.759..34.487 ro
ws=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=183
               Worker 0:  actual time=8.698..29.924 rows=26173 loops=1
               Worker 1:  actual time=12.865..33.889 rows=27421 loops=1
               Worker 2:  actual time=13.088..32.823 rows=24591 loops=1
               Worker 3:  actual time=14.075..36.349 rows=26571 loops=1
               ->  BitmapOr  (cost=3193.29..3193.29 rows=170535
width=0) (actual time=19.356..19.357 rows=0 loops=1)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=10.330..10.330
rows=
79999 loops=1)
                           Index Cond: (x.a < 80000)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.024..9.024
rows=10
0000 loops=1)
                           Index Cond: (x.a > 199900000)
 Planning Time: 0.219 ms
 Execution Time: 562.442 ms
(25 rows)



test=# set max_parallel_workers_per_gather=0;
SET
test=# truncate testscan_index;
TRUNCATE TABLE
test=# explain analyze verbose insert into testscan_index select a
from x where a<80000 or (a%2=0 and a>199900000);
                                                                QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan_index  (cost=3193.29..3625636.35 rows=0
width=0) (actual time=607.619..607.621 rows=0 loops=1)
   ->  Bitmap Heap Scan on public.x  (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=21.001..96.283 rows=129999 loops
=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         ->  BitmapOr  (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=20.690..20.691 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.097..9.097 rows=79999 lo
ops=1)
                     Index Cond: (x.a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1790.21
rows=96752 width=0) (actual time=11.591..11.591 rows=100000
 loops=1)
                     Index Cond: (x.a > 199900000)
 Planning Time: 0.205 ms
 Execution Time: 607.734 ms
(14 rows)


Even when I changed the queries to return more rows from the scan, to
the point where it chose not to use a parallel INSERT bitmap heap scan
(in favour of parallel seq scan), and then forced it to by disabling
seqscan, I found that it was still at least as fast as serial INSERT
with bitmap heap scan.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Langote

Date:

16 February 2021, 13:18:59

On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > Actually, I tried adding the following in the loop that checks the
> > > parallel-safety of each partition and it seemed to work:
> > >
> > >             glob->relationOids =
> > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > >
> > > Can you confirm, is that what you were referring to?
> >
> > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> >
> > Although it gets the job done, I'm not sure if manipulating
> > relationOids from max_parallel_hazard() or its subroutines is okay,
> > but I will let the committer decide that.  As I mentioned above, the
> > person who designed this decided for some reason that it is
> > extract_query_dependencies()'s job to populate
> > PlannerGlobal.relationOids/invalItems.
>
> Yes, it doesn't really seem right doing it within max_parallel_hazard().
> I tried doing it in extract_query_dependencies() instead - see
> attached patch - and it seems to work, but I'm not sure if there might
> be any unintended side-effects.

One issue I see with the patch is that it fails to consider
multi-level partitioning, because it's looking up partitions only in
the target table's PartitionDesc and no other.

@@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
PlannerInfo *context)
            RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);

            if (rte->rtekind == RTE_RELATION)
-               context->glob->relationOids =
-                   lappend_oid(context->glob->relationOids, rte->relid);
+           {
+               PlannerGlobal   *glob;
+
+               glob = context->glob;
+               glob->relationOids =
+                   lappend_oid(glob->relationOids, rte->relid);
+               if (query->commandType == CMD_INSERT &&
+                                   rte->relkind == RELKIND_PARTITIONED_TABLE)

The RTE whose relkind is being checked here may not be the INSERT
target relation's RTE, even though that's perhaps always true today.
So, I suggest to pull the new block out of the loop over rtable and
perform its deeds on the result RTE explicitly fetched using
rt_fetch(), preferably using a separate recursive function.  I'm
thinking something like the attached revised version.



--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

setrefs-v2.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

17 February 2021, 01:44:15

On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > > Actually, I tried adding the following in the loop that checks the
> > > > parallel-safety of each partition and it seemed to work:
> > > >
> > > >             glob->relationOids =
> > > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > > >
> > > > Can you confirm, is that what you were referring to?
> > >
> > > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> > >
> > > Although it gets the job done, I'm not sure if manipulating
> > > relationOids from max_parallel_hazard() or its subroutines is okay,
> > > but I will let the committer decide that.  As I mentioned above, the
> > > person who designed this decided for some reason that it is
> > > extract_query_dependencies()'s job to populate
> > > PlannerGlobal.relationOids/invalItems.
> >
> > Yes, it doesn't really seem right doing it within max_parallel_hazard().
> > I tried doing it in extract_query_dependencies() instead - see
> > attached patch - and it seems to work, but I'm not sure if there might
> > be any unintended side-effects.
>
> One issue I see with the patch is that it fails to consider
> multi-level partitioning, because it's looking up partitions only in
> the target table's PartitionDesc and no other.
>
> @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
> PlannerInfo *context)
>             RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
>
>             if (rte->rtekind == RTE_RELATION)
> -               context->glob->relationOids =
> -                   lappend_oid(context->glob->relationOids, rte->relid);
> +           {
> +               PlannerGlobal   *glob;
> +
> +               glob = context->glob;
> +               glob->relationOids =
> +                   lappend_oid(glob->relationOids, rte->relid);
> +               if (query->commandType == CMD_INSERT &&
> +                                   rte->relkind == RELKIND_PARTITIONED_TABLE)
>
> The RTE whose relkind is being checked here may not be the INSERT
> target relation's RTE, even though that's perhaps always true today.
> So, I suggest to pull the new block out of the loop over rtable and
> perform its deeds on the result RTE explicitly fetched using
> rt_fetch(), preferably using a separate recursive function.  I'm
> thinking something like the attached revised version.
>
>

Thanks. Yes, I'd forgotten about the fact a partition may itself be
partitioned, so it needs to be recursive (like in the parallel-safety
checks).
Your revised version seems OK, though I do have a concern:
Is the use of "table_close(rel, NoLock)'' intentional? That will keep
the lock (lockmode) until end-of-transaction.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Langote

Date:

17 February 2021, 13:34:42

On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > > > Actually, I tried adding the following in the loop that checks the
> > > > > parallel-safety of each partition and it seemed to work:
> > > > >
> > > > >             glob->relationOids =
> > > > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > > > >
> > > > > Can you confirm, is that what you were referring to?
> > > >
> > > > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> > > >
> > > > Although it gets the job done, I'm not sure if manipulating
> > > > relationOids from max_parallel_hazard() or its subroutines is okay,
> > > > but I will let the committer decide that.  As I mentioned above, the
> > > > person who designed this decided for some reason that it is
> > > > extract_query_dependencies()'s job to populate
> > > > PlannerGlobal.relationOids/invalItems.
> > >
> > > Yes, it doesn't really seem right doing it within max_parallel_hazard().
> > > I tried doing it in extract_query_dependencies() instead - see
> > > attached patch - and it seems to work, but I'm not sure if there might
> > > be any unintended side-effects.
> >
> > One issue I see with the patch is that it fails to consider
> > multi-level partitioning, because it's looking up partitions only in
> > the target table's PartitionDesc and no other.
> >
> > @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
> > PlannerInfo *context)
> >             RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
> >
> >             if (rte->rtekind == RTE_RELATION)
> > -               context->glob->relationOids =
> > -                   lappend_oid(context->glob->relationOids, rte->relid);
> > +           {
> > +               PlannerGlobal   *glob;
> > +
> > +               glob = context->glob;
> > +               glob->relationOids =
> > +                   lappend_oid(glob->relationOids, rte->relid);
> > +               if (query->commandType == CMD_INSERT &&
> > +                                   rte->relkind == RELKIND_PARTITIONED_TABLE)
> >
> > The RTE whose relkind is being checked here may not be the INSERT
> > target relation's RTE, even though that's perhaps always true today.
> > So, I suggest to pull the new block out of the loop over rtable and
> > perform its deeds on the result RTE explicitly fetched using
> > rt_fetch(), preferably using a separate recursive function.  I'm
> > thinking something like the attached revised version.
>
> Thanks. Yes, I'd forgotten about the fact a partition may itself be
> partitioned, so it needs to be recursive (like in the parallel-safety
> checks).
> Your revised version seems OK, though I do have a concern:
> Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> the lock (lockmode) until end-of-transaction.

I think we always keep any locks on relations that are involved in a
plan until end-of-transaction.  What if a partition is changed in an
unsafe manner between being considered safe for parallel insertion and
actually performing the parallel insert?

BTW, I just noticed that exctract_query_dependencies() runs on a
rewritten, but not-yet-planned query tree, that is, I didn't know that
extract_query_dependencies() only populates the CachedPlanSource's
relationOids and not CachedPlan's.  The former is only for tracking
the dependencies of an unplanned Query, so partitions should never be
added to it.  Instead, they should be added to
PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
which is kind of what your earlier patch did.  Needless to say,
PlanCacheRelCallback checks both CachedPlanSource.relationOids and
PlannedStmt.relationOids, so if it receives a message about a
partition, its OID is matched from the latter.

All that is to say that we should move our code to add partition OIDs
as plan dependencies to somewhere in set_plan_references(), which
otherwise populates PlannedStmt.relationOids.  I updated the patch to
do that.  It also occurred to me that we can avoid pointless adding of
partitions if the final plan won't use parallelism.  For that, the
patch adds checking glob->parallelModeNeeded, which seems to do the
trick though I don't know if that's the correct way of doing that.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

setrefs-v3.patch

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

18 February 2021, 01:03:11

On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > Your revised version seems OK, though I do have a concern:
> > Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> > the lock (lockmode) until end-of-transaction.
>
> I think we always keep any locks on relations that are involved in a
> plan until end-of-transaction.  What if a partition is changed in an
> unsafe manner between being considered safe for parallel insertion and
> actually performing the parallel insert?
>
> BTW, I just noticed that exctract_query_dependencies() runs on a
> rewritten, but not-yet-planned query tree, that is, I didn't know that
> extract_query_dependencies() only populates the CachedPlanSource's
> relationOids and not CachedPlan's.  The former is only for tracking
> the dependencies of an unplanned Query, so partitions should never be
> added to it.  Instead, they should be added to
> PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
> which is kind of what your earlier patch did.  Needless to say,
> PlanCacheRelCallback checks both CachedPlanSource.relationOids and
> PlannedStmt.relationOids, so if it receives a message about a
> partition, its OID is matched from the latter.
>
> All that is to say that we should move our code to add partition OIDs
> as plan dependencies to somewhere in set_plan_references(), which
> otherwise populates PlannedStmt.relationOids.  I updated the patch to
> do that.

OK, understood. Thanks for the detailed explanation.

> It also occurred to me that we can avoid pointless adding of
> partitions if the final plan won't use parallelism.  For that, the
> patch adds checking glob->parallelModeNeeded, which seems to do the
> trick though I don't know if that's the correct way of doing that.
>

I'm not sure if's pointless adding partitions even in the case of NOT
using parallelism, because we may be relying on the result of
parallel-safety checks on partitions in both cases.
For example, insert_parallel.sql currently includes a test (that you
originally provided in a previous post) that checks a non-parallel
plan is generated after a parallel-unsafe trigger is created on a
partition involved in the INSERT.
If I further add to that test by then dropping that trigger and then
again using EXPLAIN to see what plan is generated, then I'd expect a
parallel-plan to be generated, but with the setrefs-v3.patch it still
generates a non-parallel plan. So I think the "&&
glob->parallelModeNeeded" part of test needs to be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Langote

Date:

18 February 2021, 05:35:06

On Thu, Feb 18, 2021 at 10:03 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > All that is to say that we should move our code to add partition OIDs
> > as plan dependencies to somewhere in set_plan_references(), which
> > otherwise populates PlannedStmt.relationOids.  I updated the patch to
> > do that.
>
> OK, understood. Thanks for the detailed explanation.
>
> > It also occurred to me that we can avoid pointless adding of
> > partitions if the final plan won't use parallelism.  For that, the
> > patch adds checking glob->parallelModeNeeded, which seems to do the
> > trick though I don't know if that's the correct way of doing that.
> >
>
> I'm not sure if's pointless adding partitions even in the case of NOT
> using parallelism, because we may be relying on the result of
> parallel-safety checks on partitions in both cases.
> For example, insert_parallel.sql currently includes a test (that you
> originally provided in a previous post) that checks a non-parallel
> plan is generated after a parallel-unsafe trigger is created on a
> partition involved in the INSERT.
> If I further add to that test by then dropping that trigger and then
> again using EXPLAIN to see what plan is generated, then I'd expect a
> parallel-plan to be generated, but with the setrefs-v3.patch it still
> generates a non-parallel plan. So I think the "&&
> glob->parallelModeNeeded" part of test needs to be removed.

Ah, okay, I didn't retest my case after making that change.

Looking at this again, I am a bit concerned about going over the whole
partition tree *twice* when making a parallel plan for insert into
partitioned tables.  Maybe we should do what you did in your first
attempt a slightly differently -- add partition OIDs during the
max_parallel_hazard() initiated scan of the partition tree as you did.
Instead of adding them directly to PlannerGlobal.relationOids, add to,
say, PlannerInfo.targetPartitionOids and have set_plan_references() do
list_concat(glob->relationOids, list_copy(root->targetPartitionOids)
in the same place as setrefs-v3 does
add_target_partition_oids_recurse().  Thoughts?

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

18 February 2021, 10:39:08

On Thu, Feb 18, 2021 at 4:35 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Looking at this again, I am a bit concerned about going over the whole
> partition tree *twice* when making a parallel plan for insert into
> partitioned tables.  Maybe we should do what you did in your first
> attempt a slightly differently -- add partition OIDs during the
> max_parallel_hazard() initiated scan of the partition tree as you did.
> Instead of adding them directly to PlannerGlobal.relationOids, add to,
> say, PlannerInfo.targetPartitionOids and have set_plan_references() do
> list_concat(glob->relationOids, list_copy(root->targetPartitionOids)
> in the same place as setrefs-v3 does
> add_target_partition_oids_recurse().  Thoughts?
>

Agreed, that might be a better approach, and that way we're also only
recording the partition OIDs that the parallel-safety checks are
relying on.
I'll give it a go and see if I can detect any issues with this method.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From

Greg Nancarrow

Date:

19 February 2021, 01:25:56

Posting a new version of the patches, with the following updates:
- Moved the update of glob->relationOIDs (i.e. addition of partition
OIDs that plan depends on, resulting from parallel-safety checks) from
within max_parallel_hazard() to set_plan_references().
- Added an extra test for partition plan-cache invalidation.
- Simplified query_has_modifying_cte() temporary bug-fix.
- Added a comment explaining why parallel-safety of partition column
defaults is not checked.
- Minor simplification: hasSubQuery return to is_parallel_possible_for_modify().


Regards,
Greg Nancarrow
Fujitsu Australia

Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch have been committed, I have now rebased and attached the rest of the original patchset, which includes:

- Additional tests for "Parallel SELECT for INSERT"

- Parallel INSERT functionality

Re: Parallel INSERT (INTO ... SELECT ...)

From

Amit Kapila

Date:

01 April 2021, 04:26:45

On Mon, Mar 22, 2021 at 3:57 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Mar 22, 2021 at 6:28 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > >
> > > Let me know if these changes seem OK to you.
> >
> > Yes, these changes look good to me.
>
> Posting an updated set of patches with these changes...
>

I have marked this as Returned with Feedback. There is a lot of work
to do for this patch as per the feedback given on pgsql-committers
[1].

[1] - https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY%40gemulon.postgresql.org

-- 
With Regards,
Amit Kapila.