Thread: Parallel INSERT (INTO ... SELECT ...)

Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Hi Hackers,

Following on from Dilip Kumar's POC patch for allowing parallelism of
the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
patch for allowing parallelism of both the INSERT and SELECT parts,
where it can be allowed.
For cases where it can't be allowed (e.g. INSERT into a table with
foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
...") it at least allows parallelism of the SELECT part.
Obviously I've had to update the planner and executor and
parallel-worker code to make this happen, hopefully not breaking too
many things along the way.

Examples with patch applied:


(1) non-parallel:

test=# explain analyze insert into primary_tbl select * from third_tbl;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Insert on primary_tbl  (cost=0.00..154.99 rows=9999 width=12) (actual
time=108.445..108.446 rows=0 loops=1)
   ->  Seq Scan on third_tbl  (cost=0.00..154.99 rows=9999 width=12)
(actual time=0.009..5.282 rows=9999 loops=1)
 Planning Time: 0.132 ms
 Execution Time: 108.596 ms
(4 rows)


(2) parallel:

test=# explain analyze insert into primary_tbl select * from third_tbl;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..16.00 rows=9999 width=12) (actual
time=69.870..70.310 rows=0 loops=1)
   Workers Planned: 5
   Workers Launched: 5
   ->  Parallel Insert on primary_tbl  (cost=0.00..16.00 rows=500
width=12) (actual time=59.948..59.949 rows=0 loops=6)
         ->  Parallel Seq Scan on third_tbl  (cost=0.00..80.00
rows=2500 width=12) (actual time=0.014..0.922 rows=1666 loops=6)
 Planning Time: 0.121 ms
 Execution Time: 70.438 ms
(7 rows)


(3) parallel select only (insert into table with foreign key)

test=# explain analyze insert into secondary_tbl select * from third_tbl;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Insert on secondary_tbl  (cost=0.00..80.00 rows=9999 width=12)
(actual time=33.864..33.926 rows=0 loops=1)
   ->  Gather  (cost=0.00..80.00 rows=9999 width=12) (actual
time=0.451..5.201 rows=9999 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Parallel Seq Scan on third_tbl  (cost=0.00..80.00
rows=2500 width=12) (actual time=0.013..0.717 rows=2000 loops=5)
 Planning Time: 0.127 ms
 Trigger for constraint secondary_tbl_index_fkey: time=331.834 calls=9999
 Execution Time: 367.342 ms
(8 rows)


Known issues/TODOs:
- Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
INTO ... VALUES ..." would need additional Table AM functions for
dividing up the INSERT work amongst the workers (currently only exists
for scans).
- When INSERTs are made parallel, currently the reported row-count in
the "INSERT 0 <row-count>" status only reflects the rows that the
leader has processed (not the workers) - so it is obviously less than
the actual number of rows inserted.
- Functions relating to computing the number of parallel workers for
an INSERT, and the cost of an INSERT, need work.
- "force_parallel_mode" handling was updated so that it only affects
SELECT (not INSERT) - can't allow it for INSERT because we're only
supporting "INSERT INTO .. SELECT ..." and don't support other types
of INSERTs, and also can't allow attempted parallel UPDATEs resulting
from "INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE" etc.


Thoughts and feedback?

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
> - When INSERTs are made parallel, currently the reported row-count in
> the "INSERT 0 <row-count>" status only reflects the rows that the
> leader has processed (not the workers) - so it is obviously less than
> the actual number of rows inserted.

Attached an updated patch which fixes this issue (for parallel
INSERTs, each worker's processed tuple count is communicated in shared
memory back to the leader, where it is added to the global
"es_processed" count).

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
vignesh C
Date:
On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Hi Hackers,
>
> Following on from Dilip Kumar's POC patch for allowing parallelism of
> the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
> patch for allowing parallelism of both the INSERT and SELECT parts,
> where it can be allowed.
> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.
> Obviously I've had to update the planner and executor and
> parallel-worker code to make this happen, hopefully not breaking too
> many things along the way.

I feel this will be a very good performance improvement. +1 for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Andres Freund
Date:
Hi,

On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote:
> Following on from Dilip Kumar's POC patch for allowing parallelism of
> the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
> patch for allowing parallelism of both the INSERT and SELECT parts,
> where it can be allowed.

Cool!

I think it'd be good if you outlined what your approach is to make this
safe.


> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.

I think it'd be good to do this part separately and first, independent
of whether the insert part can be parallelized.


> Obviously I've had to update the planner and executor and
> parallel-worker code to make this happen, hopefully not breaking too
> many things along the way.

Hm, it looks like you've removed a fair bit of checks, it's not clear to
me why that's safe in each instance.


> - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
> INTO ... VALUES ..." would need additional Table AM functions for
> dividing up the INSERT work amongst the workers (currently only exists
> for scans).

Hm, not entirely following. What precisely are you thinking of here?

I doubt it's really worth adding parallelism support for INSERT
... VALUES, the cost of spawning workers will almost always higher than
the benefit.





> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
>      TupleDesc    toasttupDesc;
>      Datum        t_values[3];
>      bool        t_isnull[3];
> -    CommandId    mycid = GetCurrentCommandId(true);
> +    CommandId    mycid = GetCurrentCommandId(!IsParallelWorker());
>      struct varlena *result;
>      struct varatt_external toast_pointer;
>      union

Hm? Why do we need this in the various places you have made this change?


> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 1585861..94c8507 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
>       * inserts in general except for the cases where inserts generate a new
>       * CommandId (eg. inserts into a table having a foreign key column).
>       */
> -    if (IsParallelWorker())
> -        ereport(ERROR,
> -                (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> -                 errmsg("cannot insert tuples in a parallel worker")));
> -

I'm afraid that this weakens our checks more than I'd like. What if this
ends up being invoked from inside C code?


> @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
>                  isdead = false;
>                  break;
>              case HEAPTUPLE_INSERT_IN_PROGRESS:
> -
>                  /*
>                   * Since we hold exclusive lock on the relation, normally the
>                   * only way to see this is if it was inserted earlier in our
>                   * own transaction.  However, it can happen in system
>                   * catalogs, since we tend to release write lock before commit
> -                 * there.  Give a warning if neither case applies; but in any
> -                 * case we had better copy it.
> +                 * there. In any case we had better copy it.
>                   */
> -                if (!is_system_catalog &&
> -                    !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
> -                    elog(WARNING, "concurrent insert in progress within table \"%s\"",
> -                         RelationGetRelationName(OldHeap));
> +
>                  /* treat as live */
>                  isdead = false;
>                  break;
> @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation,
>                       * the only way to see this is if it was inserted earlier
>                       * in our own transaction.  However, it can happen in
>                       * system catalogs, since we tend to release write lock
> -                     * before commit there.  Give a warning if neither case
> -                     * applies.
> +                     * before commit there.
>                       */
>                      xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
>                      if (!TransactionIdIsCurrentTransactionId(xwait))
>                      {
> -                        if (!is_system_catalog)
> -                            elog(WARNING, "concurrent insert in progress within table \"%s\"",
> -                                 RelationGetRelationName(heapRelation));
> -
>                          /*
>                           * If we are performing uniqueness checks, indexing
>                           * such a tuple could lead to a bogus uniqueness

Huh, I don't think this should be necessary?


> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> index a4944fa..9d3f100 100644
> --- a/src/backend/access/transam/varsup.c
> +++ b/src/backend/access/transam/varsup.c
> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
>      TransactionId xid;
>  
>      /*
> -     * Workers synchronize transaction state at the beginning of each parallel
> -     * operation, so we can't account for new XIDs after that point.
> -     */
> -    if (IsInParallelMode())
> -        elog(ERROR, "cannot assign TransactionIds during a parallel operation");
> -
> -    /*
>       * During bootstrap initialization, we return the special bootstrap
>       * transaction id.
>       */

Same thing, this code cannot just be allowed to be reachable. What
prevents you from assigning two different xids from different workers
etc?


> @@ -577,13 +608,6 @@ AssignTransactionId(TransactionState s)
>      Assert(s->state == TRANS_INPROGRESS);
>  
>      /*
> -     * Workers synchronize transaction state at the beginning of each parallel
> -     * operation, so we can't account for new XIDs at this point.
> -     */
> -    if (IsInParallelMode() || IsParallelWorker())
> -        elog(ERROR, "cannot assign XIDs during a parallel operation");
> -
> -    /*
>       * Ensure parent(s) have XIDs, so that a child always has an XID later
>       * than its parent.  Mustn't recurse here, or we might get a stack
>       * overflow if we're at the bottom of a huge stack of subtransactions none

Dito.


Greetings,

Andres Freund



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Tue, Sep 22, 2020 at 4:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>  Gather  (cost=0.00..16.00 rows=9999 width=12) (actual
> time=69.870..70.310 rows=0 loops=1)
>    Workers Planned: 5
>    Workers Launched: 5
>    ->  Parallel Insert on primary_tbl  (cost=0.00..16.00 rows=500
> width=12) (actual time=59.948..59.949 rows=0 loops=6)

Nice.  I took it for a quick spin.  I was initially surprised to see
Gather.  I suppose I thought that Parallel {Insert|Update|Delete}
might be a top level node itself, because in such a plan there is no
need to gather tuples per se.  I understand exactly why you have it
that way though: Gather is needed to control workers and handle their
errors etc, and we don't want to have to terminate parallelism anyway
(thinking of some kind of plan with multiple write subqueries).



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Sep 24, 2020 at 7:57 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Tue, Sep 22, 2020 at 4:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >  Gather  (cost=0.00..16.00 rows=9999 width=12) (actual
> > time=69.870..70.310 rows=0 loops=1)
> >    Workers Planned: 5
> >    Workers Launched: 5
> >    ->  Parallel Insert on primary_tbl  (cost=0.00..16.00 rows=500
> > width=12) (actual time=59.948..59.949 rows=0 loops=6)
>
> Nice.  I took it for a quick spin.  I was initially surprised to see
> Gather.  I suppose I thought that Parallel {Insert|Update|Delete}
> might be a top level node itself, because in such a plan there is no
> need to gather tuples per se.  I understand exactly why you have it
> that way though: Gather is needed to control workers and handle their
> errors etc, and we don't want to have to terminate parallelism anyway
> (thinking of some kind of plan with multiple write subqueries).
>

I have not checked the patch but I guess if we parallelise Inserts
with Returning then isn't it better to have Gather node above Parallel
Inserts?

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Sep 24, 2020 at 12:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I have not checked the patch but I guess if we parallelise Inserts
> with Returning then isn't it better to have Gather node above Parallel
> Inserts?
>

This is indeed the case with the patch applied.

For example:

test=# explain insert into primary_tbl select * from third_tbl
returning index, height;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather  (cost=0.00..28.15 rows=9999 width=12)
   Workers Planned: 3
   ->  Parallel Insert on primary_tbl  (cost=0.00..28.15 rows=1040 width=12)
         ->  Parallel Seq Scan on third_tbl  (cost=0.00..87.25
rows=3225 width=12)
(4 rows)

test=# insert into primary_tbl select * from third_tbl returning index, height;
 index | height
-------+--------
     1 |    1.2
     2 |    1.2
     3 |    1.2
     4 |    1.2
     5 |    1.2
     6 |    1.2
     7 |    1.2

...

  9435 |    1.2
  9619 |    1.2
  9620 |    1.2
(9999 rows)

INSERT 0 9999


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Sep 24, 2020 at 7:51 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-09-22 14:55:21 +1000, Greg Nancarrow wrote:
>
>
> > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> > index 1585861..94c8507 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
> >        * inserts in general except for the cases where inserts generate a new
> >        * CommandId (eg. inserts into a table having a foreign key column).
> >        */
> > -     if (IsParallelWorker())
> > -             ereport(ERROR,
> > -                             (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -                              errmsg("cannot insert tuples in a parallel worker")));
> > -
>
> I'm afraid that this weakens our checks more than I'd like.
>

I think we need to change/remove this check to allow inserts by
parallel workers. I am not sure but maybe we can add an Assert to
ensure that it is safe to perform insert via parallel worker.

> What if this
> ends up being invoked from inside C code?
>

I think it shouldn't be a problem unless one is trying to do something
like insert into foreign key table. So, probably we can have an Assert
to catch it if possible. Do you have any other idea?

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
> > What if this
> > ends up being invoked from inside C code?
> >
>
> I think it shouldn't be a problem unless one is trying to do something
> like insert into foreign key table. So, probably we can have an Assert
> to catch it if possible. Do you have any other idea?
>

Note that the planner code updated by the patch does avoid creating a
Parallel INSERT plan in the case of inserting into a table with a
foreign key (so commandIds won't be created in the parallel-worker
code).
I'm not sure how to distinguish the "invoked from inside C code" case though.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Hi Andres,

On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres@anarazel.de> wrote:
>
>I think it'd be good if you outlined what your approach is to make this
>safe.

Some prior work has already been done to establish the necessary
infrastructure to allow parallel INSERTs, in general, to be safe,
except for cases where new commandIds would be generated in the
parallel-worker code (such as inserts into a table having a foreign
key) - these cases need to be avoided.
See the following commits.

85f6b49 Allow relation extension lock to conflict among parallel group members
3ba59cc Allow page lock to conflict among parallel group members

The planner code updated by the patch avoids creating a Parallel
INSERT plan in the case of inserting into a table that has a foreign
key.


>> For cases where it can't be allowed (e.g. INSERT into a table with
>> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
>> ...") it at least allows parallelism of the SELECT part.
>
>I think it'd be good to do this part separately and first, independent
>of whether the insert part can be parallelized.

OK then, I'll try to extract that as a separate patch.


>> Obviously I've had to update the planner and executor and
>> parallel-worker code to make this happen, hopefully not breaking too
>> many things along the way.
>
>Hm, it looks like you've removed a fair bit of checks, it's not clear to
>me why that's safe in each instance.

It should be safe for Parallel INSERT - but you are right, these are
brute force removals (for the purpose of a POC patch) that should be
tightened up wherever possible to disallow unsafe paths into that
code. Problem is, currently there's not a lot of context information
available to easily allow that, so some work needs to be done.


>> - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
>> INTO ... VALUES ..." would need additional Table AM functions for
>> dividing up the INSERT work amongst the workers (currently only exists
>> for scans).
>
>Hm, not entirely following. What precisely are you thinking of here?

All I was saying is that for SELECTs, the work done by each parallel
worker is effectively divided up by parallel-worker-related functions
in tableam.c and indexam.c, and no such technology currently exists
for dividing up work for the "INSERT ... VALUES" case.


>I doubt it's really worth adding parallelism support for INSERT
>... VALUES, the cost of spawning workers will almost always higher than
>the benefit.

You're probably right in doubting any benefit, but I wasn't entirely sure.


>> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
>>       TupleDesc       toasttupDesc;
>>       Datum           t_values[3];
>>       bool            t_isnull[3];
>> -     CommandId       mycid = GetCurrentCommandId(true);
>> +     CommandId       mycid = GetCurrentCommandId(!IsParallelWorker());
>>       struct varlena *result;
>>       struct varatt_external toast_pointer;
>>       union
>
>Hm? Why do we need this in the various places you have made this change?

It's because for Parallel INSERT, we're assigning the same command-id
to each worker up-front during worker initialization (the commandId
has been retrieved by the leader and passed through to each worker)
and "currentCommandIdUsed" has been set true. See the
AssignCommandIdForWorker() function in the patch.
If you see the code of GetCurrentCommandId(), you'll see it Assert
that it's not being run by a parallel worker if the parameter is true.
I didn't want to remove yet another check, without being able to know
the context of the caller, because only for Parallel INSERT do I know
that "currentCommandIdUsed was already true at the start of the
parallel operation". See the comment in that function. Anyway, that's
why I'm passing "false" to relevant GetCurrentCommandId() calls if
they're being run by a parallel (INSERT) worker.


>> @@ -822,19 +822,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
>>                               isdead = false;
>>                               break;
>>                       case HEAPTUPLE_INSERT_IN_PROGRESS:
>> -
>>                               /*
>>                                * Since we hold exclusive lock on the relation, normally the
>>                                * only way to see this is if it was inserted earlier in our
>>                                * own transaction.  However, it can happen in system
>>                                * catalogs, since we tend to release write lock before >commit
>> -                              * there.  Give a warning if neither case applies; but in any
>> -                              * case we had better copy it.
>> +                              * there. In any case we had better copy it.
>>                                */
>> -                             if (!is_system_catalog &&
>> -                                     !TransactionIdIsCurrentTransactionId>(HeapTupleHeaderGetXmin(tuple->t_data)))
>> -                                     elog(WARNING, "concurrent insert in progress within >table \"%s\"",
>> -                                              RelationGetRelationName(OldHeap));
>> +
>>                               /* treat as live */
>>                               isdead = false;
>>                               break;
>> @@ -1434,16 +1429,11 @@ heapam_index_build_range_scan(Relation heapRelation,
>>                                        * the only way to see this is if it was inserted >earlier
>>                                        * in our own transaction.  However, it can happen in
>>                                        * system catalogs, since we tend to release write >lock
>> -                                      * before commit there.  Give a warning if neither >case
>> -                                      * applies.
>> +                                      * before commit there.
>>                                        */
>>                                       xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
>>                                       if (!TransactionIdIsCurrentTransactionId(xwait))
>>                                       {
>> -                                             if (!is_system_catalog)
>> -                                                     elog(WARNING, "concurrent insert in >progress within table
\"%s\"",
>> -                                                              RelationGetRelationName>(heapRelation));
>> -
>>                                               /*
>>                                                * If we are performing uniqueness checks, >>indexing
>>                                                * such a tuple could lead to a bogus >uniqueness
>
>Huh, I don't think this should be necessary?

Yes, I think you're right, I perhaps got carried away removing checks
on concurrent inserts. I will revert those changes.


>> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
>> index a4944fa..9d3f100 100644
>> --- a/src/backend/access/transam/varsup.c
>> +++ b/src/backend/access/transam/varsup.c
>> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
>>       TransactionId xid;
>>
>>       /*
>> -      * Workers synchronize transaction state at the beginning of each parallel
>> -      * operation, so we can't account for new XIDs after that point.
>> -      */
>> -     if (IsInParallelMode())
>> -             elog(ERROR, "cannot assign TransactionIds during a parallel operation");
>> -
>> -     /*
>>        * During bootstrap initialization, we return the special bootstrap
>>        * transaction id.
>>        */
>
>Same thing, this code cannot just be allowed to be reachable. What
>prevents you from assigning two different xids from different workers
>etc?

At least in the case of Parallel INSERT, the leader for the Parallel
INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
through and assigned to each of the workers during their
initialization (so they are assigned the same xid).


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> For cases where it can't be allowed (e.g. INSERT into a table with
> foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> ...") it at least allows parallelism of the SELECT part.
>

Thanks Greg for the patch.

I have few points (inspired from parallel copy feature work) to mention:

1. What if the target table is a foreign table or partitioned table?
2. What happens if the target table has triggers(before statement,
after statement, before row, after row) that are parallel unsafe?
3. Will each worker be doing single row insertions or multi inserts?
If single row insertions, will the buffer lock contentions be more?
5. How does it behave with toast columns values?
6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT?

I'm looking forward to seeing some initial numbers on execution times
with and without patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Sep 25, 2020 at 7:01 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I have few points (inspired from parallel copy feature work) to mention:
>
> 1. What if the target table is a foreign table or partitioned table?
> 2. What happens if the target table has triggers(before statement,
> after statement, before row, after row) that are parallel unsafe?
> 3. Will each worker be doing single row insertions or multi inserts?
> If single row insertions, will the buffer lock contentions be more?
> 5. How does it behave with toast columns values?
> 6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT?
>

Hi Bharath,

Thanks for pointing out more cases I need to exclude and things I need
to investigate further.
I have taken note of them, and will do more testing and improvement.
At least RETURNING clause with INSERT INTO SELECT is working!

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Sep 25, 2020 at 10:02 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Hi Andres,
>
> On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres@anarazel.de> wrote:
> >
>
>
> >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
> >>       TupleDesc       toasttupDesc;
> >>       Datum           t_values[3];
> >>       bool            t_isnull[3];
> >> -     CommandId       mycid = GetCurrentCommandId(true);
> >> +     CommandId       mycid = GetCurrentCommandId(!IsParallelWorker());
> >>       struct varlena *result;
> >>       struct varatt_external toast_pointer;
> >>       union
> >
> >Hm? Why do we need this in the various places you have made this change?
>
> It's because for Parallel INSERT, we're assigning the same command-id
> to each worker up-front during worker initialization (the commandId
> has been retrieved by the leader and passed through to each worker)
> and "currentCommandIdUsed" has been set true. See the
> AssignCommandIdForWorker() function in the patch.
> If you see the code of GetCurrentCommandId(), you'll see it Assert
> that it's not being run by a parallel worker if the parameter is true.
> I didn't want to remove yet another check, without being able to know
> the context of the caller, because only for Parallel INSERT do I know
> that "currentCommandIdUsed was already true at the start of the
> parallel operation". See the comment in that function. Anyway, that's
> why I'm passing "false" to relevant GetCurrentCommandId() calls if
> they're being run by a parallel (INSERT) worker.
>

But we can tighten the condition in GetCurrentCommandId() such that it
Asserts for parallel worker only when currentCommandIdUsed is not set
before start of parallel operation. I also find these changes in the
callers of GetCurrentCommandId() quite adhoc and ugly even if they are
correct. Also, why we don't face a similar problems for parallel copy?

>
> >> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> >> index a4944fa..9d3f100 100644
> >> --- a/src/backend/access/transam/varsup.c
> >> +++ b/src/backend/access/transam/varsup.c
> >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
> >>       TransactionId xid;
> >>
> >>       /*
> >> -      * Workers synchronize transaction state at the beginning of each parallel
> >> -      * operation, so we can't account for new XIDs after that point.
> >> -      */
> >> -     if (IsInParallelMode())
> >> -             elog(ERROR, "cannot assign TransactionIds during a parallel operation");
> >> -
> >> -     /*
> >>        * During bootstrap initialization, we return the special bootstrap
> >>        * transaction id.
> >>        */
> >
> >Same thing, this code cannot just be allowed to be reachable. What
> >prevents you from assigning two different xids from different workers
> >etc?
>
> At least in the case of Parallel INSERT, the leader for the Parallel
> INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
> through and assigned to each of the workers during their
> initialization (so they are assigned the same xid).
>

So are you facing problems in this area because we EnterParallelMode
before even assigning the xid in the leader? Because I don't think we
should ever reach this code in the worker. If so, there are two
possibilities that come to my mind (a) assign xid in leader before
entering parallel mode or (b) change the check so that we don't assign
the new xid in workers. In this case, I am again wondering how does
parallel copy dealing this?

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Fri, Sep 25, 2020 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > At least in the case of Parallel INSERT, the leader for the Parallel
> > INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
> > through and assigned to each of the workers during their
> > initialization (so they are assigned the same xid).
> >
>
> So are you facing problems in this area because we EnterParallelMode
> before even assigning the xid in the leader? Because I don't think we
> should ever reach this code in the worker. If so, there are two
> possibilities that come to my mind (a) assign xid in leader before
> entering parallel mode or (b) change the check so that we don't assign
> the new xid in workers. In this case, I am again wondering how does
> parallel copy dealing this?
>

In parallel copy, we are doing option (a) i.e. the leader gets the
full txn id before entering parallel mode and passes it to all
workers.
In the leader:
    full_transaction_id = GetCurrentFullTransactionId();
    EnterParallelMode();
    shared_info_ptr->full_transaction_id = full_transaction_id;
In the workers:
    AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);

Hence below part of the code doesn't get hit.
    if (IsInParallelMode() || IsParallelWorker())
        elog(ERROR, "cannot assign XIDs during a parallel operation");

We also deal with the commandid similarly i.e. the leader gets the
command id, and workers would use it while insertion.
In the leader:
    shared_info_ptr->mycid = GetCurrentCommandId(true);
In the workers:
    AssignCommandIdForWorker(pcshared_info->mycid, true);

[1]
void
AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
{
    TransactionState s = CurrentTransactionState;

    Assert((IsInParallelMode() || IsParallelWorker()));
    s->fullTransactionId = fullTransactionId;
}

void
AssignCommandIdForWorker(CommandId commandId, bool used)
{
    Assert((IsInParallelMode() || IsParallelWorker()));

    /* this is global to a transaction, not subtransaction-local */
    if (used)
        currentCommandIdUsed = true;

    currentCommandId = commandId;
}

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> But we can tighten the condition in GetCurrentCommandId() such that it
> Asserts for parallel worker only when currentCommandIdUsed is not set
> before start of parallel operation. I also find these changes in the
> callers of GetCurrentCommandId() quite adhoc and ugly even if they are
> correct. Also, why we don't face a similar problems for parallel copy?
>

For Parallel Insert, as part of query plan execution,
GetCurrentCommandId(true) is being called as part of INSERT statement
execution.
Parallel Copy of course doesn't have to deal with this. That's why
there's a difference. And also, it has its own parallel entry point
(ParallelCopyMain), so it's in full control, it's not trying to fit in
with the infrastructure for plan execution.

> So are you facing problems in this area because we EnterParallelMode
> before even assigning the xid in the leader? Because I don't think we
> should ever reach this code in the worker. If so, there are two
> possibilities that come to my mind (a) assign xid in leader before
> entering parallel mode or (b) change the check so that we don't assign
> the new xid in workers. In this case, I am again wondering how does
> parallel copy dealing this?
>

Again, there's a fundamental difference in the Parallel Insert case.
Right at the top of ExecutePlan it calls EnterParallelMode().
For ParallelCopy(), there is no such problem. EnterParallelMode() is
only called just before ParallelCopyMain() is called. So it can easily
acquire the xid before this, because then parallel mode is not set.

As it turns out, I think I have solved the commandId issue (and almost
the xid issue) by realising that both the xid and cid are ALREADY
being included as part of the serialized transaction state in the
Parallel DSM. So actually I don't believe that there is any need for
separately passing them in the DSM, and having to use those
AssignXXXXForWorker() functions in the worker code - not even in the
Parallel Copy case (? - need to check). GetCurrentCommandId(true) and
GetFullTransactionId() need to be called prior to Parallel DSM
initialization, so they are included in the serialized transaction
state.
I just needed to add a function to set currentCommandIdUsed=true in
the worker initialization (for INSERT case) and make a small tweak to
the Assert in GetCurrentCommandId() to ensure that
currentCommandIdUsed, in a parallel worker, never gets set to true
when it is false. This is in line with the comment in that function,
because we know that "currentCommandId was already true at the start
of the parallel operation". With this in place, I don't need to change
any of the original calls to GetCurrentCommandId(), so this addresses
that issue raised by Andres.

I am not sure yet how to get past the issue of the parallel mode being
set at the top of ExecutePlan(). With that in place, it doesn't allow
a xid to be acquired for the leader, without removing/changing that
parallel-mode check in GetNewTransactionId().

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@g
>
> As it turns out, I think I have solved the commandId issue (and almost
> the xid issue) by realising that both the xid and cid are ALREADY
> being included as part of the serialized transaction state in the
> Parallel DSM. So actually I don't believe that there is any need for
> separately passing them in the DSM, and having to use those
> AssignXXXXForWorker() functions in the worker code - not even in the
> Parallel Copy case (? - need to check). GetCurrentCommandId(true) and
> GetFullTransactionId() need to be called prior to Parallel DSM
> initialization, so they are included in the serialized transaction
> state.
> I just needed to add a function to set currentCommandIdUsed=true in
> the worker initialization (for INSERT case) and make a small tweak to
> the Assert in GetCurrentCommandId() to ensure that
> currentCommandIdUsed, in a parallel worker, never gets set to true
> when it is false. This is in line with the comment in that function,
> because we know that "currentCommandId was already true at the start
> of the parallel operation". With this in place, I don't need to change
> any of the original calls to GetCurrentCommandId(), so this addresses
> that issue raised by Andres.
>
> I am not sure yet how to get past the issue of the parallel mode being
> set at the top of ExecutePlan(). With that in place, it doesn't allow
> a xid to be acquired for the leader, without removing/changing that
> parallel-mode check in GetNewTransactionId().
>

I think now there is no fundamental problem in allocating xid in the
leader and then sharing it with workers who can use it to perform the
insert. So we can probably tweak that check so that it is true for
only parallel workers.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Sep 25, 2020 at 2:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > For cases where it can't be allowed (e.g. INSERT into a table with
> > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
> > ...") it at least allows parallelism of the SELECT part.
> >
>
> Thanks Greg for the patch.
>
> 2. What happens if the target table has triggers(before statement,
> after statement, before row, after row) that are parallel unsafe?
>

In such a case, the parallel insert shouldn't be selected. However, we
should still be able to execute the Select part in parallel.

> 3. Will each worker be doing single row insertions or multi inserts?
> If single row insertions, will the buffer lock contentions be more?
>

I don't think the purpose of this patch is to change the basic flow of
how Insert works and also I am not sure if it is worth the effort as
well. I have answered this earlier in a bit more detailed way [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> Again, there's a fundamental difference in the Parallel Insert case.
> Right at the top of ExecutePlan it calls EnterParallelMode().
> For ParallelCopy(), there is no such problem. EnterParallelMode() is
> only called just before ParallelCopyMain() is called. So it can easily
> acquire the xid before this, because then parallel mode is not set.
>
> As it turns out, I think I have solved the commandId issue (and almost
> the xid issue) by realising that both the xid and cid are ALREADY
> being included as part of the serialized transaction state in the
> Parallel DSM. So actually I don't believe that there is any need for
> separately passing them in the DSM, and having to use those
> AssignXXXXForWorker() functions in the worker code - not even in the
> Parallel Copy case (? - need to check).
>

Thanks Gred for the detailed points.

I further checked on full txn id and command id. Yes, these are
getting passed to workers  via InitializeParallelDSM() ->
SerializeTransactionState(). I tried to summarize what we need to do
in case of parallel inserts in general i.e. parallel COPY, parallel
inserts in INSERT INTO and parallel inserts in CTAS.

In the leader:
    GetCurrentFullTransactionId()
    GetCurrentCommandId(true)
    EnterParallelMode();
    InitializeParallelDSM() --> calls SerializeTransactionState()
(both full txn id and command id are serialized into parallel DSM)

In the workers:
ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
full txn id and command id are restored into workers'
CurrentTransactionState->fullTransactionId and currentCommandId)
If the parallel workers are meant for insertions, then we need to set
currentCommandIdUsed = true; Maybe we can lift the assert in
GetCurrentCommandId(), if we don't want to touch that function, then
we can have a new function GetCurrentCommandidInWorker() whose
functionality will be same as GetCurrentCommandId() without the
Assert(!IsParallelWorker());.

Am I missing something?

If the above points are true, we might have to update the parallel
copy patch set, test the use cases and post separately in the parallel
copy thread in coming days.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Sat, Sep 26, 2020 at 11:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > Again, there's a fundamental difference in the Parallel Insert case.
> > Right at the top of ExecutePlan it calls EnterParallelMode().
> > For ParallelCopy(), there is no such problem. EnterParallelMode() is
> > only called just before ParallelCopyMain() is called. So it can easily
> > acquire the xid before this, because then parallel mode is not set.
> >
> > As it turns out, I think I have solved the commandId issue (and almost
> > the xid issue) by realising that both the xid and cid are ALREADY
> > being included as part of the serialized transaction state in the
> > Parallel DSM. So actually I don't believe that there is any need for
> > separately passing them in the DSM, and having to use those
> > AssignXXXXForWorker() functions in the worker code - not even in the
> > Parallel Copy case (? - need to check).
> >
>
> Thanks Gred for the detailed points.
>
> I further checked on full txn id and command id. Yes, these are
> getting passed to workers  via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
>     GetCurrentFullTransactionId()
>     GetCurrentCommandId(true)
>     EnterParallelMode();
>     InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>

This won't be true for Parallel Insert patch as explained by Greg as
well because we enter-parallel-mode much before we assign xid.


-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Sep 25, 2020 at 9:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > > What if this
> > > ends up being invoked from inside C code?
> > >
> >
> > I think it shouldn't be a problem unless one is trying to do something
> > like insert into foreign key table. So, probably we can have an Assert
> > to catch it if possible. Do you have any other idea?
> >
>
> Note that the planner code updated by the patch does avoid creating a
> Parallel INSERT plan in the case of inserting into a table with a
> foreign key (so commandIds won't be created in the parallel-worker
> code).
> I'm not sure how to distinguish the "invoked from inside C code" case though.
>

I think if possible we can have an Assert to check if it is a
parallel-worker and relation has a foreign-key. Similarly, we can
enhance the check for any other un-safe use. This will prevent the
illegal usage of inserts via parallel workers.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
vignesh C
Date:
On Wed, Sep 23, 2020 at 2:21 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > - When INSERTs are made parallel, currently the reported row-count in
> > the "INSERT 0 <row-count>" status only reflects the rows that the
> > leader has processed (not the workers) - so it is obviously less than
> > the actual number of rows inserted.
>
> Attached an updated patch which fixes this issue (for parallel
> INSERTs, each worker's processed tuple count is communicated in shared
> memory back to the leader, where it is added to the global
> "es_processed" count).

I noticed that we are not having any check for skipping temporary
table insertion.

/* Check if the target relation has foreign keys; if so, avoid
* creating a parallel Insert plan (because inserting into
* such tables would result in creation of new CommandIds, and
* this isn't supported by parallel workers).
* Similarly, avoid creating a parallel Insert plan if ON
* CONFLICT ... DO UPDATE ... has been specified, because
* parallel UPDATE is not supported.
* However, do allow any underlying query to be run by parallel
* workers in these cases.
*/

You should also include temporary tables check here, as parallel
workers might not have access to temporary tables.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Sun, Sep 27, 2020 at 2:03 AM vignesh C <vignesh21@gmail.com> wrote:
>
> I noticed that we are not having any check for skipping temporary
> table insertion.
>

> You should also include temporary tables check here, as parallel
> workers might not have access to temporary tables.
>

Thanks Vignesh, you are right, I need to test this and add it to the
list of further exclusions that the patch needs to check for.
Hopefully I can provide an updated patch soon that caters for these
additional identified cases.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> I further checked on full txn id and command id. Yes, these are
> getting passed to workers  via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
>     GetCurrentFullTransactionId()
>     GetCurrentCommandId(true)
>     EnterParallelMode();
>     InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>
> In the workers:
> ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> full txn id and command id are restored into workers'
> CurrentTransactionState->fullTransactionId and currentCommandId)
> If the parallel workers are meant for insertions, then we need to set
> currentCommandIdUsed = true; Maybe we can lift the assert in
> GetCurrentCommandId(), if we don't want to touch that function, then
> we can have a new function GetCurrentCommandidInWorker() whose
> functionality will be same as GetCurrentCommandId() without the
> Assert(!IsParallelWorker());.
>
> Am I missing something?
>
> If the above points are true, we might have to update the parallel
> copy patch set, test the use cases and post separately in the parallel
> copy thread in coming days.
>

Hi Bharath,

I pretty much agree with your above points.

I've attached an updated Parallel INSERT...SELECT patch, that:
- Only uses existing transaction state serialization support for
transfer of xid and cid.
- Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
currentCommandIdUsed=true at the start of a parallel operation (used
for Parallel INSERT case, where we know the currentCommandId has been
assigned to the worker at the start of the parallel operation).
- Tweaks the Assert condition within "used=true" parameter case in
GetCurrentCommandId(), so that it only fires if in a parallel worker
and currentCommandId is false - refer to the updated comment in that
function.
- Does not modify any existing GetCurrentCommandId() calls.
- Does not remove any existing parallel-related asserts/checks, except
for the "cannot insert tuples in a parallel worker" error in
heap_prepare_insert(). I am still considering what to do with the
original error-check here.
[- Does not yet cater for other exclusion cases that you and Vignesh
have pointed out]

This patch is mostly a lot cleaner, but does contain a possible ugly
hack, in that where it needs to call GetCurrentFullTransactionId(), it
must temporarily escape parallel-mode (recalling that parallel-mode is
set true right at the top of ExectePlan() in the cases of Parallel
INSERT/SELECT).

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I further checked on full txn id and command id. Yes, these are
> > getting passed to workers  via InitializeParallelDSM() ->
> > SerializeTransactionState(). I tried to summarize what we need to do
> > in case of parallel inserts in general i.e. parallel COPY, parallel
> > inserts in INSERT INTO and parallel inserts in CTAS.
> >
> > In the leader:
> >     GetCurrentFullTransactionId()
> >     GetCurrentCommandId(true)
> >     EnterParallelMode();
> >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > (both full txn id and command id are serialized into parallel DSM)
> >
> > In the workers:
> > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > full txn id and command id are restored into workers'
> > CurrentTransactionState->fullTransactionId and currentCommandId)
> > If the parallel workers are meant for insertions, then we need to set
> > currentCommandIdUsed = true; Maybe we can lift the assert in
> > GetCurrentCommandId(), if we don't want to touch that function, then
> > we can have a new function GetCurrentCommandidInWorker() whose
> > functionality will be same as GetCurrentCommandId() without the
> > Assert(!IsParallelWorker());.
> >
> > Am I missing something?
> >
> > If the above points are true, we might have to update the parallel
> > copy patch set, test the use cases and post separately in the parallel
> > copy thread in coming days.
> >
>
> Hi Bharath,
>
> I pretty much agree with your above points.
>
> I've attached an updated Parallel INSERT...SELECT patch, that:
> - Only uses existing transaction state serialization support for
> transfer of xid and cid.
> - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> currentCommandIdUsed=true at the start of a parallel operation (used
> for Parallel INSERT case, where we know the currentCommandId has been
> assigned to the worker at the start of the parallel operation).
> - Tweaks the Assert condition within "used=true" parameter case in
> GetCurrentCommandId(), so that it only fires if in a parallel worker
> and currentCommandId is false - refer to the updated comment in that
> function.
> - Does not modify any existing GetCurrentCommandId() calls.
> - Does not remove any existing parallel-related asserts/checks, except
> for the "cannot insert tuples in a parallel worker" error in
> heap_prepare_insert(). I am still considering what to do with the
> original error-check here.
> [- Does not yet cater for other exclusion cases that you and Vignesh
> have pointed out]
>
> This patch is mostly a lot cleaner, but does contain a possible ugly
> hack, in that where it needs to call GetCurrentFullTransactionId(), it
> must temporarily escape parallel-mode (recalling that parallel-mode is
> set true right at the top of ExectePlan() in the cases of Parallel
> INSERT/SELECT).
>

Thanks Greg.

In general, see a few things common to all parallel insert
cases(CTAS[1], COPY[2], INSERT INTO SELECTs):
1. Removal of "cannot insert tuples in a parallel worker" restriction
from heap_prepare_insert()
2. Each worker should be able to set currentCommandIdUsed to true.
3. The change you proposed to make in GetCurrentCommandId()'s assert condition.

Please add if I miss any other common point.

Common solutions to each of the above points would be beneficial to
all the parallel insert cases. How about having a common thread,
discussion and a common patch for all the 3 points?

@Amit Kapila  @Greg Nancarrow @vignesh C Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACWj%2B3H5TQqwxANZmdePEnSNxk-YAeT1c5WE184Gf75XUw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Sep 28, 2020 at 4:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I further checked on full txn id and command id. Yes, these are
> > > getting passed to workers  via InitializeParallelDSM() ->
> > > SerializeTransactionState(). I tried to summarize what we need to do
> > > in case of parallel inserts in general i.e. parallel COPY, parallel
> > > inserts in INSERT INTO and parallel inserts in CTAS.
> > >
> > > In the leader:
> > >     GetCurrentFullTransactionId()
> > >     GetCurrentCommandId(true)
> > >     EnterParallelMode();
> > >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > > (both full txn id and command id are serialized into parallel DSM)
> > >
> > > In the workers:
> > > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > > full txn id and command id are restored into workers'
> > > CurrentTransactionState->fullTransactionId and currentCommandId)
> > > If the parallel workers are meant for insertions, then we need to set
> > > currentCommandIdUsed = true; Maybe we can lift the assert in
> > > GetCurrentCommandId(), if we don't want to touch that function, then
> > > we can have a new function GetCurrentCommandidInWorker() whose
> > > functionality will be same as GetCurrentCommandId() without the
> > > Assert(!IsParallelWorker());.
> > >
> > > Am I missing something?
> > >
> > > If the above points are true, we might have to update the parallel
> > > copy patch set, test the use cases and post separately in the parallel
> > > copy thread in coming days.
> > >
> >
> > Hi Bharath,
> >
> > I pretty much agree with your above points.
> >
> > I've attached an updated Parallel INSERT...SELECT patch, that:
> > - Only uses existing transaction state serialization support for
> > transfer of xid and cid.
> > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> > currentCommandIdUsed=true at the start of a parallel operation (used
> > for Parallel INSERT case, where we know the currentCommandId has been
> > assigned to the worker at the start of the parallel operation).
> > - Tweaks the Assert condition within "used=true" parameter case in
> > GetCurrentCommandId(), so that it only fires if in a parallel worker
> > and currentCommandId is false - refer to the updated comment in that
> > function.
> > - Does not modify any existing GetCurrentCommandId() calls.
> > - Does not remove any existing parallel-related asserts/checks, except
> > for the "cannot insert tuples in a parallel worker" error in
> > heap_prepare_insert(). I am still considering what to do with the
> > original error-check here.
> > [- Does not yet cater for other exclusion cases that you and Vignesh
> > have pointed out]
> >
> > This patch is mostly a lot cleaner, but does contain a possible ugly
> > hack, in that where it needs to call GetCurrentFullTransactionId(), it
> > must temporarily escape parallel-mode (recalling that parallel-mode is
> > set true right at the top of ExectePlan() in the cases of Parallel
> > INSERT/SELECT).
> >
>
> Thanks Greg.
>
> In general, see a few things common to all parallel insert
> cases(CTAS[1], COPY[2], INSERT INTO SELECTs):
> 1. Removal of "cannot insert tuples in a parallel worker" restriction
> from heap_prepare_insert()
> 2. Each worker should be able to set currentCommandIdUsed to true.
> 3. The change you proposed to make in GetCurrentCommandId()'s assert condition.
>
> Please add if I miss any other common point.
>
> Common solutions to each of the above points would be beneficial to
> all the parallel insert cases. How about having a common thread,
> discussion and a common patch for all the 3 points?
>

I am not sure if that is required at this stage, lets first sort out
other parts of the design because there could be other bigger problems
which we have not thought bout yet. I have already shared some
thoughts on those points in this thread, lets first get that done and
have the basic patch ready then if required we can discuss in detail
about these points in other thread.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I further checked on full txn id and command id. Yes, these are
> > getting passed to workers  via InitializeParallelDSM() ->
> > SerializeTransactionState(). I tried to summarize what we need to do
> > in case of parallel inserts in general i.e. parallel COPY, parallel
> > inserts in INSERT INTO and parallel inserts in CTAS.
> >
> > In the leader:
> >     GetCurrentFullTransactionId()
> >     GetCurrentCommandId(true)
> >     EnterParallelMode();
> >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > (both full txn id and command id are serialized into parallel DSM)
> >
> > In the workers:
> > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > full txn id and command id are restored into workers'
> > CurrentTransactionState->fullTransactionId and currentCommandId)
> > If the parallel workers are meant for insertions, then we need to set
> > currentCommandIdUsed = true; Maybe we can lift the assert in
> > GetCurrentCommandId(), if we don't want to touch that function, then
> > we can have a new function GetCurrentCommandidInWorker() whose
> > functionality will be same as GetCurrentCommandId() without the
> > Assert(!IsParallelWorker());.
> >
> > Am I missing something?
> >
> > If the above points are true, we might have to update the parallel
> > copy patch set, test the use cases and post separately in the parallel
> > copy thread in coming days.
> >
>
> Hi Bharath,
>
> I pretty much agree with your above points.
>
> I've attached an updated Parallel INSERT...SELECT patch, that:
> - Only uses existing transaction state serialization support for
> transfer of xid and cid.
> - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> currentCommandIdUsed=true at the start of a parallel operation (used
> for Parallel INSERT case, where we know the currentCommandId has been
> assigned to the worker at the start of the parallel operation).
> - Tweaks the Assert condition within "used=true" parameter case in
> GetCurrentCommandId(), so that it only fires if in a parallel worker
> and currentCommandId is false - refer to the updated comment in that
> function.
> - Does not modify any existing GetCurrentCommandId() calls.
> - Does not remove any existing parallel-related asserts/checks, except
> for the "cannot insert tuples in a parallel worker" error in
> heap_prepare_insert(). I am still considering what to do with the
> original error-check here.
> [- Does not yet cater for other exclusion cases that you and Vignesh
> have pointed out]
>
> This patch is mostly a lot cleaner, but does contain a possible ugly
> hack, in that where it needs to call GetCurrentFullTransactionId(), it
> must temporarily escape parallel-mode (recalling that parallel-mode is
> set true right at the top of ExectePlan() in the cases of Parallel
> INSERT/SELECT).

I think you still need to work on the costing part, basically if we
are parallelizing whole insert then plan is like below

-> Gather
  -> Parallel Insert
      -> Parallel Seq Scan

That means the tuple we are selecting via scan are not sent back to
the gather node, so in cost_gather we need to see if it is for the
INSERT then there is no row transferred through the parallel queue
that mean we need not to pay any parallel tuple cost.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Tue, Sep 29, 2020 at 8:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 8:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I further checked on full txn id and command id. Yes, these are
> > > getting passed to workers  via InitializeParallelDSM() ->
> > > SerializeTransactionState(). I tried to summarize what we need to do
> > > in case of parallel inserts in general i.e. parallel COPY, parallel
> > > inserts in INSERT INTO and parallel inserts in CTAS.
> > >
> > > In the leader:
> > >     GetCurrentFullTransactionId()
> > >     GetCurrentCommandId(true)
> > >     EnterParallelMode();
> > >     InitializeParallelDSM() --> calls SerializeTransactionState()
> > > (both full txn id and command id are serialized into parallel DSM)
> > >
> > > In the workers:
> > > ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> > > full txn id and command id are restored into workers'
> > > CurrentTransactionState->fullTransactionId and currentCommandId)
> > > If the parallel workers are meant for insertions, then we need to set
> > > currentCommandIdUsed = true; Maybe we can lift the assert in
> > > GetCurrentCommandId(), if we don't want to touch that function, then
> > > we can have a new function GetCurrentCommandidInWorker() whose
> > > functionality will be same as GetCurrentCommandId() without the
> > > Assert(!IsParallelWorker());.
> > >
> > > Am I missing something?
> > >
> > > If the above points are true, we might have to update the parallel
> > > copy patch set, test the use cases and post separately in the parallel
> > > copy thread in coming days.
> > >
> >
> > Hi Bharath,
> >
> > I pretty much agree with your above points.
> >
> > I've attached an updated Parallel INSERT...SELECT patch, that:
> > - Only uses existing transaction state serialization support for
> > transfer of xid and cid.
> > - Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
> > currentCommandIdUsed=true at the start of a parallel operation (used
> > for Parallel INSERT case, where we know the currentCommandId has been
> > assigned to the worker at the start of the parallel operation).
> > - Tweaks the Assert condition within "used=true" parameter case in
> > GetCurrentCommandId(), so that it only fires if in a parallel worker
> > and currentCommandId is false - refer to the updated comment in that
> > function.
> > - Does not modify any existing GetCurrentCommandId() calls.
> > - Does not remove any existing parallel-related asserts/checks, except
> > for the "cannot insert tuples in a parallel worker" error in
> > heap_prepare_insert(). I am still considering what to do with the
> > original error-check here.
> > [- Does not yet cater for other exclusion cases that you and Vignesh
> > have pointed out]
> >
> > This patch is mostly a lot cleaner, but does contain a possible ugly
> > hack, in that where it needs to call GetCurrentFullTransactionId(), it
> > must temporarily escape parallel-mode (recalling that parallel-mode is
> > set true right at the top of ExectePlan() in the cases of Parallel
> > INSERT/SELECT).
>
> I think you still need to work on the costing part, basically if we
> are parallelizing whole insert then plan is like below
>
> -> Gather
>   -> Parallel Insert
>       -> Parallel Seq Scan
>
> That means the tuple we are selecting via scan are not sent back to
> the gather node, so in cost_gather we need to see if it is for the
> INSERT then there is no row transferred through the parallel queue
> that mean we need not to pay any parallel tuple cost.

I just looked into the parallel CTAS[1] patch for the same thing, and
I can see in that patch it is being handled.

[1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
> >
> > I think you still need to work on the costing part, basically if we
> > are parallelizing whole insert then plan is like below
> >
> > -> Gather
> >   -> Parallel Insert
> >       -> Parallel Seq Scan
> >
> > That means the tuple we are selecting via scan are not sent back to
> > the gather node, so in cost_gather we need to see if it is for the
> > INSERT then there is no row transferred through the parallel queue
> > that mean we need not to pay any parallel tuple cost.
>
> I just looked into the parallel CTAS[1] patch for the same thing, and
> I can see in that patch it is being handled.
>
> [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
>

Hi Dilip,

You're right, the costing for Parallel Insert is not done and
finished, I'm still working on the costing, and haven't posted an
updated patch for it yet.
As far as cost_gather() method is concerned, for Parallel INSERT, it
can probably use the same costing approach as the CTAS patch except in
the case of a specified RETURNING clause.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > >
> > > I think you still need to work on the costing part, basically if we
> > > are parallelizing whole insert then plan is like below
> > >
> > > -> Gather
> > >   -> Parallel Insert
> > >       -> Parallel Seq Scan
> > >
> > > That means the tuple we are selecting via scan are not sent back to
> > > the gather node, so in cost_gather we need to see if it is for the
> > > INSERT then there is no row transferred through the parallel queue
> > > that mean we need not to pay any parallel tuple cost.
> >
> > I just looked into the parallel CTAS[1] patch for the same thing, and
> > I can see in that patch it is being handled.
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> >
>
> Hi Dilip,
>
> You're right, the costing for Parallel Insert is not done and
> finished, I'm still working on the costing, and haven't posted an
> updated patch for it yet.

Okay.

> As far as cost_gather() method is concerned, for Parallel INSERT, it
> can probably use the same costing approach as the CTAS patch except in
> the case of a specified RETURNING clause.

Yeah right.  I did not think about the returning part.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > >
> > > I think you still need to work on the costing part, basically if we
> > > are parallelizing whole insert then plan is like below
> > >
> > > -> Gather
> > >   -> Parallel Insert
> > >       -> Parallel Seq Scan
> > >
> > > That means the tuple we are selecting via scan are not sent back to
> > > the gather node, so in cost_gather we need to see if it is for the
> > > INSERT then there is no row transferred through the parallel queue
> > > that mean we need not to pay any parallel tuple cost.
> >
> > I just looked into the parallel CTAS[1] patch for the same thing, and
> > I can see in that patch it is being handled.
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> >
>
> Hi Dilip,
>
> You're right, the costing for Parallel Insert is not done and
> finished, I'm still working on the costing, and haven't posted an
> updated patch for it yet.
> As far as cost_gather() method is concerned, for Parallel INSERT, it
> can probably use the same costing approach as the CTAS patch except in
> the case of a specified RETURNING clause.
>

I have one question which is common to both this patch and parallel
inserts in CTAS[1], do we need to skip creating tuple
queues(ExecParallelSetupTupleQueues) as we don't have any tuples
that's being shared from workers to leader? Put it another way, do we
use the tuple queue for sharing any info other than tuples from
workers to leader?

[1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > > >
> > > > I think you still need to work on the costing part, basically if we
> > > > are parallelizing whole insert then plan is like below
> > > >
> > > > -> Gather
> > > >   -> Parallel Insert
> > > >       -> Parallel Seq Scan
> > > >
> > > > That means the tuple we are selecting via scan are not sent back to
> > > > the gather node, so in cost_gather we need to see if it is for the
> > > > INSERT then there is no row transferred through the parallel queue
> > > > that mean we need not to pay any parallel tuple cost.
> > >
> > > I just looked into the parallel CTAS[1] patch for the same thing, and
> > > I can see in that patch it is being handled.
> > >
> > > [1]
https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> > >
> >
> > Hi Dilip,
> >
> > You're right, the costing for Parallel Insert is not done and
> > finished, I'm still working on the costing, and haven't posted an
> > updated patch for it yet.
> > As far as cost_gather() method is concerned, for Parallel INSERT, it
> > can probably use the same costing approach as the CTAS patch except in
> > the case of a specified RETURNING clause.
> >
>
> I have one question which is common to both this patch and parallel
> inserts in CTAS[1], do we need to skip creating tuple
> queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> that's being shared from workers to leader? Put it another way, do we
> use the tuple queue for sharing any info other than tuples from
> workers to leader?

Ideally, we don't need the tuple queue unless we want to transfer the
tuple to the gather node.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
>
> I have one question which is common to both this patch and parallel
> inserts in CTAS[1], do we need to skip creating tuple
> queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> that's being shared from workers to leader?
>

As far as this patch is concerned we might need to return tuples when
there is a Returning clause. I think for the cases where we don't need
to return tuples we might want to skip creating these queues if it is
feasible without too many changes.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Mon, Oct 5, 2020 at 4:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> >
> > I have one question which is common to both this patch and parallel
> > inserts in CTAS[1], do we need to skip creating tuple
> > queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> > that's being shared from workers to leader?
> >
>
> As far as this patch is concerned we might need to return tuples when
> there is a Returning clause. I think for the cases where we don't need
> to return tuples we might want to skip creating these queues if it is
> feasible without too many changes.

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > > I have one question which is common to both this patch and parallel
> > > inserts in CTAS[1], do we need to skip creating tuple
> > > queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> > > that's being shared from workers to leader?
> > >
> >
> > As far as this patch is concerned we might need to return tuples when
> > there is a Returning clause. I think for the cases where we don't need
> > to return tuples we might want to skip creating these queues if it is
> > feasible without too many changes.
>

Hi Dilip,

You're right. I've included that in my latest version of the patch (so
Gather should only start tuple queues in the case of parallel SELECT
or parallel INSERT with a RETURNING clause).
Other functionality updated includes:
- Added more necessary exclusions for Parallel INSERT INTO ... SELECT
... (but allowing underlying query to still be parallel):
  - non-parallel-safe triggers
  - non-parallel-safe default and check expressions
  - foreign tables
  - temporary tables
- Added support for before/after statement-level INSERT triggers
(can't allow parallel workers to execute these)
- Adjusted cost of Gather node, for when RETURNING clause is not specified
I have not found issues with partition tables (yet) or toast column values.

Also, I have attached a separate patch (requested by Andres Freund)
that just allows the underlying SELECT part of "INSERT INTO ... SELECT
..." to be parallel.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> I have not found issues with partition tables (yet) or toast column values.
>

I think for toast column values there may not be a problem as each
parallel worker inserts toast column values individually.

But the problem may arise if a partitioned table has foreign table as
a partition, I think we can not allow parallelism for this case too,
but it's hard to determine ahead of time whether a table has a foreign
partition.(See [1] in copy.c)

>
> - Added support for before/after statement-level INSERT triggers
> (can't allow parallel workers to execute these)
>

I think we can allow parallelism for before statement level-triggers.
Leader can execute this trigger and go for parallel inserts.

How about before row, after row, instead row, new table type triggers?

[1]
    else
    {
        /*
         * For partitioned tables, we may still be able to perform bulk
         * inserts.  However, the possibility of this depends on which types
         * of triggers exist on the partition.  We must disable bulk inserts
         * if the partition is a foreign table or it has any before row insert
         * or insert instead triggers (same as we checked above for the parent
         * table).  Since the partition's resultRelInfos are initialized only
         * when we actually need to insert the first tuple into them, we must
         * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
         * flag that we must later determine if we can use bulk-inserts for
         * the partition being inserted into.
         */
        if (proute)
            insertMethod = CIM_MULTI_CONDITIONAL;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Oct 6, 2020 at 9:10 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> But the problem may arise if a partitioned table has foreign table as
> a partition, I think we can not allow parallelism for this case too,
> but it's hard to determine ahead of time whether a table has a foreign
> partition.(See [1] in copy.c)
>

Thanks, I had seen that as a potential issue when scanning the code,
but had forgotten to note it. I'll check your code again.

> >
> > - Added support for before/after statement-level INSERT triggers
> > (can't allow parallel workers to execute these)
> >
>
> I think we can allow parallelism for before statement level-triggers.
> Leader can execute this trigger and go for parallel inserts.
>

My attached patch implements the before/after statement-level trigger
invocation.
(For INSERT INTO ... SELECT... case, it needs to account for parallel
and non-parallel INSERT, and also the fact that, as the patch
currently stands, the leader also participates in a parallel INSERT -
so I found it necessary to invoke those triggers at the Gather node
level in that case).

> How about before row, after row, instead row, new table type triggers?
>

My attached patch does not allow parallel INSERT if there are any
row-level triggers (as the trigger functions could see a different and
unpredictable table state compared to non-parallel INSERT, even if
otherwise parallel-safe).

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Tue, Oct 6, 2020 at 4:13 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 9:10 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > But the problem may arise if a partitioned table has foreign table as
> > a partition, I think we can not allow parallelism for this case too,
> > but it's hard to determine ahead of time whether a table has a foreign
> > partition.(See [1] in copy.c)
> >
>
> Thanks, I had seen that as a potential issue when scanning the code,
> but had forgotten to note it. I'll check your code again.
>

In parallel, we are not doing anything(due to the same reason
explained in above comment) to find whether there is a foreign
partition or not while deciding to go with parallel/non-parallel copy,
we are just throwing an error during the first tuple insertion into
the partition.

errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
OF triggers, or if the partition is foreign partition"),
                            errhint("Try COPY without PARALLEL option")));

> > >
> > > - Added support for before/after statement-level INSERT triggers
> > > (can't allow parallel workers to execute these)
> > >
> >
> > I think we can allow parallelism for before statement level-triggers.
> > Leader can execute this trigger and go for parallel inserts.
> >
>
> My attached patch implements the before/after statement-level trigger
> invocation.
> (For INSERT INTO ... SELECT... case, it needs to account for parallel
> and non-parallel INSERT, and also the fact that, as the patch
> currently stands, the leader also participates in a parallel INSERT -
> so I found it necessary to invoke those triggers at the Gather node
> level in that case).
>

Allowing the leader to execute before statement triggers at Gather
node level before invoking the parallel plan and then parallel inserts
makes sense. But if there are any after statement triggers, there may
come transition tables, see Amit's findings under Case-1 in [1] and we
must disable parallelism in that case.

[1] -
https://www.postgresql.org/message-id/flat/CAA4eK1%2BANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> In parallel, we are not doing anything(due to the same reason
> explained in above comment) to find whether there is a foreign
> partition or not while deciding to go with parallel/non-parallel copy,
> we are just throwing an error during the first tuple insertion into
> the partition.
>
> errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
> OF triggers, or if the partition is foreign partition"),
>                             errhint("Try COPY without PARALLEL option")));
>

I may well need to do something similar for parallel INSERT, but I'm
kind of surprised it can't be detected earlier (?).
Will need to further test this.

>
> Allowing the leader to execute before statement triggers at Gather
> node level before invoking the parallel plan and then parallel inserts
> makes sense. But if there are any after statement triggers, there may
> come transition tables, see Amit's findings under Case-1 in [1] and we
> must disable parallelism in that case.
>
> [1] -
https://www.postgresql.org/message-id/flat/CAA4eK1%2BANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA%40mail.gmail.com
>

The patch I last posted for parallel INSERT does detect use of
transition tables in this case (trigdesc->trig_insert_new_table) and
disables INSERT parallelism (I tested it against Amit's example), yet
still otherwise allows AFTER STATEMENT triggers for parallel INSERT.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> In parallel, we are not doing anything(due to the same reason
> explained in above comment) to find whether there is a foreign
> partition or not while deciding to go with parallel/non-parallel copy,
> we are just throwing an error during the first tuple insertion into
> the partition.
>
> errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
> OF triggers, or if the partition is foreign partition"),
>                             errhint("Try COPY without PARALLEL option")));
>

I'm wondering whether code similar to the following can safely be used
to detect a foreign partition:

    if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
    {
        int i;
        PartitionDesc pd = RelationGetPartitionDesc(rel);
        for (i = 0; i < pd->nparts; i++)
        {
            if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE)
            {
                table_close(rel, NoLock);
                return false;
            }
        }
    }

Thoughts?

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Oct 7, 2020 at 7:25 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > In parallel, we are not doing anything(due to the same reason
> > explained in above comment) to find whether there is a foreign
> > partition or not while deciding to go with parallel/non-parallel copy,
> > we are just throwing an error during the first tuple insertion into
> > the partition.
> >
> > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
> > OF triggers, or if the partition is foreign partition"),
> >                             errhint("Try COPY without PARALLEL option")));
> >
>
> I'm wondering whether code similar to the following can safely be used
> to detect a foreign partition:
>
>     if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>     {
>         int i;
>         PartitionDesc pd = RelationGetPartitionDesc(rel);
>         for (i = 0; i < pd->nparts; i++)
>         {
>             if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE)
>             {
>                 table_close(rel, NoLock);
>                 return false;
>             }
>         }
>     }
>

Actually, the addition of this kind of check is still not good enough.
Partitions can have their own constraints, triggers, column default
expressions etc. and a partition itself can be partitioned.
I've written code to recursively walk the partitions and do all the
various checks for parallel-insert-safety as before, but it's doing a
fair bit of work.
Any other idea of dealing with this? Seems it can't be avoided if you
want to support partitioned tables and partitions.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Bharath Rupireddy
Date:
On Thu, Oct 8, 2020 at 1:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Oct 7, 2020 at 7:25 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > In parallel, we are not doing anything(due to the same reason
> > > explained in above comment) to find whether there is a foreign
> > > partition or not while deciding to go with parallel/non-parallel copy,
> > > we are just throwing an error during the first tuple insertion into
> > > the partition.
> > >
> > > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
> > > OF triggers, or if the partition is foreign partition"),
> > >                             errhint("Try COPY without PARALLEL option")));
> > >
> >
> > I'm wondering whether code similar to the following can safely be used
> > to detect a foreign partition:
> >
> >     if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> >     {
> >         int i;
> >         PartitionDesc pd = RelationGetPartitionDesc(rel);
> >         for (i = 0; i < pd->nparts; i++)
> >         {
> >             if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE)
> >             {
> >                 table_close(rel, NoLock);
> >                 return false;
> >             }
> >         }
> >     }
> >
>
> Actually, the addition of this kind of check is still not good enough.
> Partitions can have their own constraints, triggers, column default
> expressions etc. and a partition itself can be partitioned.
> I've written code to recursively walk the partitions and do all the
> various checks for parallel-insert-safety as before, but it's doing a
> fair bit of work.
> Any other idea of dealing with this? Seems it can't be avoided if you
> want to support partitioned tables and partitions.
>

IMHO, it's good not to do all of this recursive checking right now,
which may complicate the code or may restrict the performance gain.
Having said that, in future we may have to do something about it.

Others may have better opinions on this point.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Tue, Oct 6, 2020 at 10:38 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
+            if (estate->es_plannedstmt->commandType == CMD_INSERT)
...
+    if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
...
+        isParallelInsertLeader = nodeModifyTableState->operation == CMD_INSERT;
...

One thing I noticed is that you have logic, variable names and
assertions all over the tree that assume that we can only do parallel
*inserts*.  I agree 100% with your plan to make Parallel Insert work
first, it is an excellent goal and if we get it in it'll be a headline
feature of PG14 (along with COPY etc).  That said, I wonder if it
would make sense to use more general naming (isParallelModifyLeader?),
be more liberal where you really mean "is it DML", and find a way to
centralise the logic about which DML commands types are currently
allowed (ie insert only for now) for assertions and error checks etc,
so that in future we don't have to go around and change all these
places and rename things again and again.

While contemplating that, I couldn't resist taking a swing at the main
(?) show stopper for Parallel Update and Parallel Delete, judging by
various clues left in code comments by Robert: combo command IDs
created by other processes.  Here's a rapid prototype to make that
work (though perhaps not as efficiently as we'd want, not sure).  With
that in place, I wonder what else we'd need to extend your patch to
cover all three operations... it can't be much!  Of course I don't
want to derail your work on Parallel Insert, I'm just providing some
motivation for my comments on the (IMHO) shortsightedness of some of
the coding.

PS Why not use git format-patch to create patches?

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 8:41 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> One thing I noticed is that you have logic, variable names and
> assertions all over the tree that assume that we can only do parallel
> *inserts*.  I agree 100% with your plan to make Parallel Insert work
> first, it is an excellent goal and if we get it in it'll be a headline
> feature of PG14 (along with COPY etc).  That said, I wonder if it
> would make sense to use more general naming (isParallelModifyLeader?),
> be more liberal where you really mean "is it DML", and find a way to
> centralise the logic about which DML commands types are currently
> allowed (ie insert only for now) for assertions and error checks etc,
> so that in future we don't have to go around and change all these
> places and rename things again and again.
>

Fair points.
I agree, it would make more sense to generalise the naming and
centralise the DML-command-type checks, rather than everything being
insert-specific.
It was getting a bit ugly. I'll work on that.

> While contemplating that, I couldn't resist taking a swing at the main
> (?) show stopper for Parallel Update and Parallel Delete, judging by
> various clues left in code comments by Robert: combo command IDs
> created by other processes.  Here's a rapid prototype to make that
> work (though perhaps not as efficiently as we'd want, not sure).  With
> that in place, I wonder what else we'd need to extend your patch to
> cover all three operations... it can't be much!  Of course I don't
> want to derail your work on Parallel Insert, I'm just providing some
> motivation for my comments on the (IMHO) shortsightedness of some of
> the coding.
>

Thanks for your prototype code for coordination of combo command IDs
with the workers.
It does give me the incentive to look beyond that issue and see
whether parallel Update and parallel Delete are indeed possible. I'll
be sure to give it a go!

> PS Why not use git format-patch to create patches?

Guess I was being a bit lazy - will use git format-patch in future.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Fri, Oct 9, 2020 at 3:48 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> It does give me the incentive to look beyond that issue and see
> whether parallel Update and parallel Delete are indeed possible. I'll
> be sure to give it a go!

Cool!

A couple more observations:

+       pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;

Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
If you change it to assign false always, it works just the same,
except EXPLAIN says:

 Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
   Workers Planned: 2
   ->  Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
         ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)

... instead of:

 Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
   Workers Planned: 2
   ->  Parallel Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
         ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)

AFAICS it's not parallel-aware, it just happens to be running in
parallel with a partial input and partial output (and in this case,
effect in terms of writes).  Parallel-aware is our term for nodes that
actually know they are running in parallel and do some special
coordination with their twins in other processes.

The estimated row count also looks wrong; at a guess, the parallel
divisor is applied twice.  Let me try that with
parallel_leader_particiation=off (which disables some funky maths in
the row estimation and makes it straight division by number of
processes):

 Gather  (cost=17629.00..18645.50 rows=1000000 width=4)
   Workers Planned: 2
   ->  Insert on s  (cost=17629.00..18645.50 rows=250000 width=4)
         ->  Parallel Hash Join  (cost=17629.00..37291.00 rows=500000 width=4)
[more nodes omitted]

Yeah, that was a join that spat out a million rows, and we correctly
estimated 500k per process, and then Insert (still with my hack to
turn off the bogus "Parallel" display in this case, but it doesn't
affect the estimation) estimated 250k per process, which is wrong.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 6:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> A couple more observations:
>
> +       pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;
>
> Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
> If you change it to assign false always, it works just the same,
> except EXPLAIN says:
>
>  Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
>          ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)
>
> ... instead of:
>
>  Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Parallel Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
>          ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)
>
> AFAICS it's not parallel-aware, it just happens to be running in
> parallel with a partial input and partial output (and in this case,
> effect in terms of writes).  Parallel-aware is our term for nodes that
> actually know they are running in parallel and do some special
> coordination with their twins in other processes.
>

Ah, thanks, I see the distinction now. I'll fix that, to restore
parallel_aware=false for the ModifyTable node.

> The estimated row count also looks wrong; at a guess, the parallel
> divisor is applied twice.  Let me try that with
> parallel_leader_particiation=off (which disables some funky maths in
> the row estimation and makes it straight division by number of
> processes):
>
>  Gather  (cost=17629.00..18645.50 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Insert on s  (cost=17629.00..18645.50 rows=250000 width=4)
>          ->  Parallel Hash Join  (cost=17629.00..37291.00 rows=500000 width=4)
> [more nodes omitted]
>
> Yeah, that was a join that spat out a million rows, and we correctly
> estimated 500k per process, and then Insert (still with my hack to
> turn off the bogus "Parallel" display in this case, but it doesn't
> affect the estimation) estimated 250k per process, which is wrong.

Thanks, I did suspect the current costing was wrong for ModifyTable
(workers>0 case), as I'd thrown it in (moving current costing code
into costsize.c) without a lot of checking or great thought, and was
on my TODO list of things to check. At least I created a placeholder
for it. Looks like I've applied a parallel-divisor again (not allowing
for that of the underlying query), as you said.
Speaking of costing, I'm not sure I really agree with the current
costing of a Gather node. Just considering a simple Parallel SeqScan
case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
Gather cost always completely drowns out any other path costs when a
large number of rows are involved (at least with default
parallel-related GUC values), such that Parallel SeqScan would never
be the cheapest path. This linear relationship in the costing based on
the rows and a parallel_tuple_cost doesn't make sense to me. Surely
after a certain amount of rows, the overhead of launching workers will
be out-weighed by the benefit of their parallel work, such that the
more rows, the more likely a Parallel SeqScan will benefit. That seems
to suggest something like a logarithmic formula (or similar) would
better match reality than what we have now. Am I wrong on this? Every
time I use default GUC values, the planner doesn't want to generate a
parallel plan. Lowering parallel-related GUCs like parallel_tuple_cost
(which I normally do for testing) influences it of course, but the
linear relationship still seems wrong.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Also, I have attached a separate patch (requested by Andres Freund)
> that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> ..." to be parallel.
>

It might be a good idea to first just get this patch committed, if
possible. So, I have reviewed the latest version of this patch:

0001-InsertParallelSelect
1.
ParallelContext *pcxt;

+ /*
+ * We need to avoid an attempt on INSERT to assign a
+ * FullTransactionId whilst in parallel mode (which is in
+ * effect due to the underlying parallel query) - so the
+ * FullTransactionId is assigned here. Parallel mode must
+ * be temporarily escaped in order for this to be possible.
+ * The FullTransactionId will be included in the transaction
+ * state that is serialized in the parallel DSM.
+ */
+ if (estate->es_plannedstmt->commandType == CMD_INSERT)
+ {
+ Assert(IsInParallelMode());
+ ExitParallelMode();
+ GetCurrentFullTransactionId();
+ EnterParallelMode();
+ }
+

This looks like a hack to me. I think you are doing this to avoid the
parallel mode checks in GetNewTransactionId(), right? If so, I have
already mentioned above [1] that we can change it so that we disallow
assigning xids for parallel workers only. The same is true for the
check in ExecGatherMerge. Do you see any problem with that suggestion?

2.
@@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  */
  if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
  IsUnderPostmaster &&
- parse->commandType == CMD_SELECT &&
+ (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
  !parse->hasModifyingCTE &&
  max_parallel_workers_per_gather > 0 &&
  !IsParallelWorker())

I think the comments above this need to be updated especially the part
where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
the leader backend writes into a completely new table.". Don't we need
to include Insert also?

3.
@@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  * parallel-unsafe, or else the query planner itself has a bug.
  */
  glob->parallelModeNeeded = glob->parallelModeOK &&
+ (parse->commandType == CMD_SELECT) &&
  (force_parallel_mode != FORCE_PARALLEL_OFF);

Why do you need this change? The comments above this code should be
updated to reflect this change. I think for the same reason the below
code seems to be modified but I don't understand the reason for the
below change as well, also it is better to update the comments for
this as well.

@@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  * Optionally add a Gather node for testing purposes, provided this is
  * actually a safe thing to do.
  */
- if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
+ if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
== CMD_SELECT && top_plan->parallel_safe)
  {
  Gather    *gather = makeNode(Gather);

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com


-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Speaking of costing, I'm not sure I really agree with the current
> costing of a Gather node. Just considering a simple Parallel SeqScan
> case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> Gather cost always completely drowns out any other path costs when a
> large number of rows are involved (at least with default
> parallel-related GUC values), such that Parallel SeqScan would never
> be the cheapest path. This linear relationship in the costing based on
> the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> after a certain amount of rows, the overhead of launching workers will
> be out-weighed by the benefit of their parallel work, such that the
> more rows, the more likely a Parallel SeqScan will benefit.
>

That will be true for the number of rows/pages we need to scan not for
the number of tuples we need to return as a result. The formula here
considers the number of rows the parallel scan will return and the
more the number of rows each parallel node needs to pass via shared
memory to gather node the more costly it will be.

We do consider the total pages we need to scan in
compute_parallel_worker() where we use a logarithmic formula to
determine the number of workers.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 0001-InsertParallelSelect
> 1.
> ParallelContext *pcxt;
>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right?

Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
that ExecutePlan() sets parallel-mode for the entire plan execution).
Also, did not expect that to necessarily remain in a final patch.

>If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

No, should be OK I guess, but will update and test to be sure.

> 2.
> @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   */
>   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>   IsUnderPostmaster &&
> - parse->commandType == CMD_SELECT &&
> + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
>   !parse->hasModifyingCTE &&
>   max_parallel_workers_per_gather > 0 &&
>   !IsParallelWorker())
>
> I think the comments above this need to be updated especially the part
> where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> the leader backend writes into a completely new table.". Don't we need
> to include Insert also?

Yes, Insert needs to be mentioned somewhere there.

>
> 3.
> @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   * parallel-unsafe, or else the query planner itself has a bug.
>   */
>   glob->parallelModeNeeded = glob->parallelModeOK &&
> + (parse->commandType == CMD_SELECT) &&
>   (force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Why do you need this change? The comments above this code should be
> updated to reflect this change. I think for the same reason the below
> code seems to be modified but I don't understand the reason for the
> below change as well, also it is better to update the comments for
> this as well.
>

OK, I will update the comments for this.
Basically, up to now, the "force_parallel_mode" has only ever operated
on a SELECT.
But since we are now allowing CMD_INSERT to be assessed for parallel
mode too, we need to prevent the force_parallel_mode logic from
sticking a Gather node over the top of arbitrary INSERTs and causing
them to be run in parallel. Not all INSERTs are suitable for parallel
operation, and also there are further considerations for
parallel-safety for INSERTs compared to SELECT. INSERTs can also
trigger UPDATEs.
If we need to support force_parallel_mode for INSERT, more work will
need to be done.

> @@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   * Optionally add a Gather node for testing purposes, provided this is
>   * actually a safe thing to do.
>   */
> - if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
> + if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
> == CMD_SELECT && top_plan->parallel_safe)
>   {
>   Gather    *gather = makeNode(Gather);
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com
>

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > Speaking of costing, I'm not sure I really agree with the current
> > costing of a Gather node. Just considering a simple Parallel SeqScan
> > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> > Gather cost always completely drowns out any other path costs when a
> > large number of rows are involved (at least with default
> > parallel-related GUC values), such that Parallel SeqScan would never
> > be the cheapest path. This linear relationship in the costing based on
> > the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> > after a certain amount of rows, the overhead of launching workers will
> > be out-weighed by the benefit of their parallel work, such that the
> > more rows, the more likely a Parallel SeqScan will benefit.
> >
>
> That will be true for the number of rows/pages we need to scan not for
> the number of tuples we need to return as a result. The formula here
> considers the number of rows the parallel scan will return and the
> more the number of rows each parallel node needs to pass via shared
> memory to gather node the more costly it will be.
>
> We do consider the total pages we need to scan in
> compute_parallel_worker() where we use a logarithmic formula to
> determine the number of workers.
>

Despite all the best intentions, the current costings seem to be
geared towards selection of a non-parallel plan over a parallel plan,
the more rows there are in the table. Yet the performance of a
parallel plan appears to be better than non-parallel-plan the more
rows there are in the table.
This doesn't seem right to me. Is there a rationale behind this costing model?
I have pointed out the part of the parallel_tuple_cost calculation
that seems to drown out all other costs (causing the cost value to be
huge), the more rows there are in the table.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 4:28 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > Speaking of costing, I'm not sure I really agree with the current
> > > costing of a Gather node. Just considering a simple Parallel SeqScan
> > > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> > > Gather cost always completely drowns out any other path costs when a
> > > large number of rows are involved (at least with default
> > > parallel-related GUC values), such that Parallel SeqScan would never
> > > be the cheapest path. This linear relationship in the costing based on
> > > the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> > > after a certain amount of rows, the overhead of launching workers will
> > > be out-weighed by the benefit of their parallel work, such that the
> > > more rows, the more likely a Parallel SeqScan will benefit.
> > >
> >
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
> >
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.
> This doesn't seem right to me. Is there a rationale behind this costing model?
>

Yes, AFAIK, there is no proof that we can get any (much) gain by
dividing the I/O among workers. It is primarily the CPU effort which
gives the benefit. So, the parallel plans show greater benefit when we
have to scan a large table and then project much lesser rows.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right? If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

Actually, there is a problem.
If I remove that "hack", and change the code in GetNewTransactionId()
to disallow xid assignment for parallel workers only, then there is
also similar code in AssignTransactionId() which gets called. If I
change that code too, in the same way, then on a parallel INSERT, that
code gets called by a parallel worker (from GetCurrentTransactionId())
and the ERROR "cannot assign XIDs in a parallel worker" results.
GetCurrentFullTransactionId() must be called in the leader, somewhere
(and will be included in the transaction state that is serialized in
the parallel DSM).
If not done here, then where?

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 5:54 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right? If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> Actually, there is a problem.
> If I remove that "hack", and change the code in GetNewTransactionId()
> to disallow xid assignment for parallel workers only, then there is
> also similar code in AssignTransactionId() which gets called.
>

I don't think workers need to call AssignTransactionId(), before that
the transactionid passed from leader should be set in
CurrentTransactionState. Why
GetCurrentTransactionId()/GetCurrentFullTransactionId(void) needs to
call AssignTransactionId() when called from worker?

> GetCurrentFullTransactionId() must be called in the leader, somewhere
> (and will be included in the transaction state that is serialized in
> the parallel DSM).
>

Yes, it should have done in the leader and then it should have been
set in the workers via StartParallelWorkerTransaction before we do any
actual operation. If that happens then GetCurrentTransactionId() won't
need to call AssignTransactionId().

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 0001-InsertParallelSelect
> > 1.
> > ParallelContext *pcxt;
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right?
>
> Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
> that ExecutePlan() sets parallel-mode for the entire plan execution).
> Also, did not expect that to necessarily remain in a final patch.
>
> >If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> No, should be OK I guess, but will update and test to be sure.
>
> > 2.
> > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   */
> >   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> >   IsUnderPostmaster &&
> > - parse->commandType == CMD_SELECT &&
> > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
> >   !parse->hasModifyingCTE &&
> >   max_parallel_workers_per_gather > 0 &&
> >   !IsParallelWorker())
> >
> > I think the comments above this need to be updated especially the part
> > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> > the leader backend writes into a completely new table.". Don't we need
> > to include Insert also?
>
> Yes, Insert needs to be mentioned somewhere there.
>
> >
> > 3.
> > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   * parallel-unsafe, or else the query planner itself has a bug.
> >   */
> >   glob->parallelModeNeeded = glob->parallelModeOK &&
> > + (parse->commandType == CMD_SELECT) &&
> >   (force_parallel_mode != FORCE_PARALLEL_OFF);
> >
> > Why do you need this change? The comments above this code should be
> > updated to reflect this change. I think for the same reason the below
> > code seems to be modified but I don't understand the reason for the
> > below change as well, also it is better to update the comments for
> > this as well.
> >
>
> OK, I will update the comments for this.
> Basically, up to now, the "force_parallel_mode" has only ever operated
> on a SELECT.
> But since we are now allowing CMD_INSERT to be assessed for parallel
> mode too, we need to prevent the force_parallel_mode logic from
> sticking a Gather node over the top of arbitrary INSERTs and causing
> them to be run in parallel. Not all INSERTs are suitable for parallel
> operation, and also there are further considerations for
> parallel-safety for INSERTs compared to SELECT. INSERTs can also
> trigger UPDATEs.
>

Sure but in that case 'top_plan->parallel_safe' should be false and it
should stick Gather node atop Insert node. For the purpose of this
patch, the scan beneath Insert should be considered as parallel_safe.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 6:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > OK, I will update the comments for this.
> > Basically, up to now, the "force_parallel_mode" has only ever operated
> > on a SELECT.
> > But since we are now allowing CMD_INSERT to be assessed for parallel
> > mode too, we need to prevent the force_parallel_mode logic from
> > sticking a Gather node over the top of arbitrary INSERTs and causing
> > them to be run in parallel. Not all INSERTs are suitable for parallel
> > operation, and also there are further considerations for
> > parallel-safety for INSERTs compared to SELECT. INSERTs can also
> > trigger UPDATEs.
> >
>
> Sure but in that case 'top_plan->parallel_safe' should be false and it
> should stick Gather node atop Insert node.
>

/should/should not.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > Sure but in that case 'top_plan->parallel_safe' should be false and it
> > should stick Gather node atop Insert node.
> >
>
> /should/should not.
>

OK, for the minimal patch, just allowing INSERT with parallel SELECT,
you're right, neither of those additional "commandType == CMD_SELECT"
checks are needed, so I'll remove them. (In the main patch, I think
the first check can be removed, once the XID handling is fixed; the
second check is definitely needed though).

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Fri, Oct 9, 2020 at 11:58 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.

Right, but as Amit said, we still have to account for the cost of
schlepping tuples between processes.  Hmm... could the problem be that
we're incorrectly estimating that Insert (without RETURNING) will send
a bazillion tuples, even though that isn't true?  I didn't look at the
code but that's what the plan seems to imply when it says stuff like
"Gather  (cost=15428.00..16101.14 rows=1000000 width=4)".  I suppose
the row estimates for ModifyTable paths are based on what they write,
not what they emit, and in the past that distinction didn't matter
much because it wasn't something that was used for comparing
alternative plans.  Now it is.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 7:32 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > Sure but in that case 'top_plan->parallel_safe' should be false and it
> > > should stick Gather node atop Insert node.
> > >
> >
> > /should/should not.
> >
>
> OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> you're right, neither of those additional "commandType == CMD_SELECT"
> checks are needed, so I'll remove them.
>

Okay, that makes sense.

> (In the main patch, I think
> the first check can be removed, once the XID handling is fixed; the
> second check is definitely needed though).
>

Okay, then move that check but please do add some comments to state the reason.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 9, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Also, I have attached a separate patch (requested by Andres Freund)
> > that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> > ..." to be parallel.
> >
>
> It might be a good idea to first just get this patch committed, if
> possible. So, I have reviewed the latest version of this patch:
>

Few initial comments on 0004-ParallelInsertSelect:

1.
@@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation,
HeapTuple tup, TransactionId xid,
  * inserts in general except for the cases where inserts generate a new
  * CommandId (eg. inserts into a table having a foreign key column).
  */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-

I have speculated above [1] to see if we can change this Assert
condition instead of just removing it? Have you considered that
suggestion?

2.
@@ -764,12 +778,13 @@ GetCurrentCommandId(bool used)
  if (used)
  {
  /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader.  We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed
+ * if currentCommandIdUsed was already true at the start of the
+ * parallel operation (by way of SetCurrentCommandIdUsed()), otherwise
+ * forbid setting currentCommandIdUsed because we have no provision
+ * for communicating this back to the leader.
  */
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
  currentCommandIdUsed = true;
  }

Once we allowed this, won't the next CommandCounterIncrement() in the
worker will increment the commandId which will lead to using different
commandIds in worker and leader? Is that prevented in some way, if so,
how? Can we document the same?

3.
@@ -173,7 +173,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * against performing unsafe operations in parallel mode, but this gives a
  * more user-friendly error message.
  */
- if ((XactReadOnly || IsInParallelMode()) &&
+ if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
  !(eflags & EXEC_FLAG_EXPLAIN_ONLY))
  ExecCheckXactReadOnly(queryDesc->plannedstmt);

I don't think above change is correct. We need to extend the below
check in ExecCheckXactReadOnly() because otherwise, it can allow
Insert operations even when XactReadOnly is set which we don't want.

ExecCheckXactReadOnly()
{
..
if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
..
}

4.
@@ -173,18 +175,20 @@ ExecSerializePlan(Plan *plan, EState *estate)
  * PlannedStmt to start the executor.
- pstmt->hasReturning = false;
- pstmt->hasModifyingCTE = false;
+ pstmt->hasReturning = estate->es_plannedstmt->hasReturning;
+ pstmt->hasModifyingCTE = estate->es_plannedstmt->hasModifyingCTE;

Why change hasModifyingCTE?

5.
+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */

The multi-line comment should start from the second line. I see a
similar problem at other places in the patch as well.

6.
@@ -178,6 +214,25 @@ ExecGather(PlanState *pstate)
  node->pei,
  gather->initParam);

+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */
+ if (nodeModifyTableState->fireBSTriggers)
+ {
+ fireBSTriggers(nodeModifyTableState);
+ nodeModifyTableState->fireBSTriggers = false;
+
+ /*
+ * Disable firing of AFTER STATEMENT triggers by local
+ * plan execution (ModifyTable processing). These will be
+ * fired at end of Gather processing.
+ */
+ nodeModifyTableState->fireASTriggers = false;
+ }
+ }

Can we encapsulate this in a separate function? It seems a bit odd to
directly do this ExecGather.

7.
@@ -418,14 +476,25 @@ ExecShutdownGatherWorkers(GatherState *node)
 void
 ExecShutdownGather(GatherState *node)
 {
- ExecShutdownGatherWorkers(node);
+ if (node->pei == NULL)
+ return;

- /* Now destroy the parallel context. */
- if (node->pei != NULL)

So after this patch if "node->pei == NULL" then we won't shutdown
workers here? Why so?

8. You have made changes related to trigger execution for Gather node,
don't we need similar changes for GatherMerge node?

9.
@@ -383,7 +444,21 @@ cost_gather(GatherPath *path, PlannerInfo *root,

  /* Parallel setup and communication cost. */
  startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * For Parallel INSERT, provided no tuples are returned from workers
+ * to gather/leader node, don't add a cost-per-row, as each worker
+ * parallelly inserts the tuples that result from its chunk of plan
+ * execution. This change may make the parallel plan cheap among all
+ * other plans, and influence the planner to consider this parallel
+ * plan.
+ */
+ if (!(IsA(path->subpath, ModifyTablePath) &&
+ castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
+ castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
+ {
+ run_cost += parallel_tuple_cost * path->path.rows;
+ }

Isn't the last condition in above check "castNode(ModifyTablePath,
path->subpath)->returningLists != NULL" should be
"castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
instead? Because otherwise when there is returning list it won't count
the cost for passing tuples via Gather node. This might be reason of
what Thomas has seen in his recent email [2].

10. Don't we need a change similar to cost_gather in
cost_gather_merge? It seems you have made only partial changes for
GatherMerge node.


[1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CA%2BhUKGLZB%3D1Q%2BAQQEEmffr3bUMAh%2BJD%2BJ%2B7axv%2BK10Kea0U9TQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 8. You have made changes related to trigger execution for Gather node,
> don't we need similar changes for GatherMerge node?
>
..
>
> 10. Don't we need a change similar to cost_gather in
> cost_gather_merge? It seems you have made only partial changes for
> GatherMerge node.
>

Please ignore these two comments as we can't push Insert to workers
when GatherMerge is involved as a leader backend does the final phase
(merge the results by workers). So, we can only push the Select part
of the statement.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Sun, Oct 11, 2020 at 12:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> + /*
> + * For Parallel INSERT, provided no tuples are returned from workers
> + * to gather/leader node, don't add a cost-per-row, as each worker
> + * parallelly inserts the tuples that result from its chunk of plan
> + * execution. This change may make the parallel plan cheap among all
> + * other plans, and influence the planner to consider this parallel
> + * plan.
> + */
> + if (!(IsA(path->subpath, ModifyTablePath) &&
> + castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
> + castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
> + {
> + run_cost += parallel_tuple_cost * path->path.rows;
> + }
>
> Isn't the last condition in above check "castNode(ModifyTablePath,
> path->subpath)->returningLists != NULL" should be
> "castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
> instead? Because otherwise when there is returning list it won't count
> the cost for passing tuples via Gather node. This might be reason of
> what Thomas has seen in his recent email [2].

Yeah, I think this is trying to fix the problem too late.  Instead, we
should fix the incorrect row estimates so we don't have to fudge it
later like that.  For example, this should be estimating rows=0:

postgres=# explain analyze insert into s select * from t t1 join t t2 using (i);
...
 Insert on s  (cost=30839.08..70744.45 rows=1000226 width=4) (actual
time=2940.560..2940.562 rows=0 loops=1)

I think that should be done with something like this:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root,
RelOptInfo *rel,
                if (lc == list_head(subpaths))  /* first node? */
                        pathnode->path.startup_cost = subpath->startup_cost;
                pathnode->path.total_cost += subpath->total_cost;
-               pathnode->path.rows += subpath->rows;
+               if (returningLists != NIL)
+                       pathnode->path.rows += subpath->rows;
                total_size += subpath->pathtarget->width * subpath->rows;
        }

-       /*
-        * Set width to the average width of the subpath outputs.  XXX this is
-        * totally wrong: we should report zero if no RETURNING, else an average
-        * of the RETURNING tlist widths.  But it's what happened historically,
-        * and improving it is a task for another day.
-        */
        if (pathnode->path.rows > 0)
                total_size /= pathnode->path.rows;
        pathnode->path.pathtarget->width = rint(total_size);



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> > you're right, neither of those additional "commandType == CMD_SELECT"
> > checks are needed, so I'll remove them.
> >
>various
> Okay, that makes sense.
>

For the minimal patch (just allowing INSERT with parallel SELECT),
there are issues with parallel-mode and various parallel-mode-related
checks in the code.
Initially, I thought it was only a couple of XID-related checks (which
could perhaps just be tweaked to check for IsParallelWorker() instead,
as you suggested), but I now realise that there are a lot more cases.
This stems from the fact that just having a parallel SELECT (as part
of non-parallel INSERT) causes parallel-mode to be set for the WHOLE
plan. I'm not sure why parallel-mode is set globally like this, for
the whole plan. Couldn't it just be set for the scope of
Gather/GatherMerge? Otherwise, errors from these checks seem to be
misleading when outside the scope of Gather/GatherMerge, as
technically they are not occurring within the scope of parallel-leader
and parallel-worker(s). The global parallel-mode wouldn't have been an
issue before, because up to now INSERT has never had underlying
parallel operations.

For example, when running the tests under
"force_parallel_mode=regress", the test failures show that there are a
lot more cases affected:

"cannot assign TransactionIds during a parallel operation"
"cannot assign XIDs during a parallel operation"
"cannot start commands during a parallel operation"
"cannot modify commandid in active snapshot during a parallel operation"
"cannot execute nextval() during a parallel operation"
"cannot execute INSERT during a parallel operation"
"cannot execute ANALYZE during a parallel operation
"cannot update tuples during a parallel operation"

(and there are more not currently detected by the tests, found by
searching the code).

As an example, with the minimal patch applied, if you had a trigger on
INSERT that, say, attempted a table creation or UPDATE/DELETE, and you
ran an "INSERT INTO ... SELECT...", it would treat the trigger
operations as being attempted in parallel-mode, and so an error would
result.

Let me know your thoughts on how to deal with these issues.
Can you see a problem with only having parallel-mode set for scope of
Gather/GatherMerge, or do you have some other idea?

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Sun, Oct 11, 2020 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 8. You have made changes related to trigger execution for Gather node,
> > don't we need similar changes for GatherMerge node?
> >
> ..
> >
> > 10. Don't we need a change similar to cost_gather in
> > cost_gather_merge? It seems you have made only partial changes for
> > GatherMerge node.
> >
>
> Please ignore these two comments as we can't push Insert to workers
> when GatherMerge is involved as a leader backend does the final phase
> (merge the results by workers). So, we can only push the Select part
> of the statement.
>

Precisely, that's why I didn't make those changes for GatherMerge.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Yeah, I think this is trying to fix the problem too late.  Instead, we
> should fix the incorrect row estimates so we don't have to fudge it
> later like that.  For example, this should be estimating rows=0:
>
> postgres=# explain analyze insert into s select * from t t1 join t t2 using (i);
> ...
>  Insert on s  (cost=30839.08..70744.45 rows=1000226 width=4) (actual
> time=2940.560..2940.562 rows=0 loops=1)
>
> I think that should be done with something like this:
>
> --- a/src/backend/optimizer/util/pathnode.c
> +++ b/src/backend/optimizer/util/pathnode.c
> @@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root,
> RelOptInfo *rel,
>                 if (lc == list_head(subpaths))  /* first node? */
>                         pathnode->path.startup_cost = subpath->startup_cost;
>                 pathnode->path.total_cost += subpath->total_cost;
> -               pathnode->path.rows += subpath->rows;
> +               if (returningLists != NIL)
> +                       pathnode->path.rows += subpath->rows;
>                 total_size += subpath->pathtarget->width * subpath->rows;
>         }
>
> -       /*
> -        * Set width to the average width of the subpath outputs.  XXX this is
> -        * totally wrong: we should report zero if no RETURNING, else an average
> -        * of the RETURNING tlist widths.  But it's what happened historically,
> -        * and improving it is a task for another day.
> -        */
>         if (pathnode->path.rows > 0)
>                 total_size /= pathnode->path.rows;
>         pathnode->path.pathtarget->width = rint(total_size);

Agree, thanks (bug in existing Postgres code, right?)

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Mon, Oct 12, 2020 at 3:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >                 pathnode->path.total_cost += subpath->total_cost;
> > -               pathnode->path.rows += subpath->rows;
> > +               if (returningLists != NIL)
> > +                       pathnode->path.rows += subpath->rows;
> >                 total_size += subpath->pathtarget->width * subpath->rows;
> >         }

Erm, except the condition should of course cover total_size too.

> Agree, thanks (bug in existing Postgres code, right?)

Yeah, I think we should go ahead and fix that up front.  Here's a
version with a commit message.

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> > > you're right, neither of those additional "commandType == CMD_SELECT"
> > > checks are needed, so I'll remove them.
> > >
> >various
> > Okay, that makes sense.
> >
>
> For the minimal patch (just allowing INSERT with parallel SELECT),
> there are issues with parallel-mode and various parallel-mode-related
> checks in the code.
> Initially, I thought it was only a couple of XID-related checks (which
> could perhaps just be tweaked to check for IsParallelWorker() instead,
> as you suggested), but I now realise that there are a lot more cases.
> This stems from the fact that just having a parallel SELECT (as part
> of non-parallel INSERT) causes parallel-mode to be set for the WHOLE
> plan. I'm not sure why parallel-mode is set globally like this, for
> the whole plan. Couldn't it just be set for the scope of
> Gather/GatherMerge? Otherwise, errors from these checks seem to be
> misleading when outside the scope of Gather/GatherMerge, as
> technically they are not occurring within the scope of parallel-leader
> and parallel-worker(s). The global parallel-mode wouldn't have been an
> issue before, because up to now INSERT has never had underlying
> parallel operations.
>

That is right but there is another operation which works like that.
For ex. a statement like "create table test_new As select * from
test_parallel where c1 < 1000;" will use parallel select but the write
operation will be performed in a leader. I agree that the code flow of
Insert is different so we will have a different set of challenges in
that case but to make it work there shouldn't be any fundamental
problem.

> For example, when running the tests under
> "force_parallel_mode=regress", the test failures show that there are a
> lot more cases affected:
>
> "cannot assign TransactionIds during a parallel operation"
> "cannot assign XIDs during a parallel operation"
> "cannot start commands during a parallel operation"
> "cannot modify commandid in active snapshot during a parallel operation"
> "cannot execute nextval() during a parallel operation"
> "cannot execute INSERT during a parallel operation"
> "cannot execute ANALYZE during a parallel operation
> "cannot update tuples during a parallel operation"
>
> (and there are more not currently detected by the tests, found by
> searching the code).
>

Did you get these after applying your patch? If so, can you share the
version which you are using, or if you have already posted the same
then point me to the same?

> As an example, with the minimal patch applied, if you had a trigger on
> INSERT that, say, attempted a table creation or UPDATE/DELETE, and you
> ran an "INSERT INTO ... SELECT...", it would treat the trigger
> operations as being attempted in parallel-mode, and so an error would
> result.
>

Oh, I guess this happens because you need to execute Insert in
parallel-mode even though Insert is happening in the leader, right?
And probably we are not facing this with "Create Table As .." because
there is no trigger execution involved there.

> Let me know your thoughts on how to deal with these issues.
> Can you see a problem with only having parallel-mode set for scope of
> Gather/GatherMerge, or do you have some other idea?
>

I have not thought about this yet but I don't understand your
proposal. How will you set it only for the scope of Gather (Merge)?
The execution of the Gather node will be interleaved with the Insert
node, basically, you fetch a tuple from Gather, and then you need to
Insert it. Can you be a bit more specific on what you have in mind for
this?

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Oct 12, 2020 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 3:42 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Sun, Oct 11, 2020 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > >                 pathnode->path.total_cost += subpath->total_cost;
> > > -               pathnode->path.rows += subpath->rows;
> > > +               if (returningLists != NIL)
> > > +                       pathnode->path.rows += subpath->rows;
> > >                 total_size += subpath->pathtarget->width * subpath->rows;
> > >         }
>
> Erm, except the condition should of course cover total_size too.
>
> > Agree, thanks (bug in existing Postgres code, right?)
>
> Yeah, I think we should go ahead and fix that up front.  Here's a
> version with a commit message.

I've checked it and tested it, and it looks fine to me.
Also, it seems to align with the gripe in the old comment about width
("XXX this is totally wrong: we should report zero if no RETURNING
...").
I'm happy for you to commit it.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Oct 12, 2020 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 6:51 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
>
> > Let me know your thoughts on how to deal with these issues.
> > Can you see a problem with only having parallel-mode set for scope of
> > Gather/GatherMerge, or do you have some other idea?
> >
>
> I have not thought about this yet but I don't understand your
> proposal. How will you set it only for the scope of Gather (Merge)?
> The execution of the Gather node will be interleaved with the Insert
> node, basically, you fetch a tuple from Gather, and then you need to
> Insert it. Can you be a bit more specific on what you have in mind for
> this?
>

One more thing I would like to add here is that we can't exit
parallel-mode till the workers are running (performing the scan or
other operation it is assigned with) and shared memory is not
destroyed. Otherwise, the leader can perform un-safe things like
assigning new commandsids or probably workers can send some error for
which the leader should still be in parallel-mode. So, considering
this I think we need quite similar checks (similar to parallel
inserts) to make even the Select part parallel for Inserts. If we do
that then you won't face many of the problems you mentioned above like
executing triggers that contain parallel-unsafe stuff. I feel still it
will be beneficial to do this as it will cover cases like Insert with
GatherMerge underneath it which would otherwise not possible.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > I have not thought about this yet but I don't understand your
> > proposal. How will you set it only for the scope of Gather (Merge)?
> > The execution of the Gather node will be interleaved with the Insert
> > node, basically, you fetch a tuple from Gather, and then you need to
> > Insert it. Can you be a bit more specific on what you have in mind for
> > this?
> >
>
> One more thing I would like to add here is that we can't exit
> parallel-mode till the workers are running (performing the scan or
> other operation it is assigned with) and shared memory is not
> destroyed. Otherwise, the leader can perform un-safe things like
> assigning new commandsids or probably workers can send some error for
> which the leader should still be in parallel-mode. So, considering
> this I think we need quite similar checks (similar to parallel
> inserts) to make even the Select part parallel for Inserts. If we do
> that then you won't face many of the problems you mentioned above like
> executing triggers that contain parallel-unsafe stuff. I feel still it
> will be beneficial to do this as it will cover cases like Insert with
> GatherMerge underneath it which would otherwise not possible.
>

Yes, I see what you mean, exiting parallel-mode can't be done safely
where I had hoped it could, so looks like, even for making just the
Select part of Insert parallel, I need to add checks (along the same
lines as for Parallel Insert) to avoid the parallel Select in certain
potentially-unsafe cases.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Oct 12, 2020 at 12:38 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > I have not thought about this yet but I don't understand your
> > > proposal. How will you set it only for the scope of Gather (Merge)?
> > > The execution of the Gather node will be interleaved with the Insert
> > > node, basically, you fetch a tuple from Gather, and then you need to
> > > Insert it. Can you be a bit more specific on what you have in mind for
> > > this?
> > >
> >
> > One more thing I would like to add here is that we can't exit
> > parallel-mode till the workers are running (performing the scan or
> > other operation it is assigned with) and shared memory is not
> > destroyed. Otherwise, the leader can perform un-safe things like
> > assigning new commandsids or probably workers can send some error for
> > which the leader should still be in parallel-mode. So, considering
> > this I think we need quite similar checks (similar to parallel
> > inserts) to make even the Select part parallel for Inserts. If we do
> > that then you won't face many of the problems you mentioned above like
> > executing triggers that contain parallel-unsafe stuff. I feel still it
> > will be beneficial to do this as it will cover cases like Insert with
> > GatherMerge underneath it which would otherwise not possible.
> >
>
> Yes, I see what you mean, exiting parallel-mode can't be done safely
> where I had hoped it could, so looks like, even for making just the
> Select part of Insert parallel, I need to add checks (along the same
> lines as for Parallel Insert) to avoid the parallel Select in certain
> potentially-unsafe cases.
>

Right, after we take care of that, we can think of assigning xid or
things like that before we enter parallel mode. Say we have a function
like PrepareParallelMode (or PrepareEnterParallelMode) or something
like that where we can check whether we need to perform
parallel-safe-write operation (as of now Insert) and then do the
required preparation like assign xid, etc. I think this might not be
idle because it is possible that we don't fetch even a single row (say
due to filter condition) which needs to be inserted and then we will
waste xid but such cases might not occur often enough to worry.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Thomas Munro
Date:
On Mon, Oct 12, 2020 at 6:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Mon, Oct 12, 2020 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Yeah, I think we should go ahead and fix that up front.  Here's a
> > version with a commit message.
>
> I've checked it and tested it, and it looks fine to me.
> Also, it seems to align with the gripe in the old comment about width
> ("XXX this is totally wrong: we should report zero if no RETURNING
> ...").
> I'm happy for you to commit it.

Pushed, though I left most of that comment there because the width
estimate still needs work when you do use RETURNING.  At least we now
have rows=0 for queries without RETURNING, which was a bigger problem
for your patch.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> It might be a good idea to first just get this patch committed, if
> possible. So, I have reviewed the latest version of this patch:
>
> 0001-InsertParallelSelect

I've attached an updated InsertParallelSelect patch (v2) - allowing
underlying parallel SELECT for "INSERT INTO ... SELECT ...".
I think I've addressed most of the issues identified in the previous
version of the patch.
I'm still seeing a couple of errors in the tests when
"force_parallel_mode=regress" is in effect, and those need to be
addressed (extra checks required to avoid parallel SELECT in certain
cases).
Also, I'm seeing a partition-related error when running
installcheck-world - I'm investigating that.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It might be a good idea to first just get this patch committed, if
> > possible. So, I have reviewed the latest version of this patch:
> >
> > 0001-InsertParallelSelect
>
> I've attached an updated InsertParallelSelect patch (v2) - allowing
> underlying parallel SELECT for "INSERT INTO ... SELECT ...".
> I think I've addressed most of the issues identified in the previous
> version of the patch.
> I'm still seeing a couple of errors in the tests when
> "force_parallel_mode=regress" is in effect, and those need to be
> addressed (extra checks required to avoid parallel SELECT in certain
> cases).
>

I am getting below error in force_parallel_mode:
@@ -1087,9 +1087,14 @@
 ERROR:  value for domain inotnull violates check constraint "inotnull_check"
 create table dom_table (x inotnull);
 insert into dom_table values ('1');
+ERROR:  cannot start commands during a parallel operation
+CONTEXT:  SQL function "sql_is_distinct_from"

It happened with below test:

create function sql_is_distinct_from(anyelement, anyelement)
returns boolean language sql
as 'select $1 is distinct from $2 limit 1';

create domain inotnull int
  check (sql_is_distinct_from(value, null));

create table dom_table (x inotnull);
insert into dom_table values ('1');

So it is clear that this is happening because we have allowed insert
that is parallel-unsafe. The attribute is of type domain which has a
parallel-unsafe constraint. As per your current code, we need to
detect it in IsRelParallelModeSafeForModify. The idea would be to
check the type of each attribute and if it is domain type then we need
to check if it has a constraint (See function ExecGrant_Type on how to
detect a domain type and then refer to functions
AlterTypeNamespaceInternal and AlterConstraintNamespaces to know how
to determine constraint for domain type). Once you can find a
constraint then you already have code in your patch to find if it
contains parallel-unsafe expression.


> Also, I'm seeing a partition-related error when running
> installcheck-world - I'm investigating that.
>

Okay.

Few more comments:
==================
1.
+ /*
+ * Can't support execution of row-level or transition-table triggers
+ * during parallel-mode, since such triggers may query the table
+ * into which the data is being inserted, and the content returned
+ * would vary unpredictably according to the order of retrieval by
+ * the workers and the rows already inserted.
+ */
+ if (trigdesc != NULL &&
+ (trigdesc->trig_insert_instead_row ||
+   trigdesc->trig_insert_before_row ||
+   trigdesc->trig_insert_after_row ||
+   trigdesc->trig_insert_new_table))
+ {
+ return false;
+ }

I don't think it is a good idea to prohibit all before/after/instead
row triggers because for the case you are referring to should mark
trigger functions as parallel-unsafe. We might want to have to Assert
somewhere to detect if there is illegal usage but I don't see the need
to directly prohibit them.

2.
@@ -56,8 +57,8 @@ GetNewTransactionId(bool isSubXact)
  * Workers synchronize transaction state at the beginning of each parallel
  * operation, so we can't account for new XIDs after that point.
  */
- if (IsInParallelMode())
- elog(ERROR, "cannot assign TransactionIds during a parallel operation");
+ if (IsParallelWorker())
+ elog(ERROR, "cannot assign TransactionIds in a parallel worker");

  /*
  * During bootstrap initialization, we return the special bootstrap
diff --git a/src/backend/access/transam/xact.c
b/src/backend/access/transam/xact.c
index af6afce..ef423fb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -580,8 +580,8 @@ AssignTransactionId(TransactionState s)
  * Workers synchronize transaction state at the beginning of each parallel
  * operation, so we can't account for new XIDs at this point.
  */
- if (IsInParallelMode() || IsParallelWorker())
- elog(ERROR, "cannot assign XIDs during a parallel operation");
+ if (IsParallelWorker())
+ elog(ERROR, "cannot assign XIDs in a parallel worker");


I think we don't need these changes at least for the purpose of this
patch if you follow the suggestion related to having a new function
like PrepareParallelMode in the email above [1]. One problem I see
with removing these checks is how do we ensure that leader won't
assign a new transactionid once we start executing a parallel node. It
can do via sub-transactions maybe that is already protected at some
previous point but I would like to see if we can retain these checks.

[1] - https://www.postgresql.org/message-id/CAA4eK1JogfXUa%3D3wMPO%2BK%3DUiOLgHgCO7-fj1wCHsSxdaXsfVbw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Oct 15, 2020 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > Also, I'm seeing a partition-related error when running
> > installcheck-world - I'm investigating that.
> >
>
> Okay.
>

The attached patch fixes this partition case for me. Basically, we
need to check the parallel-safety of PartitionKey. I have only checked
for partsupfunc but I think we need to check the parallel-safety of
partexprs as well. Also, I noticed that you have allowed for
parallelism only when all expressions/functions involved with Insert
are parallel-safe, can't we allow parallel-restricted case because
anyway Inserts have to be performed by the leader for this patch.

-- 
With Regards,
Amit Kapila.

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 16, 2020 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 15, 2020 at 9:56 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > Also, I'm seeing a partition-related error when running
> > > installcheck-world - I'm investigating that.
> > >
> >
> > Okay.
> >
>
> The attached patch fixes this partition case for me. Basically, we
> need to check the parallel-safety of PartitionKey. I have only checked
> for partsupfunc but I think we need to check the parallel-safety of
> partexprs as well.

Thanks, I had already added the parallel-safety check for partexprs
when I saw this, so your patch hopefully completes all the
partition-related checks required.

> Also, I noticed that you have allowed for
> parallelism only when all expressions/functions involved with Insert
> are parallel-safe, can't we allow parallel-restricted case because
> anyway Inserts have to be performed by the leader for this patch.
>

Yes, I think you're right.
"A parallel restricted operation is one which cannot be performed in a
parallel worker, but which can be performed in the leader while
parallel query is in use."
I'll make the change and test that everything works OK.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 16, 2020 at 2:16 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 16, 2020 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > Also, I noticed that you have allowed for
> > parallelism only when all expressions/functions involved with Insert
> > are parallel-safe, can't we allow parallel-restricted case because
> > anyway Inserts have to be performed by the leader for this patch.
> >
>
> Yes, I think you're right.
> "A parallel restricted operation is one which cannot be performed in a
> parallel worker, but which can be performed in the leader while
> parallel query is in use."
> I'll make the change and test that everything works OK.
>

Cool, let me try to explain my thoughts a bit more. The idea is first
(in standard_planner) we check if there is any 'parallel_unsafe'
function/expression (via max_parallel_hazard) in the query tree. If we
don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
this stage, the patch is checking whether there is any
'parallel_unsafe' or 'parallel_restricted' expression/function in the
target relation and if there is none then we mark parallelModeOK as
true. So, if there is anything 'parallel_restricted' then we will mark
parallelModeOK as false which doesn't seem right to me.

Then later in the planner during set_rel_consider_parallel, we
determine if a particular relation can be scanned from within a
worker, then we consider that relation for parallelism. Here, we
determine if certain things are parallel-restricted then we don't
consider this for parallelism. Then we create partial paths for the
relations that are considered for parallelism. I think we don't need
to change anything for the current patch in these later stages because
we anyway are not considering Insert to be pushed into workers.
However, in the second patch where we are thinking to push Inserts in
workers, we might need to do something to filter parallel-restricted
cases during this stage of the planner.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Cool, let me try to explain my thoughts a bit more. The idea is first
> (in standard_planner) we check if there is any 'parallel_unsafe'
> function/expression (via max_parallel_hazard) in the query tree. If we
> don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
> this stage, the patch is checking whether there is any
> 'parallel_unsafe' or 'parallel_restricted' expression/function in the
> target relation and if there is none then we mark parallelModeOK as
> true. So, if there is anything 'parallel_restricted' then we will mark
> parallelModeOK as false which doesn't seem right to me.
>
> Then later in the planner during set_rel_consider_parallel, we
> determine if a particular relation can be scanned from within a
> worker, then we consider that relation for parallelism. Here, we
> determine if certain things are parallel-restricted then we don't
> consider this for parallelism. Then we create partial paths for the
> relations that are considered for parallelism. I think we don't need
> to change anything for the current patch in these later stages because
> we anyway are not considering Insert to be pushed into workers.
> However, in the second patch where we are thinking to push Inserts in
> workers, we might need to do something to filter parallel-restricted
> cases during this stage of the planner.
>

Posting an update to the smaller patch (Parallel SELECT for INSERT
INTO...SELECT...).

Most of this patch feeds into the larger Parallel INSERT patch, for
which I'll also be posting an update soon.

Patch updates include:
- Removed explicit trigger-type checks (instead rely on declared
trigger parallel safety)
- Restored parallel-related XID checks that previous patch altered;
now assign XID prior to entering parallel-mode
- Now considers parallel-SELECT for parallel RESTRICTED cases (not
just parallel SAFE cases)
- Added parallel-safety checks for partition key expressions and
support functions
- Workaround added for test failure in "partition-concurrent-attach"
test; since ALTER TABLE operations may exclusively lock a relation
until end-of-transaction, now assume and return UNSAFE if can't
acquire a share-lock on the relation, rather than block until
potentially end of the other concurrent transaction in which the
exclusive lock is held.
Examples of when a relation is exclusively locked
(AccessExclusiveLock) until end-of-transaction include:
    ALTER TABLE DROP COLUMN
    ALTER TABLE ... RENAME
    ALTER TABLE ... ATTACH PARTITION  (locks default partition)

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Cool, let me try to explain my thoughts a bit more. The idea is first
> (in standard_planner) we check if there is any 'parallel_unsafe'
> function/expression (via max_parallel_hazard) in the query tree. If we
> don't find anything 'parallel_unsafe' then we mark parallelModeOk. At
> this stage, the patch is checking whether there is any
> 'parallel_unsafe' or 'parallel_restricted' expression/function in the
> target relation and if there is none then we mark parallelModeOK as
> true. So, if there is anything 'parallel_restricted' then we will mark
> parallelModeOK as false which doesn't seem right to me.
>
> Then later in the planner during set_rel_consider_parallel, we
> determine if a particular relation can be scanned from within a
> worker, then we consider that relation for parallelism. Here, we
> determine if certain things are parallel-restricted then we don't
> consider this for parallelism. Then we create partial paths for the
> relations that are considered for parallelism. I think we don't need
> to change anything for the current patch in these later stages because
> we anyway are not considering Insert to be pushed into workers.
> However, in the second patch where we are thinking to push Inserts in
> workers, we might need to do something to filter parallel-restricted
> cases during this stage of the planner.
>

Posting an updated Parallel INSERT patch which (mostly) addresses
previously-identified issues and suggestions.

More work needs to be done in order to support parallel UPDATE and
DELETE (even after application of Thomas Munro's combo-cid
parallel-support patch), but it is getting closer.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Oct 22, 2020 at 9:47 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Oct 16, 2020 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> Posting an update to the smaller patch (Parallel SELECT for INSERT
> INTO...SELECT...).
>
> Most of this patch feeds into the larger Parallel INSERT patch, for
> which I'll also be posting an update soon.
>
> Patch updates include:
> - Removed explicit trigger-type checks (instead rely on declared
> trigger parallel safety)
> - Restored parallel-related XID checks that previous patch altered;
> now assign XID prior to entering parallel-mode
> - Now considers parallel-SELECT for parallel RESTRICTED cases (not
> just parallel SAFE cases)
> - Added parallel-safety checks for partition key expressions and
> support functions
> - Workaround added for test failure in "partition-concurrent-attach"
> test;
>

IIUC, below is code for this workaround:

+MaxRelParallelHazardForModify(Oid relid,
+ CmdType commandType,
+ max_parallel_hazard_context *context)
+{
+ Relation        rel;
+ TupleDesc tupdesc;
+ int attnum;
+
+ LOCKMODE lockmode = AccessShareLock;
+
+ /*
+ * It's possible that this relation is locked for exclusive access
+ * in another concurrent transaction (e.g. as a result of a
+ * ALTER TABLE ... operation) until that transaction completes.
+ * If a share-lock can't be acquired on it now, we have to assume this
+ * could be the worst-case, so to avoid blocking here until that
+ * transaction completes, conditionally try to acquire the lock and
+ * assume and return UNSAFE on failure.
+ */
+ if (ConditionalLockRelationOid(relid, lockmode))
+ {
+ rel = table_open(relid, NoLock);
+ }
+ else
+ {
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ return context->max_hazard;
+ }

Do we need this workaround if we lock just the parent table instead of
locking all the tables? Basically, can we safely identify the
parallel-safety of partitioned relation if we just have a lock on
parent relation? One more thing I have noticed is that for scan
relations (Select query), we do such checks much later based on
RelOptInfo (see set_rel_consider_parallel) which seems to have most of
the information required to perform parallel-safety checks but I guess
for ModifyTable (aka the Insert table) the equivalent doesn't seem
feasible but have you thought of doing at the later stage in planner?

Few other comments on latest patch:
===============================
1.
MaxRelParallelHazardForModify()
{
..
+ if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
+ {
+ /*
..

Why to check CMD_UPDATE here?

2.
+void PrepareParallelModeForModify(CmdType commandType, bool
isParallelModifyLeader)
+{
+ Assert(!IsInParallelMode());
+
+ if (isParallelModifyLeader)
+ (void)GetCurrentCommandId(true);
+
+ (void)GetCurrentFullTransactionId();

Here, we should use GetCurrentTransactionId() similar to heap_insert
or other heap operations. I am not sure why you have used
GetCurrentFullTransactionId?

3. Can we have a test to show why we need to check all the partitions
for parallel-safety? I think it would be possible when there is a
trigger on only one of the partitions and that trigger has
corresponding parallel_unsafe function. But it is good to verify that
once.

4. Have you checked the overhead of this on the planner for different
kinds of statements like inserts into tables having 100 or 500
partitions? Similarly, it is good to check the overhead of domain
related checks added in the patch.

5. Can we have a separate patch for parallel-selects for Insert? It
will make review easier.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> IIUC, below is code for this workaround:
>
> +MaxRelParallelHazardForModify(Oid relid,
> + CmdType commandType,
> + max_parallel_hazard_context *context)
> +{
> + Relation        rel;
> + TupleDesc tupdesc;
> + int attnum;
> +
> + LOCKMODE lockmode = AccessShareLock;
> +
> + /*
> + * It's possible that this relation is locked for exclusive access
> + * in another concurrent transaction (e.g. as a result of a
> + * ALTER TABLE ... operation) until that transaction completes.
> + * If a share-lock can't be acquired on it now, we have to assume this
> + * could be the worst-case, so to avoid blocking here until that
> + * transaction completes, conditionally try to acquire the lock and
> + * assume and return UNSAFE on failure.
> + */
> + if (ConditionalLockRelationOid(relid, lockmode))
> + {
> + rel = table_open(relid, NoLock);
> + }
> + else
> + {
> + context->max_hazard = PROPARALLEL_UNSAFE;
> + return context->max_hazard;
> + }
>
> Do we need this workaround if we lock just the parent table instead of
> locking all the tables? Basically, can we safely identify the
> parallel-safety of partitioned relation if we just have a lock on
> parent relation?

I believe the workaround is still needed in this case, because the
workaround was added because of a test in which the parent table was
exclusively locked in another concurrent transaction (as a result of
ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a
ShareLock on the parent table without hanging (and then ending up
failing the test because of it).
So at the moment the workaround is needed, even if just trying to lock
the parent table.
I'll do some more testing to determine the secondary issue of whether
locks on the partition tables are needed, but at the moment I believe
they are.

>One more thing I have noticed is that for scan
> relations (Select query), we do such checks much later based on
> RelOptInfo (see set_rel_consider_parallel) which seems to have most of
> the information required to perform parallel-safety checks but I guess
> for ModifyTable (aka the Insert table) the equivalent doesn't seem
> feasible but have you thought of doing at the later stage in planner?
>

Yes, and in fact I tried putting the checks in a later stage of the
planner, and it's almost successful, except it then makes setting
"parallelModeNeeded" very tricky indeed, because that is expected to
be set based on whether the SQL is safe to run in parallel mode
(paralleModeOK == true) and whether force_parallel_mode is not off.
With parallel safety checks delayed to a later stage in the planner,
it's then not known whether there are certain types of parallel-unsafe
INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE
...), because processing for those doesn't reach those later stages of
the planner where parallelism is being considered. So then to avoid
errors from when parallel-mode is forced on and such unsafe INSERTs
are run, the only real choice is to only allow parallelModeNeeded to
be true for SELECT only (not INSERT), and this is kind of cheating and
also not picking up cases where parallel-safe INSERT is run but
invokes parallel-mode-unsafe features.
My conclusion, at least for the moment, is to leave the check where it is.


> Few other comments on latest patch:
> ===============================
> 1.
> MaxRelParallelHazardForModify()
> {
> ..
> + if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
> + {
> + /*
> ..
>
> Why to check CMD_UPDATE here?
>

That was a bit of forward-thinking, for when/if UPDATE/DELETE is
supported in parallel-mode.
Column default expressions and check-constraints are only applicable
to INSERT and UPDATE.
Note however that currently this function can only ever be called with
commandType == CMD_INSERT.

> 2.
> +void PrepareParallelModeForModify(CmdType commandType, bool
> isParallelModifyLeader)
> +{
> + Assert(!IsInParallelMode());
> +
> + if (isParallelModifyLeader)
> + (void)GetCurrentCommandId(true);
> +
> + (void)GetCurrentFullTransactionId();
>
> Here, we should use GetCurrentTransactionId() similar to heap_insert
> or other heap operations. I am not sure why you have used
> GetCurrentFullTransactionId?
>

GetCurrentTransactionId() and GetCurrentFullTransactionId() actually
have the same functionality, just a different return value (which is
not being used here).
But anyway I've changed it to use GetCurrentTransactionId().


> 3. Can we have a test to show why we need to check all the partitions
> for parallel-safety? I think it would be possible when there is a
> trigger on only one of the partitions and that trigger has
> corresponding parallel_unsafe function. But it is good to verify that
> once.
>

I can't imagine how you could check parallel-safety properly without
checking all of the partitions.
We don't know which partition that data will get inserted into until
runtime (e.g. range/list partitioning).
Each partition can have its own column default expressions,
check-constraints, triggers etc. (which may or may not be
parallel-safe) and a partition may itself be a partitioned table.


> 4. Have you checked the overhead of this on the planner for different
> kinds of statements like inserts into tables having 100 or 500
> partitions? Similarly, it is good to check the overhead of domain
> related checks added in the patch.
>

Checking that now and will post results soon.

> 5. Can we have a separate patch for parallel-selects for Insert? It
> will make review easier.
>

See attached patches.


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Oct 30, 2020 at 6:09 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > IIUC, below is code for this workaround:
> >
> > +MaxRelParallelHazardForModify(Oid relid,
> > + CmdType commandType,
> > + max_parallel_hazard_context *context)
> > +{
> > + Relation        rel;
> > + TupleDesc tupdesc;
> > + int attnum;
> > +
> > + LOCKMODE lockmode = AccessShareLock;
> > +
> > + /*
> > + * It's possible that this relation is locked for exclusive access
> > + * in another concurrent transaction (e.g. as a result of a
> > + * ALTER TABLE ... operation) until that transaction completes.
> > + * If a share-lock can't be acquired on it now, we have to assume this
> > + * could be the worst-case, so to avoid blocking here until that
> > + * transaction completes, conditionally try to acquire the lock and
> > + * assume and return UNSAFE on failure.
> > + */
> > + if (ConditionalLockRelationOid(relid, lockmode))
> > + {
> > + rel = table_open(relid, NoLock);
> > + }
> > + else
> > + {
> > + context->max_hazard = PROPARALLEL_UNSAFE;
> > + return context->max_hazard;
> > + }
> >
> > Do we need this workaround if we lock just the parent table instead of
> > locking all the tables? Basically, can we safely identify the
> > parallel-safety of partitioned relation if we just have a lock on
> > parent relation?
>
> I believe the workaround is still needed in this case, because the
> workaround was added because of a test in which the parent table was
> exclusively locked in another concurrent transaction (as a result of
> ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a
> ShareLock on the parent table without hanging (and then ending up
> failing the test because of it).
>

Don't you think the test case design is flawed in that case? Because
even simple "select * from tpart;" will hang in planner while taking
share lock (the code flow is:
add_other_rels_to_query->expand_inherited_rtentry->expand_partitioned_rtentry)
once you take exclusive lock for a parallel session on the table.
Currently we never need to acquire any lock for Inserts in the planner
but not sure we can design a test case based on that assumption as we
can see it fails in this basic case.


> So at the moment the workaround is needed, even if just trying to lock
> the parent table.
>

I am not convinced, rather I think that the test case is not well
designed unless there is any other way (without taking share lock on
the relation) to determine parallel-safety of Inserts which neither of
us have thought of. I understand that you don't want to change that
test case as part of this patch so you are using this workaround.

> I'll do some more testing to determine the secondary issue of whether
> locks on the partition tables are needed, but at the moment I believe
> they are.
>

Fair enough but lets determine that by some testing and analysis. I
feel we should even add a comment if we require to lock all partition
tables. I see that we are already doing it for SELECT in the above
mentioned code path so maybe it is okay to do so for Inserts as well.

> >One more thing I have noticed is that for scan
> > relations (Select query), we do such checks much later based on
> > RelOptInfo (see set_rel_consider_parallel) which seems to have most of
> > the information required to perform parallel-safety checks but I guess
> > for ModifyTable (aka the Insert table) the equivalent doesn't seem
> > feasible but have you thought of doing at the later stage in planner?
> >
>
> Yes, and in fact I tried putting the checks in a later stage of the
> planner, and it's almost successful, except it then makes setting
> "parallelModeNeeded" very tricky indeed, because that is expected to
> be set based on whether the SQL is safe to run in parallel mode
> (paralleModeOK == true) and whether force_parallel_mode is not off.
> With parallel safety checks delayed to a later stage in the planner,
> it's then not known whether there are certain types of parallel-unsafe
> INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE
> ...), because processing for those doesn't reach those later stages of
> the planner where parallelism is being considered.
>

I guess if that is the only case then you can have that check in the
earlier stage of planner (we should be able to do that as the
information is present in Query) and other checks in the later stage.
However, I guess that is not the only case, we need to determine
parallel-safety of index expressions, trigger functions if any, any
other CHECK expressions on each of attribute, etc.

> So then to avoid
> errors from when parallel-mode is forced on and such unsafe INSERTs
> are run, the only real choice is to only allow parallelModeNeeded to
> be true for SELECT only (not INSERT), and this is kind of cheating and
> also not picking up cases where parallel-safe INSERT is run but
> invokes parallel-mode-unsafe features.
> My conclusion, at least for the moment, is to leave the check where it is.
>

Okay, then can we integrate the functionality of
MaxParallelHazardForModify in max_parallel_hazard? Calling it
separately looks bit awkward.

>
> > Few other comments on latest patch:
> > ===============================
> > 1.
> > MaxRelParallelHazardForModify()
> > {
> > ..
> > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
> > + {
> > + /*
> > ..
> >
> > Why to check CMD_UPDATE here?
> >
>
> That was a bit of forward-thinking, for when/if UPDATE/DELETE is
> supported in parallel-mode.
> Column default expressions and check-constraints are only applicable
> to INSERT and UPDATE.
> Note however that currently this function can only ever be called with
> commandType == CMD_INSERT.
>

I feel then for other command types there should be an Assert rather
than try to handle something which is not yet implemented nor it is
clear what all is required for that. It confuses the reader, at least
it confused me. Probably we can write a comment but I don't think we
should have any check for Update at this stage of work.

> > 2.
> > +void PrepareParallelModeForModify(CmdType commandType, bool
> > isParallelModifyLeader)
> > +{
> > + Assert(!IsInParallelMode());
> > +
> > + if (isParallelModifyLeader)
> > + (void)GetCurrentCommandId(true);
> > +
> > + (void)GetCurrentFullTransactionId();
> >
> > Here, we should use GetCurrentTransactionId() similar to heap_insert
> > or other heap operations. I am not sure why you have used
> > GetCurrentFullTransactionId?
> >
>
> GetCurrentTransactionId() and GetCurrentFullTransactionId() actually
> have the same functionality, just a different return value (which is
> not being used here).
>

Sure but lets use what is required.

> But anyway I've changed it to use GetCurrentTransactionId().
>

But comments in ExecutePlan and PrepareParallelModeForModify still
refer to FullTransactionId.

>
>
> > 4. Have you checked the overhead of this on the planner for different
> > kinds of statements like inserts into tables having 100 or 500
> > partitions? Similarly, it is good to check the overhead of domain
> > related checks added in the patch.
> >
>
> Checking that now and will post results soon.
>

Thanks.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Oct 30, 2020 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > So then to avoid
> > errors from when parallel-mode is forced on and such unsafe INSERTs
> > are run, the only real choice is to only allow parallelModeNeeded to
> > be true for SELECT only (not INSERT), and this is kind of cheating and
> > also not picking up cases where parallel-safe INSERT is run but
> > invokes parallel-mode-unsafe features.
> > My conclusion, at least for the moment, is to leave the check where it is.
> >
>
> Okay, then can we integrate the functionality of
> MaxParallelHazardForModify in max_parallel_hazard? Calling it
> separately looks bit awkward.
>

Looking into that.

> >
> > > Few other comments on latest patch:
> > > ===============================
> > > 1.
> > > MaxRelParallelHazardForModify()
> > > {
> > > ..
> > > + if (commandType == CMD_INSERT || commandType == CMD_UPDATE)
> > > + {
> > > + /*
> > > ..
> > >
> > > Why to check CMD_UPDATE here?
> > >
> >
> > That was a bit of forward-thinking, for when/if UPDATE/DELETE is
> > supported in parallel-mode.
> > Column default expressions and check-constraints are only applicable
> > to INSERT and UPDATE.
> > Note however that currently this function can only ever be called with
> > commandType == CMD_INSERT.
> >
>
> I feel then for other command types there should be an Assert rather
> than try to handle something which is not yet implemented nor it is
> clear what all is required for that. It confuses the reader, at least
> it confused me. Probably we can write a comment but I don't think we
> should have any check for Update at this stage of work.
>

OK, for now I'll restrict the checks to INSERT, but I'll add comments
to assist with potential future UPDATE support.

> > > 2.
> > > +void PrepareParallelModeForModify(CmdType commandType, bool
> > > isParallelModifyLeader)
> > > +{
> > > + Assert(!IsInParallelMode());
> > > +
> > > + if (isParallelModifyLeader)
> > > + (void)GetCurrentCommandId(true);
> > > +
> > > + (void)GetCurrentFullTransactionId();
> > >
> > > Here, we should use GetCurrentTransactionId() similar to heap_insert
> > > or other heap operations. I am not sure why you have used
> > > GetCurrentFullTransactionId?
> > >
> >
> > GetCurrentTransactionId() and GetCurrentFullTransactionId() actually
> > have the same functionality, just a different return value (which is
> > not being used here).
> >
>
> Sure but lets use what is required.
>
> > But anyway I've changed it to use GetCurrentTransactionId().
> >
>
> But comments in ExecutePlan and PrepareParallelModeForModify still
> refer to FullTransactionId.
>

I believe those comments are technically correct.
GetCurrentTransactionId() calls AssignTransactionId() to do all the
work - and the comment for that function says "Assigns a new permanent
FullTransactionId to the given TransactionState".

> >
> >
> > > 4. Have you checked the overhead of this on the planner for different
> > > kinds of statements like inserts into tables having 100 or 500
> > > partitions? Similarly, it is good to check the overhead of domain
> > > related checks added in the patch.
> > >
> >
> > Checking that now and will post results soon.
> >
>

I am seeing a fair bit of overhead in the planning for the INSERT
parallel-safety checks (mind you, compared to the overall performance
gain, it's not too bad).
Some representative timings for a parallel INSERT of a millions rows
into 100,250 and 500 partitions are shown below.

(1) Without patch

# Partitions                    Planning Time (ms)
Execution Time (ms)
100                                1.014
           4176.435
250                                0.404
           3842.414
500                                0.529
           4440.633


(2) With Parallel INSERT patch

# Partitions                    Planning Time (ms)
Execution Time (ms)
100                                11.420
          2131.148
250                                23.269
          3472.259
500                                36.531
          3238.868

I'm looking into how this can be improved by better integration into
the current code, and addressing locking concerns that you've
previously mentioned.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
vignesh C
Date:

>
> See attached patches.
>

Thanks for providing the patches.
I had reviewed v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to be
+ * included in the transaction state that is serialized in the parallel DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+       Assert(!IsInParallelMode());
+
+       (void)GetCurrentTransactionId();
+}

-> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes into a completely new table.  In the future, we can extend it to allow workers to write into the table" must be updated accordingly:
+        * modify any data using a CTE, or if this is a cursor operation, or if
+        * GUCs are set to values that don't permit parallelism, or if
+        * parallel-unsafe functions are present in the query tree.
         *
-        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+        * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
         * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
         * backend writes into a completely new table.  In the future, we can
         * extend it to allow workers to write into the table.  However, to allow

-> Also should we specify insert as "insert into select"
 
-> We could include a small writeup of the design may be in the commit message. It will be useful for review.

-> I felt the below two assignment statements can be in the else condition:
                glob->maxParallelHazard = max_parallel_hazard(parse);
                glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+               /*
+                * Additional parallel-mode safety checks are required in order to
+                * allow an underlying parallel query to be used for a
+                * table-modification command that is supported in parallel-mode.
+                */
+               if (glob->parallelModeOK &&
+                       IsModifySupportedInParallelMode(parse->commandType))
+               {
+                       glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+               }

something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);

-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+               /*
+                * Supported table-modification commands may require additional steps
+                * prior to entering parallel mode, such as assigning a FullTransactionId.
+                */

-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+               if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+                       break;
+
+               /*
+                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+                * the relation, and this would result in creation of new CommandIds
+                * on insert/update/delete and this isn't supported in a parallel
+                * worker (but is safe in the parallel leader).
+                */
+               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+               if (trigtype == RI_TRIGGER_FK)
+               {
+                       context->max_hazard = PROPARALLEL_RESTRICTED;
+                       /*
+                        * As we're looking for the max parallel hazard, we don't break
+                        * here; examine any further triggers ...
+                        */
+               }

-> Should we switch to non-parallel mode in this case, instead of throwing error?
+                       val = SysCacheGetAttr(CONSTROID, tup,
+                                               Anum_pg_constraint_conbin, &isnull);
+                       if (isnull)
+                               elog(ERROR, "null conbin for constraint %u", con->oid);
+                       conbin = TextDatumGetCString(val);

-> We could include a few tests for this in regression.

-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Hi Vignesh,

Thanks for reviewing the patches.

On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:
>
> -> commandType is not used, we can remove it.
> + * Prepare for entering parallel mode by assigning a FullTransactionId, to be
> + * included in the transaction state that is serialized in the parallel DSM.
> + */
> +void PrepareParallelModeForModify(CmdType commandType)
> +{
> +       Assert(!IsInParallelMode());
> +
> +       (void)GetCurrentTransactionId();
> +}

Thanks, at least for INSERT, it's not needed, so I'll remove it.

>
> -> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes
intoa completely new table.  In the future, we can extend it to allow workers to write into the table" must be updated
accordingly:
> +        * modify any data using a CTE, or if this is a cursor operation, or if
> +        * GUCs are set to values that don't permit parallelism, or if
> +        * parallel-unsafe functions are present in the query tree.
>          *
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> +        * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
>          * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
>          * backend writes into a completely new table.  In the future, we can
>          * extend it to allow workers to write into the table.  However, to allow
>
> -> Also should we specify insert as "insert into select"
>

I'll update it, appropriate to each patch.

> -> We could include a small writeup of the design may be in the commit message. It will be useful for review.
>

Will do so for the next patch version.

> -> I felt the below two assignment statements can be in the else condition:
>                 glob->maxParallelHazard = max_parallel_hazard(parse);
>                 glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +
> +               /*
> +                * Additional parallel-mode safety checks are required in order to
> +                * allow an underlying parallel query to be used for a
> +                * table-modification command that is supported in parallel-mode.
> +                */
> +               if (glob->parallelModeOK &&
> +                       IsModifySupportedInParallelMode(parse->commandType))
> +               {
> +                       glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
> +                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +               }
>
> something like:
> /*
> * Additional parallel-mode safety checks are required in order to
> * allow an underlying parallel query to be used for a
> * table-modification command that is supported in parallel-mode.
> */
> if (glob->parallelModeOK &&
> IsModifySupportedInParallelMode(parse->commandType))
> glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
> else
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse);
> glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
>

That won't work. As the comment is trying to point out, additional
parallel-safety checks (i.e. in addition to those done by
max_parallel_hazard()) are required to determine if INSERT can be
safely run in parallel-mode with an underlying parallel query.
Also, the max_parallel_hazard found from first calling
max_parallel_hazard() then needs to be fed into
MaxParallelHazardForModify(), in case it finds a worse parallel
hazard.
For example, max_parallel_hazard() may find something parallel
RESTRICTED, but then the additional parallel-safety checks done by
MaxParallelHazardForModify() find something parallel UNSAFE.

> -> Comments need slight adjustment, maybe you could run pgindent for the modified code.
> +               /*
> +                * Supported table-modification commands may require additional steps
> +                * prior to entering parallel mode, such as assigning a FullTransactionId.
> +                */
>

OK, will run pgindent.

> -> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in
thatcase? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
 
> +               if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
> +                       break;
> +
> +               /*
> +                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> +                * the relation, and this would result in creation of new CommandIds
> +                * on insert/update/delete and this isn't supported in a parallel
> +                * worker (but is safe in the parallel leader).
> +                */
> +               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> +               if (trigtype == RI_TRIGGER_FK)
> +               {
> +                       context->max_hazard = PROPARALLEL_RESTRICTED;
> +                       /*
> +                        * As we're looking for the max parallel hazard, we don't break
> +                        * here; examine any further triggers ...
> +                        */
> +               }
>

max_parallel_hazard_test won't return true for PROPARALLEL_RESTRICTED.
max_parallel_hazard_test only returns true when
"context.max_interesting" is found, and that is set to
PROPARALLEL_UNSAFE in max_parallel_hazard_for_modify().

> -> Should we switch to non-parallel mode in this case, instead of throwing error?
> +                       val = SysCacheGetAttr(CONSTROID, tup,
> +                                               Anum_pg_constraint_conbin, &isnull);
> +                       if (isnull)
> +                               elog(ERROR, "null conbin for constraint %u", con->oid);
> +                       conbin = TextDatumGetCString(val);
>

I didn't invent that error check, it's found in several other places
in the Postgres code (that error should only ever occur if the
database has been corrupted or intentionally invalidated).
Having said that, I agree that perhaps it's best to switch to
non-parallel mode in this case, but this wouldn't stop it erroring out
when the plan is actually run.

> -> We could include a few tests for this in regression.
>

Looking at adding relevant test cases.

> -> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
>

Looking at doc updates.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Nov 4, 2020 at 6:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Nov 3, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > -> commandType is not used, we can remove it.
> > + * Prepare for entering parallel mode by assigning a FullTransactionId, to be
> > + * included in the transaction state that is serialized in the parallel DSM.
> > + */
> > +void PrepareParallelModeForModify(CmdType commandType)
> > +{
> > +       Assert(!IsInParallelMode());
> > +
> > +       (void)GetCurrentTransactionId();
> > +}
>
> Thanks, at least for INSERT, it's not needed, so I'll remove it.
>

Or you might want to consider moving the check related to
IsModifySupportedInParallelMode() inside
PrepareParallelModeForModify(). That way the code might look a bit
cleaner.


-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Nov 4, 2020 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Or you might want to consider moving the check related to
> IsModifySupportedInParallelMode() inside
> PrepareParallelModeForModify(). That way the code might look a bit
> cleaner.
>

Posting an updated Parallel SELECT for "INSERT INTO ... SELECT ..."
patch which addresses previously-identified issues and suggestions,
and adds some tests and doc updates.
I won't post an updated Parallel INSERT patch just yet (which just
builds on the 1st patch), because there's at least a couple of issues
in this 1st patch which need to be discussed first.

Firstly, in order to perform parallel-safety checks in the case of
partitions, the patch currently recursively locks/unlocks
(AccessShareLock) each partition during such checks (as each partition
may itself be a partitioned table). Is there a better way of
performing the parallel-safety checks and reducing the locking
requirements?

Secondly, I found that when running "make check-world", the
"partition-concurrent-attach" test fails, because it is expecting a
partition constraint to be violated on insert, while an "alter table
attach partition ..." is concurrently being executed in another
transaction. Because of the partition locking done by the patch's
parallel-safety checking code, the insert blocks on the exclusive lock
held by the "alter table" in the other transaction until the
transaction ends, so the insert ends up successfully completing (and
thus fails the test) when the other transaction ends. To overcome this
test failure, the patch code was updated to instead perform a
conditional lock on the partition, and on failure (i.e. because of an
exclusive lock held somewhere else), just assume it's parallel-unsafe
because the parallel-safety can't be determined without blocking on
the lock. This is not ideal, but I'm not sure of what other approach
could be used and I am somewhat reluctant to change that test. If
anybody is familiar with the "partition-concurrent-attach" test, any
ideas or insights would be appreciated.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Nov 13, 2020 at 8:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Or you might want to consider moving the check related to
> > IsModifySupportedInParallelMode() inside
> > PrepareParallelModeForModify(). That way the code might look a bit
> > cleaner.
> >
>
> Posting an updated Parallel SELECT for "INSERT INTO ... SELECT ..."
> patch which addresses previously-identified issues and suggestions,
> and adds some tests and doc updates.
> I won't post an updated Parallel INSERT patch just yet (which just
> builds on the 1st patch), because there's at least a couple of issues
> in this 1st patch which need to be discussed first.
>
> Firstly, in order to perform parallel-safety checks in the case of
> partitions, the patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition
> may itself be a partitioned table). Is there a better way of
> performing the parallel-safety checks and reducing the locking
> requirements?
>
> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a
> partition constraint to be violated on insert, while an "alter table
> attach partition ..." is concurrently being executed in another
> transaction. Because of the partition locking done by the patch's
> parallel-safety checking code, the insert blocks on the exclusive lock
> held by the "alter table" in the other transaction until the
> transaction ends, so the insert ends up successfully completing (and
> thus fails the test) when the other transaction ends. To overcome this
> test failure, the patch code was updated to instead perform a
> conditional lock on the partition, and on failure (i.e. because of an
> exclusive lock held somewhere else), just assume it's parallel-unsafe
> because the parallel-safety can't be determined without blocking on
> the lock. This is not ideal, but I'm not sure of what other approach
> could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any
> ideas or insights would be appreciated.
>

Posting an updated set of patches, with some additional testing and
documentation updates, and including the latest version of the
Parallel Insert patch.
Any feedback appreciated, especially on the two points mentioned in
the previous post.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches, with some additional testing and
> documentation updates, and including the latest version of the
> Parallel Insert patch.
> Any feedback appreciated, especially on the two points mentioned in
> the previous post.
>

Posting an updated set of patches, since a minor bug was found in the
1st patch that was causing a postgresql-cfbot build failure.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
vignesh C
Date:
On Mon, Dec 7, 2020 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > Posting an updated set of patches, with some additional testing and
> > documentation updates, and including the latest version of the
> > Parallel Insert patch.
> > Any feedback appreciated, especially on the two points mentioned in
> > the previous post.
> >
>
> Posting an updated set of patches, since a minor bug was found in the
> 1st patch that was causing a postgresql-cfbot build failure.
>

Most of the code present in
v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
applicable for parallel copy patch also. The patch in this thread
handles the check for PROPARALLEL_UNSAFE, we could slightly make it
generic by handling like the comments below, that way this parallel
safety checks can be used based on the value set in
max_parallel_hazard_context. There is nothing wrong with the changes,
I'm providing these comments so that this patch can be generalized for
parallel checks and the same can also be used by parallel copy.
Few comments:
1)
+               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+               if (trigtype == RI_TRIGGER_FK)
+               {
+                       context->max_hazard = PROPARALLEL_RESTRICTED;
+
+                       /*
+                        * As we're looking for the max parallel
hazard, we don't break
+                        * here; examine any further triggers ...
+                        */
+               }

Can we change this something like:
trigtype = RI_FKey_trigger_type(trigger->tgfoid);
if (trigtype == RI_TRIGGER_FK)
{
        if(max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)
break;
}

This below line is not required as it will be taken care by
max_parallel_hazard_test.
context->max_hazard = PROPARALLEL_RESTRICTED;

2)
+               /* Recursively check each partition ... */
+               pdesc = RelationGetPartitionDesc(rel);
+               for (i = 0; i < pdesc->nparts; i++)
+               {
+                       if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+
                            command_type,
+
                            context,
+
                            AccessShareLock) == PROPARALLEL_UNSAFE)
+                       {
+                               table_close(rel, lockmode);
+                               return context->max_hazard;
+                       }
+               }


Can we change this something like:
/* Recursively check each partition ... */
pdesc = RelationGetPartitionDesc(rel);
for (i = 0; i < pdesc->nparts; i++)
{
char max_hazard = rel_max_parallel_hazard_for_modify(pdesc->oids[i],

            command_type,

            context,

            AccessShareLock);

    if(max_parallel_hazard_test(max_hazard, context)
    {
            table_close(rel, lockmode);
            return context->max_hazard;
    }
}

3)
Similarly for the below:
+       /*
+        * If there are any index expressions, check that they are parallel-mode
+        * safe.
+        */
+       if (index_expr_max_parallel_hazard_for_modify(rel, context) ==
PROPARALLEL_UNSAFE)
+       {
+               table_close(rel, lockmode);
+               return context->max_hazard;
+       }
+
+       /*
+        * If any triggers exist, check that they are parallel safe.
+        */
+       if (rel->trigdesc != NULL &&
+               trigger_max_parallel_hazard_for_modify(rel->trigdesc,
context) == PROPARALLEL_UNSAFE)
+       {
+               table_close(rel, lockmode);
+               return context->max_hazard;
+       }


4) Similar change required for the below:
+                       /*
+                        * If the column is of a DOMAIN type,
determine whether that
+                        * domain has any CHECK expressions that are
not parallel-mode
+                        * safe.
+                        */
+                       if (get_typtype(att->atttypid) == TYPTYPE_DOMAIN)
+                       {
+                               if
(domain_max_parallel_hazard_for_modify(att->atttypid, context) ==
PROPARALLEL_UNSAFE)
+                               {
+                                       table_close(rel, lockmode);
+                                       return context->max_hazard;
+                               }
+                       }

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
>
> Most of the code present in
> v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> applicable for parallel copy patch also. The patch in this thread
> handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> generic by handling like the comments below, that way this parallel
> safety checks can be used based on the value set in
> max_parallel_hazard_context. There is nothing wrong with the changes,
> I'm providing these comments so that this patch can be generalized for
> parallel checks and the same can also be used by parallel copy.

Hi Vignesh,

You are absolutely right in pointing that out, the code was taking
short-cuts knowing that for Parallel Insert,
"max_parallel_hazard_context.max_interesting" had been set to
PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
re-used by other callers.

I've attached a new set of patches that includes your suggested improvements.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.

I was going through v10-0001 patch where we are parallelizing only the
select part.

+ /*
+ * UPDATE is not currently supported in parallel-mode, so prohibit
+ * INSERT...ON CONFLICT...DO UPDATE...
+ */
+ if (parse->onConflict != NULL && parse->onConflict->action ==
ONCONFLICT_UPDATE)
+ return PROPARALLEL_UNSAFE;

I understand that we can now allow updates from the worker, but what
is the problem if we allow the parallel select even if there is an
update in the leader?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Most of the code present in
> > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > applicable for parallel copy patch also. The patch in this thread
> > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > generic by handling like the comments below, that way this parallel
> > > safety checks can be used based on the value set in
> > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > I'm providing these comments so that this patch can be generalized for
> > > parallel checks and the same can also be used by parallel copy.
> >
> > Hi Vignesh,
> >
> > You are absolutely right in pointing that out, the code was taking
> > short-cuts knowing that for Parallel Insert,
> > "max_parallel_hazard_context.max_interesting" had been set to
> > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > re-used by other callers.
> >
> > I've attached a new set of patches that includes your suggested improvements.
>
> I was going through v10-0001 patch where we are parallelizing only the
> select part.
>
> + /*
> + * UPDATE is not currently supported in parallel-mode, so prohibit
> + * INSERT...ON CONFLICT...DO UPDATE...
> + */
> + if (parse->onConflict != NULL && parse->onConflict->action ==
> ONCONFLICT_UPDATE)
> + return PROPARALLEL_UNSAFE;
>
> I understand that we can now allow updates from the worker, but what
> is the problem if we allow the parallel select even if there is an
> update in the leader?
>

I think we can't allow update even in leader without having a
mechanism for a shared combocid table. Right now, we share the
ComboCids at the beginning of the parallel query and then never change
it during the parallel query but if we allow updates in the leader
backend which can generate a combocid then we need a mechanism to
propagate that change. Does this make sense?

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > Most of the code present in
> > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > applicable for parallel copy patch also. The patch in this thread
> > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > generic by handling like the comments below, that way this parallel
> > > > safety checks can be used based on the value set in
> > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > I'm providing these comments so that this patch can be generalized for
> > > > parallel checks and the same can also be used by parallel copy.
> > >
> > > Hi Vignesh,
> > >
> > > You are absolutely right in pointing that out, the code was taking
> > > short-cuts knowing that for Parallel Insert,
> > > "max_parallel_hazard_context.max_interesting" had been set to
> > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > re-used by other callers.
> > >
> > > I've attached a new set of patches that includes your suggested improvements.
> >
> > I was going through v10-0001 patch where we are parallelizing only the
> > select part.
> >
> > + /*
> > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > + * INSERT...ON CONFLICT...DO UPDATE...
> > + */
> > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > ONCONFLICT_UPDATE)
> > + return PROPARALLEL_UNSAFE;
> >
> > I understand that we can now allow updates from the worker, but what
> > is the problem if we allow the parallel select even if there is an
> > update in the leader?
> >
>
> I think we can't allow update even in leader without having a
> mechanism for a shared combocid table. Right now, we share the
> ComboCids at the beginning of the parallel query and then never change
> it during the parallel query but if we allow updates in the leader
> backend which can generate a combocid then we need a mechanism to
> propagate that change. Does this make sense?
>

Okay, got it.  Basically, ONCONFLICT_UPDATE might run inside some
transaction block and there is a possibility that update may try to
update the same tuple is previously inserted by the same transaction
and in that case, it will generate the combo cid.  Thanks for
clarifying.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>

In v10-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO,
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2049,10 +2049,6 @@ heap_prepare_insert(Relation relation,
HeapTuple tup, TransactionId xid,
  * inserts in general except for the cases where inserts generate a new
  * CommandId (eg. inserts into a table having a foreign key column).
  */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));

I think I have given a comment long back that here we can have an
Assert to check if it is a parallel-worker and relation has a
foreign-key and probably other conditions if possible. It is better to
protect such cases from happening due to any bugs. Is there a reason
you have not handled it?

[1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > >
> > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > Most of the code present in
> > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > > applicable for parallel copy patch also. The patch in this thread
> > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > > generic by handling like the comments below, that way this parallel
> > > > > safety checks can be used based on the value set in
> > > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > > I'm providing these comments so that this patch can be generalized for
> > > > > parallel checks and the same can also be used by parallel copy.
> > > >
> > > > Hi Vignesh,
> > > >
> > > > You are absolutely right in pointing that out, the code was taking
> > > > short-cuts knowing that for Parallel Insert,
> > > > "max_parallel_hazard_context.max_interesting" had been set to
> > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > > re-used by other callers.
> > > >
> > > > I've attached a new set of patches that includes your suggested improvements.
> > >
> > > I was going through v10-0001 patch where we are parallelizing only the
> > > select part.
> > >
> > > + /*
> > > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > > + * INSERT...ON CONFLICT...DO UPDATE...
> > > + */
> > > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > > ONCONFLICT_UPDATE)
> > > + return PROPARALLEL_UNSAFE;
> > >
> > > I understand that we can now allow updates from the worker, but what
> > > is the problem if we allow the parallel select even if there is an
> > > update in the leader?
> > >
> >
> > I think we can't allow update even in leader without having a
> > mechanism for a shared combocid table. Right now, we share the
> > ComboCids at the beginning of the parallel query and then never change
> > it during the parallel query but if we allow updates in the leader
> > backend which can generate a combocid then we need a mechanism to
> > propagate that change. Does this make sense?
> >
>
> Okay, got it.  Basically, ONCONFLICT_UPDATE might run inside some
> transaction block and there is a possibility that update may try to
> update the same tuple is previously inserted by the same transaction
> and in that case, it will generate the combo cid.  Thanks for
> clarifying.
>

We can probably add a comment in the patch so that it is clear why we
are not allowing this case.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:


On Wed, 9 Dec 2020 at 5:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 9, 2020 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 9, 2020 at 2:38 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > >
> > > > On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > Most of the code present in
> > > > > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > > > > applicable for parallel copy patch also. The patch in this thread
> > > > > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > > > > generic by handling like the comments below, that way this parallel
> > > > > safety checks can be used based on the value set in
> > > > > max_parallel_hazard_context. There is nothing wrong with the changes,
> > > > > I'm providing these comments so that this patch can be generalized for
> > > > > parallel checks and the same can also be used by parallel copy.
> > > >
> > > > Hi Vignesh,
> > > >
> > > > You are absolutely right in pointing that out, the code was taking
> > > > short-cuts knowing that for Parallel Insert,
> > > > "max_parallel_hazard_context.max_interesting" had been set to
> > > > PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> > > > re-used by other callers.
> > > >
> > > > I've attached a new set of patches that includes your suggested improvements.
> > >
> > > I was going through v10-0001 patch where we are parallelizing only the
> > > select part.
> > >
> > > + /*
> > > + * UPDATE is not currently supported in parallel-mode, so prohibit
> > > + * INSERT...ON CONFLICT...DO UPDATE...
> > > + */
> > > + if (parse->onConflict != NULL && parse->onConflict->action ==
> > > ONCONFLICT_UPDATE)
> > > + return PROPARALLEL_UNSAFE;
> > >
> > > I understand that we can now allow updates from the worker, but what
> > > is the problem if we allow the parallel select even if there is an
> > > update in the leader?
> > >
> >
> > I think we can't allow update even in leader without having a
> > mechanism for a shared combocid table. Right now, we share the
> > ComboCids at the beginning of the parallel query and then never change
> > it during the parallel query but if we allow updates in the leader
> > backend which can generate a combocid then we need a mechanism to
> > propagate that change. Does this make sense?
> >
>
> Okay, got it.  Basically, ONCONFLICT_UPDATE might run inside some
> transaction block and there is a possibility that update may try to
> update the same tuple is previously inserted by the same transaction
> and in that case, it will generate the combo cid.  Thanks for
> clarifying.
>

We can probably add a comment in the patch so that it is clear why we
are not allowing this case.

+1
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Greg Nancarrow <gregn4422@gmail.com>
> Firstly, in order to perform parallel-safety checks in the case of partitions, the
> patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition may
> itself be a partitioned table). Is there a better way of performing the
> parallel-safety checks and reducing the locking requirements?

First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased
planningtime is negligible when the parallel insert is intentionally used for loading large amount of data.  However,
it'sa problem if the overhead is imposed on OLTP transactions.  Does the overhead occur with the default values of
max_parallel_workers_per_gather= 2 and max_parall_workers = 8?
 

To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like
relhasindexesand relhas triggers.  The concerning point is that we have to maintain the accuracy of the value when
droppingancillary objects around the table/partition.
 


> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a partition
> constraint to be violated on insert, while an "alter table attach partition ..." is
> concurrently being executed in another transaction. Because of the partition
> locking done by the patch's parallel-safety checking code, the insert blocks on
> the exclusive lock held by the "alter table" in the other transaction until the
> transaction ends, so the insert ends up successfully completing (and thus fails
> the test) when the other transaction ends. To overcome this test failure, the
> patch code was updated to instead perform a conditional lock on the partition,
> and on failure (i.e. because of an exclusive lock held somewhere else), just
> assume it's parallel-unsafe because the parallel-safety can't be determined
> without blocking on the lock. This is not ideal, but I'm not sure of what other
> approach could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> insights would be appreciated.

That test looks sane.  I think what we should do is to disable parallel operation during that test.  It looks like some
ofother existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0.  It's not strange
thatsome tests fail with some configuration.  autovacuum is disabled in many places of the regression test.
 

Rather, I don't think we should introduce the trick to use ConditionalLockAcquire().  Otherwise, the insert would be
executedin a serial fashion without the user knowing it -- "What?  The insert suddenly slowed down multiple times
today,and it didn't finish within the planned maintenance window.  What's wrong?"
 


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
vignesh C
Date:
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

Thanks for fixing and posting a new patch.
Few comments:
+                                       Node       *index_expr;
+
+                                       if (index_expr_item == NULL)
 /* shouldn't happen */
+                                               elog(ERROR, "too few
entries in indexprs list");
+
+                                       index_expr = (Node *)
lfirst(index_expr_item);

We can change this elog to below to maintain consistency:
if (index_expr_item == NULL)    /* shouldn't happen */
{
  context->max_hazard = PROPARALLEL_UNSAFE;
  return context->max_hazard;
}

static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
/*
* To allow parallel inserts, we need to ensure that they are safe to be
* performed in workers. We have the infrastructure to allow parallel
* inserts in general except for the cases where inserts generate a new
* CommandId (eg. inserts into a table having a foreign key column).
*/
I felt we could remove the above comments or maybe rephrase it.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Dec 10, 2020 at 3:50 PM vignesh C <vignesh21@gmail.com> wrote:
> Few comments:
> +                                       Node       *index_expr;
> +
> +                                       if (index_expr_item == NULL)
>  /* shouldn't happen */
> +                                               elog(ERROR, "too few
> entries in indexprs list");
> +
> +                                       index_expr = (Node *)
> lfirst(index_expr_item);
>
> We can change this elog to below to maintain consistency:
> if (index_expr_item == NULL)    /* shouldn't happen */
> {
>   context->max_hazard = PROPARALLEL_UNSAFE;
>   return context->max_hazard;
> }
>

Thanks. I think you pointed out something similar to this before, but
somehow I must have missed updating this as well (I was just following
existing error handling for this case in the Postgres code).
I'll update it as you suggest, in the next version of the patch I post.

> static HeapTuple
> heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
> CommandId cid, int options)
> {
> /*
> * To allow parallel inserts, we need to ensure that they are safe to be
> * performed in workers. We have the infrastructure to allow parallel
> * inserts in general except for the cases where inserts generate a new
> * CommandId (eg. inserts into a table having a foreign key column).
> */
> I felt we could remove the above comments or maybe rephrase it.
>

That is Amit's comment, and I'm reluctant to change it because it is
still applicable even after application of this patch.
Amit has previously suggested that I add an Assert here, to match the
comment (to replace the original Parallel-worker error-check that I
removed), so I am looking into that.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

 /*
+ * PrepareParallelMode
+ *
+ * Prepare for entering parallel mode, based on command-type.
+ */
+void
+PrepareParallelMode(CmdType commandType)
+{
+ Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
+
+ if (IsModifySupportedInParallelMode(commandType))
+ {
+ /*
+ * Prepare for entering parallel mode by assigning a
+ * FullTransactionId, to be included in the transaction state that is
+ * serialized in the parallel DSM.
+ */
+ (void) GetCurrentTransactionId();
+ }
+}

Why do we need to serialize the transaction ID for 0001?  I mean in
0001 we are just allowing the SELECT to be executed in parallel so why
we would need the transaction Id for that.  I agree that we would need
this once we try to perform the Insert also from the worker in the
remaining patches.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Dec 10, 2020 at 1:23 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Greg Nancarrow <gregn4422@gmail.com>
> > Firstly, in order to perform parallel-safety checks in the case of partitions, the
> > patch currently recursively locks/unlocks
> > (AccessShareLock) each partition during such checks (as each partition may
> > itself be a partitioned table). Is there a better way of performing the
> > parallel-safety checks and reducing the locking requirements?
>
> First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased
planningtime is negligible when the parallel insert is intentionally used for loading large amount of data.  However,
it'sa problem if the overhead is imposed on OLTP transactions.  Does the overhead occur with the default values of
max_parallel_workers_per_gather= 2 and max_parall_workers = 8? 
>
> To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like
relhasindexesand relhas triggers.  The concerning point is that we have to maintain the accuracy of the value when
droppingancillary objects around the table/partition. 
>

Having information in another table that needs to be accessed is
likely to also have locking requirements.
Here the issue is specifically with partitions, because otherwise if
the target relation is not a partitioned table, it will already be
locked prior to planning as part of the parse/re-write phase (and you
will notice that the initial lock-mode, used by the parallel-safety
checking code for opening the table, is NoLock).

>
> > Secondly, I found that when running "make check-world", the
> > "partition-concurrent-attach" test fails, because it is expecting a partition
> > constraint to be violated on insert, while an "alter table attach partition ..." is
> > concurrently being executed in another transaction. Because of the partition
> > locking done by the patch's parallel-safety checking code, the insert blocks on
> > the exclusive lock held by the "alter table" in the other transaction until the
> > transaction ends, so the insert ends up successfully completing (and thus fails
> > the test) when the other transaction ends. To overcome this test failure, the
> > patch code was updated to instead perform a conditional lock on the partition,
> > and on failure (i.e. because of an exclusive lock held somewhere else), just
> > assume it's parallel-unsafe because the parallel-safety can't be determined
> > without blocking on the lock. This is not ideal, but I'm not sure of what other
> > approach could be used and I am somewhat reluctant to change that test. If
> > anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> > insights would be appreciated.
>
> That test looks sane.  I think what we should do is to disable parallel operation during that test.  It looks like
someof other existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0.  It's not
strangethat some tests fail with some configuration.  autovacuum is disabled in many places of the regression test. 
>
> Rather, I don't think we should introduce the trick to use ConditionalLockAcquire().  Otherwise, the insert would be
executedin a serial fashion without the user knowing it -- "What?  The insert suddenly slowed down multiple times
today,and it didn't finish within the planned maintenance window.  What's wrong?" 
>
>

I think that's probably the best idea, to disable parallel operation
during that test.
However, that doesn't change the fact that, after removal of that
"trick", then the partition locking used in the parallel-safety
checking code will block, if a concurrent transaction has exclusively
locked that partition (as in this test case), and thus there is no
guarantee that a parallel insert will execute faster compared to
serial execution (as such locks tend to be held until the end of the
transaction).

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
>
>  /*
> + * PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> +
> + if (IsModifySupportedInParallelMode(commandType))
> + {
> + /*
> + * Prepare for entering parallel mode by assigning a
> + * FullTransactionId, to be included in the transaction state that is
> + * serialized in the parallel DSM.
> + */
> + (void) GetCurrentTransactionId();
> + }
> +}
>
> Why do we need to serialize the transaction ID for 0001?  I mean in
> 0001 we are just allowing the SELECT to be executed in parallel so why
> we would need the transaction Id for that.  I agree that we would need
> this once we try to perform the Insert also from the worker in the
> remaining patches.
>

There's a very good reason. It's related to parallel-mode checks for
Insert and how the XID is lazily acquired if required.
When allowing SELECT to be executed in parallel, we're in
parallel-mode and the leader interleaves Inserts with retrieval of the
tuple data from the workers.
You will notice that heap_insert() calls GetTransactionId() as the
very first thing it does. If the FullTransactionId is not valid,
AssignTransactionId() is then called, which then executes this code:

    /*
     * Workers synchronize transaction state at the beginning of each parallel
     * operation, so we can't account for new XIDs at this point.
     */
    if (IsInParallelMode() || IsParallelWorker())
        elog(ERROR, "cannot assign XIDs during a parallel operation");

So that code (currently) has no way of knowing that a XID is being
(lazily) assigned at the beginning, or somewhere in the middle of, a
parallel operation.
This is the reason why PrepareParallelMode() is calling
GetTransactionId() up-front, to ensure a FullTransactionId is assigned
up-front, prior to parallel-mode (so then there won't be an attempted
XID assignment).

If you remove the GetTransactionId() call from PrepareParallelMode()
and run "make installcheck-world" with "force_parallel_mode=regress"
in effect, many tests will fail with:
    ERROR:  cannot assign XIDs during a parallel operation

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Dilip Kumar
Date:
On Thu, Dec 10, 2020 at 1:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 5:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >
> >  /*
> > + * PrepareParallelMode
> > + *
> > + * Prepare for entering parallel mode, based on command-type.
> > + */
> > +void
> > +PrepareParallelMode(CmdType commandType)
> > +{
> > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> > +
> > + if (IsModifySupportedInParallelMode(commandType))
> > + {
> > + /*
> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();
> > + }
> > +}
> >
> > Why do we need to serialize the transaction ID for 0001?  I mean in
> > 0001 we are just allowing the SELECT to be executed in parallel so why
> > we would need the transaction Id for that.  I agree that we would need
> > this once we try to perform the Insert also from the worker in the
> > remaining patches.
> >
>
> There's a very good reason. It's related to parallel-mode checks for
> Insert and how the XID is lazily acquired if required.
> When allowing SELECT to be executed in parallel, we're in
> parallel-mode and the leader interleaves Inserts with retrieval of the
> tuple data from the workers.
> You will notice that heap_insert() calls GetTransactionId() as the
> very first thing it does. If the FullTransactionId is not valid,
> AssignTransactionId() is then called, which then executes this code:
>
>     /*
>      * Workers synchronize transaction state at the beginning of each parallel
>      * operation, so we can't account for new XIDs at this point.
>      */
>     if (IsInParallelMode() || IsParallelWorker())
>         elog(ERROR, "cannot assign XIDs during a parallel operation");
>
> So that code (currently) has no way of knowing that a XID is being
> (lazily) assigned at the beginning, or somewhere in the middle of, a
> parallel operation.
> This is the reason why PrepareParallelMode() is calling
> GetTransactionId() up-front, to ensure a FullTransactionId is assigned
> up-front, prior to parallel-mode (so then there won't be an attempted
> XID assignment).
>
> If you remove the GetTransactionId() call from PrepareParallelMode()
> and run "make installcheck-world" with "force_parallel_mode=regress"
> in effect, many tests will fail with:
>     ERROR:  cannot assign XIDs during a parallel operation

Yeah got it, I missed that point that the goal is the avoid assigning
the Transaction Id when we are in parallel mode.  But IIUC at least
for the first patch we don't want to serialize the XID in the
transaction state right because workers don't need the xid as they are
only doing select.  So maybe we can readjust the comment slightly in
the below code

> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Posting an updated set of patches to address recent feedback:

- Removed conditional-locking code used in parallel-safety checking
code (Tsunakawa-san feedback). It turns out that for the problem test
case, no parallel-safety checking should be occurring that locks
relations because those inserts are specifying VALUES, not an
underlying SELECT, so the parallel-safety checking code was updated to
bail out early if no underlying SELECT is specified for the INSERT. No
change to the test code was required.
- Added comment to better explain the reason for treating "INSERT ...
ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
- Added assertion to heap_prepare_insert() (Amit)
- Updated error handling for NULL index_expr_item case (Vignesh)


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi

> Posting an updated set of patches to address recent feedback:
> 
> - Removed conditional-locking code used in parallel-safety checking code
> (Tsunakawa-san feedback). It turns out that for the problem test case, no
> parallel-safety checking should be occurring that locks relations because
> those inserts are specifying VALUES, not an underlying SELECT, so the
> parallel-safety checking code was updated to bail out early if no underlying
> SELECT is specified for the INSERT. No change to the test code was required.
> - Added comment to better explain the reason for treating "INSERT ...
> ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
> - Added assertion to heap_prepare_insert() (Amit)
> - Updated error handling for NULL index_expr_item case (Vignesh)

+
+    index_oid_list = RelationGetIndexList(rel);
...

As memtioned in the comments of RelationGetIndexList:
* we return a copy of the list palloc'd in the caller's context.  The caller
* may list_free() the returned list after scanning it.

Shall we list_free(index_oid_list) at the end of function ?
Just to avoid potential memory leak.

Best regards,
houzj




RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi

I have an issue about the parallel-safety checks.

If target table is foreign table or temporary table, 
rel_max_parallel_hazard_for_modify will return PROPARALLEL_UNSAFE,
which not only disable parallel insert but also disable underlying parallel SELECT.

+create temporary table temp_names (like names);
+explain (costs off) insert into temp_names select * from names;
+       QUERY PLAN        
+-------------------------
+ Insert on temp_names
+   ->  Seq Scan on names
+(2 rows)

I may be wrong, and if I miss sth in previous mails, please give me some hints.
IMO, serial insertion with underlying parallel SELECT can be considered for foreign table or temporary table,
as the insertions only happened in the leader process.

Are there any special considerations for this case ?

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Dec 23, 2020 at 7:15 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> I have an issue about the parallel-safety checks.
>
> If target table is foreign table or temporary table,
> rel_max_parallel_hazard_for_modify will return PROPARALLEL_UNSAFE,
> which not only disable parallel insert but also disable underlying parallel SELECT.
>
> +create temporary table temp_names (like names);
> +explain (costs off) insert into temp_names select * from names;
> +       QUERY PLAN
> +-------------------------
> + Insert on temp_names
> +   ->  Seq Scan on names
> +(2 rows)
>
> I may be wrong, and if I miss sth in previous mails, please give me some hints.
> IMO, serial insertion with underlying parallel SELECT can be considered for foreign table or temporary table,
> as the insertions only happened in the leader process.
>

I don't think we support parallel scan for temporary tables. Can you
please try once both of these operations without Insert being
involved? If you are able to produce a parallel plan without Insert
then we can see why it is not supported with Insert.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi

> > I may be wrong, and if I miss sth in previous mails, please give me some
> hints.
> > IMO, serial insertion with underlying parallel SELECT can be
> > considered for foreign table or temporary table, as the insertions only
> happened in the leader process.
> >
> 
> I don't think we support parallel scan for temporary tables. Can you please
> try once both of these operations without Insert being involved? If you
> are able to produce a parallel plan without Insert then we can see why it
> is not supported with Insert.

Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) table is
temporary.
And you are right that parallel select is not enabled when temporary table is in select part.

I test for the case when insert's target table is temporary or not.
--insert into not temporary table---
postgres=# explain (costs off) insert into notemp select * from test where i < 600;
              QUERY PLAN               
---------------------------------------
 Gather
   Workers Planned: 4
   ->  Insert on notemp
         ->  Parallel Seq Scan on test
               Filter: (i < 600)

--insert into temporary table---
postgres=# explain (costs off) insert into temp select * from test where i < 600;
       QUERY PLAN         
---------------------------
 Insert on temp
   ->  Seq Scan on test
         Filter: (i < 600)

---without insert part---
postgres=# explain (costs off) select * from test where i < 600;
           QUERY PLAN            
---------------------------------
 Gather
   Workers Planned: 4
   ->  Parallel Seq Scan on test
         Filter: (i < 600)

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> > > I may be wrong, and if I miss sth in previous mails, please give me some
> > hints.
> > > IMO, serial insertion with underlying parallel SELECT can be
> > > considered for foreign table or temporary table, as the insertions only
> > happened in the leader process.
> > >
> >
> > I don't think we support parallel scan for temporary tables. Can you please
> > try once both of these operations without Insert being involved? If you
> > are able to produce a parallel plan without Insert then we can see why it
> > is not supported with Insert.
>
> Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) table
istemporary.
 
> And you are right that parallel select is not enabled when temporary table is in select part.
>

I think Select can be parallel for this case and we should support this case.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Antonin Houska
Date:
Greg Nancarrow <gregn4422@gmail.com> wrote:

> Posting an updated set of patches to address recent feedback:

Following is my review.

v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
--------------------------------------------------------------

@@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
         /* all the cheap tests pass, so scan the query tree */
         glob->maxParallelHazard = max_parallel_hazard(parse);
         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+        /*
+         * Additional parallel-mode safety checks are required in order to
+         * allow an underlying parallel query to be used for a
+         * table-modification command that is supported in parallel-mode.
+         */
+        if (glob->parallelModeOK &&
+            IsModifySupportedInParallelMode(parse->commandType))
+        {
+            glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
+            glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+        }

Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
functions should not be called by parallel worker.


@@ -1015,6 +1016,27 @@ IsInParallelMode(void)
 }

 /*
+ *    PrepareParallelMode
+ *
+ * Prepare for entering parallel mode, based on command-type.
+ */
+void
+PrepareParallelMode(CmdType commandType)
+{
+    Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);

Isn't the test of force_parallel_mode just a hack to make regression tests
pass? When I removed this part and ran the regression tests with
force_parallel_mode=regress, the assertion fired when executing a subquery
because the executor was already in parallel mode due to the main query
execution. I think the function should be implemented such that it does not
mind repeated execution by the same backend.

As an alternative, have you considered allocation of the XID even in parallel
mode? I imagine that the first parallel worker that needs the XID for
insertions allocates it and shares it with the other workers as well as with
the leader process.

One problem of the current patch version is that the "INSERT INTO ... SELECT
..." statement consumes XID even if the SELECT eventually does not return any
row. However, if the same query is processed w/o parallelism, the XID is only
allocated if at least one tuple needs to be inserted.


v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
-------------------------------------------------------------------

@@ -1021,12 +1039,15 @@ IsInParallelMode(void)
  * Prepare for entering parallel mode, based on command-type.
  */
 void
-PrepareParallelMode(CmdType commandType)
+PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
 {
        Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);

        if (IsModifySupportedInParallelMode(commandType))
        {
+               if (isParallelModifyLeader)
+                       (void) GetCurrentCommandId(true);

I miss a comment here. I suppose this is to set currentCommandIdUsed, so that
the leader process gets a new commandId for the following statements in the
same transaction, and thus it can see the rows inserted by the parallel
workers?

If my understanding is correct, I think that the leader should not participate
in the execution of the Insert node, else it would use higher commandId than
the workers. That would be weird, although probably not data corruption. I
wonder if parallel_leader_participation should be considered false for the
"Gather -> Insert -> ..." plans.


@@ -144,9 +148,19 @@ ExecGather(PlanState *pstate)
     GatherState *node = castNode(GatherState, pstate);
     TupleTableSlot *slot;
     ExprContext *econtext;
+    ModifyTableState *nodeModifyTableState = NULL;
+    bool        isParallelModifyLeader = false;
+    bool        isParallelModifyWithReturning = false;

The variable names are quite long. Since this code deals with the Gather node,
I think that both "Parallel" and "Leader" components can be removed.


@@ -418,14 +446,35 @@ ExecShutdownGatherWorkers(GatherState *node)
 void
 ExecShutdownGather(GatherState *node)
 {
-    ExecShutdownGatherWorkers(node);
+    bool        isParallelModifyLeader;

Likewise, the variable name.


@@ -208,7 +236,7 @@ ExecGather(PlanState *pstate)
         }

         /* Run plan locally if no workers or enabled and not single-copy. */
-        node->need_to_scan_locally = (node->nreaders == 0)
+        node->need_to_scan_locally = (node->nworkers_launched <= 0)
             || (!gather->single_copy && parallel_leader_participation);
         node->initialized = true;
     }

Is this change needed? The code just before this test indicates that nreaders
should be equal to nworkers_launched.


In grouping_planner(), this branch

+    /* Consider a supported parallel table-modification command */
+    if (IsModifySupportedInParallelMode(parse->commandType) &&
+        !inheritance_update &&
+        final_rel->consider_parallel &&
+        parse->rowMarks == NIL)
+    {

is very similar to creation of the non-parallel ModifyTablePaths - perhaps an
opportunity to move the common code into a new function.


@@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
         }
     }

+    if (parallel_modify_partial_path_count > 0)
+    {
+        final_rel->rows = current_rel->rows;    /* ??? why hasn't this been
+                                                 * set above somewhere ???? */
+        generate_useful_gather_paths(root, final_rel, false);
+    }
+
     extra.limit_needed = limit_needed(parse);
     extra.limit_tuples = limit_tuples;
     extra.count_est = count_est;

A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no
need to count the paths using parallel_modify_partial_path_count.


@@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
     PlannerGlobal *glob = root->glob;
     int            rtoffset = list_length(glob->finalrtable);
     ListCell   *lc;
+    Plan       *finalPlan;

     /*
      * Add all the query's RTEs to the flattened rangetable.  The live ones
@@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
     }

     /* Now fix the Plan tree */
-    return set_plan_refs(root, plan, rtoffset);
+    finalPlan = set_plan_refs(root, plan, rtoffset);
+    if (finalPlan != NULL && IsA(finalPlan, Gather))
+    {
+        Plan       *subplan = outerPlan(finalPlan);
+
+        if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
+        {
+            finalPlan->targetlist = copyObject(subplan->targetlist);
+        }
+    }
+    return finalPlan;
 }

I'm not sure if the problem of missing targetlist should be handled here (BTW,
NIL is the constant for an empty list, not NULL). Obviously this is a
consequence of the fact that the ModifyTable node has no regular targetlist.

Actually I don't quite understand why (in the current master branch) the
targetlist initialized in set_plan_refs()

    /*
     * Set up the visible plan targetlist as being the same as
     * the first RETURNING list. This is for the use of
     * EXPLAIN; the executor won't pay any attention to the
     * targetlist.  We postpone this step until here so that
     * we don't have to do set_returning_clause_references()
     * twice on identical targetlists.
     */
    splan->plan.targetlist = copyObject(linitial(newRL));

is not used. Instead, ExecInitModifyTable() picks the first returning list
again:

    /*
     * Initialize result tuple slot and assign its rowtype using the first
     * RETURNING list.  We assume the rest will look the same.
     */
    mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);

So if you set the targetlist in create_modifytable_plan() (according to
best_path->returningLists), or even in create_modifytable_path(), and ensure
that it gets propagated to the Gather node (generate_gather_pahs currently
uses rel->reltarget), then you should no longer need to tweak
setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the
targetlist. However I don't guarantee that this is the best approach - some
planner expert should speak up.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
> v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> --------------------------------------------------------------
>
> @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
>                 /* all the cheap tests pass, so scan the query tree */
>                 glob->maxParallelHazard = max_parallel_hazard(parse);
>                 glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +
> +               /*
> +                * Additional parallel-mode safety checks are required in order to
> +                * allow an underlying parallel query to be used for a
> +                * table-modification command that is supported in parallel-mode.
> +                */
> +               if (glob->parallelModeOK &&
> +                       IsModifySupportedInParallelMode(parse->commandType))
> +               {
> +                       glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
> +                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +               }
>
> Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> functions should not be called by parallel worker.
>

What in the above change indicates that the parallel_restricted will
be allowed in parallel workers. This just sets paralleModeOK to allow
parallel plans for Selects if the Insert can be performed safely in a
leader backend.

>
> @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
>  }
>
>  /*
> + *     PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> +       Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Isn't the test of force_parallel_mode just a hack to make regression tests
> pass? When I removed this part and ran the regression tests with
> force_parallel_mode=regress, the assertion fired when executing a subquery
> because the executor was already in parallel mode due to the main query
> execution.
>

I think this Assert is bogus. We are allowed to enter in parallel-mode
if we are already in parallel-mode, see EnterParallelMode. But we
shouldn't be allowed allocate xid in parallel-mode. So the
Assert(!IsInParallelMode()) should be moved inside the check if
(IsModifySupportedInParallelMode(commandType)) in this function. Can
you check if it still fails after such a modification?

> As an alternative, have you considered allocation of the XID even in parallel
> mode? I imagine that the first parallel worker that needs the XID for
> insertions allocates it and shares it with the other workers as well as with
> the leader process.
>

As a matter of this patch
(v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
never need to allocate xids by workers because Insert is always
performed by leader backend. Even, if we want to do what you are
suggesting it would be tricky because currently, we don't have such an
infrastructure where we can pass information among workers.

> One problem of the current patch version is that the "INSERT INTO ... SELECT
> ..." statement consumes XID even if the SELECT eventually does not return any
> row. However, if the same query is processed w/o parallelism, the XID is only
> allocated if at least one tuple needs to be inserted.
>

Yeah, that is true but I think this can happen w/o parallelism for
updates and deletes where by the time we try to modify the row, it got
modified by a concurrent session and the first session will needlessly
allocate XID.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Antonin Houska
Date:
Amit Kapila <amit.kapila16@gmail.com> wrote:

> On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote:
> >
> > Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > > Posting an updated set of patches to address recent feedback:
> >
> > Following is my review.
> >
> > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> > --------------------------------------------------------------
> >
> > @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
> >                 /* all the cheap tests pass, so scan the query tree */
> >                 glob->maxParallelHazard = max_parallel_hazard(parse);
> >                 glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > +
> > +               /*
> > +                * Additional parallel-mode safety checks are required in order to
> > +                * allow an underlying parallel query to be used for a
> > +                * table-modification command that is supported in parallel-mode.
> > +                */
> > +               if (glob->parallelModeOK &&
> > +                       IsModifySupportedInParallelMode(parse->commandType))
> > +               {
> > +                       glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
> > +                       glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > +               }
> >
> > Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> > functions should not be called by parallel worker.
> >
>
> What in the above change indicates that the parallel_restricted will
> be allowed in parallel workers. This just sets paralleModeOK to allow
> parallel plans for Selects if the Insert can be performed safely in a
> leader backend.

Well, this is just the initial setting, while the distinction between "Gather
-> Insert -> ..." and "Insert -> Gather -> ..." is made later. So I withdraw
my objection.

> >
> > @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
> >  }
> >
> >  /*
> > + *     PrepareParallelMode
> > + *
> > + * Prepare for entering parallel mode, based on command-type.
> > + */
> > +void
> > +PrepareParallelMode(CmdType commandType)
> > +{
> > +       Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> >
> > Isn't the test of force_parallel_mode just a hack to make regression tests
> > pass? When I removed this part and ran the regression tests with
> > force_parallel_mode=regress, the assertion fired when executing a subquery
> > because the executor was already in parallel mode due to the main query
> > execution.
> >
>
> I think this Assert is bogus. We are allowed to enter in parallel-mode
> if we are already in parallel-mode, see EnterParallelMode.

Right.

> But we shouldn't be allowed allocate xid in parallel-mode. So the
> Assert(!IsInParallelMode()) should be moved inside the check if
> (IsModifySupportedInParallelMode(commandType)) in this function. Can you
> check if it still fails after such a modification?

Yes, this works.


> > As an alternative, have you considered allocation of the XID even in parallel
> > mode? I imagine that the first parallel worker that needs the XID for
> > insertions allocates it and shares it with the other workers as well as with
> > the leader process.
> >
>
> As a matter of this patch
> (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
> never need to allocate xids by workers because Insert is always
> performed by leader backend.

When writing this comment, I was actually thinking of
v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather
than v11-0001, see below. On the other hand, if we allowed XID allocation in
the parallel mode (as a separate patch), even the 0001 patch would get a bit
simpler.

> Even, if we want to do what you are suggesting it would be tricky because
> currently, we don't have such an infrastructure where we can pass
> information among workers.

How about barriers (storage/ipc/barrier.c)? What I imagine is that all the
workers "meet" at the barrier when they want to insert the first tuple. Then
one of them allocates the XID, makes it available to others (via shared
memory) and all the workers can continue.

> > One problem of the current patch version is that the "INSERT INTO ... SELECT
> > ..." statement consumes XID even if the SELECT eventually does not return any
> > row. However, if the same query is processed w/o parallelism, the XID is only
> > allocated if at least one tuple needs to be inserted.
> >

> Yeah, that is true but I think this can happen w/o parallelism for
> updates and deletes where by the time we try to modify the row, it got
> modified by a concurrent session and the first session will needlessly
> allocate XID.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
..
>
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
> -------------------------------------------------------------------
>
> @@ -1021,12 +1039,15 @@ IsInParallelMode(void)
>   * Prepare for entering parallel mode, based on command-type.
>   */
>  void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
>  {
>         Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
>
>         if (IsModifySupportedInParallelMode(commandType))
>         {
> +               if (isParallelModifyLeader)
> +                       (void) GetCurrentCommandId(true);
>
> I miss a comment here. I suppose this is to set currentCommandIdUsed, so that
> the leader process gets a new commandId for the following statements in the
> same transaction, and thus it can see the rows inserted by the parallel
> workers?
>

oh no, leader backend and worker backends must use the same commandId.
I am also not sure if we need this because for Insert statements we
already call GetCurrentCommandId(true) is standard_ExecutorStart. We
don't want the rows visibility behavior for parallel-inserts any
different than non-parallel ones.

> If my understanding is correct, I think that the leader should not participate
> in the execution of the Insert node, else it would use higher commandId than
> the workers. That would be weird, although probably not data corruption.
>

Yeah, exactly this is the reason both leader and backends must use the
same commandId.

> I
> wonder if parallel_leader_participation should be considered false for the
> "Gather -> Insert -> ..." plans.
>

If what I said above is correct then this is moot.

>
>
> @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate)
>                 }
>
>                 /* Run plan locally if no workers or enabled and not single-copy. */
> -               node->need_to_scan_locally = (node->nreaders == 0)
> +               node->need_to_scan_locally = (node->nworkers_launched <= 0)
>                         || (!gather->single_copy && parallel_leader_participation);
>                 node->initialized = true;
>         }
>
> Is this change needed? The code just before this test indicates that nreaders
> should be equal to nworkers_launched.
>

This change is required because we don't need to set up readers for
parallel-insert unless there is a returning clause. See the below
check a few lines before this change:

- if (pcxt->nworkers_launched > 0)
+ if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader &&
!isParallelModifyWithReturning))
  {

I think this check could be simplified to if (pcxt->nworkers_launched
> 0 && isParallelModifyWithReturning) or something like that.

>
> In grouping_planner(), this branch
>
> +       /* Consider a supported parallel table-modification command */
> +       if (IsModifySupportedInParallelMode(parse->commandType) &&
> +               !inheritance_update &&
> +               final_rel->consider_parallel &&
> +               parse->rowMarks == NIL)
> +       {
>
> is very similar to creation of the non-parallel ModifyTablePaths - perhaps an
> opportunity to move the common code into a new function.
>

+1.

>
> @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
>                 }
>         }
>
> +       if (parallel_modify_partial_path_count > 0)
> +       {
> +               final_rel->rows = current_rel->rows;    /* ??? why hasn't this been
> +                                                                                                * set above
somewhere???? */
 
> +               generate_useful_gather_paths(root, final_rel, false);
> +       }
> +
>         extra.limit_needed = limit_needed(parse);
>         extra.limit_tuples = limit_tuples;
>         extra.count_est = count_est;
>
> A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no
> need to count the paths using parallel_modify_partial_path_count.
>

Sounds sensible.

>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         PlannerGlobal *glob = root->glob;
>         int                     rtoffset = list_length(glob->finalrtable);
>         ListCell   *lc;
> +       Plan       *finalPlan;
>
>         /*
>          * Add all the query's RTEs to the flattened rangetable.  The live ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         }
>
>         /* Now fix the Plan tree */
> -       return set_plan_refs(root, plan, rtoffset);
> +       finalPlan = set_plan_refs(root, plan, rtoffset);
> +       if (finalPlan != NULL && IsA(finalPlan, Gather))
> +       {
> +               Plan       *subplan = outerPlan(finalPlan);
> +
> +               if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> +               {
> +                       finalPlan->targetlist = copyObject(subplan->targetlist);
> +               }
> +       }
> +       return finalPlan;
>  }
>
> I'm not sure if the problem of missing targetlist should be handled here (BTW,
> NIL is the constant for an empty list, not NULL). Obviously this is a
> consequence of the fact that the ModifyTable node has no regular targetlist.
>

I think it is better to add comments along with this change. In this
form, this looks quite hacky to me.

> Actually I don't quite understand why (in the current master branch) the
> targetlist initialized in set_plan_refs()
>
>         /*
>          * Set up the visible plan targetlist as being the same as
>          * the first RETURNING list. This is for the use of
>          * EXPLAIN; the executor won't pay any attention to the
>          * targetlist.  We postpone this step until here so that
>          * we don't have to do set_returning_clause_references()
>          * twice on identical targetlists.
>          */
>         splan->plan.targetlist = copyObject(linitial(newRL));
>
> is not used. Instead, ExecInitModifyTable() picks the first returning list
> again:
>
>         /*
>          * Initialize result tuple slot and assign its rowtype using the first
>          * RETURNING list.  We assume the rest will look the same.
>          */
>         mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>
> So if you set the targetlist in create_modifytable_plan() (according to
> best_path->returningLists), or even in create_modifytable_path(), and ensure
> that it gets propagated to the Gather node (generate_gather_pahs currently
> uses rel->reltarget), then you should no longer need to tweak
> setrefs.c.

This sounds worth investigating.

> Moreover, ExecInitModifyTable() would no longer need to set the
> targetlist.
>

I am not sure if we need to do anything about ExecInitModifyTable. If
we want to unify what setrefs.c does with ExecInitModifyTable, then we
can start a separate thread.

Thanks for all the reviews. I would like to emphasize what I said
earlier in this thread that it is better to first focus on
Parallelising Selects for Insert (aka what
v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT does) as that
in itself is a step towards achieving parallel inserts, doing both
0001 and 0003 at the same time can take much more time as both touches
quite intricate parts of the code.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
vignesh C
Date:
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches to address recent feedback:
>
> - Removed conditional-locking code used in parallel-safety checking
> code (Tsunakawa-san feedback). It turns out that for the problem test
> case, no parallel-safety checking should be occurring that locks
> relations because those inserts are specifying VALUES, not an
> underlying SELECT, so the parallel-safety checking code was updated to
> bail out early if no underlying SELECT is specified for the INSERT. No
> change to the test code was required.
> - Added comment to better explain the reason for treating "INSERT ...
> ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
> - Added assertion to heap_prepare_insert() (Amit)
> - Updated error handling for NULL index_expr_item case (Vignesh)

Thanks Greg for fixing and posting a new patch.
Few comments:
+-- Test INSERT with underlying query.
+-- (should create plan with parallel SELECT, Gather parent node)
+--
+explain(costs off) insert into para_insert_p1 select unique1,
stringu1 from tenk1;
+               QUERY PLAN
+----------------------------------------
+ Insert on para_insert_p1
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Seq Scan on tenk1
+(4 rows)
+
+insert into para_insert_p1 select unique1, stringu1 from tenk1;
+select count(*), sum(unique1) from para_insert_p1;
+ count |   sum
+-------+----------
+ 10000 | 49995000
+(1 row)
+

For one of the test you can validate that the same transaction has
been used by all the parallel workers, you could use something like
below to validate:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  para_insert_p1) as dt;

Few includes are not required:
 #include "executor/nodeGather.h"
+#include "executor/nodeModifyTable.h"
 #include "executor/nodeSubplan.h"
 #include "executor/tqueue.h"
 #include "miscadmin.h"
@@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
        GatherState *gatherstate;
        Plan       *outerNode;
        TupleDesc       tupDesc;
+       Index           varno;

This include is not required in nodeModifyTable.c

+#include "catalog/index.h"
+#include "catalog/indexing.h"

@@ -43,7 +49,11 @@
 #include "parser/parse_agg.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"

The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c

There are few typos:
+        table and populate it can use a parallel plan. Another
exeption is the command
+        <literal>INSERT INTO ... SELECT ...</literal> which can use a
parallel plan for
+        the underlying <literal>SELECT</literal> part of the query.

exeption should be exception

+ /*
+ * For the number of workers to use for a parallel
+ * INSERT/UPDATE/DELETE, it seems resonable to use the same number
+ * of workers as estimated for the underlying query.
+ */
+ parallelModifyWorkers = path->parallel_workers;
resonable should be reasonable

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Jan 8, 2021 at 12:21 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > As an alternative, have you considered allocation of the XID even in parallel
> > > mode? I imagine that the first parallel worker that needs the XID for
> > > insertions allocates it and shares it with the other workers as well as with
> > > the leader process.
> > >
> >
> > As a matter of this patch
> > (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
> > never need to allocate xids by workers because Insert is always
> > performed by leader backend.
>
> When writing this comment, I was actually thinking of
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather
> than v11-0001, see below. On the other hand, if we allowed XID allocation in
> the parallel mode (as a separate patch), even the 0001 patch would get a bit
> simpler.
>
> > Even, if we want to do what you are suggesting it would be tricky because
> > currently, we don't have such an infrastructure where we can pass
> > information among workers.
>
> How about barriers (storage/ipc/barrier.c)? What I imagine is that all the
> workers "meet" at the barrier when they want to insert the first tuple. Then
> one of them allocates the XID, makes it available to others (via shared
> memory) and all the workers can continue.
>

Even if want to do this I am not sure if we need barriers because
there is no intrinsic need for all workers to stop before allocating
XID. After allocation of XID, we just need some way for other workers
to use it, maybe something like all workers currently synchronizes for
getting the block number to process in parallel sequence scans. But
the question is it really worth because in many cases it would be
already allocated by the time parallel DML operation is started and we
do share it in the beginning?  I think if we really want to allow
allocation of xid in parallel-mode then we should also think to allow
it for subtransactions xid not only for main transactions and that
will open up some other use cases. I feel it is better to tackle that
problem separately.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Tang, Haiying"
Date:
Hi Greg, Amit
Cc:hackers

> > > 4. Have you checked the overhead of this on the planner for 
> > > different kinds of statements like inserts into tables having 100 
> > > or 500 partitions? Similarly, it is good to check the overhead of 
> > > domain related checks added in the patch.
> > >
> >
> > Checking that now and will post results soon.
> >
>I am seeing a fair bit of overhead in the planning for the INSERT 
>parallel-safety checks (mind you, compared to the overall performance 
>gain, it's not too bad).

Considering the 'real-world' use cases and extreme cases I can imagine, I took 3 kinds of measurements on partition
tablefor the latest patch(V11). 
 
The measurement is mainly focus on small rows because this could be easier to evaluate check overhead among the
parallelismoptimization.
 
From current results, the overhead looks acceptable compared to the benefits as Greg said.

Test 1: overhead of parallel insert into thousands partitions and 1 rows per partition.
%reg=(patched-master)/master
all time= Execution Time+ Planning Time
           |                patched                |                   master               |         %reg
           |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |Execution Time(ms)| Planning Time(ms)  | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) |
%reg(alltime)  |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
1000       | 2281.291         |  25.983            |  9752.145          |  0.208            |   -77%              |
-76%           |
 
2000       | 2303.229         |  50.427            |  9446.221          |  0.227            |   -76%              |
-75%           |
 
4000       | 2303.207         |  100.946           |  9948.743          |  0.211            |   -77%              |
-76%           |
 
6000       | 2411.877         |  152.212           |  9953.114          |  0.210            |   -76%              |
-74%           |
 
10000      | 2467.235         |  260.751           |  10917.494         |  0.284            |   -77%              |
-75%           |
 

Test 2: overhead of parallel insert into thousands partitions and 100 rows per partition.
           |                patched                |                   master               |         %reg
           |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |Execution Time(ms)| Planning Time(ms)  | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) |
%reg(alltime)  |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
1000       | 2366.620         |  25.787            |  14052.748         |  0.238            |   -83%              |
-83%           |
 
2000       | 2325.171         |  48.780            |  10099.203         |  0.211            |   -77%              |
-76%           |
 
4000       | 2599.344         |  110.978           |  10678.065         |  0.216            |   -76%              |
-75%           |
 
6000       | 2764.070         |  152.929           |  10880.948         |  0.238            |   -75%              |
-73%           |
 
10000      | 3043.658         |  265.297           |  11607.202         |  0.207            |   -74%              |
-71%           |
 

Test 3: overhead of parallel insert into varying number of partitions and inserted rows. 
                             |                patched                |                   master               |
%reg                         |
 

-----------|-----------------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |total table rows |Execution Time(ms)| Planning Time(ms)  | Execution Time(ms) | Planning Time(ms) |
%reg(ExcutionTime) | %reg(all time)  |
 

-----------|-----------------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
100        | 10000000        | 11202.021        |  1.593             |  25668.560         |  0.212            |   -56%
           | -56%            |
 
500        | 10000000        | 10290.368        |  12.722            |  25730.860         |  0.214            |   -60%
           | -60%            |
 
1000       | 10000000        | 8946.627         |  24.851            |  26271.026         |  0.219            |   -66%
           | -66%            |
 
2000       | 10000000        | 10615.643        |  50.111            |  25512.692         |  0.231            |   -58%
           | -58%            |
 
4000       | 10000000        | 9056.334         |  105.644           |  26643.383         |  0.217            |   -66%
           | -66%            |
 

------------------------------------------------------------------------------------------------------------------------------------------------------
100        | 1000000         | 2757.670         |  1.493             |  11136.357         |  0.208            |   -75%
           | -75%            |
 
500        | 1000000         | 2810.980         |  12.696            |  11483.715         |  0.206            |   -76%
           | -75%            |
 
1000       | 1000000         | 2773.342         |  24.746            |  13441.169         |  0.214            |   -79%
           | -79%            |
 
2000       | 1000000         | 2856.915         |  51.737            |  10996.621         |  0.226            |   -74%
           | -74%            |
 
4000       | 1000000         | 2942.478         |  100.235           |  11422.699         |  0.220            |   -74%
           | -73%            |
 

------------------------------------------------------------------------------------------------------------------------------------------------------
100        | 100000          | 2257.134         |  1.682             |  9351.511          |  0.226            |   -76%
           | -76%            |
 
500        | 100000          | 2197.570         |  12.452            |  9636.659          |  0.203            |   -77%
           | -77%            |
 
1000       | 100000          | 2188.356         |  24.553            |  9647.583          |  0.202            |   -77%
           | -77%            |
 
2000       | 100000          | 2293.287         |  49.167            |  9365.449          |  0.224            |   -76%
           | -75%            |
 
4000       | 100000          | 2375.935         |  104.562           |  10125.190         |  0.219            |   -77%
           | -76%            |
 

------------------------------------------------------------------------------------------------------------------------------------------------------
100        | 10000           | 2142.086         |  1.506             |  9500.491          |  0.206            |   -77%
           | -77%            |
 
500        | 10000           | 2147.779         |  12.260            |  11746.766         |  0.202            |   -82%
           | -82%            |
 
1000       | 10000           | 2153.286         |  23.900            |  9298.452          |  0.212            |   -77%
           | -77%            |
 
2000       | 10000           | 2303.170         |  52.844            |  9772.971          |  0.217            |   -76%
           | -76%            |
 

However, just like Amit and other hackers concerned, if we want to leave the overhead as it is, we should cover real
usecase as much as possible in case we find the overhead can't be ignored(then we should consider to reduce the
overhead).
So if anyone has some reality use cases(which I didn't include in my results above) need to test on this patch. Please
sharethe info with me, I'd like to do more tests on it.
 

Regards,
Tang




Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches to address recent feedback:
>

Here is an additional review of
v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
quite a few comments raised on the V11-0001* patch. I suggest first
post a revised version of V11-0001* patch addressing those comments
and then you can separately post a revised version of
v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.

Few comments:
==============
1.
+char
+max_parallel_hazard_for_modify(Query *parse, const char
*initial_max_parallel_hazard)
{
..
+ return (rel_max_parallel_hazard_for_modify(rte->relid,
parse->commandType, &context, NoLock));
..
}

rel_max_parallel_hazard_for_modify()
{
..
+ rel = table_open(relid, lockmode);
..
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
..
+ /*
+ * Column default expressions for columns in the target-list are
+ * already being checked for parallel-safety in the
+ * max_parallel_hazard() scan of the query tree in standard_planner().
+ */
+
+ tupdesc = RelationGetDescr(rel);
}

Here, it seems we are accessing the relation descriptor without any
lock on the table which is dangerous considering the table can be
modified in a parallel session. Is there a reason why you think this
is safe? Did you check anywhere else such a coding pattern?

2.
+ /*
+ * If there are any index expressions, check that they are parallel-mode
+ * safe.
+ */
+ max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context);
+ if (max_parallel_hazard_test(max_hazard, context))
+ {
+ table_close(rel, lockmode);
+ return context->max_hazard;
+ }

Here and at all other similar places, the call to
max_parallel_hazard_test seems redundant because
index_expr_max_parallel_hazard_for_modify would have already done
that. Why can't we just return true/false from
index_expr_max_parallel_hazard_for_modify? The context would have been
already updated for max_hazard.

3.
@@ -342,6 +343,18 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  /* all the cheap tests pass, so scan the query tree */
  glob->maxParallelHazard = max_parallel_hazard(parse);
  glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = max_parallel_hazard_for_modify(parse,
&glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
  }

I don't like this way of checking parallel_hazard for modify. This not
only duplicates some code in max_parallel_hazard_for_modify from
max_parallel_hazard but also appears quite awkward. Can we move
max_parallel_hazard_for_modify inside max_parallel_hazard? Basically,
after calling max_parallel_hazard_walker, we can check for modify
statement and call the new function.

4.
domain_max_parallel_hazard_for_modify()
{
..
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ break;
+ }
..
}
index_expr_max_parallel_hazard_for_modify()
{
..
+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ index_close(index_rel, lockmode);
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ return context->max_hazard;
+ }
..
}

It is not clear why the above two are shouldn't happen cases and if so
why we want to treat them as unsafe. Ideally, there should be an
Assert if these can't happen but it is difficult to decide without
knowing why you have considered them unsafe?

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Jan 14, 2021 at 2:37 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
>
> Hi Greg, Amit
> Cc:hackers
>
> > > > 4. Have you checked the overhead of this on the planner for
> > > > different kinds of statements like inserts into tables having 100
> > > > or 500 partitions? Similarly, it is good to check the overhead of
> > > > domain related checks added in the patch.
> > > >
> > >
> > > Checking that now and will post results soon.
> > >
> >I am seeing a fair bit of overhead in the planning for the INSERT
> >parallel-safety checks (mind you, compared to the overall performance
> >gain, it's not too bad).
>
> Considering the 'real-world' use cases and extreme cases I can imagine, I took 3 kinds of measurements on partition
tablefor the latest patch(V11).
 
> The measurement is mainly focus on small rows because this could be easier to evaluate check overhead among the
parallelismoptimization.
 
> From current results, the overhead looks acceptable compared to the benefits as Greg said.
>

Can we test cases when we have few rows in the Select table (say 1000)
and there 500 or 1000 partitions. In that case, we won't select
parallelism but we have to pay the price of checking parallel-safety
of all partitions. Can you check this with 100, 200, 500, 1000
partitions table?

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello Tang-san,

From: Amit Kapila <amit.kapila16@gmail.com>
> Can we test cases when we have few rows in the Select table (say 1000)
> and there 500 or 1000 partitions. In that case, we won't select
> parallelism but we have to pay the price of checking parallel-safety
> of all partitions. Can you check this with 100, 200, 500, 1000
> partitions table?

I also wanted to see such an extreme(?) case.  The 1,000 rows is not the count per partition but the total count of all
partitions.e.g.,when # of partitions is 100, # of rows per partition is 10.
 



Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 8, 2021 at 8:25 PM vignesh C <vignesh21@gmail.com> wrote:
>

> For one of the test you can validate that the same transaction has
> been used by all the parallel workers, you could use something like
> below to validate:
> SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  para_insert_p1) as dt;
>

> Few includes are not required:
>  #include "executor/nodeGather.h"
>
> This include is not required in nodeModifyTable.c
>

>
> The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c
>

> exeption should be exception
>
> resonable should be reasonable
>

Thanks Vignesh,
I'll be sure to make those updates in the next version of the patches.
Regards,
Greg



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Here is an additional review of
> v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
> quite a few comments raised on the V11-0001* patch. I suggest first
> post a revised version of V11-0001* patch addressing those comments
> and then you can separately post a revised version of
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.
>

1)

>Here, it seems we are accessing the relation descriptor without any
>lock on the table which is dangerous considering the table can be
>modified in a parallel session. Is there a reason why you think this
>is safe? Did you check anywhere else such a coding pattern?

Yes, there's a very good reason and I certainly have checked for the
same coding pattern elsewhere, and not just randomly decided that
locking can be ignored.
The table has ALREADY been locked (by the caller) during the
parse/analyze phase.
(This is not the case for a partition, in which case the patch code
uses AccessShareLock, as you will see).
And BTW, with asserts enabled, an attempt to table_open() with NoLock
when you haven't already locked the table will fire an assert - see
following code in relation_open():

    /*
     * If we didn't get the lock ourselves, assert that caller holds one,
     * except in bootstrap mode where no locks are used.
     */
    Assert(lockmode != NoLock ||
           IsBootstrapProcessingMode() ||
           CheckRelationLockedByMe(r, AccessShareLock, true));

2)

>+ /*
>+ * If there are any index expressions, check that they are parallel-mode
>+ * safe.
>+ */
>+ max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context);
>+ if (max_parallel_hazard_test(max_hazard, context))
>+ {
>+ table_close(rel, lockmode);
>+ return context->max_hazard;
>+ }

>Here and at all other similar places, the call to
>max_parallel_hazard_test seems redundant because
>index_expr_max_parallel_hazard_for_modify would have already done
>that. Why can't we just return true/false from
>index_expr_max_parallel_hazard_for_modify? The context would have been
>already updated for max_hazard.

Yes, you're right, it's redundant to call max_parallel_hazard_test(_)
again. The max_hazard can always be retrieved from the context if
needed (rather than explicitly returned), so I'll change this function
(and any similar cases) to just return true if the max_hazard of
interest has been reached.

3)

>I don't like this way of checking parallel_hazard for modify. This not
>only duplicates some code in max_parallel_hazard_for_modify from
>max_parallel_hazard but also appears quite awkward. Can we move
>max_parallel_hazard_for_modify inside max_parallel_hazard? Basically,
>after calling max_parallel_hazard_walker, we can check for modify
>statement and call the new function.

Agree, I'll move it, as you suggest.

4)

>domain_max_parallel_hazard_for_modify()
>{
>..
>+ if (isnull)
>+ {
>+ /*
>+ * This shouldn't ever happen, but if it does, log a WARNING
>+ * and return UNSAFE, rather than erroring out.
>+ */
>+ elog(WARNING, "null conbin for constraint %u", con->oid);
>+ context->max_hazard = PROPARALLEL_UNSAFE;
>+ break;
>+ }
>..
>}
>index_expr_max_parallel_hazard_for_modify()
>{
>..
>+ if (index_expr_item == NULL) /* shouldn't happen */
>+ {
>+ index_close(index_rel, lockmode);
>+ context->max_hazard = PROPARALLEL_UNSAFE;
>+ return context->max_hazard;
>+ }
>..
>}

>It is not clear why the above two are shouldn't happen cases and if so
>why we want to treat them as unsafe. Ideally, there should be an
>Assert if these can't happen but it is difficult to decide without
>knowing why you have considered them unsafe?

The checks being done here for "should never happen" cases are THE
SAME as other parts of the Postgres code.
For example, search Postgres code for "null conbin" and you'll find 6
other places in the Postgres code which actually ERROR out if conbin
(binary representation of the constraint) in a pg_constraint tuple is
found to be null.
The cases that you point out in the patch used to also error out in
the same way, but Vignesh suggested changing them to just return
parallel-unsafe instead of erroring-out, which I agree with. Such
cases could surely ever only happen if the DB had been corrupted, so
extremely rare IMHO and most likely to have caused an ERROR elsewhere
before ever reaching here...
I can add some Asserts to the current code, to better alert for such
cases, for when asserts are enabled, but otherwise I think that the
current code is OK (cleaning up other Postgres code is beyond the
scope of the task here).


Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Tang, Haiying"
Date:
> From: Amit Kapila <amit.kapila16@gmail.com>
> > Can we test cases when we have few rows in the Select table (say 
> > 1000) and there 500 or 1000 partitions. In that case, we won't 
> > select parallelism but we have to pay the price of checking 
> > parallel-safety of all partitions. Can you check this with 100, 200, 
> > 500, 1000 partitions table?
> 
> I also wanted to see such an extreme(?) case.  The 1,000 rows is not 
> the count per partition but the total count of all partitions.e.g., 
> when # of partitions is 100, # of rows per partition is 10.

Below results are in serial plan which select table total rows are 1,000. The Excution Time + Planning Time is still
lessthan unpatched.
 
(does this patch make some optimizes in serial insert? I'm a little confused here, Because the patched execution time
isless than unpatched, but I didn't find information in commit messages about it. If I missed something, please kindly
letme know.)
 
    
           |                patched                |                   master               |         %reg
           |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |Execution Time(ms)| Planning Time(ms)  | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) |
%reg(alltime)  |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
100        | 5.294            |  1.581             |  6.951             |  0.037            |   -24%              | -2%
           |
 
200        | 9.666            |  3.068             |  13.681            |  0.043            |   -29%              | -7%
           |
 
500        | 22.742           |  12.061            |  35.928            |  0.125            |   -37%              | -3%
           |
 
1000       | 46.386           |  24.872            |  75.523            |  0.142            |   -39%              | -6%
           |
 

I did another test which made check overhead obvious. this case is not fitting for partition purpose, but I put it here
asan extreme case too.
 
Select table total rows are 1,000, # of partitions is 2000. So only the first 1000 partitions have 1 row per partition,
thelast 1000 partitions have no data inserted. 
 

           |                patched                |                   master               |         %reg
           |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |Execution Time(ms)| Planning Time(ms)  | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) |
%reg(alltime)  |
 

-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
2000       | 45.758           |  51.697            |  80.272            |  0.136            |   -43               | 21%
           |
 

Regards,
Tang



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>
> (does this patch make some optimizes in serial insert? I'm a little confused
> here, Because the patched execution time is less than unpatched, but I didn't
> find information in commit messages about it. If I missed something, please
> kindly let me know.)

I haven't thought of anything yet.  Could you show us the output of EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000
partitionscase for the patched and unpatched?  If it doesn't show any difference, the output of perf may be necessary
next.

(BTW, were all the 1,000 rows stored in the target table?)


> I did another test which made check overhead obvious. this case is not fitting
> for partition purpose, but I put it here as an extreme case too.
> Select table total rows are 1,000, # of partitions is 2000. So only the first 1000
> partitions have 1 row per partition, the last 1000 partitions have no data
> inserted.

Thank you, that's a good test.


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 1:02 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
>
> > From: Amit Kapila <amit.kapila16@gmail.com>
> > > Can we test cases when we have few rows in the Select table (say
> > > 1000) and there 500 or 1000 partitions. In that case, we won't
> > > select parallelism but we have to pay the price of checking
> > > parallel-safety of all partitions. Can you check this with 100, 200,
> > > 500, 1000 partitions table?
> >
> > I also wanted to see such an extreme(?) case.  The 1,000 rows is not
> > the count per partition but the total count of all partitions.e.g.,
> > when # of partitions is 100, # of rows per partition is 10.
>
> Below results are in serial plan which select table total rows are 1,000. The Excution Time + Planning Time is still
lessthan unpatched.
 
> (does this patch make some optimizes in serial insert? I'm a little confused here, Because the patched execution time
isless than unpatched, but I didn't find information in commit messages about it. If I missed something, please kindly
letme know.)
 
>

I don't think the patch should have any impact on the serial case. I
think you can try to repeat each test 3 times both with and without a
patch and take the median of the three.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Tang, Haiying"
Date:
Hi Tsunakawa-san

> From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>
> > (does this patch make some optimizes in serial insert? I'm a little 
> > confused here, Because the patched execution time is less than 
> > unpatched, but I didn't find information in commit messages about it.
> > If I missed something, please kindly let me know.)
> 
> I haven't thought of anything yet.  Could you show us the output of 
> EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000 partitions case for the 
> patched and unpatched?  If it doesn't show any difference, the output 
> of perf may be necessary next.

Execute EXPLAIN on Patched:
postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual time=44.139..44.140 rows=0 loops=1)
   Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
   ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.007..0.201 rows=1000
loops=1)
         Output: test_data1.a, test_data1.b
         Buffers: shared hit=5
 Planning:
   Buffers: shared hit=27011
 Planning Time: 24.526 ms
 Execution Time: 44.981 ms

Execute EXPLAIN on non-Patched:
postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual time=72.656..72.657 rows=0 loops=1)
   Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
   ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.010..0.175 rows=1000
loops=1)
         Output: test_data1.a, test_data1.b
         Buffers: shared hit=5
 Planning:
   Buffers: shared hit=72
 Planning Time: 0.135 ms
 Execution Time: 79.058 ms

> (BTW, were all the 1,000 rows stored in the target table?)

Yes, I checked all rows stored in target table.
postgres=# select count(*) from test_part;  count
-------
  1000

Regards,
Tang



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 10:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Here is an additional review of
> > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
> > quite a few comments raised on the V11-0001* patch. I suggest first
> > post a revised version of V11-0001* patch addressing those comments
> > and then you can separately post a revised version of
> > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.
> >
>
> 1)
>
> >Here, it seems we are accessing the relation descriptor without any
> >lock on the table which is dangerous considering the table can be
> >modified in a parallel session. Is there a reason why you think this
> >is safe? Did you check anywhere else such a coding pattern?
>
> Yes, there's a very good reason and I certainly have checked for the
> same coding pattern elsewhere, and not just randomly decided that
> locking can be ignored.
> The table has ALREADY been locked (by the caller) during the
> parse/analyze phase.
>

Fair enough. I suggest adding a comment saying the same so that the
reader doesn't get confused about the same.

> (This is not the case for a partition, in which case the patch code
> uses AccessShareLock, as you will see).

Okay, but I see you release the lock on partition rel immediately
after checking parallel-safety. What if a user added some
parallel-unsafe constraint (via Alter Table) after that check?

>
> 4)
>
> >domain_max_parallel_hazard_for_modify()
> >{
> >..
> >+ if (isnull)
> >+ {
> >+ /*
> >+ * This shouldn't ever happen, but if it does, log a WARNING
> >+ * and return UNSAFE, rather than erroring out.
> >+ */
> >+ elog(WARNING, "null conbin for constraint %u", con->oid);
> >+ context->max_hazard = PROPARALLEL_UNSAFE;
> >+ break;
> >+ }
> >..
> >}
> >index_expr_max_parallel_hazard_for_modify()
> >{
> >..
> >+ if (index_expr_item == NULL) /* shouldn't happen */
> >+ {
> >+ index_close(index_rel, lockmode);
> >+ context->max_hazard = PROPARALLEL_UNSAFE;
> >+ return context->max_hazard;
> >+ }
> >..
> >}
>
> >It is not clear why the above two are shouldn't happen cases and if so
> >why we want to treat them as unsafe. Ideally, there should be an
> >Assert if these can't happen but it is difficult to decide without
> >knowing why you have considered them unsafe?
>
> The checks being done here for "should never happen" cases are THE
> SAME as other parts of the Postgres code.
> For example, search Postgres code for "null conbin" and you'll find 6
> other places in the Postgres code which actually ERROR out if conbin
> (binary representation of the constraint) in a pg_constraint tuple is
> found to be null.
> The cases that you point out in the patch used to also error out in
> the same way, but Vignesh suggested changing them to just return
> parallel-unsafe instead of erroring-out, which I agree with.
>

You have not raised a WARNING for the second case. But in the first
place what is the reasoning for making this different from other parts
of code? If we don't have a solid reason then I suggest keeping these
checks and errors the same as in other parts of the code.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Tang, Haiying"
Date:
Hi Amit

> I don't think the patch should have any impact on the serial case. I 
> think you can try to repeat each test 3 times both with and without a 
> patch and take the median of the three.

Actually, I repeated about 10 times, the execution time is always less than unpatched.

Regards,
Tang




Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
>
> Hi Tsunakawa-san
>
> > From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>
> > > (does this patch make some optimizes in serial insert? I'm a little
> > > confused here, Because the patched execution time is less than
> > > unpatched, but I didn't find information in commit messages about it.
> > > If I missed something, please kindly let me know.)
> >
> > I haven't thought of anything yet.  Could you show us the output of
> > EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000 partitions case for the
> > patched and unpatched?  If it doesn't show any difference, the output
> > of perf may be necessary next.
>
> Execute EXPLAIN on Patched:
> postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1;
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual time=44.139..44.140 rows=0 loops=1)
>    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
>    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.007..0.201 rows=1000
loops=1)
>          Output: test_data1.a, test_data1.b
>          Buffers: shared hit=5
>  Planning:
>    Buffers: shared hit=27011
>  Planning Time: 24.526 ms
>  Execution Time: 44.981 ms
>
> Execute EXPLAIN on non-Patched:
> postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1;
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual time=72.656..72.657 rows=0 loops=1)
>    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
>    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.010..0.175 rows=1000
loops=1)
>          Output: test_data1.a, test_data1.b
>          Buffers: shared hit=5
>  Planning:
>    Buffers: shared hit=72
>  Planning Time: 0.135 ms
>  Execution Time: 79.058 ms
>

So, the results indicate that after the patch we touch more buffers
during planning which I think is because of accessing the partition
information, and during execution, the patch touches fewer buffers for
the same reason. But why this can reduce the time with patch? I think
this needs some investigation.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jan 18, 2021 at 10:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Here is an additional review of
> > > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
> > > quite a few comments raised on the V11-0001* patch. I suggest first
> > > post a revised version of V11-0001* patch addressing those comments
> > > and then you can separately post a revised version of
> > > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.
> > >
> >
> > 1)
> >
> > >Here, it seems we are accessing the relation descriptor without any
> > >lock on the table which is dangerous considering the table can be
> > >modified in a parallel session. Is there a reason why you think this
> > >is safe? Did you check anywhere else such a coding pattern?
> >
> > Yes, there's a very good reason and I certainly have checked for the
> > same coding pattern elsewhere, and not just randomly decided that
> > locking can be ignored.
> > The table has ALREADY been locked (by the caller) during the
> > parse/analyze phase.
> >
>
> Fair enough. I suggest adding a comment saying the same so that the
> reader doesn't get confused about the same.
>
> > (This is not the case for a partition, in which case the patch code
> > uses AccessShareLock, as you will see).
>
> Okay, but I see you release the lock on partition rel immediately
> after checking parallel-safety. What if a user added some
> parallel-unsafe constraint (via Alter Table) after that check?
>
> >
> > 4)
> >
> > >domain_max_parallel_hazard_for_modify()
> > >{
> > >..
> > >+ if (isnull)
> > >+ {
> > >+ /*
> > >+ * This shouldn't ever happen, but if it does, log a WARNING
> > >+ * and return UNSAFE, rather than erroring out.
> > >+ */
> > >+ elog(WARNING, "null conbin for constraint %u", con->oid);
> > >+ context->max_hazard = PROPARALLEL_UNSAFE;
> > >+ break;
> > >+ }
> > >..
> > >}
> > >index_expr_max_parallel_hazard_for_modify()
> > >{
> > >..
> > >+ if (index_expr_item == NULL) /* shouldn't happen */
> > >+ {
> > >+ index_close(index_rel, lockmode);
> > >+ context->max_hazard = PROPARALLEL_UNSAFE;
> > >+ return context->max_hazard;
> > >+ }
> > >..
> > >}
> >
> > >It is not clear why the above two are shouldn't happen cases and if so
> > >why we want to treat them as unsafe. Ideally, there should be an
> > >Assert if these can't happen but it is difficult to decide without
> > >knowing why you have considered them unsafe?
> >
> > The checks being done here for "should never happen" cases are THE
> > SAME as other parts of the Postgres code.
> > For example, search Postgres code for "null conbin" and you'll find 6
> > other places in the Postgres code which actually ERROR out if conbin
> > (binary representation of the constraint) in a pg_constraint tuple is
> > found to be null.
> > The cases that you point out in the patch used to also error out in
> > the same way, but Vignesh suggested changing them to just return
> > parallel-unsafe instead of erroring-out, which I agree with.
> >
>
> You have not raised a WARNING for the second case. But in the first
> place what is the reasoning for making this different from other parts
> of code?
>

On again, thinking about this, I see a reason why one wants to do like
you have done currently in the patch. It helps us to avoid giving such
errors when they are really not required say when it occurred while
checking parallel-safety for a particular partition and in reality we
will never insert in that partition and there probably similar other
cases. I guess we should give WARNING consistently in all such cases.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Jan 18, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > 1)
> >
> > >Here, it seems we are accessing the relation descriptor without any
> > >lock on the table which is dangerous considering the table can be
> > >modified in a parallel session. Is there a reason why you think this
> > >is safe? Did you check anywhere else such a coding pattern?
> >
> > Yes, there's a very good reason and I certainly have checked for the
> > same coding pattern elsewhere, and not just randomly decided that
> > locking can be ignored.
> > The table has ALREADY been locked (by the caller) during the
> > parse/analyze phase.
> >
>
> Fair enough. I suggest adding a comment saying the same so that the
> reader doesn't get confused about the same.
>

OK, I'll add a comment.

> > (This is not the case for a partition, in which case the patch code
> > uses AccessShareLock, as you will see).
>
> Okay, but I see you release the lock on partition rel immediately
> after checking parallel-safety. What if a user added some
> parallel-unsafe constraint (via Alter Table) after that check?
>
> >

I'm not sure. But there would be a similar concern for current
Parallel SELECT functionality, right?
My recollection is that ALTER TABLE obtains an exclusive lock on the
table which it retains until the end of the transaction, so that will
result in blocking at certain points, during parallel-checks and
execution, but there may still be a window.

> > 4)
> >
> > >domain_max_parallel_hazard_for_modify()
> > >{
> > >..
> > >+ if (isnull)
> > >+ {
> > >+ /*
> > >+ * This shouldn't ever happen, but if it does, log a WARNING
> > >+ * and return UNSAFE, rather than erroring out.
> > >+ */
> > >+ elog(WARNING, "null conbin for constraint %u", con->oid);
> > >+ context->max_hazard = PROPARALLEL_UNSAFE;
> > >+ break;
> > >+ }
> > >..
> > >}
> > >index_expr_max_parallel_hazard_for_modify()
> > >{
> > >..
> > >+ if (index_expr_item == NULL) /* shouldn't happen */
> > >+ {
> > >+ index_close(index_rel, lockmode);
> > >+ context->max_hazard = PROPARALLEL_UNSAFE;
> > >+ return context->max_hazard;
> > >+ }
> > >..
> > >}
> >
> > >It is not clear why the above two are shouldn't happen cases and if so
> > >why we want to treat them as unsafe. Ideally, there should be an
> > >Assert if these can't happen but it is difficult to decide without
> > >knowing why you have considered them unsafe?
> >
> > The checks being done here for "should never happen" cases are THE
> > SAME as other parts of the Postgres code.
> > For example, search Postgres code for "null conbin" and you'll find 6
> > other places in the Postgres code which actually ERROR out if conbin
> > (binary representation of the constraint) in a pg_constraint tuple is
> > found to be null.
> > The cases that you point out in the patch used to also error out in
> > the same way, but Vignesh suggested changing them to just return
> > parallel-unsafe instead of erroring-out, which I agree with.
> >
>
> You have not raised a WARNING for the second case.

The same checks in current Postgres code also don't raise a WARNING
for that case, so I'm just being consistent with existing Postgres
code (which itself isn't consistent for those two cases).

>But in the first
> place what is the reasoning for making this different from other parts
> of code? If we don't have a solid reason then I suggest keeping these
> checks and errors the same as in other parts of the code.
>

The checks are the same as done in existing Postgres source - but
instead of failing with an ERROR (i.e. whole backend dies), in the
middle of parallel-safety-checking, it has been changed to regard the
operation as parallel-unsafe, so that it will try to execute in
non-parallel mode, where it will most likely fail too when those
corrupted attributes are accessed - but it will fail in the way that
it currently does in Postgres, should that very rare condition ever
happen. This was suggested by Vignesh, and I agree with him. So in
effect, it's just allowing it to use the existing error paths in the
code, rather than introducing new ERROR points.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Greg Nancarrow <gregn4422@gmail.com>
> On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >Here, it seems we are accessing the relation descriptor without any
> >lock on the table which is dangerous considering the table can be
> >modified in a parallel session. Is there a reason why you think this
> >is safe? Did you check anywhere else such a coding pattern?
> 
> Yes, there's a very good reason and I certainly have checked for the
> same coding pattern elsewhere, and not just randomly decided that
> locking can be ignored.
> The table has ALREADY been locked (by the caller) during the
> parse/analyze phase.

Isn't there any case where planning is done but parse analysis is not done immediately before?  e.g.

* Alteration of objects invalidates cached query plans, and the next execution of the plan rebuilds it.  (But it seems
thatparse analysis is done in this case in plancache.c.)
 

* Execute a prepared statement with a different parameter value, which builds a new custom plan or a generic plan.

Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a
triggerwith a parallel-unsafe trigger action?
 


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 3:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Jan 18, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > >It is not clear why the above two are shouldn't happen cases and if so
> > > >why we want to treat them as unsafe. Ideally, there should be an
> > > >Assert if these can't happen but it is difficult to decide without
> > > >knowing why you have considered them unsafe?
> > >
> > > The checks being done here for "should never happen" cases are THE
> > > SAME as other parts of the Postgres code.
> > > For example, search Postgres code for "null conbin" and you'll find 6
> > > other places in the Postgres code which actually ERROR out if conbin
> > > (binary representation of the constraint) in a pg_constraint tuple is
> > > found to be null.
> > > The cases that you point out in the patch used to also error out in
> > > the same way, but Vignesh suggested changing them to just return
> > > parallel-unsafe instead of erroring-out, which I agree with.
> > >
> >
> > You have not raised a WARNING for the second case.
>
> The same checks in current Postgres code also don't raise a WARNING
> for that case, so I'm just being consistent with existing Postgres
> code (which itself isn't consistent for those two cases).
>

Search for the string "too few entries in indexprs list" and you will
find a lot of places in code raising ERROR for the same condition.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Jan 19, 2021 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > >
> > > You have not raised a WARNING for the second case.
> >
> > The same checks in current Postgres code also don't raise a WARNING
> > for that case, so I'm just being consistent with existing Postgres
> > code (which itself isn't consistent for those two cases).
> >
>
> Search for the string "too few entries in indexprs list" and you will
> find a lot of places in code raising ERROR for the same condition.
>

Yes, but raising an ERROR stops processing (not just logs an error
message). Raising a WARNING logs a warning message and continues
processing. It's a big difference.
So, for the added parallel-safety-checking code, it was suggested by
Vignesh (and agreed by me) that, for these rare and highly unlikely
conditions, it would be best not to just copy the error-handling code
verbatim from other cases in the Postgres code (as I had originally
done) and just stop processing dead with an error, but to instead
return PARALLEL_UNSAFE, so that processing continues as it would for
current non-parallel processing, which would most likely error-out
anyway along the current error-handling checks and paths when those
bad attributes/fields are referenced.
I will add some Asserts() and don't mind adding a WARNING message for
the 2nd case.
If you really feel strongly about this, I can just restore the
original code, which will stop dead with an ERROR in the middle of
parallel-safety checking should one of these rare conditions ever
occur.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Jan 19, 2021 at 1:37 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> > The table has ALREADY been locked (by the caller) during the
> > parse/analyze phase.
>
> Isn't there any case where planning is done but parse analysis is not done immediately before?  e.g.
>
> * Alteration of objects invalidates cached query plans, and the next execution of the plan rebuilds it.  (But it
seemsthat parse analysis is done in this case in plancache.c.)
 
>
> * Execute a prepared statement with a different parameter value, which builds a new custom plan or a generic plan.
>

I don't know, but since NoLock is used in other parts of the planner,
I'd expect those to fail if such cases existed.

> Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a
triggerwith a parallel-unsafe trigger action?
 
>

Needs to be tested, but I'd expect the cached plan to get invalidated
in this case - surely the same potential issue exists in Postgres for
the current Parallel SELECT functionality - for example, for a column
with a default value that is an expression (which could be altered
from being parallel-safe to parallel-unsafe).

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 3:50 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Jan 18, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > 1)
> > >
> > > >Here, it seems we are accessing the relation descriptor without any
> > > >lock on the table which is dangerous considering the table can be
> > > >modified in a parallel session. Is there a reason why you think this
> > > >is safe? Did you check anywhere else such a coding pattern?
> > >
> > > Yes, there's a very good reason and I certainly have checked for the
> > > same coding pattern elsewhere, and not just randomly decided that
> > > locking can be ignored.
> > > The table has ALREADY been locked (by the caller) during the
> > > parse/analyze phase.
> > >
> >
> > Fair enough. I suggest adding a comment saying the same so that the
> > reader doesn't get confused about the same.
> >
>
> OK, I'll add a comment.
>
> > > (This is not the case for a partition, in which case the patch code
> > > uses AccessShareLock, as you will see).
> >
> > Okay, but I see you release the lock on partition rel immediately
> > after checking parallel-safety. What if a user added some
> > parallel-unsafe constraint (via Alter Table) after that check?
> >
> > >
>
> I'm not sure. But there would be a similar concern for current
> Parallel SELECT functionality, right?
>

I don't think so because, for Selects, we take locks on the required
partitions and don't release them immediately. We do parallel safety
checks after acquiring those locks. From the code perspective, we lock
individual partitions via
expand_inherited_rtentry->expand_partitioned_rtentry and then check
parallel-safety at a later point via
set_append_rel_size->set_rel_consider_parallel. Also, I am not sure if
there is anything we check for Selects at each partition relation
level that can be changed by a concurrent session. Do you have a
different understanding?

Similarly, we do retain locks on indexes, see get_relation_info, which
we are not doing in the patch.

> My recollection is that ALTER TABLE obtains an exclusive lock on the
> table which it retains until the end of the transaction, so that will
> result in blocking at certain points, during parallel-checks and
> execution, but there may still be a window.
>

Once the Select has acquired locks in the above code path, I don't
think Alter for a particular partition would be able to proceed unless
those locks are non-conflicting.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Tue, Jan 19, 2021 at 9:19 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Jan 19, 2021 at 2:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > >
> > > > You have not raised a WARNING for the second case.
> > >
> > > The same checks in current Postgres code also don't raise a WARNING
> > > for that case, so I'm just being consistent with existing Postgres
> > > code (which itself isn't consistent for those two cases).
> > >
> >
> > Search for the string "too few entries in indexprs list" and you will
> > find a lot of places in code raising ERROR for the same condition.
> >
>
> Yes, but raising an ERROR stops processing (not just logs an error
> message). Raising a WARNING logs a warning message and continues
> processing. It's a big difference.
> So, for the added parallel-safety-checking code, it was suggested by
> Vignesh (and agreed by me) that, for these rare and highly unlikely
> conditions, it would be best not to just copy the error-handling code
> verbatim from other cases in the Postgres code (as I had originally
> done) and just stop processing dead with an error, but to instead
> return PARALLEL_UNSAFE, so that processing continues as it would for
> current non-parallel processing, which would most likely error-out
> anyway along the current error-handling checks and paths when those
> bad attributes/fields are referenced.
> I will add some Asserts() and don't mind adding a WARNING message for
> the 2nd case.
> If you really feel strongly about this, I can just restore the
> original code, which will stop dead with an ERROR in the middle of
> parallel-safety checking should one of these rare conditions ever
> occur.
>

I am expecting that either we raise a WARNING and return
parallel_unsafe for all such checks (shouldn't reach cases) in the
patch or simply raise an ERROR as we do in other parts of the patch. I
personally prefer the latter alternative but I am fine with the former
one as well.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Tue, Jan 19, 2021 at 10:32 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Jan 19, 2021 at 1:37 PM tsunakawa.takay@fujitsu.com
> <tsunakawa.takay@fujitsu.com> wrote:
> >
> > > The table has ALREADY been locked (by the caller) during the
> > > parse/analyze phase.
> >
> > Isn't there any case where planning is done but parse analysis is not done immediately before?  e.g.
> >
> > * Alteration of objects invalidates cached query plans, and the next execution of the plan rebuilds it.  (But it
seemsthat parse analysis is done in this case in plancache.c.)
 
> >
> > * Execute a prepared statement with a different parameter value, which builds a new custom plan or a generic plan.
> >
>
> I don't know, but since NoLock is used in other parts of the planner,
> I'd expect those to fail if such cases existed.
>

I think I know how for both the above cases, we ensure that the locks
are acquired before we reach the planner. It seems we will call
GetCachedPlan during these scenarios which will acquire the required
locks in RevalidateCachedQuery both when the cached plan is invalid
and when it is valid. So, we should be fine even when the
custom/generic plan needs to be formed due to a different parameter.

> > Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a
triggerwith a parallel-unsafe trigger action?
 
> >
>
> Needs to be tested,
>

Yeah, it would be good to test it but I think even if the plan is
invalidated, we will reacquire the required locks as mentioned above.

Tsunakawa-San, does this address your concerns around locking the
target relation in the required cases? It would be good to test but I
don't see any problems in the scenarios you mentioned.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Amit Kapila <amit.kapila16@gmail.com>
> Tsunakawa-San, does this address your concerns around locking the
> target relation in the required cases? It would be good to test but I
> don't see any problems in the scenarios you mentioned.

Thank you, understood.  RevalidateCachedQuery() does parse analysis, that's the trick.


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 8, 2021 at 8:25 PM vignesh C <vignesh21@gmail.com> wrote:
>
>
> Few includes are not required:
>  #include "executor/nodeGather.h"
> +#include "executor/nodeModifyTable.h"
>  #include "executor/nodeSubplan.h"
>  #include "executor/tqueue.h"
>  #include "miscadmin.h"
> @@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
>         GatherState *gatherstate;
>         Plan       *outerNode;
>         TupleDesc       tupDesc;
> +       Index           varno;
>
> This include is not required in nodeModifyTable.c
>

I think you meant nodeGather.c (not nodeModifyTable.c).
However, the include file (executor/nodeModifyTable.h) is actually
required here, otherwise there are build warnings.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>
> Execute EXPLAIN on Patched:
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
>    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
>    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
>          Output: test_data1.a, test_data1.b
>          Buffers: shared hit=5

> Execute EXPLAIN on non-Patched:
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
>    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
>    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
>          Output: test_data1.a, test_data1.b
>          Buffers: shared hit=5

I don't know if this is related to this issue, but I felt "shared hit=5" for Seq Scan is strange.  This test case reads
1,000rows from 1,000 partitions, one row per partition, so I thought the shared hit should be 1,000 in Seq Scan.  I
wonderif the 1,000 is included in Insert node?
 


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 8, 2021 at 8:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> - if (pcxt->nworkers_launched > 0)
> + if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader &&
> !isParallelModifyWithReturning))
>   {
>
> I think this check could be simplified to if (pcxt->nworkers_launched
> > 0 && isParallelModifyWithReturning) or something like that.
>

Not quite. The existing check is correct, because it needs to account
for existing Parallel SELECT functionality (not just new Parallel
INSERT).
But I will re-write the test as an equivalent expression, so it's
hopefully more readable (taking into account Antonin's suggested
variable-name changes):

    if (pcxt->nworkers_launched > 0 && (!isModify || isModifyWithReturning))


> >
> > @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> >         PlannerGlobal *glob = root->glob;
> >         int                     rtoffset = list_length(glob->finalrtable);
> >         ListCell   *lc;
> > +       Plan       *finalPlan;
> >
> >         /*
> >          * Add all the query's RTEs to the flattened rangetable.  The live ones
> > @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> >         }
> >
> >         /* Now fix the Plan tree */
> > -       return set_plan_refs(root, plan, rtoffset);
> > +       finalPlan = set_plan_refs(root, plan, rtoffset);
> > +       if (finalPlan != NULL && IsA(finalPlan, Gather))
> > +       {
> > +               Plan       *subplan = outerPlan(finalPlan);
> > +
> > +               if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> > +               {
> > +                       finalPlan->targetlist = copyObject(subplan->targetlist);
> > +               }
> > +       }
> > +       return finalPlan;
> >  }
> >
> > I'm not sure if the problem of missing targetlist should be handled here (BTW,
> > NIL is the constant for an empty list, not NULL). Obviously this is a
> > consequence of the fact that the ModifyTable node has no regular targetlist.
> >
>
> I think it is better to add comments along with this change. In this
> form, this looks quite hacky to me.
>

The targetlist on the ModifyTable node has been setup correctly, but
it hasn't been propagated to the Gather node.
Of course, I have previously tried to elegantly fix this, but struck
various problems, using different approaches.
Perhaps this code could just be moved into set_plan_refs().
For the moment, I'll just keep the current code, but I'll add a FIXME
comment for this.
I'll investigate Antonin's suggestions as a lower-priority side-task.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Jan 20, 2021 at 3:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Jan 8, 2021 at 8:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I'm not sure if the problem of missing targetlist should be handled here (BTW,
> > > NIL is the constant for an empty list, not NULL). Obviously this is a
> > > consequence of the fact that the ModifyTable node has no regular targetlist.
> > >
> >
> > I think it is better to add comments along with this change. In this
> > form, this looks quite hacky to me.
> >
>
> The targetlist on the ModifyTable node has been setup correctly, but
> it hasn't been propagated to the Gather node.
> Of course, I have previously tried to elegantly fix this, but struck
> various problems, using different approaches.
> Perhaps this code could just be moved into set_plan_refs().
> For the moment, I'll just keep the current code, but I'll add a FIXME
> comment for this.
> I'll investigate Antonin's suggestions as a lower-priority side-task.
>

+1, that sounds like a good approach because anyway this has to be
dealt for the second patch and at this point, we should make the first
patch ready.


-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data


[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL

[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()

[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> 
> Thanks for the feedback.
> Posting an updated set of patches. Changes are based on feedback, as detailed
> below:
Hi

It seems there are some previous comments[1][2] not addressed in current patch.
Just to make sure it's not missed.

[1]
https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local

[2]
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Zhihong Yu
Date:
Hi,
For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.

existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.

Minor. For index_expr_max_parallel_hazard_for_modify(), 

+               if (keycol == 0)
+               {
+                   /* Found an index expression */

You can check if keycol != 0, continue with the loop. This would save some indent.

+                   if (index_expr_item == NULL)    /* shouldn't happen */
+                   {
+                       elog(WARNING, "too few entries in indexprs list");

I think the warning should be an error since there is assertion ahead of the if statement.

+           Assert(!isnull);
+           if (isnull)
+           {
+               /*
+                * This shouldn't ever happen, but if it does, log a WARNING
+                * and return UNSAFE, rather than erroring out.
+                */
+               elog(WARNING, "null conbin for constraint %u", con->oid);

The above should be error as well.

Cheers

On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data


[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL

[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()

[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From
Zhihong Yu
Date:
Hi,
For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:

+       bool        isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState);

Please wrap long line.

+   uint64     *processed_count_space;

If I read the code correctly, it seems it can be dropped (use pei->processed_count directly).

Cheers



On Wed, Jan 20, 2021 at 6:29 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.

existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.

Minor. For index_expr_max_parallel_hazard_for_modify(), 

+               if (keycol == 0)
+               {
+                   /* Found an index expression */

You can check if keycol != 0, continue with the loop. This would save some indent.

+                   if (index_expr_item == NULL)    /* shouldn't happen */
+                   {
+                       elog(WARNING, "too few entries in indexprs list");

I think the warning should be an error since there is assertion ahead of the if statement.

+           Assert(!isnull);
+           if (isnull)
+           {
+               /*
+                * This shouldn't ever happen, but if it does, log a WARNING
+                * and return UNSAFE, rather than erroring out.
+                */
+               elog(WARNING, "null conbin for constraint %u", con->oid);

The above should be error as well.

Cheers

On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

There's a couple of potential issues currently being looked at:
- locking issues in additional parallel-safety checks?
- apparent uneven work distribution across the parallel workers, for
large insert data


[Antonin]
- Fixed bad Assert in PrepareParallelMode()
- Added missing comment to explain use of GetCurrentCommandId() in
PrepareParallelMode()
- Some variable name shortening in a few places
- Created common function for creation of non-parallel and parallel
ModifyTable paths
- Path count variable changed to bool
- Added FIXME comment to dubious code for creating Gather target-list
from ModifyTable subplan
- Fixed check on returningLists to use NIL instead of NULL

[Amit]
- Moved additional parallel-safety checks (for modify case) into
max_parallel_hazard()
- Removed redundant calls to max_parallel_hazard_test()
- Added Asserts to "should never happen" null-attribute cases (and
added WARNING log missing from one case)
- Added comment for use of NoLock in max_parallel_hazard_for_modify()

[Vignesh]
- Fixed a couple of typos
- Added a couple of test cases for testing that the same transaction
is used by all parallel workers


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Jan 21, 2021 at 1:28 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> is found from the additional parallel-safety checks, or from the existing
> parallel-safety checks for SELECT that it currently performs.
>
> existing and 'it currently performs' are redundant. You can omit 'that it currently performs'.
>

OK, but this is very minor.

> Minor. For index_expr_max_parallel_hazard_for_modify(),
>
> +               if (keycol == 0)
> +               {
> +                   /* Found an index expression */
>
> You can check if keycol != 0, continue with the loop. This would save some indent.

Yes I know, but I don't really see any issue with indent (I'm using
4-space tabs).

>
> +                   if (index_expr_item == NULL)    /* shouldn't happen */
> +                   {
> +                       elog(WARNING, "too few entries in indexprs list");
>
> I think the warning should be an error since there is assertion ahead of the if statement.
>

Assertions are normally for DEBUG builds, so the Assert would have no
effect in a production (release) build.
Besides, as I have explained in my reply to previous feedback, the
logging of a WARNING (rather than ERROR) is intentional, because I
want processing to continue (not stop) if ever this very rare
condition was to occur - so that the issue can be dealt with by the
current non-parallel processing (rather than stop dead in the middle
of parallel-safety-checking code). For a DEBUG build, it is handy for
the Assert to immediately alert us to the issue (which could really
only be caused by a database corruption, not bug in the code).
Note that Vignesh originally suggested changing it from
"elog(ERROR,...)" to "elog(WARNING,...)", and I agree with him.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Jan 21, 2021 at 1:50 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:
>
> +       bool        isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate),
ModifyTableState);
>
> Please wrap long line.
>

OK.
I thought I ran pg_indent fairly recently, but maybe it chose not to
wrap that line.


> +   uint64     *processed_count_space;
>
> If I read the code correctly, it seems it can be dropped (use pei->processed_count directly).
>

You're right. I'll change it in the next version.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Jan 21, 2021 at 12:47 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> >
> Hi
>
> It seems there are some previous comments[1][2] not addressed in current patch.
> Just to make sure it's not missed.
>
> [1]
> https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local
>
> [2]
> https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com
>

Thanks for alerting me to those, somehow I completely missed them,
sorry about that.
I'll be sure to investigate and address them in the next version of
the patch I post.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > Hi
> >
> > > > I may be wrong, and if I miss sth in previous mails, please give me some
> > > hints.
> > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > considered for foreign table or temporary table, as the insertions only
> > > happened in the leader process.
> > > >
> > >
> > > I don't think we support parallel scan for temporary tables. Can you please
> > > try once both of these operations without Insert being involved? If you
> > > are able to produce a parallel plan without Insert then we can see why it
> > > is not supported with Insert.
> >
> > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part)
tableis temporary.
 
> > And you are right that parallel select is not enabled when temporary table is in select part.
> >
>
> I think Select can be parallel for this case and we should support this case.
>

So I think we're saying that if the target table is a foreign table or
temporary table, it can be regarded as PARALLEL_RESTRICTED, right?

i.e. code-wise:

        /*
-        * We can't support table modification in parallel-mode if
it's a foreign
-        * table/partition (no FDW API for supporting parallel access) or a
+        * We can't support table modification in a parallel worker if it's a
+        * foreign table/partition (no FDW API for supporting parallel
access) or a
         * temporary table.
         */
        if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
                RelationUsesLocalBuffers(rel))
        {
-               table_close(rel, lockmode);
-               context->max_hazard = PROPARALLEL_UNSAFE;
-               return true;
+               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+               {
+                       table_close(rel, lockmode);
+                       return true;
+               }
        }


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Dec 21, 2020 at 1:50 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> +
> +       index_oid_list = RelationGetIndexList(rel);
> ...
>
> As memtioned in the comments of RelationGetIndexList:
> * we return a copy of the list palloc'd in the caller's context.  The caller
> * may list_free() the returned list after scanning it.
>
> Shall we list_free(index_oid_list) at the end of function ?
> Just to avoid potential memory leak.
>

I think that's a good idea, so I'll make that update in the next
version of the patch.
I do notice, however, that there seems to be quite a few places in the
Postgres code where RelationGetIndexList() is being called without a
corresponding list_free() being called - obviously instead relying on
it being deallocated when the caller's memory-context is destroyed.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Jan 21, 2021 at 12:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > >
> > > Hi
> > >
> > > > > I may be wrong, and if I miss sth in previous mails, please give me some
> > > > hints.
> > > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > > considered for foreign table or temporary table, as the insertions only
> > > > happened in the leader process.
> > > > >
> > > >
> > > > I don't think we support parallel scan for temporary tables. Can you please
> > > > try once both of these operations without Insert being involved? If you
> > > > are able to produce a parallel plan without Insert then we can see why it
> > > > is not supported with Insert.
> > >
> > > Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part)
tableis temporary.
 
> > > And you are right that parallel select is not enabled when temporary table is in select part.
> > >
> >
> > I think Select can be parallel for this case and we should support this case.
> >
>
> So I think we're saying that if the target table is a foreign table or
> temporary table, it can be regarded as PARALLEL_RESTRICTED, right?
>

Yes.

> i.e. code-wise:
>
>         /*
> -        * We can't support table modification in parallel-mode if
> it's a foreign
> -        * table/partition (no FDW API for supporting parallel access) or a
> +        * We can't support table modification in a parallel worker if it's a
> +        * foreign table/partition (no FDW API for supporting parallel
> access) or a
>          * temporary table.
>          */
>         if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
>                 RelationUsesLocalBuffers(rel))
>         {
> -               table_close(rel, lockmode);
> -               context->max_hazard = PROPARALLEL_UNSAFE;
> -               return true;
> +               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> +               {
> +                       table_close(rel, lockmode);
> +                       return true;
> +               }
>         }
>

Yeah, these changes look correct to me.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > > Hi
> > >
> > > > > I may be wrong, and if I miss sth in previous mails, please give
> > > > > me some
> > > > hints.
> > > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > > considered for foreign table or temporary table, as the
> > > > > insertions only
> > > > happened in the leader process.
> > > > >
> > > >
> > > > I don't think we support parallel scan for temporary tables. Can
> > > > you please try once both of these operations without Insert being
> > > > involved? If you are able to produce a parallel plan without
> > > > Insert then we can see why it is not supported with Insert.
> > >
> > > Sorry, may be I did not express it clearly, I actually means the case
> when insert's target(not in select part) table is temporary.
> > > And you are right that parallel select is not enabled when temporary
> table is in select part.
> > >
> >
> > I think Select can be parallel for this case and we should support this
> case.
> >
> 
> So I think we're saying that if the target table is a foreign table or
> temporary table, it can be regarded as PARALLEL_RESTRICTED, right?
>

Yes

IMO, PARALLEL_RESTRICTED currently enable parallel select but disable parallel insert.
So, the INSERT only happen in leader worker which seems safe to insert into tempory/foreigh table.

In addition, there are some other restriction about parallel select which seems can be removed:

1.- Target table has a parallel-unsafe trigger, index expression, column default
    expression or check constraint
2.- Target table is a partitioned table with a parallel-unsafe partition key
    expression or support function

If the Insert's target table is the type listed above, Is there some reasons why we can not support parallel select ?
It seems only leader worker will execute the trigger and key-experssion which seems safe.
(If I miss something about it, please let me know)

Best regards,
houzj






RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > So I think we're saying that if the target table is a foreign table or
> > temporary table, it can be regarded as PARALLEL_RESTRICTED, right?
> >
> 
> Yes
> 
> IMO, PARALLEL_RESTRICTED currently enable parallel select but disable
> parallel insert.
> So, the INSERT only happen in leader worker which seems safe to insert into
> tempory/foreigh table.
> 
> In addition, there are some other restriction about parallel select which
> seems can be removed:
> 
> 1.- Target table has a parallel-unsafe trigger, index expression, column
> default
>     expression or check constraint
> 2.- Target table is a partitioned table with a parallel-unsafe partition
> key
>     expression or support function
> 
> If the Insert's target table is the type listed above, Is there some reasons
> why we can not support parallel select ?
> It seems only leader worker will execute the trigger and key-experssion
> which seems safe.
> (If I miss something about it, please let me know)


So Sorry, please ignore the above, I think of something wrong.

Best regards,
houzj



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> >
> > +
> > +       index_oid_list = RelationGetIndexList(rel);
> > ...
> >
> > As memtioned in the comments of RelationGetIndexList:
> > * we return a copy of the list palloc'd in the caller's context.  The
> > caller
> > * may list_free() the returned list after scanning it.
> >
> > Shall we list_free(index_oid_list) at the end of function ?
> > Just to avoid potential memory leak.
> >
> 
> I think that's a good idea, so I'll make that update in the next version
> of the patch.
> I do notice, however, that there seems to be quite a few places in the Postgres
> code where RelationGetIndexList() is being called without a corresponding
> list_free() being called - obviously instead relying on it being deallocated
> when the caller's memory-context is destroyed.
> 

Yes, it will be deallocated when the caller's memory-context is destroyed.

Currently, parallel safety-check check each partition.
I am just a little worried about if there are lots of partition here, it may cause high memory use.

And there is another place like this:

1.
+            conbin = TextDatumGetCString(val);
+            check_expr = stringToNode(conbin);

It seems we can free the cobin when not used(for the same reason above).
What do you think ?


Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 12:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> >
> > I think that's a good idea, so I'll make that update in the next version
> > of the patch.
> > I do notice, however, that there seems to be quite a few places in the Postgres
> > code where RelationGetIndexList() is being called without a corresponding
> > list_free() being called - obviously instead relying on it being deallocated
> > when the caller's memory-context is destroyed.
> >
>
> Yes, it will be deallocated when the caller's memory-context is destroyed.
>
> Currently, parallel safety-check check each partition.
> I am just a little worried about if there are lots of partition here, it may cause high memory use.
>
> And there is another place like this:
>
> 1.
> +                       conbin = TextDatumGetCString(val);
> +                       check_expr = stringToNode(conbin);
>
> It seems we can free the cobin when not used(for the same reason above).
> What do you think ?
>
>

Yes, I think you're right, we should pfree conbin after converting to
Node, to minimize memory usage.
Again, it's interesting that existing Postgres code, when looping
through all of the constraints, doesn't do this.
Hmmm. I'm wondering if there is a performance reason behind this -
avoiding multiple calls to pfree() and just relying on it to be
deallocated in one hit, when the memory context is destroyed.
Anyway, perhaps the concerns of many partitions and the recursive
nature of these checks overrides that, because, as you say, possible
high memory usage.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi

I took a look at v12-0001 patch, here are some comments:

1.
+    /*
+     * Setup the context used in finding the max parallel-mode hazard.
+     */
+    Assert(initial_max_parallel_hazard == 0 ||
+           initial_max_parallel_hazard == PROPARALLEL_SAFE ||
+           initial_max_parallel_hazard == PROPARALLEL_RESTRICTED);
+    context.max_hazard = initial_max_parallel_hazard == 0 ?
+        PROPARALLEL_SAFE : initial_max_parallel_hazard;

I am not quiet sure when will " max_parallel_hazard == 0"
Does it means the case max_parallel_hazard_context not initialized ?


2.
Some tiny code style suggestions

+        if (con->contype == CONSTRAINT_CHECK)
+        {
+            char       *conbin;
+            Datum        val;
+            bool        isnull;
+            Expr       *check_expr;

How about :

if (con->contype != CONSTRAINT_CHECK)
    continue;

3.
+                if (keycol == 0)
+                {
+                    /* Found an index expression */
+
+                    Node       *index_expr;

Like 2, how about:

If (keycol != 0)
    Continue;


4.
+            ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
...
+                    index_expr = (Node *) lfirst(index_expr_item);
+                    index_expr = (Node *) expression_planner((Expr *) index_expr);

It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions

So, IMO, we do not need to call expression_planner for the expr again.


And there seems another solution for this:

In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
which seems can get from "Relation-> rd_index".

Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
It can avoid some unnecessary cost.
And in this solution we do not need to remove expression_planner.

What do you think ?

Best regards,
houzj















Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Jan 21, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > i.e. code-wise:
> >
> >         /*
> > -        * We can't support table modification in parallel-mode if
> > it's a foreign
> > -        * table/partition (no FDW API for supporting parallel access) or a
> > +        * We can't support table modification in a parallel worker if it's a
> > +        * foreign table/partition (no FDW API for supporting parallel
> > access) or a
> >          * temporary table.
> >          */
> >         if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
> >                 RelationUsesLocalBuffers(rel))
> >         {
> > -               table_close(rel, lockmode);
> > -               context->max_hazard = PROPARALLEL_UNSAFE;
> > -               return true;
> > +               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> > +               {
> > +                       table_close(rel, lockmode);
> > +                       return true;
> > +               }
> >         }
> >
>
> Yeah, these changes look correct to me.
>

Unfortunately, this change results in a single test failure in the
"with" tests when "force_parallel_mode=regress" is in effect.

I have reproduced the problem, by extracting relevant SQL from those
tests, as follows:

CREATE TEMP TABLE bug6051 AS
  select i from generate_series(1,3) as i;
SELECT * FROM bug6051;
CREATE TEMP TABLE bug6051_2 (i int);
CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD
 INSERT INTO bug6051_2
 SELECT NEW.i;
WITH t1 AS ( DELETE FROM bug6051 RETURNING * )
INSERT INTO bug6051 SELECT * FROM t1;
ERROR:  cannot delete tuples during a parallel operation

Note that prior to the patch, all INSERTs were regarded as
PARALLEL_UNSAFE, so this problem obviously didn't occur.
I believe this INSERT should be regarded as PARALLEL_UNSAFE, because
it contains a modifying CTE.
However, for some reason, the INSERT is not regarded as having a
modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into
the parallel-safety-checks and is found to be PARALLEL_RESTRICTED:

The relevant code in standard_planner() is:

    if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
        IsUnderPostmaster &&
        (parse->commandType == CMD_SELECT ||
         IsModifySupportedInParallelMode(parse->commandType)) &&
        !parse->hasModifyingCTE &&
        max_parallel_workers_per_gather > 0 &&
        !IsParallelWorker())
    {
        /* all the cheap tests pass, so scan the query tree */
        glob->maxParallelHazard = max_parallel_hazard(parse);
        glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
    }
    else
    {
        /* skip the query tree scan, just assume it's unsafe */
        glob->maxParallelHazard = PROPARALLEL_UNSAFE;
        glob->parallelModeOK = false;
    }

When I debugged this (transformWithClause()), the WITH clause was
found to contain a modifying CTE and for the INSERT
query->hasModifyingCTE was set true.
But somehow in the re-writer code, this got lost.
Bug?
Ideas?

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> I took a look at v12-0001 patch, here are some comments:
>
> 1.
> +       /*
> +        * Setup the context used in finding the max parallel-mode hazard.
> +        */
> +       Assert(initial_max_parallel_hazard == 0 ||
> +                  initial_max_parallel_hazard == PROPARALLEL_SAFE ||
> +                  initial_max_parallel_hazard == PROPARALLEL_RESTRICTED);
> +       context.max_hazard = initial_max_parallel_hazard == 0 ?
> +               PROPARALLEL_SAFE : initial_max_parallel_hazard;
>
> I am not quiet sure when will " max_parallel_hazard == 0"
> Does it means the case max_parallel_hazard_context not initialized ?
>

That function doesn't accept a "max_parallel_hazard_context". It
accepts an initial "max_parallel_hazard" value (char).
The "0" value is just a way of specifying "use the default"
(PROPARALLEL_SAFE). It is not currently used, since currently we just
always pass the "context.max_parallel_hazard" value resulting from the
previous parallel-safety checks for SELECT (and otherwise don't call
that function anywhere else).

>
> 2.
> Some tiny code style suggestions
>
> +               if (con->contype == CONSTRAINT_CHECK)
> +               {
> +                       char       *conbin;
> +                       Datum           val;
> +                       bool            isnull;
> +                       Expr       *check_expr;
>
> How about :
>
> if (con->contype != CONSTRAINT_CHECK)
>         continue;
>
> 3.
> +                               if (keycol == 0)
> +                               {
> +                                       /* Found an index expression */
> +
> +                                       Node       *index_expr;
>
> Like 2, how about:
>
> If (keycol != 0)
>         Continue;
>

This is really a programmer style preference (plenty of discussions on
the internet about it), but it can be argued that use of "continue"
here is not quite as clear as the explicit "if" condition, especially
in this very simple one-condition case.
I'm inclined to leave it as is.

>
> 4.
> +                       ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
> ...
> +                                       index_expr = (Node *) lfirst(index_expr_item);
> +                                       index_expr = (Node *) expression_planner((Expr *) index_expr);
>
> It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
> Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions
>
> So, IMO, we do not need to call expression_planner for the expr again.
>
>
> And there seems another solution for this:
>
> In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
> which seems can get from "Relation-> rd_index".
>
> Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
> It can avoid some unnecessary cost.
> And in this solution we do not need to remove expression_planner.
>

OK, maybe this is a good idea, but I do recall trying to minimize this
kind of processing before, but there were cases which broke it.
Have you actually tried your idea and run all regression tests and
verified that they passed?
In any case, I'll look into it.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Jan 22, 2021 at 8:29 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Thu, Jan 21, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > i.e. code-wise:
> > >
> > >         /*
> > > -        * We can't support table modification in parallel-mode if
> > > it's a foreign
> > > -        * table/partition (no FDW API for supporting parallel access) or a
> > > +        * We can't support table modification in a parallel worker if it's a
> > > +        * foreign table/partition (no FDW API for supporting parallel
> > > access) or a
> > >          * temporary table.
> > >          */
> > >         if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
> > >                 RelationUsesLocalBuffers(rel))
> > >         {
> > > -               table_close(rel, lockmode);
> > > -               context->max_hazard = PROPARALLEL_UNSAFE;
> > > -               return true;
> > > +               if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> > > +               {
> > > +                       table_close(rel, lockmode);
> > > +                       return true;
> > > +               }
> > >         }
> > >
> >
> > Yeah, these changes look correct to me.
> >
>
> Unfortunately, this change results in a single test failure in the
> "with" tests when "force_parallel_mode=regress" is in effect.
>
> I have reproduced the problem, by extracting relevant SQL from those
> tests, as follows:
>
> CREATE TEMP TABLE bug6051 AS
>   select i from generate_series(1,3) as i;
> SELECT * FROM bug6051;
> CREATE TEMP TABLE bug6051_2 (i int);
> CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD
>  INSERT INTO bug6051_2
>  SELECT NEW.i;
> WITH t1 AS ( DELETE FROM bug6051 RETURNING * )
> INSERT INTO bug6051 SELECT * FROM t1;
> ERROR:  cannot delete tuples during a parallel operation
>
> Note that prior to the patch, all INSERTs were regarded as
> PARALLEL_UNSAFE, so this problem obviously didn't occur.
> I believe this INSERT should be regarded as PARALLEL_UNSAFE, because
> it contains a modifying CTE.
> However, for some reason, the INSERT is not regarded as having a
> modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into
> the parallel-safety-checks and is found to be PARALLEL_RESTRICTED:
>
> The relevant code in standard_planner() is:
>
>     if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>         IsUnderPostmaster &&
>         (parse->commandType == CMD_SELECT ||
>          IsModifySupportedInParallelMode(parse->commandType)) &&
>         !parse->hasModifyingCTE &&
>         max_parallel_workers_per_gather > 0 &&
>         !IsParallelWorker())
>     {
>         /* all the cheap tests pass, so scan the query tree */
>         glob->maxParallelHazard = max_parallel_hazard(parse);
>         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
>     }
>     else
>     {
>         /* skip the query tree scan, just assume it's unsafe */
>         glob->maxParallelHazard = PROPARALLEL_UNSAFE;
>         glob->parallelModeOK = false;
>     }
>
> When I debugged this (transformWithClause()), the WITH clause was
> found to contain a modifying CTE and for the INSERT
> query->hasModifyingCTE was set true.
> But somehow in the re-writer code, this got lost.
> Bug?
> Ideas?
>

How it behaves when the table in the above test is a non-temp table
with your patch? If it leads to the same error then we can at least
conclude that this is a generic problem and nothing specific to temp
tables.

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
>
> 4.
> +                       ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
> ...
> +                                       index_expr = (Node *) lfirst(index_expr_item);
> +                                       index_expr = (Node *) expression_planner((Expr *) index_expr);
>
> It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
> Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions
>
> So, IMO, we do not need to call expression_planner for the expr again.
>

Thanks. You are right. I debugged it, and found that BuildIndexInfo-->
RelationGetIndexExpressions executes the same expression evaluation
code as expression_planner().
So I'll remove the redundant call to expression_planner() here.

>
> And there seems another solution for this:
>
> In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
> which seems can get from "Relation-> rd_index".
>
> Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
> It can avoid some unnecessary cost.
> And in this solution we do not need to remove expression_planner.
>

Hmmm, when I debugged my simple test case, I found rel->rd_index was
NULL, so it seems that the call to BuildIndexInfo is needed.
(have I understood your suggestion correctly?)

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > Unfortunately, this change results in a single test failure in the
> > "with" tests when "force_parallel_mode=regress" is in effect.
> >
> > I have reproduced the problem, by extracting relevant SQL from those
> > tests, as follows:
> >
> > CREATE TEMP TABLE bug6051 AS
> >   select i from generate_series(1,3) as i;
> > SELECT * FROM bug6051;
> > CREATE TEMP TABLE bug6051_2 (i int);
> > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD
> >  INSERT INTO bug6051_2
> >  SELECT NEW.i;
> > WITH t1 AS ( DELETE FROM bug6051 RETURNING * )
> > INSERT INTO bug6051 SELECT * FROM t1;
> > ERROR:  cannot delete tuples during a parallel operation
> >
> > Note that prior to the patch, all INSERTs were regarded as
> > PARALLEL_UNSAFE, so this problem obviously didn't occur.
> > I believe this INSERT should be regarded as PARALLEL_UNSAFE, because
> > it contains a modifying CTE.
> > However, for some reason, the INSERT is not regarded as having a
> > modifying CTE, so instead of finding it PARALLEL_UNSAFE, it falls into
> > the parallel-safety-checks and is found to be PARALLEL_RESTRICTED:
> >
> > The relevant code in standard_planner() is:
> >
> >     if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> >         IsUnderPostmaster &&
> >         (parse->commandType == CMD_SELECT ||
> >          IsModifySupportedInParallelMode(parse->commandType)) &&
> >         !parse->hasModifyingCTE &&
> >         max_parallel_workers_per_gather > 0 &&
> >         !IsParallelWorker())
> >     {
> >         /* all the cheap tests pass, so scan the query tree */
> >         glob->maxParallelHazard = max_parallel_hazard(parse);
> >         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> >     }
> >     else
> >     {
> >         /* skip the query tree scan, just assume it's unsafe */
> >         glob->maxParallelHazard = PROPARALLEL_UNSAFE;
> >         glob->parallelModeOK = false;
> >     }
> >
> > When I debugged this (transformWithClause()), the WITH clause was
> > found to contain a modifying CTE and for the INSERT
> > query->hasModifyingCTE was set true.
> > But somehow in the re-writer code, this got lost.
> > Bug?
> > Ideas?
> >
>
> How it behaves when the table in the above test is a non-temp table
> with your patch? If it leads to the same error then we can at least
> conclude that this is a generic problem and nothing specific to temp
> tables.
>

Oh, I don't believe that this has anything to do with TEMP tables -
it's just that when I relaxed the parallel-safety level on TEMP
tables, it exposed the CTE issue in this test case because it just
happens to use a TEMP table.
Having said that, when I changed that test code to not use a TEMP
table, an Assert fired in the planner code and caused the backend to
abort.
It looks like I need to update the following Assert in the planner
code (unchanged by the current patch) in order to test further - but
this Assert only fired because the commandType was CMD_DELETE, which
SHOULD have been excluded by the "hasModifyingCTE" test on the parent
INSERT, which is what I'm saying is strangely NOT getting set.

    /*
     * Generate partial paths for final_rel, too, if outer query levels might
     * be able to make use of them.
     */
    if (final_rel->consider_parallel && root->query_level > 1 &&
        !limit_needed(parse))
    {
        Assert(!parse->rowMarks && parse->commandType == CMD_SELECT);
        foreach(lc, current_rel->partial_pathlist)
        {
            Path       *partial_path = (Path *) lfirst(lc);

            add_partial_path(final_rel, partial_path);
        }
    }

Once the Assert above is changed to not test the commandType, the same
error occurs as before.

This appears to possibly be some kind of bug in which hasModifyingCTE
is not getting set, at least in this particular INSERT case, but in
the current Postgres code it doesn't matter because all INSERTs are
considered parallel-unsafe, so won't be executed in parallel-mode
anyway.
I notice that if I execute "SELECT * from t1" instead of "INSERT INTO
bug6051 SELECT * from t1", then "hasModifyingCTE" is getting set to
true for the query, and thus is always considered parallel-unsafe.


Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > And there seems another solution for this:
> >
> > In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs ,
> > ii_IndexAttrNumbers } from the IndexInfo, which seems can get from
> "Relation-> rd_index".
> >
> > Based on above, May be we do not need to call BuildIndexInfo to build
> the IndexInfo.
> > It can avoid some unnecessary cost.
> > And in this solution we do not need to remove expression_planner.
> >
> 
> Hmmm, when I debugged my simple test case, I found rel->rd_index was NULL,
> so it seems that the call to BuildIndexInfo is needed.
> (have I understood your suggestion correctly?)
> 

Hi greg,

Thanks for debugging this.

May be I missed something. I am not sure about the case when rel->rd_index was NULL.
Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index.
Like the following:
----
BuildIndexInfo(Relation index)
{
    IndexInfo  *ii;
    Form_pg_index indexStruct = index->rd_index;
    int            i;
    int            numAtts;

    /* check the number of keys, and copy attr numbers into the IndexInfo */
    numAtts = indexStruct->indnatts;
----

And the patch do not have NULL-check for index->rd_index too.
So I thought we can assume index->rd_index is not null, but it seems I may missed something ?

Can you please share the test case with me ?

I use the following code to replace the call of BuildIndexInfo.
And the installcheck passed.

Example:
+                Form_pg_index indexStruct = index_rel->rd_index;
+                List *ii_Expressions = RelationGetIndexExpressions(index_rel);
+                int ii_NumIndexAttrs = indexStruct->indnatts;
+                AttrNumber      ii_IndexAttrNumbers[INDEX_MAX_KEYS];

+                for (int i = 0; i < ii_NumIndexAttrs; i++)
+                        ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i];

Best regards,
houzj





Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 6:21 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> Hi greg,
>
> Thanks for debugging this.
>
> May be I missed something. I am not sure about the case when rel->rd_index was NULL.
> Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index.
> Like the following:
> ----
> BuildIndexInfo(Relation index)
> {
>         IndexInfo  *ii;
>         Form_pg_index indexStruct = index->rd_index;
>         int                     i;
>         int                     numAtts;
>
>         /* check the number of keys, and copy attr numbers into the IndexInfo */
>         numAtts = indexStruct->indnatts;
> ----
>
> And the patch do not have NULL-check for index->rd_index too.
> So I thought we can assume index->rd_index is not null, but it seems I may missed something ?
>
> Can you please share the test case with me ?
>
> I use the following code to replace the call of BuildIndexInfo.
> And the installcheck passed.
>
> Example:
> +                Form_pg_index indexStruct = index_rel->rd_index;
> +                List *ii_Expressions = RelationGetIndexExpressions(index_rel);
> +                int ii_NumIndexAttrs = indexStruct->indnatts;
> +                AttrNumber      ii_IndexAttrNumbers[INDEX_MAX_KEYS];
>
> +                for (int i = 0; i < ii_NumIndexAttrs; i++)
> +                        ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i];

Sorry, I was looking at rel->rd_index, not index_rel->rd_index, my fault.
Your code looks OK. I've taken it and reduced some of the lines and
got rid of the C99-only intermingled variable declarations (see
https://www.postgresql.org/docs/13/source-conventions.html).
The changes are below.
The regression tests all pass, so should be OK (my test case was taken
from insert_parallel regression tests).
Thanks for your help.

-        Oid         index_oid = lfirst_oid(lc);
-        Relation    index_rel;
-        IndexInfo  *index_info;
+        Relation        index_rel;
+        Form_pg_index   indexStruct;
+        List            *ii_Expressions;
+        Oid             index_oid = lfirst_oid(lc);

         index_rel = index_open(index_oid, lockmode);

-        index_info = BuildIndexInfo(index_rel);
+        indexStruct = index_rel->rd_index;
+        ii_Expressions = RelationGetIndexExpressions(index_rel);

-        if (index_info->ii_Expressions != NIL)
+        if (ii_Expressions != NIL)
         {
             int         i;
-            ListCell   *index_expr_item =
list_head(index_info->ii_Expressions);
+            ListCell    *index_expr_item = list_head(ii_Expressions);

-            for (i = 0; i < index_info->ii_NumIndexAttrs; i++)
+            for (i = 0; i < indexStruct->indnatts; i++)
             {
-                int         keycol = index_info->ii_IndexAttrNumbers[i];
+                int         keycol = indexStruct->indkey.values[i];

                 if (keycol == 0)
                 {
@@ -912,7 +914,7 @@ index_expr_max_parallel_hazard_for_modify(Relation rel,
                         return true;
                     }

-                    index_expr_item =
lnext(index_info->ii_Expressions, index_expr_item);
+                    index_expr_item = lnext(ii_Expressions, index_expr_item);
                 }
             }


Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello Greg-san,


Initially, some miner comments:


(1)
-     * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
-     * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
-     * backend writes into a completely new table.  In the future, we can
-     * extend it to allow workers to write into the table.  However, to allow
-     * parallel updates and deletes, we have to solve other problems,
-     * especially around combo CIDs.)
+     * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
+     * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
+     * of now, only the leader backend writes into a completely new table. In

This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it?
 


(2)
 /*
  * RELATION_IS_LOCAL
- *        If a rel is either temp or newly created in the current transaction,
- *        it can be assumed to be accessible only to the current backend.
- *        This is typically used to decide that we can skip acquiring locks.
+ *        If a rel is temp, it can be assumed to be accessible only to the
+ *        current backend. This is typically used to decide that we can
+ *        skip acquiring locks.
  *
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_LOCAL(relation) \
-    ((relation)->rd_islocaltemp || \
-     (relation)->rd_createSubid != InvalidSubTransactionId)
+    ((relation)->rd_islocaltemp)

How is this correct?  At least, this change would cause a transaction that creates a new relation acquire an
unnecessarylock.  I'm not sure if that overhead is worth worrying about (perhaps not, I guess).  But can we still check
>rd_createSubidin non-parallel mode?  If we adopt the above change, the comments at call sites need modification - "new
ortemp relation" becomes "temp relations".
 


(3)
@@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate)
...
-    pstmt->commandType = CMD_SELECT;
+    Assert(estate->es_plannedstmt->commandType == CMD_SELECT ||
+           IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType));
+    pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT;

The last line can just be as follows, according to the Assert():

+    pstmt->commandType = estate->es_plannedstmt->commandType);


(4)
@@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate,
     estate->es_use_parallel_mode = use_parallel_mode;
     if (use_parallel_mode)
     {
-        PrepareParallelMode(estate->es_plannedstmt->commandType);
+        bool        isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate),
ModifyTableState);
+
+        PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader);
         EnterParallelMode();
     }

@@ -1021,12 +1039,25 @@ IsInParallelMode(void)
  * Prepare for entering parallel mode, based on command-type.
  */
 void
-PrepareParallelMode(CmdType commandType)
+PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
 {
     if (IsModifySupportedInParallelMode(commandType))
     {
         Assert(!IsInParallelMode());
 
+        if (isParallelModifyLeader)
+        {
+            /*
+             * Set currentCommandIdUsed to true, to ensure that the current
+             * CommandId (which will be used by the parallel workers) won't
+             * change during this parallel operation, as starting new
+             * commands in parallel-mode is not currently supported.
+             * See related comments in GetCurrentCommandId and
+             * CommandCounterIncrement.
+             */
+            (void) GetCurrentCommandId(true);
+        }

I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan().
PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel
leaderor a parallel worker.
 

BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()?  Someone who will read other call
sitesof EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there.  Can
wejust add an argument to EnterParallelMode()?  Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to use
CMD_XX.


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 7:52 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> (1)
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> -        * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
> -        * backend writes into a completely new table.  In the future, we can
> -        * extend it to allow workers to write into the table.  However, to allow
> -        * parallel updates and deletes, we have to solve other problems,
> -        * especially around combo CIDs.)
> +        * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> +        * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> +        * of now, only the leader backend writes into a completely new table. In
>
> This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it? 
>

Yes, you're right the wording is not right (and I don't really like
the wording used before the patch).

Perhaps it could say:

(Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
of now, other than in the case of INSERT INTO...SELECT, only the leader backend
writes into a completely new table. In the future, we can extend it to
allow workers for the
other commands to write into the table. However, to allow parallel
updates and deletes, we
have to solve other problems, especially around combo CIDs.)

Of course, this will need further updating when parallel CREATE TABLE
AS etc. is implemented ...

>
> (2)
>  /*
>   * RELATION_IS_LOCAL
> - *             If a rel is either temp or newly created in the current transaction,
> - *             it can be assumed to be accessible only to the current backend.
> - *             This is typically used to decide that we can skip acquiring locks.
> + *             If a rel is temp, it can be assumed to be accessible only to the
> + *             current backend. This is typically used to decide that we can
> + *             skip acquiring locks.
>   *
>   * Beware of multiple eval of argument
>   */
>  #define RELATION_IS_LOCAL(relation) \
> -       ((relation)->rd_islocaltemp || \
> -        (relation)->rd_createSubid != InvalidSubTransactionId)
> +       ((relation)->rd_islocaltemp)
>
> How is this correct?  At least, this change would cause a transaction that creates a new relation acquire an
unnecessarylock.  I'm not sure if that overhead is worth worrying about (perhaps not, I guess).  But can we still check
>rd_createSubidin non-parallel mode?  If we adopt the above change, the comments at call sites need modification - "new
ortemp relation" becomes "temp relations". 
>

The problem is, with the introduction of parallel INSERT, it's no
longer the case that newly-created tables can't be accessed by anyone
else in the same transaction - now, a transaction can include parallel
workers, inserting into the table concurrently. Without changing that
macro, things fail with a very obscure message (e.g. ERROR:
unexpected data beyond EOF in block 5 of relation base/16384/16388)
and it takes days to debug what the cause of it is.
Maybe updating the macro to still check rd_createSubid in non-parallel
mode is a good idea - I'll need to try it.
Other than that, each and every usage of RELATION_IS_LOCAL would need
to be closely examined, to see if it could be within a parallel
INSERT.

>
> (3)
> @@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate)
> ...
> -       pstmt->commandType = CMD_SELECT;
> +       Assert(estate->es_plannedstmt->commandType == CMD_SELECT ||
> +                  IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType));
> +       pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT;
>
> The last line can just be as follows, according to the Assert():
>
> +       pstmt->commandType = estate->es_plannedstmt->commandType);
>

No, that's not right. I did that originally and it failed in some
cases (try changing it and then run the regression tests and you'll
see).
The commandType of the es_plannedstmt might be CMD_INSERT but the one
in the plan might be CMD_SELECT (for the underlying SELECT).

>
> (4)
> @@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate,
>         estate->es_use_parallel_mode = use_parallel_mode;
>         if (use_parallel_mode)
>         {
> -               PrepareParallelMode(estate->es_plannedstmt->commandType);
> +               bool            isParallelModifyLeader = IsA(planstate, GatherState) &&
IsA(outerPlanState(planstate),ModifyTableState); 
> +
> +               PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader);
>                 EnterParallelMode();
>         }
>
> @@ -1021,12 +1039,25 @@ IsInParallelMode(void)
>   * Prepare for entering parallel mode, based on command-type.
>   */
>  void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
>  {
>         if (IsModifySupportedInParallelMode(commandType))
>         {
>                 Assert(!IsInParallelMode());
>
> +               if (isParallelModifyLeader)
> +               {
> +                       /*
> +                        * Set currentCommandIdUsed to true, to ensure that the current
> +                        * CommandId (which will be used by the parallel workers) won't
> +                        * change during this parallel operation, as starting new
> +                        * commands in parallel-mode is not currently supported.
> +                        * See related comments in GetCurrentCommandId and
> +                        * CommandCounterIncrement.
> +                        */
> +                       (void) GetCurrentCommandId(true);
> +               }
>
> I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan().
PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel
leaderor a parallel worker. 

You could, but I'm not sure it would make the code easier to read,
especially for those who don't know !isParallelWorker() means it's a
parallel leader.

>
> BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()?  Someone who will read other
callsites of EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there.
Can we just add an argument to EnterParallelMode()?  Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to
useCMD_XX. 
>

I really can't see a problem. PrepareParallelMode() is only needed
prior to execution of a parallel plan, so it's not needed for "other
call sites" using EnterParallelMode().
Perhaps the name can be changed to disassociate it from generic
EnterParallelMode() usage. So far, I've only thought of long names
like: PrepareParallelModePlanExec().
Ideas?

Regards,
Greg Nancarrow
Fujitsu Australia

.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello Greg-san,


Second group of comments (I'll reply to (1) - (4) later):


(5)
@@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
... 
-    if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
+    if ((plannedstmt->commandType != CMD_SELECT &&
+         !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
         PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
 }

Now that we're trying to allow parallel writes (INSERT), we should:

* use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents.  That is,
movethe call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
 

* Update the comment  above the call to ExecCheckXactReadOnly().


(6)
@@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
...
+    else
+    {
+        pei->processed_count = NULL;
+    }

The braces can be deleted.


(7)
@@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
                                          true);
     queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
 
+    Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
+    if (IsModifySupportedInParallelMode(queryDesc->operation))
+    {
+        /*
+         * Record that the CurrentCommandId is used, at the start of the
+         * parallel operation.
+         */
+        SetCurrentCommandIdUsedForWorker();
+    }
+
     /* Setting debug_query_string for individual workers */
     debug_query_string = queryDesc->sourceText;

@@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
     if (used)
     {
         /*
-         * Forbid setting currentCommandIdUsed in a parallel worker, because
-         * we have no provision for communicating this back to the leader.  We
-         * could relax this restriction when currentCommandIdUsed was already
-         * true at the start of the parallel operation.
+         * If in a parallel worker, only allow setting currentCommandIdUsed if
+         * currentCommandIdUsed was already true at the start of the parallel
+         * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+         * setting currentCommandIdUsed because we have no provision for
+         * communicating this back to the leader. Once currentCommandIdUsed is
+         * set, the commandId used by leader and workers can't be changed,
+         * because CommandCounterIncrement() then prevents any attempted
+         * increment of the current commandId.
          */
-        Assert(!IsParallelWorker());
+        Assert(!(IsParallelWorker() && !currentCommandIdUsed));
         currentCommandIdUsed = true;
     }
     return currentCommandId;

What happens without these changes?  If this kind of change is really necessary, it seems more natural to pass
currentCommandIdUsedtogether with currentCommandId through SerializeTransactionState() and
StartParallelWorkerTransaction(),instead of the above changes.
 

As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().


(8)
+        /*
+         * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+         * the relation, and this would result in creation of new CommandIds
+         * on insert/update/delete and this isn't supported in a parallel
+         * worker (but is safe in the parallel leader).
+         */
+        trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+        if (trigtype == RI_TRIGGER_FK)
+        {
+            if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+                return true;
+        }

Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.  See
RI_FKey_check()which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows
argumentset to false, which causes CommandCounterIncrement() to not be called.
 

Plus, tables that have RI_TRIGGER_PK should allow parallel INSERT in a parallel-safe manner, because those triggers
onlyfire for UPDATE and DELETE.  So, for the future parallel UPDATE/DELETE support, the above check should be performed
inUPDATE and DELETE cases.
 

(In a data warehouse, fact tables, which store large amounts of historical data, typically have foreign keys to smaller
dimensiontables.  Thus, it's important to allow parallel INSERTs on tables with foreign keys.)
 


Regards
Takayuki Tsunakawa


RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi,

After doing some test to cover the code path in the PATCH 0001.
I have some suggestions for the 0002 testcase.


(1)
+            /* Check parallel-safety of any expressions in the partition key */
+            if (get_partition_col_attnum(pkey, i) == 0)
+            {
+                Node       *check_expr = (Node *) lfirst(partexprs_item);
+
+                if (max_parallel_hazard_walker(check_expr, context))
+                {
+                    table_close(rel, lockmode);
+                    return true;
+                }

The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition
key).

Personally, I use the following sql to cover this:
-----
create table partkey_unsafe_key_expr_t (a int4, b name) partition by range
((fullname_parallel_unsafe('',a::varchar)));
explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1;
-----


(2)
I noticed that most of testcase test both (parallel safe/unsafe/restricted).
But the index expression seems does not test the parallel restricted.
How about add a testcase like:
-----
create or replace function fullname_parallel_restricted(f text, l text) returns text as $$
    begin
        return f || l;
    end;
$$ language plpgsql immutable parallel restricted;

create table names4(index int, first_name text, last_name text);
create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name));

--
-- Test INSERT with parallel-restricted index expression
-- (should create a parallel plan)
--
explain (costs off) insert into names4 select * from names;
-----

(3)
+        /* Recursively check each partition ... */
+        pdesc = RelationGetPartitionDesc(rel);
+        for (i = 0; i < pdesc->nparts; i++)
+        {
+            if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+                                                        command_type,
+                                                        context,
+                                                        AccessShareLock))
+            {
+                table_close(rel, lockmode);
+                return true;
+            }
+        }

It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition)
Hoe about add one testcase to test parallel unsafe partition ?

Best regards,
houzj




Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:

On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
>
> Hello Greg-san,
>
>
> Second group of comments (I'll reply to (1) - (4) later):
>
>
> (5)
> @@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
> ...
> -       if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
> +       if ((plannedstmt->commandType != CMD_SELECT &&
> +                !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
>                 PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
>  }
>
> Now that we're trying to allow parallel writes (INSERT), we should:
>
> * use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents.  That is, move the call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
>
> * Update the comment  above the call to ExecCheckXactReadOnly().
>
>

Hmmm, I not so sure. The patch changes just make the existing test for calling PreventCommandIfParallelMode() a bit more restrictive, to exclude the Parallel INSERT case. So the code previously wasn't just checking read-only transactions anyway, so it's not as if the patch has changed something fundamental in this function. And by moving the PreventCommandIfParallelMode() call to a higher level, then you're making a change to the existing order of error-handling (as ExecCheckXactReadOnly() is calling PreventCommandIfReadOnly() based on a few other range-table conditions, prior to testing whether to call PreventCommandIfParallelMode()). I don't want to introduce a bug by making the change that you're suggesting.


> (6)
> @@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
> ...
> +       else
> +       {
> +               pei->processed_count = NULL;
> +       }
>
> The braces can be deleted.
>

Yes they can be deleted, and I guess I will, but for the record, I personally prefer the explicit brackets, even if just one line, because:
- if more code ever needs to be added to the else, you'll need to add brackets anyway (and newbies might add extra lines tabbed in, thinking it's part of the else block  ...).
- I think it looks better and slightly easier to read, especially when there's a mix of cases with multiple code lines and single code lines
Of course, these kind of things could be debated forever, but I don't think it's such a big deal.

>
> (7)
> @@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
>                                                                                  true);
>         queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
>
> +       Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
> +       if (IsModifySupportedInParallelMode(queryDesc->operation))
> +       {
> +               /*
> +                * Record that the CurrentCommandId is used, at the start of the
> +                * parallel operation.
> +                */
> +               SetCurrentCommandIdUsedForWorker();
> +       }
> +
>         /* Setting debug_query_string for individual workers */
>         debug_query_string = queryDesc->sourceText;
>
> @@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
>         if (used)
>         {
>                 /*
> -                * Forbid setting currentCommandIdUsed in a parallel worker, because
> -                * we have no provision for communicating this back to the leader.  We
> -                * could relax this restriction when currentCommandIdUsed was already
> -                * true at the start of the parallel operation.
> +                * If in a parallel worker, only allow setting currentCommandIdUsed if
> +                * currentCommandIdUsed was already true at the start of the parallel
> +                * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
> +                * setting currentCommandIdUsed because we have no provision for
> +                * communicating this back to the leader. Once currentCommandIdUsed is
> +                * set, the commandId used by leader and workers can't be changed,
> +                * because CommandCounterIncrement() then prevents any attempted
> +                * increment of the current commandId.
>                  */
> -               Assert(!IsParallelWorker());
> +               Assert(!(IsParallelWorker() && !currentCommandIdUsed));
>                 currentCommandIdUsed = true;
>         }
>         return currentCommandId;
>
> What happens without these changes?  

The change to the above comment explains why the change is needed.
Without these changes, a call in a parallel worker to GetCurrentCommandId() will result in an Assert being fired because (prior to the patch) currentCommandIdUsed is forbidden to be set in a parallel worker, and calling GetCurrentCommandId(true) (to signify the intent to use the returned CommandId to mark inserted/updated/deleted tuples) will result in currentCommandIdUsed being set to true.
So it is clear that this cannot remain the same, in order to support Parallel INSERT by workers.
So for each worker, the patch sets "currentCommandIdUsed" to true at the start of the parallel operation (using SetCurrentCommandIdUsedForWorker()) and the Assert condition in GetCurrentCommandId() is tweaked to fire the Assert if GetCurrentCommandId(true) is called in a parallel worker when currentCommandIdUsed is false;
To me, this makes perfect sense.


>If this kind of change is really necessary, it seems more natural to pass currentCommandIdUsed together with currentCommandId through SerializeTransactionState() and StartParallelWorkerTransaction(), instead of the above changes.

No, I don't agree with that. That approach doesn't sound right to me at all.
All the patch really changes is WHERE "currentCurrentIdUsed" can be set for a parallel worker - now it is only allowed to be set to true at the start of the parallel operation for each worker, and the Assert (which is just a sanity check) is updated to ensure that for workers, it can only be set true at that time. That's all it does. It's completely consistent with the old comment that said "We could relax this restriction when currentCommandIdUsed was already true at the start of the parallel operation" - that's what we are now doing with the patch.


> As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().
>

Thanks, I'll fix that in the comments.


>
> (8)
> +               /*
> +                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> +                * the relation, and this would result in creation of new CommandIds
> +                * on insert/update/delete and this isn't supported in a parallel
> +                * worker (but is safe in the parallel leader).
> +                */
> +               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> +               if (trigtype == RI_TRIGGER_FK)
> +               {
> +                       if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> +                               return true;
> +               }
>
> Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.  See RI_FKey_check() which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows argument set to false, which causes CommandCounterIncrement() to not be called.
>

Hmmm, I'm not sure that you have read and interpreted the patch code correctly.
The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will generate a new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new command IDs across the participants in the parallel operation). This is what the code comment says, It does not say that such a trigger generates a new command ID.


In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK).

If you have a test case that breaks the existing patch, please let me know.


Regards,
Greg Nancarrow
Fujitsu Australia


RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Greg Nancarrow <gregn4422@gmail.com>
> > (1)
> Yes, you're right the wording is not right (and I don't really like
> the wording used before the patch).
> 
> Perhaps it could say:
> 
> (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> of now, other than in the case of INSERT INTO...SELECT, only the leader
> backend
> writes into a completely new table. In the future, we can extend it to
> allow workers for the
> other commands to write into the table. However, to allow parallel
> updates and deletes, we
> have to solve other problems, especially around combo CIDs.)

That looks good to me, thanks.


> > (4)
> You could, but I'm not sure it would make the code easier to read,
> especially for those who don't know !isParallelWorker() means it's a
> parallel leader.
...
> I really can't see a problem. PrepareParallelMode() is only needed
> prior to execution of a parallel plan, so it's not needed for "other
> call sites" using EnterParallelMode().
My frank first impressions were (and are):

* Why do we have to call a separate function for preparation despite the actual entering follows immediately?  We can
donecessary preparation in the entering function.
 

* Those who read the parallel index build and parallel VACUUM code for the first time might be startled at the missing
PrepareParallelMode()call: "Oh, EnterParallelMode() is called without preparation unlike the other site I saw the other
day. Isn't this a but?"
 


> Perhaps the name can be changed to disassociate it from generic
> EnterParallelMode() usage. So far, I've only thought of long names
> like: PrepareParallelModePlanExec().
> Ideas?

What PrepareParallelMode() handles is the XID and command ID, which are managed by access/transam/ module and are not
executor-specific. It's natural (or at least not unnatural) that EnterParallelMode() prepares them, because
EnterParallelMode()is part of access/transam/.
 


Regards
Takayuki Tsunakawa


Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:

On Mon, Jan 25, 2021 at 2:22 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi,

After doing some test to cover the code path in the PATCH 0001.
I have some suggestions for the 0002 testcase.


(1)
+                       /* Check parallel-safety of any expressions in the partition key */
+                       if (get_partition_col_attnum(pkey, i) == 0)
+                       {
+                               Node       *check_expr = (Node *) lfirst(partexprs_item);
+
+                               if (max_parallel_hazard_walker(check_expr, context))
+                               {
+                                       table_close(rel, lockmode);
+                                       return true;
+                               }

The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition key).

Personally, I use the following sql to cover this:
-----
create table partkey_unsafe_key_expr_t (a int4, b name) partition by range ((fullname_parallel_unsafe('',a::varchar)));
explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1;
-----


Thanks. It looks like that test case was accidently missed (since the comment said to test the index expressions, but it actually tested the support functions).
I'll update the test code (and comments) accordingly, using your suggestion. 
 

(2)
I noticed that most of testcase test both (parallel safe/unsafe/restricted).
But the index expression seems does not test the parallel restricted.
How about add a testcase like:
-----
create or replace function fullname_parallel_restricted(f text, l text) returns text as $$
    begin
        return f || l;
    end;
$$ language plpgsql immutable parallel restricted;

create table names4(index int, first_name text, last_name text);
create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name));

--
-- Test INSERT with parallel-restricted index expression
-- (should create a parallel plan)
--
explain (costs off) insert into names4 select * from names;
-----


Thanks, looks like that test case is missing, I'll add it as you suggest.
 
(3)
+               /* Recursively check each partition ... */
+               pdesc = RelationGetPartitionDesc(rel);
+               for (i = 0; i < pdesc->nparts; i++)
+               {
+                       if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+                                                                                                               command_type,
+                                                                                                               context,
+                                                                                                               AccessShareLock))
+                       {
+                               table_close(rel, lockmode);
+                               return true;
+                       }
+               }

It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition)
Hoe about add one testcase to test parallel unsafe partition ?



OK, I have to create a more complex table to test those other potential parallel-safety issues of partitions (other than what was tested before the recursive call, or support functions and expression in index key), but since it's a recursive call, invoking code that's already been tested, I would not anticipate any problems.


Thanks,

Greg Nancarrow
Fujitsu Australia


Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Jan 25, 2021 at 4:37 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> > > (4)
> > You could, but I'm not sure it would make the code easier to read,
> > especially for those who don't know !isParallelWorker() means it's a
> > parallel leader.
> ...
> > I really can't see a problem. PrepareParallelMode() is only needed
> > prior to execution of a parallel plan, so it's not needed for "other
> > call sites" using EnterParallelMode().
> My frank first impressions were (and are):
>
> * Why do we have to call a separate function for preparation despite the actual entering follows immediately?  We can
donecessary preparation in the entering function. 
>
> * Those who read the parallel index build and parallel VACUUM code for the first time might be startled at the
missingPrepareParallelMode() call: "Oh, EnterParallelMode() is called without preparation unlike the other site I saw
theother day.  Isn't this a but?" 
>
>
> > Perhaps the name can be changed to disassociate it from generic
> > EnterParallelMode() usage. So far, I've only thought of long names
> > like: PrepareParallelModePlanExec().
> > Ideas?
>
> What PrepareParallelMode() handles is the XID and command ID, which are managed by access/transam/ module and are not
executor-specific. It's natural (or at least not unnatural) that EnterParallelMode() prepares them, because
EnterParallelMode()is part of access/transam/. 
>
>

EnterParallelMode() is part of a generic interface for execution of a
parallel operation, and EnterParallelMode() is called in several
different places to enter parallel mode prior to execution of
different parallel operations. At the moment it is assumed that
EnterParallelMode() just essentially sets a flag to prohibit certain
unsafe operations when doing the parallel operation. If I move
PrepareParallelMode() into EnterParallelMode() then I need to pass in
contextual information to distinguish who the caller is, and possibly
extra information needed by that caller - and change the function call
for each caller, and probably update the comments for each, and in
other places, etc. etc.
I think that it just complicates things doing this. The other callers
of EnterParallelMode() are obviously currently doing their own "pre"
parallel-mode code themselves, specific to whatever parallel operation
they are doing - but nobody has thought it necessary to have to hook
this code into EnterParallelMode().
I think the "PrepareParallelMode()" name can just be changed to
something specific to plan execution, so nobody gets confused with a
name like "PrepareParallelMode()", which as you point out sounds
generic to all callers of EnterParallelMode().


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Jan 22, 2021 at 7:52 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> (1)
> -        * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
> -        * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
> -        * backend writes into a completely new table.  In the future, we can
> -        * extend it to allow workers to write into the table.  However, to allow
> -        * parallel updates and deletes, we have to solve other problems,
> -        * especially around combo CIDs.)
> +        * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
> +        * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
> +        * of now, only the leader backend writes into a completely new table. In
>
> This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it?
 
>
>

Sorry, I've just realized that this is in reference to the 1st patch
(v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch),
which implements parallel SELECT for INSERT.
In that case, data is SELECTed in parallel by the workers, but only
INSERTed by the parallel leader.
So the patch comment is, in fact, correct.
In the 3rd patch
(v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch),
which implements parallel INSERT, the wording for this comment is
again altered, to reflect the fact that parallel workers also write
into the table.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi,

When reading the code of rel_max_parallel_hazard_for_modify in 0001.

I thought there are so many places call table_close().
Personally, It's a little confused to me.

Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ?

Like:

static bool rel_max_parallel_hazard_for_modify(Relation rel,
                                               CmdType command_type,
                                               max_parallel_hazard_context *context);
...
        Relation relation = table_open(rte->relid, NoLock);
        (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, &context);
        table_close(relation, NoLock);


And we seems do not need the lockmode param with the above define.


Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Jan 25, 2021 at 10:40 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When reading the code of rel_max_parallel_hazard_for_modify in 0001.
>
> I thought there are so many places call table_close().
> Personally, It's a little confused to me.
>
> Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ?
>
> Like:
>
> static bool rel_max_parallel_hazard_for_modify(Relation rel,
>                                                CmdType command_type,
>                                                max_parallel_hazard_context *context);
> ...
>         Relation relation = table_open(rte->relid, NoLock);
>         (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, &context);
>         table_close(relation, NoLock);
>
>
> And we seems do not need the lockmode param with the above define.
>
>

Yeah, the repeated cleanup at the point of return is a bit ugly.
It could be solved by changing the function to do cleanup at a common
return point, but I agree with you that in this case it could simply
be done outside the function.
Thanks, I'll make that change.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi,

I have an issue of the check about column default expressions.

+    if (command_type == CMD_INSERT)
+    {
+        /*
+         * Column default expressions for columns in the target-list are
+         * already being checked for parallel-safety in the
+         * max_parallel_hazard() scan of the query tree in standard_planner().
+         */
+
+        tupdesc = RelationGetDescr(rel);
+        for (attnum = 0; attnum < tupdesc->natts; attnum++)


IMO, max_parallel_hazard() only check the parent table's default expressions, But if the table has partitions and its
partitionhave its own default expressions, max_parallel_hazard() seems does not check that.
 
And we seems does not check that too.

I am not sure should we allow parallel insert for this case ?

Example:

-------------------------
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;
set max_parallel_workers_per_gather=4;

create table origin(a int);
insert into origin values(generate_series(1,5000));

create or replace function bdefault_unsafe () returns int language plpgsql parallel unsafe as $$ begin
    RETURN 5;
end $$;

create table parttable1 (a int, b name) partition by range (a); create table parttable1_1 partition of parttable1 for
valuesfrom (0) to (5000); create table parttable1_2 partition of parttable1 for values from (5000) to (10000);
 

alter table parttable1_1 ALTER COLUMN b SET DEFAULT bdefault_unsafe();

postgres=# explain insert into parttable1 select * from origin ;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Gather  (cost=0.00..41.92 rows=5865 width=0)
   Workers Planned: 3
   ->  Insert on parttable1  (cost=0.00..41.92 rows=0 width=0)
         ->  Parallel Seq Scan on origin  (cost=0.00..41.92 rows=1892 width=68)
(4 rows)

postgres=# explain insert into parttable1_1 select * from origin ;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Insert on parttable1_1  (cost=0.00..1348.00 rows=0 width=0)
   ->  Seq Scan on origin  (cost=0.00..1348.00 rows=5000 width=68)
(2 rows)

-------------------------

Best regards,
houzj



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
> IMO, max_parallel_hazard() only check the parent table's default expressions,
> But if the table has partitions and its partition have its own default expressions,
> max_parallel_hazard() seems does not check that.
> And we seems does not check that too.
> 
> I am not sure should we allow parallel insert for this case ?

I think we can allow parallel insert in this case, because the column value is determined according to the DEFAULT
definitionof the target table specified in the INSERT statement.  This is described here:
 

https://www.postgresql.org/docs/devel/sql-createtable.html

"Defaults may be specified separately for each partition. But note that a partition's default value is not applied when
insertinga tuple through a partitioned table."
 

So the parallel-unsafe function should not be called.


Regards
Takayuki Tsunakawa


RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> I think we can allow parallel insert in this case, because the column value
> is determined according to the DEFAULT definition of the target table
> specified in the INSERT statement.  This is described here:
> 
> https://www.postgresql.org/docs/devel/sql-createtable.html
> 
> "Defaults may be specified separately for each partition. But note that
> a partition's default value is not applied when inserting a tuple through
> a partitioned table."
> 
> So the parallel-unsafe function should not be called.

Thanks for the explanation.
I think you are right, I did miss it.

Best regards,
houzj



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi,

When testing the patch with the following kind of sql.

---
Insert into part_table select 1;
Insert into part_table select generate_series(1,10000,1);
Insert into part_table select * from testfunc();
---

we usually use these sqls to initialize the table or for testing purpose.

Personally I think we do not need to do the parallel safety-check for these cases,
because there seems no chance for the select part to consider parallel.

I thought we aim to not check the safety unless parallel is possible.
, So I was thinking is it possible to avoid the check it these cases ?

I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query.
If no we do not check the safety-check.

I am not sure is it worth to do that, any thoughts ?

Best regards,
Houzj




Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Jan 27, 2021 at 2:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When testing the patch with the following kind of sql.
>
> ---
> Insert into part_table select 1;
> Insert into part_table select generate_series(1,10000,1);
> Insert into part_table select * from testfunc();
> ---
>
> we usually use these sqls to initialize the table or for testing purpose.
>
> Personally I think we do not need to do the parallel safety-check for these cases,
> because there seems no chance for the select part to consider parallel.
>
> I thought we aim to not check the safety unless parallel is possible.
> , So I was thinking is it possible to avoid the check it these cases ?
>
> I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query.
> If no we do not check the safety-check.
>
> I am not sure is it worth to do that, any thoughts ?
>

Yes, I think it's worth it. It's surprising that there's not really
any optimizations for these with just the current Postgres parallel
SELECT functionality (as there's currently no way to divide the work
for these amongst the workers, even if the function/expression is
parallel-safe).
For the additional parallel-safety checks for INSERT, currently we
check that RTE_SUBQUERY is in the range-table. So I think we can
additionally check that RTE_RELATION is in the subquery range-table
(otherwise treat it as unsafe).

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Thanks for the feedback.
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

[Hou]
- Deallocate list returned from RelationGetIndexList() using
list_free() after finished using list
- Regard foreign and temporary tables as parallel-restricted (instead
of parallel unsafe) for Insert
- pfree() conbin returned from TextDatumGetCString() after finished using it
- Make parallel-safety checks of index expressions more efficient,
remove some redundant function calls
- Add a few more test cases to cover certain untested parallel-safety
check cases
- Remove repeated table_close() on return, by moving table_open() &
table_close() to a higher level
- Reduce Insert parallel-safety checks required for some SQL, by
noting that the subquery must operate on a relation (check for
RTE_RELATION in subquery range-table)

[Zhihong Yu]
- Minor change to patch comment
- Wrap long line
- Remove intermediate local variable

[Tsunakawa-san]
- Update RELATION_IS_LOCAL macro to reinstate previously-removed check
on the relation being newly created in the current transaction (and so
assumed accessible only to the current backend), but for
non-parallel-mode only (since now it may be accessible to parallel
workers)
- Remove braces for one-line else
- Fix code comment
- Rename PrepareParallelMode() for plan execution, so that it's not
misinterpreted as a general function for preparation of parallel-mode

[Misc]
- Fix bug in query re-writer - hasModifyingCTE is not set in
re-written non-SELECT queries having a CTE


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"Tang, Haiying"
Date:
Hi Greg,

Recently, I was keeping evaluating performance of this patch(1/28 V13).
Here I find a regression test case which is parallel insert with bitmap heap scan.
when the target table has primary key or index, then the patched performance will have a 7%-19% declines than
unpatched.
 

Could you please have a look about this?

I tried max_parallel_workers_per_gather=2/4/8, and I didn't tune other parameters(like GUCs or other enforce parallel
parameters).
 

1. max_parallel_workers_per_gather=2(default)
target_table        patched       master      %reg
------------------------------------------------------
without_PK_index    83.683        142.183    -41%
with_PK             382.824       321.101    19%
with_index          372.682       324.246    15%

2. max_parallel_workers_per_gather=4
target_table        patched       master      %reg
------------------------------------------------------
without_PK_index    73.189        141.879     -48%
with_PK             362.104       329.759     10%
with_index          372.237       333.718     12%

3. max_parallel_workers_per_gather=8 (also set max_parallel_workers=16, max_worker_processes = 16)
target_table        patched       master      %reg
------------------------------------------------------
without_PK_index    75.072        146.100     -49%
with_PK             365.312       324.339     13%
with_index          362.636       338.366     7%

Attached test_bitmap.sql which includes my test data and sql if you want to have a look. 

Regards,
Tang




Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi,

When developing the reloption patch, I noticed some issues in the patch.

1).
> - Reduce Insert parallel-safety checks required for some SQL, by noting
> that the subquery must operate on a relation (check for RTE_RELATION in
> subquery range-table)

+            foreach(lcSub, rte->subquery->rtable)
+            {
+                rteSub = lfirst_node(RangeTblEntry, lcSub);
+                if (rteSub->rtekind == RTE_RELATION)
+                {
+                    hasSubQueryOnRelation = true;
+                    break;
+                }
+            }
It seems we can not only search RTE_RELATION in rtable,
because RTE_RELATION may exist in other place like:

---
--** explain insert into target select (select * from test);
    Subplan's subplan

--** with cte as (select * from test) insert into target select * from cte;
    In query's ctelist.
---

May be we should use a walker function [1] to
search the subquery and ctelist.

2).

+--
+-- Test INSERT into temporary table with underlying query.
+-- (should not use a parallel plan)
+--

May be the comment here need some change since
we currently support parallel plan for temp table.

3)
Do you think we can add a testcase for foreign-table ?
To test parallel query with serial insert on foreign table.


[1]
static bool
relation_walker(Node *node)
{
    if (node == NULL)
        return false;

    else if (IsA(node, RangeTblEntry))
    {
        RangeTblEntry *rte = (RangeTblEntry *) node;
        if (rte->rtekind == RTE_RELATION)
            return true;
        
        return false;
    }

    else if (IsA(node, Query))
    {
        Query       *query = (Query *) node;

        /* Recurse into subselects */
        return query_tree_walker(query, relation_walker,
                                 NULL, QTW_EXAMINE_RTES_BEFORE);
    }

    /* Recurse to check arguments */
    return expression_tree_walker(node,
                                  relation_walker,
                                  NULL);
}


Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> +                       foreach(lcSub, rte->subquery->rtable)
> +                       {
> +                               rteSub = lfirst_node(RangeTblEntry, lcSub);
> +                               if (rteSub->rtekind == RTE_RELATION)
> +                               {
> +                                       hasSubQueryOnRelation = true;
> +                                       break;
> +                               }
> +                       }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
>         Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
>         In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>

Yes, the current checks are too simple, as you point out, there seem
to be more complex cases that it doesn't pick up. Unfortunately
expanding the testing for them does detract from the original
intention of this code (which was to avoid extra parallel-safety check
processing on code which can't be run in parallel). I guess the
relation walker function should additionally check for SELECT queries
only (commandType == CMD_SELECT), and exclude SELECT FOR UPDATE/SHARE
(rowMarks != NIL) too. I'll need to look further into it, but will
certainly update the code for the next version of the patch.

> 2).
>
> +--
> +-- Test INSERT into temporary table with underlying query.
> +-- (should not use a parallel plan)
> +--
>
> May be the comment here need some change since
> we currently support parallel plan for temp table.
>

Thanks, it should say something like "should create the plan with
INSERT + parallel SELECT".

> 3)
> Do you think we can add a testcase for foreign-table ?
> To test parallel query with serial insert on foreign table.
>

I have intended to do it, but as a lower-priority task.

>
> [1]
> static bool
> relation_walker(Node *node)
> {
>         if (node == NULL)
>                 return false;
>
>         else if (IsA(node, RangeTblEntry))
>         {
>                 RangeTblEntry *rte = (RangeTblEntry *) node;
>                 if (rte->rtekind == RTE_RELATION)
>                         return true;
>
>                 return false;
>         }
>
>         else if (IsA(node, Query))
>         {
>                 Query      *query = (Query *) node;
>
>                 /* Recurse into subselects */
>                 return query_tree_walker(query, relation_walker,
>                                                                  NULL, QTW_EXAMINE_RTES_BEFORE);
>         }
>
>         /* Recurse to check arguments */
>         return expression_tree_walker(node,
>                                                                   relation_walker,
>                                                                   NULL);
> }
>

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> +                       foreach(lcSub, rte->subquery->rtable)
> +                       {
> +                               rteSub = lfirst_node(RangeTblEntry, lcSub);
> +                               if (rteSub->rtekind == RTE_RELATION)
> +                               {
> +                                       hasSubQueryOnRelation = true;
> +                                       break;
> +                               }
> +                       }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
>         Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
>         In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>
>
>
> [1]
> static bool
> relation_walker(Node *node)
> {
>         if (node == NULL)
>                 return false;
>
>         else if (IsA(node, RangeTblEntry))
>         {
>                 RangeTblEntry *rte = (RangeTblEntry *) node;
>                 if (rte->rtekind == RTE_RELATION)
>                         return true;
>
>                 return false;
>         }
>
>         else if (IsA(node, Query))
>         {
>                 Query      *query = (Query *) node;
>
>                 /* Recurse into subselects */
>                 return query_tree_walker(query, relation_walker,
>                                                                  NULL, QTW_EXAMINE_RTES_BEFORE);
>         }
>
>         /* Recurse to check arguments */
>         return expression_tree_walker(node,
>                                                                   relation_walker,
>                                                                   NULL);
> }
>

I've had a further look at this, and this walker function is doing a
lot of work recursing the parse tree, and I'm not sure that it
reliably retrieves the information that we;re looking for, for all
cases of different SQL queries. Unless it can be made much more
efficient and specific to our needs, I think we should not try to do
this optimization, because there's too much overhead. Also, keep in
mind that for the current parallel SELECT functionality in Postgres, I
don't see any similar optimization being attempted (and such
optimization should be attempted at the SELECT level). So I don't
think we should be attempting such optimization in this patch (but
could be attempted in a separate patch, just related to current
parallel SELECT functionality).

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> 
> I've had a further look at this, and this walker function is doing a lot
> of work recursing the parse tree, and I'm not sure that it reliably retrieves
> the information that we;re looking for, for all cases of different SQL
> queries. Unless it can be made much more efficient and specific to our needs,
> I think we should not try to do this optimization, because there's too much
> overhead. Also, keep in mind that for the current parallel SELECT
> functionality in Postgres, I don't see any similar optimization being
> attempted (and such optimization should be attempted at the SELECT level).
> So I don't think we should be attempting such optimization in this patch
> (but could be attempted in a separate patch, just related to current parallel
> SELECT functionality).

Yes, I agreed,
I was worried about the overhead it may bring too,
we can remove this from the current patch.

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Feb 2, 2021 at 7:26 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> >
>
> Yes, I agreed,
> I was worried about the overhead it may bring too,
> we can remove this from the current patch.
>

Posting an updated set of patches. Changes are based on feedback, as
detailed below:

[Hou]
- Corrected code that tries to check for underlying query on a
relation (but must at least check for underlying query, to rule out
VALUES). More complex tests not used, due to overhead.
- Fixed comment in tests for INSERT on temp table.
[Antonin]
- Moved Gather node targetlist fix-up code to set_plan_refs() and
updated comment (Antonin)
- Added extra tests for INSERT with RETURNING clause.

Hou: the parallel_dml patches will need slight rebasing

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Amit Kapila <amit.kapila16@gmail.com>
> On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> <tanghy.fnst@cn.fujitsu.com> wrote:
> > Execute EXPLAIN on Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> >                                                        QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
> >    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> >          Output: test_data1.a, test_data1.b
> >          Buffers: shared hit=5
> >  Planning:
> >    Buffers: shared hit=27011
> >  Planning Time: 24.526 ms
> >  Execution Time: 44.981 ms
> >
> > Execute EXPLAIN on non-Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> >                                                        QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
> >    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> >          Output: test_data1.a, test_data1.b
> >          Buffers: shared hit=5
> >  Planning:
> >    Buffers: shared hit=72
> >  Planning Time: 0.135 ms
> >  Execution Time: 79.058 ms
> >
> 
> So, the results indicate that after the patch we touch more buffers
> during planning which I think is because of accessing the partition
> information, and during execution, the patch touches fewer buffers for
> the same reason. But why this can reduce the time with patch? I think
> this needs some investigation.

I guess another factor other than shared buffers is relcache and catcache.  The patched version loads those cached
entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched
versionhas to gradually build those cache entries during execution.  How can wee confirm its effect?
 


Regards
Takayuki Tsunakawa



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Feb 4, 2021 at 6:26 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>         From: Amit Kapila <amit.kapila16@gmail.com>
> > On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> > <tanghy.fnst@cn.fujitsu.com> wrote:
> > > Execute EXPLAIN on Patched:
> > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> > select * from test_data1;
> > >                                                        QUERY PLAN
> > >
> > ---------------------------------------------------------------------------
> > ---------------------------------------------
> > >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> > time=44.139..44.140 rows=0 loops=1)
> > >    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> > >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> > width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> > >          Output: test_data1.a, test_data1.b
> > >          Buffers: shared hit=5
> > >  Planning:
> > >    Buffers: shared hit=27011
> > >  Planning Time: 24.526 ms
> > >  Execution Time: 44.981 ms
> > >
> > > Execute EXPLAIN on non-Patched:
> > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> > select * from test_data1;
> > >                                                        QUERY PLAN
> > >
> > ---------------------------------------------------------------------------
> > ---------------------------------------------
> > >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> > time=72.656..72.657 rows=0 loops=1)
> > >    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> > >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> > width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> > >          Output: test_data1.a, test_data1.b
> > >          Buffers: shared hit=5
> > >  Planning:
> > >    Buffers: shared hit=72
> > >  Planning Time: 0.135 ms
> > >  Execution Time: 79.058 ms
> > >
> >
> > So, the results indicate that after the patch we touch more buffers
> > during planning which I think is because of accessing the partition
> > information, and during execution, the patch touches fewer buffers for
> > the same reason. But why this can reduce the time with patch? I think
> > this needs some investigation.
>
> I guess another factor other than shared buffers is relcache and catcache.  The patched version loads those cached
entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched
versionhas to gradually build those cache entries during execution. 
>

Right.

>  How can wee confirm its effect?
>

I am not sure but if your theory is correct then won't in consecutive
runs both should have the same performance?


--
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Feb 4, 2021 at 11:56 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> >
> > So, the results indicate that after the patch we touch more buffers
> > during planning which I think is because of accessing the partition
> > information, and during execution, the patch touches fewer buffers for
> > the same reason. But why this can reduce the time with patch? I think
> > this needs some investigation.
>
> I guess another factor other than shared buffers is relcache and catcache.  The patched version loads those cached
entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched
versionhas to gradually build those cache entries during execution.  How can wee confirm its effect? 
>

I believe that we can confirm its effect by invalidating relcache and
catcache, in both the patched and unpatched versions, just after the
parallel-safety checks are performed in the planner, and then running
tests and comparing the performance.

So that's exactly what I did (adding a call to
InvalidateSystemCaches() just after the parallel-safety checks in the
planner).
I found that then the unpatched version always performed better than
the patched version for tests inserting 1000 records into a table with
100,200,500 and 1000 partitions.
Looking at the breakdown of the timing for each Insert, the Planning
Time was always significantly more for the patched version (expected,
because it does extra checks), but the Execution Time was very similar
for both the patched and unpatched versions.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
Hi,

I took a look into the hasModifyingCTE bugfix recently, 
and found a possible bug case without the parallel insert patch.

---------------------------------
drop table if exists test_data1;
create table test_data1(a int, b int) ;
insert into test_data1 select generate_series(1,1000), generate_series(1,1000);
set force_parallel_mode=on;

CREATE TEMP TABLE bug6051 AS
  select i from generate_series(1,3) as i;

SELECT * FROM bug6051;
CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as i from test_data1;

WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 SELECT * FROM t1;

*******
***ERROR:  cannot assign XIDs during a parallel operation
*******
---------------------------------

I debugged it and it did have modifycte in the parsetree after rewrite.
I think if we can properly set the hasModifyingCTE, we can avoid this error by not consider parallel for this.

Thoughts ?

Best regards,
houzj






Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 5, 2021 at 2:58 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> I took a look into the hasModifyingCTE bugfix recently,
> and found a possible bug case without the parallel insert patch.
>
> ---------------------------------
> drop table if exists test_data1;
> create table test_data1(a int, b int) ;
> insert into test_data1 select generate_series(1,1000), generate_series(1,1000);
> set force_parallel_mode=on;
>
> CREATE TEMP TABLE bug6051 AS
>   select i from generate_series(1,3) as i;
>
> SELECT * FROM bug6051;
> CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as i from test_data1;
>
> WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 SELECT * FROM t1;
>
> *******
> ***ERROR:  cannot assign XIDs during a parallel operation
> *******
> ---------------------------------
>
> I debugged it and it did have modifycte in the parsetree after rewrite.
> I think if we can properly set the hasModifyingCTE, we can avoid this error by not consider parallel for this.
>

Thanks. You've identified that the bug exists for SELECT too. I've
verified that the issue is fixed by the bugfix included in the
Parallel INSERT patch.
Are you able to review my bugfix?
Since the problem exists for SELECT in the current Postgres code, I'd
like to pull that bugfix out and provide it as a separate fix.
My concern is that there may well be a better way to fix the issue -
for example, during the re-writing, rather than after the query has
been re-written.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > I took a look into the hasModifyingCTE bugfix recently, and found a
> > possible bug case without the parallel insert patch.
> >
> > ---------------------------------
> > drop table if exists test_data1;
> > create table test_data1(a int, b int) ; insert into test_data1 select
> > generate_series(1,1000), generate_series(1,1000); set
> > force_parallel_mode=on;
> >
> > CREATE TEMP TABLE bug6051 AS
> >   select i from generate_series(1,3) as i;
> >
> > SELECT * FROM bug6051;
> > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as
> > i from test_data1;
> >
> > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051
> > SELECT * FROM t1;
> >
> > *******
> > ***ERROR:  cannot assign XIDs during a parallel operation
> > *******
> > ---------------------------------
> >
> > I debugged it and it did have modifycte in the parsetree after rewrite.
> > I think if we can properly set the hasModifyingCTE, we can avoid this
> error by not consider parallel for this.
> >
> 
> Thanks. You've identified that the bug exists for SELECT too. I've verified
> that the issue is fixed by the bugfix included in the Parallel INSERT patch.
> Are you able to review my bugfix?
> Since the problem exists for SELECT in the current Postgres code, I'd like
> to pull that bugfix out and provide it as a separate fix.
> My concern is that there may well be a better way to fix the issue - for
> example, during the re-writing, rather than after the query has been
> re-written.
Hi,

I took a look at the fix and have some thoughts on it.
(Please correct me if you have tried this idea and found something is wrong)

IMO, the main reason for the hasModifyingCTE=false is that:
the Rewriter did not update the  hasModifyingCTE when copying the existsing 'cteList' to the rewrited one.

It seems there is only one place where ctelist will be copied separately.
-------
static Query *
rewriteRuleAction(Query *parsetree,
...
        /* OK, it's safe to combine the CTE lists */
        sub_action->cteList = list_concat(sub_action->cteList,
                                          copyObject(parsetree->cteList));
+        sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
--------

Based on the above, if we update the hasModifyingCTE here, we may solve this problem.

And there is another point here, the sub_action may be not the final parsetree.
If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will
Put the ctelist into subquery in parsetree's rtable.
In this case, we may also need to update the final parsetree too.
(I think you know this case, I found same logic in the latest patch)

--------
static Query *
rewriteRuleAction(Query *parsetree,
...
        if (sub_action_ptr)
+        {
            *sub_action_ptr = sub_action;
+            rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
+        }
--------

And the Basic test passed.
What do you think ?

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
Hi,

While reviewing the v14 set of patches (will send my comments
shortly), I too had some reservations on how 0001 decided to go about
setting hasModifyingCTE.

On Fri, Feb 5, 2021 at 1:51 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > > I took a look into the hasModifyingCTE bugfix recently, and found a
> > > possible bug case without the parallel insert patch.
> > >
> > > ---------------------------------
> > > drop table if exists test_data1;
> > > create table test_data1(a int, b int) ; insert into test_data1 select
> > > generate_series(1,1000), generate_series(1,1000); set
> > > force_parallel_mode=on;
> > >
> > > CREATE TEMP TABLE bug6051 AS
> > >   select i from generate_series(1,3) as i;
> > >
> > > SELECT * FROM bug6051;
> > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as
> > > i from test_data1;
> > >
> > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051
> > > SELECT * FROM t1;
> > >
> > > *******
> > > ***ERROR:  cannot assign XIDs during a parallel operation
> > > *******
> > > ---------------------------------
> > >
> > > I debugged it and it did have modifycte in the parsetree after rewrite.
> > > I think if we can properly set the hasModifyingCTE, we can avoid this
> > error by not consider parallel for this.
> > >
> >
> > Thanks. You've identified that the bug exists for SELECT too. I've verified
> > that the issue is fixed by the bugfix included in the Parallel INSERT patch.
> > Are you able to review my bugfix?
> > Since the problem exists for SELECT in the current Postgres code, I'd like
> > to pull that bugfix out and provide it as a separate fix.

+1, a separate patch for this seems better.

> > My concern is that there may well be a better way to fix the issue - for
> > example, during the re-writing, rather than after the query has been
> > re-written.
> Hi,
>
> I took a look at the fix and have some thoughts on it.
> (Please correct me if you have tried this idea and found something is wrong)
>
> IMO, the main reason for the hasModifyingCTE=false is that:
> the Rewriter did not update the  hasModifyingCTE when copying the existsing 'cteList' to the rewrited one.
>
> It seems there is only one place where ctelist will be copied separately.
> -------
> static Query *
> rewriteRuleAction(Query *parsetree,
> ...
>                 /* OK, it's safe to combine the CTE lists */
>                 sub_action->cteList = list_concat(sub_action->cteList,
>                                                                                   copyObject(parsetree->cteList));
> +               sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> --------
>
> Based on the above, if we update the hasModifyingCTE here, we may solve this problem.
>
> And there is another point here, the sub_action may be not the final parsetree.
> If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will
> Put the ctelist into subquery in parsetree's rtable.
> In this case, we may also need to update the final parsetree too.
> (I think you know this case, I found same logic in the latest patch)
>
> --------
> static Query *
> rewriteRuleAction(Query *parsetree,
> ...
>                 if (sub_action_ptr)
> +               {
>                         *sub_action_ptr = sub_action;
> +                       rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> +               }
> --------
>
> And the Basic test passed.
> What do you think ?

That is very close to what I was going to suggest, which is this:

diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..3c4417af98 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
                checkExprHasSubLink((Node *) rule_action->returningList);
    }

+   rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
+
    return rule_action;
 }

-- 
Amit Langote
EDB: http://www.enterprisedb.com



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > --------
> > static Query *
> > rewriteRuleAction(Query *parsetree,
> > ...
> >                 if (sub_action_ptr)
> > +               {
> >                         *sub_action_ptr = sub_action;
> > +                       rule_action->hasModifyingCTE |=
> parsetree->hasModifyingCTE;
> > +               }
> > --------
> >
> > And the Basic test passed.
> > What do you think ?
> 
> That is very close to what I was going to suggest, which is this:
> 
> diff --git a/src/backend/rewrite/rewriteHandler.c
> b/src/backend/rewrite/rewriteHandler.c
> index 0672f497c6..3c4417af98 100644
> --- a/src/backend/rewrite/rewriteHandler.c
> +++ b/src/backend/rewrite/rewriteHandler.c
> @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
>                 checkExprHasSubLink((Node *)
> rule_action->returningList);
>     }
> 
> +   rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> +
>     return rule_action;
>  }


    if (parsetree->cteList != NIL && sub_action->commandType != CMD_UTILITY)
    {
    ...
        sub_action->cteList = list_concat(sub_action->cteList,
    }

Is is possible when sub_action is CMD_UTILITY ?
In this case CTE will be copied to the newone, should we set the set the flag in this case ?

Best regard,
houzj




RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > > --------
> > > static Query *
> > > rewriteRuleAction(Query *parsetree,
> > > ...
> > >                 if (sub_action_ptr)
> > > +               {
> > >                         *sub_action_ptr = sub_action;
> > > +                       rule_action->hasModifyingCTE |=
> > parsetree->hasModifyingCTE;
> > > +               }
> > > --------
> > >
> > > And the Basic test passed.
> > > What do you think ?
> >
> > That is very close to what I was going to suggest, which is this:
> >
> > diff --git a/src/backend/rewrite/rewriteHandler.c
> > b/src/backend/rewrite/rewriteHandler.c
> > index 0672f497c6..3c4417af98 100644
> > --- a/src/backend/rewrite/rewriteHandler.c
> > +++ b/src/backend/rewrite/rewriteHandler.c
> > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
> >                 checkExprHasSubLink((Node *)
> > rule_action->returningList);
> >     }
> >
> > +   rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> > +
> >     return rule_action;
> >  }
> 
> 
>     if (parsetree->cteList != NIL && sub_action->commandType !=
> CMD_UTILITY)
>     {
>     ...
>         sub_action->cteList = list_concat(sub_action->cteList,
>     }
> 
> Is is possible when sub_action is CMD_UTILITY ?
> In this case CTE will be copied to the newone, should we set the set the
> flag in this case ?

Sorry , a typo in my word.
In this case CTE will not be copied to the newone, should we set the set the flag in this case ?

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 5, 2021 at 4:25 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > >
> > > That is very close to what I was going to suggest, which is this:
> > >
> > > diff --git a/src/backend/rewrite/rewriteHandler.c
> > > b/src/backend/rewrite/rewriteHandler.c
> > > index 0672f497c6..3c4417af98 100644
> > > --- a/src/backend/rewrite/rewriteHandler.c
> > > +++ b/src/backend/rewrite/rewriteHandler.c
> > > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
> > >                 checkExprHasSubLink((Node *)
> > > rule_action->returningList);
> > >     }
> > >
> > > +   rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> > > +
> > >     return rule_action;
> > >  }
> >
> >
> >       if (parsetree->cteList != NIL && sub_action->commandType !=
> > CMD_UTILITY)
> >       {
> >       ...
> >               sub_action->cteList = list_concat(sub_action->cteList,
> >       }
> >
> > Is is possible when sub_action is CMD_UTILITY ?
> > In this case CTE will be copied to the newone, should we set the set the
> > flag in this case ?
>
> Sorry , a typo in my word.
> In this case CTE will not be copied to the newone, should we set the set the flag in this case ?
>

No, strictly speaking, we probably shouldn't, because the CTE wasn't
copied in that case.
Also, I know the bitwise OR "works" in this case, but I think some
will frown on use of that for a bool.
IMHO better to use:

   if (parsetree->hasModifyingCTE)
       rule_action->hasModifyingCTE = true;

So patch might be something like:

diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..a989e02925 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -557,6 +557,8 @@ rewriteRuleAction(Query *parsetree,
         /* OK, it's safe to combine the CTE lists */
         sub_action->cteList = list_concat(sub_action->cteList,
                                           copyObject(parsetree->cteList));
+        if (parsetree->hasModifyingCTE)
+            sub_action->hasModifyingCTE = true;
     }

     /*
@@ -594,6 +596,9 @@ rewriteRuleAction(Query *parsetree,
             *sub_action_ptr = sub_action;
         else
             rule_action = sub_action;
+
+        if (parsetree->hasModifyingCTE)
+            sub_action->hasModifyingCTE = true;
     }

     /*

I'll do some further checks, because the rewriting is recursive and
tricky, so don't want to miss any cases ...

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Fri, Feb 5, 2021 at 2:55 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Feb 5, 2021 at 4:25 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > > > That is very close to what I was going to suggest, which is this:
> > > >
> > > > diff --git a/src/backend/rewrite/rewriteHandler.c
> > > > b/src/backend/rewrite/rewriteHandler.c
> > > > index 0672f497c6..3c4417af98 100644
> > > > --- a/src/backend/rewrite/rewriteHandler.c
> > > > +++ b/src/backend/rewrite/rewriteHandler.c
> > > > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
> > > >                 checkExprHasSubLink((Node *)
> > > > rule_action->returningList);
> > > >     }
> > > >
> > > > +   rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> > > > +
> > > >     return rule_action;
> > > >  }
> > >
> > >
> > >       if (parsetree->cteList != NIL && sub_action->commandType !=
> > > CMD_UTILITY)
> > >       {
> > >       ...
> > >               sub_action->cteList = list_concat(sub_action->cteList,
> > >       }
> > >
> > > Is is possible when sub_action is CMD_UTILITY ?
> > > In this case CTE will be copied to the newone, should we set the set the
> > > flag in this case ?
> >
> > Sorry , a typo in my word.
> > In this case CTE will not be copied to the newone, should we set the set the flag in this case ?
> >
>
> No, strictly speaking, we probably shouldn't, because the CTE wasn't
> copied in that case.

Right.

> Also, I know the bitwise OR "works" in this case, but I think some
> will frown on use of that for a bool.
> IMHO better to use:
>
>    if (parsetree->hasModifyingCTE)
>        rule_action->hasModifyingCTE = true;
>
> So patch might be something like:
>
> diff --git a/src/backend/rewrite/rewriteHandler.c
> b/src/backend/rewrite/rewriteHandler.c
> index 0672f497c6..a989e02925 100644
> --- a/src/backend/rewrite/rewriteHandler.c
> +++ b/src/backend/rewrite/rewriteHandler.c
> @@ -557,6 +557,8 @@ rewriteRuleAction(Query *parsetree,
>          /* OK, it's safe to combine the CTE lists */
>          sub_action->cteList = list_concat(sub_action->cteList,
>                                            copyObject(parsetree->cteList));
> +        if (parsetree->hasModifyingCTE)
> +            sub_action->hasModifyingCTE = true;
>      }
>
>      /*
> @@ -594,6 +596,9 @@ rewriteRuleAction(Query *parsetree,
>              *sub_action_ptr = sub_action;
>          else
>              rule_action = sub_action;
> +
> +        if (parsetree->hasModifyingCTE)
> +            sub_action->hasModifyingCTE = true;
>      }

That may be better.

BTW, the original query's cteList is copied into sub_action query but
not into rule_action for reasons I haven't looked very closely into,
even though we'd like to ultimately set the latter's hasModifyingCTE
to reflect the original query's, right?  So we should do the following
at some point before returning:

if (sub_action->hasModifyingCTE)
    rule_action->hasModifyingCTE = true;

> I'll do some further checks, because the rewriting is recursive and
> tricky, so don't want to miss any cases ...

Always a good idea.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 5, 2021 at 5:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
>
> BTW, the original query's cteList is copied into sub_action query but
> not into rule_action for reasons I haven't looked very closely into,
> even though we'd like to ultimately set the latter's hasModifyingCTE
> to reflect the original query's, right?  So we should do the following
> at some point before returning:
>
> if (sub_action->hasModifyingCTE)
>     rule_action->hasModifyingCTE = true;
>

Actually, rule_action will usually point to sub_action (in which case,
no need to copy to rule_action), except if the rule action is an
INSERT...SELECT, which seems to be handled by some "kludge" according
to the following comment (and KLUDGE ALERT comment in the function
that is called):

    /*
     * Adjust rule action and qual to offset its varnos, so that we can merge
     * its rtable with the main parsetree's rtable.
     *
     * If the rule action is an INSERT...SELECT, the OLD/NEW rtable entries
     * will be in the SELECT part, and we have to modify that rather than the
     * top-level INSERT (kluge!).
     */
    sub_action = getInsertSelectQuery(rule_action, &sub_action_ptr);

So in that case (sub_action_ptr != NULL), within rule_action there is
a pointer to sub_action (RTE for the subquery), so whenever sub_action
is re-created, this pointer needs to be fixed-up.
It looks like I might need to copy hasModifyingCTE back to rule_action
in this case - but not 100% sure on it yet - still checking that. All
tests run so far pass without doing that though.
This is one reason for my original approach (though I admit, it was
not optimal) because at least it was reliable and detected the
modifyingCTE after all the rewriting and kludgy code had finished.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Fri, Feb 5, 2021 at 4:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Feb 5, 2021 at 5:21 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > BTW, the original query's cteList is copied into sub_action query but
> > not into rule_action for reasons I haven't looked very closely into,
> > even though we'd like to ultimately set the latter's hasModifyingCTE
> > to reflect the original query's, right?  So we should do the following
> > at some point before returning:
> >
> > if (sub_action->hasModifyingCTE)
> >     rule_action->hasModifyingCTE = true;
>
> Actually, rule_action will usually point to sub_action (in which case,
> no need to copy to rule_action), except if the rule action is an
> INSERT...SELECT, which seems to be handled by some "kludge" according
> to the following comment (and KLUDGE ALERT comment in the function
> that is called):
>
>     /*
>      * Adjust rule action and qual to offset its varnos, so that we can merge
>      * its rtable with the main parsetree's rtable.
>      *
>      * If the rule action is an INSERT...SELECT, the OLD/NEW rtable entries
>      * will be in the SELECT part, and we have to modify that rather than the
>      * top-level INSERT (kluge!).
>      */
>     sub_action = getInsertSelectQuery(rule_action, &sub_action_ptr);
>
> So in that case (sub_action_ptr != NULL), within rule_action there is
> a pointer to sub_action (RTE for the subquery), so whenever sub_action
> is re-created, this pointer needs to be fixed-up.
> It looks like I might need to copy hasModifyingCTE back to rule_action
> in this case - but not 100% sure on it yet - still checking that. All
> tests run so far pass without doing that though.

I guess we just don't have a test case where the rule_action query is
actually parallelized, like one that houzj shared a few emails ago.

> This is one reason for my original approach (though I admit, it was
> not optimal) because at least it was reliable and detected the
> modifyingCTE after all the rewriting and kludgy code had finished.

Yeah it's hard to go through all of this highly recursive legacy code
to be sure that hasModifyingCTE is consistent with reality in *all*
cases, but let's try to do it.  No other has* flags are set
after-the-fact, so I wouldn't bet on a committer letting this one
through.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 5, 2021 at 8:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
>
> > This is one reason for my original approach (though I admit, it was
> > not optimal) because at least it was reliable and detected the
> > modifyingCTE after all the rewriting and kludgy code had finished.
>
> Yeah it's hard to go through all of this highly recursive legacy code
> to be sure that hasModifyingCTE is consistent with reality in *all*
> cases, but let's try to do it.  No other has* flags are set
> after-the-fact, so I wouldn't bet on a committer letting this one
> through.
>

I have debugged the code a bit more now, and the following patch seems
to correctly fix the issue, at least for the known test cases.
(i.e. SELECT case, shared by houzj, and the INSERT...SELECT case, as
in the "with" regression tests, for which I originally detected the
issue)

diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..8f695b32ec 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -557,6 +557,12 @@ rewriteRuleAction(Query *parsetree,
         /* OK, it's safe to combine the CTE lists */
         sub_action->cteList = list_concat(sub_action->cteList,
                                           copyObject(parsetree->cteList));
+        if (parsetree->hasModifyingCTE)
+        {
+            sub_action->hasModifyingCTE = true;
+            if (sub_action_ptr)
+                rule_action->hasModifyingCTE = true;
+        }
     }

     /*

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Fri, Feb 5, 2021 at 6:56 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Feb 5, 2021 at 8:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > This is one reason for my original approach (though I admit, it was
> > > not optimal) because at least it was reliable and detected the
> > > modifyingCTE after all the rewriting and kludgy code had finished.
> >
> > Yeah it's hard to go through all of this highly recursive legacy code
> > to be sure that hasModifyingCTE is consistent with reality in *all*
> > cases, but let's try to do it.  No other has* flags are set
> > after-the-fact, so I wouldn't bet on a committer letting this one
> > through.
>
> I have debugged the code a bit more now, and the following patch seems
> to correctly fix the issue, at least for the known test cases.
> (i.e. SELECT case, shared by houzj, and the INSERT...SELECT case, as
> in the "with" regression tests, for which I originally detected the
> issue)
>
> diff --git a/src/backend/rewrite/rewriteHandler.c
> b/src/backend/rewrite/rewriteHandler.c
> index 0672f497c6..8f695b32ec 100644
> --- a/src/backend/rewrite/rewriteHandler.c
> +++ b/src/backend/rewrite/rewriteHandler.c
> @@ -557,6 +557,12 @@ rewriteRuleAction(Query *parsetree,
>          /* OK, it's safe to combine the CTE lists */
>          sub_action->cteList = list_concat(sub_action->cteList,
>                                            copyObject(parsetree->cteList));
> +        if (parsetree->hasModifyingCTE)
> +        {
> +            sub_action->hasModifyingCTE = true;
> +            if (sub_action_ptr)
> +                rule_action->hasModifyingCTE = true;
> +        }
>      }

That seems good enough as far as I am concerned.   Although either an
Assert as follows or a comment why the if (sub_action_ptr) is needed
seems warranted.

if (sub_action_ptr)
    rule_action->hasModifyingCTE = true;
else
    Assert(sub_action == rule_action);

Does the Assert seem overly confident?

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 5, 2021 at 11:12 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
>
> That seems good enough as far as I am concerned.   Although either an
> Assert as follows or a comment why the if (sub_action_ptr) is needed
> seems warranted.
>
> if (sub_action_ptr)
>     rule_action->hasModifyingCTE = true;
> else
>     Assert(sub_action == rule_action);
>
> Does the Assert seem overly confident?
>

No, the Assert is exactly right, and I'll add a comment too.
See below.
I'll post the patch separately, if you can't see any further issues.


diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..05b80bd347 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -557,6 +557,21 @@ rewriteRuleAction(Query *parsetree,
        /* OK, it's safe to combine the CTE lists */
        sub_action->cteList = list_concat(sub_action->cteList,
                                          copyObject(parsetree->cteList));
+
+       /*
+        * If the hasModifyingCTE flag is set in the source parsetree from
+        * which the CTE list is copied, the flag needs to be set in the
+        * sub_action and, if applicable, in the rule_action (INSERT...SELECT
+        * case).
+        */
+       if (parsetree->hasModifyingCTE)
+       {
+           sub_action->hasModifyingCTE = true;
+           if (sub_action_ptr)
+               rule_action->hasModifyingCTE = true;
+           else
+               Assert(sub_action == rule_action);
+       }
    }

    /*


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Fri, Feb 5, 2021 at 11:01 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Feb 5, 2021 at 11:12 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > That seems good enough as far as I am concerned.   Although either an
> > Assert as follows or a comment why the if (sub_action_ptr) is needed
> > seems warranted.
> >
> > if (sub_action_ptr)
> >     rule_action->hasModifyingCTE = true;
> > else
> >     Assert(sub_action == rule_action);
> >
> > Does the Assert seem overly confident?
>
> No, the Assert is exactly right, and I'll add a comment too.
> See below.
> I'll post the patch separately, if you can't see any further issues.
>
> diff --git a/src/backend/rewrite/rewriteHandler.c
> b/src/backend/rewrite/rewriteHandler.c
> index 0672f497c6..05b80bd347 100644
> --- a/src/backend/rewrite/rewriteHandler.c
> +++ b/src/backend/rewrite/rewriteHandler.c
> @@ -557,6 +557,21 @@ rewriteRuleAction(Query *parsetree,
>         /* OK, it's safe to combine the CTE lists */
>         sub_action->cteList = list_concat(sub_action->cteList,
>                                           copyObject(parsetree->cteList));
> +
> +       /*
> +        * If the hasModifyingCTE flag is set in the source parsetree from
> +        * which the CTE list is copied, the flag needs to be set in the
> +        * sub_action and, if applicable, in the rule_action (INSERT...SELECT
> +        * case).
> +        */
> +       if (parsetree->hasModifyingCTE)
> +       {
> +           sub_action->hasModifyingCTE = true;
> +           if (sub_action_ptr)
> +               rule_action->hasModifyingCTE = true;
> +           else
> +               Assert(sub_action == rule_action);
> +       }
>     }

LGTM, thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Posting an updated set of patches.
The only update is to include an improved, but only temporary, fix to
the query rewriter hasModifyingCTE issue (I separately posted a patch
for this but Tom Lane concluded that the issue is more complex than
initially thought, and no easy fix could be confidently attempted at
this point in time, so nothing ended up getting pushed - the issue
needs further investigation at another time).
See also comment in patch 0001.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> Posting an updated set of patches.

A minor comment about doc.

+  <para>
+    Where the above target table features are determined to be, at worst,
+    parallel-restricted, rather than parallel-unsafe, at least a parallel table
+    scan may be used in the query plan for the <literal>INSERT</literal>
+    statement. For more information about Parallel Safety, see
+    <xref linkend="parallel-safety"/>.
+  </para>

It seems does not mention that if target table is a foreign/temp table, a parallel table scan may be used.

So how about:

+  <para>
+    Where the target table is a foreign/temporary table or the above target table features
+    are determined to be, at worst, parallel-restricted, rather than parallel-unsafe,
+    at least a parallel table scan may be used in the query plan for the
+    <literal>INSERT</literal> statement. For more information about Parallel Safety,
+    see <xref linkend="parallel-safety"/>.
+  </para>

Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Feb 1, 2021 at 7:20 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote:
>
> Hi Greg,
>
> Recently, I was keeping evaluating performance of this patch(1/28 V13).
> Here I find a regression test case which is parallel insert with bitmap heap scan.
> when the target table has primary key or index, then the patched performance will have a 7%-19% declines than
unpatched.
>
> Could you please have a look about this?
>
> I tried max_parallel_workers_per_gather=2/4/8, and I didn't tune other parameters(like GUCs or other enforce parallel
parameters).
>
> 1. max_parallel_workers_per_gather=2(default)
> target_table        patched       master      %reg
> ------------------------------------------------------
> without_PK_index    83.683        142.183    -41%
> with_PK             382.824       321.101    19%
> with_index          372.682       324.246    15%
>
> 2. max_parallel_workers_per_gather=4
> target_table        patched       master      %reg
> ------------------------------------------------------
> without_PK_index    73.189        141.879     -48%
> with_PK             362.104       329.759     10%
> with_index          372.237       333.718     12%
>
> 3. max_parallel_workers_per_gather=8 (also set max_parallel_workers=16, max_worker_processes = 16)
> target_table        patched       master      %reg
> ------------------------------------------------------
> without_PK_index    75.072        146.100     -49%
> with_PK             365.312       324.339     13%
> with_index          362.636       338.366     7%
>
> Attached test_bitmap.sql which includes my test data and sql if you want to have a look.
>

Hi,

Did it actually use a parallel plan in your testing?
When I ran these tests with the Parallel INSERT patch applied, it did
not naturally choose a parallel plan for any of these cases.
So we can hardly blame the parallel insert with bitmap heap scan for
having worse performance, when based on costings, it doesn't actually
choose to use a parallel plan in this case.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Feb 8, 2021 at 6:00 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > Posting an updated set of patches.
>
> A minor comment about doc.
>
> +  <para>
> +    Where the above target table features are determined to be, at worst,
> +    parallel-restricted, rather than parallel-unsafe, at least a parallel table
> +    scan may be used in the query plan for the <literal>INSERT</literal>
> +    statement. For more information about Parallel Safety, see
> +    <xref linkend="parallel-safety"/>.
> +  </para>
>
> It seems does not mention that if target table is a foreign/temp table, a parallel table scan may be used.
>
> So how about:
>
> +  <para>
> +    Where the target table is a foreign/temporary table or the above target table features
> +    are determined to be, at worst, parallel-restricted, rather than parallel-unsafe,
> +    at least a parallel table scan may be used in the query plan for the
> +    <literal>INSERT</literal> statement. For more information about Parallel Safety,
> +    see <xref linkend="parallel-safety"/>.
> +  </para>
>

Thanks. You're right, I should probably update the docs to clarify
those two cases.
(I had removed them from the list of parallel-unsafe things, but not
pointed out that a parallel table scan could still be used in these
cases).

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Tang, Haiying"
Date:
> Did it actually use a parallel plan in your testing?
> When I ran these tests with the Parallel INSERT patch applied, it did 
> not naturally choose a parallel plan for any of these cases.

Yes, these cases pick parallel plan naturally on my test environment.  

postgres=# explain verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
   Workers Planned: 4
   ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0 width=0)
         ->  Parallel Bitmap Heap Scan on public.x1  (cost=3346.89..1272067.44 rows=20343 width=8)
               Output: x1.a, NULL::integer
               Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
               Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a > 199900000)))
               ->  BitmapOr  (cost=3346.89..3346.89 rows=178808 width=0)
                     ->  Bitmap Index Scan on x1_a_idx  (cost=0.00..1495.19 rows=80883 width=0)
                           Index Cond: (x1.a < 80000)
                     ->  Bitmap Index Scan on x1_a_idx  (cost=0.00..1811.01 rows=97925 width=0)
                           Index Cond: (x1.a > 199900000)

PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't do any parameters tuning in my test
session.

Regards,
Tang



Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > Did it actually use a parallel plan in your testing?
> > When I ran these tests with the Parallel INSERT patch applied, it did
> > not naturally choose a parallel plan for any of these cases.
> 
> Yes, these cases pick parallel plan naturally on my test environment.
> 
> postgres=# explain verbose insert into testscan select a from x where
> a<80000 or (a%2=0 and a>199900000);
>                                             QUERY PLAN
> ----------------------------------------------------------------------
> -----------------------------
>  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
>    Workers Planned: 4
>    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> width=0)
>          ->  Parallel Bitmap Heap Scan on public.x1
> (cost=3346.89..1272067.44 rows=20343 width=8)
>                Output: x1.a, NULL::integer
>                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
>                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> 199900000)))
>                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> width=0)
>                      ->  Bitmap Index Scan on x1_a_idx
> (cost=0.00..1495.19 rows=80883 width=0)
>                            Index Cond: (x1.a < 80000)
>                      ->  Bitmap Index Scan on x1_a_idx
> (cost=0.00..1811.01 rows=97925 width=0)
>                            Index Cond: (x1.a > 199900000)
> 
> PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> do any parameters tuning in my test session.

I reproduced this on my machine.

I think we'd better do "analyze" before insert which helps reproduce this easier.
Like:

-----
analyze;
explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
-----

Best regards,
houzj




Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
Greg, all

Thanks a lot for your work on this.

On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Posting an updated set of patches.

I've been looking at these patches, initially with an intention to
review mainly any partitioning-related concerns, but have some general
thoughts as well concerning mostly the patches 0001 and 0002.

* I've seen review comments on this thread where I think it's been
suggested that whatever max_parallel_hazard_for_modify() does had
better have been integrated into max_parallel_hazard() such that
there's no particular need for that function to exist.  For example,
the following:

+   /*
+    * UPDATE is not currently supported in parallel-mode, so prohibit
+    * INSERT...ON CONFLICT...DO UPDATE...
+    * In order to support update, even if only in the leader, some
+    * further work would need to be done. A mechanism would be needed
+    * for sharing combo-cids between leader and workers during
+    * parallel-mode, since for example, the leader might generate a
+    * combo-cid and it needs to be propagated to the workers.
+    */
+   if (parse->onConflict != NULL && parse->onConflict->action ==
ONCONFLICT_UPDATE)
+       return PROPARALLEL_UNSAFE;

could be placed in the following block in max_parallel_hazard():

    /*
     * When we're first invoked on a completely unplanned tree, we must
     * recurse into subqueries so to as to locate parallel-unsafe constructs
     * anywhere in the tree.
     */
    else if (IsA(node, Query))
    {
        Query      *query = (Query *) node;

        /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
        if (query->rowMarks != NULL)
        {
            context->max_hazard = PROPARALLEL_UNSAFE;
            return true;
        }

Furthermore, the following:

+   rte = rt_fetch(parse->resultRelation, parse->rtable);
+
+   /*
+    * The target table is already locked by the caller (this is done in the
+    * parse/analyze phase).
+    */
+   rel = table_open(rte->relid, NoLock);
+   (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType,
&context);
+   table_close(rel, NoLock);

can itself be wrapped in a function that's called from
max_parallel_hazard() by adding a new block for RangeTblEntry nodes
and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker().

That brings me to to this part of the hunk:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   hasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           hasSubQuery = true;
+           break;
+       }
+   }
+   if (!hasSubQuery)
+       return PROPARALLEL_UNSAFE;

The justification for this given in:

https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com

seems to be that the failure of a test case in
partition-concurrent-attach isolation suite is prevented if finding no
subquery RTEs in the query is flagged as parallel unsafe, which in
turn stops max_parallel_hazard_modify() from locking partitions for
safety checks in such cases.  But it feels unprincipled to have this
code to work around a specific test case that's failing.  I'd rather
edit the failing test case to disable parallel execution as
Tsunakawa-san suggested.

* Regarding function names:

+static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc,
+
max_parallel_hazard_context *context);
+static bool index_expr_max_parallel_hazard_for_modify(Relation rel,
+
max_parallel_hazard_context *context);
+static bool domain_max_parallel_hazard_for_modify(Oid typid,
max_parallel_hazard_context *context);
+static bool rel_max_parallel_hazard_for_modify(Relation rel,
+                                              CmdType command_type,
+
max_parallel_hazard_context *context)

IMO, it would be better to name these
target_rel_trigger_max_parallel_hazard(),
target_rel_index_max_parallel_hazard(), etc. rather than have
_for_modify at the end of these names to better connote that they
check the parallel safety of applying the modify operation to a given
target relation.  Also, put these prototypes just below that of
max_parallel_hazard() to have related things close by.

Attached please see v15_delta.diff showing the changes suggested above.

* I suspect that the following is broken in light of concurrent
attachment of partitions.

+
+       /* Recursively check each partition ... */
+       pdesc = RelationGetPartitionDesc(rel);

I think we'd need to use CreatePartitionDirectory() and retrieve the
PartitionDesc using PartitionDirectoryLookup().  Something we already
do when opening partitions for SELECT planning.

* I think that the concerns raised by Tsunakawa-san in:

https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com

regarding how this interacts with plancache.c deserve a look.
Specifically, a plan that uses parallel insert may fail to be
invalidated when partitions are altered directly (that is without
altering their root parent).  That would be because we are not adding
partition OIDs to PlannerGlobal.invalItems despite making a plan
that's based on checking their properties.  See this (tested with all
patches applied!):

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
create table foo (a) as select generate_series(1, 1000000);
prepare q as insert into rp select * from foo where a%2 = 0;
explain execute q;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..13041.54 rows=5642 width=0)
   Workers Planned: 2
   ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
         ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
               Filter: ((a % 2) = 0)
(5 rows)

-- create a parallel unsafe trigger (that's actually marked so)
directly on a partition
create or replace function make_table () returns trigger language
plpgsql as $$ begin create table bar(); return null; end; $$ parallel
unsafe;
create trigger ai_rp2 after insert on rp2 for each row execute
function make_table();CREATE TRIGGER

-- plan still parallel
explain execute q;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..13041.54 rows=5642 width=0)
   Workers Planned: 2
   ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
         ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
               Filter: ((a % 2) = 0)
(5 rows)

-- and because it is
execute q;
ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL statement "create table bar()"
PL/pgSQL function make_table() line 1 at SQL statement

-- OTOH, altering parent correctly discards the parallel plan
create trigger ai_rp after insert on rp for each row execute function
make_table();
explain execute q;
                           QUERY PLAN
----------------------------------------------------------------
 Insert on rp  (cost=0.00..19425.00 rows=0 width=0)
   ->  Seq Scan on foo  (cost=0.00..19425.00 rows=5000 width=4)
         Filter: ((a % 2) = 0)
(3 rows)

It's fair to argue that it would rarely make sense to use PREPARE for
bulk loads, but we need to tighten things up a bit here regardless.


--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Greg, all
>
> Thanks a lot for your work on this.
>
> On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > Posting an updated set of patches.
>
> I've been looking at these patches, initially with an intention to
> review mainly any partitioning-related concerns, but have some general
> thoughts as well concerning mostly the patches 0001 and 0002.
>
> * I've seen review comments on this thread where I think it's been
> suggested that whatever max_parallel_hazard_for_modify() does had
> better have been integrated into max_parallel_hazard() such that
> there's no particular need for that function to exist.  For example,
> the following:
>
> +   /*
> +    * UPDATE is not currently supported in parallel-mode, so prohibit
> +    * INSERT...ON CONFLICT...DO UPDATE...
> +    * In order to support update, even if only in the leader, some
> +    * further work would need to be done. A mechanism would be needed
> +    * for sharing combo-cids between leader and workers during
> +    * parallel-mode, since for example, the leader might generate a
> +    * combo-cid and it needs to be propagated to the workers.
> +    */
> +   if (parse->onConflict != NULL && parse->onConflict->action ==
> ONCONFLICT_UPDATE)
> +       return PROPARALLEL_UNSAFE;
>
> could be placed in the following block in max_parallel_hazard():
>
>     /*
>      * When we're first invoked on a completely unplanned tree, we must
>      * recurse into subqueries so to as to locate parallel-unsafe constructs
>      * anywhere in the tree.
>      */
>     else if (IsA(node, Query))
>     {
>         Query      *query = (Query *) node;
>
>         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
>         if (query->rowMarks != NULL)
>         {
>             context->max_hazard = PROPARALLEL_UNSAFE;
>             return true;
>         }
>
> Furthermore, the following:
>
> +   rte = rt_fetch(parse->resultRelation, parse->rtable);
> +
> +   /*
> +    * The target table is already locked by the caller (this is done in the
> +    * parse/analyze phase).
> +    */
> +   rel = table_open(rte->relid, NoLock);
> +   (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType,
> &context);
> +   table_close(rel, NoLock);
>
> can itself be wrapped in a function that's called from
> max_parallel_hazard() by adding a new block for RangeTblEntry nodes
> and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker().
>

Thanks, I think those suggestions look good to me.

> That brings me to to this part of the hunk:
>
> +   /*
> +    * If there is no underlying SELECT, a parallel table-modification
> +    * operation is not possible (nor desirable).
> +    */
> +   hasSubQuery = false;
> +   foreach(lc, parse->rtable)
> +   {
> +       rte = lfirst_node(RangeTblEntry, lc);
> +       if (rte->rtekind == RTE_SUBQUERY)
> +       {
> +           hasSubQuery = true;
> +           break;
> +       }
> +   }
> +   if (!hasSubQuery)
> +       return PROPARALLEL_UNSAFE;
>
> The justification for this given in:
>
> https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com
>
> seems to be that the failure of a test case in
> partition-concurrent-attach isolation suite is prevented if finding no
> subquery RTEs in the query is flagged as parallel unsafe, which in
> turn stops max_parallel_hazard_modify() from locking partitions for
> safety checks in such cases.  But it feels unprincipled to have this
> code to work around a specific test case that's failing.  I'd rather
> edit the failing test case to disable parallel execution as
> Tsunakawa-san suggested.
>

The code was not changed because of the test case (though it was
fortunate that the test case worked after the change).
The code check that you have identified above ensures that the INSERT
has an underlying SELECT, because the planner won't (and shouldn't
anyway) generate a parallel plan for INSERT...VALUES, so there is no
point doing any parallel-safety checks in this case.
It just so happens that the problem test case uses INSERT...VALUES -
and it shouldn't have triggered the parallel-safety checks for
parallel INSERT for this case anyway, because INSERT...VALUES can't
(and shouldn't) be parallelized.
So I will need to keep that check in the code somewhere, to avoid
overhead of parallel-safety checks in the case of INSERT...VALUES.

> * Regarding function names:
>
> +static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc,
> +
> max_parallel_hazard_context *context);
> +static bool index_expr_max_parallel_hazard_for_modify(Relation rel,
> +
> max_parallel_hazard_context *context);
> +static bool domain_max_parallel_hazard_for_modify(Oid typid,
> max_parallel_hazard_context *context);
> +static bool rel_max_parallel_hazard_for_modify(Relation rel,
> +                                              CmdType command_type,
> +
> max_parallel_hazard_context *context)
>
> IMO, it would be better to name these
> target_rel_trigger_max_parallel_hazard(),
> target_rel_index_max_parallel_hazard(), etc. rather than have
> _for_modify at the end of these names to better connote that they
> check the parallel safety of applying the modify operation to a given
> target relation.  Also, put these prototypes just below that of
> max_parallel_hazard() to have related things close by.
>
> Attached please see v15_delta.diff showing the changes suggested above.
>

OK, sounds reasonable. Thanks for the patch!

> * I suspect that the following is broken in light of concurrent
> attachment of partitions.
>
> +
> +       /* Recursively check each partition ... */
> +       pdesc = RelationGetPartitionDesc(rel);
>
> I think we'd need to use CreatePartitionDirectory() and retrieve the
> PartitionDesc using PartitionDirectoryLookup().  Something we already
> do when opening partitions for SELECT planning.
>
> * I think that the concerns raised by Tsunakawa-san in:
>
>
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
>
> regarding how this interacts with plancache.c deserve a look.
> Specifically, a plan that uses parallel insert may fail to be
> invalidated when partitions are altered directly (that is without
> altering their root parent).  That would be because we are not adding
> partition OIDs to PlannerGlobal.invalItems despite making a plan
> that's based on checking their properties.  See this (tested with all
> patches applied!):
>
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> create table foo (a) as select generate_series(1, 1000000);
> prepare q as insert into rp select * from foo where a%2 = 0;
> explain execute q;
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Gather  (cost=1000.00..13041.54 rows=5642 width=0)
>    Workers Planned: 2
>    ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
>          ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
>                Filter: ((a % 2) = 0)
> (5 rows)
>
> -- create a parallel unsafe trigger (that's actually marked so)
> directly on a partition
> create or replace function make_table () returns trigger language
> plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> unsafe;
> create trigger ai_rp2 after insert on rp2 for each row execute
> function make_table();CREATE TRIGGER
>
> -- plan still parallel
> explain execute q;
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Gather  (cost=1000.00..13041.54 rows=5642 width=0)
>    Workers Planned: 2
>    ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
>          ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
>                Filter: ((a % 2) = 0)
> (5 rows)
>
> -- and because it is
> execute q;
> ERROR:  cannot start commands during a parallel operation
> CONTEXT:  SQL statement "create table bar()"
> PL/pgSQL function make_table() line 1 at SQL statement
>
> -- OTOH, altering parent correctly discards the parallel plan
> create trigger ai_rp after insert on rp for each row execute function
> make_table();
> explain execute q;
>                            QUERY PLAN
> ----------------------------------------------------------------
>  Insert on rp  (cost=0.00..19425.00 rows=0 width=0)
>    ->  Seq Scan on foo  (cost=0.00..19425.00 rows=5000 width=4)
>          Filter: ((a % 2) = 0)
> (3 rows)
>
> It's fair to argue that it would rarely make sense to use PREPARE for
> bulk loads, but we need to tighten things up a bit here regardless.
>
>

Thanks, looks like you've identified some definite issues in the
partition support and some missing test cases to help detect them.
I'll look into it.

Thanks very much for your review of this and suggestions.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > Did it actually use a parallel plan in your testing?
> > > When I ran these tests with the Parallel INSERT patch applied, it did
> > > not naturally choose a parallel plan for any of these cases.
> >
> > Yes, these cases pick parallel plan naturally on my test environment.
> >
> > postgres=# explain verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> >                                             QUERY PLAN
> > ----------------------------------------------------------------------
> > -----------------------------
> >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> >    Workers Planned: 4
> >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > width=0)
> >          ->  Parallel Bitmap Heap Scan on public.x1
> > (cost=3346.89..1272067.44 rows=20343 width=8)
> >                Output: x1.a, NULL::integer
> >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> > 199900000)))
> >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > width=0)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1495.19 rows=80883 width=0)
> >                            Index Cond: (x1.a < 80000)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1811.01 rows=97925 width=0)
> >                            Index Cond: (x1.a > 199900000)
> >
> > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> > do any parameters tuning in my test session.
>
> I reproduced this on my machine.
>
> I think we'd better do "analyze" before insert which helps reproduce this easier.
> Like:
>
> -----
> analyze;
> explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
> -----

OK then.
Can you check if just the underlying SELECTs are run (without INSERT),
is there any performance degradation when compared to a non-parallel
scan?

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > > postgres=# explain verbose insert into testscan select a from x
> > > where
> > > a<80000 or (a%2=0 and a>199900000);
> > >                                             QUERY PLAN
> > >
> --------------------------------------------------------------------
> > > --
> > > -----------------------------
> > >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> > >    Workers Planned: 4
> > >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > > width=0)
> > >          ->  Parallel Bitmap Heap Scan on public.x1
> > > (cost=3346.89..1272067.44 rows=20343 width=8)
> > >                Output: x1.a, NULL::integer
> > >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> > >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND
> > > (x1.a >
> > > 199900000)))
> > >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > > width=0)
> > >                      ->  Bitmap Index Scan on x1_a_idx
> > > (cost=0.00..1495.19 rows=80883 width=0)
> > >                            Index Cond: (x1.a < 80000)
> > >                      ->  Bitmap Index Scan on x1_a_idx
> > > (cost=0.00..1811.01 rows=97925 width=0)
> > >                            Index Cond: (x1.a > 199900000)
> > >
> > > PSA is my postgresql.conf file, maybe you can have a look. Besides,
> > > I didn't do any parameters tuning in my test session.
> >
> > I reproduced this on my machine.
> >
> > I think we'd better do "analyze" before insert which helps reproduce this
> easier.
> > Like:
> >
> > -----
> > analyze;
> > explain analyze verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> > -----
> 
> OK then.
> Can you check if just the underlying SELECTs are run (without INSERT), is
> there any performance degradation when compared to a non-parallel scan?

It seems there is no performance degradation without insert.

Till now, what I found is that:
With tang's conf, when doing parallel insert, the walrecord is more than serial insert
(IMO, this is the main reason why it has performance degradation)
See the attatchment for the plan info.

I have tried alter the target table to unlogged and
then the performance degradation will not happen any more.

And the additional walrecord seems related to the index on the target table.
If the target table does not have any index, the wal record is the same between parallel plan and serial plan.
Also, it does not have performance degradation without index.

I am still looking at this problem, if someone think of something about it,
It's very grateful to share the knowledge with me.

Best regards,
houzj






Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Tue, Feb 9, 2021 at 5:49 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > > postgres=# explain verbose insert into testscan select a from x
> > > > where
> > > > a<80000 or (a%2=0 and a>199900000);
> > > >                                             QUERY PLAN
> > > >
> > --------------------------------------------------------------------
> > > > --
> > > > -----------------------------
> > > >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> > > >    Workers Planned: 4
> > > >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > > > width=0)
> > > >          ->  Parallel Bitmap Heap Scan on public.x1
> > > > (cost=3346.89..1272067.44 rows=20343 width=8)
> > > >                Output: x1.a, NULL::integer
> > > >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> > > >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND
> > > > (x1.a >
> > > > 199900000)))
> > > >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > > > width=0)
> > > >                      ->  Bitmap Index Scan on x1_a_idx
> > > > (cost=0.00..1495.19 rows=80883 width=0)
> > > >                            Index Cond: (x1.a < 80000)
> > > >                      ->  Bitmap Index Scan on x1_a_idx
> > > > (cost=0.00..1811.01 rows=97925 width=0)
> > > >                            Index Cond: (x1.a > 199900000)
> > > >
> > > > PSA is my postgresql.conf file, maybe you can have a look. Besides,
> > > > I didn't do any parameters tuning in my test session.
> > >
> > > I reproduced this on my machine.
> > >
> > > I think we'd better do "analyze" before insert which helps reproduce this
> > easier.
> > > Like:
> > >
> > > -----
> > > analyze;
> > > explain analyze verbose insert into testscan select a from x where
> > > a<80000 or (a%2=0 and a>199900000);
> > > -----
> >
> > OK then.
> > Can you check if just the underlying SELECTs are run (without INSERT), is
> > there any performance degradation when compared to a non-parallel scan?
>
> It seems there is no performance degradation without insert.
>
> Till now, what I found is that:
> With tang's conf, when doing parallel insert, the walrecord is more than serial insert
> (IMO, this is the main reason why it has performance degradation)
> See the attatchment for the plan info.
>
> I have tried alter the target table to unlogged and
> then the performance degradation will not happen any more.
>
> And the additional walrecord seems related to the index on the target table.
>

I think you might want to see which exact WAL records are extra by
using pg_waldump?

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> Till now, what I found is that:
> With tang's conf, when doing parallel insert, the walrecord is more than serial
> insert (IMO, this is the main reason why it has performance degradation) See
> the attatchment for the plan info.
> 
> I have tried alter the target table to unlogged and then the performance
> degradation will not happen any more.
> 
> And the additional walrecord seems related to the index on the target table.
> If the target table does not have any index, the wal record is the same between
> parallel plan and serial plan.
> Also, it does not have performance degradation without index.


[serial]
 Insert on public.testscan  (cost=3272.20..3652841.26 rows=0 width=0) (actual time=360.474..360.476 rows=0 loops=1)
   Buffers: shared hit=392569 read=3 dirtied=934 written=933
   WAL: records=260354 bytes=16259841

[parallel]
   ->  Insert on public.testscan  (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0
loops=5)
         Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
         WAL: records=260498 bytes=17019359


More pages are dirtied and written in the parallel execution.  Aren't the index and possibly the target table bigger
withparallel execution than with serial execution?  That may be due to the difference of inserts of index keys.
 


Regards
Takayuki Tsunakawa



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > Till now, what I found is that:
> > With tang's conf, when doing parallel insert, the walrecord is more
> > than serial insert (IMO, this is the main reason why it has
> > performance degradation) See the attatchment for the plan info.
> >
> > I have tried alter the target table to unlogged and then the
> > performance degradation will not happen any more.
> >
> > And the additional walrecord seems related to the index on the target
> table.
> > If the target table does not have any index, the wal record is the
> > same between parallel plan and serial plan.
> > Also, it does not have performance degradation without index.
> 
> 
> [serial]
>  Insert on public.testscan  (cost=3272.20..3652841.26 rows=0 width=0)
> (actual time=360.474..360.476 rows=0 loops=1)
>    Buffers: shared hit=392569 read=3 dirtied=934 written=933
>    WAL: records=260354 bytes=16259841
> 
> [parallel]
>    ->  Insert on public.testscan  (cost=3272.20..1260119.35 rows=0
> width=0) (actual time=378.227..378.229 rows=0 loops=5)
>          Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
>          WAL: records=260498 bytes=17019359
> 
> 
> More pages are dirtied and written in the parallel execution.  Aren't the
> index and possibly the target table bigger with parallel execution than
> with serial execution?  That may be due to the difference of inserts of
> index keys.

Yes, the table size and index size is bigger with parallel execution.

table and index's size after parallel insert
----------------------
postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
 pg_size_pretty
----------------
 4048 kB
(1 row)

postgres=#
postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
 pg_size_pretty
----------------
 4768 kB
(1 row)
----------------------

table and index's size after serial insert 
----------------------
postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
 pg_size_pretty
----------------
 2864 kB
(1 row)

postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
 pg_size_pretty
----------------
 4608 kB
----------------------


To Amit:
> I think you might want to see which exact WAL records are extra by using pg_waldump?

Yes, thanks for the hint, I was doing that and the result is as follow:

Heap wal record is the same between parallel and serial: (129999 which is the number count of the query result).

parallel Btree walrecord(130500 record):
----------------------
INSERT_LEAF:129500
INSERT_UPPER:497
SPLIT_L:172
SPLIT_R:328
INSERT_POST:0
DEDUP:0
VACUUM:0
DELETE:0
MARK_PAGE_HALFDEAD:0
UNLINK_PAGE:0
UNLINK_PAGE_META:0
NEWROOT:3
REUSE_PAGE:0
META_CLEANUP:0
----------------------

serial Btree walrecord(130355 record):
----------------------
INSERT_LEAF:129644
INSERT_UPPER:354
SPLIT_L:0
SPLIT_R:355
INSERT_POST:0
DEDUP:0
VACUUM:0
DELETE:0
MARK_PAGE_HALFDEAD:0
UNLINK_PAGE:0
UNLINK_PAGE_META:0
NEWROOT:2
REUSE_PAGE:0
META_CLEANUP:0
----------------------

IMO, due to the difference of inserts with parallel execution,
the btree insert's cost is more than serial.

At the same time, the parallel does not have a huge performance gain with bitmapscan,
So the extra cost of btree index will result in performance degradation.
Does it make sense ?

Best regards,
Houzj




Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Tue, Feb 9, 2021 at 10:30 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > That brings me to to this part of the hunk:
> >
> > +   /*
> > +    * If there is no underlying SELECT, a parallel table-modification
> > +    * operation is not possible (nor desirable).
> > +    */
> > +   hasSubQuery = false;
> > +   foreach(lc, parse->rtable)
> > +   {
> > +       rte = lfirst_node(RangeTblEntry, lc);
> > +       if (rte->rtekind == RTE_SUBQUERY)
> > +       {
> > +           hasSubQuery = true;
> > +           break;
> > +       }
> > +   }
> > +   if (!hasSubQuery)
> > +       return PROPARALLEL_UNSAFE;
> >
> > The justification for this given in:
> >
> > https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com
> >
> > seems to be that the failure of a test case in
> > partition-concurrent-attach isolation suite is prevented if finding no
> > subquery RTEs in the query is flagged as parallel unsafe, which in
> > turn stops max_parallel_hazard_modify() from locking partitions for
> > safety checks in such cases.  But it feels unprincipled to have this
> > code to work around a specific test case that's failing.  I'd rather
> > edit the failing test case to disable parallel execution as
> > Tsunakawa-san suggested.
> >
>
> The code was not changed because of the test case (though it was
> fortunate that the test case worked after the change).

Ah, I misread then, sorry.

> The code check that you have identified above ensures that the INSERT
> has an underlying SELECT, because the planner won't (and shouldn't
> anyway) generate a parallel plan for INSERT...VALUES, so there is no
> point doing any parallel-safety checks in this case.
> It just so happens that the problem test case uses INSERT...VALUES -
> and it shouldn't have triggered the parallel-safety checks for
> parallel INSERT for this case anyway, because INSERT...VALUES can't
> (and shouldn't) be parallelized.

AFAICS, max_parallel_hazard() path never bails from doing further
safety checks based on anything other than finding a query component
whose hazard level crosses context->max_interesting.  You're trying to
add something that bails based on second-guessing that a parallel path
would not be chosen, which I find somewhat objectionable.

If the main goal of bailing out is to avoid doing the potentially
expensive modification safety check on the target relation, maybe we
should try to somehow make the check less expensive.  I remember
reading somewhere in the thread about caching the result of this check
in relcache, but haven't closely studied the feasibility of doing so.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Jan 6, 2021 at 7:39 PM Antonin Houska <ah@cybertec.at> wrote:
>
>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         PlannerGlobal *glob = root->glob;
>         int                     rtoffset = list_length(glob->finalrtable);
>         ListCell   *lc;
> +       Plan       *finalPlan;
>
>         /*
>          * Add all the query's RTEs to the flattened rangetable.  The live ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         }
>
>         /* Now fix the Plan tree */
> -       return set_plan_refs(root, plan, rtoffset);
> +       finalPlan = set_plan_refs(root, plan, rtoffset);
> +       if (finalPlan != NULL && IsA(finalPlan, Gather))
> +       {
> +               Plan       *subplan = outerPlan(finalPlan);
> +
> +               if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> +               {
> +                       finalPlan->targetlist = copyObject(subplan->targetlist);
> +               }
> +       }
> +       return finalPlan;
>  }
>
> I'm not sure if the problem of missing targetlist should be handled here (BTW,
> NIL is the constant for an empty list, not NULL). Obviously this is a
> consequence of the fact that the ModifyTable node has no regular targetlist.
>
> Actually I don't quite understand why (in the current master branch) the
> targetlist initialized in set_plan_refs()
>
>         /*
>          * Set up the visible plan targetlist as being the same as
>          * the first RETURNING list. This is for the use of
>          * EXPLAIN; the executor won't pay any attention to the
>          * targetlist.  We postpone this step until here so that
>          * we don't have to do set_returning_clause_references()
>          * twice on identical targetlists.
>          */
>         splan->plan.targetlist = copyObject(linitial(newRL));
>
> is not used. Instead, ExecInitModifyTable() picks the first returning list
> again:
>
>         /*
>          * Initialize result tuple slot and assign its rowtype using the first
>          * RETURNING list.  We assume the rest will look the same.
>          */
>         mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>
> So if you set the targetlist in create_modifytable_plan() (according to
> best_path->returningLists), or even in create_modifytable_path(), and ensure
> that it gets propagated to the Gather node (generate_gather_pahs currently
> uses rel->reltarget), then you should no longer need to tweak
> setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the
> targetlist. However I don't guarantee that this is the best approach - some
> planner expert should speak up.
>


I've had a bit closer look at this particular issue.
I can see what you mean about the ModifyTable targetlist (that is set
in set_plan_refs()) getting overwritten by ExecInitModifyTable() -
which also contradicts the comment in set_plan_refs() that claims the
targetlist being set is used by EXPLAIN (which it is not). So the
current Postgres master branch does seem to be broken in that respect.

I did try your suggestion (and also remove my tweak), but I found that
in the T_Gather case of set_plan_refs() it does some processing (see
set_upper_references()) of the current Gather targetlist and subplan's
targetlist (and will then overwrite the Gather targetlist after that),
but in doing that processing it produces the error:
ERROR:  variable not found in subplan target list
I think that one of the fundamental problems is that, up to now,
ModifyTable has always been the top node in a (non-parallel) plan, but
now for Parallel INSERT we have a Gather node with ModifyTable in its
subplan. So the expected order of processing and node configuration
has changed.
For the moment (until perhaps a planner expert DOES speak up) I've
parked my temporary "fix" at the bottom of set_plan_refs(), simply
pointing the Gather node's targetlist to subplan's ModifyTable
targetlist.

    if (nodeTag(plan) == T_Gather)
    {
        Plan       *subplan = plan->lefttree;

        if (IsA(subplan, ModifyTable) &&
                        castNode(ModifyTable, subplan)->returningLists != NIL)
        {
            plan->targetlist = subplan->targetlist;
        }
    }

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > The code check that you have identified above ensures that the INSERT
> > has an underlying SELECT, because the planner won't (and shouldn't
> > anyway) generate a parallel plan for INSERT...VALUES, so there is no
> > point doing any parallel-safety checks in this case.
> > It just so happens that the problem test case uses INSERT...VALUES -
> > and it shouldn't have triggered the parallel-safety checks for
> > parallel INSERT for this case anyway, because INSERT...VALUES can't
> > (and shouldn't) be parallelized.
>
> AFAICS, max_parallel_hazard() path never bails from doing further
> safety checks based on anything other than finding a query component
> whose hazard level crosses context->max_interesting.

It's parallel UNSAFE because it's not safe or even possible to have a
parallel plan for this.
(as UNSAFE is the max hazard level, no point in referencing
context->max_interesting).
And there are existing cases of bailing out and not doing further
safety checks (even your v15_delta.diff patch placed code - for
bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
existing case in max_parallel_hazard_walker()):

    else if (IsA(node, Query))
    {
        Query      *query = (Query *) node;

        /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
        if (query->rowMarks != NULL)
        {
            context->max_hazard = PROPARALLEL_UNSAFE;
            return true;
        }


>You're trying to
> add something that bails based on second-guessing that a parallel path
> would not be chosen, which I find somewhat objectionable.
>
> If the main goal of bailing out is to avoid doing the potentially
> expensive modification safety check on the target relation, maybe we
> should try to somehow make the check less expensive.  I remember
> reading somewhere in the thread about caching the result of this check
> in relcache, but haven't closely studied the feasibility of doing so.
>

There's no "second-guessing" involved here.
There is no underlying way of dividing up the VALUES data of
"INSERT...VALUES" amongst the parallel workers, even if the planner
was updated to produce a parallel-plan for the "INSERT...VALUES" case
(apart from the fact that spawning off parallel workers to insert that
data would almost always result in worse performance than a
non-parallel plan...)
The division of work for parallel workers is part of the table AM
(scan) implementation, which is not invoked for "INSERT...VALUES".

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> table and index's size after parallel insert
> ----------------------
> postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
>  pg_size_pretty
> ----------------
>  4048 kB
> (1 row)
> 
> postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
>  pg_size_pretty
> ----------------
>  4768 kB
> (1 row)

Which of the above shows the table size?  What does pg_indexes_size() against an index (testscan_index) return?


> IMO, due to the difference of inserts with parallel execution,
> the btree insert's cost is more than serial.
> 
> At the same time, the parallel does not have a huge performance gain with
> bitmapscan,
> So the extra cost of btree index will result in performance degradation.

How did you know that the parallelism didn't have a huge performance gain with bitmap scan?

[serial]
   ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005
rows=129999loops=1)
 

[parallel]
         ->  Parallel Bitmap Heap Scan on public.x  (cost=3272.20..1260119.35 rows=19980 width=8) (actual
time=5.832..14.787rows=26000 loops=5)
 


Regards
Takayuki Tsunakawa



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > ----------------------
> > postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
> >  pg_size_pretty
> > ----------------
> >  4048 kB
> > (1 row)
> >
> > postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
> >  pg_size_pretty
> > ----------------
> >  4768 kB
> > (1 row)
> 
> Which of the above shows the table size?  What does pg_indexes_size()
> against an index (testscan_index) return?

Sorry, Maybe the tablename is a little confused,
but 'testscan_index' is actually a table's name.

pg_indexes_size will return the index's size attatched to the table.
pg_relation_size will return the table's size.

Did I miss something ?



> > IMO, due to the difference of inserts with parallel execution, the
> > btree insert's cost is more than serial.
> >
> > At the same time, the parallel does not have a huge performance gain
> > with bitmapscan, So the extra cost of btree index will result in
> > performance degradation.
> 
> How did you know that the parallelism didn't have a huge performance gain
> with bitmap scan?
> 
> [serial]
>    ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918
> width=8) (actual time=8.096..41.005 rows=129999 loops=1)
> 
> [parallel]
>          ->  Parallel Bitmap Heap Scan on public.x
> (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787
> rows=26000 loops=5)

I tested the case without insert(Just the query use bitmapscan):

[serial]:
postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000);
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on x  (cost=3258.59..3647578.53 rows=81338 width=4) (actual time=8.091..34.222 rows=129999 loops=1)
   Recheck Cond: ((a < 80000) OR (a > 199900000))
   Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000)))
   Rows Removed by Filter: 50000
   Heap Blocks: exact=975
   ->  BitmapOr  (cost=3258.59..3258.59 rows=173971 width=0) (actual time=7.964..7.965 rows=0 loops=1)
         ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.451..3.451 rows=79999
loops=1)
               Index Cond: (a < 80000)
         ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.513..4.513
rows=100000loops=1)
 
               Index Cond: (a > 199900000)
 Planning Time: 0.108 ms
 Execution Time: 38.136 ms

[parallel]
postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000);
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4258.59..1266704.42 rows=81338 width=4) (actual time=9.177..22.592 rows=129999 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Bitmap Heap Scan on x  (cost=3258.59..1257570.62 rows=20334 width=4) (actual time=6.402..12.882
rows=26000loops=5)
 
         Recheck Cond: ((a < 80000) OR (a > 199900000))
         Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000)))
         Rows Removed by Filter: 10000
         Heap Blocks: exact=1
         ->  BitmapOr  (cost=3258.59..3258.59 rows=173971 width=0) (actual time=8.785..8.786 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.871..3.871
rows=79999loops=1)
 
                     Index Cond: (a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.914..4.914
rows=100000loops=1)
 
                     Index Cond: (a > 199900000)
 Planning Time: 0.158 ms
 Execution Time: 26.951 ms
(15 rows)

It did have performance gain, but I think it's not huge enough to ignore the extra's index cost.
What do you think ?

Best regards,
houzj



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> Sorry, Maybe the tablename is a little confused,
> but 'testscan_index' is actually a table's name.
...
> Did I miss something ?

No, I don't think so.  I just wanted to know the facts correctly.  Your EXPLAIN output shows that the target table is
testscanas follows.  How does testscan_index relate to testscan?
 

   ->  Insert on public.testscan  (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0
loops=5)


> It did have performance gain, but I think it's not huge enough to ignore the
> extra's index cost.
> What do you think ?

Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not compensate for the loss of the parallel
insertoperation.
 

The loss is probably due to 1) more index page splits, 2) more buffer writes (table and index), and 3) internal locks
forthings such as relation extension and page content protection.  To investigate 3), we should want something like
[1],which tells us the wait event statistics (wait count and time for each wait event) per session or across the
instancelike Oracke, MySQL and EDB provides.  I want to continue this in the near future.
 


[1]
Add accumulated statistics for wait event
https://commitfest.postgresql.org/28/2332/


Regards
Takayuki Tsunakawa



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com>
> From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> > It did have performance gain, but I think it's not huge enough to ignore the
> > extra's index cost.
> > What do you think ?
> 
> Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not
> compensate for the loss of the parallel insert operation.
> 
> The loss is probably due to 1) more index page splits, 2) more buffer writes
> (table and index), and 3) internal locks for things such as relation extension
> and page content protection.  To investigate 3), we should want something
> like [1], which tells us the wait event statistics (wait count and time for each
> wait event) per session or across the instance like Oracke, MySQL and EDB
> provides.  I want to continue this in the near future.

What would the result look like if you turn off parallel_leader_participation?  If the leader is freed from
reading/writingthe table and index, the index page splits and internal lock contention may decrease enough to recover
partof the loss.
 

https://www.postgresql.org/docs/devel/parallel-plans.html

"In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more
indexesand builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the
cooperatingprocesses as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the
underlyingindex scan is not."
 


BTW, the following sentences seem to be revisited, because "the work to be done" is not the same for parallel INSERT as
forserial INSERT - the order of rows stored, table and index sizes, and what else?
 

https://www.postgresql.org/docs/devel/using-explain.html#USING-EXPLAIN-ANALYZE

"It's worth noting that although the data-modifying node can take a considerable amount of run time (here, it's
consumingthe lion's share of the time), the planner does not currently add anything to the cost estimates to account
forthat work. That's because the work to be done is the same for every correct query plan, so it doesn't affect
planningdecisions."
 


Regards
Takayuki Tsunakawa



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > > The code check that you have identified above ensures that the INSERT
> > > has an underlying SELECT, because the planner won't (and shouldn't
> > > anyway) generate a parallel plan for INSERT...VALUES, so there is no
> > > point doing any parallel-safety checks in this case.
> > > It just so happens that the problem test case uses INSERT...VALUES -
> > > and it shouldn't have triggered the parallel-safety checks for
> > > parallel INSERT for this case anyway, because INSERT...VALUES can't
> > > (and shouldn't) be parallelized.
> >
> > AFAICS, max_parallel_hazard() path never bails from doing further
> > safety checks based on anything other than finding a query component
> > whose hazard level crosses context->max_interesting.
>
> It's parallel UNSAFE because it's not safe or even possible to have a
> parallel plan for this.
> (as UNSAFE is the max hazard level, no point in referencing
> context->max_interesting).
> And there are existing cases of bailing out and not doing further
> safety checks (even your v15_delta.diff patch placed code - for
> bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
> existing case in max_parallel_hazard_walker()):
>
>     else if (IsA(node, Query))
>     {
>         Query      *query = (Query *) node;
>
>         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
>         if (query->rowMarks != NULL)
>         {
>             context->max_hazard = PROPARALLEL_UNSAFE;
>             return true;
>         }

In my understanding, the max_parallel_hazard() query tree walk is to
find constructs that are parallel unsafe in that they call code that
can't run in parallel mode.  For example, FOR UPDATE/SHARE on
traditional heap AM tuples calls AssignTransactionId() which doesn't
support being called in parallel mode.  Likewise ON CONFLICT ... DO
UPDATE calls heap_update() which doesn't support parallelism.  I'm not
aware of any such hazards downstream of ExecValuesScan().

> >You're trying to
> > add something that bails based on second-guessing that a parallel path
> > would not be chosen, which I find somewhat objectionable.
> >
> > If the main goal of bailing out is to avoid doing the potentially
> > expensive modification safety check on the target relation, maybe we
> > should try to somehow make the check less expensive.  I remember
> > reading somewhere in the thread about caching the result of this check
> > in relcache, but haven't closely studied the feasibility of doing so.
> >
>
> There's no "second-guessing" involved here.
> There is no underlying way of dividing up the VALUES data of
> "INSERT...VALUES" amongst the parallel workers, even if the planner
> was updated to produce a parallel-plan for the "INSERT...VALUES" case
> (apart from the fact that spawning off parallel workers to insert that
> data would almost always result in worse performance than a
> non-parallel plan...)
> The division of work for parallel workers is part of the table AM
> (scan) implementation, which is not invoked for "INSERT...VALUES".

I don't disagree that the planner would not normally assign a parallel
path simply to pull values out of a VALUES list mentioned in the
INSERT command, but deciding something based on the certainty of it in
an earlier planning phase seems odd to me.  Maybe that's just me
though.

--
Amit Langote
EDB: http://www.enterprisedb.com



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Amit Langote <amitlangote09@gmail.com>
> On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com>
> wrote:
> > There's no "second-guessing" involved here.
> > There is no underlying way of dividing up the VALUES data of
> > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > (apart from the fact that spawning off parallel workers to insert that
> > data would almost always result in worse performance than a
> > non-parallel plan...)
> > The division of work for parallel workers is part of the table AM
> > (scan) implementation, which is not invoked for "INSERT...VALUES".
> 
> I don't disagree that the planner would not normally assign a parallel
> path simply to pull values out of a VALUES list mentioned in the
> INSERT command, but deciding something based on the certainty of it in
> an earlier planning phase seems odd to me.  Maybe that's just me
> though.

In terms of competitiveness, Oracle does not run INSERT VALUES in parallel:


https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-6626C70C-876C-47A4-8C01-9B66574062D8

"The INSERT VALUES statement is never executed in parallel."


And SQL Server doesn't either:

https://docs.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15

"Starting with SQL Server 2016 (13.x) and database compatibility level 130, the INSERT … SELECT statement can be
executedin parallel when inserting into heaps or clustered columnstore indexes (CCI), and using the TABLOCK hint."
 


Regards
Takayuki Tsunakawa



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Wed, Feb 10, 2021 at 5:03 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> From: Amit Langote <amitlangote09@gmail.com>
> > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com>
> > wrote:
> > > There's no "second-guessing" involved here.
> > > There is no underlying way of dividing up the VALUES data of
> > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > > (apart from the fact that spawning off parallel workers to insert that
> > > data would almost always result in worse performance than a
> > > non-parallel plan...)
> > > The division of work for parallel workers is part of the table AM
> > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> >
> > I don't disagree that the planner would not normally assign a parallel
> > path simply to pull values out of a VALUES list mentioned in the
> > INSERT command, but deciding something based on the certainty of it in
> > an earlier planning phase seems odd to me.  Maybe that's just me
> > though.
>
> In terms of competitiveness, Oracle does not run INSERT VALUES in parallel:
>
>
https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-6626C70C-876C-47A4-8C01-9B66574062D8
>
> "The INSERT VALUES statement is never executed in parallel."
>
>
> And SQL Server doesn't either:
>
> https://docs.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15
>
> "Starting with SQL Server 2016 (13.x) and database compatibility level 130, the INSERT … SELECT statement can be
executedin parallel when inserting into heaps or clustered columnstore indexes (CCI), and using the TABLOCK hint." 

Just to be clear, I'm not suggesting that we should put effort into
making INSERT ... VALUES run in parallel.  I'm just raising my concern
about embedding the assumption in max_parallel_hazard() that it will
never make sense to do so.

Although, maybe there are other more pressing concerns to resolve, so
I will not insist too much on doing anything about this.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > > The code check that you have identified above ensures that the INSERT
> > > > has an underlying SELECT, because the planner won't (and shouldn't
> > > > anyway) generate a parallel plan for INSERT...VALUES, so there is no
> > > > point doing any parallel-safety checks in this case.
> > > > It just so happens that the problem test case uses INSERT...VALUES -
> > > > and it shouldn't have triggered the parallel-safety checks for
> > > > parallel INSERT for this case anyway, because INSERT...VALUES can't
> > > > (and shouldn't) be parallelized.
> > >
> > > AFAICS, max_parallel_hazard() path never bails from doing further
> > > safety checks based on anything other than finding a query component
> > > whose hazard level crosses context->max_interesting.
> >
> > It's parallel UNSAFE because it's not safe or even possible to have a
> > parallel plan for this.
> > (as UNSAFE is the max hazard level, no point in referencing
> > context->max_interesting).
> > And there are existing cases of bailing out and not doing further
> > safety checks (even your v15_delta.diff patch placed code - for
> > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
> > existing case in max_parallel_hazard_walker()):
> >
> >     else if (IsA(node, Query))
> >     {
> >         Query      *query = (Query *) node;
> >
> >         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> >         if (query->rowMarks != NULL)
> >         {
> >             context->max_hazard = PROPARALLEL_UNSAFE;
> >             return true;
> >         }
>
> In my understanding, the max_parallel_hazard() query tree walk is to
> find constructs that are parallel unsafe in that they call code that
> can't run in parallel mode.  For example, FOR UPDATE/SHARE on
> traditional heap AM tuples calls AssignTransactionId() which doesn't
> support being called in parallel mode.  Likewise ON CONFLICT ... DO
> UPDATE calls heap_update() which doesn't support parallelism.  I'm not
> aware of any such hazards downstream of ExecValuesScan().
>
> > >You're trying to
> > > add something that bails based on second-guessing that a parallel path
> > > would not be chosen, which I find somewhat objectionable.
> > >
> > > If the main goal of bailing out is to avoid doing the potentially
> > > expensive modification safety check on the target relation, maybe we
> > > should try to somehow make the check less expensive.  I remember
> > > reading somewhere in the thread about caching the result of this check
> > > in relcache, but haven't closely studied the feasibility of doing so.
> > >
> >
> > There's no "second-guessing" involved here.
> > There is no underlying way of dividing up the VALUES data of
> > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > (apart from the fact that spawning off parallel workers to insert that
> > data would almost always result in worse performance than a
> > non-parallel plan...)
> > The division of work for parallel workers is part of the table AM
> > (scan) implementation, which is not invoked for "INSERT...VALUES".
>
> I don't disagree that the planner would not normally assign a parallel
> path simply to pull values out of a VALUES list mentioned in the
> INSERT command, but deciding something based on the certainty of it in
> an earlier planning phase seems odd to me.  Maybe that's just me
> though.
>

I think it is more of a case where neither is a need for parallelism
nor we want to support parallelism of it. The other possibility for
such a check could be at some earlier phase say in standard_planner
[1] where we are doing checks for other constructs where we don't want
parallelism (I think the check for 'parse->hasModifyingCTE' is quite
similar). If you see in that check as well we just assume other
operations to be in the category of parallel-unsafe. I think we should
rule out such cases earlier than later. Do you have better ideas than
what Greg has done to avoid parallelism for such cases?

[1] -
standard_planner()
{
..
if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
!IsParallelWorker())
{
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
}
else
{
/* skip the query tree scan, just assume it's unsafe */
glob->maxParallelHazard = PROPARALLEL_UNSAFE;
glob->parallelModeOK = false;
}

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Wed, Feb 10, 2021 at 11:13 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>
> The loss is probably due to 1) more index page splits, 2) more buffer writes (table and index), and 3) internal locks
forthings such as relation extension and page content protection.  To investigate 3), we should want something like
[1],which tells us the wait event statistics (wait count and time for each wait event) per session or across the
instancelike Oracke, MySQL and EDB provides.  I want to continue this in the near future. 
>
>

I think we might mitigate such losses with a different patch where we
can do a bulk insert for Insert .. Select something like we are
discussing in the other thread [1]. I wonder if these performance
characteristics are due to the reason of underlying bitmap heap scan.
What are the results if disable the bitmap heap scan(Set
enable_bitmapscan = off)? If that happens to be true, then we might
also want to consider if in some way we can teach parallel insert to
cost more in such cases. Another thing we can try is to integrate a
parallel-insert patch with the patch on another thread [1] and see if
that makes any difference but not sure if we want to go there at this
stage unless it is simple to try that out?

[1] - https://www.postgresql.org/message-id/20200508072545.GA9701%40telsasoft.com

--
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Wed, Feb 10, 2021 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > It's parallel UNSAFE because it's not safe or even possible to have a
> > > parallel plan for this.
> > > (as UNSAFE is the max hazard level, no point in referencing
> > > context->max_interesting).
> > > And there are existing cases of bailing out and not doing further
> > > safety checks (even your v15_delta.diff patch placed code - for
> > > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
> > > existing case in max_parallel_hazard_walker()):
> > >
> > >     else if (IsA(node, Query))
> > >     {
> > >         Query      *query = (Query *) node;
> > >
> > >         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> > >         if (query->rowMarks != NULL)
> > >         {
> > >             context->max_hazard = PROPARALLEL_UNSAFE;
> > >             return true;
> > >         }
> >
> > In my understanding, the max_parallel_hazard() query tree walk is to
> > find constructs that are parallel unsafe in that they call code that
> > can't run in parallel mode.  For example, FOR UPDATE/SHARE on
> > traditional heap AM tuples calls AssignTransactionId() which doesn't
> > support being called in parallel mode.  Likewise ON CONFLICT ... DO
> > UPDATE calls heap_update() which doesn't support parallelism.  I'm not
> > aware of any such hazards downstream of ExecValuesScan().
> >
> > > >You're trying to
> > > > add something that bails based on second-guessing that a parallel path
> > > > would not be chosen, which I find somewhat objectionable.
> > > >
> > > > If the main goal of bailing out is to avoid doing the potentially
> > > > expensive modification safety check on the target relation, maybe we
> > > > should try to somehow make the check less expensive.  I remember
> > > > reading somewhere in the thread about caching the result of this check
> > > > in relcache, but haven't closely studied the feasibility of doing so.
> > > >
> > >
> > > There's no "second-guessing" involved here.
> > > There is no underlying way of dividing up the VALUES data of
> > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > > (apart from the fact that spawning off parallel workers to insert that
> > > data would almost always result in worse performance than a
> > > non-parallel plan...)
> > > The division of work for parallel workers is part of the table AM
> > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> >
> > I don't disagree that the planner would not normally assign a parallel
> > path simply to pull values out of a VALUES list mentioned in the
> > INSERT command, but deciding something based on the certainty of it in
> > an earlier planning phase seems odd to me.  Maybe that's just me
> > though.
> >
>
> I think it is more of a case where neither is a need for parallelism
> nor we want to support parallelism of it. The other possibility for
> such a check could be at some earlier phase say in standard_planner
> [1] where we are doing checks for other constructs where we don't want
> parallelism (I think the check for 'parse->hasModifyingCTE' is quite
> similar). If you see in that check as well we just assume other
> operations to be in the category of parallel-unsafe. I think we should
> rule out such cases earlier than later. Do you have better ideas than
> what Greg has done to avoid parallelism for such cases?
>
> [1] -
> standard_planner()
> {
> ..
> if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> IsUnderPostmaster &&
> parse->commandType == CMD_SELECT &&
> !parse->hasModifyingCTE &&
> max_parallel_workers_per_gather > 0 &&
> !IsParallelWorker())
> {
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse);
> glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> }
> else
> {
> /* skip the query tree scan, just assume it's unsafe */
> glob->maxParallelHazard = PROPARALLEL_UNSAFE;
> glob->parallelModeOK = false;
> }

Yeah, maybe having the block I was commenting on, viz.:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   hasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           hasSubQuery = true;
+           break;
+       }
+   }
+   if (!hasSubQuery)
+       return PROPARALLEL_UNSAFE;

before the standard_planner() block you quoted might be a good idea.
So something like this:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   rangeTablehasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           rangeTableHasSubQuery = true;
+           break;
+       }
+   }

    if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
        IsUnderPostmaster &&
        (parse->commandType == CMD_SELECT ||
         (IsModifySupportedInParallelMode(parse->commandType) &&
          rangeTableHasSubQuery)) &&
        !parse->hasModifyingCTE &&
        max_parallel_workers_per_gather > 0 &&
        !IsParallelWorker())
    {
        /* all the cheap tests pass, so scan the query tree */
        glob->maxParallelHazard = max_parallel_hazard(parse);
        glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
    }
    else
    {
        /* skip the query tree scan, just assume it's unsafe */
        glob->maxParallelHazard = PROPARALLEL_UNSAFE;
        glob->parallelModeOK = false;
    }

-- 
Amit Langote
EDB: http://www.enterprisedb.com



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Amit Langote <amitlangote09@gmail.com>
> Just to be clear, I'm not suggesting that we should put effort into
> making INSERT ... VALUES run in parallel.  I'm just raising my concern
> about embedding the assumption in max_parallel_hazard() that it will
> never make sense to do so.

I'm sorry I misunderstood your suggestion.  So, you're suggesting that it may be better to place the VALUES existence
checkoutside max_parallel_hazard().  (I may be a bit worried if I may misunderstanding in a different way.)
 

The description of max_parallel_hazard() gave me an impression that this is the right place to check VALUES, because
itsrole can be paraphrased in simpler words like "Tell you if the given Query is safe for parallel execution."  In that
regard,the standard_planner()'s if conditions that check Query->commandType and Query->hasModifyingCTE can be moved
intomax_parallel_hazard() too.
 

But looking closer to the description, it says "Returns the worst function hazard."  Function hazard?  Should this
functiononly check functions?  Or do we want to modify this description and get max_parallel_hazard() to provide more
service?


/*
 * max_parallel_hazard
 *      Find the worst parallel-hazard level in the given query
 *
 * Returns the worst function hazard property (the earliest in this list:
 * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE) that can
 * be found in the given parsetree.  We use this to find out whether the query
 * can be parallelized at all.  The caller will also save the result in
 * PlannerGlobal so as to short-circuit checks of portions of the querytree
 * later, in the common case where everything is SAFE.
 */
char
max_parallel_hazard(Query *parse)


Regards
Takayuki Tsunakawa



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > >
> > >     else if (IsA(node, Query))
> > >     {
> > >         Query      *query = (Query *) node;
> > >
> > >         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> > >         if (query->rowMarks != NULL)
> > >         {
> > >             context->max_hazard = PROPARALLEL_UNSAFE;
> > >             return true;
> > >         }
> >
> > In my understanding, the max_parallel_hazard() query tree walk is to
> > find constructs that are parallel unsafe in that they call code that
> > can't run in parallel mode.  For example, FOR UPDATE/SHARE on
> > traditional heap AM tuples calls AssignTransactionId() which doesn't
> > support being called in parallel mode.  Likewise ON CONFLICT ... DO
> > UPDATE calls heap_update() which doesn't support parallelism.  I'm not
> > aware of any such hazards downstream of ExecValuesScan().
> >
> > > >You're trying to
> > > > add something that bails based on second-guessing that a parallel
> > > >path  would not be chosen, which I find somewhat objectionable.
> > > >
> > > > If the main goal of bailing out is to avoid doing the potentially
> > > > expensive modification safety check on the target relation, maybe
> > > > we should try to somehow make the check less expensive.  I
> > > > remember reading somewhere in the thread about caching the result
> > > > of this check in relcache, but haven't closely studied the feasibility
> of doing so.
> > > >
> > >
> > > There's no "second-guessing" involved here.
> > > There is no underlying way of dividing up the VALUES data of
> > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > was updated to produce a parallel-plan for the "INSERT...VALUES"
> > > case (apart from the fact that spawning off parallel workers to
> > > insert that data would almost always result in worse performance
> > > than a non-parallel plan...) The division of work for parallel
> > > workers is part of the table AM
> > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> >
> > I don't disagree that the planner would not normally assign a parallel
> > path simply to pull values out of a VALUES list mentioned in the
> > INSERT command, but deciding something based on the certainty of it in
> > an earlier planning phase seems odd to me.  Maybe that's just me
> > though.
> >
> 
> I think it is more of a case where neither is a need for parallelism nor
> we want to support parallelism of it. The other possibility for such a check
> could be at some earlier phase say in standard_planner [1] where we are
> doing checks for other constructs where we don't want parallelism (I think
> the check for 'parse->hasModifyingCTE' is quite similar). If you see in
> that check as well we just assume other operations to be in the category
> of parallel-unsafe. I think we should rule out such cases earlier than later.
> Do you have better ideas than what Greg has done to avoid parallelism for
> such cases?
> 
> [1] -
> standard_planner()
> {
> ..
> if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster &&
> parse->commandType == CMD_SELECT &&
> !parse->hasModifyingCTE &&
> max_parallel_workers_per_gather > 0 &&
> !IsParallelWorker())
> {
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK =
> glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> }
> else
> {
> /* skip the query tree scan, just assume it's unsafe */
> glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false;
> }

+1.

In the current parallel_dml option patch. I put this check and some high-level check in a separate function called
is_parallel_possible_for_modify.


- * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE
+ * Check at a high-level if parallel mode is able to be used for the specified
+ * table-modification statement.
+ * It's not possible in the following cases:
+ *
+ *  1) enable_parallel_dml is off
+ *  2) UPDATE or DELETE command
+ *  3) INSERT...ON CONFLICT...DO UPDATE
+ *  4) INSERT without SELECT on a relation
+ *  5) the reloption parallel_dml_enabled is not set for the target table
+ *
+ * (Note: we don't do in-depth parallel-safety checks here, we do only the
+ * cheaper tests that can quickly exclude obvious cases for which
+ * parallelism isn't supported, to avoid having to do further parallel-safety
+ * checks for these)
  */
+bool
+is_parallel_possible_for_modify(Query *parse)





And I put the function at earlier place like the following:


    if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
         IsUnderPostmaster &&
         (parse->commandType == CMD_SELECT ||
-        (enable_parallel_dml &&
-        IsModifySupportedInParallelMode(parse->commandType))) &&
+        is_parallel_possible_for_modify(parse)) &&
         !parse->hasModifyingCTE &&
         max_parallel_workers_per_gather > 0 &&
         !IsParallelWorker())


If this looks good, maybe we can merge this change.

Best regards,
houzj







Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Wed, Feb 10, 2021 at 5:52 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> From: Amit Langote <amitlangote09@gmail.com>
> > Just to be clear, I'm not suggesting that we should put effort into
> > making INSERT ... VALUES run in parallel.  I'm just raising my concern
> > about embedding the assumption in max_parallel_hazard() that it will
> > never make sense to do so.
>
> I'm sorry I misunderstood your suggestion.  So, you're suggesting that it may be better to place the VALUES existence
checkoutside max_parallel_hazard().  (I may be a bit worried if I may misunderstanding in a different way.)
 

To add context to my comments, here's the block of code in the patch I
was commenting on:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   hasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           hasSubQuery = true;
+           break;
+       }
+   }
+   if (!hasSubQuery)
+       return PROPARALLEL_UNSAFE;

For a modification query, this makes max_parallel_hazard() return that
it is unsafe to parallelize the query because it doesn't contain a
subquery RTE, or only contains a VALUES RTE.

I was trying to say that inside max_parallel_hazard() seems to be a
wrong place to reject parallelism for modification if only because
there are no subquery RTEs in the query.  Although now I'm thinking
that maybe it's okay as long as it's appropriately placed.  I shared
one suggestion in my reply to Amit K's email.

> The description of max_parallel_hazard() gave me an impression that this is the right place to check VALUES, because
itsrole can be paraphrased in simpler words like "Tell you if the given Query is safe for parallel execution."
 
>
> In that regard, the standard_planner()'s if conditions that check Query->commandType and Query->hasModifyingCTE can
bemoved into max_parallel_hazard() too.
 
>
> But looking closer to the description, it says "Returns the worst function hazard."  Function hazard?  Should this
functiononly check functions?  Or do we want to modify this description and get max_parallel_hazard() to provide more
service?

Yeah, updating the description to be more general may make sense.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Wed, Feb 10, 2021 at 5:50 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 10, 2021 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Feb 10, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > > It's parallel UNSAFE because it's not safe or even possible to have a
> > > > parallel plan for this.
> > > > (as UNSAFE is the max hazard level, no point in referencing
> > > > context->max_interesting).
> > > > And there are existing cases of bailing out and not doing further
> > > > safety checks (even your v15_delta.diff patch placed code - for
> > > > bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
> > > > existing case in max_parallel_hazard_walker()):
> > > >
> > > >     else if (IsA(node, Query))
> > > >     {
> > > >         Query      *query = (Query *) node;
> > > >
> > > >         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> > > >         if (query->rowMarks != NULL)
> > > >         {
> > > >             context->max_hazard = PROPARALLEL_UNSAFE;
> > > >             return true;
> > > >         }
> > >
> > > In my understanding, the max_parallel_hazard() query tree walk is to
> > > find constructs that are parallel unsafe in that they call code that
> > > can't run in parallel mode.  For example, FOR UPDATE/SHARE on
> > > traditional heap AM tuples calls AssignTransactionId() which doesn't
> > > support being called in parallel mode.  Likewise ON CONFLICT ... DO
> > > UPDATE calls heap_update() which doesn't support parallelism.  I'm not
> > > aware of any such hazards downstream of ExecValuesScan().
> > >
> > > > >You're trying to
> > > > > add something that bails based on second-guessing that a parallel path
> > > > > would not be chosen, which I find somewhat objectionable.
> > > > >
> > > > > If the main goal of bailing out is to avoid doing the potentially
> > > > > expensive modification safety check on the target relation, maybe we
> > > > > should try to somehow make the check less expensive.  I remember
> > > > > reading somewhere in the thread about caching the result of this check
> > > > > in relcache, but haven't closely studied the feasibility of doing so.
> > > > >
> > > >
> > > > There's no "second-guessing" involved here.
> > > > There is no underlying way of dividing up the VALUES data of
> > > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > > > (apart from the fact that spawning off parallel workers to insert that
> > > > data would almost always result in worse performance than a
> > > > non-parallel plan...)
> > > > The division of work for parallel workers is part of the table AM
> > > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> > >
> > > I don't disagree that the planner would not normally assign a parallel
> > > path simply to pull values out of a VALUES list mentioned in the
> > > INSERT command, but deciding something based on the certainty of it in
> > > an earlier planning phase seems odd to me.  Maybe that's just me
> > > though.
> > >
> >
> > I think it is more of a case where neither is a need for parallelism
> > nor we want to support parallelism of it. The other possibility for
> > such a check could be at some earlier phase say in standard_planner
> > [1] where we are doing checks for other constructs where we don't want
> > parallelism (I think the check for 'parse->hasModifyingCTE' is quite
> > similar). If you see in that check as well we just assume other
> > operations to be in the category of parallel-unsafe. I think we should
> > rule out such cases earlier than later. Do you have better ideas than
> > what Greg has done to avoid parallelism for such cases?
> >
> > [1] -
> > standard_planner()
> > {
> > ..
> > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> > IsUnderPostmaster &&
> > parse->commandType == CMD_SELECT &&
> > !parse->hasModifyingCTE &&
> > max_parallel_workers_per_gather > 0 &&
> > !IsParallelWorker())
> > {
> > /* all the cheap tests pass, so scan the query tree */
> > glob->maxParallelHazard = max_parallel_hazard(parse);
> > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > }
> > else
> > {
> > /* skip the query tree scan, just assume it's unsafe */
> > glob->maxParallelHazard = PROPARALLEL_UNSAFE;
> > glob->parallelModeOK = false;
> > }
>
> Yeah, maybe having the block I was commenting on, viz.:
>
> +   /*
> +    * If there is no underlying SELECT, a parallel table-modification
> +    * operation is not possible (nor desirable).
> +    */
> +   hasSubQuery = false;
> +   foreach(lc, parse->rtable)
> +   {
> +       rte = lfirst_node(RangeTblEntry, lc);
> +       if (rte->rtekind == RTE_SUBQUERY)
> +       {
> +           hasSubQuery = true;
> +           break;
> +       }
> +   }
> +   if (!hasSubQuery)
> +       return PROPARALLEL_UNSAFE;
>
> before the standard_planner() block you quoted might be a good idea.
> So something like this:
>
> +   /*
> +    * If there is no underlying SELECT, a parallel table-modification
> +    * operation is not possible (nor desirable).
> +    */
> +   rangeTablehasSubQuery = false;
> +   foreach(lc, parse->rtable)
> +   {
> +       rte = lfirst_node(RangeTblEntry, lc);
> +       if (rte->rtekind == RTE_SUBQUERY)
> +       {
> +           rangeTableHasSubQuery = true;
> +           break;
> +       }
> +   }
>
>     if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>         IsUnderPostmaster &&
>         (parse->commandType == CMD_SELECT ||
>          (IsModifySupportedInParallelMode(parse->commandType) &&
>           rangeTableHasSubQuery)) &&
>         !parse->hasModifyingCTE &&
>         max_parallel_workers_per_gather > 0 &&
>         !IsParallelWorker())
>     {
>         /* all the cheap tests pass, so scan the query tree */
>         glob->maxParallelHazard = max_parallel_hazard(parse);
>         glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
>     }
>     else
>     {
>         /* skip the query tree scan, just assume it's unsafe */
>         glob->maxParallelHazard = PROPARALLEL_UNSAFE;
>         glob->parallelModeOK = false;
>     }

On second thought, maybe we could even put the hasSubQuery-based
short-circuit in the following block of max_parallel_hazard_walker():

    /*
     * When we're first invoked on a completely unplanned tree, we must
     * recurse into subqueries so to as to locate parallel-unsafe constructs
     * anywhere in the tree.
     */
    else if (IsA(node, Query))
    {
        Query      *query = (Query *) node;

        /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
        if (query->rowMarks != NULL)
        {
            context->max_hazard = PROPARALLEL_UNSAFE;
            return true;
        }

Also, update the comment to mention we bail out if (!hasSubQuery) as a
special case.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> What are the results if disable the bitmap heap scan(Set enable_bitmapscan
> = off)? If that happens to be true, then we might also want to consider
> if in some way we can teach parallel insert to cost more in such cases.
> Another thing we can try is to integrate a parallel-insert patch with the
> patch on another thread [1] and see if that makes any difference but not
> sure if we want to go there at this stage unless it is simple to try that
> out?

If we diable bitmapscan, the performance degradation seems will not happen.

[Parallel]
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL)  insert into testscan select a from x where a<80000 or (a%2=0 and
a>199900000);
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..2090216.68 rows=81338 width=0) (actual time=0.226..5488.455 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027
   WAL: records=260400 bytes=16549513
   ->  Insert on public.testscan  (cost=0.00..2081082.88 rows=0 width=0) (actual time=5483.113..5483.114 rows=0
loops=4)
         Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027
         WAL: records=260400 bytes=16549513
         Worker 0:  actual time=5483.116..5483.117 rows=0 loops=1
           Buffers: shared hit=36306 read=264288 dirtied=100 written=49
           WAL: records=23895 bytes=1575860
         Worker 1:  actual time=5483.220..5483.222 rows=0 loops=1
           Buffers: shared hit=39750 read=280476 dirtied=101 written=106
           WAL: records=26141 bytes=1685083
         Worker 2:  actual time=5482.844..5482.845 rows=0 loops=1
           Buffers: shared hit=38660 read=263713 dirtied=105 written=250
           WAL: records=25318 bytes=1657396
         Worker 3:  actual time=5483.272..5483.274 rows=0 loops=1
           Buffers: shared hit=278648 read=271058 dirtied=678 written=622
           WAL: records=185046 bytes=11631174
         ->  Parallel Seq Scan on public.x  (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248
rows=32500loops=4)
 
               Output: x.a, NULL::integer
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
               Rows Removed by Filter: 49967500
               Buffers: shared hit=1551 read=1079531
               Worker 0:  actual time=5335.456..5340.757 rows=11924 loops=1
                 Buffers: shared hit=281 read=264288
               Worker 1:  actual time=5335.559..5341.766 rows=13049 loops=1
                 Buffers: shared hit=281 read=280476
               Worker 2:  actual time=5335.534..5340.964 rows=12636 loops=1
                 Buffers: shared hit=278 read=263712
               Worker 3:  actual time=0.015..5125.503 rows=92390 loops=1
                 Buffers: shared hit=711 read=271055
 Planning:
   Buffers: shared hit=19
 Planning Time: 0.175 ms
 Execution Time: 5488.493 ms

[Serial]
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL)  insert into testscan select a from x where a<80000 or (a%2=0 and
a>199900000);
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan  (cost=0.00..5081085.52 rows=0 width=0) (actual time=19311.642..19311.643 rows=0 loops=1)
   Buffers: shared hit=392485 read=1079694 dirtied=934 written=933
   WAL: records=260354 bytes=16259841
   ->  Seq Scan on public.x  (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=129999
loops=1)
         Output: x.a, NULL::integer
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 199870001
         Buffers: shared hit=1391 read=1079691
 Planning:
   Buffers: shared hit=10
 Planning Time: 0.125 ms
 Execution Time: 19311.700 ms

Best regards,
houzj



RE: Parallel INSERT (INTO ... SELECT ...)

From
"Hou, Zhijie"
Date:
> > > It did have performance gain, but I think it's not huge enough to
> > > ignore the extra's index cost.
> > > What do you think ?
> >
> > Yes... as you suspect, I'm afraid the benefit from parallel bitmap
> > scan may not compensate for the loss of the parallel insert operation.
> >
> > The loss is probably due to 1) more index page splits, 2) more buffer
> > writes (table and index), and 3) internal locks for things such as
> > relation extension and page content protection.  To investigate 3), we
> > should want something like [1], which tells us the wait event
> > statistics (wait count and time for each wait event) per session or
> > across the instance like Oracke, MySQL and EDB provides.  I want to
> continue this in the near future.
> 
> What would the result look like if you turn off
> parallel_leader_participation?  If the leader is freed from
> reading/writing the table and index, the index page splits and internal
> lock contention may decrease enough to recover part of the loss.
> 
> https://www.postgresql.org/docs/devel/parallel-plans.html
> 
> "In a parallel bitmap heap scan, one process is chosen as the leader. That
> process performs a scan of one or more indexes and builds a bitmap indicating
> which table blocks need to be visited. These blocks are then divided among
> the cooperating processes as in a parallel sequential scan. In other words,
> the heap scan is performed in parallel, but the underlying index scan is
> not."

If I disable parallel_leader_participation.

For max_parallel_workers_per_gather = 4, It still have performance degradation.

For max_parallel_workers_per_gather = 2, the performance degradation will not happen in most of the case.
There is sometimes a noise(performance degradation), but most of result(about 80%) is good.

Best regards,
houzj



Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Feb 10, 2021 at 8:59 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > >
> > > >     else if (IsA(node, Query))
> > > >     {
> > > >         Query      *query = (Query *) node;
> > > >
> > > >         /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> > > >         if (query->rowMarks != NULL)
> > > >         {
> > > >             context->max_hazard = PROPARALLEL_UNSAFE;
> > > >             return true;
> > > >         }
> > >
> > > In my understanding, the max_parallel_hazard() query tree walk is to
> > > find constructs that are parallel unsafe in that they call code that
> > > can't run in parallel mode.  For example, FOR UPDATE/SHARE on
> > > traditional heap AM tuples calls AssignTransactionId() which doesn't
> > > support being called in parallel mode.  Likewise ON CONFLICT ... DO
> > > UPDATE calls heap_update() which doesn't support parallelism.  I'm not
> > > aware of any such hazards downstream of ExecValuesScan().
> > >
> > > > >You're trying to
> > > > > add something that bails based on second-guessing that a parallel
> > > > >path  would not be chosen, which I find somewhat objectionable.
> > > > >
> > > > > If the main goal of bailing out is to avoid doing the potentially
> > > > > expensive modification safety check on the target relation, maybe
> > > > > we should try to somehow make the check less expensive.  I
> > > > > remember reading somewhere in the thread about caching the result
> > > > > of this check in relcache, but haven't closely studied the feasibility
> > of doing so.
> > > > >
> > > >
> > > > There's no "second-guessing" involved here.
> > > > There is no underlying way of dividing up the VALUES data of
> > > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > > was updated to produce a parallel-plan for the "INSERT...VALUES"
> > > > case (apart from the fact that spawning off parallel workers to
> > > > insert that data would almost always result in worse performance
> > > > than a non-parallel plan...) The division of work for parallel
> > > > workers is part of the table AM
> > > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> > >
> > > I don't disagree that the planner would not normally assign a parallel
> > > path simply to pull values out of a VALUES list mentioned in the
> > > INSERT command, but deciding something based on the certainty of it in
> > > an earlier planning phase seems odd to me.  Maybe that's just me
> > > though.
> > >
> >
> > I think it is more of a case where neither is a need for parallelism nor
> > we want to support parallelism of it. The other possibility for such a check
> > could be at some earlier phase say in standard_planner [1] where we are
> > doing checks for other constructs where we don't want parallelism (I think
> > the check for 'parse->hasModifyingCTE' is quite similar). If you see in
> > that check as well we just assume other operations to be in the category
> > of parallel-unsafe. I think we should rule out such cases earlier than later.
> > Do you have better ideas than what Greg has done to avoid parallelism for
> > such cases?
> >
> > [1] -
> > standard_planner()
> > {
> > ..
> > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster &&
> > parse->commandType == CMD_SELECT &&
> > !parse->hasModifyingCTE &&
> > max_parallel_workers_per_gather > 0 &&
> > !IsParallelWorker())
> > {
> > /* all the cheap tests pass, so scan the query tree */
> > glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK =
> > glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > }
> > else
> > {
> > /* skip the query tree scan, just assume it's unsafe */
> > glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false;
> > }
>
> +1.
>
> In the current parallel_dml option patch. I put this check and some high-level check in a separate function called
is_parallel_possible_for_modify.
>
>
> - * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE
> + * Check at a high-level if parallel mode is able to be used for the specified
> + * table-modification statement.
> + * It's not possible in the following cases:
> + *
> + *  1) enable_parallel_dml is off
> + *  2) UPDATE or DELETE command
> + *  3) INSERT...ON CONFLICT...DO UPDATE
> + *  4) INSERT without SELECT on a relation
> + *  5) the reloption parallel_dml_enabled is not set for the target table
> + *
> + * (Note: we don't do in-depth parallel-safety checks here, we do only the
> + * cheaper tests that can quickly exclude obvious cases for which
> + * parallelism isn't supported, to avoid having to do further parallel-safety
> + * checks for these)
>   */
> +bool
> +is_parallel_possible_for_modify(Query *parse)
>
>
>
>
>
> And I put the function at earlier place like the following:
>
>
>         if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>                 IsUnderPostmaster &&
>                 (parse->commandType == CMD_SELECT ||
> -               (enable_parallel_dml &&
> -               IsModifySupportedInParallelMode(parse->commandType))) &&
> +               is_parallel_possible_for_modify(parse)) &&
>                 !parse->hasModifyingCTE &&
>                 max_parallel_workers_per_gather > 0 &&
>                 !IsParallelWorker())
>
>
> If this looks good, maybe we can merge this change.
>

If I've understood correctly, you're suggesting to merge the
"is_parallel_possible_for_modify()" code from your parallel_dml patch
into the main parallel INSERT patch, right?
If so, I think that's a good idea, as it will help simplify both
patches (and then, if need be, we can still discuss where best to
place certain checks ...).
I'll post an update soon that includes that change (then the
parallel_dml patch will need to be rebased accordingly).

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> * I think that the concerns raised by Tsunakawa-san in:
>
>
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
>
> regarding how this interacts with plancache.c deserve a look.
> Specifically, a plan that uses parallel insert may fail to be
> invalidated when partitions are altered directly (that is without
> altering their root parent).  That would be because we are not adding
> partition OIDs to PlannerGlobal.invalItems despite making a plan
> that's based on checking their properties.  See this (tested with all
> patches applied!):
>

Does any current Postgres code add partition OIDs to
PlannerGlobal.invalItems for a similar reason?
I would have thought that, for example,  partitions with a default
column expression, using a function that is changed from SAFE to
UNSAFE, would suffer the same plancache issue (for current parallel
SELECT functionality) as we're talking about here - but so far I
haven't seen any code handling this.

(Currently invalItems seems to support PROCID and TYPEOID; relation
OIDs seem to be handled through a different mechanism)..
Can you elaborate on what you believe is required here, so that the
partition OID dependency is registered and the altered partition
results in the plan being invalidated?
Thanks in advance for any help you can provide here.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > * I think that the concerns raised by Tsunakawa-san in:
> >
> >
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
> >
> > regarding how this interacts with plancache.c deserve a look.
> > Specifically, a plan that uses parallel insert may fail to be
> > invalidated when partitions are altered directly (that is without
> > altering their root parent).  That would be because we are not adding
> > partition OIDs to PlannerGlobal.invalItems despite making a plan
> > that's based on checking their properties.  See this (tested with all
> > patches applied!):
> >
>
> Does any current Postgres code add partition OIDs to
> PlannerGlobal.invalItems for a similar reason?
> I would have thought that, for example,  partitions with a default
> column expression, using a function that is changed from SAFE to
> UNSAFE, would suffer the same plancache issue (for current parallel
> SELECT functionality) as we're talking about here - but so far I
> haven't seen any code handling this.
>
> (Currently invalItems seems to support PROCID and TYPEOID; relation
> OIDs seem to be handled through a different mechanism)..
> Can you elaborate on what you believe is required here, so that the
> partition OID dependency is registered and the altered partition
> results in the plan being invalidated?
> Thanks in advance for any help you can provide here.
>

Actually, I tried adding the following in the loop that checks the
parallel-safety of each partition and it seemed to work:

            glob->relationOids =
                    lappend_oid(glob->relationOids, pdesc->oids[i]);

Can you confirm, is that what you were referring to?
(note that I've already updated the code to use
CreatePartitionDirectory() and PartitionDirectoryLookup())

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Posting an updated set of patches. Changes are based on feedback, as
detailed below:

[Amit Langote]
- Integrate max_parallel_hazard_for_modify() with max_parallel_hazard()
- Some function name changes
- Fix partition-related problems (to handle concurrent attachment of
partitions and altering of partitions, plan cache invalidation) and
added some tests for this.
(Method of fixing yet to be verified)
[Hou-san]
- Merge is_parallel_possible_for_modify() from the parallel_dml patch,
which helps in placement of some short-circuits of parallel-safety
checks
- Minor update to documentation for temp and foreign tables
[Greg]
- Temporary fix for query rewriter hasModifyingCTE bug (without
changing query rewriter code - note that v15 patch put fix in query
rewriter)

Hou-san: the parallel_dml patches will need slight rebasing.


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches. Changes are based on feedback, as
> detailed below:
>

Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
tests, and it caused some test failures in the PostgreSQL Patch Tester
(cfbot).
Also, I think that perhaps the localized temporary fix included in the
patch for the hasModifyingCTE bug should be restricted to INSERT, even
though the bug actually exists for SELECT too.
Posting an updated set of patches to address these.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Zhihong Yu
Date:
Hi,
For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

+       /* Assume original queries have hasModifyingCTE set correctly */
+       if (parsetree->hasModifyingCTE)
+           hasModifyingCTE = true;

Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
    hasModifyingCTE = parsetree->hasModifyingCTE

+   if (!hasSubQuery)
+       return false;
+
+   return true;

The above can be simplified as:
    return hasSubQuery;

Cheers

On Thu, Feb 11, 2021 at 7:02 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches. Changes are based on feedback, as
> detailed below:
>

Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
tests, and it caused some test failures in the PostgreSQL Patch Tester
(cfbot).
Also, I think that perhaps the localized temporary fix included in the
patch for the hasModifyingCTE bug should be restricted to INSERT, even
though the bug actually exists for SELECT too.
Posting an updated set of patches to address these.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> +       /* Assume original queries have hasModifyingCTE set correctly */
> +       if (parsetree->hasModifyingCTE)
> +           hasModifyingCTE = true;
>
> Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
>     hasModifyingCTE = parsetree->hasModifyingCTE
>

Actually, we should just return parsetree->hasModifyingCTE at this
point, because if it's false, we shouldn't need to continue the search
(as we're assuming it has been set correctly for QSRC_ORIGINAL case).

> +   if (!hasSubQuery)
> +       return false;
> +
> +   return true;
>
> The above can be simplified as:
>     return hasSubQuery;
>

Yes, absolutely right, silly miss on that one!
Thanks.

This was only ever meant to be a temporary fix for this bug that
affects this patch.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Zhihong Yu
Date:
Greg:
bq. we should just return parsetree->hasModifyingCTE at this point, 

Maybe you can clarify a bit.
The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE) block.

In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)

Cheers

On Thu, Feb 11, 2021 at 8:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> +       /* Assume original queries have hasModifyingCTE set correctly */
> +       if (parsetree->hasModifyingCTE)
> +           hasModifyingCTE = true;
>
> Since hasModifyingCTE is false by the time the above is run, it can be simplified as:
>     hasModifyingCTE = parsetree->hasModifyingCTE
>

Actually, we should just return parsetree->hasModifyingCTE at this
point, because if it's false, we shouldn't need to continue the search
(as we're assuming it has been set correctly for QSRC_ORIGINAL case).

> +   if (!hasSubQuery)
> +       return false;
> +
> +   return true;
>
> The above can be simplified as:
>     return hasSubQuery;
>

Yes, absolutely right, silly miss on that one!
Thanks.

This was only ever meant to be a temporary fix for this bug that
affects this patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Greg:
> bq. we should just return parsetree->hasModifyingCTE at this point,
>
> Maybe you can clarify a bit.
> The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
> When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if
(!hasModifyingCTE)block.
 
>
> In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)
>

Sorry for not making it clear. What I meant was that instead of:

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  if (parsetree->hasModifyingCTE)
    hasModifyingCTE = true;
}

I thought I should be able to use the following (it the setting for
QSRC_ORIGINAL can really be trusted):

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  return parsetree->hasModifyingCTE;
}

(and then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed)


BUT - after testing that change, the problem test case (in the "with"
tests) STILL fails.
I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
case (by adding an Assert), and it always is false.
So actually, there is no point in having the "if
(parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
called "original" query doesn't maintain the setting correctly (even
though the actual original query sent into the query rewriter does).
And also then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> > What would the result look like if you turn off
> > parallel_leader_participation?  If the leader is freed from
> > reading/writing the table and index, the index page splits and
> > internal lock contention may decrease enough to recover part of the loss.
> >
> > https://www.postgresql.org/docs/devel/parallel-plans.html
> >
> > "In a parallel bitmap heap scan, one process is chosen as the leader.
> > That process performs a scan of one or more indexes and builds a
> > bitmap indicating which table blocks need to be visited. These blocks
> > are then divided among the cooperating processes as in a parallel
> > sequential scan. In other words, the heap scan is performed in
> > parallel, but the underlying index scan is not."
> 
> If I disable parallel_leader_participation.
> 
> For max_parallel_workers_per_gather = 4, It still have performance
> degradation.
> 
> For max_parallel_workers_per_gather = 2, the performance degradation will
> not happen in most of the case.
> There is sometimes a noise(performance degradation), but most of
> result(about 80%) is good.

Thank you.  The results indicate that it depends on the degree of parallelism whether the gain from parallelism
outweighsthe loss of parallel insert operations, at least in the bitmap scan case.
 

But can we conclude that this is limited to bitmap scan?  Even if that's the case, the planner does not have
informationabout insert operation to choose other plans like serial execution or parallel sequential scan.  Should we
encouragethe user in the manual to tune parameters and find the fastest plan?
 


Regards
Takayuki Tsunakawa



RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> If we diable bitmapscan, the performance degradation seems will not happen.

Yes, but that's because the hundreds of times slower sequential scan hides the insert time.  Furthermore, as an aside,
Worker3 does much of the work in the parallel sequential scan + parallel insert case, while the load is well balanced
inthe parallel bitmap scan + parallel insert case.
 

Oracle and SQL Server executes parallel DML by holding an exclusive lock on the target table.  They might use some
specialpath for parallel DML to mitigate contention.
 


[serial bitmap scan + serial insert]
   ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005
rows=129999loops=1)
 
...
 Execution Time: 360.547 ms

[parallel bitmap scan + parallel insert]
         ->  Parallel Bitmap Heap Scan on public.x  (cost=3272.20..1260119.35 rows=19980 width=8) (actual
time=5.832..14.787rows=26000 loops=5)
 
...
 Execution Time: 382.776 ms


[serial sequential scan + serial insert]
   ->  Seq Scan on public.x  (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=129999
loops=1)
...
 Execution Time: 19311.700 ms

[parallel sequential scan + parallel insert]
         ->  Parallel Seq Scan on public.x  (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248
rows=32500loops=4)
 
...
 Execution Time: 5488.493 ms


Regards
Takayuki Tsunakawa



Re: Parallel INSERT (INTO ... SELECT ...)

From
Zhihong Yu
Date:
Greg:
Thanks for more debugging.

Cheers

On Thu, Feb 11, 2021 at 9:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Greg:
> bq. we should just return parsetree->hasModifyingCTE at this point,
>
> Maybe you can clarify a bit.
> The if (parsetree->hasModifyingCTE) check is followed by if (!hasModifyingCTE).
> When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true, resulting in the execution of the if (!hasModifyingCTE) block.
>
> In your reply, did you mean that the if (!hasModifyingCTE) block is no longer needed ? (I guess not)
>

Sorry for not making it clear. What I meant was that instead of:

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  if (parsetree->hasModifyingCTE)
    hasModifyingCTE = true;
}

I thought I should be able to use the following (it the setting for
QSRC_ORIGINAL can really be trusted):

if (parsetree->querySource == QSRC_ORIGINAL)
{
  /* Assume original queries have hasModifyingCTE set correctly */
  return parsetree->hasModifyingCTE;
}

(and then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed)


BUT - after testing that change, the problem test case (in the "with"
tests) STILL fails.
I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
case (by adding an Assert), and it always is false.
So actually, there is no point in having the "if
(parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
called "original" query doesn't maintain the setting correctly (even
though the actual original query sent into the query rewriter does).
And also then the "if (!hasModifyingCTE)" test on the code following
immediately below it can be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Fri, Feb 12, 2021 at 5:30 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> > If I disable parallel_leader_participation.
> >
> > For max_parallel_workers_per_gather = 4, It still have performance
> > degradation.
> >
> > For max_parallel_workers_per_gather = 2, the performance degradation will
> > not happen in most of the case.
> > There is sometimes a noise(performance degradation), but most of
> > result(about 80%) is good.
>
> Thank you.  The results indicate that it depends on the degree of parallelism whether the gain from parallelism
outweighsthe loss of parallel insert operations, at least in the bitmap scan case. 
>

That seems to be the pattern for this particular query, but I think
we'd need to test a variety to determine if that's always the case.

> But can we conclude that this is limited to bitmap scan?  Even if that's the case, the planner does not have
informationabout insert operation to choose other plans like serial execution or parallel sequential scan.  Should we
encouragethe user in the manual to tune parameters and find the fastest plan? 
>
>

It's all based on path costs, so we need to analyze and compare the
costing calculations done in this particular case against other cases,
and the values of the various parameters (costsize.c).
It's not difficult to determine for a parallel ModifyTablePath if it
has a BitmapHeapPath subpath - perhaps total_cost needs adjustment
(increase) for this case - and that will influence the planner to
choose a cheaper path. I was able to easily test the effect of doing
this, in the debugger - by increasing total_cost in cost_modifytable()
for the parallel bitmap heap scan case, the planner then chose a
serial Insert + bitmap heap scan, because it then had a cheaper cost.
Of course we need to better understand the problem and observed
patters in order to get a better feel of how those costs should be
adjusted.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > * I think that the concerns raised by Tsunakawa-san in:
> > >
> > >
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
> > >
> > > regarding how this interacts with plancache.c deserve a look.
> > > Specifically, a plan that uses parallel insert may fail to be
> > > invalidated when partitions are altered directly (that is without
> > > altering their root parent).  That would be because we are not adding
> > > partition OIDs to PlannerGlobal.invalItems despite making a plan
> > > that's based on checking their properties.  See this (tested with all
> > > patches applied!):
> > >
> >
> > Does any current Postgres code add partition OIDs to
> > PlannerGlobal.invalItems for a similar reason?

Currently, the planner opens partitions only for SELECT queries and
also adds them to the query's range table.  And because they are added
to the range table, their OIDs do get added to
PlannerGlobal.relationOids (not invalItems, sorry!) by way of
CompleteCachedPlan() calling extract_query_dependencies(), which looks
at Query.rtable to decide which tables/partitions to add.

> > I would have thought that, for example,  partitions with a default
> > column expression, using a function that is changed from SAFE to
> > UNSAFE, would suffer the same plancache issue (for current parallel
> > SELECT functionality) as we're talking about here - but so far I
> > haven't seen any code handling this.

AFAIK, default column expressions don't affect plans for SELECT
queries.  OTOH, consider a check constraint expression as an example.
The planner may use one to exclude a partition from the plan with its
constraint exclusion algorithm (separate from "partition pruning").
If the check constraint is dropped, any cached plans that used it will
be invalidated.

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp1 add constraint chk check (a >= -5);
set constraint_exclusion to on;

-- forces using a cached plan
set plan_cache_mode to force_generic_plan ;
prepare q as select * from rp where a < -5;

-- planner excluded rp1 because of the contradictory constraint
explain execute q;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

-- constraint dropped, plancache inval hook invoked
alter table rp1 drop constraint chk ;

-- old plan invalidated, new one made
explain execute q;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on rp1 rp  (cost=0.00..41.88 rows=850 width=4)
   Filter: (a < '-5'::integer)
(2 rows)

> > (Currently invalItems seems to support PROCID and TYPEOID; relation
> > OIDs seem to be handled through a different mechanism)..
>
> > Can you elaborate on what you believe is required here, so that the
> > partition OID dependency is registered and the altered partition
> > results in the plan being invalidated?
> > Thanks in advance for any help you can provide here.
>
> Actually, I tried adding the following in the loop that checks the
> parallel-safety of each partition and it seemed to work:
>
>             glob->relationOids =
>                     lappend_oid(glob->relationOids, pdesc->oids[i]);
>
> Can you confirm, is that what you were referring to?

Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.

Although it gets the job done, I'm not sure if manipulating
relationOids from max_parallel_hazard() or its subroutines is okay,
but I will let the committer decide that.  As I mentioned above, the
person who designed this decided for some reason that it is
extract_query_dependencies()'s job to populate
PlannerGlobal.relationOids/invalItems.

> (note that I've already updated the code to use
> CreatePartitionDirectory() and PartitionDirectoryLookup())

I will check your v16 to check if that indeed does the intended thing.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Thu, Feb 11, 2021 at 5:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > >
> > > > * I think that the concerns raised by Tsunakawa-san in:
> > > >
> > > >
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
> > > >
> > > > regarding how this interacts with plancache.c deserve a look.
> > > > Specifically, a plan that uses parallel insert may fail to be
> > > > invalidated when partitions are altered directly (that is without
> > > > altering their root parent).  That would be because we are not adding
> > > > partition OIDs to PlannerGlobal.invalItems despite making a plan
> > > > that's based on checking their properties.  See this (tested with all
> > > > patches applied!):
> > > >
> > >
> > > Does any current Postgres code add partition OIDs to
> > > PlannerGlobal.invalItems for a similar reason?
>
> Currently, the planner opens partitions only for SELECT queries and
> also adds them to the query's range table.  And because they are added
> to the range table, their OIDs do get added to
> PlannerGlobal.relationOids (not invalItems, sorry!) by way of
> CompleteCachedPlan() calling extract_query_dependencies(), which looks
> at Query.rtable to decide which tables/partitions to add.
>
> > > I would have thought that, for example,  partitions with a default
> > > column expression, using a function that is changed from SAFE to
> > > UNSAFE, would suffer the same plancache issue (for current parallel
> > > SELECT functionality) as we're talking about here - but so far I
> > > haven't seen any code handling this.
>
> AFAIK, default column expressions don't affect plans for SELECT
> queries.  OTOH, consider a check constraint expression as an example.
> The planner may use one to exclude a partition from the plan with its
> constraint exclusion algorithm (separate from "partition pruning").
> If the check constraint is dropped, any cached plans that used it will
> be invalidated.
>

Sorry, I got that wrong, default column expressions are relevant to
INSERT, not SELECT.

> >
> > Actually, I tried adding the following in the loop that checks the
> > parallel-safety of each partition and it seemed to work:
> >
> >             glob->relationOids =
> >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> >
> > Can you confirm, is that what you were referring to?
>
> Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
>
> Although it gets the job done, I'm not sure if manipulating
> relationOids from max_parallel_hazard() or its subroutines is okay,
> but I will let the committer decide that.  As I mentioned above, the
> person who designed this decided for some reason that it is
> extract_query_dependencies()'s job to populate
> PlannerGlobal.relationOids/invalItems.
>

Yes, it doesn't really seem right doing it within max_parallel_hazard().
I tried doing it in extract_query_dependencies() instead - see
attached patch - and it seems to work, but I'm not sure if there might
be any unintended side-effects.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > Did it actually use a parallel plan in your testing?
> > > When I ran these tests with the Parallel INSERT patch applied, it did
> > > not naturally choose a parallel plan for any of these cases.
> >
> > Yes, these cases pick parallel plan naturally on my test environment.
> >
> > postgres=# explain verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> >                                             QUERY PLAN
> > ----------------------------------------------------------------------
> > -----------------------------
> >  Gather  (cost=4346.89..1281204.64 rows=81372 width=0)
> >    Workers Planned: 4
> >    ->  Insert on public.testscan  (cost=3346.89..1272067.44 rows=0
> > width=0)
> >          ->  Parallel Bitmap Heap Scan on public.x1
> > (cost=3346.89..1272067.44 rows=20343 width=8)
> >                Output: x1.a, NULL::integer
> >                Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> >                Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> > 199900000)))
> >                ->  BitmapOr  (cost=3346.89..3346.89 rows=178808
> > width=0)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1495.19 rows=80883 width=0)
> >                            Index Cond: (x1.a < 80000)
> >                      ->  Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1811.01 rows=97925 width=0)
> >                            Index Cond: (x1.a > 199900000)
> >
> > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> > do any parameters tuning in my test session.
>
> I reproduced this on my machine.
>
> I think we'd better do "analyze" before insert which helps reproduce this easier.
> Like:
>
> -----
> analyze;
> explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
> -----
>

Thanks, I tried test_bimap.sql in my own environment, and added
"analyze", and I also found it naturally chose a parallel INSERT with
parallel bitmap heap scan for each of these cases.
However, I didn't see any performance degradation when compared
against serial INSERT with bitmap heap scan.
The parallel plan in these cases seems to run a bit faster.
(Note that I'm using a release build of Postgres, and using default
postgresql.conf)


test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan select a from x
where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=210.587..212.135 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Insert on public.testscan  (cost=3193.29..1247014.24 rows=0
width=0) (actual time=195.296..195.298 rows=0 loops=5)
         Worker 0:  actual time=189.512..189.514 rows=0 loops=1
         Worker 1:  actual time=194.843..194.844 rows=0 loops=1
         Worker 2:  actual time=193.986..193.988 rows=0 loops=1
         Worker 3:  actual time=188.035..188.037 rows=0 loops=1
         ->  Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=7.992..25.837 row
s=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=261
               Worker 0:  actual time=1.473..14.458 rows=22465 loops=1
               Worker 1:  actual time=7.370..31.359 rows=30525 loops=1
               Worker 2:  actual time=8.765..19.838 rows=18549 loops=1
               Worker 3:  actual time=0.279..17.269 rows=23864 loops=1
               ->  BitmapOr  (cost=3193.29..3193.29 rows=170535
width=0) (actual time=21.775..21.777 rows=0 loops=1)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=11.961..11.961
rows=
79999 loops=1)
                           Index Cond: (x.a < 80000)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.809..9.809
rows=10
0000 loops=1)
                           Index Cond: (x.a > 199900000)
 Planning Time: 0.276 ms
 Execution Time: 212.189 ms
(25 rows)


test=# truncate testscan;
TRUNCATE TABLE
test=# set max_parallel_workers_per_gather=0;
SET
test=# explain analyze verbose insert into testscan select a from x
where a<80000 or (a%2=0 and a>199900000);
                                                               QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan  (cost=3193.29..3625636.35 rows=0 width=0)
(actual time=241.222..241.224 rows=0 loops=1)
   ->  Bitmap Heap Scan on public.x  (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=16.945..92.392 rows=129999 loops
=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         ->  BitmapOr  (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=16.735..16.736 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.222..9.223 rows=79999 lo
ops=1)
                     Index Cond: (x.a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1790.21
rows=96752 width=0) (actual time=7.511..7.511 rows=100000 l
oops=1)
                     Index Cond: (x.a > 199900000)
 Planning Time: 0.205 ms
 Execution Time: 241.274 ms
(14 rows)


============


test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan_pk select a from x
where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=572.242..573.683 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Insert on public.testscan_pk  (cost=3193.29..1247014.24 rows=0
width=0) (actual time=566.303..566.308 rows=0 loops=5)
         Worker 0:  actual time=566.756..566.757 rows=0 loops=1
         Worker 1:  actual time=564.778..564.779 rows=0 loops=1
         Worker 2:  actual time=564.402..564.419 rows=0 loops=1
         Worker 3:  actual time=563.748..563.749 rows=0 loops=1
         ->  Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=16.479..37.327 ro
ws=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=204
               Worker 0:  actual time=17.358..36.895 rows=24233 loops=1
               Worker 1:  actual time=12.711..33.538 rows=25616 loops=1
               Worker 2:  actual time=15.671..35.701 rows=24831 loops=1
               Worker 3:  actual time=17.656..39.310 rows=26645 loops=1
               ->  BitmapOr  (cost=3193.29..3193.29 rows=170535
width=0) (actual time=18.541..18.542 rows=0 loops=1)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=8.549..8.549
rows=79
999 loops=1)
                           Index Cond: (x.a < 80000)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.990..9.990
rows=10
0000 loops=1)
                           Index Cond: (x.a > 199900000)
 Planning Time: 0.240 ms
 Execution Time: 573.733 ms
(25 rows)



test=# set max_parallel_workers_per_gather=0;
SET
test=# truncate testscan_pk;
TRUNCATE TABLE
test=# explain analyze verbose insert into testscan_pk select a from x
where a<80000 or (a%2=0 and a>199900000);
                                                                QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan_pk  (cost=3193.29..3625636.35 rows=0
width=0) (actual time=598.997..598.998 rows=0 loops=1)
   ->  Bitmap Heap Scan on public.x  (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=20.153..96.858 rows=129999 loops
=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         ->  BitmapOr  (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=19.840..19.841 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.276..9.276 rows=79999 lo
ops=1)
                     Index Cond: (x.a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1790.21
rows=96752 width=0) (actual time=10.562..10.562 rows=100000
 loops=1)
                     Index Cond: (x.a > 199900000)
 Planning Time: 0.204 ms
 Execution Time: 599.098 ms
(14 rows)


============


test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan_index select a
from x where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=560.460..562.386 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Insert on public.testscan_index  (cost=3193.29..1247014.24
rows=0 width=0) (actual time=553.434..553.435 rows=0 loops=5)
         Worker 0:  actual time=548.751..548.752 rows=0 loops=1
         Worker 1:  actual time=552.008..552.009 rows=0 loops=1
         Worker 2:  actual time=553.094..553.095 rows=0 loops=1
         Worker 3:  actual time=553.389..553.390 rows=0 loops=1
         ->  Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=13.759..34.487 ro
ws=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=183
               Worker 0:  actual time=8.698..29.924 rows=26173 loops=1
               Worker 1:  actual time=12.865..33.889 rows=27421 loops=1
               Worker 2:  actual time=13.088..32.823 rows=24591 loops=1
               Worker 3:  actual time=14.075..36.349 rows=26571 loops=1
               ->  BitmapOr  (cost=3193.29..3193.29 rows=170535
width=0) (actual time=19.356..19.357 rows=0 loops=1)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=10.330..10.330
rows=
79999 loops=1)
                           Index Cond: (x.a < 80000)
                     ->  Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.024..9.024
rows=10
0000 loops=1)
                           Index Cond: (x.a > 199900000)
 Planning Time: 0.219 ms
 Execution Time: 562.442 ms
(25 rows)



test=# set max_parallel_workers_per_gather=0;
SET
test=# truncate testscan_index;
TRUNCATE TABLE
test=# explain analyze verbose insert into testscan_index select a
from x where a<80000 or (a%2=0 and a>199900000);
                                                                QUERY
PLAN


--------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan_index  (cost=3193.29..3625636.35 rows=0
width=0) (actual time=607.619..607.621 rows=0 loops=1)
   ->  Bitmap Heap Scan on public.x  (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=21.001..96.283 rows=129999 loops
=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         ->  BitmapOr  (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=20.690..20.691 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.097..9.097 rows=79999 lo
ops=1)
                     Index Cond: (x.a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1790.21
rows=96752 width=0) (actual time=11.591..11.591 rows=100000
 loops=1)
                     Index Cond: (x.a > 199900000)
 Planning Time: 0.205 ms
 Execution Time: 607.734 ms
(14 rows)


Even when I changed the queries to return more rows from the scan, to
the point where it chose not to use a parallel INSERT bitmap heap scan
(in favour of parallel seq scan), and then forced it to by disabling
seqscan, I found that it was still at least as fast as serial INSERT
with bitmap heap scan.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > Actually, I tried adding the following in the loop that checks the
> > > parallel-safety of each partition and it seemed to work:
> > >
> > >             glob->relationOids =
> > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > >
> > > Can you confirm, is that what you were referring to?
> >
> > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> >
> > Although it gets the job done, I'm not sure if manipulating
> > relationOids from max_parallel_hazard() or its subroutines is okay,
> > but I will let the committer decide that.  As I mentioned above, the
> > person who designed this decided for some reason that it is
> > extract_query_dependencies()'s job to populate
> > PlannerGlobal.relationOids/invalItems.
>
> Yes, it doesn't really seem right doing it within max_parallel_hazard().
> I tried doing it in extract_query_dependencies() instead - see
> attached patch - and it seems to work, but I'm not sure if there might
> be any unintended side-effects.

One issue I see with the patch is that it fails to consider
multi-level partitioning, because it's looking up partitions only in
the target table's PartitionDesc and no other.

@@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
PlannerInfo *context)
            RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);

            if (rte->rtekind == RTE_RELATION)
-               context->glob->relationOids =
-                   lappend_oid(context->glob->relationOids, rte->relid);
+           {
+               PlannerGlobal   *glob;
+
+               glob = context->glob;
+               glob->relationOids =
+                   lappend_oid(glob->relationOids, rte->relid);
+               if (query->commandType == CMD_INSERT &&
+                                   rte->relkind == RELKIND_PARTITIONED_TABLE)

The RTE whose relkind is being checked here may not be the INSERT
target relation's RTE, even though that's perhaps always true today.
So, I suggest to pull the new block out of the loop over rtable and
perform its deeds on the result RTE explicitly fetched using
rt_fetch(), preferably using a separate recursive function.  I'm
thinking something like the attached revised version.



--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > > Actually, I tried adding the following in the loop that checks the
> > > > parallel-safety of each partition and it seemed to work:
> > > >
> > > >             glob->relationOids =
> > > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > > >
> > > > Can you confirm, is that what you were referring to?
> > >
> > > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> > >
> > > Although it gets the job done, I'm not sure if manipulating
> > > relationOids from max_parallel_hazard() or its subroutines is okay,
> > > but I will let the committer decide that.  As I mentioned above, the
> > > person who designed this decided for some reason that it is
> > > extract_query_dependencies()'s job to populate
> > > PlannerGlobal.relationOids/invalItems.
> >
> > Yes, it doesn't really seem right doing it within max_parallel_hazard().
> > I tried doing it in extract_query_dependencies() instead - see
> > attached patch - and it seems to work, but I'm not sure if there might
> > be any unintended side-effects.
>
> One issue I see with the patch is that it fails to consider
> multi-level partitioning, because it's looking up partitions only in
> the target table's PartitionDesc and no other.
>
> @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
> PlannerInfo *context)
>             RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
>
>             if (rte->rtekind == RTE_RELATION)
> -               context->glob->relationOids =
> -                   lappend_oid(context->glob->relationOids, rte->relid);
> +           {
> +               PlannerGlobal   *glob;
> +
> +               glob = context->glob;
> +               glob->relationOids =
> +                   lappend_oid(glob->relationOids, rte->relid);
> +               if (query->commandType == CMD_INSERT &&
> +                                   rte->relkind == RELKIND_PARTITIONED_TABLE)
>
> The RTE whose relkind is being checked here may not be the INSERT
> target relation's RTE, even though that's perhaps always true today.
> So, I suggest to pull the new block out of the loop over rtable and
> perform its deeds on the result RTE explicitly fetched using
> rt_fetch(), preferably using a separate recursive function.  I'm
> thinking something like the attached revised version.
>
>

Thanks. Yes, I'd forgotten about the fact a partition may itself be
partitioned, so it needs to be recursive (like in the parallel-safety
checks).
Your revised version seems OK, though I do have a concern:
Is the use of "table_close(rel, NoLock)'' intentional? That will keep
the lock (lockmode) until end-of-transaction.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > > > Actually, I tried adding the following in the loop that checks the
> > > > > parallel-safety of each partition and it seemed to work:
> > > > >
> > > > >             glob->relationOids =
> > > > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > > > >
> > > > > Can you confirm, is that what you were referring to?
> > > >
> > > > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> > > >
> > > > Although it gets the job done, I'm not sure if manipulating
> > > > relationOids from max_parallel_hazard() or its subroutines is okay,
> > > > but I will let the committer decide that.  As I mentioned above, the
> > > > person who designed this decided for some reason that it is
> > > > extract_query_dependencies()'s job to populate
> > > > PlannerGlobal.relationOids/invalItems.
> > >
> > > Yes, it doesn't really seem right doing it within max_parallel_hazard().
> > > I tried doing it in extract_query_dependencies() instead - see
> > > attached patch - and it seems to work, but I'm not sure if there might
> > > be any unintended side-effects.
> >
> > One issue I see with the patch is that it fails to consider
> > multi-level partitioning, because it's looking up partitions only in
> > the target table's PartitionDesc and no other.
> >
> > @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
> > PlannerInfo *context)
> >             RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
> >
> >             if (rte->rtekind == RTE_RELATION)
> > -               context->glob->relationOids =
> > -                   lappend_oid(context->glob->relationOids, rte->relid);
> > +           {
> > +               PlannerGlobal   *glob;
> > +
> > +               glob = context->glob;
> > +               glob->relationOids =
> > +                   lappend_oid(glob->relationOids, rte->relid);
> > +               if (query->commandType == CMD_INSERT &&
> > +                                   rte->relkind == RELKIND_PARTITIONED_TABLE)
> >
> > The RTE whose relkind is being checked here may not be the INSERT
> > target relation's RTE, even though that's perhaps always true today.
> > So, I suggest to pull the new block out of the loop over rtable and
> > perform its deeds on the result RTE explicitly fetched using
> > rt_fetch(), preferably using a separate recursive function.  I'm
> > thinking something like the attached revised version.
>
> Thanks. Yes, I'd forgotten about the fact a partition may itself be
> partitioned, so it needs to be recursive (like in the parallel-safety
> checks).
> Your revised version seems OK, though I do have a concern:
> Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> the lock (lockmode) until end-of-transaction.

I think we always keep any locks on relations that are involved in a
plan until end-of-transaction.  What if a partition is changed in an
unsafe manner between being considered safe for parallel insertion and
actually performing the parallel insert?

BTW, I just noticed that exctract_query_dependencies() runs on a
rewritten, but not-yet-planned query tree, that is, I didn't know that
extract_query_dependencies() only populates the CachedPlanSource's
relationOids and not CachedPlan's.  The former is only for tracking
the dependencies of an unplanned Query, so partitions should never be
added to it.  Instead, they should be added to
PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
which is kind of what your earlier patch did.  Needless to say,
PlanCacheRelCallback checks both CachedPlanSource.relationOids and
PlannedStmt.relationOids, so if it receives a message about a
partition, its OID is matched from the latter.

All that is to say that we should move our code to add partition OIDs
as plan dependencies to somewhere in set_plan_references(), which
otherwise populates PlannedStmt.relationOids.  I updated the patch to
do that.  It also occurred to me that we can avoid pointless adding of
partitions if the final plan won't use parallelism.  For that, the
patch adds checking glob->parallelModeNeeded, which seems to do the
trick though I don't know if that's the correct way of doing that.


--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > Your revised version seems OK, though I do have a concern:
> > Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> > the lock (lockmode) until end-of-transaction.
>
> I think we always keep any locks on relations that are involved in a
> plan until end-of-transaction.  What if a partition is changed in an
> unsafe manner between being considered safe for parallel insertion and
> actually performing the parallel insert?
>
> BTW, I just noticed that exctract_query_dependencies() runs on a
> rewritten, but not-yet-planned query tree, that is, I didn't know that
> extract_query_dependencies() only populates the CachedPlanSource's
> relationOids and not CachedPlan's.  The former is only for tracking
> the dependencies of an unplanned Query, so partitions should never be
> added to it.  Instead, they should be added to
> PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
> which is kind of what your earlier patch did.  Needless to say,
> PlanCacheRelCallback checks both CachedPlanSource.relationOids and
> PlannedStmt.relationOids, so if it receives a message about a
> partition, its OID is matched from the latter.
>
> All that is to say that we should move our code to add partition OIDs
> as plan dependencies to somewhere in set_plan_references(), which
> otherwise populates PlannedStmt.relationOids.  I updated the patch to
> do that.

OK, understood. Thanks for the detailed explanation.

> It also occurred to me that we can avoid pointless adding of
> partitions if the final plan won't use parallelism.  For that, the
> patch adds checking glob->parallelModeNeeded, which seems to do the
> trick though I don't know if that's the correct way of doing that.
>

I'm not sure if's pointless adding partitions even in the case of NOT
using parallelism, because we may be relying on the result of
parallel-safety checks on partitions in both cases.
For example, insert_parallel.sql currently includes a test (that you
originally provided in a previous post) that checks a non-parallel
plan is generated after a parallel-unsafe trigger is created on a
partition involved in the INSERT.
If I further add to that test by then dropping that trigger and then
again using EXPLAIN to see what plan is generated, then I'd expect a
parallel-plan to be generated, but with the setrefs-v3.patch it still
generates a non-parallel plan. So I think the "&&
glob->parallelModeNeeded" part of test needs to be removed.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Langote
Date:
On Thu, Feb 18, 2021 at 10:03 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > All that is to say that we should move our code to add partition OIDs
> > as plan dependencies to somewhere in set_plan_references(), which
> > otherwise populates PlannedStmt.relationOids.  I updated the patch to
> > do that.
>
> OK, understood. Thanks for the detailed explanation.
>
> > It also occurred to me that we can avoid pointless adding of
> > partitions if the final plan won't use parallelism.  For that, the
> > patch adds checking glob->parallelModeNeeded, which seems to do the
> > trick though I don't know if that's the correct way of doing that.
> >
>
> I'm not sure if's pointless adding partitions even in the case of NOT
> using parallelism, because we may be relying on the result of
> parallel-safety checks on partitions in both cases.
> For example, insert_parallel.sql currently includes a test (that you
> originally provided in a previous post) that checks a non-parallel
> plan is generated after a parallel-unsafe trigger is created on a
> partition involved in the INSERT.
> If I further add to that test by then dropping that trigger and then
> again using EXPLAIN to see what plan is generated, then I'd expect a
> parallel-plan to be generated, but with the setrefs-v3.patch it still
> generates a non-parallel plan. So I think the "&&
> glob->parallelModeNeeded" part of test needs to be removed.

Ah, okay, I didn't retest my case after making that change.

Looking at this again, I am a bit concerned about going over the whole
partition tree *twice* when making a parallel plan for insert into
partitioned tables.  Maybe we should do what you did in your first
attempt a slightly differently -- add partition OIDs during the
max_parallel_hazard() initiated scan of the partition tree as you did.
Instead of adding them directly to PlannerGlobal.relationOids, add to,
say, PlannerInfo.targetPartitionOids and have set_plan_references() do
list_concat(glob->relationOids, list_copy(root->targetPartitionOids)
in the same place as setrefs-v3 does
add_target_partition_oids_recurse().  Thoughts?


--
Amit Langote
EDB: http://www.enterprisedb.com



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Thu, Feb 18, 2021 at 4:35 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Looking at this again, I am a bit concerned about going over the whole
> partition tree *twice* when making a parallel plan for insert into
> partitioned tables.  Maybe we should do what you did in your first
> attempt a slightly differently -- add partition OIDs during the
> max_parallel_hazard() initiated scan of the partition tree as you did.
> Instead of adding them directly to PlannerGlobal.relationOids, add to,
> say, PlannerInfo.targetPartitionOids and have set_plan_references() do
> list_concat(glob->relationOids, list_copy(root->targetPartitionOids)
> in the same place as setrefs-v3 does
> add_target_partition_oids_recurse().  Thoughts?
>

Agreed, that might be a better approach, and that way we're also only
recording the partition OIDs that the parallel-safety checks are
relying on.
I'll give it a go and see if I can detect any issues with this method.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
Posting a new version of the patches, with the following updates:
- Moved the update of glob->relationOIDs (i.e. addition of partition
OIDs that plan depends on, resulting from parallel-safety checks) from
within max_parallel_hazard() to set_plan_references().
- Added an extra test for partition plan-cache invalidation.
- Simplified query_has_modifying_cte() temporary bug-fix.
- Added a comment explaining why parallel-safety of partition column
defaults is not checked.
- Minor simplification: hasSubQuery return to is_parallel_possible_for_modify().


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Greg Nancarrow <gregn4422@gmail.com> 
--------------------------------------------------
On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
> (8)
> +               /*
> +                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> +                * the relation, and this would result in creation of new CommandIds
> +                * on insert/update/delete and this isn't supported in a parallel
> +                * worker (but is safe in the parallel leader).
> +                */
> +               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> +               if (trigtype == RI_TRIGGER_FK)
> +               {
> +                       if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> +                               return true;
> +               }
>
> Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.  See
RI_FKey_check()which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows
argumentset to false, which causes CommandCounterIncrement() to not be called.
 
>

Hmmm, I'm not sure that you have read and interpreted the patch code correctly.
The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will
generatea new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new
commandIDs across the participants in the parallel operation). This is what the code comment says, It does not say that
sucha trigger generates a new command ID.
 

See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665

In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK).
--------------------------------------------------


First of all, I anticipate this parallel INSERT SELECT feature will typically shine, and expected to work, in the ETL
orELT into a data warehouse or an ODS for analytics.  Bearing that in mind, let me list some issues or questions below.
But the current state of the patch would be of course attractive in some workloads, so I don't think these are not
necessarilyblockers.
 


(1)
According to the classic book "The Data Warehouse Toolkit" and the website [1] by its author, the fact table (large
transactionhistory) in the data warehouse has foreign keys referencing to the dimension tables (small or medium-sized
masteror reference data).  So, parallel insert will be effective if it works when loading data into the fact table with
foreignkeys.
 

To answer the above question, I'm assuming:

CREATE TABLE some_dimension (key_col int PRIMARY KEY);
CREATE TABLE some_fact (some_key int REFERENCES some_dimension);
INSERT INTO some_fact SELECT ...;


My naive question is, "why should new command IDs be generated to check foreign key constraints in this INSERT case?
Thecheck just reads the parent (some_dimension table here)..."
 

Looking a bit deeper into the code, although ri_PerformCheck() itself tries to avoid generating command IDs, it calls
_SPI_execute_snapshot()with the read_only argument always set to false.  It in turn calls _SPI_execute_plan() ->
CommandCounterIncrement()as follows:
 

[_SPI_execute_plan()]
            /*
             * If not read-only mode, advance the command counter before each
             * command and update the snapshot.
             */
            if (!read_only && !plan->no_snapshots)
            {
                CommandCounterIncrement();
                UpdateActiveSnapshotCommandId();
            }


Can't we pass true to read_only from ri_PerformCheck() in some cases?


(2)
Likewise, dimension tables have surrogate keys that are typically implemented as a sequence or an identity column.  It
issuggested that even fact tables sometimes (or often?) have surrogate keys.  But the current patch does not
parallelizethe statement when the target table has a sequence or an identity column.
 

I was looking at the sequence code, and my naive (again) idea is that the parallel leader and workers allocates numbers
fromthe sequence independently, and sets the largest number of them as the session's currval at the end of parallel
operation. We have to note in the documentation that gaps in the sequence numbers will arise and not used in parallel
DML.


(3)
As Hou-san demonstrated, the current patch causes the resulting table and index to become larger when inserted in
parallelthan in inserted serially.  This could be a problem for analytics use cases where the table is just inserted
andread only afterwards.  We could advise the user to run REINDEX CONCURRENTLY after loading data, but what about
tables?

BTW, I don't know if Oracle and SQL Server have similar issues.  They may have some reason about this why they take an
exclusivelock on the target table.
 


(4)
When the target table is partitioned, is the INSERT parallelized among its partitions?  Some plan like:

Parallel Append on parent_table
  -> Insert on partiton1
  -> Insert on partiton2



[1]
Fact Table Surrogate Key | Kimball Dimensional Modeling Techniques

https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/fact-surrogate-key/


Regards
Takayuki Tsunakawa



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Feb 18, 2021 at 11:05 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> > > It also occurred to me that we can avoid pointless adding of
> > > partitions if the final plan won't use parallelism.  For that, the
> > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > trick though I don't know if that's the correct way of doing that.
> > >
> >
> > I'm not sure if's pointless adding partitions even in the case of NOT
> > using parallelism, because we may be relying on the result of
> > parallel-safety checks on partitions in both cases.
> > For example, insert_parallel.sql currently includes a test (that you
> > originally provided in a previous post) that checks a non-parallel
> > plan is generated after a parallel-unsafe trigger is created on a
> > partition involved in the INSERT.
> > If I further add to that test by then dropping that trigger and then
> > again using EXPLAIN to see what plan is generated, then I'd expect a
> > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > generates a non-parallel plan. So I think the "&&
> > glob->parallelModeNeeded" part of test needs to be removed.
>
> Ah, okay, I didn't retest my case after making that change.
>

Greg has point here but I feel something on previous lines (having a
test of glob->parallelModeNeeded) is better. We only want to
invalidate the plan if the prepared plan is unsafe to execute next
time. It is quite possible that there are unsafe triggers on different
partitions and only one of them is dropped, so next time planning will
again yield to the same non-parallel plan. If we agree with that I
think it is better to add this dependency in set_plan_refs (along with
Gather node handling).

Also, if we agree that we don't have any cheap way to determine
parallel-safety of partitioned relations then shall we consider the
patch being discussed [1] to be combined here?

Amit L, do you agree with that line of thought, or you have any other ideas?

I feel we should focus on getting the first patch of Greg
(v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT, along with
a test case patch) and Hou-San's patch discussed at [1] ready. Then we
can focus on the
v18-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Because
even if we get the first patch that is good enough for some users.

What do you think?

[1] - https://www.postgresql.org/message-id/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Fri, Feb 19, 2021 at 10:13 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Greg Nancarrow <gregn4422@gmail.com>
> --------------------------------------------------
> On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
> > (8)
> > +               /*
> > +                * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
> > +                * the relation, and this would result in creation of new CommandIds
> > +                * on insert/update/delete and this isn't supported in a parallel
> > +                * worker (but is safe in the parallel leader).
> > +                */
> > +               trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> > +               if (trigtype == RI_TRIGGER_FK)
> > +               {
> > +                       if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> > +                               return true;
> > +               }
> >
> > Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.
SeeRI_FKey_check() which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows
argumentset to false, which causes CommandCounterIncrement() to not be called. 
> >
>
> Hmmm, I'm not sure that you have read and interpreted the patch code correctly.
> The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will
generatea new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new
commandIDs across the participants in the parallel operation). This is what the code comment says, It does not say that
sucha trigger generates a new command ID. 
>
> See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665
>
> In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK).
> --------------------------------------------------
>
>
> First of all, I anticipate this parallel INSERT SELECT feature will typically shine, and expected to work, in the ETL
orELT into a data warehouse or an ODS for analytics.  Bearing that in mind, let me list some issues or questions below.
But the current state of the patch would be of course attractive in some workloads, so I don't think these are not
necessarilyblockers. 
>
>
> (1)
> According to the classic book "The Data Warehouse Toolkit" and the website [1] by its author, the fact table (large
transactionhistory) in the data warehouse has foreign keys referencing to the dimension tables (small or medium-sized
masteror reference data).  So, parallel insert will be effective if it works when loading data into the fact table with
foreignkeys. 
>
> To answer the above question, I'm assuming:
>
> CREATE TABLE some_dimension (key_col int PRIMARY KEY);
> CREATE TABLE some_fact (some_key int REFERENCES some_dimension);
> INSERT INTO some_fact SELECT ...;
>
>
> My naive question is, "why should new command IDs be generated to check foreign key constraints in this INSERT case?
Thecheck just reads the parent (some_dimension table here)..." 
>

It is quite possible what you are saying is correct but I feel that is
not this patch's fault. So, won't it better to discuss this in a
separate thread?

>
>
> (2)
> Likewise, dimension tables have surrogate keys that are typically implemented as a sequence or an identity column.
Itis suggested that even fact tables sometimes (or often?) have surrogate keys.  But the current patch does not
parallelizethe statement when the target table has a sequence or an identity column. 
>
> I was looking at the sequence code, and my naive (again) idea is that the parallel leader and workers allocates
numbersfrom the sequence independently, and sets the largest number of them as the session's currval at the end of
paralleloperation.  We have to note in the documentation that gaps in the sequence numbers will arise and not used in
parallelDML. 
>

Good use case but again, I think this can be done as a separate patch.

>
> (3)
> As Hou-san demonstrated, the current patch causes the resulting table and index to become larger when inserted in
parallelthan in inserted serially.  This could be a problem for analytics use cases where the table is just inserted
andread only afterwards.  We could advise the user to run REINDEX CONCURRENTLY after loading data, but what about
tables?
>

I think here you are talking about the third patch (Parallel Inserts).
I guess if one has run inserts parallelly from psql then also similar
behavior would have been observed. For tables, it might lead to better
results once we have the patch discussed at [1]. Actually, this needs
more investigation.

[1] - https://www.postgresql.org/message-id/20200508072545.GA9701%40telsasoft.com

--
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Mar 18, 2021 at 8:30 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
> index ff1b642722..d5d356f2de 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> @@ -1338,8 +1338,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
>      If a table parameter value is set and the
>      equivalent <literal>toast.</literal> parameter is not, the TOAST table
>      will use the table's parameter value.
> -    Specifying these parameters for partitioned tables is not supported,
> -    but you may specify them for individual leaf partitions.
> +    These parameters, with the exception of
> +    <literal>parallel_insert_enabled</literal>,
> +    are not supported on partitioned tables, but may be specified for
> +    individual leaf partitions.
>     </para>
>

Your patch looks good to me. While checking this, I notice a typo in
the previous patch:
-      planner parameter <varname>parallel_workers</varname>.
+      planner parameter <varname>parallel_workers</varname> and
+      <varname>parallel_insert_enabled</varname>.

Here, it should be /planner parameter/planner parameters.

-- 
With Regards,
Amit Kapila.



RE: Parallel INSERT (INTO ... SELECT ...)

From
"houzj.fnst@fujitsu.com"
Date:
> >      If a table parameter value is set and the
> >      equivalent <literal>toast.</literal> parameter is not, the TOAST table
> >      will use the table's parameter value.
> > -    Specifying these parameters for partitioned tables is not supported,
> > -    but you may specify them for individual leaf partitions.
> > +    These parameters, with the exception of
> > +    <literal>parallel_insert_enabled</literal>,
> > +    are not supported on partitioned tables, but may be specified for
> > +    individual leaf partitions.
> >     </para>
> >
> 
> Your patch looks good to me. While checking this, I notice a typo in the
> previous patch:
> -      planner parameter <varname>parallel_workers</varname>.
> +      planner parameter <varname>parallel_workers</varname> and
> +      <varname>parallel_insert_enabled</varname>.
> 
> Here, it should be /planner parameter/planner parameters.

Thanks amit and justin for pointing this out !
The changes looks good to me.

Best regards,
houzj


Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Thu, Mar 18, 2021 at 9:04 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> > >      If a table parameter value is set and the
> > >      equivalent <literal>toast.</literal> parameter is not, the TOAST table
> > >      will use the table's parameter value.
> > > -    Specifying these parameters for partitioned tables is not supported,
> > > -    but you may specify them for individual leaf partitions.
> > > +    These parameters, with the exception of
> > > +    <literal>parallel_insert_enabled</literal>,
> > > +    are not supported on partitioned tables, but may be specified for
> > > +    individual leaf partitions.
> > >     </para>
> > >
> >
> > Your patch looks good to me. While checking this, I notice a typo in the
> > previous patch:
> > -      planner parameter <varname>parallel_workers</varname>.
> > +      planner parameter <varname>parallel_workers</varname> and
> > +      <varname>parallel_insert_enabled</varname>.
> >
> > Here, it should be /planner parameter/planner parameters.
>
> Thanks amit and justin for pointing this out !
> The changes looks good to me.
>

Pushed!

-- 
With Regards,
Amit Kapila.



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:

Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch have been committed, I have now rebased and attached the rest of the original patchset, which includes:
- Additional tests for "Parallel SELECT for INSERT"
- Parallel INSERT functionality
- Test and documentation updates for Parallel INSERT

Regards,
Greg Nancarrow
Fujitsu Australia



Attachment

RE: Parallel INSERT (INTO ... SELECT ...)

From
"houzj.fnst@fujitsu.com"
Date:
> Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch have been committed, I have now rebased
andattached the rest of the original patchset, 
 
> which includes:
>- Additional tests for "Parallel SELECT for INSERT"
>- Parallel INSERT functionality
>- Test and documentation updates for Parallel INSERT
Hi,

I noticed that some comments may need updated since we introduced parallel insert in this patch.

1) src/backend/executor/execMain.c
     * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
     * would require (a) storing the combocid hash in shared memory, rather
     * than synchronizing it just once at the start of parallelism, and (b) an
     * alternative to heap_update()'s reliance on xmax for mutual exclusion.
     * INSERT may have no such troubles, but we forbid it to simplify the
     * checks.

As we will allow INSERT in parallel mode, we'd better change the comment here.

2) src/backend/storage/lmgr/README
dangers are modest.  The leader and worker share the same transaction,
snapshot, and combo CID hash, and neither can perform any DDL or, indeed,
write any data at all.  Thus, for either to read a table locked exclusively by

The same as 1), parallel insert is the exception.

3) src/backend/storage/lmgr/README
mutual exclusion method for such cases.  Currently, the parallel mode is
strictly read-only, but now we have the infrastructure to allow parallel
inserts and parallel copy.

May be we can say:
+mutual exclusion method for such cases.  Currently, we only allowed parallel
+inserts, but we already have the infrastructure to allow parallel copy.


Best regards,
houzj



Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Mar 22, 2021 at 2:30 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> I noticed that some comments may need updated since we introduced parallel insert in this patch.
>
> 1) src/backend/executor/execMain.c
>          * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
>          * would require (a) storing the combocid hash in shared memory, rather
>          * than synchronizing it just once at the start of parallelism, and (b) an
>          * alternative to heap_update()'s reliance on xmax for mutual exclusion.
>          * INSERT may have no such troubles, but we forbid it to simplify the
>          * checks.
>
> As we will allow INSERT in parallel mode, we'd better change the comment here.
>

Thanks, it does need to be updated for parallel INSERT.
I was thinking of the following change:

-     * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
-     * would require (a) storing the combocid hash in shared memory, rather
-     * than synchronizing it just once at the start of parallelism, and (b) an
-     * alternative to heap_update()'s reliance on xmax for mutual exclusion.
-     * INSERT may have no such troubles, but we forbid it to simplify the
-     * checks.
+     * Except for INSERT, don't allow writes in parallel mode.  Supporting
+     * UPDATE and DELETE would require (a) storing the combocid hash in shared
+     * memory, rather than synchronizing it just once at the start of
+     * parallelism, and (b) an alternative to heap_update()'s reliance on xmax
+     * for mutual exclusion.



> 2) src/backend/storage/lmgr/README
> dangers are modest.  The leader and worker share the same transaction,
> snapshot, and combo CID hash, and neither can perform any DDL or, indeed,
> write any data at all.  Thus, for either to read a table locked exclusively by
>
> The same as 1), parallel insert is the exception.
>

I agree, it needs to be updated too, to account for parallel INSERT
now being supported.

-write any data at all.  ...
+write any data at all (with the exception of parallel insert).  ...


> 3) src/backend/storage/lmgr/README
> mutual exclusion method for such cases.  Currently, the parallel mode is
> strictly read-only, but now we have the infrastructure to allow parallel
> inserts and parallel copy.
>
> May be we can say:
> +mutual exclusion method for such cases.  Currently, we only allowed parallel
> +inserts, but we already have the infrastructure to allow parallel copy.
>

Yes, agree, something like:

-mutual exclusion method for such cases.  Currently, the parallel mode is
-strictly read-only, but now we have the infrastructure to allow parallel
-inserts and parallel copy.
+mutual exclusion method for such cases.  Currently, only parallel insert is
+allowed, but we have the infrastructure to allow parallel copy.


Let me know if these changes seem OK to you.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel INSERT (INTO ... SELECT ...)

From
"houzj.fnst@fujitsu.com"
Date:
> > I noticed that some comments may need updated since we introduced
> parallel insert in this patch.
> >
> > 1) src/backend/executor/execMain.c
> >          * Don't allow writes in parallel mode.  Supporting UPDATE and
> DELETE
> >          * would require (a) storing the combocid hash in shared memory,
> rather
> >          * than synchronizing it just once at the start of parallelism, and (b) an
> >          * alternative to heap_update()'s reliance on xmax for mutual
> exclusion.
> >          * INSERT may have no such troubles, but we forbid it to simplify the
> >          * checks.
> >
> > As we will allow INSERT in parallel mode, we'd better change the comment
> here.
> >
> 
> Thanks, it does need to be updated for parallel INSERT.
> I was thinking of the following change:
> 
> -     * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
> -     * would require (a) storing the combocid hash in shared memory, rather
> -     * than synchronizing it just once at the start of parallelism, and (b) an
> -     * alternative to heap_update()'s reliance on xmax for mutual exclusion.
> -     * INSERT may have no such troubles, but we forbid it to simplify the
> -     * checks.
> +     * Except for INSERT, don't allow writes in parallel mode.  Supporting
> +     * UPDATE and DELETE would require (a) storing the combocid hash in
> shared
> +     * memory, rather than synchronizing it just once at the start of
> +     * parallelism, and (b) an alternative to heap_update()'s reliance on xmax
> +     * for mutual exclusion.
> 
> 
> 
> > 2) src/backend/storage/lmgr/README
> > dangers are modest.  The leader and worker share the same transaction,
> > snapshot, and combo CID hash, and neither can perform any DDL or,
> > indeed, write any data at all.  Thus, for either to read a table
> > locked exclusively by
> >
> > The same as 1), parallel insert is the exception.
> >
> 
> I agree, it needs to be updated too, to account for parallel INSERT now being
> supported.
> 
> -write any data at all.  ...
> +write any data at all (with the exception of parallel insert).  ...
> 
> 
> > 3) src/backend/storage/lmgr/README
> > mutual exclusion method for such cases.  Currently, the parallel mode
> > is strictly read-only, but now we have the infrastructure to allow
> > parallel inserts and parallel copy.
> >
> > May be we can say:
> > +mutual exclusion method for such cases.  Currently, we only allowed
> > +parallel inserts, but we already have the infrastructure to allow parallel copy.
> >
> 
> Yes, agree, something like:
> 
> -mutual exclusion method for such cases.  Currently, the parallel mode is
> -strictly read-only, but now we have the infrastructure to allow parallel -inserts
> and parallel copy.
> +mutual exclusion method for such cases.  Currently, only parallel
> +insert is allowed, but we have the infrastructure to allow parallel copy.
> 
> 
> Let me know if these changes seem OK to you.

Yes, these changes look good to me.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

From
Greg Nancarrow
Date:
On Mon, Mar 22, 2021 at 6:28 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> >
> > Let me know if these changes seem OK to you.
>
> Yes, these changes look good to me.

Posting an updated set of patches with these changes...

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel INSERT (INTO ... SELECT ...)

From
Amit Kapila
Date:
On Mon, Mar 22, 2021 at 3:57 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Mon, Mar 22, 2021 at 6:28 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > >
> > > Let me know if these changes seem OK to you.
> >
> > Yes, these changes look good to me.
>
> Posting an updated set of patches with these changes...
>

I have marked this as Returned with Feedback. There is a lot of work
to do for this patch as per the feedback given on pgsql-committers
[1].

[1] - https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY%40gemulon.postgresql.org

-- 
With Regards,
Amit Kapila.