Thread: Parallel Inserts in CREATE TABLE AS

Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
Hi,

The idea of this patch is to allow the leader and each worker insert the tuples in parallel if the SELECT part of the CTAS is parallelizable. Along with the parallel inserts, if the CTAS code path is allowed to do table_multi_insert()[1], then the gain we achieve is as follows:

For a table with 2 integer columns, 100million tuples(more testing results are at [2]), the exec time on the HEAD is 120sec, where as with the parallelism patch proposed here and multi insert patch [1], with 3 workers and leader participation the exec time is 22sec(5.45X). With the current CTAS code which does single tuple insert(see intorel_receive()), the performance gain is limited to ~1.7X with parallelism. This is due to the fact that the workers contend more for locks on buffer pages while extending the table. So, the maximum benefit we could get for CTAS is with both parallelism and multi tuple inserts.

The design:
Let the planner know that the SELECT is from CTAS in createas.c so that it can set the number of tuples transferred from the workers to Gather node to 0. With this change, there are chances that the planner may choose the parallel plan. After the planning, check if the upper plan node is Gather in createas.c and mark a parallelism flag in the CTAS dest receiver. Pass the into clause, object id, command id from the leader to workers, so that each worker can create its own CTAS dest receiver. Leader inserts it's share of tuples if instructed to do, and so are workers. Each worker writes atomically it's number of inserted tuples into a shared memory variable, the leader combines this with it's own number of inserted tuples and shares to the client.

Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker" from heap_prepare_insert() for only CTAS cases or for that matter parallel copy? How about having a variable in any of the worker global contexts and use that? Of course, we can remove this restriction entirely in case we fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
2. How to represent the parallel insert for CTAS in explain plans? The explain CTAS shows the plan for only the SELECT part. How about having some textual info along with the Gather node?
     -----------------------------------------------------------------------------
     Gather  (cost=1000.00..108738.90 rows=0 width=8)
     Workers Planned: 2
        ->  Parallel Seq Scan on t_test  (cost=0.00..106748.00 rows=4954 width=8)
             Filter: (many < 10000)
   -----------------------------------------------------------------------------
3. Need to restrict parallel inserts, if CTAS tries to create temp/global tables as the workers will not have access to those tables. Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
4. Need to stop unnecessary parallel shared state such as tuple queue being created and shared to workers.
5. Addition of new test cases. Testing with more scenarios and different data sets, sizes, tablespaces, select into. Analysis on the 2 mismatches in write_parallel.sql regression test.

Thoughts?

Credits:
1. Thanks to DIlip Kumar for the main design idea and the discussions. Thanks to Vignesh for the discussions.
2. Patch development, testing is by me.
3. Thanks to the authors of table_multi_insert() in CTAS patch [1].

[1] - For table_multi_insert() in CTAS, I used an in-progress patch available at https://www.postgresql.org/message-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1%3DN1AC-Fw%40mail.gmail.com
[2] - Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:
>
> [1] - For table_multi_insert() in CTAS, I used an in-progress patch available at
https://www.postgresql.org/message-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1%3DN1AC-Fw%40mail.gmail.com
> [2] - Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file.
Allreadings are of triplet form - (workers, exec time in sec, improvement).
 
> case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
> case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
> case 3: same table but unlogged with multi inserts -
(0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
>

I feel this enhancement could give good improvement, +1 for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Andres Freund
Date:
Hi,

On 2020-09-23 17:20:20 +0530, Bharath Rupireddy wrote:
> The idea of this patch is to allow the leader and each worker insert the
> tuples in parallel if the SELECT part of the CTAS is parallelizable.

Cool!


> The design:

I think it'd be good if you could explain a bit more why you think this
safe to do in the way you have done it.

E.g. from a quick scroll through the patch, there's not even a comment
explaining that the only reason there doesn't need to be code dealing
with xid assignment because we already did the catalog changes to create
the table. But how does that work for SELECT INTO? Are you prohibiting
that? ...


> Pass the into clause, object id, command id from the leader to
> workers, so that each worker can create its own CTAS dest
> receiver. Leader inserts it's share of tuples if instructed to do, and
> so are workers. Each worker writes atomically it's number of inserted
> tuples into a shared memory variable, the leader combines this with
> it's own number of inserted tuples and shares to the client.
> 
> Below things are still pending. Thoughts are most welcome:
> 1. How better we can lift the "cannot insert tuples in a parallel worker"
> from heap_prepare_insert() for only CTAS cases or for that matter parallel
> copy? How about having a variable in any of the worker global contexts and
> use that? Of course, we can remove this restriction entirely in case we
> fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.

I have mentioned before that I think it'd be good if we changed the
insert APIs to have a more 'scan' like structure. I am thinking of
something like

TableInsertScan* table_begin_insert(Relation);
table_tuple_insert(TableInsertScan *is, other, args);
table_multi_insert(TableInsertScan *is, other, args);
table_end_insert(TableInsertScan *);

that'd then replace the BulkInsertStateData logic we have right now. But
more importantly it'd allow an AM to optimize operations across multiple
inserts, which is important for column stores.

And for the purpose of your question, we could then have a
table_insert_allow_parallel(TableInsertScan *);
or an additional arg to table_begin_insert().



> 3. Need to restrict parallel inserts, if CTAS tries to create temp/global
> tables as the workers will not have access to those tables. Need to analyze
> whether to allow parallelism if CTAS has prepared statements or with no
> data.

In which case does CTAS not create a table? You definitely need to
ensure that the table is created before your workers are started, and
there needs to be in a different CommandId.


Greetings,

Andres Freund



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
Thanks Andres for the comments.

On Thu, Sep 24, 2020 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
>
> > The design:
>
> I think it'd be good if you could explain a bit more why you think this
> safe to do in the way you have done it.
>
> E.g. from a quick scroll through the patch, there's not even a comment
> explaining that the only reason there doesn't need to be code dealing
> with xid assignment because we already did the catalog changes to create
> the table.
>

Yes we do a bunch of catalog changes related to the created new table.
We will have both the txn id and command id assigned when catalogue
changes are being made. But, right after the table is created in the
leader, the command id is incremented (CommandCounterIncrement() is
called from create_ctas_internal()) whereas the txn id remains the
same. The new command id is marked as GetCurrentCommandId(true); in
intorel_startup, then the parallel mode is entered. The txn id and
command id are serialized into parallel DSM, they are then available
to all parallel workers. This is discussed in [1].

Few changes I have to make in the parallel worker code: set
currentCommandIdUsed = true;, may be via a common API
SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
extra command id sharing from the leader to workers.

I will add a few comments in the upcoming patch related to the above info.

>
> But how does that work for SELECT INTO? Are you prohibiting
> that? ...
>

In case of SELECT INTO, a new table gets created and I'm not
prohibiting the parallel inserts and I think we don't need to.
Thoughts?

>
> > Below things are still pending. Thoughts are most welcome:
> > 1. How better we can lift the "cannot insert tuples in a parallel worker"
> > from heap_prepare_insert() for only CTAS cases or for that matter parallel
> > copy? How about having a variable in any of the worker global contexts and
> > use that? Of course, we can remove this restriction entirely in case we
> > fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
>
> And for the purpose of your question, we could then have a
> table_insert_allow_parallel(TableInsertScan *);
> or an additional arg to table_begin_insert().
>

Removing "cannot insert tuples in a parallel worker" restriction from
heap_prepare_insert() is a common problem for parallel inserts in
general, i.e. parallel inserts in CTAS, parallel INSERT INTO
SELECTs[1] and parallel copy[2]. It will be good if a common solution
is agreed.

>
> > 3. Need to restrict parallel inserts, if CTAS tries to create temp/global
> > tables as the workers will not have access to those tables. Need to analyze
> > whether to allow parallelism if CTAS has prepared statements or with no
> > data.
>
> In which case does CTAS not create a table?

AFAICS, the table gets created in all the cases but the insertion of
the data gets skipped if the user specifies "with no data" option in
which case the select part is not even planned, and so the parallelism
will also not be picked.

>
> You definitely need to
> ensure that the table is created before your workers are started, and
> there needs to be in a different CommandId.
>

Yeah, this is already being done. Table gets created in the
leader(intorel_startup which gets called from dest->rStartup(dest in
standard_ExecutorRun()) before entering the parallel mode.

[1] https://www.postgresql.org/message-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c%2BUu3XU5VMv37Aw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BkpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV%3Dg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Mon, Sep 28, 2020 at 3:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks Andres for the comments.
>
> On Thu, Sep 24, 2020 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > > The design:
> >
> > I think it'd be good if you could explain a bit more why you think this
> > safe to do in the way you have done it.
> >
> > E.g. from a quick scroll through the patch, there's not even a comment
> > explaining that the only reason there doesn't need to be code dealing
> > with xid assignment because we already did the catalog changes to create
> > the table.
> >
>
> Yes we do a bunch of catalog changes related to the created new table.
> We will have both the txn id and command id assigned when catalogue
> changes are being made. But, right after the table is created in the
> leader, the command id is incremented (CommandCounterIncrement() is
> called from create_ctas_internal()) whereas the txn id remains the
> same. The new command id is marked as GetCurrentCommandId(true); in
> intorel_startup, then the parallel mode is entered. The txn id and
> command id are serialized into parallel DSM, they are then available
> to all parallel workers. This is discussed in [1].
>
> Few changes I have to make in the parallel worker code: set
> currentCommandIdUsed = true;, may be via a common API
> SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
> extra command id sharing from the leader to workers.
>
> I will add a few comments in the upcoming patch related to the above info.
>

Yes, that would be good.

> >
> > But how does that work for SELECT INTO? Are you prohibiting
> > that? ...
> >
>
> In case of SELECT INTO, a new table gets created and I'm not
> prohibiting the parallel inserts and I think we don't need to.
>

So, in this case, also do we ensure that table is created before we
launch the workers. If so, I think you can explain in comments about
it and what you need to do that to ensure the same.

While skimming through the patch, a small thing I noticed:
+ /*
+ * SELECT part of the CTAS is parallelizable, so we can make
+ * each parallel worker insert the tuples that are resulted
+ * in it's execution into the target table.
+ */
+ if (!is_matview &&
+ IsA(plan->planTree, Gather))
+ ((DR_intorel *) dest)->is_parallel = true;
+

I am not sure at this stage if this is the best way to make CTAS as
parallel but if so, then probably you can expand the comments a bit to
say why you consider only Gather node (and that too when it is the
top-most node) and why not another parallel node like GatherMerge?

> Thoughts?
>
> >
> > > Below things are still pending. Thoughts are most welcome:
> > > 1. How better we can lift the "cannot insert tuples in a parallel worker"
> > > from heap_prepare_insert() for only CTAS cases or for that matter parallel
> > > copy? How about having a variable in any of the worker global contexts and
> > > use that? Of course, we can remove this restriction entirely in case we
> > > fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
> >
> > And for the purpose of your question, we could then have a
> > table_insert_allow_parallel(TableInsertScan *);
> > or an additional arg to table_begin_insert().
> >
>
> Removing "cannot insert tuples in a parallel worker" restriction from
> heap_prepare_insert() is a common problem for parallel inserts in
> general, i.e. parallel inserts in CTAS, parallel INSERT INTO
> SELECTs[1] and parallel copy[2]. It will be good if a common solution
> is agreed.
>

Right, for now, I think you can simply remove that check from the code
instead of just commenting it. We will see if there is a better
check/Assert we can add there.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Yes we do a bunch of catalog changes related to the created new table.
> > We will have both the txn id and command id assigned when catalogue
> > changes are being made. But, right after the table is created in the
> > leader, the command id is incremented (CommandCounterIncrement() is
> > called from create_ctas_internal()) whereas the txn id remains the
> > same. The new command id is marked as GetCurrentCommandId(true); in
> > intorel_startup, then the parallel mode is entered. The txn id and
> > command id are serialized into parallel DSM, they are then available
> > to all parallel workers. This is discussed in [1].
> >
> > Few changes I have to make in the parallel worker code: set
> > currentCommandIdUsed = true;, may be via a common API
> > SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
> > extra command id sharing from the leader to workers.
> >
> > I will add a few comments in the upcoming patch related to the above info.
> >
>
> Yes, that would be good.
>

Added comments.

>
> > > But how does that work for SELECT INTO? Are you prohibiting
> > > that? ...
> > >
> >
> > In case of SELECT INTO, a new table gets created and I'm not
> > prohibiting the parallel inserts and I think we don't need to.
> >
>
> So, in this case, also do we ensure that table is created before we
> launch the workers. If so, I think you can explain in comments about
> it and what you need to do that to ensure the same.
>

For SELECT INTO, the table gets created by the leader in
create_ctas_internal(), then ExecInitParallelPlan() gets called which
launches the workers and then the leader(if asked to do so) and the
workers insert the rows. So, we don't need to do any extra work to
ensure the table gets created before the workers start inserting
tuples.

>
> While skimming through the patch, a small thing I noticed:
> + /*
> + * SELECT part of the CTAS is parallelizable, so we can make
> + * each parallel worker insert the tuples that are resulted
> + * in it's execution into the target table.
> + */
> + if (!is_matview &&
> + IsA(plan->planTree, Gather))
> + ((DR_intorel *) dest)->is_parallel = true;
> +
>
> I am not sure at this stage if this is the best way to make CTAS as
> parallel but if so, then probably you can expand the comments a bit to
> say why you consider only Gather node (and that too when it is the
> top-most node) and why not another parallel node like GatherMerge?
>

If somebody expects to preserve the order of the tuples that are
coming from GatherMerge node of the select part in CTAS or SELECT INTO
while inserting, now if parallelism is allowed, that may not be the
case i.e. the order of insertion of tuples may vary. I'm not quite
sure, if someone wants to use order by in the select parts of CTAS or
SELECT INTO in a real world use case. Thoughts?

>
> Right, for now, I think you can simply remove that check from the code
> instead of just commenting it. We will see if there is a better
> check/Assert we can add there.
>

Done.

I also worked on some of the open points I listed earlier in my mail.

>
> 3. Need to restrict parallel inserts, if CTAS tries to create temp/global tables as the workers will not have access
tothose tables.
 
>

Done.

>
> Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
>

For prepared statements, the parallelism will not be picked and so is
parallel insertion.
For CTAS with no data option case the select part is not even planned,
and so the parallelism will also not be picked.

>
> 4. Need to stop unnecessary parallel shared state such as tuple queue being created and shared to workers.
>

Done.

I'm listing the things that are still pending.

1. How to represent the parallel insert for CTAS in explain plans? The
explain CTAS shows the plan for only the SELECT part. How about having
some textual info along with the Gather node? I'm not quite sure on
this point, any suggestions are welcome.
2. Addition of new test cases. Testing with more scenarios and
different data sets, sizes, tablespaces, select into. Analysis on the
2 mismatches in write_parallel.sql regression test.

Attaching v2 patch, thoughts and comments are welcome.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Wed, Oct 14, 2020 at 2:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > While skimming through the patch, a small thing I noticed:
> > + /*
> > + * SELECT part of the CTAS is parallelizable, so we can make
> > + * each parallel worker insert the tuples that are resulted
> > + * in it's execution into the target table.
> > + */
> > + if (!is_matview &&
> > + IsA(plan->planTree, Gather))
> > + ((DR_intorel *) dest)->is_parallel = true;
> > +
> >
> > I am not sure at this stage if this is the best way to make CTAS as
> > parallel but if so, then probably you can expand the comments a bit to
> > say why you consider only Gather node (and that too when it is the
> > top-most node) and why not another parallel node like GatherMerge?
> >
>
> If somebody expects to preserve the order of the tuples that are
> coming from GatherMerge node of the select part in CTAS or SELECT INTO
> while inserting, now if parallelism is allowed, that may not be the
> case i.e. the order of insertion of tuples may vary. I'm not quite
> sure, if someone wants to use order by in the select parts of CTAS or
> SELECT INTO in a real world use case. Thoughts?
>

I think there is no reason why one can't use ORDER BY in the
statements we are talking about here. But, I think we can't enable
parallelism for GatherMerge is because for that node we always need to
fetch the data in the leader backend to perform the final merge phase.
So, I was expecting a small comment saying something on those lines.

>
> >
> > Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
> >
>
> For prepared statements, the parallelism will not be picked and so is
> parallel insertion.
>

Hmm, I am not sure what makes you say this statement. The parallelism
is enabled for prepared statements since commit 57a6a72b6b.

>
> I'm listing the things that are still pending.
>
> 1. How to represent the parallel insert for CTAS in explain plans? The
> explain CTAS shows the plan for only the SELECT part. How about having
> some textual info along with the Gather node? I'm not quite sure on
> this point, any suggestions are welcome.
>

I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Oct 14, 2020 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > If somebody expects to preserve the order of the tuples that are
> > coming from GatherMerge node of the select part in CTAS or SELECT INTO
> > while inserting, now if parallelism is allowed, that may not be the
> > case i.e. the order of insertion of tuples may vary. I'm not quite
> > sure, if someone wants to use order by in the select parts of CTAS or
> > SELECT INTO in a real world use case. Thoughts?
> >
>
> I think there is no reason why one can't use ORDER BY in the
> statements we are talking about here. But, I think we can't enable
> parallelism for GatherMerge is because for that node we always need to
> fetch the data in the leader backend to perform the final merge phase.
> So, I was expecting a small comment saying something on those lines.
>

Sure, I will add comments in the upcoming patch.

>
> > For prepared statements, the parallelism will not be picked and so is
> > parallel insertion.
>
> Hmm, I am not sure what makes you say this statement. The parallelism
> is enabled for prepared statements since commit 57a6a72b6b.
>

Thanks for letting me know this. I misunderstood the parallelism for prepared statements. Now, I verified with a proper use case(see below), where I had a prepared statement, CTAS having EXECUTE, in this case too parallelism is picked and parallel insertion happened with the patch proposed in this thread. Do we have any problems if we allow parallel insertion for these cases?

PREPARE myselect AS SELECT * FROM t1;
EXPLAIN ANALYZE CREATE TABLE t1_test AS EXECUTE myselect;

I think the commit 57a6a72b6b has not added any test cases, isn't it good to add one in prepare.sql or select_parallel.sql?

>
> > 1. How to represent the parallel insert for CTAS in explain plans? The
> > explain CTAS shows the plan for only the SELECT part. How about having
> > some textual info along with the Gather node? I'm not quite sure on
> > this point, any suggestions are welcome.
>
> I am also not sure about this point because we don't display anything
> for the DDL part in explain. Can you propose by showing some example
> of what you have in mind?
>

I thought we could have something like this.
 -----------------------------------------------------------------------------
     Gather  (cost=1000.00..108738.90 rows=0 width=8)
     Workers Planned: 2 Parallel Insert on t_test1
        ->  Parallel Seq Scan on t_test  (cost=0.00..106748.00 rows=4954 width=8)
             Filter: (many < 10000)
 -----------------------------------------------------------------------------

With Regards,
Bharath Rupireddy.

Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Thu, Oct 15, 2020 at 9:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > For prepared statements, the parallelism will not be picked and so is
> > > parallel insertion.
> >
> > Hmm, I am not sure what makes you say this statement. The parallelism
> > is enabled for prepared statements since commit 57a6a72b6b.
> >
>
> Thanks for letting me know this. I misunderstood the parallelism for prepared statements. Now, I verified with a
properuse case(see below), where I had a prepared statement, CTAS having EXECUTE, in this case too parallelism is
pickedand parallel insertion happened with the patch proposed in this thread. Do we have any problems if we allow
parallelinsertion for these cases? 
>
> PREPARE myselect AS SELECT * FROM t1;
> EXPLAIN ANALYZE CREATE TABLE t1_test AS EXECUTE myselect;
>
> I think the commit 57a6a72b6b has not added any test cases, isn't it good to add one in prepare.sql or
select_parallel.sql?
>

I am not sure if it is worth as this is not functionality which is too
complex or there are many chances of getting it broken.

> >
> > > 1. How to represent the parallel insert for CTAS in explain plans? The
> > > explain CTAS shows the plan for only the SELECT part. How about having
> > > some textual info along with the Gather node? I'm not quite sure on
> > > this point, any suggestions are welcome.
> >
> > I am also not sure about this point because we don't display anything
> > for the DDL part in explain. Can you propose by showing some example
> > of what you have in mind?
> >
>
> I thought we could have something like this.
>  -----------------------------------------------------------------------------
>      Gather  (cost=1000.00..108738.90 rows=0 width=8)
>      Workers Planned: 2 Parallel Insert on t_test1
>         ->  Parallel Seq Scan on t_test  (cost=0.00..106748.00 rows=4954 width=8)
>              Filter: (many < 10000)
>  -----------------------------------------------------------------------------
>

maybe something like below:
Gather  (cost=1000.00..108738.90 rows=0 width=8)
   -> Create t_test1
       ->  Parallel Seq Scan on t_test

I don't know what is the best thing to do here. I think for the
temporary purpose you can keep something like above then once the
patch is matured then we can take a separate opinion for this.

--
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 14.10.20 11:16, Bharath Rupireddy wrote:
> On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>> Yes we do a bunch of catalog changes related to the created new table.
>>> We will have both the txn id and command id assigned when catalogue
>>> changes are being made. But, right after the table is created in the
>>> leader, the command id is incremented (CommandCounterIncrement() is
>>> called from create_ctas_internal()) whereas the txn id remains the
>>> same. The new command id is marked as GetCurrentCommandId(true); in
>>> intorel_startup, then the parallel mode is entered. The txn id and
>>> command id are serialized into parallel DSM, they are then available
>>> to all parallel workers. This is discussed in [1].
>>>
>>> Few changes I have to make in the parallel worker code: set
>>> currentCommandIdUsed = true;, may be via a common API
>>> SetCurrentCommandIdUsedForWorker() proposed in [1] and remove the
>>> extra command id sharing from the leader to workers.
>>>
>>> I will add a few comments in the upcoming patch related to the above info.
>>>
>>
>> Yes, that would be good.
>>
> 
> Added comments.
> 
>>
>>>> But how does that work for SELECT INTO? Are you prohibiting
>>>> that? ...
>>>>
>>>
>>> In case of SELECT INTO, a new table gets created and I'm not
>>> prohibiting the parallel inserts and I think we don't need to.
>>>
>>
>> So, in this case, also do we ensure that table is created before we
>> launch the workers. If so, I think you can explain in comments about
>> it and what you need to do that to ensure the same.
>>
> 
> For SELECT INTO, the table gets created by the leader in
> create_ctas_internal(), then ExecInitParallelPlan() gets called which
> launches the workers and then the leader(if asked to do so) and the
> workers insert the rows. So, we don't need to do any extra work to
> ensure the table gets created before the workers start inserting
> tuples.
> 
>>
>> While skimming through the patch, a small thing I noticed:
>> + /*
>> + * SELECT part of the CTAS is parallelizable, so we can make
>> + * each parallel worker insert the tuples that are resulted
>> + * in it's execution into the target table.
>> + */
>> + if (!is_matview &&
>> + IsA(plan->planTree, Gather))
>> + ((DR_intorel *) dest)->is_parallel = true;
>> +
>>
>> I am not sure at this stage if this is the best way to make CTAS as
>> parallel but if so, then probably you can expand the comments a bit to
>> say why you consider only Gather node (and that too when it is the
>> top-most node) and why not another parallel node like GatherMerge?
>>
> 
> If somebody expects to preserve the order of the tuples that are
> coming from GatherMerge node of the select part in CTAS or SELECT INTO
> while inserting, now if parallelism is allowed, that may not be the
> case i.e. the order of insertion of tuples may vary. I'm not quite
> sure, if someone wants to use order by in the select parts of CTAS or
> SELECT INTO in a real world use case. Thoughts?
> 
>>
>> Right, for now, I think you can simply remove that check from the code
>> instead of just commenting it. We will see if there is a better
>> check/Assert we can add there.
>>
> 
> Done.
> 
> I also worked on some of the open points I listed earlier in my mail.
> 
>>
>> 3. Need to restrict parallel inserts, if CTAS tries to create temp/global tables as the workers will not have access
tothose tables.
 
>>
> 
> Done.
> 
>>
>> Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
>>
> 
> For prepared statements, the parallelism will not be picked and so is
> parallel insertion.
> For CTAS with no data option case the select part is not even planned,
> and so the parallelism will also not be picked.
> 
>>
>> 4. Need to stop unnecessary parallel shared state such as tuple queue being created and shared to workers.
>>
> 
> Done.
> 
> I'm listing the things that are still pending.
> 
> 1. How to represent the parallel insert for CTAS in explain plans? The
> explain CTAS shows the plan for only the SELECT part. How about having
> some textual info along with the Gather node? I'm not quite sure on
> this point, any suggestions are welcome.
> 2. Addition of new test cases. Testing with more scenarios and
> different data sets, sizes, tablespaces, select into. Analysis on the
> 2 mismatches in write_parallel.sql regression test.
> 
> Attaching v2 patch, thoughts and comments are welcome.
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Really looking forward to this ending up in postgres as I think it's a 
very nice improvement.

Whilst reviewing your patch I was wondering: is there a reason you did 
not introduce a batch insert in the destreceiver for the CTAS? For me 
this makes a huge difference in ingest speed as otherwise the inserts do 
not really scale so well as lock contention start to be a big problem. 
If you like I can make a patch to introduce this on top?

Kind regards,
Luc
Swarm64



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Oct 16, 2020 at 11:33 AM Luc Vlaming <luc@swarm64.com> wrote:
>
> Really looking forward to this ending up in postgres as I think it's a
> very nice improvement.
>
> Whilst reviewing your patch I was wondering: is there a reason you did
> not introduce a batch insert in the destreceiver for the CTAS? For me
> this makes a huge difference in ingest speed as otherwise the inserts do
> not really scale so well as lock contention start to be a big problem.
> If you like I can make a patch to introduce this on top?
>

Thanks for your interest. You are right, we can get maximum
improvement if we have multi inserts in destreceiver for the CTAS on
the similar lines to COPY FROM command. I specified this point in my
first mail [1]. You may want to take a look at an already existing
patch [2] for multi inserts, I think there are some review comments to
be addressed in that patch. I would love to see the multi insert patch
getting revived.

[1] - https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1%3DN1AC-Fw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 16.10.20 08:23, Bharath Rupireddy wrote:
> On Fri, Oct 16, 2020 at 11:33 AM Luc Vlaming <luc@swarm64.com> wrote:
>>
>> Really looking forward to this ending up in postgres as I think it's a
>> very nice improvement.
>>
>> Whilst reviewing your patch I was wondering: is there a reason you did
>> not introduce a batch insert in the destreceiver for the CTAS? For me
>> this makes a huge difference in ingest speed as otherwise the inserts do
>> not really scale so well as lock contention start to be a big problem.
>> If you like I can make a patch to introduce this on top?
>>
> 
> Thanks for your interest. You are right, we can get maximum
> improvement if we have multi inserts in destreceiver for the CTAS on
> the similar lines to COPY FROM command. I specified this point in my
> first mail [1]. You may want to take a look at an already existing
> patch [2] for multi inserts, I think there are some review comments to
> be addressed in that patch. I would love to see the multi insert patch
> getting revived.
> 
> [1] - https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1%3DN1AC-Fw%40mail.gmail.com
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Sorry had not seen that pointer in your first email.

I'll first finish some other patches I'm working on and then I'll try to 
revive that patch. Thanks for the pointers.

Kind regards,
Luc
Swarm64



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Oct 15, 2020 at 3:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > > 1. How to represent the parallel insert for CTAS in explain plans? The
> > > > explain CTAS shows the plan for only the SELECT part. How about having
> > > > some textual info along with the Gather node? I'm not quite sure on
> > > > this point, any suggestions are welcome.
> > >
> > > I am also not sure about this point because we don't display anything
> > > for the DDL part in explain. Can you propose by showing some example
> > > of what you have in mind?
> >
> > I thought we could have something like this.
> >  -----------------------------------------------------------------------------
> >      Gather  (cost=1000.00..108738.90 rows=0 width=8)
> >      Workers Planned: 2 Parallel Insert on t_test1
> >         ->  Parallel Seq Scan on t_test  (cost=0.00..106748.00 rows=4954 width=8)
> >              Filter: (many < 10000)
> >  -----------------------------------------------------------------------------
>
> maybe something like below:
> Gather  (cost=1000.00..108738.90 rows=0 width=8)
>    -> Create t_test1
>        ->  Parallel Seq Scan on t_test
>
> I don't know what is the best thing to do here. I think for the
> temporary purpose you can keep something like above then once the
> patch is matured then we can take a separate opinion for this.
>

Agreed. Here's a snapshot of explain with the change suggested.

postgres=# EXPLAIN (ANALYZE, COSTS OFF) CREATE TABLE t1_test AS SELECT * FROM t1;
                                   QUERY PLAN                            
---------------------------------------------------------------------------------
 Gather (actual time=970.524..972.913 rows=0 loops=1)
   ->  Create t1_test
     Workers Planned: 2
     Workers Launched: 2
     ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
 Planning Time: 0.049 ms
 Execution Time: 973.733 ms

>
> I think there is no reason why one can't use ORDER BY in the
> statements we are talking about here. But, I think we can't enable
> parallelism for GatherMerge is because for that node we always need to
> fetch the data in the leader backend to perform the final merge phase.
> So, I was expecting a small comment saying something on those lines.
>

Added comments.

>
> 2. Addition of new test cases.
>

Added new test cases.

>
> Analysis on the 2 mismatches in write_parallel.sql regression test.
>

Done. It needed a small code change in costsize.c. Now, both make check and make check-world passes.

Apart from above, a couple of other things I have finished with the v3 patch.

1. Both make check and make check-world with force_parallel_mode = regress passes.
2. I enabled parallel inserts in case of materialized views. Hope that's fine.

Attaching v3 patch herewith.

I'm done with all the open points in my list. Please review the v3 patch and provide comments.

With Regards,
Bharath Rupireddy.
Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Oct 19, 2020 at 10:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Attaching v3 patch herewith.
>
> I'm done with all the open points in my list. Please review the v3 patch and provide comments.
>

Attaching v4 patch, rebased on the latest master 68b1a4877e. Also,
added this feature to commitfest -
https://commitfest.postgresql.org/31/2841/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi,

I'm very interested in this feature,
and I'm looking at the patch, here are some comments.

1.
+            if (!TupIsNull(outerTupleSlot))
+            {
+                (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
+                node->ps.state->es_processed++;
+            }
+
+            if(TupIsNull(outerTupleSlot))
+                break;
+        }

How about the following style:

        if(TupIsNull(outerTupleSlot))
            Break;
        
        (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
        node->ps.state->es_processed++;

Which looks cleaner.


2.
+
+    if (into != NULL &&
+        IsA(into, IntoClause))
+    {

The check can be replaced by ISCTAS(into).


3.
+    /*
+     * For parallelizing inserts in CTAS i.e. making each
+     * parallel worker inerst it's tuples, we must send
+     * information such as intoclause(for each worker

'inerst' looks like a typo (insert).


4.
+    /* Estimate space for into clause for CTAS. */
+    if (ISCTAS(planstate->intoclause))
+    {
+        intoclausestr = nodeToString(planstate->intoclause);
+        shm_toc_estimate_chunk(&pcxt->estimator, strlen(intoclausestr) + 1);
+        shm_toc_estimate_keys(&pcxt->estimator, 1);
+    }
...
+    if (intoclausestr != NULL)
+    {
+        char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+                                                strlen(intoclausestr) + 1);
+        strcpy(shmptr, intoclausestr);
+        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+    }

The code here call strlen(intoclausestr) for two times,
After checking the existing code in ExecInitParallelPlan,
It used to store the strlen in a variable.

So how about the following style:

    intoclause_len = strlen(intoclausestr);
    ...
    /* Store serialized intoclause. */
    intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
    memcpy(shmptr, intoclausestr, intoclause_len + 1);
    shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);

the code in ExecInitParallelPlan 


5.
+    if (intoclausestr != NULL)
+    {
+        char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+                                                strlen(intoclausestr) + 1);
+        strcpy(shmptr, intoclausestr);
+        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr);
+    }
+
     /* Set up the tuple queues that the workers will write into. */
-    pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
+    if (intoclausestr == NULL)
+        pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);

The two check about intoclausestr seems can be combined like:

if (intoclausestr != NULL)
{
...
}
else
{
...
}

Best regards,
houzj





Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Nov 24, 2020 at 4:43 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> I'm very interested in this feature,
> and I'm looking at the patch, here are some comments.
>

Thanks for the review.

>
> How about the following style:
>
>                 if(TupIsNull(outerTupleSlot))
>                         Break;
>
>                 (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
>                 node->ps.state->es_processed++;
>
> Which looks cleaner.
>

Done.

>
> The check can be replaced by ISCTAS(into).
>

Done.

>
> 'inerst' looks like a typo (insert).
>

Corrected.

>
> The code here call strlen(intoclausestr) for two times,
> After checking the existing code in ExecInitParallelPlan,
> It used to store the strlen in a variable.
>
> So how about the following style:
>
>         intoclause_len = strlen(intoclausestr);
>         ...
>         /* Store serialized intoclause. */
>         intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
>         memcpy(shmptr, intoclausestr, intoclause_len + 1);
>         shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
>

Done.

>
> The two check about intoclausestr seems can be combined like:
>
> if (intoclausestr != NULL)
> {
> ...
> }
> else
> {
> ...
> }
>

Done.

Attaching v5 patch. Please consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi,

I have an issue about the following code:

    econtext = node->ps.ps_ExprContext;
     ResetExprContext(econtext);
 
+    if (ISCTAS(node->ps.intoclause))
+    {
+        ExecParallelInsertInCTAS(node);
+        return NULL;
+    }

    /* If no projection is required, we're done. */
    if (node->ps.ps_ProjInfo == NULL)
        return slot;

    /*
     * Form the result tuple using ExecProject(), and return it.
     */
    econtext->ecxt_outertuple = slot;
    return ExecProject(node->ps.ps_ProjInfo);

It seems the projection will be skipped.
Is this because projection is not required in this case ?
(I'm not very familiar with where the projection will be.)

If projection is not required here, shall we add some comments here?

Best regards,
houzj





Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> I have an issue about the following code:
>
>         econtext = node->ps.ps_ExprContext;
>         ResetExprContext(econtext);
>
> +       if (ISCTAS(node->ps.intoclause))
> +       {
> +               ExecParallelInsertInCTAS(node);
> +               return NULL;
> +       }
>
>         /* If no projection is required, we're done. */
>         if (node->ps.ps_ProjInfo == NULL)
>                 return slot;
>
>         /*
>          * Form the result tuple using ExecProject(), and return it.
>          */
>         econtext->ecxt_outertuple = slot;
>         return ExecProject(node->ps.ps_ProjInfo);
>
> It seems the projection will be skipped.
> Is this because projection is not required in this case ?
> (I'm not very familiar with where the projection will be.)
>

For parallel inserts in CTAS, I don't think we need to project the
tuples being returned from the underlying plan nodes, and also we have
nothing to project from the Gather node further up. The required
projection will happen while the tuples are being returned from the
underlying nodes and the projected tuples are being directly fed to
CTAS's dest receiver intorel_receive(), from there into the created
table. We don't need ExecProject again in ExecParallelInsertInCTAS().

For instance, projection will always be done when the tuple is being
returned from an underlying sequential scan node(see ExecScan() -->
ExecProject() and this is true for both leader and workers. In both
leader and workers, we are just calling CTAS's dest receiver
intorel_receive().

Thoughts?

>
> If projection is not required here, shall we add some comments here?
>

If the above point looks okay, I can add a comment.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi ,

> On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
> wrote:
> >
> > Hi,
> >
> > I have an issue about the following code:
> >
> >         econtext = node->ps.ps_ExprContext;
> >         ResetExprContext(econtext);
> >
> > +       if (ISCTAS(node->ps.intoclause))
> > +       {
> > +               ExecParallelInsertInCTAS(node);
> > +               return NULL;
> > +       }
> >
> >         /* If no projection is required, we're done. */
> >         if (node->ps.ps_ProjInfo == NULL)
> >                 return slot;
> >
> >         /*
> >          * Form the result tuple using ExecProject(), and return it.
> >          */
> >         econtext->ecxt_outertuple = slot;
> >         return ExecProject(node->ps.ps_ProjInfo);
> >
> > It seems the projection will be skipped.
> > Is this because projection is not required in this case ?
> > (I'm not very familiar with where the projection will be.)
> >
> 
> For parallel inserts in CTAS, I don't think we need to project the tuples
> being returned from the underlying plan nodes, and also we have nothing
> to project from the Gather node further up. The required projection will
> happen while the tuples are being returned from the underlying nodes and
> the projected tuples are being directly fed to CTAS's dest receiver
> intorel_receive(), from there into the created table. We don't need
> ExecProject again in ExecParallelInsertInCTAS().
> 
> For instance, projection will always be done when the tuple is being returned
> from an underlying sequential scan node(see ExecScan() -->
> ExecProject() and this is true for both leader and workers. In both leader
> and workers, we are just calling CTAS's dest receiver intorel_receive().
> 
> Thoughts?

I took a deep look at the projection logic.
In most cases, you are right that Gather node does not need projection.

In some rare cases, such as Subplan (or initplan I guess).
The projection will happen in Gather node.

The example:

Create table test(i int);
Create table test2(a int, b int);
insert into test values(generate_series(1,10000000,1));
insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));

postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test
wheretest.i < 2000;
 
               QUERY PLAN
----------------------------------------
 Gather
   Output: test.i, (SubPlan 1)
   Workers Planned: 2
   ->  Parallel Seq Scan on public.test
         Output: test.i
         Filter: (test.i < 2000)
   SubPlan 1
     ->  Limit
           Output: (test.i)
           ->  Seq Scan on public.test2
                 Output: test.i

In this case, projection is necessary,
because the subplan will be executed in projection.

If skipped, the table created will loss some data.



Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Nov 26, 2020 at 12:15 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> I took a deep look at the projection logic.
> In most cases, you are right that Gather node does not need projection.
>
> In some rare cases, such as Subplan (or initplan I guess).
> The projection will happen in Gather node.
>
> The example:
>
> Create table test(i int);
> Create table test2(a int, b int);
> insert into test values(generate_series(1,10000000,1));
> insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));
>
> postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test
wheretest.i < 2000;
 
>                QUERY PLAN
> ----------------------------------------
>  Gather
>    Output: test.i, (SubPlan 1)
>    Workers Planned: 2
>    ->  Parallel Seq Scan on public.test
>          Output: test.i
>          Filter: (test.i < 2000)
>    SubPlan 1
>      ->  Limit
>            Output: (test.i)
>            ->  Seq Scan on public.test2
>                  Output: test.i
>
> In this case, projection is necessary,
> because the subplan will be executed in projection.
>
> If skipped, the table created will loss some data.
>

Thanks a lot for the use case. Yes with the current patch table will
lose data related to the subplan. On analyzing further, I think we can
not allow parallel inserts in the cases when the Gather node has some
projections to do. Because the workers can not perform that
projection. So, having ps_ProjInfo in the Gather node is an indication
for us to disable parallel inserts and only the leader can do the
insertions after the Gather node does the required projections.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi,

> > I took a deep look at the projection logic.
> > In most cases, you are right that Gather node does not need projection.
> >
> > In some rare cases, such as Subplan (or initplan I guess).
> > The projection will happen in Gather node.
> >
> > The example:
> >
> > Create table test(i int);
> > Create table test2(a int, b int);
> > insert into test values(generate_series(1,10000000,1));
> > insert into test2 values(generate_series(1,1000,1),
> > generate_series(1,1000,1));
> >
> > postgres=# explain(verbose, costs off) select test.i,(select i from
> (select * from test2) as tt limit 1) from test where test.i < 2000;
> >                QUERY PLAN
> > ----------------------------------------
> >  Gather
> >    Output: test.i, (SubPlan 1)
> >    Workers Planned: 2
> >    ->  Parallel Seq Scan on public.test
> >          Output: test.i
> >          Filter: (test.i < 2000)
> >    SubPlan 1
> >      ->  Limit
> >            Output: (test.i)
> >            ->  Seq Scan on public.test2
> >                  Output: test.i
> >
> > In this case, projection is necessary, because the subplan will be
> > executed in projection.
> >
> > If skipped, the table created will loss some data.
> >
> 
> Thanks a lot for the use case. Yes with the current patch table will lose
> data related to the subplan. On analyzing further, I think we can not allow
> parallel inserts in the cases when the Gather node has some projections
> to do. Because the workers can not perform that projection. So, having
> ps_ProjInfo in the Gather node is an indication for us to disable parallel
> inserts and only the leader can do the insertions after the Gather node
> does the required projections.
> 
> Thoughts?
> 

Agreed.


2.
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
         {
             ParallelContext *pcxt;
 
+            /*
+             * Take the necessary information to be passed to workers for
+             * parallel inserts in CTAS.
+             */
+            if (ISCTAS(node->ps.intoclause))
+            {
+                node->ps.lefttree->intoclause = node->ps.intoclause;
+                node->ps.lefttree->objectid = node->ps.objectid;
+            }
+
             /* Initialize, or re-initialize, shared state needed by workers. */
             if (!node->pei)
                 node->pei = ExecInitParallelPlan(node->ps.lefttree,

I found the code pass intoclause and objectid to Gather node's lefttree.
Is it necessary? It seems only Gather node will use the information.


Best regards,
houzj




Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Nov 27, 2020 at 11:57 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > Thanks a lot for the use case. Yes with the current patch table will lose
> > data related to the subplan. On analyzing further, I think we can not allow
> > parallel inserts in the cases when the Gather node has some projections
> > to do. Because the workers can not perform that projection. So, having
> > ps_ProjInfo in the Gather node is an indication for us to disable parallel
> > inserts and only the leader can do the insertions after the Gather node
> > does the required projections.
> >
> > Thoughts?
> >
>
> Agreed.
>

Thanks! I will add/modify IsParallelInsertInCTASAllowed() to return
false in this case.

>
> 2.
> @@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
>                 {
>                         ParallelContext *pcxt;
>
> +                       /*
> +                        * Take the necessary information to be passed to workers for
> +                        * parallel inserts in CTAS.
> +                        */
> +                       if (ISCTAS(node->ps.intoclause))
> +                       {
> +                               node->ps.lefttree->intoclause = node->ps.intoclause;
> +                               node->ps.lefttree->objectid = node->ps.objectid;
> +                       }
> +
>                         /* Initialize, or re-initialize, shared state needed by workers. */
>                         if (!node->pei)
>                                 node->pei = ExecInitParallelPlan(node->ps.lefttree,
>
> I found the code pass intoclause and objectid to Gather node's lefttree.
> Is it necessary? It seems only Gather node will use the information.
>

I am passing the required information from the up to here through
PlanState structure. Since the Gather node's leftree is also a
PlanState structure variable, here I just assigned them to pass that
information to ExecInitParallelPlan().

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 25-11-2020 03:40, Bharath Rupireddy wrote:
> On Tue, Nov 24, 2020 at 4:43 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>>
>> I'm very interested in this feature,
>> and I'm looking at the patch, here are some comments.
>>
> 
> Thanks for the review.
> 
>>
>> How about the following style:
>>
>>                  if(TupIsNull(outerTupleSlot))
>>                          Break;
>>
>>                  (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
>>                  node->ps.state->es_processed++;
>>
>> Which looks cleaner.
>>
> 
> Done.
> 
>>
>> The check can be replaced by ISCTAS(into).
>>
> 
> Done.
> 
>>
>> 'inerst' looks like a typo (insert).
>>
> 
> Corrected.
> 
>>
>> The code here call strlen(intoclausestr) for two times,
>> After checking the existing code in ExecInitParallelPlan,
>> It used to store the strlen in a variable.
>>
>> So how about the following style:
>>
>>          intoclause_len = strlen(intoclausestr);
>>          ...
>>          /* Store serialized intoclause. */
>>          intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
>>          memcpy(shmptr, intoclausestr, intoclause_len + 1);
>>          shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
>>
> 
> Done.
> 
>>
>> The two check about intoclausestr seems can be combined like:
>>
>> if (intoclausestr != NULL)
>> {
>> ...
>> }
>> else
>> {
>> ...
>> }
>>
> 
> Done.
> 
> Attaching v5 patch. Please consider it for further review.
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Disclaimer: I have by no means throughly reviewed all the involved parts 
and am probably missing quite a bit of context so if I understood parts 
wrong or they have been discussed before then I'm sorry. Most notably 
the whole situation about the command-id is still elusive for me and I 
can really not judge yet anything related to that.

IMHO The patch makes that we now have the gather do most of the CTAS 
work, which seems unwanted. For the non-ctas insert/update case it seems 
that a modifytable node exists to actually do the work. What I'm 
wondering is if it is maybe not better to introduce a CreateTable node 
as well?
This would have several merits:
- the rowcount of that node would be 0 for the parallel case, and 
non-zero for the serial case. Then the gather ndoe and the Query struct 
don't have to know about CTAS for the most part, removing e.g. the case 
distinctions in cost_gather.
- the inserted rows can now be accounted in this new node instead of the 
parallel executor state, and this node can also do its own DSM 
intializations
- the generation of a partial variants of the CreateTable node can now 
be done in the optimizer instead of the ExecCreateTableAs which IMHO is 
a more logical place to make these kind of decisions. which then also 
makes it potentially play nicer with costs and the like.
- the explain code can now be in its own place instead of part of the 
gather node
- IIUC it would allow the removal of the code to only launch parallel 
workers if its not CTAS, which IMHO would be quite a big benefit.

Thoughts?

Some small things I noticed while going through the patch:
- Typo for the comment about "inintorel_startup" which should be 
intorel_startup
-   if (node->nworkers_launched == 0 && !node->need_to_scan_locally) 

   can be changed into
   if (node->nworkers_launched == 0
   because either way it'll be true.

Regards,
Luc
Swarm64



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Nov 27, 2020 at 1:07 PM Luc Vlaming <luc@swarm64.com> wrote:
>
> Disclaimer: I have by no means throughly reviewed all the involved parts
> and am probably missing quite a bit of context so if I understood parts
> wrong or they have been discussed before then I'm sorry. Most notably
> the whole situation about the command-id is still elusive for me and I
> can really not judge yet anything related to that.
>
> IMHO The patch makes that we now have the gather do most of the CTAS
> work, which seems unwanted. For the non-ctas insert/update case it seems
> that a modifytable node exists to actually do the work. What I'm
> wondering is if it is maybe not better to introduce a CreateTable node
> as well?
> This would have several merits:
> - the rowcount of that node would be 0 for the parallel case, and
> non-zero for the serial case. Then the gather ndoe and the Query struct
> don't have to know about CTAS for the most part, removing e.g. the case
> distinctions in cost_gather.
> - the inserted rows can now be accounted in this new node instead of the
> parallel executor state, and this node can also do its own DSM
> intializations
> - the generation of a partial variants of the CreateTable node can now
> be done in the optimizer instead of the ExecCreateTableAs which IMHO is
> a more logical place to make these kind of decisions. which then also
> makes it potentially play nicer with costs and the like.
> - the explain code can now be in its own place instead of part of the
> gather node
> - IIUC it would allow the removal of the code to only launch parallel
> workers if its not CTAS, which IMHO would be quite a big benefit.
>
> Thoughts?
>

If I'm not wrong, I think currently we have no exec nodes for DDLs.
I'm not sure whether we would like to introduce one for this. And also
note that, both CTAS and CREATE MATERIALIZED VIEW(CMV) are handled
with the same code, so if we have CreateTable as the new node, then do
we also want to have another node or a generic node name?

The main design idea of the patch proposed in this thread is that
pushing the dest receiver down to the workers if the SELECT part of
the CTAS or CMV is parallelizable. And also, for CTAS or CMV we do not
do any planning as such, but the planner is just influenced to take
into consideration that there are no tuples to transfer from the
workers to Gather node which may make the planner choose parallelism
for SELECT part. So, the planner work for CTAS or CMV is very minimal.
I also have the idea of extending this design (if accepted) to REFRESH
MATERIALIZED VIEW after some analysis.

I may be wrong above, other hackers may have better opinions.

>
> Some small things I noticed while going through the patch:
> - Typo for the comment about "inintorel_startup" which should be
> intorel_startup
>

Corrected.

>
> -   if (node->nworkers_launched == 0 && !node->need_to_scan_locally)
>
>    can be changed into
>    if (node->nworkers_launched == 0
>    because either way it'll be true.
>

Yes, !node->need_to_scan_locally is not necessary, we need to set it
to true if there are no workers launched. I removed
!node->need_to_scan_locally check from the if clause.

> On Fri, Nov 27, 2020 at 11:57 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > > Thanks a lot for the use case. Yes with the current patch table will lose
> > > data related to the subplan. On analyzing further, I think we can not allow
> > > parallel inserts in the cases when the Gather node has some projections
> > > to do. Because the workers can not perform that projection. So, having
> > > ps_ProjInfo in the Gather node is an indication for us to disable parallel
> > > inserts and only the leader can do the insertions after the Gather node
> > > does the required projections.
> > >
> > > Thoughts?
> >
> > Agreed.
>
> Thanks! I will add/modify IsParallelInsertInCTASAllowed() to return
> false in this case.
>

Modified.

Attaching v6 patch that has the above review comments addressed.
Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Mon, Nov 30, 2020 at 10:43 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Nov 27, 2020 at 1:07 PM Luc Vlaming <luc@swarm64.com> wrote:
> >
> > Disclaimer: I have by no means throughly reviewed all the involved parts
> > and am probably missing quite a bit of context so if I understood parts
> > wrong or they have been discussed before then I'm sorry. Most notably
> > the whole situation about the command-id is still elusive for me and I
> > can really not judge yet anything related to that.
> >
> > IMHO The patch makes that we now have the gather do most of the CTAS
> > work, which seems unwanted. For the non-ctas insert/update case it seems
> > that a modifytable node exists to actually do the work. What I'm
> > wondering is if it is maybe not better to introduce a CreateTable node
> > as well?
> > This would have several merits:
> > - the rowcount of that node would be 0 for the parallel case, and
> > non-zero for the serial case. Then the gather ndoe and the Query struct
> > don't have to know about CTAS for the most part, removing e.g. the case
> > distinctions in cost_gather.
> > - the inserted rows can now be accounted in this new node instead of the
> > parallel executor state, and this node can also do its own DSM
> > intializations
> > - the generation of a partial variants of the CreateTable node can now
> > be done in the optimizer instead of the ExecCreateTableAs which IMHO is
> > a more logical place to make these kind of decisions. which then also
> > makes it potentially play nicer with costs and the like.
> > - the explain code can now be in its own place instead of part of the
> > gather node
> > - IIUC it would allow the removal of the code to only launch parallel
> > workers if its not CTAS, which IMHO would be quite a big benefit.
> >
> > Thoughts?
> >
>
> If I'm not wrong, I think currently we have no exec nodes for DDLs.
> I'm not sure whether we would like to introduce one for this.
>

Yeah, I am also not in favor of having an executor node for CTAS but
OTOH, I also don't like the way you have jammed the relevant
information in generic PlanState. How about keeping it in GatherState
and initializing it in ExecCreateTableAs() after the executor start.
You are already doing special treatment for the Gather node in
ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
well initialize the required information in GatherState in
ExecCreateTableAs. I think that might help in reducing the special
treatment for intoclause at different places.

Few other assorted comments:
=========================
1.
+/*
+ * IsParallelInsertInCTASAllowed --- determine whether or not parallel
+ * insertion is possible.
+ */
+bool IsParallelInsertInCTASAllowed(IntoClause *into, QueryDesc *queryDesc)
+{
..
..
if (ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+ plannedstmt->parallelModeNeeded &&
+ plannedstmt->planTree &&
+ IsA(plannedstmt->planTree, Gather) &&
+ plannedstmt->planTree->lefttree &&
+ plannedstmt->planTree->lefttree->parallel_aware &&
+ plannedstmt->planTree->lefttree->parallel_safe)
+ {
+ /*
+ * Since there are no rows that are transferred from workers to
+ * Gather node, so we set it to 0 to be visible in explain
+ * plans. Note that we would have already accounted this for
+ * cost calculations in cost_gather().
+ */
+ plannedstmt->planTree->plan_rows = 0;

This looks a bit odd. The function name
'IsParallelInsertInCTASAllowed' suggests that it just checks whether
parallelism is allowed but it is internally changing the plan_rows. It
might be better to do this separately if the parallelism is allowed.

2.
 static void ExecShutdownGatherWorkers(GatherState *node);
-
+static void ExecParallelInsertInCTAS(GatherState *node);

Spurious line removal.

3.
/* Wait for the parallel workers to finish. */
+ if (node->nworkers_launched > 0)
+ {
+ ExecShutdownGatherWorkers(node);
+
+ /*
+ * Add up the total tuples inserted by all workers, to the tuples
+ * inserted by the leader(if any). This will be shared to client.
+ */
+ node->ps.state->es_processed += pg_atomic_read_u64(node->pei->processed);
+ }

The comment and code appear a bit misleading as the function seems to
shutdown the workers rather than waiting for them to finish. How about
using something like below:

/*
* Next, accumulate buffer and WAL usage.  (This must wait for the workers
* to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
int i;

/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);

for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}

This is how it works for parallel vacuum.

4.
+
+ /*
+ * Make the number of tuples that are transferred from workers to gather
+ * node zero as each worker parallelly insert the tuples that are resulted
+ * from its chunk of plan execution. This change may make the parallel
+ * plan cheap among all other plans, and influence the planner to consider
+ * this parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;

The above comment doesn't seem to convey what it intends to convey.
How about changing it slightly as: "We don't compute the
parallel_tuple_cost for CTAS because the number of tuples that are
transferred from workers to the gather node is zero as each worker
parallelly inserts the tuples that are resulted from its chunk of plan
execution. This change may make the parallel plan cheap among all
other plans, and influence the planner to consider this parallel
plan."

Then, we can also have an Assert for path->path.rows to zero for the CTAS case.

5.
+ /* Prallel inserts in CTAS related info is specified below. */
+ IntoClause *intoclause;
+ Oid objectid;
+ DestReceiver *dest;
 } PlanState;

Typo. /Prallel/Parallel

6.
Currently, it seems the plan look like:
 Gather (actual time=970.524..972.913 rows=0 loops=1)
   ->  Create t1_test
     Workers Planned: 2
     Workers Launched: 2
     ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)

I would prefer it to be:
Gather (actual time=970.524..972.913 rows=0 loops=1)
     Workers Planned: 2
     Workers Launched: 2
    ->  Create t1_test
     ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)

This way it looks like the writing part is done below the Gather node
and also it will match the Parallel Insert patch of Greg.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
Thanks Amit for the review comments.

On Sat, Dec 5, 2020 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > If I'm not wrong, I think currently we have no exec nodes for DDLs.
> > I'm not sure whether we would like to introduce one for this.
>
> Yeah, I am also not in favor of having an executor node for CTAS but
> OTOH, I also don't like the way you have jammed the relevant
> information in generic PlanState. How about keeping it in GatherState
> and initializing it in ExecCreateTableAs() after the executor start.
> You are already doing special treatment for the Gather node in
> ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
> well initialize the required information in GatherState in
> ExecCreateTableAs. I think that might help in reducing the special
> treatment for intoclause at different places.
>

Done. Added required info to GatherState node. While this reduced the
changes at many other places, but had to pass the into clause and
object id to ExecInitParallelPlan() as we do not send GatherState node
to it. Hope that's okay.

>
> Few other assorted comments:
> =========================
> 1.
> This looks a bit odd. The function name
> 'IsParallelInsertInCTASAllowed' suggests that it just checks whether
> parallelism is allowed but it is internally changing the plan_rows. It
> might be better to do this separately if the parallelism is allowed.
>

Changed.

>
> 2.
>  static void ExecShutdownGatherWorkers(GatherState *node);
> -
> +static void ExecParallelInsertInCTAS(GatherState *node);
>
> Spurious line removal.
>

Corrected.

>
> 3.
> The comment and code appear a bit misleading as the function seems to
> shutdown the workers rather than waiting for them to finish. How about
> using something like below:
>
> /*
> * Next, accumulate buffer and WAL usage.  (This must wait for the workers
> * to finish, or we might get incomplete data.)
> */
> if (nworkers > 0)
> {
> int i;
>
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
> }
>
> This is how it works for parallel vacuum.
>

Done.

>
> 4.
> The above comment doesn't seem to convey what it intends to convey.
> How about changing it slightly as: "We don't compute the
> parallel_tuple_cost for CTAS because the number of tuples that are
> transferred from workers to the gather node is zero as each worker
> parallelly inserts the tuples that are resulted from its chunk of plan
> execution. This change may make the parallel plan cheap among all
> other plans, and influence the planner to consider this parallel
> plan."
>

Changed.

>
> Then, we can also have an Assert for path->path.rows to zero for the CTAS case.
>

We can not have Assert(path->path.rows == 0), because we are not
changing this parameter upstream in or before the planning phase. We
are just skipping to take it into account for CTAS. We may have to do
extra checks over different places in case we have to make planner
path->path.rows to 0 for CTAS. IMHO, that's not necessary. We can just
skip taking this value in cost_gather. Thoughts?

>
> 5.
> + /* Prallel inserts in CTAS related info is specified below. */
> + IntoClause *intoclause;
> + Oid objectid;
> + DestReceiver *dest;
>  } PlanState;
>
> Typo. /Prallel/Parallel
>

Corrected.

>
> 6.
> Currently, it seems the plan look like:
>  Gather (actual time=970.524..972.913 rows=0 loops=1)
>    ->  Create t1_test
>      Workers Planned: 2
>      Workers Launched: 2
>      ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
>
> I would prefer it to be:
> Gather (actual time=970.524..972.913 rows=0 loops=1)
>      Workers Planned: 2
>      Workers Launched: 2
>     ->  Create t1_test
>      ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
>
> This way it looks like the writing part is done below the Gather node
> and also it will match the Parallel Insert patch of Greg.
>

Done.

Attaching v7 patch. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
Hi, Bharath :

+       (void) SetCurrentCommandIdUsedForWorker();
+       myState->output_cid = GetCurrentCommandId(false);

SetCurrentCommandIdUsedForWorker already has void as return type. The '(void)' is not needed.

+            * rd_createSubid is marked invalid, otherwise, the table is
+            * not allowed to extend by the workers.

nit: to extend by the workers -> to be extended by the workers

For IsParallelInsertInCTASAllowed, logic is inside 'if (IS_CTAS(into))' block.
You can return false when (!IS_CTAS(into)) - this would save some indentation for the body.

+       if (rel && rel->relpersistence != RELPERSISTENCE_TEMP)
+           allowed = true;

Similarly, when the above condition doesn't hold, you can return false directly - reducing the next if condition to 'if (queryDesc)'.

+           if (!(ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+               plannedstmt->parallelModeNeeded &&
+               plannedstmt->planTree &&
+               IsA(plannedstmt->planTree, Gather) &&
+               plannedstmt->planTree->lefttree &&
+               plannedstmt->planTree->lefttree->parallel_aware &&
+               plannedstmt->planTree->lefttree->parallel_safe))

The composite condition is negated. Maybe you can write without negation:

+           return (ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+               plannedstmt->parallelModeNeeded &&
+               plannedstmt->planTree &&
+               IsA(plannedstmt->planTree, Gather) &&
+               plannedstmt->planTree->lefttree &&
+               plannedstmt->planTree->lefttree->parallel_aware &&
+               plannedstmt->planTree->lefttree->parallel_safe)

+    * Write out the number of tuples this worker has inserted. Leader will use
+    * it to inform to the end client.

'inform to the end client' -> 'inform the end client' (without to)

Cheers

On Sun, Dec 6, 2020 at 4:37 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks Amit for the review comments.

On Sat, Dec 5, 2020 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > If I'm not wrong, I think currently we have no exec nodes for DDLs.
> > I'm not sure whether we would like to introduce one for this.
>
> Yeah, I am also not in favor of having an executor node for CTAS but
> OTOH, I also don't like the way you have jammed the relevant
> information in generic PlanState. How about keeping it in GatherState
> and initializing it in ExecCreateTableAs() after the executor start.
> You are already doing special treatment for the Gather node in
> ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
> well initialize the required information in GatherState in
> ExecCreateTableAs. I think that might help in reducing the special
> treatment for intoclause at different places.
>

Done. Added required info to GatherState node. While this reduced the
changes at many other places, but had to pass the into clause and
object id to ExecInitParallelPlan() as we do not send GatherState node
to it. Hope that's okay.

>
> Few other assorted comments:
> =========================
> 1.
> This looks a bit odd. The function name
> 'IsParallelInsertInCTASAllowed' suggests that it just checks whether
> parallelism is allowed but it is internally changing the plan_rows. It
> might be better to do this separately if the parallelism is allowed.
>

Changed.

>
> 2.
>  static void ExecShutdownGatherWorkers(GatherState *node);
> -
> +static void ExecParallelInsertInCTAS(GatherState *node);
>
> Spurious line removal.
>

Corrected.

>
> 3.
> The comment and code appear a bit misleading as the function seems to
> shutdown the workers rather than waiting for them to finish. How about
> using something like below:
>
> /*
> * Next, accumulate buffer and WAL usage.  (This must wait for the workers
> * to finish, or we might get incomplete data.)
> */
> if (nworkers > 0)
> {
> int i;
>
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
> }
>
> This is how it works for parallel vacuum.
>

Done.

>
> 4.
> The above comment doesn't seem to convey what it intends to convey.
> How about changing it slightly as: "We don't compute the
> parallel_tuple_cost for CTAS because the number of tuples that are
> transferred from workers to the gather node is zero as each worker
> parallelly inserts the tuples that are resulted from its chunk of plan
> execution. This change may make the parallel plan cheap among all
> other plans, and influence the planner to consider this parallel
> plan."
>

Changed.

>
> Then, we can also have an Assert for path->path.rows to zero for the CTAS case.
>

We can not have Assert(path->path.rows == 0), because we are not
changing this parameter upstream in or before the planning phase. We
are just skipping to take it into account for CTAS. We may have to do
extra checks over different places in case we have to make planner
path->path.rows to 0 for CTAS. IMHO, that's not necessary. We can just
skip taking this value in cost_gather. Thoughts?

>
> 5.
> + /* Prallel inserts in CTAS related info is specified below. */
> + IntoClause *intoclause;
> + Oid objectid;
> + DestReceiver *dest;
>  } PlanState;
>
> Typo. /Prallel/Parallel
>

Corrected.

>
> 6.
> Currently, it seems the plan look like:
>  Gather (actual time=970.524..972.913 rows=0 loops=1)
>    ->  Create t1_test
>      Workers Planned: 2
>      Workers Launched: 2
>      ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
>
> I would prefer it to be:
> Gather (actual time=970.524..972.913 rows=0 loops=1)
>      Workers Planned: 2
>      Workers Launched: 2
>     ->  Create t1_test
>      ->  Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333 loops=3)
>
> This way it looks like the writing part is done below the Gather node
> and also it will match the Parallel Insert patch of Greg.
>

Done.

Attaching v7 patch. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
Thanks for the comments.

On Mon, Dec 7, 2020 at 8:56 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
> +       (void) SetCurrentCommandIdUsedForWorker();
> +       myState->output_cid = GetCurrentCommandId(false);
>
> SetCurrentCommandIdUsedForWorker already has void as return type. The '(void)' is not needed.
>

Removed.

>
> +            * rd_createSubid is marked invalid, otherwise, the table is
> +            * not allowed to extend by the workers.
>
> nit: to extend by the workers -> to be extended by the workers
>

Changed.

>
> For IsParallelInsertInCTASAllowed, logic is inside 'if (IS_CTAS(into))' block.
> You can return false when (!IS_CTAS(into)) - this would save some indentation for the body.
>

Done.

>
> +       if (rel && rel->relpersistence != RELPERSISTENCE_TEMP)
> +           allowed = true;
>
> Similarly, when the above condition doesn't hold, you can return false directly - reducing the next if condition to
'if(queryDesc)'.
 
>

Done.

>
> The composite condition is negated. Maybe you can write without negation:
>

Done.

>
> +    * Write out the number of tuples this worker has inserted. Leader will use
> +    * it to inform to the end client.
>
> 'inform to the end client' -> 'inform the end client' (without to)
>

Changed.

Attaching v8 patch. Consider this for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi

+    /*
+     * Flag to let the planner know that the SELECT query is for CTAS. This is
+     * used to calculate the tuple transfer cost from workers to gather node(in
+     * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+     * each worker will insert its share of tuples in parallel.
+     */
+    if (IsParallelInsertInCTASAllowed(into, NULL))
+        query->isForCTAS = true;


+    /*
+     * We do not compute the parallel_tuple_cost for CTAS because the number of
+     * tuples that are transferred from workers to the gather node is zero as
+     * each worker, in parallel, inserts the tuples that are resulted from its
+     * chunk of plan execution. This change may make the parallel plan cheap
+     * among all other plans, and influence the planner to consider this
+     * parallel plan.
+     */
+    if (!(root->parse->isForCTAS &&
+        root->query_level == 1))
+        run_cost += parallel_tuple_cost * path->path.rows;

I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.

Example:
    Create table test(i int);
    insert into test values(generate_series(1,10000000,1));
    explain create table ntest3 as select * from test where i < 200 limit 10000;

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Limit  (cost=1000.00..97331.33 rows=1000 width=4)
   ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
         Workers Planned: 2
         ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 width=4)
               Filter: (i < 200)


The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.

Is that works as expected ?

Best regards,
houzj






Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> +       /*
> +        * Flag to let the planner know that the SELECT query is for CTAS. This is
> +        * used to calculate the tuple transfer cost from workers to gather node(in
> +        * case parallelism kicks in for the SELECT part of the CTAS), to zero as
> +        * each worker will insert its share of tuples in parallel.
> +        */
> +       if (IsParallelInsertInCTASAllowed(into, NULL))
> +               query->isForCTAS = true;
>
>
> +       /*
> +        * We do not compute the parallel_tuple_cost for CTAS because the number of
> +        * tuples that are transferred from workers to the gather node is zero as
> +        * each worker, in parallel, inserts the tuples that are resulted from its
> +        * chunk of plan execution. This change may make the parallel plan cheap
> +        * among all other plans, and influence the planner to consider this
> +        * parallel plan.
> +        */
> +       if (!(root->parse->isForCTAS &&
> +               root->query_level == 1))
> +               run_cost += parallel_tuple_cost * path->path.rows;
>
> I noticed that the parallel_tuple_cost will still be ignored,
> When Gather is not the top node.
>
> Example:
>         Create table test(i int);
>         insert into test values(generate_series(1,10000000,1));
>         explain create table ntest3 as select * from test where i < 200 limit 10000;
>
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Limit  (cost=1000.00..97331.33 rows=1000 width=4)
>    ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
>          Workers Planned: 2
>          ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 width=4)
>                Filter: (i < 200)
>
>
> The isForCTAS will be true because [create table as], the
> query_level is always 1 because there is no subquery.
> So even if gather is not the top node, parallel cost will still be ignored.
>
> Is that works as expected ?
>

I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 7, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > Hi
> >
> > +       /*
> > +        * Flag to let the planner know that the SELECT query is for CTAS. This is
> > +        * used to calculate the tuple transfer cost from workers to gather node(in
> > +        * case parallelism kicks in for the SELECT part of the CTAS), to zero as
> > +        * each worker will insert its share of tuples in parallel.
> > +        */
> > +       if (IsParallelInsertInCTASAllowed(into, NULL))
> > +               query->isForCTAS = true;
> >
> >
> > +       /*
> > +        * We do not compute the parallel_tuple_cost for CTAS because the number of
> > +        * tuples that are transferred from workers to the gather node is zero as
> > +        * each worker, in parallel, inserts the tuples that are resulted from its
> > +        * chunk of plan execution. This change may make the parallel plan cheap
> > +        * among all other plans, and influence the planner to consider this
> > +        * parallel plan.
> > +        */
> > +       if (!(root->parse->isForCTAS &&
> > +               root->query_level == 1))
> > +               run_cost += parallel_tuple_cost * path->path.rows;
> >
> > I noticed that the parallel_tuple_cost will still be ignored,
> > When Gather is not the top node.
> >
> > Example:
> >         Create table test(i int);
> >         insert into test values(generate_series(1,10000000,1));
> >         explain create table ntest3 as select * from test where i < 200 limit 10000;
> >
> >                                   QUERY PLAN
> > -------------------------------------------------------------------------------
> >  Limit  (cost=1000.00..97331.33 rows=1000 width=4)
> >    ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
> >          Workers Planned: 2
> >          ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 width=4)
> >                Filter: (i < 200)
> >
> >
> > The isForCTAS will be true because [create table as], the
> > query_level is always 1 because there is no subquery.
> > So even if gather is not the top node, parallel cost will still be ignored.
> >
> > Is that works as expected ?
> >
>
> I don't think that is expected and is not the case without this patch.
> The cost shouldn't be changed for existing cases where the write is
> not pushed to workers.
>

Thanks for pointing that out. Yes it should not change for the cases
where parallel inserts will not be picked later.

Any better suggestions on how to make the planner consider that the
CTAS might choose parallel inserts later at the same time avoiding the
above issue in case it doesn't?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Mon, Dec 7, 2020 at 3:44 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > >
> > > +       if (!(root->parse->isForCTAS &&
> > > +               root->query_level == 1))
> > > +               run_cost += parallel_tuple_cost * path->path.rows;
> > >
> > > I noticed that the parallel_tuple_cost will still be ignored,
> > > When Gather is not the top node.
> > >
> > > Example:
> > >         Create table test(i int);
> > >         insert into test values(generate_series(1,10000000,1));
> > >         explain create table ntest3 as select * from test where i < 200 limit 10000;
> > >
> > >                                   QUERY PLAN
> > > -------------------------------------------------------------------------------
> > >  Limit  (cost=1000.00..97331.33 rows=1000 width=4)
> > >    ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
> > >          Workers Planned: 2
> > >          ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 width=4)
> > >                Filter: (i < 200)
> > >
> > >
> > > The isForCTAS will be true because [create table as], the
> > > query_level is always 1 because there is no subquery.
> > > So even if gather is not the top node, parallel cost will still be ignored.
> > >
> > > Is that works as expected ?
> > >
> >
> > I don't think that is expected and is not the case without this patch.
> > The cost shouldn't be changed for existing cases where the write is
> > not pushed to workers.
> >
>
> Thanks for pointing that out. Yes it should not change for the cases
> where parallel inserts will not be picked later.
>
> Any better suggestions on how to make the planner consider that the
> CTAS might choose parallel inserts later at the same time avoiding the
> above issue in case it doesn't?
>

What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> What is the need of checking query_level when 'isForCTAS' is set only
> when Gather is a top-node?
>

isForCTAS is getting set before pg_plan_query() which is being used in
cost_gather(). We will not have a Gather node by then and hence will
not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
setting isForCTAS to true. Intention to check query_level == 1 in
cost_gather is to consider for only top level query not for other sub
queries.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > What is the need of checking query_level when 'isForCTAS' is set only
> > when Gather is a top-node?
> >
>
> isForCTAS is getting set before pg_plan_query() which is being used in
> cost_gather(). We will not have a Gather node by then and hence will
> not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
> setting isForCTAS to true.
>

IsParallelInsertInCTASAllowed() seems to be returning false if
queryDesc is NULL, so won't isForCTAS be always set to false? I think
I am missing something here.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 7, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > What is the need of checking query_level when 'isForCTAS' is set only
> > > when Gather is a top-node?
> > >
> >
> > isForCTAS is getting set before pg_plan_query() which is being used in
> > cost_gather(). We will not have a Gather node by then and hence will
> > not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
> > setting isForCTAS to true.
> >
>
> IsParallelInsertInCTASAllowed() seems to be returning false if
> queryDesc is NULL, so won't isForCTAS be always set to false? I think
> I am missing something here.
>

My bad. I utterly missed this, sorry for the confusion.

My intention to have IsParallelInsertInCTASAllowed() is for two
purposes. 1. when called before planning without queryDesc, it should
return true if IS_CTAS(into) is true and is not a temporary table. 2.
when called after planning with a non-null queryDesc, along with 1)
checks, it should also perform the Gather State checks and return
accordingly.

I have corrected it in v9 patch. Please have a look.

>
> > > The isForCTAS will be true because [create table as], the
> > > query_level is always 1 because there is no subquery.
> > > So even if gather is not the top node, parallel cost will still be ignored.
> > >
> > > Is that works as expected ?
> > >
> >
> > I don't think that is expected and is not the case without this patch.
> > The cost shouldn't be changed for existing cases where the write is
> > not pushed to workers.
> >
>
> Thanks for pointing that out. Yes it should not change for the cases
> where parallel inserts will not be picked later.
>
> Any better suggestions on how to make the planner consider that the
> CTAS might choose parallel inserts later at the same time avoiding the
> above issue in case it doesn't?
>

I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks? This is simple
to do, but we might miss some parallel plans for the SELECTs because
the planner would have already considered the tuple transfer cost from
workers to Gather wrongly because of which that parallel plan would
have become costlier compared to non parallel plans. IMO, we can do
this since it also keeps the existing behaviour of the planner i.e.
when the planner is planning for SELECTs it doesn't know that it is
doing it for CTAS. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Mon, Dec 7, 2020 at 7:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > What is the need of checking query_level when 'isForCTAS' is set only
> > > > when Gather is a top-node?
> > > >
> > >
> > > isForCTAS is getting set before pg_plan_query() which is being used in
> > > cost_gather(). We will not have a Gather node by then and hence will
> > > not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
> > > setting isForCTAS to true.
> > >
> >
> > IsParallelInsertInCTASAllowed() seems to be returning false if
> > queryDesc is NULL, so won't isForCTAS be always set to false? I think
> > I am missing something here.
> >
>
> My bad. I utterly missed this, sorry for the confusion.
>
> My intention to have IsParallelInsertInCTASAllowed() is for two
> purposes. 1. when called before planning without queryDesc, it should
> return true if IS_CTAS(into) is true and is not a temporary table. 2.
> when called after planning with a non-null queryDesc, along with 1)
> checks, it should also perform the Gather State checks and return
> accordingly.
>
> I have corrected it in v9 patch. Please have a look.
>
> >
> > > > The isForCTAS will be true because [create table as], the
> > > > query_level is always 1 because there is no subquery.
> > > > So even if gather is not the top node, parallel cost will still be ignored.
> > > >
> > > > Is that works as expected ?
> > > >
> > >
> > > I don't think that is expected and is not the case without this patch.
> > > The cost shouldn't be changed for existing cases where the write is
> > > not pushed to workers.
> > >
> >
> > Thanks for pointing that out. Yes it should not change for the cases
> > where parallel inserts will not be picked later.
> >
> > Any better suggestions on how to make the planner consider that the
> > CTAS might choose parallel inserts later at the same time avoiding the
> > above issue in case it doesn't?
> >
>
> I'm not quite sure how to address this. Can we not allow the planner
> to consider that the select is for CTAS and check only after the
> planning is done for the Gather node and other checks? This is simple
> to do, but we might miss some parallel plans for the SELECTs because
> the planner would have already considered the tuple transfer cost from
> workers to Gather wrongly because of which that parallel plan would
> have become costlier compared to non parallel plans. IMO, we can do
> this since it also keeps the existing behaviour of the planner i.e.
> when the planner is planning for SELECTs it doesn't know that it is
> doing it for CTAS. Thoughts?
>

I have done some initial review and I have a few comments.

@@ -328,6 +316,15 @@ ExecCreateTableAs(ParseState *pstate,
CreateTableAsStmt *stmt,
        query = linitial_node(Query, rewritten);
        Assert(query->commandType == CMD_SELECT);

+       /*
+        * Flag to let the planner know that the SELECT query is for CTAS. This
+        * is used to calculate the tuple transfer cost from workers to gather
+        * node(in case parallelism kicks in for the SELECT part of the CTAS),
+        * to zero as each worker will insert its share of tuples in parallel.
+        */
+       if (IsParallelInsertInCTASAllowed(into, NULL))
+           query->isForCTAS = true;
+
        /* plan the query */
        plan = pg_plan_query(query, pstate->p_sourcetext,
                             CURSOR_OPT_PARALLEL_OK, params);
@@ -350,6 +347,15 @@ ExecCreateTableAs(ParseState *pstate,
CreateTableAsStmt *stmt,
        /* call ExecutorStart to prepare the plan for execution */
        ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+       /*
+        * If SELECT part of the CTAS is parallelizable, then make each
+        * parallel worker insert the tuples that are resulted in its execution
+        * into the target table. We need plan state to be initialized by the
+        * executor to decide whether to allow parallel inserts or not.
+       */
+       if (IsParallelInsertInCTASAllowed(into, queryDesc))
+           SetCTASParallelInsertState(queryDesc);


Once we have called IsParallelInsertInCTASAllowed and set the
query->isForCTAS flag then why we are calling this again?

——
---

+    */
+   if (!(root->parse->isForCTAS &&
+       root->query_level == 1))
+       run_cost += parallel_tuple_cost * path->path.rows;

From this check, it appeared that the lower level gather will also get
influenced by this, consider this

-> NLJ
  -> Gather
     -> Parallel Seq Scan
  -> Index Scan

This condition is only checking that it should be a top-level query
and it should be under CTAS then this will impact all the gather nodes
as shown in the above example.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Mon, Dec 7, 2020 at 7:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I'm not quite sure how to address this. Can we not allow the planner
> to consider that the select is for CTAS and check only after the
> planning is done for the Gather node and other checks?
>

IIUC, you are saying that we should not influence the cost of gather
node even when the insertion would be done by workers? I think that
should be our fallback option anyway but that might miss some paths to
be considered parallel where the cost becomes more due to
parallel_tuple_cost (aka tuple transfer cost). I think the idea is we
can avoid the tuple transfer cost only when Gather is the top node
because only at that time we can push insertion down, right? How about
if we have some way to detect the same before calling
generate_useful_gather_paths()? I think when we are calling
apply_scanjoin_target_to_paths() in grouping_planner(), if the
query_level is 1, it is for CTAS, and it doesn't have a chance to
create UPPER_REL (doesn't have grouping, order, limit, etc clause)
then we can probably assume that the Gather will be top_node. I am not
sure about this but I think it is worth exploring.

-- 
With Regards,
Amit Kapila.



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> > I'm not quite sure how to address this. Can we not allow the planner
> > to consider that the select is for CTAS and check only after the
> > planning is done for the Gather node and other checks?
> >
> 
> IIUC, you are saying that we should not influence the cost of gather node
> even when the insertion would be done by workers? I think that should be
> our fallback option anyway but that might miss some paths to be considered
> parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> transfer cost). I think the idea is we can avoid the tuple transfer cost
> only when Gather is the top node because only at that time we can push
> insertion down, right? How about if we have some way to detect the same
> before calling generate_useful_gather_paths()? I think when we are calling
> apply_scanjoin_target_to_paths() in grouping_planner(), if the
> query_level is 1, it is for CTAS, and it doesn't have a chance to create
> UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> probably assume that the Gather will be top_node. I am not sure about this
> but I think it is worth exploring.
> 

I took a look at the parallel insert patch and have the same idea.
https://commitfest.postgresql.org/31/2844/

     * Consider generating Gather or Gather Merge paths.  We must only do this
     * if the relation is parallel safe, and we don't do it for child rels to
     * avoid creating multiple Gather nodes within the same plan. We must do
     * this after all paths have been generated and before set_cheapest, since
     * one of the generated paths may turn out to be the cheapest one.
     */
    if (rel->consider_parallel && !IS_OTHER_REL(rel))
        generate_useful_gather_paths(root, rel, false);

IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
But We need check the following parse option which will create path to be the parent of Gatherpath here.

if (root->parse->rowMarks)
if (limit_needed(root->parse))
if (root->parse->sortClause)
if (root->parse->distinctClause)
if (root->parse->hasWindowFuncs)
if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)

Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > I'm not quite sure how to address this. Can we not allow the planner
> > > to consider that the select is for CTAS and check only after the
> > > planning is done for the Gather node and other checks?
> > >
> >
> > IIUC, you are saying that we should not influence the cost of gather node
> > even when the insertion would be done by workers? I think that should be
> > our fallback option anyway but that might miss some paths to be considered
> > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > only when Gather is the top node because only at that time we can push
> > insertion down, right? How about if we have some way to detect the same
> > before calling generate_useful_gather_paths()? I think when we are calling
> > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > probably assume that the Gather will be top_node. I am not sure about this
> > but I think it is worth exploring.
> >
>
> I took a look at the parallel insert patch and have the same idea.
> https://commitfest.postgresql.org/31/2844/
>
>          * Consider generating Gather or Gather Merge paths.  We must only do this
>          * if the relation is parallel safe, and we don't do it for child rels to
>          * avoid creating multiple Gather nodes within the same plan. We must do
>          * this after all paths have been generated and before set_cheapest, since
>          * one of the generated paths may turn out to be the cheapest one.
>          */
>         if (rel->consider_parallel && !IS_OTHER_REL(rel))
>                 generate_useful_gather_paths(root, rel, false);
>
> IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> But We need check the following parse option which will create path to be the parent of Gatherpath here.
>
> if (root->parse->rowMarks)
> if (limit_needed(root->parse))
> if (root->parse->sortClause)
> if (root->parse->distinctClause)
> if (root->parse->hasWindowFuncs)
> if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
>

Thanks Amit and Hou. I will look into these areas and get back soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Tue, Dec 8, 2020 at 6:36 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > > > I'm not quite sure how to address this. Can we not allow the planner
> > > > to consider that the select is for CTAS and check only after the
> > > > planning is done for the Gather node and other checks?
> > > >
> > >
> > > IIUC, you are saying that we should not influence the cost of gather node
> > > even when the insertion would be done by workers? I think that should be
> > > our fallback option anyway but that might miss some paths to be considered
> > > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > > only when Gather is the top node because only at that time we can push
> > > insertion down, right? How about if we have some way to detect the same
> > > before calling generate_useful_gather_paths()? I think when we are calling
> > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > > probably assume that the Gather will be top_node. I am not sure about this
> > > but I think it is worth exploring.
> > >
> >
> > I took a look at the parallel insert patch and have the same idea.
> > https://commitfest.postgresql.org/31/2844/
> >
> >          * Consider generating Gather or Gather Merge paths.  We must only do this
> >          * if the relation is parallel safe, and we don't do it for child rels to
> >          * avoid creating multiple Gather nodes within the same plan. We must do
> >          * this after all paths have been generated and before set_cheapest, since
> >          * one of the generated paths may turn out to be the cheapest one.
> >          */
> >         if (rel->consider_parallel && !IS_OTHER_REL(rel))
> >                 generate_useful_gather_paths(root, rel, false);
> >
> > IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> > But We need check the following parse option which will create path to be the parent of Gatherpath here.
> >
> > if (root->parse->rowMarks)
> > if (limit_needed(root->parse))
> > if (root->parse->sortClause)
> > if (root->parse->distinctClause)
> > if (root->parse->hasWindowFuncs)
> > if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
> >
>
> Thanks Amit and Hou. I will look into these areas and get back soon.
>

It might be better to split the patch for this such that in the base
patch, we won't consider anything special for gather costing w.r.t
CTAS and in the next patch, we consider all the checks discussed
above.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > I'm not quite sure how to address this. Can we not allow the planner
> > > to consider that the select is for CTAS and check only after the
> > > planning is done for the Gather node and other checks?
> > >
> >
> > IIUC, you are saying that we should not influence the cost of gather node
> > even when the insertion would be done by workers? I think that should be
> > our fallback option anyway but that might miss some paths to be considered
> > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > only when Gather is the top node because only at that time we can push
> > insertion down, right? How about if we have some way to detect the same
> > before calling generate_useful_gather_paths()? I think when we are calling
> > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > probably assume that the Gather will be top_node. I am not sure about this
> > but I think it is worth exploring.
> >
>
> I took a look at the parallel insert patch and have the same idea.
> https://commitfest.postgresql.org/31/2844/
>
>          * Consider generating Gather or Gather Merge paths.  We must only do this
>          * if the relation is parallel safe, and we don't do it for child rels to
>          * avoid creating multiple Gather nodes within the same plan. We must do
>          * this after all paths have been generated and before set_cheapest, since
>          * one of the generated paths may turn out to be the cheapest one.
>          */
>         if (rel->consider_parallel && !IS_OTHER_REL(rel))
>                 generate_useful_gather_paths(root, rel, false);
>
> IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> But We need check the following parse option which will create path to be the parent of Gatherpath here.
>
> if (root->parse->rowMarks)
> if (limit_needed(root->parse))
> if (root->parse->sortClause)
> if (root->parse->distinctClause)
> if (root->parse->hasWindowFuncs)
> if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
>

Yeah, and as I pointed earlier, along with this we also need to
consider that the RelOptInfo must be the final target(top level rel).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > > > I'm not quite sure how to address this. Can we not allow the planner
> > > > to consider that the select is for CTAS and check only after the
> > > > planning is done for the Gather node and other checks?
> > > >
> > >
> > > IIUC, you are saying that we should not influence the cost of gather node
> > > even when the insertion would be done by workers? I think that should be
> > > our fallback option anyway but that might miss some paths to be considered
> > > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > > only when Gather is the top node because only at that time we can push
> > > insertion down, right? How about if we have some way to detect the same
> > > before calling generate_useful_gather_paths()? I think when we are calling
> > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > > probably assume that the Gather will be top_node. I am not sure about this
> > > but I think it is worth exploring.
> > >
> >
> > I took a look at the parallel insert patch and have the same idea.
> > https://commitfest.postgresql.org/31/2844/
> >
> >          * Consider generating Gather or Gather Merge paths.  We must only do this
> >          * if the relation is parallel safe, and we don't do it for child rels to
> >          * avoid creating multiple Gather nodes within the same plan. We must do
> >          * this after all paths have been generated and before set_cheapest, since
> >          * one of the generated paths may turn out to be the cheapest one.
> >          */
> >         if (rel->consider_parallel && !IS_OTHER_REL(rel))
> >                 generate_useful_gather_paths(root, rel, false);
> >
> > IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> > But We need check the following parse option which will create path to be the parent of Gatherpath here.
> >
> > if (root->parse->rowMarks)
> > if (limit_needed(root->parse))
> > if (root->parse->sortClause)
> > if (root->parse->distinctClause)
> > if (root->parse->hasWindowFuncs)
> > if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
> >
>
> Yeah, and as I pointed earlier, along with this we also need to
> consider that the RelOptInfo must be the final target(top level rel).
>

Attaching v10 patch set that includes the change suggested above for
ignoring parallel tuple cost and also few more test cases. I split the
patch as per Amit's suggestion. v10-0001 contains parallel inserts
code without planner tuple cost changes and test cases. v10-0002 has
required changes for ignoring planner tuple cost calculations.

Please review it further.

After the review and addressing all the comments, I plan to make some
code common so that it can be used for Parallel Inserts in REFRESH
MATERIALIZED VIEW. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
Hi,

+           if (!OidIsValid(col->collOid) &&
+               type_is_collatable(col->typeName->typeOid))
+               ereport(ERROR,
...
+           attrList = lappend(attrList, col);

Should attrList be freed when ereport is called ?

+       query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;

Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning the value of 0 ?

Cheers

On Wed, Dec 9, 2020 at 5:43 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> >
> > > > I'm not quite sure how to address this. Can we not allow the planner
> > > > to consider that the select is for CTAS and check only after the
> > > > planning is done for the Gather node and other checks?
> > > >
> > >
> > > IIUC, you are saying that we should not influence the cost of gather node
> > > even when the insertion would be done by workers? I think that should be
> > > our fallback option anyway but that might miss some paths to be considered
> > > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > > only when Gather is the top node because only at that time we can push
> > > insertion down, right? How about if we have some way to detect the same
> > > before calling generate_useful_gather_paths()? I think when we are calling
> > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > > probably assume that the Gather will be top_node. I am not sure about this
> > > but I think it is worth exploring.
> > >
> >
> > I took a look at the parallel insert patch and have the same idea.
> > https://commitfest.postgresql.org/31/2844/
> >
> >          * Consider generating Gather or Gather Merge paths.  We must only do this
> >          * if the relation is parallel safe, and we don't do it for child rels to
> >          * avoid creating multiple Gather nodes within the same plan. We must do
> >          * this after all paths have been generated and before set_cheapest, since
> >          * one of the generated paths may turn out to be the cheapest one.
> >          */
> >         if (rel->consider_parallel && !IS_OTHER_REL(rel))
> >                 generate_useful_gather_paths(root, rel, false);
> >
> > IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> > But We need check the following parse option which will create path to be the parent of Gatherpath here.
> >
> > if (root->parse->rowMarks)
> > if (limit_needed(root->parse))
> > if (root->parse->sortClause)
> > if (root->parse->distinctClause)
> > if (root->parse->hasWindowFuncs)
> > if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
> >
>
> Yeah, and as I pointed earlier, along with this we also need to
> consider that the RelOptInfo must be the final target(top level rel).
>

Attaching v10 patch set that includes the change suggested above for
ignoring parallel tuple cost and also few more test cases. I split the
patch as per Amit's suggestion. v10-0001 contains parallel inserts
code without planner tuple cost changes and test cases. v10-0002 has
required changes for ignoring planner tuple cost calculations.

Please review it further.

After the review and addressing all the comments, I plan to make some
code common so that it can be used for Parallel Inserts in REFRESH
MATERIALIZED VIEW. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Dec 10, 2020 at 7:48 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> +           if (!OidIsValid(col->collOid) &&
> +               type_is_collatable(col->typeName->typeOid))
> +               ereport(ERROR,
> ...
> +           attrList = lappend(attrList, col);
>
> Should attrList be freed when ereport is called ?
>

I think that's not necessary since we are going to throw an error
anyways. And also that this is not a new code added as part of this
feature, it is an existing code adjusted for parallel inserts. On
looking further in the code base there are many places where we don't
free up the lists before throwing errors.

            errmsg("column privileges are only valid for relations")));
            errmsg("check constraint \"%s\" already exists",
            errmsg("name or argument lists may not contain nulls")));
            elog(ERROR, "no tlist entry for key %d", keyresno);

> +       query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;
>
> Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning the value of 0 ?
>

Yeah both are equivalent. For now I will keep it that way, I will
change it in the next version of the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi

+        allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+                plannedstmt->parallelModeNeeded &&
+                plannedstmt->planTree &&
+                IsA(plannedstmt->planTree, Gather) &&
+                plannedstmt->planTree->lefttree &&
+                plannedstmt->planTree->lefttree->parallel_aware &&
+                plannedstmt->planTree->lefttree->parallel_safe;

I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?

I did some test but did not find a case like that.


Best regards,
houzj





Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Thu, Dec 10, 2020 at 3:59 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> +               allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> +                               plannedstmt->parallelModeNeeded &&
> +                               plannedstmt->planTree &&
> +                               IsA(plannedstmt->planTree, Gather) &&
> +                               plannedstmt->planTree->lefttree &&
> +                               plannedstmt->planTree->lefttree->parallel_aware &&
> +                               plannedstmt->planTree->lefttree->parallel_safe;
>
> I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?
>
> I did some test but did not find a case like that.
>

This seems like an extra check.  Apart from that if we combine 0001
and 0002 there should be an additional protection so that it should
not happen that in cost_gather we have ignored the parallel tuple cost
and now we are rejecting the parallel insert. Probably we should add
an assert.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Dec 10, 2020 at 4:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > +               allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > +                               plannedstmt->parallelModeNeeded &&
> > +                               plannedstmt->planTree &&
> > +                               IsA(plannedstmt->planTree, Gather) &&
> > +                               plannedstmt->planTree->lefttree &&
> > +                               plannedstmt->planTree->lefttree->parallel_aware &&
> > +                               plannedstmt->planTree->lefttree->parallel_safe;
> >
> > I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> > Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?
> >
> > I did some test but did not find a case like that.
> >
>
> This seems like an extra check.  Apart from that if we combine 0001
> and 0002 there should be an additional protection so that it should
> not happen that in cost_gather we have ignored the parallel tuple cost
> and now we are rejecting the parallel insert. Probably we should add
> an assert.

Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is GatherState and the plan tree will not be Gather.  I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the patch set.

Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at nodeGather.c:61
(gdb) p gatherstate
$10 = (GatherState *) 0x5647fac83850
(gdb) p gatherstate->ps.plan
$11 = (Plan *) 0x5647fac918a0

Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
663     {
(gdb) p ps
$13 = (PlanState *) 0x5647fac83850
(gdb) p ps->plan
$14 = (Plan *) 0x5647fac918a0

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Thu, Dec 10, 2020 at 5:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 4:49 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > +               allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > > +                               plannedstmt->parallelModeNeeded &&
> > > +                               plannedstmt->planTree &&
> > > +                               IsA(plannedstmt->planTree, Gather) &&
> > > +                               plannedstmt->planTree->lefttree &&
> > > +                               plannedstmt->planTree->lefttree->parallel_aware &&
> > > +                               plannedstmt->planTree->lefttree->parallel_safe;
> > >
> > > I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> > > Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?
> > >
> > > I did some test but did not find a case like that.
> > >
> >
> > This seems like an extra check.  Apart from that if we combine 0001
> > and 0002 there should be an additional protection so that it should
> > not happen that in cost_gather we have ignored the parallel tuple cost
> > and now we are rejecting the parallel insert. Probably we should add
> > an assert.
>
> Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState
checkis enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in
ExecInitGatherwhich are the ones we are checking here. So, there is no chance that the plan state is GatherState and
theplan tree will not be Gather.  I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the
patchset. 
>
> Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at
nodeGather.c:61
> (gdb) p gatherstate
> $10 = (GatherState *) 0x5647fac83850
> (gdb) p gatherstate->ps.plan
> $11 = (Plan *) 0x5647fac918a0
>
> Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
> 663     {
> (gdb) p ps
> $13 = (PlanState *) 0x5647fac83850
> (gdb) p ps->plan
> $14 = (Plan *) 0x5647fac918a0
>

Hope you did not miss the second part of my comment
"
> Apart from that if we combine 0001
> and 0002 there should be additional protection so that it should
> not happen that in cost_gather we have ignored the parallel tuple cost
> and now we are rejecting the parallel insert. Probably we should add
> an assert.
"

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > +               allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > > > +                               plannedstmt->parallelModeNeeded &&
> > > > +                               plannedstmt->planTree &&
> > > > +                               IsA(plannedstmt->planTree, Gather) &&
> > > > +                               plannedstmt->planTree->lefttree &&
> > > > +                               plannedstmt->planTree->lefttree->parallel_aware &&
> > > > +                               plannedstmt->planTree->lefttree->parallel_safe;
> > > >
> > > > I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> > > > Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?
> > > >
> > > > I did some test but did not find a case like that.
> > > >
> > > This seems like an extra check.  Apart from that if we combine 0001
> > > and 0002 there should be an additional protection so that it should
> > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > and now we are rejecting the parallel insert. Probably we should add
> > > an assert.
> >
> > Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState
checkis enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in
ExecInitGatherwhich are the ones we are checking here. So, there is no chance that the plan state is GatherState and
theplan tree will not be Gather.  I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the
patchset. 
> >
> > Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at
nodeGather.c:61
> > (gdb) p gatherstate
> > $10 = (GatherState *) 0x5647fac83850
> > (gdb) p gatherstate->ps.plan
> > $11 = (Plan *) 0x5647fac918a0
> >
> > Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
> > 663     {
> > (gdb) p ps
> > $13 = (PlanState *) 0x5647fac83850
> > (gdb) p ps->plan
> > $14 = (Plan *) 0x5647fac918a0
> >
> Hope you did not miss the second part of my comment
> "
> > Apart from that if we combine 0001
> > and 0002 there should be additional protection so that it should
> > not happen that in cost_gather we have ignored the parallel tuple cost
> > and now we are rejecting the parallel insert. Probably we should add
> > an assert.
> "

IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
ignore the parallel tuple cost and while checking to allow or disallow
parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
assert something like Assert(cost_ignored_in_cost_gather && allow)
before return allow;

This assertion fails 1) either if we have not ignored the cost but
allowing parallel inserts 2) or we ignored the cost but not allowing
parallel inserts.

1) seems to be fine, we can go ahead and perform parallel inserts. 2)
is the concern that the planner would have wrongly chosen the parallel
plan, but in this case also isn't it better to go ahead with the
parallel plan instead of failing the query?

+        /*
+         * We allow parallel inserts by the workers only if the Gather node has
+         * no projections to perform and if the upper node is Gather. In case,
+         * the Gather node has projections, which is possible if there are any
+         * subplans in the query, the workers can not do those projections. And
+         * when the upper node is GatherMerge, then the leader has to perform
+         * the final phase i.e. merge the results by workers.
+         */
+        allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+                plannedstmt->parallelModeNeeded &&
+                plannedstmt->planTree &&
+                plannedstmt->planTree->lefttree &&
+                plannedstmt->planTree->lefttree->parallel_aware &&
+                plannedstmt->planTree->lefttree->parallel_safe;
+
+        return allow;
+    }

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Dec 10, 2020 at 7:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > +               allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > > > > +                               plannedstmt->parallelModeNeeded &&
> > > > > +                               plannedstmt->planTree &&
> > > > > +                               IsA(plannedstmt->planTree, Gather) &&
> > > > > +                               plannedstmt->planTree->lefttree &&
> > > > > +                               plannedstmt->planTree->lefttree->parallel_aware &&
> > > > > +                               plannedstmt->planTree->lefttree->parallel_safe;
> > > > >
> > > > > I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> > > > > Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false
?
> > > > >
> > > > > I did some test but did not find a case like that.
> > > > >
> > > > This seems like an extra check.  Apart from that if we combine 0001
> > > > and 0002 there should be an additional protection so that it should
> > > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > > and now we are rejecting the parallel insert. Probably we should add
> > > > an assert.
> > >
> > > Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState
checkis enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in
ExecInitGatherwhich are the ones we are checking here. So, there is no chance that the plan state is GatherState and
theplan tree will not be Gather.  I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the
patchset. 
> > >
> > > Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at
nodeGather.c:61
> > > (gdb) p gatherstate
> > > $10 = (GatherState *) 0x5647fac83850
> > > (gdb) p gatherstate->ps.plan
> > > $11 = (Plan *) 0x5647fac918a0
> > >
> > > Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
> > > 663     {
> > > (gdb) p ps
> > > $13 = (PlanState *) 0x5647fac83850
> > > (gdb) p ps->plan
> > > $14 = (Plan *) 0x5647fac918a0
> > >
> > Hope you did not miss the second part of my comment
> > "
> > > Apart from that if we combine 0001
> > > and 0002 there should be additional protection so that it should
> > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > and now we are rejecting the parallel insert. Probably we should add
> > > an assert.
> > "
>
> IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
> ignore the parallel tuple cost and while checking to allow or disallow
> parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
> assert something like Assert(cost_ignored_in_cost_gather && allow)
> before return allow;
>
> This assertion fails 1) either if we have not ignored the cost but
> allowing parallel inserts 2) or we ignored the cost but not allowing
> parallel inserts.
>
> 1) seems to be fine, we can go ahead and perform parallel inserts. 2)
> is the concern that the planner would have wrongly chosen the parallel
> plan, but in this case also isn't it better to go ahead with the
> parallel plan instead of failing the query?
>
> +        /*
> +         * We allow parallel inserts by the workers only if the Gather node has
> +         * no projections to perform and if the upper node is Gather. In case,
> +         * the Gather node has projections, which is possible if there are any
> +         * subplans in the query, the workers can not do those projections. And
> +         * when the upper node is GatherMerge, then the leader has to perform
> +         * the final phase i.e. merge the results by workers.
> +         */
> +        allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> +                plannedstmt->parallelModeNeeded &&
> +                plannedstmt->planTree &&
> +                plannedstmt->planTree->lefttree &&
> +                plannedstmt->planTree->lefttree->parallel_aware &&
> +                plannedstmt->planTree->lefttree->parallel_safe;
> +
> +        return allow;
> +    }

I added the assertion into the 0002 patch so that it fails when the
planner ignores parallel tuple cost and may choose parallel plan but
later we don't allow parallel inserts. make check and make check-world
passeses without any assertion failures.

Attaching v11 patch set. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi

> Attaching v11 patch set. Please review it further.

Currently with the patch, we can allow parallel CTAS when topnode is Gather.
When top-node is Append and Gather is the sub-node of Append, I think we can still enable 
Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:

Append
------>Gather
--------->Create table
------------->Seqscan
------>Gather
--------->create table
------------->Seqscan

And the use case seems common to me, such as:
select * from A where xxx union all select * from B where xxx;

I attatch a WIP patch which just show the possibility of this feature.
The patch is based on the latest v11-patch.

What do you think?

Best regards,
houzj






Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Mon, Dec 14, 2020 at 4:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 7:20 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > +               allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > > > > > +                               plannedstmt->parallelModeNeeded &&
> > > > > > +                               plannedstmt->planTree &&
> > > > > > +                               IsA(plannedstmt->planTree, Gather) &&
> > > > > > +                               plannedstmt->planTree->lefttree &&
> > > > > > +                               plannedstmt->planTree->lefttree->parallel_aware &&
> > > > > > +                               plannedstmt->planTree->lefttree->parallel_safe;
> > > > > >
> > > > > > I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> > > > > > Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is
false? 
> > > > > >
> > > > > > I did some test but did not find a case like that.
> > > > > >
> > > > > This seems like an extra check.  Apart from that if we combine 0001
> > > > > and 0002 there should be an additional protection so that it should
> > > > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > > > and now we are rejecting the parallel insert. Probably we should add
> > > > > an assert.
> > > >
> > > > Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather).
GatherStatecheck is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan
treein ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is
GatherStateand the plan tree will not be Gather.  I will remove IsA(plannedstmt->planTree, Gather) check in the next
versionof the patch set. 
> > > >
> > > > Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at
nodeGather.c:61
> > > > (gdb) p gatherstate
> > > > $10 = (GatherState *) 0x5647fac83850
> > > > (gdb) p gatherstate->ps.plan
> > > > $11 = (Plan *) 0x5647fac918a0
> > > >
> > > > Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
> > > > 663     {
> > > > (gdb) p ps
> > > > $13 = (PlanState *) 0x5647fac83850
> > > > (gdb) p ps->plan
> > > > $14 = (Plan *) 0x5647fac918a0
> > > >
> > > Hope you did not miss the second part of my comment
> > > "
> > > > Apart from that if we combine 0001
> > > > and 0002 there should be additional protection so that it should
> > > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > > and now we are rejecting the parallel insert. Probably we should add
> > > > an assert.
> > > "
> >
> > IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
> > ignore the parallel tuple cost and while checking to allow or disallow
> > parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
> > assert something like Assert(cost_ignored_in_cost_gather && allow)
> > before return allow;
> >
> > This assertion fails 1) either if we have not ignored the cost but
> > allowing parallel inserts 2) or we ignored the cost but not allowing
> > parallel inserts.
> >
> > 1) seems to be fine, we can go ahead and perform parallel inserts. 2)
> > is the concern that the planner would have wrongly chosen the parallel
> > plan, but in this case also isn't it better to go ahead with the
> > parallel plan instead of failing the query?
> >
> > +        /*
> > +         * We allow parallel inserts by the workers only if the Gather node has
> > +         * no projections to perform and if the upper node is Gather. In case,
> > +         * the Gather node has projections, which is possible if there are any
> > +         * subplans in the query, the workers can not do those projections. And
> > +         * when the upper node is GatherMerge, then the leader has to perform
> > +         * the final phase i.e. merge the results by workers.
> > +         */
> > +        allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > +                plannedstmt->parallelModeNeeded &&
> > +                plannedstmt->planTree &&
> > +                plannedstmt->planTree->lefttree &&
> > +                plannedstmt->planTree->lefttree->parallel_aware &&
> > +                plannedstmt->planTree->lefttree->parallel_safe;
> > +
> > +        return allow;
> > +    }
>
> I added the assertion into the 0002 patch so that it fails when the
> planner ignores parallel tuple cost and may choose parallel plan but
> later we don't allow parallel inserts. make check and make check-world
> passeses without any assertion failures.
>
> Attaching v11 patch set. Please review it further.

I can see a lot of unrelated changes in 0002, or you have done a lot
of code refactoring especially in createas.c file.  If it is intended
refactoring then please move the refactoring to a separate patch so
that the patch is readable.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
For set_append_rel_size(), it seems this is the difference between query_level != 1 and query_level == 1:

+                   (root->parent_root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&

Maybe extract the common conditions into its own expression / variable so that the code is easier to read.

Cheers

On Mon, Dec 14, 2020 at 4:50 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi

> Attaching v11 patch set. Please review it further.

Currently with the patch, we can allow parallel CTAS when topnode is Gather.
When top-node is Append and Gather is the sub-node of Append, I think we can still enable
Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:

Append
------>Gather
--------->Create table
------------->Seqscan
------>Gather
--------->create table
------------->Seqscan

And the use case seems common to me, such as:
select * from A where xxx union all select * from B where xxx;

I attatch a WIP patch which just show the possibility of this feature.
The patch is based on the latest v11-patch.

What do you think?

Best regards,
houzj





Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> Currently with the patch, we can allow parallel CTAS when topnode is Gather.
> When top-node is Append and Gather is the sub-node of Append, I think we can still enable
> Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:
>
> Append
> ------>Gather
> --------->Create table
> ------------->Seqscan
> ------>Gather
> --------->create table
> ------------->Seqscan
>
> And the use case seems common to me, such as:
> select * from A where xxx union all select * from B where xxx;

Thanks for the append use case.

Here's my analysis on pushing parallel inserts down even in case the
top node is Append.

For union cases which need to remove duplicate tuples, we can't push
the inserts or CTAS dest receiver down. If I'm not wrong, Append node
is not doing duplicate removal(??), I saw that it's the HashAggregate
node (which is the top node that removes the duplicate tuples). And
also for except/except all/intersect/intersect all cases we receive
HashSetOp nodes on top of Append. So for both cases, our check for
Gather or Append at the top node is enough to detect this to not allow
parallel inserts.

For union all:
case 1: We can push the CTAS dest receiver to each Gather node
 Append
     ->Gather
         ->Parallel Seq Scan
     ->Gather
         ->Parallel Seq Scan
      ->Gather
         ->Parallel Seq Scan

case 2: We can still push the CTAS dest receiver to each Gather node.
Non-Gather nodes will do inserts as they do now i.e. by sending tuples
to Append and from there to CTAS dest receiver.
 Append
     ->Gather
         ->Parallel Seq Scan
     ->Seq Scan / Join / any other non-Gather node
     ->Gather
         ->Parallel Seq Scan
     ->Seq Scan / Join / any other non-Gather node

case 3:  We can push the CTAS dest receiver to Gather
 Gather
     ->Parallel Append
         ->Parallel Seq Scan
         ->Parallel Seq Scan

case 4: We can push the CTAS dest receiver to Gather
 Gather
     ->Parallel Append
         ->Parallel Seq Scan
         ->Parallel Seq Scan
         ->Seq Scan / Join / any other non-Gather node

Please let me know if I'm missing any other possible use case.

Thoughts?

> I attach a WIP patch which just show the possibility of this feature.
> The patch is based on the latest v11-patch.
>
> What do you think?

As suggested by Amit earlier, I kept the 0001 patch(so far) such that
it doesn't have the code to influence the planner to consider parallel
tuple cost as 0. It works on the plan whatever gets generated and
decides to allow parallel inserts or not. And in the 0002 patch, I
added the code for influencing the planner to consider parallel tuple
cost as 0. Maybe we can have a 0003 patch for tests alone.

Once we are okay with the above analysis and use cases, we can
incorporate the Append changes to respective patches.

Hope that's okay.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Tue, Dec 15, 2020 at 2:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Currently with the patch, we can allow parallel CTAS when topnode is Gather.
> > When top-node is Append and Gather is the sub-node of Append, I think we can still enable
> > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:
> >
> > Append
> > ------>Gather
> > --------->Create table
> > ------------->Seqscan
> > ------>Gather
> > --------->create table
> > ------------->Seqscan
> >
> > And the use case seems common to me, such as:
> > select * from A where xxx union all select * from B where xxx;
>
> Thanks for the append use case.
>
> Here's my analysis on pushing parallel inserts down even in case the
> top node is Append.
>
> For union cases which need to remove duplicate tuples, we can't push
> the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> is not doing duplicate removal(??), I saw that it's the HashAggregate
> node (which is the top node that removes the duplicate tuples). And
> also for except/except all/intersect/intersect all cases we receive
> HashSetOp nodes on top of Append. So for both cases, our check for
> Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
>
> For union all:
> case 1: We can push the CTAS dest receiver to each Gather node
>  Append
>      ->Gather
>          ->Parallel Seq Scan
>      ->Gather
>          ->Parallel Seq Scan
>       ->Gather
>          ->Parallel Seq Scan
>
> case 2: We can still push the CTAS dest receiver to each Gather node.
> Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> to Append and from there to CTAS dest receiver.
>  Append
>      ->Gather
>          ->Parallel Seq Scan
>      ->Seq Scan / Join / any other non-Gather node
>      ->Gather
>          ->Parallel Seq Scan
>      ->Seq Scan / Join / any other non-Gather node
>
> case 3:  We can push the CTAS dest receiver to Gather
>  Gather
>      ->Parallel Append
>          ->Parallel Seq Scan
>          ->Parallel Seq Scan
>
> case 4: We can push the CTAS dest receiver to Gather
>  Gather
>      ->Parallel Append
>          ->Parallel Seq Scan
>          ->Parallel Seq Scan
>          ->Seq Scan / Join / any other non-Gather node
>
> Please let me know if I'm missing any other possible use case.
>
> Thoughts?

Your analysis looks right to me.

> > I attach a WIP patch which just show the possibility of this feature.
> > The patch is based on the latest v11-patch.
> >
> > What do you think?
>
> As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> it doesn't have the code to influence the planner to consider parallel
> tuple cost as 0. It works on the plan whatever gets generated and
> decides to allow parallel inserts or not. And in the 0002 patch, I
> added the code for influencing the planner to consider parallel tuple
> cost as 0. Maybe we can have a 0003 patch for tests alone.

Yeah, that makes sense and it will be easy for the review.

> Once we are okay with the above analysis and use cases, we can
> incorporate the Append changes to respective patches.
>
> Hope that's okay.

Make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> Thanks for the append use case.
> 
> Here's my analysis on pushing parallel inserts down even in case the top
> node is Append.
> 
> For union cases which need to remove duplicate tuples, we can't push the
> inserts or CTAS dest receiver down. If I'm not wrong, Append node is not
> doing duplicate removal(??), I saw that it's the HashAggregate node (which
> is the top node that removes the duplicate tuples). And also for
> except/except all/intersect/intersect all cases we receive HashSetOp nodes
> on top of Append. So for both cases, our check for Gather or Append at the
> top node is enough to detect this to not allow parallel inserts.
> 
> For union all:
> case 1: We can push the CTAS dest receiver to each Gather node  Append
>      ->Gather
>          ->Parallel Seq Scan
>      ->Gather
>          ->Parallel Seq Scan
>       ->Gather
>          ->Parallel Seq Scan
> 
> case 2: We can still push the CTAS dest receiver to each Gather node.
> Non-Gather nodes will do inserts as they do now i.e. by sending tuples to
> Append and from there to CTAS dest receiver.
>  Append
>      ->Gather
>          ->Parallel Seq Scan
>      ->Seq Scan / Join / any other non-Gather node
>      ->Gather
>          ->Parallel Seq Scan
>      ->Seq Scan / Join / any other non-Gather node
> 
> case 3:  We can push the CTAS dest receiver to Gather  Gather
>      ->Parallel Append
>          ->Parallel Seq Scan
>          ->Parallel Seq Scan
> 
> case 4: We can push the CTAS dest receiver to Gather  Gather
>      ->Parallel Append
>          ->Parallel Seq Scan
>          ->Parallel Seq Scan
>          ->Seq Scan / Join / any other non-Gather node
> 
> Please let me know if I'm missing any other possible use case.
> 
> Thoughts?


Yes, The analysis looks right to me.


> As suggested by Amit earlier, I kept the 0001 patch(so far) such that it
> doesn't have the code to influence the planner to consider parallel tuple
> cost as 0. It works on the plan whatever gets generated and decides to allow
> parallel inserts or not. And in the 0002 patch, I added the code for
> influencing the planner to consider parallel tuple cost as 0. Maybe we can
> have a 0003 patch for tests alone.
> 
> Once we are okay with the above analysis and use cases, we can incorporate
> the Append changes to respective patches.
> 
> Hope that's okay.

A little explanation about how to push down the ctas info in append.

1. about how to ignore tuple cost in this case.
IMO, it create gather path under append like the following:
query_planner
-make_one_rel
--set_base_rel_sizes
---set_rel_size
----set_append_rel_size (*)
-----set_rel_size
------set_subquery_pathlist
-------subquery_planner
--------grouping_planner
---------apply_scanjoin_target_to_paths
----------generate_useful_gather_paths

set_append_rel_size seems the right place where we can check and set a flag to ignore tuple cost later.
We can set the flag for two cases when there is no parent path will be created(such as : limit,sort,distinct...):
i) query_level is 1
ii) query_level > 1 and we have set the flag in the parent_root.

The case ii) is to check append under append:
Append
   ->Append
       ->Gather
   ->Other plan

2.about how to push ctas info down.

We traversing the whole plans tree, and we only care Append and Gather type.
Gather: It set the ctas dest info and returned true at once if the gathernode does not have projection.
Append: It will recursively traversing the subplan of Appendnode and will reture true if one of the subplan can be
parallel.

+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+    bool parallel = false;
+    
+    if(ps == NULL)
+        return parallel;
+
+    if(IsA(ps, AppendState))
+    {
+        AppendState *aps = (AppendState *) ps;
+        for(int i = 0; i < aps->as_nplans; i++)
+        {
+            parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
+        }
+    }
+    else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+    {
+        GatherState *gstate = (GatherState *) ps;
+        parallel = true;
+
+        ((DR_intorel *) dest)->is_parallel = true;
+        gstate->dest = dest;
+        ps->plan->plan_rows = 0;
+    }
+
+    return parallel;
+}

Best regards,
houzj



Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> From: Hou, Zhijie [mailto:houzj.fnst@cn.fujitsu.com]
> Sent: Tuesday, December 15, 2020 7:30 PM
> To: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> Cc: Amit Kapila <amit.kapila16@gmail.com>; Luc Vlaming <luc@swarm64.com>;
> PostgreSQL-development <pgsql-hackers@postgresql.org>; Zhihong Yu
> <zyu@yugabyte.com>; Dilip Kumar <dilipbalaut@gmail.com>
> Subject: RE: Parallel Inserts in CREATE TABLE AS
> 
> > Thanks for the append use case.
> >
> > Here's my analysis on pushing parallel inserts down even in case the
> > top node is Append.
> >
> > For union cases which need to remove duplicate tuples, we can't push
> > the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> > is not doing duplicate removal(??), I saw that it's the HashAggregate
> > node (which is the top node that removes the duplicate tuples). And
> > also for except/except all/intersect/intersect all cases we receive
> > HashSetOp nodes on top of Append. So for both cases, our check for
> > Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
> >
> > For union all:
> > case 1: We can push the CTAS dest receiver to each Gather node  Append
> >      ->Gather
> >          ->Parallel Seq Scan
> >      ->Gather
> >          ->Parallel Seq Scan
> >       ->Gather
> >          ->Parallel Seq Scan
> >
> > case 2: We can still push the CTAS dest receiver to each Gather node.
> > Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> > to Append and from there to CTAS dest receiver.
> >  Append
> >      ->Gather
> >          ->Parallel Seq Scan
> >      ->Seq Scan / Join / any other non-Gather node
> >      ->Gather
> >          ->Parallel Seq Scan
> >      ->Seq Scan / Join / any other non-Gather node
> >
> > case 3:  We can push the CTAS dest receiver to Gather  Gather
> >      ->Parallel Append
> >          ->Parallel Seq Scan
> >          ->Parallel Seq Scan
> >
> > case 4: We can push the CTAS dest receiver to Gather  Gather
> >      ->Parallel Append
> >          ->Parallel Seq Scan
> >          ->Parallel Seq Scan
> >          ->Seq Scan / Join / any other non-Gather node
> >
> > Please let me know if I'm missing any other possible use case.
> >
> > Thoughts?
> 
> 
> Yes, The analysis looks right to me.
> 
> 
> > As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> > it doesn't have the code to influence the planner to consider parallel
> > tuple cost as 0. It works on the plan whatever gets generated and
> > decides to allow parallel inserts or not. And in the 0002 patch, I
> > added the code for influencing the planner to consider parallel tuple
> > cost as 0. Maybe we can have a 0003 patch for tests alone.
> >
> > Once we are okay with the above analysis and use cases, we can
> > incorporate the Append changes to respective patches.
> >
> > Hope that's okay.
> 
> A little explanation about how to push down the ctas info in append.
> 
> 1. about how to ignore tuple cost in this case.
> IMO, it create gather path under append like the following:
> query_planner
> -make_one_rel
> --set_base_rel_sizes
> ---set_rel_size
> ----set_append_rel_size (*)
> -----set_rel_size
> ------set_subquery_pathlist
> -------subquery_planner
> --------grouping_planner
> ---------apply_scanjoin_target_to_paths
> ----------generate_useful_gather_paths
> 
> set_append_rel_size seems the right place where we can check and set a flag
> to ignore tuple cost later.
> We can set the flag for two cases when there is no parent path will be
> created(such as : limit,sort,distinct...):
> i) query_level is 1
> ii) query_level > 1 and we have set the flag in the parent_root.
> 
> The case ii) is to check append under append:
> Append
>    ->Append
>        ->Gather
>    ->Other plan
> 
> 2.about how to push ctas info down.
> 
> We traversing the whole plans tree, and we only care Append and Gather type.
> Gather: It set the ctas dest info and returned true at once if the gathernode
> does not have projection.
> Append: It will recursively traversing the subplan of Appendnode and will
> reture true if one of the subplan can be parallel.
> 
> +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) {
> +    bool parallel = false;
> +
> +    if(ps == NULL)
> +        return parallel;
> +
> +    if(IsA(ps, AppendState))
> +    {
> +        AppendState *aps = (AppendState *) ps;
> +        for(int i = 0; i < aps->as_nplans; i++)
> +        {
> +            parallel |=
> PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
> +        }
> +    }
> +    else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
> +    {
> +        GatherState *gstate = (GatherState *) ps;
> +        parallel = true;
> +
> +        ((DR_intorel *) dest)->is_parallel = true;
> +        gstate->dest = dest;
> +        ps->plan->plan_rows = 0;
> +    }
> +
> +    return parallel;
> +}

So sorry for my miss, my last patch has some mistakes.
Attatch the new one.

Best regards,
houzj



Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Dec 15, 2020 at 5:48 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > A little explanation about how to push down the ctas info in append.
> >
> > 1. about how to ignore tuple cost in this case.
> > IMO, it create gather path under append like the following:
> > query_planner
> > -make_one_rel
> > --set_base_rel_sizes
> > ---set_rel_size
> > ----set_append_rel_size (*)
> > -----set_rel_size
> > ------set_subquery_pathlist
> > -------subquery_planner
> > --------grouping_planner
> > ---------apply_scanjoin_target_to_paths
> > ----------generate_useful_gather_paths
> >
> > set_append_rel_size seems the right place where we can check and set a flag
> > to ignore tuple cost later.
> > We can set the flag for two cases when there is no parent path will be
> > created(such as : limit,sort,distinct...):
> > i) query_level is 1
> > ii) query_level > 1 and we have set the flag in the parent_root.
> >
> > The case ii) is to check append under append:
> > Append
> >    ->Append
> >        ->Gather
> >    ->Other plan
> >
> > 2.about how to push ctas info down.
> >
> > We traversing the whole plans tree, and we only care Append and Gather type.
> > Gather: It set the ctas dest info and returned true at once if the gathernode
> > does not have projection.
> > Append: It will recursively traversing the subplan of Appendnode and will
> > reture true if one of the subplan can be parallel.
> >
> > +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) {
> > +     bool parallel = false;
> > +
> > +     if(ps == NULL)
> > +             return parallel;
> > +
> > +     if(IsA(ps, AppendState))
> > +     {
> > +             AppendState *aps = (AppendState *) ps;
> > +             for(int i = 0; i < aps->as_nplans; i++)
> > +             {
> > +                     parallel |=
> > PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
> > +             }
> > +     }
> > +     else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
> > +     {
> > +             GatherState *gstate = (GatherState *) ps;
> > +             parallel = true;
> > +
> > +             ((DR_intorel *) dest)->is_parallel = true;
> > +             gstate->dest = dest;
> > +             ps->plan->plan_rows = 0;
> > +     }
> > +
> > +     return parallel;
> > +}
>
> So sorry for my miss, my last patch has some mistakes.
> Attatch the new one.

Thanks for the append patches. Basically your changes look good to me.
I'm merging them to the original patch set and adding the test cases
to cover these cases. I will post the updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Dec 15, 2020 at 5:53 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I'm merging them to the original patch set and adding the test cases
> to cover these cases. I will post the updated patch set soon.

Attaching v12 patch set.

0001 - parallel inserts without tuple cost enforcement.
0002 - enforce planner for parallel tuple cost
0003 - test cases

Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Wed, Dec 16, 2020 at 12:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Dec 15, 2020 at 5:53 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > I'm merging them to the original patch set and adding the test cases
> > to cover these cases. I will post the updated patch set soon.
>
> Attaching v12 patch set.
>
> 0001 - parallel inserts without tuple cost enforcement.
> 0002 - enforce planner for parallel tuple cost
> 0003 - test cases
>
> Please review it further.
>

I think it will be clean to implement the parallel CTAS when a
top-level node is the gather node.  Basically, the idea is that
whenever we get the gather on the top which doesn't have any
projection then we can push down the dest receiver directly to the
worker.  I agree that append is an exception that doesn't do any extra
processing other than appending the results, So IMHO it would be
better that in the first part we parallelize the plan where gather
node on top.  I see that we have already worked on the patch where the
append node is on top so I would suggest that we can keep that part in
a separate patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi

The cfbost seems complains about the testcase:

Command exited with code 1
perl dumpregr.pl
=== $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out
C:/projects/postgresql/src/test/regress/results/write_parallel.out
--- C:/projects/postgresql/src/test/regress/expected/write_parallel.out    2020-12-21 01:41:17.745091500 +0000
+++ C:/projects/postgresql/src/test/regress/results/write_parallel.out    2020-12-21 01:47:20.375514800 +0000
@@ -1204,7 +1204,7 @@
                ->  Gather (actual rows=2 loops=1)
                      Workers Planned: 3
                      Workers Launched: 3
-                     ->  Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+                     ->  Parallel Seq Scan on temp2 (actual rows=1 loops=4)
                            Filter: (col2 < 3)
                            Rows Removed by Filter: 1
(14 rows)
@@ -1233,7 +1233,7 @@
                ->  Gather (actual rows=2 loops=1)
                      Workers Planned: 3
                      Workers Launched: 3
-                     ->  Parallel Seq Scan on temp2 (actual rows=0 loops=4)
+                     ->  Parallel Seq Scan on temp2 (actual rows=1 loops=4)
                            Filter: (col2 < 3)
                            Rows Removed by Filter: 1
(14 rows)

Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Dec 18, 2020 at 10:08 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I think it will be clean to implement the parallel CTAS when a
> top-level node is the gather node.  Basically, the idea is that
> whenever we get the gather on the top which doesn't have any
> projection then we can push down the dest receiver directly to the
> worker.  I agree that append is an exception that doesn't do any extra
> processing other than appending the results, So IMHO it would be
> better that in the first part we parallelize the plan where gather
> node on top.  I see that we have already worked on the patch where the
> append node is on top so I would suggest that we can keep that part in
> a separate patch.

Thanks! I rearranged the patches to keep the append part separate in
the 0004 patch.

Attaching v13 patch set:

0001 - parallel inserts in ctas without planner enforcement for tuple
cost calculation
0002 - planner enforcement for tuple cost calculation
0003 - tests
0004 - enabling parallel inserts for Append cases, related planner
enforcement code and tests.

Please consider these patches for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 21, 2020 at 8:16 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> The cfbost seems complains about the testcase:
>
> Command exited with code 1
> perl dumpregr.pl
> === $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out
C:/projects/postgresql/src/test/regress/results/write_parallel.out
> --- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 +0000
> +++ C:/projects/postgresql/src/test/regress/results/write_parallel.out  2020-12-21 01:47:20.375514800 +0000
> @@ -1204,7 +1204,7 @@
>                 ->  Gather (actual rows=2 loops=1)
>                       Workers Planned: 3
>                       Workers Launched: 3
> -                     ->  Parallel Seq Scan on temp2 (actual rows=0 loops=4)
> +                     ->  Parallel Seq Scan on temp2 (actual rows=1 loops=4)
>                             Filter: (col2 < 3)
>                             Rows Removed by Filter: 1
> (14 rows)
> @@ -1233,7 +1233,7 @@
>                 ->  Gather (actual rows=2 loops=1)
>                       Workers Planned: 3
>                       Workers Launched: 3
> -                     ->  Parallel Seq Scan on temp2 (actual rows=0 loops=4)
> +                     ->  Parallel Seq Scan on temp2 (actual rows=1 loops=4)
>                             Filter: (col2 < 3)
>                             Rows Removed by Filter: 1
> (14 rows)

Thanks! Looks like the explain analyze test case outputs can be
unstable because we may not get the requested number of workers
always. The comment before explain_parallel_append function in
partition_prune.sql explains it well.

Solution is to have a function similar to explain_parallel_append, say
explain_parallel_inserts in write_parallel.sql and use that for all
explain analyze cases. This will make the results consistent.
Thoughts? If okay, I will update the test cases and post new patches.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Mon, Dec 21, 2020 at 8:16 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > The cfbost seems complains about the testcase:
> >
> > Command exited with code 1
> > perl dumpregr.pl
> > === $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out
C:/projects/postgresql/src/test/regress/results/write_parallel.out
> > --- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 +0000
> > +++ C:/projects/postgresql/src/test/regress/results/write_parallel.out  2020-12-21 01:47:20.375514800 +0000
> > @@ -1204,7 +1204,7 @@
> >                 ->  Gather (actual rows=2 loops=1)
> >                       Workers Planned: 3
> >                       Workers Launched: 3
> > -                     ->  Parallel Seq Scan on temp2 (actual rows=0 loops=4)
> > +                     ->  Parallel Seq Scan on temp2 (actual rows=1 loops=4)
> >                             Filter: (col2 < 3)
> >                             Rows Removed by Filter: 1
> > (14 rows)
> > @@ -1233,7 +1233,7 @@
> >                 ->  Gather (actual rows=2 loops=1)
> >                       Workers Planned: 3
> >                       Workers Launched: 3
> > -                     ->  Parallel Seq Scan on temp2 (actual rows=0 loops=4)
> > +                     ->  Parallel Seq Scan on temp2 (actual rows=1 loops=4)
> >                             Filter: (col2 < 3)
> >                             Rows Removed by Filter: 1
> > (14 rows)
>
> Thanks! Looks like the explain analyze test case outputs can be
> unstable because we may not get the requested number of workers
> always. The comment before explain_parallel_append function in
> partition_prune.sql explains it well.
>
> Solution is to have a function similar to explain_parallel_append, say
> explain_parallel_inserts in write_parallel.sql and use that for all
> explain analyze cases. This will make the results consistent.
> Thoughts? If okay, I will update the test cases and post new patches.

Attaching v14 patch set that has above changes. Please consider this
for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> Attaching v14 patch set that has above changes. Please consider this
> for further review.
>

Few comments:
In the below case, should create be above Gather?
postgres=# explain  create table t7 as select * from t6;
                            QUERY PLAN
-------------------------------------------------------------------
 Gather  (cost=0.00..9.17 rows=0 width=4)
   Workers Planned: 2
 ->  Create t7
   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
(4 rows)

Can we change it to something like:
-------------------------------------------------------------------
Create t7
 -> Gather  (cost=0.00..9.17 rows=0 width=4)
  Workers Planned: 2
  ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
(4 rows)

You could change intoclause_len = strlen(intoclausestr) to
strlen(intoclausestr) + 1 and use intoclause_len in the remaining
places. We can avoid the +1 in the other places.
+       /* Estimate space for into clause for CTAS. */
+       if (IS_CTAS(intoclause) && OidIsValid(objectid))
+       {
+               intoclausestr = nodeToString(intoclause);
+               intoclause_len = strlen(intoclausestr);
+               shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+               shm_toc_estimate_keys(&pcxt->estimator, 1);
+       }

Can we use  node->nworkers_launched == 0 in place of
node->need_to_scan_locally, that way the setting and resetting of
node->need_to_scan_locally can be removed. Unless need_to_scan_locally
is needed in any of the functions that gets called.
+       /* Enable leader to insert in case no parallel workers were launched. */
+       if (node->nworkers_launched == 0)
+               node->need_to_scan_locally = true;
+
+       /*
+        * By now, for parallel workers (if launched any), would have
started their
+        * work i.e. insertion to target table. In case the leader is chosen to
+        * participate for parallel inserts in CTAS, then finish its
share before
+        * going to wait for the parallel workers to finish.
+        */
+       if (node->need_to_scan_locally)
+       {

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > Attaching v14 patch set that has above changes. Please consider this
> > for further review.
> >
>
> Few comments:
> In the below case, should create be above Gather?
> postgres=# explain  create table t7 as select * from t6;
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Gather  (cost=0.00..9.17 rows=0 width=4)
>    Workers Planned: 2
>  ->  Create t7
>    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> (4 rows)
>
> Can we change it to something like:
> -------------------------------------------------------------------
> Create t7
>  -> Gather  (cost=0.00..9.17 rows=0 width=4)
>   Workers Planned: 2
>   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> (4 rows)
>

I think it is better to have it in a way as in the current patch
because that reflects that we are performing insert/create below
Gather which is the purpose of this patch. I think this is similar to
what the Parallel Insert patch [1] has for a similar plan.


[1] - https://commitfest.postgresql.org/31/2844/

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> You could change intoclause_len = strlen(intoclausestr) to
> strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> places. We can avoid the +1 in the other places.
> +       /* Estimate space for into clause for CTAS. */
> +       if (IS_CTAS(intoclause) && OidIsValid(objectid))
> +       {
> +               intoclausestr = nodeToString(intoclause);
> +               intoclause_len = strlen(intoclausestr);
> +               shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
> +               shm_toc_estimate_keys(&pcxt->estimator, 1);
> +       }

Done.

> Can we use  node->nworkers_launched == 0 in place of
> node->need_to_scan_locally, that way the setting and resetting of
> node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> is needed in any of the functions that gets called.
> +       /* Enable leader to insert in case no parallel workers were launched. */
> +       if (node->nworkers_launched == 0)
> +               node->need_to_scan_locally = true;
> +
> +       /*
> +        * By now, for parallel workers (if launched any), would have
> started their
> +        * work i.e. insertion to target table. In case the leader is chosen to
> +        * participate for parallel inserts in CTAS, then finish its
> share before
> +        * going to wait for the parallel workers to finish.
> +        */
> +       if (node->need_to_scan_locally)
> +       {

need_to_scan_locally is being set in ExecGather() even if
nworkers_launched > 0 it can still be true, so I think we can not
remove need_to_scan_locally in ExecParallelInsertInCTAS.

Attaching v15 patch set for further review. Note that the change is
only in 0001 patch, other patches remain unchanged from v14.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:


On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > Attaching v14 patch set that has above changes. Please consider this
> > > for further review.
> > >
> >
> > Few comments:
> > In the below case, should create be above Gather?
> > postgres=# explain  create table t7 as select * from t6;
> >                             QUERY PLAN
> > -------------------------------------------------------------------
> >  Gather  (cost=0.00..9.17 rows=0 width=4)
> >    Workers Planned: 2
> >  ->  Create t7
> >    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > (4 rows)
> >
> > Can we change it to something like:
> > -------------------------------------------------------------------
> > Create t7
> >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> >   Workers Planned: 2
> >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > (4 rows)
> >
>
> I think it is better to have it in a way as in the current patch
> because that reflects that we are performing insert/create below
> Gather which is the purpose of this patch. I think this is similar to
> what the Parallel Insert patch [1] has for a similar plan.
>
>
> [1] - https://commitfest.postgresql.org/31/2844/
>

Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create table will be done earlier itself. Should we change Create table to Insert table something like below:
                             QUERY PLAN                            
-------------------------------------------------------------------
 Gather  (cost=0.00..9.17 rows=0 width=4)
   Workers Planned: 2
 ->  Insert table2 (instead of Create table2)
   ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
> On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > > Attaching v14 patch set that has above changes. Please consider this
> > > > for further review.
> > > >
> > >
> > > Few comments:
> > > In the below case, should create be above Gather?
> > > postgres=# explain  create table t7 as select * from t6;
> > >                             QUERY PLAN
> > > -------------------------------------------------------------------
> > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > >    Workers Planned: 2
> > >  ->  Create t7
> > >    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > (4 rows)
> > >
> > > Can we change it to something like:
> > > -------------------------------------------------------------------
> > > Create t7
> > >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> > >   Workers Planned: 2
> > >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > (4 rows)
> > >
> >
> > I think it is better to have it in a way as in the current patch
> > because that reflects that we are performing insert/create below
> > Gather which is the purpose of this patch. I think this is similar to
> > what the Parallel Insert patch [1] has for a similar plan.
> >
> >
> > [1] - https://commitfest.postgresql.org/31/2844/
> >
>
> Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create
tablewill be done earlier itself. Should we change Create table to Insert table something like below:
 
>                              QUERY PLAN
> -------------------------------------------------------------------
>  Gather  (cost=0.00..9.17 rows=0 width=4)
>    Workers Planned: 2
>  ->  Insert table2 (instead of Create table2)
>    ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)

IMO, showing Insert under Gather makes sense if the query is INSERT
INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
query, so having Create under Gather looks fine to me. This way we can
also distinguish the EXPLAINs of parallel inserts in INSERT INTO
SELECT and CTAS.

And also, some might wonder that Create under Gather means that each
parallel worker is creating the table, it's actually not the creation
of the table that's parallelized but it's insertion. If required, we
can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
added docs related to allowing parallel inserts in CTAS. Shall I add a
para saying when parallel inserts can be picked and how the sample
EXPLAIN looks? Thoughts?

[1] - https://commitfest.postgresql.org/31/2844/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
> > On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > > > Attaching v14 patch set that has above changes. Please consider this
> > > > > for further review.
> > > > >
> > > >
> > > > Few comments:
> > > > In the below case, should create be above Gather?
> > > > postgres=# explain  create table t7 as select * from t6;
> > > >                             QUERY PLAN
> > > > -------------------------------------------------------------------
> > > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > > >    Workers Planned: 2
> > > >  ->  Create t7
> > > >    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > (4 rows)
> > > >
> > > > Can we change it to something like:
> > > > -------------------------------------------------------------------
> > > > Create t7
> > > >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> > > >   Workers Planned: 2
> > > >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > (4 rows)
> > > >
> > >
> > > I think it is better to have it in a way as in the current patch
> > > because that reflects that we are performing insert/create below
> > > Gather which is the purpose of this patch. I think this is similar to
> > > what the Parallel Insert patch [1] has for a similar plan.
> > >
> > >
> > > [1] - https://commitfest.postgresql.org/31/2844/
> > >
> >
> > Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create
tablewill be done earlier itself. Should we change Create table to Insert table something like below:
 
> >                              QUERY PLAN
> > -------------------------------------------------------------------
> >  Gather  (cost=0.00..9.17 rows=0 width=4)
> >    Workers Planned: 2
> >  ->  Insert table2 (instead of Create table2)
> >    ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)
>
> IMO, showing Insert under Gather makes sense if the query is INSERT
> INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
> query, so having Create under Gather looks fine to me. This way we can
> also distinguish the EXPLAINs of parallel inserts in INSERT INTO
> SELECT and CTAS.

I don't think that is a problem because now also if we EXPLAIN CTAS it
will appear like we are executing the select query because that is
what we are planning for only the select part.  So now if we are
including the INSERT in the planning and pushing the insert under the
gather then it will make more sense to show INSERT instead of showing
CREATE.  Let's see what others think.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
> > On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > > > Attaching v14 patch set that has above changes. Please consider this
> > > > > for further review.
> > > > >
> > > >
> > > > Few comments:
> > > > In the below case, should create be above Gather?
> > > > postgres=# explain  create table t7 as select * from t6;
> > > >                             QUERY PLAN
> > > > -------------------------------------------------------------------
> > > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > > >    Workers Planned: 2
> > > >  ->  Create t7
> > > >    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > (4 rows)
> > > >
> > > > Can we change it to something like:
> > > > -------------------------------------------------------------------
> > > > Create t7
> > > >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> > > >   Workers Planned: 2
> > > >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > (4 rows)
> > > >
> > >
> > > I think it is better to have it in a way as in the current patch
> > > because that reflects that we are performing insert/create below
> > > Gather which is the purpose of this patch. I think this is similar to
> > > what the Parallel Insert patch [1] has for a similar plan.
> > >
> > >
> > > [1] - https://commitfest.postgresql.org/31/2844/
> > >
> >
> > Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the Create
tablewill be done earlier itself. Should we change Create table to Insert table something like below:
 
> >                              QUERY PLAN
> > -------------------------------------------------------------------
> >  Gather  (cost=0.00..9.17 rows=0 width=4)
> >    Workers Planned: 2
> >  ->  Insert table2 (instead of Create table2)
> >    ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)
>
> IMO, showing Insert under Gather makes sense if the query is INSERT
> INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
> query, so having Create under Gather looks fine to me. This way we can
> also distinguish the EXPLAINs of parallel inserts in INSERT INTO
> SELECT and CTAS.
>

Right, IIRC, we have done the way it is in the patch for convenience
and to move forward with it and come back to it later once all other
parts of the patch are good.

> And also, some might wonder that Create under Gather means that each
> parallel worker is creating the table, it's actually not the creation
> of the table that's parallelized but it's insertion. If required, we
> can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
> added docs related to allowing parallel inserts in CTAS. Shall I add a
> para saying when parallel inserts can be picked and how the sample
> EXPLAIN looks? Thoughts?
>

Yeah, I don't see any problem with it, and maybe we can move  Explain
related code to a separate patch. The reason is we don't display DDL
part without parallelism and this might need a separate discussion.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Fri, Dec 25, 2020 at 10:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 25, 2020 at 9:54 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Fri, Dec 25, 2020 at 7:12 AM vignesh C <vignesh21@gmail.com> wrote:
> > > On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> > > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > > > > Attaching v14 patch set that has above changes. Please consider this
> > > > > > for further review.
> > > > > >
> > > > >
> > > > > Few comments:
> > > > > In the below case, should create be above Gather?
> > > > > postgres=# explain  create table t7 as select * from t6;
> > > > >                             QUERY PLAN
> > > > > -------------------------------------------------------------------
> > > > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > > > >    Workers Planned: 2
> > > > >  ->  Create t7
> > > > >    ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > > (4 rows)
> > > > >
> > > > > Can we change it to something like:
> > > > > -------------------------------------------------------------------
> > > > > Create t7
> > > > >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> > > > >   Workers Planned: 2
> > > > >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > > > > (4 rows)
> > > > >
> > > >
> > > > I think it is better to have it in a way as in the current patch
> > > > because that reflects that we are performing insert/create below
> > > > Gather which is the purpose of this patch. I think this is similar to
> > > > what the Parallel Insert patch [1] has for a similar plan.
> > > >
> > > >
> > > > [1] - https://commitfest.postgresql.org/31/2844/
> > > >
> > >
> > > Also another thing that I felt was that actually the Gather nodes will actually do the insert operation, the
Createtable will be done earlier itself. Should we change Create table to Insert table something like below:
 
> > >                              QUERY PLAN
> > > -------------------------------------------------------------------
> > >  Gather  (cost=0.00..9.17 rows=0 width=4)
> > >    Workers Planned: 2
> > >  ->  Insert table2 (instead of Create table2)
> > >    ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)
> >
> > IMO, showing Insert under Gather makes sense if the query is INSERT
> > INTO SELECT as it's in the other patch [1]. Since here it is a CTAS
> > query, so having Create under Gather looks fine to me. This way we can
> > also distinguish the EXPLAINs of parallel inserts in INSERT INTO
> > SELECT and CTAS.
> >
>
> Right, IIRC, we have done the way it is in the patch for convenience
> and to move forward with it and come back to it later once all other
> parts of the patch are good.
>
> > And also, some might wonder that Create under Gather means that each
> > parallel worker is creating the table, it's actually not the creation
> > of the table that's parallelized but it's insertion. If required, we
> > can clarify it in CTAS docs with a sample EXPLAIN. I have not yet
> > added docs related to allowing parallel inserts in CTAS. Shall I add a
> > para saying when parallel inserts can be picked and how the sample
> > EXPLAIN looks? Thoughts?
> >
>
> Yeah, I don't see any problem with it, and maybe we can move  Explain
> related code to a separate patch. The reason is we don't display DDL
> part without parallelism and this might need a separate discussion.
>

This makes sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> > You could change intoclause_len = strlen(intoclausestr) to
> > strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> > places. We can avoid the +1 in the other places.
> > +       /* Estimate space for into clause for CTAS. */
> > +       if (IS_CTAS(intoclause) && OidIsValid(objectid))
> > +       {
> > +               intoclausestr = nodeToString(intoclause);
> > +               intoclause_len = strlen(intoclausestr);
> > +               shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
> > +               shm_toc_estimate_keys(&pcxt->estimator, 1);
> > +       }
>
> Done.
>
> > Can we use  node->nworkers_launched == 0 in place of
> > node->need_to_scan_locally, that way the setting and resetting of
> > node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> > is needed in any of the functions that gets called.
> > +       /* Enable leader to insert in case no parallel workers were launched. */
> > +       if (node->nworkers_launched == 0)
> > +               node->need_to_scan_locally = true;
> > +
> > +       /*
> > +        * By now, for parallel workers (if launched any), would have
> > started their
> > +        * work i.e. insertion to target table. In case the leader is chosen to
> > +        * participate for parallel inserts in CTAS, then finish its
> > share before
> > +        * going to wait for the parallel workers to finish.
> > +        */
> > +       if (node->need_to_scan_locally)
> > +       {
>
> need_to_scan_locally is being set in ExecGather() even if
> nworkers_launched > 0 it can still be true, so I think we can not
> remove need_to_scan_locally in ExecParallelInsertInCTAS.
>
> Attaching v15 patch set for further review. Note that the change is
> only in 0001 patch, other patches remain unchanged from v14.

I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.

1.

@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
     /* this is global to a transaction, not subtransaction-local */
     if (used)
     {
-        /*
-         * Forbid setting currentCommandIdUsed in a parallel worker, because
-         * we have no provision for communicating this back to the leader.  We
-         * could relax this restriction when currentCommandIdUsed was already
-         * true at the start of the parallel operation.
-         */
-        Assert(!IsParallelWorker());
+         /*
+          * This is a temporary hack for all common parallel insert cases i.e.
+          * insert into, ctas, copy from. To be changed later. In a parallel
+          * worker, set currentCommandIdUsed to true only if it was not set to
+          * true at the start of the parallel operation (by way of
+          * SetCurrentCommandIdUsedForWorker()). We have to do this because
+          * GetCurrentCommandId(true) may be called from anywhere, especially
+          * for parallel inserts, within parallel worker.
+          */
+        Assert(!(IsParallelWorker() && !currentCommandIdUsed));

Why is this temporary hack? and what is the plan for removing this hack?

2.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+    if (!IS_CTAS(into))
+        return;

When will this hit?  The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return,  can you add the comment that when do you
expect this case?

Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)

3.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */

Push down to the Gather nodes?  I think the right statement will be
push down below the Gather node.


4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
     DR_intorel *myState = (DR_intorel *) self;

+    if (myState->is_parallel_worker)
+    {
+        /* In the worker */

+        SetCurrentCommandIdUsedForWorker();
+        myState->output_cid = GetCurrentCommandId(false);
+    }
+    else
     {
        non-parallel worker code
    }
}

I think instead of moving all the code related to non-parallel worker
in the else we can do better.  This
will avoid unnecessary code movement.

4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
     DR_intorel *myState = (DR_intorel *) self;

    -- Comment ->in parallel worker we don't need to crease dest recv blah blah
+    if (myState->is_parallel_worker)
    {
        --parallel worker handling--
        return;
    }

    --non-parallel worker code stay right there, instead of moving to else

5.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{

From function name and comments it appeared that this function will
return boolean saying whether
Parallel insert should be selected or not.  I think name/comment
should be better for this

6.
        /*
+         * For parallelizing inserts in CTAS i.e. making each parallel worker
+         * insert the tuples, we must send information such as into clause (for
+         * each worker to build separate dest receiver), object id (for each
+         * worker to open the created table).

Comment is saying we need to pass object id but the code under this
comment is not doing so.

7.
+        /*
+         * Since there are no rows that are transferred from workers to Gather
+         * node, so we set it to 0 to be visible in estimated row count of
+         * explain plans.
+         */
+        queryDesc->planstate->plan->plan_rows = 0;

This seems a bit hackies Why it is done after the planning,  I mean
plan must know that it is returning a 0 rows?

8.
+        char *intoclause_space = shm_toc_allocate(pcxt->toc,
+                                                  intoclause_len);
+        memcpy(intoclause_space, intoclausestr, intoclause_len);
+        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);

One blank line between variable declaration and next code segment,
take care at other places as well.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21@gmail.com> wrote:
> > You could change intoclause_len = strlen(intoclausestr) to
> > strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> > places. We can avoid the +1 in the other places.
> > +       /* Estimate space for into clause for CTAS. */
> > +       if (IS_CTAS(intoclause) && OidIsValid(objectid))
> > +       {
> > +               intoclausestr = nodeToString(intoclause);
> > +               intoclause_len = strlen(intoclausestr);
> > +               shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
> > +               shm_toc_estimate_keys(&pcxt->estimator, 1);
> > +       }
>
> Done.
>
> > Can we use  node->nworkers_launched == 0 in place of
> > node->need_to_scan_locally, that way the setting and resetting of
> > node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> > is needed in any of the functions that gets called.
> > +       /* Enable leader to insert in case no parallel workers were launched. */
> > +       if (node->nworkers_launched == 0)
> > +               node->need_to_scan_locally = true;
> > +
> > +       /*
> > +        * By now, for parallel workers (if launched any), would have
> > started their
> > +        * work i.e. insertion to target table. In case the leader is chosen to
> > +        * participate for parallel inserts in CTAS, then finish its
> > share before
> > +        * going to wait for the parallel workers to finish.
> > +        */
> > +       if (node->need_to_scan_locally)
> > +       {
>
> need_to_scan_locally is being set in ExecGather() even if
> nworkers_launched > 0 it can still be true, so I think we can not
> remove need_to_scan_locally in ExecParallelInsertInCTAS.
>
> Attaching v15 patch set for further review. Note that the change is
> only in 0001 patch, other patches remain unchanged from v14.
>

+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment  "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"

+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.

+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+    ln text;
+begin
+    for ln in
+        execute format('explain (analyze, costs off, summary off,
timing off) %s',
+            $1)
+    loop
+        ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
+        ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
+        ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
+        return next ln;
+    end loop;
+end;
+$$;

The above function is same as function present in partition_prune.sql:
create function explain_parallel_append(text) returns setof text
language plpgsql as
$$
declare
    ln text;
begin
    for ln in
        execute format('explain (analyze, costs off, summary off,
timing off) %s',
            $1)
    loop
        ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
        ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
        ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
        return next ln;
    end loop;
end;
$$;

If possible try to make a common function for both and use.

+       if (intoclausestr && OidIsValid(objectid))
+               fpes->objectid = objectid;
+       else
+               fpes->objectid = InvalidOid;
Here OidIsValid(objectid) check is not required intoclausestr will be
set only if OidIsValid.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have reviewed part of v15-0001 patch, I have a few comments, I will
> continue to review this.

Thanks a lot.

> 1.
> Why is this temporary hack? and what is the plan for removing this hack?

The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.

> 2.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
> +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> +{
> +    if (!IS_CTAS(into))
> +        return;
>
> When will this hit?  The functtion name suggest that it is from CTAS
> but now you have a check that if it is
> not for CTAS then return,  can you add the comment that when do you
> expect this case?

Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
    ChooseParallelInsertsInCTAS()

> Also the function name should start in a new line
> i.e
> void
> ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)

Ah, missed that. Modified now.

> 3.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
>
> Push down to the Gather nodes?  I think the right statement will be
> push down below the Gather node.

Modified.

> 4.
> intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> {
>      DR_intorel *myState = (DR_intorel *) self;
>
>     -- Comment ->in parallel worker we don't need to crease dest recv blah blah
> +    if (myState->is_parallel_worker)
>     {
>         --parallel worker handling--
>         return;
>     }
>
>     --non-parallel worker code stay right there, instead of moving to else

Done.

> 5.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
> +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> +{
>
> From function name and comments it appeared that this function will
> return boolean saying whether
> Parallel insert should be selected or not.  I think name/comment
> should be better for this

Yeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.

> 6.
>         /*
> +         * For parallelizing inserts in CTAS i.e. making each parallel worker
> +         * insert the tuples, we must send information such as into clause (for
> +         * each worker to build separate dest receiver), object id (for each
> +         * worker to open the created table).
>
> Comment is saying we need to pass object id but the code under this
> comment is not doing so.

Improved the comment.

> 7.
> +        /*
> +         * Since there are no rows that are transferred from workers to Gather
> +         * node, so we set it to 0 to be visible in estimated row count of
> +         * explain plans.
> +         */
> +        queryDesc->planstate->plan->plan_rows = 0;
>
> This seems a bit hackies Why it is done after the planning,  I mean
> plan must know that it is returning a 0 rows?

This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
    if (es->costs)
    {
        if (es->format == EXPLAIN_FORMAT_TEXT)
        {
            appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f width=%d)",
                             plan->startup_cost, plan->total_cost,
                             plan->plan_rows, plan->plan_width);

Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?

I removed it in v6 patch set.

> 8.
> +        char *intoclause_space = shm_toc_allocate(pcxt->toc,
> +                                                  intoclause_len);
> +        memcpy(intoclause_space, intoclausestr, intoclause_len);
> +        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
>
> One blank line between variable declaration and next code segment,
> take care at other places as well.

Done.

I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.

[1] - https://www.postgresql.org/message-id/CAA4eK1JqwXGYoGa1%2B3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA%40mail.gmail.com



With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Sat, Dec 26, 2020 at 9:20 PM vignesh C <vignesh21@gmail.com> wrote:
> +-- parallel inserts must occur
> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +select count(*) from parallel_write;
> +drop table parallel_write;
>
> We can change comment  "parallel inserts must occur" like "parallel
> insert must be selected for CTAS on normal table"
>
> +-- parallel inserts must occur
> +select explain_pictas(
> +'create unlogged table parallel_write as select length(stringu1) from tenk1;');
> +select count(*) from parallel_write;
> +drop table parallel_write;
>
> We can change comment "parallel inserts must occur" like "parallel
> insert must be selected for CTAS on unlogged table"
> Similar comment need to be handled in other places also.

I think the existing comments look fine. The info like table type and
the Query CTAS or CMV is visible by looking at the test case. What I
wanted from the comments is whether we support parallel inserts or not
and if not why so that it will be easy to read. I tried to keep it as
succinctly as possible.

> If possible try to make a common function for both and use.

Yes you are right. The function explain_pictas is the same as
explain_parallel_append from partition_prune.sql. It's a test
function, and I also see that we have serial_schedule and
parallel_schedule which means that these sql files can run in any
order. I'm not quite sure whether we can have it in a common test sql
file and use it across other tests sql files. AFAICS, I didn't find
any function being used in such a manner. Thoughts?

> +       if (intoclausestr && OidIsValid(objectid))
> +               fpes->objectid = objectid;
> +       else
> +               fpes->objectid = InvalidOid;
> Here OidIsValid(objectid) check is not required intoclausestr will be
> set only if OidIsValid.

Removed the OidIsValid check in the latest v16 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:

+       if (ignore &&
+           (root->parse->CTASParallelInsInfo &
+            CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))

I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the above if since when ignore_parallel_tuple_cost returns true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.

+ * In this function we only care Append and Gather nodes.

'care' -> 'care about'

+       for (int i = 0; i < aps->as_nplans; i++)
+       {
+           parallel |= PushDownCTASParallelInsertState(dest,
+                                                       aps->appendplans[i],
+                                                       gather_exists);

It seems the loop termination condition can include parallel since we can come out of the loop once parallel is true.

+   if (!allow && tuple_cost_flags && gather_exists)

As the above code shows, gather_exists is only checked when allow is false.

+            * We set the flag for two cases when there is no parent path will
+            * be created(such as : limit,sort,distinct...):

Please correct the grammar : there are two verbs following 'when'

For set_append_rel_size:

+           {
+               root->parse->CTASParallelInsInfo |=
+                                       CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+           }
+       }
+
+       if (root->parse->CTASParallelInsInfo &
+           CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+       {
+           root->parse->CTASParallelInsInfo &=
+                                       ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;

In the if block for childrel->rtekind == RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it cleared immediately after ?

+   /* Set to this in case tuple cost needs to be ignored for Append cases. */
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3

Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn on' or similar term in the comment. Because 'set to' normally means assignment.

Cheers

On Sun, Dec 27, 2020 at 12:50 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have reviewed part of v15-0001 patch, I have a few comments, I will
> continue to review this.

Thanks a lot.

> 1.
> Why is this temporary hack? and what is the plan for removing this hack?

The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.

> 2.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
> +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> +{
> +    if (!IS_CTAS(into))
> +        return;
>
> When will this hit?  The functtion name suggest that it is from CTAS
> but now you have a check that if it is
> not for CTAS then return,  can you add the comment that when do you
> expect this case?

Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
    ChooseParallelInsertsInCTAS()

> Also the function name should start in a new line
> i.e
> void
> ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)

Ah, missed that. Modified now.

> 3.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
>
> Push down to the Gather nodes?  I think the right statement will be
> push down below the Gather node.

Modified.

> 4.
> intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> {
>      DR_intorel *myState = (DR_intorel *) self;
>
>     -- Comment ->in parallel worker we don't need to crease dest recv blah blah
> +    if (myState->is_parallel_worker)
>     {
>         --parallel worker handling--
>         return;
>     }
>
>     --non-parallel worker code stay right there, instead of moving to else

Done.

> 5.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
> +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> +{
>
> From function name and comments it appeared that this function will
> return boolean saying whether
> Parallel insert should be selected or not.  I think name/comment
> should be better for this

Yeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.

> 6.
>         /*
> +         * For parallelizing inserts in CTAS i.e. making each parallel worker
> +         * insert the tuples, we must send information such as into clause (for
> +         * each worker to build separate dest receiver), object id (for each
> +         * worker to open the created table).
>
> Comment is saying we need to pass object id but the code under this
> comment is not doing so.

Improved the comment.

> 7.
> +        /*
> +         * Since there are no rows that are transferred from workers to Gather
> +         * node, so we set it to 0 to be visible in estimated row count of
> +         * explain plans.
> +         */
> +        queryDesc->planstate->plan->plan_rows = 0;
>
> This seems a bit hackies Why it is done after the planning,  I mean
> plan must know that it is returning a 0 rows?

This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
    if (es->costs)
    {
        if (es->format == EXPLAIN_FORMAT_TEXT)
        {
            appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f width=%d)",
                             plan->startup_cost, plan->total_cost,
                             plan->plan_rows, plan->plan_width);

Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?

I removed it in v6 patch set.

> 8.
> +        char *intoclause_space = shm_toc_allocate(pcxt->toc,
> +                                                  intoclause_len);
> +        memcpy(intoclause_space, intoclausestr, intoclause_len);
> +        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
>
> One blank line between variable declaration and next code segment,
> take care at other places as well.

Done.

I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.

[1] - https://www.postgresql.org/message-id/CAA4eK1JqwXGYoGa1%2B3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA%40mail.gmail.com



With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > continue to review this.
>
> Thanks a lot.
>
> > 1.
> > Why is this temporary hack? and what is the plan for removing this hack?
>
> The changes in xact.c, xact.h and heapam.c are common to all the
> parallel insert patches - COPY, INSERT INTO SELECT. That was the
> initial comment, I forgot to keep it in sync with the other patches.
> Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> was to have these code in all the parallel inserts patch, whichever
> gets to review and commit first, others will update their patches
> accordingly.
>
> > 2.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> > +    if (!IS_CTAS(into))
> > +        return;
> >
> > When will this hit?  The functtion name suggest that it is from CTAS
> > but now you have a check that if it is
> > not for CTAS then return,  can you add the comment that when do you
> > expect this case?
>
> Yes it will hit for explain cases, but I choose to remove this and
> check outside in the explain something like:
> if (into)
>     ChooseParallelInsertsInCTAS()
>
> > Also the function name should start in a new line
> > i.e
> > void
> > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
>
> Ah, missed that. Modified now.
>
> > 3.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push down
> > + * the dest receiver to the Gather nodes.
> > + */
> >
> > Push down to the Gather nodes?  I think the right statement will be
> > push down below the Gather node.
>
> Modified.
>
> > 4.
> > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > {
> >      DR_intorel *myState = (DR_intorel *) self;
> >
> >     -- Comment ->in parallel worker we don't need to crease dest recv blah blah
> > +    if (myState->is_parallel_worker)
> >     {
> >         --parallel worker handling--
> >         return;
> >     }
> >
> >     --non-parallel worker code stay right there, instead of moving to else
>
> Done.
>
> > 5.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> >
> > From function name and comments it appeared that this function will
> > return boolean saying whether
> > Parallel insert should be selected or not.  I think name/comment
> > should be better for this
>
> Yeah that function can still return void because no point in returning
> bool there, since the intention is to see if parallel inserts can be
> performed, if yes, set the state otherwise exit. I changed the
> function name to TryParallelizingInsertsInCTAS(). Let me know your
> suggestions if that doesn't work out.
>
> > 6.
> >         /*
> > +         * For parallelizing inserts in CTAS i.e. making each parallel worker
> > +         * insert the tuples, we must send information such as into clause (for
> > +         * each worker to build separate dest receiver), object id (for each
> > +         * worker to open the created table).
> >
> > Comment is saying we need to pass object id but the code under this
> > comment is not doing so.
>
> Improved the comment.
>
> > 7.
> > +        /*
> > +         * Since there are no rows that are transferred from workers to Gather
> > +         * node, so we set it to 0 to be visible in estimated row count of
> > +         * explain plans.
> > +         */
> > +        queryDesc->planstate->plan->plan_rows = 0;
> >
> > This seems a bit hackies Why it is done after the planning,  I mean
> > plan must know that it is returning a 0 rows?
>
> This exists to show up the estimated row count(in case of EXPLAIN CTAS
> without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
> are shown correctly as 0 because Gather doesn't receive any tuples.
>     if (es->costs)
>     {
>         if (es->format == EXPLAIN_FORMAT_TEXT)
>         {
>             appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f width=%d)",
>                              plan->startup_cost, plan->total_cost,
>                              plan->plan_rows, plan->plan_width);
>
> Since it's an estimated row count(which may not be always correct), we
> will let the EXPLAIN plan show that and I think we can remove that
> part. Thoughts?
>
> I removed it in v6 patch set.
>
> > 8.
> > +        char *intoclause_space = shm_toc_allocate(pcxt->toc,
> > +                                                  intoclause_len);
> > +        memcpy(intoclause_space, intoclausestr, intoclause_len);
> > +        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
> >
> > One blank line between variable declaration and next code segment,
> > take care at other places as well.
>
> Done.
>
> I'm attaching the v16 patch set. Please note that I added the
> documentation saying that parallel insertions can happen and a sample
> output of the explain to 0003 patch as discussed in [1]. But I didn't
> move the explain output related code to a separate patch because it's
> a small snippet in explain.c. I hope that's okay.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1JqwXGYoGa1%2B3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA%40mail.gmail.com
>

Thanks for working on this, I will have a look at the updated patches soon.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:


On Sun, Dec 27, 2020 at 2:28 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Dec 26, 2020 at 9:20 PM vignesh C <vignesh21@gmail.com> wrote:
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment  "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on normal table"
> >
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create unlogged table parallel_write as select length(stringu1) from tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on unlogged table"
> > Similar comment need to be handled in other places also.
>
> I think the existing comments look fine. The info like table type and
> the Query CTAS or CMV is visible by looking at the test case. What I
> wanted from the comments is whether we support parallel inserts or not
> and if not why so that it will be easy to read. I tried to keep it as
> succinctly as possible.
>

I saw few inconsistencies in the patch:
+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+                      explain_pictas  

+-- parallel inserts must not occur as the table is temporary
+select explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from tenk1;');
+                      explain_pictas  

+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker

+select explain_pictas(
+'create table parallel_write as select two col1,
+    (select two from (select * from tenk2) as tt limit 1) col2
+    from tenk1  where tenk1.four = 3;');
+                             explain_pictas  

+-- the top node is Gather under which merge join happens, so parallel inserts
+-- must occur

+set enable_nestloop to off;
+set enable_mergejoin to on;

+-- parallel hash join happens under Gather node, so parallel inserts must occur
+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(

Test comments are detailed in a few cases and in few others it is not detailed for similar kinds of parallelism selected tests. I felt we could make the test comments consistent across the file.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 28, 2020 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Thanks for working on this, I will have a look at the updated patches soon.

Attaching v17 patch set after addressing comments raised in other
threads. Please consider this patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 28, 2020 at 1:16 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:
>
> +       if (ignore &&
> +           (root->parse->CTASParallelInsInfo &
> +            CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
>
> I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the above if since when
ignore_parallel_tuple_costreturns true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.
 

Sometimes, we may set the flag CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
before generate_useful_gather_paths, but the
generate_useful_gather_paths can return without reaching cost_gather
where we reset. The generate_useful_gather_paths can return without
reaching cost_gather, in following case

if (rel->partial_pathlist == NIL)
    return;

So, for such cases, I'm resetting it here.

> + * In this function we only care Append and Gather nodes.
>
> 'care' -> 'care about'

Done.

> +       for (int i = 0; i < aps->as_nplans; i++)
> +       {
> +           parallel |= PushDownCTASParallelInsertState(dest,
> +                                                       aps->appendplans[i],
> +                                                       gather_exists);
>
> It seems the loop termination condition can include parallel since we can come out of the loop once parallel is
true.

No, we can not come out of the for loop if parallel is true, because
our intention there is to look for all the child/sub plans under
Append, and push the inserts to the Gather nodes wherever possible.

> +   if (!allow && tuple_cost_flags && gather_exists)
>
> As the above code shows, gather_exists is only checked when allow is false.

Yes, if at least one gather node exists under the Append for which the
planner would have ignored the tuple cost, and now if we don't allow
parallel inserts, we should assert that the parallelism is not picked
because of wrong parallel tuple cost enforcement.

> +            * We set the flag for two cases when there is no parent path will
> +            * be created(such as : limit,sort,distinct...):
>
> Please correct the grammar : there are two verbs following 'when'

Done.

> For set_append_rel_size:
>
> +           {
> +               root->parse->CTASParallelInsInfo |=
> +                                       CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
> +           }
> +       }
> +
> +       if (root->parse->CTASParallelInsInfo &
> +           CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
> +       {
> +           root->parse->CTASParallelInsInfo &=
> +                                       ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
>
> In the if block for childrel->rtekind == RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it
clearedimmediately after ?
 

Thanks for pointing that out. It's a miss, intention is to reset it
after set_rel_size(). Corrected in the v17 patch.

> +   /* Set to this in case tuple cost needs to be ignored for Append cases. */
> +   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
>
> Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn on' or similar term in the comment.
Because'set to' normally means assignment.
 

Done.

All the above comments are addressed in the v17 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Dec 28, 2020 at 11:24 AM vignesh C <vignesh21@gmail.com> wrote:
> Test comments are detailed in a few cases and in few others it is not detailed for similar kinds of parallelism
selectedtests. I felt we could make the test comments consistent across the file.
 

Modified the test case description in the v17 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch

+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.

I think the 'unless' should be 'if'. As can be seen from the body of the method:

+       if (!ps->ps_ProjInfo)
+       {
+           GatherState *gstate = (GatherState *) ps;
+
+           parallel = true;

Cheers

On Mon, Dec 28, 2020 at 4:12 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Dec 28, 2020 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Thanks for working on this, I will have a look at the updated patches soon.

Attaching v17 patch set after addressing comments raised in other
threads. Please consider this patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
>
> + * Push the dest receiver to Gather node when it is either at the top of the
> + * plan or under top Append node unless it does not have any projections to do.
>
> I think the 'unless' should be 'if'. As can be seen from the body of the method:
>
> +       if (!ps->ps_ProjInfo)
> +       {
> +           GatherState *gstate = (GatherState *) ps;
> +
> +           parallel = true;

Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
that no change in 0001 to 0003 patches from v17.

Please consider v18 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Mon, Dec 28, 2020 at 10:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > > continue to review this.
> >
> > Thanks a lot.
> >
> > > 1.
> > > Why is this temporary hack? and what is the plan for removing this hack?
> >
> > The changes in xact.c, xact.h and heapam.c are common to all the
> > parallel insert patches - COPY, INSERT INTO SELECT. That was the
> > initial comment, I forgot to keep it in sync with the other patches.
> > Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> > was to have these code in all the parallel inserts patch, whichever
> > gets to review and commit first, others will update their patches
> > accordingly.
> >
> > > 2.
> > > +/*
> > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > > + * insertion is possible, if yes set the parallel insert state i.e. push down
> > > + * the dest receiver to the Gather nodes.
> > > + */
> > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > > +{
> > > +    if (!IS_CTAS(into))
> > > +        return;
> > >
> > > When will this hit?  The functtion name suggest that it is from CTAS
> > > but now you have a check that if it is
> > > not for CTAS then return,  can you add the comment that when do you
> > > expect this case?
> >
> > Yes it will hit for explain cases, but I choose to remove this and
> > check outside in the explain something like:
> > if (into)
> >     ChooseParallelInsertsInCTAS()
> >
> > > Also the function name should start in a new line
> > > i.e
> > > void
> > > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> >
> > Ah, missed that. Modified now.
> >
> > > 3.
> > > +/*
> > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > > + * insertion is possible, if yes set the parallel insert state i.e. push down
> > > + * the dest receiver to the Gather nodes.
> > > + */
> > >
> > > Push down to the Gather nodes?  I think the right statement will be
> > > push down below the Gather node.
> >
> > Modified.
> >
> > > 4.
> > > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > > {
> > >      DR_intorel *myState = (DR_intorel *) self;
> > >
> > >     -- Comment ->in parallel worker we don't need to crease dest recv blah blah
> > > +    if (myState->is_parallel_worker)
> > >     {
> > >         --parallel worker handling--
> > >         return;
> > >     }
> > >
> > >     --non-parallel worker code stay right there, instead of moving to else
> >
> > Done.
> >
> > > 5.
> > > +/*
> > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > > + * insertion is possible, if yes set the parallel insert state i.e. push down
> > > + * the dest receiver to the Gather nodes.
> > > + */
> > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > > +{
> > >
> > > From function name and comments it appeared that this function will
> > > return boolean saying whether
> > > Parallel insert should be selected or not.  I think name/comment
> > > should be better for this
> >
> > Yeah that function can still return void because no point in returning
> > bool there, since the intention is to see if parallel inserts can be
> > performed, if yes, set the state otherwise exit. I changed the
> > function name to TryParallelizingInsertsInCTAS(). Let me know your
> > suggestions if that doesn't work out.
> >
> > > 6.
> > >         /*
> > > +         * For parallelizing inserts in CTAS i.e. making each parallel worker
> > > +         * insert the tuples, we must send information such as into clause (for
> > > +         * each worker to build separate dest receiver), object id (for each
> > > +         * worker to open the created table).
> > >
> > > Comment is saying we need to pass object id but the code under this
> > > comment is not doing so.
> >
> > Improved the comment.
> >
> > > 7.
> > > +        /*
> > > +         * Since there are no rows that are transferred from workers to Gather
> > > +         * node, so we set it to 0 to be visible in estimated row count of
> > > +         * explain plans.
> > > +         */
> > > +        queryDesc->planstate->plan->plan_rows = 0;
> > >
> > > This seems a bit hackies Why it is done after the planning,  I mean
> > > plan must know that it is returning a 0 rows?
> >
> > This exists to show up the estimated row count(in case of EXPLAIN CTAS
> > without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
> > are shown correctly as 0 because Gather doesn't receive any tuples.
> >     if (es->costs)
> >     {
> >         if (es->format == EXPLAIN_FORMAT_TEXT)
> >         {
> >             appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f width=%d)",
> >                              plan->startup_cost, plan->total_cost,
> >                              plan->plan_rows, plan->plan_width);
> >
> > Since it's an estimated row count(which may not be always correct), we
> > will let the EXPLAIN plan show that and I think we can remove that
> > part. Thoughts?
> >
> > I removed it in v6 patch set.
> >
> > > 8.
> > > +        char *intoclause_space = shm_toc_allocate(pcxt->toc,
> > > +                                                  intoclause_len);
> > > +        memcpy(intoclause_space, intoclausestr, intoclause_len);
> > > +        shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
> > >
> > > One blank line between variable declaration and next code segment,
> > > take care at other places as well.
> >
> > Done.
> >
> > I'm attaching the v16 patch set. Please note that I added the
> > documentation saying that parallel insertions can happen and a sample
> > output of the explain to 0003 patch as discussed in [1]. But I didn't
> > move the explain output related code to a separate patch because it's
> > a small snippet in explain.c. I hope that's okay.
> >
> > [1] - https://www.postgresql.org/message-id/CAA4eK1JqwXGYoGa1%2B3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA%40mail.gmail.com
> >
>
> Thanks for working on this, I will have a look at the updated patches soon.

I have completed reviewing 0001, I don't have more comments, just one
question.  Soon I will review the remaining patches.

+    /* If parallel inserts are to be allowed, set a few extra information. */
+    if (myState->is_parallel)
+    {
+        myState->object_id = intoRelationAddr.objectId;
+
+        /*
+         * We don't need to skip contacting FSM while inserting tuples for
+         * parallel mode, while extending the relations, workers instead of
+         * blocking on a page while another worker is inserting, can check the
+         * FSM for another page that can accommodate the tuples. This results
+         * in major benefit for parallel inserts.
+         */
+        myState->ti_options = 0;

Is there any performance data for this or just theoretical analysis?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have completed reviewing 0001, I don't have more comments, just one
> question.  Soon I will review the remaining patches.

Thanks.

> +    /* If parallel inserts are to be allowed, set a few extra information. */
> +    if (myState->is_parallel)
> +    {
> +        myState->object_id = intoRelationAddr.objectId;
> +
> +        /*
> +         * We don't need to skip contacting FSM while inserting tuples for
> +         * parallel mode, while extending the relations, workers instead of
> +         * blocking on a page while another worker is inserting, can check the
> +         * FSM for another page that can accommodate the tuples. This results
> +         * in major benefit for parallel inserts.
> +         */
> +        myState->ti_options = 0;
>
> Is there any performance data for this or just theoretical analysis?

I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have completed reviewing 0001, I don't have more comments, just one
> question.  Soon I will review the remaining patches.

Thanks.

> +    /* If parallel inserts are to be allowed, set a few extra information. */
> +    if (myState->is_parallel)
> +    {
> +        myState->object_id = intoRelationAddr.objectId;
> +
> +        /*
> +         * We don't need to skip contacting FSM while inserting tuples for
> +         * parallel mode, while extending the relations, workers instead of
> +         * blocking on a page while another worker is inserting, can check the
> +         * FSM for another page that can accommodate the tuples. This results
> +         * in major benefit for parallel inserts.
> +         */
> +        myState->ti_options = 0;
>
> Is there any performance data for this or just theoretical analysis?

I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.

Yeah that’s fine
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Wed, Dec 30, 2020 at 10:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> > I have completed reviewing 0001, I don't have more comments, just one
>> > question.  Soon I will review the remaining patches.
>>
>> Thanks.
>>
>> > +    /* If parallel inserts are to be allowed, set a few extra information. */
>> > +    if (myState->is_parallel)
>> > +    {
>> > +        myState->object_id = intoRelationAddr.objectId;
>> > +
>> > +        /*
>> > +         * We don't need to skip contacting FSM while inserting tuples for
>> > +         * parallel mode, while extending the relations, workers instead of
>> > +         * blocking on a page while another worker is inserting, can check the
>> > +         * FSM for another page that can accommodate the tuples. This results
>> > +         * in major benefit for parallel inserts.
>> > +         */
>> > +        myState->ti_options = 0;
>> >
>> > Is there any performance data for this or just theoretical analysis?
>>
>> I have seen that we don't get much performance with the skip fsm
>> option, though I don't have the data to back it up. I'm planning to
>> run performance tests after the patches 0001, 0002 and 0003 get
>> reviewed. I will capture the data at that time. Hope that's fine.
>
>
> Yeah that’s fine
>

Some comments in 0002

1.
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;


I don't like the naming of these flags.  Especially no need to define
CTAS_PARALLEL_INS_UNDEF, we can directl use 0
for that purpose instead of giving some weird name.  So I suggest
first, just get rid of CTAS_PARALLEL_INS_UNDEF.

2.
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
  generate_useful_gather_paths(root, rel, false);

+ /*
+ * Reset the ignore flag, in case we turned it on but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ * If we reached cost_gather, we would have been reset it there.
+ */
+ if (ignore && (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }

I think th way we are using these cost ignoring flag, doesn't look clean.

I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
CTAS and then ignore_parallel_tuple_cost will
set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
condition which is fine.  Now, internally cost
gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
it outside.  Why do we need to remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?

3.
+ if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));

Instead of adding Assert inside an IF statement, you can convert whole
statement as an assert.  Lets not add unnecessary
if in the release mode.

4.
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;

Changes this to (if, else) as shown below, because if it goes to the
IF part then ignore_tuple_cost will always be true
so no need to have an extra if check.

if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
(root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
{
ignore_tuple_cost = true;
root->parse->CTASParallelInsInfo &=
~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
}
else
run_cost += parallel_tuple_cost * path->path.rows;

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:
On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have completed reviewing 0001, I don't have more comments, just one
> > question.  Soon I will review the remaining patches.
>
> Thanks.
>
> > +    /* If parallel inserts are to be allowed, set a few extra information. */
> > +    if (myState->is_parallel)
> > +    {
> > +        myState->object_id = intoRelationAddr.objectId;
> > +
> > +        /*
> > +         * We don't need to skip contacting FSM while inserting tuples for
> > +         * parallel mode, while extending the relations, workers instead of
> > +         * blocking on a page while another worker is inserting, can check the
> > +         * FSM for another page that can accommodate the tuples. This results
> > +         * in major benefit for parallel inserts.
> > +         */
> > +        myState->ti_options = 0;
> >
> > Is there any performance data for this or just theoretical analysis?
>
> I have seen that we don't get much performance with the skip fsm
> option, though I don't have the data to back it up. I'm planning to
> run performance tests after the patches 0001, 0002 and 0003 get
> reviewed. I will capture the data at that time. Hope that's fine.
>

When you run the performance tests, you can try to capture and publish
relation size & the number of pages that are getting created for base
table and the CTAS table, you can use something like SELECT relpages
FROM pg_class WHERE relname = 'tablename &  SELECT
pg_total_relation_size('tablename'). Just to make sure that there is
no significant difference between the base table and CTAS table.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:
On Wed, Dec 30, 2020 at 9:25 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> > w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
> >
> > + * Push the dest receiver to Gather node when it is either at the top of the
> > + * plan or under top Append node unless it does not have any projections to do.
> >
> > I think the 'unless' should be 'if'. As can be seen from the body of the method:
> >
> > +       if (!ps->ps_ProjInfo)
> > +       {
> > +           GatherState *gstate = (GatherState *) ps;
> > +
> > +           parallel = true;
>
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
>
> Please consider v18 patch set for further review.
>

Few comments:
-       /*
-        * To allow parallel inserts, we need to ensure that they are safe to be
-        * performed in workers. We have the infrastructure to allow parallel
-        * inserts in general except for the cases where inserts generate a new
-        * CommandId (eg. inserts into a table having a foreign key column).
-        */
-       if (IsParallelWorker())
-               ereport(ERROR,
-                               (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
-                                errmsg("cannot insert tuples in a
parallel worker")));

Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.

+       Oid                     objectid;               /* workers to
open relation/table.  */
+       /* Number of tuples inserted by all the workers. */
+       pg_atomic_uint64        processed;

We can just mention relation instead of relation/table.

+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+                      explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+   Workers Planned: 4
+   Workers Launched: N
+ ->  Create parallel_write
+   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;

Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id  in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
Thanks for the comments.

How about naming like below more generically and placing them in
parallel.h so that it will also be used for refresh materialized view?

+typedef enum ParallelInsertTupleCostOpt
+{
+ PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PINS_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PINS_TUP_COST_IGNORED = 1 << 2

My plan was to get the main design idea of pushing the dest receiver
to gather reviewed and once agreed, then I thought of making few
functions common and place them in parallel.h and parallel.c so that
they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
because the same design idea can be applied there as well.

For instance my thoughts are: add the below structures, functions and
other macros to parallel.h and parallel.c:
typedef enum ParallelInsertKind
{
    PINS_UNDEF = 0,
    PINS_CREATE_TABLE_AS,
    PINS_REFRESH_MAT_VIEW
} ParallelInsertKind;

typedef struct ParallelInsertCTASInfo
{
    IntoClause *intoclause;
    Oid objectid;
} ParallelInsertCTASInfo;

typedef struct ParallelInsertRMVInfo
{
    Oid objectid;
} ParallelInsertRMVInfo;

ExecInitParallelPlan(PlanState *planstate, EState *estate,
                      Bitmapset *sendParams, int nworkers,
-                     int64 tuples_needed)
+                     int64 tuples_needed, ParallelInsertKind pinskind,
+                     void *pinsinfo)

Change ExecParallelInsertInCTAS to

+static void
+ExecParallelInsert(GatherState *node)
+{

Change SetCTASParallelInsertState to
+void
+SetParallelInsertState(QueryDesc *queryDesc)

Change IsParallelInsertionAllowedInCTAS to

+bool
+IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
+{

Thoughts?

If okay, I can work on these points and add a new patch into the patch
set that will have changes for parallel inserts in REFRESH
MATERIALIZED VIEW.

On Wed, Dec 30, 2020 at 3:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Some comments in 0002
>
> 1.
> +/*
> + * Information sent to the planner from CTAS to account for the cost
> + * calculations in cost_gather. We need to do this because, no tuples will be
> + * received by the Gather node if the workers insert the tuples in parallel.
> + */
> +typedef enum CTASParallelInsertOpt
> +{
> + CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
> + CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
> + /*
> + * Turn on this while planning for upper Gather path to ignore parallel
> + * tuple cost in cost_gather.
> + */
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
> + /* Turn on this after the cost is ignored. */
> + CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
> +} CTASParallelInsertOpt;
>
>
> I don't like the naming of these flags.  Especially no need to define
> CTAS_PARALLEL_INS_UNDEF, we can directl use 0
> for that purpose instead of giving some weird name.  So I suggest
> first, just get rid of CTAS_PARALLEL_INS_UNDEF.

+1. I will change it in the next version of the patch.

> 2.
> + /*
> + * Turn on a flag to ignore parallel tuple cost by the Gather path in
> + * cost_gather if the SELECT is for CTAS and we are generating an upper
> + * level Gather path.
> + */
> + bool ignore = ignore_parallel_tuple_cost(root);
> +
>   generate_useful_gather_paths(root, rel, false);
>
> + /*
> + * Reset the ignore flag, in case we turned it on but
> + * generate_useful_gather_paths returned without reaching cost_gather.
> + * If we reached cost_gather, we would have been reset it there.
> + */
> + if (ignore && (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> + {
> + root->parse->CTASParallelInsInfo &=
> + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> + }
>
> I think th way we are using these cost ignoring flag, doesn't look clean.
>
> I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
> CTAS and then ignore_parallel_tuple_cost will
> set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
> condition which is fine.  Now, internally cost
> gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
> it outside.  Why do we need to remove
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?

Yes we don't need to remove the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
flag. I will change it in the next version.

> 3.
> + if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
> + Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
>
> Instead of adding Assert inside an IF statement, you can convert whole
> statement as an assert.  Lets not add unnecessary
> if in the release mode.

+1. I will change it in the version.

> 4.
> + if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
> + (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> + {
> + ignore_tuple_cost = true;
> + root->parse->CTASParallelInsInfo &=
> + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> + root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
> + }
> +
> + if (!ignore_tuple_cost)
> + run_cost += parallel_tuple_cost * path->path.rows;
>
> Changes this to (if, else) as shown below, because if it goes to the
> IF part then ignore_tuple_cost will always be true
> so no need to have an extra if check.
>
> if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
> (root->parse->CTASParallelInsInfo &
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> {
> ignore_tuple_cost = true;
> root->parse->CTASParallelInsInfo &=
> ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
> }
> else
> run_cost += parallel_tuple_cost * path->path.rows;

+1. I will change it in the next version.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Wed, Dec 30, 2020 at 7:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks for the comments.
>
> How about naming like below more generically and placing them in
> parallel.h so that it will also be used for refresh materialized view?
>
> +typedef enum ParallelInsertTupleCostOpt
> +{
> + PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
> + /*
> + * Turn on this while planning for upper Gather path to ignore parallel
> + * tuple cost in cost_gather.
> + */
> + PINS_CAN_IGN_TUP_COST = 1 << 1,
> + /* Turn on this after the cost is ignored. */
> + PINS_TUP_COST_IGNORED = 1 << 2
>
> My plan was to get the main design idea of pushing the dest receiver
> to gather reviewed and once agreed, then I thought of making few
> functions common and place them in parallel.h and parallel.c so that
> they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
> because the same design idea can be applied there as well.


I think instead of PINS_* we can name PARALLEL_INSERT_* other than
that I am fine with the name.

> For instance my thoughts are: add the below structures, functions and
> other macros to parallel.h and parallel.c:
> typedef enum ParallelInsertKind
> {
>     PINS_UNDEF = 0,
>     PINS_CREATE_TABLE_AS,
>     PINS_REFRESH_MAT_VIEW
> } ParallelInsertKind;
>
> typedef struct ParallelInsertCTASInfo
> {
>     IntoClause *intoclause;
>     Oid objectid;
> } ParallelInsertCTASInfo;
>
> typedef struct ParallelInsertRMVInfo
> {
>     Oid objectid;
> } ParallelInsertRMVInfo;
>
> ExecInitParallelPlan(PlanState *planstate, EState *estate,
>                       Bitmapset *sendParams, int nworkers,
> -                     int64 tuples_needed)
> +                     int64 tuples_needed, ParallelInsertKind pinskind,
> +                     void *pinsinfo)
>
> Change ExecParallelInsertInCTAS to
>
> +static void
> +ExecParallelInsert(GatherState *node)
> +{
>
> Change SetCTASParallelInsertState to
> +void
> +SetParallelInsertState(QueryDesc *queryDesc)
>
> Change IsParallelInsertionAllowedInCTAS to
>
> +bool
> +IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
> +{
>
> Thoughts?
>

I haven’t thought about these structures yet but yeah making them
generic will be good.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Dec 30, 2020 at 5:26 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > I have completed reviewing 0001, I don't have more comments, just one
> > > question.  Soon I will review the remaining patches.
> >
> > Thanks.
> >
> > > +    /* If parallel inserts are to be allowed, set a few extra information. */
> > > +    if (myState->is_parallel)
> > > +    {
> > > +        myState->object_id = intoRelationAddr.objectId;
> > > +
> > > +        /*
> > > +         * We don't need to skip contacting FSM while inserting tuples for
> > > +         * parallel mode, while extending the relations, workers instead of
> > > +         * blocking on a page while another worker is inserting, can check the
> > > +         * FSM for another page that can accommodate the tuples. This results
> > > +         * in major benefit for parallel inserts.
> > > +         */
> > > +        myState->ti_options = 0;
> > >
> > > Is there any performance data for this or just theoretical analysis?
> >
> > I have seen that we don't get much performance with the skip fsm
> > option, though I don't have the data to back it up. I'm planning to
> > run performance tests after the patches 0001, 0002 and 0003 get
> > reviewed. I will capture the data at that time. Hope that's fine.
> >
>
> When you run the performance tests, you can try to capture and publish
> relation size & the number of pages that are getting created for base
> table and the CTAS table, you can use something like SELECT relpages
> FROM pg_class WHERE relname = 'tablename &  SELECT
> pg_total_relation_size('tablename'). Just to make sure that there is
> no significant difference between the base table and CTAS table.

I can do that, I'm sure the number of pages will be equal or little
more, since I observed this for parallel copy.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
> Few comments:
> -       /*
> -        * To allow parallel inserts, we need to ensure that they are safe to be
> -        * performed in workers. We have the infrastructure to allow parallel
> -        * inserts in general except for the cases where inserts generate a new
> -        * CommandId (eg. inserts into a table having a foreign key column).
> -        */
> -       if (IsParallelWorker())
> -               ereport(ERROR,
> -                               (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> -                                errmsg("cannot insert tuples in a
> parallel worker")));
>
> Is it possible to add a check if it is a CTAS insert here as we do not
> support insert in parallel workers from others as of now.

Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.

Thoughts?

> +       Oid                     objectid;               /* workers to
> open relation/table.  */
> +       /* Number of tuples inserted by all the workers. */
> +       pg_atomic_uint64        processed;
>
> We can just mention relation instead of relation/table.

I will modify it in the next patch set.

> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +                      explain_pictas
> +----------------------------------------------------------
> + Gather (actual rows=N loops=N)
> +   Workers Planned: 4
> +   Workers Launched: N
> + ->  Create parallel_write
> +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> +(5 rows)
> +
> +select count(*) from parallel_write;
>
> Can we include selection of cmin, xmin for one of the test to verify
> that it uses the same transaction id  in the parallel workers
> something like:
> select distinct(cmin,xmin) from parallel_write;

This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 30-12-2020 04:55, Bharath Rupireddy wrote:
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>> w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
>>
>> + * Push the dest receiver to Gather node when it is either at the top of the
>> + * plan or under top Append node unless it does not have any projections to do.
>>
>> I think the 'unless' should be 'if'. As can be seen from the body of the method:
>>
>> +       if (!ps->ps_ProjInfo)
>> +       {
>> +           GatherState *gstate = (GatherState *) ps;
>> +
>> +           parallel = true;
> 
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
> 
> Please consider v18 patch set for further review.
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Sorry it took so long to get back to reviewing this.

wrt v18-0001....patch:

+        /*
+         * If the worker is for parallel insert in CTAS, then use the proper
+         * dest receiver.
+         */
+        intoclause = (IntoClause *) stringToNode(intoclausestr);
+        receiver = CreateIntoRelDestReceiver(intoclause);
+        ((DR_intorel *)receiver)->is_parallel_worker = true;
+        ((DR_intorel *)receiver)->object_id = fpes->objectid;
I would move this into a function called e.g. 
GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in 
createas.c.
I would then also split up intorel_startup into intorel_leader_startup
and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set 
self->pub.rStartup to intorel_worker_startup.


+    volatile pg_atomic_uint64    *processed;
why is it volatile?


+            if (isctas)
+            {
+                intoclause = ((DR_intorel *) node->dest)->into;
+                objectid = ((DR_intorel *) node->dest)->object_id;
+            }
Given that you extract them each once and then pass them directly into 
the parallel-worker, can't you instead pass in the destreceiver and 
leave that logic to ExecInitParallelPlan?


+                    if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+                        ((DR_intorel *) gstate->dest)->into->rel &&
+                        ((DR_intorel *) gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted? 
because it seems from the intorel_startup function that that would be 
set as soon as startup was done, which i assume (wrongly?) is always done?


+     * In case if no workers were launched, allow the leader to insert entire
+     * tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?


================
wrt v18-0002....patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to 
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around 
generate_useful_gather_paths and in cost_gather when to me it seems more 
logical to put it in create_gather_path? i'm probably missing something 
there?


================
wrt v18-0003....patch:

not sure if it is needed, but i was wondering if we would want more 
tests with multiple gather nodes existing? caused e.g. by using CTE's, 
valid subquery's (like the one test you have, but without the group 
by/having)?


Kind regards,
Luc



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
Hi

> ================
> wrt v18-0002....patch:
> 
> It looks like this introduces a state machine that goes like:
> - starts at CTAS_PARALLEL_INS_UNDEF
> - possibly moves to CTAS_PARALLEL_INS_SELECT
> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> - if both were added at some stage, we can go to
> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> 
> what i'm wondering is why you opted to put logic around
> generate_useful_gather_paths and in cost_gather when to me it seems more
> logical to put it in create_gather_path? i'm probably missing something
> there?

IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can
onlycreate top node Gather.
 
So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node. 


Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 04-01-2021 12:16, Hou, Zhijie wrote:
> Hi
> 
>> ================
>> wrt v18-0002....patch:
>>
>> It looks like this introduces a state machine that goes like:
>> - starts at CTAS_PARALLEL_INS_UNDEF
>> - possibly moves to CTAS_PARALLEL_INS_SELECT
>> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
>> - if both were added at some stage, we can go to
>> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
>>
>> what i'm wondering is why you opted to put logic around
>> generate_useful_gather_paths and in cost_gather when to me it seems more
>> logical to put it in create_gather_path? i'm probably missing something
>> there?
> 
> IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
> And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can
onlycreate top node Gather.
 
> So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top
node.
> 
> 
> Best regards,
> houzj
> 
> 

Hi,

I was wondering actually if we need the state machine. Reason is that as 
AFAICS the code could be placed in create_gather_path, where you can 
also check if it is a top gather node, whether the dest receiver is the 
right type, etc? To me that seems like a nicer solution as its makes 
that all logic that decides whether or not a parallel CTAS is valid is 
in a single place instead of distributed over various places.

Kind regards,
Luc



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Dec 31, 2020 at 10:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > How about naming like below more generically and placing them in
> > parallel.h so that it will also be used for refresh materialized view?
> >
> > +typedef enum ParallelInsertTupleCostOpt
> > +{
> > + PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
> > + /*
> > + * Turn on this while planning for upper Gather path to ignore parallel
> > + * tuple cost in cost_gather.
> > + */
> > + PINS_CAN_IGN_TUP_COST = 1 << 1,
> > + /* Turn on this after the cost is ignored. */
> > + PINS_TUP_COST_IGNORED = 1 << 2
> >
> > My plan was to get the main design idea of pushing the dest receiver
> > to gather reviewed and once agreed, then I thought of making few
> > functions common and place them in parallel.h and parallel.c so that
> > they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
> > because the same design idea can be applied there as well.
>
> I think instead of PINS_* we can name PARALLEL_INSERT_* other than
> that I am fine with the name.

Done.

>
> > For instance my thoughts are: add the below structures, functions and
> > other macros to parallel.h and parallel.c:
> > typedef enum ParallelInsertKind
> > {
> >     PINS_UNDEF = 0,
> >     PINS_CREATE_TABLE_AS,
> >     PINS_REFRESH_MAT_VIEW
> > } ParallelInsertKind;
> >
> > typedef struct ParallelInsertCTASInfo
> > {
> >     IntoClause *intoclause;
> >     Oid objectid;
> > } ParallelInsertCTASInfo;
> >
> > typedef struct ParallelInsertRMVInfo
> > {
> >     Oid objectid;
> > } ParallelInsertRMVInfo;
> >
> > ExecInitParallelPlan(PlanState *planstate, EState *estate,
> >                       Bitmapset *sendParams, int nworkers,
> > -                     int64 tuples_needed)
> > +                     int64 tuples_needed, ParallelInsertKind pinskind,
> > +                     void *pinsinfo)
> >
> > Change ExecParallelInsertInCTAS to
> >
> > +static void
> > +ExecParallelInsert(GatherState *node)
> > +{
> >
> > Change SetCTASParallelInsertState to
> > +void
> > +SetParallelInsertState(QueryDesc *queryDesc)
> >
> > Change IsParallelInsertionAllowedInCTAS to
> >
> > +bool
> > +IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
> > +{
> >
> > Thoughts?
> >
>
> I haven’t thought about these structures yet but yeah making them
> generic will be good.

Attaching v19 patch set. It has following changes: 1) generic code
which can easily be extended to parallel inserts in Refresh
Materialized View, parallelizing Copy To command 2) addressing the
review comments received so far.

Once these patches are reviewed and get to the commit stage, I can
post a separate patch (probably in a separate thread) for parallel
inserts in Refresh Materialized View based on this patch set.

Please review the v19 patch set further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com> wrote:
> Sorry it took so long to get back to reviewing this.

Thanks for the comments.

> wrt v18-0001....patch:
>
> +               /*
> +                * If the worker is for parallel insert in CTAS, then use the proper
> +                * dest receiver.
> +                */
> +               intoclause = (IntoClause *) stringToNode(intoclausestr);
> +               receiver = CreateIntoRelDestReceiver(intoclause);
> +               ((DR_intorel *)receiver)->is_parallel_worker = true;
> +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
> I would move this into a function called e.g.
> GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> createas.c.
> I would then also split up intorel_startup into intorel_leader_startup
> and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply made the changes in intorel_startup, in which for workers it just does the minimalistic work and exit from it. In the leader most of the table creation and sanity check is kept untouched. Please have a look at the v19 patch posted upthread [1].

> +       volatile pg_atomic_uint64       *processed;
> why is it volatile?

Intention is to always read from the actual memory location. I referred it from the way pg_atomic_fetch_add_u64_impl, pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.

> +                       if (isctas)
> +                       {
> +                               intoclause = ((DR_intorel *) node->dest)->into;
> +                               objectid = ((DR_intorel *) node->dest)->object_id;
> +                       }
> Given that you extract them each once and then pass them directly into
> the parallel-worker, can't you instead pass in the destreceiver and
> leave that logic to ExecInitParallelPlan?

That's changed entirely in the v19 patch set posted upthread [1]. Please have a look. I didn't pass the dest receiver, to keep the API generic, I passed parallel insert command type and a void * ptr which points to insertion command because the information we pass to workers depends on the insertion command (for instance, the information needed by workers is for CTAS into clause and object id and for Refresh Mat View object id).

>
> +                                       if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> +                                               ((DR_intorel *) gstate->dest)->into->rel &&
> +                                               ((DR_intorel *) gstate->dest)->into->rel->relname)
> why would rel and relname not be there? if no rows have been inserted?
> because it seems from the intorel_startup function that that would be
> set as soon as startup was done, which i assume (wrongly?) is always done?

Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and SELECT INTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually, it's not required.

create_as_target:
            qualified_name opt_column_list table_access_method_clause
            OptWith OnCommitOption OptTableSpace
                {
                    $$ = makeNode(IntoClause);
                    $$->rel = $1;
create_mv_target:
            qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
            {
                $$ = makeNode(IntoClause);
                $$->rel = $1;
into_clause:
            INTO OptTempTableName
            {
                $$ = makeNode(IntoClause);
               $$->rel = $2;

I will change the below code:
+                    if (GetParallelInsertCmdType(gstate->dest) ==
+                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+                        ((DR_intorel *) gstate->dest)->into &&
+                        ((DR_intorel *) gstate->dest)->into->rel &&
+                        ((DR_intorel *) gstate->dest)->into->rel->relname)
+                    {

to:
+                    if (GetParallelInsertCmdType(gstate->dest) ==
+                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+                    {

I will update this in the next version of the patch set.

>
> +        * In case if no workers were launched, allow the leader to insert entire
> +        * tuples.
> what does "entire tuples" mean? should it maybe be "all tuples"?

Yeah, noticed that while working on the v19 patch set. Please have a look at the v19 patch posted upthread [1].

> ================
> wrt v18-0003....patch:
>
> not sure if it is needed, but i was wondering if we would want more
> tests with multiple gather nodes existing? caused e.g. by using CTE's,
> valid subquery's (like the one test you have, but without the group
> by/having)?

I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH RECURSIVE and I don't see that any of the WITH clause processing hits createas.c functions. So, IMHO, we don't need to add them. Please let me know if there are any specific use cases you have in mind.

For instance, I tried to cover Init/Sub Plan and Subquery cases with:

below case has multiple Gather, Init Plan:
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+    (select two from (select * from tenk2) as tt limit 1) col2
+    from tenk1  where tenk1.four = 3;');

below case has Gather, Sub Plan:
+-- parallel inserts must not occur, as there is sub plan that gets executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+    (select tenk1.two from generate_series(1,1)) col2
+    from tenk1  where tenk1.four = 3;');

For multiple Gather node cases, I covered them with the Union All/Append cases in the 0004 patch. Please have a look.

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming <luc@swarm64.com> wrote:
> On 04-01-2021 12:16, Hou, Zhijie wrote:
> >> ================
> >> wrt v18-0002....patch:
> >>
> >> It looks like this introduces a state machine that goes like:
> >> - starts at CTAS_PARALLEL_INS_UNDEF
> >> - possibly moves to CTAS_PARALLEL_INS_SELECT
> >> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> >> - if both were added at some stage, we can go to
> >> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> >>
> >> what i'm wondering is why you opted to put logic around
> >> generate_useful_gather_paths and in cost_gather when to me it seems more
> >> logical to put it in create_gather_path? i'm probably missing something
> >> there?
> >
> > IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
> > And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can
onlycreate top node Gather.
 
> > So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top
node.

Right. We wanted to ignore parallel tuple cost for only the upper Gather path.

> I was wondering actually if we need the state machine. Reason is that as
> AFAICS the code could be placed in create_gather_path, where you can
> also check if it is a top gather node, whether the dest receiver is the
> right type, etc? To me that seems like a nicer solution as its makes
> that all logic that decides whether or not a parallel CTAS is valid is
> in a single place instead of distributed over various places.

IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > +                                       if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > +                                               ((DR_intorel *) gstate->dest)->into->rel &&
> > +                                               ((DR_intorel *) gstate->dest)->into->rel->relname)
> > why would rel and relname not be there? if no rows have been inserted?
> > because it seems from the intorel_startup function that that would be
> > set as soon as startup was done, which i assume (wrongly?) is always done?
>
> Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and
SELECTINTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually,
it'snot required.
 
>
> create_as_target:
>             qualified_name opt_column_list table_access_method_clause
>             OptWith OnCommitOption OptTableSpace
>                 {
>                     $$ = makeNode(IntoClause);
>                     $$->rel = $1;
> create_mv_target:
>             qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
>             {
>                 $$ = makeNode(IntoClause);
>                 $$->rel = $1;
> into_clause:
>             INTO OptTempTableName
>             {
>                 $$ = makeNode(IntoClause);
>                $$->rel = $2;
>
> I will change the below code:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> +                        ((DR_intorel *) gstate->dest)->into &&
> +                        ((DR_intorel *) gstate->dest)->into->rel &&
> +                        ((DR_intorel *) gstate->dest)->into->rel->relname)
> +                    {
>
> to:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                    {
>
> I will update this in the next version of the patch set.

Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
vignesh C
Date:
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
> > Few comments:
> > -       /*
> > -        * To allow parallel inserts, we need to ensure that they are safe to be
> > -        * performed in workers. We have the infrastructure to allow parallel
> > -        * inserts in general except for the cases where inserts generate a new
> > -        * CommandId (eg. inserts into a table having a foreign key column).
> > -        */
> > -       if (IsParallelWorker())
> > -               ereport(ERROR,
> > -                               (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -                                errmsg("cannot insert tuples in a
> > parallel worker")));
> >
> > Is it possible to add a check if it is a CTAS insert here as we do not
> > support insert in parallel workers from others as of now.
>
> Currently, there's no global variable in which we can selectively skip
> this in case of parallel insertion in CTAS. How about having a
> variable in any of the worker global contexts, set that when parallel
> insertion is chosen for CTAS and use that in heap_prepare_insert() to
> skip the above error? Eventually, we can remove this restriction
> entirely in case we fully allow parallelism for INSERT INTO SELECT,
> CTAS, and COPY.
>
> Thoughts?

Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.

>
> > +       Oid                     objectid;               /* workers to
> > open relation/table.  */
> > +       /* Number of tuples inserted by all the workers. */
> > +       pg_atomic_uint64        processed;
> >
> > We can just mention relation instead of relation/table.
>
> I will modify it in the next patch set.
>
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +                      explain_pictas
> > +----------------------------------------------------------
> > + Gather (actual rows=N loops=N)
> > +   Workers Planned: 4
> > +   Workers Launched: N
> > + ->  Create parallel_write
> > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > +(5 rows)
> > +
> > +select count(*) from parallel_write;
> >
> > Can we include selection of cmin, xmin for one of the test to verify
> > that it uses the same transaction id  in the parallel workers
> > something like:
> > select distinct(cmin,xmin) from parallel_write;
>
> This is not possible since cmin and xmin are dynamic, we can not use
> them in test cases. I think it's not necessary to check whether the
> leader and workers are in the same txn or not, since we are not
> creating a new txn. All the txn state from the leader is serialized in
> SerializeTransactionState and restored in
> StartParallelWorkerTransaction.
>

I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  t1) as dt;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 04-01-2021 14:32, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com 
> <mailto:luc@swarm64.com>> wrote:
>  > Sorry it took so long to get back to reviewing this.
> 
> Thanks for the comments.
> 
>  > wrt v18-0001....patch:
>  >
>  > +               /*
>  > +                * If the worker is for parallel insert in CTAS, then 
> use the proper
>  > +                * dest receiver.
>  > +                */
>  > +               intoclause = (IntoClause *) stringToNode(intoclausestr);
>  > +               receiver = CreateIntoRelDestReceiver(intoclause);
>  > +               ((DR_intorel *)receiver)->is_parallel_worker = true;
>  > +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
>  > I would move this into a function called e.g.
>  > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
>  > createas.c.
>  > I would then also split up intorel_startup into intorel_leader_startup
>  > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
>  > self->pub.rStartup to intorel_worker_startup.
> 
> My intention was to not add any new APIs to the dest receiver. I simply 
> made the changes in intorel_startup, in which for workers it just does 
> the minimalistic work and exit from it. In the leader most of the table 
> creation and sanity check is kept untouched. Please have a look at the 
> v19 patch posted upthread [1].
> 

Looks much better, really nicely abstracted away in the v20 patch.

>  > +       volatile pg_atomic_uint64       *processed;
>  > why is it volatile?
> 
> Intention is to always read from the actual memory location. I referred 
> it from the way pg_atomic_fetch_add_u64_impl, 
> pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their 
> u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
> 

Okay I had not seen this syntax before for atomics with the volatile 
keyword but its apparently how the atomics abstraction works in postgresql.

>  > +                       if (isctas)
>  > +                       {
>  > +                               intoclause = ((DR_intorel *) 
> node->dest)->into;
>  > +                               objectid = ((DR_intorel *) 
> node->dest)->object_id;
>  > +                       }
>  > Given that you extract them each once and then pass them directly into
>  > the parallel-worker, can't you instead pass in the destreceiver and
>  > leave that logic to ExecInitParallelPlan?
> 
> That's changed entirely in the v19 patch set posted upthread [1]. Please 
> have a look. I didn't pass the dest receiver, to keep the API generic, I 
> passed parallel insert command type and a void * ptr which points to 
> insertion command because the information we pass to workers depends on 
> the insertion command (for instance, the information needed by workers 
> is for CTAS into clause and object id and for Refresh Mat View object id).
> 
>  >
>  > +                                       if 
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
>  > +                                               ((DR_intorel *) 
> gstate->dest)->into->rel &&
>  > +                                               ((DR_intorel *) 
> gstate->dest)->into->rel->relname)
>  > why would rel and relname not be there? if no rows have been inserted?
>  > because it seems from the intorel_startup function that that would be
>  > set as soon as startup was done, which i assume (wrongly?) is always 
> done?
> 
> Actually, that into clause rel variable is always being set in the 
> gram.y for CTAS, Create Materialized View and SELECT INTO (because 
> qualified_name non-terminal is not optional). My bad. I just added it as 
> a sanity check. Actually, it's not required.
> 
> create_as_target:
> *qualified_name* opt_column_list table_access_method_clause
>              OptWith OnCommitOption OptTableSpace
>                  {
>                      $$ = makeNode(IntoClause);
> *                    $$->rel = $1;*
> create_mv_target:
> *qualified_name* opt_column_list table_access_method_clause 
> opt_reloptions OptTableSpace
>              {
>                  $$ = makeNode(IntoClause);
> *                $$->rel = $1;*
> into_clause:
>              INTO OptTempTableName
>              {
>                  $$ = makeNode(IntoClause);
> *               $$->rel = $2;*
> 
> I will change the below code:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> +                        ((DR_intorel *) gstate->dest)->into &&
> +                        ((DR_intorel *) gstate->dest)->into->rel &&
> +                        ((DR_intorel *) gstate->dest)->into->rel->relname)
> +                    {
> 
> to:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                    {
> 
> I will update this in the next version of the patch set.
> 

Thanks
>  >
>  > +        * In case if no workers were launched, allow the leader to 
> insert entire
>  > +        * tuples.
>  > what does "entire tuples" mean? should it maybe be "all tuples"?
> 
> Yeah, noticed that while working on the v19 patch set. Please have a 
> look at the v19 patch posted upthread [1].
> 
>  > ================
>  > wrt v18-0003....patch:
>  >
>  > not sure if it is needed, but i was wondering if we would want more
>  > tests with multiple gather nodes existing? caused e.g. by using CTE's,
>  > valid subquery's (like the one test you have, but without the group
>  > by/having)?
> 
> I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH 
> RECURSIVE and I don't see that any of the WITH clause processing hits 
> createas.c functions. So, IMHO, we don't need to add them. Please let me 
> know if there are any specific use cases you have in mind.
> 
> For instance, I tried to cover Init/Sub Plan and Subquery cases with:
> 
> below case has multiple Gather, Init Plan:
> +-- parallel inserts must occur, as there is init plan that gets executed by
> +-- each parallel worker
> +select explain_pictas(
> +'create table parallel_write as select two col1,
> +    (select two from (select * from tenk2) as tt limit 1) col2
> +    from tenk1  where tenk1.four = 3;');
> 
> below case has Gather, Sub Plan:
> +-- parallel inserts must not occur, as there is sub plan that gets 
> executed by
> +-- the Gather node in leader
> +select explain_pictas(
> +'create table parallel_write as select two col1,
> +    (select tenk1.two from generate_series(1,1)) col2
> +    from tenk1  where tenk1.four = 3;');
> 
> For multiple Gather node cases, I covered them with the Union All/Append 
> cases in the 0004 patch. Please have a look.
> 

Right, had not reviewed part 4 yet. My bad.

> [1] - 
> https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com 
> <https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com>
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Kind regards,
Luc



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 05-01-2021 04:59, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>>> +                                       if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
>>> +                                               ((DR_intorel *) gstate->dest)->into->rel &&
>>> +                                               ((DR_intorel *) gstate->dest)->into->rel->relname)
>>> why would rel and relname not be there? if no rows have been inserted?
>>> because it seems from the intorel_startup function that that would be
>>> set as soon as startup was done, which i assume (wrongly?) is always done?
>>
>> Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and
SELECTINTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually,
it'snot required.
 
>>
>> create_as_target:
>>              qualified_name opt_column_list table_access_method_clause
>>              OptWith OnCommitOption OptTableSpace
>>                  {
>>                      $$ = makeNode(IntoClause);
>>                      $$->rel = $1;
>> create_mv_target:
>>              qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
>>              {
>>                  $$ = makeNode(IntoClause);
>>                  $$->rel = $1;
>> into_clause:
>>              INTO OptTempTableName
>>              {
>>                  $$ = makeNode(IntoClause);
>>                 $$->rel = $2;
>>
>> I will change the below code:
>> +                    if (GetParallelInsertCmdType(gstate->dest) ==
>> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
>> +                        ((DR_intorel *) gstate->dest)->into &&
>> +                        ((DR_intorel *) gstate->dest)->into->rel &&
>> +                        ((DR_intorel *) gstate->dest)->into->rel->relname)
>> +                    {
>>
>> to:
>> +                    if (GetParallelInsertCmdType(gstate->dest) ==
>> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>> +                    {
>>
>> I will update this in the next version of the patch set.
> 
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
> 
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Reviewing further v20-0001:

I would still opt for moving the code for the parallel worker into a 
separate function, and then setting rStartup of the dest receiver to 
that function in ExecParallelGetInsReceiver, as its completely 
independent code. Just a matter of style I guess.

Maybe I'm not completely following why but afaics we want parallel 
inserts in various scenarios, not just CTAS? I'm asking because code like
+    if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+        pg_atomic_add_fetch_u64(&fpes->processed, 
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that 
would be generalized soon after? Basically I would have expected the if 
to compare against PARALLEL_INSERT_CMD_UNDEF.

Apart from these small things v20-0001 looks (very) good to me.

v20-0002:
will reply on the specific mail-thread about the state machine

v20-0003 and v20-0004:
looks good to me.

Kind regards,
Luc



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 04-01-2021 14:53, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming <luc@swarm64.com> wrote:
>> On 04-01-2021 12:16, Hou, Zhijie wrote:
>>>> ================
>>>> wrt v18-0002....patch:
>>>>
>>>> It looks like this introduces a state machine that goes like:
>>>> - starts at CTAS_PARALLEL_INS_UNDEF
>>>> - possibly moves to CTAS_PARALLEL_INS_SELECT
>>>> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
>>>> - if both were added at some stage, we can go to
>>>> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
>>>>
>>>> what i'm wondering is why you opted to put logic around
>>>> generate_useful_gather_paths and in cost_gather when to me it seems more
>>>> logical to put it in create_gather_path? i'm probably missing something
>>>> there?
>>>
>>> IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
>>> And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can
onlycreate top node Gather.
 
>>> So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top
node.
> 
> Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
> 
>> I was wondering actually if we need the state machine. Reason is that as
>> AFAICS the code could be placed in create_gather_path, where you can
>> also check if it is a top gather node, whether the dest receiver is the
>> right type, etc? To me that seems like a nicer solution as its makes
>> that all logic that decides whether or not a parallel CTAS is valid is
>> in a single place instead of distributed over various places.
> 
> IMO, we can't determine the fact that we are going to generate the top
> Gather path in create_gather_path. To decide on whether or not the top
> Gather path generation, I think it's not only required to check the
> root->query_level == 1 but we also need to rely on from where
> generate_useful_gather_paths gets called. For instance, for
> query_level 1, generate_useful_gather_paths gets called from 2 places
> in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
> gets called from many places. IMO, the current way i.e. setting flag
> it in apply_scanjoin_target_to_paths and ignoring based on that in
> cost_gather seems safe.
> 
> I may be wrong. Thoughts?
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this 
would make the parallel inserts conceptually invalid.

Right now we're trying to judge what might be added on-top that could 
change the rows by inspecting all parts of the root object that would 
cause anything to be added, and add a little statemachine to track the 
state of that knowledge. To me this has the downside that the list in 
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to 
make sure it stays up-to-date, which could result in regressions if not 
tracked carefully.

Personally I would therefore go for a design which is safe in the sense 
that regressions are not as easily introduced. IMHO that could be done 
by inspecting the planned query afterwards, and then judging whether or 
not the parallel inserts are actually the right thing to do.

Another way to create more safety against regressions would be to add an 
assert upon execution of the query that if we do parallel inserts that 
only a subset of allowed nodes exists above the gather node.

Some (not extremely fact checked) approaches as food for thought:
1. Plan the query as normal, and then afterwards look at the resulting 
plan to see if there are only nodes that are ok between the gather node 
and the top node, which afaics would only be things like append nodes.
Which would mean two things:
- at the end of subquery_planner before the final_rel is fetched, we add 
another pass like the grouping_planner called e.g. 
parallel_modify_planner or so, which traverses the query plan and checks 
if the inserts would indeed be executed parallel, and if so sets the 
cost of the gather to 0.
- we always keep around the best gathered partial path, or the partial 
path itself.

2. Generate both gather paths: one with zero cost for the inserts and 
one with costs. the one with zero costs would however be kept separately 
and added as prime candidate for the final rel. then we can check in the 
subquery_planner if the final candidate is different and then choose.

Kind regards,
Luc



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Jan 5, 2021 at 10:08 AM vignesh C <vignesh21@gmail.com> wrote:
> On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote:
> > > Few comments:
> > > -       /*
> > > -        * To allow parallel inserts, we need to ensure that they are safe to be
> > > -        * performed in workers. We have the infrastructure to allow parallel
> > > -        * inserts in general except for the cases where inserts generate a new
> > > -        * CommandId (eg. inserts into a table having a foreign key column).
> > > -        */
> > > -       if (IsParallelWorker())
> > > -               ereport(ERROR,
> > > -                               (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > > -                                errmsg("cannot insert tuples in a
> > > parallel worker")));
> > >
> > > Is it possible to add a check if it is a CTAS insert here as we do not
> > > support insert in parallel workers from others as of now.
> >
> > Currently, there's no global variable in which we can selectively skip
> > this in case of parallel insertion in CTAS. How about having a
> > variable in any of the worker global contexts, set that when parallel
> > insertion is chosen for CTAS and use that in heap_prepare_insert() to
> > skip the above error? Eventually, we can remove this restriction
> > entirely in case we fully allow parallelism for INSERT INTO SELECT,
> > CTAS, and COPY.
> >
> > Thoughts?
>
> Yes, I felt that the leader can store the command as CTAS and the
> leader/worker can use it to check and throw an error. The similar
> change can be used for the parallel insert patches and once all the
> patches are committed, we can remove it eventually.

We can skip the error "cannot insert tuples in a parallel worker" in
heap_prepare_insert() selectively for each parallel insertion and
eventually we can remove that error after all the parallel insertion
related patches are committed. The main problem is that we should be
knowing in heap_prepare_insert() that we are coming from parallel
insertion for CTAS, or some other command at the same time we don't
want to alter the table_tuple_insert()/heap_prepare_insert() API
because this change will be removed eventually.

We can achieve this in below ways:
1) Add a backend global variable, set it before each
table_tuple_insert() in intorel_receive() and use that in
heap_prepare_insert() to skip the error.
2) Add a variable to MyBgworkerEntry structure, set it before each
table_tuple_insert() in intorel_receive() or in ParallelQueryMain() if
we are for CTAS parallel insertion and use that in
heap_prepare_insert() to skip the error.
3) Currently, we pass table insert options to
table_tuple_insert()/heap_prepare_insert(), which is a bitmap of below
values. We could also add something like #define
PARALLEL_INSERTION_CMD_CTAS 0x000F, set it before each
table_tuple_insert() in intorel_receive() and use that in
heap_prepare_insert() to skip the error, then unset it.

/* "options" flag bits for table_tuple_insert */
/* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
#define TABLE_INSERT_SKIP_FSM        0x0002
#define TABLE_INSERT_FROZEN            0x0004
#define TABLE_INSERT_NO_LOGICAL        0x0008

IMO either 2 or 3 would be fine. Thoughts?

> > > +       Oid                     objectid;               /* workers to
> > > open relation/table.  */
> > > +       /* Number of tuples inserted by all the workers. */
> > > +       pg_atomic_uint64        processed;
> > >
> > > We can just mention relation instead of relation/table.
> >
> > I will modify it in the next patch set.
> >
> > > +select explain_pictas(
> > > +'create table parallel_write as select length(stringu1) from tenk1;');
> > > +                      explain_pictas
> > > +----------------------------------------------------------
> > > + Gather (actual rows=N loops=N)
> > > +   Workers Planned: 4
> > > +   Workers Launched: N
> > > + ->  Create parallel_write
> > > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > > +(5 rows)
> > > +
> > > +select count(*) from parallel_write;
> > >
> > > Can we include selection of cmin, xmin for one of the test to verify
> > > that it uses the same transaction id  in the parallel workers
> > > something like:
> > > select distinct(cmin,xmin) from parallel_write;
> >
> > This is not possible since cmin and xmin are dynamic, we can not use
> > them in test cases. I think it's not necessary to check whether the
> > leader and workers are in the same txn or not, since we are not
> > creating a new txn. All the txn state from the leader is serialized in
> > SerializeTransactionState and restored in
> > StartParallelWorkerTransaction.
> >
>
> I had seen in your patch that you serialize and use the same
> transaction, but it will be good if you can have at least one test
> case to validate that the leader and worker both use the same
> transaction. To solve the problem that you are facing where cmin and
> xmin are dynamic, you can check the distinct count by using something
> like below:
> SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  t1) as dt;

Thanks.

So, the expectation is that the above query should always return 1 if
both leader and workers shared the same txn. I will add this to one of
the test cases in the next version of the patch set.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Tue, Jan 5, 2021 at 12:43 PM Luc Vlaming <luc@swarm64.com> wrote:
>
> On 04-01-2021 14:32, Bharath Rupireddy wrote:
> > On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com
> > <mailto:luc@swarm64.com>> wrote:
> >  > Sorry it took so long to get back to reviewing this.
> >
> > Thanks for the comments.
> >
> >  > wrt v18-0001....patch:
> >  >
> >  > +               /*
> >  > +                * If the worker is for parallel insert in CTAS, then
> > use the proper
> >  > +                * dest receiver.
> >  > +                */
> >  > +               intoclause = (IntoClause *) stringToNode(intoclausestr);
> >  > +               receiver = CreateIntoRelDestReceiver(intoclause);
> >  > +               ((DR_intorel *)receiver)->is_parallel_worker = true;
> >  > +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
> >  > I would move this into a function called e.g.
> >  > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> >  > createas.c.
> >  > I would then also split up intorel_startup into intorel_leader_startup
> >  > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> >  > self->pub.rStartup to intorel_worker_startup.
> >
> > My intention was to not add any new APIs to the dest receiver. I simply
> > made the changes in intorel_startup, in which for workers it just does
> > the minimalistic work and exit from it. In the leader most of the table
> > creation and sanity check is kept untouched. Please have a look at the
> > v19 patch posted upthread [1].
> >
>
> Looks much better, really nicely abstracted away in the v20 patch.
>
> >  > +       volatile pg_atomic_uint64       *processed;
> >  > why is it volatile?
> >
> > Intention is to always read from the actual memory location. I referred
> > it from the way pg_atomic_fetch_add_u64_impl,
> > pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
> > u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.

But in your case, I do not understand the intention that where do you
think that the compiler can optimize it and read the old value?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 05-01-2021 11:32, Dilip Kumar wrote:
> On Tue, Jan 5, 2021 at 12:43 PM Luc Vlaming <luc@swarm64.com> wrote:
>>
>> On 04-01-2021 14:32, Bharath Rupireddy wrote:
>>> On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com
>>> <mailto:luc@swarm64.com>> wrote:
>>>   > Sorry it took so long to get back to reviewing this.
>>>
>>> Thanks for the comments.
>>>
>>>   > wrt v18-0001....patch:
>>>   >
>>>   > +               /*
>>>   > +                * If the worker is for parallel insert in CTAS, then
>>> use the proper
>>>   > +                * dest receiver.
>>>   > +                */
>>>   > +               intoclause = (IntoClause *) stringToNode(intoclausestr);
>>>   > +               receiver = CreateIntoRelDestReceiver(intoclause);
>>>   > +               ((DR_intorel *)receiver)->is_parallel_worker = true;
>>>   > +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
>>>   > I would move this into a function called e.g.
>>>   > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
>>>   > createas.c.
>>>   > I would then also split up intorel_startup into intorel_leader_startup
>>>   > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
>>>   > self->pub.rStartup to intorel_worker_startup.
>>>
>>> My intention was to not add any new APIs to the dest receiver. I simply
>>> made the changes in intorel_startup, in which for workers it just does
>>> the minimalistic work and exit from it. In the leader most of the table
>>> creation and sanity check is kept untouched. Please have a look at the
>>> v19 patch posted upthread [1].
>>>
>>
>> Looks much better, really nicely abstracted away in the v20 patch.
>>
>>>   > +       volatile pg_atomic_uint64       *processed;
>>>   > why is it volatile?
>>>
>>> Intention is to always read from the actual memory location. I referred
>>> it from the way pg_atomic_fetch_add_u64_impl,
>>> pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
>>> u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
> 
> But in your case, I do not understand the intention that where do you
> think that the compiler can optimize it and read the old value?
> 

It can not and should not. I had just only seen so far c++ atomic 
variables and not a (postgres-specific?) c atomic variable which 
apparently requires the volatile keyword. My stupidity ;)

Cheers,
Luc



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <luc@swarm64.com> wrote:
> Reviewing further v20-0001:
>
> I would still opt for moving the code for the parallel worker into a
> separate function, and then setting rStartup of the dest receiver to
> that function in ExecParallelGetInsReceiver, as its completely
> independent code. Just a matter of style I guess.

If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;

IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.

If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.

Thoughts?

> Maybe I'm not completely following why but afaics we want parallel
> inserts in various scenarios, not just CTAS? I'm asking because code like
> +       if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +               pg_atomic_add_fetch_u64(&fpes->processed,
> queryDesc->estate->es_processed);
> seems very specific to CTAS. For now that seems fine but I suppose that
> would be generalized soon after? Basically I would have expected the if
> to compare against PARALLEL_INSERT_CMD_UNDEF.

After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1] as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com

> Apart from these small things v20-0001 looks (very) good to me.
> v20-0003 and v20-0004:
> looks good to me.

Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :

ParallelInsCmdEstimate :

+   Assert(pcxt && ins_info &&
+          (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)

Sinc the if condition is covered by the assertion, I wonder why the if check is still needed.

Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > +                                       if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > +                                               ((DR_intorel *) gstate->dest)->into->rel &&
> > +                                               ((DR_intorel *) gstate->dest)->into->rel->relname)
> > why would rel and relname not be there? if no rows have been inserted?
> > because it seems from the intorel_startup function that that would be
> > set as soon as startup was done, which i assume (wrongly?) is always done?
>
> Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and SELECT INTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually, it's not required.
>
> create_as_target:
>             qualified_name opt_column_list table_access_method_clause
>             OptWith OnCommitOption OptTableSpace
>                 {
>                     $$ = makeNode(IntoClause);
>                     $$->rel = $1;
> create_mv_target:
>             qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
>             {
>                 $$ = makeNode(IntoClause);
>                 $$->rel = $1;
> into_clause:
>             INTO OptTempTableName
>             {
>                 $$ = makeNode(IntoClause);
>                $$->rel = $2;
>
> I will change the below code:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> +                        ((DR_intorel *) gstate->dest)->into &&
> +                        ((DR_intorel *) gstate->dest)->into->rel &&
> +                        ((DR_intorel *) gstate->dest)->into->rel->relname)
> +                    {
>
> to:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                    {
>
> I will update this in the next version of the patch set.

Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
>
> ParallelInsCmdEstimate :
>
> +   Assert(pcxt && ins_info &&
> +          (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> +
> +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>
> Sinc the if condition is covered by the assertion, I wonder why the if check is still needed.
>
> Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo

Thanks.

The idea is to have assertion with all the expected ins_cmd types, and
then later to have selective handling for different ins_cmds. For
example, if we add (in future) parallel insertion in Refresh
Materialized View, then the code in those functions will be something
like:

+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+                       void *ins_info)
+{
+    Assert(pcxt && ins_info &&
+           (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
+            (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
+
+    if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+    {
+
+    }
+   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+   {
+
+   }

Similarly for other functions as well.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
The plan sounds good.

Before the second command type is added, can you leave out the 'if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)' and keep the pair of curlies ?

You can add the if condition back when the second command type is added.

Cheers

On Tue, Jan 5, 2021 at 7:53 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
>
> ParallelInsCmdEstimate :
>
> +   Assert(pcxt && ins_info &&
> +          (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> +
> +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>
> Sinc the if condition is covered by the assertion, I wonder why the if check is still needed.
>
> Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo

Thanks.

The idea is to have assertion with all the expected ins_cmd types, and
then later to have selective handling for different ins_cmds. For
example, if we add (in future) parallel insertion in Refresh
Materialized View, then the code in those functions will be something
like:

+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+                       void *ins_info)
+{
+    Assert(pcxt && ins_info &&
+           (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
+            (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
+
+    if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+    {
+
+    }
+   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+   {
+
+   }

Similarly for other functions as well.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>

+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;

I see there is some code that is generic for CTAS and INSERT INTO
SELECT *,  So is it
possible to take out that common code to a separate base patch?  Later
both CTAS and INSERT INTO SELECT * can expand
that for their usage.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
> >
> > ParallelInsCmdEstimate :
> >
> > +   Assert(pcxt && ins_info &&
> > +          (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> > +
> > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> >
> > Sinc the if condition is covered by the assertion, I wonder why the if
> check is still needed.
> >
> > Similar comment for SaveParallelInsCmdFixedInfo and
> > SaveParallelInsCmdInfo
> 
> Thanks.
> 
> The idea is to have assertion with all the expected ins_cmd types, and then
> later to have selective handling for different ins_cmds. For example, if
> we add (in future) parallel insertion in Refresh Materialized View, then
> the code in those functions will be something
> like:
> 
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind
> ins_cmd,
> +                       void *ins_info)
> +{
> +    Assert(pcxt && ins_info &&
> +           (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
> +            (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
> +
> +    if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +    {
> +
> +    }
> +   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> +   {
> +
> +   }
> 
> Similarly for other functions as well.

I think it makes sense.

And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in some places,
How about define a generic function with some comment to mention the purpose.

An example in INSERT INTO SELECT patch:
+/*
+ * IsModifySupportedInParallelMode
+ *
+ * Indicates whether execution of the specified table-modification command
+ * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain
+ * parallel-safety conditions.
+ */
+static inline bool
+IsModifySupportedInParallelMode(CmdType commandType)
+{
+    /* Currently only INSERT is supported */
+    return (commandType == CMD_INSERT);
+}

Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 6, 2021 at 10:05 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> The plan sounds good.
>
> Before the second command type is added, can you leave out the 'if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)'
andkeep the pair of curlies ?
 
>
> You can add the if condition back when the second command type is added.

Thanks.

IMO, an empty pair of curlies is not a good idea. Having if (ins_cmd
== PARALLEL_INSERT_CMD_CREATE_TABLE_AS) doesn't harm anything.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 6, 2021 at 11:06 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
> > >
> > > ParallelInsCmdEstimate :
> > >
> > > +   Assert(pcxt && ins_info &&
> > > +          (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> > > +
> > > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > >
> > > Sinc the if condition is covered by the assertion, I wonder why the if
> > check is still needed.
> > >
> > > Similar comment for SaveParallelInsCmdFixedInfo and
> > > SaveParallelInsCmdInfo
> >
> > Thanks.
> >
> > The idea is to have assertion with all the expected ins_cmd types, and then
> > later to have selective handling for different ins_cmds. For example, if
> > we add (in future) parallel insertion in Refresh Materialized View, then
> > the code in those functions will be something
> > like:
> >
> > +static void
> > +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind
> > ins_cmd,
> > +                       void *ins_info)
> > +{
> > +    Assert(pcxt && ins_info &&
> > +           (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
> > +            (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
> > +
> > +    if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +    {
> > +
> > +    }
> > +   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> > +   {
> > +
> > +   }
> >
> > Similarly for other functions as well.
>
> I think it makes sense.
>
> And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in some places,
> How about define a generic function with some comment to mention the purpose.
>
> An example in INSERT INTO SELECT patch:
> +/*
> + * IsModifySupportedInParallelMode
> + *
> + * Indicates whether execution of the specified table-modification command
> + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain
> + * parallel-safety conditions.
> + */
> +static inline bool
> +IsModifySupportedInParallelMode(CmdType commandType)
> +{
> +       /* Currently only INSERT is supported */
> +       return (commandType == CMD_INSERT);
> +}

The intention of assert is to verify that those functions are called
for appropriate commands such as CTAS, Refresh Mat View and so on with
correct parameters. I really don't think so we can replace the assert
with a function like above, in the release mode assertion will always
be true. In a way, that assertion is for only debugging purposes. And
I also think that when we as the callers know when to call those new
functions, we can even remove the assertions, if they are really a
problem here. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 6, 2021 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
>
> +/*
> + * List the commands here for which parallel insertions are possible.
> + */
> +typedef enum ParallelInsertCmdKind
> +{
> + PARALLEL_INSERT_CMD_UNDEF = 0,
> + PARALLEL_INSERT_CMD_CREATE_TABLE_AS
> +} ParallelInsertCmdKind;
>
> I see there is some code that is generic for CTAS and INSERT INTO
> SELECT *,  So is it
> possible to take out that common code to a separate base patch?  Later
> both CTAS and INSERT INTO SELECT * can expand
> that for their usage.

I currently see the common code for parallel inserts i.e. insert into
selects, copy, ctas/create mat view/refresh mat view is the code in -
heapam.c, xact.c and xact.h. I can make a separate patch if required
for these changes alone. Thoughts?

IIRC parallel inserts in insert into select and copy don't use the
design idea of pushing the dest receiver down to Gather. Whereas
ctas/create mat view, refresh mat view, copy to can use the idea of
pushing the dest receiver to Gather and can easily extend on the
patches I made here.

Is there anything else do you feel that we can have in common?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> > I think it makes sense.
> >
> > And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be
> > used in some places, How about define a generic function with some comment
> to mention the purpose.
> >
> > An example in INSERT INTO SELECT patch:
> > +/*
> > + * IsModifySupportedInParallelMode
> > + *
> > + * Indicates whether execution of the specified table-modification
> > +command
> > + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to
> > +certain
> > + * parallel-safety conditions.
> > + */
> > +static inline bool
> > +IsModifySupportedInParallelMode(CmdType commandType) {
> > +       /* Currently only INSERT is supported */
> > +       return (commandType == CMD_INSERT); }
> 
> The intention of assert is to verify that those functions are called for
> appropriate commands such as CTAS, Refresh Mat View and so on with correct
> parameters. I really don't think so we can replace the assert with a function
> like above, in the release mode assertion will always be true. In a way,
> that assertion is for only debugging purposes. And I also think that when
> we as the callers know when to call those new functions, we can even remove
> the assertions, if they are really a problem here. Thoughts?
Hi

Thanks for the explanation.

If the check about command type is only used in assert, I think you are right.
I suggested a new function because I guess the check can be used in some other places.
Such as:

+        /* Okay to parallelize inserts, so mark it. */
+        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+            ((DR_intorel *) dest)->is_parallel = true;

+        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+            ((DR_intorel *) dest)->is_parallel = false;

Or

+    if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+        pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);

If you think the above code will extend the ins_cmd type check in the future, the generic function may make sense.

Best regards,
houzj 




Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Wed, Jan 6, 2021 at 11:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 6, 2021 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> >
> > +/*
> > + * List the commands here for which parallel insertions are possible.
> > + */
> > +typedef enum ParallelInsertCmdKind
> > +{
> > + PARALLEL_INSERT_CMD_UNDEF = 0,
> > + PARALLEL_INSERT_CMD_CREATE_TABLE_AS
> > +} ParallelInsertCmdKind;
> >
> > I see there is some code that is generic for CTAS and INSERT INTO
> > SELECT *,  So is it
> > possible to take out that common code to a separate base patch?  Later
> > both CTAS and INSERT INTO SELECT * can expand
> > that for their usage.
>
> I currently see the common code for parallel inserts i.e. insert into
> selects, copy, ctas/create mat view/refresh mat view is the code in -
> heapam.c, xact.c and xact.h. I can make a separate patch if required
> for these changes alone. Thoughts?

I just saw this structure (ParallelInsertCmdKind) where it is defining
the ParallelInsertCmdKind and also usage is different based on the
command type.  So I think the code which is defining the generic code
e.g. this structure and other similar code can go to the first patch
and we can build the remaining patch atop that patch.  But if you
think this is just this structure and not much code is common then we
can let it be.

> IIRC parallel inserts in insert into select and copy don't use the
> design idea of pushing the dest receiver down to Gather. Whereas
> ctas/create mat view, refresh mat view, copy to can use the idea of
> pushing the dest receiver to Gather and can easily extend on the
> patches I made here.
>
> Is there anything else do you feel that we can have in common?

Nothing specific.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 6, 2021 at 11:30 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > > I think it makes sense.
> > >
> > > And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be
> > > used in some places, How about define a generic function with some comment
> > to mention the purpose.
> > >
> > > An example in INSERT INTO SELECT patch:
> > > +/*
> > > + * IsModifySupportedInParallelMode
> > > + *
> > > + * Indicates whether execution of the specified table-modification
> > > +command
> > > + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to
> > > +certain
> > > + * parallel-safety conditions.
> > > + */
> > > +static inline bool
> > > +IsModifySupportedInParallelMode(CmdType commandType) {
> > > +       /* Currently only INSERT is supported */
> > > +       return (commandType == CMD_INSERT); }
> >
> > The intention of assert is to verify that those functions are called for
> > appropriate commands such as CTAS, Refresh Mat View and so on with correct
> > parameters. I really don't think so we can replace the assert with a function
> > like above, in the release mode assertion will always be true. In a way,
> > that assertion is for only debugging purposes. And I also think that when
> > we as the callers know when to call those new functions, we can even remove
> > the assertions, if they are really a problem here. Thoughts?
> Hi
>
> Thanks for the explanation.
>
> If the check about command type is only used in assert, I think you are right.
> I suggested a new function because I guess the check can be used in some other places.
> Such as:
>
> +               /* Okay to parallelize inserts, so mark it. */
> +               if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                       ((DR_intorel *) dest)->is_parallel = true;
>
> +               if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                       ((DR_intorel *) dest)->is_parallel = false;

We need to know exactly what is the command in above place, to
dereference and mark is_parallel to true, because is_parallel is being
added to the respective structures, not to the generic _DestReceiver
structure. So, in future the above code becomes something like below:

+    /* Okay to parallelize inserts, so mark it. */
+    if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+        ((DR_intorel *) dest)->is_parallel = true;
+    else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+        ((DR_transientrel *) dest)->is_parallel = true;
+    else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO)
+        ((DR_copy *) dest)->is_parallel = true;

In the below place, instead of new function, I think we can just have
something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)

> Or
>
> +       if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +               pg_atomic_add_fetch_u64(&fpes->processed, queryDesc->estate->es_processed);
>
> If you think the above code will extend the ins_cmd type check in the future, the generic function may make sense.

We can also change below to fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF.

+    if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+        receiver = ExecParallelGetInsReceiver(toc, fpes);

If okay, I will modify it in the next version of the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 05-01-2021 13:57, Bharath Rupireddy wrote:
> On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <luc@swarm64.com> wrote:
>> Reviewing further v20-0001:
>>
>> I would still opt for moving the code for the parallel worker into a
>> separate function, and then setting rStartup of the dest receiver to
>> that function in ExecParallelGetInsReceiver, as its completely
>> independent code. Just a matter of style I guess.
> 
> If we were to have a intorel_startup_worker and assign it to
> self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
> we have to pass a parameter to CreateIntoRelDestReceiver as an
> indication of parallel worker, which requires code changes in places
> wherever CreateIntoRelDestReceiver is used. 2) we can also assign
> intorel_startup_worker after CreateIntoRelDestReceiver in
> ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
> can duplicate CreateIntoRelDestReceiver and have a
> CreateIntoRelParallelDestReceiver with the only change being that
> self->pub.rStartup = intorel_startup_worker;
> 
> IMHO, the way it is currently, looks good. Anyways, I'm open to
> changing that if we agree on any of the above 3 ways.

The current way is good enough, it was a suggestion as personally I find 
it hard to read to have two completely separate code paths in the same 
function. If any I would opt for something like 3) where there's a 
CreateIntoRelParallelDestReceiver which calls CreateIntoRelDestReceiver 
and then overrides rStartup to intorel_startup_worker. Then no callsites 
have to change except the ones that are for parallel workers.
> 
> If we were to do any of the above, then we might have to do the same
> thing for other commands Refresh Materialized View or Copy To where we
> can parallelize.
> 
> Thoughts?
> 
>> Maybe I'm not completely following why but afaics we want parallel
>> inserts in various scenarios, not just CTAS? I'm asking because code like
>> +       if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>> +               pg_atomic_add_fetch_u64(&fpes->processed,
>> queryDesc->estate->es_processed);
>> seems very specific to CTAS. For now that seems fine but I suppose that
>> would be generalized soon after? Basically I would have expected the if
>> to compare against PARALLEL_INSERT_CMD_UNDEF.
> 
> After this patch is reviewed and goes for commit, then the next thing
> I plan to do is to allow parallel inserts in Refresh Materialized View
> and it can be used for that. I think the processed variable can also
> be used for parallel inserts in INSERT INTO SELECT [1] as well.
> Currently, I'm keeping it for CTAS, maybe later (after this is
> committed) it can be generalized.
> 
> Thoughts?

Sounds good

> 
> [1] -
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com
> 
>> Apart from these small things v20-0001 looks (very) good to me.
>> v20-0003 and v20-0004:
>> looks good to me.
> 
> Thanks.
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Kind regards,
Luc



RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> >
> > +               /* Okay to parallelize inserts, so mark it. */
> > +               if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +                       ((DR_intorel *) dest)->is_parallel = true;
> >
> > +               if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +                       ((DR_intorel *) dest)->is_parallel = false;
> 
> We need to know exactly what is the command in above place, to dereference
> and mark is_parallel to true, because is_parallel is being added to the
> respective structures, not to the generic _DestReceiver structure. So, in
> future the above code becomes something like below:
> 
> +    /* Okay to parallelize inserts, so mark it. */
> +    if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +        ((DR_intorel *) dest)->is_parallel = true;
> +    else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> +        ((DR_transientrel *) dest)->is_parallel = true;
> +    else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO)
> +        ((DR_copy *) dest)->is_parallel = true;
> 
> In the below place, instead of new function, I think we can just have
> something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
> 
> > Or
> >
> > +       if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +               pg_atomic_add_fetch_u64(&fpes->processed,
> > + queryDesc->estate->es_processed);
> >
> > If you think the above code will extend the ins_cmd type check in the
> future, the generic function may make sense.
> 
> We can also change below to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF.
> 
> +    if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +        receiver = ExecParallelGetInsReceiver(toc, fpes);
> 
> If okay, I will modify it in the next version of the patch.

Yes, that looks good to me.

Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, Jan 5, 2021 at 1:25 PM Luc Vlaming <luc@swarm64.com> wrote:
> >>>> wrt v18-0002....patch:
> >>>>
> >>>> It looks like this introduces a state machine that goes like:
> >>>> - starts at CTAS_PARALLEL_INS_UNDEF
> >>>> - possibly moves to CTAS_PARALLEL_INS_SELECT
> >>>> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> >>>> - if both were added at some stage, we can go to
> >>>> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> >>>>
> >>>> what i'm wondering is why you opted to put logic around
> >>>> generate_useful_gather_paths and in cost_gather when to me it seems more
> >>>> logical to put it in create_gather_path? i'm probably missing something
> >>>> there?
> >>>
> >>> IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
> >>> And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which
canonly create top node Gather.
 
> >>> So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top
node.
> >
> > Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
> >
> >> I was wondering actually if we need the state machine. Reason is that as
> >> AFAICS the code could be placed in create_gather_path, where you can
> >> also check if it is a top gather node, whether the dest receiver is the
> >> right type, etc? To me that seems like a nicer solution as its makes
> >> that all logic that decides whether or not a parallel CTAS is valid is
> >> in a single place instead of distributed over various places.
> >
> > IMO, we can't determine the fact that we are going to generate the top
> > Gather path in create_gather_path. To decide on whether or not the top
> > Gather path generation, I think it's not only required to check the
> > root->query_level == 1 but we also need to rely on from where
> > generate_useful_gather_paths gets called. For instance, for
> > query_level 1, generate_useful_gather_paths gets called from 2 places
> > in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
> > gets called from many places. IMO, the current way i.e. setting flag
> > it in apply_scanjoin_target_to_paths and ignoring based on that in
> > cost_gather seems safe.
> >
> > I may be wrong. Thoughts?
>
> So the way I understand it the requirements are:
> - it needs to be the top-most gather
> - it should not do anything with the rows after the gather node as this
> would make the parallel inserts conceptually invalid.

Right.

> Right now we're trying to judge what might be added on-top that could
> change the rows by inspecting all parts of the root object that would
> cause anything to be added, and add a little statemachine to track the
> state of that knowledge. To me this has the downside that the list in
> HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
> make sure it stays up-to-date, which could result in regressions if not
> tracked carefully.

Right. Any new clause that will be added which generates an upper path
in grouping_planner after apply_scanjoin_target_to_paths also needs to
be added to HAS_PARENT_PATH_GENERATING_CLAUSE. Otherwise, we might
ignore the parallel tuple cost because of which the parallel plan may
be chosen and we go for parallel inserts only when the top node is
Gather. I don't think any new clause that will be added generates a
new upper Gather node in grouping_planner after
apply_scanjoin_target_to_paths.

> Personally I would therefore go for a design which is safe in the sense
> that regressions are not as easily introduced. IMHO that could be done
> by inspecting the planned query afterwards, and then judging whether or
> not the parallel inserts are actually the right thing to do.

The 0001 patch does that. It doesn't have any influence on the planner
for parallel tuple cost calculation, it just looks at the generated
plan and decides on parallel inserts. Having said that, we might miss
parallel plans even though we know that there will not be tuples
transferred from workers to Gather. So, 0002 patch adds the code for
influencing the planner for parallel tuple cost.

> Another way to create more safety against regressions would be to add an
> assert upon execution of the query that if we do parallel inserts that
> only a subset of allowed nodes exists above the gather node.

Yes, we already do this. Please have a look at
SetParallelInsertState() in the 0002 patch. The idea is that in any
case, if the planner ignored the tuple cost, but we later not allow
parallel inserts either due to the upper node is not Gather or Gather
with projections. The assertion fails. So, in case any new parent path
generating clause is added (apart from the ones that are there in
HAS_PARENT_PATH_GENERATING_CLAUSE) and we ignore the tuple cost, then
this Assert will catch it. Currently, I couldn't find any assertion
failures in my debug build with make check and make check world.

+    else
+    {
+        /*
+         * Upper Gather node has projections, so parallel insertions are not
+         * allowed.
+         */
+        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+            ((DR_intorel *) dest)->is_parallel = false;
+
+        gstate->dest = NULL;
+
+        /*
+         * Before returning, ensure that we have not done wrong parallel tuple
+         * cost enforcement in the planner. Main reason for this assertion is
+         * to check if we enforced the planner to ignore the parallel tuple
+         * cost (with the intention of choosing parallel inserts) due to which
+         * the parallel plan may have been chosen, but we do not allow the
+         * parallel inserts now.
+         *
+         * If we have correctly ignored parallel tuple cost in the planner
+         * while creating Gather path, then this assertion failure should not
+         * occur. In case it occurs, that means the planner may have chosen
+         * this parallel plan because of our wrong enforcement. So let's try to
+         * catch that here.
+         */
+        Assert(tuple_cost_opts && !(*tuple_cost_opts &
+               PARALLEL_INSERT_TUP_COST_IGNORED));
+    }

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :

workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.

It would be nice if the scenarios where parallel plan is not chosen are listed.

+   if ((root->parse->parallelInsCmdTupleCostOpt &
+        PARALLEL_INSERT_SELECT_QUERY) &&
+       (root->parse->parallelInsCmdTupleCostOpt &
+        PARALLEL_INSERT_CAN_IGN_TUP_COST))
+   {
+       /* We are ignoring the parallel tuple cost, so mark it. */
+       root->parse->parallelInsCmdTupleCostOpt |=
+                                           PARALLEL_INSERT_TUP_COST_IGNORED;

If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.

Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should suffice.

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > +                                       if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > +                                               ((DR_intorel *) gstate->dest)->into->rel &&
> > +                                               ((DR_intorel *) gstate->dest)->into->rel->relname)
> > why would rel and relname not be there? if no rows have been inserted?
> > because it seems from the intorel_startup function that that would be
> > set as soon as startup was done, which i assume (wrongly?) is always done?
>
> Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and SELECT INTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually, it's not required.
>
> create_as_target:
>             qualified_name opt_column_list table_access_method_clause
>             OptWith OnCommitOption OptTableSpace
>                 {
>                     $$ = makeNode(IntoClause);
>                     $$->rel = $1;
> create_mv_target:
>             qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
>             {
>                 $$ = makeNode(IntoClause);
>                 $$->rel = $1;
> into_clause:
>             INTO OptTempTableName
>             {
>                 $$ = makeNode(IntoClause);
>                $$->rel = $2;
>
> I will change the below code:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> +                        ((DR_intorel *) gstate->dest)->into &&
> +                        ((DR_intorel *) gstate->dest)->into->rel &&
> +                        ((DR_intorel *) gstate->dest)->into->rel->relname)
> +                    {
>
> to:
> +                    if (GetParallelInsertCmdType(gstate->dest) ==
> +                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +                    {
>
> I will update this in the next version of the patch set.

Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
>
> workers to Gather node to 0. With this change, there are chances
> that the planner may choose the parallel plan.
>
> It would be nice if the scenarios where a parallel plan is not chosen are listed.

There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.

Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.

+     * Parallel insertions are possible only if the upper node is Gather.
      */
+    if (!IsA(gstate, GatherState))
         return;

+     * Parallelize inserts only when the upper Gather node has no projections.
      */
+    if (!gstate->ps.ps_ProjInfo)
+    {
+        /* Okay to parallelize inserts, so mark it. */
+        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+            ((DR_intorel *) dest)->is_parallel = true;
+
+        /*
+         * For parallelizing inserts, we must send some information so that the
+         * workers can build their own dest receivers. For CTAS, this info is
+         * into clause, object id (to open the created table).
+         *
+         * Since the required information is available in the dest receiver,
+         * store a reference to it in the Gather state so that it will be used
+         * in ExecInitParallelPlan to pick the information.
+         */
+        gstate->dest = dest;
+    }
+    else
+    {
+        /*
+         * Upper Gather node has projections, so parallel insertions are not
+         * allowed.
+         */

> +   if ((root->parse->parallelInsCmdTupleCostOpt &
> +        PARALLEL_INSERT_SELECT_QUERY) &&
> +       (root->parse->parallelInsCmdTupleCostOpt &
> +        PARALLEL_INSERT_CAN_IGN_TUP_COST))
> +   {
> +       /* We are ignoring the parallel tuple cost, so mark it. */
> +       root->parse->parallelInsCmdTupleCostOpt |=
> +                                           PARALLEL_INSERT_TUP_COST_IGNORED;
>
> If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set,
PARALLEL_INSERT_TUP_COST_IGNOREDis implied.
 
>
> Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should
suffice.

The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.

void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
    ListCell   *lc;
    double        rows;
    double       *rowsp = NULL;
    List       *useful_pathkeys_list = NIL;
    Path       *cheapest_partial_path = NULL;

    /* If there are no partial paths, there's nothing to do here. */
    if (rel->partial_pathlist == NIL)
        return;

    /* Should we override the rel's rowcount estimate? */
    if (override_rows)
        rowsp = &rows;

    /* generate the regular gather (merge) paths */
    generate_gather_paths(root, rel, override_rows);

void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
    Path       *cheapest_partial_path;
    Path       *simple_gather_path;
    ListCell   *lc;
    double        rows;
    double       *rowsp = NULL;

    /* If there are no partial paths, there's nothing to do here. */
    if (rel->partial_pathlist == NIL)
        return;


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Zhihong Yu
Date:
Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.

Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
>
> workers to Gather node to 0. With this change, there are chances
> that the planner may choose the parallel plan.
>
> It would be nice if the scenarios where a parallel plan is not chosen are listed.

There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.

Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.

+     * Parallel insertions are possible only if the upper node is Gather.
      */
+    if (!IsA(gstate, GatherState))
         return;

+     * Parallelize inserts only when the upper Gather node has no projections.
      */
+    if (!gstate->ps.ps_ProjInfo)
+    {
+        /* Okay to parallelize inserts, so mark it. */
+        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+            ((DR_intorel *) dest)->is_parallel = true;
+
+        /*
+         * For parallelizing inserts, we must send some information so that the
+         * workers can build their own dest receivers. For CTAS, this info is
+         * into clause, object id (to open the created table).
+         *
+         * Since the required information is available in the dest receiver,
+         * store a reference to it in the Gather state so that it will be used
+         * in ExecInitParallelPlan to pick the information.
+         */
+        gstate->dest = dest;
+    }
+    else
+    {
+        /*
+         * Upper Gather node has projections, so parallel insertions are not
+         * allowed.
+         */

> +   if ((root->parse->parallelInsCmdTupleCostOpt &
> +        PARALLEL_INSERT_SELECT_QUERY) &&
> +       (root->parse->parallelInsCmdTupleCostOpt &
> +        PARALLEL_INSERT_CAN_IGN_TUP_COST))
> +   {
> +       /* We are ignoring the parallel tuple cost, so mark it. */
> +       root->parse->parallelInsCmdTupleCostOpt |=
> +                                           PARALLEL_INSERT_TUP_COST_IGNORED;
>
> If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.
>
> Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should suffice.

The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.

void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
    ListCell   *lc;
    double        rows;
    double       *rowsp = NULL;
    List       *useful_pathkeys_list = NIL;
    Path       *cheapest_partial_path = NULL;

    /* If there are no partial paths, there's nothing to do here. */
    if (rel->partial_pathlist == NIL)
        return;

    /* Should we override the rel's rowcount estimate? */
    if (override_rows)
        rowsp = &rows;

    /* generate the regular gather (merge) paths */
    generate_gather_paths(root, rel, override_rows);

void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
    Path       *cheapest_partial_path;
    Path       *simple_gather_path;
    ListCell   *lc;
    double        rows;
    double       *rowsp = NULL;

    /* If there are no partial paths, there's nothing to do here. */
    if (rel->partial_pathlist == NIL)
        return;


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
Hi,

Attaching v21 patch set, which has following changes:
1) 0001 - changed fpes->ins_cmd_type ==
PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
PARALLEL_INSERT_CMD_UNDEF
2) 0002 - reworded the commit message.
3) 0003 - added cmin, xmin test case to one of the parallel insert
cases to ensure leader and worker insert the tuples in the same xact
and replaced memory usage output in numbers like 25kB to NkB to make
the tests stable.
4) 0004 - updated one of the test output to be in NkB and made the
assertion in SetParallelInsertState to be not under an if condition.

There's one open point [1] on selective skipping of error "cannot
insert tuples in a parallel worker" in heap_prepare_insert(), thoughts
are welcome.

Please consider the v21 patch set for further review.

[1] - https://www.postgresql.org/message-id/CALj2ACXmbka1P5pxOV2vU-Go3UPTtsPqZXE8nKW1mE49MQcZtw%40mail.gmail.com


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Hou, Zhijie"
Date:
> Attaching v21 patch set, which has following changes:
> 1) 0001 - changed fpes->ins_cmd_type ==
> PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF
> 2) 0002 - reworded the commit message.
> 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> to ensure leader and worker insert the tuples in the same xact and replaced
> memory usage output in numbers like 25kB to NkB to make the tests stable.
> 4) 0004 - updated one of the test output to be in NkB and made the assertion
> in SetParallelInsertState to be not under an if condition.
> 
> There's one open point [1] on selective skipping of error "cannot insert
> tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> welcome.
> 
> Please consider the v21 patch set for further review.

Hi,

I took a look into the new patch and have some comments.

1.
+    /*
+     * Do not consider tuple cost in case of we intend to perform parallel
+     * inserts by workers. We would have turned on the ignore flag in
+     * apply_scanjoin_target_to_paths before generating Gather path for the
+     * upper level SELECT part of the query.
+     */
+    if ((root->parse->parallelInsCmdTupleCostOpt &
+         PARALLEL_INSERT_SELECT_QUERY) &&
+        (root->parse->parallelInsCmdTupleCostOpt &
+         PARALLEL_INSERT_CAN_IGN_TUP_COST))

Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.


2.
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+                       void *ins_info)
...
+        info = (ParallelInsertCTASInfo *) ins_info;
+        intoclause_str = nodeToString(info->intoclause);
+        intoclause_len = strlen(intoclause_str) + 1;

+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+                       void *ins_info)
...
+        info = (ParallelInsertCTASInfo *)ins_info;
+        intoclause_str = nodeToString(info->intoclause);
+        intoclause_len = strlen(intoclause_str) + 1;
+        intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);

I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?

3.
+    if (node->need_to_scan_locally || node->nworkers_launched == 0)
+    {
+        EState       *estate = node->ps.state;
+        TupleTableSlot *outerTupleSlot;
+
+        for(;;)
+        {
+            /* Install our DSA area while executing the plan. */
+            estate->es_query_dsa =
+                    node->pei ? node->pei->area : NULL;
...
+            node->ps.state->es_processed++;
+        }

How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;

Best regards,
houzj






Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Jan 11, 2021 at 6:37 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Attaching v21 patch set, which has following changes:
> > 1) 0001 - changed fpes->ins_cmd_type ==
> > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> > PARALLEL_INSERT_CMD_UNDEF
> > 2) 0002 - reworded the commit message.
> > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> > to ensure leader and worker insert the tuples in the same xact and replaced
> > memory usage output in numbers like 25kB to NkB to make the tests stable.
> > 4) 0004 - updated one of the test output to be in NkB and made the assertion
> > in SetParallelInsertState to be not under an if condition.
> >
> > There's one open point [1] on selective skipping of error "cannot insert
> > tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> > welcome.
> >
> > Please consider the v21 patch set for further review.
>
> Hi,
>
> I took a look into the new patch and have some comments.

Thanks.

> 1.
> +       /*
> +        * Do not consider tuple cost in case of we intend to perform parallel
> +        * inserts by workers. We would have turned on the ignore flag in
> +        * apply_scanjoin_target_to_paths before generating Gather path for the
> +        * upper level SELECT part of the query.
> +        */
> +       if ((root->parse->parallelInsCmdTupleCostOpt &
> +                PARALLEL_INSERT_SELECT_QUERY) &&
> +               (root->parse->parallelInsCmdTupleCostOpt &
> +                PARALLEL_INSERT_CAN_IGN_TUP_COST))
>
> Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
> IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.

+1. Changed.

> 2.
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> +                                          void *ins_info)
> ...
> +               info = (ParallelInsertCTASInfo *) ins_info;
> +               intoclause_str = nodeToString(info->intoclause);
> +               intoclause_len = strlen(intoclause_str) + 1;
>
> +static void
> +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> +                                          void *ins_info)
> ...
> +               info = (ParallelInsertCTASInfo *)ins_info;
> +               intoclause_str = nodeToString(info->intoclause);
> +               intoclause_len = strlen(intoclause_str) + 1;
> +               intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
>
> I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
> Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?

I wanted to keep the API generic, not do nodeToString, strlen outside
and pass it to the APIs. I don't think it will add too much function
call cost since it's run only in the leader.  This way, the code and
API looks more readable. Thoughts?

> 3.
> +       if (node->need_to_scan_locally || node->nworkers_launched == 0)
> +       {
> +               EState     *estate = node->ps.state;
> +               TupleTableSlot *outerTupleSlot;
> +
> +               for(;;)
> +               {
> +                       /* Install our DSA area while executing the plan. */
> +                       estate->es_query_dsa =
> +                                       node->pei ? node->pei->area : NULL;
> ...
> +                       node->ps.state->es_processed++;
> +               }
>
> How about use the variables estate like 'estate-> es_processed++;'
> Instead of node->ps.state->es_processed++;

+1. Changed.

Attaching v22 patch set with changes only in 0001 and 0002. Please
consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Luc Vlaming
Date:
On 06-01-2021 09:32, Bharath Rupireddy wrote:
> On Tue, Jan 5, 2021 at 1:25 PM Luc Vlaming <luc@swarm64.com> wrote:
>>>>>> wrt v18-0002....patch:
>>>>>>
>>>>>> It looks like this introduces a state machine that goes like:
>>>>>> - starts at CTAS_PARALLEL_INS_UNDEF
>>>>>> - possibly moves to CTAS_PARALLEL_INS_SELECT
>>>>>> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
>>>>>> - if both were added at some stage, we can go to
>>>>>> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
>>>>>>
>>>>>> what i'm wondering is why you opted to put logic around
>>>>>> generate_useful_gather_paths and in cost_gather when to me it seems more
>>>>>> logical to put it in create_gather_path? i'm probably missing something
>>>>>> there?
>>>>>
>>>>> IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
>>>>> And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which
canonly create top node Gather.
 
>>>>> So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top
node.
>>>
>>> Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
>>>
>>>> I was wondering actually if we need the state machine. Reason is that as
>>>> AFAICS the code could be placed in create_gather_path, where you can
>>>> also check if it is a top gather node, whether the dest receiver is the
>>>> right type, etc? To me that seems like a nicer solution as its makes
>>>> that all logic that decides whether or not a parallel CTAS is valid is
>>>> in a single place instead of distributed over various places.
>>>
>>> IMO, we can't determine the fact that we are going to generate the top
>>> Gather path in create_gather_path. To decide on whether or not the top
>>> Gather path generation, I think it's not only required to check the
>>> root->query_level == 1 but we also need to rely on from where
>>> generate_useful_gather_paths gets called. For instance, for
>>> query_level 1, generate_useful_gather_paths gets called from 2 places
>>> in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
>>> gets called from many places. IMO, the current way i.e. setting flag
>>> it in apply_scanjoin_target_to_paths and ignoring based on that in
>>> cost_gather seems safe.
>>>
>>> I may be wrong. Thoughts?
>>
>> So the way I understand it the requirements are:
>> - it needs to be the top-most gather
>> - it should not do anything with the rows after the gather node as this
>> would make the parallel inserts conceptually invalid.
> 
> Right.
> 
>> Right now we're trying to judge what might be added on-top that could
>> change the rows by inspecting all parts of the root object that would
>> cause anything to be added, and add a little statemachine to track the
>> state of that knowledge. To me this has the downside that the list in
>> HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
>> make sure it stays up-to-date, which could result in regressions if not
>> tracked carefully.
> 
> Right. Any new clause that will be added which generates an upper path
> in grouping_planner after apply_scanjoin_target_to_paths also needs to
> be added to HAS_PARENT_PATH_GENERATING_CLAUSE. Otherwise, we might
> ignore the parallel tuple cost because of which the parallel plan may
> be chosen and we go for parallel inserts only when the top node is
> Gather. I don't think any new clause that will be added generates a
> new upper Gather node in grouping_planner after
> apply_scanjoin_target_to_paths.
> 
>> Personally I would therefore go for a design which is safe in the sense
>> that regressions are not as easily introduced. IMHO that could be done
>> by inspecting the planned query afterwards, and then judging whether or
>> not the parallel inserts are actually the right thing to do.
> 
> The 0001 patch does that. It doesn't have any influence on the planner
> for parallel tuple cost calculation, it just looks at the generated
> plan and decides on parallel inserts. Having said that, we might miss
> parallel plans even though we know that there will not be tuples
> transferred from workers to Gather. So, 0002 patch adds the code for
> influencing the planner for parallel tuple cost.
> 

Ok. Thanks for the explanation and sorry for the confusion.

>> Another way to create more safety against regressions would be to add an
>> assert upon execution of the query that if we do parallel inserts that
>> only a subset of allowed nodes exists above the gather node.
> 
> Yes, we already do this. Please have a look at
> SetParallelInsertState() in the 0002 patch. The idea is that in any
> case, if the planner ignored the tuple cost, but we later not allow
> parallel inserts either due to the upper node is not Gather or Gather
> with projections. The assertion fails. So, in case any new parent path
> generating clause is added (apart from the ones that are there in
> HAS_PARENT_PATH_GENERATING_CLAUSE) and we ignore the tuple cost, then
> this Assert will catch it. Currently, I couldn't find any assertion
> failures in my debug build with make check and make check world.
> 

Ok. Seems I missed that assert when reviewing.

> +    else
> +    {
> +        /*
> +         * Upper Gather node has projections, so parallel insertions are not
> +         * allowed.
> +         */
> +        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +            ((DR_intorel *) dest)->is_parallel = false;
> +
> +        gstate->dest = NULL;
> +
> +        /*
> +         * Before returning, ensure that we have not done wrong parallel tuple
> +         * cost enforcement in the planner. Main reason for this assertion is
> +         * to check if we enforced the planner to ignore the parallel tuple
> +         * cost (with the intention of choosing parallel inserts) due to which
> +         * the parallel plan may have been chosen, but we do not allow the
> +         * parallel inserts now.
> +         *
> +         * If we have correctly ignored parallel tuple cost in the planner
> +         * while creating Gather path, then this assertion failure should not
> +         * occur. In case it occurs, that means the planner may have chosen
> +         * this parallel plan because of our wrong enforcement. So let's try to
> +         * catch that here.
> +         */
> +        Assert(tuple_cost_opts && !(*tuple_cost_opts &
> +               PARALLEL_INSERT_TUP_COST_IGNORED));
> +    }
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Kind regards,
Luc



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Mon, Jan 11, 2021 at 8:51 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Attaching v22 patch set with changes only in 0001 and 0002. Please
> consider it for further review.

Seems like v22 patch was failing in cfbot for one of the unstable test
cases. Attaching v23 patch set with modification in 0003 and 0004
patches. No changes to 0001 and 0002 patches. Hopefully cfbot will be
happy with v23.

Please consider v23 for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"Tang, Haiying"
Date:
Hi Bharath,

I'm trying to take some performance measurements on you patch v23.  
But when I started, I found an issue about the tuples unbalance distribution among workers(99% tuples read by one
worker)under specified case which lead the "parallel select" part makes no performance gain.
 
Then I find it's not introduced by your patch, because it's also happening in master(HEAD). But I don't know how to
dealwith it , so I put it here to see if anybody know what's going wrong with this or have good ideas to deal this
issue.

Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it also happen but not so obvious(40% tuples on
oneworker in my tests).
 
2. query plan is "serial insert + parallel select", I have reproduce this behavior in (CTAS, Select into, insert into
select).
3. select part needs to query large data size(e.g. query 100 million from 200 million).

According to above, IMHO, I guess it may be caused by the leader write rate can't catch the worker read rate, then the
tuplesof one worker blocked in the queue, become more and more.
 

Below is my test info:
1. test spec environment
  CentOS 8.2, 128G RAM, 40 processors, disk SAS 

2. test data prepare
  create table x(a int, b int, c int);
  create index on x(a);
  insert into x select generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);

3. test execute results
  *Patched CTAS*: please look at worker 2, 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.203..24023.686 rows=100006268 loops=1)
   Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.016..4367.035
rows=20001254loops=5)
 
         Output: a, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 19998746
         Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
         Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
         Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
         Worker 3:  actual time=0.015..19.347 rows=94586 loops=1  Planning Time: 0.098 ms  Execution Time: 91054.828
ms

  *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
   Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.283..19216.157 rows=100003148 loops=1)
   Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.020..4380.360
rows=20000630loops=5)
 
         Output: a, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 19999370
         Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
         Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
         Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
         Worker 3:  actual time=0.035..19.849 rows=94679 loops=1  Planning Time: 0.083 ms  Execution Time: 91151.097
ms

I'm still working on the performance tests on your patch, if I make some progress, I will post my results here.

Regards,
Tang



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Jan 22, 2021 at 5:16 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote:
>
> Hi Bharath,
>
> I'm trying to take some performance measurements on you patch v23.
> But when I started, I found an issue about the tuples unbalance distribution among workers(99% tuples read by one worker) under specified case which lead the "parallel select" part makes no performance gain.
> Then I find it's not introduced by your patch, because it's also happening in master(HEAD). But I don't know how to deal with it , so I put it here to see if anybody know what's going wrong with this or have good ideas to deal this issue.
>
> Here are the conditions to produce the issue:
> 1. high CPU spec environment(say above 20 processors). In smaller CPU, it also happen but not so obvious(40% tuples on one worker in my tests).
> 2. query plan is "serial insert + parallel select", I have reproduce this behavior in (CTAS, Select into, insert into select).
> 3. select part needs to query large data size(e.g. query 100 million from 200 million).
>
> According to above, IMHO, I guess it may be caused by the leader write rate can't catch the worker read rate, then the tuples of one worker blocked in the queue, become more and more.
>
> Below is my test info:
> 1. test spec environment
>   CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> 2. test data prepare
>   create table x(a int, b int, c int);
>   create index on x(a);
>   insert into x select generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
>
> 3. test execute results
>   *Patched CTAS*: please look at worker 2, 99% tuples read by it.
>   explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
>                                                                 QUERY PLAN
>   -------------------------------------------------------------------------------------------------------------------------------
>  Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.203..24023.686 rows=100006268 loops=1)
>    Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
>    Workers Planned: 4
>    Workers Launched: 4
>    ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
>          Output: a, c
>          Filter: ((x.b % 2) = 0)
>          Rows Removed by Filter: 19998746
>          Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
>          Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
>          Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
>          Worker 3:  actual time=0.015..19.347 rows=94586 loops=1  Planning Time: 0.098 ms  Execution Time: 91054.828 ms
>
>   *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
>   explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
>                                                                 QUERY PLAN
>   -------------------------------------------------------------------------------------------------------------------------------
>    Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.283..19216.157 rows=100003148 loops=1)
>    Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
>    Workers Planned: 4
>    Workers Launched: 4
>    ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
>          Output: a, c
>          Filter: ((x.b % 2) = 0)
>          Rows Removed by Filter: 19999370
>          Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
>          Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
>          Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
>          Worker 3:  actual time=0.035..19.849 rows=94679 loops=1  Planning Time: 0.083 ms  Execution Time: 91151.097 ms
>
> I'm still working on the performance tests on your patch, if I make some progress, I will post my results here.

Thanks a lot for the tests. In your test case, parallel insertions are not being picked because the Gather node has some projections(floor(((random() * '10000'::double precision) + '1'::double precision)) to perform. That's expected. Whenever parallel insertions are chosen for CTAS, we should see "Create target_table '' under Gather node [1] and also the actual row count for Gather node 0 (but in your test it is rows=100006268) in the explain analyze output. Coming to your test case, if it's modified to something like [1], where the Gather node has no projections, then parallel insertions will be chosen.

[1] - I did this test on my development system, I will run on some performance system and post my observations.
postgres=# explain (analyze, verbose) create table test(a,b,c) as select a,b,c from x where b%2=0;
                                                         QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..3846.71 rows=1000 width=12) (actual time=5581.308..5581.379 rows=0 loops=1)
   Output: a, b, c
   Workers Planned: 1
   Workers Launched: 1
 ->  Create test
   ->  Parallel Seq Scan on public.x  (cost=0.00..2846.71 rows=588 width=12) (actual time=0.014..29.512 rows=50023 loops=2)
         Output: a, b, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 49977
         Worker 0:  actual time=0.015..29.751 rows=49419 loops=1
 Planning Time: 1574.584 ms
 Execution Time: 6437.562 ms
(12 rows)

With Regards,
Bharath Rupireddy.

RE: Parallel Inserts in CREATE TABLE AS

From
"Tang, Haiying"
Date:

>Thanks a lot for the tests. In your test case, parallel insertions are not being picked because the Gather node has

> some projections(floor(((random() * '10000'::double precision) + >'1'::double precision)) to perform. That's expected.

>Whenever parallel insertions are chosen for CTAS, we should see "Create target_table '' under Gather node [1] and

>also the actual >row count for Gather node 0 (but in your test it is rows=100006268) in the explain analyze output.

>Coming to your test case, if it's modified to something like [1], where the Gather >node has no projections,

>then parallel insertions will be chosen.

 

Thanks for your explanation and test.

Actually, I deliberately made my test case(with projection) to pick serial insert to make tuples unbalance distribution(99% tuples read by one worker) happened.

This issue will lead the performance regression.

 

But it's not introduced by your patch, its happening in master(HEAD).

Do you have some thoughts about this.

 

>[1] - I did this test on my development system, I will run on some performance system and post my observations.

 

Thank you, It will be very kind of you to do this.

To reproduce above issue, you need to use my case(with projection). Because it wont occur in parallel insert.

 

Regards,

Tang

RE: Parallel Inserts in CREATE TABLE AS

From
"Tang, Haiying"
Date:

Hi Bharath,

 

I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times.

Most of the tests execution become faster with this patch.

However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table size(2/10/20 millions), they all have a 6%-10% declines. I think it may need some check at this problem.

 

Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file. 

reg%=(patched-master)/master

Test NO |  Test Case                     |reg%  | patched(ms)  | master(ms)

--------|--------------------------------|------|--------------|-------------

1       |  CTAS select from table        | -9%  | 16709.50477  | 18370.76660

2       |  Append plan                   | -14% | 16542.97807  | 19305.86600

3       |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633

4       |  CTAS table                    | 10%  | 20835.48800  | 18986.40350

5       |  CTAS select from execute      | -6%  | 16973.73890  | 18008.59789

 

About Test NO 4:

In master(HEAD), this test case picks serial seq scan.

query plan likes:

----------------------------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000 loops=1)

   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4  Planning Time: 0.053 ms  Execution Time: 20165.023 ms

 

With this patch, it will choose parallel seq scan and parallel insert.

query plan likes:

----------------------------------------------------------------------------------------------------------------------------------------------------------

Gather  (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)

   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4

   Workers Planned: 4

   Workers Launched: 4

->  Create test

   ->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094 rows=2000000 loops=5)

         Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4

         Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1

         Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1

         Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1

         Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning Time: 0.053 ms  Execution Time: 20437.643 ms

 

test machine spec:

CentOS 8.2, 128G RAM, 40 processors, disk SAS

 

Regards,

Tang

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
> I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times.
>
> Most of the tests execution become faster with this patch.
>
> However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table
size(2/10/20millions), they all have a 6%-10% declines. I think it may need some check at this problem.
 
>
>
>
> Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
>
> reg%=(patched-master)/master
>
> Test NO |  Test Case                     |reg%  | patched(ms)  | master(ms)
>
> --------|--------------------------------|------|--------------|-------------
>
> 1       |  CTAS select from table        | -9%  | 16709.50477  | 18370.76660
>
> 2       |  Append plan                   | -14% | 16542.97807  | 19305.86600
>
> 3       |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633
>
> 4       |  CTAS table                    | 10%  | 20835.48800  | 18986.40350
>
> 5       |  CTAS select from execute      | -6%  | 16973.73890  | 18008.59789
>
>
>
> About Test NO 4:
>
> In master(HEAD), this test case picks serial seq scan.
>
> query plan likes:
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000
loops=1)
>
>    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even,
stringu1,stringu2, string4  Planning Time: 0.053 ms  Execution Time: 20165.023 ms
 
>
>
>
> With this patch, it will choose parallel seq scan and parallel insert.
>
> query plan likes:
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Gather  (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
>
>    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even,
stringu1,stringu2, string4
 
>
>    Workers Planned: 4
>
>    Workers Launched: 4
>
> ->  Create test
>
>    ->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094
rows=2000000loops=5)
 
>
>          Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd,
even,stringu1, stringu2, string4
 
>
>          Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1
>
>          Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1
>
>          Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1
>
>          Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning Time: 0.053 ms  Execution Time:
20437.643ms
 
>
>
>
> test machine spec:
>
> CentOS 8.2, 128G RAM, 40 processors, disk SAS

Thanks a lot for the performance tests and test cases. I will analyze
why the performance is degrading one case and respond soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
> <tanghy.fnst@cn.fujitsu.com> wrote:
> > I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30
times.
> >
> > Most of the tests execution become faster with this patch.
> >
> > However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table
size(2/10/20millions), they all have a 6%-10% declines. I think it may need some check at this problem.
 
> >
> >
> >
> > Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
> >
> > reg%=(patched-master)/master
> >
> > Test NO |  Test Case                     |reg%  | patched(ms)  | master(ms)
> >
> > --------|--------------------------------|------|--------------|-------------
> >
> > 1       |  CTAS select from table        | -9%  | 16709.50477  | 18370.76660
> >
> > 2       |  Append plan                   | -14% | 16542.97807  | 19305.86600
> >
> > 3       |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633
> >
> > 4       |  CTAS table                    | 10%  | 20835.48800  | 18986.40350
> >
> > 5       |  CTAS select from execute      | -6%  | 16973.73890  | 18008.59789
> >
> >
> >
> > About Test NO 4:
> >
> > In master(HEAD), this test case picks serial seq scan.
> >
> > query plan likes:
> >
> >
----------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000
loops=1)
> >
> >    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd,
even,stringu1, stringu2, string4  Planning Time: 0.053 ms  Execution Time: 20165.023 ms
 
> >
> >
> >
> > With this patch, it will choose parallel seq scan and parallel insert.
> >
> > query plan likes:
> >
> >
----------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Gather  (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
> >
> >    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd,
even,stringu1, stringu2, string4
 
> >
> >    Workers Planned: 4
> >
> >    Workers Launched: 4
> >
> > ->  Create test
> >
> >    ->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094
rows=2000000loops=5)
 
> >
> >          Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous,
odd,even, stringu1, stringu2, string4
 
> >
> >          Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1
> >
> >          Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1
> >
> >          Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1
> >
> >          Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning Time: 0.053 ms  Execution Time:
20437.643ms
 
> >
> >
> >
> > test machine spec:
> >
> > CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> Thanks a lot for the performance tests and test cases. I will analyze
> why the performance is degrading one case and respond soon.

I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk. I saw no regression when I incremented(for testing purpose) the
rate at which the extra blocks are added in RelationAddExtraBlocks to
extraBlocks = Min(1024, lockWaiters * 512); (currently it is
extraBlocks = Min(512, lockWaiters * 20); Incrementing the extra
blocks addition rate is not a practical solution to this problem
though.

In an offlist discussion with Robert and Dilip, using fallocate to
extend the relation may help to extend the relation faster. In regards
to this, it looks like the AIO/DIO patch set of Andres [1]  which
involves using fallocate() to extend files will surely be helpful.
Until then, we honestly feel that the parallel inserts in CTAS patch
set be put on hold and revive it later.

[1] - https://www.postgresql.org/message-id/flat/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"tanghy.fnst@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
>I analyzed performance of parallel inserts in CTAS for different cases
>with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
>gain if the tuple sizes are lower. But if the tuple size is larger
>i..e 1064bytes, there's a regression with parallel inserts.

Thanks for the update.
BTW, May be you have some more testcases that can reproduce this regression easily.
Can you please share some of the testcase (with big tuple size) with me.

Regards,
Tang



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, Mar 19, 2021 at 12:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
>
> From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> >I analyzed performance of parallel inserts in CTAS for different cases
> >with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> >gain if the tuple sizes are lower. But if the tuple size is larger
> >i..e 1064bytes, there's a regression with parallel inserts.
>
> Thanks for the update.
> BTW, May be you have some more testcases that can reproduce this regression easily.
> Can you please share some of the testcase (with big tuple size) with me.

They are pretty simple though. I think someone can also check if the
same regression exists for parallel inserts in "INSERT INTO SELECT"
patch set as well for larger tuple sizes.

[1]
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int);
INSERT INTO tenk1 values(generate_series(1,100000000),
generate_series(1,100000000));
explain analyze verbose create table test as select * from tenk1;

DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int, c3 varchar(8), c4
varchar(8), c5 varchar(8));
INSERT INTO tenk1 values(generate_series(1,100000000),
generate_series(1,100000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;

DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 varchar(8));
INSERT INTO tenk1 values(generate_series(1,100000000),
generate_series(1,100000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;

DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,10000000),
generate_series(1,10000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create unlogged table test as select * from tenk1;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
> They are pretty simple though. I think someone can also check if the same
> regression exists for parallel inserts in "INSERT INTO SELECT"
> patch set as well for larger tuple sizes.

Thanks for reminding.
I did some performance tests for parallel inserts in " INSERT INTO SELECT " with the testcase you provided,
the regression seems does not exists in "INSERT INTO SELECT".

I will try to test with larger tuple size later.

Best regards,
houzj

RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
> > They are pretty simple though. I think someone can also check if the
> > same regression exists for parallel inserts in "INSERT INTO SELECT"
> > patch set as well for larger tuple sizes.
> 
> Thanks for reminding.
> I did some performance tests for parallel inserts in " INSERT INTO SELECT " with
> the testcase you provided, the regression seems does not exists in "INSERT
> INTO SELECT".

I forgot to share the test results with Parallel CTAS.

I test with sql: explain analyze verbose create table test as select * from tenk1;

> CREATE UNLOGGED TABLE tenk1(c1 int, c2 int);
> CREATE UNLOGGED TABLE tenk1(c1 int, c2 int, c3 varchar(8), c4 varchar(8), c5 varchar(8));
> CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 varchar(8));
I did not see regression in these cases (low tuple size).

> CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, c6 name, c7 name, c8 name, c9 name, c10
name,c11 name, c12 name, c13 name, c14 name, 
 
> c15 name, c16 name, c17 name, c18 name);

I can see the degradation in this case.
The average test results of CTAS are:
Serial  CTAS -----Execution Time: 80892.240 ms
Parallel CTAS -----Execution Time: 85725.591 ms
About 6% degradation.

I also test with Parallel INSERT patch in this case.
(Note: to keep consistent, I create a new target table(test) before inserting.)
The average test results of Parallel INSERT are:
Serial Parallel INSERT ------ Execution Time: 90075.501 ms
Parallel Parallel INSERT----- Execution Time: 85812.202 ms
No degradation.

Best regards,
houzj


Re: Parallel Inserts in CREATE TABLE AS

From
Greg Nancarrow
Date:
On Fri, Mar 19, 2021 at 4:33 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> In an offlist discussion with Robert and Dilip, using fallocate to
> extend the relation may help to extend the relation faster. In regards
> to this, it looks like the AIO/DIO patch set of Andres [1]  which
> involves using fallocate() to extend files will surely be helpful.
> Until then, we honestly feel that the parallel inserts in CTAS patch
> set be put on hold and revive it later.
>

Hi,

I had partially reviewed some of the patches (first scan) when I was
alerted to your post and intention to put the patch on hold.
I thought I'd just post the comments I have so far, and you can look
at them at a later time when/if you revive the patch.


Patch 0001

1) Patch comment

Leader inserts its share of tuples if instructed to do, and so are workers

should be:

Leader inserts its share of tuples if instructed to, and so do the workers.


2)

void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
{
    GatherState *gstate;
    DestReceiver *dest;

    Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));

    gstate = (GatherState *) queryDesc->planstate;
    dest = queryDesc->dest;

    /*
    * Parallel insertions are not possible either if the upper node is not
    * Gather or it's a Gather but it have some projections to perform.
    */
    if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
        return;


I think it would look better for code to be:


    dest = queryDesc->dest;

    /*
    * Parallel insertions are not possible either if the upper node is not
    * Gather or it's a Gather but it have some projections to perform.
    */
    if (!IsA(queryDesc->planstate, GatherState) ||
queryDesc->planstate.ps_ProjInfo)
        return;
    gstate = (GatherState *) queryDesc->planstate;


3) src/backend/executor/execParallel.c

+ pg_atomic_uint64 processed;

I am wondering, when there is contention from multiple workers in
writing back their processed count, how well does this work? Any
performance issues?
For the Parallel INSERT patch (which has not yet been committed) it
currently uses an array of processed counts for the workers (since #
of workers is capped) so there is never any contention related to
this.


4) src/backend/executor/execParallel.c

You shouldn't use intermingled declarations and code.
https://www.postgresql.org/docs/13/source-conventions.html

Best to move the uninitialized variable declaration to the top of the block:


ParallelInsertCTASInfo *info = NULL;
char *intoclause_str = NULL;
int intoclause_len;
char *intoclause_space = NULL;

should be:

int intoclause_len;
ParallelInsertCTASInfo *info = NULL;
char *intoclause_str = NULL;
char *intoclause_space = NULL;


5) ExecParallelGetInsReceiver

Would look better to have:

DR_intorel *receiver;

receiver = (DR_intorel *)CreateIntoRelDestReceiver(intoclause);

receiver->is_parallel_worker = true;
receiver->object_id = fpes->objectid;


6) GetParallelInsertCmdType

I think the following would be better:

ParallelInsertCmdKind
GetParallelInsertCmdType(DestReceiver *dest)
{
    if (dest &&
        dest->mydest == DestIntoRel &&
            ((DR_intorel *) dest)->is_parallel)
        return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;

    return PARALLEL_INSERT_CMD_UNDEF;
}


7) IsParallelInsertAllowed

In the following code:

/* Below check may hit in case this function is called from explain.c. */
if (!(into && IsA(into, IntoClause)))
    return false;

If "into" is non-NULL, isn't it guaranteed to point at an IntoClause?

I think the code can just be:

/* Below check may hit in case this function is called from explain.c. */
if (!into)
    return false;

8) ExecGather

The comments and variable name are likely to cause confusion when the
parallel INSERT statement is implemented. Suggest minor change:

change:

   bool perform_parallel_ins = false;

to:

   bool perform_parallel_ins_no_readers = false;


change:

/*
* Do not create tuple queue readers for commands with parallel
* insertion. Because the gather node will not receive any
* tuples, the workers will insert the tuples into the target
* relation.
*/

to:

/*
* Do not create tuple queue readers for commands with parallel
* insertion that don't additionally return tuples. In this case,
* the workers will only insert the tuples into the target
* relation and the gather node will not receive any tuples.
*/


I think some changes in other areas are needed for the same reasons.



Patch 0002

1) I noticed that "rows" is not zero (and so is not displayed as 0 in
the EXPLAIN output for Gather) for the Gather node when parallel
inserts will be used. This doesn't seem to be right. I think that if
PARALLEL_INSERT_CAN_IGN_TUP_COST is set, path->rows should be set to
0, and just let existing "run_cost" be evaluated as normal (which will
be 0 as path->rows is 0).

2) Is PARALLEL_INSERT_TUP_COST_IGNORED actually needed? Couldn't only
PARALLEL_INSERT_CAN_IGN_TUP_COST be used for the purpose of ignoring
parallel tuple cost?


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
>
> I analyzed performance of parallel inserts in CTAS for different cases
> with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> gain if the tuple sizes are lower. But if the tuple size is larger
> i..e 1064bytes, there's a regression with parallel inserts. Upon
> further analysis, it turned out that the parallel workers are
> requiring frequent extra blocks addition while concurrently extending
> the relation(in RelationAddExtraBlocks) and the majority of the time
> spent is going into flushing those new empty pages/blocks onto the
> disk.
>

How you have ensured that the cost is due to the flushing of pages?
AFAICS, we don't flush the pages rather just write them and then
register those to be flushed by checkpointer, now it is possible that
the checkpointer sync queue gets full and the backend has to write by
itself but have we checked that? I think we can check via wait events,
if it is due to flush then we should see a lot of file sync
(WAIT_EVENT_DATA_FILE_SYNC) wait events.  The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> >
> > I analyzed performance of parallel inserts in CTAS for different cases
> > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> > gain if the tuple sizes are lower. But if the tuple size is larger
> > i..e 1064bytes, there's a regression with parallel inserts. Upon
> > further analysis, it turned out that the parallel workers are
> > requiring frequent extra blocks addition while concurrently extending
> > the relation(in RelationAddExtraBlocks) and the majority of the time
> > spent is going into flushing those new empty pages/blocks onto the
> > disk.
> >
>
> How you have ensured that the cost is due to the flushing of pages?
> AFAICS, we don't flush the pages rather just write them and then
> register those to be flushed by checkpointer, now it is possible that
> the checkpointer sync queue gets full and the backend has to write by
> itself but have we checked that? I think we can check via wait events,
> if it is due to flush then we should see a lot of file sync
> (WAIT_EVENT_DATA_FILE_SYNC) wait events.  The other possibility could
> be that the free pages added to FSM by one worker are not being used
> by another worker due to some reason. Can we debug and check if the
> pages added by one worker are being used by another worker?

Thanks! I will work on the above points sometime later.

BTW, I forgot to mention one point earlier that we see a benefit
without parallelism if only multi inserts are used for CTAS instead of
single inserts. See [2] for more testing results. I used "New Table
Access Methods for Multi and Single Inserts" patches from [1] for this
testing. I think it's a good idea to revisit that work.

[1] - https://www.postgresql.org/message-id/CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg%40mail.gmail.com
[2]
case 1 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
on master - 130sec
on master with multi inserts - 105sec, gain - 1.23X
on parallel CTAS patch without multi inserts - (2 workers,  82sec,
1.58X), (4 workers, 83sec, 1.56X)
on parallel CTAS patch with multi inserts - (2 workers,  45sec, 2.33X,
overall gain if seen from master 2.88X), (4 workers, 33sec, 3.18X,
overall gain if seen from master 3.9X)

case 2 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
59 bytes, 100mn tuples
on master - 185sec
on master with multi inserts - 121sec, gain - 1.52X
on parallel CTAS patch without multi inserts - (2 workers,  120sec,
1.54X), (4 workers, 123sec, 1.5X)
on parallel CTAS patch with multi inserts - (2 workers,  68sec, 1.77X,
overall gain if seen from master  2.72X), (4 workers, 61sec, 1.98X,
overall gain if seen from master 3.03X)

Above two cases are the best cases with tuple size a few bytes where
parallel CTAS + multi inserts would give up to 3.9X and 3.03X
benefits.

case 3 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
on master - 367sec
on master with multi inserts - 291sec, gain - 1.26X
on parallel CTAS patch without multi inserts - (2 workers,  334sec,
1.09X), (4 workers, 336sec, 1.09X)
on parallel CTAS patch with multi inserts - (2 workers,  284sec,
1.02X, overall gain if seen from master 1.29X), (4 workers, 278sec,
1.04X, overall gain if seen from master 1.32X)

Above case where tuple size is 241 bytes, we don't gain much.

case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
columns, tuple size 1064 bytes, 10mn tuples
on master - 120sec
on master with multi inserts - 115sec, gain - 1.04X
on parallel CTAS patch without multi inserts - (2 workers,  140sec,
0.85X), (4 workers, 142sec, 0.84X)
on parallel CTAS patch with multi inserts - (2 workers,  133sec,
0.86X, overall loss if seen from master  0.9X), (4 workers, 134sec,
0.85X, overall loss if seen from master 0.89X)

Above case where tuple size is 1064 bytes, we gain very little with
multi inserts and with parallel inserts we cause regression.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
Hi Bharath-san,

From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Friday, May 21, 2021 6:49 PM
> 
> On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > >
> > > I analyzed performance of parallel inserts in CTAS for different
> > > cases with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We
> > > could gain if the tuple sizes are lower. But if the tuple size is
> > > larger i..e 1064bytes, there's a regression with parallel inserts.
> > > Upon further analysis, it turned out that the parallel workers are
> > > requiring frequent extra blocks addition while concurrently
> > > extending the relation(in RelationAddExtraBlocks) and the majority
> > > of the time spent is going into flushing those new empty
> > > pages/blocks onto the disk.
> > >
> >
> > How you have ensured that the cost is due to the flushing of pages?
> > AFAICS, we don't flush the pages rather just write them and then
> > register those to be flushed by checkpointer, now it is possible that
> > the checkpointer sync queue gets full and the backend has to write by
> > itself but have we checked that? I think we can check via wait events,
> > if it is due to flush then we should see a lot of file sync
> > (WAIT_EVENT_DATA_FILE_SYNC) wait events.  The other possibility could
> > be that the free pages added to FSM by one worker are not being used
> > by another worker due to some reason. Can we debug and check if the
> > pages added by one worker are being used by another worker?
> 
> Thanks! I will work on the above points sometime later.

I noticed one place which could be one of the reasons that cause the performance degradation.

+        /*
+         * We don't need to skip contacting FSM while inserting tuples for
+         * parallel mode, while extending the relations, workers instead of
+         * blocking on a page while another worker is inserting, can check the
+         * FSM for another page that can accommodate the tuples. This results
+         * in major benefit for parallel inserts.
+         */
+        myState->ti_options = 0;

I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring performance gain.
In my test environment, if I change this code to use option " TABLE_INSERT_SKIP_FSM ", then there
seems no performance degradation . Could you please have a try on it ?
(I test with the SQL you provided earlier[1])

[1] https://www.postgresql.org/message-id/CALj2ACWFvNm4d_uqT2iECPqaXZjEd-O%2By8xbghvqXeMLj0pxGw%40mail.gmail.com

Best regards,
houzj

RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
Bharath-san, all,


Hmm, I didn't experience performance degradation on my poor-man's Linux VM (4 CPU, 4 GB RAM, HDD)...

[benchmark preparation]
autovacuum = off
shared_buffers = 1GB
checkpoint_timeout = 1h
max_wal_size = 8GB
min_wal_size = 8GB
(other settings to enable parallelism)
CREATE UNLOGGED TABLE a (c char(1100));
INSERT INTO a SELECT i FROM generate_series(1, 300000) i;
(the table size is 335 MB)

[benchmark]
CREATE TABLE b AS SELECT * FROM a;
DROP TABLE a;
CHECKPOINT;
(measure only CTAS)


[results]
parallel_leader_participation = off
  workers  time(ms)
  0  3921
  2  3290
  4  3132
parallel_leader_participation = on
  workers  time(ms)
  2  3266
  4  3247


Although this should be a controversial and may be crazy idea, the following change brought 4-11% speedup.  This is
becauseI thought parallel workers might contend for WAL flush as a result of them using the limited ring buffer and
flushingdirty buffers when the ring buffer is filled.  Can we take advantage of this?
 

[GetBulkInsertState]
/*  bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
    bistate->strategy = NULL;


[results]
parallel_leader_participation = off
  workers  time(ms)
  0  3695  (5% reduction)
  2  3135  (4% reduction)
  4  2767  (11% reduction)


Regards
Takayuki Tsunakawa


RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
> +        /*
> +         * We don't need to skip contacting FSM while inserting tuples
> for
> +         * parallel mode, while extending the relations, workers
> instead of
> +         * blocking on a page while another worker is inserting, can
> check the
> +         * FSM for another page that can accommodate the tuples.
> This results
> +         * in major benefit for parallel inserts.
> +         */
> +        myState->ti_options = 0;
> 
> I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring
> performance gain.
> In my test environment, if I change this code to use option "
> TABLE_INSERT_SKIP_FSM ", then there
> seems no performance degradation.

+1, probably.

Does the code comment represent the situation like this?

1. Worker 1 is inserting into page 1.

2. Worker 2 tries to insert into page 1, but cannot acquire the buffer content lock of page 1 because worker 1 holds
it.

3. Worker 2 looks up FSM to find a page with enough free space.

But isn't FSM still empty during CTAS?



Regards
Takayuki Tsunakawa


Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, May 25, 2021 at 12:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> I noticed one place which could be one of the reasons that cause the performance degradation.
>
> +               /*
> +                * We don't need to skip contacting FSM while inserting tuples for
> +                * parallel mode, while extending the relations, workers instead of
> +                * blocking on a page while another worker is inserting, can check the
> +                * FSM for another page that can accommodate the tuples. This results
> +                * in major benefit for parallel inserts.
> +                */
> +               myState->ti_options = 0;
>
> I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring performance gain.
> In my test environment, if I change this code to use option " TABLE_INSERT_SKIP_FSM ", then there
> seems no performance degradation . Could you please have a try on it ?
> (I test with the SQL you provided earlier[1])

Thanks for trying that out.

Please see the code around the use_fsm flag in
RelationGetBufferForTuple for more understanding of the points below.

What happens if FSM is skipped i.e. myState->ti_options =
TABLE_INSERT_SKIP_FSM;?
1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it
until it is full. When the block is full, the worker doesn't look in
FSM GetPageWithFreeSpace as use_fsm is false. It directly goes for
relation extension and tries to acquire relation extension lock with
LockRelationForExtension. Note that the bulk extension of blocks with
RelationAddExtraBlocks is not reached as use_fsm is false.
3) After acquiring the relation extension lock, it adds an extra new
block with ReadBufferBI(relation, P_NEW, ...), see the comment "In
addition to whatever extension we performed above, we always add at
least one block to satisfy our own request." The tuple is inserted
into this new block.

Basically, the workers can't look for the empty pages from the pages
added by other workers, they keep doing the above steps in silos.

What happens if FSM is not skipped i.e. myState->ti_options = 0;?
1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it
until it is full. When the block is full, the worker looks for the
page with free space in FSM GetPageWithFreeSpace as use_fsm is true.
If it can't find any page with the required amount of free space, it
goes for bulk relation extension(RelationAddExtraBlocks) after
acquiring relation extension lock with
ConditionalLockRelationForExtension. Then the worker adds extraBlocks
= Min(512, lockWaiters * 20); new blocks in RelationAddExtraBlocks and
immediately updates the bottom level of FSM for each block (see the
comment around RecordPageWithFreeSpace for why only the bottom level,
not the entire FSM tree). After all the blocks are added, then it
updates the entire FSM tree FreeSpaceMapVacuumRange.
4) After the bulk extension, then the worker adds another block see
the comment "In addition to whatever extension we performed above, we
always add at least one block to satisfy our own request." and inserts
tuple into this new block.

Basically, the workers can benefit from the bulk extension of the
relation and they always can look for the empty pages from the pages
added by other workers. There are high chances that the blocks will be
available after bulk extension. Having said that, if the added extra
blocks are consumed by the workers so fast i.e. if the tuple sizes are
big i.e very less tuples per page, then, the bulk extension too can't
help much and there will be more contention on the relation extension
lock. Well, one might think to add more blocks at a time, say
Min(1024, lockWaiters * 128/256/512) than currently extraBlocks =
Min(512, lockWaiters * 20);. This will work (i.e. we don't see any
regression with parallel inserts in CTAS patches), but it can't be a
practical solution. Because the total pages for the relation will be
more with many pages having more free space. Furthermore, the future
sequential scans on that relation might take a lot of time.

If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the
place(within if (myState->is_parallel)), then it will be effective for
leader i.e. leader will not look for FSM, but all the workers will,
because within if (myState->is_parallel_worker) in intorel_startup,
myState->ti_options = 0; for workers.

I ran tests with configuration shown at [1] for the case 4 (2
bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns,
tuple size 1064 bytes, 10mn tuples) with leader participation where
I'm seeing regression:

1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader
and workers, then my results are as follows:
0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275
2) when myState->ti_options = 0; for both leader and workers, then my
results are as follows:
0 workers - 1116184.718, 2 workers - 139798.055, 4 workers - 143022.409

I hope the above explanation and the test results should clarify the
fact that skipping FSM doesn't solve the problem. Let me know if
anything is not clear or I'm missing something.

[1] postgresql.conf parameters used:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
port = 5440

System Configuration:
RAM:     528GB
Disk Type:   SSD
Disk Size:   1.5TB
lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:              2
CPU MHz:               1064.000
CPU max MHz:           2129.0000
CPU min MHz:           1064.0000
BogoMIPS:              4266.62
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, May 25, 2021 at 1:10 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> Although this should be a controversial and may be crazy idea, the following change brought 4-11% speedup.  This is
becauseI thought parallel workers might contend for WAL flush as a result of them using the limited ring buffer and
flushingdirty buffers when the ring buffer is filled.  Can we take advantage of this? 
>
> [GetBulkInsertState]
> /*  bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
>     bistate->strategy = NULL;

You are right. If ring buffer(16MB) is not used and shared
buffers(1GB) are used instead, in your case since the table size is
335MB and it can fit in the shared buffers, there will not be any or
will be very minimal dirty buffer flushing, so there will be more some
more speedup.

Otherwise, the similar speed up can be observed when the BAS_BULKWRITE
is increased a bit from the current 16MB to some other reasonable
value. I earlier tried these experiments.

Otherwise, as I said in [1], we can also increase the number of extra
blocks added at a time, say Min(1024, lockWaiters * 128/256/512) than
currently extraBlocks = Min(512, lockWaiters * 20);. This will also
give some speedup and we don't see any regression with parallel
inserts in CTAS patches.

But, I'm not so sure that the hackers will agree any of the above as a
practical solution to the "relation extension" problem.

[1] https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Tue, May 25, 2021 at 1:50 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
> > +             /*
> > +              * We don't need to skip contacting FSM while inserting tuples
> > for
> > +              * parallel mode, while extending the relations, workers
> > instead of
> > +              * blocking on a page while another worker is inserting, can
> > check the
> > +              * FSM for another page that can accommodate the tuples.
> > This results
> > +              * in major benefit for parallel inserts.
> > +              */
> > +             myState->ti_options = 0;
> >
> > I am not quite sure that disabling the " SKIP FSM " in parallel worker will bring
> > performance gain.
> > In my test environment, if I change this code to use option "
> > TABLE_INSERT_SKIP_FSM ", then there
> > seems no performance degradation.
>
> +1, probably.

I tried to explain it at [1]. Please have a look.

> Does the code comment represent the situation like this?
>
> 1. Worker 1 is inserting into page 1.
>
> 2. Worker 2 tries to insert into page 1, but cannot acquire the buffer content lock of page 1 because worker 1 holds
it.
>
> 3. Worker 2 looks up FSM to find a page with enough free space.

I tried to explain it at [1]. Please have a look.

> But isn't FSM still empty during CTAS?

No, FSM will be built on the fly in case if we don't skip the FSM i.e.
myState->ti_options = 0, see RelationGetBufferForTuple with use_fsm =
true -> GetPageWithFreeSpace -> fsm_search -> fsm_set_and_search ->
fsm_readbuf with extend = true.

[1] https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> >
> > I analyzed performance of parallel inserts in CTAS for different cases
> > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> > gain if the tuple sizes are lower. But if the tuple size is larger
> > i..e 1064bytes, there's a regression with parallel inserts. Upon
> > further analysis, it turned out that the parallel workers are
> > requiring frequent extra blocks addition while concurrently extending
> > the relation(in RelationAddExtraBlocks) and the majority of the time
> > spent is going into flushing those new empty pages/blocks onto the
> > disk.
> >
>
> How you have ensured that the cost is due to the flushing of pages?

I think I'm wrong to just say the problem is with the flushing of
empty pages when bulk extending the relation. I should have said the
problem is with the "relation extension lock", but I will hold on to
it for a moment until I capture the relation extension lock wait
events for the regression causing cases. I will share the information
soon.

> AFAICS, we don't flush the pages rather just write them and then
> register those to be flushed by checkpointer, now it is possible that
> the checkpointer sync queue gets full and the backend has to write by
> itself but have we checked that? I think we can check via wait events,
> if it is due to flush then we should see a lot of file sync
> (WAIT_EVENT_DATA_FILE_SYNC) wait events.

I will also capture the data file sync events along with relation
extension lock wait events.

> The other possibility could
> be that the free pages added to FSM by one worker are not being used
> by another worker due to some reason. Can we debug and check if the
> pages added by one worker are being used by another worker?

I tried to explain it at [1]. Please have a look. It looks like the
burden is more on the "relation extension lock" and the way the extra
new blocks are getting added.

[1] https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Wed, May 26, 2021 at 5:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
>
> > The other possibility could
> > be that the free pages added to FSM by one worker are not being used
> > by another worker due to some reason. Can we debug and check if the
> > pages added by one worker are being used by another worker?
>
> I tried to explain it at [1]. Please have a look.
>

I have read it but I think we should try to ensure practically what is
happening because it is possible that first time worker checked in FSM
without taking relation extension lock, it didn't find any free page,
and then when it tried to acquire the conditional lock, it got the
same and just extended the relation by one block. So, in such a case
it won't be able to use the newly added pages by another worker. I am
not sure any such thing is happening here but I think it is better to
verify it in some way. Also, I am not sure if just getting the info
about the relation extension lock is sufficient?

--
With Regards,
Amit Kapila.



RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Wednesday, May 26, 2021 7:22 PM
> Thanks for trying that out.
> 
> Please see the code around the use_fsm flag in RelationGetBufferForTuple for
> more understanding of the points below.
> 
> What happens if FSM is skipped i.e. myState->ti_options =
> TABLE_INSERT_SKIP_FSM;?
> 1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple.
> 2) Each worker initially gets a block and keeps inserting into it until it is full.
> When the block is full, the worker doesn't look in FSM GetPageWithFreeSpace
> as use_fsm is false. It directly goes for relation extension and tries to acquire
> relation extension lock with LockRelationForExtension. Note that the bulk
> extension of blocks with RelationAddExtraBlocks is not reached as use_fsm is
> false.
> 3) After acquiring the relation extension lock, it adds an extra new block with
> ReadBufferBI(relation, P_NEW, ...), see the comment "In addition to whatever
> extension we performed above, we always add at least one block to satisfy our
> own request." The tuple is inserted into this new block.
> 
> Basically, the workers can't look for the empty pages from the pages added by
> other workers, they keep doing the above steps in silos.
> 
> What happens if FSM is not skipped i.e. myState->ti_options = 0;?
> 1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple.
> 2) Each worker initially gets a block and keeps inserting into it until it is full.
> When the block is full, the worker looks for the page with free space in FSM
> GetPageWithFreeSpace as use_fsm is true.
> If it can't find any page with the required amount of free space, it goes for bulk
> relation extension(RelationAddExtraBlocks) after acquiring relation extension
> lock with ConditionalLockRelationForExtension. Then the worker adds
> extraBlocks = Min(512, lockWaiters * 20); new blocks in
> RelationAddExtraBlocks and immediately updates the bottom level of FSM for
> each block (see the comment around RecordPageWithFreeSpace for why only
> the bottom level, not the entire FSM tree). After all the blocks are added, then
> it updates the entire FSM tree FreeSpaceMapVacuumRange.
> 4) After the bulk extension, then the worker adds another block see the
> comment "In addition to whatever extension we performed above, we always
> add at least one block to satisfy our own request." and inserts tuple into this
> new block.
> 
> Basically, the workers can benefit from the bulk extension of the relation and
> they always can look for the empty pages from the pages added by other
> workers. There are high chances that the blocks will be available after bulk
> extension. Having said that, if the added extra blocks are consumed by the
> workers so fast i.e. if the tuple sizes are big i.e very less tuples per page, then,
> the bulk extension too can't help much and there will be more contention on
> the relation extension lock. Well, one might think to add more blocks at a time,
> say Min(1024, lockWaiters * 128/256/512) than currently extraBlocks = Min(512,
> lockWaiters * 20);. This will work (i.e. we don't see any regression with parallel
> inserts in CTAS patches), but it can't be a practical solution. Because the total
> pages for the relation will be more with many pages having more free space.
> Furthermore, the future sequential scans on that relation might take a lot of
> time.
> 
> If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the place(within if
> (myState->is_parallel)), then it will be effective for leader i.e. leader will not
> look for FSM, but all the workers will, because within if
> (myState->is_parallel_worker) in intorel_startup,
> myState->ti_options = 0; for workers.
> 
> I ran tests with configuration shown at [1] for the case 4 (2 bigint(of 8 bytes
> each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn
> tuples) with leader participation where I'm seeing regression:
> 
> 1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader and
> workers, then my results are as follows:
> 0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275
> 2) when myState->ti_options = 0; for both leader and workers, then my results
> are as follows:
> 0 workers - 1116184.718, 2 workers - 139798.055, 4 workers - 143022.409
> I hope the above explanation and the test results should clarify the fact that
> skipping FSM doesn't solve the problem. Let me know if anything is not clear or
> I'm missing something.

Thanks for the explanation.
I followed your above test steps and the below configuration, but my test results are a little different from yours.
I am not sure the exact reason, maybe because of the hardware..

Test INSERT 10000000 rows((2 bigint(of 8 bytes) 16 name(of 64 bytes each) columns):
SERIAL: 22023.631 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 21824.934 ms  [SKIP FSM]: 19381.474 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 20481.117 ms   [SKIP FSM]: 18381.305 ms

I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine), 
I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk extension
bring is not much greater than the cost in FSM.
Do you have some ideas on it ?

My test machine:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
Stepping:            7
CPU MHz:             2901.005
CPU max MHz:         3200.0000
CPU min MHz:         1000.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            14080K

Best regards,
houzj

> [1] postgresql.conf parameters used:
> shared_buffers = 40GB
> max_worker_processes = 32
> max_parallel_maintenance_workers = 24
> max_parallel_workers = 32
> synchronous_commit = off
> checkpoint_timeout = 1d
> max_wal_size = 24GB
> min_wal_size = 15GB
> autovacuum = off
> port = 5440
> 
> System Configuration:
> RAM:     528GB
> Disk Type:   SSD
> Disk Size:   1.5TB
> lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                128
> On-line CPU(s) list:   0-127
> Thread(s) per core:    2
> Core(s) per socket:    8
> Socket(s):             8
> NUMA node(s):          8
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 47
> Model name:            Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
> Stepping:              2
> CPU MHz:               1064.000
> CPU max MHz:           2129.0000
> CPU min MHz:           1064.0000
> BogoMIPS:              4266.62
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              24576K


RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
Thank you for the detailed analysis, I'll look into it too.  (The times have changed...)

From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> Well, one might think to add more blocks at a time, say
> Min(1024, lockWaiters * 128/256/512) than currently extraBlocks =
> Min(512, lockWaiters * 20);. This will work (i.e. we don't see any
> regression with parallel inserts in CTAS patches), but it can't be a
> practical solution. Because the total pages for the relation will be
> more with many pages having more free space. Furthermore, the future
> sequential scans on that relation might take a lot of time.

> Otherwise, the similar speed up can be observed when the BAS_BULKWRITE
> is increased a bit from the current 16MB to some other reasonable
> value. I earlier tried these experiments.
> 
> Otherwise, as I said in [1], we can also increase the number of extra
> blocks added at a time, say Min(1024, lockWaiters * 128/256/512) than
> currently extraBlocks = Min(512, lockWaiters * 20);. This will also
> give some speedup and we don't see any regression with parallel
> inserts in CTAS patches.
> 
> But, I'm not so sure that the hackers will agree any of the above as a
> practical solution to the "relation extension" problem.

I think I understand your concern about resource consumption and impact on other concurrently running jobs (OLTP, data
analysis.)

OTOH, what's the situation like when the user wants to run CTAS, and further, wants to speed it up by using
parallelism? isn't it okay to let the (parallel) CTAS use as much as it wants?  At least, I think we can provide
anothermode for it, like Oracle provides conditional path mode and direct path mode for INSERT and data loading.
 

What do we want to do to maximize parallel CTAS speedup if we were a bit unshackled from the current constraints
(alignmentwith existing code, impact on other concurrent workloads)?
 

* Use as many shared buffers as possible to decrease WAL flush.
Otherwise, INSERT SELECT may be faster?

* Minimize relation extension (= increase the block count per extension)
posix_fallocate() would help too.

* Allocate added pages among parallel workers, and each worker fills pages to their full capacity.
The worker that extended the relation stores the page numbers of added pages in shared memory for parallel execution.
Eachworker gets a page from there after waiting for the relation extension lock, instead of using FSM.
 
The last pages that the workers used will be filled halfway, but the amount of unused space should be low compared to
thetotal table size.
 



Regards
Takayuki Tsunakawa


Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> I followed your above test steps and the below configuration, but my test results are a little different from yours.
> I am not sure the exact reason, maybe because of the hardware..
>
> Test INSERT 10000000 rows((2 bigint(of 8 bytes) 16 name(of 64 bytes each) columns):
> SERIAL: 22023.631 ms
> PARALLEL 2 WORKER [NOT SKIP FSM]: 21824.934 ms  [SKIP FSM]: 19381.474 ms
> PARALLEL 4 WORKER [NOT SKIP FSM]: 20481.117 ms   [SKIP FSM]: 18381.305 ms

I'm not sure why there's a huge difference in the execution time, on
your system it just takes ~20sec whereas on my system(with SSD) it
takes ~115 sec. I hope you didn't try creating the unlogged table in
CTAS right? Just for reference, the exact use case I tried is at [1].
The configure command I used to build the postgres source code is at
[2]. I don't know whether I'm missing something here.

[1] case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes
each) columns, tuple size 1064 bytes, 10mn tuples
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,10000000),
generate_series(1,10000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;

[2] ./configure --with-zlib --prefix=$PWD/inst/ --with-openssl
--with-readline  --with-libxml  > war.log && make -j 8 install >
war.log 2>&1 &

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Wed, May 26, 2021 at 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, May 26, 2021 at 5:28 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> >
> > > The other possibility could
> > > be that the free pages added to FSM by one worker are not being used
> > > by another worker due to some reason. Can we debug and check if the
> > > pages added by one worker are being used by another worker?
> >
> > I tried to explain it at [1]. Please have a look.
> >
>
> I have read it but I think we should try to ensure practically what is
> happening because it is possible that first time worker checked in FSM
> without taking relation extension lock, it didn't find any free page,
> and then when it tried to acquire the conditional lock, it got the
> same and just extended the relation by one block. So, in such a case
> it won't be able to use the newly added pages by another worker. I am
> not sure any such thing is happening here but I think it is better to
> verify it in some way. Also, I am not sure if just getting the info
> about the relation extension lock is sufficient?
>

One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended the
relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I have read it but I think we should try to ensure practically what is
> > happening because it is possible that first time worker checked in FSM
> > without taking relation extension lock, it didn't find any free page,
> > and then when it tried to acquire the conditional lock, it got the
> > same and just extended the relation by one block. So, in such a case
> > it won't be able to use the newly added pages by another worker. I am
> > not sure any such thing is happening here but I think it is better to
> > verify it in some way. Also, I am not sure if just getting the info
> > about the relation extension lock is sufficient?
> >
>
> One idea to find this out could be that we have three counters for
> each worker which counts the number of times each worker extended the
> relation in bulk, the number of times each worker extended the
> relation by one block, the number of times each worker gets the page
> from FSM. It might be possible that with this we will be able to
> figure out why there is a difference between your and Hou-San's
> results.

Yeah, that helps. And also, the time spent in
LockRelationForExtension, ConditionalLockRelationForExtension,
GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
insight.

My plan is to have a patch with above info added in (which I will
share it here so that others can test and see the results too) and run
the "case 4" where there's a regression seen on my system.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
> I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk extension
> bring is not much greater than the cost in FSM.
> Do you have some ideas on it ?

I think, if we try what Amit and I said in [1], we should get some
insights on whether the bulk relation extension is taking more time or
the FSM lookup. I plan to share the testing patch adding the timings
and the counters so that you can also test from your end. I hope
that's fine with you.

[1] - https://www.postgresql.org/message-id/CALj2ACXskhY58%3DFh8TioKLL1DXYkKdyEyWFYykf-6aLJgJ2qmQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Thu, May 27, 2021 at 10:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
> > I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk
extension
> > bring is not much greater than the cost in FSM.
> > Do you have some ideas on it ?
>
> I think, if we try what Amit and I said in [1], we should get some
> insights on whether the bulk relation extension is taking more time or
> the FSM lookup. I plan to share the testing patch adding the timings
> and the counters so that you can also test from your end. I hope
> that's fine with you.

I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in.  Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.

2. I think the parallel worker are scanning are producing a lot of
tuple in a short time so the demand for the new block is very high
compare to what AddExtra block is able to produce, so maybe you can
try adding more block by increasing the multiplier and see what is the
impact.

3. Also try where the underlying select query has some complex
condition and also it select fewer record say 50%, 40%...10% and see
what are the numbers.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Dilip Kumar <dilipbalaut@gmail.com>
> I think some other cause of contention on relation extension locks are
> 1. CTAS is using a buffer strategy and due to that, it might need to
> evict out the buffer frequently for getting the new block in.  Maybe
> we can identify by turning off the buffer strategy for CTAS and
> increasing the shared buffer so that data fits in memory.

Yes, both Bhrath-san (on a rich-man's machine) and I (on a poor-man's VM) saw that it's effective.  I think we should
removethis shackle from CTAS.
 

The question is why CTAS chose to use BULKWRITE strategy in the past.  We need to know that to make a better decision.
Ican understand why VACUUM uses a ring buffer, because it should want to act humbly as a background maintenance task to
notcause trouble to frontend tasks.  But why does CTAS have to be humble?  If CTAS needs to be modest, why doesn't it
usethe BULKREAD strategy for its SELECT?
 


Regards
Takayuki Tsunakawa


Re: Parallel Inserts in CREATE TABLE AS

From
Dilip Kumar
Date:
On Thu, 27 May 2021 at 11:32 AM, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
From: Dilip Kumar <dilipbalaut@gmail.com>
> I think some other cause of contention on relation extension locks are
> 1. CTAS is using a buffer strategy and due to that, it might need to
> evict out the buffer frequently for getting the new block in.  Maybe
> we can identify by turning off the buffer strategy for CTAS and
> increasing the shared buffer so that data fits in memory.

Yes, both Bhrath-san (on a rich-man's machine) and I (on a poor-man's VM) saw that it's effective.  I think we should remove this shackle from CTAS.

The question is why CTAS chose to use BULKWRITE strategy in the past.  We need to know that to make a better decision.

Basically you are creating a new table and loading data to it and that means you will be less likely to access those data soon so for such thing spoiling buffer cache may not be a good idea.   I was just suggesting only for experiments for identifying the root cause.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 12:46 PM
> On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > I am afraid that the using the FSM seems not get a stable performance
> > gain(at least on my machine), I will take a deep look into this to
> > figure out the difference. A naive idea it that the benefit that bulk extension
> bring is not much greater than the cost in FSM.
> > Do you have some ideas on it ?
> 
> I think, if we try what Amit and I said in [1], we should get some insights on
> whether the bulk relation extension is taking more time or the FSM lookup. I
> plan to share the testing patch adding the timings and the counters so that you
> can also test from your end. I hope that's fine with you.

Sure, it will be nice if we can calculate the exact time. Thanks in advance.

BTW, I checked my test results, I was testing INSERT INTO unlogged table.
I re-test INSERT into normal(logged) table again, it seems [SKIP FSM] still Looks slightly better.
Although, the 4 workers case still has performance degradation compared to serial case.

SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]: 58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]: 66,960.305 ms

Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 12:19 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> BTW, I checked my test results, I was testing INSERT INTO unlogged table.

What do you mean by "testing INSERT INTO"? Is it that you are testing
the timings for parallel inserts in INSERT INTO ... SELECT command? If
so, why should we test parallel inserts in the INSERT INTO ... SELECT
command here?

The way I test parallel inserts in CTAS is: Apply the latest v23 patch
set available at [1]. Run the data preparation sqls from [2]. Enable
timing and run the CTAS query from [3]. Run with 0, 2 and 4 workers
with leader participation on.

[1] - https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com

[2]
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,100000),
generate_series(1,10000000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));

[3]
EXPLAIN ANALYZE VERBOSE CREATE TABLE test AS SELECT * FROM tenk1;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 2:59 PM
> On Thu, May 27, 2021 at 12:19 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > BTW, I checked my test results, I was testing INSERT INTO unlogged table.
> 
> What do you mean by "testing INSERT INTO"? Is it that you are testing the
> timings for parallel inserts in INSERT INTO ... SELECT command? If so, why
> should we test parallel inserts in the INSERT INTO ... SELECT command here?

Oops, sorry, it's a typo, I actually meant CREATE TABLE AS SELECT.

Best regards,
houzj

RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Dilip Kumar <dilipbalaut@gmail.com> 
Basically you are creating a new table and loading data to it and that means you will be less likely to access those
datasoon so for such thing spoiling buffer cache may not be a good idea.
 
--------------------------------------------------

Some people, including me, would say that the table will be accessed soon and that's why the data is loaded quickly
duringminimal maintenance hours.
 


--------------------------------------------------
I was just suggesting only for experiments for identifying the root cause.
--------------------------------------------------

I thought this is a good chance to possibly change things better (^^).
I guess the user would simply think like this: "I just want to finish CTAS as quickly as possible, so I configured to
takeadvantage of parallelism.  I want CTAS to make most use of our resources.  Why doesn't Postgres try to limit
resourceusage (by using the ring buffer) against my will?"
 


Regards
Takayuki Tsunakawa


Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 12:46 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Dilip Kumar <dilipbalaut@gmail.com>
> Basically you are creating a new table and loading data to it and that means you will be less likely to access those
datasoon so for such thing spoiling buffer cache may not be a good idea. 
> --------------------------------------------------
>
> Some people, including me, would say that the table will be accessed soon and that's why the data is loaded quickly
duringminimal maintenance hours. 
>
>
> --------------------------------------------------
> I was just suggesting only for experiments for identifying the root cause.
> --------------------------------------------------
>
> I thought this is a good chance to possibly change things better (^^).
> I guess the user would simply think like this: "I just want to finish CTAS as quickly as possible, so I configured to
takeadvantage of parallelism.  I want CTAS to make most use of our resources.  Why doesn't Postgres try to limit
resourceusage (by using the ring buffer) against my will?" 

If the idea is to give the user control of whether or not to use the
separate RING BUFFER for bulk inserts/writes, then how about giving it
as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
and COPY. Furthermore, we could make the rel option an integer and
allow users to provide the size of the ring buffer they want to choose
for a particular bulk insert operation (of course with a max limit
which is not exceeding the shared buffers or some reasonable amount
not exceeding the RAM of the system).

I think we can discuss this in a separate thread and see what other
hackers think.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
> Although, the 4 workers case still has performance degradation compared to
> serial case.
> 
> SERIAL: 58759.213 ms
> PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]:
> 58633.924 ms
> PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]:
> 66,960.305 ms

Can you see any difference in table sizes?


Regards
Takayuki Tsunakawa


Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 1:03 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
> > Although, the 4 workers case still has performance degradation compared to
> > serial case.
> >
> > SERIAL: 58759.213 ms
> > PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]:
> > 58633.924 ms
> > PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]:
> > 66,960.305 ms
>
> Can you see any difference in table sizes?

Also, the number of pages the table occupies in each case along with
table size would give more insights.

I do as follows to get the number of pages a relation occupies:
CREATE EXTENSION pgstattuple;
SELECT pg_relpages('test');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> I think we can discuss this in a separate thread and see what other
> hackers think.

OK, unless we won't get stuck in the current direction.  (Our goal is to not degrade in performance, but to outperform
serialexecution, isn't it?)
 


> If the idea is to give the user control of whether or not to use the
> separate RING BUFFER for bulk inserts/writes, then how about giving it
> as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
> being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
> and COPY. Furthermore, we could make the rel option an integer and
> allow users to provide the size of the ring buffer they want to choose
> for a particular bulk insert operation (of course with a max limit
> which is not exceeding the shared buffers or some reasonable amount
> not exceeding the RAM of the system).

I think it's not a table property but an execution property.  So, it'd be appropriate to control it with the SET
command,just like the DBA sets work_mem and maintenance_work_mem for specific maintenance operations.
 

I'll stop on this here...


Regards
Takayuki Tsunakawa


Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Thu, May 27, 2021 at 10:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 10:16 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, May 27, 2021 at 7:12 AM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > > I am afraid that the using the FSM seems not get a stable performance gain(at least on my machine),
> > > I will take a deep look into this to figure out the difference. A naive idea it that the benefit that bulk
extension
> > > bring is not much greater than the cost in FSM.
> > > Do you have some ideas on it ?
> >
> > I think, if we try what Amit and I said in [1], we should get some
> > insights on whether the bulk relation extension is taking more time or
> > the FSM lookup. I plan to share the testing patch adding the timings
> > and the counters so that you can also test from your end. I hope
> > that's fine with you.
>
> I think some other cause of contention on relation extension locks are
> 1. CTAS is using a buffer strategy and due to that, it might need to
> evict out the buffer frequently for getting the new block in.  Maybe
> we can identify by turning off the buffer strategy for CTAS and
> increasing the shared buffer so that data fits in memory.
>

One more thing to ensure is whether all the workers are using the same
access strategy?

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I think some other cause of contention on relation extension locks are
> > 1. CTAS is using a buffer strategy and due to that, it might need to
> > evict out the buffer frequently for getting the new block in.  Maybe
> > we can identify by turning off the buffer strategy for CTAS and
> > increasing the shared buffer so that data fits in memory.
> >
>
> One more thing to ensure is whether all the workers are using the same
> access strategy?

In the Parallel Inserts in CTAS patches, the leader and each worker
uses its own ring buffer of 16MB i.e. does myState->bistate =
GetBulkInsertState(); separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 3:41 PM
> 
> On Thu, May 27, 2021 at 1:03 PM tsunakawa.takay@fujitsu.com
> <tsunakawa.takay@fujitsu.com> wrote:
> >
> > From: houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
> > > Although, the 4 workers case still has performance degradation
> > > compared to serial case.
> > >
> > > SERIAL: 58759.213 ms
> > > PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]:
> > > 58633.924 ms
> > > PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]:
> > > 66,960.305 ms
> >
> > Can you see any difference in table sizes?
> 
> Also, the number of pages the table occupies in each case along with table size
> would give more insights.
> 
> I do as follows to get the number of pages a relation occupies:
> CREATE EXTENSION pgstattuple;
> SELECT pg_relpages('test');

It seems the difference between SKIP FSM and NOT SKIP FSM is not big.
I tried serval times and the average result is almost the same.

pg_relpages
-------------
     1428575

pg_relation_size
-------------
11702976512(11G)

Best regards,
houzj


Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > One idea to find this out could be that we have three counters for
> > each worker which counts the number of times each worker extended the
> > relation in bulk, the number of times each worker extended the
> > relation by one block, the number of times each worker gets the page
> > from FSM. It might be possible that with this we will be able to
> > figure out why there is a difference between your and Hou-San's
> > results.
>
> Yeah, that helps. And also, the time spent in
> LockRelationForExtension, ConditionalLockRelationForExtension,
> GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
> insight.
>
> My plan is to have a patch with above info added in (which I will
> share it here so that others can test and see the results too) and run
> the "case 4" where there's a regression seen on my system.

I captured below information with the attached patch
0001-test-times-and-block-counts.patch applied on top of CTAS v23
patch set. Testing details are attached in the file named "test".
Total time spent in LockRelationForExtension
Total time spent in GetPageWithFreeSpace
Total time spent in RelationAddExtraBlocks
Total number of times extended the relation in bulk
Total number of times extended the relation by one block
Total number of blocks added in bulk extension
Total number of times getting the page from FSM

Here is a summary of what I observed:
1) The execution time with 2 workers, without TABLE_INSERT_SKIP_FSM
(140 sec) is more than with 0 workers (112 sec)
2) The execution time with 2 workers, with TABLE_INSERT_SKIP_FSM (225
sec) is more than with 2 workers, without TABLE_INSERT_SKIP_FSM (140
sec)
3) Majority of the time is going into waiting for relation extension
lock in LockRelationForExtension. With 2 workers, without
TABLE_INSERT_SKIP_FSM, out of total execution time 140 sec, the time
spent in LockRelationForExtension is ~40 sec and the time spent in
RelationAddExtraBlocks is ~20 sec. So, ~60 sec are being spent in
these two functions. With 2 workers, with TABLE_INSERT_SKIP_FSM, out
of total execution time 225 sec, the time spent in
LockRelationForExtension is ~135 sec and the time spent in
RelationAddExtraBlocks is 0 sec (because we skip FSM, no bulk extend
logic applies). So, most of the time is being spent in
LockRelationForExtension.

I'm still not sure why the execution time with 0 workers (or serial
execution or no parallelism involved) on my testing system is 112 sec
compared to 58 sec on Hou-San's system for the same use case. Maybe
the testing system I'm using is not of the latest configuration
compared to others.

Having said that, I request others to try and see if the same
observations (as above) are made on their testing systems for the same
use case. If others don't see regression (with just 2 workers) or they
observe not much difference with and without TABLE_INSERT_SKIP_FSM.
I'm open to changing the parallel inserts in CTAS code to use
TABLE_INSERT_SKIP_FSM. In any case, if the observation is that there's
a good amount of time being spent in LockRelationForExtension, I'm not
sure (at this point) whether we can do something here or just live
with it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> I'm still not sure why the execution time with 0 workers (or serial execution or
> no parallelism involved) on my testing system is 112 sec compared to 58 sec on
> Hou-San's system for the same use case. Maybe the testing system I'm using
> is not of the latest configuration compared to others.

What's the setting of wal_level on your two's systems?  I thought it could be that you set it to > minimal, while
Hou-sanset it to minimal.  (I forgot the results of 2 and 4 workers, though.)
 


Regards
Takayuki Tsunakawa


RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Sent: Friday, May 28, 2021 8:55 AM
> To: 'Bharath Rupireddy' <bharath.rupireddyforpostgres@gmail.com>; Hou,
> Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
> Cc: Amit Kapila <amit.kapila16@gmail.com>; Tang, Haiying/唐 海英
> <tanghy.fnst@fujitsu.com>; PostgreSQL-development
> <pgsql-hackers@postgresql.org>; Zhihong Yu <zyu@yugabyte.com>; Luc
> Vlaming <luc@swarm64.com>; Dilip Kumar <dilipbalaut@gmail.com>;
> vignesh C <vignesh21@gmail.com>
> Subject: RE: Parallel Inserts in CREATE TABLE AS
> 
> From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > I'm still not sure why the execution time with 0 workers (or serial
> > execution or no parallelism involved) on my testing system is 112 sec
> > compared to 58 sec on Hou-San's system for the same use case. Maybe
> > the testing system I'm using is not of the latest configuration compared to
> others.
> 
> What's the setting of wal_level on your two's systems?  I thought it could be
> that you set it to > minimal, while Hou-san set it to minimal.  (I forgot the
> results of 2 and 4 workers, though.)

I think I followed the configuration that Bharath-san mentioned.
It could be the hardware's difference, because I am not using SSD.
I will try to test on SSD to see if there is some difference.

I only change the the following configuration:

shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

Best regards,
houzj

Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > One idea to find this out could be that we have three counters for
> > > each worker which counts the number of times each worker extended the
> > > relation in bulk, the number of times each worker extended the
> > > relation by one block, the number of times each worker gets the page
> > > from FSM. It might be possible that with this we will be able to
> > > figure out why there is a difference between your and Hou-San's
> > > results.
> >
> > Yeah, that helps. And also, the time spent in
> > LockRelationForExtension, ConditionalLockRelationForExtension,
> > GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
> > insight.
> >
> > My plan is to have a patch with above info added in (which I will
> > share it here so that others can test and see the results too) and run
> > the "case 4" where there's a regression seen on my system.
>
> I captured below information with the attached patch
> 0001-test-times-and-block-counts.patch applied on top of CTAS v23
> patch set. Testing details are attached in the file named "test".
> Total time spent in LockRelationForExtension
> Total time spent in GetPageWithFreeSpace
> Total time spent in RelationAddExtraBlocks
> Total number of times extended the relation in bulk
> Total number of times extended the relation by one block
> Total number of blocks added in bulk extension
> Total number of times getting the page from FSM
>

In your results, the number of pages each process is getting from FSM
is not matching with the number of blocks added. I think we need to
increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
because that is also called and the process can get a free page via
the same. The other thing to check via debugger is when one worker
adds the blocks in bulk does another parallel worker gets all those
blocks. You can achieve that by allowing one worker (say worker-1) to
extend the relation in bulk and then let it wait and allow another
worker (say worker-2) to proceed and see if it gets all the pages
added by worker-1 from FSM. You need to keep the leader also waiting
or not perform any operation.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Fri, May 28, 2021 at 6:24 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > I'm still not sure why the execution time with 0 workers (or serial execution or
> > no parallelism involved) on my testing system is 112 sec compared to 58 sec on
> > Hou-San's system for the same use case. Maybe the testing system I'm using
> > is not of the latest configuration compared to others.
>
> What's the setting of wal_level on your two's systems?  I thought it could be that you set it to > minimal, while
Hou-sanset it to minimal.  (I forgot the results of 2 and 4 workers, though.)
 

Thanks. I was earlier running with default wal_level = replica.

Results on my system, with wal_level = minimal, PSA file
"test_results2" for more details:
Without TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 61875.255 ms (01:01.875)
2 workers - Time: 89227.379 ms (01:29.227)
4 workers - Time: 81484.876 ms (01:21.485)
With TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 61279.764 ms (01:01.280)
2 workers - Time: 208620.453 ms (03:28.620)
4 workers - Time: 223737.081 ms (03:43.737)

Results on my system, with wal_level = replica, PSA file
"test_results1" for more details:
Without TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 112175.273 ms (01:52.175)
2 workers - Time: 140441.158 ms (02:20.441)
4 workers - Time: 141750.577 ms (02:21.751)

With TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 112637.906 ms (01:52.638)
2 workers - Time: 225358.287 ms (03:45.358)
4 workers - Time: 242172.600 ms (04:02.173)

Results on Hou-san's system:
SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]: 58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]: 66,960.305 ms

Majority of the time is being spent in LockRelationForExtension,
RelationAddExtraBlocks without TABLE_INSERT_SKIP_FSM and in
LockRelationForExtension with TABLE_INSERT_SKIP_FSM. The observations
made at [1] still hold true with wal_level = minimal.

I request Hou-san to capture the same info with the add-on patch
shared earlier. This would help us to be on the same page. We can
further think on:
1) Why so much time is being spent in LockRelationForExtension?
2) Whether to use TABLE_INSERT_SKIP_FSM or not, in other words,
whether to take advantage of bulk relation extension or not.
3) If bulk relation extension is to be used i.e. without
TABLE_INSERT_SKIP_FSM flag, then whether the blocks being added by one
worker are immediately visible to other workers or not after it
finishes adding all the blocks.

[1] - https://www.postgresql.org/message-id/CALj2ACV-VToW65BE6ndDEB7S_3qhzQ_BUWtw2q6V88iwTwwPSg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

RE: Parallel Inserts in CREATE TABLE AS

From
"houzj.fnst@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Sent: Thursday, May 27, 2021 10:07 PM
> On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > One idea to find this out could be that we have three counters for
> > > each worker which counts the number of times each worker extended
> > > the relation in bulk, the number of times each worker extended the
> > > relation by one block, the number of times each worker gets the page
> > > from FSM. It might be possible that with this we will be able to
> > > figure out why there is a difference between your and Hou-San's
> > > results.
> >
> > Yeah, that helps. And also, the time spent in
> > LockRelationForExtension, ConditionalLockRelationForExtension,
> > GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
> > insight.
> >
> > My plan is to have a patch with above info added in (which I will
> > share it here so that others can test and see the results too) and run
> > the "case 4" where there's a regression seen on my system.
> 
> I captured below information with the attached patch
> 0001-test-times-and-block-counts.patch applied on top of CTAS v23 patch set.
> Testing details are attached in the file named "test".
> Total time spent in LockRelationForExtension Total time spent in
> GetPageWithFreeSpace Total time spent in RelationAddExtraBlocks Total
> number of times extended the relation in bulk Total number of times extended
> the relation by one block Total number of blocks added in bulk extension Total
> number of times getting the page from FSM
> 
> Here is a summary of what I observed:
> 1) The execution time with 2 workers, without TABLE_INSERT_SKIP_FSM
> (140 sec) is more than with 0 workers (112 sec)
> 2) The execution time with 2 workers, with TABLE_INSERT_SKIP_FSM (225
> sec) is more than with 2 workers, without TABLE_INSERT_SKIP_FSM (140
> sec)
> 3) Majority of the time is going into waiting for relation extension lock in
> LockRelationForExtension. With 2 workers, without TABLE_INSERT_SKIP_FSM,
> out of total execution time 140 sec, the time spent in LockRelationForExtension
> is ~40 sec and the time spent in RelationAddExtraBlocks is ~20 sec. So, ~60 sec
> are being spent in these two functions. With 2 workers, with
> TABLE_INSERT_SKIP_FSM, out of total execution time 225 sec, the time spent
> in LockRelationForExtension is ~135 sec and the time spent in
> RelationAddExtraBlocks is 0 sec (because we skip FSM, no bulk extend logic
> applies). So, most of the time is being spent in LockRelationForExtension.
> 
> I'm still not sure why the execution time with 0 workers (or serial execution or
> no parallelism involved) on my testing system is 112 sec compared to 58 sec on
> Hou-San's system for the same use case. Maybe the testing system I'm using is
> not of the latest configuration compared to others.
> 
> Having said that, I request others to try and see if the same observations (as
> above) are made on their testing systems for the same use case. If others don't
> see regression (with just 2 workers) or they observe not much difference with
> and without TABLE_INSERT_SKIP_FSM.

Thanks for the patch !
I attached my test results. Note I did not change the wal_level to minimal.

I only change the the following configuration:

shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

Best regards,
houzj

Attachment

Re: Parallel Inserts in CREATE TABLE AS

From
Amit Kapila
Date:
On Fri, May 28, 2021 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
> >
> > I captured below information with the attached patch
> > 0001-test-times-and-block-counts.patch applied on top of CTAS v23
> > patch set. Testing details are attached in the file named "test".
> > Total time spent in LockRelationForExtension
> > Total time spent in GetPageWithFreeSpace
> > Total time spent in RelationAddExtraBlocks
> > Total number of times extended the relation in bulk
> > Total number of times extended the relation by one block
> > Total number of blocks added in bulk extension
> > Total number of times getting the page from FSM
> >
>
> In your results, the number of pages each process is getting from FSM
> is not matching with the number of blocks added. I think we need to
> increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
> because that is also called and the process can get a free page via
> the same. The other thing to check via debugger is when one worker
> adds the blocks in bulk does another parallel worker gets all those
> blocks. You can achieve that by allowing one worker (say worker-1) to
> extend the relation in bulk and then let it wait and allow another
> worker (say worker-2) to proceed and see if it gets all the pages
> added by worker-1 from FSM. You need to keep the leader also waiting
> or not perform any operation.
>

While looking at results, I have observed one more thing that we are
trying to parallelize I/O due to which we might not be seeing benefit
in such cases. I think even for non-write queries there won't be any
(much) benefit if we can't parallelize CPU usage. Basically, the test
you are doing is for statement: explain analyze verbose create table
test as select * from tenk1;. Now, in this statement, there is no
qualification and still, the Gather node is generated for it, this
won't be the case if we check "select * from tenk1". Is it due to the
reason that the patch completely ignores the parallel_tuple_cost? But
still, it should prefer a serial plan due parallel_setup_cost, why is
that not happening? Anyway, I think we should not parallelize such
queries where we can't parallelize CPU usage. Have you tried the cases
without changing any of the costings for parallelism?

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts in CREATE TABLE AS

From
Bharath Rupireddy
Date:
On Sat, May 29, 2021 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> While looking at results, I have observed one more thing that we are
> trying to parallelize I/O due to which we might not be seeing benefit
> in such cases. I think even for non-write queries there won't be any
> (much) benefit if we can't parallelize CPU usage. Basically, the test
> you are doing is for statement: explain analyze verbose create table
> test as select * from tenk1;. Now, in this statement, there is no
> qualification and still, the Gather node is generated for it, this
> won't be the case if we check "select * from tenk1". Is it due to the
> reason that the patch completely ignores the parallel_tuple_cost? But
> still, it should prefer a serial plan due parallel_setup_cost, why is
> that not happening? Anyway, I think we should not parallelize such
> queries where we can't parallelize CPU usage. Have you tried the cases
> without changing any of the costings for parallelism?

Hi,

I measured the execution timings for parallel inserts in CTAS in cases
where the planner chooses parallelism for selects naturally. This
means, I have used only 0001 patch from v23 patch set at [1]. I have
not used the 0002 patch that makes parallel_tuple_cost 0.

Query used for all these tests is below. Also, attached table creation
sqls in the file "test_cases".
EXPLAIN (ANALYZE, VERBOSE) create table test1 as select * from tenk1
t1, tenk2 t2 where t1.c1 = t2.d2;

All the results are of the form (number of workers, exec time in milli sec).

Test case 1: both tenk1 and tenk2 are of tables with 1 integer(of 4
bytes) columns, tuple size 28 bytes, 100mn tuples
master: (0, 277886.951 ms), (2, 171183.221 ms), (4, 159703.496 ms)
with parallel inserts CTAS patch: (0, 264709.186 ms), (2, 128354.448
ms), (4, 111533.731 ms)

Test case 2: both tenk1 and tenk2 are of tables with 2 integer(of 4
bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples
master: (0, 453505.228 ms), (2, 236762.759 ms), (4, 219038.126 ms)
with parallel inserts CTAS patch: (0, 470483.818 ms), (2, 219374.198
ms), (4, 203543.681 ms)

Test case 3: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8),
tuple size 241 bytes, 100mn tuples
master: (0, 1052725.928 ms), (2, 592968.486 ms), (4, 562137.891 ms)
with parallel inserts CTAS patch: (0, 1019086.805 ms), (2, 634448.322
ms), (4, 680793.305 ms)

Test case 4: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 16 name(of 64 bytes each) columns, tuple size
1064 bytes, 10mn tuples
master: (0, 371931.497 ms), (2, 247206.841 ms), (4, 241959.839 ms)
with parallel inserts CTAS patch: (0, 396342.329 ms), (2, 333860.472
ms), (4, 317895.558 ms)

Observation: parallel insert + parallel select gives good benefit wIth
very lesser tuple sizes, cases 1 and 2. If the tuple size is bigger
serial insert + parallel select fares better, cases 3 and 4.

In the coming days, I will try to work on more performance analysis
and clarify some of the points raised upthread.

[1] - https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com
[2] - postgresql.conf changes I made:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = on
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
wal_level = replica

With Regards,
Bharath Rupireddy.

Attachment