Thread: [HACKERS] utility commands benefiting from parallel plan

[HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:
Hi Hackers,

Here I attached an implementation patch that allows
utility statements that have queries underneath such as
CREATE TABLE AS, CREATE MATERIALIZED VIEW
and REFRESH commands to benefit from parallel plan.

These write operations not performed concurrently by the
parallel workers, but the underlying query that is used by
these operations are eligible for parallel plans.

Currently the write operations are implemented for the
tuple dest types DestIntoRel and DestTransientRel.

Currently I am evaluating other write operations that can
benefit with parallelism without side effects in enabling them.

comments?

Regards,
Hari Babu
Fujitsu Australia
Attachment

Re: [HACKERS] utility commands benefiting from parallel plan

From
Dilip Kumar
Date:
On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Here I attached an implementation patch that allows
> utility statements that have queries underneath such as
> CREATE TABLE AS, CREATE MATERIALIZED VIEW
> and REFRESH commands to benefit from parallel plan.
>
> These write operations not performed concurrently by the
> parallel workers, but the underlying query that is used by
> these operations are eligible for parallel plans.
>
> Currently the write operations are implemented for the
> tuple dest types DestIntoRel and DestTransientRel.
>
> Currently I am evaluating other write operations that can
> benefit with parallelism without side effects in enabling them.

The Idea looks good to me.

Since we are already modifying heap_prepare_insert, I am thinking that
we can as well enable queries like "insert into .. select from .."
with minor modification?

- * For now, parallel operations are required to be strictly read-only.
- * Unlike heap_update() and heap_delete(), an insert should never create a
- * combo CID, so it might be possible to relax this restriction, but not
- * without more thought and testing.
+ * For now, parallel operations are required to be strictly read-only in
+ * parallel worker.
This statement is still not true, we can not do heap_update in the
leader even though worker are doing the read-only operation (update
with select).  We can change the comments such that it appears more
specific to insert I think.
-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] utility commands benefiting from parallel plan

From
Robert Haas
Date:
On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Here I attached an implementation patch that allows
> utility statements that have queries underneath such as
> CREATE TABLE AS, CREATE MATERIALIZED VIEW
> and REFRESH commands to benefit from parallel plan.
>
> These write operations not performed concurrently by the
> parallel workers, but the underlying query that is used by
> these operations are eligible for parallel plans.
>
> Currently the write operations are implemented for the
> tuple dest types DestIntoRel and DestTransientRel.
>
> Currently I am evaluating other write operations that can
> benefit with parallelism without side effects in enabling them.
>
> comments?

I think a lot more work than this will be needed.  See:

https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_1vU6H8oVyD=zyuLvRnJqTN==fvnhg@mail.gmail.com

...and the discussion which followed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:


On Sat, Feb 25, 2017 at 2:45 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Here I attached an implementation patch that allows
> utility statements that have queries underneath such as
> CREATE TABLE AS, CREATE MATERIALIZED VIEW
> and REFRESH commands to benefit from parallel plan.
>
> These write operations not performed concurrently by the
> parallel workers, but the underlying query that is used by
> these operations are eligible for parallel plans.
>
> Currently the write operations are implemented for the
> tuple dest types DestIntoRel and DestTransientRel.
>
> Currently I am evaluating other write operations that can
> benefit with parallelism without side effects in enabling them.

The Idea looks good to me.

Since we are already modifying heap_prepare_insert, I am thinking that
we can as well enable queries like "insert into .. select from .."
with minor modification?

Thanks for the review.

I am finding it not so easy in supporting write operations like INSERT,
DELETE and UPDATE commands to use parallelism benefits for the
queries that are underneath.

Currently the parallelism is enabled only for the tables that don't have
any triggers and indexes with expressions. This limitation can be
removed after a though testing.

To support the same, I removed all the errors from heap functions
and functions to get a new transaction and updating the command id
to the current snapshot (Required for the cases where a single command
validates the input).
 
Attached a WIP patch for the support for DML write operations.
There is no functional change in base utility write support patch.
 
Regards,
Hari Babu
Fujitsu Australia
Attachment

Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:


On Sat, Feb 25, 2017 at 3:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Here I attached an implementation patch that allows
> utility statements that have queries underneath such as
> CREATE TABLE AS, CREATE MATERIALIZED VIEW
> and REFRESH commands to benefit from parallel plan.
>
> These write operations not performed concurrently by the
> parallel workers, but the underlying query that is used by
> these operations are eligible for parallel plans.
>
> Currently the write operations are implemented for the
> tuple dest types DestIntoRel and DestTransientRel.
>
> Currently I am evaluating other write operations that can
> benefit with parallelism without side effects in enabling them.
>
> comments?

I think a lot more work than this will be needed.  See:

https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_1vU6H8oVyD=zyuLvRnJqTN==fvnhg@mail.gmail.com

...and the discussion which followed.


Thanks for the link.
Yes, it needs more work to support parallelism even for 
queries that involved in write operations like INSERT,
DELETE and UPDATE commands.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:


On Tue, Feb 28, 2017 at 12:48 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:


On Sat, Feb 25, 2017 at 3:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Here I attached an implementation patch that allows
> utility statements that have queries underneath such as
> CREATE TABLE AS, CREATE MATERIALIZED VIEW
> and REFRESH commands to benefit from parallel plan.
>
> These write operations not performed concurrently by the
> parallel workers, but the underlying query that is used by
> these operations are eligible for parallel plans.
>
> Currently the write operations are implemented for the
> tuple dest types DestIntoRel and DestTransientRel.
>
> Currently I am evaluating other write operations that can
> benefit with parallelism without side effects in enabling them.
>
> comments?

I think a lot more work than this will be needed.  See:

https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_1vU6H8oVyD=zyuLvRnJqTN==fvnhg@mail.gmail.com

...and the discussion which followed.


Thanks for the link.
Yes, it needs more work to support parallelism even for 
queries that involved in write operations like INSERT,
DELETE and UPDATE commands.

This patch is marked as "returned with feedback" in the ongoing
commitfest.

The proposed DML write operations patch is having good number
of limitations like triggers and etc, but the utility writer operations
patch is in a good shape in my view to start supporting write operations.
This is useful for materialized view while refreshing the data.

Do you find any problems/missings in supporting parallel plan for utility
commands with the attached update patch?  Or is it something
like supporting all write operations at once?

Regards,
Hari Babu
Fujitsu Australia
Attachment

Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:

Hi All,

Attached a rebased patch that supports parallelism for the queries
that are underneath of some utility commands such as CREATE TABLE AS
and CREATE MATERIALIZED VIEW.

Note: This patch doesn't make the utility statement (insert operation)
to run in parallel. It only allows the select query to be parallel if the query
is eligible for parallel.
 
Regards,
Hari Babu
Fujitsu Australia
Attachment

Re: [HACKERS] utility commands benefiting from parallel plan

From
Rafia Sabih
Date:
On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
> Hi All,
>
> Attached a rebased patch that supports parallelism for the queries
> that are underneath of some utility commands such as CREATE TABLE AS
> and CREATE MATERIALIZED VIEW.
>
> Note: This patch doesn't make the utility statement (insert operation)
> to run in parallel. It only allows the select query to be parallel if the
> query
> is eligible for parallel.
>

Here is my feedback fro this patch,

- The patch is working as expected, all regression tests are passing
- I agree with Dilip that having similar mechanism for 'insert into
select...' statements would add more value to the patch, but even then
this looks like a good idea to extend parallelism for atleast a few of
the write operations

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:


On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:
On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
> Hi All,
>
> Attached a rebased patch that supports parallelism for the queries
> that are underneath of some utility commands such as CREATE TABLE AS
> and CREATE MATERIALIZED VIEW.
>
> Note: This patch doesn't make the utility statement (insert operation)
> to run in parallel. It only allows the select query to be parallel if the
> query
> is eligible for parallel.
>

Here is my feedback fro this patch,

- The patch is working as expected, all regression tests are passing

Thanks for the review.
 
- I agree with Dilip that having similar mechanism for 'insert into
select...' statements would add more value to the patch, but even then
this looks like a good idea to extend parallelism for atleast a few of
the write operations

Yes, I also agree that supporting of 'insert into select' will provide more
benefit. I already tried to support the same in [1], but it have many 
drawbacks especially with triggers. To support a proper parallel support
for DML queries, I feel the logic of ParalleMode needs an update to
avoid the errors from PreventCommandIfParallelMode() function to
identify whether it is nested query operation and that should execute
only in backend and etc.

As the current patch falls into DDL category that gets benefited from
parallel query, because of this reason, I didn't add the 'insert into select'
support into this patch. Without support of it also, it provides the benefit.
I work on supporting the DML write support with parallel query as a 
separate patch.

Re: [HACKERS] utility commands benefiting from parallel plan

From
Rafia Sabih
Date:
On Wed, Sep 13, 2017 at 2:29 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
> wrote:
>>
>> On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
>> <kommi.haribabu@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > Attached a rebased patch that supports parallelism for the queries
>> > that are underneath of some utility commands such as CREATE TABLE AS
>> > and CREATE MATERIALIZED VIEW.
>> >
>> > Note: This patch doesn't make the utility statement (insert operation)
>> > to run in parallel. It only allows the select query to be parallel if
>> > the
>> > query
>> > is eligible for parallel.
>> >
>>
>> Here is my feedback fro this patch,
>>
>> - The patch is working as expected, all regression tests are passing
>
>
> Thanks for the review.
>
>>
>> - I agree with Dilip that having similar mechanism for 'insert into
>> select...' statements would add more value to the patch, but even then
>> this looks like a good idea to extend parallelism for atleast a few of
>> the write operations
>
>
> Yes, I also agree that supporting of 'insert into select' will provide more
> benefit. I already tried to support the same in [1], but it have many
> drawbacks especially with triggers. To support a proper parallel support
> for DML queries, I feel the logic of ParalleMode needs an update to
> avoid the errors from PreventCommandIfParallelMode() function to
> identify whether it is nested query operation and that should execute
> only in backend and etc.
>
> As the current patch falls into DDL category that gets benefited from
> parallel query, because of this reason, I didn't add the 'insert into
> select'
> support into this patch. Without support of it also, it provides the
> benefit.
> I work on supporting the DML write support with parallel query as a
> separate patch.
>
Sounds sensible. In that case, I'll be marking this patch ready for committer.
>
> [1] -
> https://www.postgresql.org/message-id/CAJrrPGfo58TrYxnqwnFAo4%2BtYr8wUH-oC0dJ7V9x7gAOZeaz%2BQ%40mail.gmail.com
>
>

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:


On Thu, Sep 14, 2017 at 2:42 PM, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:
On Wed, Sep 13, 2017 at 2:29 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Wed, Sep 13, 2017 at 4:17 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
> wrote:
>>
>> On Fri, Sep 1, 2017 at 12:31 PM, Haribabu Kommi
>> <kommi.haribabu@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > Attached a rebased patch that supports parallelism for the queries
>> > that are underneath of some utility commands such as CREATE TABLE AS
>> > and CREATE MATERIALIZED VIEW.
>> >
>> > Note: This patch doesn't make the utility statement (insert operation)
>> > to run in parallel. It only allows the select query to be parallel if
>> > the
>> > query
>> > is eligible for parallel.
>> >
>>
>> Here is my feedback fro this patch,
>>
>> - The patch is working as expected, all regression tests are passing
>
>
> Thanks for the review.
>
>>
>> - I agree with Dilip that having similar mechanism for 'insert into
>> select...' statements would add more value to the patch, but even then
>> this looks like a good idea to extend parallelism for atleast a few of
>> the write operations
>
>
> Yes, I also agree that supporting of 'insert into select' will provide more
> benefit. I already tried to support the same in [1], but it have many
> drawbacks especially with triggers. To support a proper parallel support
> for DML queries, I feel the logic of ParalleMode needs an update to
> avoid the errors from PreventCommandIfParallelMode() function to
> identify whether it is nested query operation and that should execute
> only in backend and etc.
>
> As the current patch falls into DDL category that gets benefited from
> parallel query, because of this reason, I didn't add the 'insert into
> select'
> support into this patch. Without support of it also, it provides the
> benefit.
> I work on supporting the DML write support with parallel query as a
> separate patch.
>
Sounds sensible. In that case, I'll be marking this patch ready for committer.

Thanks for the review.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] utility commands benefiting from parallel plan

From
Robert Haas
Date:
On Fri, Sep 15, 2017 at 2:22 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Thanks for the review.

I committed this patch with some cosmetic changes.  I think the fact
that several people have asked for this indicates that, even without
making some of the more complicated cases work, this has some value.
I am not convinced it is safe in any case other than when the DML
command is both creating and populating the table, so I removed
REFRESH MATERIALIZED VIEW support from the patch and worked over the
documentation and comments to a degree.

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking).  That's probably a stupid thing to do,
but it can't be allowed to break the world.  The other cases are safe
from that particular problem because the table doesn't exist yet.

I am still slightly nervous that there may be some other problem that
none of us have thought about that makes this unsafe, but we still
have quite a while until 11 goes out the door, so if there is such a
problem, maybe someone else will find it now that this is committed
and more likely to get some attention.  I thought about not committing
this just in case such a problem exists, but that seemed too timid: if
we never commit anything that might have an undetected bug, we'll
never commit anything at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

From
Haribabu Kommi
Date:


On Fri, Oct 6, 2017 at 2:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 15, 2017 at 2:22 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Thanks for the review.

I committed this patch with some cosmetic changes.  I think the fact
that several people have asked for this indicates that, even without
making some of the more complicated cases work, this has some value.
I am not convinced it is safe in any case other than when the DML
command is both creating and populating the table, so I removed
REFRESH MATERIALIZED VIEW support from the patch and worked over the
documentation and comments to a degree.

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking).  That's probably a stupid thing to do,
but it can't be allowed to break the world.  The other cases are safe
from that particular problem because the table doesn't exist yet.

Thanks for committing the patch.
I understand the problem of REFRESH MATERIALIZED VIEW case.

Regards,
Hari Babu
Fujitsu Australia