Thread: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Bharath Rupireddy
Date:
Hi,

I think we can pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() for
refresh mat view so that parallelism can be considered for the SELECT
part of the previously created mat view. The refresh mat view queries
can be faster in cases where SELECT is parallelized.

Attaching a small patch. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Bharath Rupireddy
Date:
On Tue, Dec 1, 2020 at 5:34 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> I think we can pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() for
> refresh mat view so that parallelism can be considered for the SELECT
> part of the previously created mat view. The refresh mat view queries
> can be faster in cases where SELECT is parallelized.
>
> Attaching a small patch. Thoughts?
>

Added this to commitfest, in case it is useful -
https://commitfest.postgresql.org/31/2856/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
"Hou, Zhijie"
Date:
Hi

> Added this to commitfest, in case it is useful -
> https://commitfest.postgresql.org/31/2856/

I have an issue about the safety of enable parallel select.

I checked the [parallel insert into select] patch.
https://commitfest.postgresql.org/31/2844/
It seems parallel select is not allowed when target table is temporary table.

+    /*
+     * We can't support table modification in parallel-mode if it's a foreign
+     * table/partition (no FDW API for supporting parallel access) or a
+     * temporary table.
+     */
+    if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+        RelationUsesLocalBuffers(rel))
+    {
+        table_close(rel, lockmode);
+        context->max_hazard = PROPARALLEL_UNSAFE;
+        return context->max_hazard;
+    }

For Refresh MatView.
if CONCURRENTLY is specified, It will builds new data in temp tablespace:
    if (concurrent)
    {
        tableSpace = GetDefaultTablespace(RELPERSISTENCE_TEMP, false);
        relpersistence = RELPERSISTENCE_TEMP;
    }

For the above case, should we still consider parallelism ?

Best regards,
houzj




Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Bharath Rupireddy
Date:
On Tue, Dec 22, 2020 at 4:53 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> I have an issue about the safety of enable parallel select.
>
> I checked the [parallel insert into select] patch.
> https://commitfest.postgresql.org/31/2844/
> It seems parallel select is not allowed when target table is temporary table.
>
> +       /*
> +        * We can't support table modification in parallel-mode if it's a foreign
> +        * table/partition (no FDW API for supporting parallel access) or a
> +        * temporary table.
> +        */
> +       if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
> +               RelationUsesLocalBuffers(rel))
> +       {
> +               table_close(rel, lockmode);
> +               context->max_hazard = PROPARALLEL_UNSAFE;
> +               return context->max_hazard;
> +       }
>
> For Refresh MatView.
> if CONCURRENTLY is specified, It will builds new data in temp tablespace:
>         if (concurrent)
>         {
>                 tableSpace = GetDefaultTablespace(RELPERSISTENCE_TEMP, false);
>                 relpersistence = RELPERSISTENCE_TEMP;
>         }
>
> For the above case, should we still consider parallelism ?

Thanks for taking a look at the patch.

The intention of the patch is to just enable the parallel mode while
planning the select part of the materialized view, but the insertions
do happen in the leader backend itself. That way even if there's
temporary tablespace gets created, we have no problem.

This patch can be thought as a precursor to parallelizing inserts in
refresh matview. I'm thinking to follow the design of parallel inserts
in ctas [1] i.e. pushing the dest receiver down to the workers once
that gets reviewed and finalized. At that time, we should take care of
not pushing inserts down to workers if temporary tablespace gets
created.

In summary, as far as this patch is considered we don't have any
problem with temporary tablespace getting created with CONCURRENTLY
option.

I'm planning to add a few test cases to cover this patch in
matview.sql and post a v2 patch soon.

[1] -  https://www.postgresql.org/message-id/CALj2ACWbQ95gS0z1viQC3qFVnMGAz7dcLjno9GdZv%2Bu9RAU9eQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
"Hou, Zhijie"
Date:
> Thanks for taking a look at the patch.
> 
> The intention of the patch is to just enable the parallel mode while planning
> the select part of the materialized view, but the insertions do happen in
> the leader backend itself. That way even if there's temporary tablespace
> gets created, we have no problem.
> 
> This patch can be thought as a precursor to parallelizing inserts in refresh
> matview. I'm thinking to follow the design of parallel inserts in ctas [1]
> i.e. pushing the dest receiver down to the workers once that gets reviewed
> and finalized. At that time, we should take care of not pushing inserts
> down to workers if temporary tablespace gets created.
> 
> In summary, as far as this patch is considered we don't have any problem
> with temporary tablespace getting created with CONCURRENTLY option.
> 
> I'm planning to add a few test cases to cover this patch in matview.sql
> and post a v2 patch soon.

Thanks for your explanation!
You are right that temporary tablespace does not affect current patch.

For the testcase:
I noticed that you have post a mail about add explain support for REFRESH MATERIALIZED VIEW.
Do you think we can combine these two features in one thread ?

Personally, The testcase with explain info will be better.

Best regards,
houzj



Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Bharath Rupireddy
Date:
On Wed, Dec 23, 2020 at 9:14 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Thanks for taking a look at the patch.
> >
> > The intention of the patch is to just enable the parallel mode while planning
> > the select part of the materialized view, but the insertions do happen in
> > the leader backend itself. That way even if there's temporary tablespace
> > gets created, we have no problem.
> >
> > This patch can be thought as a precursor to parallelizing inserts in refresh
> > matview. I'm thinking to follow the design of parallel inserts in ctas [1]
> > i.e. pushing the dest receiver down to the workers once that gets reviewed
> > and finalized. At that time, we should take care of not pushing inserts
> > down to workers if temporary tablespace gets created.
> >
> > In summary, as far as this patch is considered we don't have any problem
> > with temporary tablespace getting created with CONCURRENTLY option.
> >
> > I'm planning to add a few test cases to cover this patch in matview.sql
> > and post a v2 patch soon.
>
> Thanks for your explanation!
> You are right that temporary tablespace does not affect current patch.
>
> For the testcase:
> I noticed that you have post a mail about add explain support for REFRESH MATERIALIZED VIEW.
> Do you think we can combine these two features in one thread ?

Yeah, it is at [1] and on some initial analysis ISTM that
explain/explain analyze RMV requires a good amount of code
rearrangement in ExecRefreshMatView(). IMO these two features can be
kept separate because they serve different purposes.

[1] - https://www.postgresql.org/message-id/flat/CALj2ACU71s91G1EOzo-Xx7Z4mvF0dKq-mYeP5u4nikJWxPNRSA%40mail.gmail.com

> Personally, The testcase with explain info will be better.

Yeah without explain analyze we can not show whether the parallelism
is picked in the test cases. What we could do is that we can add a
plain RMV test case in write_parallel.sql after CMV so that at least
we can be ensured that the parallelism will be picked because of the
enforcement there. We can always see the parallelism for the select
part of explain analyze CMV in write_parallel.sql and the same select
query gets planned even in RMV cases.

IMO, the patch in this thread can go with test case addition to
write_parallel.sql. since it is very small.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
"Hou, Zhijie"
Date:
> Yeah without explain analyze we can not show whether the parallelism is
> picked in the test cases. What we could do is that we can add a plain RMV
> test case in write_parallel.sql after CMV so that at least we can be ensured
> that the parallelism will be picked because of the enforcement there. We
> can always see the parallelism for the select part of explain analyze CMV
> in write_parallel.sql and the same select query gets planned even in RMV
> cases.
> 
> IMO, the patch in this thread can go with test case addition to
> write_parallel.sql. since it is very small.
> 
> Thoughts?

Yes, agreed.

Best regards,
houzj



Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Bharath Rupireddy
Date:
On Wed, Dec 30, 2020 at 8:03 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Yeah without explain analyze we can not show whether the parallelism is
> > picked in the test cases. What we could do is that we can add a plain RMV
> > test case in write_parallel.sql after CMV so that at least we can be ensured
> > that the parallelism will be picked because of the enforcement there. We
> > can always see the parallelism for the select part of explain analyze CMV
> > in write_parallel.sql and the same select query gets planned even in RMV
> > cases.
> >
> > IMO, the patch in this thread can go with test case addition to
> > write_parallel.sql. since it is very small.
> >
> > Thoughts?
>
> Yes, agreed.

Thanks. Added the test case. Attaching v2 patch. Please have a look.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Luc Vlaming
Date:
On 30-12-2020 04:49, Bharath Rupireddy wrote:
> On Wed, Dec 30, 2020 at 8:03 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>>> Yeah without explain analyze we can not show whether the parallelism is
>>> picked in the test cases. What we could do is that we can add a plain RMV
>>> test case in write_parallel.sql after CMV so that at least we can be ensured
>>> that the parallelism will be picked because of the enforcement there. We
>>> can always see the parallelism for the select part of explain analyze CMV
>>> in write_parallel.sql and the same select query gets planned even in RMV
>>> cases.
>>>
>>> IMO, the patch in this thread can go with test case addition to
>>> write_parallel.sql. since it is very small.
>>>
>>> Thoughts?
>>
>> Yes, agreed.
> 
> Thanks. Added the test case. Attaching v2 patch. Please have a look.
> 
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Looks good to me and a nice simple improvement.

Passes everything according to http://cfbot.cputube.org/ so marked it 
therefore as ready for commiter.

Cheers,
Luc



Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From
Luc Vlaming
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

passes according to http://cfbot.cputube.org/

The new status of this patch is: Ready for Committer