Thread: CREATE/REFRESH MATERIALIZED VIEW planner difference?

CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Philip Semanchuk
Date:
Hi all,
Should I expect a planner difference between CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW? We have a
materializedview that uses 4 workers during CREATE but only one worker during REFRESH, and as a result the refresh
takesmuch longer (~90 minutes vs. 30 minutes for the CREATE). So far this behavior has been 100% consistent. 

I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on AWS Aurora). I don't think the refresh is
usingone worker in response to other things happening on the server because we’ve observed this happening when the
serveris not busy. We're not using the CONCURRENTLY option for REFRESH. 

THanks
Philip


Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Vijaykumar Jain
Date:
if you are not using it concurrently, can you confirm the there are *no active* queries on the mv.
refresh requires AccessExclusiveLock and will wait, till it gets one.
just asking if you can rule out the extended time is not due to waiting for lock.

also, can you share the plans  where you see the diff.





On Tue, 1 Jun 2021 at 23:30, Philip Semanchuk <philip@americanefficient.com> wrote:
Hi all,
Should I expect a planner difference between CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW? We have a materialized view that uses 4 workers during CREATE but only one worker during REFRESH, and as a result the refresh takes much longer (~90 minutes vs. 30 minutes for the CREATE). So far this behavior has been 100% consistent.

I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on AWS Aurora). I don't think the refresh is using one worker in response to other things happening on the server because we’ve observed this happening when the server is not busy. We're not using the CONCURRENTLY option for REFRESH.

THanks
Philip



--
Thanks,
Vijay
Mumbai, India

Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Philip Semanchuk
Date:

> On Jun 1, 2021, at 2:20 PM, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
>
> if you are not using it concurrently, can you confirm the there are *no active* queries on the mv.
> refresh requires AccessExclusiveLock and will wait, till it gets one.
> just asking if you can rule out the extended time is not due to waiting for lock.

I can confirm that it’s not waiting on a lock. In addition, through the AWS CPU utilization monitor I can see that the
REFRESHuses one CPU/worker whereas the CREATE uses four. This is consistent with the EXPLAIN ANALYZE for the CREATE
whichsays it uses four workers. 


> also, can you share the plans  where you see the diff.

Unless I misunderstand, there is no plan for a REFRESH.


EXPLAIN (ANALYZE, BUFFERS) refresh materialized view my_mat_view
+-------------------------------------------+
| QUERY PLAN                                |
|-------------------------------------------|
| Utility statements have no plan structure |
+-------------------------------------------+

Cheers
Philip




>
> On Tue, 1 Jun 2021 at 23:30, Philip Semanchuk <philip@americanefficient.com> wrote:
> Hi all,
> Should I expect a planner difference between CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW? We have a
materializedview that uses 4 workers during CREATE but only one worker during REFRESH, and as a result the refresh
takesmuch longer (~90 minutes vs. 30 minutes for the CREATE). So far this behavior has been 100% consistent. 
>
> I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on AWS Aurora). I don't think the refresh
isusing one worker in response to other things happening on the server because we’ve observed this happening when the
serveris not busy. We're not using the CONCURRENTLY option for REFRESH. 
>
> THanks
> Philip
>
>
>
> --
> Thanks,
> Vijay
> Mumbai, India




Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Tom Lane
Date:
Philip Semanchuk <philip@americanefficient.com> writes:
> I can confirm that it’s not waiting on a lock. In addition, through the AWS CPU utilization monitor I can see that
theREFRESH uses one CPU/worker whereas the CREATE uses four. This is consistent with the EXPLAIN ANALYZE for the CREATE
whichsays it uses four workers. 

Hm.  I tried to reproduce this here, and in a simple test case I get
parallelized plans for both CREATE and REFRESH.  Are you sure the
REFRESH is running with the same server parameter settings?

>> also, can you share the plans  where you see the diff.

> Unless I misunderstand, there is no plan for a REFRESH.

EXPLAIN isn't bright about that, but if you enable auto_explain,
it will log the plan for a REFRESH's query.

            regards, tom lane



Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Vijaykumar Jain
Date:
ok i see this.
i may be wrong, but even when i force parallel cost to 0,
i only get workers to create mv, but refresh mv plan does not use workers for the same conf params.

******************* 
postgres=# create table if not exists t( id int primary key, value int );
CREATE TABLE
postgres=# insert into t select x,x from generate_series(1, 100000) x;
INSERT 0 100000
postgres=# analyze t;
ANALYZE
*************

postgres=# drop materialized view mv;
DROP MATERIALIZED VIEW
postgres=# explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2943.02..2943.03 rows=1 width=40) (actual time=10027.940..10027.941 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..2443.01 rows=100000 width=4) (actual time=10010.513..10022.985 rows=100000 loops=1)
         ->  Function Scan on pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=10010.497..10010.498 rows=1 loops=1)
         ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.012..5.841 rows=100000 loops=1)
 Planning Time: 0.245 ms
 Execution Time: 10039.621 ms
(6 rows)

postgres=# drop materialized view mv;
DROP MATERIALIZED VIEW
postgres=# set parallel_setup_cost=0;
SET
postgres=# set parallel_tuple_cost=0;
SET
postgres=# set min_parallel_table_scan_size=0;
SET
postgres=# set max_parallel_workers_per_gather=4;
SET
postgres=# explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1318.04..1318.05 rows=1 width=40) (actual time=10042.197..10042.457 rows=1 loops=1)
   ->  Gather  (cost=1318.00..1318.01 rows=4 width=40) (actual time=10041.941..10042.450 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=1318.00..1318.01 rows=1 width=40) (actual time=10035.167..10035.168 rows=1 loops=5)
               ->  Nested Loop  (cost=0.00..1193.00 rows=25000 width=4) (actual time=10011.980..10033.456 rows=20000 loops=5)
                     ->  Parallel Seq Scan on t  (cost=0.00..693.00 rows=25000 width=4) (actual time=0.005..5.791 rows=20000 loops=5)
                     ->  Function Scan on pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=0.501..0.501 rows=1 loops=100000)
 Planning Time: 0.105 ms
 Execution Time: 10059.992 ms
(10 rows)

postgres=# refresh materialized view mv;
REFRESH MATERIALIZED VIEW


*************************************************** auto explain in logs



2021-06-02 00:41:44.294 IST [2687] LOG:  statement: explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
2021-06-02 00:41:54.361 IST [2687] LOG:  duration: 10059.566 ms  plan:
        Query Text: explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
        Finalize Aggregate  (cost=1318.04..1318.05 rows=1 width=40) (actual time=10042.197..10042.457 rows=1 loops=1)
          Output: round(avg(t.id), 0), sum(t.id)
          Buffers: shared hit=443
          ->  Gather  (cost=1318.00..1318.01 rows=4 width=40) (actual time=10041.941..10042.450 rows=5 loops=1)
                Output: (PARTIAL avg(t.id)), (PARTIAL sum(t.id))
                Workers Planned: 4
                Workers Launched: 4
                Buffers: shared hit=443
                ->  Partial Aggregate  (cost=1318.00..1318.01 rows=1 width=40) (actual time=10035.167..10035.168 rows=1 loops=5)
                      Output: PARTIAL avg(t.id), PARTIAL sum(t.id)
                      Buffers: shared hit=443
                      Worker 0:  actual time=10033.316..10033.316 rows=1 loops=1
                        Buffers: shared hit=62
                      Worker 1:  actual time=10033.162..10033.163 rows=1 loops=1
                        Buffers: shared hit=55
                      Worker 2:  actual time=10034.946..10034.946 rows=1 loops=1
                        Buffers: shared hit=117
                      Worker 3:  actual time=10033.210..10033.211 rows=1 loops=1
                        Buffers: shared hit=103
                      ->  Nested Loop  (cost=0.00..1193.00 rows=25000 width=4) (actual time=10011.980..10033.456 rows=20000 loops=5)
                            Output: t.id
                            Buffers: shared hit=443
                            Worker 0:  actual time=10017.958..10032.681 rows=14012 loops=1
                              Buffers: shared hit=62
                            Worker 1:  actual time=10014.150..10032.520 rows=12430 loops=1
                              Buffers: shared hit=55
                            Worker 2:  actual time=10007.133..10029.864 rows=26442 loops=1
                              Buffers: shared hit=117
                            Worker 3:  actual time=10010.339..10032.137 rows=23160 loops=1
                              Buffers: shared hit=103
                            ->  Parallel Seq Scan on public.t  (cost=0.00..693.00 rows=25000 width=4) (actual time=0.005..5.791 rows=20000 loops=5)
                                  Output: t.id, t.value
                                  Buffers: shared hit=443
                                  Worker 0:  actual time=0.004..0.708 rows=14012 loops=1
                                    Buffers: shared hit=62
                                  Worker 1:  actual time=0.005..0.722 rows=12430 loops=1
                                    Buffers: shared hit=55
                                  Worker 2:  actual time=0.006..1.433 rows=26442 loops=1
                                    Buffers: shared hit=117
                                  Worker 3:  actual time=0.005..17.246 rows=23160 loops=1
                                    Buffers: shared hit=103
                            ->  Function Scan on pg_catalog.pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=0.501..0.501 rows=1 loops=100000)
                                  Output: pg_sleep.pg_sleep
                                  Function Call: pg_sleep('10'::double precision)
                                  Worker 0:  actual time=0.715..0.715 rows=1 loops=14012
                                  Worker 1:  actual time=0.806..0.807 rows=1 loops=12430
                                  Worker 2:  actual time=0.378..0.379 rows=1 loops=26442
                                  Worker 3:  actual time=0.432..0.432 rows=1 loops=23160
2021-06-02 00:41:54.369 IST [2687] LOG:  duration: 10074.336 ms
2021-06-02 00:42:00.567 IST [2687] LOG:  statement: refresh materialized view mv;
2021-06-02 00:42:10.611 IST [2687] LOG:  duration: 10023.402 ms  plan:
        Query Text: refresh materialized view mv;
        Aggregate  (cost=2943.02..2943.03 rows=1 width=40) (actual time=10023.331..10023.332 rows=1 loops=1)
          Output: round(avg(t.id), 0), sum(t.id)
          Buffers: shared hit=443
          ->  Nested Loop  (cost=0.00..2443.01 rows=100000 width=4) (actual time=10005.544..10018.127 rows=100000 loops=1)
                Output: t.id
                Buffers: shared hit=443
                ->  Function Scan on pg_catalog.pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=10005.504..10005.505 rows=1 loops=1)
                      Output: pg_sleep.pg_sleep
                      Function Call: pg_sleep('10'::double precision)
                ->  Seq Scan on public.t  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.032..5.815 rows=100000 loops=1)
                      Output: t.id, t.value
                      Buffers: shared hit=443
2021-06-02 00:42:10.619 IST [2687] LOG:  duration: 10051.366 ms




On Wed, 2 Jun 2021 at 00:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Philip Semanchuk <philip@americanefficient.com> writes:
> I can confirm that it’s not waiting on a lock. In addition, through the AWS CPU utilization monitor I can see that the REFRESH uses one CPU/worker whereas the CREATE uses four. This is consistent with the EXPLAIN ANALYZE for the CREATE which says it uses four workers.

Hm.  I tried to reproduce this here, and in a simple test case I get
parallelized plans for both CREATE and REFRESH.  Are you sure the
REFRESH is running with the same server parameter settings?

>> also, can you share the plans  where you see the diff.

> Unless I misunderstand, there is no plan for a REFRESH.

EXPLAIN isn't bright about that, but if you enable auto_explain,
it will log the plan for a REFRESH's query.

                        regards, tom lane


--
Thanks,
Vijay
Mumbai, India

Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Thomas Munro
Date:
On Wed, Jun 2, 2021 at 7:15 AM Vijaykumar Jain
<vijaykumarjain.github@gmail.com> wrote:
> i only get workers to create mv, but refresh mv plan does not use workers for the same conf params.

Yeah, this changed in v14:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9e7ccd9ef64d05e87ceb1985d459bef9031205c0



Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Vijaykumar Jain
Date:
ok, so Tom ran on pg14  it seems. :)


On Wed, 2 Jun 2021 at 00:53, Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Jun 2, 2021 at 7:15 AM Vijaykumar Jain
<vijaykumarjain.github@gmail.com> wrote:
> i only get workers to create mv, but refresh mv plan does not use workers for the same conf params.

Yeah, this changed in v14:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9e7ccd9ef64d05e87ceb1985d459bef9031205c0


--
Thanks,
Vijay
Mumbai, India

Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From
Philip Semanchuk
Date:

> On Jun 1, 2021, at 3:23 PM, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Jun 2, 2021 at 7:15 AM Vijaykumar Jain
> <vijaykumarjain.github@gmail.com> wrote:
>> i only get workers to create mv, but refresh mv plan does not use workers for the same conf params.
>
> Yeah, this changed in v14:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9e7ccd9ef64d05e87ceb1985d459bef9031205c0


Thanks, all! It’s great to have a clear explanation. I looked at the change notes for 12 & 13 before I posted. I didn’t
occurto me to look at 14. :-)  

Cheers
Philip