Re: CREATE/REFRESH MATERIALIZED VIEW planner difference? - Mailing list pgsql-general

From Vijaykumar Jain
Subject Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
Date
Msg-id CAM+6J97mxM5zPNuXxU_Jnqiyc5znRF3GTXd6dB_yfvpYWouuEQ@mail.gmail.com
Whole thread Raw
In response to Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-general
ok i see this.
i may be wrong, but even when i force parallel cost to 0,
i only get workers to create mv, but refresh mv plan does not use workers for the same conf params.

******************* 
postgres=# create table if not exists t( id int primary key, value int );
CREATE TABLE
postgres=# insert into t select x,x from generate_series(1, 100000) x;
INSERT 0 100000
postgres=# analyze t;
ANALYZE
*************

postgres=# drop materialized view mv;
DROP MATERIALIZED VIEW
postgres=# explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2943.02..2943.03 rows=1 width=40) (actual time=10027.940..10027.941 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..2443.01 rows=100000 width=4) (actual time=10010.513..10022.985 rows=100000 loops=1)
         ->  Function Scan on pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=10010.497..10010.498 rows=1 loops=1)
         ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.012..5.841 rows=100000 loops=1)
 Planning Time: 0.245 ms
 Execution Time: 10039.621 ms
(6 rows)

postgres=# drop materialized view mv;
DROP MATERIALIZED VIEW
postgres=# set parallel_setup_cost=0;
SET
postgres=# set parallel_tuple_cost=0;
SET
postgres=# set min_parallel_table_scan_size=0;
SET
postgres=# set max_parallel_workers_per_gather=4;
SET
postgres=# explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1318.04..1318.05 rows=1 width=40) (actual time=10042.197..10042.457 rows=1 loops=1)
   ->  Gather  (cost=1318.00..1318.01 rows=4 width=40) (actual time=10041.941..10042.450 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=1318.00..1318.01 rows=1 width=40) (actual time=10035.167..10035.168 rows=1 loops=5)
               ->  Nested Loop  (cost=0.00..1193.00 rows=25000 width=4) (actual time=10011.980..10033.456 rows=20000 loops=5)
                     ->  Parallel Seq Scan on t  (cost=0.00..693.00 rows=25000 width=4) (actual time=0.005..5.791 rows=20000 loops=5)
                     ->  Function Scan on pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=0.501..0.501 rows=1 loops=100000)
 Planning Time: 0.105 ms
 Execution Time: 10059.992 ms
(10 rows)

postgres=# refresh materialized view mv;
REFRESH MATERIALIZED VIEW


*************************************************** auto explain in logs



2021-06-02 00:41:44.294 IST [2687] LOG:  statement: explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
2021-06-02 00:41:54.361 IST [2687] LOG:  duration: 10059.566 ms  plan:
        Query Text: explain analyze create materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
        Finalize Aggregate  (cost=1318.04..1318.05 rows=1 width=40) (actual time=10042.197..10042.457 rows=1 loops=1)
          Output: round(avg(t.id), 0), sum(t.id)
          Buffers: shared hit=443
          ->  Gather  (cost=1318.00..1318.01 rows=4 width=40) (actual time=10041.941..10042.450 rows=5 loops=1)
                Output: (PARTIAL avg(t.id)), (PARTIAL sum(t.id))
                Workers Planned: 4
                Workers Launched: 4
                Buffers: shared hit=443
                ->  Partial Aggregate  (cost=1318.00..1318.01 rows=1 width=40) (actual time=10035.167..10035.168 rows=1 loops=5)
                      Output: PARTIAL avg(t.id), PARTIAL sum(t.id)
                      Buffers: shared hit=443
                      Worker 0:  actual time=10033.316..10033.316 rows=1 loops=1
                        Buffers: shared hit=62
                      Worker 1:  actual time=10033.162..10033.163 rows=1 loops=1
                        Buffers: shared hit=55
                      Worker 2:  actual time=10034.946..10034.946 rows=1 loops=1
                        Buffers: shared hit=117
                      Worker 3:  actual time=10033.210..10033.211 rows=1 loops=1
                        Buffers: shared hit=103
                      ->  Nested Loop  (cost=0.00..1193.00 rows=25000 width=4) (actual time=10011.980..10033.456 rows=20000 loops=5)
                            Output: t.id
                            Buffers: shared hit=443
                            Worker 0:  actual time=10017.958..10032.681 rows=14012 loops=1
                              Buffers: shared hit=62
                            Worker 1:  actual time=10014.150..10032.520 rows=12430 loops=1
                              Buffers: shared hit=55
                            Worker 2:  actual time=10007.133..10029.864 rows=26442 loops=1
                              Buffers: shared hit=117
                            Worker 3:  actual time=10010.339..10032.137 rows=23160 loops=1
                              Buffers: shared hit=103
                            ->  Parallel Seq Scan on public.t  (cost=0.00..693.00 rows=25000 width=4) (actual time=0.005..5.791 rows=20000 loops=5)
                                  Output: t.id, t.value
                                  Buffers: shared hit=443
                                  Worker 0:  actual time=0.004..0.708 rows=14012 loops=1
                                    Buffers: shared hit=62
                                  Worker 1:  actual time=0.005..0.722 rows=12430 loops=1
                                    Buffers: shared hit=55
                                  Worker 2:  actual time=0.006..1.433 rows=26442 loops=1
                                    Buffers: shared hit=117
                                  Worker 3:  actual time=0.005..17.246 rows=23160 loops=1
                                    Buffers: shared hit=103
                            ->  Function Scan on pg_catalog.pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=0.501..0.501 rows=1 loops=100000)
                                  Output: pg_sleep.pg_sleep
                                  Function Call: pg_sleep('10'::double precision)
                                  Worker 0:  actual time=0.715..0.715 rows=1 loops=14012
                                  Worker 1:  actual time=0.806..0.807 rows=1 loops=12430
                                  Worker 2:  actual time=0.378..0.379 rows=1 loops=26442
                                  Worker 3:  actual time=0.432..0.432 rows=1 loops=23160
2021-06-02 00:41:54.369 IST [2687] LOG:  duration: 10074.336 ms
2021-06-02 00:42:00.567 IST [2687] LOG:  statement: refresh materialized view mv;
2021-06-02 00:42:10.611 IST [2687] LOG:  duration: 10023.402 ms  plan:
        Query Text: refresh materialized view mv;
        Aggregate  (cost=2943.02..2943.03 rows=1 width=40) (actual time=10023.331..10023.332 rows=1 loops=1)
          Output: round(avg(t.id), 0), sum(t.id)
          Buffers: shared hit=443
          ->  Nested Loop  (cost=0.00..2443.01 rows=100000 width=4) (actual time=10005.544..10018.127 rows=100000 loops=1)
                Output: t.id
                Buffers: shared hit=443
                ->  Function Scan on pg_catalog.pg_sleep  (cost=0.00..0.01 rows=1 width=0) (actual time=10005.504..10005.505 rows=1 loops=1)
                      Output: pg_sleep.pg_sleep
                      Function Call: pg_sleep('10'::double precision)
                ->  Seq Scan on public.t  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.032..5.815 rows=100000 loops=1)
                      Output: t.id, t.value
                      Buffers: shared hit=443
2021-06-02 00:42:10.619 IST [2687] LOG:  duration: 10051.366 ms




On Wed, 2 Jun 2021 at 00:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Philip Semanchuk <philip@americanefficient.com> writes:
> I can confirm that it’s not waiting on a lock. In addition, through the AWS CPU utilization monitor I can see that the REFRESH uses one CPU/worker whereas the CREATE uses four. This is consistent with the EXPLAIN ANALYZE for the CREATE which says it uses four workers.

Hm.  I tried to reproduce this here, and in a simple test case I get
parallelized plans for both CREATE and REFRESH.  Are you sure the
REFRESH is running with the same server parameter settings?

>> also, can you share the plans  where you see the diff.

> Unless I misunderstand, there is no plan for a REFRESH.

EXPLAIN isn't bright about that, but if you enable auto_explain,
it will log the plan for a REFRESH's query.

                        regards, tom lane


--
Thanks,
Vijay
Mumbai, India

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
Next
From: Thomas Munro
Date:
Subject: Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?