Thread: Matview size - space increased on concurrently refresh

Matview size - space increased on concurrently refresh

From
Nicola Contu
Date:
Hello,
we noticed with a simple matview we have that refreshing it using the concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time I am am refreshing it.

cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
 
Let's try concurrently.....
 
cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)


So the matview is not really used and it does not have anything strange but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

Re: Matview size - space increased on concurrently refresh

From
Nicola Contu
Date:
P.S.: I am on postgres 11.3

Il giorno ven 12 lug 2019 alle ore 16:32 Nicola Contu <nicola.contu@gmail.com> ha scritto:
Hello,
we noticed with a simple matview we have that refreshing it using the concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time I am am refreshing it.

cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
 
Let's try concurrently.....
 
cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)


So the matview is not really used and it does not have anything strange but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

Re: Matview size - space increased on concurrently refresh

From
Kaixi Luo
Date:

On Fri, Jul 12, 2019 at 4:34 PM Nicola Contu <nicola.contu@gmail.com> wrote:
P.S.: I am on postgres 11.3

Il giorno ven 12 lug 2019 alle ore 16:32 Nicola Contu <nicola.contu@gmail.com> ha scritto:
Hello,
we noticed with a simple matview we have that refreshing it using the concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time I am am refreshing it.

cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
 
Let's try concurrently.....
 
cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)


So the matview is not really used and it does not have anything strange but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

This is normal and something to be expected. When refreshing the materialized view, the new data is written to a disk and then the two tables are diffed. After the refresh finishes, your view size should go back to normal.
 

Re: Matview size - space increased on concurrently refresh

From
Nicola Contu
Date:
It does not. That's the issue.
It always increases of 120mb and it reached 12gb instead of just 180mb.

Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo@gmail.com> ha scritto:

On Fri, Jul 12, 2019 at 4:34 PM Nicola Contu <nicola.contu@gmail.com> wrote:
P.S.: I am on postgres 11.3

Il giorno ven 12 lug 2019 alle ore 16:32 Nicola Contu <nicola.contu@gmail.com> ha scritto:
Hello,
we noticed with a simple matview we have that refreshing it using the concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time I am am refreshing it.

cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019
 
cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)
 
 
Let's try concurrently.....
 
cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)


So the matview is not really used and it does not have anything strange but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

This is normal and something to be expected. When refreshing the materialized view, the new data is written to a disk and then the two tables are diffed. After the refresh finishes, your view size should go back to normal.
 

Re: Matview size - space increased on concurrently refresh

From
Tom Lane
Date:
[ please do not top-post in your replies, it makes the conversation hard
  to follow ]

Nicola Contu <nicola.contu@gmail.com> writes:
> Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo@gmail.com> ha scritto:
>> This is normal and something to be expected. When refreshing the
>> materialized view, the new data is written to a disk and then the two
>> tables are diffed. After the refresh finishes, your view size should go
>> back to normal.

> It does not. That's the issue.
> It always increases of 120mb and it reached 12gb instead of just 180mb.

A concurrent matview refresh will necessarily leave behind two copies
of any rows it changes, just like any other row-update operation in
Postgres.  Once there are no concurrent transactions that can "see"
the old row copies, they should be reclaimable by vacuum.

Since you're not seeing autovacuum reclaim the space automatically,
I hypothesize that you've got autovacuum turned off or dialed down
to unrealistically non-aggressive settings.  Or possibly you have
old open transactions that are preventing reclaiming dead rows
(because they can still possibly "see" those rows).  Either of those
explanations should imply that you're getting similar bloat in every
other table and matview, though.

You might want to look into pg_stat_all_tables to see what it says
about the last_autovacuum time etc. for that matview.  Another source
of insight is to do a manual "vacuum verbose" on the matview and see
what that says about removable and nonremovable rows.

            regards, tom lane



Re: Matview size - space increased on concurrently refresh

From
Nicola Contu
Date:


Il giorno dom 14 lug 2019 alle ore 22:23 Tom Lane <tgl@sss.pgh.pa.us> ha scritto:
[ please do not top-post in your replies, it makes the conversation hard
  to follow ]

Nicola Contu <nicola.contu@gmail.com> writes:
> Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo@gmail.com> ha scritto:
>> This is normal and something to be expected. When refreshing the
>> materialized view, the new data is written to a disk and then the two
>> tables are diffed. After the refresh finishes, your view size should go
>> back to normal.

> It does not. That's the issue.
> It always increases of 120mb and it reached 12gb instead of just 180mb.

A concurrent matview refresh will necessarily leave behind two copies
of any rows it changes, just like any other row-update operation in
Postgres.  Once there are no concurrent transactions that can "see"
the old row copies, they should be reclaimable by vacuum.

Since you're not seeing autovacuum reclaim the space automatically,
I hypothesize that you've got autovacuum turned off or dialed down
to unrealistically non-aggressive settings.  Or possibly you have
old open transactions that are preventing reclaiming dead rows
(because they can still possibly "see" those rows).  Either of those
explanations should imply that you're getting similar bloat in every
other table and matview, though.

You might want to look into pg_stat_all_tables to see what it says
about the last_autovacuum time etc. for that matview.  Another source
of insight is to do a manual "vacuum verbose" on the matview and see
what that says about removable and nonremovable rows.

                        regards, tom lane

This matview has nothing strange and nothign custom.
We can replicate the matview that is not used by anyone. 

cmdv3=# vacuum (full,analyze,verbose) public.matview_nm_connections;
INFO:  vacuuming "public.matview_nm_connections"
INFO:  "matview_nm_connections": found 0 removable, 295877 nonremovable row versions in 33654 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.45 s, system: 0.43 s, elapsed: 1.24 s.
INFO:  analyzing "public.matview_nm_connections"
INFO:  "matview_nm_connections": scanned 16986 of 16986 pages, containing 295877 live rows and 0 dead rows; 30000 rows in sample, 295877 estimated total rows
VACUUM

This is an example of full and verbose vacuum. Everytime I refresh it I get the size increased.

See stats from the pg_stat_all_tables :

cmdv3=# select * from pg_stat_all_tables where relname = 'matview_nm_connections';
   relid    | schemaname |        relname         | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |          last_vacuum          |        last_autovacuum        |         last_analyze          |       last_autoanalyze
      | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
------------+------------+------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------
------+--------------+------------------+---------------+-------------------
 3466831733 | public     | matview_nm_connections |     3725 |    540992219 |    33235 |        255113 |  96874161 |         0 |  95692276 |             0 |     295877 |          0 |                   0 | 2019-07-12 11:58:39.198049+00 | 2019-07-16 11:07:02.765612+00 | 2019-07-17 10:28:08.819679+00 | 2019-07-16 11:03:32.4895
73+00 |            5 |               29 |            11 |                17
(1 row)