Thread: Statistics import and export: difference in statistics of materialized view dumped

Hi Jeff, Corey,
After fixing the statistics difference in dumps of tables with
indexes, I now see difference in statistics of materialized view dump
in the test I am developing at [1] (see the latest patches there).

I see following difference in the dump from the original regression
database and the dump taken from the database where the dump is
restored

@@ -441198,8 +441198,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_bb'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -441218,8 +441218,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tm'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '3'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -441238,8 +441238,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tvmm'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -448468,9 +448468,9 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.tableam_tblmv_heap2'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '1'::real,
- 'relallvisible', '1'::integer
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
+ 'relallvisible', '0'::integer
);

These are materialised views created in the test matview.sql and create_am.sql.

When I tried to reproduce the issue outside the test using the
attached scripts. The SQL is just copied from matview.sql. But both
the dumps (from original and restored databases) do not show any
difference. But if I run "make installcheck", take dump of regression
database, restore it, take dump of restored database, I am able to see
the following difference
*** 458089,458097 ****
  SELECT * FROM pg_catalog.pg_restore_relation_stats(
        'version', '180000'::integer,
        'relation', 'public.tableam_tblmv_heap2'::regclass,
!       'relpages', '1'::integer,
!       'reltuples', '1'::real,
!       'relallvisible', '1'::integer
  );


--- 458089,458097 ----
  SELECT * FROM pg_catalog.pg_restore_relation_stats(
        'version', '180000'::integer,
        'relation', 'public.tableam_tblmv_heap2'::regclass,
!       'relpages', '0'::integer,
!       'reltuples', '-1'::real,
!       'relallvisible', '0'::integer
  );

This seems to be a real problem since the statistics is going back
i.e. useful statistics is being reset.

[1] https://www.postgresql.org/message-id/CAExHW5sBbMki6Xs4XxFQQF3C4Wx3wxkLAcySrtuW3vrnOxXDNQ%40mail.gmail.com
-- 
Best Wishes,
Ashutosh Bapat

Attachment
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> After fixing the statistics difference in dumps of tables with
> indexes, I now see difference in statistics of materialized view dump
> in the test I am developing at [1] (see the latest patches there).

Are you doing the restore in parallel by any chance?  I had a todo
item to revisit the dependencies that pg_dump is creating for stats
items, because they looked wrong to me, ie inadequate to guarantee
correct restore order.

            regards, tom lane



On Tue, 2025-03-11 at 10:17 -0400, Tom Lane wrote:
> Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> > After fixing the statistics difference in dumps of tables with
> > indexes, I now see difference in statistics of materialized view
> > dump
> > in the test I am developing at [1] (see the latest patches there).
>
> Are you doing the restore in parallel by any chance?  I had a todo
> item to revisit the dependencies that pg_dump is creating for stats
> items, because they looked wrong to me, ie inadequate to guarantee
> correct restore order.

It's creating a dependency on the relation and a boundary dependency on
the postDataBound (unless it's an index, or an MV that got pushed to
SECTION_POST_DATA).

I suspect what we need here is a dependency on the MV *data*, because
that's doing a heap swap, which resets the stats. Looking into it.

Regards,
    Jeff Davis




Jeff Davis <pgsql@j-davis.com> writes:
> On Tue, 2025-03-11 at 10:17 -0400, Tom Lane wrote:
>> Are you doing the restore in parallel by any chance?  I had a todo
>> item to revisit the dependencies that pg_dump is creating for stats
>> items, because they looked wrong to me, ie inadequate to guarantee
>> correct restore order.

> It's creating a dependency on the relation and a boundary dependency on
> the postDataBound (unless it's an index, or an MV that got pushed to
> SECTION_POST_DATA).
> I suspect what we need here is a dependency on the MV *data*, because
> that's doing a heap swap, which resets the stats. Looking into it.

Right, that was what I was thinking, but hadn't had time to look in
detail.  The postDataBound dependency isn't real helpful here, we
could lose that if we had the data dependency.

            regards, tom lane



On Tue, 2025-03-11 at 11:26 -0400, Tom Lane wrote:
> Right, that was what I was thinking, but hadn't had time to look in
> detail.  The postDataBound dependency isn't real helpful here, we
> could lose that if we had the data dependency.

Attached a patch.

It's a bit messier than I expected, so I'm open to other suggestions.
The reason is because materialized view data is also pushed to
RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
(otherwise the dependency is just ignored).

Regards,
    Jeff Davis


Attachment
On Wed, Mar 12, 2025 at 4:08 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2025-03-11 at 11:26 -0400, Tom Lane wrote:
> > Right, that was what I was thinking, but hadn't had time to look in
> > detail.  The postDataBound dependency isn't real helpful here, we
> > could lose that if we had the data dependency.
>
> Attached a patch.
>
> It's a bit messier than I expected, so I'm open to other suggestions.
> The reason is because materialized view data is also pushed to
> RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
> (otherwise the dependency is just ignored).

I ran my test with this patch (we have to remove 0003 patch in my test
which uses --no-statistics option). It failed with following
differences
@@ -452068,8 +452068,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_aa'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452097,8 +452097,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tm_type'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '3'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452111,8 +452111,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tvmm_expr'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--=== stderr ===
=== EOF ===

The previous differences have disappeared but new differences have appeared.

--
Best Wishes,
Ashutosh Bapat



On Wed, Mar 12, 2025 at 4:29 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> I ran my test with this patch (we have to remove 0003 patch in my test
> which uses --no-statistics option). It failed with following
> differences
> @@ -452068,8 +452068,8 @@
> SELECT * FROM pg_catalog.pg_restore_relation_stats(
> 'version', '180000'::integer,
> 'relation', 'public.mvtest_aa'::regclass,
> - 'relpages', '2'::integer,
> - 'reltuples', '1'::real,
> + 'relpages', '0'::integer,
> + 'reltuples', '-1'::real,
> 'relallvisible', '0'::integer
> );
> --
> @@ -452097,8 +452097,8 @@
> SELECT * FROM pg_catalog.pg_restore_relation_stats(
> 'version', '180000'::integer,
> 'relation', 'public.mvtest_tm_type'::regclass,
> - 'relpages', '2'::integer,
> - 'reltuples', '3'::real,
> + 'relpages', '0'::integer,
> + 'reltuples', '-1'::real,
> 'relallvisible', '0'::integer
> );
> --
> @@ -452111,8 +452111,8 @@
> SELECT * FROM pg_catalog.pg_restore_relation_stats(
> 'version', '180000'::integer,
> 'relation', 'public.mvtest_tvmm_expr'::regclass,
> - 'relpages', '2'::integer,
> - 'reltuples', '1'::real,
> + 'relpages', '0'::integer,
> + 'reltuples', '-1'::real,
> 'relallvisible', '0'::integer
> );
> --=== stderr ===
> === EOF ===
>
> The previous differences have disappeared but new differences have appeared.

Pulled the latest sources but the test is still failing with the same
differences.

--
Best Wishes,
Ashutosh Bapat



On Thu, 2025-03-27 at 17:07 +0530, Ashutosh Bapat wrote:
> Pulled the latest sources but the test is still failing with the same
> differences.

The attached set of patches (your 0001 and 0002, combined with my patch
v2j-0003) applied on master (058b5152f0) are passing the pg_upgrade
test suite for me.

Are you saying that the tests don't work for you even when v2j-0003 is
applied? Or are you saying that your tests are failing on master, and
that v2j-0002 should be committed?

Regards,
    Jeff Davis


Attachment
On Fri, Mar 28, 2025 at 10:41 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2025-03-27 at 17:07 +0530, Ashutosh Bapat wrote:
> > Pulled the latest sources but the test is still failing with the same
> > differences.
>
> The attached set of patches (your 0001 and 0002, combined with my patch
> v2j-0003) applied on master (058b5152f0) are passing the pg_upgrade
> test suite for me.
>
> Are you saying that the tests don't work for you even when v2j-0003 is
> applied? Or are you saying that your tests are failing on master, and
> that v2j-0002 should be committed?


When I applied v1 it didn't pass.

But applying v2j-0003, the test passes.

I don't think I understand the patch fully. But I have a comment.

+ * However, the section may be updated later for materialized views.
+ * Matview stats depend on the matview data, because REFRESH
+ * MATERIALIZED VIEW replaces the storage and resets the stats, and
+ * the matview data is in SECTION_POST_DATA. Also, the materialized
+ * SECTION_POST_DATA to resolve some kinds of dependency problems (see
+ * repairMatViewBoundaryMultiLoop()).
+ */

It feels like we can simplify this as: REFRESH MATERIALIZED VIEW
replaces storage and resets the stats hence the Matview stats should
be printed after printing REFRESH MATERIALIZED VIEW command in
SECTION_POST_DATA. Also the materialized view ... . Hence the section
may be updated later for materialized views.

--
Best Wishes,
Ashutosh Bapat



On Fri, 2025-03-28 at 14:53 +0530, Ashutosh Bapat wrote:
> When I applied v1 it didn't pass.

I applied v1 on top of master as of March 15 (771ba90298), and then
took your two changes adding the tests, and it passed.

Version v2j is just rebased forward, which involved a trivial merge
conflict.

> But applying v2j-0003, the test passes.

Great.

> It feels like we can simplify this as: REFRESH MATERIALIZED VIEW
> replaces storage and resets the stats hence the Matview stats should
> be printed after printing REFRESH MATERIALIZED VIEW command in
> SECTION_POST_DATA. Also the materialized view ... . Hence the section
> may be updated later for materialized views.

Done, with a few modifications. Hopefully it's simpler.


Tom suggested that there may be other dependency problems lurking, but
I believe the patch is an improvement, so I committed it now to unblock
your work.

Regards,
    Jeff Davis