Thread: Unstable regression test for contrib/pageinspect

Unstable regression test for contrib/pageinspect

From
Tom Lane
Date:
My very slow buildfarm animal mamba has failed pageinspect
several times [1][2][3][4] with this symptom:

diff -U3 /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out
/home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out
--- /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out    2022-11-20 10:12:51.780935488
-0500
+++ /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out    2022-11-20 14:00:25.818743985
-0500
@@ -92,9 +92,9 @@
 SELECT t_infomask, t_infomask2, raw_flags, combined_flags
 FROM heap_page_items(get_raw_page('test1', 0)),
      LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2);
- t_infomask | t_infomask2 |                         raw_flags                         |   combined_flags
-------------+-------------+-----------------------------------------------------------+--------------------
-       2816 |           2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} | {HEAP_XMIN_FROZEN}
+ t_infomask | t_infomask2 |                raw_flags                | combined_flags
+------------+-------------+-----------------------------------------+----------------
+       2304 |           2 | {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {}
 (1 row)

 -- tests for decoding of combined flags

It's not hard to guess what the problem is here: the immediately preceding
bit is hopelessly optimistic.

-- If we freeze the only tuple on test1, the infomask should
-- always be the same in all test runs.
VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test1;

The fact that you asked for a freeze doesn't mean you'll get a freeze.
I suppose that a background auto-analyze is holding back global xmin
so that the tuple doesn't actually get frozen.

The core reloptions.sql and vacuum.sql tests are two places that are
also using this option, but they are applying it to temp tables,
which I think makes it safe (and the lack of failures, seeing that
they run within parallel test groups, reinforces that).  Can we apply
that idea in pageinspect?

contrib/amcheck and contrib/pg_visibility are also using
DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards.
I haven't seen them fall over, though.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-11-20%2015%3A13%3A19
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-10-31%2013%3A33%3A35
[3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-10-19%2016%3A34%3A07
[4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-08-29%2017%3A49%3A02



Re: Unstable regression test for contrib/pageinspect

From
Peter Geoghegan
Date:
On Sun, Nov 20, 2022 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The core reloptions.sql and vacuum.sql tests are two places that are
> also using this option, but they are applying it to temp tables,
> which I think makes it safe (and the lack of failures, seeing that
> they run within parallel test groups, reinforces that).  Can we apply
> that idea in pageinspect?

I believe so. The temp table horizons guarantee isn't all that old, so
the tests may well have been written before it was possible.

> contrib/amcheck and contrib/pg_visibility are also using
> DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards.
> I haven't seen them fall over, though.

DISABLE_PAGE_SKIPPING forces aggressive mode (which is also possible
with FREEZE), but unlike FREEZE it also forces VACUUM to scan even
all-frozen pages. The other difference is that DISABLE_PAGE_SKIPPING
doesn't affect FreezeLimit/freeze_min_age, whereas FREEZE sets it to
0.

I think that most use of DISABLE_PAGE_SKIPPING by the regression tests
just isn't necessary. Especially where it's combined with FREEZE like
this, as it often seems to be. Why should the behavior around skipping
all-frozen pages (the only thing changed by using
DISABLE_PAGE_SKIPPING on top of FREEZE) actually matter to these
tests?

-- 
Peter Geoghegan



Re: Unstable regression test for contrib/pageinspect

From
Mark Dilger
Date:

> On Nov 20, 2022, at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> contrib/amcheck and contrib/pg_visibility are also using
> DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards.
> I haven't seen them fall over, though.

In the amcheck regression test case, it's because the test isn't sensitive to whether the freeze actually happens.  You
cancomment out that line, and the only test difference is the comment: 

@@ -108,8 +108,8 @@
 ERROR:  ending block number must be between 0 and 0
 SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
 ERROR:  starting block number must be between 0 and 0
--- Vacuum freeze to change the xids encountered in subsequent tests
-VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) heaptest;
+-- -- Vacuum freeze to change the xids encountered in subsequent tests
+-- VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) heaptest;
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty frozen table
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');


The amcheck TAP test is sensitive to commenting out the freeze, though:

t/001_verify_heapam.pl .. 42/?
#   Failed test 'all-frozen corrupted table skipping all-frozen'
#   at t/001_verify_heapam.pl line 58.
#          got: '0|3||line pointer redirection to item at offset 21840 exceeds maximum offset 38
# 0|4||line pointer to page offset 21840 with length 21840 ends beyond maximum page offset 8192
# 0|5||line pointer redirection to item at offset 0 precedes minimum offset 1
# 0|6||line pointer length 0 is less than the minimum tuple header size 24
# 0|7||line pointer to page offset 15 is not maximally aligned
# 0|8||line pointer length 15 is less than the minimum tuple header size 24'
#     expected: ''
t/001_verify_heapam.pl .. 211/? # Looks like you failed 1 test of 272.
t/001_verify_heapam.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/272 subtests
t/002_cic.pl ............ ok
t/003_cic_2pc.pl ........ ok

Test Summary Report
-------------------
t/001_verify_heapam.pl (Wstat: 256 (exited 1) Tests: 272 Failed: 1)
  Failed test:  80
  Non-zero exit status: 1
Files=3, Tests=280, 10 wallclock secs ( 0.05 usr  0.02 sys +  3.84 cusr  3.10 csys =  7.01 CPU)
Result: FAIL
make: *** [check] Error 1


But the TAP test also disables autovacuum, so a background auto-analyze shouldn't be running.  Maybe that's why you
haven'tseen amcheck fall over? 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Unstable regression test for contrib/pageinspect

From
Tom Lane
Date:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
> On Nov 20, 2022, at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> contrib/amcheck and contrib/pg_visibility are also using
>> DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards.
>> I haven't seen them fall over, though.

> In the amcheck regression test case, it's because the test isn't
> sensitive to whether the freeze actually happens.  You can comment
> out that line, and the only test difference is the comment:

Interesting.  I tried that with pg_visibility, with the same result:
removing its VACUUM commands altogether changes nothing else in the
test output.  I'm not sure this is a good thing.  It makes one wonder
whether these tests really test what they claim to.  But it certainly
explains the lack of failures.

> The amcheck TAP test is sensitive to commenting out the freeze, though:
> ...
> But the TAP test also disables autovacuum, so a background
> auto-analyze shouldn't be running.  Maybe that's why you haven't
> seen amcheck fall over?

Ah, right, I see

    $node->append_conf('postgresql.conf', 'autovacuum=off');

in 001_verify_heapam.pl.  So that one's okay too.

Bottom line seems to be that converting pageinspect's test table
to a temp table should fix this.  If no objections, I'll do that
tomorrow.

            regards, tom lane



Re: Unstable regression test for contrib/pageinspect

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, Nov 20, 2022 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The core reloptions.sql and vacuum.sql tests are two places that are
>> also using this option, but they are applying it to temp tables,
>> which I think makes it safe (and the lack of failures, seeing that
>> they run within parallel test groups, reinforces that).  Can we apply
>> that idea in pageinspect?

> I believe so. The temp table horizons guarantee isn't all that old, so
> the tests may well have been written before it was possible.

Ah, right, I see that that only dates back to v14 (cf a7212be8b).
So we can fix pageinspect's issue by making that table be temp,
but only as far back as v14.

That's probably good enough in terms of reducing the buildfarm
noise level, seeing that mamba has only reported this failure
on HEAD so far.  I'd be tempted to propose back-patching
a7212be8b, but there would be ABI-stability issues, and it's
probably not worth dealing with that.

>> contrib/amcheck and contrib/pg_visibility are also using
>> DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards.

> I think that most use of DISABLE_PAGE_SKIPPING by the regression tests
> just isn't necessary.

Apparently not -- see followup discussion with Mark Dilger.

            regards, tom lane