Thread: Unstable regression test for contrib/pageinspect
My very slow buildfarm animal mamba has failed pageinspect several times [1][2][3][4] with this symptom: diff -U3 /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out --- /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out 2022-11-20 10:12:51.780935488 -0500 +++ /home/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out 2022-11-20 14:00:25.818743985 -0500 @@ -92,9 +92,9 @@ SELECT t_infomask, t_infomask2, raw_flags, combined_flags FROM heap_page_items(get_raw_page('test1', 0)), LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2); - t_infomask | t_infomask2 | raw_flags | combined_flags -------------+-------------+-----------------------------------------------------------+-------------------- - 2816 | 2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} | {HEAP_XMIN_FROZEN} + t_infomask | t_infomask2 | raw_flags | combined_flags +------------+-------------+-----------------------------------------+---------------- + 2304 | 2 | {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {} (1 row) -- tests for decoding of combined flags It's not hard to guess what the problem is here: the immediately preceding bit is hopelessly optimistic. -- If we freeze the only tuple on test1, the infomask should -- always be the same in all test runs. VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test1; The fact that you asked for a freeze doesn't mean you'll get a freeze. I suppose that a background auto-analyze is holding back global xmin so that the tuple doesn't actually get frozen. The core reloptions.sql and vacuum.sql tests are two places that are also using this option, but they are applying it to temp tables, which I think makes it safe (and the lack of failures, seeing that they run within parallel test groups, reinforces that). Can we apply that idea in pageinspect? contrib/amcheck and contrib/pg_visibility are also using DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards. I haven't seen them fall over, though. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-11-20%2015%3A13%3A19 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-10-31%2013%3A33%3A35 [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-10-19%2016%3A34%3A07 [4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-08-29%2017%3A49%3A02
On Sun, Nov 20, 2022 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The core reloptions.sql and vacuum.sql tests are two places that are > also using this option, but they are applying it to temp tables, > which I think makes it safe (and the lack of failures, seeing that > they run within parallel test groups, reinforces that). Can we apply > that idea in pageinspect? I believe so. The temp table horizons guarantee isn't all that old, so the tests may well have been written before it was possible. > contrib/amcheck and contrib/pg_visibility are also using > DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards. > I haven't seen them fall over, though. DISABLE_PAGE_SKIPPING forces aggressive mode (which is also possible with FREEZE), but unlike FREEZE it also forces VACUUM to scan even all-frozen pages. The other difference is that DISABLE_PAGE_SKIPPING doesn't affect FreezeLimit/freeze_min_age, whereas FREEZE sets it to 0. I think that most use of DISABLE_PAGE_SKIPPING by the regression tests just isn't necessary. Especially where it's combined with FREEZE like this, as it often seems to be. Why should the behavior around skipping all-frozen pages (the only thing changed by using DISABLE_PAGE_SKIPPING on top of FREEZE) actually matter to these tests? -- Peter Geoghegan
> On Nov 20, 2022, at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > contrib/amcheck and contrib/pg_visibility are also using > DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards. > I haven't seen them fall over, though. In the amcheck regression test case, it's because the test isn't sensitive to whether the freeze actually happens. You cancomment out that line, and the only test difference is the comment: @@ -108,8 +108,8 @@ ERROR: ending block number must be between 0 and 0 SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000); ERROR: starting block number must be between 0 and 0 --- Vacuum freeze to change the xids encountered in subsequent tests -VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) heaptest; +-- -- Vacuum freeze to change the xids encountered in subsequent tests +-- VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) heaptest; -- Check that valid options are not rejected nor corruption reported -- for a non-empty frozen table SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none'); The amcheck TAP test is sensitive to commenting out the freeze, though: t/001_verify_heapam.pl .. 42/? # Failed test 'all-frozen corrupted table skipping all-frozen' # at t/001_verify_heapam.pl line 58. # got: '0|3||line pointer redirection to item at offset 21840 exceeds maximum offset 38 # 0|4||line pointer to page offset 21840 with length 21840 ends beyond maximum page offset 8192 # 0|5||line pointer redirection to item at offset 0 precedes minimum offset 1 # 0|6||line pointer length 0 is less than the minimum tuple header size 24 # 0|7||line pointer to page offset 15 is not maximally aligned # 0|8||line pointer length 15 is less than the minimum tuple header size 24' # expected: '' t/001_verify_heapam.pl .. 211/? # Looks like you failed 1 test of 272. t/001_verify_heapam.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/272 subtests t/002_cic.pl ............ ok t/003_cic_2pc.pl ........ ok Test Summary Report ------------------- t/001_verify_heapam.pl (Wstat: 256 (exited 1) Tests: 272 Failed: 1) Failed test: 80 Non-zero exit status: 1 Files=3, Tests=280, 10 wallclock secs ( 0.05 usr 0.02 sys + 3.84 cusr 3.10 csys = 7.01 CPU) Result: FAIL make: *** [check] Error 1 But the TAP test also disables autovacuum, so a background auto-analyze shouldn't be running. Maybe that's why you haven'tseen amcheck fall over? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes: > On Nov 20, 2022, at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> contrib/amcheck and contrib/pg_visibility are also using >> DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards. >> I haven't seen them fall over, though. > In the amcheck regression test case, it's because the test isn't > sensitive to whether the freeze actually happens. You can comment > out that line, and the only test difference is the comment: Interesting. I tried that with pg_visibility, with the same result: removing its VACUUM commands altogether changes nothing else in the test output. I'm not sure this is a good thing. It makes one wonder whether these tests really test what they claim to. But it certainly explains the lack of failures. > The amcheck TAP test is sensitive to commenting out the freeze, though: > ... > But the TAP test also disables autovacuum, so a background > auto-analyze shouldn't be running. Maybe that's why you haven't > seen amcheck fall over? Ah, right, I see $node->append_conf('postgresql.conf', 'autovacuum=off'); in 001_verify_heapam.pl. So that one's okay too. Bottom line seems to be that converting pageinspect's test table to a temp table should fix this. If no objections, I'll do that tomorrow. regards, tom lane
Peter Geoghegan <pg@bowt.ie> writes: > On Sun, Nov 20, 2022 at 12:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The core reloptions.sql and vacuum.sql tests are two places that are >> also using this option, but they are applying it to temp tables, >> which I think makes it safe (and the lack of failures, seeing that >> they run within parallel test groups, reinforces that). Can we apply >> that idea in pageinspect? > I believe so. The temp table horizons guarantee isn't all that old, so > the tests may well have been written before it was possible. Ah, right, I see that that only dates back to v14 (cf a7212be8b). So we can fix pageinspect's issue by making that table be temp, but only as far back as v14. That's probably good enough in terms of reducing the buildfarm noise level, seeing that mamba has only reported this failure on HEAD so far. I'd be tempted to propose back-patching a7212be8b, but there would be ABI-stability issues, and it's probably not worth dealing with that. >> contrib/amcheck and contrib/pg_visibility are also using >> DISABLE_PAGE_SKIPPING, so I wonder if they have similar hazards. > I think that most use of DISABLE_PAGE_SKIPPING by the regression tests > just isn't necessary. Apparently not -- see followup discussion with Mark Dilger. regards, tom lane