bailing out in tap tests nearly always a bad idea - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | bailing out in tap tests nearly always a bad idea |
Date | |
Msg-id | 20220213232249.7sevhlioapydla37@alap3.anarazel.de Whole thread Raw |
Responses |
Re: bailing out in tap tests nearly always a bad idea
Re: bailing out in tap tests nearly always a bad idea |
List | pgsql-hackers |
Hi, A recent cfbot failure reminded me of a topic I wanted to raise: The use of BAIL in tap tests. When a tap test exits with BAIL it doesn't just consider that test failed, but it also stops running further tap tests in the same group: https://metacpan.org/pod/Test::More "Indicates to the harness that things are going so badly all testing should terminate. This includes the running of any additional test scripts." There may be a few cases where stopping testing makes sense, but nearly all of the uses of bail in our tests look like they just want to fail the the current test, rather than other tests in the same prove invocation. Besides aborting other tests, it's also just hard to parse the output: https://cirrus-ci.com/task/6243385840238592?logs=test_world#L2329 ... [16:28:12.178] [16:27:23] t/024_archive_recovery.pl ............ ok 3440 ms ( 0.00 usr 0.00 sys + 1.43 cusr 0.64 csys= 2.07 CPU) [16:28:12.178] [16:27:26] t/025_stuck_on_old_timeline.pl ....... ok 3024 ms ( 0.00 usr 0.00 sys + 1.21 cusr 0.61 csys= 1.82 CPU) [16:28:12.178] [16:27:30] t/026_overwrite_contrecord.pl ........ ok 3693 ms ( 0.01 usr 0.00 sys + 1.05 cusr 0.42 csys= 1.48 CPU) [16:28:12.178] Bailout called. Further testing stopped: command "/tmp/cirrus-ci-build/src/test/recovery/../../../src/test/regress/pg_regress --dlpath="/tmp/cirrus-ci-build/src/test/regress"--bindir= --host=/tmp/gFHAMHHTJ_ --port=52708 --schedule=../regress/parallel_schedule--max-concurrent-tests=20 --inputdir=../regress --outputdir="/tmp/cirrus-ci-build/src/test/recovery/tmp_check""exited with value 1 [16:28:12.178] FAILED--Further testing stopped: command "/tmp/cirrus-ci-build/src/test/recovery/../../../src/test/regress/pg_regress --dlpath="/tmp/cirrus-ci-build/src/test/regress"--bindir= --host=/tmp/gFHAMHHTJ_ --port=52708 --schedule=../regress/parallel_schedule--max-concurrent-tests=20 --inputdir=../regress --outputdir="/tmp/cirrus-ci-build/src/test/recovery/tmp_check""exited with value 1 [16:28:12.178] make[2]: *** [Makefile:27: check] Error 255 [16:28:12.178] make[1]: *** [Makefile:48: check-recovery-recurse] Error 2 What failed was 027_stream_regress.pl, due to a trivial regression test output change: diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/guc.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/guc.out --- /tmp/cirrus-ci-build/src/test/regress/expected/guc.out 2022-02-13 16:20:53.931949802 +0000 +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/guc.out 2022-02-13 16:28:00.481307866 +0000 @@ -850,7 +850,8 @@ name --------------------------- default_statistics_target -(1 row) + enable_indexskipscan +(2 rows) -- Runtime-computed GUCs should be part of the preset category. SELECT name FROM tab_settings_flags But there's pretty much no way to know from the error output that this failure was in 027_stream_regress.pl (just an example, this is widespread). For some reason prove's output is much worse when the output is redirected to a file. When testing with prove going to a terminal, bail at least prints the test name. interactive: t/000_fail.pl ........................ Bailout called. Further testing stopped: I am bailing FAILED--Further testing stopped: I am bailing file: Bailout called. Further testing stopped: I am bailing FAILED--Further testing stopped: I am bailing the latter is just about undebuggable, particularly when using PROVE_FLAGS='-j...'. Even just a 'die' is better than bail, output wise (and better due to not preventing other tests from running): interactive: t/000_fail.pl ........................ Dubious, test returned 25 (wstat 6400, 0x1900) No subtests run t/001_stream_rep.pl .................. ok file: t/000_fail.pl ........................ Dubious, test returned 25 (wstat 6400, 0x1900) No subtests run t/001_stream_rep.pl .................. ok Does anybody want to defend / explain the use of BAIL_OUT? If not, I think we should consider doing a global replace of the use of bailing. Best with a central function signalling fatal error, rather than individual uses of die or such. Greetings, Andres Freund
pgsql-hackers by date: