Thread: The 031_recovery_conflict.pl test might fail due to late pgstat entries flushing
The 031_recovery_conflict.pl test might fail due to late pgstat entries flushing
From
Alexander Lakhin
Date:
Hello hackers, A recent buildfarm failure [1] shows that the test 031_recovery_conflict.pl can fail yet another way: 23/296 postgresql:recovery / recovery/031_recovery_conflict ERROR 11.55s exit status 1 [07:58:53.979](0.255s) ok 11 - tablespace conflict: logfile contains terminated connection due to recovery conflict [07:58:54.058](0.080s) not ok 12 - tablespace conflict: stats show conflict on standby [07:58:54.059](0.000s) # Failed test 'tablespace conflict: stats show conflict on standby' # at /home/bf/bf-build/rorqual/REL_17_STABLE/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 332. [07:58:54.059](0.000s) # got: '0' # expected: '1' I managed to reproduce a similar failure by running multiple test instances in parallel on a slow VM. With extra logging added, I see in a failed run log: 10 10 # Failed test 'startup deadlock: stats show conflict on standby' 10 # at t/031_recovery_conflict.pl line 368. 10 # got: '0' 10 # expected: '1' 10 [19:48:19] t/031_recovery_conflict.pl .. 10 Dubious, test returned 1 (wstat 256, 0x100) 2024-07-23 19:48:13.966 UTC [1668402:12][client backend][1/2:0] LOG: !!!pgstat_report_recovery_conflict| reason: 13, pgstat_track_counts: 1 at character 15 2024-07-23 19:48:13.966 UTC [1668402:13][client backend][1/2:0] STATEMENT: SELECT * FROM test_recovery_conflict_table2; 2024-07-23 19:48:13.966 UTC [1668402:14][client backend][1/2:0] ERROR: canceling statement due to conflict with recovery at character 15 2024-07-23 19:48:13.966 UTC [1668402:15][client backend][1/2:0] DETAIL: User transaction caused buffer deadlock with recovery. ... 2024-07-23 19:48:14.129 UTC [1668805:8][client backend][5/2:0] LOG: statement: SELECT confl_deadlock FROM pg_stat_database_conflicts WHERE datname='test_db'; ... 2024-07-23 19:48:14.148 UTC [1668402:16][client backend][1/0:0] LOG: !!!pgstat_database_flush_cb| nowait: 0 This failure can be reproduced easily with a sleep added as below: @@ -514,6 +514,7 @@ pgstat_shutdown_hook(int code, Datum arg) if (OidIsValid(MyDatabaseId)) pgstat_report_disconnect(MyDatabaseId); +if (rand() % 5 == 0) pg_usleep(100000); pgstat_report_stat(true); By running the test in a loop, I get miscellaneous "stats show conflict on standby" failures, including: iteration 19 # +++ tap check in src/test/recovery +++ t/031_recovery_conflict.pl .. 1/? # Failed test 'buffer pin conflict: stats show conflict on standby' # at t/031_recovery_conflict.pl line 332. # got: '0' # expected: '1' t/031_recovery_conflict.pl .. 17/? # Looks like you failed 1 test of 18. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-07-23%2007%3A56%3A35 Best regards, Alexander