The 026_overwrite_contrecord test might fail on extremely slow animals - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | The 026_overwrite_contrecord test might fail on extremely slow animals |
Date | |
Msg-id | 9ffdb19a-7a89-424e-925a-dd981c37f0ba@gmail.com Whole thread Raw |
List | pgsql-hackers |
Hello hackers, A couple of buildfarm failures produced by skink, which runs tests under Valgrind: [1], [2], with the following diagnostics: 31/324 postgresql:recovery / recovery/026_overwrite_contrecord ERROR 325.72s (exit status 255 or signal 127 SIGinvalid) # Initializing node "standby" from backup "backup" of node "primary" ### Enabling streaming replication for node "standby" ### Starting node "standby" # Running: pg_ctl -w -D /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/data/t_026_overwrite_contrecord_standby_data/pgdata -l /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/log/026_overwrite_contrecord_standby.log -o --cluster-name=standby start waiting for server to start........ stopped waiting pg_ctl: could not start server Examine the log output. # pg_ctl start failed; see logfile for details: /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/log/026_overwrite_contrecord_standby.log # No postmaster PID for node "standby" [14:44:39.051](7.936s) Bail out! pg_ctl start failed 026_overwrite_contrecord_standby.log: 2025-03-14 14:44:38.533 UTC [1558222][startup][:0] LOG: database system was interrupted; last known up at 2025-03-14 14:44:30 UTC 2025-03-14 14:44:38.806 UTC [1558222][startup][:0] LOG: invalid checkpoint record 2025-03-14 14:44:38.808 UTC [1558222][startup][:0] PANIC: could not locate a valid checkpoint record at 0/2094248 2025-03-14 14:44:38.937 UTC [1555866][postmaster][:0] LOG: startup process (PID 1558222) was terminated by signal 6: Aborted 2025-03-14 14:44:38.937 UTC [1555866][postmaster][:0] LOG: aborting startup due to startup process failure 026_overwrite_contrecord_primary.log: 2025-03-14 14:44:30.536 UTC [1553014][client backend][4/2:0] LOG: statement: SELECT pg_walfile_name(pg_current_wal_insert_lsn()) 2025-03-14 14:44:30.621 UTC [1519069][checkpointer][:0] LOG: checkpoint complete: wrote 109 buffers (85.2%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=12.404 s, sync=0.013 s, total=12.574 s; sync files=28, longest=0.004 s, average=0.001 s; distance=8299 kB, estimate=8299 kB; lsn=0/2094248, redo lsn=0/1FA5A48 2025-03-14 14:44:31.119 UTC [1519026][postmaster][:0] LOG: received immediate shutdown request indicates that this test might fail when a checkpoint performed during the test execution (that is, when the duration of the test up to $node->stop('immediate'); exceeds 300 secs). This failure can be easily reproduced with: --- a/src/test/recovery/t/026_overwrite_contrecord.pl +++ b/src/test/recovery/t/026_overwrite_contrecord.pl @@ -24,2 +24,3 @@ autovacuum = off wal_keep_size = 1GB +checkpoint_timeout=30s )); @@ -61,2 +62,3 @@ $node->safe_psql('postgres', #$node->safe_psql('postgres', qq{create table foo ()}); +sleep(40); my $endfile = $node->safe_psql('postgres', The last 5 runs of the test on skink on master show the following duration of the test: 29/331 postgresql:recovery / recovery/026_overwrite_contrecord OK 317.37s 3 subtests passed 29/331 postgresql:recovery / recovery/026_overwrite_contrecord OK 412.78s 3 subtests passed 29/331 postgresql:recovery / recovery/026_overwrite_contrecord OK 448.46s 3 subtests passed 29/331 postgresql:recovery / recovery/026_overwrite_contrecord OK 445.73s 3 subtests passed 29/331 postgresql:recovery / recovery/026_overwrite_contrecord OK 383.50s 3 subtests passed [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-09%2020%3A23%3A09 - REL_17_STABLE [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-14%2013%3A52%3A09 - master Best regards, Alexander Lakhin Neon (https://neon.tech)
pgsql-hackers by date: