The 026_overwrite_contrecord test might fail on extremely slow animals - Mailing list pgsql-hackers

From Alexander Lakhin
Subject The 026_overwrite_contrecord test might fail on extremely slow animals
Date
Msg-id 9ffdb19a-7a89-424e-925a-dd981c37f0ba@gmail.com
Whole thread Raw
List pgsql-hackers
Hello hackers,

A couple of buildfarm failures produced by skink, which runs
tests under Valgrind: [1], [2], with the following diagnostics:
  31/324 postgresql:recovery / recovery/026_overwrite_contrecord ERROR            325.72s   (exit status 255 or signal

127 SIGinvalid)

# Initializing node "standby" from backup "backup" of node "primary"
### Enabling streaming replication for node "standby"
### Starting node "standby"
# Running: pg_ctl -w -D 

/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/data/t_026_overwrite_contrecord_standby_data/pgdata

-l 

/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/log/026_overwrite_contrecord_standby.log

-o --cluster-name=standby start
waiting for server to start........ stopped waiting
pg_ctl: could not start server
Examine the log output.
# pg_ctl start failed; see logfile for details: 

/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/log/026_overwrite_contrecord_standby.log
# No postmaster PID for node "standby"
[14:44:39.051](7.936s) Bail out!  pg_ctl start failed

026_overwrite_contrecord_standby.log:
2025-03-14 14:44:38.533 UTC [1558222][startup][:0] LOG:  database system was interrupted; last known up at 2025-03-14 
14:44:30 UTC
2025-03-14 14:44:38.806 UTC [1558222][startup][:0] LOG:  invalid checkpoint record
2025-03-14 14:44:38.808 UTC [1558222][startup][:0] PANIC:  could not locate a valid checkpoint record at 0/2094248
2025-03-14 14:44:38.937 UTC [1555866][postmaster][:0] LOG:  startup process (PID 1558222) was terminated by signal 6: 
Aborted
2025-03-14 14:44:38.937 UTC [1555866][postmaster][:0] LOG:  aborting startup due to startup process failure

026_overwrite_contrecord_primary.log:
2025-03-14 14:44:30.536 UTC [1553014][client backend][4/2:0] LOG: statement: SELECT 
pg_walfile_name(pg_current_wal_insert_lsn())
2025-03-14 14:44:30.621 UTC [1519069][checkpointer][:0] LOG: checkpoint complete: wrote 109 buffers (85.2%), wrote 3 
SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=12.404 s, sync=0.013 s, total=12.574 s; sync files=28,

longest=0.004 s, average=0.001 s; distance=8299 kB, estimate=8299 kB; lsn=0/2094248, redo lsn=0/1FA5A48
2025-03-14 14:44:31.119 UTC [1519026][postmaster][:0] LOG:  received immediate shutdown request

indicates that this test might fail when a checkpoint performed during the
test execution (that is, when the duration of the test up to
$node->stop('immediate');
exceeds 300 secs).

This failure can be easily reproduced with:
--- a/src/test/recovery/t/026_overwrite_contrecord.pl
+++ b/src/test/recovery/t/026_overwrite_contrecord.pl
@@ -24,2 +24,3 @@ autovacuum = off
  wal_keep_size = 1GB
+checkpoint_timeout=30s
  ));
@@ -61,2 +62,3 @@ $node->safe_psql('postgres',
  #$node->safe_psql('postgres', qq{create table foo ()});
+sleep(40);
  my $endfile = $node->safe_psql('postgres',

The last 5 runs of the test on skink on master show the following duration of the test:
  29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 317.37s   3 subtests passed
  29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 412.78s   3 subtests passed
  29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 448.46s   3 subtests passed
  29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 445.73s   3 subtests passed
  29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 383.50s   3 subtests passed

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-09%2020%3A23%3A09 - REL_17_STABLE
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-14%2013%3A52%3A09 - master

Best regards,
Alexander Lakhin
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: "Gurjeet Singh"
Date:
Subject: Minor edits to README.tuplock, and a question
Next
From: Etsuro Fujita
Date:
Subject: Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw