dikkop failed the pg_combinebackupCheck/006_db_file_copy.pl test - Mailing list pgsql-hackers

From Alexander Lakhin
Subject dikkop failed the pg_combinebackupCheck/006_db_file_copy.pl test
Date
Msg-id 877b1f23-35d2-31b2-2fcd-d176fd3d05c4@gmail.com
Whole thread Raw
List pgsql-hackers
Hello Tomas,

Please take a look at a recent dikkop's failure [1]. The
regress_log_006_db_file_copy file from that run shows:
[02:08:57.929](0.014s) # initializing database system by copying initdb template
...
[02:09:22.511](24.583s) ok 1 - full backup
...
[02:10:35.758](73.247s) not ok 2 - incremental backup

006_db_file_copy_primary.log contains:
2024-07-28 02:09:29.441 UTC [67785:12] 006_db_file_copy.pl LOG: received replication command: START_REPLICATION SLOT 
"pg_basebackup_67785" 0/4000000 TIMELINE 1
2024-07-28 02:09:29.441 UTC [67785:13] 006_db_file_copy.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_67785" 
0/4000000 TIMELINE 1
2024-07-28 02:09:29.441 UTC [67785:14] 006_db_file_copy.pl LOG: acquired physical replication slot
"pg_basebackup_67785"
2024-07-28 02:09:29.441 UTC [67785:15] 006_db_file_copy.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_67785" 
0/4000000 TIMELINE 1
2024-07-28 02:10:29.487 UTC [67785:16] 006_db_file_copy.pl LOG: terminating walsender process due to replication
timeout
2024-07-28 02:10:29.487 UTC [67785:17] 006_db_file_copy.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_67785" 
0/4000000 TIMELINE 1

It looks like this incremental backup operation was performed slower than
usual (it took more than 60 seconds and apparently was interrupted due to
wal_sender_timeout). But looking at regress_log_006_db_file_copy from the
6 previous (successful) test runs, we can see:
[14:22:16.841](43.215s) ok 2 - incremental backup
[02:14:42.888](34.595s) ok 2 - incremental backup
[17:51:16.152](43.708s) ok 2 - incremental backup
[04:07:16.757](31.087s) ok 2 - incremental backup
[12:15:01.256](49.432s) ok 2 - incremental backup
[01:06:02.482](52.364s) ok 2 - incremental backup

Thus reaching 60s (e.g., due to some background activity) on this animal
seems pretty possible. So maybe it would make sense to increase
wal_sender_timeout for it, say, to 120s?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-07-27%2023%3A22%3A57

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Next
From: Amit Kapila
Date:
Subject: Re: Conflict detection and logging in logical replication