Hello Tomas,
Please take a look at a recent dikkop's failure [1]. The
regress_log_006_db_file_copy file from that run shows:
[02:08:57.929](0.014s) # initializing database system by copying initdb template
...
[02:09:22.511](24.583s) ok 1 - full backup
...
[02:10:35.758](73.247s) not ok 2 - incremental backup
006_db_file_copy_primary.log contains:
2024-07-28 02:09:29.441 UTC [67785:12] 006_db_file_copy.pl LOG: received replication command: START_REPLICATION SLOT
"pg_basebackup_67785" 0/4000000 TIMELINE 1
2024-07-28 02:09:29.441 UTC [67785:13] 006_db_file_copy.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_67785"
0/4000000 TIMELINE 1
2024-07-28 02:09:29.441 UTC [67785:14] 006_db_file_copy.pl LOG: acquired physical replication slot
"pg_basebackup_67785"
2024-07-28 02:09:29.441 UTC [67785:15] 006_db_file_copy.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_67785"
0/4000000 TIMELINE 1
2024-07-28 02:10:29.487 UTC [67785:16] 006_db_file_copy.pl LOG: terminating walsender process due to replication
timeout
2024-07-28 02:10:29.487 UTC [67785:17] 006_db_file_copy.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_67785"
0/4000000 TIMELINE 1
It looks like this incremental backup operation was performed slower than
usual (it took more than 60 seconds and apparently was interrupted due to
wal_sender_timeout). But looking at regress_log_006_db_file_copy from the
6 previous (successful) test runs, we can see:
[14:22:16.841](43.215s) ok 2 - incremental backup
[02:14:42.888](34.595s) ok 2 - incremental backup
[17:51:16.152](43.708s) ok 2 - incremental backup
[04:07:16.757](31.087s) ok 2 - incremental backup
[12:15:01.256](49.432s) ok 2 - incremental backup
[01:06:02.482](52.364s) ok 2 - incremental backup
Thus reaching 60s (e.g., due to some background activity) on this animal
seems pretty possible. So maybe it would make sense to increase
wal_sender_timeout for it, say, to 120s?
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-07-27%2023%3A22%3A57
Best regards,
Alexander