Thread: pg_basebackup
Hello, We're facing in a customer installation (PostgreSQL 13.1 on Linux) the following problem for the first time and not reproducible: The effective part of our backup script contains: ... test -d ${BACKUPWAL}-${DATE}-${NUM}/ || mkdir -p ${BACKUPWAL}-${DATE}-${NUM}/ # kick to archive the current log; use a DB which will exist; # psql -U ${DBSUSER} -dpostgres -c "select pg_switch_wal();" > /dev/null # backup the cluster # printf "%s: pg_basebackup the cluster to %s ... " "`date "+%d.%m.%Y-%H:%M:%S"`" ${BACKUPDIR}-${DATE}-${NUM} ${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM} ... The resulting stdout/stderr of the script: 16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ... pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_basebackup: child process exited with error 1 pg-error.log: 2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout Why the PostgreSQL server says something about "replication", we do pg_basebackup? Some more information: - wal_sender_timeout has default value (60s) - backup target is a local file, not a network storage - the Linux SLES 15 server is good equipped - nothing is logged in /var/log/messages Any ideas? Thanks. matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
On Mon, 2023-11-20 at 07:30 +0100, Matthias Apitz wrote: > We're facing in a customer installation (PostgreSQL 13.1 on Linux) the > following problem for the first time and not reproducible: 13.1? Your immediate reaction should be "update to the latest minor release". > ${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM} > > The resulting stdout/stderr of the script: > > 16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ... > pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > pg_basebackup: child process exited with error 1 > > pg-error.log: > > 2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout > > Why the PostgreSQL server says something about "replication", we do > pg_basebackup? Because "pg_basebackup" uses a replication connection. > Some more information: > > - wal_sender_timeout has default value (60s) Increase "wal_sender_timeout", perhaps to 0 (which means "infinite"). Yours, Laurenz Albe
## Matthias Apitz (guru@unixarea.de): > 2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout Besides "what Lauenz said" (especially about the horribly ooutdated PostgreSQL version): check IO speed and saturation during backup and make sure you're not stalling. I've seen this beaviour a few times, mostly in conjunction with btrfs - using a suitably proven filesystem usually solved the problem (overloaded hardware can be a problem, too - but modern systems can take quite a bit more than in the olden days of spinning rust). Regards, Christoph -- Spare Space.