Thread: pg_basebackup

pg_basebackup

From
Matthias Apitz
Date:
Hello,

We're facing in a customer installation (PostgreSQL 13.1 on Linux) the
following problem for the first time and not reproducible:

The effective part of our backup script contains:
...
test -d ${BACKUPWAL}-${DATE}-${NUM}/ || mkdir -p ${BACKUPWAL}-${DATE}-${NUM}/

# kick to archive the current log; use a DB which will exist;
#
psql -U ${DBSUSER} -dpostgres -c "select pg_switch_wal();" > /dev/null

# backup the cluster
#
printf "%s: pg_basebackup the cluster to %s ... " "`date "+%d.%m.%Y-%H:%M:%S"`" ${BACKUPDIR}-${DATE}-${NUM}
${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM}

...


The resulting stdout/stderr of the script:

16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ... 
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
pg_basebackup: child process exited with error 1

pg-error.log:

2023-11-16 20:34:13.538 CET [6250] LOG:  terminating walsender process due to replication timeout

Why the PostgreSQL server says something about "replication", we do
pg_basebackup?

Some more information:

- wal_sender_timeout has default value (60s)
- backup target is a local file, not a network storage
- the Linux SLES 15 server is good equipped
- nothing is logged in /var/log/messages

Any ideas? Thanks.

    matthias


-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: pg_basebackup

From
Laurenz Albe
Date:
On Mon, 2023-11-20 at 07:30 +0100, Matthias Apitz wrote:
> We're facing in a customer installation (PostgreSQL 13.1 on Linux) the
> following problem for the first time and not reproducible:

13.1?  Your immediate reaction should be "update to the latest minor release".

> ${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM}
>
> The resulting stdout/stderr of the script:
>
> 16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ...
> pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> pg_basebackup: child process exited with error 1
>
> pg-error.log:
>
> 2023-11-16 20:34:13.538 CET [6250] LOG:  terminating walsender process due to replication timeout
>
> Why the PostgreSQL server says something about "replication", we do
> pg_basebackup?

Because "pg_basebackup" uses a replication connection.

> Some more information:
>
> - wal_sender_timeout has default value (60s)

Increase "wal_sender_timeout", perhaps to 0 (which means "infinite").

Yours,
Laurenz Albe



Re: pg_basebackup

From
Christoph Moench-Tegeder
Date:
## Matthias Apitz (guru@unixarea.de):

> 2023-11-16 20:34:13.538 CET [6250] LOG:  terminating walsender process due to replication timeout

Besides "what Lauenz said" (especially about the horribly ooutdated
PostgreSQL version): check IO speed and saturation during backup
and make sure you're not stalling. I've seen this beaviour a few
times, mostly in conjunction with btrfs - using a suitably proven
filesystem usually solved the problem (overloaded hardware can
be a problem, too - but modern systems can take quite a bit more
than in the olden days of spinning rust).

Regards,
Christoph

-- 
Spare Space.