"invalid record length" after restoring pg_basebackup - Mailing list pgsql-general

From Dennis Jacobfeuerborn
Subject "invalid record length" after restoring pg_basebackup
Date
Msg-id c729cdd1-73c3-73f8-9f40-dd0fc84efb29@conversis.de
Whole thread Raw
Responses Re: "invalid record length" after restoring pg_basebackup
List pgsql-general
Hi,
I've run into a strange issue after restoring a backup that I created
using pg_basebackup on a standby instance. The command I use to create
the backup is this:

pg_basebackup -v --write-recovery-conf -h$BACKUP_HOST -p5432
-U$BACKUP_USER --format tar --wal-method stream --compress=2 -D
"$BACKUP_DIR"

This backup runs fine and produces a "base.tar.gz" and "pg_wal.tar.gz"
file. The server version is 11.7.

To restore the backup on another system I unpack the "base.tar.gz" file
into the data directory and "pg_wal.tar.gz" into the "pg_wal"
sub-directory. I then comment out the "primary_conninfo" directive in
the "recovery.conf" file since I don't want to replicate from the
primary on this system I just want to restore the state from the backup.

All of this works fine and the logs report that the db reaches a
consistent recovery state but as last entry it reports an "invalid
record length":

2020-11-13 12:25:34.266 UTC [1] LOG:  listening on IPv4 address
"0.0.0.0", port 5432
2020-11-13 12:25:34.266 UTC [1] LOG:  listening on IPv6 address "::",
port 5432
2020-11-13 12:25:34.269 UTC [1] LOG:  listening on Unix socket
"/var/run/postgresql/.s.PGSQL.5432"
2020-11-13 12:25:34.437 UTC [20] LOG:  database system was interrupted
while in recovery at log time 2020-11-03 14:36:00 UTC
2020-11-13 12:25:34.437 UTC [20] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an
earlier recovery target.
2020-11-13 12:25:35.359 UTC [20] WARNING:  recovery command file
"recovery.conf" specified neither primary_conninfo nor restore_command
2020-11-13 12:25:35.359 UTC [20] HINT:  The database server will
regularly poll the pg_wal subdirectory to check for files placed there.
2020-11-13 12:25:35.359 UTC [20] LOG:  entering standby mode
2020-11-13 12:25:35.372 UTC [20] LOG:  redo starts at 932/9D123E70
2020-11-13 12:26:45.435 UTC [20] LOG:  consistent recovery state reached
at 933/AA63F600
2020-11-13 12:26:45.435 UTC [1] LOG:  database system is ready to accept
read only connections
2020-11-13 12:26:45.444 UTC [20] LOG:  invalid record length at
933/AA6DA660: wanted 24, got 0

Looking at the WAL data the last file is "0000000100000933000000AA" and
there is a file "archive_status/0000000100000933000000A9.done" but no
corresponding ".done" file for the "...AA" WAL file.

Running pg_waldump on the "...AA" file shows the following at the end:

$ pg_waldump 0000000100000933000000AA
...
rmgr: Heap        len (rec/tot):     54/    54, tx:    1743220, lsn:
933/AA6DA510, prev 933/AA6D8F08, desc: LOCK off 12: xid 1743220: flags 0
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/13117/874105 blk 0
rmgr: Heap        len (rec/tot):     81/    81, tx:    1743220, lsn:
933/AA6DA548, prev 933/AA6DA510, desc: INSERT off 19, blkref #0: rel
1663/13117/868296 blk 0
rmgr: Btree       len (rec/tot):     80/    80, tx:    1743220, lsn:
933/AA6DA5A0, prev 933/AA6DA548, desc: INSERT_LEAF off 24, blkref #0:
rel 1663/13117/868299 blk 1
rmgr: Btree       len (rec/tot):     72/    72, tx:    1743220, lsn:
933/AA6DA5F0, prev 933/AA6DA5A0, desc: INSERT_LEAF off 23, blkref #0:
rel 1663/13117/915254 blk 2
rmgr: Transaction len (rec/tot):     34/    34, tx:    1743220, lsn:
933/AA6DA638, prev 933/AA6DA5F0, desc: COMMIT 2020-11-03 14:49:29.205562 UTC
pg_waldump: FATAL:  error in WAL record at 933/AA6DA638: invalid record
length at 933/AA6DA660: wanted 24, got 0

It looks like the WAL in the backup is inconsistent at the end but I'd
expect pg_basebackup to create a clean backup. Is this something I
should expect or is this a problem with pg_basebackup?

Regards,
  Dennis



pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Failed Login Attempts in PostgreSQL
Next
From: Radoslav Nedyalkov
Date:
Subject: conflict with recovery when delay is gone