Thread: "invalid record length" after restoring pg_basebackup

"invalid record length" after restoring pg_basebackup

From
Dennis Jacobfeuerborn
Date:
Hi,
I've run into a strange issue after restoring a backup that I created
using pg_basebackup on a standby instance. The command I use to create
the backup is this:

pg_basebackup -v --write-recovery-conf -h$BACKUP_HOST -p5432
-U$BACKUP_USER --format tar --wal-method stream --compress=2 -D
"$BACKUP_DIR"

This backup runs fine and produces a "base.tar.gz" and "pg_wal.tar.gz"
file. The server version is 11.7.

To restore the backup on another system I unpack the "base.tar.gz" file
into the data directory and "pg_wal.tar.gz" into the "pg_wal"
sub-directory. I then comment out the "primary_conninfo" directive in
the "recovery.conf" file since I don't want to replicate from the
primary on this system I just want to restore the state from the backup.

All of this works fine and the logs report that the db reaches a
consistent recovery state but as last entry it reports an "invalid
record length":

2020-11-13 12:25:34.266 UTC [1] LOG:  listening on IPv4 address
"0.0.0.0", port 5432
2020-11-13 12:25:34.266 UTC [1] LOG:  listening on IPv6 address "::",
port 5432
2020-11-13 12:25:34.269 UTC [1] LOG:  listening on Unix socket
"/var/run/postgresql/.s.PGSQL.5432"
2020-11-13 12:25:34.437 UTC [20] LOG:  database system was interrupted
while in recovery at log time 2020-11-03 14:36:00 UTC
2020-11-13 12:25:34.437 UTC [20] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an
earlier recovery target.
2020-11-13 12:25:35.359 UTC [20] WARNING:  recovery command file
"recovery.conf" specified neither primary_conninfo nor restore_command
2020-11-13 12:25:35.359 UTC [20] HINT:  The database server will
regularly poll the pg_wal subdirectory to check for files placed there.
2020-11-13 12:25:35.359 UTC [20] LOG:  entering standby mode
2020-11-13 12:25:35.372 UTC [20] LOG:  redo starts at 932/9D123E70
2020-11-13 12:26:45.435 UTC [20] LOG:  consistent recovery state reached
at 933/AA63F600
2020-11-13 12:26:45.435 UTC [1] LOG:  database system is ready to accept
read only connections
2020-11-13 12:26:45.444 UTC [20] LOG:  invalid record length at
933/AA6DA660: wanted 24, got 0

Looking at the WAL data the last file is "0000000100000933000000AA" and
there is a file "archive_status/0000000100000933000000A9.done" but no
corresponding ".done" file for the "...AA" WAL file.

Running pg_waldump on the "...AA" file shows the following at the end:

$ pg_waldump 0000000100000933000000AA
...
rmgr: Heap        len (rec/tot):     54/    54, tx:    1743220, lsn:
933/AA6DA510, prev 933/AA6D8F08, desc: LOCK off 12: xid 1743220: flags 0
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/13117/874105 blk 0
rmgr: Heap        len (rec/tot):     81/    81, tx:    1743220, lsn:
933/AA6DA548, prev 933/AA6DA510, desc: INSERT off 19, blkref #0: rel
1663/13117/868296 blk 0
rmgr: Btree       len (rec/tot):     80/    80, tx:    1743220, lsn:
933/AA6DA5A0, prev 933/AA6DA548, desc: INSERT_LEAF off 24, blkref #0:
rel 1663/13117/868299 blk 1
rmgr: Btree       len (rec/tot):     72/    72, tx:    1743220, lsn:
933/AA6DA5F0, prev 933/AA6DA5A0, desc: INSERT_LEAF off 23, blkref #0:
rel 1663/13117/915254 blk 2
rmgr: Transaction len (rec/tot):     34/    34, tx:    1743220, lsn:
933/AA6DA638, prev 933/AA6DA5F0, desc: COMMIT 2020-11-03 14:49:29.205562 UTC
pg_waldump: FATAL:  error in WAL record at 933/AA6DA638: invalid record
length at 933/AA6DA660: wanted 24, got 0

It looks like the WAL in the backup is inconsistent at the end but I'd
expect pg_basebackup to create a clean backup. Is this something I
should expect or is this a problem with pg_basebackup?

Regards,
  Dennis



Re: "invalid record length" after restoring pg_basebackup

From
Tom Lane
Date:
Dennis Jacobfeuerborn <dennisml@conversis.de> writes:
> All of this works fine and the logs report that the db reaches a
> consistent recovery state but as last entry it reports an "invalid
> record length":

This looks quite normal to me.  If you'd pulled the power plug on the
primary system at the time you made this backup, you would likely see
the same message at the end of its crash recovery.  Some sort of
corrupt-WAL-entry report is expected at the end of WAL replay anytime
you didn't have a clean shutdown.

            regards, tom lane



Re: "invalid record length" after restoring pg_basebackup

From
Dennis Jacobfeuerborn
Date:
On 11/13/20 4:02 PM, Tom Lane wrote:
> Dennis Jacobfeuerborn <dennisml@conversis.de> writes:
>> All of this works fine and the logs report that the db reaches a
>> consistent recovery state but as last entry it reports an "invalid
>> record length":
> 
> This looks quite normal to me.  If you'd pulled the power plug on the
> primary system at the time you made this backup, you would likely see
> the same message at the end of its crash recovery.  Some sort of
> corrupt-WAL-entry report is expected at the end of WAL replay anytime
> you didn't have a clean shutdown.

But the system the backup was pulled from kept running fine. Also
wouldn't that make re-attaching the system to the primary impossible
since replication cannot be continued due to the broken WAL record?
What I would expect is that pg_basebackup only transfer healthy WAL
entries so that a restored system can pick up right after that with
streaming replication.

Regards,
 Dennis



Re: "invalid record length" after restoring pg_basebackup

From
Tom Lane
Date:
Dennis Jacobfeuerborn <dennisml@conversis.de> writes:
> On 11/13/20 4:02 PM, Tom Lane wrote:
>> This looks quite normal to me.  If you'd pulled the power plug on the
>> primary system at the time you made this backup, you would likely see
>> the same message at the end of its crash recovery.  Some sort of
>> corrupt-WAL-entry report is expected at the end of WAL replay anytime
>> you didn't have a clean shutdown.

> But the system the backup was pulled from kept running fine. Also
> wouldn't that make re-attaching the system to the primary impossible
> since replication cannot be continued due to the broken WAL record?

There is no "broken WAL record".  There is only junk following the
primary's current WAL write point.

> What I would expect is that pg_basebackup only transfer healthy WAL
> entries so that a restored system can pick up right after that with
> streaming replication.

You need to adjust your expectations.  pg_basebackup doesn't parse the WAL
data, because it has no need to.  It just copies whole WAL segment files.

            regards, tom lane