Thread: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup

Hi,

When I use pg_basebackup to backup and restore db(Let's call it A) to a
standalone instance(Let's call it B), "missing chunk number 0 for toast
value xxx in pg_toast_xxx" errors output.

PG version: 10.3
pg_basebackup command:
     /usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data
-Xs -P -n --waldir=/tmp/pg_wal
   I have mounted a disk to /tmp/pg_wal before, then I will mount the disk
to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal
records during backup.
 
Since I don't want B to be a standy server, I just want it to be a
standalone server.
I removed recovery.conf, then simply start postgresql-10.service. It turned
out that postgresql-10.service
can be started successfully. But when I use this postgresql(reindex, vacumm
and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx"
errors output.

When pg_basebackup, it will store wal under pg_wal, can't postgresql work
with wal records locally?
I think primary_conninfo in recovery.conf is just used to get newer wal
records from A. Right?

I have also tested:
If I start postgresql-10.service with recovery.conf firstly, then split it
from postgresql cluster, everything works fine.

Above test seems proved that it is wal records's problem. I am really
confused.

Regards
Ma Xinjian






--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



On Tue, 2021-04-13 at 02:38 -0700, Ma Xinjian wrote:
> When I use pg_basebackup to backup and restore db(Let's call it A) to a
> standalone instance(Let's call it B), "missing chunk number 0 for toast
> value xxx in pg_toast_xxx" errors output.
> 
> PG version: 10.3
> pg_basebackup command:
>      /usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data
> -Xs -P -n --waldir=/tmp/pg_wal
>    I have mounted a disk to /tmp/pg_wal before, then I will mount the disk
> to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal
> records during backup.
>  
> Since I don't want B to be a standy server, I just want it to be a
> standalone server.
> I removed recovery.conf, then simply start postgresql-10.service. It turned
> out that postgresql-10.service
> can be started successfully. But when I use this postgresql(reindex, vacumm
> and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx"
> errors output.
> 
> When pg_basebackup, it will store wal under pg_wal, can't postgresql work
> with wal records locally?
> I think primary_conninfo in recovery.conf is just used to get newer wal
> records from A. Right?
> 
> I have also tested:
> If I start postgresql-10.service with recovery.conf firstly, then split it
> from postgresql cluster, everything works fine.
> 
> Above test seems proved that it is wal records's problem. I am really
> confused.

Your mail got me confused...

Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
default location?

I see nothing wrong with what you are doing, but I may have got lost
in your complicated procedure.

You don't happen to remove "backup_label", do you?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




> Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
> default location?
pg_wal dir has size limitation, if wal files are too large, they will be
overwrited, right?

> I see nothing wrong with what you are doing, but I may have got lost in
> your complicated procedure.

> You don't happen to remove "backup_label", do you?
em, I do remove backup_label...

1. It means recovery.conf is not necessary, backup_label is necessary?
2. Which key in backup_label is necessary?
3. I searched the log, it do has recoveried.
Then, if there is no backup_label, what's the default START WAL LOCATION and
CHECKPOINT LOCATION?



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Ma Xinjian <maxj.fnst@fujitsu.com> writes:
> When I use pg_basebackup to backup and restore db(Let's call it A) to a
> standalone instance(Let's call it B), "missing chunk number 0 for toast
> value xxx in pg_toast_xxx" errors output.
> PG version: 10.3

10.3 is quite a few bug fixes ago.  Maybe you'd have better results
with the current release (10.16).

            regards, tom lane



On Tue, 2021-04-13 at 06:36 -0700, MaXinjian wrote:
> > Why do you write the WAL to /tmp/pg_wal, only to later mount that at the
> > default location?
>
> pg_wal dir has size limitation, if wal files are too large, they will be
> overwrited, right?

No, they won't.
You could run out of space on the file system though.

> > I see nothing wrong with what you are doing, but I may have got lost in
> > your complicated procedure.
> > You don't happen to remove "backup_label", do you?
>
> em, I do remove backup_label...

Then that's your problem.
That will corrupt your data, because recovery starts from the wrong
checkpoint.

> 1. It means recovery.conf is not necessary, backup_label is necessary?

Yes, exactly.

> 2. Which key in backup_label is necessary?

The whole file needs to be preserved unchanged, just as it is.
Don't mess with that file.

> 3. I searched the log, it do has recoveried.
> Then, if there is no backup_label, what's the default START WAL LOCATION and
> CHECKPOINT LOCATION?

That's the catch.

"backup_label" is the *only way* to tell a backup from a crashed
PostgreSQL cluster.

If there is no "backup_label", PostgreSQL will get the latest checkpoint
from the control file (global/pg_control), which may well be later than
the checkpoint that started the backup, so you will miss to recover some
transactions.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com