Thread: A question about possible recovery inconsistency

A question about possible recovery inconsistency

From
Eugen Konkov
Date:
Hello.
We had PostgreSQL v11 and used PITR with our database.
But PITR behavior was changed in v13

>https://www.postgresql.org/docs/release/13.0/
>Generate an error if recovery does not reach the specified recovery target (Leif Gunnar Erlandsen, Peter Eisentraut)
>Previously, a standby would promote itself upon reaching the end of WAL, even if the target was not reached.

I tried to change our scripts to include the time where base backup is
started (also I tried the time when the base backup was finished).
But all my recoveries failed with: `XX000: recovery ended before
configured recovery target was reached`

So we decided to use `recovery_target_name` instead of
`recovery_target_time`. But this attempt also failed with the error
above.
Backup procedure looks like this:
1. Insert data
2. t3=$(date +"%Y-%m-%d %T")
3. psql -c "select pg_create_restore_point( '$t3' )"
4. wal-g backup-push $PGDATA. This tool lever low level API
https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP.

't3' value was used for `recoverty_target_name`.

Issue here is that `t3` name was not included into base backup and
following WAL files. Logs looks like this:

xargs -n1 '-I{}' docker exec test-userdb-restore psql -d userdb -U
user_api -c 'INSERT INTO test (num) VALUES ({});'
INSERT 0 1
docker exec test-userdb-restore psql -c 'select
pg_create_restore_point( '\''2023-10-10 14:48:46'\'' )'
 pg_create_restore_point
-------------------------
 0/5000640
(1 row)
wal-g wal-push pg_wal/000000010000000000000006.00000028.backup

/dbdata/postgres/pg_wal:
total 65540
-rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000005
-rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000006
-rw------- 1 postgres postgres      364 Oct 10 14:48
000000010000000000000006.00000028.backup
-rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000007
-rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000008
drwx------ 5 postgres postgres      160 Oct 10 14:48 archive_status
drwx------ 3 root     root           96 Oct 10 14:48 walg_data

/dbdata/postgres/pg_wal/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 10 14:48 000000010000000000000005.done
-rw------- 1 root     root     0 Oct 10 14:48
000000010000000000000006.00000028.backup.done
-rw------- 1 root     root     0 Oct 10 14:48 000000010000000000000006.done

/dbdata/postgres/pg_wal/walg_data:
total 0
drwx------ 2 postgres postgres 64 Oct 10 14:48 walg_archive_status

/dbdata/postgres/pg_wal/walg_data/walg_archive_status:
total 0
+ docker exec test-userdb-restore bash -c 'cat $PGDATA/pg_wal/$(ls -R
$PGDATA/pg_wal | grep backup | head -n 1)'
START WAL LOCATION: 0/6000028 (file 000000010000000000000006)
STOP WAL LOCATION: 0/6000100 (file 000000010000000000000006)
CHECKPOINT LOCATION: 0/6000060
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2023-10-10 14:48:46 UTC
LABEL: 2023-10-10 14:48:46.482532 +0000 UTC m=+0.058620814
START TIMELINE: 1
STOP TIME: 2023-10-10 14:48:47 UTC
STOP TIMELINE: 1

When server was stopped couple of WAL files are created:

+ cd /dbdata/postgres/pg_wal
+ ls -R
.:
000000010000000000000006.00000028.backup
000000010000000000000007
000000010000000000000008
000000010000000000000009
00000001000000000000000A
archive_status
walg_data

./archive_status:
000000010000000000000006.00000028.backup.done

**Created `recovery.signal` and started server**
+ echo 'recovery_target_name= '\''2023-10-10 14:48:46'\'''
+ echo 'recovery_target_action= '\''promote'\'''
+ touch /dbdata/postgres/recovery.signal

2023-10-10 14:48:51.825 UTC [127] database system was interrupted;
last known up at 2023-10-10 14:48:46 UTC
2023-10-10 14:48:51.985 UTC [127] LOG:  starting point-in-time
recovery to "2023-10-10 14:48:46"
2023-10-10 14:48:52.065 UTC [127] LOG:  restored log file
"000000010000000000000006" from archive
2023-10-10 14:48:52.082 UTC [127] LOG:  redo starts at 0/6000028
2023-10-10 14:48:52.141 UTC [127] LOG:  consistent recovery state
reached at 0/6000100
2023-10-10 14:48:52.141 UTC [124] LOG:  database system is ready to
accept read-only connections
2023-10-10 14:48:52.188 UTC [127] LOG:  redo done at 0/6000100 system
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.10 s
2023-10-10 14:48:52.188 UTC [127] FATAL:  recovery ended before
configured recovery target was reached

See full logs in `experiment1.txt` attachment.

Ok. Some name in the past is not included into the base backup which
was done just now.
This is expected, so I created the named restore point after base backup.

Now logs are:
+ docker exec test-userdb-restore psql -c 'select
pg_create_restore_point( '\''2023-10-10 15:07:37'\'' )'
 pg_create_restore_point
-------------------------
 0/7000090
(1 row)

+ cat 000000010000000000000006.00000028.backup
START WAL LOCATION: 0/6000028 (file 000000010000000000000006)
STOP WAL LOCATION: 0/6000100 (file 000000010000000000000006)
CHECKPOINT LOCATION: 0/6000060
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2023-10-10 15:07:37 UTC
LABEL: 2023-10-10 15:07:37.844015 +0000 UTC m=+0.057006433
START TIMELINE: 1
STOP TIME: 2023-10-10 15:07:38 UTC
STOP TIMELINE: 1

2023-10-10 15:07:43.451 UTC [128] LOG:  starting point-in-time
recovery to "2023-10-10 15:07:37"
2023-10-10 15:07:43.525 UTC [128] LOG:  restored log file
"000000010000000000000006" from archive
2023-10-10 15:07:43.540 UTC [128] LOG:  redo starts at 0/6000028
2023-10-10 15:07:43.677 UTC [128] LOG:  consistent recovery state
reached at 0/6000100
2023-10-10 15:07:43.677 UTC [128] LOG:  recovery stopping at restore
point "2023-10-10 15:07:37", time 2023-10-10 15:07:38.846288+00
2023-10-10 15:07:43.677 UTC [128] LOG:  redo done at 0/7000028 system
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.13 s
2023-10-10 15:07:43.677 UTC [125] LOG:  database system is ready to
accept read-only connections

1. Why here (in experiment2.txt) redo done at 0/7000028 when recovery
target name "2023-10-10 15:07:37" is at 0/7000090?
I suppose 0/7000090 should be included, because
`recovery_target_inclusive=true` by default.

2. Is there any way to include a label into a base backup which I can
use as `recoverty_target_name`?
This is not clear from documentation
https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
Is 'label' the name for `recovery_target_name` /
`pg_create_restore_point` is called by `pg_backup_start`?

3. Is there any way to get the latest time from a base backup which is
reachable and could be used as the value for `recovery_target_time`?
As a workaround for XX000 error I inserted one additional record into
the database, so a new WAL file is generated. Then I can use the t3
value for `recovery_target_time`.
This only works when archive_command/restore_command was configured.
But without them it seems I can not use the `recovery_target_time`
option. Is this true?

Thank you.

-- 

Eugen Konkov

DevOps Engineer, Planitar Inc.

Attachment

Re: A question about possible recovery inconsistency

From
Laurenz Albe
Date:
On Tue, 2023-10-10 at 11:46 -0400, Eugen Konkov wrote:
> [wants to avoid
>  FATAL:  recovery ended before configured recovery target was reached
>  that is issued in v13 and later]
>
> 1. Why here (in experiment2.txt) redo done at 0/7000028 when recovery
> target name "2023-10-10 15:07:37" is at 0/7000090?
> I suppose 0/7000090 should be included, because
> `recovery_target_inclusive=true` by default.

Because there was no transaction at 0/7000090.

> 2. Is there any way to include a label into a base backup which I can
> use as `recoverty_target_name`?
> This is not clear from documentation
> https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
> Is 'label' the name for `recovery_target_name` /
> `pg_create_restore_point` is called by `pg_backup_start`?

No.

> 3. Is there any way to get the latest time from a base backup which is
> reachable and could be used as the value for `recovery_target_time`?
> As a workaround for XX000 error I inserted one additional record into
> the database, so a new WAL file is generated. Then I can use the t3
> value for `recovery_target_time`.
> This only works when archive_command/restore_command was configured.
> But without them it seems I can not use the `recovery_target_time`
> option. Is this true?

Perhaps you could use the time from the "backup" file in the WAL archive,
not sure.

But why do you want to do that, if all that you have to do is specify
"recovery_target = 'immediate'" to recover to the end of the backup?

Yours,
Laurenz Albe



Re: A question about possible recovery inconsistency

From
Eugen Konkov
Date:
>But why do you want to do that, if all that you have to do is specify
"recovery_target = 'immediate'" to recover to the end of the backup?

Because automation scripts do not know if transactions are available
after some point in time or not. But automation scripts know that
backup was completed successfully at that point.
For example:
We want to provide time to recover the database.
1. Base backup restored, wal files are applied successfully if there
is a transaction.
2. Base backup restored, wal files are not applied successfully, even
if we have the correct wal file after target time.
eg. this wal file was created with help: pg_create_restore_point / pg_switch_wal

It looks inconsistent, because we can restore save archive by name,
but we can not restore it by time, even if this time is less when the
named point was created.

As workaround we just insert a fake record into database, but this
looks very questionable: Why do we need to insert more records into
database after successful backup??



On Wed, Oct 11, 2023 at 3:45 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2023-10-10 at 11:46 -0400, Eugen Konkov wrote:
> > [wants to avoid
> >  FATAL:  recovery ended before configured recovery target was reached
> >  that is issued in v13 and later]
> >
> > 1. Why here (in experiment2.txt) redo done at 0/7000028 when recovery
> > target name "2023-10-10 15:07:37" is at 0/7000090?
> > I suppose 0/7000090 should be included, because
> > `recovery_target_inclusive=true` by default.
>
> Because there was no transaction at 0/7000090.
>
> > 2. Is there any way to include a label into a base backup which I can
> > use as `recoverty_target_name`?
> > This is not clear from documentation
> > https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
> > Is 'label' the name for `recovery_target_name` /
> > `pg_create_restore_point` is called by `pg_backup_start`?
>
> No.
>
> > 3. Is there any way to get the latest time from a base backup which is
> > reachable and could be used as the value for `recovery_target_time`?
> > As a workaround for XX000 error I inserted one additional record into
> > the database, so a new WAL file is generated. Then I can use the t3
> > value for `recovery_target_time`.
> > This only works when archive_command/restore_command was configured.
> > But without them it seems I can not use the `recovery_target_time`
> > option. Is this true?
>
> Perhaps you could use the time from the "backup" file in the WAL archive,
> not sure.
>
> But why do you want to do that, if all that you have to do is specify
> "recovery_target = 'immediate'" to recover to the end of the backup?
>
> Yours,
> Laurenz Albe



--


Eugen Konkov

DevOps Engineer, Planitar Inc.

M. 416-276-1715

ekonkove@planitar.com | goiguide.com

560 Parkside Drive, Unit 401

Waterloo, ON, Canada N2L 5Z4



Re: A question about possible recovery inconsistency

From
Ron
Date:
On 10/11/23 09:52, Eugen Konkov wrote:
But why do you want to do that, if all that you have to do is specify
"recovery_target = 'immediate'" to recover to the end of the backup?

Because automation scripts do not know if transactions are available
after some point in time or not. But automation scripts know that
backup was completed successfully at that point.
For example:
We want to provide time to recover the database.
1. Base backup restored, wal files are applied successfully if there
is a transaction.

Doesn't "pg_basebackup --wal-method=stream" already do that?

Since the commands below automagically starts physical streaming, there must be a variation where all the wal data is applied but doesn't start replication.

pg_basebackup --dbname=service=basebackup -D $PGDATA --progress --checkpoint=fast -v \
        --write-recovery-conf --wal-method=stream --create-slot --slot=pgstandby1 --compress=server-zstd
pg_ctl start -w

Maybe just remove the two "slot" options:
pg_basebackup --dbname=service=basebackup -D $PGDATA --progress --checkpoint=fast -v \
        --write-recovery-conf --wal-method=stream --compress=server-zstd

--
Born in Arizona, moved to Babylonia.

Re: A question about possible recovery inconsistency

From
Ron
Date:
On 10/11/23 12:07, Ron wrote:
On 10/11/23 09:52, Eugen Konkov wrote:
But why do you want to do that, if all that you have to do is specify
"recovery_target = 'immediate'" to recover to the end of the backup?

Because automation scripts do not know if transactions are available
after some point in time or not. But automation scripts know that
backup was completed successfully at that point.
For example:
We want to provide time to recover the database.
1. Base backup restored, wal files are applied successfully if there
is a transaction.

Doesn't "pg_basebackup --wal-method=stream" already do that?

Since the commands below automagically starts physical streaming, there must be a variation where all the wal data is applied but doesn't start replication.

pg_basebackup --dbname=service=basebackup -D $PGDATA --progress --checkpoint=fast -v \
        --write-recovery-conf --wal-method=stream --create-slot --slot=pgstandby1 --compress=server-zstd
pg_ctl start -w

Maybe just remove the two "slot" options:
pg_basebackup --dbname=service=basebackup -D $PGDATA --progress --checkpoint=fast -v \
        --write-recovery-conf --wal-method=stream --compress=server-zstd


After that pg_basebackup, do this to get R/W copy of the database:
pg_ctl start
pg_ctl promote

--
Born in Arizona, moved to Babylonia.