A question about possible recovery inconsistency - Mailing list pgsql-general
From | Eugen Konkov |
---|---|
Subject | A question about possible recovery inconsistency |
Date | |
Msg-id | CAHKPvkZLN4ow=oFbzUFdGLBNtRdfXwfm97sQYKkYHwL2dJ8d_Q@mail.gmail.com Whole thread Raw |
Responses |
Re: A question about possible recovery inconsistency
|
List | pgsql-general |
Hello. We had PostgreSQL v11 and used PITR with our database. But PITR behavior was changed in v13 >https://www.postgresql.org/docs/release/13.0/ >Generate an error if recovery does not reach the specified recovery target (Leif Gunnar Erlandsen, Peter Eisentraut) >Previously, a standby would promote itself upon reaching the end of WAL, even if the target was not reached. I tried to change our scripts to include the time where base backup is started (also I tried the time when the base backup was finished). But all my recoveries failed with: `XX000: recovery ended before configured recovery target was reached` So we decided to use `recovery_target_name` instead of `recovery_target_time`. But this attempt also failed with the error above. Backup procedure looks like this: 1. Insert data 2. t3=$(date +"%Y-%m-%d %T") 3. psql -c "select pg_create_restore_point( '$t3' )" 4. wal-g backup-push $PGDATA. This tool lever low level API https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP. 't3' value was used for `recoverty_target_name`. Issue here is that `t3` name was not included into base backup and following WAL files. Logs looks like this: xargs -n1 '-I{}' docker exec test-userdb-restore psql -d userdb -U user_api -c 'INSERT INTO test (num) VALUES ({});' INSERT 0 1 docker exec test-userdb-restore psql -c 'select pg_create_restore_point( '\''2023-10-10 14:48:46'\'' )' pg_create_restore_point ------------------------- 0/5000640 (1 row) wal-g wal-push pg_wal/000000010000000000000006.00000028.backup /dbdata/postgres/pg_wal: total 65540 -rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000005 -rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000006 -rw------- 1 postgres postgres 364 Oct 10 14:48 000000010000000000000006.00000028.backup -rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000007 -rw------- 1 postgres postgres 16777216 Oct 10 14:48 000000010000000000000008 drwx------ 5 postgres postgres 160 Oct 10 14:48 archive_status drwx------ 3 root root 96 Oct 10 14:48 walg_data /dbdata/postgres/pg_wal/archive_status: total 0 -rw------- 1 postgres postgres 0 Oct 10 14:48 000000010000000000000005.done -rw------- 1 root root 0 Oct 10 14:48 000000010000000000000006.00000028.backup.done -rw------- 1 root root 0 Oct 10 14:48 000000010000000000000006.done /dbdata/postgres/pg_wal/walg_data: total 0 drwx------ 2 postgres postgres 64 Oct 10 14:48 walg_archive_status /dbdata/postgres/pg_wal/walg_data/walg_archive_status: total 0 + docker exec test-userdb-restore bash -c 'cat $PGDATA/pg_wal/$(ls -R $PGDATA/pg_wal | grep backup | head -n 1)' START WAL LOCATION: 0/6000028 (file 000000010000000000000006) STOP WAL LOCATION: 0/6000100 (file 000000010000000000000006) CHECKPOINT LOCATION: 0/6000060 BACKUP METHOD: streamed BACKUP FROM: primary START TIME: 2023-10-10 14:48:46 UTC LABEL: 2023-10-10 14:48:46.482532 +0000 UTC m=+0.058620814 START TIMELINE: 1 STOP TIME: 2023-10-10 14:48:47 UTC STOP TIMELINE: 1 When server was stopped couple of WAL files are created: + cd /dbdata/postgres/pg_wal + ls -R .: 000000010000000000000006.00000028.backup 000000010000000000000007 000000010000000000000008 000000010000000000000009 00000001000000000000000A archive_status walg_data ./archive_status: 000000010000000000000006.00000028.backup.done **Created `recovery.signal` and started server** + echo 'recovery_target_name= '\''2023-10-10 14:48:46'\''' + echo 'recovery_target_action= '\''promote'\''' + touch /dbdata/postgres/recovery.signal 2023-10-10 14:48:51.825 UTC [127] database system was interrupted; last known up at 2023-10-10 14:48:46 UTC 2023-10-10 14:48:51.985 UTC [127] LOG: starting point-in-time recovery to "2023-10-10 14:48:46" 2023-10-10 14:48:52.065 UTC [127] LOG: restored log file "000000010000000000000006" from archive 2023-10-10 14:48:52.082 UTC [127] LOG: redo starts at 0/6000028 2023-10-10 14:48:52.141 UTC [127] LOG: consistent recovery state reached at 0/6000100 2023-10-10 14:48:52.141 UTC [124] LOG: database system is ready to accept read-only connections 2023-10-10 14:48:52.188 UTC [127] LOG: redo done at 0/6000100 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.10 s 2023-10-10 14:48:52.188 UTC [127] FATAL: recovery ended before configured recovery target was reached See full logs in `experiment1.txt` attachment. Ok. Some name in the past is not included into the base backup which was done just now. This is expected, so I created the named restore point after base backup. Now logs are: + docker exec test-userdb-restore psql -c 'select pg_create_restore_point( '\''2023-10-10 15:07:37'\'' )' pg_create_restore_point ------------------------- 0/7000090 (1 row) + cat 000000010000000000000006.00000028.backup START WAL LOCATION: 0/6000028 (file 000000010000000000000006) STOP WAL LOCATION: 0/6000100 (file 000000010000000000000006) CHECKPOINT LOCATION: 0/6000060 BACKUP METHOD: streamed BACKUP FROM: primary START TIME: 2023-10-10 15:07:37 UTC LABEL: 2023-10-10 15:07:37.844015 +0000 UTC m=+0.057006433 START TIMELINE: 1 STOP TIME: 2023-10-10 15:07:38 UTC STOP TIMELINE: 1 2023-10-10 15:07:43.451 UTC [128] LOG: starting point-in-time recovery to "2023-10-10 15:07:37" 2023-10-10 15:07:43.525 UTC [128] LOG: restored log file "000000010000000000000006" from archive 2023-10-10 15:07:43.540 UTC [128] LOG: redo starts at 0/6000028 2023-10-10 15:07:43.677 UTC [128] LOG: consistent recovery state reached at 0/6000100 2023-10-10 15:07:43.677 UTC [128] LOG: recovery stopping at restore point "2023-10-10 15:07:37", time 2023-10-10 15:07:38.846288+00 2023-10-10 15:07:43.677 UTC [128] LOG: redo done at 0/7000028 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.13 s 2023-10-10 15:07:43.677 UTC [125] LOG: database system is ready to accept read-only connections 1. Why here (in experiment2.txt) redo done at 0/7000028 when recovery target name "2023-10-10 15:07:37" is at 0/7000090? I suppose 0/7000090 should be included, because `recovery_target_inclusive=true` by default. 2. Is there any way to include a label into a base backup which I can use as `recoverty_target_name`? This is not clear from documentation https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Is 'label' the name for `recovery_target_name` / `pg_create_restore_point` is called by `pg_backup_start`? 3. Is there any way to get the latest time from a base backup which is reachable and could be used as the value for `recovery_target_time`? As a workaround for XX000 error I inserted one additional record into the database, so a new WAL file is generated. Then I can use the t3 value for `recovery_target_time`. This only works when archive_command/restore_command was configured. But without them it seems I can not use the `recovery_target_time` option. Is this true? Thank you. -- Eugen Konkov DevOps Engineer, Planitar Inc.
Attachment
pgsql-general by date: