Thread: [11.6] WALs recovery ordering with Restore Command - timelines

[11.6] WALs recovery ordering with Restore Command - timelines

From
Pavel Suderevsky
Date:
Hi,

PostgreSQL 11.6
Centos 7.6
pgBackrest 2.19


Case:
Master is on timeline 15 while Standby that should be synchronized with Master as a hot standby is on timeline 13. WALs to be obtained from archive.

recovery.conf:
>primary_conninfo = 'user=usename host=10.10.10.3 port=5432 sslmode=prefer application_name=pg2'
>recovery_target_timeline = 'latest'
>restore_command = '/usr/bin/pgbackrest --stanza=platform archive-get %f "%p" --log-level-console info'
>standby_mode = 'on'

Standby recovery is done like that:
1. try restore 0000000F00000A27000000E5
2. unable to find 0000000F00000A27000000E5 in the archive
3. try restore 0000000E00000A27000000E5
4. found 0000000E00000A27000000E5 in the archive
-- trying next WAL for BOTH timelines
1. try restore 0000000F00000A27000000E6
2. unable to find 0000000F00000A27000000E6 in the archive
3. try restore 0000000E00000A27000000E6
4. found 0000000E00000A27000000E6 in the archive

Why does Postgres restore WALs not in a timelines order? First 13, then 14, then 15. Up to timeline switch position. WALs it try to restore for the latest timeline are deliberately not yet exist. It leads to terrible recovery performance because of long "unable to find" operations.

Re: [11.6] WALs recovery ordering with Restore Command - timelines

From
Pavel Suderevsky
Date:
One more log example:
2019-12-06 07:11:16 CST  LOG:  database system was shut down in recovery at 2019-12-06 07:11:08 CST
Searching WAL: 00000022.history, location: pg_wal/RECOVERYHISTORY
2019-12-06 07:11:16 CST  LOG:  restored log file "00000022.history" from archive
Searching WAL: 00000023.history, location: pg_wal/RECOVERYHISTORY
2019-12-06 07:11:16 CST  LOG:  entering standby mode
Searching WAL: 00000022.history, location: pg_wal/RECOVERYHISTORY
2019-12-06 07:11:16 CST  LOG:  restored log file "00000022.history" from archive
Searching WAL: 00000022000018C60000003F, location: pg_wal/RECOVERYXLOG
Searching WAL: 00000021000018C60000003F, location: pg_wal/RECOVERYXLOG
2019-12-06 07:11:20 CST  LOG:  restored log file "00000021000018C60000003F" from archive
Searching WAL: 00000021.history, location: pg_wal/RECOVERYHISTORY
2019-12-06 07:11:20 CST  LOG:  restored log file "00000021.history" from archive
Searching WAL: 00000022000018BF0000001B, location: pg_wal/RECOVERYXLOG
Searching WAL: 00000021000018BF0000001B, location: pg_wal/RECOVERYXLOG
2019-12-06 07:11:27 CST  LOG:  restored log file "00000021000018BF0000001B" from archive
2019-12-06 07:11:27 CST  LOG:  redo starts at 18BF/1B311260
Searching WAL: 00000022000018BF0000001C, location: pg_wal/RECOVERYXLOG
Searching WAL: 00000021000018BF0000001C, location: pg_wal/RECOVERYXLOG
2019-12-06 07:11:34 CST  LOG:  restored log file "00000021000018BF0000001C" from archive
Searching WAL: 00000022000018BF0000001D, location: pg_wal/RECOVERYXLOG
Searching WAL: 00000021000018BF0000001D, location: pg_wal/RECOVERYXLOG
2019-12-06 07:11:40 CST  LOG:  restored log file "00000021000018BF0000001D" from archive
Searching WAL: 00000022000018BF0000001E, location: pg_wal/RECOVERYXLOG
Searching WAL: 00000021000018BF0000001E, location: pg_wal/RECOVERYXLOG
2019-12-06 07:11:46 CST  LOG:  restored log file "00000021000018BF0000001E" from archive
Searching WAL: 00000022000018BF0000001F, location: pg_wal/RECOVERYXLOG
Searching WAL: 00000021000018BF0000001F, location: pg_wal/RECOVERYXLOG
2019-12-06 07:11:53 CST  LOG:  restored log file "00000021000018BF0000001F" from archive
Command: 
restore_command = 'echo -e "Searching WAL: %f, location: %p"; /usr/bin/pgbackrest --stanza=platform archive-get %f "%p"' 

As you can see Postgres tries restoring 00000022* WALs before timeline switch LSN is reached while restoring 00000021*.
 
Should I resend this issue to a pgsql-bugs?

пн, 25 нояб. 2019 г. в 23:26, Pavel Suderevsky <psuderevsky@gmail.com>:
Hi,

PostgreSQL 11.6
Centos 7.6
pgBackrest 2.19


Case:
Master is on timeline 15 while Standby that should be synchronized with Master as a hot standby is on timeline 13. WALs to be obtained from archive.

recovery.conf:
>primary_conninfo = 'user=usename host=10.10.10.3 port=5432 sslmode=prefer application_name=pg2'
>recovery_target_timeline = 'latest'
>restore_command = '/usr/bin/pgbackrest --stanza=platform archive-get %f "%p" --log-level-console info'
>standby_mode = 'on'

Standby recovery is done like that:
1. try restore 0000000F00000A27000000E5
2. unable to find 0000000F00000A27000000E5 in the archive
3. try restore 0000000E00000A27000000E5
4. found 0000000E00000A27000000E5 in the archive
-- trying next WAL for BOTH timelines
1. try restore 0000000F00000A27000000E6
2. unable to find 0000000F00000A27000000E6 in the archive
3. try restore 0000000E00000A27000000E6
4. found 0000000E00000A27000000E6 in the archive

Why does Postgres restore WALs not in a timelines order? First 13, then 14, then 15. Up to timeline switch position. WALs it try to restore for the latest timeline are deliberately not yet exist. It leads to terrible recovery performance because of long "unable to find" operations.