PostgreSQL Timeline Issue After Switchover with Pacemaker - Mailing list pgsql-admin
From | |
---|---|
Subject | PostgreSQL Timeline Issue After Switchover with Pacemaker |
Date | |
Msg-id | 05b001db7374$73a9ccb0$5afd6610$@japannext.co.jp Whole thread Raw |
List | pgsql-admin |
Hello,
We have multiple two-node primary/standby PostgreSQL clusters managed by Pacemaker.
Yesterday, we performed an OS upgrade following these steps:
- Put the standby node into standby mode using:
pcs node standby [node_name]
- The Unix team upgraded the OS.
- Upgraded packages (pgBackRest, Pacemaker, pcs, Corosync).
- Unstandby the node using:
pcs node unstandby [node_name]
- Repeated the process for the primary node.
However, in one cluster, the PostgreSQL instance did not start when the previous primary became the new standby.
Logs from the failed node:
2025-01-30 15:53:57.106 JST,,,12779,,679b2205.31eb,1,,2025-01-30 15:53:57 JST,,0,FATAL,55000,"highest timeline 7 of the primary is behind recovery timeline 8",,,,,,,,,"","walreceiver",,0
2025-01-30 15:53:57.106 JST,,,9197,,679b212d.23ed,49,,2025-01-30 15:50:21 JST,1/0,0,LOG,00000,"waiting for WAL to become available at 88FE/890000B8",,,,,,,,,"","startup",,0
2025-01-30 15:54:02.109 JST,,,12782,,679b220a.31ee,1,,2025-01-30 15:54:02 JST,,0,FATAL,55000,"highest timeline 7 of the primary is behind recovery timeline 8",,,,,,,,,"","walreceiver",,0
2025-01-30 15:54:02.109 JST,,,9197,,679b212d.23ed,50,,2025-01-30 15:50:21 JST,1/0,0,LOG,00000,"waiting for WAL to become available at 88FE/890000B8",,,,,,,,,"","startup",,0
Steps Taken to Fix It:
I took an incremental backup from the primary database using pgBackRest and performed a delta restore:
pgbackrest --stanza=xxxx --delta --type=standby --log-level-console=detail restore
However, after the restoration, PostgreSQL still wouldn’t start, and I found the following error logs:
2025-01-30 16:16:30.917 JST,,,33337,,679b274e.8239,1,,2025-01-30 16:16:30 JST,,0,LOG,00000,"database system was interrupted; last known up at 2025-01-30 16:11:27 JST",,,,,,,,,"","startup",,0
2025-01-30 16:16:31.087 JST,,,33337,,679b274e.8239,2,,2025-01-30 16:16:30 JST,,0,LOG,00000,"restored log file ""00000008.history"" from archive",,,,,,,,,"","startup",,0
2025-01-30 16:16:31.094 JST,,,33337,,679b274e.8239,3,,2025-01-30 16:16:30 JST,,0,LOG,00000,"entering standby mode",,,,,,,,,"","startup",,0
2025-01-30 16:16:31.094 JST,,,33337,,679b274e.8239,4,,2025-01-30 16:16:30 JST,,0,LOG,00000,"starting backup recovery with redo LSN 88FE/8B000028, checkpoint LSN 88FE/8B000098, on timeline ID 7",,,,,,,,,"","startup",,0
2025-01-30 16:16:31.102 JST,,,33337,,679b274e.8239,5,,2025-01-30 16:16:30 JST,,0,LOG,00000,"restored log file ""00000008.history"" from archive",,,,,,,,,"","startup",,0
2025-01-30 16:16:31.146 JST,,,33337,,679b274e.8239,6,,2025-01-30 16:16:30 JST,,0,LOG,00000,"restored log file ""00000007000088FE0000008B"" from archive",,,,,,,,,"","startup",,0
2025-01-30 16:16:31.155 JST,,,33337,,679b274e.8239,7,,2025-01-30 16:16:30 JST,,0,FATAL,XX000,"requested timeline 8 is not a child of this server's history","Latest checkpoint is at 0/DEAD on timeline 7, but in the history of the requested timeline, the server forked off from that timeline at 88FE/880000A0.",,,,,,,,"","startup",,0
After many attempts (including taking a full backup and performing a full restore, but encountering the same error), I was able to bring up the node using:
pgbackrest --stanza=xxxx --delta --type=standby --log-level-console=detail restore --target-timeline=current
My Questions:
- Why did this happen? Could you explain or provide keywords/links that I can look into?
- Why, after a full backup restore, did the standby still look for the wrong timeline? Is there a file or setting that records the timeline information?
Thank you,
Dean
This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).
pgsql-admin by date: