On 2022-Feb-22, Imseih (AWS), Sami wrote:
> On 13.5 a wal flush PANIC is encountered after a standby is promoted.
>
> With debugging, it was found that when a standby skips a missing
> continuation record on recovery, the missingContrecPtr is not
> invalidated after the record is skipped. Therefore, when the standby
> is promoted to a primary it writes an overwrite_contrecord with an LSN
> of the missingContrecPtr, which is now in the past. On flush time,
> this causes a PANIC. From what I can see, this failure scenario can
> only occur after a standby is promoted.
Ooh, nice find and diagnosys. I can confirm that the test fails as you
described without the code fix, and doesn't fail with it.
I attach the same patch, with the test file put in its final place
rather than as a patch. Due to recent xlog.c changes this need a bit of
work to apply to back branches; I'll see about getting it in all
branches soon.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php