Re: PANIC during crash recovery of a recently promoted standby - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: PANIC during crash recovery of a recently promoted standby
Date
Msg-id 20180524075707.GE15445@paquier.xyz
Whole thread Raw
In response to Re: PANIC during crash recovery of a recently promoted standby  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: PANIC during crash recovery of a recently promoted standby
List pgsql-hackers
On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:
> Looks like I didn't understand Alvaro's comment when he mentioned it to me
> off-list. But I now see what Michael and Alvaro mean and that indeed seems
> like a problem. I was thinking that the test for (ControlFile->state ==
> DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
> after the standby is promoted. While that's true for a DB_IN_PRODUCTION,  the
> RestartPoint may finish after we have written end-of-recovery record, but
> before we're in production and thus the minRecoveryPoint may again be set.

Yeah, this has been something I considered as well first, but I was not
confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
was actually a safe thing for timeline switches.

So I have spent a good portion of today testing and playing with it to
be confident enough that this was right, and I have finished with the
attached.  The patch adds a new flag to XLogCtl which marks if the
control file has been updated after the end-of-recovery record has been
written, so as minRecoveryPoint does not get updated because of a
restart point running in parallel.

I have also reworked the test case you sent, removing the manuals sleeps
and replacing them with correct wait points.  There is also no point to
wait after promotion as pg_ctl promote implies a wait.  Another
important thing is that you need to use wal_log_hints = off to see a
crash, which is something that allows_streaming actually enables.

Comments are welcome.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: PG11 jit failing on ppc64el
Next
From: Pavel Raiskup
Date:
Subject: Re: Shared PostgreSQL libraries and symbol versioning