Re: [BUG] Panic due to incorrect missingContrecPtr after promotion - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date
Msg-id YrAeJw+DjSFkrgG1@paquier.xyz
Whole thread Raw
In response to Re: [BUG] Panic due to incorrect missingContrecPtr after promotion  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: [BUG] Panic due to incorrect missingContrecPtr after promotion  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Fri, May 27, 2022 at 07:01:37PM +0000, Imseih (AWS), Sami wrote:
> What we found:
>
> 1. missingContrecPtr is set when
>    StandbyMode is false, therefore
>    only a writer should set this value
>    and a record is then sent downstream.
>
>    But a standby going through crash
>    recovery will always have StandbyMode = false,
>    causing the missingContrecPtr to be incorrectly
>    set.

That stands as true as far as I know, StandbyMode would be switched
only once we get out of crash recovery, and only if archive recovery
completes when there is a restore_command.

> 2. If StandbyModeRequested is checked instead,
>      we ensure that a standby will not set a
>      missingContrecPtr.
>
> 3. After applying the patch below, the tap test succeeded

Hmm.  I have not looked at that in depth, but if the intention is to
check that the database is able to write WAL, looking at
XLogCtl->SharedRecoveryState would be the way to go because that's the
flip switching between crash recovery, archive recovery and the end of
recovery (when WAL can be safely written).

The check in xlogrecovery_redo() still looks like a good thing to have
anyway, because we know that we can safely skip the contrecord.  Now,
for any patch produced, could the existing TAP test be extended so as
we are able to get a PANIC even if we keep around the sanity check in
xlogrecovery_redo().  That would likely involve an immediate shutdown
of a standby followed by a start sequence?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Replica Identity check of partition table on subscriber
Next
From: Przemysław Sztoch
Date:
Subject: Re: [PATCH] Completed unaccent dictionary with many missing characters