Home > mailing lists

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date	June 27, 2022 09:02:11
Msg-id	YrlH47KV/VsKVLE9@paquier.xyz Whole thread Raw
In response to	Re: [BUG] Panic due to incorrect missingContrecPtr after promotion ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses	Re: [BUG] Panic due to incorrect missingContrecPtr after promotion Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
List	pgsql-hackers

Tree view

On Fri, Jun 24, 2022 at 04:17:34PM +0000, Imseih (AWS), Sami wrote:
> It is been difficult to get a generic repro, but the way we reproduce
> Is through our test suite. To give more details, we are running tests
> In which we constantly failover and promote standbys. The issue
> surfaces after we have gone through a few promotions which occur
> every few hours or so ( not really important but to give context ).

Hmm.  Could you describe exactly the failover scenario you are using?
Is the test using a set of cascading standbys linked to the promoted
one?  Are the standbys recycled from the promoted nodes with pg_rewind
or created from scratch with a new base backup taken from the
freshly-promoted primary?  I have been looking more at this thread
through the day but I don't see a remaining issue.  It could be
perfectly possible that we are missing a piece related to the handling
of those new overwrite contrecords in some cases, like in a rewind.

> I am adding some additional debugging  to see if I can draw a better
> picture of what is happening. Will also give aborted_contrec_reset_3.patch
> a go, although I suspect it will not handle the specific case we are deaing with.

Yeah, this is not going to change much things if you are still seeing
an issue.  This patch does not change the logic, aka it just
simplifies the tracking of the continuation record data, resetting it
when a complete record has been read.  Saying that, getting rid of the
dependency on StandbyMode because we cannot promote in the middle of a
record is nice (my memories around that were a bit blurry but even
recovery_target_lsn would not recover in the middle of an continuation
record), and this is not bug so there is limited reason to backpatch
this part of the change.
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 27 June 2022, 08:26:22
Subject: Re: Race condition in TransactionIdIsInProgress

From: Peter Eisentraut
Date: 27 June 2022, 09:23:59
Subject: Re: ICU for global collation

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion - Mailing list pgsql-hackers

Attachment

Previous

Next