Re: Unstable tests for recovery conflict handling - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Unstable tests for recovery conflict handling
Date
Msg-id 20220726200354.hunrcu6zfjakxfnk@alap3.anarazel.de
Whole thread Raw
In response to Re: Unstable tests for recovery conflict handling  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Unstable tests for recovery conflict handling
List pgsql-hackers
Hi,

On 2022-07-26 14:30:30 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> >> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> >> the startup process hasn't detected the buffer conflict in the first
> >> place.
> 
> > I wonder if this, at least partially, could be be due to the elog thing
> > I was complaining about nearby. I.e. we decide to FATAL as part of a
> > recovery conflict interrupt, and then during that ERROR out as part of
> > another recovery conflict interrupt (because nothing holds interrupts as
> > part of FATAL).
> 
> There are all sorts of things one could imagine going wrong in the
> backend receiving the recovery conflict interrupt, but AFAICS in these
> failures, the startup process hasn't sent a recovery conflict interrupt.
> It certainly hasn't logged anything suggesting it noticed a conflict.

I don't think we reliably emit a log message before the recovery
conflict is resolved.

I've wondered a couple times now about making tap test timeouts somehow
trigger a core dump of all processes. Certainly would make it easier to
debug some of these kinds of issues.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres do not allow to create many tables with more than 63-symbols prefix
Next
From: Robert Haas
Date:
Subject: Re: Transparent column encryption