Hi,
On 2022-07-26 14:30:30 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> >> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> >> the startup process hasn't detected the buffer conflict in the first
> >> place.
>
> > I wonder if this, at least partially, could be be due to the elog thing
> > I was complaining about nearby. I.e. we decide to FATAL as part of a
> > recovery conflict interrupt, and then during that ERROR out as part of
> > another recovery conflict interrupt (because nothing holds interrupts as
> > part of FATAL).
>
> There are all sorts of things one could imagine going wrong in the
> backend receiving the recovery conflict interrupt, but AFAICS in these
> failures, the startup process hasn't sent a recovery conflict interrupt.
> It certainly hasn't logged anything suggesting it noticed a conflict.
I don't think we reliably emit a log message before the recovery
conflict is resolved.
I've wondered a couple times now about making tap test timeouts somehow
trigger a core dump of all processes. Certainly would make it easier to
debug some of these kinds of issues.
Greetings,
Andres Freund