Re: Strange failure on mamba - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Strange failure on mamba
Date
Msg-id 2055732.1668725270@sss.pgh.pa.us
Whole thread Raw
In response to Re: Strange failure on mamba  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Strange failure on mamba
Re: Strange failure on mamba
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Nov 18, 2022 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> mamba has been showing intermittent failures in various replication
>> tests since day one.

> I wonder if it's a runtime variant of the other problem.  We do
> load_file("libpqwalreceiver", false) before unblocking signals but
> maybe don't resolve the symbols until calling them, or something like
> that...

Yeah, that or some other NetBSD bug could be the explanation, too.
Without a stack trace it's hard to have any confidence about it,
but I've been unable to reproduce the problem outside the buildfarm.
(Which is a familiar refrain.  I wonder what it is about the buildfarm
environment that makes it act different from the exact same code running
on the exact same machine.)

So I'd like to have some way to make the postmaster send SIGABRT instead
of SIGKILL in the buildfarm environment.  The lowest-tech way would be
to drive that off some #define or other.  We could scale it up to a GUC
perhaps.  Adjacent to that, I also wonder whether SIGABRT wouldn't be
more useful than SIGSTOP for the existing SendStop half-a-feature ---
the idea that people should collect cores manually seems mighty
last-century.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Strange failure on mamba
Next
From: Peter Geoghegan
Date:
Subject: Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs