Home > mailing lists

Re: What is happening on buildfarm member crake? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: What is happening on buildfarm member crake?
Date	January 25, 2014 22:04:49
Msg-id	10013.1390687480@sss.pgh.pa.us Whole thread Raw
In response to	Re: What is happening on buildfarm member crake? (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: What is happening on buildfarm member crake? Re: What is happening on buildfarm member crake?
List	pgsql-hackers

Tree view

Andrew Dunstan <andrew@dunslane.net> writes:
> On 01/19/2014 08:22 PM, Robert Haas wrote:
>> Hmm, that looks an awful lot like the SIGUSR1 signal handler is
>> getting called after we've already completed shmem_exit.  And indeed
>> that seems like the sort of thing that would result in dying horribly
>> in just this way.  The obvious fix seems to be to check
>> proc_exit_inprogress before doing anything that might touch shared
>> memory, but there are a lot of other SIGUSR1 handlers that don't do
>> that either.  However, in those cases, the likely cause of a SIGUSR1
>> would be a sinval catchup interrupt or a recovery conflict, which
>> aren't likely to be so far delayed that they arrive after we've
>> already disconnected from shared memory.  But the dynamic background
>> workers stuff adds a new possible cause of SIGUSR1: the postmaster
>> letting us know that a child has started or died.  And that could
>> happen even after we've detached shared memory.

> Is anything happening about this? We're still getting quite a few of 
> these: 
> <http://www.pgbuildfarm.org/cgi-bin/show_failures.pl?max_days=3&member=crake>

Yeah.  If Robert's diagnosis is correct, and it sounds pretty plausible,
then this is really just one instance of a bug that's probably pretty
widespread in our signal handlers.  Somebody needs to go through 'em
all and look for touches of shared memory.

I'm not sure if we can just disable signal response the moment the
proc_exit_inprogress flag goes up, though.  In some cases such as lock
handling, it's likely that we need that functionality to keep working
for some part of the shutdown process.  We might end up having to disable
individual signal handlers at appropriate places.

Ick.
        regards, tom lane

pgsql-hackers by date:

From: Florian Pflug
Date: 25 January 2014, 22:02:49
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Peter Geoghegan
Date: 25 January 2014, 22:09:02
Subject: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core

Re: What is happening on buildfarm member crake? - Mailing list pgsql-hackers

Previous

Next