Re: What is happening on buildfarm member crake? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: What is happening on buildfarm member crake?
Date
Msg-id 10013.1390687480@sss.pgh.pa.us
Whole thread Raw
In response to Re: What is happening on buildfarm member crake?  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: What is happening on buildfarm member crake?  (Andrew Dunstan <andrew@dunslane.net>)
Re: What is happening on buildfarm member crake?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> On 01/19/2014 08:22 PM, Robert Haas wrote:
>> Hmm, that looks an awful lot like the SIGUSR1 signal handler is
>> getting called after we've already completed shmem_exit.  And indeed
>> that seems like the sort of thing that would result in dying horribly
>> in just this way.  The obvious fix seems to be to check
>> proc_exit_inprogress before doing anything that might touch shared
>> memory, but there are a lot of other SIGUSR1 handlers that don't do
>> that either.  However, in those cases, the likely cause of a SIGUSR1
>> would be a sinval catchup interrupt or a recovery conflict, which
>> aren't likely to be so far delayed that they arrive after we've
>> already disconnected from shared memory.  But the dynamic background
>> workers stuff adds a new possible cause of SIGUSR1: the postmaster
>> letting us know that a child has started or died.  And that could
>> happen even after we've detached shared memory.

> Is anything happening about this? We're still getting quite a few of 
> these: 
> <http://www.pgbuildfarm.org/cgi-bin/show_failures.pl?max_days=3&member=crake>

Yeah.  If Robert's diagnosis is correct, and it sounds pretty plausible,
then this is really just one instance of a bug that's probably pretty
widespread in our signal handlers.  Somebody needs to go through 'em
all and look for touches of shared memory.

I'm not sure if we can just disable signal response the moment the
proc_exit_inprogress flag goes up, though.  In some cases such as lock
handling, it's likely that we need that functionality to keep working
for some part of the shutdown process.  We might end up having to disable
individual signal handlers at appropriate places.

Ick.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Next
From: Peter Geoghegan
Date:
Subject: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core