Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Date
Msg-id 771a2a12-dcc3-a64e-f54d-b6bbc030d2f8@iki.fi
Whole thread Raw
In response to Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket  (Ashwin Agrawal <aagrawal@pivotal.io>)
Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 19/07/18 23:13, Andres Freund wrote:
> On 2018-07-19 14:54:52 -0500, Nico Williams wrote:
>> On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote:
>>> On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote:
>>>> Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I
>>>> agree we should just _exit(2). All we want to do is to exit the process
>>>> immediately.
>>>
>>> Indeed.
>>>
>>>> bgworker_quickdie() makes some effort to block SIGQUIT during the exit()
>>>> processing, but that doesn't solve the whole problem. The process could've
>>>> been in the middle of a malloc/free when the signal arrived, for example.
>>>> exit() is simply not safe to call from a signal handler.
>>>
>>> Yea. I really don't understand why we have't been able to agree on this
>>> for *years* now.
>>
>> I mean, only calling async-signal-safe functions from signal handlers is
>> a critical POSIX requirement, and exit(3) is NOT async-signal-safe.
> 
> There's a few standard requirements that aren't particularly relevant in
> practice (including some functions not being listed as signal
> safe). Just arguing from that isn't really helpful. But there's plenty
> hard evidence, including a few messages upthread!, of it being
> practically problematic. Just looking at the reasoning at why exit (and
> malloc) aren't signal safe however, makes it clear that this particular
> restriction should be adhered to, however.

ISTM that no-one has any great ideas on what to do about the ereport() 
in quickdie(). But I think we have consensus on replacing the exit(2) 
calls with _exit(2). If we do just that, it would be better than the 
status quo, even if it doesn't completely fix the problem. This would 
prevent the case that Asim reported, for starters.

Any objections to the attached?

With _exit(), I think we wouldn't really need the 
"PG_SETMASK(&BlockSig);" calls in the aux process *_quickdie handlers 
that don't do anything else than call _exit(). But I didn't dare to 
remove them yet.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: AYahorau@ibagroup.eu
Date:
Subject: Adding TCP_USER_TIMEOUT support for libpq/psqlodbc
Next
From: Etsuro Fujita
Date:
Subject: Re: de-deduplicate code in DML execution hooks in postgres_fdw