Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket - Mailing list pgsql-hackers

From Nico Williams
Subject Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Date
Msg-id 20180719200143.GJ9712@localhost
Whole thread Raw
In response to Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
List pgsql-hackers
On Thu, Jul 19, 2018 at 03:49:35PM -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote:
> >> The regular backend's quickdie() function is more tricky. It should also
> >> call _exit(2) rather than exit(2). But it also tries to ereport a WARNING,
> >> and that is quite useful.
> 
> There's already an on_exit_reset in there; why do we need more than that?

You're not allowed to call exit(3) in signal handlers.  exit(3) is not
async-signal-safe (for example, it calls atexit handlers, flushes stdio,
and who knows what else).

> > Is that actually true? Clients like libpq create the same error message
> > (which has its own issues, because it'll sometimes mis-interpret
> > things). The message doesn't actually have useful content, no?
> 
> Yes, it does: it lets users tell the difference between exit due to a
> SIGQUIT and a crash of their own backend.

ereport() calls (via errstart() and write_stderr()) vfprintf() and
fflush(stderr).  That's absolutely not async-signal-safe.  The WIN32
code in write_stderr() does use vsnprintf() instead, and that could get
written to stderr with write(2), which is async-signal-safe.

> Admittedly, if we crash trying to send the message, then we're not
> better off.  But since that happens only very rarely, I do not think
> it's a reasonable tradeoff to never send it at all.

If sending it means using OpenSSL or what have you, that's probably even
more async-signal-safe code.

Now, ereport() could all be made safe enough for use in signal handlers,
but it isn't at this time.

What I'd do is have a volatile sig_atomic_t in_signal_handler_context
variable to indicate that we're dying, and then when that is non-zero,
ereport() and friends could use all-async-signal-safe codepaths.

Nico
-- 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Possible performance regression in version 10.1 with pgbench read-write tests.
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket