Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket - Mailing list pgsql-hackers

From Nico Williams
Subject Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Date
Msg-id 20180719202705.GN9712@localhost
Whole thread Raw
In response to Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Jul 19, 2018 at 04:16:31PM -0400, Tom Lane wrote:
> Nico Williams <nico@cryptonector.com> writes:
> > I dunno if it is or isn't helpful.  But I do know that this must be done
> > in an async-signal-safe way.
> 
> I haven't actually heard a convincing reason why that's true.  As per

It's undefined behavior.  The system is free to do all sorts of terrible
things.  In practice deadlock or crash are the things you'll see, as you
say.

> the previous discussion, if we happen to service the SIGQUIT at an
> unfortunate moment, we might get a deadlock or crash in the backend
> process, and thereby fail to send the message.  But we're no worse
> off in such cases than if we'd not tried to send it at all.  The only
> likely penalty is that, in the deadlock case, a few seconds will elapse
> before the postmaster runs out of patience and sends SIGKILL.
> 
> Yeah, it'd be nicer if we weren't taking such risks, but the amount
> of effort required to get there seems very far out of proportion to
> the benefit.
> 
> > Besides making ereport() async-signal-safe, which is tricky, you could
> > write(2) the arguments to a pipe that another thread in the same process
> > is reading from and which will then call ereport() and exit(3).
> 
> We have not yet accepted any patch that introduces threading into the
> core backend, and this seems like a particularly unlikely place to
> start.  Color me dubious, as well, that thread 2 calling exit() (never
> mind ereport()) while the main thread continues to do $whatever offers
> any actual gain in safety.

No, the other thread does NOT continue to do whatever -- it
blocks/sleeps forever waiting for the coming exit(3).

I.e., quickdie() would look like this:

        /* Marshall info in to an automatic */
        struct quickdie_info info, *infop;

        info.this = that;
        ...
        infop = &info;

        /* Tell the death thread to report and exit */
        /* XXX actually something like net_write(), to handle EINTR */
        write(quickdie_pipe[1], infop, sizeof(infop));

        /* Wait forever */
        for (;;) usleep(FOREVER);

and the thread would basically do:

        struct quickdie_info *info;

        /* Wait forever for a quickdie() to happen on the main thread */
        /* XXX Actually something like net_read(), to handle EINTR */
        read(quickdie_pipe[0], &infop, sizeof(infop));

        ereport(infop->...);

        exit(1);

This use of threads does not require any changes to the rest of the
codebase.

Nico
-- 


pgsql-hackers by date:

Previous
From: Nico Williams
Date:
Subject: Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Next
From: Alexander Korotkov
Date:
Subject: Re: Bug in gin insert redo code path during re-compression of emptygin data leaf pages