Home > mailing lists

Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: SIGQUIT handling, redux
Date	September 9, 2020 23:09:00
Msg-id	110116.1599682140@sss.pgh.pa.us Whole thread Raw
In response to	Re: SIGQUIT handling, redux (Andres Freund <andres@anarazel.de>)
Responses	Re: SIGQUIT handling, redux Re: SIGQUIT handling, redux
List	pgsql-hackers

Tree view

Andres Freund <andres@anarazel.de> writes:
> I wish startup_die() weren't named startup_ - every single time I see
> the name I think it's about the startup process...

We could call it startup_packet_die or something?

> I think StartupPacketTimeoutHandler is another case?

Yeah.  Although it's a lot less risky, since if the timeout is reached
we're almost certainly waiting for client input.

>> In passing, it's worth noting that startup_die() isn't really much safer
>> for SIGTERM than it is for SIGQUIT; the only argument for distinguishing
>> those is that code that applies BlockSig will at least manage to block the
>> former.

> Which is pretty unconvincing...

Agreed, it'd be nice if this were less shaky.  On the other hand,
we've seen darn few complaints traceable to this AFAIR.  I'm not
really sure it's worth putting a lot of effort into.

> The long term correct way to handle this would obviously be to
> restructure everything that happens covered by startup_die() in a
> non-blocking manner and just rely on CFR(). But that's a tall order to
> get done anytime soon, particularly things like DNS are IIRC pretty hard
> without relying on custom libraries.

Not only DNS, but all the various auth libraries would have to be
contended with.  Lots of work there compared to the likely rewards.

>> I don't want to give up trying to send a message to the client.

> That still doesn't make much sense to me. The potential for hanging
> (e.g. inside malloc) is so much worse than not sending a message...

We see backends going through this code on a very regular basis in the
buildfarm, but complete hangs are rare as can be.  I think you
overestimate the severity of the problem.

> I only had one coffee so far (and it looks like the sun has died
> outside), so maybe I'm just slow: But, uh, we don't currently send a
> message startup_die(), right?
> So that part is about quickdie()?

Right.  Note that startup_die() is pre-authentication, so I'm doubtful
that we should tell the would-be client anything about the state of
the server at that point, even ignoring these risk factors.  (I'm a
bit inclined to remove the comment suggesting that'd be desirable.)

            regards, tom lane

pgsql-hackers by date:

From: Alvaro Herrera
Date: 09 September 2020, 22:53:30
Subject: Re: WIP: BRIN multi-range indexes

From: Andres Freund
Date: 09 September 2020, 23:22:01
Subject: Re: SIGQUIT handling, redux

Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

Previous

Next