Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SIGQUIT handling, redux
Date
Msg-id 110116.1599682140@sss.pgh.pa.us
Whole thread Raw
In response to Re: SIGQUIT handling, redux  (Andres Freund <andres@anarazel.de>)
Responses Re: SIGQUIT handling, redux  (Andres Freund <andres@anarazel.de>)
Re: SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> I wish startup_die() weren't named startup_ - every single time I see
> the name I think it's about the startup process...

We could call it startup_packet_die or something?

> I think StartupPacketTimeoutHandler is another case?

Yeah.  Although it's a lot less risky, since if the timeout is reached
we're almost certainly waiting for client input.

>> In passing, it's worth noting that startup_die() isn't really much safer
>> for SIGTERM than it is for SIGQUIT; the only argument for distinguishing
>> those is that code that applies BlockSig will at least manage to block the
>> former.

> Which is pretty unconvincing...

Agreed, it'd be nice if this were less shaky.  On the other hand,
we've seen darn few complaints traceable to this AFAIR.  I'm not
really sure it's worth putting a lot of effort into.

> The long term correct way to handle this would obviously be to
> restructure everything that happens covered by startup_die() in a
> non-blocking manner and just rely on CFR(). But that's a tall order to
> get done anytime soon, particularly things like DNS are IIRC pretty hard
> without relying on custom libraries.

Not only DNS, but all the various auth libraries would have to be
contended with.  Lots of work there compared to the likely rewards.

>> I don't want to give up trying to send a message to the client.

> That still doesn't make much sense to me. The potential for hanging
> (e.g. inside malloc) is so much worse than not sending a message...

We see backends going through this code on a very regular basis in the
buildfarm, but complete hangs are rare as can be.  I think you
overestimate the severity of the problem.

> I only had one coffee so far (and it looks like the sun has died
> outside), so maybe I'm just slow: But, uh, we don't currently send a
> message startup_die(), right?
> So that part is about quickdie()?

Right.  Note that startup_die() is pre-authentication, so I'm doubtful
that we should tell the would-be client anything about the state of
the server at that point, even ignoring these risk factors.  (I'm a
bit inclined to remove the comment suggesting that'd be desirable.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Andres Freund
Date:
Subject: Re: SIGQUIT handling, redux