Re: SIGQUIT handling, redux - Mailing list pgsql-hackers

From Andres Freund
Subject Re: SIGQUIT handling, redux
Date
Msg-id 20200909210654.ayufghpoczz7jfmx@alap3.anarazel.de
Whole thread Raw
In response to Re: SIGQUIT handling, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2020-09-09 16:30:37 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-09-09 16:09:00 -0400, Tom Lane wrote:
> >> We could call it startup_packet_die or something?
> 
> > Yea, I think that'd be good.
> 
> I'll make it so.

Thanks!


> >> We see backends going through this code on a very regular basis in the
> >> buildfarm, but complete hangs are rare as can be.  I think you
> >> overestimate the severity of the problem.
> 
> > I don't think the BF exercises the problmetic paths to a significant
> > degree. It's mostly local socket connections, and where not it's
> > localhost. There's no slow DNS, no more complicated authentication
> > methods, no packet loss. How often do we ever actually end up even
> > getting close to any of the paths but immediate shutdowns?
> 
> Since we're talking about quickdie(), immediate shutdown/crash restart
> is exactly the case of concern, and the buildfarm exercises it all the
> time.

Yea, but only in simple cases. Largely no SSL / kerberos. Largely
untranslated. Mostly the immediate shutdowns aren't when inside plpython
or such.


> > And in the
> > SIGQUIT path, how often do we end up in the SIGKILL path, masking
> > potential deadlocks?
> 
> True, we can't really tell that.  I wonder if we should make the
> postmaster emit a log message when it times out and goes to SIGKILL.
> After a few months we could scrape the buildfarm logs and get a
> pretty good handle on it.

I think that'd be a good idea.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Next
From: Justin Pryzby
Date:
Subject: Re: v13: show extended stats target in \d