Hi,
On 2020-09-09 16:30:37 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-09-09 16:09:00 -0400, Tom Lane wrote:
> >> We could call it startup_packet_die or something?
>
> > Yea, I think that'd be good.
>
> I'll make it so.
Thanks!
> >> We see backends going through this code on a very regular basis in the
> >> buildfarm, but complete hangs are rare as can be. I think you
> >> overestimate the severity of the problem.
>
> > I don't think the BF exercises the problmetic paths to a significant
> > degree. It's mostly local socket connections, and where not it's
> > localhost. There's no slow DNS, no more complicated authentication
> > methods, no packet loss. How often do we ever actually end up even
> > getting close to any of the paths but immediate shutdowns?
>
> Since we're talking about quickdie(), immediate shutdown/crash restart
> is exactly the case of concern, and the buildfarm exercises it all the
> time.
Yea, but only in simple cases. Largely no SSL / kerberos. Largely
untranslated. Mostly the immediate shutdowns aren't when inside plpython
or such.
> > And in the
> > SIGQUIT path, how often do we end up in the SIGKILL path, masking
> > potential deadlocks?
>
> True, we can't really tell that. I wonder if we should make the
> postmaster emit a log message when it times out and goes to SIGKILL.
> After a few months we could scrape the buildfarm logs and get a
> pretty good handle on it.
I think that'd be a good idea.
Greetings,
Andres Freund