"Ed L." <pgsql@bluepolka.net> writes:
> Uh, no, I didn't say signal 9 is SIGTERM. Isn't a "smart" shutdown request
> an indication of a SIGTERM? I'm just speculating about what happened, but
> isn't that what you'd see during a system shutdown? The kernel sending
> SIGTERMs?
Yes, the trace is sort of consistent with the idea of a system shutdown:
you'd see SIGTERMs issued, followed some time later by SIGKILL.
I thought Sean had said that the machine did not shut down during this
interval, and so mentally eliminated that theory --- but based on his
latest comment I guess that is what happened after all.
So that does leave me with a question: why didn't it work more cleanly?
Our signal responses are designed around the assumption that during
shutdown the kernel will send SIGTERM to *all* the Postgres processes.
Backends interpret that as an immediate shutdown and should exit quickly
enough to avoid getting SIGKILL'd later. It looks like either the
postmaster was sent SIGTERM but the backends weren't, or the interval
between SIGTERM and SIGKILL was unreasonably short. I don't think I
believe the latter; the last time I checked this on Darwin, it seemed to
be using the traditional 20-second grace period.
Another question: if that was a shutdown we were looking at, how did the
postmaster live long enough to record the final log lines? It shoulda
gotten SIGKILL'd at the same time as its children.
In short, there's something pretty odd about the way these signals are
being passed around. It looks something like a standard system shutdown
sequence, but not enough like it.
regards, tom lane