Thread: A note about SIGTERM illusion and reality

A note about SIGTERM illusion and reality

From

Tom Lane

Date:

11 January 2008, 22:59:54

If you've paid any attention to our signal handling and shutdown
procedures, you know that there is a whole lot of design and logic
based on the assumption that during a forced system shutdown,
we will see SIGTERM delivered to all PG processes, with a little bit
of grace time before we get SIGKILL'ed.

I had occasion to test this yesterday (thank you, Duquesne Light)
and couldn't help noticing a lack of expected behavior in the postmaster
log after the lights came back on.  The database recovered fine, but it
had to recover --- there was no orderly shutdown as intended, and the
only indication that the postmaster had any notice at all was a log
entry about a SIGKILL on the walwriter process.

After digging in the man page for init(8) on a couple of machines,
I realized that the SIGTERM-then-SIGKILL behavior only applies to
processes that are launched directly by init.  Now, I can recall
having started the postmaster from an inittab entry on a few systems
I maintained years ago, but it's certainly not been the recommended
practice for a long time --- I think all modern distributions use SysV
init scripts or something comparable.  The inittab idea still has some
attraction because it guarantees automatic restart if the postmaster
dies ... but it's been a long time since that was a big hazard.

Even more bit-rot in the concept: init will only SIGTERM its direct
child and members of that child's process group.  Not too long ago
we made most of the postmaster children do setsid() to create their own
process groups, so even if you did launch the postmaster from an inittab
entry, things wouldn't work as intended.

I have no idea what (if anything) we should do about this; but it
seems clear that there's some design thinking that could stand to be
revisited.
        regards, tom lane

Re: A note about SIGTERM illusion and reality

From

Tom Lane

Date:

12 January 2008, 16:28:38

I wrote:
> After digging in the man page for init(8) on a couple of machines,
> I realized that the SIGTERM-then-SIGKILL behavior only applies to
> processes that are launched directly by init.

Actually, after further experimentation, it seems this is very
platform-dependent.

Current Linux (tested on Fedora 8) actually does behave in the way
our code expects, ie, every process gets a SIGTERM.  The default
SIGTERM-to-SIGKILL delay is only 5 seconds, which is likely not enough
for a checkpoint in a busy database, but at least we tried :-(.

Mac OS X seems to issue everything SIGQUIT, instead of SIGTERM.
I didn't experiment to see how much grace period there might be.

No idea about other BSDen, though OS X's behavior might be typical.

HPUX seems to go straight to SIGKILL.

So the bottom line is that the best bet is to rely on an initscript's
stop command to issue "pg_ctl stop -m fast", which will give us enough
time to perform a clean shutdown.  The other behavior is only of
interest for databases that aren't controlled by an initscript known
to the system.  But there does seem to be some value in our designed
response to SIGTERM, on at least one popular platform, so I no longer
feel a need to rethink that.
        regards, tom lane