Thread: Immediate shutdown and system(3)

Immediate shutdown and system(3)

From

Heikki Linnakangas

Date:

27 February 2009, 05:52:55

We're using SIGQUIT to signal immediate shutdown request. Upon receiving
SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT
and exits.

This is a problem when child processes use system(3) to call other
programs. We use system(3) in two places: to execute archive_command and
restore_command. Fujii Masao identified this with pg_standby back in
November:

http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa3a1@mail.gmail.com

I'm starting a new thread to bring this to attention of those who
haven't been following the hot standby stuff. pg_standby has a
particular problem because it traps SIGQUIT to mean "end recovery,
promote standby to master", which it shouldn't do IMHO. But ignoring
that for a moment, the problem is generic.

SIGQUIT by default dumps core. That's not what we want to happen on
immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
immediately instead, but external commands will dump core. system(3)
ignores SIGQUIT, so we can't trap it in the parent process; it is always
relayed to the child.

There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us
trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It
might be a bit tricky to get this right in a portable way; Windows would
certainly need a completely separate implementation.

2. Use a signal other than SIGQUIT for immediate shutdown of child
processes. We can't change the signal sent to postmaster for
backwards-compatibility reasons, but the signal sent by postmaster to
child processes we could change. We've already used all signals in
normal backends, but perhaps we could rearrange them.

3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
processes that use system(3): the archiver process and the startup
process. Neither of them use SIGINT currently. SIGINT is ignored by
system(3), like SIGQUIT, but the default action is to terminate the
process rather than core dump. Unfortunately pg_standby traps SIGINT too
to mean "promote to master", but we could change it to use SIGUSR1
instead for that purpose. If someone has a script that uses "killall
-INT pg_standby" to promote a standby server to master, it would need to
be changed. Looking at the manual page of pg_standby, however, it seems
that the kill-method of triggering a promotion isn't documented, so with
a notice in release notes we could do that.

I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just
remove the signal handler for SIGQUIT from pg_standby and leave it at
that. If you perform an immediate shutdown, you can get a core dump from
archive_command or restore_command, but that's a minor inconvenience.

-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com