When walsender calls out to do_pg_stop_backup() (during base backups),
it is not possible to terminate the process with a SIGTERM - it
requires a SIGKILL. This can leave unkillable backends for example if
archive_mode is on and archive_command is failing (or not set). A
similar thing would happen in other cases if walsender calls out to
something that would block (do_pg_start_backup() for example), but the
stop one is easy to provoke.
ISTM one way to fix it is the attached, which is to have walsender set
the "global" flags saying that we have received sigterm, which in turn
causes the CHECK_FOR_INTERRUPTS() calls in the routines to properly
exit the process. AFAICT it works fine. Any holes in this approach?
Second, I wonder if we should add a SIGINT handler as well, that would
make it possible to send a cancel signal. Given that the end result
would be the same (at least if we want to keep with the "walsender is
simple" path), I'm not sure it's necessary - but it would at least
help those doing pg_cancel_backend()... thoughts?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/