On Wed, Oct 5, 2011 at 10:30 PM, Magnus Hagander <magnus@hagander.net> wrote:
> When walsender calls out to do_pg_stop_backup() (during base backups),
> it is not possible to terminate the process with a SIGTERM - it
> requires a SIGKILL. This can leave unkillable backends for example if
> archive_mode is on and archive_command is failing (or not set). A
> similar thing would happen in other cases if walsender calls out to
> something that would block (do_pg_start_backup() for example), but the
> stop one is easy to provoke.
Good catch!
> ISTM one way to fix it is the attached, which is to have walsender set
> the "global" flags saying that we have received sigterm, which in turn
> causes the CHECK_FOR_INTERRUPTS() calls in the routines to properly
> exit the process. AFAICT it works fine. Any holes in this approach?
Very simple patch. Looks fine.
> Second, I wonder if we should add a SIGINT handler as well, that would
> make it possible to send a cancel signal. Given that the end result
> would be the same (at least if we want to keep with the "walsender is
> simple" path), I'm not sure it's necessary - but it would at least
> help those doing pg_cancel_backend()... thoughts?
I don't think that's necessary because, as you suggested, there is no
use case for *now*. We can add that handler when someone proposes
the feature which requires that.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center