WAL senders sending base backups not listening much to SIGTERM - Mailing list pgsql-bugs

From Michael Paquier
Subject WAL senders sending base backups not listening much to SIGTERM
Date
Msg-id CAB7nPqQokxXWEGZLOFEkeDdPWEikxVRb5g=NeAtEQxZhJ5p12Q@mail.gmail.com
Whole thread Raw
Responses Re: WAL senders sending base backups not listening much to SIGTERM  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
Hi all,

A couple of days ago I received as report that Postgres does not
shutdown quickly even if the fast stop mode is used with pg_ctl.
Basically "pg_ctl stop -m fast -t 300" was trying to stop the server
but I saw the following process still remaining alive:
vpostgr+  6883  0.0  0.1 490780 14928 ?        Ss   00:51   0:00
postgres: wal sender process replicator 192.168.111.152(39986) sending
backup "pg_basebackup base backup"
And this prevented the postmaster to stop for 5 minutes, until it gave
up at the end of the timeout.

I am aware of the fact that WAL senders are stopped last to be given
the chance to stream WAL records at shutdown, per what InitWalSnd. But
also what I am noticing is that in this case WAL senders check for
walsender_ready_to_stop to determine if a WAL sender should do an
early exit or not, but WAL senders sending base backups do not check
or use it.

I have not been able to reproduce manually this behavior with 9.4.9
(master seems a lot of responsive) and saw this behavior only once on
a test lab, with a rather large base backup. This is rather an
annoying behavior, and I'd expect the WAL sender to leave as fast as
it can, and in case if a fast mode I'd expect server to be left in a
clean state by using CancelBackup() at least.

Perhaps I am missing something? Thoughts?
--
Michael

pgsql-bugs by date:

Previous
From: Vik Fearing
Date:
Subject: Re: BUG #14340: pg xlog size increasing
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #14334: vacuumdb.c build failure on openbsd