Thread: WAL senders sending base backups not listening much to SIGTERM

WAL senders sending base backups not listening much to SIGTERM

From
Michael Paquier
Date:
Hi all,

A couple of days ago I received as report that Postgres does not
shutdown quickly even if the fast stop mode is used with pg_ctl.
Basically "pg_ctl stop -m fast -t 300" was trying to stop the server
but I saw the following process still remaining alive:
vpostgr+  6883  0.0  0.1 490780 14928 ?        Ss   00:51   0:00
postgres: wal sender process replicator 192.168.111.152(39986) sending
backup "pg_basebackup base backup"
And this prevented the postmaster to stop for 5 minutes, until it gave
up at the end of the timeout.

I am aware of the fact that WAL senders are stopped last to be given
the chance to stream WAL records at shutdown, per what InitWalSnd. But
also what I am noticing is that in this case WAL senders check for
walsender_ready_to_stop to determine if a WAL sender should do an
early exit or not, but WAL senders sending base backups do not check
or use it.

I have not been able to reproduce manually this behavior with 9.4.9
(master seems a lot of responsive) and saw this behavior only once on
a test lab, with a rather large base backup. This is rather an
annoying behavior, and I'd expect the WAL sender to leave as fast as
it can, and in case if a fast mode I'd expect server to be left in a
clean state by using CancelBackup() at least.

Perhaps I am missing something? Thoughts?
--
Michael

Re: WAL senders sending base backups not listening much to SIGTERM

From
Michael Paquier
Date:
On Tue, Sep 27, 2016 at 11:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I have not been able to reproduce manually this behavior with 9.4.9
> (master seems a lot of responsive) and saw this behavior only once on
> a test lab, with a rather large base backup. This is rather an
> annoying behavior, and I'd expect the WAL sender to leave as fast as
> it can, and in case if a fast mode I'd expect server to be left in a
> clean state by using CancelBackup() at least.
>
> Perhaps I am missing something? Thoughts?

And I did. The application has kept bombarding Postgres with
pg_basebackup -c spread requests when it should not have..
--
Michael