Re: Revised patch for fixing archiver shutdown behavior - Mailing list pgsql-patches

From Tom Lane
Subject Re: Revised patch for fixing archiver shutdown behavior
Date
Msg-id 22574.1199927796@sss.pgh.pa.us
Whole thread Raw
In response to Revised patch for fixing archiver shutdown behavior  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Revised patch for fixing archiver shutdown behavior
List pgsql-patches
I wrote:
> One point needing discussion is that the postmaster is currently
> coded not to send SIGUSR1 to the archiver if a fast-mode shutdown
> is under way.  I duplicated that in the added SIGUSR1 signal here,
> but I wonder whether it is sane or not.  Comments?

After chewing on that for awhile, I decided it was bogus.  If we are
going to have a policy that the archiver gets a chance to archive
everything, that shouldn't depend on fast vs. smart shutdown; those
alternatives determine whether we kick clients out ungracefully,
not whether we take extra risks with committed data.

I think we should allow the archiver to finish out its tasks fully
in all non-crash cases except one: if we got SIGTERM from init.
In that case there's a very great risk of being SIGKILL'd before
we can finish archiving.  The postmaster cannot easily tell whether
its SIGTERM came from init or not, but we can drive this off the
archiver itself getting SIGTERM'd.  I propose that if the archiver
receives SIGTERM, it should cease to issue any new archive commands,
but just wait till it sees the postmaster exit.  (It can't exit
right away, since there's a race condition: the postmaster might
not have been SIGTERM'd yet, and might therefore spawn a new
archiver, which would have no idea it's unsafe to do anything more.)

There's an obvious failure mode in that, which is that a randomly
issued SIGTERM to the archiver would shut down archiving indefinitely.
We can guard against that with a timeout: the archiver should exit
a minute or two after being SIGTERM'd, even if the postmaster is still
there.  That should certainly be enough delay to avoid the race
condition, and if in fact everything is still hunky-dory the
postmaster will immediately spawn a new archiver.

Hence, attached revised patch ...

            regards, tom lane


Attachment

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Revised patch for fixing archiver shutdown behavior
Next
From: Bruce Momjian
Date:
Subject: Re: DOC: Wal update