Re: Improve shutdown during online backup, take 4 - Mailing list pgsql-patches

From Tom Lane
Subject Re: Improve shutdown during online backup, take 4
Date
Msg-id 28186.1209047125@sss.pgh.pa.us
Whole thread Raw
In response to Re: Improve shutdown during online backup, take 4  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Responses Re: Improve shutdown during online backup, take 4
List pgsql-patches
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
> Tom Lane wrote:
>> Lastly, the changes to pmdie's SIGINT handling seem quite bogus.
>> Don't you need to transition into WAIT_BACKUP rather than WAIT_BACKENDS
>> state in that case too?  Shouldn't you do CancelBackup *before*
>> PostmasterStateMachine?  The thing screams of race conditions.

> I suspect there must be a misunderstanding.
> You cannot really mean that the postmaster should enter WAIT_BACKUP
> state on a fast shutdown request.

Why not?  It'll fall out of the state again immediately in
PostmasterStateMachine, no, if we do a CancelBackup here?  I don't think
these two paths of control should be any more different than really
necessary.

> Sure, CancelBackup could be called earlier. It doesn't do much more than
> rename a file.
> My reason for calling it late was that backup mode should really only be
> cancelled if we manage to shutdown cleanly, and this is not clear until
> the last child is gone.

Well, if there were anything conditional about calling it, then maybe
that argument would hold some water, but the way you've got it here it
*will* get called anyway, just after the PostmasterStateMachine call
... except in the corner case where there were no child processes left
and so PostmasterStateMachine manages to decide it's ready to exit().
So far from implementing the logic you suggest, this arrangement never
gets it right, and has a race condition case in which it gets it exactly
backward.

The other reason for the remark about race conditions is that the
PostmasterStateMachine call should absolutely be the last thing that
pmdie() does --- putting anything after it is wrong, especially things
that might alter the PM state, as indeed CancelBackup could.  The reason
for having that in the signal handler is to cover the possibility that
no such call will occur immediately when we return to the wait loop.
In general all of the condition handlers in postmaster.c should be of
the form "respond to the immediate condition, and then let
PostmasterStateMachine decide if there's anything else to do".

If you actually want the behavior you propose, then the only correct way
to implement it is to embed it into the state machine logic, ie, do the
CancelBackup inside PostmasterStateMachine in some state transition
taken after the last child is gone.

            regards, tom lane

pgsql-patches by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Improve shutdown during online backup, take 4
Next
From: Alvaro Herrera
Date:
Subject: Re: lc_time and localized dates