Re: Sync Rep and shutdown Re: Sync Rep v19 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Sync Rep and shutdown Re: Sync Rep v19
Date
Msg-id 1300265474.20494.7579.camel@ebony
Whole thread Raw
In response to Re: Sync Rep and shutdown Re: Sync Rep v19  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Sync Rep and shutdown Re: Sync Rep v19
List pgsql-hackers
On Tue, 2011-03-15 at 22:07 -0400, Robert Haas wrote:
> On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > Same as above. I think that it's more problematic to leave the code
> > as it is. Because smart/fast shutdown can make the server get stuck
> > until immediate shutdown is requested.
> 
> I agree that the current state of affairs is a problem.  However,
> after looking through the code somewhat carefully, it looks a bit
> difficult to fix.  Suppose that backend A is waiting for sync rep.  A
> fast shutdown is performed.  Right now, backend A shrugs its shoulders
> and does nothing.  Not good.  But suppose we change it so that backend
> A closes the connection and exits without either confirming the commit
> or throwing ERROR/FATAL.  That seems like correct behavior, since, if
> we weren't using sync rep, the client would have to interpret that as
> indicating that the connection denied in mid-COMMIT, and mustn't
> assume anything about the state of the transaction.  So far so good.
> 
> The problem is that there may be another backend B waiting on a lock
> held by A.  If backend A exits cleanly (without a PANIC), it will
> remove itself from the ProcArray and release locks.  That wakes up A,
> which can now go do its thing.  If the operating system is a bit on
> the slow side delivering the signal to B, then the client to which B
> is connected might manage to see a database state that shows the
> transaction previous running in A as committed, even though that
> transaction wasn't committed.  That would stink, because the whole
> point of having A hold onto locks until the standby ack'd the commit
> was that no other transaction would see it as committed until it was
> replicated.
> 
> This is a pretty unlikely race condition in practice but people who
> are running sync rep are intending precisely to guard against unlikely
> failure scenarios.
> 
> The only idea I have for allowing fast shutdown to still be fast, even
> when sync rep is involved, is to shut down the system in two phases.
> The postmaster would need to stop accepting new connections, and first
> kill off all the backends that aren't waiting for sync rep.  Then,
> once all remaining backends are waiting for sync rep, we can have them
> proceed as above: close the connection without acking the commit or
> throwing ERROR/FATAL, and exit.  That's pretty complicated, especially
> given the rule that the postmaster mustn't touch shared memory, but I
> don't see any alternative.  


> We could just not allow fast shutdown, as
> now, but I think that's worse.

Please explain why not allowing fast shutdown makes it worse?

For me, I'd rather not support a whole bunch of dubious code, just to
allow you to type -m fast when you can already type -m immediate.

What extra capability are we actually delivering by doing that??
The risk of introducing a bug and thereby losing data far outweighs the
rather dubious benefit.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: How should the waiting backends behave in sync rep?
Next
From: Fujii Masao
Date:
Subject: Re: Replication server timeout patch