Re: Sync Rep and shutdown Re: Sync Rep v19 - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Sync Rep and shutdown Re: Sync Rep v19 |
Date | |
Msg-id | 1300265474.20494.7579.camel@ebony Whole thread Raw |
In response to | Re: Sync Rep and shutdown Re: Sync Rep v19 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Sync Rep and shutdown Re: Sync Rep v19
|
List | pgsql-hackers |
On Tue, 2011-03-15 at 22:07 -0400, Robert Haas wrote: > On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > Same as above. I think that it's more problematic to leave the code > > as it is. Because smart/fast shutdown can make the server get stuck > > until immediate shutdown is requested. > > I agree that the current state of affairs is a problem. However, > after looking through the code somewhat carefully, it looks a bit > difficult to fix. Suppose that backend A is waiting for sync rep. A > fast shutdown is performed. Right now, backend A shrugs its shoulders > and does nothing. Not good. But suppose we change it so that backend > A closes the connection and exits without either confirming the commit > or throwing ERROR/FATAL. That seems like correct behavior, since, if > we weren't using sync rep, the client would have to interpret that as > indicating that the connection denied in mid-COMMIT, and mustn't > assume anything about the state of the transaction. So far so good. > > The problem is that there may be another backend B waiting on a lock > held by A. If backend A exits cleanly (without a PANIC), it will > remove itself from the ProcArray and release locks. That wakes up A, > which can now go do its thing. If the operating system is a bit on > the slow side delivering the signal to B, then the client to which B > is connected might manage to see a database state that shows the > transaction previous running in A as committed, even though that > transaction wasn't committed. That would stink, because the whole > point of having A hold onto locks until the standby ack'd the commit > was that no other transaction would see it as committed until it was > replicated. > > This is a pretty unlikely race condition in practice but people who > are running sync rep are intending precisely to guard against unlikely > failure scenarios. > > The only idea I have for allowing fast shutdown to still be fast, even > when sync rep is involved, is to shut down the system in two phases. > The postmaster would need to stop accepting new connections, and first > kill off all the backends that aren't waiting for sync rep. Then, > once all remaining backends are waiting for sync rep, we can have them > proceed as above: close the connection without acking the commit or > throwing ERROR/FATAL, and exit. That's pretty complicated, especially > given the rule that the postmaster mustn't touch shared memory, but I > don't see any alternative. > We could just not allow fast shutdown, as > now, but I think that's worse. Please explain why not allowing fast shutdown makes it worse? For me, I'd rather not support a whole bunch of dubious code, just to allow you to type -m fast when you can already type -m immediate. What extra capability are we actually delivering by doing that?? The risk of introducing a bug and thereby losing data far outweighs the rather dubious benefit. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: