Re: Sync Rep and shutdown Re: Sync Rep v19 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Sync Rep and shutdown Re: Sync Rep v19
Date
Msg-id AANLkTimQC+sq69_kQ6kKEmtMSxZg26tpiATuGoejqQCN@mail.gmail.com
Whole thread Raw
In response to Re: Sync Rep and shutdown Re: Sync Rep v19  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep and shutdown Re: Sync Rep v19
List pgsql-hackers
On Wed, Mar 16, 2011 at 4:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2011-03-15 at 22:07 -0400, Robert Haas wrote:
>> On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > Same as above. I think that it's more problematic to leave the code
>> > as it is. Because smart/fast shutdown can make the server get stuck
>> > until immediate shutdown is requested.
>>
>> I agree that the current state of affairs is a problem.  However,
>> after looking through the code somewhat carefully, it looks a bit
>> difficult to fix.  Suppose that backend A is waiting for sync rep.  A
>> fast shutdown is performed.  Right now, backend A shrugs its shoulders
>> and does nothing.  Not good.  But suppose we change it so that backend
>> A closes the connection and exits without either confirming the commit
>> or throwing ERROR/FATAL.  That seems like correct behavior, since, if
>> we weren't using sync rep, the client would have to interpret that as
>> indicating that the connection denied in mid-COMMIT, and mustn't
>> assume anything about the state of the transaction.  So far so good.
>>
>> The problem is that there may be another backend B waiting on a lock
>> held by A.  If backend A exits cleanly (without a PANIC), it will
>> remove itself from the ProcArray and release locks.  That wakes up A,
>> which can now go do its thing.  If the operating system is a bit on
>> the slow side delivering the signal to B, then the client to which B
>> is connected might manage to see a database state that shows the
>> transaction previous running in A as committed, even though that
>> transaction wasn't committed.  That would stink, because the whole
>> point of having A hold onto locks until the standby ack'd the commit
>> was that no other transaction would see it as committed until it was
>> replicated.
>>
>> This is a pretty unlikely race condition in practice but people who
>> are running sync rep are intending precisely to guard against unlikely
>> failure scenarios.
>>
>> The only idea I have for allowing fast shutdown to still be fast, even
>> when sync rep is involved, is to shut down the system in two phases.
>> The postmaster would need to stop accepting new connections, and first
>> kill off all the backends that aren't waiting for sync rep.  Then,
>> once all remaining backends are waiting for sync rep, we can have them
>> proceed as above: close the connection without acking the commit or
>> throwing ERROR/FATAL, and exit.  That's pretty complicated, especially
>> given the rule that the postmaster mustn't touch shared memory, but I
>> don't see any alternative.
>
>
>> We could just not allow fast shutdown, as
>> now, but I think that's worse.
>
> Please explain why not allowing fast shutdown makes it worse?
>
> For me, I'd rather not support a whole bunch of dubious code, just to
> allow you to type -m fast when you can already type -m immediate.
>
> What extra capability are we actually delivering by doing that??
> The risk of introducing a bug and thereby losing data far outweighs the
> rather dubious benefit.

Well, my belief is that when users ask the database to shut down, they
want it to work.  If I'm the only one who thinks that, then whatever.
But I firmly believe we'll get bug reports about this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19
Next
From: Robert Haas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,