Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Date
Msg-id AANLkTinSxW_DrJwwfcOoEoJV-UeuVZjrtoSKeP5R69WE@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
List pgsql-hackers
On Fri, Mar 18, 2011 at 12:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2011-03-18 at 17:47 +0200, Heikki Linnakangas wrote:
>> On 18.03.2011 16:52, Kevin Grittner wrote:
>> > Simon Riggs<simon@2ndQuadrant.com>  wrote:
>> >
>> >> In PostgreSQL other users cannot observe the commit until an
>> >> acknowledgement has been received.
>> >
>> > Really?  I hadn't picked up on that.  That makes for a lot of
>> > complication on crash-and-recovery of a master, but if we can pull
>> > it off, that's really cool.  If we do that and MySQL doesn't, we
>> > definitely don't want to use the same terminology they do, which
>> > would imply the same behavior.
>>
>> To be clear: other users cannot observe the commit until standby
>> acknowledges it - unless the master crashes while waiting for the
>> acknowledgment. If that happens, the commit will be visible to everyone
>> after recovery.
>
> No, only in the case where you choose not to failover to the standby
> when you crash, which would be a fairly strange choice after the effort
> to set up the standby. In a correctly configured and operated cluster
> what I say above is fully correct and needs no addendum.

Except it doesn't work that way.  If, say, a backend on the master
core dumps, the system will perform a crash and restart cycle, and the
transaction will become visible whether it's yet been replicated or
not.  Since we now have a GUC to suppress restart after a backend
crash, it's theoretically possible to set up the system so that this
doesn't occur, but it'd take quite a bit of work to make it robust and
automatic, and it's certainly not the default out of the box.

The fundamental problem here is that once you update CLOG and flush
the corresponding WAL record, there is no going backward.  You can
hold the system in some intermediate state where the transaction still
holds locks and is excluded from MVCC snapshots, but there's no way to
back up.  So there are bound to be corner cases where the where the
wait doesn't last as long as you want, and stuff leaks out around the
edges.  It's fundamentally impossible to guarantee that you'll remain
in that intermediate state forever - what do you do if a meteor hits
the synchronous standby and at the same time you lose power to the
master?  No amount of configuration will save you from coming back on
line with a visible-but-unreplicated transaction.  I'm not knocking
the system; I think what we have is impressively good.  But pretending
that corner cases can't happen gets us nowhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Next
From: Robert Haas
Date:
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19