Re: Sync Rep Design - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: Sync Rep Design
Date
Msg-id 4D1F6938.1090101@kaltenbrunner.cc
Whole thread Raw
In response to Re: Sync Rep Design  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep Design  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 01/01/2011 06:29 PM, Simon Riggs wrote:
> On Sat, 2011-01-01 at 18:13 +0100, Stefan Kaltenbrunner wrote:
>> On 01/01/2011 05:55 PM, Simon Riggs wrote:
>>>
>>> It appears to me there has been substantial confusion over alternatives,
>>> because of a misunderstanding about how synchronisation works. Requiring
>>> confirmation that standbys are in sync is *not* the same thing as them
>>> actually being in sync. Every single proposal made by anybody here on
>>> hackers that supports multiple standby servers suffers from the same
>>> issue: when the primary crashes you need to work out which standby
>>> server is ahead.
>>
>> aaah that was exactly what I was after - so the problem is that when you
>> have a sync standby it will technically always be "in front" of the
>> master (because it needs to fsync/apply/whatever before the master).
>> In the end the question boils down to what is "the bigger problem" in
>> the case of a lost master:
>
>> a) a transaction that was confirmed on the master but might not be on
>> any of the surviving sync standbys (or you will never know if it is) -
>> this is how I understand the proposal so far
>
> No that cannot happen, the current situation is that we will fsync WAL
> on the master, then fsync WAL on the standby, then reply to the master.
> The standby is never ahead of the master, at any point.

hmm maybe my "surviving" standbys(the case I'm wondering about is whole 
datacenter failures which might take out more than just the master) was 
not clear - consider three boxes, one master and two standby and 
semisync replication(ie any one of the standbys is enough to reply).

1. master fsyncs wal
2. standby #1 fsyncs and replies
3. master confirms commit
4. desaster strikes and destroys master and standby #1 while standby m2 
never had time to apply the change(IO/CPU load, latency, whatever)
5. now you have a sync standby that is missing something that was 
commited on the master and confirmed to the client and no way to verify 
that this thing happened (same problem with more than two standbys - as 
long as you lose ONE standby and the master at the same time you will 
never be sure)



what is it that I'm missing here?


>
>> b) a transaction that was not yet confirmed on the master but might have
>> been applied on the surving standby before the desaster - this is what I
>> understand "confirm from all sync standbys" could result in.
>
> Yes, that is described in the docs changes I published.
>
> (a) was discussed, but ruled out, since it would require any crash/immed
> shutdown of the master to become a failover, or have some kind of weird
> back channel to give the missing data back.
>
> There hasn't been any difference of opinion in this area, that I am
> aware of. All proposals have offered (b).

hmm I'm confused now - any chance you mixed up a & b here because in a) 
no backchannel is needed because the standby could just fetch the 
missing data from the master?
If that is the case I agree that it would be hard to get the replication 
up again after a crash of the master with a standby that is ahead but in 
the end it would be a business decision (as in conflict resolution) on 
what to do - take the "ahead" standbys data and use that or destroy the 
old standby and recreate.



Stefan


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Sync Rep Design
Next
From: Tom Lane
Date:
Subject: Re: ALTER TABLE .. SET SCHEMA lock strength