Re: Sync Rep Design - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Sync Rep Design
Date
Msg-id 4D205A94.6070205@2ndquadrant.com
Whole thread Raw
In response to Re: Sync Rep Design  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2.1.2011 5:36, Robert Haas wrote:
> On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>> Yes, working out the math is a good idea. Things are much clearer if we
>> do that.
>>
>> Let's assume we have 98% availability on any single server.
>>
>> 1. Having one primary and 2 standbys, either of which can acknowledge,
>> and we never lock up if both standbys fail, then we will have 99.9992%
>> server availability. (So PostgreSQL hits "5 Nines", with data
>> guarantees). ("Maximised availability")
> I don't agree with this math.  If the master and one standby fail
> simultaneously, the other standby is useless, because it may or may
> not be caught up with the master.  You know that the last transaction
> acknowledged as committed by the master is on at least one of the two
> standbys, but you don't know which one, and so you can't safely
> promote the surviving standby.
> (If you are working in an environment where promoting the surviving
> standby when it's possibly not caught up is OK, then you don't need
> sync rep in the first place: you can just run async rep and get much
> better performance.)
> So the availability is 98% (you are up when the master is up) + 98%^2
> * 2% (you are up when both slaves are up and the master is down) =
> 99.92%.  If you had only a single standby, then you could be certain
> that any commit acknowledged by the master was on that standby.  Thus
> your availability would be 98% (up when master is up) + 98% * 2% (you
> are up when the master is down and the slave is up) = 99.96%.
>
OTOH, in the case where you need _all_ the slaves to confirm any failing 
slave brings
the master down, so adding a slave brings down availability by extra 2%

The solution to achieving good durability AND availability is requiring 
N past the
post instead of 1 past the post.

In this case you can get to 99.9992% availability with master + 3 sync 
slaves, 2 of which have ACK.

---------------------------------------
Hannu Krosing
Performance and Infinite Scalability Consultant
http://www.2ndQuadrant.com/books/





pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Extension upgrade, patch v0: debug help needed
Next
From: Jan Urbański
Date:
Subject: Re: pl/python refactoring