Re: Standalone synchronous master - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Standalone synchronous master
Date
Msg-id 52CDF4E1.8000604@nasby.net
Whole thread Raw
In response to Re: Standalone synchronous master  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Standalone synchronous master
List pgsql-hackers
On 1/8/14, 6:05 PM, Tom Lane wrote:
> Josh Berkus<josh@agliodbs.com>  writes:
>> >On 01/08/2014 03:27 PM, Tom Lane wrote:
>>> >>What we lack, and should work on, is a way for sync mode to have M larger
>>> >>than one.  AFAICS, right now we'll report commit as soon as there's one
>>> >>up-to-date replica, and some high-reliability cases are going to want
>>> >>more.
>> >"Sync N times" is really just a guarantee against data loss as long as
>> >you lose N-1 servers or fewer.  And it becomes an even
>> >lower-availability solution if you don't have at least N+1 replicas.
>> >For that reason, I'd like to see some realistic actual user demand
>> >before we take the idea seriously.
> Sure.  I wasn't volunteering to implement it, just saying that what
> we've got now is not designed to guarantee data survival across failure
> of more than one server.  Changing things around the margins isn't
> going to improve such scenarios very much.
>
> It struck me after re-reading your example scenario that the most
> likely way to figure out what you had left would be to see if some
> additional system (think Nagios monitor, or monitors) had records
> of when the various database servers went down.  This might be
> what you were getting at when you said "logging", but the key point
> is it has to be logging done on an external server that could survive
> failure of the database server.  postmaster.log ain't gonna do it.

Yeah, and I think that the logging command that was suggested allows for that *if configured correctly*.

Automatic degradation to async is useful for protecting you against all modes of a single failure: Master fails, you've
gotthe replica. Replica fails, you've got the master.
 

But fit hits the shan as soon as you get a double failure, and that double failure can be very subtle. Josh's case is
notsubtle: You lost power AND the master died. You KNOW you have two failures.
 

But what happens if there's a network blip that's not large enough to notice (but large enough to degrade your
replication)and the master dies? Now you have no clue if you've lost data.
 

Compare this to async: if the master goes down (one failure), you have zero clue if you lost data or not. At least with
auto-degredationyou know you have to have 2 failures to suffer data loss.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Standalone synchronous master
Next
From: Jim Nasby
Date:
Subject: Re: nested hstore patch