Re: Replication - Mailing list pgsql-hackers

From Chris Browne
Subject Re: Replication
Date
Msg-id 60wt8y195v.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to Re: PostgreSQL on 64 bit Linux  ("Luke Lonergan" <llonergan@greenplum.com>)
List pgsql-hackers
pgsql@j-davis.com (Jeff Davis) writes:
> On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote:
>> Hannu Krosing wrote:
>> > But if you have very few writes, then there seems no reason to do sync
>> > anyway.
>> 
>> I think there is one: high-availability. A standby-server which can 
>> continue if your primary fails. Of course sync is only needed if you 
>> absolutely cannot effort loosing any committed transaction.
>> 
>
> I disagree about high-availability. In fact, I would say that sync
> replication is trading availability and performance for synchronization
> (which is a valid tradeoff, but costly). 
>
> If you have an async system, all nodes must go down for the system to go
> down.
>
> If you have a sync system, if any node goes down the system goes down.
> If you plan on doing failover, consider this: what if it's not obvious
> which system is still up? What if the network route between the two
> systems goes down (or just becomes too slow to replicate over), but
> clients can still connect to both servers? Then you have two systems
> that both think that the other system went down, and both start
> accepting transactions. Now you no longer have replication at all.

That is why for multimaster, there's a need for both automatic policy
as well as some human intervention.

- You need an automatic determination of "quorum", where, to be safe, it is only permissible for a set of $m$ servers
tobelieve themselves to be active if they number more than 1/2 of the total of expected servers.
 
 Thus, if there are 13 servers in the cluster, then "quorum" is 7 servers.
 If a set of 6 servers get cut off from the rest of the network, they don't number at least 7, and thus know that they
can'trepresent a quorum.
 

- And if conditions change, a human may need to change the quorum number.
 If 4 new nodes get added, quorum moves up to 9.
 If 5 nodes get dropped, quorum moves down to 5.

Deciding when to throw a node out of the quorum because it is
responding too slowly is still not completely trivial, but having a
quorum policy does address your issue.
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/linux.html
"Be humble.   A lot happened before  you were born."   - Life's Little
Instruction Book


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCHES] Updatable views
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Autovacuum on by default?