Re: Synchronization levels in SR - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronization levels in SR
Date
Msg-id 1274818590.6203.2759.camel@ebony
Whole thread Raw
In response to Re: Synchronization levels in SR  (Yeb Havinga <yebhavinga@gmail.com>)
Responses Re: Synchronization levels in SR
List pgsql-hackers
On Tue, 2010-05-25 at 21:19 +0200, Yeb Havinga wrote:
> Simon Riggs wrote:
> > How we handle degraded mode is important, yes. Whatever parameters we
> > choose the problem will remain the same.
> >
> > Should we just ignore degraded mode and respond as if nothing bad had
> > happened? Most people would say not.
> >
> > If we specify server1 = synch and server2 = async we then also need to
> > specify what happens if server1 is down. People might often specify
> >     if (server1 == down) server2 = synch.
> >   
> I have a hard time imagining including async servers in the quorum. If 
> an async servers vote is necessary to reach quorum due to a 'real' sync 
> standby server failure, it would mean that the async-intended standby is 
> now also in sync with the master transactions. IMHO this is a bad 
> situation, since instead of the DBA getting the error: "not enough sync 
> standbys to reach quorum", he'll now get "database is slow" complaints, 
> only to find out later that too much sync standby servers went south. 
> (under the assumption that async servers are mostly on too slow links to 
> consider for sync standby).

Yeh, there's difficulty either way. 

We don't need to think of servers as being "synch" or "async", more
likely we would rate them in terms of typical synchronisation delay. So
yeh, calling them "fast" and "slow" in terms of synchronisation delay
makes sense.

Some people with low xact rate and high need for protection might want
to switch across to the slow server and keep running. If not, the
max_synch_delay would trip and you would then select
synch_failure_action = rollback. 

The realistic response is to add a second "fast" sync server, to allow
you to stay up even when you lose one of the fast servers. That now
gives you 4 servers and the failure modes start to get real complex.

Specifying rules to achieve what you're after would be much harder. Some
people might want that, but most people won't in the general case and if
they did specify them they'd likely get them wrong.

All of these issues show why I want to specify the synchronisation mode
as a USERSET. That will allow us to specify more easily which parts of
our application are important when the cluster is degraded and which
data is so critical it must reach multiple servers.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Exposing the Xact commit order to the user
Next
From: "Kevin Grittner"
Date:
Subject: Re: Exposing the Xact commit order to the user