Re: Synchronization levels in SR - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronization levels in SR |
Date | |
Msg-id | 1274860942.6203.2944.camel@ebony Whole thread Raw |
In response to | Re: Synchronization levels in SR (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: Synchronization levels in SR
|
List | pgsql-hackers |
On Wed, 2010-05-26 at 12:36 +0900, Fujii Masao wrote: > On Wed, May 26, 2010 at 2:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > My suggestion is simply to have a single parameter (name unimportant) > > > > number_of_synch_servers_we_wait_for = N > > > > which is much easier to understand because it is phrased in terms of the > > guarantee given to the transaction, not in terms of what the admin > > thinks is the situation. > > How can we choose #2, #3 or #4 by using your proposed option? > If "async", the standby never sends any ACK. If "recv", "fsync", > or "redo", the standby sends the ACK when it has received, fsynced > or replayed the WAL from the master, respectively. Everything I've said about "per-standby" settings applies here, which was based upon having just 2 settings: sync and async. If you have four settings instead, things get even more complex. If we were going to reduce complexity, it would be to reduce the number of options here to just offering option #2 in the first phase. AFAICS people would only ever select #2 or #4 anyway. IMHO #3 isn't likely to be selected on its own because it performs badly for no real benefit. Having two standbys, I might want to specify #2 to both, or if one is down then #3 to the remaining standby instead. Nobody else has yet tried to explain how we would specify what happens when one of the standbys is down, with per-standby settings. Failure modes are where the complexity is here. However we proceed, we must have a discussion about how we specify the failure modes. This is not something we should add on at the last minute, we should think about that now and address it openly. Oracle Data Guard is a great resource for what semantics we might need to cover, but its also a lesson in complexity from its per-standby settings. Please look at net_timeout and alternate options in particular. See how difficult it is to specify failure modes, even though Data Guard offers probably dozens of parameters and options - its orientation is per-standby not towards the transaction and the user. > On the other hand, we add new GUC "max_synchronous_standbys" > (I prefer it to "number_of_synch_servers_we_wait_for", but does > anyone have better name?) as PGC_USERSET into postgresql.conf. > It specifies the maximum number of standbys which transaction > commit must wait for the ACK from. > > If max_synchronous_standbys is 0, no transaction commit waits for > ACK even if some connected standbys set their replication_mode to > "recv", "fsync" or "redo". If it's positive, transaction comit waits > for N ACKs. N is the smaller number between max_synchronous_standbys > and the actual number of connected "synchronous" standbys. To summarise, I think we can get away with just 3 parameters: synchronous_replication = N # similar in name to synchronous_commit synch_rep_timeout = T synch_rep_timeout_action = commit | abort Conceptually, this is "I want at least N replica copies made of my database changes, I will wait for up to T milliseconds to get that otherwise I will do X". Very easy and clear for an application to understand what guarantees it is requesting. Also very easy for the administrator to understand the guarantees requested and how to provision for them: to deliver robustness they typically need N+1 servers, or for even higher levels of robustness and performance N+2 etc.. Making synchronous_replication into a USERSET would be an industry first: transaction controlled robustness at every level. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: