Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Support for N synchronous standby servers - take 2 |
Date | |
Msg-id | 20151117.095706.240836667.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Support for N synchronous standby servers - take 2 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Support for N synchronous standby servers - take 2
|
List | pgsql-hackers |
Hello, At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com> > > - Notation > > > > synchronous_standby_names, and synchronous_replication_method as > > a variable to provide other syntax is probably no argument > > except its name. But I feel synchronous_standby_num looks bit > > too specific. > > > > I'd like to propose if this doesn't reprise the argument on > > notation for replication definitions:p > > > > The following two GUCs would be enough to bear future expansion > > of notation syntax and/or method. > > > > synchronous_standby_names : as it is > > > > synchronous_replication_method: > > > > default is "1-priority", which means the same with the current > > meaning. possible additional values so far would be, > > > > "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...", > > where n is the number of required acknowledges. > > One question is that what is different between the leading "n" in > s_s_names and the leading "n" of "n-priority"? Ah. Sorry for the ambiguous description. 'n' in s_s_names representing an arbitrary integer number and that in "n-priority" is literally an "n", meaning "a format with any number of priority hosts" as a whole. As an instance, synchronous_replication_method = "n-priority" synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter" I added "n-" of "n-priority" to distinguish with "1-priority" so if we won't provide "1-priority" for backward compatibility, "priority" would be enough to represent the type. By the way, s_r_method is not essentially necessary but it would be important to avoid complexity of autodetection of formats including currently undefined ones. > > "n-quorum": the format of s_s_names is the same as above, but > > it is read in quorum context. The "n" of this is the same as above. > > These can be expanded, for example, as follows, but in future. > > > > "complex" : Michael's format. > > "json" : JSON? > > "json-ext": specify JSON in external file. > > > > Even after we have complex notations, I suppose that many use > > cases are coverd by the first tree notations. > > I'm not sure it's desirable to implement the all kind of methods into core. > I think it's better to extend replication in order to be more > extensibility like adding hook function. > And then other approach is implemented as a contrib module. I agree with you. I proposed the following internal design having that in mind. > > - Internal design > > > > What should be done in SyncRepReleaseWaiters() is calculating a > > pair of LSNs that can be regarded as synced and decide whether > > *this* walsender have advanced the LSN pair, then trying to > > release backends that wait for the LSNs *if* this walsender has > > advanced them. > > > > From such point, the proposed patch will make redundant trials > > to release backens. > > > > Addition to that, the patch looks to be a mixture of the current > > implement and the new feature. These are for the same objective > > so they cannot coexist each other, I think. As the result, codes > > for both quorum/priority judgement appear at multiple level in > > call tree. This would be an obstacle for future (possible) > > expansion. > > > > So, I think this feature should be implemented as following, > > > > SyncRepInitConfig reads the configuration and stores the result > > structure into elsewhere such like WalSnd->syncrepset_definition > > instead of WalSnd->sync_standby_priority, which should be > > removed. Nothing would be stored if the current wal sender is > > not a member of the defined replication set. Storing a pointer > > to matching function there would increase the flexibility but > > such implement in contrast will make the code difficult to be > > read.. (I often look for the entity of xlogreader->read_page() > > ;) > > > > Then SyncRepSyncedLsnAdvancedTo() instead of > > SyncRepGetSynchronousStandbys() returns an LSN pair that can be > > regarded as 'synced' according to specified definition of > > replication set and whether this walsender have advanced the > > LSNs. > > > > Finally, SyncRepReleaseWaiters() uses it to release backends if > > needed. > > > > The differences among quorum/priority or others are confined in > > SyncRepSyncedLsnAdvancedTo(). As the result, > > SyncRepReleaseWaiters would look as following. > > > > | SyncRepReleaseWaiters(void) > > | { > > | if (MyWalSnd->syncrepset_definition == NULL || ...) > > | return; > > | ... > > | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos)) > > | { > > | /* I haven't advanced the synced LSNs */ > > | LWLockRelease(SyncRepLock); > > | rerturn; > > | } > > | /* Set the lsn first so that when we wake backends they will relase... > > > > I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo > > does but perhaps yes we can:p in effective manner.. > > > > What do you think about this? > > I agree with this design. > What SyncRepSyncedLsnAdvancedTo() does would be different for each > method, so we can implement "n-priority" style multiple sync > replication at first version. Maybe the first *additional* one if we decide to keep backward compatibility, as the discussion above. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: