Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Support for N synchronous standby servers - take 2 |
Date | |
Msg-id | CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com Whole thread Raw |
In response to | Re: Support for N synchronous standby servers - take 2 (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: Support for N synchronous standby servers - take 2
|
List | pgsql-hackers |
On Fri, Nov 13, 2015 at 12:52 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg@mail.gmail.com> >> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > ... >> >> This patch is a tool for discussion, so I'm not going to fix this bug >> >> until getting consensus. >> >> >> >> We are still under the discussion to find solution that can get consensus. >> >> I felt that it's difficult to select from the two approaches within >> >> this development cycle, and there would not be time to implement such >> >> big feature even if we selected. >> >> But this feature is obviously needed by many users. >> >> So I'm considering more simple and extensible something solution, the >> >> idea I posted is one of them. >> >> The another worth considering approach is that just specifying the >> >> number of sync standby. It also can cover the main use cases in >> >> some-cases. >> > >> > Yes, it covers main and simple use case like "I want to have multiple >> > synchronous replicas!". Even if we miss quorum commit at the first >> > version, the feature is still very useful. > > +1 > >> It can cover not only the case you mentioned but also main use case >> Michael mentioned by setting same application_name. >> And that first version patch is almost implemented, so just needs to >> be reviewed. >> >> I think that it would be good to implement the simple feature at the >> first version, and then coordinate the design based on opinion and >> feed backs from more user, use-case. > > Yeah. I agree with it. And I have two proposals in this > direction. > > - Notation > > synchronous_standby_names, and synchronous_replication_method as > a variable to provide other syntax is probably no argument > except its name. But I feel synchronous_standby_num looks bit > too specific. > > I'd like to propose if this doesn't reprise the argument on > notation for replication definitions:p > > The following two GUCs would be enough to bear future expansion > of notation syntax and/or method. > > synchronous_standby_names : as it is > > synchronous_replication_method: > > default is "1-priority", which means the same with the current > meaning. possible additional values so far would be, > > "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...", > where n is the number of required acknowledges. One question is that what is different between the leading "n" in s_s_names and the leading "n" of "n-priority"? > > "n-quorum": the format of s_s_names is the same as above, but > it is read in quorum context. > > These can be expanded, for example, as follows, but in future. > > "complex" : Michael's format. > "json" : JSON? > "json-ext": specify JSON in external file. > > Even after we have complex notations, I suppose that many use > cases are coverd by the first tree notations. I'm not sure it's desirable to implement the all kind of methods into core. I think it's better to extend replication in order to be more extensibility like adding hook function. And then other approach is implemented as a contrib module. > > - Internal design > > What should be done in SyncRepReleaseWaiters() is calculating a > pair of LSNs that can be regarded as synced and decide whether > *this* walsender have advanced the LSN pair, then trying to > release backends that wait for the LSNs *if* this walsender has > advanced them. > > From such point, the proposed patch will make redundant trials > to release backens. > > Addition to that, the patch looks to be a mixture of the current > implement and the new feature. These are for the same objective > so they cannot coexist each other, I think. As the result, codes > for both quorum/priority judgement appear at multiple level in > call tree. This would be an obstacle for future (possible) > expansion. > > So, I think this feature should be implemented as following, > > SyncRepInitConfig reads the configuration and stores the result > structure into elsewhere such like WalSnd->syncrepset_definition > instead of WalSnd->sync_standby_priority, which should be > removed. Nothing would be stored if the current wal sender is > not a member of the defined replication set. Storing a pointer > to matching function there would increase the flexibility but > such implement in contrast will make the code difficult to be > read.. (I often look for the entity of xlogreader->read_page() > ;) > > Then SyncRepSyncedLsnAdvancedTo() instead of > SyncRepGetSynchronousStandbys() returns an LSN pair that can be > regarded as 'synced' according to specified definition of > replication set and whether this walsender have advanced the > LSNs. > > Finally, SyncRepReleaseWaiters() uses it to release backends if > needed. > > The differences among quorum/priority or others are confined in > SyncRepSyncedLsnAdvancedTo(). As the result, > SyncRepReleaseWaiters would look as following. > > | SyncRepReleaseWaiters(void) > | { > | if (MyWalSnd->syncrepset_definition == NULL || ...) > | return; > | ... > | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos)) > | { > | /* I haven't advanced the synced LSNs */ > | LWLockRelease(SyncRepLock); > | rerturn; > | } > | /* Set the lsn first so that when we wake backends they will relase... > > I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo > does but perhaps yes we can:p in effective manner.. > > What do you think about this? I agree with this design. What SyncRepSyncedLsnAdvancedTo() does would be different for each method, so we can implement "n-priority" style multiple sync replication at first version. Regards, -- Masahiko Sawada
pgsql-hackers by date: