Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id 20151113.125212.102628436.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hello,

At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg@mail.gmail.com>
> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
...
> >> This patch is a tool for discussion, so I'm not going to fix this bug
> >> until getting consensus.
> >>
> >> We are still under the discussion to find solution that can get consensus.
> >> I felt that it's difficult to select from the two approaches within
> >> this development cycle, and there would not be time to implement such
> >> big feature even if we selected.
> >> But this feature is obviously needed by many users.
> >> So I'm considering more simple and extensible something solution, the
> >> idea I posted is one of them.
> >> The another worth considering approach is that just specifying the
> >> number of sync standby. It also can cover the main use cases in
> >> some-cases.
> >
> > Yes, it covers main and simple use case like "I want to have multiple
> > synchronous replicas!". Even if we miss quorum commit at the first
> > version, the feature is still very useful.

+1

> It can cover not only the case you mentioned but also main use case
> Michael mentioned by setting same application_name.
> And that first version patch is almost implemented, so just needs to
> be reviewed.
> 
> I think that it would be good to implement the simple feature at the
> first version, and then coordinate the design based on opinion and
> feed backs from more user, use-case.

Yeah. I agree with it. And I have two proposals in this
direction.

- Notation
synchronous_standby_names, and synchronous_replication_method asa variable to provide other syntax is probably no
argumentexceptits name. But I feel synchronous_standby_num looks bittoo specific.
 
I'd like to propose if this doesn't reprise the argument onnotation for replication definitions:p
The following two GUCs would be enough to bear future expansionof notation syntax and/or method.
synchronous_standby_names :  as it is
synchronous_replication_method:
  default is "1-priority", which means the same with the current  meaning.  possible additional values so far would
be,
   "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",                 where n is the number of
requiredacknowledges.
 
   "n-quorum":   the format of s_s_names is the same as above, but                 it is read in quorum context.
These can be expanded, for example, as follows, but in future.
   "complex" : Michael's format.   "json"    : JSON?   "json-ext": specify JSON in external file.

Even after we have complex notations, I suppose that many use
cases are coverd by the first tree notations.


- Internal design
What should be done in SyncRepReleaseWaiters() is calculating apair of LSNs that can be regarded as synced and decide
whether*this*walsender have advanced the LSN pair, then trying torelease backends that wait for the LSNs *if* this
walsenderhasadvanced them.
 
From such point, the proposed patch will make redundant trialsto release backens.
Addition to that, the patch looks to be a mixture of the currentimplement and the new feature. These are for the same
objectivesothey cannot coexist each other, I think. As the result, codesfor both quorum/priority judgement appear at
multiplelevel incall tree. This would be an obstacle for future (possible)expansion.
 
So, I think this feature should be implemented as following,
SyncRepInitConfig reads the configuration and stores the resultstructure into elsewhere such like
WalSnd->syncrepset_definitioninsteadof WalSnd->sync_standby_priority, which should beremoved. Nothing would be stored
ifthe current wal sender isnot a member of the defined replication set. Storing a pointerto matching function there
wouldincrease the flexibility butsuch implement in contrast will make the code difficult to beread.. (I often look for
theentity of xlogreader->read_page();)
 
Then SyncRepSyncedLsnAdvancedTo() instead ofSyncRepGetSynchronousStandbys() returns an LSN pair that can beregarded as
'synced'according to specified definition ofreplication set and whether this walsender have advanced theLSNs.
 
Finally, SyncRepReleaseWaiters() uses it to release backends ifneeded.
The differences among quorum/priority or others are confined inSyncRepSyncedLsnAdvancedTo(). As the
result,SyncRepReleaseWaiterswould look as following.
 
| SyncRepReleaseWaiters(void)| {|   if (MyWalSnd->syncrepset_definition == NULL || ...)|      return;|   ...|   if
(!SyncRepSyncedLsnAdvancedTo(&flush_pos,&write_pos))|   {|     /* I haven't advanced the synced LSNs */|
LWLockRelease(SyncRepLock);|    rerturn;|   }|   /* Set the lsn first so that when we wake backends they will
relase...
I'm not thought concretely about what SyncRepSyncedLsnAdvancedTodoes but perhaps yes we can:p in effective manner..
What do you think about this?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan
Next
From: Kouhei Kaigai
Date:
Subject: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)