Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Support for N synchronous standby servers - take 2
List pgsql-hackers
On Fri, Nov 13, 2015 at 12:52 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg@mail.gmail.com>
>> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> ...
>> >> This patch is a tool for discussion, so I'm not going to fix this bug
>> >> until getting consensus.
>> >>
>> >> We are still under the discussion to find solution that can get consensus.
>> >> I felt that it's difficult to select from the two approaches within
>> >> this development cycle, and there would not be time to implement such
>> >> big feature even if we selected.
>> >> But this feature is obviously needed by many users.
>> >> So I'm considering more simple and extensible something solution, the
>> >> idea I posted is one of them.
>> >> The another worth considering approach is that just specifying the
>> >> number of sync standby. It also can cover the main use cases in
>> >> some-cases.
>> >
>> > Yes, it covers main and simple use case like "I want to have multiple
>> > synchronous replicas!". Even if we miss quorum commit at the first
>> > version, the feature is still very useful.
>
> +1
>
>> It can cover not only the case you mentioned but also main use case
>> Michael mentioned by setting same application_name.
>> And that first version patch is almost implemented, so just needs to
>> be reviewed.
>>
>> I think that it would be good to implement the simple feature at the
>> first version, and then coordinate the design based on opinion and
>> feed backs from more user, use-case.
>
> Yeah. I agree with it. And I have two proposals in this
> direction.
>
> - Notation
>
>  synchronous_standby_names, and synchronous_replication_method as
>  a variable to provide other syntax is probably no argument
>  except its name. But I feel synchronous_standby_num looks bit
>  too specific.
>
>  I'd like to propose if this doesn't reprise the argument on
>  notation for replication definitions:p
>
>  The following two GUCs would be enough to bear future expansion
>  of notation syntax and/or method.
>
>  synchronous_standby_names :  as it is
>
>  synchronous_replication_method:
>
>    default is "1-priority", which means the same with the current
>    meaning.  possible additional values so far would be,
>
>     "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
>                   where n is the number of required acknowledges.

One question is that what is different between the leading "n" in
s_s_names and the leading "n" of "n-priority"?

>
>     "n-quorum":   the format of s_s_names is the same as above, but
>                   it is read in quorum context.
>
>  These can be expanded, for example, as follows, but in future.
>
>     "complex" : Michael's format.
>     "json"    : JSON?
>     "json-ext": specify JSON in external file.
>
> Even after we have complex notations, I suppose that many use
> cases are coverd by the first tree notations.

I'm not sure it's desirable to implement the all kind of methods into core.
I think it's better to extend replication  in order to be more
extensibility like adding hook function.
And then other approach is implemented as a contrib module.

>
> - Internal design
>
>  What should be done in SyncRepReleaseWaiters() is calculating a
>  pair of LSNs that can be regarded as synced and decide whether
>  *this* walsender have advanced the LSN pair, then trying to
>  release backends that wait for the LSNs *if* this walsender has
>  advanced them.
>
>  From such point, the proposed patch will make redundant trials
>  to release backens.
>
>  Addition to that, the patch looks to be a mixture of the current
>  implement and the new feature. These are for the same objective
>  so they cannot coexist each other, I think. As the result, codes
>  for both quorum/priority judgement appear at multiple level in
>  call tree. This would be an obstacle for future (possible)
>  expansion.
>
>  So, I think this feature should be implemented as following,
>
>  SyncRepInitConfig reads the configuration and stores the result
>  structure into elsewhere such like WalSnd->syncrepset_definition
>  instead of WalSnd->sync_standby_priority, which should be
>  removed. Nothing would be stored if the current wal sender is
>  not a member of the defined replication set. Storing a pointer
>  to matching function there would increase the flexibility but
>  such implement in contrast will make the code difficult to be
>  read.. (I often look for the entity of xlogreader->read_page()
>  ;)
>
>  Then SyncRepSyncedLsnAdvancedTo() instead of
>  SyncRepGetSynchronousStandbys() returns an LSN pair that can be
>  regarded as 'synced' according to specified definition of
>  replication set and whether this walsender have advanced the
>  LSNs.
>
>  Finally, SyncRepReleaseWaiters() uses it to release backends if
>  needed.
>
>  The differences among quorum/priority or others are confined in
>  SyncRepSyncedLsnAdvancedTo(). As the result,
>  SyncRepReleaseWaiters would look as following.
>
>  | SyncRepReleaseWaiters(void)
>  | {
>  |   if (MyWalSnd->syncrepset_definition == NULL || ...)
>  |      return;
>  |   ...
>  |   if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
>  |   {
>  |     /* I haven't advanced the synced LSNs */
>  |     LWLockRelease(SyncRepLock);
>  |     rerturn;
>  |   }
>  |   /* Set the lsn first so that when we wake backends they will relase...
>
>  I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
>  does but perhaps yes we can:p in effective manner..
>
>  What do you think about this?

I agree with this design.
What SyncRepSyncedLsnAdvancedTo() does would be different for each
method, so we can implement "n-priority" style multiple sync
replication at first version.

Regards,

--
Masahiko Sawada



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Making tab-complete.c easier to maintain
Next
From: Catalin Iacob
Date:
Subject: Re: proposal: multiple psql option -c