Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw@mail.gmail.com
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Support for N synchronous standby servers - take 2  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Tue, Nov 17, 2015 at 9:57 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com>
>> > - Notation
>> >
>> >  synchronous_standby_names, and synchronous_replication_method as
>> >  a variable to provide other syntax is probably no argument
>> >  except its name. But I feel synchronous_standby_num looks bit
>> >  too specific.
>> >
>> >  I'd like to propose if this doesn't reprise the argument on
>> >  notation for replication definitions:p
>> >
>> >  The following two GUCs would be enough to bear future expansion
>> >  of notation syntax and/or method.
>> >
>> >  synchronous_standby_names :  as it is
>> >
>> >  synchronous_replication_method:
>> >
>> >    default is "1-priority", which means the same with the current
>> >    meaning.  possible additional values so far would be,
>> >
>> >     "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
>> >                   where n is the number of required acknowledges.
>>
>> One question is that what is different between the leading "n" in
>> s_s_names and the leading "n" of "n-priority"?
>
> Ah. Sorry for the ambiguous description. 'n' in s_s_names
> representing an arbitrary integer number and that in "n-priority"
> is literally an "n", meaning "a format with any number of
> priority hosts" as a whole. As an instance,
>
> synchronous_replication_method = "n-priority"
> synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>
> I added "n-" of "n-priority" to distinguish with "1-priority" so
> if we won't provide "1-priority" for backward compatibility,
> "priority" would be enough to represent the type.
>
> By the way, s_r_method is not essentially necessary but it would
> be important to avoid complexity of autodetection of formats
> including currently undefined ones.

Than you for your explanation, I understood that.

It means that the format of s_s_names will be changed, which would be not good.
So, how about the adding just s_r_method parameter and the number of
required ACK is represented in the leading of s_r_method?
For example, the following setting is same as above.

synchronous_replication_method = "2-priority"
synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

In quorum method, we can set;
synchronous_replication_method = "2-quorum"
synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

Thought?

>
>
>> >     "n-quorum":   the format of s_s_names is the same as above, but
>> >                   it is read in quorum context.
>
> The "n" of this is the same as above.
>
>> >  These can be expanded, for example, as follows, but in future.
>> >
>> >     "complex" : Michael's format.
>> >     "json"    : JSON?
>> >     "json-ext": specify JSON in external file.
>> >
>> > Even after we have complex notations, I suppose that many use
>> > cases are coverd by the first tree notations.
>>
>> I'm not sure it's desirable to implement the all kind of methods into core.
>> I think it's better to extend replication  in order to be more
>> extensibility like adding hook function.
>> And then other approach is implemented as a contrib module.
>
> I agree with you. I proposed the following internal design having
> that in mind.
>
>> > - Internal design
>> >
>> >  What should be done in SyncRepReleaseWaiters() is calculating a
>> >  pair of LSNs that can be regarded as synced and decide whether
>> >  *this* walsender have advanced the LSN pair, then trying to
>> >  release backends that wait for the LSNs *if* this walsender has
>> >  advanced them.
>> >
>> >  From such point, the proposed patch will make redundant trials
>> >  to release backens.
>> >
>> >  Addition to that, the patch looks to be a mixture of the current
>> >  implement and the new feature. These are for the same objective
>> >  so they cannot coexist each other, I think. As the result, codes
>> >  for both quorum/priority judgement appear at multiple level in
>> >  call tree. This would be an obstacle for future (possible)
>> >  expansion.
>> >
>> >  So, I think this feature should be implemented as following,
>> >
>> >  SyncRepInitConfig reads the configuration and stores the result
>> >  structure into elsewhere such like WalSnd->syncrepset_definition
>> >  instead of WalSnd->sync_standby_priority, which should be
>> >  removed. Nothing would be stored if the current wal sender is
>> >  not a member of the defined replication set. Storing a pointer
>> >  to matching function there would increase the flexibility but
>> >  such implement in contrast will make the code difficult to be
>> >  read.. (I often look for the entity of xlogreader->read_page()
>> >  ;)
>> >
>> >  Then SyncRepSyncedLsnAdvancedTo() instead of
>> >  SyncRepGetSynchronousStandbys() returns an LSN pair that can be
>> >  regarded as 'synced' according to specified definition of
>> >  replication set and whether this walsender have advanced the
>> >  LSNs.
>> >
>> >  Finally, SyncRepReleaseWaiters() uses it to release backends if
>> >  needed.
>> >
>> >  The differences among quorum/priority or others are confined in
>> >  SyncRepSyncedLsnAdvancedTo(). As the result,
>> >  SyncRepReleaseWaiters would look as following.
>> >
>> >  | SyncRepReleaseWaiters(void)
>> >  | {
>> >  |   if (MyWalSnd->syncrepset_definition == NULL || ...)
>> >  |      return;
>> >  |   ...
>> >  |   if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
>> >  |   {
>> >  |     /* I haven't advanced the synced LSNs */
>> >  |     LWLockRelease(SyncRepLock);
>> >  |     rerturn;
>> >  |   }
>> >  |   /* Set the lsn first so that when we wake backends they will relase...
>> >
>> >  I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
>> >  does but perhaps yes we can:p in effective manner..
>> >
>> >  What do you think about this?
>>
>> I agree with this design.
>> What SyncRepSyncedLsnAdvancedTo() does would be different for each
>> method, so we can implement "n-priority" style multiple sync
>> replication at first version.
>
> Maybe the first *additional* one if we decide to keep backward
> compatibility, as the discussion above.
>

Regards,

--
Masahiko Sawada



pgsql-hackers by date:

Previous
From: konstantin knizhnik
Date:
Subject: Re: Question concerning XTM (eXtensible Transaction Manager API)
Next
From: Simon Riggs
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers