Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id 20151117.095706.240836667.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Support for N synchronous standby servers - take 2
List pgsql-hackers
Hello,

At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com>
> > - Notation
> >
> >  synchronous_standby_names, and synchronous_replication_method as
> >  a variable to provide other syntax is probably no argument
> >  except its name. But I feel synchronous_standby_num looks bit
> >  too specific.
> >
> >  I'd like to propose if this doesn't reprise the argument on
> >  notation for replication definitions:p
> >
> >  The following two GUCs would be enough to bear future expansion
> >  of notation syntax and/or method.
> >
> >  synchronous_standby_names :  as it is
> >
> >  synchronous_replication_method:
> >
> >    default is "1-priority", which means the same with the current
> >    meaning.  possible additional values so far would be,
> >
> >     "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
> >                   where n is the number of required acknowledges.
> 
> One question is that what is different between the leading "n" in
> s_s_names and the leading "n" of "n-priority"?

Ah. Sorry for the ambiguous description. 'n' in s_s_names
representing an arbitrary integer number and that in "n-priority"
is literally an "n", meaning "a format with any number of
priority hosts" as a whole. As an instance, 

synchronous_replication_method = "n-priority"
synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"

I added "n-" of "n-priority" to distinguish with "1-priority" so
if we won't provide "1-priority" for backward compatibility,
"priority" would be enough to represent the type.

By the way, s_r_method is not essentially necessary but it would
be important to avoid complexity of autodetection of formats
including currently undefined ones.


> >     "n-quorum":   the format of s_s_names is the same as above, but
> >                   it is read in quorum context.

The "n" of this is the same as above.

> >  These can be expanded, for example, as follows, but in future.
> >
> >     "complex" : Michael's format.
> >     "json"    : JSON?
> >     "json-ext": specify JSON in external file.
> >
> > Even after we have complex notations, I suppose that many use
> > cases are coverd by the first tree notations.
> 
> I'm not sure it's desirable to implement the all kind of methods into core.
> I think it's better to extend replication  in order to be more
> extensibility like adding hook function.
> And then other approach is implemented as a contrib module.

I agree with you. I proposed the following internal design having
that in mind.

> > - Internal design
> >
> >  What should be done in SyncRepReleaseWaiters() is calculating a
> >  pair of LSNs that can be regarded as synced and decide whether
> >  *this* walsender have advanced the LSN pair, then trying to
> >  release backends that wait for the LSNs *if* this walsender has
> >  advanced them.
> >
> >  From such point, the proposed patch will make redundant trials
> >  to release backens.
> >
> >  Addition to that, the patch looks to be a mixture of the current
> >  implement and the new feature. These are for the same objective
> >  so they cannot coexist each other, I think. As the result, codes
> >  for both quorum/priority judgement appear at multiple level in
> >  call tree. This would be an obstacle for future (possible)
> >  expansion.
> >
> >  So, I think this feature should be implemented as following,
> >
> >  SyncRepInitConfig reads the configuration and stores the result
> >  structure into elsewhere such like WalSnd->syncrepset_definition
> >  instead of WalSnd->sync_standby_priority, which should be
> >  removed. Nothing would be stored if the current wal sender is
> >  not a member of the defined replication set. Storing a pointer
> >  to matching function there would increase the flexibility but
> >  such implement in contrast will make the code difficult to be
> >  read.. (I often look for the entity of xlogreader->read_page()
> >  ;)
> >
> >  Then SyncRepSyncedLsnAdvancedTo() instead of
> >  SyncRepGetSynchronousStandbys() returns an LSN pair that can be
> >  regarded as 'synced' according to specified definition of
> >  replication set and whether this walsender have advanced the
> >  LSNs.
> >
> >  Finally, SyncRepReleaseWaiters() uses it to release backends if
> >  needed.
> >
> >  The differences among quorum/priority or others are confined in
> >  SyncRepSyncedLsnAdvancedTo(). As the result,
> >  SyncRepReleaseWaiters would look as following.
> >
> >  | SyncRepReleaseWaiters(void)
> >  | {
> >  |   if (MyWalSnd->syncrepset_definition == NULL || ...)
> >  |      return;
> >  |   ...
> >  |   if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
> >  |   {
> >  |     /* I haven't advanced the synced LSNs */
> >  |     LWLockRelease(SyncRepLock);
> >  |     rerturn;
> >  |   }
> >  |   /* Set the lsn first so that when we wake backends they will relase...
> >
> >  I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
> >  does but perhaps yes we can:p in effective manner..
> >
> >  What do you think about this?
> 
> I agree with this design.
> What SyncRepSyncedLsnAdvancedTo() does would be different for each
> method, so we can implement "n-priority" style multiple sync
> replication at first version.

Maybe the first *additional* one if we decide to keep backward
compatibility, as the discussion above.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?
Next
From: Robert Haas
Date:
Subject: Re: Freeze avoidance of very large table.