Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id CAD21AoCfSX_MbOze8C--zoDJatUfaJcF5PzZ-ELqpigNBQ_ERw@mail.gmail.com
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Reply to multiple hackers.
Thank you for reviewing this patch.

> +    used.  Priority is given to servers in the order that the appear
> in the list.
>
> s/the appear/they appear/
>
> -    The minimum wait time is the roundtrip time between primary to standby.
> +    The minimum wait time is the roundtrip time between the primary and the
> +    almost synchronous standby.
>
> s/almost/slowest/

Will fix this typo. Thanks!

On Fri, Mar 4, 2016 at 5:22 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> Sorry for long, hard-to-read writings in advance..
>
> At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD3XGZtuvgc5uKJdvcoJP5S0rvGQQCJLRL4rLsruRch5Q@mail.gmail.com>
>> Hi,
>>
>> Thank you so much for reviewing this patch!
>>
>> All review comments regarding document and comment are fixed.
>> Attached latest v14 patch.
>>
>> > This accepts 'abc^Id' as a name, which is wrong behavior (but
>> > such appliction names are not allowed anyway. If you assume so,
>> > I'd like to see a comment for that.).
>>
>> 'abc^Id' is accepted as application_name, no?
>> postgres(1)=# set application_name to 'abc^Id';
>> SET
>> postgres(1)=# show application_name ;
>>  application_name
>> ------------------
>>  abc^Id
>> (1 row)
>
> Sorry, I implicitly used "^" in the meaning of "ctrl key". So
> "^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
> following in psql shows that.
>
> =# set application_name to E'abc\td';
> =# show application_name ;
>  application_name
> ------------------
>  ab?d
> (1 row)
>
> The <tab> is replaced with '?' (literally) at the time of
> guc assinment.

Oh, I see.
I will comment for that.

>> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> > char ychar) requires differnt character types. Is there any reason
>> > for that?
>>
>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
> Umm. My qustion might have been a bit out of the point.
>
> The addlitchar_xd_string(str,unsigned char c) does
> appendStringInfoChar(, c). On the other hand, the signature of
> the function of stringinfo is the following.
>
> AppendStringInfoChar(StringInfo str, char ch);
>
> Of course "char" is equivalent of "signed char" as
> default. addlitchar_xd_string assigns the given character in
> "unsigned char" to the parameter of AppendStringInfoChar of
> "signed char".
>
> These two are incompatible types. Imagine the
> following codelet,
>
> #include <stdio.h>
>
> void hoge(signed char c){
>   int ch = c;
>   fprintf(stderr, "char = %d\n", ch);
> }
>
> int main(void)
> {
>   unsigned char u;
>
>   u = 200;
>   hoge(u);
>   return 0;
> }
>
> The result is -56. So we generally should get rid of such type of
> mixture of signedness for no particular reason.
>
> In this case, the domain of the variable is 0x20-0x7e so no
> problem won't be actualized but also there's no reason for the
> signedness mixture.

Thank you for explanation.
I will fix this.

>> > I personally don't like addlit*string() things for such simple
>> > syntax but itself is acceptble enough for me. However it uses
>> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> > of memory chunk for every double-quoted name. The chunks are
>> > finally stacked up left uncollected until the current
>> > memorycontext is deleted or reset (It is deleted just after
>> > finishing config file processing). Addition to that, setting
>> > s_s_names runs the parser twice. It seems to me too greedy and
>> > seems that static char [NAMEDATALEN] is enough using the v12 way
>> > without palloc/repalloc.
>>
>> I though that length of group name could be more than NAMEDATALEN, so
>> I use StringInfo.
>> Is it not necessary?
>
> Such long names doesn't seem to necessary. Too long identifiers
> no longer act as identifier for human eyeballs. We are limiting
> the length of identifiers of the whole database system to
> NAMEDATALEN-1, which seems to have been enough so I don't see any
> reason to have a group name longer than that.
>

I see. I will fix this.

>> > I found that the name SyncGroupName.wait_num is not
>> > instinctive. How about sync_num, sync_member_num or
>> > sync_standby_num? If the last is preferable, .members also should
>> > be .standbys .
>>
>> Thanks, sync_num is preferable to me.
>>
>> ===
>> > I am quite uncomfortable with the existence of
>> > WanSnd.sync_standby_priority. It represented the pirority in the
>> > old linear s_s_names format but nested groups or even
>> > single-level quarum list obviously doesn't fit it. Can we get rid
>> > of sync_standby_priority, even though we realize atmost
>> > n-priority for now?
>>
>> We could get rid of sync_standby_priority.
>> But if so, we will not be able to see the next sync standby in
>> pg_stat_replication system view.
>> Regarding each node priority, I was thinking that standbys in quorum
>> list have same priority, and in nested group each standbys are given
>> the priority starting from 1.
>
> As far as I can see the varialbe is referred to as a boolean to
> indicate whether a walsernder is connected to a candidate
> synchronous standby. So the value is totally useless, at least
> for now. However, SyncRepRelaseWaiters uses the value to check if
> the synced LSNs can be advaned by a walsender so the variable is
> useful as a boolean.
>
> In the previous versions, the reason why WanSnd had the priority
> value is that a pair of synchronized LSNs is determined only by
> one wansender, which has the highest priority among active
> wansenders. So even if a walsender receives a response from
> walreceiver, it doesn't need to do nothing if it is not at the
> highest priority. It's a simple world.
>
> In the quorum commit word, in contrast, what
> SyncRepGetSyncStandbysFn shoud do is returning certain private
> information to be used to calculate a pair of safe/synched LSNs
> in SyncRepGetSYncedLsnsFn looking into WalSndCtl->wansnds
> list. The latter passes a pair of safe/synced LSNs to the upper
> level list or SyncRepSyncedLsnAdvancedTo as the topmost
> caller. There's no room for sync_standby_priority to work as the
> original objective.
>
> Even if we assign the value in the explained way, the values are
> always 1 for quorum method and duplicate values for multiple
> priority method. What do you want to show by the value to users?

I agree with you.
When we implement nested style of multiple sync replication, it would
tough to show to users using by sync_standby_priority.
But in current our first goal (implementing 1-nest style), it doesn't
seem to need to get rid of sync_standby_priority from WalSnd so far,
no?
Towards multiple nested style, I'm roughly planning to have new system
view is defined like follows.

- New system view shows all groups and nodes informations.
- Move sync_state from pg_stat_replication to new system view.
- Get rid of sync_priority from pg_stat_replication.
- Add new sync_state 'quorum' that indicates candidate sync standbys
of its group using quorum method.
- If parent group state is potential, 'potential:' prefix is added to
the child standby's sync_state.

* s_s_names = '2[a, 1(b,c):group1, 1[d,e]:gorup2]' name  | sync_method |      member         | sync_num |
sync_state      | parant_group

-----------+--------------------+---------------------------+---------------+--------------------------+--------------main
  |  priority         | {a,group1,group2}  |        2      |                      |a         |                     |
                       |       | sync                   | maingroup1 |  quorum        | {b,c}                     |
  1      |
 
sync                    | mainb         |                    |                             |       | sync
    | group1c         |                    |                             |       | potential               |
group1group2|  priority         | {d,e}                     |        1| potential               | maind         |
            |                             |       | potential:sync      | group2e         |                    |
                    |       | potential:potential | group2
 
(8 rows)

* s_s_names = '2(a, 1[b,c]:group1, 1(d,e):group2)' name  | sync_method |      member         | sync_num |
sync_state      | parant_group

-----------+--------------------+--------------------------+----------------+--------------------------+--------------main
  |  quorum        | {a,group1,group2} |        2      |                 |a         |                     |
             |      | quorum              | maingroup1 |  priority         | {b,c}                    |        1
 
| quorum              | mainb         |                     |                           |      | sync
|group1c         |                     |                           |      | potential             | group1group2 |
quorum       | {d,e}                    |        1      |
 
quorum              | maind         |                    |                            |      | quorum              |
group2e        |                    |                            |      | quorum              | group2
 
(8 rows)

>> > SyncRepStandbys are to be in multilevel and the struct is
>> > naturally allowed to be so but SyncRepClearStandbyGroupList
>> > assumes it in single level.
>>
>> Because I think that we don't need to implement to fully support
>> nested style at first version.
>> We have to carefully design this feature while considering
>> expandability, but overkill implementation could be cause of crash.
>> Consider remaining time for 9.6, I feel we could implement quorum
>> method at best.
>
> Yes, so I proposed to ass Aseert() in the function.

Will add it.


Regards,

--
Masahiko Sawada



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Next
From: David Rowley
Date:
Subject: Re: Parallel Aggregate