Thread: Quorum commit for multiple synchronous replication.

Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
Hi all,

In 9.6 development cycle, we had been discussed about configuration
syntax for a long time while considering expanding.
As a result, we had new dedicated language for multiple synchronous
replication, but it supports only priority method.
We know that quorum commit is very useful for many users and can
expand dedicated language easily for quorum commit.
So I'd like to propose quorum commit for multiple synchronous replication here.

The followings are changes attached patches made.
- Add new syntax 'Any N ( node1, node2, ... )' to
synchornous_standby_names for quorum commit.
- In quorum commit, the master can return commit to client after
received ACK from *at least* any N servers of listed standbys.
- sync_priority of all listed servers are same, 1.
- Add regression test for quorum commit.

I was thinking that the syntax for quorum method would use '[ ... ]'
but it will be confused with '( ... )' priority method used.
001 patch adds 'Any N ( ... )' style syntax but I know that we still
might need to discuss about better syntax, discussion is very welcome.
Attached draft patch, please give me feedback.

Regards,

--
Masahiko Sawada

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I was thinking that the syntax for quorum method would use '[ ... ]'
> but it will be confused with '( ... )' priority method used.
> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
> might need to discuss about better syntax, discussion is very welcome.
> Attached draft patch, please give me feedback.

I am +1 for using either "{}" or "[]" to define a quorum set, and -1
for the addition of a keyword in front of the integer defining for how
many nodes server need to wait for.

-    foreach(cell, sync_standbys)
+    foreach (cell, sync_standbys)    {
-        WalSnd       *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)];
+        WalSnd *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)];
This patch has some noise.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I was thinking that the syntax for quorum method would use '[ ... ]'
>> but it will be confused with '( ... )' priority method used.
>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>> might need to discuss about better syntax, discussion is very welcome.
>> Attached draft patch, please give me feedback.
>
> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
> for the addition of a keyword in front of the integer defining for how
> many nodes server need to wait for.

Thank you for reply.
"{}" or "[]" are not bad but because these are not intuitive, I
thought that it will be hard for uses to use different method for each
purpose.

> -    foreach(cell, sync_standbys)
> +    foreach (cell, sync_standbys)
>      {
> -        WalSnd       *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)];
> +        WalSnd *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)];
> This patch has some noise.

Will fix.

-- 
Regards,

--
Masahiko Sawada



Re: Quorum commit for multiple synchronous replication.

From
Petr Jelinek
Date:
On 04/08/16 06:40, Masahiko Sawada wrote:
> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I was thinking that the syntax for quorum method would use '[ ... ]'
>>> but it will be confused with '( ... )' priority method used.
>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>>> might need to discuss about better syntax, discussion is very welcome.
>>> Attached draft patch, please give me feedback.
>>
>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
>> for the addition of a keyword in front of the integer defining for how
>> many nodes server need to wait for.
>
> Thank you for reply.
> "{}" or "[]" are not bad but because these are not intuitive, I
> thought that it will be hard for uses to use different method for each
> purpose.
>

I think the "any" keyword is more explicit and understandable, also 
closer to SQL. So I would be in favor of doing that.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> On 04/08/16 06:40, Masahiko Sawada wrote:
>>
>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>>
>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>> wrote:
>>>>
>>>> I was thinking that the syntax for quorum method would use '[ ... ]'
>>>> but it will be confused with '( ... )' priority method used.
>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>>>> might need to discuss about better syntax, discussion is very welcome.
>>>> Attached draft patch, please give me feedback.
>>>
>>>
>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
>>> for the addition of a keyword in front of the integer defining for how
>>> many nodes server need to wait for.
>>
>>
>> Thank you for reply.
>> "{}" or "[]" are not bad but because these are not intuitive, I
>> thought that it will be hard for uses to use different method for each
>> purpose.
>>
>
> I think the "any" keyword is more explicit and understandable, also closer
> to SQL. So I would be in favor of doing that.

+1

Also I like the following Simon's idea.

https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
-----------------------
* first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
* any k (n1, n2, n3) – would release waiters as soon as we have the
responses from k out of N standbys. “any k” would be faster, so is
desirable for performance and resilience
-----------------------

Regards,

--
Fujii Masao



Re: Quorum commit for multiple synchronous replication.

From
Simon Riggs
Date:
On 29 August 2016 at 14:52, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>> On 04/08/16 06:40, Masahiko Sawada wrote:
>>>
>>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>>
>>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>>> wrote:
>>>>>
>>>>> I was thinking that the syntax for quorum method would use '[ ... ]'
>>>>> but it will be confused with '( ... )' priority method used.
>>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>>>>> might need to discuss about better syntax, discussion is very welcome.
>>>>> Attached draft patch, please give me feedback.
>>>>
>>>>
>>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
>>>> for the addition of a keyword in front of the integer defining for how
>>>> many nodes server need to wait for.
>>>
>>>
>>> Thank you for reply.
>>> "{}" or "[]" are not bad but because these are not intuitive, I
>>> thought that it will be hard for uses to use different method for each
>>> purpose.
>>>
>>
>> I think the "any" keyword is more explicit and understandable, also closer
>> to SQL. So I would be in favor of doing that.
>
> +1
>
> Also I like the following Simon's idea.
>
> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
> -----------------------
> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
> * any k (n1, n2, n3) – would release waiters as soon as we have the
> responses from k out of N standbys. “any k” would be faster, so is
> desirable for performance and resilience
> -----------------------

+1

"synchronous_method" -> "synchronization_method"

I'm concerned about the performance of this code. Can we work out a
way of measuring it, so we can judge how successful we are at
releasing waiters quickly? Thanks

For 9.6 we implemented something that allows the DBA to define how
slow programs are. Previously, since 9.1 this was something specified
on the application side. I would like to put it back that way, so we
end up with a parameter on client e.g. commit_quorum = k. Forget the
exact parameters/user API for now, but I'd like to allow the code to
work with user defined settings. Thanks.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Sep 6, 2016 at 11:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 29 August 2016 at 14:52, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>> On 04/08/16 06:40, Masahiko Sawada wrote:
>>>>
>>>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>>
>>>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> I was thinking that the syntax for quorum method would use '[ ... ]'
>>>>>> but it will be confused with '( ... )' priority method used.
>>>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>>>>>> might need to discuss about better syntax, discussion is very welcome.
>>>>>> Attached draft patch, please give me feedback.
>>>>>
>>>>>
>>>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
>>>>> for the addition of a keyword in front of the integer defining for how
>>>>> many nodes server need to wait for.
>>>>
>>>>
>>>> Thank you for reply.
>>>> "{}" or "[]" are not bad but because these are not intuitive, I
>>>> thought that it will be hard for uses to use different method for each
>>>> purpose.
>>>>
>>>
>>> I think the "any" keyword is more explicit and understandable, also closer
>>> to SQL. So I would be in favor of doing that.
>>
>> +1
>>
>> Also I like the following Simon's idea.
>>
>> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
>> -----------------------
>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
>> * any k (n1, n2, n3) – would release waiters as soon as we have the
>> responses from k out of N standbys. “any k” would be faster, so is
>> desirable for performance and resilience
>> -----------------------
>
> +1
>
> "synchronous_method" -> "synchronization_method"

Thanks, will fix.

> I'm concerned about the performance of this code. Can we work out a
> way of measuring it, so we can judge how successful we are at
> releasing waiters quickly? Thanks

I will measure the performance effect of this code.
I'm expecting that performances are,
'first 1 (n1, n2)'  > 'any 1(n1, n2)' > 'first 2(n1, n2)'
'first 1 (n1, n2)' will be highest throughput.

> For 9.6 we implemented something that allows the DBA to define how
> slow programs are. Previously, since 9.1 this was something specified
> on the application side. I would like to put it back that way, so we
> end up with a parameter on client e.g. commit_quorum = k. Forget the
> exact parameters/user API for now, but I'd like to allow the code to
> work with user defined settings. Thanks.

I see. The parameter on client should effect for priority method as well.
And similar to synchronous_commit, the client can specify the how much
standbys the master waits to commit for according to synchronization
method, even if s_s_names is defined.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Josh Berkus
Date:
On 08/29/2016 06:52 AM, Fujii Masao wrote:
> Also I like the following Simon's idea.
>
> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
> -----------------------
> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
> * any k (n1, n2, n3) – would release waiters as soon as we have the
> responses from k out of N standbys. “any k” would be faster, so is
> desirable for performance and resilience

What are we going to do for backwards compatibility, here?

So, here's the dilemma:

If we want to keep backwards compatibility with 9.6, then:

"k (n1, n2, n3)" == "first k (n1, n2, n3)"

However, "first k" is not what most users will want, most of the time;
users of version 13, years from now, will be getting constantly confused
by "first k" behavior when they wanted quorum.  So the sensible default
would be:

"k (n1, n2, n3)" == "any k (n1, n2, n3)"

... however, that will break backwards compatibility.  Thoughts?

My $0.02 is that we break backwards compat somehow and document the heck
out of it.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Sep 7, 2016 at 12:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 11:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 29 August 2016 at 14:52, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>>> On 04/08/16 06:40, Masahiko Sawada wrote:
>>>>>
>>>>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
>>>>> <michael.paquier@gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I was thinking that the syntax for quorum method would use '[ ... ]'
>>>>>>> but it will be confused with '( ... )' priority method used.
>>>>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>>>>>>> might need to discuss about better syntax, discussion is very welcome.
>>>>>>> Attached draft patch, please give me feedback.
>>>>>>
>>>>>>
>>>>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
>>>>>> for the addition of a keyword in front of the integer defining for how
>>>>>> many nodes server need to wait for.
>>>>>
>>>>>
>>>>> Thank you for reply.
>>>>> "{}" or "[]" are not bad but because these are not intuitive, I
>>>>> thought that it will be hard for uses to use different method for each
>>>>> purpose.
>>>>>
>>>>
>>>> I think the "any" keyword is more explicit and understandable, also closer
>>>> to SQL. So I would be in favor of doing that.
>>>
>>> +1
>>>
>>> Also I like the following Simon's idea.
>>>
>>> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
>>> -----------------------
>>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
>>> * any k (n1, n2, n3) – would release waiters as soon as we have the
>>> responses from k out of N standbys. “any k” would be faster, so is
>>> desirable for performance and resilience
>>> -----------------------
>>
>> +1
>>
>> "synchronous_method" -> "synchronization_method"
>
> Thanks, will fix.
>
>> I'm concerned about the performance of this code. Can we work out a
>> way of measuring it, so we can judge how successful we are at
>> releasing waiters quickly? Thanks
>
> I will measure the performance effect of this code.
> I'm expecting that performances are,
> 'first 1 (n1, n2)'  > 'any 1(n1, n2)' > 'first 2(n1, n2)'
> 'first 1 (n1, n2)' will be highest throughput.
>

Sorry, that's wrong.
'any 1(n1, n2)' will be highest throughput or same as 'first 1(n1, n2)'.

>> For 9.6 we implemented something that allows the DBA to define how
>> slow programs are. Previously, since 9.1 this was something specified
>> on the application side. I would like to put it back that way, so we
>> end up with a parameter on client e.g. commit_quorum = k. Forget the
>> exact parameters/user API for now, but I'd like to allow the code to
>> work with user defined settings. Thanks.
>
> I see. The parameter on client should effect for priority method as well.
> And similar to synchronous_commit, the client can specify the how much
> standbys the master waits to commit for according to synchronization
> method, even if s_s_names is defined.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Sep 7, 2016 at 4:03 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 08/29/2016 06:52 AM, Fujii Masao wrote:
>> Also I like the following Simon's idea.
>>
>> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
>> -----------------------
>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
>> * any k (n1, n2, n3) – would release waiters as soon as we have the
>> responses from k out of N standbys. “any k” would be faster, so is
>> desirable for performance and resilience
>
> What are we going to do for backwards compatibility, here?
>
> So, here's the dilemma:
>
> If we want to keep backwards compatibility with 9.6, then:
>
> "k (n1, n2, n3)" == "first k (n1, n2, n3)"
>
> However, "first k" is not what most users will want, most of the time;
> users of version 13, years from now, will be getting constantly confused
> by "first k" behavior when they wanted quorum.  So the sensible default
> would be:
>
> "k (n1, n2, n3)" == "any k (n1, n2, n3)"
>

+1.

"k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
compatibility but most users would think "k(n1, n2, n3)" as quorum
after introduced quorum.
I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
n3)" style before 9.6 releasing if we got consensus.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
> compatibility but most users would think "k(n1, n2, n3)" as quorum
> after introduced quorum.
> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
> n3)" style before 9.6 releasing if we got consensus.

Considering breaking backward-compatibility in the next release does
not sound like a good idea to me for a new feature that is going to be
GA soon.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Vik Fearing
Date:
On 09/09/2016 03:28 AM, Michael Paquier wrote:
> On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
>> compatibility but most users would think "k(n1, n2, n3)" as quorum
>> after introduced quorum.
>> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
>> n3)" style before 9.6 releasing if we got consensus.
> 
> Considering breaking backward-compatibility in the next release does
> not sound like a good idea to me for a new feature that is going to be
> GA soon.

Indeed.  I'll vote for pulling a fast one on 9.6 for this.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Quorum commit for multiple synchronous replication.

From
Petr Jelinek
Date:

On 09/09/16 08:23, Vik Fearing wrote:
> On 09/09/2016 03:28 AM, Michael Paquier wrote:
>> On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
>>> compatibility but most users would think "k(n1, n2, n3)" as quorum
>>> after introduced quorum.
>>> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
>>> n3)" style before 9.6 releasing if we got consensus.
>>
>> Considering breaking backward-compatibility in the next release does
>> not sound like a good idea to me for a new feature that is going to be
>> GA soon.
>
> Indeed.  I'll vote for pulling a fast one on 9.6 for this.
>

+1

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Quorum commit for multiple synchronous replication.

From
Simon Riggs
Date:
On 8 September 2016 at 10:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
> compatibility but most users would think "k(n1, n2, n3)" as quorum
> after introduced quorum.
> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
> n3)" style before 9.6 releasing if we got consensus.

Let's see the proposed patch, so we can evaluate the proposal.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Fri, Sep 9, 2016 at 6:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 September 2016 at 10:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
>> compatibility but most users would think "k(n1, n2, n3)" as quorum
>> after introduced quorum.
>> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
>> n3)" style before 9.6 releasing if we got consensus.
>
> Let's see the proposed patch, so we can evaluate the proposal.
>

Attached 2 patches.
000 patch changes syntax of s_s_names from 'k(n1, n2, n3)' to 'First k
(n1, n2,n3)' for PG9.6.
001 patch adds the quorum commit using syntax 'Any k (n1, n2,n3)' for PG10.

Since we already released 9.6RC1, I understand that it's quite hard to
change syntax of 9.6.
But considering that we support the quorum commit, this could be one
of the solutions in order to avoid breaking backward compatibility and
to provide useful user interface.
So I attached these patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Since we already released 9.6RC1, I understand that it's quite hard to
> change syntax of 9.6.
> But considering that we support the quorum commit, this could be one
> of the solutions in order to avoid breaking backward compatibility and
> to provide useful user interface.
> So I attached these patches.
standby_config:
-        standby_list                { $$ = create_syncrep_config("1", $1); }
-        | FIRST NUM '(' standby_list ')'    { $$ =
create_syncrep_config($1, $4); }
+        standby_list                        { $$ =
create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
+        | ANY NUM '(' standby_list ')'        { $$ =
create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
+        | FIRST NUM '(' standby_list ')'    { $$ =
create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }

Reading again the thread, it seems that my previous post [1] was a bit
misunderstood. My position is to not introduce any new behavior
changes in 9.6, so we could just make the FIRST NUM grammar equivalent
to NUM.

[1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Vik Fearing
Date:
On 09/21/2016 08:30 AM, Michael Paquier wrote:
> On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Since we already released 9.6RC1, I understand that it's quite hard to
>> change syntax of 9.6.
>> But considering that we support the quorum commit, this could be one
>> of the solutions in order to avoid breaking backward compatibility and
>> to provide useful user interface.
>> So I attached these patches.
> 
>  standby_config:
> -        standby_list                { $$ = create_syncrep_config("1", $1); }
> -        | FIRST NUM '(' standby_list ')'    { $$ =
> create_syncrep_config($1, $4); }
> +        standby_list                        { $$ =
> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
> +        | ANY NUM '(' standby_list ')'        { $$ =
> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
> +        | FIRST NUM '(' standby_list ')'    { $$ =
> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
> 
> Reading again the thread, it seems that my previous post [1] was a bit
> misunderstood. My position is to not introduce any new behavior
> changes in 9.6, so we could just make the FIRST NUM grammar equivalent
> to NUM.
> 
> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com

I misunderstood your intent, then.  But I still stand by what I did
understand, namely that 'k (...)'  should mean 'any k (...)'.  It's much
more natural than having it mean 'first k (...)' and I also think it
will be more frequent in practice.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Quorum commit for multiple synchronous replication.

From
Petr Jelinek
Date:
On 21/09/16 09:18, Vik Fearing wrote:
> On 09/21/2016 08:30 AM, Michael Paquier wrote:
>> On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Since we already released 9.6RC1, I understand that it's quite hard to
>>> change syntax of 9.6.
>>> But considering that we support the quorum commit, this could be one
>>> of the solutions in order to avoid breaking backward compatibility and
>>> to provide useful user interface.
>>> So I attached these patches.
>>
>>  standby_config:
>> -        standby_list                { $$ = create_syncrep_config("1", $1); }
>> -        | FIRST NUM '(' standby_list ')'    { $$ =
>> create_syncrep_config($1, $4); }
>> +        standby_list                        { $$ =
>> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
>> +        | ANY NUM '(' standby_list ')'        { $$ =
>> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
>> +        | FIRST NUM '(' standby_list ')'    { $$ =
>> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
>>
>> Reading again the thread, it seems that my previous post [1] was a bit
>> misunderstood. My position is to not introduce any new behavior
>> changes in 9.6, so we could just make the FIRST NUM grammar equivalent
>> to NUM.
>>
>> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com
> 
> I misunderstood your intent, then.  But I still stand by what I did
> understand, namely that 'k (...)'  should mean 'any k (...)'.  It's much
> more natural than having it mean 'first k (...)' and I also think it
> will be more frequent in practice.
> 

I think so as well.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Quorum commit for multiple synchronous replication.

From
Robert Haas
Date:
On Wed, Sep 21, 2016 at 5:54 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>> Reading again the thread, it seems that my previous post [1] was a bit
>>> misunderstood. My position is to not introduce any new behavior
>>> changes in 9.6, so we could just make the FIRST NUM grammar equivalent
>>> to NUM.
>>>
>>> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com
>>
>> I misunderstood your intent, then.  But I still stand by what I did
>> understand, namely that 'k (...)'  should mean 'any k (...)'.  It's much
>> more natural than having it mean 'first k (...)' and I also think it
>> will be more frequent in practice.
>>
>
> I think so as well.

Well, I agree, but I think making behavior changes after rc1 is a
non-starter.  It's better to live with the incompatibility than to
change the behavior so close to release.  At least, that's my
position.  Getting the release out on time with a minimal bug count is
more important to me than a minor incompatibility in the meaning of
one GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Sep 21, 2016 at 11:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 5:54 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>>> Reading again the thread, it seems that my previous post [1] was a bit
>>>> misunderstood. My position is to not introduce any new behavior
>>>> changes in 9.6, so we could just make the FIRST NUM grammar equivalent
>>>> to NUM.
>>>>
>>>> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com
>>>
>>> I misunderstood your intent, then.  But I still stand by what I did
>>> understand, namely that 'k (...)'  should mean 'any k (...)'.  It's much
>>> more natural than having it mean 'first k (...)' and I also think it
>>> will be more frequent in practice.
>>>
>>
>> I think so as well.
>
> Well, I agree, but I think making behavior changes after rc1 is a
> non-starter.  It's better to live with the incompatibility than to
> change the behavior so close to release.  At least, that's my
> position.  Getting the release out on time with a minimal bug count is
> more important to me than a minor incompatibility in the meaning of
> one GUC.
>

As the release team announced, it's better to postpone changing the
syntax of existing s_s_name.
I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
quorum commit.
That is,
1.  'First k (n1, n2, n3)' means that the master server waits for ACKs
from k standby servers whose name appear earlier in the list.
2.  'Any k (n1, n2, n3)' means that the master server waits for ACKs
from any k listed standby servers.
3.  'n1, n2, n3' is the same as #1 with k=1.
4.  '(n1, n2, n3)' is the same as #2 with k=1.

Attached updated patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
> quorum commit.
> That is,
> 1.  'First k (n1, n2, n3)' means that the master server waits for ACKs
> from k standby servers whose name appear earlier in the list.
> 2.  'Any k (n1, n2, n3)' means that the master server waits for ACKs
> from any k listed standby servers.
> 3.  'n1, n2, n3' is the same as #1 with k=1.
> 4.  '(n1, n2, n3)' is the same as #2 with k=1.

OK, so I have done a review of this patch keeping that in mind as
that's the consensus. I am still getting familiar with the code...

-    transactions will wait until all the standby servers which are considered
+    transactions will wait until the multiple standby servers which
are considered
There is no real need to update this sentence.

+        <literal>FIRST</> means to control the standby servers with
+        different priorities. The synchronous standbys will be those
+        whose name appear earlier in this list, and that are both
+        currently connected and streaming data in real-time(as shown
+        by a state of <literal>streaming</> in the
+        <link linkend="monitoring-stats-views-table">
+        <literal>pg_stat_replication</></link> view). Other standby
+        servers appearing later in this list represent potential
+        synchronous standbys. If any of the current synchronous
+        standbys disconnects for whatever reason, it will be replaced
+        immediately with the next-highest-priority standby.
+        For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</>
+        makes transaction commits wait until their WAL records are received
+        by three higher-priority standbys chosen from standby servers
+        <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
It does not seem necessary to me to enter in this level of details:
The keyword FIRST, coupled with an integer number N, chooses the first
N higher-priority standbys and makes transaction commit when their WAL
records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</>
makes transaction commits wait until their WAL records are received by
the three high-priority standbys chosen from standby servers s1, s2,
s3 and s4.

+        <literal>ANY</> means to control all of standby servers with
+        same priority. The master sever will wait for receipt from
+        at least <replaceable class="parameter">num_sync</replaceable>
+        standbys, which is quorum commit in the literature. The all of
+        listed standbys are considered as candidate of quorum commit.
+        For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes
+        transaction commits wait until receiving receipts from at least
+        any three standbys of four listed servers <literal>s1</>,
+        <literal>s2</>, <literal>s3</>, <literal>s4</>.

Similarly, something like that...
The keyword ANY, coupled with an integer number N, chooses N standbys
in a set of standbys with the same, lowest, priority and makes
transaction commit when WAL records are received those N standbys. For
example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL
records have been received from 3 servers in the set s1, s2, s3 and
s4.

It could be good also to mention that no keyword specified means ANY,
which is incompatible with 9.6. The docs also miss the fact that if a
simple list of servers is given, without parenthesis and keywords,
this is equivalent to FIRST 1.

-synchronous_standby_names = '2 (s1, s2, s3)'
+synchronous_standby_names = 'First 2 (s1, s2, s3)'
Nit here. It may be a good idea to just use upper-case characters in
the docs, or just lower-case for consistency, but not mix both.
Usually GUCs use lower-case characters.

+      when standby is considered as a condidate of quorum commit.</entry>
s/condidate/candidate/

-syncrep_scanner.c: FLEXFLAGS = -CF -p
+syncrep_scanner.c: FLEXFLAGS = -CF -p -i
Hm... Is that actually a good idea? Now "NODE" and "node" are two
different things for application_name, but with this patch both would
have the same meaning. I am getting to think that we could just use
the lower-case characters for the keywords any/first. Is this -i
switch a problem for elements in standby_list?

+ * Calculate the 'pos' newest Write, Flush and Apply positions among
sync standbys.
I don't understand this comment.

+   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
+       got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,
+                                            &applyPtr, &am_sync);
+   else /* SYNC_REP_QUORUM */
+       got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr,
+                                             &applyPtr,
SyncRepConfig->num_sync,
+                                             &am_sync);
Those could be grouped together, there is no need to have pos as an argument.

+   /* In quroum method, all sync standby priorities are always 1 */
+   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
+       priority = 1;
This is dead code, SyncRepGetSyncStandbysPriority is not called for
QUORUM. You may want to add an assert in
SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
sure that they are getting called for the correct method.

+   /* Sort each array in descending order to get 'pos' newest element */
+   qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
+   qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
+   qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
There is no need to reorder things again and to use arrays, you can
choose the newest LSNs when scanning the WalSnd entries.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> OK, so I have done a review of this patch keeping that in mind as
> that's the consensus. I am still getting familiar with the code...

Returned with feedback for now. This just needs polishing so feel free
to move it to the next CF once you have a new patch.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
>> quorum commit.
>> That is,
>> 1.  'First k (n1, n2, n3)' means that the master server waits for ACKs
>> from k standby servers whose name appear earlier in the list.
>> 2.  'Any k (n1, n2, n3)' means that the master server waits for ACKs
>> from any k listed standby servers.
>> 3.  'n1, n2, n3' is the same as #1 with k=1.
>> 4.  '(n1, n2, n3)' is the same as #2 with k=1.
>
> OK, so I have done a review of this patch keeping that in mind as
> that's the consensus. I am still getting familiar with the code...

Thank you for reviewing!

> -    transactions will wait until all the standby servers which are considered
> +    transactions will wait until the multiple standby servers which
> are considered
> There is no real need to update this sentence.
>
> +        <literal>FIRST</> means to control the standby servers with
> +        different priorities. The synchronous standbys will be those
> +        whose name appear earlier in this list, and that are both
> +        currently connected and streaming data in real-time(as shown
> +        by a state of <literal>streaming</> in the
> +        <link linkend="monitoring-stats-views-table">
> +        <literal>pg_stat_replication</></link> view). Other standby
> +        servers appearing later in this list represent potential
> +        synchronous standbys. If any of the current synchronous
> +        standbys disconnects for whatever reason, it will be replaced
> +        immediately with the next-highest-priority standby.
> +        For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</>
> +        makes transaction commits wait until their WAL records are received
> +        by three higher-priority standbys chosen from standby servers
> +        <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
> It does not seem necessary to me to enter in this level of details:
> The keyword FIRST, coupled with an integer number N, chooses the first
> N higher-priority standbys and makes transaction commit when their WAL
> records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</>
> makes transaction commits wait until their WAL records are received by
> the three high-priority standbys chosen from standby servers s1, s2,
> s3 and s4.

Will fix.

> +        <literal>ANY</> means to control all of standby servers with
> +        same priority. The master sever will wait for receipt from
> +        at least <replaceable class="parameter">num_sync</replaceable>
> +        standbys, which is quorum commit in the literature. The all of
> +        listed standbys are considered as candidate of quorum commit.
> +        For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes
> +        transaction commits wait until receiving receipts from at least
> +        any three standbys of four listed servers <literal>s1</>,
> +        <literal>s2</>, <literal>s3</>, <literal>s4</>.
>
> Similarly, something like that...
> The keyword ANY, coupled with an integer number N, chooses N standbys
> in a set of standbys with the same, lowest, priority and makes
> transaction commit when WAL records are received those N standbys. For
> example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL
> records have been received from 3 servers in the set s1, s2, s3 and
> s4.

Will fix.

> It could be good also to mention that no keyword specified means ANY,
> which is incompatible with 9.6. The docs also miss the fact that if a
> simple list of servers is given, without parenthesis and keywords,
> this is equivalent to FIRST 1.

Right. I will add those documentations.

> -synchronous_standby_names = '2 (s1, s2, s3)'
> +synchronous_standby_names = 'First 2 (s1, s2, s3)'
> Nit here. It may be a good idea to just use upper-case characters in
> the docs, or just lower-case for consistency, but not mix both.
> Usually GUCs use lower-case characters.

Agree. Will fix.

> +      when standby is considered as a condidate of quorum commit.</entry>
> s/condidate/candidate/

Will fix.

> -syncrep_scanner.c: FLEXFLAGS = -CF -p
> +syncrep_scanner.c: FLEXFLAGS = -CF -p -i
> Hm... Is that actually a good idea? Now "NODE" and "node" are two
> different things for application_name, but with this patch both would
> have the same meaning. I am getting to think that we could just use
> the lower-case characters for the keywords any/first. Is this -i
> switch a problem for elements in standby_list?

The string of standby name is not changed actually, only the parser
doesn't distinguish between "NODE" and "node".
The values used for checking application_name will still works fine.
If we want to name "first" or "any" as the standby name then it should
be double quoted.

> + * Calculate the 'pos' newest Write, Flush and Apply positions among
> sync standbys.
> I don't understand this comment.
>
> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> +       got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,
> +                                            &applyPtr, &am_sync);
> +   else /* SYNC_REP_QUORUM */
> +       got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr,
> +                                             &applyPtr,
> SyncRepConfig->num_sync,
> +                                             &am_sync);
> Those could be grouped together, there is no need to have pos as an argument.

Will fix.

> +   /* In quroum method, all sync standby priorities are always 1 */
> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
> +       priority = 1;
> This is dead code, SyncRepGetSyncStandbysPriority is not called for
> QUORUM.

Well, this code is in SyncRepGetStandbyPriority which is called by
SyncRepInitConifig.
SyncRepGetStandbyPriority can be called regardless of the the
synchronization method.


> You may want to add an assert in
> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
> sure that they are getting called for the correct method.
> +   /* Sort each array in descending order to get 'pos' newest element */
> +   qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
> +   qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
> +   qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
> There is no need to reorder things again and to use arrays, you can
> choose the newest LSNs when scanning the WalSnd entries.

I considered it that but it depends on performance.
Current patch avoids O(N*M).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> You may want to add an assert in
>> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
>> sure that they are getting called for the correct method.
>> +   /* Sort each array in descending order to get 'pos' newest element */
>> +   qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +   qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +   qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> There is no need to reorder things again and to use arrays, you can
>> choose the newest LSNs when scanning the WalSnd entries.
>
> I considered it that but it depends on performance.
> Current patch avoids O(N*M).

I am surprised by this statement. You would have O(N) by just
discarding the oldest LSN values while holding the spinlock of each
WAL sender. What SyncRepGetNNewestSyncRecPtr looks for are just the
newest apply, write and flush positions.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Tue, Oct 11, 2016 at 6:08 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> You may want to add an assert in
>>> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
>>> sure that they are getting called for the correct method.
>>> +   /* Sort each array in descending order to get 'pos' newest element */
>>> +   qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
>>> +   qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
>>> +   qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>>> There is no need to reorder things again and to use arrays, you can
>>> choose the newest LSNs when scanning the WalSnd entries.
>>
>> I considered it that but it depends on performance.
>> Current patch avoids O(N*M).
>
> I am surprised by this statement. You would have O(N) by just
> discarding the oldest LSN values while holding the spinlock of each
> WAL sender. What SyncRepGetNNewestSyncRecPtr looks for are just the
> newest apply, write and flush positions.

Bah, stupid. I just missed the point with 'pos'. Now I see the trick.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
>>> quorum commit.
>>> That is,
>>> 1.  'First k (n1, n2, n3)' means that the master server waits for ACKs
>>> from k standby servers whose name appear earlier in the list.
>>> 2.  'Any k (n1, n2, n3)' means that the master server waits for ACKs
>>> from any k listed standby servers.
>>> 3.  'n1, n2, n3' is the same as #1 with k=1.
>>> 4.  '(n1, n2, n3)' is the same as #2 with k=1.
>>
>> OK, so I have done a review of this patch keeping that in mind as
>> that's the consensus. I am still getting familiar with the code...
>
> Thank you for reviewing!
>
>> -    transactions will wait until all the standby servers which are considered
>> +    transactions will wait until the multiple standby servers which
>> are considered
>> There is no real need to update this sentence.
>>
>> +        <literal>FIRST</> means to control the standby servers with
>> +        different priorities. The synchronous standbys will be those
>> +        whose name appear earlier in this list, and that are both
>> +        currently connected and streaming data in real-time(as shown
>> +        by a state of <literal>streaming</> in the
>> +        <link linkend="monitoring-stats-views-table">
>> +        <literal>pg_stat_replication</></link> view). Other standby
>> +        servers appearing later in this list represent potential
>> +        synchronous standbys. If any of the current synchronous
>> +        standbys disconnects for whatever reason, it will be replaced
>> +        immediately with the next-highest-priority standby.
>> +        For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</>
>> +        makes transaction commits wait until their WAL records are received
>> +        by three higher-priority standbys chosen from standby servers
>> +        <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
>> It does not seem necessary to me to enter in this level of details:
>> The keyword FIRST, coupled with an integer number N, chooses the first
>> N higher-priority standbys and makes transaction commit when their WAL
>> records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</>
>> makes transaction commits wait until their WAL records are received by
>> the three high-priority standbys chosen from standby servers s1, s2,
>> s3 and s4.
>
> Will fix.
>
>> +        <literal>ANY</> means to control all of standby servers with
>> +        same priority. The master sever will wait for receipt from
>> +        at least <replaceable class="parameter">num_sync</replaceable>
>> +        standbys, which is quorum commit in the literature. The all of
>> +        listed standbys are considered as candidate of quorum commit.
>> +        For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes
>> +        transaction commits wait until receiving receipts from at least
>> +        any three standbys of four listed servers <literal>s1</>,
>> +        <literal>s2</>, <literal>s3</>, <literal>s4</>.
>>
>> Similarly, something like that...
>> The keyword ANY, coupled with an integer number N, chooses N standbys
>> in a set of standbys with the same, lowest, priority and makes
>> transaction commit when WAL records are received those N standbys. For
>> example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL
>> records have been received from 3 servers in the set s1, s2, s3 and
>> s4.
>
> Will fix.
>
>> It could be good also to mention that no keyword specified means ANY,
>> which is incompatible with 9.6. The docs also miss the fact that if a
>> simple list of servers is given, without parenthesis and keywords,
>> this is equivalent to FIRST 1.
>
> Right. I will add those documentations.
>
>> -synchronous_standby_names = '2 (s1, s2, s3)'
>> +synchronous_standby_names = 'First 2 (s1, s2, s3)'
>> Nit here. It may be a good idea to just use upper-case characters in
>> the docs, or just lower-case for consistency, but not mix both.
>> Usually GUCs use lower-case characters.
>
> Agree. Will fix.
>
>> +      when standby is considered as a condidate of quorum commit.</entry>
>> s/condidate/candidate/
>
> Will fix.
>
>> -syncrep_scanner.c: FLEXFLAGS = -CF -p
>> +syncrep_scanner.c: FLEXFLAGS = -CF -p -i
>> Hm... Is that actually a good idea? Now "NODE" and "node" are two
>> different things for application_name, but with this patch both would
>> have the same meaning. I am getting to think that we could just use
>> the lower-case characters for the keywords any/first. Is this -i
>> switch a problem for elements in standby_list?
>
> The string of standby name is not changed actually, only the parser
> doesn't distinguish between "NODE" and "node".
> The values used for checking application_name will still works fine.
> If we want to name "first" or "any" as the standby name then it should
> be double quoted.
>
>> + * Calculate the 'pos' newest Write, Flush and Apply positions among
>> sync standbys.
>> I don't understand this comment.
>>
>> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>> +       got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,
>> +                                            &applyPtr, &am_sync);
>> +   else /* SYNC_REP_QUORUM */
>> +       got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr,
>> +                                             &applyPtr,
>> SyncRepConfig->num_sync,
>> +                                             &am_sync);
>> Those could be grouped together, there is no need to have pos as an argument.
>
> Will fix.
>
>> +   /* In quroum method, all sync standby priorities are always 1 */
>> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> +       priority = 1;
>> This is dead code, SyncRepGetSyncStandbysPriority is not called for
>> QUORUM.
>
> Well, this code is in SyncRepGetStandbyPriority which is called by
> SyncRepInitConifig.
> SyncRepGetStandbyPriority can be called regardless of the the
> synchronization method.
>
>
>> You may want to add an assert in
>> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
>> sure that they are getting called for the correct method.
>> +   /* Sort each array in descending order to get 'pos' newest element */
>> +   qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +   qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +   qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> There is no need to reorder things again and to use arrays, you can
>> choose the newest LSNs when scanning the WalSnd entries.
>
> I considered it that but it depends on performance.
> Current patch avoids O(N*M).
>

Attached latest patch.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Attached latest patch.
> Please review it.

Okay, so let's move on with this patch...

+         <para>
+         The keyword <literal>ANY</> is omissible, but note that there is
+         not compatibility between <productname>PostgreSQL</> version 10 and
+         9.6 or before. For example, <literal>1 (s1, s2)</> is the same as the
+         configuration with <literal>FIRST</> and <replaceable
class="parameter">
+         num_sync</replaceable> equal to 1 in <productname>PostgreSQL</> 9.6
+         or before.  On the other hand, It's the same as the configuration with
+         <literal>ANY</> and <replaceable
class="parameter">num_sync</> equal to
+         1 in <productname>PostgreSQL</> 10 or later.
+        </para>
This paragraph could be reworded:
If FIRST or ANY are not specified, this parameter behaves as ANY. Note
that this grammar is incompatible with PostgreSQL 9.6, where no
keyword specified is equivalent as if FIRST was specified.
In short, there is no real need to specify num_sync as this behavior
does not have changed, as well as it is not necessary to mention
pre-9.6 versions as the multi-sync grammar has been added in 9.6.

-        Specifying more than one standby name can allow very high availability.
Why removing this sentence?

+        The keyword <literal>ANY</>, coupeld with an interger number N,
s/coupeld/coupled/ and s/interger/integer/, for a double hit in one
line, still...

+        The keyword <literal>ANY</>, coupeld with an interger number N,
+        chooses N standbys in a set of standbys with the same, lowest,
+        priority and makes transaction commit when WAL records are received
+        those N standbys.
This could be reworded more simply, for example: The keyword ANY,
coupled with an integer number N, makes transaction commits wait until
WAL records are received from N connected standbys among those defined
in the list of synchronous_standby_names.

+    <literal>s2</> and <literal>s3</> wil be considered as synchronous standby
s/wil/will/

+      when standby is considered as a condidate of quorum commit.</entry>
s/condidate/candidate/

[... stopping here ...] Please be more careful with the documentation
and comment grammar. There are other things in the patch..

A bunch of comments at the top of syncrep.c need to be updated.

+extern List *SyncRepGetSyncStandbysPriority(bool *am_sync);
+extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync);
Those two should be static routines in syncrep.c, let's keep the whole
logic about quorum and higher-priority definition only there and not
bother the callers of them about that.

+   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
+       return SyncRepGetSyncStandbysPriority(am_sync);
+   else /* SYNC_REP_QUORUM */
+       return SyncRepGetSyncStandbysQuorum(am_sync);
Both routines share the same logic to detect if a WAL sender can be
selected as a candidate for sync evaluation or not, still per the
selection they do I agree that it is better to keep them as separate.

+   /* In quroum method, all sync standby priorities are always 1 */
+   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
+       priority = 1;
Honestly I don't understand why you are enforcing that. Priority can
be important for users willing to switch from ANY to FIRST to have a
look immediately at what are the standbys that would become sync or
potential.
           else if (list_member_int(sync_standbys, i))
-               values[7] = CStringGetTextDatum("sync");
+               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
+                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
The comment at the top of this code block needs to be refreshed.

The representation given to the user in pg_stat_replication is not
enough IMO. For example, imagine a cluster with 4 standbys:
=# select application_name, sync_priority, sync_state from pg_stat_replication ;application_name | sync_priority |
sync_state
------------------+---------------+------------node_5433        |             0 | asyncnode_5434        |             0
|asyncnode_5435        |             0 | asyncnode_5436        |             0 | async
 
(4 rows)

If FIRST N is used, is it easy for the user to understand what are the
nodes in sync:
=# alter system set synchronous_standby_names = 'FIRST 2 (node_5433,
node_5434, node_5435)';
ALTER SYSTEM
=# select pg_reload_conf();pg_reload_conf
----------------t
(1 row)
=# select application_name, sync_priority, sync_state from pg_stat_replication ;application_name | sync_priority |
sync_state
------------------+---------------+------------node_5433        |             1 | syncnode_5434        |             2
|syncnode_5435        |             3 | potentialnode_5436        |             0 | async
 
(4 rows)

In this case it is easy to understand that two nodes are required to be in sync.

When using ANY similarly for three nodes, here is what
pg_stat_replication tells:
=# select application_name, sync_priority, sync_state from pg_stat_replication ;application_name | sync_priority |
sync_state
------------------+---------------+------------node_5433        |             1 | quorumnode_5434        |
1| quorumnode_5435        |             1 | quorumnode_5436        |             0 | async
 
(4 rows)

It is not possible to guess from how many standbys this needs to wait
for. One idea would be to mark the sync_state not as "quorum", but
"quorum-N", or just add a new column to indicate how many in the set
need to give a commit confirmation.

The patch is going into the right direction in my opinion.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Attached latest patch.
>> Please review it.
>
> Okay, so let's move on with this patch...

Thank you for reviewing this patch.

> +         <para>
> +         The keyword <literal>ANY</> is omissible, but note that there is
> +         not compatibility between <productname>PostgreSQL</> version 10 and
> +         9.6 or before. For example, <literal>1 (s1, s2)</> is the same as the
> +         configuration with <literal>FIRST</> and <replaceable
> class="parameter">
> +         num_sync</replaceable> equal to 1 in <productname>PostgreSQL</> 9.6
> +         or before.  On the other hand, It's the same as the configuration with
> +         <literal>ANY</> and <replaceable
> class="parameter">num_sync</> equal to
> +         1 in <productname>PostgreSQL</> 10 or later.
> +        </para>
> This paragraph could be reworded:
> If FIRST or ANY are not specified, this parameter behaves as ANY. Note
> that this grammar is incompatible with PostgreSQL 9.6, where no
> keyword specified is equivalent as if FIRST was specified.
> In short, there is no real need to specify num_sync as this behavior
> does not have changed, as well as it is not necessary to mention
> pre-9.6 versions as the multi-sync grammar has been added in 9.6.

Fixed.

> -        Specifying more than one standby name can allow very high availability.
> Why removing this sentence?
>
> +        The keyword <literal>ANY</>, coupeld with an interger number N,
> s/coupeld/coupled/ and s/interger/integer/, for a double hit in one
> line, still...
>
> +        The keyword <literal>ANY</>, coupeld with an interger number N,
> +        chooses N standbys in a set of standbys with the same, lowest,
> +        priority and makes transaction commit when WAL records are received
> +        those N standbys.
> This could be reworded more simply, for example: The keyword ANY,
> coupled with an integer number N, makes transaction commits wait until
> WAL records are received from N connected standbys among those defined
> in the list of synchronous_standby_names.
>
> +    <literal>s2</> and <literal>s3</> wil be considered as synchronous standby
> s/wil/will/
>
> +      when standby is considered as a condidate of quorum commit.</entry>
> s/condidate/candidate/
>
> [... stopping here ...] Please be more careful with the documentation
> and comment grammar. There are other things in the patch..

I fix some typo as much as I found.

> A bunch of comments at the top of syncrep.c need to be updated.
>
> +extern List *SyncRepGetSyncStandbysPriority(bool *am_sync);
> +extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync);
> Those two should be static routines in syncrep.c, let's keep the whole
> logic about quorum and higher-priority definition only there and not
> bother the callers of them about that.

Fixed.

> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> +       return SyncRepGetSyncStandbysPriority(am_sync);
> +   else /* SYNC_REP_QUORUM */
> +       return SyncRepGetSyncStandbysQuorum(am_sync);
> Both routines share the same logic to detect if a WAL sender can be
> selected as a candidate for sync evaluation or not, still per the
> selection they do I agree that it is better to keep them as separate.
>
> +   /* In quroum method, all sync standby priorities are always 1 */
> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
> +       priority = 1;
> Honestly I don't understand why you are enforcing that. Priority can
> be important for users willing to switch from ANY to FIRST to have a
> look immediately at what are the standbys that would become sync or
> potential.

I thought that since all standbys appearing in s_s_names list are
treated equally in quorum method, these standbys should have same
priority.
If these standby have different sync_priority, it looks like that
master server replicates to standby server based on priority.

>             else if (list_member_int(sync_standbys, i))
> -               values[7] = CStringGetTextDatum("sync");
> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
> The comment at the top of this code block needs to be refreshed.

Fixed.

> The representation given to the user in pg_stat_replication is not
> enough IMO. For example, imagine a cluster with 4 standbys:
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
>  application_name | sync_priority | sync_state
> ------------------+---------------+------------
>  node_5433        |             0 | async
>  node_5434        |             0 | async
>  node_5435        |             0 | async
>  node_5436        |             0 | async
> (4 rows)
>
> If FIRST N is used, is it easy for the user to understand what are the
> nodes in sync:
> =# alter system set synchronous_standby_names = 'FIRST 2 (node_5433,
> node_5434, node_5435)';
> ALTER SYSTEM
> =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
>  application_name | sync_priority | sync_state
> ------------------+---------------+------------
>  node_5433        |             1 | sync
>  node_5434        |             2 | sync
>  node_5435        |             3 | potential
>  node_5436        |             0 | async
> (4 rows)
>
> In this case it is easy to understand that two nodes are required to be in sync.
>
> When using ANY similarly for three nodes, here is what
> pg_stat_replication tells:
> =# select application_name, sync_priority, sync_state from pg_stat_replication ;
>  application_name | sync_priority | sync_state
> ------------------+---------------+------------
>  node_5433        |             1 | quorum
>  node_5434        |             1 | quorum
>  node_5435        |             1 | quorum
>  node_5436        |             0 | async
> (4 rows)
>
> It is not possible to guess from how many standbys this needs to wait
> for. One idea would be to mark the sync_state not as "quorum", but
> "quorum-N", or just add a new column to indicate how many in the set
> need to give a commit confirmation.

As Simon suggested before, we could support another feature that
allows the client to control the quorum number.
Considering adding that feature, I thought it's better to have and
control that information as a GUC parameter.
Thought?

Attached latest v5 patch.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>> +       return SyncRepGetSyncStandbysPriority(am_sync);
>> +   else /* SYNC_REP_QUORUM */
>> +       return SyncRepGetSyncStandbysQuorum(am_sync);
>> Both routines share the same logic to detect if a WAL sender can be
>> selected as a candidate for sync evaluation or not, still per the
>> selection they do I agree that it is better to keep them as separate.
>>
>> +   /* In quroum method, all sync standby priorities are always 1 */
>> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> +       priority = 1;
>> Honestly I don't understand why you are enforcing that. Priority can
>> be important for users willing to switch from ANY to FIRST to have a
>> look immediately at what are the standbys that would become sync or
>> potential.
>
> I thought that since all standbys appearing in s_s_names list are
> treated equally in quorum method, these standbys should have same
> priority.
> If these standby have different sync_priority, it looks like that
> master server replicates to standby server based on priority.

No actually, because we know that they are a quorum set, and that they
work in the same set. The concept of priorities has no real meaning
for quorum as there is no ordering of the elements. Another, perhaps
cleaner idea may be to mark the field as NULL actually.

>> It is not possible to guess from how many standbys this needs to wait
>> for. One idea would be to mark the sync_state not as "quorum", but
>> "quorum-N", or just add a new column to indicate how many in the set
>> need to give a commit confirmation.
>
> As Simon suggested before, we could support another feature that
> allows the client to control the quorum number.
> Considering adding that feature, I thought it's better to have and
> control that information as a GUC parameter.
> Thought?

Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry
is not that much legitimate, users could just look at s_s_names to
guess how many in hte set a commit needs to wait for.

+        <para>
+        <literal>FIRST</> and <literal>ANY</> are case-insensitive word
+        and the standby name having these words are must be double-quoted.
+        </para>
s/word/words/.

+        <literal>FIRST</> and <literal>ANY</> specify the method of
+        that how master server controls the standby servers.
A little bit hard to understand, I would suggest:
FIRST and ANY specify the method used by the master to control the
standby servers.

+        The keyword <literal>FIRST</>, coupled with an integer
+        number N higher-priority standbys and makes transaction commit
+        when their WAL records are received.
This is unclear to me. Here is a correction:
The keyword FIRST, coupled with an integer N, makes transaction commit
wait until WAL records are received fron the N standbys with higher
priority number.

+        <varname>synchronous_standby_names</>. For example, a setting
+        of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits
+        wait until receiving receipts from at least any three standbys
+        of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>,
This could just mention WAL records instead of "receipts".

Instead of saying "an integer number N", we could use <literal>num_sync</>.

+         If <literal>FIRST</> or <literal>ANY</> are not specified,
this parameter
+         behaves as <literal>ANY</>. Note that this grammar is
incompatible with
+         <productname>PostgresSQL</> 9.6, where no keyword specified
is equivalent
+         as if <literal>FIRST</> was specified. In short, there is no
real need to
+         specify <replaceable class="parameter">num_sync</replaceable> as this
+         behavior does not have changed, as well as it is not
necessary to mention
+         pre-9.6 versions are the multi-sync grammar has been added in 9.6.
This paragraph could be reworked, say:
if FIRST or ANY are not specified this parameter behaves as if ANY is
used. Note that this grammar is incompatible with PostgreSQL 9.6 which
is the first version supporting multiple standbys with synchronous
replication, where no such keyword FIRST or ANY can be used. Note that
the grammar behaves as if FIRST is used, which is incompatible with
the post-9.6 behavior.

+     <entry>Synchronous state of this standby server. <literal>quorum-N</>
+     , where N is the number of synchronous standbys that transactions
+     need to wait for replies from, when standby is considered as a
+     candidate of quorum commit.</entry>
Nitpicking: I think that the comma goes to the previous line if it is
the first character of a line.

+   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
+       return SyncRepGetSyncStandbysPriority(am_sync);
+   else /* SYNC_REP_QUORUM */
+       return SyncRepGetSyncStandbysQuorum(am_sync)
Or that?
if (PRIORITY)   return StandbysPriority();
else if (QUORUM)   return StandbysQuorum();
else   elog(ERROR, "Boom");

+ * In priority method, we need the oldest these positions among sync
+ * standbys. In quorum method, we need the newest these positions
+ * specified by SyncRepConfig->num_sync.
Last sentence is grammatically incorrect, and it would be more correct
to precise the Nth LSN positions to be able to select k standbys from
a set of n ones.

+           SpinLockAcquire(&walsnd->mutex);
+           write_array[i] = walsnd->write;
+           flush_array[i]= walsnd->flush;
+           apply_array[i] = walsnd->flush;
+           SpinLockRelease(&walsnd->mutex);
A nit: adding a space on the self of the second = character. And you
need to save the apply position of the WAL sender, not the flush
position in the array that is going to be ordered.
           /*            * More easily understood version of standby state. This is purely
-            * informational, not different from priority.
+            * informational. In quorum method, we add the number to indicate
+            * how many in the set need to give a commit confirmation.            */           if (priority == 0)
       values[7] = CStringGetTextDatum("async");           else if (list_member_int(sync_standbys, i))
 
-               values[7] = CStringGetTextDatum("sync");
+               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
+                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum")
This code block and its explanation comments tell two different
stories. The comment is saying that something like "quorum-N" is used
but the code always prints "quorum".

It may be a good idea in the test to check that when no keywords is
specified the group of standbys is in quorum mode.

The code looks in good shape, I am still willing to run more advanced
tests manually.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Nov 8, 2016 at 10:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>>> +       return SyncRepGetSyncStandbysPriority(am_sync);
>>> +   else /* SYNC_REP_QUORUM */
>>> +       return SyncRepGetSyncStandbysQuorum(am_sync);
>>> Both routines share the same logic to detect if a WAL sender can be
>>> selected as a candidate for sync evaluation or not, still per the
>>> selection they do I agree that it is better to keep them as separate.
>>>
>>> +   /* In quroum method, all sync standby priorities are always 1 */
>>> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>>> +       priority = 1;
>>> Honestly I don't understand why you are enforcing that. Priority can
>>> be important for users willing to switch from ANY to FIRST to have a
>>> look immediately at what are the standbys that would become sync or
>>> potential.
>>
>> I thought that since all standbys appearing in s_s_names list are
>> treated equally in quorum method, these standbys should have same
>> priority.
>> If these standby have different sync_priority, it looks like that
>> master server replicates to standby server based on priority.
>
> No actually, because we know that they are a quorum set, and that they
> work in the same set. The concept of priorities has no real meaning
> for quorum as there is no ordering of the elements. Another, perhaps
> cleaner idea may be to mark the field as NULL actually.

We know that but I'm concerned it might confuse the user.
If these priorities are the same, it can obviously imply that all of
the standby listed in s_s_names are handled equally.

>>> It is not possible to guess from how many standbys this needs to wait
>>> for. One idea would be to mark the sync_state not as "quorum", but
>>> "quorum-N", or just add a new column to indicate how many in the set
>>> need to give a commit confirmation.
>>
>> As Simon suggested before, we could support another feature that
>> allows the client to control the quorum number.
>> Considering adding that feature, I thought it's better to have and
>> control that information as a GUC parameter.
>> Thought?
>
> Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry
> is not that much legitimate, users could just look at s_s_names to
> guess how many in hte set a commit needs to wait for.

It would be PGC_USRSET similar to synchronous_commit. The user can
specify it in statement level.

> +        <para>
> +        <literal>FIRST</> and <literal>ANY</> are case-insensitive word
> +        and the standby name having these words are must be double-quoted.
> +        </para>
> s/word/words/.
>
> +        <literal>FIRST</> and <literal>ANY</> specify the method of
> +        that how master server controls the standby servers.
> A little bit hard to understand, I would suggest:
> FIRST and ANY specify the method used by the master to control the
> standby servers.
>
> +        The keyword <literal>FIRST</>, coupled with an integer
> +        number N higher-priority standbys and makes transaction commit
> +        when their WAL records are received.
> This is unclear to me. Here is a correction:
> The keyword FIRST, coupled with an integer N, makes transaction commit
> wait until WAL records are received fron the N standbys with higher
> priority number.
>
> +        <varname>synchronous_standby_names</>. For example, a setting
> +        of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits
> +        wait until receiving receipts from at least any three standbys
> +        of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>,
> This could just mention WAL records instead of "receipts".
>
> Instead of saying "an integer number N", we could use <literal>num_sync</>.
>
> +         If <literal>FIRST</> or <literal>ANY</> are not specified,
> this parameter
> +         behaves as <literal>ANY</>. Note that this grammar is
> incompatible with
> +         <productname>PostgresSQL</> 9.6, where no keyword specified
> is equivalent
> +         as if <literal>FIRST</> was specified. In short, there is no
> real need to
> +         specify <replaceable class="parameter">num_sync</replaceable> as this
> +         behavior does not have changed, as well as it is not
> necessary to mention
> +         pre-9.6 versions are the multi-sync grammar has been added in 9.6.
> This paragraph could be reworked, say:
> if FIRST or ANY are not specified this parameter behaves as if ANY is
> used. Note that this grammar is incompatible with PostgreSQL 9.6 which
> is the first version supporting multiple standbys with synchronous
> replication, where no such keyword FIRST or ANY can be used. Note that
> the grammar behaves as if FIRST is used, which is incompatible with
> the post-9.6 behavior.
>
> +     <entry>Synchronous state of this standby server. <literal>quorum-N</>
> +     , where N is the number of synchronous standbys that transactions
> +     need to wait for replies from, when standby is considered as a
> +     candidate of quorum commit.</entry>
> Nitpicking: I think that the comma goes to the previous line if it is
> the first character of a line.
>
> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
> +       return SyncRepGetSyncStandbysPriority(am_sync);
> +   else /* SYNC_REP_QUORUM */
> +       return SyncRepGetSyncStandbysQuorum(am_sync)
> Or that?
> if (PRIORITY)
>     return StandbysPriority();
> else if (QUORUM)
>     return StandbysQuorum();
> else
>     elog(ERROR, "Boom");
>
> + * In priority method, we need the oldest these positions among sync
> + * standbys. In quorum method, we need the newest these positions
> + * specified by SyncRepConfig->num_sync.
> Last sentence is grammatically incorrect, and it would be more correct
> to precise the Nth LSN positions to be able to select k standbys from
> a set of n ones.
>
> +           SpinLockAcquire(&walsnd->mutex);
> +           write_array[i] = walsnd->write;
> +           flush_array[i]= walsnd->flush;
> +           apply_array[i] = walsnd->flush;
> +           SpinLockRelease(&walsnd->mutex);
> A nit: adding a space on the self of the second = character. And you
> need to save the apply position of the WAL sender, not the flush
> position in the array that is going to be ordered.
>
>             /*
>              * More easily understood version of standby state. This is purely
> -            * informational, not different from priority.
> +            * informational. In quorum method, we add the number to indicate
> +            * how many in the set need to give a commit confirmation.
>              */
>             if (priority == 0)
>                 values[7] = CStringGetTextDatum("async");
>             else if (list_member_int(sync_standbys, i))
> -               values[7] = CStringGetTextDatum("sync");
> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum")
> This code block and its explanation comments tell two different
> stories. The comment is saying that something like "quorum-N" is used
> but the code always prints "quorum".
>
> It may be a good idea in the test to check that when no keywords is
> specified the group of standbys is in quorum mode.

Yeah, I will add some tests.

I will post new version patch incorporated other comments.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Mon, Nov 14, 2016 at 5:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Nov 8, 2016 at 10:12 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>>>> +       return SyncRepGetSyncStandbysPriority(am_sync);
>>>> +   else /* SYNC_REP_QUORUM */
>>>> +       return SyncRepGetSyncStandbysQuorum(am_sync);
>>>> Both routines share the same logic to detect if a WAL sender can be
>>>> selected as a candidate for sync evaluation or not, still per the
>>>> selection they do I agree that it is better to keep them as separate.
>>>>
>>>> +   /* In quroum method, all sync standby priorities are always 1 */
>>>> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>>>> +       priority = 1;
>>>> Honestly I don't understand why you are enforcing that. Priority can
>>>> be important for users willing to switch from ANY to FIRST to have a
>>>> look immediately at what are the standbys that would become sync or
>>>> potential.
>>>
>>> I thought that since all standbys appearing in s_s_names list are
>>> treated equally in quorum method, these standbys should have same
>>> priority.
>>> If these standby have different sync_priority, it looks like that
>>> master server replicates to standby server based on priority.
>>
>> No actually, because we know that they are a quorum set, and that they
>> work in the same set. The concept of priorities has no real meaning
>> for quorum as there is no ordering of the elements. Another, perhaps
>> cleaner idea may be to mark the field as NULL actually.
>
> We know that but I'm concerned it might confuse the user.
> If these priorities are the same, it can obviously imply that all of
> the standby listed in s_s_names are handled equally.
>
>>>> It is not possible to guess from how many standbys this needs to wait
>>>> for. One idea would be to mark the sync_state not as "quorum", but
>>>> "quorum-N", or just add a new column to indicate how many in the set
>>>> need to give a commit confirmation.
>>>
>>> As Simon suggested before, we could support another feature that
>>> allows the client to control the quorum number.
>>> Considering adding that feature, I thought it's better to have and
>>> control that information as a GUC parameter.
>>> Thought?
>>
>> Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry
>> is not that much legitimate, users could just look at s_s_names to
>> guess how many in hte set a commit needs to wait for.
>
> It would be PGC_USRSET similar to synchronous_commit. The user can
> specify it in statement level.
>
>> +        <para>
>> +        <literal>FIRST</> and <literal>ANY</> are case-insensitive word
>> +        and the standby name having these words are must be double-quoted.
>> +        </para>
>> s/word/words/.
>>
>> +        <literal>FIRST</> and <literal>ANY</> specify the method of
>> +        that how master server controls the standby servers.
>> A little bit hard to understand, I would suggest:
>> FIRST and ANY specify the method used by the master to control the
>> standby servers.
>>
>> +        The keyword <literal>FIRST</>, coupled with an integer
>> +        number N higher-priority standbys and makes transaction commit
>> +        when their WAL records are received.
>> This is unclear to me. Here is a correction:
>> The keyword FIRST, coupled with an integer N, makes transaction commit
>> wait until WAL records are received fron the N standbys with higher
>> priority number.
>>
>> +        <varname>synchronous_standby_names</>. For example, a setting
>> +        of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits
>> +        wait until receiving receipts from at least any three standbys
>> +        of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>,
>> This could just mention WAL records instead of "receipts".
>>
>> Instead of saying "an integer number N", we could use <literal>num_sync</>.
>>
>> +         If <literal>FIRST</> or <literal>ANY</> are not specified,
>> this parameter
>> +         behaves as <literal>ANY</>. Note that this grammar is
>> incompatible with
>> +         <productname>PostgresSQL</> 9.6, where no keyword specified
>> is equivalent
>> +         as if <literal>FIRST</> was specified. In short, there is no
>> real need to
>> +         specify <replaceable class="parameter">num_sync</replaceable> as this
>> +         behavior does not have changed, as well as it is not
>> necessary to mention
>> +         pre-9.6 versions are the multi-sync grammar has been added in 9.6.
>> This paragraph could be reworked, say:
>> if FIRST or ANY are not specified this parameter behaves as if ANY is
>> used. Note that this grammar is incompatible with PostgreSQL 9.6 which
>> is the first version supporting multiple standbys with synchronous
>> replication, where no such keyword FIRST or ANY can be used. Note that
>> the grammar behaves as if FIRST is used, which is incompatible with
>> the post-9.6 behavior.
>>
>> +     <entry>Synchronous state of this standby server. <literal>quorum-N</>
>> +     , where N is the number of synchronous standbys that transactions
>> +     need to wait for replies from, when standby is considered as a
>> +     candidate of quorum commit.</entry>
>> Nitpicking: I think that the comma goes to the previous line if it is
>> the first character of a line.
>>
>> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>> +       return SyncRepGetSyncStandbysPriority(am_sync);
>> +   else /* SYNC_REP_QUORUM */
>> +       return SyncRepGetSyncStandbysQuorum(am_sync)
>> Or that?
>> if (PRIORITY)
>>     return StandbysPriority();
>> else if (QUORUM)
>>     return StandbysQuorum();
>> else
>>     elog(ERROR, "Boom");
>>
>> + * In priority method, we need the oldest these positions among sync
>> + * standbys. In quorum method, we need the newest these positions
>> + * specified by SyncRepConfig->num_sync.
>> Last sentence is grammatically incorrect, and it would be more correct
>> to precise the Nth LSN positions to be able to select k standbys from
>> a set of n ones.
>>
>> +           SpinLockAcquire(&walsnd->mutex);
>> +           write_array[i] = walsnd->write;
>> +           flush_array[i]= walsnd->flush;
>> +           apply_array[i] = walsnd->flush;
>> +           SpinLockRelease(&walsnd->mutex);
>> A nit: adding a space on the self of the second = character. And you
>> need to save the apply position of the WAL sender, not the flush
>> position in the array that is going to be ordered.
>>
>>             /*
>>              * More easily understood version of standby state. This is purely
>> -            * informational, not different from priority.
>> +            * informational. In quorum method, we add the number to indicate
>> +            * how many in the set need to give a commit confirmation.
>>              */
>>             if (priority == 0)
>>                 values[7] = CStringGetTextDatum("async");
>>             else if (list_member_int(sync_standbys, i))
>> -               values[7] = CStringGetTextDatum("sync");
>> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
>> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum")
>> This code block and its explanation comments tell two different
>> stories. The comment is saying that something like "quorum-N" is used
>> but the code always prints "quorum".
>>
>> It may be a good idea in the test to check that when no keywords is
>> specified the group of standbys is in quorum mode.
>
> Yeah, I will add some tests.
>
> I will post new version patch incorporated other comments.
>

Attached latest version patch incorporated review comments. After more
thought, I agree and changed the value of standby priority in quorum
method so that it's not set 1 forcibly. The all standby priorities are
1 If s_s_names = 'ANY(*)'.
Please review this patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Attached latest version patch incorporated review comments. After more
> thought, I agree and changed the value of standby priority in quorum
> method so that it's not set 1 forcibly. The all standby priorities are
> 1 If s_s_names = 'ANY(*)'.
> Please review this patch.

Sorry for my late reply. Here is my final lookup.

 <synopsis>
-<replaceable class="parameter">num_sync</replaceable> ( <replaceable
class="parameter">standby_name</replaceable> [, ...] )
+[ANY] <replaceable class="parameter">num_sync</replaceable> (
<replaceable class="parameter">standby_name</replaceable> [, ...] )
+FIRST <replaceable class="parameter">num_sync</replaceable> (
<replaceable class="parameter">standby_name</replaceable> [, ...] )
 <replaceable class="parameter">standby_name</replaceable> [, ...
This can just be replaced with [ ANY | FIRST ]. There is no need for
braces as the keyword is not mandatory.

+        is the name of a standby server.
+        <literal>FIRST</> and <literal>ANY</> specify the method used by
+        the master to control the standby servres.
         </para>
s/servres/servers/.

            if (priority == 0)
                values[7] = CStringGetTextDatum("async");
            else if (list_member_int(sync_standbys, i))
-               values[7] = CStringGetTextDatum("sync");
+               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
+                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
            else
                values[7] = CStringGetTextDatum("potential");
This can be simplified a bit as "quorum" is the state value for all
standbys with a non-zero priority when the method is set to
SYNC_REP_QUORUM:
            if (priority == 0)
                values[7] = CStringGetTextDatum("async");
+           else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
+               values[7] = CStringGetTextDatum("quorum");
            else if (list_member_int(sync_standbys, i))
                values[7] = CStringGetTextDatum("sync");
            else

SyncRepConfig data is made external to syncrep.c with this patch as
walsender.c needs to look at the sync method in place, no complain
about that after considering if there could be a more elegant way to
do things without this change.

While reviewing the patch, I have found a couple of incorrectly shaped
sentences, both in the docs and some comments. Attached is a new
version with this word-smithing. The patch is now switched as ready
for committer.
--
Michael

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Attached latest version patch incorporated review comments. After more
>> thought, I agree and changed the value of standby priority in quorum
>> method so that it's not set 1 forcibly. The all standby priorities are
>> 1 If s_s_names = 'ANY(*)'.
>> Please review this patch.
>
> Sorry for my late reply. Here is my final lookup.

Thank you for reviewing!

>  <synopsis>
> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable
> class="parameter">standby_name</replaceable> [, ...] )
> +[ANY] <replaceable class="parameter">num_sync</replaceable> (
> <replaceable class="parameter">standby_name</replaceable> [, ...] )
> +FIRST <replaceable class="parameter">num_sync</replaceable> (
> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>  <replaceable class="parameter">standby_name</replaceable> [, ...
> This can just be replaced with [ ANY | FIRST ]. There is no need for
> braces as the keyword is not mandatory.
>
> +        is the name of a standby server.
> +        <literal>FIRST</> and <literal>ANY</> specify the method used by
> +        the master to control the standby servres.
>          </para>
> s/servres/servers/.
>
>             if (priority == 0)
>                 values[7] = CStringGetTextDatum("async");
>             else if (list_member_int(sync_standbys, i))
> -               values[7] = CStringGetTextDatum("sync");
> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
>             else
>                 values[7] = CStringGetTextDatum("potential");
> This can be simplified a bit as "quorum" is the state value for all
> standbys with a non-zero priority when the method is set to
> SYNC_REP_QUORUM:
>             if (priority == 0)
>                 values[7] = CStringGetTextDatum("async");
> +           else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
> +               values[7] = CStringGetTextDatum("quorum");
>             else if (list_member_int(sync_standbys, i))
>                 values[7] = CStringGetTextDatum("sync");
>             else
>
> SyncRepConfig data is made external to syncrep.c with this patch as
> walsender.c needs to look at the sync method in place, no complain
> about that after considering if there could be a more elegant way to
> do things without this change.

Agreed.

> While reviewing the patch, I have found a couple of incorrectly shaped
> sentences, both in the docs and some comments. Attached is a new
> version with this word-smithing. The patch is now switched as ready
> for committer.

Thanks. I found a typo in v7 patch, so attached latest v8 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Attached latest version patch incorporated review comments. After more
>>> thought, I agree and changed the value of standby priority in quorum
>>> method so that it's not set 1 forcibly. The all standby priorities are
>>> 1 If s_s_names = 'ANY(*)'.
>>> Please review this patch.
>>
>> Sorry for my late reply. Here is my final lookup.
>
> Thank you for reviewing!
>
>>  <synopsis>
>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable
>> class="parameter">standby_name</replaceable> [, ...] )
>> +[ANY] <replaceable class="parameter">num_sync</replaceable> (
>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>> +FIRST <replaceable class="parameter">num_sync</replaceable> (
>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>  <replaceable class="parameter">standby_name</replaceable> [, ...
>> This can just be replaced with [ ANY | FIRST ]. There is no need for
>> braces as the keyword is not mandatory.
>>
>> +        is the name of a standby server.
>> +        <literal>FIRST</> and <literal>ANY</> specify the method used by
>> +        the master to control the standby servres.
>>          </para>
>> s/servres/servers/.
>>
>>             if (priority == 0)
>>                 values[7] = CStringGetTextDatum("async");
>>             else if (list_member_int(sync_standbys, i))
>> -               values[7] = CStringGetTextDatum("sync");
>> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
>> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
>>             else
>>                 values[7] = CStringGetTextDatum("potential");
>> This can be simplified a bit as "quorum" is the state value for all
>> standbys with a non-zero priority when the method is set to
>> SYNC_REP_QUORUM:
>>             if (priority == 0)
>>                 values[7] = CStringGetTextDatum("async");
>> +           else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> +               values[7] = CStringGetTextDatum("quorum");
>>             else if (list_member_int(sync_standbys, i))
>>                 values[7] = CStringGetTextDatum("sync");
>>             else
>>
>> SyncRepConfig data is made external to syncrep.c with this patch as
>> walsender.c needs to look at the sync method in place, no complain
>> about that after considering if there could be a more elegant way to
>> do things without this change.
>
> Agreed.
>
>> While reviewing the patch, I have found a couple of incorrectly shaped
>> sentences, both in the docs and some comments. Attached is a new
>> version with this word-smithing. The patch is now switched as ready
>> for committer.
>
> Thanks. I found a typo in v7 patch, so attached latest v8 patch.

Moved patch to CF 2017-01, with same status "Ready for committer".
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Attached latest version patch incorporated review comments. After more
>>> thought, I agree and changed the value of standby priority in quorum
>>> method so that it's not set 1 forcibly. The all standby priorities are
>>> 1 If s_s_names = 'ANY(*)'.
>>> Please review this patch.
>>
>> Sorry for my late reply. Here is my final lookup.
>
> Thank you for reviewing!
>
>>  <synopsis>
>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable
>> class="parameter">standby_name</replaceable> [, ...] )
>> +[ANY] <replaceable class="parameter">num_sync</replaceable> (
>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>> +FIRST <replaceable class="parameter">num_sync</replaceable> (
>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>  <replaceable class="parameter">standby_name</replaceable> [, ...
>> This can just be replaced with [ ANY | FIRST ]. There is no need for
>> braces as the keyword is not mandatory.
>>
>> +        is the name of a standby server.
>> +        <literal>FIRST</> and <literal>ANY</> specify the method used by
>> +        the master to control the standby servres.
>>          </para>
>> s/servres/servers/.
>>
>>             if (priority == 0)
>>                 values[7] = CStringGetTextDatum("async");
>>             else if (list_member_int(sync_standbys, i))
>> -               values[7] = CStringGetTextDatum("sync");
>> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
>> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
>>             else
>>                 values[7] = CStringGetTextDatum("potential");
>> This can be simplified a bit as "quorum" is the state value for all
>> standbys with a non-zero priority when the method is set to
>> SYNC_REP_QUORUM:
>>             if (priority == 0)
>>                 values[7] = CStringGetTextDatum("async");
>> +           else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> +               values[7] = CStringGetTextDatum("quorum");
>>             else if (list_member_int(sync_standbys, i))
>>                 values[7] = CStringGetTextDatum("sync");
>>             else
>>
>> SyncRepConfig data is made external to syncrep.c with this patch as
>> walsender.c needs to look at the sync method in place, no complain
>> about that after considering if there could be a more elegant way to
>> do things without this change.
>
> Agreed.
>
>> While reviewing the patch, I have found a couple of incorrectly shaped
>> sentences, both in the docs and some comments. Attached is a new
>> version with this word-smithing. The patch is now switched as ready
>> for committer.
>
> Thanks. I found a typo in v7 patch, so attached latest v8 patch.

+        qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
+        qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
+        qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);

In quorum commit, we need to calculate the N-th largest LSN from
M quorum synchronous standbys' LSN. N would be usually 1 - 3 and
M would be 1 - 10, I guess. You used the algorithm using qsort for
that calculation. But I'm not sure if that's enough effective algorithm
or not.

If M (i.e., number of quorum sync standbys) is enough large,
your choice would be good. But usually M seems not so large.

Thought?

Regards,

-- 
Fujii Masao



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> Attached latest version patch incorporated review comments. After more
>>>> thought, I agree and changed the value of standby priority in quorum
>>>> method so that it's not set 1 forcibly. The all standby priorities are
>>>> 1 If s_s_names = 'ANY(*)'.
>>>> Please review this patch.
>>>
>>> Sorry for my late reply. Here is my final lookup.
>>
>> Thank you for reviewing!
>>
>>>  <synopsis>
>>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable
>>> class="parameter">standby_name</replaceable> [, ...] )
>>> +[ANY] <replaceable class="parameter">num_sync</replaceable> (
>>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>> +FIRST <replaceable class="parameter">num_sync</replaceable> (
>>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>>  <replaceable class="parameter">standby_name</replaceable> [, ...
>>> This can just be replaced with [ ANY | FIRST ]. There is no need for
>>> braces as the keyword is not mandatory.
>>>
>>> +        is the name of a standby server.
>>> +        <literal>FIRST</> and <literal>ANY</> specify the method used by
>>> +        the master to control the standby servres.
>>>          </para>
>>> s/servres/servers/.
>>>
>>>             if (priority == 0)
>>>                 values[7] = CStringGetTextDatum("async");
>>>             else if (list_member_int(sync_standbys, i))
>>> -               values[7] = CStringGetTextDatum("sync");
>>> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
>>> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
>>>             else
>>>                 values[7] = CStringGetTextDatum("potential");
>>> This can be simplified a bit as "quorum" is the state value for all
>>> standbys with a non-zero priority when the method is set to
>>> SYNC_REP_QUORUM:
>>>             if (priority == 0)
>>>                 values[7] = CStringGetTextDatum("async");
>>> +           else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>>> +               values[7] = CStringGetTextDatum("quorum");
>>>             else if (list_member_int(sync_standbys, i))
>>>                 values[7] = CStringGetTextDatum("sync");
>>>             else
>>>
>>> SyncRepConfig data is made external to syncrep.c with this patch as
>>> walsender.c needs to look at the sync method in place, no complain
>>> about that after considering if there could be a more elegant way to
>>> do things without this change.
>>
>> Agreed.
>>
>>> While reviewing the patch, I have found a couple of incorrectly shaped
>>> sentences, both in the docs and some comments. Attached is a new
>>> version with this word-smithing. The patch is now switched as ready
>>> for committer.
>>
>> Thanks. I found a typo in v7 patch, so attached latest v8 patch.
>
> +        qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
> +        qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
> +        qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>
> In quorum commit, we need to calculate the N-th largest LSN from
> M quorum synchronous standbys' LSN. N would be usually 1 - 3 and
> M would be 1 - 10, I guess. You used the algorithm using qsort for
> that calculation. But I'm not sure if that's enough effective algorithm
> or not.
>
> If M (i.e., number of quorum sync standbys) is enough large,
> your choice would be good. But usually M seems not so large.
>

Thank you for the comment.

One another possible idea is to use the partial selection sort[1],
which takes O(MN) time. Since this is more efficient if N is small
this would be better than qsort for this case. But I'm not sure that
we can see such a difference by result of performance measurement.

[1] https://en.wikipedia.org/wiki/Selection_algorithm#Partial_selection_sort

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Tue, Dec 6, 2016 at 6:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> If M (i.e., number of quorum sync standbys) is enough large,
>> your choice would be good. But usually M seems not so large.
>>
>
> Thank you for the comment.
>
> One another possible idea is to use the partial selection sort[1],
> which takes O(MN) time. Since this is more efficient if N is small
> this would be better than qsort for this case. But I'm not sure that
> we can see such a difference by result of performance measurement.
>
> [1] https://en.wikipedia.org/wiki/Selection_algorithm#Partial_selection_sort

We'll begin to see a minimal performance impact when selecting a sync
standby across hundreds of them, which is less than say what 0.1% (or
less) of existing deployments are doing. The current approach taken
seems simple enough to be kept, and performance is not something to
worry much IMHO.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Tue, Dec 6, 2016 at 6:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> Attached latest version patch incorporated review comments. After more
>>>>> thought, I agree and changed the value of standby priority in quorum
>>>>> method so that it's not set 1 forcibly. The all standby priorities are
>>>>> 1 If s_s_names = 'ANY(*)'.
>>>>> Please review this patch.
>>>>
>>>> Sorry for my late reply. Here is my final lookup.
>>>
>>> Thank you for reviewing!
>>>
>>>>  <synopsis>
>>>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable
>>>> class="parameter">standby_name</replaceable> [, ...] )
>>>> +[ANY] <replaceable class="parameter">num_sync</replaceable> (
>>>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>>> +FIRST <replaceable class="parameter">num_sync</replaceable> (
>>>> <replaceable class="parameter">standby_name</replaceable> [, ...] )
>>>>  <replaceable class="parameter">standby_name</replaceable> [, ...
>>>> This can just be replaced with [ ANY | FIRST ]. There is no need for
>>>> braces as the keyword is not mandatory.
>>>>
>>>> +        is the name of a standby server.
>>>> +        <literal>FIRST</> and <literal>ANY</> specify the method used by
>>>> +        the master to control the standby servres.
>>>>          </para>
>>>> s/servres/servers/.
>>>>
>>>>             if (priority == 0)
>>>>                 values[7] = CStringGetTextDatum("async");
>>>>             else if (list_member_int(sync_standbys, i))
>>>> -               values[7] = CStringGetTextDatum("sync");
>>>> +               values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ?
>>>> +                   CStringGetTextDatum("sync") : CStringGetTextDatum("quorum");
>>>>             else
>>>>                 values[7] = CStringGetTextDatum("potential");
>>>> This can be simplified a bit as "quorum" is the state value for all
>>>> standbys with a non-zero priority when the method is set to
>>>> SYNC_REP_QUORUM:
>>>>             if (priority == 0)
>>>>                 values[7] = CStringGetTextDatum("async");
>>>> +           else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>>>> +               values[7] = CStringGetTextDatum("quorum");
>>>>             else if (list_member_int(sync_standbys, i))
>>>>                 values[7] = CStringGetTextDatum("sync");
>>>>             else
>>>>
>>>> SyncRepConfig data is made external to syncrep.c with this patch as
>>>> walsender.c needs to look at the sync method in place, no complain
>>>> about that after considering if there could be a more elegant way to
>>>> do things without this change.
>>>
>>> Agreed.
>>>
>>>> While reviewing the patch, I have found a couple of incorrectly shaped
>>>> sentences, both in the docs and some comments. Attached is a new
>>>> version with this word-smithing. The patch is now switched as ready
>>>> for committer.
>>>
>>> Thanks. I found a typo in v7 patch, so attached latest v8 patch.
>>
>> +        qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +        qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +        qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>>
>> In quorum commit, we need to calculate the N-th largest LSN from
>> M quorum synchronous standbys' LSN. N would be usually 1 - 3 and
>> M would be 1 - 10, I guess. You used the algorithm using qsort for
>> that calculation. But I'm not sure if that's enough effective algorithm
>> or not.
>>
>> If M (i.e., number of quorum sync standbys) is enough large,
>> your choice would be good. But usually M seems not so large.
>>
>
> Thank you for the comment.
>
> One another possible idea is to use the partial selection sort[1],
> which takes O(MN) time. Since this is more efficient if N is small
> this would be better than qsort for this case. But I'm not sure that
> we can see such a difference by result of performance measurement.

So, isn't it better to compare the performance of some algorithms and
confirm which is the best for quorum commit? Since this code is hot, i.e.,
can be very frequently executed, I'd like to avoid waste of cycle as much
as possible.

Regards,

-- 
Fujii Masao



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Dec 7, 2016 at 12:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> So, isn't it better to compare the performance of some algorithms and
> confirm which is the best for quorum commit? Since this code is hot, i.e.,
> can be very frequently executed, I'd like to avoid waste of cycle as much
> as possible.

It seems to me that it would be simple enough to write a script to do
that to avoid any other noise: allocate an array with N random
elements, and fetch the M-th element from it after applying a sort
method. I highly doubt that you'd see much difference with a low
number of elements, now if you scale at a thousand standbys in a
quorum set you may surely see something :*)
Anybody willing to try out?
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Wed, 7 Dec 2016 13:26:38 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqSyfsg=gHeqgXyzP0iGWvdyrXqnG-UENzfueaU=2m5-zg@mail.gmail.com>
> On Wed, Dec 7, 2016 at 12:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > So, isn't it better to compare the performance of some algorithms and
> > confirm which is the best for quorum commit? Since this code is hot, i.e.,
> > can be very frequently executed, I'd like to avoid waste of cycle as much
> > as possible.
> 
> It seems to me that it would be simple enough to write a script to do
> that to avoid any other noise: allocate an array with N random
> elements, and fetch the M-th element from it after applying a sort
> method. I highly doubt that you'd see much difference with a low
> number of elements, now if you scale at a thousand standbys in a
> quorum set you may surely see something :*)
> Anybody willing to try out?


Aside from measurement of the two sorting methods, I'd like to
point out that quorum commit basically doesn't need
sorting. Counting comforming santdbys while scanning the
walsender(receiver) LSN list comparing with the target LSN is
O(n). Small refactoring of SyncRerpGetOldestSyncRecPtr would
enough to do that.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Dec 7, 2016 at 2:49 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Aside from measurement of the two sorting methods, I'd like to
> point out that quorum commit basically doesn't need
> sorting. Counting conforming santdbys while scanning the
> walsender(receiver) LSN list comparing with the target LSN is
> O(n). Small refactoring of SyncRerpGetOldestSyncRecPtr would
> enough to do that.

Indeed, I haven't thought about that, and that's a no-brainer. That
would remove the need to allocate and sort each array, what is simply
needed is to track the number of times a newest value has been found.
So what this processing would do is updating the write/flush/apply
values for the first k loops if the new value is *older* than the
current one, where k is the quorum number, and between k+1 and N the
value gets updated only if the value compared is newer. No need to
take the mutex lock for a long time as well. By the way, the patch now
conflicts on HEAD, it needs a refresh.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Dec 7, 2016 at 4:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 2:49 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Aside from measurement of the two sorting methods, I'd like to
>> point out that quorum commit basically doesn't need
>> sorting. Counting conforming santdbys while scanning the
>> walsender(receiver) LSN list comparing with the target LSN is
>> O(n). Small refactoring of SyncRerpGetOldestSyncRecPtr would
>> enough to do that.

What does the target LSN mean here?

> Indeed, I haven't thought about that, and that's a no-brainer. That
> would remove the need to allocate and sort each array, what is simply
> needed is to track the number of times a newest value has been found.
> So what this processing would do is updating the write/flush/apply
> values for the first k loops if the new value is *older* than the
> current one, where k is the quorum number, and between k+1 and N the
> value gets updated only if the value compared is newer. No need to
> take the mutex lock for a long time as well.

Sorry, I could not understand this algorithm. Could you elaborate
this? It takes only O(n) times?

> By the way, the patch now
> conflicts on HEAD, it needs a refresh.

Thanks, I'll post the latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Dec 7, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 4:05 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Indeed, I haven't thought about that, and that's a no-brainer. That
>> would remove the need to allocate and sort each array, what is simply
>> needed is to track the number of times a newest value has been found.
>> So what this processing would do is updating the write/flush/apply
>> values for the first k loops if the new value is *older* than the
>> current one, where k is the quorum number, and between k+1 and N the
>> value gets updated only if the value compared is newer. No need to
>> take the mutex lock for a long time as well.
>
> Sorry, I could not understand this algorithm. Could you elaborate
> this? It takes only O(n) times?

Nah, please forget that, that was a random useless thought. There is
no way to be able to select the k-th element without knowing the
hierarchy induced by the others, which is what the partial sort would
help with here.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Robert Haas
Date:
On Tue, Dec 6, 2016 at 11:26 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 12:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> So, isn't it better to compare the performance of some algorithms and
>> confirm which is the best for quorum commit? Since this code is hot, i.e.,
>> can be very frequently executed, I'd like to avoid waste of cycle as much
>> as possible.
>
> It seems to me that it would be simple enough to write a script to do
> that to avoid any other noise: allocate an array with N random
> elements, and fetch the M-th element from it after applying a sort
> method. I highly doubt that you'd see much difference with a low
> number of elements, now if you scale at a thousand standbys in a
> quorum set you may surely see something :*)
> Anybody willing to try out?

You could do that, but first I would code up the simplest, cleanest
algorithm you can think of and see if it even shows up in a 'perf'
profile.  Microbenchmarking is probably overkill here unless a problem
is visible on macrobenchmarks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
Hello, context switch was complete that time, sorry.

There's multiple "target LET"s. So we need kth-largest LTEs.

At Wed, 7 Dec 2016 19:04:23 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqR10OnEL5XxW1DVYvAXmtpEVNCMi=V-6Jb_9owFuY8aSg@mail.gmail.com>
> On Wed, Dec 7, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Sorry, I could not understand this algorithm. Could you elaborate
> > this? It takes only O(n) times?
> 
> Nah, please forget that, that was a random useless thought. There is
> no way to be able to select the k-th element without knowing the
> hierarchy induced by the others, which is what the partial sort would
> help with here.

So, let's consider for some cases,

- needing 3-quorum among 5 standbys.
There's no problem whatever make kth-largest we choose.Of course qsorts are fine.

- needing 10 quorums among 100 standbys.
I'm not sure if there's any difference with 3/5.

- needing 10 quorums among 1000 standbys.Obviously qsorts are doing too much.  Any kind of kth-largestalgorithm should
beinvolved.  For instance quickselect with O(nlong n) - O(n) seems better in comparison to O(n log n) - O(n^2)of
qsort.

- needing 700 quorums among 1000 standbys.
I don't think this case is worth consider but kth-largest isbetter even for this case.

If we don't 700/1000 is out of at least the current scope, I also
recommend to use kth-largest selection.


If not, the quorum calculation is triggered by every standby
reply message and the frequency of the calculation seems near to
the frequency of WAL-insertion for the worst case. (Right?) Even
the kth-largest takes too long time to have 1000 standys.

Maintining kth-largest in shared memory needs less CPU but leads
to more bad contention on the shared memory.

Inversely, we already have waiting LSNs of backends in procarray.
If we have another array in the order of waiting LSNs and having
a condition varialble on the number of comforming
walsenders. Every walsender can individually looking up it and
count it up. It might performs better but I'm not sure.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> You could do that, but first I would code up the simplest, cleanest
> algorithm you can think of and see if it even shows up in a 'perf'
> profile.  Microbenchmarking is probably overkill here unless a problem
> is visible on macrobenchmarks.

This is what I would go for! The current code is doing a simple thing:
select the Nth element using qsort() after scanning each WAL sender's
values. And I think that Sawada-san got it right. Even running on my
laptop a pgbench run with 10 sync standbys using a data set that fits
into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
using perf top on a non-assert, non-debug build. Hash tables and
allocations get a far larger share. Using the patch,
SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
nodes. Let's kick the ball for now. An extra patch could make things
better later on if that's worth it.
-- 
Michael



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> You could do that, but first I would code up the simplest, cleanest
>> algorithm you can think of and see if it even shows up in a 'perf'
>> profile.  Microbenchmarking is probably overkill here unless a problem
>> is visible on macrobenchmarks.
>
> This is what I would go for! The current code is doing a simple thing:
> select the Nth element using qsort() after scanning each WAL sender's
> values. And I think that Sawada-san got it right. Even running on my
> laptop a pgbench run with 10 sync standbys using a data set that fits
> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
> using perf top on a non-assert, non-debug build. Hash tables and
> allocations get a far larger share. Using the patch,
> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
> nodes. Let's kick the ball for now. An extra patch could make things
> better later on if that's worth it.

Yeah, since the both K and N could be not large these algorithm takes
almost the same time. And current patch does simple thing. When we
need over 100 or 1000 replication node the optimization could be
required.
Attached latest v9 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Amit Kapila
Date:
On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> You could do that, but first I would code up the simplest, cleanest
>>> algorithm you can think of and see if it even shows up in a 'perf'
>>> profile.  Microbenchmarking is probably overkill here unless a problem
>>> is visible on macrobenchmarks.
>>
>> This is what I would go for! The current code is doing a simple thing:
>> select the Nth element using qsort() after scanning each WAL sender's
>> values. And I think that Sawada-san got it right. Even running on my
>> laptop a pgbench run with 10 sync standbys using a data set that fits
>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
>> using perf top on a non-assert, non-debug build. Hash tables and
>> allocations get a far larger share. Using the patch,
>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
>> nodes. Let's kick the ball for now. An extra patch could make things
>> better later on if that's worth it.
>
> Yeah, since the both K and N could be not large these algorithm takes
> almost the same time. And current patch does simple thing. When we
> need over 100 or 1000 replication node the optimization could be
> required.
> Attached latest v9 patch.
>

Few comments:

+ * In 10.0 we support two synchronization methods, priority and
+ * quorum. The number of synchronous standbys that transactions
+ * must wait for replies from and synchronization method are specified
+ * in synchronous_standby_names. This parameter also specifies a list
+ * of standby names, which determines the priority of each standby for
+ * being chosen as a synchronous standby. In priority method, the standbys
+ * whose names appear earlier in the list are given higher priority
+ * and will be considered as synchronous. Other standby servers appearing
+ * later in this list represent potential synchronous standbys. If any of
+ * the current synchronous standbys disconnects for whatever reason,
+ * it will be replaced immediately with the next-highest-priority standby.
+ * In quorum method, the all standbys appearing in the list are
+ * considered as a candidate for quorum commit.

In the above description, is priority method represented by FIRST and
quorum method by ANY in the synchronous_standby_names syntax?  If so,
it might be better to write about it explicitly.


2.--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) }
 /*
- * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
- * from each input tape.
- */
- state->memtupsize = numInputTapes;
- state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
- USEMEM(state, GetMemoryChunkSpace(state->memtuples));
-
- /*
- * Use all the remaining memory we have available for read buffers among
- * the input tapes.
+ * Use all the spare memory we have available for read buffers among the
+ * input tapes.

This doesn't belong to this patch.

3.
+ * Return the list of sync standbys using according to synchronous method,

In above sentence, "using according" seems to either incomplete or wrong usage.

4.
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ "invalid synchronization method is specified \"%d\"",
+ SyncRepConfig->sync_method));

Here, the error message doesn't seem to aligned and you might want to
use errmsg for the same.

5.
+ * In priority method, we need the oldest these positions among sync
+ * standbys. In quorum method, we need the newest these positions
+ * specified by SyncRepConfig->num_sync.

/oldest these/oldest of these
/newest these positions specified/newest of these positions as specified

Instead of newest, can we consider to use latest?

6.
+ int sync_method; /* synchronization method */ /* member_names contains nmembers consecutive nul-terminated C strings
*/char member_names[FLEXIBLE_ARRAY_MEMBER];} SyncRepConfigData;
 

Can't we use 1 or 2 bytes to store sync_method information?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> You could do that, but first I would code up the simplest, cleanest
>>>> algorithm you can think of and see if it even shows up in a 'perf'
>>>> profile.  Microbenchmarking is probably overkill here unless a problem
>>>> is visible on macrobenchmarks.
>>>
>>> This is what I would go for! The current code is doing a simple thing:
>>> select the Nth element using qsort() after scanning each WAL sender's
>>> values. And I think that Sawada-san got it right. Even running on my
>>> laptop a pgbench run with 10 sync standbys using a data set that fits
>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
>>> using perf top on a non-assert, non-debug build. Hash tables and
>>> allocations get a far larger share. Using the patch,
>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
>>> nodes. Let's kick the ball for now. An extra patch could make things
>>> better later on if that's worth it.
>>
>> Yeah, since the both K and N could be not large these algorithm takes
>> almost the same time. And current patch does simple thing. When we
>> need over 100 or 1000 replication node the optimization could be
>> required.
>> Attached latest v9 patch.
>>
>
> Few comments:

Thank you for reviewing.

> + * In 10.0 we support two synchronization methods, priority and
> + * quorum. The number of synchronous standbys that transactions
> + * must wait for replies from and synchronization method are specified
> + * in synchronous_standby_names. This parameter also specifies a list
> + * of standby names, which determines the priority of each standby for
> + * being chosen as a synchronous standby. In priority method, the standbys
> + * whose names appear earlier in the list are given higher priority
> + * and will be considered as synchronous. Other standby servers appearing
> + * later in this list represent potential synchronous standbys. If any of
> + * the current synchronous standbys disconnects for whatever reason,
> + * it will be replaced immediately with the next-highest-priority standby.
> + * In quorum method, the all standbys appearing in the list are
> + * considered as a candidate for quorum commit.
>
> In the above description, is priority method represented by FIRST and
> quorum method by ANY in the synchronous_standby_names syntax?  If so,
> it might be better to write about it explicitly.

Added description.

>
> 2.
>  --- a/src/backend/utils/sort/tuplesort.c
> +++ b/src/backend/utils/sort/tuplesort.c
> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
>   }
>
>   /*
> - * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
> - * from each input tape.
> - */
> - state->memtupsize = numInputTapes;
> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
> - USEMEM(state, GetMemoryChunkSpace(state->memtuples));
> -
> - /*
> - * Use all the remaining memory we have available for read buffers among
> - * the input tapes.
> + * Use all the spare memory we have available for read buffers among the
> + * input tapes.
>
> This doesn't belong to this patch.

Oops, fixed.

> 3.
> + * Return the list of sync standbys using according to synchronous method,
>
> In above sentence, "using according" seems to either incomplete or wrong usage.

Fixed.

> 4.
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + "invalid synchronization method is specified \"%d\"",
> + SyncRepConfig->sync_method));
>
> Here, the error message doesn't seem to aligned and you might want to
> use errmsg for the same.
>

Fixed.

> 5.
> + * In priority method, we need the oldest these positions among sync
> + * standbys. In quorum method, we need the newest these positions
> + * specified by SyncRepConfig->num_sync.
>
> /oldest these/oldest of these
> /newest these positions specified/newest of these positions as specified

Fixed.

> Instead of newest, can we consider to use latest?

Yeah, I changed it so.


> 6.
> + int sync_method; /* synchronization method */
>   /* member_names contains nmembers consecutive nul-terminated C strings */
>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>  } SyncRepConfigData;
>
> Can't we use 1 or 2 bytes to store sync_method information?

I changed it to uint8.

Attached latest v10 patch incorporated the review comments so far.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> You could do that, but first I would code up the simplest, cleanest
>>>>> algorithm you can think of and see if it even shows up in a 'perf'
>>>>> profile.  Microbenchmarking is probably overkill here unless a problem
>>>>> is visible on macrobenchmarks.
>>>>
>>>> This is what I would go for! The current code is doing a simple thing:
>>>> select the Nth element using qsort() after scanning each WAL sender's
>>>> values. And I think that Sawada-san got it right. Even running on my
>>>> laptop a pgbench run with 10 sync standbys using a data set that fits
>>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
>>>> using perf top on a non-assert, non-debug build. Hash tables and
>>>> allocations get a far larger share. Using the patch,
>>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
>>>> nodes. Let's kick the ball for now. An extra patch could make things
>>>> better later on if that's worth it.
>>>
>>> Yeah, since the both K and N could be not large these algorithm takes
>>> almost the same time. And current patch does simple thing. When we
>>> need over 100 or 1000 replication node the optimization could be
>>> required.
>>> Attached latest v9 patch.
>>>
>>
>> Few comments:
>
> Thank you for reviewing.
>
>> + * In 10.0 we support two synchronization methods, priority and
>> + * quorum. The number of synchronous standbys that transactions
>> + * must wait for replies from and synchronization method are specified
>> + * in synchronous_standby_names. This parameter also specifies a list
>> + * of standby names, which determines the priority of each standby for
>> + * being chosen as a synchronous standby. In priority method, the standbys
>> + * whose names appear earlier in the list are given higher priority
>> + * and will be considered as synchronous. Other standby servers appearing
>> + * later in this list represent potential synchronous standbys. If any of
>> + * the current synchronous standbys disconnects for whatever reason,
>> + * it will be replaced immediately with the next-highest-priority standby.
>> + * In quorum method, the all standbys appearing in the list are
>> + * considered as a candidate for quorum commit.
>>
>> In the above description, is priority method represented by FIRST and
>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
>> it might be better to write about it explicitly.
>
> Added description.
>
>>
>> 2.
>>  --- a/src/backend/utils/sort/tuplesort.c
>> +++ b/src/backend/utils/sort/tuplesort.c
>> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
>>   }
>>
>>   /*
>> - * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
>> - * from each input tape.
>> - */
>> - state->memtupsize = numInputTapes;
>> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
>> - USEMEM(state, GetMemoryChunkSpace(state->memtuples));
>> -
>> - /*
>> - * Use all the remaining memory we have available for read buffers among
>> - * the input tapes.
>> + * Use all the spare memory we have available for read buffers among the
>> + * input tapes.
>>
>> This doesn't belong to this patch.
>
> Oops, fixed.
>
>> 3.
>> + * Return the list of sync standbys using according to synchronous method,
>>
>> In above sentence, "using according" seems to either incomplete or wrong usage.
>
> Fixed.
>
>> 4.
>> + else
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + "invalid synchronization method is specified \"%d\"",
>> + SyncRepConfig->sync_method));
>>
>> Here, the error message doesn't seem to aligned and you might want to
>> use errmsg for the same.
>>
>
> Fixed.
>
>> 5.
>> + * In priority method, we need the oldest these positions among sync
>> + * standbys. In quorum method, we need the newest these positions
>> + * specified by SyncRepConfig->num_sync.
>>
>> /oldest these/oldest of these
>> /newest these positions specified/newest of these positions as specified
>
> Fixed.
>
>> Instead of newest, can we consider to use latest?
>
> Yeah, I changed it so.
>
>
>> 6.
>> + int sync_method; /* synchronization method */
>>   /* member_names contains nmembers consecutive nul-terminated C strings */
>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>>  } SyncRepConfigData;
>>
>> Can't we use 1 or 2 bytes to store sync_method information?
>
> I changed it to uint8.
>
> Attached latest v10 patch incorporated the review comments so far.
> Please review it.

Thanks for updating the patch!

Do we need to update postgresql.conf.sample?

+{any_ident}    {
+                yylval.str = pstrdup(yytext);
+                return ANY;
+        }
+{first_ident}    {
+                yylval.str = pstrdup(yytext);
+                return FIRST;
+        }

Why is pstrdup(yytext) necessary here?

+        standby_list                        { $$ =
create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
+        | NUM '(' standby_list ')'            { $$ =
create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
+        | ANY NUM '(' standby_list ')'        { $$ =
create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
+        | FIRST NUM '(' standby_list ')'    { $$ =
create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }

Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
differently from curent version while "list" works in the same way as
current one) very confusing?

I prefer to either of

1. break the backward-compatibility, i.e., treat the first syntax of   "standby_list" as quorum commit
2. keep the backward-compatibility, i.e., treat the second syntax of   "NUM (standby_list)" as sync rep with the
priority

+        <literal>pg_stat_replication</></link> view). If the keyword
+        <literal>FIRST</> is specified, other standby servers appearing
+        later in this list represent potential synchronous standbys.
+        If any of the current synchronous standbys disconnects for
+        whatever reason, it will be replaced immediately with the
+        next-highest-priority standby. Specifying more than one standby
+        name can allow very high availability.

It seems strange to explain the behavior of FIRST before explaining
the syntax of synchronous_standby_names and FIRST.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>>> You could do that, but first I would code up the simplest, cleanest
>>>>>> algorithm you can think of and see if it even shows up in a 'perf'
>>>>>> profile.  Microbenchmarking is probably overkill here unless a problem
>>>>>> is visible on macrobenchmarks.
>>>>>
>>>>> This is what I would go for! The current code is doing a simple thing:
>>>>> select the Nth element using qsort() after scanning each WAL sender's
>>>>> values. And I think that Sawada-san got it right. Even running on my
>>>>> laptop a pgbench run with 10 sync standbys using a data set that fits
>>>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
>>>>> using perf top on a non-assert, non-debug build. Hash tables and
>>>>> allocations get a far larger share. Using the patch,
>>>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
>>>>> nodes. Let's kick the ball for now. An extra patch could make things
>>>>> better later on if that's worth it.
>>>>
>>>> Yeah, since the both K and N could be not large these algorithm takes
>>>> almost the same time. And current patch does simple thing. When we
>>>> need over 100 or 1000 replication node the optimization could be
>>>> required.
>>>> Attached latest v9 patch.
>>>>
>>>
>>> Few comments:
>>
>> Thank you for reviewing.
>>
>>> + * In 10.0 we support two synchronization methods, priority and
>>> + * quorum. The number of synchronous standbys that transactions
>>> + * must wait for replies from and synchronization method are specified
>>> + * in synchronous_standby_names. This parameter also specifies a list
>>> + * of standby names, which determines the priority of each standby for
>>> + * being chosen as a synchronous standby. In priority method, the standbys
>>> + * whose names appear earlier in the list are given higher priority
>>> + * and will be considered as synchronous. Other standby servers appearing
>>> + * later in this list represent potential synchronous standbys. If any of
>>> + * the current synchronous standbys disconnects for whatever reason,
>>> + * it will be replaced immediately with the next-highest-priority standby.
>>> + * In quorum method, the all standbys appearing in the list are
>>> + * considered as a candidate for quorum commit.
>>>
>>> In the above description, is priority method represented by FIRST and
>>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
>>> it might be better to write about it explicitly.
>>
>> Added description.
>>
>>>
>>> 2.
>>>  --- a/src/backend/utils/sort/tuplesort.c
>>> +++ b/src/backend/utils/sort/tuplesort.c
>>> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state)
>>>   }
>>>
>>>   /*
>>> - * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple
>>> - * from each input tape.
>>> - */
>>> - state->memtupsize = numInputTapes;
>>> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));
>>> - USEMEM(state, GetMemoryChunkSpace(state->memtuples));
>>> -
>>> - /*
>>> - * Use all the remaining memory we have available for read buffers among
>>> - * the input tapes.
>>> + * Use all the spare memory we have available for read buffers among the
>>> + * input tapes.
>>>
>>> This doesn't belong to this patch.
>>
>> Oops, fixed.
>>
>>> 3.
>>> + * Return the list of sync standbys using according to synchronous method,
>>>
>>> In above sentence, "using according" seems to either incomplete or wrong usage.
>>
>> Fixed.
>>
>>> 4.
>>> + else
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> + "invalid synchronization method is specified \"%d\"",
>>> + SyncRepConfig->sync_method));
>>>
>>> Here, the error message doesn't seem to aligned and you might want to
>>> use errmsg for the same.
>>>
>>
>> Fixed.
>>
>>> 5.
>>> + * In priority method, we need the oldest these positions among sync
>>> + * standbys. In quorum method, we need the newest these positions
>>> + * specified by SyncRepConfig->num_sync.
>>>
>>> /oldest these/oldest of these
>>> /newest these positions specified/newest of these positions as specified
>>
>> Fixed.
>>
>>> Instead of newest, can we consider to use latest?
>>
>> Yeah, I changed it so.
>>
>>
>>> 6.
>>> + int sync_method; /* synchronization method */
>>>   /* member_names contains nmembers consecutive nul-terminated C strings */
>>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>>>  } SyncRepConfigData;
>>>
>>> Can't we use 1 or 2 bytes to store sync_method information?
>>
>> I changed it to uint8.
>>
>> Attached latest v10 patch incorporated the review comments so far.
>> Please review it.
>
> Thanks for updating the patch!
>
> Do we need to update postgresql.conf.sample?

Added description to postgresql.conf.sample.

> +{any_ident}    {
> +                yylval.str = pstrdup(yytext);
> +                return ANY;
> +        }
> +{first_ident}    {
> +                yylval.str = pstrdup(yytext);
> +                return FIRST;
> +        }
>
> Why is pstrdup(yytext) necessary here?

The first whole line was unnecessary actually. Removed.

> +        standby_list                        { $$ =
> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
> +        | NUM '(' standby_list ')'            { $$ =
> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
> +        | ANY NUM '(' standby_list ')'        { $$ =
> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
> +        | FIRST NUM '(' standby_list ')'    { $$ =
> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
>
> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
> differently from curent version while "list" works in the same way as
> current one) very confusing?
>
> I prefer to either of
>
> 1. break the backward-compatibility, i.e., treat the first syntax of
>     "standby_list" as quorum commit
> 2. keep the backward-compatibility, i.e., treat the second syntax of
>     "NUM (standby_list)" as sync rep with the priority

There were some comments when I proposed the quorum commit. If we do
#1 it breaks the backward-compatibility with 9.5 or before as well. I
don't think it's a good idea. On the other hand, if we do #2 then the
behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
(standby_list)''. But it would not what most of user will want and
would confuse the users of future version who will want to use the
quorum commit. Since many hackers thought that the sensible default
behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
current patch chose to changes the behaviour of s_s_names and document
that changes thoroughly.

> +        <literal>pg_stat_replication</></link> view). If the keyword
> +        <literal>FIRST</> is specified, other standby servers appearing
> +        later in this list represent potential synchronous standbys.
> +        If any of the current synchronous standbys disconnects for
> +        whatever reason, it will be replaced immediately with the
> +        next-highest-priority standby. Specifying more than one standby
> +        name can allow very high availability.
>
> It seems strange to explain the behavior of FIRST before explaining
> the syntax of synchronous_standby_names and FIRST.
>

Updated document.

Attached latest v11 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Amit Kapila
Date:
On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>
>>>> Few comments:
>>>
>>> Thank you for reviewing.
>>>
>>>> + * In 10.0 we support two synchronization methods, priority and
>>>> + * quorum. The number of synchronous standbys that transactions
>>>> + * must wait for replies from and synchronization method are specified
>>>> + * in synchronous_standby_names. This parameter also specifies a list
>>>> + * of standby names, which determines the priority of each standby for
>>>> + * being chosen as a synchronous standby. In priority method, the standbys
>>>> + * whose names appear earlier in the list are given higher priority
>>>> + * and will be considered as synchronous. Other standby servers appearing
>>>> + * later in this list represent potential synchronous standbys. If any of
>>>> + * the current synchronous standbys disconnects for whatever reason,
>>>> + * it will be replaced immediately with the next-highest-priority standby.
>>>> + * In quorum method, the all standbys appearing in the list are
>>>> + * considered as a candidate for quorum commit.
>>>>
>>>> In the above description, is priority method represented by FIRST and
>>>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
>>>> it might be better to write about it explicitly.
>>>
>>> Added description.
>>>

+ * specified in synchronous_standby_names. The priority method is
+ * represented by FIRST, and the quorum method is represented by ANY

Full stop is missing after "ANY".

>>>
>>>> 6.
>>>> + int sync_method; /* synchronization method */
>>>>   /* member_names contains nmembers consecutive nul-terminated C strings */
>>>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>>>>  } SyncRepConfigData;
>>>>
>>>> Can't we use 1 or 2 bytes to store sync_method information?
>>>
>>> I changed it to uint8.
>>>

+ int8 sync_method; /* synchronization method */

> I changed it to uint8.

In mail, you have mentioned uint8, but in the code it is int8.  I
think you want to update code to use uint8.


>
>> +        standby_list                        { $$ =
>> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
>> +        | NUM '(' standby_list ')'            { $$ =
>> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
>> +        | ANY NUM '(' standby_list ')'        { $$ =
>> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
>> +        | FIRST NUM '(' standby_list ')'    { $$ =
>> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
>>
>> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
>> differently from curent version while "list" works in the same way as
>> current one) very confusing?
>>
>> I prefer to either of
>>
>> 1. break the backward-compatibility, i.e., treat the first syntax of
>>     "standby_list" as quorum commit
>> 2. keep the backward-compatibility, i.e., treat the second syntax of
>>     "NUM (standby_list)" as sync rep with the priority
>

+1.

> There were some comments when I proposed the quorum commit. If we do
> #1 it breaks the backward-compatibility with 9.5 or before as well. I
> don't think it's a good idea. On the other hand, if we do #2 then the
> behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
> (standby_list)''. But it would not what most of user will want and
> would confuse the users of future version who will want to use the
> quorum commit. Since many hackers thought that the sensible default
> behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
> current patch chose to changes the behaviour of s_s_names and document
> that changes thoroughly.
>

Your arguments are sensible, but I think we should address the point
of confusion raised by Fujii-san.  As a developer, I feel breaking
backward compatibility (go with Option-1 mentioned above) here is a
good move as it can avoid confusions in future.  However, I know many
a time users are so accustomed to the current usage that they feel
irritated with the change in behavior even it is for their betterment,
so it is advisable to do so only if it is necessary or we have
feedback from a couple of users.  So in this case, if we don't want to
go with Option-1, then I think we should go with Option-2.  If we go
with Option-2, then we can anyway comeback to change the behavior
which is more sensible for future after feedback from users.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1JoheFzO1rrKm391wJDepFvZr1geRh4ZJ_9iC4FOX+3Uw@mail.gmail.com>
> On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>
> >>>> Few comments:
> >>>
> >>> Thank you for reviewing.
> >>>
> >>>> + * In 10.0 we support two synchronization methods, priority and
> >>>> + * quorum. The number of synchronous standbys that transactions
> >>>> + * must wait for replies from and synchronization method are specified
> >>>> + * in synchronous_standby_names. This parameter also specifies a list
> >>>> + * of standby names, which determines the priority of each standby for
> >>>> + * being chosen as a synchronous standby. In priority method, the standbys
> >>>> + * whose names appear earlier in the list are given higher priority
> >>>> + * and will be considered as synchronous. Other standby servers appearing
> >>>> + * later in this list represent potential synchronous standbys. If any of
> >>>> + * the current synchronous standbys disconnects for whatever reason,
> >>>> + * it will be replaced immediately with the next-highest-priority standby.
> >>>> + * In quorum method, the all standbys appearing in the list are
> >>>> + * considered as a candidate for quorum commit.
> >>>>
> >>>> In the above description, is priority method represented by FIRST and
> >>>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
> >>>> it might be better to write about it explicitly.
> >>>
> >>> Added description.
> >>>
> 
> + * specified in synchronous_standby_names. The priority method is
> + * represented by FIRST, and the quorum method is represented by ANY
> 
> Full stop is missing after "ANY".
> 
> >>>
> >>>> 6.
> >>>> + int sync_method; /* synchronization method */
> >>>>   /* member_names contains nmembers consecutive nul-terminated C strings */
> >>>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
> >>>>  } SyncRepConfigData;
> >>>>
> >>>> Can't we use 1 or 2 bytes to store sync_method information?
> >>>
> >>> I changed it to uint8.
> >>>
> 
> + int8 sync_method; /* synchronization method */
> 
> > I changed it to uint8.
> 
> In mail, you have mentioned uint8, but in the code it is int8.  I
> think you want to update code to use uint8.
> 
> 
> >
> >> +        standby_list                        { $$ =
> >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
> >> +        | NUM '(' standby_list ')'            { $$ =
> >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
> >> +        | ANY NUM '(' standby_list ')'        { $$ =
> >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
> >> +        | FIRST NUM '(' standby_list ')'    { $$ =
> >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
> >>
> >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
> >> differently from curent version while "list" works in the same way as
> >> current one) very confusing?
> >>
> >> I prefer to either of
> >>
> >> 1. break the backward-compatibility, i.e., treat the first syntax of
> >>     "standby_list" as quorum commit
> >> 2. keep the backward-compatibility, i.e., treat the second syntax of
> >>     "NUM (standby_list)" as sync rep with the priority
> >
> 
> +1.
> 
> > There were some comments when I proposed the quorum commit. If we do
> > #1 it breaks the backward-compatibility with 9.5 or before as well. I
> > don't think it's a good idea. On the other hand, if we do #2 then the
> > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
> > (standby_list)''. But it would not what most of user will want and
> > would confuse the users of future version who will want to use the
> > quorum commit. Since many hackers thought that the sensible default
> > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
> > current patch chose to changes the behaviour of s_s_names and document
> > that changes thoroughly.
> >
> 
> Your arguments are sensible, but I think we should address the point
> of confusion raised by Fujii-san.  As a developer, I feel breaking
> backward compatibility (go with Option-1 mentioned above) here is a
> good move as it can avoid confusions in future.  However, I know many
> a time users are so accustomed to the current usage that they feel
> irritated with the change in behavior even it is for their betterment,
> so it is advisable to do so only if it is necessary or we have
> feedback from a couple of users.  So in this case, if we don't want to
> go with Option-1, then I think we should go with Option-2.  If we go
> with Option-2, then we can anyway comeback to change the behavior
> which is more sensible for future after feedback from users.

This implicitly put an assumption that replication configuration
is defined by s_s_names. But in the past discussion, some people
suggested that quorum commit should be configured by another GUC
variable and I think it is the time to do this now.

So, we have the third option that would be like the following.
 - s_s_names is restored to work in the way as of 9.5.  or may   be the same as 9.6. Or immediately remove it! My
inclination  is doing this.
 
 - a new GUC varialbe such like "quorum_commit_standbys" (which   is exclusive to s_s_names) is defined for new version
of  quorum commit feature. The option-1 except "standby_list"   format is usable in this.
 

This doesn't puzzle users who don't read release notes carefully
(ME?). Leaving s_s_names can save some of such users but I don't
think it is requried at Pg10.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Tue, Dec 13, 2016 at 5:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1JoheFzO1rrKm391wJDepFvZr1geRh4ZJ_9iC4FOX+3Uw@mail.gmail.com>
>> On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>>>
>> >>>> Few comments:
>> >>>
>> >>> Thank you for reviewing.
>> >>>
>> >>>> + * In 10.0 we support two synchronization methods, priority and
>> >>>> + * quorum. The number of synchronous standbys that transactions
>> >>>> + * must wait for replies from and synchronization method are specified
>> >>>> + * in synchronous_standby_names. This parameter also specifies a list
>> >>>> + * of standby names, which determines the priority of each standby for
>> >>>> + * being chosen as a synchronous standby. In priority method, the standbys
>> >>>> + * whose names appear earlier in the list are given higher priority
>> >>>> + * and will be considered as synchronous. Other standby servers appearing
>> >>>> + * later in this list represent potential synchronous standbys. If any of
>> >>>> + * the current synchronous standbys disconnects for whatever reason,
>> >>>> + * it will be replaced immediately with the next-highest-priority standby.
>> >>>> + * In quorum method, the all standbys appearing in the list are
>> >>>> + * considered as a candidate for quorum commit.
>> >>>>
>> >>>> In the above description, is priority method represented by FIRST and
>> >>>> quorum method by ANY in the synchronous_standby_names syntax?  If so,
>> >>>> it might be better to write about it explicitly.
>> >>>
>> >>> Added description.
>> >>>
>>
>> + * specified in synchronous_standby_names. The priority method is
>> + * represented by FIRST, and the quorum method is represented by ANY
>>
>> Full stop is missing after "ANY".
>>
>> >>>
>> >>>> 6.
>> >>>> + int sync_method; /* synchronization method */
>> >>>>   /* member_names contains nmembers consecutive nul-terminated C strings */
>> >>>>   char member_names[FLEXIBLE_ARRAY_MEMBER];
>> >>>>  } SyncRepConfigData;
>> >>>>
>> >>>> Can't we use 1 or 2 bytes to store sync_method information?
>> >>>
>> >>> I changed it to uint8.
>> >>>
>>
>> + int8 sync_method; /* synchronization method */
>>
>> > I changed it to uint8.
>>
>> In mail, you have mentioned uint8, but in the code it is int8.  I
>> think you want to update code to use uint8.
>>
>>
>> >
>> >> +        standby_list                        { $$ =
>> >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
>> >> +        | NUM '(' standby_list ')'            { $$ =
>> >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
>> >> +        | ANY NUM '(' standby_list ')'        { $$ =
>> >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
>> >> +        | FIRST NUM '(' standby_list ')'    { $$ =
>> >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
>> >>
>> >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
>> >> differently from curent version while "list" works in the same way as
>> >> current one) very confusing?
>> >>
>> >> I prefer to either of
>> >>
>> >> 1. break the backward-compatibility, i.e., treat the first syntax of
>> >>     "standby_list" as quorum commit
>> >> 2. keep the backward-compatibility, i.e., treat the second syntax of
>> >>     "NUM (standby_list)" as sync rep with the priority
>> >
>>
>> +1.
>>
>> > There were some comments when I proposed the quorum commit. If we do
>> > #1 it breaks the backward-compatibility with 9.5 or before as well. I
>> > don't think it's a good idea. On the other hand, if we do #2 then the
>> > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
>> > (standby_list)''. But it would not what most of user will want and
>> > would confuse the users of future version who will want to use the
>> > quorum commit. Since many hackers thought that the sensible default
>> > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
>> > current patch chose to changes the behaviour of s_s_names and document
>> > that changes thoroughly.
>> >
>>
>> Your arguments are sensible, but I think we should address the point
>> of confusion raised by Fujii-san.  As a developer, I feel breaking
>> backward compatibility (go with Option-1 mentioned above) here is a
>> good move as it can avoid confusions in future.  However, I know many
>> a time users are so accustomed to the current usage that they feel
>> irritated with the change in behavior even it is for their betterment,
>> so it is advisable to do so only if it is necessary or we have
>> feedback from a couple of users.  So in this case, if we don't want to
>> go with Option-1, then I think we should go with Option-2.  If we go
>> with Option-2, then we can anyway comeback to change the behavior
>> which is more sensible for future after feedback from users.
>
> This implicitly put an assumption that replication configuration
> is defined by s_s_names. But in the past discussion, some people
> suggested that quorum commit should be configured by another GUC
> variable and I think it is the time to do this now.
>
> So, we have the third option that would be like the following.
>
>   - s_s_names is restored to work in the way as of 9.5.  or may
>     be the same as 9.6. Or immediately remove it! My inclination
>     is doing this.
>
>   - a new GUC varialbe such like "quorum_commit_standbys" (which
>     is exclusive to s_s_names) is defined for new version of
>     quorum commit feature. The option-1 except "standby_list"
>     format is usable in this.

If we drop the "standby_list" syntax, I don't think that new parameter is
necessary. We can keep s_s_names and just drop the support for that syntax
from s_s_names. This may be ok if we're really in "break all the things" mode
for PostgreSQL 10.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> If we drop the "standby_list" syntax, I don't think that new parameter is
> necessary. We can keep s_s_names and just drop the support for that syntax
> from s_s_names. This may be ok if we're really in "break all the things" mode
> for PostgreSQL 10.

Please let's not raise that as an argument again... And not break the
s_list argument. Many users depend on that for just single sync
standbys. FWIW, I'd be in favor of backward compatibility and say that
a standby list is a priority list if we can maintain that. Upthread
agreement was to break that, I did not insist further, and won't if
that's still the feeling.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> If we drop the "standby_list" syntax, I don't think that new parameter is
>> necessary. We can keep s_s_names and just drop the support for that syntax
>> from s_s_names. This may be ok if we're really in "break all the things" mode
>> for PostgreSQL 10.
>
> Please let's not raise that as an argument again... And not break the
> s_list argument. Many users depend on that for just single sync
> standbys. FWIW, I'd be in favor of backward compatibility and say that
> a standby list is a priority list if we can maintain that. Upthread
> agreement was to break that, I did not insist further, and won't if
> that's still the feeling.

I wonder why you think that the backward-compatibility for standby_list is
so "special". We renamed pg_xlog directory to pg_wal and are planning to
change recovery.conf API at all, though they have bigger impacts on
the existing users in terms of the backward compatibility. OTOH, so far,
changing GUC between major releases happened several times.

But I'm not against keeping the backward compatibility for standby_list,
to be honest. My concern is that the latest patch tries to support
the backward compatibility "partially" and which would be confusing to users,
as I told upthread.

So I'd like to propose to keep the backward compatibility fully for s_s_names
(i.e., both "standby_list" and "N (standby_list)" mean the priority method)
at the first commit, then continue discussing this and change it if we reach
the consensus until PostgreSQL 10 is actually released. Thought?

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> If we drop the "standby_list" syntax, I don't think that new parameter is
>>> necessary. We can keep s_s_names and just drop the support for that syntax
>>> from s_s_names. This may be ok if we're really in "break all the things" mode
>>> for PostgreSQL 10.
>>
>> Please let's not raise that as an argument again... And not break the
>> s_list argument. Many users depend on that for just single sync
>> standbys. FWIW, I'd be in favor of backward compatibility and say that
>> a standby list is a priority list if we can maintain that. Upthread
>> agreement was to break that, I did not insist further, and won't if
>> that's still the feeling.
>
> I wonder why you think that the backward-compatibility for standby_list is
> so "special". We renamed pg_xlog directory to pg_wal and are planning to
> change recovery.conf API at all, though they have bigger impacts on
> the existing users in terms of the backward compatibility. OTOH, so far,
> changing GUC between major releases happened several times.

Silent failures for existing failover deployments is a pain to solve
after doing upgrades. That's my only concern. Changing pg_wal would
result in a hard failure when upgrading. And changing the meaning of
the standby list (without keyword ANY or FIRST!) does not fall into
that category... So yes just removing support for standby list would
result in a hard failure, which would be fine for the
let-s-break-all-things move.

> But I'm not against keeping the backward compatibility for standby_list,
> to be honest. My concern is that the latest patch tries to support
> the backward compatibility "partially" and which would be confusing to users,
> as I told upthread.

If we try to support backward compatibility, I'd personally do it
fully, and have a list of standby names specified meaning a priority
list.

> So I'd like to propose to keep the backward compatibility fully for s_s_names
> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
> at the first commit, then continue discussing this and change it if we reach
> the consensus until PostgreSQL 10 is actually released. Thought?

+1 on that.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Amit Kapila
Date:
On Thu, Dec 15, 2016 at 7:53 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>
>> So I'd like to propose to keep the backward compatibility fully for s_s_names
>> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
>> at the first commit, then continue discussing this and change it if we reach
>> the consensus until PostgreSQL 10 is actually released. Thought?
>
> +1 on that.
>

+1.  That is the safest option to proceed.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> If we drop the "standby_list" syntax, I don't think that new parameter is
>>>> necessary. We can keep s_s_names and just drop the support for that syntax
>>>> from s_s_names. This may be ok if we're really in "break all the things" mode
>>>> for PostgreSQL 10.
>>>
>>> Please let's not raise that as an argument again... And not break the
>>> s_list argument. Many users depend on that for just single sync
>>> standbys. FWIW, I'd be in favor of backward compatibility and say that
>>> a standby list is a priority list if we can maintain that. Upthread
>>> agreement was to break that, I did not insist further, and won't if
>>> that's still the feeling.
>>
>> I wonder why you think that the backward-compatibility for standby_list is
>> so "special". We renamed pg_xlog directory to pg_wal and are planning to
>> change recovery.conf API at all, though they have bigger impacts on
>> the existing users in terms of the backward compatibility. OTOH, so far,
>> changing GUC between major releases happened several times.
>
> Silent failures for existing failover deployments is a pain to solve
> after doing upgrades. That's my only concern. Changing pg_wal would
> result in a hard failure when upgrading. And changing the meaning of
> the standby list (without keyword ANY or FIRST!) does not fall into
> that category... So yes just removing support for standby list would
> result in a hard failure, which would be fine for the
> let-s-break-all-things move.
>
>> But I'm not against keeping the backward compatibility for standby_list,
>> to be honest. My concern is that the latest patch tries to support
>> the backward compatibility "partially" and which would be confusing to users,
>> as I told upthread.
> If we try to support backward compatibility, I'd personally do it
> fully, and have a list of standby names specified meaning a priority
> list.
>
>> So I'd like to propose to keep the backward compatibility fully for s_s_names
>> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
>> at the first commit, then continue discussing this and change it if we reach
>> the consensus until PostgreSQL 10 is actually released. Thought?
>
> +1 on that.

+1.
I'll update the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Thu, 15 Dec 2016 14:20:53 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDn73aC+o0mrWCs800LeOsMYP4oV7xVb0T0_4V5VCQzhQ@mail.gmail.com>
> On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
> >> <michael.paquier@gmail.com> wrote:
> >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >>>> If we drop the "standby_list" syntax, I don't think that new parameter is
> >>>> necessary. We can keep s_s_names and just drop the support for that syntax
> >>>> from s_s_names. This may be ok if we're really in "break all the things" mode
> >>>> for PostgreSQL 10.
> >>>
> >>> Please let's not raise that as an argument again... And not break the
> >>> s_list argument. Many users depend on that for just single sync
> >>> standbys. FWIW, I'd be in favor of backward compatibility and say that
> >>> a standby list is a priority list if we can maintain that. Upthread
> >>> agreement was to break that, I did not insist further, and won't if
> >>> that's still the feeling.
> >>
> >> I wonder why you think that the backward-compatibility for standby_list is
> >> so "special". We renamed pg_xlog directory to pg_wal and are planning to
> >> change recovery.conf API at all, though they have bigger impacts on
> >> the existing users in terms of the backward compatibility. OTOH, so far,
> >> changing GUC between major releases happened several times.
> >
> > Silent failures for existing failover deployments is a pain to solve
> > after doing upgrades. That's my only concern. Changing pg_wal would
> > result in a hard failure when upgrading. And changing the meaning of
> > the standby list (without keyword ANY or FIRST!) does not fall into
> > that category... So yes just removing support for standby list would
> > result in a hard failure, which would be fine for the
> > let-s-break-all-things move.
> >
> >> But I'm not against keeping the backward compatibility for standby_list,
> >> to be honest. My concern is that the latest patch tries to support
> >> the backward compatibility "partially" and which would be confusing to users,
> >> as I told upthread.
> > If we try to support backward compatibility, I'd personally do it
> > fully, and have a list of standby names specified meaning a priority
> > list.
> >
> >> So I'd like to propose to keep the backward compatibility fully for s_s_names
> >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
> >> at the first commit, then continue discussing this and change it if we reach
> >> the consensus until PostgreSQL 10 is actually released. Thought?
> >
> > +1 on that.
> 
> +1.

FWIW, +1 from me.

> I'll update the patch.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Dec 15, 2016 at 3:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Thu, 15 Dec 2016 14:20:53 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDn73aC+o0mrWCs800LeOsMYP4oV7xVb0T0_4V5VCQzhQ@mail.gmail.com>
>> On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>> >> <michael.paquier@gmail.com> wrote:
>> >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> >>>> If we drop the "standby_list" syntax, I don't think that new parameter is
>> >>>> necessary. We can keep s_s_names and just drop the support for that syntax
>> >>>> from s_s_names. This may be ok if we're really in "break all the things" mode
>> >>>> for PostgreSQL 10.
>> >>>
>> >>> Please let's not raise that as an argument again... And not break the
>> >>> s_list argument. Many users depend on that for just single sync
>> >>> standbys. FWIW, I'd be in favor of backward compatibility and say that
>> >>> a standby list is a priority list if we can maintain that. Upthread
>> >>> agreement was to break that, I did not insist further, and won't if
>> >>> that's still the feeling.
>> >>
>> >> I wonder why you think that the backward-compatibility for standby_list is
>> >> so "special". We renamed pg_xlog directory to pg_wal and are planning to
>> >> change recovery.conf API at all, though they have bigger impacts on
>> >> the existing users in terms of the backward compatibility. OTOH, so far,
>> >> changing GUC between major releases happened several times.
>> >
>> > Silent failures for existing failover deployments is a pain to solve
>> > after doing upgrades. That's my only concern. Changing pg_wal would
>> > result in a hard failure when upgrading. And changing the meaning of
>> > the standby list (without keyword ANY or FIRST!) does not fall into
>> > that category... So yes just removing support for standby list would
>> > result in a hard failure, which would be fine for the
>> > let-s-break-all-things move.
>> >
>> >> But I'm not against keeping the backward compatibility for standby_list,
>> >> to be honest. My concern is that the latest patch tries to support
>> >> the backward compatibility "partially" and which would be confusing to users,
>> >> as I told upthread.
>> > If we try to support backward compatibility, I'd personally do it
>> > fully, and have a list of standby names specified meaning a priority
>> > list.
>> >
>> >> So I'd like to propose to keep the backward compatibility fully for s_s_names
>> >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
>> >> at the first commit, then continue discussing this and change it if we reach
>> >> the consensus until PostgreSQL 10 is actually released. Thought?
>> >
>> > +1 on that.
>>
>> +1.
>
> FWIW, +1 from me.
>
>> I'll update the patch.
>

Attached latest v12 patch.
I changed behavior of "N (standby_list)" to use the priority method
and incorporated some review comments so far. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Attached latest v12 patch.
> I changed behavior of "N (standby_list)" to use the priority method
> and incorporated some review comments so far. Please review it.

Some comments...

+    Another example of <varname>synchronous_standby_names</> for multiple
+    synchronous standby is:
Here standby takes an 's'.

+    candidates. The master server will wait for at least 2 replies from them.
+    <literal>s4</> is an asynchronous standby since its name is not in the list.
+   </para>
"will wait for replies from at least two of them".

+ * next-highest-priority standby. In quorum method, the all standbys
+ * appearing in the list are considered as a candidate for quorum commit.
"the all" is incorrect. I think you mean "all the" instead.

+ * NIL if no sync standby is connected. In quorum method, all standby
+ * priorities are same, that is 1. So this function returns the list of
This is not true. Standys have a priority number assigned. Though it does
not matter much for quorum groups, it gives an indication of their position
in the defined list.
#synchronous_standby_names = ''    # standby servers that provide sync rep-               # number of sync standbys and
comma-separatedlist of application_name+               # synchronization method, number of sync standbys+
#and comma-separated list of application_name                # from standby(s); '*' = all
 
The formulation is funny here: "sync rep synchronization method".

I think that Fujii-san has also some doc changes in his box. For anybody
picking up this patch next, it would be good to incorporate the things
I have noticed here.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Fri, Dec 16, 2016 at 2:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Attached latest v12 patch.
>> I changed behavior of "N (standby_list)" to use the priority method
>> and incorporated some review comments so far. Please review it.
>
> Some comments...
>
> +    Another example of <varname>synchronous_standby_names</> for multiple
> +    synchronous standby is:
> Here standby takes an 's'.
>
> +    candidates. The master server will wait for at least 2 replies from them.
> +    <literal>s4</> is an asynchronous standby since its name is not in the list.
> +   </para>
> "will wait for replies from at least two of them".
>
> + * next-highest-priority standby. In quorum method, the all standbys
> + * appearing in the list are considered as a candidate for quorum commit.
> "the all" is incorrect. I think you mean "all the" instead.
>
> + * NIL if no sync standby is connected. In quorum method, all standby
> + * priorities are same, that is 1. So this function returns the list of
> This is not true. Standys have a priority number assigned. Though it does
> not matter much for quorum groups, it gives an indication of their position
> in the defined list.
>
>  #synchronous_standby_names = ''    # standby servers that provide sync rep
>  -               # number of sync standbys and comma-separated list of application_name
>  +               # synchronization method, number of sync standbys
>  +               # and comma-separated list of application_name
>                  # from standby(s); '*' = all
> The formulation is funny here: "sync rep synchronization method".
>
> I think that Fujii-san has also some doc changes in his box. For anybody
> picking up this patch next, it would be good to incorporate the things
> I have noticed here.

Yes, I will. Thanks!

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Fri, Dec 16, 2016 at 5:04 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Dec 16, 2016 at 2:38 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Attached latest v12 patch.
>>> I changed behavior of "N (standby_list)" to use the priority method
>>> and incorporated some review comments so far. Please review it.
>>
>> Some comments...
>>
>> +    Another example of <varname>synchronous_standby_names</> for multiple
>> +    synchronous standby is:
>> Here standby takes an 's'.
>>
>> +    candidates. The master server will wait for at least 2 replies from them.
>> +    <literal>s4</> is an asynchronous standby since its name is not in the list.
>> +   </para>
>> "will wait for replies from at least two of them".
>>
>> + * next-highest-priority standby. In quorum method, the all standbys
>> + * appearing in the list are considered as a candidate for quorum commit.
>> "the all" is incorrect. I think you mean "all the" instead.
>>
>> + * NIL if no sync standby is connected. In quorum method, all standby
>> + * priorities are same, that is 1. So this function returns the list of
>> This is not true. Standys have a priority number assigned. Though it does
>> not matter much for quorum groups, it gives an indication of their position
>> in the defined list.
>>
>>  #synchronous_standby_names = ''    # standby servers that provide sync rep
>>  -               # number of sync standbys and comma-separated list of application_name
>>  +               # synchronization method, number of sync standbys
>>  +               # and comma-separated list of application_name
>>                  # from standby(s); '*' = all
>> The formulation is funny here: "sync rep synchronization method".
>>
>> I think that Fujii-san has also some doc changes in his box. For anybody
>> picking up this patch next, it would be good to incorporate the things
>> I have noticed here.
>
> Yes, I will. Thanks!

Attached is the modified version of the patch. Barring objections, I will
commit this version.

Even after committing the patch, there will be still many source comments
and documentations that we need to update, for example,
in high-availability.sgml. We need to check and update them throughly later.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Attached is the modified version of the patch. Barring objections, I will
> commit this version.

There is a whitespace:
$ git diff master --check
src/backend/replication/syncrep.c:39: trailing whitespace.
+ *

> Even after committing the patch, there will be still many source comments
> and documentations that we need to update, for example,
> in high-availability.sgml. We need to check and update them throughly later.

The current patch is complicated enough, so that's fine for me. I
checked the patch one last time and that looks good.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Sun, Dec 18, 2016 at 9:36 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Attached is the modified version of the patch. Barring objections, I will
>> commit this version.
>
> There is a whitespace:
> $ git diff master --check
> src/backend/replication/syncrep.c:39: trailing whitespace.
> + *

Okey, pushed the patch with this fix. Thanks!

Regarding this feature, there are some loose ends. We should work on
and complete them until the release.

(1)
Which synchronous replication method, priority or quorum, should be
chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
a priority-based sync replication is chosen for keeping backward
compatibility. However some hackers argued to change this decision
so that a quorum commit is chosen because they think that most users
prefer to a quorum.

(2)
There will be still many source comments and documentations that
we need to update, for example, in high-availability.sgml. We need to
check and update them throughly.

(3)
The priority value is assigned to each standby listed in s_s_names
even in quorum commit though those priority values are not used at all.
Users can see those priority values in pg_stat_replication.
Isn't this confusing? If yes, it might be better to always assign 1 as
the priority, for example.


Any other?

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Alvaro Herrera
Date:
Fujii Masao wrote:

> Regarding this feature, there are some loose ends. We should work on
> and complete them until the release.

Please list these in https://wiki.postgresql.org/wiki/Open_Items so that we
don't forget.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Mon, Dec 19, 2016 at 9:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Dec 18, 2016 at 9:36 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Attached is the modified version of the patch. Barring objections, I will
>>> commit this version.
>>
>> There is a whitespace:
>> $ git diff master --check
>> src/backend/replication/syncrep.c:39: trailing whitespace.
>> + *
>
> Okey, pushed the patch with this fix. Thanks!

Thank you for reviewing and commit!

> Regarding this feature, there are some loose ends. We should work on
> and complete them until the release.
>
> (1)
> Which synchronous replication method, priority or quorum, should be
> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
> a priority-based sync replication is chosen for keeping backward
> compatibility. However some hackers argued to change this decision
> so that a quorum commit is chosen because they think that most users
> prefer to a quorum.
>
> (2)
> There will be still many source comments and documentations that
> we need to update, for example, in high-availability.sgml. We need to
> check and update them throughly.

Will try to update them.

> (3)
> The priority value is assigned to each standby listed in s_s_names
> even in quorum commit though those priority values are not used at all.
> Users can see those priority values in pg_stat_replication.
> Isn't this confusing? If yes, it might be better to always assign 1 as
> the priority, for example.
>
>
> Any other?
>

Do we need to consider the sorting method and the selecting k-th
latest LSN method?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Do we need to consider the sorting method and the selecting k-th
> latest LSN method?

Honestly, nah. Tests are showing that there are many more bottlenecks
before that with just memory allocation and parsing.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Tue, Dec 20, 2016 at 1:44 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Fujii Masao wrote:
>
>> Regarding this feature, there are some loose ends. We should work on
>> and complete them until the release.
>
> Please list these in https://wiki.postgresql.org/wiki/Open_Items so that we
> don't forget.

Yep, added!

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Tue, Dec 20, 2016 at 2:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Do we need to consider the sorting method and the selecting k-th
>> latest LSN method?
>
> Honestly, nah. Tests are showing that there are many more bottlenecks
> before that with just memory allocation and parsing.

I think that it's worth prototyping alternative algorithm, and
measuring the performances of those alternative and current
algorithms. This measurement should check not only the bottleneck
but also how much each algorithm increases the time that backends
need to wait for before they receive ack from walsender.

If it's reported that current algorithm is enough "effecient",
we can just leave the code as it is. OTOH, if not, let's adopt
the alternative one.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Tue, 20 Dec 2016 23:47:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwFcEhv8BPP0HV2VQ8kXaHQmfN7PFAgkKsPyVip0frizpg@mail.gmail.com>
> On Tue, Dec 20, 2016 at 2:46 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >> Do we need to consider the sorting method and the selecting k-th
> >> latest LSN method?
> >
> > Honestly, nah. Tests are showing that there are many more bottlenecks
> > before that with just memory allocation and parsing.
> 
> I think that it's worth prototyping alternative algorithm, and
> measuring the performances of those alternative and current
> algorithms. This measurement should check not only the bottleneck
> but also how much each algorithm increases the time that backends
> need to wait for before they receive ack from walsender.
> 
> If it's reported that current algorithm is enough "effecient",
> we can just leave the code as it is. OTOH, if not, let's adopt
> the alternative one.

I'm personally interested in the difference of them but it
doesn't seem urgently required. If we have nothing particular to
do with this, considering other ordering method would be
valuable.

By a not-well-grounded thought though, maintaining top-kth list
by insertion sort would be promising rather than running top-kth
sorting on the whole list. Sorting on all walsenders is needed
for the first time and some other situation though.


By the way, do we continue dispu^h^hcussing on the format of
s_s_names and/or a successor right now?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Wed, Dec 21, 2016 at 10:39 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Tue, 20 Dec 2016 23:47:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwFcEhv8BPP0HV2VQ8kXaHQmfN7PFAgkKsPyVip0frizpg@mail.gmail.com>
>> On Tue, Dec 20, 2016 at 2:46 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >> Do we need to consider the sorting method and the selecting k-th
>> >> latest LSN method?
>> >
>> > Honestly, nah. Tests are showing that there are many more bottlenecks
>> > before that with just memory allocation and parsing.
>>
>> I think that it's worth prototyping alternative algorithm, and
>> measuring the performances of those alternative and current
>> algorithms. This measurement should check not only the bottleneck
>> but also how much each algorithm increases the time that backends
>> need to wait for before they receive ack from walsender.
>>
>> If it's reported that current algorithm is enough "effecient",
>> we can just leave the code as it is. OTOH, if not, let's adopt
>> the alternative one.
>
> I'm personally interested in the difference of them but it
> doesn't seem urgently required.

Yes, it's not urgent task.

> If we have nothing particular to
> do with this, considering other ordering method would be
> valuable.
>
> By a not-well-grounded thought though, maintaining top-kth list
> by insertion sort would be promising rather than running top-kth
> sorting on the whole list. Sorting on all walsenders is needed
> for the first time and some other situation though.
>
>
> By the way, do we continue dispu^h^hcussing on the format of
> s_s_names and/or a successor right now?

Yes. If there is better approach, we should discuss that.

Regards,

-- 
Fujii Masao



Re: Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> Regarding this feature, there are some loose ends. We should work on
> and complete them until the release.
> 
> (1)
> Which synchronous replication method, priority or quorum, should be
> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
> a priority-based sync replication is chosen for keeping backward
> compatibility. However some hackers argued to change this decision
> so that a quorum commit is chosen because they think that most users
> prefer to a quorum.
> 
> (2)
> There will be still many source comments and documentations that
> we need to update, for example, in high-availability.sgml. We need to
> check and update them throughly.
> 
> (3)
> The priority value is assigned to each standby listed in s_s_names
> even in quorum commit though those priority values are not used at all.
> Users can see those priority values in pg_stat_replication.
> Isn't this confusing? If yes, it might be better to always assign 1 as
> the priority, for example.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> Regarding this feature, there are some loose ends. We should work on
>> and complete them until the release.
>>
>> (1)
>> Which synchronous replication method, priority or quorum, should be
>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>> a priority-based sync replication is chosen for keeping backward
>> compatibility. However some hackers argued to change this decision
>> so that a quorum commit is chosen because they think that most users
>> prefer to a quorum.
>>
>> (2)
>> There will be still many source comments and documentations that
>> we need to update, for example, in high-availability.sgml. We need to
>> check and update them throughly.
>>
>> (3)
>> The priority value is assigned to each standby listed in s_s_names
>> even in quorum commit though those priority values are not used at all.
>> Users can see those priority values in pg_stat_replication.
>> Isn't this confusing? If yes, it might be better to always assign 1 as
>> the priority, for example.
>
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
>
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Thanks for the notice!

Regarding the item (2), Sawada-san told me that he will work on it after
this CommitFest finishes. So we would receive the patch for the item from
him next week. If there will be no patch even after the end of next week
(i.e., April 14th), I will. Let's wait for Sawada-san's action at first.

The items (1) and (3) are not bugs. So I don't think that they need to be
resolved before the beta release. After the feature freeze, many users
will try and play with many new features including quorum-based syncrep.
Then if many of them complain about (1) and (3), we can change the code
at that timing. So we need more time that users can try the feature.

BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
as the priority if quorum-based sync rep is chosen. It's less confusing.

Regards,

-- 
Fujii Masao



Re: Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >> Regarding this feature, there are some loose ends. We should work on
> >> and complete them until the release.
> >>
> >> (1)
> >> Which synchronous replication method, priority or quorum, should be
> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
> >> a priority-based sync replication is chosen for keeping backward
> >> compatibility. However some hackers argued to change this decision
> >> so that a quorum commit is chosen because they think that most users
> >> prefer to a quorum.
> >>
> >> (2)
> >> There will be still many source comments and documentations that
> >> we need to update, for example, in high-availability.sgml. We need to
> >> check and update them throughly.
> >>
> >> (3)
> >> The priority value is assigned to each standby listed in s_s_names
> >> even in quorum commit though those priority values are not used at all.
> >> Users can see those priority values in pg_stat_replication.
> >> Isn't this confusing? If yes, it might be better to always assign 1 as
> >> the priority, for example.
> >
> > [Action required within three days.  This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > v10 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within three calendar days of
> > this message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all fixed
> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
> > toward speedy resolution.  Thanks.
> >
> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> Thanks for the notice!
> 
> Regarding the item (2), Sawada-san told me that he will work on it after
> this CommitFest finishes. So we would receive the patch for the item from
> him next week. If there will be no patch even after the end of next week
> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.

Sounds reasonable; I will look for your update on 14Apr or earlier.

> The items (1) and (3) are not bugs. So I don't think that they need to be
> resolved before the beta release. After the feature freeze, many users
> will try and play with many new features including quorum-based syncrep.
> Then if many of them complain about (1) and (3), we can change the code
> at that timing. So we need more time that users can try the feature.

I've moved (1) to a new section for things to revisit during beta.  If someone
feels strongly that the current behavior is Wrong and must change, speak up as
soon as you reach that conclusion.  Absent such arguments, the behavior won't
change.

> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
> as the priority if quorum-based sync rep is chosen. It's less confusing.

Since you do want (3) to change, please own it like any other open item,
including the mandatory status updates.



Re: Quorum commit for multiple synchronous replication.

From
Petr Jelinek
Date:
On 06/04/17 03:51, Noah Misch wrote:
> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>>> Regarding this feature, there are some loose ends. We should work on
>>>> and complete them until the release.
>>>>
>>>> (1)
>>>> Which synchronous replication method, priority or quorum, should be
>>>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>>>> a priority-based sync replication is chosen for keeping backward
>>>> compatibility. However some hackers argued to change this decision
>>>> so that a quorum commit is chosen because they think that most users
>>>> prefer to a quorum.
>>>>
>>>> (2)
>>>> There will be still many source comments and documentations that
>>>> we need to update, for example, in high-availability.sgml. We need to
>>>> check and update them throughly.
>>>>
>>>> (3)
>>>> The priority value is assigned to each standby listed in s_s_names
>>>> even in quorum commit though those priority values are not used at all.
>>>> Users can see those priority values in pg_stat_replication.
>>>> Isn't this confusing? If yes, it might be better to always assign 1 as
>>>> the priority, for example.
>>>
>>> [Action required within three days.  This is a generic notification.]
>>>
>>> The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>>> since you committed the patch believed to have created it, you own this open
>>> item.  If some other commit is more relevant or if this does not belong as a
>>> v10 open item, please let us know.  Otherwise, please observe the policy on
>>> open item ownership[1] and send a status update within three calendar days of
>>> this message.  Include a date for your subsequent status update.  Testers may
>>> discover new open items at any time, and I want to plan to get them all fixed
>>> well in advance of shipping v10.  Consequently, I will appreciate your efforts
>>> toward speedy resolution.  Thanks.
>>>
>>> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>> Thanks for the notice!
>>
>> Regarding the item (2), Sawada-san told me that he will work on it after
>> this CommitFest finishes. So we would receive the patch for the item from
>> him next week. If there will be no patch even after the end of next week
>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
> 
> Sounds reasonable; I will look for your update on 14Apr or earlier.
> 
>> The items (1) and (3) are not bugs. So I don't think that they need to be
>> resolved before the beta release. After the feature freeze, many users
>> will try and play with many new features including quorum-based syncrep.
>> Then if many of them complain about (1) and (3), we can change the code
>> at that timing. So we need more time that users can try the feature.
> 
> I've moved (1) to a new section for things to revisit during beta.  If someone
> feels strongly that the current behavior is Wrong and must change, speak up as
> soon as you reach that conclusion.  Absent such arguments, the behavior won't
> change.
> 

I was one of the people who said in original thread that I think the
default behavior should change to quorum and I am still of that opinion.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> >> Regarding this feature, there are some loose ends. We should work on
>> >> and complete them until the release.
>> >>
>> >> (1)
>> >> Which synchronous replication method, priority or quorum, should be
>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>> >> a priority-based sync replication is chosen for keeping backward
>> >> compatibility. However some hackers argued to change this decision
>> >> so that a quorum commit is chosen because they think that most users
>> >> prefer to a quorum.
>> >>
>> >> (2)
>> >> There will be still many source comments and documentations that
>> >> we need to update, for example, in high-availability.sgml. We need to
>> >> check and update them throughly.
>> >>
>> >> (3)
>> >> The priority value is assigned to each standby listed in s_s_names
>> >> even in quorum commit though those priority values are not used at all.
>> >> Users can see those priority values in pg_stat_replication.
>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>> >> the priority, for example.
>> >
>> > [Action required within three days.  This is a generic notification.]
>> >
>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>> > since you committed the patch believed to have created it, you own this open
>> > item.  If some other commit is more relevant or if this does not belong as a
>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>> > open item ownership[1] and send a status update within three calendar days of
>> > this message.  Include a date for your subsequent status update.  Testers may
>> > discover new open items at any time, and I want to plan to get them all fixed
>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
>> > toward speedy resolution.  Thanks.
>> >
>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>> Thanks for the notice!
>>
>> Regarding the item (2), Sawada-san told me that he will work on it after
>> this CommitFest finishes. So we would receive the patch for the item from
>> him next week. If there will be no patch even after the end of next week
>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>
> Sounds reasonable; I will look for your update on 14Apr or earlier.
>
>> The items (1) and (3) are not bugs. So I don't think that they need to be
>> resolved before the beta release. After the feature freeze, many users
>> will try and play with many new features including quorum-based syncrep.
>> Then if many of them complain about (1) and (3), we can change the code
>> at that timing. So we need more time that users can try the feature.
>
> I've moved (1) to a new section for things to revisit during beta.  If someone
> feels strongly that the current behavior is Wrong and must change, speak up as
> soon as you reach that conclusion.  Absent such arguments, the behavior won't
> change.
>
>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>
> Since you do want (3) to change, please own it like any other open item,
> including the mandatory status updates.

I agree to report NULL as the priority. I'll send a patch for this as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>> >> Regarding this feature, there are some loose ends. We should work on
>>> >> and complete them until the release.
>>> >>
>>> >> (1)
>>> >> Which synchronous replication method, priority or quorum, should be
>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>>> >> a priority-based sync replication is chosen for keeping backward
>>> >> compatibility. However some hackers argued to change this decision
>>> >> so that a quorum commit is chosen because they think that most users
>>> >> prefer to a quorum.
>>> >>
>>> >> (2)
>>> >> There will be still many source comments and documentations that
>>> >> we need to update, for example, in high-availability.sgml. We need to
>>> >> check and update them throughly.
>>> >>
>>> >> (3)
>>> >> The priority value is assigned to each standby listed in s_s_names
>>> >> even in quorum commit though those priority values are not used at all.
>>> >> Users can see those priority values in pg_stat_replication.
>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>>> >> the priority, for example.
>>> >
>>> > [Action required within three days.  This is a generic notification.]
>>> >
>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>>> > since you committed the patch believed to have created it, you own this open
>>> > item.  If some other commit is more relevant or if this does not belong as a
>>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>>> > open item ownership[1] and send a status update within three calendar days of
>>> > this message.  Include a date for your subsequent status update.  Testers may
>>> > discover new open items at any time, and I want to plan to get them all fixed
>>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
>>> > toward speedy resolution.  Thanks.
>>> >
>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>>
>>> Thanks for the notice!
>>>
>>> Regarding the item (2), Sawada-san told me that he will work on it after
>>> this CommitFest finishes. So we would receive the patch for the item from
>>> him next week. If there will be no patch even after the end of next week
>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>>
>> Sounds reasonable; I will look for your update on 14Apr or earlier.
>>
>>> The items (1) and (3) are not bugs. So I don't think that they need to be
>>> resolved before the beta release. After the feature freeze, many users
>>> will try and play with many new features including quorum-based syncrep.
>>> Then if many of them complain about (1) and (3), we can change the code
>>> at that timing. So we need more time that users can try the feature.
>>
>> I've moved (1) to a new section for things to revisit during beta.  If someone
>> feels strongly that the current behavior is Wrong and must change, speak up as
>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
>> change.
>>
>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>>
>> Since you do want (3) to change, please own it like any other open item,
>> including the mandatory status updates.
>
> I agree to report NULL as the priority. I'll send a patch for this as well.
>
> Regards,
>

Attached two draft patches. The one makes pg_stat_replication.sync
priority report NULL if in quorum-based sync replication. To prevent
extra change I don't change so far the code of setting standby
priority. The another one improves the comment and documentation. If
there is more thing what we need to mention in documentation please
give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:

> > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> > >> (2)
> > >> There will be still many source comments and documentations that
> > >> we need to update, for example, in high-availability.sgml. We need to
> > >> check and update them throughly.
> > >>
> > >> (3)
> > >> The priority value is assigned to each standby listed in s_s_names
> > >> even in quorum commit though those priority values are not used at all.
> > >> Users can see those priority values in pg_stat_replication.
> > >> Isn't this confusing? If yes, it might be better to always assign 1 as
> > >> the priority, for example.

> > Regarding the item (2), Sawada-san told me that he will work on it after
> > this CommitFest finishes. So we would receive the patch for the item from
> > him next week. If there will be no patch even after the end of next week
> > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
> 
> Sounds reasonable; I will look for your update on 14Apr or earlier.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

> Since you do want (3) to change, please own it like any other open item,
> including the mandatory status updates.

Likewise.



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> 
> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> > > >> (2)
> > > >> There will be still many source comments and documentations that
> > > >> we need to update, for example, in high-availability.sgml. We need to
> > > >> check and update them throughly.
> > > >>
> > > >> (3)
> > > >> The priority value is assigned to each standby listed in s_s_names
> > > >> even in quorum commit though those priority values are not used at all.
> > > >> Users can see those priority values in pg_stat_replication.
> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
> > > >> the priority, for example.
> 
> > > Regarding the item (2), Sawada-san told me that he will work on it after
> > > this CommitFest finishes. So we would receive the patch for the item from
> > > him next week. If there will be no patch even after the end of next week
> > > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
> > 
> > Sounds reasonable; I will look for your update on 14Apr or earlier.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> > Since you do want (3) to change, please own it like any other open item,
> > including the mandatory status updates.
> 
> Likewise.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-04-17 05:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
> On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
>> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
>> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>
>> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> > > >> (2)
>> > > >> There will be still many source comments and documentations that
>> > > >> we need to update, for example, in high-availability.sgml. We need to
>> > > >> check and update them throughly.
>> > > >>
>> > > >> (3)
>> > > >> The priority value is assigned to each standby listed in s_s_names
>> > > >> even in quorum commit though those priority values are not used at all.
>> > > >> Users can see those priority values in pg_stat_replication.
>> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
>> > > >> the priority, for example.
>>
>> > > Regarding the item (2), Sawada-san told me that he will work on it after
>> > > this CommitFest finishes. So we would receive the patch for the item from
>> > > him next week. If there will be no patch even after the end of next week
>> > > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>> >
>> > Sounds reasonable; I will look for your update on 14Apr or earlier.
>>
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent status
>> update.  Refer to the policy on open item ownership:
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Sorry for the delay.

I will review Sawada-san's patch and commit something in next three days.
So next target date is April 19th.

>> > Since you do want (3) to change, please own it like any other open item,
>> > including the mandatory status updates.
>>
>> Likewise.

As I told firstly this is not a bug. There are some proposals for better design
of priority column in pg_stat_replication, but we've not reached the consensus
yet. So I think that it's better to move this open item to "Design Decisions to
Recheck Mid-Beta" section so that we can hear more opinions.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>>> >> Regarding this feature, there are some loose ends. We should work on
>>>> >> and complete them until the release.
>>>> >>
>>>> >> (1)
>>>> >> Which synchronous replication method, priority or quorum, should be
>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>>>> >> a priority-based sync replication is chosen for keeping backward
>>>> >> compatibility. However some hackers argued to change this decision
>>>> >> so that a quorum commit is chosen because they think that most users
>>>> >> prefer to a quorum.
>>>> >>
>>>> >> (2)
>>>> >> There will be still many source comments and documentations that
>>>> >> we need to update, for example, in high-availability.sgml. We need to
>>>> >> check and update them throughly.
>>>> >>
>>>> >> (3)
>>>> >> The priority value is assigned to each standby listed in s_s_names
>>>> >> even in quorum commit though those priority values are not used at all.
>>>> >> Users can see those priority values in pg_stat_replication.
>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>>>> >> the priority, for example.
>>>> >
>>>> > [Action required within three days.  This is a generic notification.]
>>>> >
>>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>>>> > since you committed the patch believed to have created it, you own this open
>>>> > item.  If some other commit is more relevant or if this does not belong as a
>>>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>>>> > open item ownership[1] and send a status update within three calendar days of
>>>> > this message.  Include a date for your subsequent status update.  Testers may
>>>> > discover new open items at any time, and I want to plan to get them all fixed
>>>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
>>>> > toward speedy resolution.  Thanks.
>>>> >
>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>>>
>>>> Thanks for the notice!
>>>>
>>>> Regarding the item (2), Sawada-san told me that he will work on it after
>>>> this CommitFest finishes. So we would receive the patch for the item from
>>>> him next week. If there will be no patch even after the end of next week
>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>>>
>>> Sounds reasonable; I will look for your update on 14Apr or earlier.
>>>
>>>> The items (1) and (3) are not bugs. So I don't think that they need to be
>>>> resolved before the beta release. After the feature freeze, many users
>>>> will try and play with many new features including quorum-based syncrep.
>>>> Then if many of them complain about (1) and (3), we can change the code
>>>> at that timing. So we need more time that users can try the feature.
>>>
>>> I've moved (1) to a new section for things to revisit during beta.  If someone
>>> feels strongly that the current behavior is Wrong and must change, speak up as
>>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
>>> change.
>>>
>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>>>
>>> Since you do want (3) to change, please own it like any other open item,
>>> including the mandatory status updates.
>>
>> I agree to report NULL as the priority. I'll send a patch for this as well.
>>
>> Regards,
>>
>
> Attached two draft patches. The one makes pg_stat_replication.sync
> priority report NULL if in quorum-based sync replication. To prevent
> extra change I don't change so far the code of setting standby
> priority. The another one improves the comment and documentation. If
> there is more thing what we need to mention in documentation please
> give me feedback.

Attached is the modified version of the doc improvement patch.
Barring any objection, I will commit this version.

+    In term of performance there is difference between two synchronous
+    replication method. Generally quorum-based synchronous replication
+    tends to be higher performance than priority-based synchronous
+    replication. Because in quorum-based synchronous replication, the
+    transaction can resume as soon as received the specified number of
+    acknowledgement from synchronous standby servers without distinction
+    of standby servers. On the other hand in priority-based synchronous
+    replication, the standby server that the primary server must wait for
+    is fixed until a synchronous standby fails. Therefore, if a server on
+    low-performance machine a has high priority and is chosen as a
+    synchronous standby server it can reduce performance for database
+    applications.

This description looks misleading. A quorum-based sync rep is basically
more efficient when there are multiple standbys in s_s_names and you want
to replicate the transactions to some of them synchronously. I think that
this assumption should be documented explicitly. So I modified this
description. Please see the modified version in the attached patch.

+ /*
+ * Update priority of this WalSender, but note that in
+ * quroum-based sync replication, the value of
+ * sync_standby_priority has no effect.
+ */

This is not true because even quorum-based sync rep uses the priority
value to check whether the standby is async or sync. So I just remove this.

+ * In quorum-based sync replication we select the quorum sync
+ * standby without theirs priority. The all running active standbys
+ * are considered as a candidate for quorum sync standbys

Same as above.

Also I removed some descriptions that I thought unnecessary to add.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
>>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>>>> >> Regarding this feature, there are some loose ends. We should work on
>>>>> >> and complete them until the release.
>>>>> >>
>>>>> >> (1)
>>>>> >> Which synchronous replication method, priority or quorum, should be
>>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>>>>> >> a priority-based sync replication is chosen for keeping backward
>>>>> >> compatibility. However some hackers argued to change this decision
>>>>> >> so that a quorum commit is chosen because they think that most users
>>>>> >> prefer to a quorum.
>>>>> >>
>>>>> >> (2)
>>>>> >> There will be still many source comments and documentations that
>>>>> >> we need to update, for example, in high-availability.sgml. We need to
>>>>> >> check and update them throughly.
>>>>> >>
>>>>> >> (3)
>>>>> >> The priority value is assigned to each standby listed in s_s_names
>>>>> >> even in quorum commit though those priority values are not used at all.
>>>>> >> Users can see those priority values in pg_stat_replication.
>>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>>>>> >> the priority, for example.
>>>>> >
>>>>> > [Action required within three days.  This is a generic notification.]
>>>>> >
>>>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>>>>> > since you committed the patch believed to have created it, you own this open
>>>>> > item.  If some other commit is more relevant or if this does not belong as a
>>>>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>>>>> > open item ownership[1] and send a status update within three calendar days of
>>>>> > this message.  Include a date for your subsequent status update.  Testers may
>>>>> > discover new open items at any time, and I want to plan to get them all fixed
>>>>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
>>>>> > toward speedy resolution.  Thanks.
>>>>> >
>>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>>>>
>>>>> Thanks for the notice!
>>>>>
>>>>> Regarding the item (2), Sawada-san told me that he will work on it after
>>>>> this CommitFest finishes. So we would receive the patch for the item from
>>>>> him next week. If there will be no patch even after the end of next week
>>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>>>>
>>>> Sounds reasonable; I will look for your update on 14Apr or earlier.
>>>>
>>>>> The items (1) and (3) are not bugs. So I don't think that they need to be
>>>>> resolved before the beta release. After the feature freeze, many users
>>>>> will try and play with many new features including quorum-based syncrep.
>>>>> Then if many of them complain about (1) and (3), we can change the code
>>>>> at that timing. So we need more time that users can try the feature.
>>>>
>>>> I've moved (1) to a new section for things to revisit during beta.  If someone
>>>> feels strongly that the current behavior is Wrong and must change, speak up as
>>>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
>>>> change.
>>>>
>>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>>>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>>>>
>>>> Since you do want (3) to change, please own it like any other open item,
>>>> including the mandatory status updates.
>>>
>>> I agree to report NULL as the priority. I'll send a patch for this as well.
>>>
>>> Regards,
>>>
>>
>> Attached two draft patches. The one makes pg_stat_replication.sync
>> priority report NULL if in quorum-based sync replication. To prevent
>> extra change I don't change so far the code of setting standby
>> priority. The another one improves the comment and documentation. If
>> there is more thing what we need to mention in documentation please
>> give me feedback.
>
> Attached is the modified version of the doc improvement patch.
> Barring any objection, I will commit this version.

Thank you for updating the patch.

>
> +    In term of performance there is difference between two synchronous
> +    replication method. Generally quorum-based synchronous replication
> +    tends to be higher performance than priority-based synchronous
> +    replication. Because in quorum-based synchronous replication, the
> +    transaction can resume as soon as received the specified number of
> +    acknowledgement from synchronous standby servers without distinction
> +    of standby servers. On the other hand in priority-based synchronous
> +    replication, the standby server that the primary server must wait for
> +    is fixed until a synchronous standby fails. Therefore, if a server on
> +    low-performance machine a has high priority and is chosen as a
> +    synchronous standby server it can reduce performance for database
> +    applications.
>
> This description looks misleading. A quorum-based sync rep is basically
> more efficient when there are multiple standbys in s_s_names and you want
> to replicate the transactions to some of them synchronously. I think that
> this assumption should be documented explicitly. So I modified this
> description. Please see the modified version in the attached patch.

You're right. The modified version looks good to me, thanks.

>
> + /*
> + * Update priority of this WalSender, but note that in
> + * quroum-based sync replication, the value of
> + * sync_standby_priority has no effect.
> + */
>
> This is not true because even quorum-based sync rep uses the priority
> value to check whether the standby is async or sync. So I just remove this.
>
> + * In quorum-based sync replication we select the quorum sync
> + * standby without theirs priority. The all running active standbys
> + * are considered as a candidate for quorum sync standbys
>
> Same as above.
>
> Also I removed some descriptions that I thought unnecessary to add.
>
> Regards,
>
> --
> Fujii Masao


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com>
> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
> >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
> >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >>>>> >> Regarding this feature, there are some loose ends. We should work on
> >>>>> >> and complete them until the release.
> >>>>> >>
> >>>>> >> (1)
> >>>>> >> Which synchronous replication method, priority or quorum, should be
> >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
> >>>>> >> a priority-based sync replication is chosen for keeping backward
> >>>>> >> compatibility. However some hackers argued to change this decision
> >>>>> >> so that a quorum commit is chosen because they think that most users
> >>>>> >> prefer to a quorum.
> >>>>> >>
> >>>>> >> (2)
> >>>>> >> There will be still many source comments and documentations that
> >>>>> >> we need to update, for example, in high-availability.sgml. We need to
> >>>>> >> check and update them throughly.
> >>>>> >>
> >>>>> >> (3)
> >>>>> >> The priority value is assigned to each standby listed in s_s_names
> >>>>> >> even in quorum commit though those priority values are not used at all.
> >>>>> >> Users can see those priority values in pg_stat_replication.
> >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
> >>>>> >> the priority, for example.
> >>>>> >
> >>>>> > [Action required within three days.  This is a generic notification.]
> >>>>> >
> >>>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
> >>>>> > since you committed the patch believed to have created it, you own this open
> >>>>> > item.  If some other commit is more relevant or if this does not belong as a
> >>>>> > v10 open item, please let us know.  Otherwise, please observe the policy on
> >>>>> > open item ownership[1] and send a status update within three calendar days of
> >>>>> > this message.  Include a date for your subsequent status update.  Testers may
> >>>>> > discover new open items at any time, and I want to plan to get them all fixed
> >>>>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
> >>>>> > toward speedy resolution.  Thanks.
> >>>>> >
> >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> >>>>>
> >>>>> Thanks for the notice!
> >>>>>
> >>>>> Regarding the item (2), Sawada-san told me that he will work on it after
> >>>>> this CommitFest finishes. So we would receive the patch for the item from
> >>>>> him next week. If there will be no patch even after the end of next week
> >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
> >>>>
> >>>> Sounds reasonable; I will look for your update on 14Apr or earlier.
> >>>>
> >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be
> >>>>> resolved before the beta release. After the feature freeze, many users
> >>>>> will try and play with many new features including quorum-based syncrep.
> >>>>> Then if many of them complain about (1) and (3), we can change the code
> >>>>> at that timing. So we need more time that users can try the feature.
> >>>>
> >>>> I've moved (1) to a new section for things to revisit during beta.  If someone
> >>>> feels strongly that the current behavior is Wrong and must change, speak up as
> >>>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
> >>>> change.
> >>>>
> >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
> >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
> >>>>
> >>>> Since you do want (3) to change, please own it like any other open item,
> >>>> including the mandatory status updates.
> >>>
> >>> I agree to report NULL as the priority. I'll send a patch for this as well.
> >>>
> >>> Regards,
> >>>
> >>
> >> Attached two draft patches. The one makes pg_stat_replication.sync
> >> priority report NULL if in quorum-based sync replication. To prevent
> >> extra change I don't change so far the code of setting standby
> >> priority. The another one improves the comment and documentation. If
> >> there is more thing what we need to mention in documentation please
> >> give me feedback.
> >
> > Attached is the modified version of the doc improvement patch.
> > Barring any objection, I will commit this version.
> 
> Thank you for updating the patch.
> 
> >
> > +    In term of performance there is difference between two synchronous
> > +    replication method. Generally quorum-based synchronous replication
> > +    tends to be higher performance than priority-based synchronous
> > +    replication. Because in quorum-based synchronous replication, the
> > +    transaction can resume as soon as received the specified number of
> > +    acknowledgement from synchronous standby servers without distinction
> > +    of standby servers. On the other hand in priority-based synchronous
> > +    replication, the standby server that the primary server must wait for
> > +    is fixed until a synchronous standby fails. Therefore, if a server on
> > +    low-performance machine a has high priority and is chosen as a
> > +    synchronous standby server it can reduce performance for database
> > +    applications.
> >
> > This description looks misleading. A quorum-based sync rep is basically
> > more efficient when there are multiple standbys in s_s_names and you want
> > to replicate the transactions to some of them synchronously. I think that
> > this assumption should be documented explicitly. So I modified this
> > description. Please see the modified version in the attached patch.
> 
> You're right. The modified version looks good to me, thanks.

It looks better to me, too. But (even I'm not sure, of course)
the sentences seem to need improvement.

|    <para>
|     Quorum-based synchronous replication is basically more
|     efficient than priority-based one when you specify multiple
|     standbys in <varname>synchronous_standby_names</> and want
|     to synchronously replicate transactions to two or more of
|     them. In the priority-based case, the replication master
|     must wait for a reply from the slowest standby in the
|     required number of standbys in priority order, which may
|     slower than the rest.  On the other hand, quorum-based
|     synchronous replication may reduce the latency because it
|     allows transactions to wait only for replies from a
|     required number of fastest standbys in all the listed
|     standbys, i.e., such slow standby doesn't block
|     transactions.
|  </para>

I'm not sure that this is actually an improvement..


> > + /*
> > + * Update priority of this WalSender, but note that in
> > + * quroum-based sync replication, the value of
> > + * sync_standby_priority has no effect.
> > + */
> >
> > This is not true because even quorum-based sync rep uses the priority
> > value to check whether the standby is async or sync. So I just remove this.
> >
> > + * In quorum-based sync replication we select the quorum sync
> > + * standby without theirs priority. The all running active standbys
> > + * are considered as a candidate for quorum sync standbys
> >
> > Same as above.
> >
> > Also I removed some descriptions that I thought unnecessary to add.
> >
> > Regards,
> >
> > --
> > Fujii Masao
> 
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com>
>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
>> >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>> >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> >>>>> >> Regarding this feature, there are some loose ends. We should work on
>> >>>>> >> and complete them until the release.
>> >>>>> >>
>> >>>>> >> (1)
>> >>>>> >> Which synchronous replication method, priority or quorum, should be
>> >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>> >>>>> >> a priority-based sync replication is chosen for keeping backward
>> >>>>> >> compatibility. However some hackers argued to change this decision
>> >>>>> >> so that a quorum commit is chosen because they think that most users
>> >>>>> >> prefer to a quorum.
>> >>>>> >>
>> >>>>> >> (2)
>> >>>>> >> There will be still many source comments and documentations that
>> >>>>> >> we need to update, for example, in high-availability.sgml. We need to
>> >>>>> >> check and update them throughly.
>> >>>>> >>
>> >>>>> >> (3)
>> >>>>> >> The priority value is assigned to each standby listed in s_s_names
>> >>>>> >> even in quorum commit though those priority values are not used at all.
>> >>>>> >> Users can see those priority values in pg_stat_replication.
>> >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>> >>>>> >> the priority, for example.
>> >>>>> >
>> >>>>> > [Action required within three days.  This is a generic notification.]
>> >>>>> >
>> >>>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>> >>>>> > since you committed the patch believed to have created it, you own this open
>> >>>>> > item.  If some other commit is more relevant or if this does not belong as a
>> >>>>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>> >>>>> > open item ownership[1] and send a status update within three calendar days of
>> >>>>> > this message.  Include a date for your subsequent status update.  Testers may
>> >>>>> > discover new open items at any time, and I want to plan to get them all fixed
>> >>>>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
>> >>>>> > toward speedy resolution.  Thanks.
>> >>>>> >
>> >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>> >>>>>
>> >>>>> Thanks for the notice!
>> >>>>>
>> >>>>> Regarding the item (2), Sawada-san told me that he will work on it after
>> >>>>> this CommitFest finishes. So we would receive the patch for the item from
>> >>>>> him next week. If there will be no patch even after the end of next week
>> >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>> >>>>
>> >>>> Sounds reasonable; I will look for your update on 14Apr or earlier.
>> >>>>
>> >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be
>> >>>>> resolved before the beta release. After the feature freeze, many users
>> >>>>> will try and play with many new features including quorum-based syncrep.
>> >>>>> Then if many of them complain about (1) and (3), we can change the code
>> >>>>> at that timing. So we need more time that users can try the feature.
>> >>>>
>> >>>> I've moved (1) to a new section for things to revisit during beta.  If someone
>> >>>> feels strongly that the current behavior is Wrong and must change, speak up as
>> >>>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
>> >>>> change.
>> >>>>
>> >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>> >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>> >>>>
>> >>>> Since you do want (3) to change, please own it like any other open item,
>> >>>> including the mandatory status updates.
>> >>>
>> >>> I agree to report NULL as the priority. I'll send a patch for this as well.
>> >>>
>> >>> Regards,
>> >>>
>> >>
>> >> Attached two draft patches. The one makes pg_stat_replication.sync
>> >> priority report NULL if in quorum-based sync replication. To prevent
>> >> extra change I don't change so far the code of setting standby
>> >> priority. The another one improves the comment and documentation. If
>> >> there is more thing what we need to mention in documentation please
>> >> give me feedback.
>> >
>> > Attached is the modified version of the doc improvement patch.
>> > Barring any objection, I will commit this version.
>>
>> Thank you for updating the patch.
>>
>> >
>> > +    In term of performance there is difference between two synchronous
>> > +    replication method. Generally quorum-based synchronous replication
>> > +    tends to be higher performance than priority-based synchronous
>> > +    replication. Because in quorum-based synchronous replication, the
>> > +    transaction can resume as soon as received the specified number of
>> > +    acknowledgement from synchronous standby servers without distinction
>> > +    of standby servers. On the other hand in priority-based synchronous
>> > +    replication, the standby server that the primary server must wait for
>> > +    is fixed until a synchronous standby fails. Therefore, if a server on
>> > +    low-performance machine a has high priority and is chosen as a
>> > +    synchronous standby server it can reduce performance for database
>> > +    applications.
>> >
>> > This description looks misleading. A quorum-based sync rep is basically
>> > more efficient when there are multiple standbys in s_s_names and you want
>> > to replicate the transactions to some of them synchronously. I think that
>> > this assumption should be documented explicitly. So I modified this
>> > description. Please see the modified version in the attached patch.
>>
>> You're right. The modified version looks good to me, thanks.
>
> It looks better to me, too. But (even I'm not sure, of course)
> the sentences seem to need improvement.
>
> |    <para>
> |     Quorum-based synchronous replication is basically more
> |     efficient than priority-based one when you specify multiple
> |     standbys in <varname>synchronous_standby_names</> and want
> |     to synchronously replicate transactions to two or more of
> |     them. In the priority-based case, the replication master
> |     must wait for a reply from the slowest standby in the
> |     required number of standbys in priority order, which may
> |     slower than the rest.

I supposed that Fujii-san pointed out that quorum-based sync
replication could be more efficient when we want to replicate the
transaction to "part of" standbys listed in s_s_names. So I guess it's
not good idea to mention "two or more of them" which also can mean the
all of standbys.

> On the other hand, quorum-based
> |     synchronous replication may reduce the latency because it
> |     allows transactions to wait only for replies from a
> |     required number of fastest standbys in all the listed
> |     standbys, i.e., such slow standby doesn't block
> |     transactions.
> |  </para>
>
> I'm not sure that this is actually an improvement..
>



--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Tue, Apr 18, 2017 at 7:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com>
>>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote:
>>> >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>> >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>>> >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>> >>>>> >> Regarding this feature, there are some loose ends. We should work on
>>> >>>>> >> and complete them until the release.
>>> >>>>> >>
>>> >>>>> >> (1)
>>> >>>>> >> Which synchronous replication method, priority or quorum, should be
>>> >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>>> >>>>> >> a priority-based sync replication is chosen for keeping backward
>>> >>>>> >> compatibility. However some hackers argued to change this decision
>>> >>>>> >> so that a quorum commit is chosen because they think that most users
>>> >>>>> >> prefer to a quorum.
>>> >>>>> >>
>>> >>>>> >> (2)
>>> >>>>> >> There will be still many source comments and documentations that
>>> >>>>> >> we need to update, for example, in high-availability.sgml. We need to
>>> >>>>> >> check and update them throughly.
>>> >>>>> >>
>>> >>>>> >> (3)
>>> >>>>> >> The priority value is assigned to each standby listed in s_s_names
>>> >>>>> >> even in quorum commit though those priority values are not used at all.
>>> >>>>> >> Users can see those priority values in pg_stat_replication.
>>> >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>>> >>>>> >> the priority, for example.
>>> >>>>> >
>>> >>>>> > [Action required within three days.  This is a generic notification.]
>>> >>>>> >
>>> >>>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>>> >>>>> > since you committed the patch believed to have created it, you own this open
>>> >>>>> > item.  If some other commit is more relevant or if this does not belong as a
>>> >>>>> > v10 open item, please let us know.  Otherwise, please observe the policy on
>>> >>>>> > open item ownership[1] and send a status update within three calendar days of
>>> >>>>> > this message.  Include a date for your subsequent status update.  Testers may
>>> >>>>> > discover new open items at any time, and I want to plan to get them all fixed
>>> >>>>> > well in advance of shipping v10.  Consequently, I will appreciate your efforts
>>> >>>>> > toward speedy resolution.  Thanks.
>>> >>>>> >
>>> >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>> >>>>>
>>> >>>>> Thanks for the notice!
>>> >>>>>
>>> >>>>> Regarding the item (2), Sawada-san told me that he will work on it after
>>> >>>>> this CommitFest finishes. So we would receive the patch for the item from
>>> >>>>> him next week. If there will be no patch even after the end of next week
>>> >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>>> >>>>
>>> >>>> Sounds reasonable; I will look for your update on 14Apr or earlier.
>>> >>>>
>>> >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be
>>> >>>>> resolved before the beta release. After the feature freeze, many users
>>> >>>>> will try and play with many new features including quorum-based syncrep.
>>> >>>>> Then if many of them complain about (1) and (3), we can change the code
>>> >>>>> at that timing. So we need more time that users can try the feature.
>>> >>>>
>>> >>>> I've moved (1) to a new section for things to revisit during beta.  If someone
>>> >>>> feels strongly that the current behavior is Wrong and must change, speak up as
>>> >>>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
>>> >>>> change.
>>> >>>>
>>> >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>>> >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>>> >>>>
>>> >>>> Since you do want (3) to change, please own it like any other open item,
>>> >>>> including the mandatory status updates.
>>> >>>
>>> >>> I agree to report NULL as the priority. I'll send a patch for this as well.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>
>>> >> Attached two draft patches. The one makes pg_stat_replication.sync
>>> >> priority report NULL if in quorum-based sync replication. To prevent
>>> >> extra change I don't change so far the code of setting standby
>>> >> priority. The another one improves the comment and documentation. If
>>> >> there is more thing what we need to mention in documentation please
>>> >> give me feedback.
>>> >
>>> > Attached is the modified version of the doc improvement patch.
>>> > Barring any objection, I will commit this version.
>>>
>>> Thank you for updating the patch.
>>>
>>> >
>>> > +    In term of performance there is difference between two synchronous
>>> > +    replication method. Generally quorum-based synchronous replication
>>> > +    tends to be higher performance than priority-based synchronous
>>> > +    replication. Because in quorum-based synchronous replication, the
>>> > +    transaction can resume as soon as received the specified number of
>>> > +    acknowledgement from synchronous standby servers without distinction
>>> > +    of standby servers. On the other hand in priority-based synchronous
>>> > +    replication, the standby server that the primary server must wait for
>>> > +    is fixed until a synchronous standby fails. Therefore, if a server on
>>> > +    low-performance machine a has high priority and is chosen as a
>>> > +    synchronous standby server it can reduce performance for database
>>> > +    applications.
>>> >
>>> > This description looks misleading. A quorum-based sync rep is basically
>>> > more efficient when there are multiple standbys in s_s_names and you want
>>> > to replicate the transactions to some of them synchronously. I think that
>>> > this assumption should be documented explicitly. So I modified this
>>> > description. Please see the modified version in the attached patch.
>>>
>>> You're right. The modified version looks good to me, thanks.
>>
>> It looks better to me, too. But (even I'm not sure, of course)
>> the sentences seem to need improvement.
>>
>> |    <para>
>> |     Quorum-based synchronous replication is basically more
>> |     efficient than priority-based one when you specify multiple
>> |     standbys in <varname>synchronous_standby_names</> and want
>> |     to synchronously replicate transactions to two or more of
>> |     them. In the priority-based case, the replication master
>> |     must wait for a reply from the slowest standby in the
>> |     required number of standbys in priority order, which may
>> |     slower than the rest.
>
> I supposed that Fujii-san pointed out that quorum-based sync
> replication could be more efficient when we want to replicate the
> transaction to "part of" standbys listed in s_s_names.

Yes.

Anyway, I pushed the patch except this paragraph.
Regarding this paragraph, the patch for better descriptions is welcome.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> >>
> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >> > > >> (3)
> >> > > >> The priority value is assigned to each standby listed in s_s_names
> >> > > >> even in quorum commit though those priority values are not used at all.
> >> > > >> Users can see those priority values in pg_stat_replication.
> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
> >> > > >> the priority, for example.

> >> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> >> a status update within 24 hours, and include a date for your subsequent status
> >> update.  Refer to the policy on open item ownership:
> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

> >> > Since you do want (3) to change, please own it like any other open item,
> >> > including the mandatory status updates.
> >>
> >> Likewise.
> 
> As I told firstly this is not a bug. There are some proposals for better design
> of priority column in pg_stat_replication, but we've not reached the consensus
> yet. So I think that it's better to move this open item to "Design Decisions to
> Recheck Mid-Beta" section so that we can hear more opinions.

I'm reading that some people want to report NULL priority, some people want to
report a constant 1 priority, and nobody wants the current behavior.  Is that
an accurate summary?



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
> On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
>> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
>> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
>> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> >>
>> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> >> > > >> (3)
>> >> > > >> The priority value is assigned to each standby listed in s_s_names
>> >> > > >> even in quorum commit though those priority values are not used at all.
>> >> > > >> Users can see those priority values in pg_stat_replication.
>> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
>> >> > > >> the priority, for example.
>
>> >> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> >> a status update within 24 hours, and include a date for your subsequent status
>> >> update.  Refer to the policy on open item ownership:
>> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
>> >> > Since you do want (3) to change, please own it like any other open item,
>> >> > including the mandatory status updates.
>> >>
>> >> Likewise.
>>
>> As I told firstly this is not a bug. There are some proposals for better design
>> of priority column in pg_stat_replication, but we've not reached the consensus
>> yet. So I think that it's better to move this open item to "Design Decisions to
>> Recheck Mid-Beta" section so that we can hear more opinions.
>
> I'm reading that some people want to report NULL priority, some people want to
> report a constant 1 priority, and nobody wants the current behavior.  Is that
> an accurate summary?

Yes, I think that's correct.

FWIW the reason of current behavior is that it would be useful for the
user who is willing to switch from ANY to FIRST. They can know which
standbys will become sync or potential.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Apr 19, 2017 at 1:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
>> On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>>> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
>>> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
>>> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
>>> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>> >>
>>> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>> >> > > >> (3)
>>> >> > > >> The priority value is assigned to each standby listed in s_s_names
>>> >> > > >> even in quorum commit though those priority values are not used at all.
>>> >> > > >> Users can see those priority values in pg_stat_replication.
>>> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
>>> >> > > >> the priority, for example.
>>
>>> >> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>>> >> a status update within 24 hours, and include a date for your subsequent status
>>> >> update.  Refer to the policy on open item ownership:
>>> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>>> >> > Since you do want (3) to change, please own it like any other open item,
>>> >> > including the mandatory status updates.
>>> >>
>>> >> Likewise.
>>>
>>> As I told firstly this is not a bug. There are some proposals for better design
>>> of priority column in pg_stat_replication, but we've not reached the consensus
>>> yet. So I think that it's better to move this open item to "Design Decisions to
>>> Recheck Mid-Beta" section so that we can hear more opinions.
>>
>> I'm reading that some people want to report NULL priority, some people want to
>> report a constant 1 priority, and nobody wants the current behavior.  Is that
>> an accurate summary?
>
> Yes, I think that's correct.

Just adding that I am the only one advocating for switching the
priority number to NULL for async standbys, and that this proposal is
visibly outvoted as it breaks backward-compatibility with the
0-priority setting.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Wed, 19 Apr 2017 03:03:38 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwE95S5GM9UZh0F3ef2D3iEwJ59skh=EwW5HmDJPe2aXog@mail.gmail.com>
> On Tue, Apr 18, 2017 at 7:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com>
> >>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>> > This description looks misleading. A quorum-based sync rep is basically
> >>> > more efficient when there are multiple standbys in s_s_names and you want
> >>> > to replicate the transactions to some of them synchronously. I think that
> >>> > this assumption should be documented explicitly. So I modified this
> >>> > description. Please see the modified version in the attached patch.
> >>>
> >>> You're right. The modified version looks good to me, thanks.

+     A quorum-based synchronous replication is basically more efficient than
+     a priority-based one when you specify multiple standbys in
+     <varname>synchronous_standby_names</> and want to replicate
+     the transactions to some of them synchronously. In this case,
+     the transactions in a priority-based synchronous replication must wait for
+     reply from the slowest standby in synchronous standbys chosen based on
+     their priorities, and which may increase the transaction latencies.
+     On the other hand, using a quorum-based synchronous replication may
+     improve those latencies because it makes the transactions wait only for
+     replies from the requested number of faster standbys in all the listed
+     standbys, i.e., such slow standby doesn't block the transactions.

> >> It looks better to me, too. But (even I'm not sure, of course)
> >> the sentences seem to need improvement.
> >>
> >> |    <para>
> >> |     Quorum-based synchronous replication is basically more
> >> |     efficient than priority-based one when you specify multiple
> >> |     standbys in <varname>synchronous_standby_names</> and want
> >> |     to synchronously replicate transactions to two or more of
> >> |     them. In the priority-based case, the replication master
> >> |     must wait for a reply from the slowest standby in the
> >> |     required number of standbys in priority order, which may
> >> |     slower than the rest.
> >
> > I supposed that Fujii-san pointed out that quorum-based sync
> > replication could be more efficient when we want to replicate the
> > transaction to "part of" standbys listed in s_s_names.
> 
> Yes.

Yes, am I wrote something opposing?

> Anyway, I pushed the patch except this paragraph.
> Regarding this paragraph, the patch for better descriptions is welcome.

+1

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
Ok, I got the point.

At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp>
> > >> |    <para>
> > >> |     Quorum-based synchronous replication is basically more
> > >> |     efficient than priority-based one when you specify multiple
> > >> |     standbys in <varname>synchronous_standby_names</> and want
> > >> |     to synchronously replicate transactions to two or more of
> > >> |     them.

"Some" means "not all".

> > >> |     In the priority-based case, the replication master
> > >> |     must wait for a reply from the slowest standby in the
> > >> |     required number of standbys in priority order, which may
> > >> |     slower than the rest.


Quorum-based synchronous replication is expected to be more
efficient than priority-based one when your master doesn't need
to be in sync with all of the nominated standbys by
<varname>synchronous_standby_names</>.  While quorum-based
replication master waits only for a specified number of fastest
standbys, priority-based replicatoin master must wait for
standbys at the top of the list, which may be slower than the
rest.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
> >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
> >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
> >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> >> >>
> >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >> >> > > >> (3)
> >> >> > > >> The priority value is assigned to each standby listed in s_s_names
> >> >> > > >> even in quorum commit though those priority values are not used at all.
> >> >> > > >> Users can see those priority values in pg_stat_replication.
> >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
> >> >> > > >> the priority, for example.
> >
> >> >> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> >> >> a status update within 24 hours, and include a date for your subsequent status
> >> >> update.  Refer to the policy on open item ownership:
> >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> >
> >> >> > Since you do want (3) to change, please own it like any other open item,
> >> >> > including the mandatory status updates.
> >> >>
> >> >> Likewise.
> >>
> >> As I told firstly this is not a bug. There are some proposals for better design
> >> of priority column in pg_stat_replication, but we've not reached the consensus
> >> yet. So I think that it's better to move this open item to "Design Decisions to
> >> Recheck Mid-Beta" section so that we can hear more opinions.
> >
> > I'm reading that some people want to report NULL priority, some people want to
> > report a constant 1 priority, and nobody wants the current behavior.  Is that
> > an accurate summary?
> 
> Yes, I think that's correct.

Okay, but ...

> FWIW the reason of current behavior is that it would be useful for the
> user who is willing to switch from ANY to FIRST. They can know which
> standbys will become sync or potential.

... does this mean you personally want to keep the current behavior?  If not,
has some other person stated a wish to keep the current behavior?



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote:
> On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
>> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
>> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>> >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
>> >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
>> >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
>> >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> >> >>
>> >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> >> >> > > >> (3)
>> >> >> > > >> The priority value is assigned to each standby listed in s_s_names
>> >> >> > > >> even in quorum commit though those priority values are not used at all.
>> >> >> > > >> Users can see those priority values in pg_stat_replication.
>> >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
>> >> >> > > >> the priority, for example.
>> >
>> >> >> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> >> >> a status update within 24 hours, and include a date for your subsequent status
>> >> >> update.  Refer to the policy on open item ownership:
>> >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>> >
>> >> >> > Since you do want (3) to change, please own it like any other open item,
>> >> >> > including the mandatory status updates.
>> >> >>
>> >> >> Likewise.
>> >>
>> >> As I told firstly this is not a bug. There are some proposals for better design
>> >> of priority column in pg_stat_replication, but we've not reached the consensus
>> >> yet. So I think that it's better to move this open item to "Design Decisions to
>> >> Recheck Mid-Beta" section so that we can hear more opinions.
>> >
>> > I'm reading that some people want to report NULL priority, some people want to
>> > report a constant 1 priority, and nobody wants the current behavior.  Is that
>> > an accurate summary?
>>
>> Yes, I think that's correct.
>
> Okay, but ...
>
>> FWIW the reason of current behavior is that it would be useful for the
>> user who is willing to switch from ANY to FIRST. They can know which
>> standbys will become sync or potential.
>
> ... does this mean you personally want to keep the current behavior?  If not,
> has some other person stated a wish to keep the current behavior?

No, I want to change the current behavior. IMO it's better to set
priority 1 to all standbys in quorum set. I guess there is no longer
person who supports the current behavior.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Fri, 21 Apr 2017 13:20:05 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoCU+ch4b2O0iW-b_BnUs7oMcT8pcwM690XVu134k=cA+Q@mail.gmail.com>
> On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
> >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
> >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> >> >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote:
> >> >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
> >> >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
> >> >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> >> >> >>
> >> >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >> >> >> > > >> (3)
> >> >> >> > > >> The priority value is assigned to each standby listed in s_s_names
> >> >> >> > > >> even in quorum commit though those priority values are not used at all.
> >> >> >> > > >> Users can see those priority values in pg_stat_replication.
> >> >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as
> >> >> >> > > >> the priority, for example.
> >> >
> >> >> >> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> >> >> >> a status update within 24 hours, and include a date for your subsequent status
> >> >> >> update.  Refer to the policy on open item ownership:
> >> >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> >> >
> >> >> >> > Since you do want (3) to change, please own it like any other open item,
> >> >> >> > including the mandatory status updates.
> >> >> >>
> >> >> >> Likewise.
> >> >>
> >> >> As I told firstly this is not a bug. There are some proposals for better design
> >> >> of priority column in pg_stat_replication, but we've not reached the consensus
> >> >> yet. So I think that it's better to move this open item to "Design Decisions to
> >> >> Recheck Mid-Beta" section so that we can hear more opinions.
> >> >
> >> > I'm reading that some people want to report NULL priority, some people want to
> >> > report a constant 1 priority, and nobody wants the current behavior.  Is that
> >> > an accurate summary?
> >>
> >> Yes, I think that's correct.
> >
> > Okay, but ...
> >
> >> FWIW the reason of current behavior is that it would be useful for the
> >> user who is willing to switch from ANY to FIRST. They can know which
> >> standbys will become sync or potential.
> >
> > ... does this mean you personally want to keep the current behavior?  If not,
> > has some other person stated a wish to keep the current behavior?
> 
> No, I want to change the current behavior. IMO it's better to set
> priority 1 to all standbys in quorum set. I guess there is no longer
> person who supports the current behavior.

+1 for the latter. For the former, I'd like to distinguish
standbys in sync and not in the field or something if we can
allow the additional complexity.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote:
> On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
> >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
> >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> >> >> As I told firstly this is not a bug. There are some proposals for better design
> >> >> of priority column in pg_stat_replication, but we've not reached the consensus
> >> >> yet. So I think that it's better to move this open item to "Design Decisions to
> >> >> Recheck Mid-Beta" section so that we can hear more opinions.
> >> >
> >> > I'm reading that some people want to report NULL priority, some people want to
> >> > report a constant 1 priority, and nobody wants the current behavior.  Is that
> >> > an accurate summary?
> >>
> >> Yes, I think that's correct.
> >
> > Okay, but ...
> >
> >> FWIW the reason of current behavior is that it would be useful for the
> >> user who is willing to switch from ANY to FIRST. They can know which
> >> standbys will become sync or potential.
> >
> > ... does this mean you personally want to keep the current behavior?  If not,
> > has some other person stated a wish to keep the current behavior?
> 
> No, I want to change the current behavior. IMO it's better to set
> priority 1 to all standbys in quorum set. I guess there is no longer
> person who supports the current behavior.

In that case, this open item is not eligible for section "Design Decisions to
Recheck Mid-Beta".  That section is for items where we'll probably change
nothing, but we plan to recheck later just in case.  Here, we expect to change
the behavior; the open question is which replacement behavior to prefer.

Fujii, as the owner of this open item, you are responsible for moderating the
debate until there's adequate consensus to make a particular change or to keep
the current behavior after all.  Please proceed to do that.  Beta testers
deserve a UI they may like, not a UI you already plan to change later.

Thanks,
nm



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Thu, Apr 20, 2017 at 11:34:34PM -0700, Noah Misch wrote:
> On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote:
> > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote:
> > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
> > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
> > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> > >> >> As I told firstly this is not a bug. There are some proposals for better design
> > >> >> of priority column in pg_stat_replication, but we've not reached the consensus
> > >> >> yet. So I think that it's better to move this open item to "Design Decisions to
> > >> >> Recheck Mid-Beta" section so that we can hear more opinions.
> > >> >
> > >> > I'm reading that some people want to report NULL priority, some people want to
> > >> > report a constant 1 priority, and nobody wants the current behavior.  Is that
> > >> > an accurate summary?
> > >>
> > >> Yes, I think that's correct.
> > >
> > > Okay, but ...
> > >
> > >> FWIW the reason of current behavior is that it would be useful for the
> > >> user who is willing to switch from ANY to FIRST. They can know which
> > >> standbys will become sync or potential.
> > >
> > > ... does this mean you personally want to keep the current behavior?  If not,
> > > has some other person stated a wish to keep the current behavior?
> > 
> > No, I want to change the current behavior. IMO it's better to set
> > priority 1 to all standbys in quorum set. I guess there is no longer
> > person who supports the current behavior.
> 
> In that case, this open item is not eligible for section "Design Decisions to
> Recheck Mid-Beta".  That section is for items where we'll probably change
> nothing, but we plan to recheck later just in case.  Here, we expect to change
> the behavior; the open question is which replacement behavior to prefer.
> 
> Fujii, as the owner of this open item, you are responsible for moderating the
> debate until there's adequate consensus to make a particular change or to keep
> the current behavior after all.  Please proceed to do that.  Beta testers
> deserve a UI they may like, not a UI you already plan to change later.

Please observe the policy on open item ownership[1] and send a status update
within three calendar days of this message.  Include a date for your
subsequent status update.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Apr 20, 2017 at 9:31 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Ok, I got the point.
>
> At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp>
 
>> > >> |    <para>
>> > >> |     Quorum-based synchronous replication is basically more
>> > >> |     efficient than priority-based one when you specify multiple
>> > >> |     standbys in <varname>synchronous_standby_names</> and want
>> > >> |     to synchronously replicate transactions to two or more of
>> > >> |     them.
>
> "Some" means "not all".
>
>> > >> |     In the priority-based case, the replication master
>> > >> |     must wait for a reply from the slowest standby in the
>> > >> |     required number of standbys in priority order, which may
>> > >> |     slower than the rest.
>
>
> Quorum-based synchronous replication is expected to be more
> efficient than priority-based one when your master doesn't need
> to be in sync with all of the nominated standbys by
> <varname>synchronous_standby_names</>.  While quorum-based
> replication master waits only for a specified number of fastest
> standbys, priority-based replicatoin master must wait for
> standbys at the top of the list, which may be slower than the
> rest.

This description looks good to me. I've updated the patch based on
this description and attached it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Mon, Apr 24, 2017 at 9:02 AM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Apr 20, 2017 at 11:34:34PM -0700, Noah Misch wrote:
>> On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote:
>> > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote:
>> > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
>> > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
>> > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>> > >> >> As I told firstly this is not a bug. There are some proposals for better design
>> > >> >> of priority column in pg_stat_replication, but we've not reached the consensus
>> > >> >> yet. So I think that it's better to move this open item to "Design Decisions to
>> > >> >> Recheck Mid-Beta" section so that we can hear more opinions.
>> > >> >
>> > >> > I'm reading that some people want to report NULL priority, some people want to
>> > >> > report a constant 1 priority, and nobody wants the current behavior.  Is that
>> > >> > an accurate summary?
>> > >>
>> > >> Yes, I think that's correct.
>> > >
>> > > Okay, but ...
>> > >
>> > >> FWIW the reason of current behavior is that it would be useful for the
>> > >> user who is willing to switch from ANY to FIRST. They can know which
>> > >> standbys will become sync or potential.
>> > >
>> > > ... does this mean you personally want to keep the current behavior?  If not,
>> > > has some other person stated a wish to keep the current behavior?
>> >
>> > No, I want to change the current behavior. IMO it's better to set
>> > priority 1 to all standbys in quorum set. I guess there is no longer
>> > person who supports the current behavior.
>>
>> In that case, this open item is not eligible for section "Design Decisions to
>> Recheck Mid-Beta".  That section is for items where we'll probably change
>> nothing, but we plan to recheck later just in case.  Here, we expect to change
>> the behavior; the open question is which replacement behavior to prefer.
>>
>> Fujii, as the owner of this open item, you are responsible for moderating the
>> debate until there's adequate consensus to make a particular change or to keep
>> the current behavior after all.  Please proceed to do that.  Beta testers
>> deserve a UI they may like, not a UI you already plan to change later.
>
> Please observe the policy on open item ownership[1] and send a status update
> within three calendar days of this message.  Include a date for your
> subsequent status update.

Okay, so our consensus is to always set the priorities of sync standbys
to 1 in quorum-based syncrep case. Attached patch does this change.
Barrying any objection, I will commit this.

I will commit something to close this open item by April 28th at the latest
(IOW before my vacation starts).

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Mon, Apr 24, 2017 at 2:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Apr 20, 2017 at 9:31 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Ok, I got the point.
>>
>> At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp>
 
>>> > >> |    <para>
>>> > >> |     Quorum-based synchronous replication is basically more
>>> > >> |     efficient than priority-based one when you specify multiple
>>> > >> |     standbys in <varname>synchronous_standby_names</> and want
>>> > >> |     to synchronously replicate transactions to two or more of
>>> > >> |     them.
>>
>> "Some" means "not all".
>>
>>> > >> |     In the priority-based case, the replication master
>>> > >> |     must wait for a reply from the slowest standby in the
>>> > >> |     required number of standbys in priority order, which may
>>> > >> |     slower than the rest.
>>
>>
>> Quorum-based synchronous replication is expected to be more
>> efficient than priority-based one when your master doesn't need
>> to be in sync with all of the nominated standbys by
>> <varname>synchronous_standby_names</>.

This description may be invalid in the case where the requested number
of sync standbys is smaller than the number of "nominated" standbys by
s_s_names. For example, please imagine the case where there are five
standbys nominated by s_s_name, the requested number of sync standbys
is 2, and only two sync standbys are running. In this case, the master
needs to wait for those two standbys whatever the sync rep method is.
I think that we should rewrite that to something like "quorum-based
synchronous replication is more effecient when the requested number
of synchronous standbys is smaller than the number of potential
synchronous standbys running".

>  While quorum-based
>> replication master waits only for a specified number of fastest
>> standbys, priority-based replicatoin master must wait for
>> standbys at the top of the list, which may be slower than the
>> rest.
>
> This description looks good to me. I've updated the patch based on
> this description and attached it.

But I still think that the original description that I used in my patch is
better than this....

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Apr 25, 2017 at 12:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 9:02 AM, Noah Misch <noah@leadboat.com> wrote:
>> On Thu, Apr 20, 2017 at 11:34:34PM -0700, Noah Misch wrote:
>>> On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote:
>>> > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote:
>>> > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
>>> > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote:
>>> > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>>> > >> >> As I told firstly this is not a bug. There are some proposals for better design
>>> > >> >> of priority column in pg_stat_replication, but we've not reached the consensus
>>> > >> >> yet. So I think that it's better to move this open item to "Design Decisions to
>>> > >> >> Recheck Mid-Beta" section so that we can hear more opinions.
>>> > >> >
>>> > >> > I'm reading that some people want to report NULL priority, some people want to
>>> > >> > report a constant 1 priority, and nobody wants the current behavior.  Is that
>>> > >> > an accurate summary?
>>> > >>
>>> > >> Yes, I think that's correct.
>>> > >
>>> > > Okay, but ...
>>> > >
>>> > >> FWIW the reason of current behavior is that it would be useful for the
>>> > >> user who is willing to switch from ANY to FIRST. They can know which
>>> > >> standbys will become sync or potential.
>>> > >
>>> > > ... does this mean you personally want to keep the current behavior?  If not,
>>> > > has some other person stated a wish to keep the current behavior?
>>> >
>>> > No, I want to change the current behavior. IMO it's better to set
>>> > priority 1 to all standbys in quorum set. I guess there is no longer
>>> > person who supports the current behavior.
>>>
>>> In that case, this open item is not eligible for section "Design Decisions to
>>> Recheck Mid-Beta".  That section is for items where we'll probably change
>>> nothing, but we plan to recheck later just in case.  Here, we expect to change
>>> the behavior; the open question is which replacement behavior to prefer.
>>>
>>> Fujii, as the owner of this open item, you are responsible for moderating the
>>> debate until there's adequate consensus to make a particular change or to keep
>>> the current behavior after all.  Please proceed to do that.  Beta testers
>>> deserve a UI they may like, not a UI you already plan to change later.
>>
>> Please observe the policy on open item ownership[1] and send a status update
>> within three calendar days of this message.  Include a date for your
>> subsequent status update.
>
> Okay, so our consensus is to always set the priorities of sync standbys
> to 1 in quorum-based syncrep case. Attached patch does this change.
> Barrying any objection, I will commit this.

+1

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Tue, 25 Apr 2017 01:13:12 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwFZHQXfu04d+FwOOgFzvXdRoRvPrU6jFQJRF2BPLkADsQ@mail.gmail.com>
> On Mon, Apr 24, 2017 at 2:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Thu, Apr 20, 2017 at 9:31 AM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> Ok, I got the point.
> >>
> >> At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
wrotein <20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp>
 
> >>> > >> |    <para>
> >>> > >> |     Quorum-based synchronous replication is basically more
> >>> > >> |     efficient than priority-based one when you specify multiple
> >>> > >> |     standbys in <varname>synchronous_standby_names</> and want
> >>> > >> |     to synchronously replicate transactions to two or more of
> >>> > >> |     them.
> >>
> >> "Some" means "not all".
> >>
> >>> > >> |     In the priority-based case, the replication master
> >>> > >> |     must wait for a reply from the slowest standby in the
> >>> > >> |     required number of standbys in priority order, which may
> >>> > >> |     slower than the rest.
> >>
> >>
> >> Quorum-based synchronous replication is expected to be more
> >> efficient than priority-based one when your master doesn't need
> >> to be in sync with all of the nominated standbys by
> >> <varname>synchronous_standby_names</>.
> 
> This description may be invalid in the case where the requested number
> of sync standbys is smaller than the number of "nominated" standbys by
> s_s_names. For example, please imagine the case where there are five
> standbys nominated by s_s_name, the requested number of sync standbys
> is 2, and only two sync standbys are running. In this case, the master
> needs to wait for those two standbys whatever the sync rep method is.

Hmm. The 'nominated' standbys are standbys that their names are
listed in the s_s_names. "your master doesn't need to be in sync
with all of" means "number of sync standbys is smaller than the
number of.." So it seems to be the same... for me.

> I think that we should rewrite that to something like "quorum-based
> synchronous replication is more effecient when the requested number
> of synchronous standbys is smaller than the number of potential
> synchronous standbys running".

Against this phrase, "potential sync standbys" is "nominated
standbys".


> >  While quorum-based
> >> replication master waits only for a specified number of fastest
> >> standbys, priority-based replicatoin master must wait for
> >> standbys at the top of the list, which may be slower than the
> >> rest.
> 
> > This description looks good to me. I've updated the patch based on
> > this description and attached it.
> 
> But I still think that the original description that I used in my patch is
> better than this....

I'm not good at composition, so I cannot insist on my
proposal. For the convenience of others, here is the proposal
from Fujii-san.

+     A quorum-based synchronous replication is basically more efficient than
+     a priority-based one when you specify multiple standbys in
+     <varname>synchronous_standby_names</> and want to replicate
+     the transactions to some of them synchronously. In this case,
+     the transactions in a priority-based synchronous replication must wait for
+     reply from the slowest standby in synchronous standbys chosen based on
+     their priorities, and which may increase the transaction latencies.
+     On the other hand, using a quorum-based synchronous replication may
+     improve those latencies because it makes the transactions wait only for
+     replies from the requested number of faster standbys in all the listed
+     standbys, i.e., such slow standby doesn't block the transactions.


-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Tue, 25 Apr 2017 09:22:59 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAG88zYUwhV9L5muNX-qPSB+AgzerFDD0JDDVoM25gKKw@mail.gmail.com>
> >> Please observe the policy on open item ownership[1] and send a status update
> >> within three calendar days of this message.  Include a date for your
> >> subsequent status update.
> >
> > Okay, so our consensus is to always set the priorities of sync standbys
> > to 1 in quorum-based syncrep case. Attached patch does this change.
> > Barrying any objection, I will commit this.
> 
> +1

Ok, +1 from me.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Amit Kapila
Date:
On Tue, Apr 25, 2017 at 2:09 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> I'm not good at composition, so I cannot insist on my
> proposal. For the convenience of others, here is the proposal
> from Fujii-san.
>

Do you see any problem with the below proposal?  To me, this sounds reasonable.

> +     A quorum-based synchronous replication is basically more efficient than
> +     a priority-based one when you specify multiple standbys in
> +     <varname>synchronous_standby_names</> and want to replicate
> +     the transactions to some of them synchronously. In this case,
> +     the transactions in a priority-based synchronous replication must wait for
> +     reply from the slowest standby in synchronous standbys chosen based on
> +     their priorities, and which may increase the transaction latencies.
> +     On the other hand, using a quorum-based synchronous replication may
> +     improve those latencies because it makes the transactions wait only for
> +     replies from the requested number of faster standbys in all the listed
> +     standbys, i.e., such slow standby doesn't block the transactions.
>

Can we do few modifications like:
improve those latencies --> reduce those latencies
such slow standby --> a slow standby

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Tue, Apr 25, 2017 at 8:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 2:09 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>> I'm not good at composition, so I cannot insist on my
>> proposal. For the convenience of others, here is the proposal
>> from Fujii-san.
>>
>
> Do you see any problem with the below proposal?
> To me, this sounds reasonable.

I agree.

>
>> +     A quorum-based synchronous replication is basically more efficient than
>> +     a priority-based one when you specify multiple standbys in
>> +     <varname>synchronous_standby_names</> and want to replicate
>> +     the transactions to some of them synchronously. In this case,
>> +     the transactions in a priority-based synchronous replication must wait for
>> +     reply from the slowest standby in synchronous standbys chosen based on
>> +     their priorities, and which may increase the transaction latencies.
>> +     On the other hand, using a quorum-based synchronous replication may
>> +     improve those latencies because it makes the transactions wait only for
>> +     replies from the requested number of faster standbys in all the listed
>> +     standbys, i.e., such slow standby doesn't block the transactions.
>>
>
> Can we do few modifications like:
> improve those latencies --> reduce those latencies
> such slow standby --> a slow standby
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Fujii Masao
Date:
On Tue, Apr 25, 2017 at 5:41 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Tue, 25 Apr 2017 09:22:59 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAG88zYUwhV9L5muNX-qPSB+AgzerFDD0JDDVoM25gKKw@mail.gmail.com>
>> >> Please observe the policy on open item ownership[1] and send a status update
>> >> within three calendar days of this message.  Include a date for your
>> >> subsequent status update.
>> >
>> > Okay, so our consensus is to always set the priorities of sync standbys
>> > to 1 in quorum-based syncrep case. Attached patch does this change.
>> > Barrying any objection, I will commit this.
>>
>> +1
>
> Ok, +1 from me.

Pushed. Thanks!

Regards,

-- 
Fujii Masao



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Kyotaro HORIGUCHI
Date:
At Tue, 25 Apr 2017 21:21:29 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBqpMzQ3hnLjOrAj1PX__Bqo9XWUhSX9hzAewdbQP9QKg@mail.gmail.com>
> On Tue, Apr 25, 2017 at 8:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Tue, Apr 25, 2017 at 2:09 PM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >>
> >> I'm not good at composition, so I cannot insist on my
> >> proposal. For the convenience of others, here is the proposal
> >> from Fujii-san.
> >>
> >
> > Do you see any problem with the below proposal?
> > To me, this sounds reasonable.
> 
> I agree.

Ok, I give up:p Thanks for shoving me.

> >> +     A quorum-based synchronous replication is basically more efficient than
> >> +     a priority-based one when you specify multiple standbys in
> >> +     <varname>synchronous_standby_names</> and want to replicate
> >> +     the transactions to some of them synchronously. In this case,
> >> +     the transactions in a priority-based synchronous replication must wait for
> >> +     reply from the slowest standby in synchronous standbys chosen based on
> >> +     their priorities, and which may increase the transaction latencies.
> >> +     On the other hand, using a quorum-based synchronous replication may
> >> +     improve those latencies because it makes the transactions wait only for
> >> +     replies from the requested number of faster standbys in all the listed
> >> +     standbys, i.e., such slow standby doesn't block the transactions.
> >>
> >
> > Can we do few modifications like:
> > improve those latencies --> reduce those latencies
> > such slow standby --> a slow standby
> >
> > --
> > With Regards,
> > Amit Kapila.
> > EnterpriseDB: http://www.enterprisedb.com
> 
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Noah Misch
Date:
On Thu, Apr 06, 2017 at 08:55:37AM +0200, Petr Jelinek wrote:
> On 06/04/17 03:51, Noah Misch wrote:
> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
> >>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >>>> Regarding this feature, there are some loose ends. We should work on
> >>>> and complete them until the release.
> >>>>
> >>>> (1)
> >>>> Which synchronous replication method, priority or quorum, should be
> >>>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
> >>>> a priority-based sync replication is chosen for keeping backward
> >>>> compatibility. However some hackers argued to change this decision
> >>>> so that a quorum commit is chosen because they think that most users
> >>>> prefer to a quorum.

> >> The items (1) and (3) are not bugs. So I don't think that they need to be
> >> resolved before the beta release. After the feature freeze, many users
> >> will try and play with many new features including quorum-based syncrep.
> >> Then if many of them complain about (1) and (3), we can change the code
> >> at that timing. So we need more time that users can try the feature.
> > 
> > I've moved (1) to a new section for things to revisit during beta.  If someone
> > feels strongly that the current behavior is Wrong and must change, speak up as
> > soon as you reach that conclusion.  Absent such arguments, the behavior won't
> > change.
> > 
> 
> I was one of the people who said in original thread that I think the
> default behavior should change to quorum and I am still of that opinion.

This item appears under "decisions to recheck mid-beta".  If anyone is going
to push for a change here, now is the time.



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Apr 06, 2017 at 08:55:37AM +0200, Petr Jelinek wrote:
>> On 06/04/17 03:51, Noah Misch wrote:
>> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote:
>> >>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> >>>> Regarding this feature, there are some loose ends. We should work on
>> >>>> and complete them until the release.
>> >>>>
>> >>>> (1)
>> >>>> Which synchronous replication method, priority or quorum, should be
>> >>>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>> >>>> a priority-based sync replication is chosen for keeping backward
>> >>>> compatibility. However some hackers argued to change this decision
>> >>>> so that a quorum commit is chosen because they think that most users
>> >>>> prefer to a quorum.
>
>> >> The items (1) and (3) are not bugs. So I don't think that they need to be
>> >> resolved before the beta release. After the feature freeze, many users
>> >> will try and play with many new features including quorum-based syncrep.
>> >> Then if many of them complain about (1) and (3), we can change the code
>> >> at that timing. So we need more time that users can try the feature.
>> >
>> > I've moved (1) to a new section for things to revisit during beta.  If someone
>> > feels strongly that the current behavior is Wrong and must change, speak up as
>> > soon as you reach that conclusion.  Absent such arguments, the behavior won't
>> > change.
>> >
>>
>> I was one of the people who said in original thread that I think the
>> default behavior should change to quorum and I am still of that opinion.
>
> This item appears under "decisions to recheck mid-beta".  If anyone is going
> to push for a change here, now is the time.

It has been 1 week since the previous mail. I though that there were
others argued to change the behavior of old-style setting so that a
quorum commit is chosen. If nobody is going to push for a change we
can live with the current behavior?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Fri, Aug 4, 2017 at 8:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote:
>> This item appears under "decisions to recheck mid-beta".  If anyone is going
>> to push for a change here, now is the time.
>
> It has been 1 week since the previous mail. I though that there were
> others argued to change the behavior of old-style setting so that a
> quorum commit is chosen. If nobody is going to push for a change we
> can live with the current behavior?

FWIW, I still see no harm in keeping backward-compatibility here, so I
am in favor of a statu-quo.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Josh Berkus
Date:
On 08/09/2017 10:49 PM, Michael Paquier wrote:
> On Fri, Aug 4, 2017 at 8:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote:
>>> This item appears under "decisions to recheck mid-beta".  If anyone is going
>>> to push for a change here, now is the time.
>>
>> It has been 1 week since the previous mail. I though that there were
>> others argued to change the behavior of old-style setting so that a
>> quorum commit is chosen. If nobody is going to push for a change we
>> can live with the current behavior?
> 
> FWIW, I still see no harm in keeping backward-compatibility here, so I
> am in favor of a statu-quo.
> 

I am vaguely in favor of making quorum the default over "ordered".
However, given that anybody using sync commit without
understanding/customizing the setup is going to be sorry regardless,
keeping backwards compatibility is acceptable.

-- 
Josh Berkus
Containers & Databases Oh My!



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Fri, Aug 11, 2017 at 1:40 AM, Josh Berkus <josh@berkus.org> wrote:
> On 08/09/2017 10:49 PM, Michael Paquier wrote:
>> On Fri, Aug 4, 2017 at 8:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote:
>>>> This item appears under "decisions to recheck mid-beta".  If anyone is going
>>>> to push for a change here, now is the time.
>>>
>>> It has been 1 week since the previous mail. I though that there were
>>> others argued to change the behavior of old-style setting so that a
>>> quorum commit is chosen. If nobody is going to push for a change we
>>> can live with the current behavior?
>>
>> FWIW, I still see no harm in keeping backward-compatibility here, so I
>> am in favor of a statu-quo.
>>
>
> I am vaguely in favor of making quorum the default over "ordered".
> However, given that anybody using sync commit without
> understanding/customizing the setup is going to be sorry regardless,
> keeping backwards compatibility is acceptable.
>

Thank you for the comment.

FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse
users and we want to break the backward compatibility, I'd rather like
to remove that style in PostgreSQL 10 and to raise an syntax error to
user for more safety. Also, since the syntax 'a, b' might be opaque
for new users who don't know the history of s_s_names syntax, we could
unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping
the '*'.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Aug 16, 2017 at 4:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse
> users and we want to break the backward compatibility, I'd rather like
> to remove that style in PostgreSQL 10 and to raise an syntax error to
> user for more safety. Also, since the syntax 'a, b' might be opaque
> for new users who don't know the history of s_s_names syntax, we could
> unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping
> the '*'.

I find the removal of a syntax in release N for something introduced
in release (N - 1) a bit hard to swallow from the user prospective.
What about just issuing a warning instead and say that the use of
ANY/FIRST is recommended? It costs nothing in maintenance to keep it
around.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Wed, Aug 16, 2017 at 4:37 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Aug 16, 2017 at 4:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse
>> users and we want to break the backward compatibility, I'd rather like
>> to remove that style in PostgreSQL 10 and to raise an syntax error to
>> user for more safety. Also, since the syntax 'a, b' might be opaque
>> for new users who don't know the history of s_s_names syntax, we could
>> unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping
>> the '*'.
>
> I find the removal of a syntax in release N for something introduced
> in release (N - 1) a bit hard to swallow from the user prospective.
> What about just issuing a warning instead and say that the use of
> ANY/FIRST is recommended? It costs nothing in maintenance to keep it
> around.

Yeah, I think that would be better. If we decide to not make quorum
commit the default we can issue a warning in docs. Attached a draft
patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Thu, Aug 17, 2017 at 2:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Aug 16, 2017 at 4:37 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Aug 16, 2017 at 4:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse
>>> users and we want to break the backward compatibility, I'd rather like
>>> to remove that style in PostgreSQL 10 and to raise an syntax error to
>>> user for more safety. Also, since the syntax 'a, b' might be opaque
>>> for new users who don't know the history of s_s_names syntax, we could
>>> unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping
>>> the '*'.
>>
>> I find the removal of a syntax in release N for something introduced
>> in release (N - 1) a bit hard to swallow from the user prospective.
>> What about just issuing a warning instead and say that the use of
>> ANY/FIRST is recommended? It costs nothing in maintenance to keep it
>> around.
>
> Yeah, I think that would be better. If we decide to not make quorum
> commit the default we can issue a warning in docs. Attached a draft
> patch.

I had in mind a ereport(WARNING) in create_syncrep_config. Extra
thoughts/opinions welcome.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Robert Haas
Date:
On Thu, Aug 17, 2017 at 1:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I had in mind a ereport(WARNING) in create_syncrep_config. Extra
> thoughts/opinions welcome.

I think for v10 we should just document the behavior we've got; I
think it's too late to be whacking things around now.

For v11, we could emit a warning if we plan to deprecate and
eventually remove the syntax without ANY/FIRST, but let's not do:

WARNING:  what you did is ok, but you might have wanted to do something else

First of all, whether or not that can properly be called a warning is
highly debatable.  Also, if you do that sort of thing to your spouse
and/or children, they call it "nagging".  I don't think users will
like it any more than family members do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Sat, Aug 19, 2017 at 12:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 17, 2017 at 1:13 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> I had in mind a ereport(WARNING) in create_syncrep_config. Extra
>> thoughts/opinions welcome.
>
> I think for v10 we should just document the behavior we've got; I
> think it's too late to be whacking things around now.
>
> For v11, we could emit a warning if we plan to deprecate and
> eventually remove the syntax without ANY/FIRST, but let's not do:
>
> WARNING:  what you did is ok, but you might have wanted to do something else
>
> First of all, whether or not that can properly be called a warning is
> highly debatable.  Also, if you do that sort of thing to your spouse
> and/or children, they call it "nagging".  I don't think users will
> like it any more than family members do.
>

It seems to me that we should discuss whether we want to keep the some
syntax such as 'a,b', 'N(a,b)' before thinking whether or not that
making the quorum commit the default behavior of 'N(a,b)' syntax. If
we plan to remove such syntax in a future release we can live with the
current code and should document it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Michael Paquier
Date:
On Wed, Aug 23, 2017 at 3:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> It seems to me that we should discuss whether we want to keep the some
> syntax such as 'a,b', 'N(a,b)' before thinking whether or not that
> making the quorum commit the default behavior of 'N(a,b)' syntax. If
> we plan to remove such syntax in a future release we can live with the
> current code and should document it.

The parsing code of repl_gram.y represents zero maintenance at the
end, so let me suggest to just live with what we have and do nothing.
Things kept as they are are not bad either. By changing the default,
people may have their failover flows silently trapped. So if we change
the default we will perhaps make some users happy, but I think that we
are going to make also some people angry. That's not fun to debug
silent failover issues.

At the end of the day, we could just add one sentence in the docs
saying the use of ANY and FIRST is encouraged over the past grammar
because they are clearer to understand.
-- 
Michael



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Josh Berkus
Date:
On 08/22/2017 11:04 PM, Masahiko Sawada wrote:
> WARNING:  what you did is ok, but you might have wanted to do something else
> 
> First of all, whether or not that can properly be called a warning is
> highly debatable.  Also, if you do that sort of thing to your spouse
> and/or children, they call it "nagging".  I don't think users will
> like it any more than family members do.

Realistically, we'll support the backwards-compatible syntax for 3-5
years.  Which is fine.

I suggest that we just gradually deprecate the old syntax from the docs,
and then around Postgres 16 eliminate it.  I posit that that's better
than changing the meaning of the old syntax out from under people.

-- 
Josh Berkus
Containers & Databases Oh My!



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Aug 24, 2017 at 3:11 AM, Josh Berkus <josh@berkus.org> wrote:
> On 08/22/2017 11:04 PM, Masahiko Sawada wrote:
>> WARNING:  what you did is ok, but you might have wanted to do something else
>>
>> First of all, whether or not that can properly be called a warning is
>> highly debatable.  Also, if you do that sort of thing to your spouse
>> and/or children, they call it "nagging".  I don't think users will
>> like it any more than family members do.
>
> Realistically, we'll support the backwards-compatible syntax for 3-5
> years.  Which is fine.
>
> I suggest that we just gradually deprecate the old syntax from the docs,
> and then around Postgres 16 eliminate it.  I posit that that's better
> than changing the meaning of the old syntax out from under people.
>

It seems to me that there is no folk who intently votes for making the
quorum commit the default. There some folks suggest to keep backward
compatibility in PG10 and gradually deprecate the old syntax. And only
the issuing from docs can be possible in PG10.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Quorum commit for multiple synchronous replication.

From
Masahiko Sawada
Date:
On Thu, Aug 24, 2017 at 4:27 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Aug 24, 2017 at 3:11 AM, Josh Berkus <josh@berkus.org> wrote:
>> On 08/22/2017 11:04 PM, Masahiko Sawada wrote:
>>> WARNING:  what you did is ok, but you might have wanted to do something else
>>>
>>> First of all, whether or not that can properly be called a warning is
>>> highly debatable.  Also, if you do that sort of thing to your spouse
>>> and/or children, they call it "nagging".  I don't think users will
>>> like it any more than family members do.
>>
>> Realistically, we'll support the backwards-compatible syntax for 3-5
>> years.  Which is fine.
>>
>> I suggest that we just gradually deprecate the old syntax from the docs,
>> and then around Postgres 16 eliminate it.  I posit that that's better
>> than changing the meaning of the old syntax out from under people.
>>
>
> It seems to me that there is no folk who intently votes for making the
> quorum commit the default. There some folks suggest to keep backward
> compatibility in PG10 and gradually deprecate the old syntax. And only
> the issuing from docs can be possible in PG10.
>

According to the discussion so far, it seems to me that keeping
backward compatibility and issuing a warning in docs that old syntax
could be changed or removed in a future release is the most acceptable
way in PG10. There is no objection against that so far and I already
posted a patch to add a warning in docs[1]. I'll wait for the
committer's decision.

[1] https://www.postgresql.org/message-id/CAD21AoAe%2BoGSFi3bjZ%2BfW6Q%3DTK7avPdDCZLEr02zM_c-U0JsRA%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center