Thread: Quorum commit for multiple synchronous replication.
Hi all, In 9.6 development cycle, we had been discussed about configuration syntax for a long time while considering expanding. As a result, we had new dedicated language for multiple synchronous replication, but it supports only priority method. We know that quorum commit is very useful for many users and can expand dedicated language easily for quorum commit. So I'd like to propose quorum commit for multiple synchronous replication here. The followings are changes attached patches made. - Add new syntax 'Any N ( node1, node2, ... )' to synchornous_standby_names for quorum commit. - In quorum commit, the master can return commit to client after received ACK from *at least* any N servers of listed standbys. - sync_priority of all listed servers are same, 1. - Add regression test for quorum commit. I was thinking that the syntax for quorum method would use '[ ... ]' but it will be confused with '( ... )' priority method used. 001 patch adds 'Any N ( ... )' style syntax but I know that we still might need to discuss about better syntax, discussion is very welcome. Attached draft patch, please give me feedback. Regards, -- Masahiko Sawada
Attachment
On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I was thinking that the syntax for quorum method would use '[ ... ]' > but it will be confused with '( ... )' priority method used. > 001 patch adds 'Any N ( ... )' style syntax but I know that we still > might need to discuss about better syntax, discussion is very welcome. > Attached draft patch, please give me feedback. I am +1 for using either "{}" or "[]" to define a quorum set, and -1 for the addition of a keyword in front of the integer defining for how many nodes server need to wait for. - foreach(cell, sync_standbys) + foreach (cell, sync_standbys) { - WalSnd *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)]; + WalSnd *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)]; This patch has some noise. -- Michael
On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> I was thinking that the syntax for quorum method would use '[ ... ]' >> but it will be confused with '( ... )' priority method used. >> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >> might need to discuss about better syntax, discussion is very welcome. >> Attached draft patch, please give me feedback. > > I am +1 for using either "{}" or "[]" to define a quorum set, and -1 > for the addition of a keyword in front of the integer defining for how > many nodes server need to wait for. Thank you for reply. "{}" or "[]" are not bad but because these are not intuitive, I thought that it will be hard for uses to use different method for each purpose. > - foreach(cell, sync_standbys) > + foreach (cell, sync_standbys) > { > - WalSnd *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)]; > + WalSnd *walsnd = &WalSndCtl->walsnds[lfirst_int(cell)]; > This patch has some noise. Will fix. -- Regards, -- Masahiko Sawada
On 04/08/16 06:40, Masahiko Sawada wrote: > On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> I was thinking that the syntax for quorum method would use '[ ... ]' >>> but it will be confused with '( ... )' priority method used. >>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >>> might need to discuss about better syntax, discussion is very welcome. >>> Attached draft patch, please give me feedback. >> >> I am +1 for using either "{}" or "[]" to define a quorum set, and -1 >> for the addition of a keyword in front of the integer defining for how >> many nodes server need to wait for. > > Thank you for reply. > "{}" or "[]" are not bad but because these are not intuitive, I > thought that it will be hard for uses to use different method for each > purpose. > I think the "any" keyword is more explicit and understandable, also closer to SQL. So I would be in favor of doing that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > On 04/08/16 06:40, Masahiko Sawada wrote: >> >> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> >>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>> wrote: >>>> >>>> I was thinking that the syntax for quorum method would use '[ ... ]' >>>> but it will be confused with '( ... )' priority method used. >>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >>>> might need to discuss about better syntax, discussion is very welcome. >>>> Attached draft patch, please give me feedback. >>> >>> >>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1 >>> for the addition of a keyword in front of the integer defining for how >>> many nodes server need to wait for. >> >> >> Thank you for reply. >> "{}" or "[]" are not bad but because these are not intuitive, I >> thought that it will be hard for uses to use different method for each >> purpose. >> > > I think the "any" keyword is more explicit and understandable, also closer > to SQL. So I would be in favor of doing that. +1 Also I like the following Simon's idea. https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com ----------------------- * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now * any k (n1, n2, n3) – would release waiters as soon as we have the responses from k out of N standbys. “any k” would be faster, so is desirable for performance and resilience ----------------------- Regards, -- Fujii Masao
On 29 August 2016 at 14:52, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> On 04/08/16 06:40, Masahiko Sawada wrote: >>> >>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> >>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>>> wrote: >>>>> >>>>> I was thinking that the syntax for quorum method would use '[ ... ]' >>>>> but it will be confused with '( ... )' priority method used. >>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >>>>> might need to discuss about better syntax, discussion is very welcome. >>>>> Attached draft patch, please give me feedback. >>>> >>>> >>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1 >>>> for the addition of a keyword in front of the integer defining for how >>>> many nodes server need to wait for. >>> >>> >>> Thank you for reply. >>> "{}" or "[]" are not bad but because these are not intuitive, I >>> thought that it will be hard for uses to use different method for each >>> purpose. >>> >> >> I think the "any" keyword is more explicit and understandable, also closer >> to SQL. So I would be in favor of doing that. > > +1 > > Also I like the following Simon's idea. > > https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com > ----------------------- > * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now > * any k (n1, n2, n3) – would release waiters as soon as we have the > responses from k out of N standbys. “any k” would be faster, so is > desirable for performance and resilience > ----------------------- +1 "synchronous_method" -> "synchronization_method" I'm concerned about the performance of this code. Can we work out a way of measuring it, so we can judge how successful we are at releasing waiters quickly? Thanks For 9.6 we implemented something that allows the DBA to define how slow programs are. Previously, since 9.1 this was something specified on the application side. I would like to put it back that way, so we end up with a parameter on client e.g. commit_quorum = k. Forget the exact parameters/user API for now, but I'd like to allow the code to work with user defined settings. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 6, 2016 at 11:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 29 August 2016 at 14:52, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >>> On 04/08/16 06:40, Masahiko Sawada wrote: >>>> >>>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier >>>> <michael.paquier@gmail.com> wrote: >>>>> >>>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>>>> wrote: >>>>>> >>>>>> I was thinking that the syntax for quorum method would use '[ ... ]' >>>>>> but it will be confused with '( ... )' priority method used. >>>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >>>>>> might need to discuss about better syntax, discussion is very welcome. >>>>>> Attached draft patch, please give me feedback. >>>>> >>>>> >>>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1 >>>>> for the addition of a keyword in front of the integer defining for how >>>>> many nodes server need to wait for. >>>> >>>> >>>> Thank you for reply. >>>> "{}" or "[]" are not bad but because these are not intuitive, I >>>> thought that it will be hard for uses to use different method for each >>>> purpose. >>>> >>> >>> I think the "any" keyword is more explicit and understandable, also closer >>> to SQL. So I would be in favor of doing that. >> >> +1 >> >> Also I like the following Simon's idea. >> >> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com >> ----------------------- >> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now >> * any k (n1, n2, n3) – would release waiters as soon as we have the >> responses from k out of N standbys. “any k” would be faster, so is >> desirable for performance and resilience >> ----------------------- > > +1 > > "synchronous_method" -> "synchronization_method" Thanks, will fix. > I'm concerned about the performance of this code. Can we work out a > way of measuring it, so we can judge how successful we are at > releasing waiters quickly? Thanks I will measure the performance effect of this code. I'm expecting that performances are, 'first 1 (n1, n2)' > 'any 1(n1, n2)' > 'first 2(n1, n2)' 'first 1 (n1, n2)' will be highest throughput. > For 9.6 we implemented something that allows the DBA to define how > slow programs are. Previously, since 9.1 this was something specified > on the application side. I would like to put it back that way, so we > end up with a parameter on client e.g. commit_quorum = k. Forget the > exact parameters/user API for now, but I'd like to allow the code to > work with user defined settings. Thanks. I see. The parameter on client should effect for priority method as well. And similar to synchronous_commit, the client can specify the how much standbys the master waits to commit for according to synchronization method, even if s_s_names is defined. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 08/29/2016 06:52 AM, Fujii Masao wrote: > Also I like the following Simon's idea. > > https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com > ----------------------- > * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now > * any k (n1, n2, n3) – would release waiters as soon as we have the > responses from k out of N standbys. “any k” would be faster, so is > desirable for performance and resilience What are we going to do for backwards compatibility, here? So, here's the dilemma: If we want to keep backwards compatibility with 9.6, then: "k (n1, n2, n3)" == "first k (n1, n2, n3)" However, "first k" is not what most users will want, most of the time; users of version 13, years from now, will be getting constantly confused by "first k" behavior when they wanted quorum. So the sensible default would be: "k (n1, n2, n3)" == "any k (n1, n2, n3)" ... however, that will break backwards compatibility. Thoughts? My $0.02 is that we break backwards compat somehow and document the heck out of it. -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
On Wed, Sep 7, 2016 at 12:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Sep 6, 2016 at 11:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 29 August 2016 at 14:52, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >>>> On 04/08/16 06:40, Masahiko Sawada wrote: >>>>> >>>>> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier >>>>> <michael.paquier@gmail.com> wrote: >>>>>> >>>>>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> I was thinking that the syntax for quorum method would use '[ ... ]' >>>>>>> but it will be confused with '( ... )' priority method used. >>>>>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still >>>>>>> might need to discuss about better syntax, discussion is very welcome. >>>>>>> Attached draft patch, please give me feedback. >>>>>> >>>>>> >>>>>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1 >>>>>> for the addition of a keyword in front of the integer defining for how >>>>>> many nodes server need to wait for. >>>>> >>>>> >>>>> Thank you for reply. >>>>> "{}" or "[]" are not bad but because these are not intuitive, I >>>>> thought that it will be hard for uses to use different method for each >>>>> purpose. >>>>> >>>> >>>> I think the "any" keyword is more explicit and understandable, also closer >>>> to SQL. So I would be in favor of doing that. >>> >>> +1 >>> >>> Also I like the following Simon's idea. >>> >>> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com >>> ----------------------- >>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now >>> * any k (n1, n2, n3) – would release waiters as soon as we have the >>> responses from k out of N standbys. “any k” would be faster, so is >>> desirable for performance and resilience >>> ----------------------- >> >> +1 >> >> "synchronous_method" -> "synchronization_method" > > Thanks, will fix. > >> I'm concerned about the performance of this code. Can we work out a >> way of measuring it, so we can judge how successful we are at >> releasing waiters quickly? Thanks > > I will measure the performance effect of this code. > I'm expecting that performances are, > 'first 1 (n1, n2)' > 'any 1(n1, n2)' > 'first 2(n1, n2)' > 'first 1 (n1, n2)' will be highest throughput. > Sorry, that's wrong. 'any 1(n1, n2)' will be highest throughput or same as 'first 1(n1, n2)'. >> For 9.6 we implemented something that allows the DBA to define how >> slow programs are. Previously, since 9.1 this was something specified >> on the application side. I would like to put it back that way, so we >> end up with a parameter on client e.g. commit_quorum = k. Forget the >> exact parameters/user API for now, but I'd like to allow the code to >> work with user defined settings. Thanks. > > I see. The parameter on client should effect for priority method as well. > And similar to synchronous_commit, the client can specify the how much > standbys the master waits to commit for according to synchronization > method, even if s_s_names is defined. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Sep 7, 2016 at 4:03 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 08/29/2016 06:52 AM, Fujii Masao wrote: >> Also I like the following Simon's idea. >> >> https://www.postgresql.org/message-id/CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com >> ----------------------- >> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now >> * any k (n1, n2, n3) – would release waiters as soon as we have the >> responses from k out of N standbys. “any k” would be faster, so is >> desirable for performance and resilience > > What are we going to do for backwards compatibility, here? > > So, here's the dilemma: > > If we want to keep backwards compatibility with 9.6, then: > > "k (n1, n2, n3)" == "first k (n1, n2, n3)" > > However, "first k" is not what most users will want, most of the time; > users of version 13, years from now, will be getting constantly confused > by "first k" behavior when they wanted quorum. So the sensible default > would be: > > "k (n1, n2, n3)" == "any k (n1, n2, n3)" > +1. "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward compatibility but most users would think "k(n1, n2, n3)" as quorum after introduced quorum. I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, n3)" style before 9.6 releasing if we got consensus. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward > compatibility but most users would think "k(n1, n2, n3)" as quorum > after introduced quorum. > I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, > n3)" style before 9.6 releasing if we got consensus. Considering breaking backward-compatibility in the next release does not sound like a good idea to me for a new feature that is going to be GA soon. -- Michael
On 09/09/2016 03:28 AM, Michael Paquier wrote: > On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward >> compatibility but most users would think "k(n1, n2, n3)" as quorum >> after introduced quorum. >> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, >> n3)" style before 9.6 releasing if we got consensus. > > Considering breaking backward-compatibility in the next release does > not sound like a good idea to me for a new feature that is going to be > GA soon. Indeed. I'll vote for pulling a fast one on 9.6 for this. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 09/09/16 08:23, Vik Fearing wrote: > On 09/09/2016 03:28 AM, Michael Paquier wrote: >> On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward >>> compatibility but most users would think "k(n1, n2, n3)" as quorum >>> after introduced quorum. >>> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, >>> n3)" style before 9.6 releasing if we got consensus. >> >> Considering breaking backward-compatibility in the next release does >> not sound like a good idea to me for a new feature that is going to be >> GA soon. > > Indeed. I'll vote for pulling a fast one on 9.6 for this. > +1 -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 8 September 2016 at 10:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward > compatibility but most users would think "k(n1, n2, n3)" as quorum > after introduced quorum. > I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, > n3)" style before 9.6 releasing if we got consensus. Let's see the proposed patch, so we can evaluate the proposal. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 9, 2016 at 6:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 8 September 2016 at 10:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward >> compatibility but most users would think "k(n1, n2, n3)" as quorum >> after introduced quorum. >> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2, >> n3)" style before 9.6 releasing if we got consensus. > > Let's see the proposed patch, so we can evaluate the proposal. > Attached 2 patches. 000 patch changes syntax of s_s_names from 'k(n1, n2, n3)' to 'First k (n1, n2,n3)' for PG9.6. 001 patch adds the quorum commit using syntax 'Any k (n1, n2,n3)' for PG10. Since we already released 9.6RC1, I understand that it's quite hard to change syntax of 9.6. But considering that we support the quorum commit, this could be one of the solutions in order to avoid breaking backward compatibility and to provide useful user interface. So I attached these patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Since we already released 9.6RC1, I understand that it's quite hard to > change syntax of 9.6. > But considering that we support the quorum commit, this could be one > of the solutions in order to avoid breaking backward compatibility and > to provide useful user interface. > So I attached these patches. standby_config: - standby_list { $$ = create_syncrep_config("1", $1); } - | FIRST NUM '(' standby_list ')' { $$ = create_syncrep_config($1, $4); } + standby_list { $$ = create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } + | ANY NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_QUORUM); } + | FIRST NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } Reading again the thread, it seems that my previous post [1] was a bit misunderstood. My position is to not introduce any new behavior changes in 9.6, so we could just make the FIRST NUM grammar equivalent to NUM. [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com -- Michael
On 09/21/2016 08:30 AM, Michael Paquier wrote: > On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Since we already released 9.6RC1, I understand that it's quite hard to >> change syntax of 9.6. >> But considering that we support the quorum commit, this could be one >> of the solutions in order to avoid breaking backward compatibility and >> to provide useful user interface. >> So I attached these patches. > > standby_config: > - standby_list { $$ = create_syncrep_config("1", $1); } > - | FIRST NUM '(' standby_list ')' { $$ = > create_syncrep_config($1, $4); } > + standby_list { $$ = > create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } > + | ANY NUM '(' standby_list ')' { $$ = > create_syncrep_config($2, $4, SYNC_REP_QUORUM); } > + | FIRST NUM '(' standby_list ')' { $$ = > create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } > > Reading again the thread, it seems that my previous post [1] was a bit > misunderstood. My position is to not introduce any new behavior > changes in 9.6, so we could just make the FIRST NUM grammar equivalent > to NUM. > > [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com I misunderstood your intent, then. But I still stand by what I did understand, namely that 'k (...)' should mean 'any k (...)'. It's much more natural than having it mean 'first k (...)' and I also think it will be more frequent in practice. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 21/09/16 09:18, Vik Fearing wrote: > On 09/21/2016 08:30 AM, Michael Paquier wrote: >> On Sat, Sep 17, 2016 at 2:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Since we already released 9.6RC1, I understand that it's quite hard to >>> change syntax of 9.6. >>> But considering that we support the quorum commit, this could be one >>> of the solutions in order to avoid breaking backward compatibility and >>> to provide useful user interface. >>> So I attached these patches. >> >> standby_config: >> - standby_list { $$ = create_syncrep_config("1", $1); } >> - | FIRST NUM '(' standby_list ')' { $$ = >> create_syncrep_config($1, $4); } >> + standby_list { $$ = >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } >> + | ANY NUM '(' standby_list ')' { $$ = >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); } >> + | FIRST NUM '(' standby_list ')' { $$ = >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } >> >> Reading again the thread, it seems that my previous post [1] was a bit >> misunderstood. My position is to not introduce any new behavior >> changes in 9.6, so we could just make the FIRST NUM grammar equivalent >> to NUM. >> >> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com > > I misunderstood your intent, then. But I still stand by what I did > understand, namely that 'k (...)' should mean 'any k (...)'. It's much > more natural than having it mean 'first k (...)' and I also think it > will be more frequent in practice. > I think so as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Sep 21, 2016 at 5:54 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: >>> Reading again the thread, it seems that my previous post [1] was a bit >>> misunderstood. My position is to not introduce any new behavior >>> changes in 9.6, so we could just make the FIRST NUM grammar equivalent >>> to NUM. >>> >>> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com >> >> I misunderstood your intent, then. But I still stand by what I did >> understand, namely that 'k (...)' should mean 'any k (...)'. It's much >> more natural than having it mean 'first k (...)' and I also think it >> will be more frequent in practice. >> > > I think so as well. Well, I agree, but I think making behavior changes after rc1 is a non-starter. It's better to live with the incompatibility than to change the behavior so close to release. At least, that's my position. Getting the release out on time with a minimal bug count is more important to me than a minor incompatibility in the meaning of one GUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 21, 2016 at 11:22 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 21, 2016 at 5:54 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: >>>> Reading again the thread, it seems that my previous post [1] was a bit >>>> misunderstood. My position is to not introduce any new behavior >>>> changes in 9.6, so we could just make the FIRST NUM grammar equivalent >>>> to NUM. >>>> >>>> [1]: https://www.postgresql.org/message-id/CAB7nPqRDvJn18e54ccNpOP1A2_iUN6-iU=4nJgmMgiAgvcSDKA@mail.gmail.com >>> >>> I misunderstood your intent, then. But I still stand by what I did >>> understand, namely that 'k (...)' should mean 'any k (...)'. It's much >>> more natural than having it mean 'first k (...)' and I also think it >>> will be more frequent in practice. >>> >> >> I think so as well. > > Well, I agree, but I think making behavior changes after rc1 is a > non-starter. It's better to live with the incompatibility than to > change the behavior so close to release. At least, that's my > position. Getting the release out on time with a minimal bug count is > more important to me than a minor incompatibility in the meaning of > one GUC. > As the release team announced, it's better to postpone changing the syntax of existing s_s_name. I still vote for changing behaviour of existing syntax 'k (n1, n2)' to quorum commit. That is, 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs from k standby servers whose name appear earlier in the list. 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs from any k listed standby servers. 3. 'n1, n2, n3' is the same as #1 with k=1. 4. '(n1, n2, n3)' is the same as #2 with k=1. Attached updated patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I still vote for changing behaviour of existing syntax 'k (n1, n2)' to > quorum commit. > That is, > 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs > from k standby servers whose name appear earlier in the list. > 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs > from any k listed standby servers. > 3. 'n1, n2, n3' is the same as #1 with k=1. > 4. '(n1, n2, n3)' is the same as #2 with k=1. OK, so I have done a review of this patch keeping that in mind as that's the consensus. I am still getting familiar with the code... - transactions will wait until all the standby servers which are considered + transactions will wait until the multiple standby servers which are considered There is no real need to update this sentence. + <literal>FIRST</> means to control the standby servers with + different priorities. The synchronous standbys will be those + whose name appear earlier in this list, and that are both + currently connected and streaming data in real-time(as shown + by a state of <literal>streaming</> in the + <link linkend="monitoring-stats-views-table"> + <literal>pg_stat_replication</></link> view). Other standby + servers appearing later in this list represent potential + synchronous standbys. If any of the current synchronous + standbys disconnects for whatever reason, it will be replaced + immediately with the next-highest-priority standby. + For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</> + makes transaction commits wait until their WAL records are received + by three higher-priority standbys chosen from standby servers + <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>. It does not seem necessary to me to enter in this level of details: The keyword FIRST, coupled with an integer number N, chooses the first N higher-priority standbys and makes transaction commit when their WAL records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</> makes transaction commits wait until their WAL records are received by the three high-priority standbys chosen from standby servers s1, s2, s3 and s4. + <literal>ANY</> means to control all of standby servers with + same priority. The master sever will wait for receipt from + at least <replaceable class="parameter">num_sync</replaceable> + standbys, which is quorum commit in the literature. The all of + listed standbys are considered as candidate of quorum commit. + For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes + transaction commits wait until receiving receipts from at least + any three standbys of four listed servers <literal>s1</>, + <literal>s2</>, <literal>s3</>, <literal>s4</>. Similarly, something like that... The keyword ANY, coupled with an integer number N, chooses N standbys in a set of standbys with the same, lowest, priority and makes transaction commit when WAL records are received those N standbys. For example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL records have been received from 3 servers in the set s1, s2, s3 and s4. It could be good also to mention that no keyword specified means ANY, which is incompatible with 9.6. The docs also miss the fact that if a simple list of servers is given, without parenthesis and keywords, this is equivalent to FIRST 1. -synchronous_standby_names = '2 (s1, s2, s3)' +synchronous_standby_names = 'First 2 (s1, s2, s3)' Nit here. It may be a good idea to just use upper-case characters in the docs, or just lower-case for consistency, but not mix both. Usually GUCs use lower-case characters. + when standby is considered as a condidate of quorum commit.</entry> s/condidate/candidate/ -syncrep_scanner.c: FLEXFLAGS = -CF -p +syncrep_scanner.c: FLEXFLAGS = -CF -p -i Hm... Is that actually a good idea? Now "NODE" and "node" are two different things for application_name, but with this patch both would have the same meaning. I am getting to think that we could just use the lower-case characters for the keywords any/first. Is this -i switch a problem for elements in standby_list? + * Calculate the 'pos' newest Write, Flush and Apply positions among sync standbys. I don't understand this comment. + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) + got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr, + &applyPtr, &am_sync); + else /* SYNC_REP_QUORUM */ + got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr, + &applyPtr, SyncRepConfig->num_sync, + &am_sync); Those could be grouped together, there is no need to have pos as an argument. + /* In quroum method, all sync standby priorities are always 1 */ + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) + priority = 1; This is dead code, SyncRepGetSyncStandbysPriority is not called for QUORUM. You may want to add an assert in SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be sure that they are getting called for the correct method. + /* Sort each array in descending order to get 'pos' newest element */ + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); There is no need to reorder things again and to use arrays, you can choose the newest LSNs when scanning the WalSnd entries. -- Michael
On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > OK, so I have done a review of this patch keeping that in mind as > that's the consensus. I am still getting familiar with the code... Returned with feedback for now. This just needs polishing so feel free to move it to the next CF once you have a new patch. -- Michael
On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to >> quorum commit. >> That is, >> 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs >> from k standby servers whose name appear earlier in the list. >> 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs >> from any k listed standby servers. >> 3. 'n1, n2, n3' is the same as #1 with k=1. >> 4. '(n1, n2, n3)' is the same as #2 with k=1. > > OK, so I have done a review of this patch keeping that in mind as > that's the consensus. I am still getting familiar with the code... Thank you for reviewing! > - transactions will wait until all the standby servers which are considered > + transactions will wait until the multiple standby servers which > are considered > There is no real need to update this sentence. > > + <literal>FIRST</> means to control the standby servers with > + different priorities. The synchronous standbys will be those > + whose name appear earlier in this list, and that are both > + currently connected and streaming data in real-time(as shown > + by a state of <literal>streaming</> in the > + <link linkend="monitoring-stats-views-table"> > + <literal>pg_stat_replication</></link> view). Other standby > + servers appearing later in this list represent potential > + synchronous standbys. If any of the current synchronous > + standbys disconnects for whatever reason, it will be replaced > + immediately with the next-highest-priority standby. > + For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</> > + makes transaction commits wait until their WAL records are received > + by three higher-priority standbys chosen from standby servers > + <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>. > It does not seem necessary to me to enter in this level of details: > The keyword FIRST, coupled with an integer number N, chooses the first > N higher-priority standbys and makes transaction commit when their WAL > records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</> > makes transaction commits wait until their WAL records are received by > the three high-priority standbys chosen from standby servers s1, s2, > s3 and s4. Will fix. > + <literal>ANY</> means to control all of standby servers with > + same priority. The master sever will wait for receipt from > + at least <replaceable class="parameter">num_sync</replaceable> > + standbys, which is quorum commit in the literature. The all of > + listed standbys are considered as candidate of quorum commit. > + For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes > + transaction commits wait until receiving receipts from at least > + any three standbys of four listed servers <literal>s1</>, > + <literal>s2</>, <literal>s3</>, <literal>s4</>. > > Similarly, something like that... > The keyword ANY, coupled with an integer number N, chooses N standbys > in a set of standbys with the same, lowest, priority and makes > transaction commit when WAL records are received those N standbys. For > example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL > records have been received from 3 servers in the set s1, s2, s3 and > s4. Will fix. > It could be good also to mention that no keyword specified means ANY, > which is incompatible with 9.6. The docs also miss the fact that if a > simple list of servers is given, without parenthesis and keywords, > this is equivalent to FIRST 1. Right. I will add those documentations. > -synchronous_standby_names = '2 (s1, s2, s3)' > +synchronous_standby_names = 'First 2 (s1, s2, s3)' > Nit here. It may be a good idea to just use upper-case characters in > the docs, or just lower-case for consistency, but not mix both. > Usually GUCs use lower-case characters. Agree. Will fix. > + when standby is considered as a condidate of quorum commit.</entry> > s/condidate/candidate/ Will fix. > -syncrep_scanner.c: FLEXFLAGS = -CF -p > +syncrep_scanner.c: FLEXFLAGS = -CF -p -i > Hm... Is that actually a good idea? Now "NODE" and "node" are two > different things for application_name, but with this patch both would > have the same meaning. I am getting to think that we could just use > the lower-case characters for the keywords any/first. Is this -i > switch a problem for elements in standby_list? The string of standby name is not changed actually, only the parser doesn't distinguish between "NODE" and "node". The values used for checking application_name will still works fine. If we want to name "first" or "any" as the standby name then it should be double quoted. > + * Calculate the 'pos' newest Write, Flush and Apply positions among > sync standbys. > I don't understand this comment. > > + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) > + got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr, > + &applyPtr, &am_sync); > + else /* SYNC_REP_QUORUM */ > + got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr, > + &applyPtr, > SyncRepConfig->num_sync, > + &am_sync); > Those could be grouped together, there is no need to have pos as an argument. Will fix. > + /* In quroum method, all sync standby priorities are always 1 */ > + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) > + priority = 1; > This is dead code, SyncRepGetSyncStandbysPriority is not called for > QUORUM. Well, this code is in SyncRepGetStandbyPriority which is called by SyncRepInitConifig. SyncRepGetStandbyPriority can be called regardless of the the synchronization method. > You may want to add an assert in > SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be > sure that they are getting called for the correct method. > + /* Sort each array in descending order to get 'pos' newest element */ > + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); > + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); > + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); > There is no need to reorder things again and to use arrays, you can > choose the newest LSNs when scanning the WalSnd entries. I considered it that but it depends on performance. Current patch avoids O(N*M). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> You may want to add an assert in >> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be >> sure that they are getting called for the correct method. >> + /* Sort each array in descending order to get 'pos' newest element */ >> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); >> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); >> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); >> There is no need to reorder things again and to use arrays, you can >> choose the newest LSNs when scanning the WalSnd entries. > > I considered it that but it depends on performance. > Current patch avoids O(N*M). I am surprised by this statement. You would have O(N) by just discarding the oldest LSN values while holding the spinlock of each WAL sender. What SyncRepGetNNewestSyncRecPtr looks for are just the newest apply, write and flush positions. -- Michael
On Tue, Oct 11, 2016 at 6:08 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> You may want to add an assert in >>> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be >>> sure that they are getting called for the correct method. >>> + /* Sort each array in descending order to get 'pos' newest element */ >>> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); >>> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); >>> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); >>> There is no need to reorder things again and to use arrays, you can >>> choose the newest LSNs when scanning the WalSnd entries. >> >> I considered it that but it depends on performance. >> Current patch avoids O(N*M). > > I am surprised by this statement. You would have O(N) by just > discarding the oldest LSN values while holding the spinlock of each > WAL sender. What SyncRepGetNNewestSyncRecPtr looks for are just the > newest apply, write and flush positions. Bah, stupid. I just missed the point with 'pos'. Now I see the trick. -- Michael
On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to >>> quorum commit. >>> That is, >>> 1. 'First k (n1, n2, n3)' means that the master server waits for ACKs >>> from k standby servers whose name appear earlier in the list. >>> 2. 'Any k (n1, n2, n3)' means that the master server waits for ACKs >>> from any k listed standby servers. >>> 3. 'n1, n2, n3' is the same as #1 with k=1. >>> 4. '(n1, n2, n3)' is the same as #2 with k=1. >> >> OK, so I have done a review of this patch keeping that in mind as >> that's the consensus. I am still getting familiar with the code... > > Thank you for reviewing! > >> - transactions will wait until all the standby servers which are considered >> + transactions will wait until the multiple standby servers which >> are considered >> There is no real need to update this sentence. >> >> + <literal>FIRST</> means to control the standby servers with >> + different priorities. The synchronous standbys will be those >> + whose name appear earlier in this list, and that are both >> + currently connected and streaming data in real-time(as shown >> + by a state of <literal>streaming</> in the >> + <link linkend="monitoring-stats-views-table"> >> + <literal>pg_stat_replication</></link> view). Other standby >> + servers appearing later in this list represent potential >> + synchronous standbys. If any of the current synchronous >> + standbys disconnects for whatever reason, it will be replaced >> + immediately with the next-highest-priority standby. >> + For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</> >> + makes transaction commits wait until their WAL records are received >> + by three higher-priority standbys chosen from standby servers >> + <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>. >> It does not seem necessary to me to enter in this level of details: >> The keyword FIRST, coupled with an integer number N, chooses the first >> N higher-priority standbys and makes transaction commit when their WAL >> records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</> >> makes transaction commits wait until their WAL records are received by >> the three high-priority standbys chosen from standby servers s1, s2, >> s3 and s4. > > Will fix. > >> + <literal>ANY</> means to control all of standby servers with >> + same priority. The master sever will wait for receipt from >> + at least <replaceable class="parameter">num_sync</replaceable> >> + standbys, which is quorum commit in the literature. The all of >> + listed standbys are considered as candidate of quorum commit. >> + For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes >> + transaction commits wait until receiving receipts from at least >> + any three standbys of four listed servers <literal>s1</>, >> + <literal>s2</>, <literal>s3</>, <literal>s4</>. >> >> Similarly, something like that... >> The keyword ANY, coupled with an integer number N, chooses N standbys >> in a set of standbys with the same, lowest, priority and makes >> transaction commit when WAL records are received those N standbys. For >> example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL >> records have been received from 3 servers in the set s1, s2, s3 and >> s4. > > Will fix. > >> It could be good also to mention that no keyword specified means ANY, >> which is incompatible with 9.6. The docs also miss the fact that if a >> simple list of servers is given, without parenthesis and keywords, >> this is equivalent to FIRST 1. > > Right. I will add those documentations. > >> -synchronous_standby_names = '2 (s1, s2, s3)' >> +synchronous_standby_names = 'First 2 (s1, s2, s3)' >> Nit here. It may be a good idea to just use upper-case characters in >> the docs, or just lower-case for consistency, but not mix both. >> Usually GUCs use lower-case characters. > > Agree. Will fix. > >> + when standby is considered as a condidate of quorum commit.</entry> >> s/condidate/candidate/ > > Will fix. > >> -syncrep_scanner.c: FLEXFLAGS = -CF -p >> +syncrep_scanner.c: FLEXFLAGS = -CF -p -i >> Hm... Is that actually a good idea? Now "NODE" and "node" are two >> different things for application_name, but with this patch both would >> have the same meaning. I am getting to think that we could just use >> the lower-case characters for the keywords any/first. Is this -i >> switch a problem for elements in standby_list? > > The string of standby name is not changed actually, only the parser > doesn't distinguish between "NODE" and "node". > The values used for checking application_name will still works fine. > If we want to name "first" or "any" as the standby name then it should > be double quoted. > >> + * Calculate the 'pos' newest Write, Flush and Apply positions among >> sync standbys. >> I don't understand this comment. >> >> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) >> + got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr, >> + &applyPtr, &am_sync); >> + else /* SYNC_REP_QUORUM */ >> + got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr, >> + &applyPtr, >> SyncRepConfig->num_sync, >> + &am_sync); >> Those could be grouped together, there is no need to have pos as an argument. > > Will fix. > >> + /* In quroum method, all sync standby priorities are always 1 */ >> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) >> + priority = 1; >> This is dead code, SyncRepGetSyncStandbysPriority is not called for >> QUORUM. > > Well, this code is in SyncRepGetStandbyPriority which is called by > SyncRepInitConifig. > SyncRepGetStandbyPriority can be called regardless of the the > synchronization method. > > >> You may want to add an assert in >> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be >> sure that they are getting called for the correct method. >> + /* Sort each array in descending order to get 'pos' newest element */ >> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); >> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); >> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); >> There is no need to reorder things again and to use arrays, you can >> choose the newest LSNs when scanning the WalSnd entries. > > I considered it that but it depends on performance. > Current patch avoids O(N*M). > Attached latest patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached latest patch. > Please review it. Okay, so let's move on with this patch... + <para> + The keyword <literal>ANY</> is omissible, but note that there is + not compatibility between <productname>PostgreSQL</> version 10 and + 9.6 or before. For example, <literal>1 (s1, s2)</> is the same as the + configuration with <literal>FIRST</> and <replaceable class="parameter"> + num_sync</replaceable> equal to 1 in <productname>PostgreSQL</> 9.6 + or before. On the other hand, It's the same as the configuration with + <literal>ANY</> and <replaceable class="parameter">num_sync</> equal to + 1 in <productname>PostgreSQL</> 10 or later. + </para> This paragraph could be reworded: If FIRST or ANY are not specified, this parameter behaves as ANY. Note that this grammar is incompatible with PostgreSQL 9.6, where no keyword specified is equivalent as if FIRST was specified. In short, there is no real need to specify num_sync as this behavior does not have changed, as well as it is not necessary to mention pre-9.6 versions as the multi-sync grammar has been added in 9.6. - Specifying more than one standby name can allow very high availability. Why removing this sentence? + The keyword <literal>ANY</>, coupeld with an interger number N, s/coupeld/coupled/ and s/interger/integer/, for a double hit in one line, still... + The keyword <literal>ANY</>, coupeld with an interger number N, + chooses N standbys in a set of standbys with the same, lowest, + priority and makes transaction commit when WAL records are received + those N standbys. This could be reworded more simply, for example: The keyword ANY, coupled with an integer number N, makes transaction commits wait until WAL records are received from N connected standbys among those defined in the list of synchronous_standby_names. + <literal>s2</> and <literal>s3</> wil be considered as synchronous standby s/wil/will/ + when standby is considered as a condidate of quorum commit.</entry> s/condidate/candidate/ [... stopping here ...] Please be more careful with the documentation and comment grammar. There are other things in the patch.. A bunch of comments at the top of syncrep.c need to be updated. +extern List *SyncRepGetSyncStandbysPriority(bool *am_sync); +extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync); Those two should be static routines in syncrep.c, let's keep the whole logic about quorum and higher-priority definition only there and not bother the callers of them about that. + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) + return SyncRepGetSyncStandbysPriority(am_sync); + else /* SYNC_REP_QUORUM */ + return SyncRepGetSyncStandbysQuorum(am_sync); Both routines share the same logic to detect if a WAL sender can be selected as a candidate for sync evaluation or not, still per the selection they do I agree that it is better to keep them as separate. + /* In quroum method, all sync standby priorities are always 1 */ + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) + priority = 1; Honestly I don't understand why you are enforcing that. Priority can be important for users willing to switch from ANY to FIRST to have a look immediately at what are the standbys that would become sync or potential. else if (list_member_int(sync_standbys, i)) - values[7] = CStringGetTextDatum("sync"); + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); The comment at the top of this code block needs to be refreshed. The representation given to the user in pg_stat_replication is not enough IMO. For example, imagine a cluster with 4 standbys: =# select application_name, sync_priority, sync_state from pg_stat_replication ;application_name | sync_priority | sync_state ------------------+---------------+------------node_5433 | 0 | asyncnode_5434 | 0 |asyncnode_5435 | 0 | asyncnode_5436 | 0 | async (4 rows) If FIRST N is used, is it easy for the user to understand what are the nodes in sync: =# alter system set synchronous_standby_names = 'FIRST 2 (node_5433, node_5434, node_5435)'; ALTER SYSTEM =# select pg_reload_conf();pg_reload_conf ----------------t (1 row) =# select application_name, sync_priority, sync_state from pg_stat_replication ;application_name | sync_priority | sync_state ------------------+---------------+------------node_5433 | 1 | syncnode_5434 | 2 |syncnode_5435 | 3 | potentialnode_5436 | 0 | async (4 rows) In this case it is easy to understand that two nodes are required to be in sync. When using ANY similarly for three nodes, here is what pg_stat_replication tells: =# select application_name, sync_priority, sync_state from pg_stat_replication ;application_name | sync_priority | sync_state ------------------+---------------+------------node_5433 | 1 | quorumnode_5434 | 1| quorumnode_5435 | 1 | quorumnode_5436 | 0 | async (4 rows) It is not possible to guess from how many standbys this needs to wait for. One idea would be to mark the sync_state not as "quorum", but "quorum-N", or just add a new column to indicate how many in the set need to give a commit confirmation. The patch is going into the right direction in my opinion. -- Michael
On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached latest patch. >> Please review it. > > Okay, so let's move on with this patch... Thank you for reviewing this patch. > + <para> > + The keyword <literal>ANY</> is omissible, but note that there is > + not compatibility between <productname>PostgreSQL</> version 10 and > + 9.6 or before. For example, <literal>1 (s1, s2)</> is the same as the > + configuration with <literal>FIRST</> and <replaceable > class="parameter"> > + num_sync</replaceable> equal to 1 in <productname>PostgreSQL</> 9.6 > + or before. On the other hand, It's the same as the configuration with > + <literal>ANY</> and <replaceable > class="parameter">num_sync</> equal to > + 1 in <productname>PostgreSQL</> 10 or later. > + </para> > This paragraph could be reworded: > If FIRST or ANY are not specified, this parameter behaves as ANY. Note > that this grammar is incompatible with PostgreSQL 9.6, where no > keyword specified is equivalent as if FIRST was specified. > In short, there is no real need to specify num_sync as this behavior > does not have changed, as well as it is not necessary to mention > pre-9.6 versions as the multi-sync grammar has been added in 9.6. Fixed. > - Specifying more than one standby name can allow very high availability. > Why removing this sentence? > > + The keyword <literal>ANY</>, coupeld with an interger number N, > s/coupeld/coupled/ and s/interger/integer/, for a double hit in one > line, still... > > + The keyword <literal>ANY</>, coupeld with an interger number N, > + chooses N standbys in a set of standbys with the same, lowest, > + priority and makes transaction commit when WAL records are received > + those N standbys. > This could be reworded more simply, for example: The keyword ANY, > coupled with an integer number N, makes transaction commits wait until > WAL records are received from N connected standbys among those defined > in the list of synchronous_standby_names. > > + <literal>s2</> and <literal>s3</> wil be considered as synchronous standby > s/wil/will/ > > + when standby is considered as a condidate of quorum commit.</entry> > s/condidate/candidate/ > > [... stopping here ...] Please be more careful with the documentation > and comment grammar. There are other things in the patch.. I fix some typo as much as I found. > A bunch of comments at the top of syncrep.c need to be updated. > > +extern List *SyncRepGetSyncStandbysPriority(bool *am_sync); > +extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync); > Those two should be static routines in syncrep.c, let's keep the whole > logic about quorum and higher-priority definition only there and not > bother the callers of them about that. Fixed. > + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) > + return SyncRepGetSyncStandbysPriority(am_sync); > + else /* SYNC_REP_QUORUM */ > + return SyncRepGetSyncStandbysQuorum(am_sync); > Both routines share the same logic to detect if a WAL sender can be > selected as a candidate for sync evaluation or not, still per the > selection they do I agree that it is better to keep them as separate. > > + /* In quroum method, all sync standby priorities are always 1 */ > + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) > + priority = 1; > Honestly I don't understand why you are enforcing that. Priority can > be important for users willing to switch from ANY to FIRST to have a > look immediately at what are the standbys that would become sync or > potential. I thought that since all standbys appearing in s_s_names list are treated equally in quorum method, these standbys should have same priority. If these standby have different sync_priority, it looks like that master server replicates to standby server based on priority. > else if (list_member_int(sync_standbys, i)) > - values[7] = CStringGetTextDatum("sync"); > + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? > + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); > The comment at the top of this code block needs to be refreshed. Fixed. > The representation given to the user in pg_stat_replication is not > enough IMO. For example, imagine a cluster with 4 standbys: > =# select application_name, sync_priority, sync_state from pg_stat_replication ; > application_name | sync_priority | sync_state > ------------------+---------------+------------ > node_5433 | 0 | async > node_5434 | 0 | async > node_5435 | 0 | async > node_5436 | 0 | async > (4 rows) > > If FIRST N is used, is it easy for the user to understand what are the > nodes in sync: > =# alter system set synchronous_standby_names = 'FIRST 2 (node_5433, > node_5434, node_5435)'; > ALTER SYSTEM > =# select pg_reload_conf(); > pg_reload_conf > ---------------- > t > (1 row) > =# select application_name, sync_priority, sync_state from pg_stat_replication ; > application_name | sync_priority | sync_state > ------------------+---------------+------------ > node_5433 | 1 | sync > node_5434 | 2 | sync > node_5435 | 3 | potential > node_5436 | 0 | async > (4 rows) > > In this case it is easy to understand that two nodes are required to be in sync. > > When using ANY similarly for three nodes, here is what > pg_stat_replication tells: > =# select application_name, sync_priority, sync_state from pg_stat_replication ; > application_name | sync_priority | sync_state > ------------------+---------------+------------ > node_5433 | 1 | quorum > node_5434 | 1 | quorum > node_5435 | 1 | quorum > node_5436 | 0 | async > (4 rows) > > It is not possible to guess from how many standbys this needs to wait > for. One idea would be to mark the sync_state not as "quorum", but > "quorum-N", or just add a new column to indicate how many in the set > need to give a commit confirmation. As Simon suggested before, we could support another feature that allows the client to control the quorum number. Considering adding that feature, I thought it's better to have and control that information as a GUC parameter. Thought? Attached latest v5 patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) >> + return SyncRepGetSyncStandbysPriority(am_sync); >> + else /* SYNC_REP_QUORUM */ >> + return SyncRepGetSyncStandbysQuorum(am_sync); >> Both routines share the same logic to detect if a WAL sender can be >> selected as a candidate for sync evaluation or not, still per the >> selection they do I agree that it is better to keep them as separate. >> >> + /* In quroum method, all sync standby priorities are always 1 */ >> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) >> + priority = 1; >> Honestly I don't understand why you are enforcing that. Priority can >> be important for users willing to switch from ANY to FIRST to have a >> look immediately at what are the standbys that would become sync or >> potential. > > I thought that since all standbys appearing in s_s_names list are > treated equally in quorum method, these standbys should have same > priority. > If these standby have different sync_priority, it looks like that > master server replicates to standby server based on priority. No actually, because we know that they are a quorum set, and that they work in the same set. The concept of priorities has no real meaning for quorum as there is no ordering of the elements. Another, perhaps cleaner idea may be to mark the field as NULL actually. >> It is not possible to guess from how many standbys this needs to wait >> for. One idea would be to mark the sync_state not as "quorum", but >> "quorum-N", or just add a new column to indicate how many in the set >> need to give a commit confirmation. > > As Simon suggested before, we could support another feature that > allows the client to control the quorum number. > Considering adding that feature, I thought it's better to have and > control that information as a GUC parameter. > Thought? Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry is not that much legitimate, users could just look at s_s_names to guess how many in hte set a commit needs to wait for. + <para> + <literal>FIRST</> and <literal>ANY</> are case-insensitive word + and the standby name having these words are must be double-quoted. + </para> s/word/words/. + <literal>FIRST</> and <literal>ANY</> specify the method of + that how master server controls the standby servers. A little bit hard to understand, I would suggest: FIRST and ANY specify the method used by the master to control the standby servers. + The keyword <literal>FIRST</>, coupled with an integer + number N higher-priority standbys and makes transaction commit + when their WAL records are received. This is unclear to me. Here is a correction: The keyword FIRST, coupled with an integer N, makes transaction commit wait until WAL records are received fron the N standbys with higher priority number. + <varname>synchronous_standby_names</>. For example, a setting + of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits + wait until receiving receipts from at least any three standbys + of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>, This could just mention WAL records instead of "receipts". Instead of saying "an integer number N", we could use <literal>num_sync</>. + If <literal>FIRST</> or <literal>ANY</> are not specified, this parameter + behaves as <literal>ANY</>. Note that this grammar is incompatible with + <productname>PostgresSQL</> 9.6, where no keyword specified is equivalent + as if <literal>FIRST</> was specified. In short, there is no real need to + specify <replaceable class="parameter">num_sync</replaceable> as this + behavior does not have changed, as well as it is not necessary to mention + pre-9.6 versions are the multi-sync grammar has been added in 9.6. This paragraph could be reworked, say: if FIRST or ANY are not specified this parameter behaves as if ANY is used. Note that this grammar is incompatible with PostgreSQL 9.6 which is the first version supporting multiple standbys with synchronous replication, where no such keyword FIRST or ANY can be used. Note that the grammar behaves as if FIRST is used, which is incompatible with the post-9.6 behavior. + <entry>Synchronous state of this standby server. <literal>quorum-N</> + , where N is the number of synchronous standbys that transactions + need to wait for replies from, when standby is considered as a + candidate of quorum commit.</entry> Nitpicking: I think that the comma goes to the previous line if it is the first character of a line. + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) + return SyncRepGetSyncStandbysPriority(am_sync); + else /* SYNC_REP_QUORUM */ + return SyncRepGetSyncStandbysQuorum(am_sync) Or that? if (PRIORITY) return StandbysPriority(); else if (QUORUM) return StandbysQuorum(); else elog(ERROR, "Boom"); + * In priority method, we need the oldest these positions among sync + * standbys. In quorum method, we need the newest these positions + * specified by SyncRepConfig->num_sync. Last sentence is grammatically incorrect, and it would be more correct to precise the Nth LSN positions to be able to select k standbys from a set of n ones. + SpinLockAcquire(&walsnd->mutex); + write_array[i] = walsnd->write; + flush_array[i]= walsnd->flush; + apply_array[i] = walsnd->flush; + SpinLockRelease(&walsnd->mutex); A nit: adding a space on the self of the second = character. And you need to save the apply position of the WAL sender, not the flush position in the array that is going to be ordered. /* * More easily understood version of standby state. This is purely - * informational, not different from priority. + * informational. In quorum method, we add the number to indicate + * how many in the set need to give a commit confirmation. */ if (priority == 0) values[7] = CStringGetTextDatum("async"); else if (list_member_int(sync_standbys, i)) - values[7] = CStringGetTextDatum("sync"); + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum") This code block and its explanation comments tell two different stories. The comment is saying that something like "quorum-N" is used but the code always prints "quorum". It may be a good idea in the test to check that when no keywords is specified the group of standbys is in quorum mode. The code looks in good shape, I am still willing to run more advanced tests manually. -- Michael
On Tue, Nov 8, 2016 at 10:12 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) >>> + return SyncRepGetSyncStandbysPriority(am_sync); >>> + else /* SYNC_REP_QUORUM */ >>> + return SyncRepGetSyncStandbysQuorum(am_sync); >>> Both routines share the same logic to detect if a WAL sender can be >>> selected as a candidate for sync evaluation or not, still per the >>> selection they do I agree that it is better to keep them as separate. >>> >>> + /* In quroum method, all sync standby priorities are always 1 */ >>> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) >>> + priority = 1; >>> Honestly I don't understand why you are enforcing that. Priority can >>> be important for users willing to switch from ANY to FIRST to have a >>> look immediately at what are the standbys that would become sync or >>> potential. >> >> I thought that since all standbys appearing in s_s_names list are >> treated equally in quorum method, these standbys should have same >> priority. >> If these standby have different sync_priority, it looks like that >> master server replicates to standby server based on priority. > > No actually, because we know that they are a quorum set, and that they > work in the same set. The concept of priorities has no real meaning > for quorum as there is no ordering of the elements. Another, perhaps > cleaner idea may be to mark the field as NULL actually. We know that but I'm concerned it might confuse the user. If these priorities are the same, it can obviously imply that all of the standby listed in s_s_names are handled equally. >>> It is not possible to guess from how many standbys this needs to wait >>> for. One idea would be to mark the sync_state not as "quorum", but >>> "quorum-N", or just add a new column to indicate how many in the set >>> need to give a commit confirmation. >> >> As Simon suggested before, we could support another feature that >> allows the client to control the quorum number. >> Considering adding that feature, I thought it's better to have and >> control that information as a GUC parameter. >> Thought? > > Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry > is not that much legitimate, users could just look at s_s_names to > guess how many in hte set a commit needs to wait for. It would be PGC_USRSET similar to synchronous_commit. The user can specify it in statement level. > + <para> > + <literal>FIRST</> and <literal>ANY</> are case-insensitive word > + and the standby name having these words are must be double-quoted. > + </para> > s/word/words/. > > + <literal>FIRST</> and <literal>ANY</> specify the method of > + that how master server controls the standby servers. > A little bit hard to understand, I would suggest: > FIRST and ANY specify the method used by the master to control the > standby servers. > > + The keyword <literal>FIRST</>, coupled with an integer > + number N higher-priority standbys and makes transaction commit > + when their WAL records are received. > This is unclear to me. Here is a correction: > The keyword FIRST, coupled with an integer N, makes transaction commit > wait until WAL records are received fron the N standbys with higher > priority number. > > + <varname>synchronous_standby_names</>. For example, a setting > + of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits > + wait until receiving receipts from at least any three standbys > + of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>, > This could just mention WAL records instead of "receipts". > > Instead of saying "an integer number N", we could use <literal>num_sync</>. > > + If <literal>FIRST</> or <literal>ANY</> are not specified, > this parameter > + behaves as <literal>ANY</>. Note that this grammar is > incompatible with > + <productname>PostgresSQL</> 9.6, where no keyword specified > is equivalent > + as if <literal>FIRST</> was specified. In short, there is no > real need to > + specify <replaceable class="parameter">num_sync</replaceable> as this > + behavior does not have changed, as well as it is not > necessary to mention > + pre-9.6 versions are the multi-sync grammar has been added in 9.6. > This paragraph could be reworked, say: > if FIRST or ANY are not specified this parameter behaves as if ANY is > used. Note that this grammar is incompatible with PostgreSQL 9.6 which > is the first version supporting multiple standbys with synchronous > replication, where no such keyword FIRST or ANY can be used. Note that > the grammar behaves as if FIRST is used, which is incompatible with > the post-9.6 behavior. > > + <entry>Synchronous state of this standby server. <literal>quorum-N</> > + , where N is the number of synchronous standbys that transactions > + need to wait for replies from, when standby is considered as a > + candidate of quorum commit.</entry> > Nitpicking: I think that the comma goes to the previous line if it is > the first character of a line. > > + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) > + return SyncRepGetSyncStandbysPriority(am_sync); > + else /* SYNC_REP_QUORUM */ > + return SyncRepGetSyncStandbysQuorum(am_sync) > Or that? > if (PRIORITY) > return StandbysPriority(); > else if (QUORUM) > return StandbysQuorum(); > else > elog(ERROR, "Boom"); > > + * In priority method, we need the oldest these positions among sync > + * standbys. In quorum method, we need the newest these positions > + * specified by SyncRepConfig->num_sync. > Last sentence is grammatically incorrect, and it would be more correct > to precise the Nth LSN positions to be able to select k standbys from > a set of n ones. > > + SpinLockAcquire(&walsnd->mutex); > + write_array[i] = walsnd->write; > + flush_array[i]= walsnd->flush; > + apply_array[i] = walsnd->flush; > + SpinLockRelease(&walsnd->mutex); > A nit: adding a space on the self of the second = character. And you > need to save the apply position of the WAL sender, not the flush > position in the array that is going to be ordered. > > /* > * More easily understood version of standby state. This is purely > - * informational, not different from priority. > + * informational. In quorum method, we add the number to indicate > + * how many in the set need to give a commit confirmation. > */ > if (priority == 0) > values[7] = CStringGetTextDatum("async"); > else if (list_member_int(sync_standbys, i)) > - values[7] = CStringGetTextDatum("sync"); > + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? > + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum") > This code block and its explanation comments tell two different > stories. The comment is saying that something like "quorum-N" is used > but the code always prints "quorum". > > It may be a good idea in the test to check that when no keywords is > specified the group of standbys is in quorum mode. Yeah, I will add some tests. I will post new version patch incorporated other comments. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Nov 14, 2016 at 5:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Nov 8, 2016 at 10:12 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Tue, Nov 8, 2016 at 12:25 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Tue, Oct 25, 2016 at 10:35 PM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) >>>> + return SyncRepGetSyncStandbysPriority(am_sync); >>>> + else /* SYNC_REP_QUORUM */ >>>> + return SyncRepGetSyncStandbysQuorum(am_sync); >>>> Both routines share the same logic to detect if a WAL sender can be >>>> selected as a candidate for sync evaluation or not, still per the >>>> selection they do I agree that it is better to keep them as separate. >>>> >>>> + /* In quroum method, all sync standby priorities are always 1 */ >>>> + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) >>>> + priority = 1; >>>> Honestly I don't understand why you are enforcing that. Priority can >>>> be important for users willing to switch from ANY to FIRST to have a >>>> look immediately at what are the standbys that would become sync or >>>> potential. >>> >>> I thought that since all standbys appearing in s_s_names list are >>> treated equally in quorum method, these standbys should have same >>> priority. >>> If these standby have different sync_priority, it looks like that >>> master server replicates to standby server based on priority. >> >> No actually, because we know that they are a quorum set, and that they >> work in the same set. The concept of priorities has no real meaning >> for quorum as there is no ordering of the elements. Another, perhaps >> cleaner idea may be to mark the field as NULL actually. > > We know that but I'm concerned it might confuse the user. > If these priorities are the same, it can obviously imply that all of > the standby listed in s_s_names are handled equally. > >>>> It is not possible to guess from how many standbys this needs to wait >>>> for. One idea would be to mark the sync_state not as "quorum", but >>>> "quorum-N", or just add a new column to indicate how many in the set >>>> need to give a commit confirmation. >>> >>> As Simon suggested before, we could support another feature that >>> allows the client to control the quorum number. >>> Considering adding that feature, I thought it's better to have and >>> control that information as a GUC parameter. >>> Thought? >> >> Similarly that would be a SIGHUP parameter? Why not. Perhaps my worry >> is not that much legitimate, users could just look at s_s_names to >> guess how many in hte set a commit needs to wait for. > > It would be PGC_USRSET similar to synchronous_commit. The user can > specify it in statement level. > >> + <para> >> + <literal>FIRST</> and <literal>ANY</> are case-insensitive word >> + and the standby name having these words are must be double-quoted. >> + </para> >> s/word/words/. >> >> + <literal>FIRST</> and <literal>ANY</> specify the method of >> + that how master server controls the standby servers. >> A little bit hard to understand, I would suggest: >> FIRST and ANY specify the method used by the master to control the >> standby servers. >> >> + The keyword <literal>FIRST</>, coupled with an integer >> + number N higher-priority standbys and makes transaction commit >> + when their WAL records are received. >> This is unclear to me. Here is a correction: >> The keyword FIRST, coupled with an integer N, makes transaction commit >> wait until WAL records are received fron the N standbys with higher >> priority number. >> >> + <varname>synchronous_standby_names</>. For example, a setting >> + of <literal>ANY 3 (s1, s2, s3, s4)</> makes transaction commits >> + wait until receiving receipts from at least any three standbys >> + of four listed servers <literal>s1</>, <literal>s2</>, <literal>s3</>, >> This could just mention WAL records instead of "receipts". >> >> Instead of saying "an integer number N", we could use <literal>num_sync</>. >> >> + If <literal>FIRST</> or <literal>ANY</> are not specified, >> this parameter >> + behaves as <literal>ANY</>. Note that this grammar is >> incompatible with >> + <productname>PostgresSQL</> 9.6, where no keyword specified >> is equivalent >> + as if <literal>FIRST</> was specified. In short, there is no >> real need to >> + specify <replaceable class="parameter">num_sync</replaceable> as this >> + behavior does not have changed, as well as it is not >> necessary to mention >> + pre-9.6 versions are the multi-sync grammar has been added in 9.6. >> This paragraph could be reworked, say: >> if FIRST or ANY are not specified this parameter behaves as if ANY is >> used. Note that this grammar is incompatible with PostgreSQL 9.6 which >> is the first version supporting multiple standbys with synchronous >> replication, where no such keyword FIRST or ANY can be used. Note that >> the grammar behaves as if FIRST is used, which is incompatible with >> the post-9.6 behavior. >> >> + <entry>Synchronous state of this standby server. <literal>quorum-N</> >> + , where N is the number of synchronous standbys that transactions >> + need to wait for replies from, when standby is considered as a >> + candidate of quorum commit.</entry> >> Nitpicking: I think that the comma goes to the previous line if it is >> the first character of a line. >> >> + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) >> + return SyncRepGetSyncStandbysPriority(am_sync); >> + else /* SYNC_REP_QUORUM */ >> + return SyncRepGetSyncStandbysQuorum(am_sync) >> Or that? >> if (PRIORITY) >> return StandbysPriority(); >> else if (QUORUM) >> return StandbysQuorum(); >> else >> elog(ERROR, "Boom"); >> >> + * In priority method, we need the oldest these positions among sync >> + * standbys. In quorum method, we need the newest these positions >> + * specified by SyncRepConfig->num_sync. >> Last sentence is grammatically incorrect, and it would be more correct >> to precise the Nth LSN positions to be able to select k standbys from >> a set of n ones. >> >> + SpinLockAcquire(&walsnd->mutex); >> + write_array[i] = walsnd->write; >> + flush_array[i]= walsnd->flush; >> + apply_array[i] = walsnd->flush; >> + SpinLockRelease(&walsnd->mutex); >> A nit: adding a space on the self of the second = character. And you >> need to save the apply position of the WAL sender, not the flush >> position in the array that is going to be ordered. >> >> /* >> * More easily understood version of standby state. This is purely >> - * informational, not different from priority. >> + * informational. In quorum method, we add the number to indicate >> + * how many in the set need to give a commit confirmation. >> */ >> if (priority == 0) >> values[7] = CStringGetTextDatum("async"); >> else if (list_member_int(sync_standbys, i)) >> - values[7] = CStringGetTextDatum("sync"); >> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? >> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum") >> This code block and its explanation comments tell two different >> stories. The comment is saying that something like "quorum-N" is used >> but the code always prints "quorum". >> >> It may be a good idea in the test to check that when no keywords is >> specified the group of standbys is in quorum mode. > > Yeah, I will add some tests. > > I will post new version patch incorporated other comments. > Attached latest version patch incorporated review comments. After more thought, I agree and changed the value of standby priority in quorum method so that it's not set 1 forcibly. The all standby priorities are 1 If s_s_names = 'ANY(*)'. Please review this patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached latest version patch incorporated review comments. After more > thought, I agree and changed the value of standby priority in quorum > method so that it's not set 1 forcibly. The all standby priorities are > 1 If s_s_names = 'ANY(*)'. > Please review this patch. Sorry for my late reply. Here is my final lookup. <synopsis> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">standby_name</replaceable> [, ...] ) +[ANY] <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">standby_name</replaceable> [, ...] ) +FIRST <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">standby_name</replaceable> [, ...] ) <replaceable class="parameter">standby_name</replaceable> [, ... This can just be replaced with [ ANY | FIRST ]. There is no need for braces as the keyword is not mandatory. + is the name of a standby server. + <literal>FIRST</> and <literal>ANY</> specify the method used by + the master to control the standby servres. </para> s/servres/servers/. if (priority == 0) values[7] = CStringGetTextDatum("async"); else if (list_member_int(sync_standbys, i)) - values[7] = CStringGetTextDatum("sync"); + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); else values[7] = CStringGetTextDatum("potential"); This can be simplified a bit as "quorum" is the state value for all standbys with a non-zero priority when the method is set to SYNC_REP_QUORUM: if (priority == 0) values[7] = CStringGetTextDatum("async"); + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM) + values[7] = CStringGetTextDatum("quorum"); else if (list_member_int(sync_standbys, i)) values[7] = CStringGetTextDatum("sync"); else SyncRepConfig data is made external to syncrep.c with this patch as walsender.c needs to look at the sync method in place, no complain about that after considering if there could be a more elegant way to do things without this change. While reviewing the patch, I have found a couple of incorrectly shaped sentences, both in the docs and some comments. Attached is a new version with this word-smithing. The patch is now switched as ready for committer. -- Michael
Attachment
On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached latest version patch incorporated review comments. After more >> thought, I agree and changed the value of standby priority in quorum >> method so that it's not set 1 forcibly. The all standby priorities are >> 1 If s_s_names = 'ANY(*)'. >> Please review this patch. > > Sorry for my late reply. Here is my final lookup. Thank you for reviewing! > <synopsis> > -<replaceable class="parameter">num_sync</replaceable> ( <replaceable > class="parameter">standby_name</replaceable> [, ...] ) > +[ANY] <replaceable class="parameter">num_sync</replaceable> ( > <replaceable class="parameter">standby_name</replaceable> [, ...] ) > +FIRST <replaceable class="parameter">num_sync</replaceable> ( > <replaceable class="parameter">standby_name</replaceable> [, ...] ) > <replaceable class="parameter">standby_name</replaceable> [, ... > This can just be replaced with [ ANY | FIRST ]. There is no need for > braces as the keyword is not mandatory. > > + is the name of a standby server. > + <literal>FIRST</> and <literal>ANY</> specify the method used by > + the master to control the standby servres. > </para> > s/servres/servers/. > > if (priority == 0) > values[7] = CStringGetTextDatum("async"); > else if (list_member_int(sync_standbys, i)) > - values[7] = CStringGetTextDatum("sync"); > + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? > + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); > else > values[7] = CStringGetTextDatum("potential"); > This can be simplified a bit as "quorum" is the state value for all > standbys with a non-zero priority when the method is set to > SYNC_REP_QUORUM: > if (priority == 0) > values[7] = CStringGetTextDatum("async"); > + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM) > + values[7] = CStringGetTextDatum("quorum"); > else if (list_member_int(sync_standbys, i)) > values[7] = CStringGetTextDatum("sync"); > else > > SyncRepConfig data is made external to syncrep.c with this patch as > walsender.c needs to look at the sync method in place, no complain > about that after considering if there could be a more elegant way to > do things without this change. Agreed. > While reviewing the patch, I have found a couple of incorrectly shaped > sentences, both in the docs and some comments. Attached is a new > version with this word-smithing. The patch is now switched as ready > for committer. Thanks. I found a typo in v7 patch, so attached latest v8 patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Attached latest version patch incorporated review comments. After more >>> thought, I agree and changed the value of standby priority in quorum >>> method so that it's not set 1 forcibly. The all standby priorities are >>> 1 If s_s_names = 'ANY(*)'. >>> Please review this patch. >> >> Sorry for my late reply. Here is my final lookup. > > Thank you for reviewing! > >> <synopsis> >> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable >> class="parameter">standby_name</replaceable> [, ...] ) >> +[ANY] <replaceable class="parameter">num_sync</replaceable> ( >> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >> +FIRST <replaceable class="parameter">num_sync</replaceable> ( >> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >> <replaceable class="parameter">standby_name</replaceable> [, ... >> This can just be replaced with [ ANY | FIRST ]. There is no need for >> braces as the keyword is not mandatory. >> >> + is the name of a standby server. >> + <literal>FIRST</> and <literal>ANY</> specify the method used by >> + the master to control the standby servres. >> </para> >> s/servres/servers/. >> >> if (priority == 0) >> values[7] = CStringGetTextDatum("async"); >> else if (list_member_int(sync_standbys, i)) >> - values[7] = CStringGetTextDatum("sync"); >> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? >> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); >> else >> values[7] = CStringGetTextDatum("potential"); >> This can be simplified a bit as "quorum" is the state value for all >> standbys with a non-zero priority when the method is set to >> SYNC_REP_QUORUM: >> if (priority == 0) >> values[7] = CStringGetTextDatum("async"); >> + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM) >> + values[7] = CStringGetTextDatum("quorum"); >> else if (list_member_int(sync_standbys, i)) >> values[7] = CStringGetTextDatum("sync"); >> else >> >> SyncRepConfig data is made external to syncrep.c with this patch as >> walsender.c needs to look at the sync method in place, no complain >> about that after considering if there could be a more elegant way to >> do things without this change. > > Agreed. > >> While reviewing the patch, I have found a couple of incorrectly shaped >> sentences, both in the docs and some comments. Attached is a new >> version with this word-smithing. The patch is now switched as ready >> for committer. > > Thanks. I found a typo in v7 patch, so attached latest v8 patch. Moved patch to CF 2017-01, with same status "Ready for committer". -- Michael
On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Attached latest version patch incorporated review comments. After more >>> thought, I agree and changed the value of standby priority in quorum >>> method so that it's not set 1 forcibly. The all standby priorities are >>> 1 If s_s_names = 'ANY(*)'. >>> Please review this patch. >> >> Sorry for my late reply. Here is my final lookup. > > Thank you for reviewing! > >> <synopsis> >> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable >> class="parameter">standby_name</replaceable> [, ...] ) >> +[ANY] <replaceable class="parameter">num_sync</replaceable> ( >> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >> +FIRST <replaceable class="parameter">num_sync</replaceable> ( >> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >> <replaceable class="parameter">standby_name</replaceable> [, ... >> This can just be replaced with [ ANY | FIRST ]. There is no need for >> braces as the keyword is not mandatory. >> >> + is the name of a standby server. >> + <literal>FIRST</> and <literal>ANY</> specify the method used by >> + the master to control the standby servres. >> </para> >> s/servres/servers/. >> >> if (priority == 0) >> values[7] = CStringGetTextDatum("async"); >> else if (list_member_int(sync_standbys, i)) >> - values[7] = CStringGetTextDatum("sync"); >> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? >> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); >> else >> values[7] = CStringGetTextDatum("potential"); >> This can be simplified a bit as "quorum" is the state value for all >> standbys with a non-zero priority when the method is set to >> SYNC_REP_QUORUM: >> if (priority == 0) >> values[7] = CStringGetTextDatum("async"); >> + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM) >> + values[7] = CStringGetTextDatum("quorum"); >> else if (list_member_int(sync_standbys, i)) >> values[7] = CStringGetTextDatum("sync"); >> else >> >> SyncRepConfig data is made external to syncrep.c with this patch as >> walsender.c needs to look at the sync method in place, no complain >> about that after considering if there could be a more elegant way to >> do things without this change. > > Agreed. > >> While reviewing the patch, I have found a couple of incorrectly shaped >> sentences, both in the docs and some comments. Attached is a new >> version with this word-smithing. The patch is now switched as ready >> for committer. > > Thanks. I found a typo in v7 patch, so attached latest v8 patch. + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); In quorum commit, we need to calculate the N-th largest LSN from M quorum synchronous standbys' LSN. N would be usually 1 - 3 and M would be 1 - 10, I guess. You used the algorithm using qsort for that calculation. But I'm not sure if that's enough effective algorithm or not. If M (i.e., number of quorum sync standbys) is enough large, your choice would be good. But usually M seems not so large. Thought? Regards, -- Fujii Masao
On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> Attached latest version patch incorporated review comments. After more >>>> thought, I agree and changed the value of standby priority in quorum >>>> method so that it's not set 1 forcibly. The all standby priorities are >>>> 1 If s_s_names = 'ANY(*)'. >>>> Please review this patch. >>> >>> Sorry for my late reply. Here is my final lookup. >> >> Thank you for reviewing! >> >>> <synopsis> >>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable >>> class="parameter">standby_name</replaceable> [, ...] ) >>> +[ANY] <replaceable class="parameter">num_sync</replaceable> ( >>> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >>> +FIRST <replaceable class="parameter">num_sync</replaceable> ( >>> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >>> <replaceable class="parameter">standby_name</replaceable> [, ... >>> This can just be replaced with [ ANY | FIRST ]. There is no need for >>> braces as the keyword is not mandatory. >>> >>> + is the name of a standby server. >>> + <literal>FIRST</> and <literal>ANY</> specify the method used by >>> + the master to control the standby servres. >>> </para> >>> s/servres/servers/. >>> >>> if (priority == 0) >>> values[7] = CStringGetTextDatum("async"); >>> else if (list_member_int(sync_standbys, i)) >>> - values[7] = CStringGetTextDatum("sync"); >>> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? >>> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); >>> else >>> values[7] = CStringGetTextDatum("potential"); >>> This can be simplified a bit as "quorum" is the state value for all >>> standbys with a non-zero priority when the method is set to >>> SYNC_REP_QUORUM: >>> if (priority == 0) >>> values[7] = CStringGetTextDatum("async"); >>> + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM) >>> + values[7] = CStringGetTextDatum("quorum"); >>> else if (list_member_int(sync_standbys, i)) >>> values[7] = CStringGetTextDatum("sync"); >>> else >>> >>> SyncRepConfig data is made external to syncrep.c with this patch as >>> walsender.c needs to look at the sync method in place, no complain >>> about that after considering if there could be a more elegant way to >>> do things without this change. >> >> Agreed. >> >>> While reviewing the patch, I have found a couple of incorrectly shaped >>> sentences, both in the docs and some comments. Attached is a new >>> version with this word-smithing. The patch is now switched as ready >>> for committer. >> >> Thanks. I found a typo in v7 patch, so attached latest v8 patch. > > + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); > + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); > + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); > > In quorum commit, we need to calculate the N-th largest LSN from > M quorum synchronous standbys' LSN. N would be usually 1 - 3 and > M would be 1 - 10, I guess. You used the algorithm using qsort for > that calculation. But I'm not sure if that's enough effective algorithm > or not. > > If M (i.e., number of quorum sync standbys) is enough large, > your choice would be good. But usually M seems not so large. > Thank you for the comment. One another possible idea is to use the partial selection sort[1], which takes O(MN) time. Since this is more efficient if N is small this would be better than qsort for this case. But I'm not sure that we can see such a difference by result of performance measurement. [1] https://en.wikipedia.org/wiki/Selection_algorithm#Partial_selection_sort Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Dec 6, 2016 at 6:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> If M (i.e., number of quorum sync standbys) is enough large, >> your choice would be good. But usually M seems not so large. >> > > Thank you for the comment. > > One another possible idea is to use the partial selection sort[1], > which takes O(MN) time. Since this is more efficient if N is small > this would be better than qsort for this case. But I'm not sure that > we can see such a difference by result of performance measurement. > > [1] https://en.wikipedia.org/wiki/Selection_algorithm#Partial_selection_sort We'll begin to see a minimal performance impact when selecting a sync standby across hundreds of them, which is less than say what 0.1% (or less) of existing deployments are doing. The current approach taken seems simple enough to be kept, and performance is not something to worry much IMHO. -- Michael
On Tue, Dec 6, 2016 at 6:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Dec 6, 2016 at 1:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Mon, Nov 28, 2016 at 8:03 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Sat, Nov 26, 2016 at 10:27 PM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> On Tue, Nov 15, 2016 at 7:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> Attached latest version patch incorporated review comments. After more >>>>> thought, I agree and changed the value of standby priority in quorum >>>>> method so that it's not set 1 forcibly. The all standby priorities are >>>>> 1 If s_s_names = 'ANY(*)'. >>>>> Please review this patch. >>>> >>>> Sorry for my late reply. Here is my final lookup. >>> >>> Thank you for reviewing! >>> >>>> <synopsis> >>>> -<replaceable class="parameter">num_sync</replaceable> ( <replaceable >>>> class="parameter">standby_name</replaceable> [, ...] ) >>>> +[ANY] <replaceable class="parameter">num_sync</replaceable> ( >>>> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >>>> +FIRST <replaceable class="parameter">num_sync</replaceable> ( >>>> <replaceable class="parameter">standby_name</replaceable> [, ...] ) >>>> <replaceable class="parameter">standby_name</replaceable> [, ... >>>> This can just be replaced with [ ANY | FIRST ]. There is no need for >>>> braces as the keyword is not mandatory. >>>> >>>> + is the name of a standby server. >>>> + <literal>FIRST</> and <literal>ANY</> specify the method used by >>>> + the master to control the standby servres. >>>> </para> >>>> s/servres/servers/. >>>> >>>> if (priority == 0) >>>> values[7] = CStringGetTextDatum("async"); >>>> else if (list_member_int(sync_standbys, i)) >>>> - values[7] = CStringGetTextDatum("sync"); >>>> + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? >>>> + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); >>>> else >>>> values[7] = CStringGetTextDatum("potential"); >>>> This can be simplified a bit as "quorum" is the state value for all >>>> standbys with a non-zero priority when the method is set to >>>> SYNC_REP_QUORUM: >>>> if (priority == 0) >>>> values[7] = CStringGetTextDatum("async"); >>>> + else if (SyncRepConfig->sync_method == SYNC_REP_QUORUM) >>>> + values[7] = CStringGetTextDatum("quorum"); >>>> else if (list_member_int(sync_standbys, i)) >>>> values[7] = CStringGetTextDatum("sync"); >>>> else >>>> >>>> SyncRepConfig data is made external to syncrep.c with this patch as >>>> walsender.c needs to look at the sync method in place, no complain >>>> about that after considering if there could be a more elegant way to >>>> do things without this change. >>> >>> Agreed. >>> >>>> While reviewing the patch, I have found a couple of incorrectly shaped >>>> sentences, both in the docs and some comments. Attached is a new >>>> version with this word-smithing. The patch is now switched as ready >>>> for committer. >>> >>> Thanks. I found a typo in v7 patch, so attached latest v8 patch. >> >> + qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn); >> + qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn); >> + qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn); >> >> In quorum commit, we need to calculate the N-th largest LSN from >> M quorum synchronous standbys' LSN. N would be usually 1 - 3 and >> M would be 1 - 10, I guess. You used the algorithm using qsort for >> that calculation. But I'm not sure if that's enough effective algorithm >> or not. >> >> If M (i.e., number of quorum sync standbys) is enough large, >> your choice would be good. But usually M seems not so large. >> > > Thank you for the comment. > > One another possible idea is to use the partial selection sort[1], > which takes O(MN) time. Since this is more efficient if N is small > this would be better than qsort for this case. But I'm not sure that > we can see such a difference by result of performance measurement. So, isn't it better to compare the performance of some algorithms and confirm which is the best for quorum commit? Since this code is hot, i.e., can be very frequently executed, I'd like to avoid waste of cycle as much as possible. Regards, -- Fujii Masao
On Wed, Dec 7, 2016 at 12:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > So, isn't it better to compare the performance of some algorithms and > confirm which is the best for quorum commit? Since this code is hot, i.e., > can be very frequently executed, I'd like to avoid waste of cycle as much > as possible. It seems to me that it would be simple enough to write a script to do that to avoid any other noise: allocate an array with N random elements, and fetch the M-th element from it after applying a sort method. I highly doubt that you'd see much difference with a low number of elements, now if you scale at a thousand standbys in a quorum set you may surely see something :*) Anybody willing to try out? -- Michael
At Wed, 7 Dec 2016 13:26:38 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqSyfsg=gHeqgXyzP0iGWvdyrXqnG-UENzfueaU=2m5-zg@mail.gmail.com> > On Wed, Dec 7, 2016 at 12:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > So, isn't it better to compare the performance of some algorithms and > > confirm which is the best for quorum commit? Since this code is hot, i.e., > > can be very frequently executed, I'd like to avoid waste of cycle as much > > as possible. > > It seems to me that it would be simple enough to write a script to do > that to avoid any other noise: allocate an array with N random > elements, and fetch the M-th element from it after applying a sort > method. I highly doubt that you'd see much difference with a low > number of elements, now if you scale at a thousand standbys in a > quorum set you may surely see something :*) > Anybody willing to try out? Aside from measurement of the two sorting methods, I'd like to point out that quorum commit basically doesn't need sorting. Counting comforming santdbys while scanning the walsender(receiver) LSN list comparing with the target LSN is O(n). Small refactoring of SyncRerpGetOldestSyncRecPtr would enough to do that. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Dec 7, 2016 at 2:49 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Aside from measurement of the two sorting methods, I'd like to > point out that quorum commit basically doesn't need > sorting. Counting conforming santdbys while scanning the > walsender(receiver) LSN list comparing with the target LSN is > O(n). Small refactoring of SyncRerpGetOldestSyncRecPtr would > enough to do that. Indeed, I haven't thought about that, and that's a no-brainer. That would remove the need to allocate and sort each array, what is simply needed is to track the number of times a newest value has been found. So what this processing would do is updating the write/flush/apply values for the first k loops if the new value is *older* than the current one, where k is the quorum number, and between k+1 and N the value gets updated only if the value compared is newer. No need to take the mutex lock for a long time as well. By the way, the patch now conflicts on HEAD, it needs a refresh. -- Michael
On Wed, Dec 7, 2016 at 4:05 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Dec 7, 2016 at 2:49 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Aside from measurement of the two sorting methods, I'd like to >> point out that quorum commit basically doesn't need >> sorting. Counting conforming santdbys while scanning the >> walsender(receiver) LSN list comparing with the target LSN is >> O(n). Small refactoring of SyncRerpGetOldestSyncRecPtr would >> enough to do that. What does the target LSN mean here? > Indeed, I haven't thought about that, and that's a no-brainer. That > would remove the need to allocate and sort each array, what is simply > needed is to track the number of times a newest value has been found. > So what this processing would do is updating the write/flush/apply > values for the first k loops if the new value is *older* than the > current one, where k is the quorum number, and between k+1 and N the > value gets updated only if the value compared is newer. No need to > take the mutex lock for a long time as well. Sorry, I could not understand this algorithm. Could you elaborate this? It takes only O(n) times? > By the way, the patch now > conflicts on HEAD, it needs a refresh. Thanks, I'll post the latest patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Dec 7, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Dec 7, 2016 at 4:05 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Indeed, I haven't thought about that, and that's a no-brainer. That >> would remove the need to allocate and sort each array, what is simply >> needed is to track the number of times a newest value has been found. >> So what this processing would do is updating the write/flush/apply >> values for the first k loops if the new value is *older* than the >> current one, where k is the quorum number, and between k+1 and N the >> value gets updated only if the value compared is newer. No need to >> take the mutex lock for a long time as well. > > Sorry, I could not understand this algorithm. Could you elaborate > this? It takes only O(n) times? Nah, please forget that, that was a random useless thought. There is no way to be able to select the k-th element without knowing the hierarchy induced by the others, which is what the partial sort would help with here. -- Michael
On Tue, Dec 6, 2016 at 11:26 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Dec 7, 2016 at 12:32 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> So, isn't it better to compare the performance of some algorithms and >> confirm which is the best for quorum commit? Since this code is hot, i.e., >> can be very frequently executed, I'd like to avoid waste of cycle as much >> as possible. > > It seems to me that it would be simple enough to write a script to do > that to avoid any other noise: allocate an array with N random > elements, and fetch the M-th element from it after applying a sort > method. I highly doubt that you'd see much difference with a low > number of elements, now if you scale at a thousand standbys in a > quorum set you may surely see something :*) > Anybody willing to try out? You could do that, but first I would code up the simplest, cleanest algorithm you can think of and see if it even shows up in a 'perf' profile. Microbenchmarking is probably overkill here unless a problem is visible on macrobenchmarks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, context switch was complete that time, sorry. There's multiple "target LET"s. So we need kth-largest LTEs. At Wed, 7 Dec 2016 19:04:23 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqR10OnEL5XxW1DVYvAXmtpEVNCMi=V-6Jb_9owFuY8aSg@mail.gmail.com> > On Wed, Dec 7, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Sorry, I could not understand this algorithm. Could you elaborate > > this? It takes only O(n) times? > > Nah, please forget that, that was a random useless thought. There is > no way to be able to select the k-th element without knowing the > hierarchy induced by the others, which is what the partial sort would > help with here. So, let's consider for some cases, - needing 3-quorum among 5 standbys. There's no problem whatever make kth-largest we choose.Of course qsorts are fine. - needing 10 quorums among 100 standbys. I'm not sure if there's any difference with 3/5. - needing 10 quorums among 1000 standbys.Obviously qsorts are doing too much. Any kind of kth-largestalgorithm should beinvolved. For instance quickselect with O(nlong n) - O(n) seems better in comparison to O(n log n) - O(n^2)of qsort. - needing 700 quorums among 1000 standbys. I don't think this case is worth consider but kth-largest isbetter even for this case. If we don't 700/1000 is out of at least the current scope, I also recommend to use kth-largest selection. If not, the quorum calculation is triggered by every standby reply message and the frequency of the calculation seems near to the frequency of WAL-insertion for the worst case. (Right?) Even the kth-largest takes too long time to have 1000 standys. Maintining kth-largest in shared memory needs less CPU but leads to more bad contention on the shared memory. Inversely, we already have waiting LSNs of backends in procarray. If we have another array in the order of waiting LSNs and having a condition varialble on the number of comforming walsenders. Every walsender can individually looking up it and count it up. It might performs better but I'm not sure. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: > You could do that, but first I would code up the simplest, cleanest > algorithm you can think of and see if it even shows up in a 'perf' > profile. Microbenchmarking is probably overkill here unless a problem > is visible on macrobenchmarks. This is what I would go for! The current code is doing a simple thing: select the Nth element using qsort() after scanning each WAL sender's values. And I think that Sawada-san got it right. Even running on my laptop a pgbench run with 10 sync standbys using a data set that fits into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead using perf top on a non-assert, non-debug build. Hash tables and allocations get a far larger share. Using the patch, SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 nodes. Let's kick the ball for now. An extra patch could make things better later on if that's worth it. -- Michael
On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> You could do that, but first I would code up the simplest, cleanest >> algorithm you can think of and see if it even shows up in a 'perf' >> profile. Microbenchmarking is probably overkill here unless a problem >> is visible on macrobenchmarks. > > This is what I would go for! The current code is doing a simple thing: > select the Nth element using qsort() after scanning each WAL sender's > values. And I think that Sawada-san got it right. Even running on my > laptop a pgbench run with 10 sync standbys using a data set that fits > into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead > using perf top on a non-assert, non-debug build. Hash tables and > allocations get a far larger share. Using the patch, > SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 > nodes. Let's kick the ball for now. An extra patch could make things > better later on if that's worth it. Yeah, since the both K and N could be not large these algorithm takes almost the same time. And current patch does simple thing. When we need over 100 or 1000 replication node the optimization could be required. Attached latest v9 patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> You could do that, but first I would code up the simplest, cleanest >>> algorithm you can think of and see if it even shows up in a 'perf' >>> profile. Microbenchmarking is probably overkill here unless a problem >>> is visible on macrobenchmarks. >> >> This is what I would go for! The current code is doing a simple thing: >> select the Nth element using qsort() after scanning each WAL sender's >> values. And I think that Sawada-san got it right. Even running on my >> laptop a pgbench run with 10 sync standbys using a data set that fits >> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >> using perf top on a non-assert, non-debug build. Hash tables and >> allocations get a far larger share. Using the patch, >> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >> nodes. Let's kick the ball for now. An extra patch could make things >> better later on if that's worth it. > > Yeah, since the both K and N could be not large these algorithm takes > almost the same time. And current patch does simple thing. When we > need over 100 or 1000 replication node the optimization could be > required. > Attached latest v9 patch. > Few comments: + * In 10.0 we support two synchronization methods, priority and + * quorum. The number of synchronous standbys that transactions + * must wait for replies from and synchronization method are specified + * in synchronous_standby_names. This parameter also specifies a list + * of standby names, which determines the priority of each standby for + * being chosen as a synchronous standby. In priority method, the standbys + * whose names appear earlier in the list are given higher priority + * and will be considered as synchronous. Other standby servers appearing + * later in this list represent potential synchronous standbys. If any of + * the current synchronous standbys disconnects for whatever reason, + * it will be replaced immediately with the next-highest-priority standby. + * In quorum method, the all standbys appearing in the list are + * considered as a candidate for quorum commit. In the above description, is priority method represented by FIRST and quorum method by ANY in the synchronous_standby_names syntax? If so, it might be better to write about it explicitly. 2.--- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) } /* - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple - * from each input tape. - */ - state->memtupsize = numInputTapes; - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); - - /* - * Use all the remaining memory we have available for read buffers among - * the input tapes. + * Use all the spare memory we have available for read buffers among the + * input tapes. This doesn't belong to this patch. 3. + * Return the list of sync standbys using according to synchronous method, In above sentence, "using according" seems to either incomplete or wrong usage. 4. + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + "invalid synchronization method is specified \"%d\"", + SyncRepConfig->sync_method)); Here, the error message doesn't seem to aligned and you might want to use errmsg for the same. 5. + * In priority method, we need the oldest these positions among sync + * standbys. In quorum method, we need the newest these positions + * specified by SyncRepConfig->num_sync. /oldest these/oldest of these /newest these positions specified/newest of these positions as specified Instead of newest, can we consider to use latest? 6. + int sync_method; /* synchronization method */ /* member_names contains nmembers consecutive nul-terminated C strings */char member_names[FLEXIBLE_ARRAY_MEMBER];} SyncRepConfigData; Can't we use 1 or 2 bytes to store sync_method information? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> You could do that, but first I would code up the simplest, cleanest >>>> algorithm you can think of and see if it even shows up in a 'perf' >>>> profile. Microbenchmarking is probably overkill here unless a problem >>>> is visible on macrobenchmarks. >>> >>> This is what I would go for! The current code is doing a simple thing: >>> select the Nth element using qsort() after scanning each WAL sender's >>> values. And I think that Sawada-san got it right. Even running on my >>> laptop a pgbench run with 10 sync standbys using a data set that fits >>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >>> using perf top on a non-assert, non-debug build. Hash tables and >>> allocations get a far larger share. Using the patch, >>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >>> nodes. Let's kick the ball for now. An extra patch could make things >>> better later on if that's worth it. >> >> Yeah, since the both K and N could be not large these algorithm takes >> almost the same time. And current patch does simple thing. When we >> need over 100 or 1000 replication node the optimization could be >> required. >> Attached latest v9 patch. >> > > Few comments: Thank you for reviewing. > + * In 10.0 we support two synchronization methods, priority and > + * quorum. The number of synchronous standbys that transactions > + * must wait for replies from and synchronization method are specified > + * in synchronous_standby_names. This parameter also specifies a list > + * of standby names, which determines the priority of each standby for > + * being chosen as a synchronous standby. In priority method, the standbys > + * whose names appear earlier in the list are given higher priority > + * and will be considered as synchronous. Other standby servers appearing > + * later in this list represent potential synchronous standbys. If any of > + * the current synchronous standbys disconnects for whatever reason, > + * it will be replaced immediately with the next-highest-priority standby. > + * In quorum method, the all standbys appearing in the list are > + * considered as a candidate for quorum commit. > > In the above description, is priority method represented by FIRST and > quorum method by ANY in the synchronous_standby_names syntax? If so, > it might be better to write about it explicitly. Added description. > > 2. > --- a/src/backend/utils/sort/tuplesort.c > +++ b/src/backend/utils/sort/tuplesort.c > @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) > } > > /* > - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple > - * from each input tape. > - */ > - state->memtupsize = numInputTapes; > - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); > - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); > - > - /* > - * Use all the remaining memory we have available for read buffers among > - * the input tapes. > + * Use all the spare memory we have available for read buffers among the > + * input tapes. > > This doesn't belong to this patch. Oops, fixed. > 3. > + * Return the list of sync standbys using according to synchronous method, > > In above sentence, "using according" seems to either incomplete or wrong usage. Fixed. > 4. > + else > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + "invalid synchronization method is specified \"%d\"", > + SyncRepConfig->sync_method)); > > Here, the error message doesn't seem to aligned and you might want to > use errmsg for the same. > Fixed. > 5. > + * In priority method, we need the oldest these positions among sync > + * standbys. In quorum method, we need the newest these positions > + * specified by SyncRepConfig->num_sync. > > /oldest these/oldest of these > /newest these positions specified/newest of these positions as specified Fixed. > Instead of newest, can we consider to use latest? Yeah, I changed it so. > 6. > + int sync_method; /* synchronization method */ > /* member_names contains nmembers consecutive nul-terminated C strings */ > char member_names[FLEXIBLE_ARRAY_MEMBER]; > } SyncRepConfigData; > > Can't we use 1 or 2 bytes to store sync_method information? I changed it to uint8. Attached latest v10 patch incorporated the review comments so far. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>> You could do that, but first I would code up the simplest, cleanest >>>>> algorithm you can think of and see if it even shows up in a 'perf' >>>>> profile. Microbenchmarking is probably overkill here unless a problem >>>>> is visible on macrobenchmarks. >>>> >>>> This is what I would go for! The current code is doing a simple thing: >>>> select the Nth element using qsort() after scanning each WAL sender's >>>> values. And I think that Sawada-san got it right. Even running on my >>>> laptop a pgbench run with 10 sync standbys using a data set that fits >>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >>>> using perf top on a non-assert, non-debug build. Hash tables and >>>> allocations get a far larger share. Using the patch, >>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >>>> nodes. Let's kick the ball for now. An extra patch could make things >>>> better later on if that's worth it. >>> >>> Yeah, since the both K and N could be not large these algorithm takes >>> almost the same time. And current patch does simple thing. When we >>> need over 100 or 1000 replication node the optimization could be >>> required. >>> Attached latest v9 patch. >>> >> >> Few comments: > > Thank you for reviewing. > >> + * In 10.0 we support two synchronization methods, priority and >> + * quorum. The number of synchronous standbys that transactions >> + * must wait for replies from and synchronization method are specified >> + * in synchronous_standby_names. This parameter also specifies a list >> + * of standby names, which determines the priority of each standby for >> + * being chosen as a synchronous standby. In priority method, the standbys >> + * whose names appear earlier in the list are given higher priority >> + * and will be considered as synchronous. Other standby servers appearing >> + * later in this list represent potential synchronous standbys. If any of >> + * the current synchronous standbys disconnects for whatever reason, >> + * it will be replaced immediately with the next-highest-priority standby. >> + * In quorum method, the all standbys appearing in the list are >> + * considered as a candidate for quorum commit. >> >> In the above description, is priority method represented by FIRST and >> quorum method by ANY in the synchronous_standby_names syntax? If so, >> it might be better to write about it explicitly. > > Added description. > >> >> 2. >> --- a/src/backend/utils/sort/tuplesort.c >> +++ b/src/backend/utils/sort/tuplesort.c >> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) >> } >> >> /* >> - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple >> - * from each input tape. >> - */ >> - state->memtupsize = numInputTapes; >> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); >> - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); >> - >> - /* >> - * Use all the remaining memory we have available for read buffers among >> - * the input tapes. >> + * Use all the spare memory we have available for read buffers among the >> + * input tapes. >> >> This doesn't belong to this patch. > > Oops, fixed. > >> 3. >> + * Return the list of sync standbys using according to synchronous method, >> >> In above sentence, "using according" seems to either incomplete or wrong usage. > > Fixed. > >> 4. >> + else >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + "invalid synchronization method is specified \"%d\"", >> + SyncRepConfig->sync_method)); >> >> Here, the error message doesn't seem to aligned and you might want to >> use errmsg for the same. >> > > Fixed. > >> 5. >> + * In priority method, we need the oldest these positions among sync >> + * standbys. In quorum method, we need the newest these positions >> + * specified by SyncRepConfig->num_sync. >> >> /oldest these/oldest of these >> /newest these positions specified/newest of these positions as specified > > Fixed. > >> Instead of newest, can we consider to use latest? > > Yeah, I changed it so. > > >> 6. >> + int sync_method; /* synchronization method */ >> /* member_names contains nmembers consecutive nul-terminated C strings */ >> char member_names[FLEXIBLE_ARRAY_MEMBER]; >> } SyncRepConfigData; >> >> Can't we use 1 or 2 bytes to store sync_method information? > > I changed it to uint8. > > Attached latest v10 patch incorporated the review comments so far. > Please review it. Thanks for updating the patch! Do we need to update postgresql.conf.sample? +{any_ident} { + yylval.str = pstrdup(yytext); + return ANY; + } +{first_ident} { + yylval.str = pstrdup(yytext); + return FIRST; + } Why is pstrdup(yytext) necessary here? + standby_list { $$ = create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } + | NUM '(' standby_list ')' { $$ = create_syncrep_config($1, $3, SYNC_REP_QUORUM); } + | ANY NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_QUORUM); } + | FIRST NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works differently from curent version while "list" works in the same way as current one) very confusing? I prefer to either of 1. break the backward-compatibility, i.e., treat the first syntax of "standby_list" as quorum commit 2. keep the backward-compatibility, i.e., treat the second syntax of "NUM (standby_list)" as sync rep with the priority + <literal>pg_stat_replication</></link> view). If the keyword + <literal>FIRST</> is specified, other standby servers appearing + later in this list represent potential synchronous standbys. + If any of the current synchronous standbys disconnects for + whatever reason, it will be replaced immediately with the + next-highest-priority standby. Specifying more than one standby + name can allow very high availability. It seems strange to explain the behavior of FIRST before explaining the syntax of synchronous_standby_names and FIRST. Regards, -- Fujii Masao
On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier >>>> <michael.paquier@gmail.com> wrote: >>>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>>> You could do that, but first I would code up the simplest, cleanest >>>>>> algorithm you can think of and see if it even shows up in a 'perf' >>>>>> profile. Microbenchmarking is probably overkill here unless a problem >>>>>> is visible on macrobenchmarks. >>>>> >>>>> This is what I would go for! The current code is doing a simple thing: >>>>> select the Nth element using qsort() after scanning each WAL sender's >>>>> values. And I think that Sawada-san got it right. Even running on my >>>>> laptop a pgbench run with 10 sync standbys using a data set that fits >>>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >>>>> using perf top on a non-assert, non-debug build. Hash tables and >>>>> allocations get a far larger share. Using the patch, >>>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >>>>> nodes. Let's kick the ball for now. An extra patch could make things >>>>> better later on if that's worth it. >>>> >>>> Yeah, since the both K and N could be not large these algorithm takes >>>> almost the same time. And current patch does simple thing. When we >>>> need over 100 or 1000 replication node the optimization could be >>>> required. >>>> Attached latest v9 patch. >>>> >>> >>> Few comments: >> >> Thank you for reviewing. >> >>> + * In 10.0 we support two synchronization methods, priority and >>> + * quorum. The number of synchronous standbys that transactions >>> + * must wait for replies from and synchronization method are specified >>> + * in synchronous_standby_names. This parameter also specifies a list >>> + * of standby names, which determines the priority of each standby for >>> + * being chosen as a synchronous standby. In priority method, the standbys >>> + * whose names appear earlier in the list are given higher priority >>> + * and will be considered as synchronous. Other standby servers appearing >>> + * later in this list represent potential synchronous standbys. If any of >>> + * the current synchronous standbys disconnects for whatever reason, >>> + * it will be replaced immediately with the next-highest-priority standby. >>> + * In quorum method, the all standbys appearing in the list are >>> + * considered as a candidate for quorum commit. >>> >>> In the above description, is priority method represented by FIRST and >>> quorum method by ANY in the synchronous_standby_names syntax? If so, >>> it might be better to write about it explicitly. >> >> Added description. >> >>> >>> 2. >>> --- a/src/backend/utils/sort/tuplesort.c >>> +++ b/src/backend/utils/sort/tuplesort.c >>> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) >>> } >>> >>> /* >>> - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple >>> - * from each input tape. >>> - */ >>> - state->memtupsize = numInputTapes; >>> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); >>> - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); >>> - >>> - /* >>> - * Use all the remaining memory we have available for read buffers among >>> - * the input tapes. >>> + * Use all the spare memory we have available for read buffers among the >>> + * input tapes. >>> >>> This doesn't belong to this patch. >> >> Oops, fixed. >> >>> 3. >>> + * Return the list of sync standbys using according to synchronous method, >>> >>> In above sentence, "using according" seems to either incomplete or wrong usage. >> >> Fixed. >> >>> 4. >>> + else >>> + ereport(ERROR, >>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >>> + "invalid synchronization method is specified \"%d\"", >>> + SyncRepConfig->sync_method)); >>> >>> Here, the error message doesn't seem to aligned and you might want to >>> use errmsg for the same. >>> >> >> Fixed. >> >>> 5. >>> + * In priority method, we need the oldest these positions among sync >>> + * standbys. In quorum method, we need the newest these positions >>> + * specified by SyncRepConfig->num_sync. >>> >>> /oldest these/oldest of these >>> /newest these positions specified/newest of these positions as specified >> >> Fixed. >> >>> Instead of newest, can we consider to use latest? >> >> Yeah, I changed it so. >> >> >>> 6. >>> + int sync_method; /* synchronization method */ >>> /* member_names contains nmembers consecutive nul-terminated C strings */ >>> char member_names[FLEXIBLE_ARRAY_MEMBER]; >>> } SyncRepConfigData; >>> >>> Can't we use 1 or 2 bytes to store sync_method information? >> >> I changed it to uint8. >> >> Attached latest v10 patch incorporated the review comments so far. >> Please review it. > > Thanks for updating the patch! > > Do we need to update postgresql.conf.sample? Added description to postgresql.conf.sample. > +{any_ident} { > + yylval.str = pstrdup(yytext); > + return ANY; > + } > +{first_ident} { > + yylval.str = pstrdup(yytext); > + return FIRST; > + } > > Why is pstrdup(yytext) necessary here? The first whole line was unnecessary actually. Removed. > + standby_list { $$ = > create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } > + | NUM '(' standby_list ')' { $$ = > create_syncrep_config($1, $3, SYNC_REP_QUORUM); } > + | ANY NUM '(' standby_list ')' { $$ = > create_syncrep_config($2, $4, SYNC_REP_QUORUM); } > + | FIRST NUM '(' standby_list ')' { $$ = > create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } > > Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works > differently from curent version while "list" works in the same way as > current one) very confusing? > > I prefer to either of > > 1. break the backward-compatibility, i.e., treat the first syntax of > "standby_list" as quorum commit > 2. keep the backward-compatibility, i.e., treat the second syntax of > "NUM (standby_list)" as sync rep with the priority There were some comments when I proposed the quorum commit. If we do #1 it breaks the backward-compatibility with 9.5 or before as well. I don't think it's a good idea. On the other hand, if we do #2 then the behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM (standby_list)''. But it would not what most of user will want and would confuse the users of future version who will want to use the quorum commit. Since many hackers thought that the sensible default behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the current patch chose to changes the behaviour of s_s_names and document that changes thoroughly. > + <literal>pg_stat_replication</></link> view). If the keyword > + <literal>FIRST</> is specified, other standby servers appearing > + later in this list represent potential synchronous standbys. > + If any of the current synchronous standbys disconnects for > + whatever reason, it will be replaced immediately with the > + next-highest-priority standby. Specifying more than one standby > + name can allow very high availability. > > It seems strange to explain the behavior of FIRST before explaining > the syntax of synchronous_standby_names and FIRST. > Updated document. Attached latest v11 patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> >>>> Few comments: >>> >>> Thank you for reviewing. >>> >>>> + * In 10.0 we support two synchronization methods, priority and >>>> + * quorum. The number of synchronous standbys that transactions >>>> + * must wait for replies from and synchronization method are specified >>>> + * in synchronous_standby_names. This parameter also specifies a list >>>> + * of standby names, which determines the priority of each standby for >>>> + * being chosen as a synchronous standby. In priority method, the standbys >>>> + * whose names appear earlier in the list are given higher priority >>>> + * and will be considered as synchronous. Other standby servers appearing >>>> + * later in this list represent potential synchronous standbys. If any of >>>> + * the current synchronous standbys disconnects for whatever reason, >>>> + * it will be replaced immediately with the next-highest-priority standby. >>>> + * In quorum method, the all standbys appearing in the list are >>>> + * considered as a candidate for quorum commit. >>>> >>>> In the above description, is priority method represented by FIRST and >>>> quorum method by ANY in the synchronous_standby_names syntax? If so, >>>> it might be better to write about it explicitly. >>> >>> Added description. >>> + * specified in synchronous_standby_names. The priority method is + * represented by FIRST, and the quorum method is represented by ANY Full stop is missing after "ANY". >>> >>>> 6. >>>> + int sync_method; /* synchronization method */ >>>> /* member_names contains nmembers consecutive nul-terminated C strings */ >>>> char member_names[FLEXIBLE_ARRAY_MEMBER]; >>>> } SyncRepConfigData; >>>> >>>> Can't we use 1 or 2 bytes to store sync_method information? >>> >>> I changed it to uint8. >>> + int8 sync_method; /* synchronization method */ > I changed it to uint8. In mail, you have mentioned uint8, but in the code it is int8. I think you want to update code to use uint8. > >> + standby_list { $$ = >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } >> + | NUM '(' standby_list ')' { $$ = >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); } >> + | ANY NUM '(' standby_list ')' { $$ = >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); } >> + | FIRST NUM '(' standby_list ')' { $$ = >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } >> >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works >> differently from curent version while "list" works in the same way as >> current one) very confusing? >> >> I prefer to either of >> >> 1. break the backward-compatibility, i.e., treat the first syntax of >> "standby_list" as quorum commit >> 2. keep the backward-compatibility, i.e., treat the second syntax of >> "NUM (standby_list)" as sync rep with the priority > +1. > There were some comments when I proposed the quorum commit. If we do > #1 it breaks the backward-compatibility with 9.5 or before as well. I > don't think it's a good idea. On the other hand, if we do #2 then the > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM > (standby_list)''. But it would not what most of user will want and > would confuse the users of future version who will want to use the > quorum commit. Since many hackers thought that the sensible default > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the > current patch chose to changes the behaviour of s_s_names and document > that changes thoroughly. > Your arguments are sensible, but I think we should address the point of confusion raised by Fujii-san. As a developer, I feel breaking backward compatibility (go with Option-1 mentioned above) here is a good move as it can avoid confusions in future. However, I know many a time users are so accustomed to the current usage that they feel irritated with the change in behavior even it is for their betterment, so it is advisable to do so only if it is necessary or we have feedback from a couple of users. So in this case, if we don't want to go with Option-1, then I think we should go with Option-2. If we go with Option-2, then we can anyway comeback to change the behavior which is more sensible for future after feedback from users. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1JoheFzO1rrKm391wJDepFvZr1geRh4ZJ_9iC4FOX+3Uw@mail.gmail.com> > On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > >>>> > >>>> Few comments: > >>> > >>> Thank you for reviewing. > >>> > >>>> + * In 10.0 we support two synchronization methods, priority and > >>>> + * quorum. The number of synchronous standbys that transactions > >>>> + * must wait for replies from and synchronization method are specified > >>>> + * in synchronous_standby_names. This parameter also specifies a list > >>>> + * of standby names, which determines the priority of each standby for > >>>> + * being chosen as a synchronous standby. In priority method, the standbys > >>>> + * whose names appear earlier in the list are given higher priority > >>>> + * and will be considered as synchronous. Other standby servers appearing > >>>> + * later in this list represent potential synchronous standbys. If any of > >>>> + * the current synchronous standbys disconnects for whatever reason, > >>>> + * it will be replaced immediately with the next-highest-priority standby. > >>>> + * In quorum method, the all standbys appearing in the list are > >>>> + * considered as a candidate for quorum commit. > >>>> > >>>> In the above description, is priority method represented by FIRST and > >>>> quorum method by ANY in the synchronous_standby_names syntax? If so, > >>>> it might be better to write about it explicitly. > >>> > >>> Added description. > >>> > > + * specified in synchronous_standby_names. The priority method is > + * represented by FIRST, and the quorum method is represented by ANY > > Full stop is missing after "ANY". > > >>> > >>>> 6. > >>>> + int sync_method; /* synchronization method */ > >>>> /* member_names contains nmembers consecutive nul-terminated C strings */ > >>>> char member_names[FLEXIBLE_ARRAY_MEMBER]; > >>>> } SyncRepConfigData; > >>>> > >>>> Can't we use 1 or 2 bytes to store sync_method information? > >>> > >>> I changed it to uint8. > >>> > > + int8 sync_method; /* synchronization method */ > > > I changed it to uint8. > > In mail, you have mentioned uint8, but in the code it is int8. I > think you want to update code to use uint8. > > > > > >> + standby_list { $$ = > >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } > >> + | NUM '(' standby_list ')' { $$ = > >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); } > >> + | ANY NUM '(' standby_list ')' { $$ = > >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); } > >> + | FIRST NUM '(' standby_list ')' { $$ = > >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } > >> > >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works > >> differently from curent version while "list" works in the same way as > >> current one) very confusing? > >> > >> I prefer to either of > >> > >> 1. break the backward-compatibility, i.e., treat the first syntax of > >> "standby_list" as quorum commit > >> 2. keep the backward-compatibility, i.e., treat the second syntax of > >> "NUM (standby_list)" as sync rep with the priority > > > > +1. > > > There were some comments when I proposed the quorum commit. If we do > > #1 it breaks the backward-compatibility with 9.5 or before as well. I > > don't think it's a good idea. On the other hand, if we do #2 then the > > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM > > (standby_list)''. But it would not what most of user will want and > > would confuse the users of future version who will want to use the > > quorum commit. Since many hackers thought that the sensible default > > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the > > current patch chose to changes the behaviour of s_s_names and document > > that changes thoroughly. > > > > Your arguments are sensible, but I think we should address the point > of confusion raised by Fujii-san. As a developer, I feel breaking > backward compatibility (go with Option-1 mentioned above) here is a > good move as it can avoid confusions in future. However, I know many > a time users are so accustomed to the current usage that they feel > irritated with the change in behavior even it is for their betterment, > so it is advisable to do so only if it is necessary or we have > feedback from a couple of users. So in this case, if we don't want to > go with Option-1, then I think we should go with Option-2. If we go > with Option-2, then we can anyway comeback to change the behavior > which is more sensible for future after feedback from users. This implicitly put an assumption that replication configuration is defined by s_s_names. But in the past discussion, some people suggested that quorum commit should be configured by another GUC variable and I think it is the time to do this now. So, we have the third option that would be like the following. - s_s_names is restored to work in the way as of 9.5. or may be the same as 9.6. Or immediately remove it! My inclination is doing this. - a new GUC varialbe such like "quorum_commit_standbys" (which is exclusive to s_s_names) is defined for new version of quorum commit feature. The option-1 except "standby_list" format is usable in this. This doesn't puzzle users who don't read release notes carefully (ME?). Leaving s_s_names can save some of such users but I don't think it is requried at Pg10. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Dec 13, 2016 at 5:06 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1JoheFzO1rrKm391wJDepFvZr1geRh4ZJ_9iC4FOX+3Uw@mail.gmail.com> >> On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >>>> >> >>>> Few comments: >> >>> >> >>> Thank you for reviewing. >> >>> >> >>>> + * In 10.0 we support two synchronization methods, priority and >> >>>> + * quorum. The number of synchronous standbys that transactions >> >>>> + * must wait for replies from and synchronization method are specified >> >>>> + * in synchronous_standby_names. This parameter also specifies a list >> >>>> + * of standby names, which determines the priority of each standby for >> >>>> + * being chosen as a synchronous standby. In priority method, the standbys >> >>>> + * whose names appear earlier in the list are given higher priority >> >>>> + * and will be considered as synchronous. Other standby servers appearing >> >>>> + * later in this list represent potential synchronous standbys. If any of >> >>>> + * the current synchronous standbys disconnects for whatever reason, >> >>>> + * it will be replaced immediately with the next-highest-priority standby. >> >>>> + * In quorum method, the all standbys appearing in the list are >> >>>> + * considered as a candidate for quorum commit. >> >>>> >> >>>> In the above description, is priority method represented by FIRST and >> >>>> quorum method by ANY in the synchronous_standby_names syntax? If so, >> >>>> it might be better to write about it explicitly. >> >>> >> >>> Added description. >> >>> >> >> + * specified in synchronous_standby_names. The priority method is >> + * represented by FIRST, and the quorum method is represented by ANY >> >> Full stop is missing after "ANY". >> >> >>> >> >>>> 6. >> >>>> + int sync_method; /* synchronization method */ >> >>>> /* member_names contains nmembers consecutive nul-terminated C strings */ >> >>>> char member_names[FLEXIBLE_ARRAY_MEMBER]; >> >>>> } SyncRepConfigData; >> >>>> >> >>>> Can't we use 1 or 2 bytes to store sync_method information? >> >>> >> >>> I changed it to uint8. >> >>> >> >> + int8 sync_method; /* synchronization method */ >> >> > I changed it to uint8. >> >> In mail, you have mentioned uint8, but in the code it is int8. I >> think you want to update code to use uint8. >> >> >> > >> >> + standby_list { $$ = >> >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } >> >> + | NUM '(' standby_list ')' { $$ = >> >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); } >> >> + | ANY NUM '(' standby_list ')' { $$ = >> >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); } >> >> + | FIRST NUM '(' standby_list ')' { $$ = >> >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } >> >> >> >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works >> >> differently from curent version while "list" works in the same way as >> >> current one) very confusing? >> >> >> >> I prefer to either of >> >> >> >> 1. break the backward-compatibility, i.e., treat the first syntax of >> >> "standby_list" as quorum commit >> >> 2. keep the backward-compatibility, i.e., treat the second syntax of >> >> "NUM (standby_list)" as sync rep with the priority >> > >> >> +1. >> >> > There were some comments when I proposed the quorum commit. If we do >> > #1 it breaks the backward-compatibility with 9.5 or before as well. I >> > don't think it's a good idea. On the other hand, if we do #2 then the >> > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM >> > (standby_list)''. But it would not what most of user will want and >> > would confuse the users of future version who will want to use the >> > quorum commit. Since many hackers thought that the sensible default >> > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the >> > current patch chose to changes the behaviour of s_s_names and document >> > that changes thoroughly. >> > >> >> Your arguments are sensible, but I think we should address the point >> of confusion raised by Fujii-san. As a developer, I feel breaking >> backward compatibility (go with Option-1 mentioned above) here is a >> good move as it can avoid confusions in future. However, I know many >> a time users are so accustomed to the current usage that they feel >> irritated with the change in behavior even it is for their betterment, >> so it is advisable to do so only if it is necessary or we have >> feedback from a couple of users. So in this case, if we don't want to >> go with Option-1, then I think we should go with Option-2. If we go >> with Option-2, then we can anyway comeback to change the behavior >> which is more sensible for future after feedback from users. > > This implicitly put an assumption that replication configuration > is defined by s_s_names. But in the past discussion, some people > suggested that quorum commit should be configured by another GUC > variable and I think it is the time to do this now. > > So, we have the third option that would be like the following. > > - s_s_names is restored to work in the way as of 9.5. or may > be the same as 9.6. Or immediately remove it! My inclination > is doing this. > > - a new GUC varialbe such like "quorum_commit_standbys" (which > is exclusive to s_s_names) is defined for new version of > quorum commit feature. The option-1 except "standby_list" > format is usable in this. If we drop the "standby_list" syntax, I don't think that new parameter is necessary. We can keep s_s_names and just drop the support for that syntax from s_s_names. This may be ok if we're really in "break all the things" mode for PostgreSQL 10. Regards, -- Fujii Masao
On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > If we drop the "standby_list" syntax, I don't think that new parameter is > necessary. We can keep s_s_names and just drop the support for that syntax > from s_s_names. This may be ok if we're really in "break all the things" mode > for PostgreSQL 10. Please let's not raise that as an argument again... And not break the s_list argument. Many users depend on that for just single sync standbys. FWIW, I'd be in favor of backward compatibility and say that a standby list is a priority list if we can maintain that. Upthread agreement was to break that, I did not insist further, and won't if that's still the feeling. -- Michael
On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> If we drop the "standby_list" syntax, I don't think that new parameter is >> necessary. We can keep s_s_names and just drop the support for that syntax >> from s_s_names. This may be ok if we're really in "break all the things" mode >> for PostgreSQL 10. > > Please let's not raise that as an argument again... And not break the > s_list argument. Many users depend on that for just single sync > standbys. FWIW, I'd be in favor of backward compatibility and say that > a standby list is a priority list if we can maintain that. Upthread > agreement was to break that, I did not insist further, and won't if > that's still the feeling. I wonder why you think that the backward-compatibility for standby_list is so "special". We renamed pg_xlog directory to pg_wal and are planning to change recovery.conf API at all, though they have bigger impacts on the existing users in terms of the backward compatibility. OTOH, so far, changing GUC between major releases happened several times. But I'm not against keeping the backward compatibility for standby_list, to be honest. My concern is that the latest patch tries to support the backward compatibility "partially" and which would be confusing to users, as I told upthread. So I'd like to propose to keep the backward compatibility fully for s_s_names (i.e., both "standby_list" and "N (standby_list)" mean the priority method) at the first commit, then continue discussing this and change it if we reach the consensus until PostgreSQL 10 is actually released. Thought? Regards, -- Fujii Masao
On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> If we drop the "standby_list" syntax, I don't think that new parameter is >>> necessary. We can keep s_s_names and just drop the support for that syntax >>> from s_s_names. This may be ok if we're really in "break all the things" mode >>> for PostgreSQL 10. >> >> Please let's not raise that as an argument again... And not break the >> s_list argument. Many users depend on that for just single sync >> standbys. FWIW, I'd be in favor of backward compatibility and say that >> a standby list is a priority list if we can maintain that. Upthread >> agreement was to break that, I did not insist further, and won't if >> that's still the feeling. > > I wonder why you think that the backward-compatibility for standby_list is > so "special". We renamed pg_xlog directory to pg_wal and are planning to > change recovery.conf API at all, though they have bigger impacts on > the existing users in terms of the backward compatibility. OTOH, so far, > changing GUC between major releases happened several times. Silent failures for existing failover deployments is a pain to solve after doing upgrades. That's my only concern. Changing pg_wal would result in a hard failure when upgrading. And changing the meaning of the standby list (without keyword ANY or FIRST!) does not fall into that category... So yes just removing support for standby list would result in a hard failure, which would be fine for the let-s-break-all-things move. > But I'm not against keeping the backward compatibility for standby_list, > to be honest. My concern is that the latest patch tries to support > the backward compatibility "partially" and which would be confusing to users, > as I told upthread. If we try to support backward compatibility, I'd personally do it fully, and have a list of standby names specified meaning a priority list. > So I'd like to propose to keep the backward compatibility fully for s_s_names > (i.e., both "standby_list" and "N (standby_list)" mean the priority method) > at the first commit, then continue discussing this and change it if we reach > the consensus until PostgreSQL 10 is actually released. Thought? +1 on that. -- Michael
On Thu, Dec 15, 2016 at 7:53 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: > >> So I'd like to propose to keep the backward compatibility fully for s_s_names >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method) >> at the first commit, then continue discussing this and change it if we reach >> the consensus until PostgreSQL 10 is actually released. Thought? > > +1 on that. > +1. That is the safest option to proceed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> If we drop the "standby_list" syntax, I don't think that new parameter is >>>> necessary. We can keep s_s_names and just drop the support for that syntax >>>> from s_s_names. This may be ok if we're really in "break all the things" mode >>>> for PostgreSQL 10. >>> >>> Please let's not raise that as an argument again... And not break the >>> s_list argument. Many users depend on that for just single sync >>> standbys. FWIW, I'd be in favor of backward compatibility and say that >>> a standby list is a priority list if we can maintain that. Upthread >>> agreement was to break that, I did not insist further, and won't if >>> that's still the feeling. >> >> I wonder why you think that the backward-compatibility for standby_list is >> so "special". We renamed pg_xlog directory to pg_wal and are planning to >> change recovery.conf API at all, though they have bigger impacts on >> the existing users in terms of the backward compatibility. OTOH, so far, >> changing GUC between major releases happened several times. > > Silent failures for existing failover deployments is a pain to solve > after doing upgrades. That's my only concern. Changing pg_wal would > result in a hard failure when upgrading. And changing the meaning of > the standby list (without keyword ANY or FIRST!) does not fall into > that category... So yes just removing support for standby list would > result in a hard failure, which would be fine for the > let-s-break-all-things move. > >> But I'm not against keeping the backward compatibility for standby_list, >> to be honest. My concern is that the latest patch tries to support >> the backward compatibility "partially" and which would be confusing to users, >> as I told upthread. > If we try to support backward compatibility, I'd personally do it > fully, and have a list of standby names specified meaning a priority > list. > >> So I'd like to propose to keep the backward compatibility fully for s_s_names >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method) >> at the first commit, then continue discussing this and change it if we reach >> the consensus until PostgreSQL 10 is actually released. Thought? > > +1 on that. +1. I'll update the patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Thu, 15 Dec 2016 14:20:53 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDn73aC+o0mrWCs800LeOsMYP4oV7xVb0T0_4V5VCQzhQ@mail.gmail.com> > On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier > >> <michael.paquier@gmail.com> wrote: > >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > >>>> If we drop the "standby_list" syntax, I don't think that new parameter is > >>>> necessary. We can keep s_s_names and just drop the support for that syntax > >>>> from s_s_names. This may be ok if we're really in "break all the things" mode > >>>> for PostgreSQL 10. > >>> > >>> Please let's not raise that as an argument again... And not break the > >>> s_list argument. Many users depend on that for just single sync > >>> standbys. FWIW, I'd be in favor of backward compatibility and say that > >>> a standby list is a priority list if we can maintain that. Upthread > >>> agreement was to break that, I did not insist further, and won't if > >>> that's still the feeling. > >> > >> I wonder why you think that the backward-compatibility for standby_list is > >> so "special". We renamed pg_xlog directory to pg_wal and are planning to > >> change recovery.conf API at all, though they have bigger impacts on > >> the existing users in terms of the backward compatibility. OTOH, so far, > >> changing GUC between major releases happened several times. > > > > Silent failures for existing failover deployments is a pain to solve > > after doing upgrades. That's my only concern. Changing pg_wal would > > result in a hard failure when upgrading. And changing the meaning of > > the standby list (without keyword ANY or FIRST!) does not fall into > > that category... So yes just removing support for standby list would > > result in a hard failure, which would be fine for the > > let-s-break-all-things move. > > > >> But I'm not against keeping the backward compatibility for standby_list, > >> to be honest. My concern is that the latest patch tries to support > >> the backward compatibility "partially" and which would be confusing to users, > >> as I told upthread. > > If we try to support backward compatibility, I'd personally do it > > fully, and have a list of standby names specified meaning a priority > > list. > > > >> So I'd like to propose to keep the backward compatibility fully for s_s_names > >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method) > >> at the first commit, then continue discussing this and change it if we reach > >> the consensus until PostgreSQL 10 is actually released. Thought? > > > > +1 on that. > > +1. FWIW, +1 from me. > I'll update the patch. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Dec 15, 2016 at 3:06 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Thu, 15 Dec 2016 14:20:53 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDn73aC+o0mrWCs800LeOsMYP4oV7xVb0T0_4V5VCQzhQ@mail.gmail.com> >> On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >> > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier >> >> <michael.paquier@gmail.com> wrote: >> >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >>>> If we drop the "standby_list" syntax, I don't think that new parameter is >> >>>> necessary. We can keep s_s_names and just drop the support for that syntax >> >>>> from s_s_names. This may be ok if we're really in "break all the things" mode >> >>>> for PostgreSQL 10. >> >>> >> >>> Please let's not raise that as an argument again... And not break the >> >>> s_list argument. Many users depend on that for just single sync >> >>> standbys. FWIW, I'd be in favor of backward compatibility and say that >> >>> a standby list is a priority list if we can maintain that. Upthread >> >>> agreement was to break that, I did not insist further, and won't if >> >>> that's still the feeling. >> >> >> >> I wonder why you think that the backward-compatibility for standby_list is >> >> so "special". We renamed pg_xlog directory to pg_wal and are planning to >> >> change recovery.conf API at all, though they have bigger impacts on >> >> the existing users in terms of the backward compatibility. OTOH, so far, >> >> changing GUC between major releases happened several times. >> > >> > Silent failures for existing failover deployments is a pain to solve >> > after doing upgrades. That's my only concern. Changing pg_wal would >> > result in a hard failure when upgrading. And changing the meaning of >> > the standby list (without keyword ANY or FIRST!) does not fall into >> > that category... So yes just removing support for standby list would >> > result in a hard failure, which would be fine for the >> > let-s-break-all-things move. >> > >> >> But I'm not against keeping the backward compatibility for standby_list, >> >> to be honest. My concern is that the latest patch tries to support >> >> the backward compatibility "partially" and which would be confusing to users, >> >> as I told upthread. >> > If we try to support backward compatibility, I'd personally do it >> > fully, and have a list of standby names specified meaning a priority >> > list. >> > >> >> So I'd like to propose to keep the backward compatibility fully for s_s_names >> >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method) >> >> at the first commit, then continue discussing this and change it if we reach >> >> the consensus until PostgreSQL 10 is actually released. Thought? >> > >> > +1 on that. >> >> +1. > > FWIW, +1 from me. > >> I'll update the patch. > Attached latest v12 patch. I changed behavior of "N (standby_list)" to use the priority method and incorporated some review comments so far. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached latest v12 patch. > I changed behavior of "N (standby_list)" to use the priority method > and incorporated some review comments so far. Please review it. Some comments... + Another example of <varname>synchronous_standby_names</> for multiple + synchronous standby is: Here standby takes an 's'. + candidates. The master server will wait for at least 2 replies from them. + <literal>s4</> is an asynchronous standby since its name is not in the list. + </para> "will wait for replies from at least two of them". + * next-highest-priority standby. In quorum method, the all standbys + * appearing in the list are considered as a candidate for quorum commit. "the all" is incorrect. I think you mean "all the" instead. + * NIL if no sync standby is connected. In quorum method, all standby + * priorities are same, that is 1. So this function returns the list of This is not true. Standys have a priority number assigned. Though it does not matter much for quorum groups, it gives an indication of their position in the defined list. #synchronous_standby_names = '' # standby servers that provide sync rep- # number of sync standbys and comma-separatedlist of application_name+ # synchronization method, number of sync standbys+ #and comma-separated list of application_name # from standby(s); '*' = all The formulation is funny here: "sync rep synchronization method". I think that Fujii-san has also some doc changes in his box. For anybody picking up this patch next, it would be good to incorporate the things I have noticed here. -- Michael
On Fri, Dec 16, 2016 at 2:38 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached latest v12 patch. >> I changed behavior of "N (standby_list)" to use the priority method >> and incorporated some review comments so far. Please review it. > > Some comments... > > + Another example of <varname>synchronous_standby_names</> for multiple > + synchronous standby is: > Here standby takes an 's'. > > + candidates. The master server will wait for at least 2 replies from them. > + <literal>s4</> is an asynchronous standby since its name is not in the list. > + </para> > "will wait for replies from at least two of them". > > + * next-highest-priority standby. In quorum method, the all standbys > + * appearing in the list are considered as a candidate for quorum commit. > "the all" is incorrect. I think you mean "all the" instead. > > + * NIL if no sync standby is connected. In quorum method, all standby > + * priorities are same, that is 1. So this function returns the list of > This is not true. Standys have a priority number assigned. Though it does > not matter much for quorum groups, it gives an indication of their position > in the defined list. > > #synchronous_standby_names = '' # standby servers that provide sync rep > - # number of sync standbys and comma-separated list of application_name > + # synchronization method, number of sync standbys > + # and comma-separated list of application_name > # from standby(s); '*' = all > The formulation is funny here: "sync rep synchronization method". > > I think that Fujii-san has also some doc changes in his box. For anybody > picking up this patch next, it would be good to incorporate the things > I have noticed here. Yes, I will. Thanks! Regards, -- Fujii Masao
On Fri, Dec 16, 2016 at 5:04 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Dec 16, 2016 at 2:38 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Attached latest v12 patch. >>> I changed behavior of "N (standby_list)" to use the priority method >>> and incorporated some review comments so far. Please review it. >> >> Some comments... >> >> + Another example of <varname>synchronous_standby_names</> for multiple >> + synchronous standby is: >> Here standby takes an 's'. >> >> + candidates. The master server will wait for at least 2 replies from them. >> + <literal>s4</> is an asynchronous standby since its name is not in the list. >> + </para> >> "will wait for replies from at least two of them". >> >> + * next-highest-priority standby. In quorum method, the all standbys >> + * appearing in the list are considered as a candidate for quorum commit. >> "the all" is incorrect. I think you mean "all the" instead. >> >> + * NIL if no sync standby is connected. In quorum method, all standby >> + * priorities are same, that is 1. So this function returns the list of >> This is not true. Standys have a priority number assigned. Though it does >> not matter much for quorum groups, it gives an indication of their position >> in the defined list. >> >> #synchronous_standby_names = '' # standby servers that provide sync rep >> - # number of sync standbys and comma-separated list of application_name >> + # synchronization method, number of sync standbys >> + # and comma-separated list of application_name >> # from standby(s); '*' = all >> The formulation is funny here: "sync rep synchronization method". >> >> I think that Fujii-san has also some doc changes in his box. For anybody >> picking up this patch next, it would be good to incorporate the things >> I have noticed here. > > Yes, I will. Thanks! Attached is the modified version of the patch. Barring objections, I will commit this version. Even after committing the patch, there will be still many source comments and documentations that we need to update, for example, in high-availability.sgml. We need to check and update them throughly later. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Attached is the modified version of the patch. Barring objections, I will > commit this version. There is a whitespace: $ git diff master --check src/backend/replication/syncrep.c:39: trailing whitespace. + * > Even after committing the patch, there will be still many source comments > and documentations that we need to update, for example, > in high-availability.sgml. We need to check and update them throughly later. The current patch is complicated enough, so that's fine for me. I checked the patch one last time and that looks good. -- Michael
On Sun, Dec 18, 2016 at 9:36 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Attached is the modified version of the patch. Barring objections, I will >> commit this version. > > There is a whitespace: > $ git diff master --check > src/backend/replication/syncrep.c:39: trailing whitespace. > + * Okey, pushed the patch with this fix. Thanks! Regarding this feature, there are some loose ends. We should work on and complete them until the release. (1) Which synchronous replication method, priority or quorum, should be chosen when neither FIRST nor ANY is specified in s_s_names? Right now, a priority-based sync replication is chosen for keeping backward compatibility. However some hackers argued to change this decision so that a quorum commit is chosen because they think that most users prefer to a quorum. (2) There will be still many source comments and documentations that we need to update, for example, in high-availability.sgml. We need to check and update them throughly. (3) The priority value is assigned to each standby listed in s_s_names even in quorum commit though those priority values are not used at all. Users can see those priority values in pg_stat_replication. Isn't this confusing? If yes, it might be better to always assign 1 as the priority, for example. Any other? Regards, -- Fujii Masao
Fujii Masao wrote: > Regarding this feature, there are some loose ends. We should work on > and complete them until the release. Please list these in https://wiki.postgresql.org/wiki/Open_Items so that we don't forget. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 19, 2016 at 9:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sun, Dec 18, 2016 at 9:36 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Attached is the modified version of the patch. Barring objections, I will >>> commit this version. >> >> There is a whitespace: >> $ git diff master --check >> src/backend/replication/syncrep.c:39: trailing whitespace. >> + * > > Okey, pushed the patch with this fix. Thanks! Thank you for reviewing and commit! > Regarding this feature, there are some loose ends. We should work on > and complete them until the release. > > (1) > Which synchronous replication method, priority or quorum, should be > chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > a priority-based sync replication is chosen for keeping backward > compatibility. However some hackers argued to change this decision > so that a quorum commit is chosen because they think that most users > prefer to a quorum. > > (2) > There will be still many source comments and documentations that > we need to update, for example, in high-availability.sgml. We need to > check and update them throughly. Will try to update them. > (3) > The priority value is assigned to each standby listed in s_s_names > even in quorum commit though those priority values are not used at all. > Users can see those priority values in pg_stat_replication. > Isn't this confusing? If yes, it might be better to always assign 1 as > the priority, for example. > > > Any other? > Do we need to consider the sorting method and the selecting k-th latest LSN method? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Do we need to consider the sorting method and the selecting k-th > latest LSN method? Honestly, nah. Tests are showing that there are many more bottlenecks before that with just memory allocation and parsing. -- Michael
On Tue, Dec 20, 2016 at 1:44 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Fujii Masao wrote: > >> Regarding this feature, there are some loose ends. We should work on >> and complete them until the release. > > Please list these in https://wiki.postgresql.org/wiki/Open_Items so that we > don't forget. Yep, added! Regards, -- Fujii Masao
On Tue, Dec 20, 2016 at 2:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Do we need to consider the sorting method and the selecting k-th >> latest LSN method? > > Honestly, nah. Tests are showing that there are many more bottlenecks > before that with just memory allocation and parsing. I think that it's worth prototyping alternative algorithm, and measuring the performances of those alternative and current algorithms. This measurement should check not only the bottleneck but also how much each algorithm increases the time that backends need to wait for before they receive ack from walsender. If it's reported that current algorithm is enough "effecient", we can just leave the code as it is. OTOH, if not, let's adopt the alternative one. Regards, -- Fujii Masao
At Tue, 20 Dec 2016 23:47:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwFcEhv8BPP0HV2VQ8kXaHQmfN7PFAgkKsPyVip0frizpg@mail.gmail.com> > On Tue, Dec 20, 2016 at 2:46 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> Do we need to consider the sorting method and the selecting k-th > >> latest LSN method? > > > > Honestly, nah. Tests are showing that there are many more bottlenecks > > before that with just memory allocation and parsing. > > I think that it's worth prototyping alternative algorithm, and > measuring the performances of those alternative and current > algorithms. This measurement should check not only the bottleneck > but also how much each algorithm increases the time that backends > need to wait for before they receive ack from walsender. > > If it's reported that current algorithm is enough "effecient", > we can just leave the code as it is. OTOH, if not, let's adopt > the alternative one. I'm personally interested in the difference of them but it doesn't seem urgently required. If we have nothing particular to do with this, considering other ordering method would be valuable. By a not-well-grounded thought though, maintaining top-kth list by insertion sort would be promising rather than running top-kth sorting on the whole list. Sorting on all walsenders is needed for the first time and some other situation though. By the way, do we continue dispu^h^hcussing on the format of s_s_names and/or a successor right now? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Dec 21, 2016 at 10:39 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Tue, 20 Dec 2016 23:47:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwFcEhv8BPP0HV2VQ8kXaHQmfN7PFAgkKsPyVip0frizpg@mail.gmail.com> >> On Tue, Dec 20, 2016 at 2:46 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >> > On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> Do we need to consider the sorting method and the selecting k-th >> >> latest LSN method? >> > >> > Honestly, nah. Tests are showing that there are many more bottlenecks >> > before that with just memory allocation and parsing. >> >> I think that it's worth prototyping alternative algorithm, and >> measuring the performances of those alternative and current >> algorithms. This measurement should check not only the bottleneck >> but also how much each algorithm increases the time that backends >> need to wait for before they receive ack from walsender. >> >> If it's reported that current algorithm is enough "effecient", >> we can just leave the code as it is. OTOH, if not, let's adopt >> the alternative one. > > I'm personally interested in the difference of them but it > doesn't seem urgently required. Yes, it's not urgent task. > If we have nothing particular to > do with this, considering other ordering method would be > valuable. > > By a not-well-grounded thought though, maintaining top-kth list > by insertion sort would be promising rather than running top-kth > sorting on the whole list. Sorting on all walsenders is needed > for the first time and some other situation though. > > > By the way, do we continue dispu^h^hcussing on the format of > s_s_names and/or a successor right now? Yes. If there is better approach, we should discuss that. Regards, -- Fujii Masao
On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > Regarding this feature, there are some loose ends. We should work on > and complete them until the release. > > (1) > Which synchronous replication method, priority or quorum, should be > chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > a priority-based sync replication is chosen for keeping backward > compatibility. However some hackers argued to change this decision > so that a quorum commit is chosen because they think that most users > prefer to a quorum. > > (2) > There will be still many source comments and documentations that > we need to update, for example, in high-availability.sgml. We need to > check and update them throughly. > > (3) > The priority value is assigned to each standby listed in s_s_names > even in quorum commit though those priority values are not used at all. > Users can see those priority values in pg_stat_replication. > Isn't this confusing? If yes, it might be better to always assign 1 as > the priority, for example. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Fujii, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> Regarding this feature, there are some loose ends. We should work on >> and complete them until the release. >> >> (1) >> Which synchronous replication method, priority or quorum, should be >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >> a priority-based sync replication is chosen for keeping backward >> compatibility. However some hackers argued to change this decision >> so that a quorum commit is chosen because they think that most users >> prefer to a quorum. >> >> (2) >> There will be still many source comments and documentations that >> we need to update, for example, in high-availability.sgml. We need to >> check and update them throughly. >> >> (3) >> The priority value is assigned to each standby listed in s_s_names >> even in quorum commit though those priority values are not used at all. >> Users can see those priority values in pg_stat_replication. >> Isn't this confusing? If yes, it might be better to always assign 1 as >> the priority, for example. > > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Fujii, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. > > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com Thanks for the notice! Regarding the item (2), Sawada-san told me that he will work on it after this CommitFest finishes. So we would receive the patch for the item from him next week. If there will be no patch even after the end of next week (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. The items (1) and (3) are not bugs. So I don't think that they need to be resolved before the beta release. After the feature freeze, many users will try and play with many new features including quorum-based syncrep. Then if many of them complain about (1) and (3), we can change the code at that timing. So we need more time that users can try the feature. BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL as the priority if quorum-based sync rep is chosen. It's less confusing. Regards, -- Fujii Masao
On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >> Regarding this feature, there are some loose ends. We should work on > >> and complete them until the release. > >> > >> (1) > >> Which synchronous replication method, priority or quorum, should be > >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > >> a priority-based sync replication is chosen for keeping backward > >> compatibility. However some hackers argued to change this decision > >> so that a quorum commit is chosen because they think that most users > >> prefer to a quorum. > >> > >> (2) > >> There will be still many source comments and documentations that > >> we need to update, for example, in high-availability.sgml. We need to > >> check and update them throughly. > >> > >> (3) > >> The priority value is assigned to each standby listed in s_s_names > >> even in quorum commit though those priority values are not used at all. > >> Users can see those priority values in pg_stat_replication. > >> Isn't this confusing? If yes, it might be better to always assign 1 as > >> the priority, for example. > > > > [Action required within three days. This is a generic notification.] > > > > The above-described topic is currently a PostgreSQL 10 open item. Fujii, > > since you committed the patch believed to have created it, you own this open > > item. If some other commit is more relevant or if this does not belong as a > > v10 open item, please let us know. Otherwise, please observe the policy on > > open item ownership[1] and send a status update within three calendar days of > > this message. Include a date for your subsequent status update. Testers may > > discover new open items at any time, and I want to plan to get them all fixed > > well in advance of shipping v10. Consequently, I will appreciate your efforts > > toward speedy resolution. Thanks. > > > > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > Thanks for the notice! > > Regarding the item (2), Sawada-san told me that he will work on it after > this CommitFest finishes. So we would receive the patch for the item from > him next week. If there will be no patch even after the end of next week > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. Sounds reasonable; I will look for your update on 14Apr or earlier. > The items (1) and (3) are not bugs. So I don't think that they need to be > resolved before the beta release. After the feature freeze, many users > will try and play with many new features including quorum-based syncrep. > Then if many of them complain about (1) and (3), we can change the code > at that timing. So we need more time that users can try the feature. I've moved (1) to a new section for things to revisit during beta. If someone feels strongly that the current behavior is Wrong and must change, speak up as soon as you reach that conclusion. Absent such arguments, the behavior won't change. > BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL > as the priority if quorum-based sync rep is chosen. It's less confusing. Since you do want (3) to change, please own it like any other open item, including the mandatory status updates.
On 06/04/17 03:51, Noah Misch wrote: > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >>>> Regarding this feature, there are some loose ends. We should work on >>>> and complete them until the release. >>>> >>>> (1) >>>> Which synchronous replication method, priority or quorum, should be >>>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >>>> a priority-based sync replication is chosen for keeping backward >>>> compatibility. However some hackers argued to change this decision >>>> so that a quorum commit is chosen because they think that most users >>>> prefer to a quorum. >>>> >>>> (2) >>>> There will be still many source comments and documentations that >>>> we need to update, for example, in high-availability.sgml. We need to >>>> check and update them throughly. >>>> >>>> (3) >>>> The priority value is assigned to each standby listed in s_s_names >>>> even in quorum commit though those priority values are not used at all. >>>> Users can see those priority values in pg_stat_replication. >>>> Isn't this confusing? If yes, it might be better to always assign 1 as >>>> the priority, for example. >>> >>> [Action required within three days. This is a generic notification.] >>> >>> The above-described topic is currently a PostgreSQL 10 open item. Fujii, >>> since you committed the patch believed to have created it, you own this open >>> item. If some other commit is more relevant or if this does not belong as a >>> v10 open item, please let us know. Otherwise, please observe the policy on >>> open item ownership[1] and send a status update within three calendar days of >>> this message. Include a date for your subsequent status update. Testers may >>> discover new open items at any time, and I want to plan to get them all fixed >>> well in advance of shipping v10. Consequently, I will appreciate your efforts >>> toward speedy resolution. Thanks. >>> >>> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >> >> Thanks for the notice! >> >> Regarding the item (2), Sawada-san told me that he will work on it after >> this CommitFest finishes. So we would receive the patch for the item from >> him next week. If there will be no patch even after the end of next week >> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. > > Sounds reasonable; I will look for your update on 14Apr or earlier. > >> The items (1) and (3) are not bugs. So I don't think that they need to be >> resolved before the beta release. After the feature freeze, many users >> will try and play with many new features including quorum-based syncrep. >> Then if many of them complain about (1) and (3), we can change the code >> at that timing. So we need more time that users can try the feature. > > I've moved (1) to a new section for things to revisit during beta. If someone > feels strongly that the current behavior is Wrong and must change, speak up as > soon as you reach that conclusion. Absent such arguments, the behavior won't > change. > I was one of the people who said in original thread that I think the default behavior should change to quorum and I am still of that opinion. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> >> Regarding this feature, there are some loose ends. We should work on >> >> and complete them until the release. >> >> >> >> (1) >> >> Which synchronous replication method, priority or quorum, should be >> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >> >> a priority-based sync replication is chosen for keeping backward >> >> compatibility. However some hackers argued to change this decision >> >> so that a quorum commit is chosen because they think that most users >> >> prefer to a quorum. >> >> >> >> (2) >> >> There will be still many source comments and documentations that >> >> we need to update, for example, in high-availability.sgml. We need to >> >> check and update them throughly. >> >> >> >> (3) >> >> The priority value is assigned to each standby listed in s_s_names >> >> even in quorum commit though those priority values are not used at all. >> >> Users can see those priority values in pg_stat_replication. >> >> Isn't this confusing? If yes, it might be better to always assign 1 as >> >> the priority, for example. >> > >> > [Action required within three days. This is a generic notification.] >> > >> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, >> > since you committed the patch believed to have created it, you own this open >> > item. If some other commit is more relevant or if this does not belong as a >> > v10 open item, please let us know. Otherwise, please observe the policy on >> > open item ownership[1] and send a status update within three calendar days of >> > this message. Include a date for your subsequent status update. Testers may >> > discover new open items at any time, and I want to plan to get them all fixed >> > well in advance of shipping v10. Consequently, I will appreciate your efforts >> > toward speedy resolution. Thanks. >> > >> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >> >> Thanks for the notice! >> >> Regarding the item (2), Sawada-san told me that he will work on it after >> this CommitFest finishes. So we would receive the patch for the item from >> him next week. If there will be no patch even after the end of next week >> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. > > Sounds reasonable; I will look for your update on 14Apr or earlier. > >> The items (1) and (3) are not bugs. So I don't think that they need to be >> resolved before the beta release. After the feature freeze, many users >> will try and play with many new features including quorum-based syncrep. >> Then if many of them complain about (1) and (3), we can change the code >> at that timing. So we need more time that users can try the feature. > > I've moved (1) to a new section for things to revisit during beta. If someone > feels strongly that the current behavior is Wrong and must change, speak up as > soon as you reach that conclusion. Absent such arguments, the behavior won't > change. > >> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL >> as the priority if quorum-based sync rep is chosen. It's less confusing. > > Since you do want (3) to change, please own it like any other open item, > including the mandatory status updates. I agree to report NULL as the priority. I'll send a patch for this as well. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: >> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >>> >> Regarding this feature, there are some loose ends. We should work on >>> >> and complete them until the release. >>> >> >>> >> (1) >>> >> Which synchronous replication method, priority or quorum, should be >>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >>> >> a priority-based sync replication is chosen for keeping backward >>> >> compatibility. However some hackers argued to change this decision >>> >> so that a quorum commit is chosen because they think that most users >>> >> prefer to a quorum. >>> >> >>> >> (2) >>> >> There will be still many source comments and documentations that >>> >> we need to update, for example, in high-availability.sgml. We need to >>> >> check and update them throughly. >>> >> >>> >> (3) >>> >> The priority value is assigned to each standby listed in s_s_names >>> >> even in quorum commit though those priority values are not used at all. >>> >> Users can see those priority values in pg_stat_replication. >>> >> Isn't this confusing? If yes, it might be better to always assign 1 as >>> >> the priority, for example. >>> > >>> > [Action required within three days. This is a generic notification.] >>> > >>> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, >>> > since you committed the patch believed to have created it, you own this open >>> > item. If some other commit is more relevant or if this does not belong as a >>> > v10 open item, please let us know. Otherwise, please observe the policy on >>> > open item ownership[1] and send a status update within three calendar days of >>> > this message. Include a date for your subsequent status update. Testers may >>> > discover new open items at any time, and I want to plan to get them all fixed >>> > well in advance of shipping v10. Consequently, I will appreciate your efforts >>> > toward speedy resolution. Thanks. >>> > >>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >>> >>> Thanks for the notice! >>> >>> Regarding the item (2), Sawada-san told me that he will work on it after >>> this CommitFest finishes. So we would receive the patch for the item from >>> him next week. If there will be no patch even after the end of next week >>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. >> >> Sounds reasonable; I will look for your update on 14Apr or earlier. >> >>> The items (1) and (3) are not bugs. So I don't think that they need to be >>> resolved before the beta release. After the feature freeze, many users >>> will try and play with many new features including quorum-based syncrep. >>> Then if many of them complain about (1) and (3), we can change the code >>> at that timing. So we need more time that users can try the feature. >> >> I've moved (1) to a new section for things to revisit during beta. If someone >> feels strongly that the current behavior is Wrong and must change, speak up as >> soon as you reach that conclusion. Absent such arguments, the behavior won't >> change. >> >>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL >>> as the priority if quorum-based sync rep is chosen. It's less confusing. >> >> Since you do want (3) to change, please own it like any other open item, >> including the mandatory status updates. > > I agree to report NULL as the priority. I'll send a patch for this as well. > > Regards, > Attached two draft patches. The one makes pg_stat_replication.sync priority report NULL if in quorum-based sync replication. To prevent extra change I don't change so far the code of setting standby priority. The another one improves the comment and documentation. If there is more thing what we need to mention in documentation please give me feedback. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > > >> (2) > > >> There will be still many source comments and documentations that > > >> we need to update, for example, in high-availability.sgml. We need to > > >> check and update them throughly. > > >> > > >> (3) > > >> The priority value is assigned to each standby listed in s_s_names > > >> even in quorum commit though those priority values are not used at all. > > >> Users can see those priority values in pg_stat_replication. > > >> Isn't this confusing? If yes, it might be better to always assign 1 as > > >> the priority, for example. > > Regarding the item (2), Sawada-san told me that he will work on it after > > this CommitFest finishes. So we would receive the patch for the item from > > him next week. If there will be no patch even after the end of next week > > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. > > Sounds reasonable; I will look for your update on 14Apr or earlier. This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > Since you do want (3) to change, please own it like any other open item, > including the mandatory status updates. Likewise.
On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: > On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: > > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > > > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > > > >> (2) > > > >> There will be still many source comments and documentations that > > > >> we need to update, for example, in high-availability.sgml. We need to > > > >> check and update them throughly. > > > >> > > > >> (3) > > > >> The priority value is assigned to each standby listed in s_s_names > > > >> even in quorum commit though those priority values are not used at all. > > > >> Users can see those priority values in pg_stat_replication. > > > >> Isn't this confusing? If yes, it might be better to always assign 1 as > > > >> the priority, for example. > > > > Regarding the item (2), Sawada-san told me that he will work on it after > > > this CommitFest finishes. So we would receive the patch for the item from > > > him next week. If there will be no patch even after the end of next week > > > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. > > > > Sounds reasonable; I will look for your update on 14Apr or earlier. > > This PostgreSQL 10 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > > Since you do want (3) to change, please own it like any other open item, > > including the mandatory status updates. > > Likewise. IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due for your status update. Please reacquaint yourself with the policy on open item ownership[1] and then reply immediately. If I do not hear from you by 2017-04-17 05:00 UTC, I will transfer this item to release management team ownership without further notice. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> > > >> (2) >> > > >> There will be still many source comments and documentations that >> > > >> we need to update, for example, in high-availability.sgml. We need to >> > > >> check and update them throughly. >> > > >> >> > > >> (3) >> > > >> The priority value is assigned to each standby listed in s_s_names >> > > >> even in quorum commit though those priority values are not used at all. >> > > >> Users can see those priority values in pg_stat_replication. >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as >> > > >> the priority, for example. >> >> > > Regarding the item (2), Sawada-san told me that he will work on it after >> > > this CommitFest finishes. So we would receive the patch for the item from >> > > him next week. If there will be no patch even after the end of next week >> > > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. >> > >> > Sounds reasonable; I will look for your update on 14Apr or earlier. >> >> This PostgreSQL 10 open item is past due for your status update. Kindly send >> a status update within 24 hours, and include a date for your subsequent status >> update. Refer to the policy on open item ownership: >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com Sorry for the delay. I will review Sawada-san's patch and commit something in next three days. So next target date is April 19th. >> > Since you do want (3) to change, please own it like any other open item, >> > including the mandatory status updates. >> >> Likewise. As I told firstly this is not a bug. There are some proposals for better design of priority column in pg_stat_replication, but we've not reached the consensus yet. So I think that it's better to move this open item to "Design Decisions to Recheck Mid-Beta" section so that we can hear more opinions. Regards, -- Fujii Masao
On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: >>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >>>> >> Regarding this feature, there are some loose ends. We should work on >>>> >> and complete them until the release. >>>> >> >>>> >> (1) >>>> >> Which synchronous replication method, priority or quorum, should be >>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >>>> >> a priority-based sync replication is chosen for keeping backward >>>> >> compatibility. However some hackers argued to change this decision >>>> >> so that a quorum commit is chosen because they think that most users >>>> >> prefer to a quorum. >>>> >> >>>> >> (2) >>>> >> There will be still many source comments and documentations that >>>> >> we need to update, for example, in high-availability.sgml. We need to >>>> >> check and update them throughly. >>>> >> >>>> >> (3) >>>> >> The priority value is assigned to each standby listed in s_s_names >>>> >> even in quorum commit though those priority values are not used at all. >>>> >> Users can see those priority values in pg_stat_replication. >>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as >>>> >> the priority, for example. >>>> > >>>> > [Action required within three days. This is a generic notification.] >>>> > >>>> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, >>>> > since you committed the patch believed to have created it, you own this open >>>> > item. If some other commit is more relevant or if this does not belong as a >>>> > v10 open item, please let us know. Otherwise, please observe the policy on >>>> > open item ownership[1] and send a status update within three calendar days of >>>> > this message. Include a date for your subsequent status update. Testers may >>>> > discover new open items at any time, and I want to plan to get them all fixed >>>> > well in advance of shipping v10. Consequently, I will appreciate your efforts >>>> > toward speedy resolution. Thanks. >>>> > >>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >>>> >>>> Thanks for the notice! >>>> >>>> Regarding the item (2), Sawada-san told me that he will work on it after >>>> this CommitFest finishes. So we would receive the patch for the item from >>>> him next week. If there will be no patch even after the end of next week >>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. >>> >>> Sounds reasonable; I will look for your update on 14Apr or earlier. >>> >>>> The items (1) and (3) are not bugs. So I don't think that they need to be >>>> resolved before the beta release. After the feature freeze, many users >>>> will try and play with many new features including quorum-based syncrep. >>>> Then if many of them complain about (1) and (3), we can change the code >>>> at that timing. So we need more time that users can try the feature. >>> >>> I've moved (1) to a new section for things to revisit during beta. If someone >>> feels strongly that the current behavior is Wrong and must change, speak up as >>> soon as you reach that conclusion. Absent such arguments, the behavior won't >>> change. >>> >>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL >>>> as the priority if quorum-based sync rep is chosen. It's less confusing. >>> >>> Since you do want (3) to change, please own it like any other open item, >>> including the mandatory status updates. >> >> I agree to report NULL as the priority. I'll send a patch for this as well. >> >> Regards, >> > > Attached two draft patches. The one makes pg_stat_replication.sync > priority report NULL if in quorum-based sync replication. To prevent > extra change I don't change so far the code of setting standby > priority. The another one improves the comment and documentation. If > there is more thing what we need to mention in documentation please > give me feedback. Attached is the modified version of the doc improvement patch. Barring any objection, I will commit this version. + In term of performance there is difference between two synchronous + replication method. Generally quorum-based synchronous replication + tends to be higher performance than priority-based synchronous + replication. Because in quorum-based synchronous replication, the + transaction can resume as soon as received the specified number of + acknowledgement from synchronous standby servers without distinction + of standby servers. On the other hand in priority-based synchronous + replication, the standby server that the primary server must wait for + is fixed until a synchronous standby fails. Therefore, if a server on + low-performance machine a has high priority and is chosen as a + synchronous standby server it can reduce performance for database + applications. This description looks misleading. A quorum-based sync rep is basically more efficient when there are multiple standbys in s_s_names and you want to replicate the transactions to some of them synchronously. I think that this assumption should be documented explicitly. So I modified this description. Please see the modified version in the attached patch. + /* + * Update priority of this WalSender, but note that in + * quroum-based sync replication, the value of + * sync_standby_priority has no effect. + */ This is not true because even quorum-based sync rep uses the priority value to check whether the standby is async or sync. So I just remove this. + * In quorum-based sync replication we select the quorum sync + * standby without theirs priority. The all running active standbys + * are considered as a candidate for quorum sync standbys Same as above. Also I removed some descriptions that I thought unnecessary to add. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >>>>> >> Regarding this feature, there are some loose ends. We should work on >>>>> >> and complete them until the release. >>>>> >> >>>>> >> (1) >>>>> >> Which synchronous replication method, priority or quorum, should be >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >>>>> >> a priority-based sync replication is chosen for keeping backward >>>>> >> compatibility. However some hackers argued to change this decision >>>>> >> so that a quorum commit is chosen because they think that most users >>>>> >> prefer to a quorum. >>>>> >> >>>>> >> (2) >>>>> >> There will be still many source comments and documentations that >>>>> >> we need to update, for example, in high-availability.sgml. We need to >>>>> >> check and update them throughly. >>>>> >> >>>>> >> (3) >>>>> >> The priority value is assigned to each standby listed in s_s_names >>>>> >> even in quorum commit though those priority values are not used at all. >>>>> >> Users can see those priority values in pg_stat_replication. >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as >>>>> >> the priority, for example. >>>>> > >>>>> > [Action required within three days. This is a generic notification.] >>>>> > >>>>> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, >>>>> > since you committed the patch believed to have created it, you own this open >>>>> > item. If some other commit is more relevant or if this does not belong as a >>>>> > v10 open item, please let us know. Otherwise, please observe the policy on >>>>> > open item ownership[1] and send a status update within three calendar days of >>>>> > this message. Include a date for your subsequent status update. Testers may >>>>> > discover new open items at any time, and I want to plan to get them all fixed >>>>> > well in advance of shipping v10. Consequently, I will appreciate your efforts >>>>> > toward speedy resolution. Thanks. >>>>> > >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >>>>> >>>>> Thanks for the notice! >>>>> >>>>> Regarding the item (2), Sawada-san told me that he will work on it after >>>>> this CommitFest finishes. So we would receive the patch for the item from >>>>> him next week. If there will be no patch even after the end of next week >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. >>>> >>>> Sounds reasonable; I will look for your update on 14Apr or earlier. >>>> >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be >>>>> resolved before the beta release. After the feature freeze, many users >>>>> will try and play with many new features including quorum-based syncrep. >>>>> Then if many of them complain about (1) and (3), we can change the code >>>>> at that timing. So we need more time that users can try the feature. >>>> >>>> I've moved (1) to a new section for things to revisit during beta. If someone >>>> feels strongly that the current behavior is Wrong and must change, speak up as >>>> soon as you reach that conclusion. Absent such arguments, the behavior won't >>>> change. >>>> >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing. >>>> >>>> Since you do want (3) to change, please own it like any other open item, >>>> including the mandatory status updates. >>> >>> I agree to report NULL as the priority. I'll send a patch for this as well. >>> >>> Regards, >>> >> >> Attached two draft patches. The one makes pg_stat_replication.sync >> priority report NULL if in quorum-based sync replication. To prevent >> extra change I don't change so far the code of setting standby >> priority. The another one improves the comment and documentation. If >> there is more thing what we need to mention in documentation please >> give me feedback. > > Attached is the modified version of the doc improvement patch. > Barring any objection, I will commit this version. Thank you for updating the patch. > > + In term of performance there is difference between two synchronous > + replication method. Generally quorum-based synchronous replication > + tends to be higher performance than priority-based synchronous > + replication. Because in quorum-based synchronous replication, the > + transaction can resume as soon as received the specified number of > + acknowledgement from synchronous standby servers without distinction > + of standby servers. On the other hand in priority-based synchronous > + replication, the standby server that the primary server must wait for > + is fixed until a synchronous standby fails. Therefore, if a server on > + low-performance machine a has high priority and is chosen as a > + synchronous standby server it can reduce performance for database > + applications. > > This description looks misleading. A quorum-based sync rep is basically > more efficient when there are multiple standbys in s_s_names and you want > to replicate the transactions to some of them synchronously. I think that > this assumption should be documented explicitly. So I modified this > description. Please see the modified version in the attached patch. You're right. The modified version looks good to me, thanks. > > + /* > + * Update priority of this WalSender, but note that in > + * quroum-based sync replication, the value of > + * sync_standby_priority has no effect. > + */ > > This is not true because even quorum-based sync rep uses the priority > value to check whether the standby is async or sync. So I just remove this. > > + * In quorum-based sync replication we select the quorum sync > + * standby without theirs priority. The all running active standbys > + * are considered as a candidate for quorum sync standbys > > Same as above. > > Also I removed some descriptions that I thought unnecessary to add. > > Regards, > > -- > Fujii Masao Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com> > On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: > >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: > >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >>>>> >> Regarding this feature, there are some loose ends. We should work on > >>>>> >> and complete them until the release. > >>>>> >> > >>>>> >> (1) > >>>>> >> Which synchronous replication method, priority or quorum, should be > >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > >>>>> >> a priority-based sync replication is chosen for keeping backward > >>>>> >> compatibility. However some hackers argued to change this decision > >>>>> >> so that a quorum commit is chosen because they think that most users > >>>>> >> prefer to a quorum. > >>>>> >> > >>>>> >> (2) > >>>>> >> There will be still many source comments and documentations that > >>>>> >> we need to update, for example, in high-availability.sgml. We need to > >>>>> >> check and update them throughly. > >>>>> >> > >>>>> >> (3) > >>>>> >> The priority value is assigned to each standby listed in s_s_names > >>>>> >> even in quorum commit though those priority values are not used at all. > >>>>> >> Users can see those priority values in pg_stat_replication. > >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as > >>>>> >> the priority, for example. > >>>>> > > >>>>> > [Action required within three days. This is a generic notification.] > >>>>> > > >>>>> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, > >>>>> > since you committed the patch believed to have created it, you own this open > >>>>> > item. If some other commit is more relevant or if this does not belong as a > >>>>> > v10 open item, please let us know. Otherwise, please observe the policy on > >>>>> > open item ownership[1] and send a status update within three calendar days of > >>>>> > this message. Include a date for your subsequent status update. Testers may > >>>>> > discover new open items at any time, and I want to plan to get them all fixed > >>>>> > well in advance of shipping v10. Consequently, I will appreciate your efforts > >>>>> > toward speedy resolution. Thanks. > >>>>> > > >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > >>>>> > >>>>> Thanks for the notice! > >>>>> > >>>>> Regarding the item (2), Sawada-san told me that he will work on it after > >>>>> this CommitFest finishes. So we would receive the patch for the item from > >>>>> him next week. If there will be no patch even after the end of next week > >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. > >>>> > >>>> Sounds reasonable; I will look for your update on 14Apr or earlier. > >>>> > >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be > >>>>> resolved before the beta release. After the feature freeze, many users > >>>>> will try and play with many new features including quorum-based syncrep. > >>>>> Then if many of them complain about (1) and (3), we can change the code > >>>>> at that timing. So we need more time that users can try the feature. > >>>> > >>>> I've moved (1) to a new section for things to revisit during beta. If someone > >>>> feels strongly that the current behavior is Wrong and must change, speak up as > >>>> soon as you reach that conclusion. Absent such arguments, the behavior won't > >>>> change. > >>>> > >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL > >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing. > >>>> > >>>> Since you do want (3) to change, please own it like any other open item, > >>>> including the mandatory status updates. > >>> > >>> I agree to report NULL as the priority. I'll send a patch for this as well. > >>> > >>> Regards, > >>> > >> > >> Attached two draft patches. The one makes pg_stat_replication.sync > >> priority report NULL if in quorum-based sync replication. To prevent > >> extra change I don't change so far the code of setting standby > >> priority. The another one improves the comment and documentation. If > >> there is more thing what we need to mention in documentation please > >> give me feedback. > > > > Attached is the modified version of the doc improvement patch. > > Barring any objection, I will commit this version. > > Thank you for updating the patch. > > > > > + In term of performance there is difference between two synchronous > > + replication method. Generally quorum-based synchronous replication > > + tends to be higher performance than priority-based synchronous > > + replication. Because in quorum-based synchronous replication, the > > + transaction can resume as soon as received the specified number of > > + acknowledgement from synchronous standby servers without distinction > > + of standby servers. On the other hand in priority-based synchronous > > + replication, the standby server that the primary server must wait for > > + is fixed until a synchronous standby fails. Therefore, if a server on > > + low-performance machine a has high priority and is chosen as a > > + synchronous standby server it can reduce performance for database > > + applications. > > > > This description looks misleading. A quorum-based sync rep is basically > > more efficient when there are multiple standbys in s_s_names and you want > > to replicate the transactions to some of them synchronously. I think that > > this assumption should be documented explicitly. So I modified this > > description. Please see the modified version in the attached patch. > > You're right. The modified version looks good to me, thanks. It looks better to me, too. But (even I'm not sure, of course) the sentences seem to need improvement. | <para> | Quorum-based synchronous replication is basically more | efficient than priority-based one when you specify multiple | standbys in <varname>synchronous_standby_names</> and want | to synchronously replicate transactions to two or more of | them. In the priority-based case, the replication master | must wait for a reply from the slowest standby in the | required number of standbys in priority order, which may | slower than the rest. On the other hand, quorum-based | synchronous replication may reduce the latency because it | allows transactions to wait only for replies from a | required number of fastest standbys in all the listed | standbys, i.e., such slow standby doesn't block | transactions. | </para> I'm not sure that this is actually an improvement.. > > + /* > > + * Update priority of this WalSender, but note that in > > + * quroum-based sync replication, the value of > > + * sync_standby_priority has no effect. > > + */ > > > > This is not true because even quorum-based sync rep uses the priority > > value to check whether the standby is async or sync. So I just remove this. > > > > + * In quorum-based sync replication we select the quorum sync > > + * standby without theirs priority. The all running active standbys > > + * are considered as a candidate for quorum sync standbys > > > > Same as above. > > > > Also I removed some descriptions that I thought unnecessary to add. > > > > Regards, > > > > -- > > Fujii Masao > > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com> >> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: >> >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >> >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> >>>>> >> Regarding this feature, there are some loose ends. We should work on >> >>>>> >> and complete them until the release. >> >>>>> >> >> >>>>> >> (1) >> >>>>> >> Which synchronous replication method, priority or quorum, should be >> >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >> >>>>> >> a priority-based sync replication is chosen for keeping backward >> >>>>> >> compatibility. However some hackers argued to change this decision >> >>>>> >> so that a quorum commit is chosen because they think that most users >> >>>>> >> prefer to a quorum. >> >>>>> >> >> >>>>> >> (2) >> >>>>> >> There will be still many source comments and documentations that >> >>>>> >> we need to update, for example, in high-availability.sgml. We need to >> >>>>> >> check and update them throughly. >> >>>>> >> >> >>>>> >> (3) >> >>>>> >> The priority value is assigned to each standby listed in s_s_names >> >>>>> >> even in quorum commit though those priority values are not used at all. >> >>>>> >> Users can see those priority values in pg_stat_replication. >> >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as >> >>>>> >> the priority, for example. >> >>>>> > >> >>>>> > [Action required within three days. This is a generic notification.] >> >>>>> > >> >>>>> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, >> >>>>> > since you committed the patch believed to have created it, you own this open >> >>>>> > item. If some other commit is more relevant or if this does not belong as a >> >>>>> > v10 open item, please let us know. Otherwise, please observe the policy on >> >>>>> > open item ownership[1] and send a status update within three calendar days of >> >>>>> > this message. Include a date for your subsequent status update. Testers may >> >>>>> > discover new open items at any time, and I want to plan to get them all fixed >> >>>>> > well in advance of shipping v10. Consequently, I will appreciate your efforts >> >>>>> > toward speedy resolution. Thanks. >> >>>>> > >> >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >> >>>>> >> >>>>> Thanks for the notice! >> >>>>> >> >>>>> Regarding the item (2), Sawada-san told me that he will work on it after >> >>>>> this CommitFest finishes. So we would receive the patch for the item from >> >>>>> him next week. If there will be no patch even after the end of next week >> >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. >> >>>> >> >>>> Sounds reasonable; I will look for your update on 14Apr or earlier. >> >>>> >> >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be >> >>>>> resolved before the beta release. After the feature freeze, many users >> >>>>> will try and play with many new features including quorum-based syncrep. >> >>>>> Then if many of them complain about (1) and (3), we can change the code >> >>>>> at that timing. So we need more time that users can try the feature. >> >>>> >> >>>> I've moved (1) to a new section for things to revisit during beta. If someone >> >>>> feels strongly that the current behavior is Wrong and must change, speak up as >> >>>> soon as you reach that conclusion. Absent such arguments, the behavior won't >> >>>> change. >> >>>> >> >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL >> >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing. >> >>>> >> >>>> Since you do want (3) to change, please own it like any other open item, >> >>>> including the mandatory status updates. >> >>> >> >>> I agree to report NULL as the priority. I'll send a patch for this as well. >> >>> >> >>> Regards, >> >>> >> >> >> >> Attached two draft patches. The one makes pg_stat_replication.sync >> >> priority report NULL if in quorum-based sync replication. To prevent >> >> extra change I don't change so far the code of setting standby >> >> priority. The another one improves the comment and documentation. If >> >> there is more thing what we need to mention in documentation please >> >> give me feedback. >> > >> > Attached is the modified version of the doc improvement patch. >> > Barring any objection, I will commit this version. >> >> Thank you for updating the patch. >> >> > >> > + In term of performance there is difference between two synchronous >> > + replication method. Generally quorum-based synchronous replication >> > + tends to be higher performance than priority-based synchronous >> > + replication. Because in quorum-based synchronous replication, the >> > + transaction can resume as soon as received the specified number of >> > + acknowledgement from synchronous standby servers without distinction >> > + of standby servers. On the other hand in priority-based synchronous >> > + replication, the standby server that the primary server must wait for >> > + is fixed until a synchronous standby fails. Therefore, if a server on >> > + low-performance machine a has high priority and is chosen as a >> > + synchronous standby server it can reduce performance for database >> > + applications. >> > >> > This description looks misleading. A quorum-based sync rep is basically >> > more efficient when there are multiple standbys in s_s_names and you want >> > to replicate the transactions to some of them synchronously. I think that >> > this assumption should be documented explicitly. So I modified this >> > description. Please see the modified version in the attached patch. >> >> You're right. The modified version looks good to me, thanks. > > It looks better to me, too. But (even I'm not sure, of course) > the sentences seem to need improvement. > > | <para> > | Quorum-based synchronous replication is basically more > | efficient than priority-based one when you specify multiple > | standbys in <varname>synchronous_standby_names</> and want > | to synchronously replicate transactions to two or more of > | them. In the priority-based case, the replication master > | must wait for a reply from the slowest standby in the > | required number of standbys in priority order, which may > | slower than the rest. I supposed that Fujii-san pointed out that quorum-based sync replication could be more efficient when we want to replicate the transaction to "part of" standbys listed in s_s_names. So I guess it's not good idea to mention "two or more of them" which also can mean the all of standbys. > On the other hand, quorum-based > | synchronous replication may reduce the latency because it > | allows transactions to wait only for replies from a > | required number of fastest standbys in all the listed > | standbys, i.e., such slow standby doesn't block > | transactions. > | </para> > > I'm not sure that this is actually an improvement.. > -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Apr 18, 2017 at 7:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com> >>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch <noah@leadboat.com> wrote: >>> >>>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >>> >>>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >>> >>>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >>> >>>>> >> Regarding this feature, there are some loose ends. We should work on >>> >>>>> >> and complete them until the release. >>> >>>>> >> >>> >>>>> >> (1) >>> >>>>> >> Which synchronous replication method, priority or quorum, should be >>> >>>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >>> >>>>> >> a priority-based sync replication is chosen for keeping backward >>> >>>>> >> compatibility. However some hackers argued to change this decision >>> >>>>> >> so that a quorum commit is chosen because they think that most users >>> >>>>> >> prefer to a quorum. >>> >>>>> >> >>> >>>>> >> (2) >>> >>>>> >> There will be still many source comments and documentations that >>> >>>>> >> we need to update, for example, in high-availability.sgml. We need to >>> >>>>> >> check and update them throughly. >>> >>>>> >> >>> >>>>> >> (3) >>> >>>>> >> The priority value is assigned to each standby listed in s_s_names >>> >>>>> >> even in quorum commit though those priority values are not used at all. >>> >>>>> >> Users can see those priority values in pg_stat_replication. >>> >>>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as >>> >>>>> >> the priority, for example. >>> >>>>> > >>> >>>>> > [Action required within three days. This is a generic notification.] >>> >>>>> > >>> >>>>> > The above-described topic is currently a PostgreSQL 10 open item. Fujii, >>> >>>>> > since you committed the patch believed to have created it, you own this open >>> >>>>> > item. If some other commit is more relevant or if this does not belong as a >>> >>>>> > v10 open item, please let us know. Otherwise, please observe the policy on >>> >>>>> > open item ownership[1] and send a status update within three calendar days of >>> >>>>> > this message. Include a date for your subsequent status update. Testers may >>> >>>>> > discover new open items at any time, and I want to plan to get them all fixed >>> >>>>> > well in advance of shipping v10. Consequently, I will appreciate your efforts >>> >>>>> > toward speedy resolution. Thanks. >>> >>>>> > >>> >>>>> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >>> >>>>> >>> >>>>> Thanks for the notice! >>> >>>>> >>> >>>>> Regarding the item (2), Sawada-san told me that he will work on it after >>> >>>>> this CommitFest finishes. So we would receive the patch for the item from >>> >>>>> him next week. If there will be no patch even after the end of next week >>> >>>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first. >>> >>>> >>> >>>> Sounds reasonable; I will look for your update on 14Apr or earlier. >>> >>>> >>> >>>>> The items (1) and (3) are not bugs. So I don't think that they need to be >>> >>>>> resolved before the beta release. After the feature freeze, many users >>> >>>>> will try and play with many new features including quorum-based syncrep. >>> >>>>> Then if many of them complain about (1) and (3), we can change the code >>> >>>>> at that timing. So we need more time that users can try the feature. >>> >>>> >>> >>>> I've moved (1) to a new section for things to revisit during beta. If someone >>> >>>> feels strongly that the current behavior is Wrong and must change, speak up as >>> >>>> soon as you reach that conclusion. Absent such arguments, the behavior won't >>> >>>> change. >>> >>>> >>> >>>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL >>> >>>>> as the priority if quorum-based sync rep is chosen. It's less confusing. >>> >>>> >>> >>>> Since you do want (3) to change, please own it like any other open item, >>> >>>> including the mandatory status updates. >>> >>> >>> >>> I agree to report NULL as the priority. I'll send a patch for this as well. >>> >>> >>> >>> Regards, >>> >>> >>> >> >>> >> Attached two draft patches. The one makes pg_stat_replication.sync >>> >> priority report NULL if in quorum-based sync replication. To prevent >>> >> extra change I don't change so far the code of setting standby >>> >> priority. The another one improves the comment and documentation. If >>> >> there is more thing what we need to mention in documentation please >>> >> give me feedback. >>> > >>> > Attached is the modified version of the doc improvement patch. >>> > Barring any objection, I will commit this version. >>> >>> Thank you for updating the patch. >>> >>> > >>> > + In term of performance there is difference between two synchronous >>> > + replication method. Generally quorum-based synchronous replication >>> > + tends to be higher performance than priority-based synchronous >>> > + replication. Because in quorum-based synchronous replication, the >>> > + transaction can resume as soon as received the specified number of >>> > + acknowledgement from synchronous standby servers without distinction >>> > + of standby servers. On the other hand in priority-based synchronous >>> > + replication, the standby server that the primary server must wait for >>> > + is fixed until a synchronous standby fails. Therefore, if a server on >>> > + low-performance machine a has high priority and is chosen as a >>> > + synchronous standby server it can reduce performance for database >>> > + applications. >>> > >>> > This description looks misleading. A quorum-based sync rep is basically >>> > more efficient when there are multiple standbys in s_s_names and you want >>> > to replicate the transactions to some of them synchronously. I think that >>> > this assumption should be documented explicitly. So I modified this >>> > description. Please see the modified version in the attached patch. >>> >>> You're right. The modified version looks good to me, thanks. >> >> It looks better to me, too. But (even I'm not sure, of course) >> the sentences seem to need improvement. >> >> | <para> >> | Quorum-based synchronous replication is basically more >> | efficient than priority-based one when you specify multiple >> | standbys in <varname>synchronous_standby_names</> and want >> | to synchronously replicate transactions to two or more of >> | them. In the priority-based case, the replication master >> | must wait for a reply from the slowest standby in the >> | required number of standbys in priority order, which may >> | slower than the rest. > > I supposed that Fujii-san pointed out that quorum-based sync > replication could be more efficient when we want to replicate the > transaction to "part of" standbys listed in s_s_names. Yes. Anyway, I pushed the patch except this paragraph. Regarding this paragraph, the patch for better descriptions is welcome. Regards, -- Fujii Masao
On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: > On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: > > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: > >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: > >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > >> > >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >> > > >> (3) > >> > > >> The priority value is assigned to each standby listed in s_s_names > >> > > >> even in quorum commit though those priority values are not used at all. > >> > > >> Users can see those priority values in pg_stat_replication. > >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as > >> > > >> the priority, for example. > >> This PostgreSQL 10 open item is past due for your status update. Kindly send > >> a status update within 24 hours, and include a date for your subsequent status > >> update. Refer to the policy on open item ownership: > >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > >> > Since you do want (3) to change, please own it like any other open item, > >> > including the mandatory status updates. > >> > >> Likewise. > > As I told firstly this is not a bug. There are some proposals for better design > of priority column in pg_stat_replication, but we've not reached the consensus > yet. So I think that it's better to move this open item to "Design Decisions to > Recheck Mid-Beta" section so that we can hear more opinions. I'm reading that some people want to report NULL priority, some people want to report a constant 1 priority, and nobody wants the current behavior. Is that an accurate summary?
On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> >> >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> >> > > >> (3) >> >> > > >> The priority value is assigned to each standby listed in s_s_names >> >> > > >> even in quorum commit though those priority values are not used at all. >> >> > > >> Users can see those priority values in pg_stat_replication. >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as >> >> > > >> the priority, for example. > >> >> This PostgreSQL 10 open item is past due for your status update. Kindly send >> >> a status update within 24 hours, and include a date for your subsequent status >> >> update. Refer to the policy on open item ownership: >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > >> >> > Since you do want (3) to change, please own it like any other open item, >> >> > including the mandatory status updates. >> >> >> >> Likewise. >> >> As I told firstly this is not a bug. There are some proposals for better design >> of priority column in pg_stat_replication, but we've not reached the consensus >> yet. So I think that it's better to move this open item to "Design Decisions to >> Recheck Mid-Beta" section so that we can hear more opinions. > > I'm reading that some people want to report NULL priority, some people want to > report a constant 1 priority, and nobody wants the current behavior. Is that > an accurate summary? Yes, I think that's correct. FWIW the reason of current behavior is that it would be useful for the user who is willing to switch from ANY to FIRST. They can know which standbys will become sync or potential. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Apr 19, 2017 at 1:52 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: >> On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: >>> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: >>> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: >>> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: >>> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >>> >> >>> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >>> >> > > >> (3) >>> >> > > >> The priority value is assigned to each standby listed in s_s_names >>> >> > > >> even in quorum commit though those priority values are not used at all. >>> >> > > >> Users can see those priority values in pg_stat_replication. >>> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as >>> >> > > >> the priority, for example. >> >>> >> This PostgreSQL 10 open item is past due for your status update. Kindly send >>> >> a status update within 24 hours, and include a date for your subsequent status >>> >> update. Refer to the policy on open item ownership: >>> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >> >>> >> > Since you do want (3) to change, please own it like any other open item, >>> >> > including the mandatory status updates. >>> >> >>> >> Likewise. >>> >>> As I told firstly this is not a bug. There are some proposals for better design >>> of priority column in pg_stat_replication, but we've not reached the consensus >>> yet. So I think that it's better to move this open item to "Design Decisions to >>> Recheck Mid-Beta" section so that we can hear more opinions. >> >> I'm reading that some people want to report NULL priority, some people want to >> report a constant 1 priority, and nobody wants the current behavior. Is that >> an accurate summary? > > Yes, I think that's correct. Just adding that I am the only one advocating for switching the priority number to NULL for async standbys, and that this proposal is visibly outvoted as it breaks backward-compatibility with the 0-priority setting. -- Michael
At Wed, 19 Apr 2017 03:03:38 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwE95S5GM9UZh0F3ef2D3iEwJ59skh=EwW5HmDJPe2aXog@mail.gmail.com> > On Tue, Apr 18, 2017 at 7:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBqSjUGx0LCDrjEDLB-yx2EvgLMdT8Nz4ZR_xpxrbMU+Q@mail.gmail.com> > >>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> > This description looks misleading. A quorum-based sync rep is basically > >>> > more efficient when there are multiple standbys in s_s_names and you want > >>> > to replicate the transactions to some of them synchronously. I think that > >>> > this assumption should be documented explicitly. So I modified this > >>> > description. Please see the modified version in the attached patch. > >>> > >>> You're right. The modified version looks good to me, thanks. + A quorum-based synchronous replication is basically more efficient than + a priority-based one when you specify multiple standbys in + <varname>synchronous_standby_names</> and want to replicate + the transactions to some of them synchronously. In this case, + the transactions in a priority-based synchronous replication must wait for + reply from the slowest standby in synchronous standbys chosen based on + their priorities, and which may increase the transaction latencies. + On the other hand, using a quorum-based synchronous replication may + improve those latencies because it makes the transactions wait only for + replies from the requested number of faster standbys in all the listed + standbys, i.e., such slow standby doesn't block the transactions. > >> It looks better to me, too. But (even I'm not sure, of course) > >> the sentences seem to need improvement. > >> > >> | <para> > >> | Quorum-based synchronous replication is basically more > >> | efficient than priority-based one when you specify multiple > >> | standbys in <varname>synchronous_standby_names</> and want > >> | to synchronously replicate transactions to two or more of > >> | them. In the priority-based case, the replication master > >> | must wait for a reply from the slowest standby in the > >> | required number of standbys in priority order, which may > >> | slower than the rest. > > > > I supposed that Fujii-san pointed out that quorum-based sync > > replication could be more efficient when we want to replicate the > > transaction to "part of" standbys listed in s_s_names. > > Yes. Yes, am I wrote something opposing? > Anyway, I pushed the patch except this paragraph. > Regarding this paragraph, the patch for better descriptions is welcome. +1 regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Ok, I got the point. At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp> > > >> | <para> > > >> | Quorum-based synchronous replication is basically more > > >> | efficient than priority-based one when you specify multiple > > >> | standbys in <varname>synchronous_standby_names</> and want > > >> | to synchronously replicate transactions to two or more of > > >> | them. "Some" means "not all". > > >> | In the priority-based case, the replication master > > >> | must wait for a reply from the slowest standby in the > > >> | required number of standbys in priority order, which may > > >> | slower than the rest. Quorum-based synchronous replication is expected to be more efficient than priority-based one when your master doesn't need to be in sync with all of the nominated standbys by <varname>synchronous_standby_names</>. While quorum-based replication master waits only for a specified number of fastest standbys, priority-based replicatoin master must wait for standbys at the top of the list, which may be slower than the rest. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: > On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: > > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: > >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: > >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: > >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: > >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > >> >> > >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >> >> > > >> (3) > >> >> > > >> The priority value is assigned to each standby listed in s_s_names > >> >> > > >> even in quorum commit though those priority values are not used at all. > >> >> > > >> Users can see those priority values in pg_stat_replication. > >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as > >> >> > > >> the priority, for example. > > > >> >> This PostgreSQL 10 open item is past due for your status update. Kindly send > >> >> a status update within 24 hours, and include a date for your subsequent status > >> >> update. Refer to the policy on open item ownership: > >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > > >> >> > Since you do want (3) to change, please own it like any other open item, > >> >> > including the mandatory status updates. > >> >> > >> >> Likewise. > >> > >> As I told firstly this is not a bug. There are some proposals for better design > >> of priority column in pg_stat_replication, but we've not reached the consensus > >> yet. So I think that it's better to move this open item to "Design Decisions to > >> Recheck Mid-Beta" section so that we can hear more opinions. > > > > I'm reading that some people want to report NULL priority, some people want to > > report a constant 1 priority, and nobody wants the current behavior. Is that > > an accurate summary? > > Yes, I think that's correct. Okay, but ... > FWIW the reason of current behavior is that it would be useful for the > user who is willing to switch from ANY to FIRST. They can know which > standbys will become sync or potential. ... does this mean you personally want to keep the current behavior? If not, has some other person stated a wish to keep the current behavior?
On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote: > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: >> >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: >> >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: >> >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: >> >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> >> >> >> >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> >> >> > > >> (3) >> >> >> > > >> The priority value is assigned to each standby listed in s_s_names >> >> >> > > >> even in quorum commit though those priority values are not used at all. >> >> >> > > >> Users can see those priority values in pg_stat_replication. >> >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as >> >> >> > > >> the priority, for example. >> > >> >> >> This PostgreSQL 10 open item is past due for your status update. Kindly send >> >> >> a status update within 24 hours, and include a date for your subsequent status >> >> >> update. Refer to the policy on open item ownership: >> >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com >> > >> >> >> > Since you do want (3) to change, please own it like any other open item, >> >> >> > including the mandatory status updates. >> >> >> >> >> >> Likewise. >> >> >> >> As I told firstly this is not a bug. There are some proposals for better design >> >> of priority column in pg_stat_replication, but we've not reached the consensus >> >> yet. So I think that it's better to move this open item to "Design Decisions to >> >> Recheck Mid-Beta" section so that we can hear more opinions. >> > >> > I'm reading that some people want to report NULL priority, some people want to >> > report a constant 1 priority, and nobody wants the current behavior. Is that >> > an accurate summary? >> >> Yes, I think that's correct. > > Okay, but ... > >> FWIW the reason of current behavior is that it would be useful for the >> user who is willing to switch from ANY to FIRST. They can know which >> standbys will become sync or potential. > > ... does this mean you personally want to keep the current behavior? If not, > has some other person stated a wish to keep the current behavior? No, I want to change the current behavior. IMO it's better to set priority 1 to all standbys in quorum set. I guess there is no longer person who supports the current behavior. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Fri, 21 Apr 2017 13:20:05 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCU+ch4b2O0iW-b_BnUs7oMcT8pcwM690XVu134k=cA+Q@mail.gmail.com> > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote: > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: > >> >> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch <noah@leadboat.com> wrote: > >> >> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote: > >> >> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote: > >> >> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > >> >> >> > >> >> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >> >> >> > > >> (3) > >> >> >> > > >> The priority value is assigned to each standby listed in s_s_names > >> >> >> > > >> even in quorum commit though those priority values are not used at all. > >> >> >> > > >> Users can see those priority values in pg_stat_replication. > >> >> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 as > >> >> >> > > >> the priority, for example. > >> > > >> >> >> This PostgreSQL 10 open item is past due for your status update. Kindly send > >> >> >> a status update within 24 hours, and include a date for your subsequent status > >> >> >> update. Refer to the policy on open item ownership: > >> >> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > >> > > >> >> >> > Since you do want (3) to change, please own it like any other open item, > >> >> >> > including the mandatory status updates. > >> >> >> > >> >> >> Likewise. > >> >> > >> >> As I told firstly this is not a bug. There are some proposals for better design > >> >> of priority column in pg_stat_replication, but we've not reached the consensus > >> >> yet. So I think that it's better to move this open item to "Design Decisions to > >> >> Recheck Mid-Beta" section so that we can hear more opinions. > >> > > >> > I'm reading that some people want to report NULL priority, some people want to > >> > report a constant 1 priority, and nobody wants the current behavior. Is that > >> > an accurate summary? > >> > >> Yes, I think that's correct. > > > > Okay, but ... > > > >> FWIW the reason of current behavior is that it would be useful for the > >> user who is willing to switch from ANY to FIRST. They can know which > >> standbys will become sync or potential. > > > > ... does this mean you personally want to keep the current behavior? If not, > > has some other person stated a wish to keep the current behavior? > > No, I want to change the current behavior. IMO it's better to set > priority 1 to all standbys in quorum set. I guess there is no longer > person who supports the current behavior. +1 for the latter. For the former, I'd like to distinguish standbys in sync and not in the field or something if we can allow the additional complexity. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote: > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote: > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: > >> >> As I told firstly this is not a bug. There are some proposals for better design > >> >> of priority column in pg_stat_replication, but we've not reached the consensus > >> >> yet. So I think that it's better to move this open item to "Design Decisions to > >> >> Recheck Mid-Beta" section so that we can hear more opinions. > >> > > >> > I'm reading that some people want to report NULL priority, some people want to > >> > report a constant 1 priority, and nobody wants the current behavior. Is that > >> > an accurate summary? > >> > >> Yes, I think that's correct. > > > > Okay, but ... > > > >> FWIW the reason of current behavior is that it would be useful for the > >> user who is willing to switch from ANY to FIRST. They can know which > >> standbys will become sync or potential. > > > > ... does this mean you personally want to keep the current behavior? If not, > > has some other person stated a wish to keep the current behavior? > > No, I want to change the current behavior. IMO it's better to set > priority 1 to all standbys in quorum set. I guess there is no longer > person who supports the current behavior. In that case, this open item is not eligible for section "Design Decisions to Recheck Mid-Beta". That section is for items where we'll probably change nothing, but we plan to recheck later just in case. Here, we expect to change the behavior; the open question is which replacement behavior to prefer. Fujii, as the owner of this open item, you are responsible for moderating the debate until there's adequate consensus to make a particular change or to keep the current behavior after all. Please proceed to do that. Beta testers deserve a UI they may like, not a UI you already plan to change later. Thanks, nm
On Thu, Apr 20, 2017 at 11:34:34PM -0700, Noah Misch wrote: > On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote: > > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote: > > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: > > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: > > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: > > >> >> As I told firstly this is not a bug. There are some proposals for better design > > >> >> of priority column in pg_stat_replication, but we've not reached the consensus > > >> >> yet. So I think that it's better to move this open item to "Design Decisions to > > >> >> Recheck Mid-Beta" section so that we can hear more opinions. > > >> > > > >> > I'm reading that some people want to report NULL priority, some people want to > > >> > report a constant 1 priority, and nobody wants the current behavior. Is that > > >> > an accurate summary? > > >> > > >> Yes, I think that's correct. > > > > > > Okay, but ... > > > > > >> FWIW the reason of current behavior is that it would be useful for the > > >> user who is willing to switch from ANY to FIRST. They can know which > > >> standbys will become sync or potential. > > > > > > ... does this mean you personally want to keep the current behavior? If not, > > > has some other person stated a wish to keep the current behavior? > > > > No, I want to change the current behavior. IMO it's better to set > > priority 1 to all standbys in quorum set. I guess there is no longer > > person who supports the current behavior. > > In that case, this open item is not eligible for section "Design Decisions to > Recheck Mid-Beta". That section is for items where we'll probably change > nothing, but we plan to recheck later just in case. Here, we expect to change > the behavior; the open question is which replacement behavior to prefer. > > Fujii, as the owner of this open item, you are responsible for moderating the > debate until there's adequate consensus to make a particular change or to keep > the current behavior after all. Please proceed to do that. Beta testers > deserve a UI they may like, not a UI you already plan to change later. Please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
On Thu, Apr 20, 2017 at 9:31 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Ok, I got the point. > > At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp> >> > >> | <para> >> > >> | Quorum-based synchronous replication is basically more >> > >> | efficient than priority-based one when you specify multiple >> > >> | standbys in <varname>synchronous_standby_names</> and want >> > >> | to synchronously replicate transactions to two or more of >> > >> | them. > > "Some" means "not all". > >> > >> | In the priority-based case, the replication master >> > >> | must wait for a reply from the slowest standby in the >> > >> | required number of standbys in priority order, which may >> > >> | slower than the rest. > > > Quorum-based synchronous replication is expected to be more > efficient than priority-based one when your master doesn't need > to be in sync with all of the nominated standbys by > <varname>synchronous_standby_names</>. While quorum-based > replication master waits only for a specified number of fastest > standbys, priority-based replicatoin master must wait for > standbys at the top of the list, which may be slower than the > rest. This description looks good to me. I've updated the patch based on this description and attached it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Apr 24, 2017 at 9:02 AM, Noah Misch <noah@leadboat.com> wrote: > On Thu, Apr 20, 2017 at 11:34:34PM -0700, Noah Misch wrote: >> On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote: >> > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote: >> > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: >> > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: >> > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: >> > >> >> As I told firstly this is not a bug. There are some proposals for better design >> > >> >> of priority column in pg_stat_replication, but we've not reached the consensus >> > >> >> yet. So I think that it's better to move this open item to "Design Decisions to >> > >> >> Recheck Mid-Beta" section so that we can hear more opinions. >> > >> > >> > >> > I'm reading that some people want to report NULL priority, some people want to >> > >> > report a constant 1 priority, and nobody wants the current behavior. Is that >> > >> > an accurate summary? >> > >> >> > >> Yes, I think that's correct. >> > > >> > > Okay, but ... >> > > >> > >> FWIW the reason of current behavior is that it would be useful for the >> > >> user who is willing to switch from ANY to FIRST. They can know which >> > >> standbys will become sync or potential. >> > > >> > > ... does this mean you personally want to keep the current behavior? If not, >> > > has some other person stated a wish to keep the current behavior? >> > >> > No, I want to change the current behavior. IMO it's better to set >> > priority 1 to all standbys in quorum set. I guess there is no longer >> > person who supports the current behavior. >> >> In that case, this open item is not eligible for section "Design Decisions to >> Recheck Mid-Beta". That section is for items where we'll probably change >> nothing, but we plan to recheck later just in case. Here, we expect to change >> the behavior; the open question is which replacement behavior to prefer. >> >> Fujii, as the owner of this open item, you are responsible for moderating the >> debate until there's adequate consensus to make a particular change or to keep >> the current behavior after all. Please proceed to do that. Beta testers >> deserve a UI they may like, not a UI you already plan to change later. > > Please observe the policy on open item ownership[1] and send a status update > within three calendar days of this message. Include a date for your > subsequent status update. Okay, so our consensus is to always set the priorities of sync standbys to 1 in quorum-based syncrep case. Attached patch does this change. Barrying any objection, I will commit this. I will commit something to close this open item by April 28th at the latest (IOW before my vacation starts). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Apr 24, 2017 at 2:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Apr 20, 2017 at 9:31 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Ok, I got the point. >> >> At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp> >>> > >> | <para> >>> > >> | Quorum-based synchronous replication is basically more >>> > >> | efficient than priority-based one when you specify multiple >>> > >> | standbys in <varname>synchronous_standby_names</> and want >>> > >> | to synchronously replicate transactions to two or more of >>> > >> | them. >> >> "Some" means "not all". >> >>> > >> | In the priority-based case, the replication master >>> > >> | must wait for a reply from the slowest standby in the >>> > >> | required number of standbys in priority order, which may >>> > >> | slower than the rest. >> >> >> Quorum-based synchronous replication is expected to be more >> efficient than priority-based one when your master doesn't need >> to be in sync with all of the nominated standbys by >> <varname>synchronous_standby_names</>. This description may be invalid in the case where the requested number of sync standbys is smaller than the number of "nominated" standbys by s_s_names. For example, please imagine the case where there are five standbys nominated by s_s_name, the requested number of sync standbys is 2, and only two sync standbys are running. In this case, the master needs to wait for those two standbys whatever the sync rep method is. I think that we should rewrite that to something like "quorum-based synchronous replication is more effecient when the requested number of synchronous standbys is smaller than the number of potential synchronous standbys running". > While quorum-based >> replication master waits only for a specified number of fastest >> standbys, priority-based replicatoin master must wait for >> standbys at the top of the list, which may be slower than the >> rest. > > This description looks good to me. I've updated the patch based on > this description and attached it. But I still think that the original description that I used in my patch is better than this.... Regards, -- Fujii Masao
On Tue, Apr 25, 2017 at 12:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Apr 24, 2017 at 9:02 AM, Noah Misch <noah@leadboat.com> wrote: >> On Thu, Apr 20, 2017 at 11:34:34PM -0700, Noah Misch wrote: >>> On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote: >>> > On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch <noah@leadboat.com> wrote: >>> > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: >>> > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch <noah@leadboat.com> wrote: >>> > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: >>> > >> >> As I told firstly this is not a bug. There are some proposals for better design >>> > >> >> of priority column in pg_stat_replication, but we've not reached the consensus >>> > >> >> yet. So I think that it's better to move this open item to "Design Decisions to >>> > >> >> Recheck Mid-Beta" section so that we can hear more opinions. >>> > >> > >>> > >> > I'm reading that some people want to report NULL priority, some people want to >>> > >> > report a constant 1 priority, and nobody wants the current behavior. Is that >>> > >> > an accurate summary? >>> > >> >>> > >> Yes, I think that's correct. >>> > > >>> > > Okay, but ... >>> > > >>> > >> FWIW the reason of current behavior is that it would be useful for the >>> > >> user who is willing to switch from ANY to FIRST. They can know which >>> > >> standbys will become sync or potential. >>> > > >>> > > ... does this mean you personally want to keep the current behavior? If not, >>> > > has some other person stated a wish to keep the current behavior? >>> > >>> > No, I want to change the current behavior. IMO it's better to set >>> > priority 1 to all standbys in quorum set. I guess there is no longer >>> > person who supports the current behavior. >>> >>> In that case, this open item is not eligible for section "Design Decisions to >>> Recheck Mid-Beta". That section is for items where we'll probably change >>> nothing, but we plan to recheck later just in case. Here, we expect to change >>> the behavior; the open question is which replacement behavior to prefer. >>> >>> Fujii, as the owner of this open item, you are responsible for moderating the >>> debate until there's adequate consensus to make a particular change or to keep >>> the current behavior after all. Please proceed to do that. Beta testers >>> deserve a UI they may like, not a UI you already plan to change later. >> >> Please observe the policy on open item ownership[1] and send a status update >> within three calendar days of this message. Include a date for your >> subsequent status update. > > Okay, so our consensus is to always set the priorities of sync standbys > to 1 in quorum-based syncrep case. Attached patch does this change. > Barrying any objection, I will commit this. +1 Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Tue, 25 Apr 2017 01:13:12 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwFZHQXfu04d+FwOOgFzvXdRoRvPrU6jFQJRF2BPLkADsQ@mail.gmail.com> > On Mon, Apr 24, 2017 at 2:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Apr 20, 2017 at 9:31 AM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> Ok, I got the point. > >> > >> At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrotein <20170419.173901.16598616.horiguchi.kyotaro@lab.ntt.co.jp> > >>> > >> | <para> > >>> > >> | Quorum-based synchronous replication is basically more > >>> > >> | efficient than priority-based one when you specify multiple > >>> > >> | standbys in <varname>synchronous_standby_names</> and want > >>> > >> | to synchronously replicate transactions to two or more of > >>> > >> | them. > >> > >> "Some" means "not all". > >> > >>> > >> | In the priority-based case, the replication master > >>> > >> | must wait for a reply from the slowest standby in the > >>> > >> | required number of standbys in priority order, which may > >>> > >> | slower than the rest. > >> > >> > >> Quorum-based synchronous replication is expected to be more > >> efficient than priority-based one when your master doesn't need > >> to be in sync with all of the nominated standbys by > >> <varname>synchronous_standby_names</>. > > This description may be invalid in the case where the requested number > of sync standbys is smaller than the number of "nominated" standbys by > s_s_names. For example, please imagine the case where there are five > standbys nominated by s_s_name, the requested number of sync standbys > is 2, and only two sync standbys are running. In this case, the master > needs to wait for those two standbys whatever the sync rep method is. Hmm. The 'nominated' standbys are standbys that their names are listed in the s_s_names. "your master doesn't need to be in sync with all of" means "number of sync standbys is smaller than the number of.." So it seems to be the same... for me. > I think that we should rewrite that to something like "quorum-based > synchronous replication is more effecient when the requested number > of synchronous standbys is smaller than the number of potential > synchronous standbys running". Against this phrase, "potential sync standbys" is "nominated standbys". > > While quorum-based > >> replication master waits only for a specified number of fastest > >> standbys, priority-based replicatoin master must wait for > >> standbys at the top of the list, which may be slower than the > >> rest. > > > This description looks good to me. I've updated the patch based on > > this description and attached it. > > But I still think that the original description that I used in my patch is > better than this.... I'm not good at composition, so I cannot insist on my proposal. For the convenience of others, here is the proposal from Fujii-san. + A quorum-based synchronous replication is basically more efficient than + a priority-based one when you specify multiple standbys in + <varname>synchronous_standby_names</> and want to replicate + the transactions to some of them synchronously. In this case, + the transactions in a priority-based synchronous replication must wait for + reply from the slowest standby in synchronous standbys chosen based on + their priorities, and which may increase the transaction latencies. + On the other hand, using a quorum-based synchronous replication may + improve those latencies because it makes the transactions wait only for + replies from the requested number of faster standbys in all the listed + standbys, i.e., such slow standby doesn't block the transactions. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 25 Apr 2017 09:22:59 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAG88zYUwhV9L5muNX-qPSB+AgzerFDD0JDDVoM25gKKw@mail.gmail.com> > >> Please observe the policy on open item ownership[1] and send a status update > >> within three calendar days of this message. Include a date for your > >> subsequent status update. > > > > Okay, so our consensus is to always set the priorities of sync standbys > > to 1 in quorum-based syncrep case. Attached patch does this change. > > Barrying any objection, I will commit this. > > +1 Ok, +1 from me. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Apr 25, 2017 at 2:09 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > I'm not good at composition, so I cannot insist on my > proposal. For the convenience of others, here is the proposal > from Fujii-san. > Do you see any problem with the below proposal? To me, this sounds reasonable. > + A quorum-based synchronous replication is basically more efficient than > + a priority-based one when you specify multiple standbys in > + <varname>synchronous_standby_names</> and want to replicate > + the transactions to some of them synchronously. In this case, > + the transactions in a priority-based synchronous replication must wait for > + reply from the slowest standby in synchronous standbys chosen based on > + their priorities, and which may increase the transaction latencies. > + On the other hand, using a quorum-based synchronous replication may > + improve those latencies because it makes the transactions wait only for > + replies from the requested number of faster standbys in all the listed > + standbys, i.e., such slow standby doesn't block the transactions. > Can we do few modifications like: improve those latencies --> reduce those latencies such slow standby --> a slow standby -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 25, 2017 at 8:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Apr 25, 2017 at 2:09 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> >> I'm not good at composition, so I cannot insist on my >> proposal. For the convenience of others, here is the proposal >> from Fujii-san. >> > > Do you see any problem with the below proposal? > To me, this sounds reasonable. I agree. > >> + A quorum-based synchronous replication is basically more efficient than >> + a priority-based one when you specify multiple standbys in >> + <varname>synchronous_standby_names</> and want to replicate >> + the transactions to some of them synchronously. In this case, >> + the transactions in a priority-based synchronous replication must wait for >> + reply from the slowest standby in synchronous standbys chosen based on >> + their priorities, and which may increase the transaction latencies. >> + On the other hand, using a quorum-based synchronous replication may >> + improve those latencies because it makes the transactions wait only for >> + replies from the requested number of faster standbys in all the listed >> + standbys, i.e., such slow standby doesn't block the transactions. >> > > Can we do few modifications like: > improve those latencies --> reduce those latencies > such slow standby --> a slow standby > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Apr 25, 2017 at 5:41 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Tue, 25 Apr 2017 09:22:59 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAG88zYUwhV9L5muNX-qPSB+AgzerFDD0JDDVoM25gKKw@mail.gmail.com> >> >> Please observe the policy on open item ownership[1] and send a status update >> >> within three calendar days of this message. Include a date for your >> >> subsequent status update. >> > >> > Okay, so our consensus is to always set the priorities of sync standbys >> > to 1 in quorum-based syncrep case. Attached patch does this change. >> > Barrying any objection, I will commit this. >> >> +1 > > Ok, +1 from me. Pushed. Thanks! Regards, -- Fujii Masao
At Tue, 25 Apr 2017 21:21:29 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoBqpMzQ3hnLjOrAj1PX__Bqo9XWUhSX9hzAewdbQP9QKg@mail.gmail.com> > On Tue, Apr 25, 2017 at 8:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 25, 2017 at 2:09 PM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> > >> I'm not good at composition, so I cannot insist on my > >> proposal. For the convenience of others, here is the proposal > >> from Fujii-san. > >> > > > > Do you see any problem with the below proposal? > > To me, this sounds reasonable. > > I agree. Ok, I give up:p Thanks for shoving me. > >> + A quorum-based synchronous replication is basically more efficient than > >> + a priority-based one when you specify multiple standbys in > >> + <varname>synchronous_standby_names</> and want to replicate > >> + the transactions to some of them synchronously. In this case, > >> + the transactions in a priority-based synchronous replication must wait for > >> + reply from the slowest standby in synchronous standbys chosen based on > >> + their priorities, and which may increase the transaction latencies. > >> + On the other hand, using a quorum-based synchronous replication may > >> + improve those latencies because it makes the transactions wait only for > >> + replies from the requested number of faster standbys in all the listed > >> + standbys, i.e., such slow standby doesn't block the transactions. > >> > > > > Can we do few modifications like: > > improve those latencies --> reduce those latencies > > such slow standby --> a slow standby > > > > -- > > With Regards, > > Amit Kapila. > > EnterpriseDB: http://www.enterprisedb.com > > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Apr 06, 2017 at 08:55:37AM +0200, Petr Jelinek wrote: > On 06/04/17 03:51, Noah Misch wrote: > > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: > >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: > >>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: > >>>> Regarding this feature, there are some loose ends. We should work on > >>>> and complete them until the release. > >>>> > >>>> (1) > >>>> Which synchronous replication method, priority or quorum, should be > >>>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > >>>> a priority-based sync replication is chosen for keeping backward > >>>> compatibility. However some hackers argued to change this decision > >>>> so that a quorum commit is chosen because they think that most users > >>>> prefer to a quorum. > >> The items (1) and (3) are not bugs. So I don't think that they need to be > >> resolved before the beta release. After the feature freeze, many users > >> will try and play with many new features including quorum-based syncrep. > >> Then if many of them complain about (1) and (3), we can change the code > >> at that timing. So we need more time that users can try the feature. > > > > I've moved (1) to a new section for things to revisit during beta. If someone > > feels strongly that the current behavior is Wrong and must change, speak up as > > soon as you reach that conclusion. Absent such arguments, the behavior won't > > change. > > > > I was one of the people who said in original thread that I think the > default behavior should change to quorum and I am still of that opinion. This item appears under "decisions to recheck mid-beta". If anyone is going to push for a change here, now is the time.
On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote: > On Thu, Apr 06, 2017 at 08:55:37AM +0200, Petr Jelinek wrote: >> On 06/04/17 03:51, Noah Misch wrote: >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote: >> >> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch <noah@leadboat.com> wrote: >> >>> On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote: >> >>>> Regarding this feature, there are some loose ends. We should work on >> >>>> and complete them until the release. >> >>>> >> >>>> (1) >> >>>> Which synchronous replication method, priority or quorum, should be >> >>>> chosen when neither FIRST nor ANY is specified in s_s_names? Right now, >> >>>> a priority-based sync replication is chosen for keeping backward >> >>>> compatibility. However some hackers argued to change this decision >> >>>> so that a quorum commit is chosen because they think that most users >> >>>> prefer to a quorum. > >> >> The items (1) and (3) are not bugs. So I don't think that they need to be >> >> resolved before the beta release. After the feature freeze, many users >> >> will try and play with many new features including quorum-based syncrep. >> >> Then if many of them complain about (1) and (3), we can change the code >> >> at that timing. So we need more time that users can try the feature. >> > >> > I've moved (1) to a new section for things to revisit during beta. If someone >> > feels strongly that the current behavior is Wrong and must change, speak up as >> > soon as you reach that conclusion. Absent such arguments, the behavior won't >> > change. >> > >> >> I was one of the people who said in original thread that I think the >> default behavior should change to quorum and I am still of that opinion. > > This item appears under "decisions to recheck mid-beta". If anyone is going > to push for a change here, now is the time. It has been 1 week since the previous mail. I though that there were others argued to change the behavior of old-style setting so that a quorum commit is chosen. If nobody is going to push for a change we can live with the current behavior? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Aug 4, 2017 at 8:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote: >> This item appears under "decisions to recheck mid-beta". If anyone is going >> to push for a change here, now is the time. > > It has been 1 week since the previous mail. I though that there were > others argued to change the behavior of old-style setting so that a > quorum commit is chosen. If nobody is going to push for a change we > can live with the current behavior? FWIW, I still see no harm in keeping backward-compatibility here, so I am in favor of a statu-quo. -- Michael
On 08/09/2017 10:49 PM, Michael Paquier wrote: > On Fri, Aug 4, 2017 at 8:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote: >>> This item appears under "decisions to recheck mid-beta". If anyone is going >>> to push for a change here, now is the time. >> >> It has been 1 week since the previous mail. I though that there were >> others argued to change the behavior of old-style setting so that a >> quorum commit is chosen. If nobody is going to push for a change we >> can live with the current behavior? > > FWIW, I still see no harm in keeping backward-compatibility here, so I > am in favor of a statu-quo. > I am vaguely in favor of making quorum the default over "ordered". However, given that anybody using sync commit without understanding/customizing the setup is going to be sorry regardless, keeping backwards compatibility is acceptable. -- Josh Berkus Containers & Databases Oh My!
On Fri, Aug 11, 2017 at 1:40 AM, Josh Berkus <josh@berkus.org> wrote: > On 08/09/2017 10:49 PM, Michael Paquier wrote: >> On Fri, Aug 4, 2017 at 8:19 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Fri, Jul 28, 2017 at 2:24 PM, Noah Misch <noah@leadboat.com> wrote: >>>> This item appears under "decisions to recheck mid-beta". If anyone is going >>>> to push for a change here, now is the time. >>> >>> It has been 1 week since the previous mail. I though that there were >>> others argued to change the behavior of old-style setting so that a >>> quorum commit is chosen. If nobody is going to push for a change we >>> can live with the current behavior? >> >> FWIW, I still see no harm in keeping backward-compatibility here, so I >> am in favor of a statu-quo. >> > > I am vaguely in favor of making quorum the default over "ordered". > However, given that anybody using sync commit without > understanding/customizing the setup is going to be sorry regardless, > keeping backwards compatibility is acceptable. > Thank you for the comment. FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse users and we want to break the backward compatibility, I'd rather like to remove that style in PostgreSQL 10 and to raise an syntax error to user for more safety. Also, since the syntax 'a, b' might be opaque for new users who don't know the history of s_s_names syntax, we could unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping the '*'. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Aug 16, 2017 at 4:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse > users and we want to break the backward compatibility, I'd rather like > to remove that style in PostgreSQL 10 and to raise an syntax error to > user for more safety. Also, since the syntax 'a, b' might be opaque > for new users who don't know the history of s_s_names syntax, we could > unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping > the '*'. I find the removal of a syntax in release N for something introduced in release (N - 1) a bit hard to swallow from the user prospective. What about just issuing a warning instead and say that the use of ANY/FIRST is recommended? It costs nothing in maintenance to keep it around. -- Michael
On Wed, Aug 16, 2017 at 4:37 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Aug 16, 2017 at 4:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse >> users and we want to break the backward compatibility, I'd rather like >> to remove that style in PostgreSQL 10 and to raise an syntax error to >> user for more safety. Also, since the syntax 'a, b' might be opaque >> for new users who don't know the history of s_s_names syntax, we could >> unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping >> the '*'. > > I find the removal of a syntax in release N for something introduced > in release (N - 1) a bit hard to swallow from the user prospective. > What about just issuing a warning instead and say that the use of > ANY/FIRST is recommended? It costs nothing in maintenance to keep it > around. Yeah, I think that would be better. If we decide to not make quorum commit the default we can issue a warning in docs. Attached a draft patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Aug 17, 2017 at 2:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Aug 16, 2017 at 4:37 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Aug 16, 2017 at 4:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> FWIW, in my opinion if tte current behavior of 'N(a,b)' could confuse >>> users and we want to break the backward compatibility, I'd rather like >>> to remove that style in PostgreSQL 10 and to raise an syntax error to >>> user for more safety. Also, since the syntax 'a, b' might be opaque >>> for new users who don't know the history of s_s_names syntax, we could >>> unify its syntax to '[ANY|FIRST] N (a, b, ...)' syntax while keeping >>> the '*'. >> >> I find the removal of a syntax in release N for something introduced >> in release (N - 1) a bit hard to swallow from the user prospective. >> What about just issuing a warning instead and say that the use of >> ANY/FIRST is recommended? It costs nothing in maintenance to keep it >> around. > > Yeah, I think that would be better. If we decide to not make quorum > commit the default we can issue a warning in docs. Attached a draft > patch. I had in mind a ereport(WARNING) in create_syncrep_config. Extra thoughts/opinions welcome. -- Michael
On Thu, Aug 17, 2017 at 1:13 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > I had in mind a ereport(WARNING) in create_syncrep_config. Extra > thoughts/opinions welcome. I think for v10 we should just document the behavior we've got; I think it's too late to be whacking things around now. For v11, we could emit a warning if we plan to deprecate and eventually remove the syntax without ANY/FIRST, but let's not do: WARNING: what you did is ok, but you might have wanted to do something else First of all, whether or not that can properly be called a warning is highly debatable. Also, if you do that sort of thing to your spouse and/or children, they call it "nagging". I don't think users will like it any more than family members do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Aug 19, 2017 at 12:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Aug 17, 2017 at 1:13 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> I had in mind a ereport(WARNING) in create_syncrep_config. Extra >> thoughts/opinions welcome. > > I think for v10 we should just document the behavior we've got; I > think it's too late to be whacking things around now. > > For v11, we could emit a warning if we plan to deprecate and > eventually remove the syntax without ANY/FIRST, but let's not do: > > WARNING: what you did is ok, but you might have wanted to do something else > > First of all, whether or not that can properly be called a warning is > highly debatable. Also, if you do that sort of thing to your spouse > and/or children, they call it "nagging". I don't think users will > like it any more than family members do. > It seems to me that we should discuss whether we want to keep the some syntax such as 'a,b', 'N(a,b)' before thinking whether or not that making the quorum commit the default behavior of 'N(a,b)' syntax. If we plan to remove such syntax in a future release we can live with the current code and should document it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Aug 23, 2017 at 3:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > It seems to me that we should discuss whether we want to keep the some > syntax such as 'a,b', 'N(a,b)' before thinking whether or not that > making the quorum commit the default behavior of 'N(a,b)' syntax. If > we plan to remove such syntax in a future release we can live with the > current code and should document it. The parsing code of repl_gram.y represents zero maintenance at the end, so let me suggest to just live with what we have and do nothing. Things kept as they are are not bad either. By changing the default, people may have their failover flows silently trapped. So if we change the default we will perhaps make some users happy, but I think that we are going to make also some people angry. That's not fun to debug silent failover issues. At the end of the day, we could just add one sentence in the docs saying the use of ANY and FIRST is encouraged over the past grammar because they are clearer to understand. -- Michael
On 08/22/2017 11:04 PM, Masahiko Sawada wrote: > WARNING: what you did is ok, but you might have wanted to do something else > > First of all, whether or not that can properly be called a warning is > highly debatable. Also, if you do that sort of thing to your spouse > and/or children, they call it "nagging". I don't think users will > like it any more than family members do. Realistically, we'll support the backwards-compatible syntax for 3-5 years. Which is fine. I suggest that we just gradually deprecate the old syntax from the docs, and then around Postgres 16 eliminate it. I posit that that's better than changing the meaning of the old syntax out from under people. -- Josh Berkus Containers & Databases Oh My!
On Thu, Aug 24, 2017 at 3:11 AM, Josh Berkus <josh@berkus.org> wrote: > On 08/22/2017 11:04 PM, Masahiko Sawada wrote: >> WARNING: what you did is ok, but you might have wanted to do something else >> >> First of all, whether or not that can properly be called a warning is >> highly debatable. Also, if you do that sort of thing to your spouse >> and/or children, they call it "nagging". I don't think users will >> like it any more than family members do. > > Realistically, we'll support the backwards-compatible syntax for 3-5 > years. Which is fine. > > I suggest that we just gradually deprecate the old syntax from the docs, > and then around Postgres 16 eliminate it. I posit that that's better > than changing the meaning of the old syntax out from under people. > It seems to me that there is no folk who intently votes for making the quorum commit the default. There some folks suggest to keep backward compatibility in PG10 and gradually deprecate the old syntax. And only the issuing from docs can be possible in PG10. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Aug 24, 2017 at 4:27 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Aug 24, 2017 at 3:11 AM, Josh Berkus <josh@berkus.org> wrote: >> On 08/22/2017 11:04 PM, Masahiko Sawada wrote: >>> WARNING: what you did is ok, but you might have wanted to do something else >>> >>> First of all, whether or not that can properly be called a warning is >>> highly debatable. Also, if you do that sort of thing to your spouse >>> and/or children, they call it "nagging". I don't think users will >>> like it any more than family members do. >> >> Realistically, we'll support the backwards-compatible syntax for 3-5 >> years. Which is fine. >> >> I suggest that we just gradually deprecate the old syntax from the docs, >> and then around Postgres 16 eliminate it. I posit that that's better >> than changing the meaning of the old syntax out from under people. >> > > It seems to me that there is no folk who intently votes for making the > quorum commit the default. There some folks suggest to keep backward > compatibility in PG10 and gradually deprecate the old syntax. And only > the issuing from docs can be possible in PG10. > According to the discussion so far, it seems to me that keeping backward compatibility and issuing a warning in docs that old syntax could be changed or removed in a future release is the most acceptable way in PG10. There is no objection against that so far and I already posted a patch to add a warning in docs[1]. I'll wait for the committer's decision. [1] https://www.postgresql.org/message-id/CAD21AoAe%2BoGSFi3bjZ%2BfW6Q%3DTK7avPdDCZLEr02zM_c-U0JsRA%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center