Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id CAHGQGwHc838dvraSD2MynnpnVUpwrGWxRUmu7_oe8YaUSegPHQ@mail.gmail.com
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Jan 20, 2016 at 2:35 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Jan 19, 2016 at 1:52 AM, Thom Brown <thom@linux.com> wrote:
>> On 3 January 2016 at 13:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
>>> <thomas.munro@enterprisedb.com> wrote:
>>>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>>>> <thomas.munro@enterprisedb.com> wrote:
>>>>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>>>>> <thomas.munro@enterprisedb.com> wrote:
>>>>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>>>>> along with the output buffer pointer.
>>>>>
>>>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>>>> function uses synchronous_standby_num which is global variable.
>>>>> But you mean that the number of synchronous standbys is given as
>>>>> function argument?
>>>>
>>>> Yeah, I was thinking of it as the output buffer size which I would be
>>>> inclined to make more explicit (I am still coming to terms with the
>>>> use of global variables in Postgres) but it doesn't matter, please
>>>> disregard that suggestion.
>>>>
>>>>>>> As for the body of that function (which I won't paste here), it
>>>>>>> contains an algorithm to find the top K elements in an array of N
>>>>>>> elements.  It does that with a linear search through the top K seen so
>>>>>>> far for each value in the input array, so its worst case is O(KN)
>>>>>>> comparisons.  Some of the sorting gurus on this list might have
>>>>>>> something to say about that but my take is that it seems fine for the
>>>>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>>>>> that it doesn't need any space other than the output buffer, unlike
>>>>>>> some other top-K algorithms which would win for larger inputs.
>>>>>
>>>>> Yeah, it's improvement point.
>>>>> But I'm assumed that the number of synchronous replication is not
>>>>> large, so I use this algorithm as first version.
>>>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>>>
>>>> You're right, I was dropping that detail, in the tradition of the
>>>> hand-wavy school of big-O notation.  (I suppose you could skip the
>>>> inner loop when the priority is lower than the current lowest
>>>> priority, giving a O(N) best case when the walsenders are perfectly
>>>> ordered by coincidence.  Probably a bad idea or just not worth
>>>> worrying about.)
>>>
>>> Thank you for reviewing the patch.
>>> Yeah, I added the logic that skip the inner loop.
>>>
>>>>
>>>>> Attached latest version patch.
>>>>
>>>> +/*
>>>> + * Obtain currently synced LSN location: write and flush, using priority
>>>> - * In 9.1 we support only a single synchronous standby, chosen from a
>>>> - * priority list of synchronous_standby_names. Before it can become the
>>>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>>>
>>>> s/standby/standbys/
>>>>
>>>> + * list of synchronous_standby_names. Before it can become the
>>>>
>>>> s/Before it can become the/Before any standby can become a/
>>>>
>>>>   * synchronous standby it must have caught up with the primary; that may
>>>>   * take some time. Once caught up, the current highest priority standby
>>>>
>>>> s/standby/standbys/
>>>>
>>>>   * will release waiters from the queue.
>>>>
>>>> +bool
>>>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>>> +{
>>>> + int sync_standbys[synchronous_standby_num];
>>>>
>>>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>>>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>>>> in C89.)
>>>>
>>>> +/*
>>>> + * Populate a caller-supplied array which much have enough space for
>>>> + * synchronous_standby_num. Returns position of standbys currently
>>>> + * considered as synchronous, and its length.
>>>> + */
>>>> +int
>>>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>>>
>>>> s/much/must/ (my bad, in previous email).
>>>>
>>>> + ereport(ERROR,
>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>> + errmsg("The number of synchronous standbys must be smaller than the
>>>> number of listed : %d",
>>>> + synchronous_standby_num)));
>>>>
>>>> How about "the number of synchronous standbys exceeds the length of
>>>> the standby list: %d"?  Error messages usually start with lower case,
>>>> ':' is not usually preceded by a space.
>>>>
>>>> + ereport(ERROR,
>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>>>
>>>> s/The/the/, s/ : /: /
>>>
>>> Fixed you mentioned.
>>>
>>> Attached latest v5 patch.
>>> Please review it.
>>
>> synchronous_standby_num doesn't appear to be a valid GUC name:
>>
>> LOG:  unrecognized configuration parameter "synchronous_standby_num"
>> in file "/home/thom/Development/test/primary/postgresql.conf" line 244
>>
>> All I did was uncomment it and set it to a value.
>>
>
> Thank you for having a look it.
>
> Yeah, synchronous_standby_num should not exists in postgresql.conf.
> Please test for multiple sync replication with latest patch.

In synchronous_replication_method = 'priority' case, when I set
synchronous_standby_names to invalid value like 'hoge,foo' and
reloaded the configuration file, the server crashed with
the following error. This crash should not happen.
   FATAL:  invalid input syntax for integer: "hoge"

+    /*
+     * After read all synchronous replication configuration parameter, we apply
+     * settings according to replication method.
+     */
+    ProcessSynchronousReplicationConfig();

Why does the above function need to be called in ProcessConfigFile(), i.e.,
by every postgres processes? I was thinking that only walsender should
call that to check which walsender is synchronous according to the setting.

When synchronous_replication_method = '1-priority' and
synchronous_standby_names = '*', I started one synchronous standby.
Then, when I ran "SELECT * FROM pg_stat_replication", I got the
following WARNING message.
   WARNING:  detected write past chunk end in ExprContext 0x2acb3c0

I don't think that it's good design to specify the number of sync replicas
to wait for, in synchronous_standby_names. It's confusing for the users.
It's better to add separate parameter (synchronous_standby_num) for
specifying that number. Which increases the number of GUC parameters,
though.

Are we really planning to implement synchronous_replication_method=quorum
at the first version? If not, I'd like to remove s_r_method parameter
because it's meaningless. We can add it later when we implement "quorum".

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Template for commit messages
Next
From: Alvaro Herrera
Date:
Subject: Re: Weighted Stats