Re: Quorum commit for multiple synchronous replication. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Quorum commit for multiple synchronous replication.
Date
Msg-id CAD21AoAs+V1iVK9ki75gw6Z0cm2O+TKcznuSrwJd4GJajNGJww@mail.gmail.com
Whole thread Raw
In response to Re: Quorum commit for multiple synchronous replication.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Quorum commit for multiple synchronous replication.  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Tue, Oct 11, 2016 at 4:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Sep 28, 2016 at 5:14 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Sat, Sep 24, 2016 at 5:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I still vote for changing behaviour of existing syntax 'k (n1, n2)' to
>>> quorum commit.
>>> That is,
>>> 1.  'First k (n1, n2, n3)' means that the master server waits for ACKs
>>> from k standby servers whose name appear earlier in the list.
>>> 2.  'Any k (n1, n2, n3)' means that the master server waits for ACKs
>>> from any k listed standby servers.
>>> 3.  'n1, n2, n3' is the same as #1 with k=1.
>>> 4.  '(n1, n2, n3)' is the same as #2 with k=1.
>>
>> OK, so I have done a review of this patch keeping that in mind as
>> that's the consensus. I am still getting familiar with the code...
>
> Thank you for reviewing!
>
>> -    transactions will wait until all the standby servers which are considered
>> +    transactions will wait until the multiple standby servers which
>> are considered
>> There is no real need to update this sentence.
>>
>> +        <literal>FIRST</> means to control the standby servers with
>> +        different priorities. The synchronous standbys will be those
>> +        whose name appear earlier in this list, and that are both
>> +        currently connected and streaming data in real-time(as shown
>> +        by a state of <literal>streaming</> in the
>> +        <link linkend="monitoring-stats-views-table">
>> +        <literal>pg_stat_replication</></link> view). Other standby
>> +        servers appearing later in this list represent potential
>> +        synchronous standbys. If any of the current synchronous
>> +        standbys disconnects for whatever reason, it will be replaced
>> +        immediately with the next-highest-priority standby.
>> +        For example, a setting of <literal>FIRST 3 (s1, s2, s3, s4)</>
>> +        makes transaction commits wait until their WAL records are received
>> +        by three higher-priority standbys chosen from standby servers
>> +        <literal>s1</>, <literal>s2</>, <literal>s3</> and <literal>s4</>.
>> It does not seem necessary to me to enter in this level of details:
>> The keyword FIRST, coupled with an integer number N, chooses the first
>> N higher-priority standbys and makes transaction commit when their WAL
>> records are received. For example <literal>FIRST 3 (s1, s2, s3, s4)</>
>> makes transaction commits wait until their WAL records are received by
>> the three high-priority standbys chosen from standby servers s1, s2,
>> s3 and s4.
>
> Will fix.
>
>> +        <literal>ANY</> means to control all of standby servers with
>> +        same priority. The master sever will wait for receipt from
>> +        at least <replaceable class="parameter">num_sync</replaceable>
>> +        standbys, which is quorum commit in the literature. The all of
>> +        listed standbys are considered as candidate of quorum commit.
>> +        For example, a setting of <literal> ANY 3 (s1, s2, s3, s4)</> makes
>> +        transaction commits wait until receiving receipts from at least
>> +        any three standbys of four listed servers <literal>s1</>,
>> +        <literal>s2</>, <literal>s3</>, <literal>s4</>.
>>
>> Similarly, something like that...
>> The keyword ANY, coupled with an integer number N, chooses N standbys
>> in a set of standbys with the same, lowest, priority and makes
>> transaction commit when WAL records are received those N standbys. For
>> example ANY 3(s1, s2, s3, s4) makes transaction commits wait until WAL
>> records have been received from 3 servers in the set s1, s2, s3 and
>> s4.
>
> Will fix.
>
>> It could be good also to mention that no keyword specified means ANY,
>> which is incompatible with 9.6. The docs also miss the fact that if a
>> simple list of servers is given, without parenthesis and keywords,
>> this is equivalent to FIRST 1.
>
> Right. I will add those documentations.
>
>> -synchronous_standby_names = '2 (s1, s2, s3)'
>> +synchronous_standby_names = 'First 2 (s1, s2, s3)'
>> Nit here. It may be a good idea to just use upper-case characters in
>> the docs, or just lower-case for consistency, but not mix both.
>> Usually GUCs use lower-case characters.
>
> Agree. Will fix.
>
>> +      when standby is considered as a condidate of quorum commit.</entry>
>> s/condidate/candidate/
>
> Will fix.
>
>> -syncrep_scanner.c: FLEXFLAGS = -CF -p
>> +syncrep_scanner.c: FLEXFLAGS = -CF -p -i
>> Hm... Is that actually a good idea? Now "NODE" and "node" are two
>> different things for application_name, but with this patch both would
>> have the same meaning. I am getting to think that we could just use
>> the lower-case characters for the keywords any/first. Is this -i
>> switch a problem for elements in standby_list?
>
> The string of standby name is not changed actually, only the parser
> doesn't distinguish between "NODE" and "node".
> The values used for checking application_name will still works fine.
> If we want to name "first" or "any" as the standby name then it should
> be double quoted.
>
>> + * Calculate the 'pos' newest Write, Flush and Apply positions among
>> sync standbys.
>> I don't understand this comment.
>>
>> +   if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY)
>> +       got_recptr = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,
>> +                                            &applyPtr, &am_sync);
>> +   else /* SYNC_REP_QUORUM */
>> +       got_recptr = SyncRepGetNNewestSyncRecPtr(&writePtr, &flushPtr,
>> +                                             &applyPtr,
>> SyncRepConfig->num_sync,
>> +                                             &am_sync);
>> Those could be grouped together, there is no need to have pos as an argument.
>
> Will fix.
>
>> +   /* In quroum method, all sync standby priorities are always 1 */
>> +   if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM)
>> +       priority = 1;
>> This is dead code, SyncRepGetSyncStandbysPriority is not called for
>> QUORUM.
>
> Well, this code is in SyncRepGetStandbyPriority which is called by
> SyncRepInitConifig.
> SyncRepGetStandbyPriority can be called regardless of the the
> synchronization method.
>
>
>> You may want to add an assert in
>> SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum to be
>> sure that they are getting called for the correct method.
>> +   /* Sort each array in descending order to get 'pos' newest element */
>> +   qsort(write_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +   qsort(flush_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> +   qsort(apply_array, len, sizeof(XLogRecPtr), cmp_lsn);
>> There is no need to reorder things again and to use arrays, you can
>> choose the newest LSNs when scanning the WalSnd entries.
>
> I considered it that but it depends on performance.
> Current patch avoids O(N*M).
>

Attached latest patch.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: When is TopMemoryContext released ?
Next
From: Pavan Deolasee
Date:
Subject: Re: FSM corruption leading to errors