Re: Review of Synchronous Replication patches - Mailing list pgsql-hackers

From zb@cybertec.at
Subject Re: Review of Synchronous Replication patches
Date
Msg-id 45f2f88591b7bc77ba03839929736061.squirrel@internal.cybertec.at
Whole thread Raw
In response to Review of Synchronous Replication patches  (Yeb Havinga <yebhavinga@gmail.com>)
Responses Re: Review of Synchronous Replication patches
Re: Review of Synchronous Replication patches
List pgsql-hackers
Hi,

> ... Off the list I've received word from Zoltan
> that work on a new patch is planned. It would be great if ideas from
> both patches could be merged into one.

...

> * Does it follow SQL spec, or the community-agreed behavior?
> A: Unknown, though the choices in guc parameters suggest the patch's
> been made to support actual use cases.
> B: Design of patch B has been discussed on the mailing list as well as
> the wiki (for links I refer to my previous mail)

> * Have all the bases been covered?
> Patch A seems to cover two-phase commit where patch B does not. (I'm
> time constrained and currently do not have a replicated cluster with
> patch B anymore, so I cannot test).

...

> Differences:
> Patch A synchronizes by sending back the Xids that have been received
> and written to disk. When the master receives the xids, the respective
> backends with having those xids active are unlocked and signalled to
> continue. This means some locking of the procarray. The libpq protocol
> was adapted so a client (the walreceiver) can report back the xids.
> Patch B synchronizes by waiting to send wal data, until the sync
> replicas have sending back LSN pointers in the wal files they have
> synced to, in one of three ways. The libpq protocol was adapted so the
> walreceiver can report back WAL file positions.
>
> Perhaps the weakness of both patches is that they are not one. In my
> opinion, both patches have their strengths. It would be great if these
> strenght could be unified in a single patch.
>
> patch A strengths:
> * design of the guc parameters - meaning of
> min_sync_replication_clients. As I understand it, it is not possible
> with patch B to define a minimum number of synchronous standby servers a
> transaction must be replicated to, before the commit succeeds. Perhaps
> the name could be changed (quorum_min_sync_standbys?), but in my opinion
> the definitive sync replication patch needs a parameter with this number
> and exact meaning. The 'synchronous_slave' guc in boolean is also a good
> one in my opinion.
>
> patch B strengths:
> * having the walsender synced by waiting for acked LSN, instead of
> signalling respective backends with a specific XID active, seems like a
> simpler and therefore better solution.
> * the three different ways when to sync (recv,flush and replay), and
> also that this is configurable per standby.
>
> In patch B there's probably some work to do in WaitXLogSend(), after a
> long day I'm having troubles understanding it's meaning but somehow I
> believe that a more intuitive set of guc parameters can be found,
> accompanied by clear code in this function. The way patch A synchronizes
> and count synchronous standbys for a commit is clearer.

This week I am on vacation but we discussed your review a little with
Hans-Jürgen Schönig and given the opinion and the consensus that sending
back LSNs is a better solution, I will withdraw my patch from the
commitfest page.

Instead, I will post a patch that unifies my configuration choices with
Fujii's patch. Do you have suggestions for better worded GUCs?
"slave" seems to be phased out by "standby" for political correctness, so
"synchronous_standby" instead of "synchronous_slave". You mentioned
"min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?

You also noticed that my patch addressed 2PC, maybe I will have to add
this part to Fujii's patch, too. Note: I haven't yet read his patch,
maybe working with LSNs instead of XIDs make this work automatically,
I don't know. We should definitely test.

Best regards,
Zoltán Böszörményi




pgsql-hackers by date:

Previous
From: Yeb Havinga
Date:
Subject: Review of Synchronous Replication patches
Next
From: Marko Tiikkaja
Date:
Subject: Re: Rewrite, normal execution vs. EXPLAIN ANALYZE