Hello Zoltán,
Thanks for your reply!
> Instead, I will post a patch that unifies my configuration choices with
> Fujii's patch.
Please know that Fujii's patch is also a work in progress. I didn't
mention in my review the previously discussed items, most important the
changing the polling loops (see e.g.
http://archives.postgresql.org/pgsql-hackers/2010-07/msg00757.php).
> Do you have suggestions for better worded GUCs?
> "slave" seems to be phased out by "standby" for political correctness, so
> "synchronous_standby" instead of "synchronous_slave". You mentioned
> "min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?
>
The 'quorum_min_sync_standbys' came from a reply to the design of
Fujii's patch on
http://archives.postgresql.org/pgsql-hackers/2010-07/msg01167.php.
However after thinking about it a bit more, I still fail to see the use
case for a maximum quorum number. Having a 'max quorum' also seems to
contradict common meanings of 'quorum', which in itself is a minimum,
minimum number, least required number, lower limit.
Having also sync in the name is useful, imho, so people know it's not
about async servers. So then the name would become quorum_sync_standbys
or synchronous_standby_quorum.
Though I think I wouldn't use 'strict_sync_replication' (explained in
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01516.php) I can
imagine others want this feature, to not have the master halt by sync
standby failure. If that's what somebody never want, than the way this
is solved by this parameter is elegant: only wait if they are connected.
In recovery.conf, a boolean to discern between sync and async servers
(like in your patch) is IMHO better than mixing 'sync or async' with the
replication modes. Together with the replication modes, this could then
become
synchronous_standby (boolean)
synchronous_mode (recv,fsync,replay)
> You also noticed that my patch addressed 2PC, maybe I will have to add
> this part to Fujii's patch, too. Note: I haven't yet read his patch,
> maybe working with LSNs instead of XIDs make this work automatically,
> I don't know.
Yes, and I think we definately need automated replicated cluster tests.
A large part of reviewing went into virtual machine setup and cluster
setup. I'm not sure if a full test suite that includes node failures
could or should be included in the core regression test, but anything
that automates setup, a few transactions and failures would benefit
everyone working and testing on replication.
regards,
Yeb Havinga