Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id CAB7nPqQdS7wmPVXqJxF7ZgTM0L-mxM0-ohadL7=e0+UjjpsJGw@mail.gmail.com
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Mon, Jun 29, 2015 at 4:20 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 06/28/2015 04:36 AM, Sawada Masahiko wrote:
>> On Sat, Jun 27, 2015 at 3:53 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>> On 06/26/2015 11:32 AM, Robert Haas wrote:
>>>> I think your proposal is worth considering, but you would need to fill
>>>> in a lot more details and explain how it works in detail, rather than
>>>> just via a set of example function calls.  The GUC-based syntax
>>>> proposal covers cases like multi-level rules and, now, prioritization,
>>>> and it's not clear how those would be reflected in what you propose.
>>>
>>> So what I'm seeing from the current proposal is:
>>>
>>> 1. we have several defined synchronous sets
>>> 2. each set requires a quorum of k  (defined per set)
>>> 3. within each set, replicas are arranged in priority order.
>>>
>>> One thing which the proposal does not implement is *names* for
>>> synchronous sets.  I would also suggest that if I lose this battle and
>>> we decide to go with a single stringy GUC, that we at least use JSON
>>> instead of defining our out, proprietary, syntax?
>>
>> JSON would be more flexible for making synchronous set, but it will
>> make us to change how to parse configuration file to enable a value
>> contains newline.
>
> Right.  Well, another reason we should be using a system catalog and not
> a single GUC ...

I assume that this takes into account the fact that you will still
need a SIGHUP to reload properly the new node information from those
catalogs and to track if some information has been modified or not.
And the fact that a connection to those catalogs will be needed as
well, something that we don't have now. Another barrier to the catalog
approach is that catalogs get replicated to the standbys, and I think
that we want to avoid that. But perhaps you simply meant having an SQL
interface with some metadata, right? Perhaps I got confused by the
word 'catalog'.

>>> I'm personally not convinced that quorum and prioritization are
>>> compatible.  I suggest instead that quorum and prioritization should be
>>> exclusive alternatives, that is that a synch set should be either a
>>> quorum set (with all members as equals) or a prioritization set (if rep1
>>> fails, try rep2).  I can imagine use cases for either mode, but not one
>>> which would involve doing both together.
>>>
>>
>> Yep, separating the GUC parameter between prioritization and quorum
>> could be also good idea.
>
> We're agreed, then ...

Er, I disagree here. Being able to get prioritization and quorum
working together is a requirement of this feature in my opinion. Using
again the example above with 2 data centers, being able to define a
prioritization set on the set of nodes of data center 1, and a quorum
set in data center 2 would reduce failure probability by being able to
prevent problems where for example one or more nodes lag behind
(improving performance at the same time).

>> Also I think that we must enable us to decide which server we should
>> promote when the master server is down.
>
> Yes, and probably my biggest issue with this patch is that it makes
> deciding which server to fail over to *more* difficult (by adding more
> synchronous options) without giving the DBA any more tools to decide how
> to fail over.  Aside from "because we said we'd eventually do it", what
> real-world problem are we solving with this patch?

Hm. This patch needs to be coupled with improvements to
pg_stat_replication to be able to represent a node tree by basically
adding to which group a node is assigned. I can draft that if needed,
I am just a bit too lazy now...

Honestly, this is not a matter of tooling. Even today if a DBA wants
to change s_s_names without touching postgresql.conf you could just
run ALTER SYSTEM and then reload parameters.

> It's always been a problem that one can accomplish a de-facto
> denial-of-service by joining a cluster using the same application_name
> as the synch standby, moreso because it's far too easy to do that
> accidentally.  One needs to simply make the mistake of copying
> recovery.conf from the synch replica instead of the async replica, and
> you've created a reliability problem.

That's a scripting problem then. There are many ways to do a false
manipulation in this area when setting up a standby. application_name
value is one, you can do worse by pointing to an incorrect IP as well,
miss a firewall filter or point to an incorrect port.

> Also, the fact that we use application_name for synch_standby groups
> prevents us from giving the standbys in the group their own names for
> identification purposes.  It's only the fact that synchronous groups are
> relatively useless in the current feature set that's prevented this from
> being a real operational problem; if we implement quorum commit, then
> users are going to want to use groups more often and will want to
> identify the members of the group, and not just by IP address.

Managing groups in the synchronous protocol is adding one level of
complexity for the operator, while what I had in mind first was to
allow a user to be able to pass to the server a formula that decides
if synchronous_commit is validated or not. In any case this feels like
a different feature thinking of it now.

> We *really* should have discussed this feature at PGCon.

What is done is done. Sawada-san and I have met last weekend, and we
agreed to get a clear image of a spec for this features on this thread
before doing any coding. So let's continue the discussion..
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_rewind failure by file deletion in source server
Next
From: Heikki Linnakangas
Date:
Subject: Re: PANIC in GIN code