Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Support for N synchronous standby servers - take 2 |
Date | |
Msg-id | CAB7nPqQdS7wmPVXqJxF7ZgTM0L-mxM0-ohadL7=e0+UjjpsJGw@mail.gmail.com Whole thread Raw |
In response to | Re: Support for N synchronous standby servers - take 2 (Josh Berkus <josh@agliodbs.com>) |
List | pgsql-hackers |
On Mon, Jun 29, 2015 at 4:20 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 06/28/2015 04:36 AM, Sawada Masahiko wrote: >> On Sat, Jun 27, 2015 at 3:53 AM, Josh Berkus <josh@agliodbs.com> wrote: >>> On 06/26/2015 11:32 AM, Robert Haas wrote: >>>> I think your proposal is worth considering, but you would need to fill >>>> in a lot more details and explain how it works in detail, rather than >>>> just via a set of example function calls. The GUC-based syntax >>>> proposal covers cases like multi-level rules and, now, prioritization, >>>> and it's not clear how those would be reflected in what you propose. >>> >>> So what I'm seeing from the current proposal is: >>> >>> 1. we have several defined synchronous sets >>> 2. each set requires a quorum of k (defined per set) >>> 3. within each set, replicas are arranged in priority order. >>> >>> One thing which the proposal does not implement is *names* for >>> synchronous sets. I would also suggest that if I lose this battle and >>> we decide to go with a single stringy GUC, that we at least use JSON >>> instead of defining our out, proprietary, syntax? >> >> JSON would be more flexible for making synchronous set, but it will >> make us to change how to parse configuration file to enable a value >> contains newline. > > Right. Well, another reason we should be using a system catalog and not > a single GUC ... I assume that this takes into account the fact that you will still need a SIGHUP to reload properly the new node information from those catalogs and to track if some information has been modified or not. And the fact that a connection to those catalogs will be needed as well, something that we don't have now. Another barrier to the catalog approach is that catalogs get replicated to the standbys, and I think that we want to avoid that. But perhaps you simply meant having an SQL interface with some metadata, right? Perhaps I got confused by the word 'catalog'. >>> I'm personally not convinced that quorum and prioritization are >>> compatible. I suggest instead that quorum and prioritization should be >>> exclusive alternatives, that is that a synch set should be either a >>> quorum set (with all members as equals) or a prioritization set (if rep1 >>> fails, try rep2). I can imagine use cases for either mode, but not one >>> which would involve doing both together. >>> >> >> Yep, separating the GUC parameter between prioritization and quorum >> could be also good idea. > > We're agreed, then ... Er, I disagree here. Being able to get prioritization and quorum working together is a requirement of this feature in my opinion. Using again the example above with 2 data centers, being able to define a prioritization set on the set of nodes of data center 1, and a quorum set in data center 2 would reduce failure probability by being able to prevent problems where for example one or more nodes lag behind (improving performance at the same time). >> Also I think that we must enable us to decide which server we should >> promote when the master server is down. > > Yes, and probably my biggest issue with this patch is that it makes > deciding which server to fail over to *more* difficult (by adding more > synchronous options) without giving the DBA any more tools to decide how > to fail over. Aside from "because we said we'd eventually do it", what > real-world problem are we solving with this patch? Hm. This patch needs to be coupled with improvements to pg_stat_replication to be able to represent a node tree by basically adding to which group a node is assigned. I can draft that if needed, I am just a bit too lazy now... Honestly, this is not a matter of tooling. Even today if a DBA wants to change s_s_names without touching postgresql.conf you could just run ALTER SYSTEM and then reload parameters. > It's always been a problem that one can accomplish a de-facto > denial-of-service by joining a cluster using the same application_name > as the synch standby, moreso because it's far too easy to do that > accidentally. One needs to simply make the mistake of copying > recovery.conf from the synch replica instead of the async replica, and > you've created a reliability problem. That's a scripting problem then. There are many ways to do a false manipulation in this area when setting up a standby. application_name value is one, you can do worse by pointing to an incorrect IP as well, miss a firewall filter or point to an incorrect port. > Also, the fact that we use application_name for synch_standby groups > prevents us from giving the standbys in the group their own names for > identification purposes. It's only the fact that synchronous groups are > relatively useless in the current feature set that's prevented this from > being a real operational problem; if we implement quorum commit, then > users are going to want to use groups more often and will want to > identify the members of the group, and not just by IP address. Managing groups in the synchronous protocol is adding one level of complexity for the operator, while what I had in mind first was to allow a user to be able to pass to the server a formula that decides if synchronous_commit is validated or not. In any case this feels like a different feature thinking of it now. > We *really* should have discussed this feature at PGCon. What is done is done. Sawada-san and I have met last weekend, and we agreed to get a clear image of a spec for this features on this thread before doing any coding. So let's continue the discussion.. -- Michael
pgsql-hackers by date: