Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id CAB7nPqQ5dOBFqUL8OfKzjJA7JGf_gqZsO+c8YwWYeZPcXgeH6A@mail.gmail.com
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Support for N synchronous standby servers - take 2  (Robert Haas <robertmhaas@gmail.com>)
Re: Support for N synchronous standby servers - take 2  (Sawada Masahiko <sawada.mshk@gmail.com>)
Re: Support for N synchronous standby servers - take 2  (Peter Eisentraut <peter_e@gmx.net>)
Re: Support for N synchronous standby servers - take 2  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Jun 25, 2015 at 8:32 PM, Simon Riggs  wrote:
> Let's start with a complex, fully described use case then work out how to
> specify what we want.

Well, one of the most simple cases where quorum commit and this
feature would be useful for is that, with 2 data centers:
- on center 1, master A and standby B
- on center 2, standby C and standby D
With the current synchronous_standby_names, what we can do now is
ensuring that one node has acknowledged the commit of master. For
example synchronous_standby_names = 'B,C,D'. But you know that :)
What this feature would allow use to do is for example being able to
ensure that a node on the data center 2 has acknowledged the commit of
master, meaning that even if data center 1 completely lost for a
reason or another we have at least one node on center 2 that has lost
no data at transaction commit.

Now, regarding the way to express that, we need to use a concept of
node group for each element of synchronous_standby_names. A group
contains a set of elements, each element being a group or a single
node. And for each group we need to know three things when a commit
needs to be acknowledged:
- Does my group need to acknowledge the commit?
- If yes, how many elements in my group need to acknowledge it?
- Does the order of my elements matter?

That's where the micro-language idea makes sense to use. For example,
we can define a group using separators and like (elt1,...eltN) or
[elt1,elt2,eltN]. Appending a number in front of a group is essential
as well for quorum commits. Hence for example, assuming that '()' is
used for a group whose element order does not matter, if we use that:
- k(elt1,elt2,eltN) means that we need for the k elements in the set
to return true (aka commit confirmation).
- k[elt1,elt2,eltN] means that we need for the first k elements in the
set to return true.

When k is not defined for a group, k = 1. Using only elements
separated by commas for the upper group means that we wait for the
first element in the set (for backward compatibility), hence:
1(elt1,elt2,eltN) <=> elt1,elt2,eltN

We could as well mix each behavior, aka being able to define for a
group to wait for the first k elements and a total of j elements in
the whole set, but I don't think that we need to go that far. I
suspect that in most cases users will be satisfied with only cases
where there is a group of data centers, and they want to be sure that
one or two in each center has acknowledged a commit to master
(performance is not the matter here if centers are not close). Hence
in the case above, you could get the behavior wanted with this
definition:
2(B,(C,D))
With more data centers, like 3 (wait for two nodes in the 3rd set):
3(B,(C,D),2(E,F,G))
Users could define more levels of group, like that:
2(A,(B,(C,D)))
But that's actually something few people would do in real cases.

> I'm nervous of "it would be good ifs" because we do a ton of work only to
> find a design flaw.

That makes sense. Let's continue arguing on it then.
-- 
Michael



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Amit Langote
Date:
Subject: Re: Support for N synchronous standby servers - take 2