Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: Support for N synchronous standby servers - take 2 |
| Date | |
| Msg-id | CAD21AoA9UqcbTnDKi0osd0yhN4FPgTrg6wuZeTtvpSYy2LqL5Q@mail.gmail.com Whole thread |
| In response to | Re: Support for N synchronous standby servers - take 2 (Michael Paquier <michael.paquier@gmail.com>) |
| Responses |
Re: Support for N synchronous standby servers - take 2
|
| List | pgsql-hackers |
On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> Yes, please let's use the custom language, and let's not care of not
>>>>> more than 1 level of nesting so as it is possible to represent
>>>>> pg_stat_replication in a simple way for the user.
>>>>
>>>> "not" is used twice in this sentence in a way that renders me not able
>>>> to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>
I agree with adding new system catalog to easily checking replication
status for user. And group name will needed for this.
What about adding group name with ":" to immediately after set of
standbys like follows?
2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo]
Also, regarding sync replication according to configuration, the view
I'm thinking is following definition.
=# \d pg_synchronous_replication Column | Type | Modifiers
-------------------------+-----------+-----------name | text |sync_type | text
|wait_num | integer |sync_priority | inteter |sync_state | text |member | text[]
|level | integer |write_location | pg_lsn |flush_location | pg_lsn |apply_location |
pg_lsn |
- "name" : node name or group name, or "main" meaning top level node.
- "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL.
- "wait_num" : number of nodes/groups to wait for in this group.
- "sync_priority" : priority of node/group in this group. "main" node has "0". - the standby is
inquorum group always has
priority 1. - the standby is in priority group has
priority according to definition order.
- "sync_state" : 'sync' or 'potential' or 'quorum'. - the standby is in quorum group is always
'quorum'. - the standby is in priority group is 'sync'
/ 'potential'.
- "member" : array of members for group node, otherwise NULL.
- "level" : nested level. "main" node is level 0.
- "write/flush/apply_location" : group/node calculated LSN according
to configuration.
When sync replication is set as above, the new system view shows,
=# select * from pg_stat_replication_group; name | sync_type | wait_num | sync_priority | sync_state |member
| level | write_location | flush_location |
apply_location
-------------+---------------+---------------+-------------------+-----------------+---------------------------------------+-------+---------------------+---------------------+----------------main
| priority | 2 | 0 | sync | {local,london,tokyo} | 0 | |
|local | | 0 | 1 |
sync | | 1 | | |london |
quorum | 2 | 2 | potential | {london1,london2,london3} | 1 |
| |london1 | | 0 | 1 |
potential | | 2 | | |london2 |
| 0 | 2 |
potential | | 2 | | |london3 |
| 0 | 3 |
potential | | 2 | | |tokyo |
quorum | 1 | 3 | potential | {tokyo1,tokyo2} | 1 |
| |tokyo1 | | 0 | 1 |
quorum | | 2 | | |tokyo2 |
| 0 | 1 |
quorum | | 2 | | |
(9 rows)
Thought?
Regards,
--
Masahiko Sawada
pgsql-hackers by date: