Home > mailing lists

Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Support for N synchronous standby servers - take 2
Date	February 5, 2016 09:20:10
Msg-id	CAD21AoA9UqcbTnDKi0osd0yhN4FPgTrg6wuZeTtvpSYy2LqL5Q@mail.gmail.com Whole thread
In response to	Re: Support for N synchronous standby servers - take 2 (Michael Paquier <michael.paquier@gmail.com>)
Responses	Re: Support for N synchronous standby servers - take 2
List	pgsql-hackers

Tree view

On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>> Yes, please let's use the custom language, and let's not care of not
>>>>> more than 1 level of nesting so as it is possible to represent
>>>>> pg_stat_replication in a simple way for the user.
>>>>
>>>> "not" is used twice in this sentence in a way that renders me not able
>>>> to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>

I agree with adding new system catalog to easily checking replication
status for user. And group name will needed for this.
What about adding group name with ":" to immediately after set of
standbys like follows?

2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo]

Also, regarding sync replication according to configuration, the view
I'm thinking is following definition.

=# \d pg_synchronous_replication    Column          |  Type   | Modifiers
-------------------------+-----------+-----------name                | text      |sync_type         | text
|wait_num         | integer  |sync_priority     | inteter   |sync_state        | text      |member            | text[]
  |level                 | integer  |write_location    | pg_lsn  |flush_location    | pg_lsn  |apply_location   |
pg_lsn  |
 

- "name" : node name or group name, or "main" meaning top level node.
- "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL.
- "wait_num" : number of nodes/groups to wait for in this group.
- "sync_priority" : priority of node/group in this group. "main" node has "0".                         - the standby is
inquorum group always has
 
priority 1.                         - the standby is in priority group has
priority according to definition order.
- "sync_state" : 'sync' or 'potential' or 'quorum'.                        - the standby is in quorum group is always
'quorum'.                       - the standby is in priority group is 'sync'
 
/ 'potential'.
- "member" : array of members for group node, otherwise NULL.
- "level" : nested level. "main" node is level 0.
- "write/flush/apply_location" : group/node calculated LSN according
to configuration.

When sync replication is set as above, the new system view shows,

=# select * from pg_stat_replication_group; name   | sync_type | wait_num | sync_priority | sync_state |member
        | level | write_location | flush_location |
 
apply_location

-------------+---------------+---------------+-------------------+-----------------+---------------------------------------+-------+---------------------+---------------------+----------------main
   | priority      |        2       |                 0 | sync        | {local,london,tokyo}          |     0  | |
               |local      |                |        0       |                 1 |
 
sync           |                                        |     1 |               |                      |london   |
quorum   |        2       |                 2 | potential    | {london1,london2,london3} |     1  |
|                |london1 |                |        0       |                 1 |
 
potential      |                                        |     2  |                |                      |london2 |
          |        0       |                 2 |
 
potential      |                                        |     2  |                |                      |london3 |
          |        0       |                 3 |
 
potential      |                                        |     2  |                |                      |tokyo    |
quorum   |        1       |                 3 | potential    | {tokyo1,tokyo2}                 |     1  |
 
|                      |tokyo1  |                |        0       |                 1 |
quorum       |                                         |     2  |              |                       |tokyo2  |
        |        0       |                 1 |
 
quorum       |                                         |     2  |              |                       |
(9 rows)

Thought?

Regards,

--
Masahiko Sawada

pgsql-hackers by date:

From: Joshua Berkus
Date: 05 February 2016, 09:10:30
Subject: Re: Support for N synchronous standby servers - take 2

From: Ashutosh Bapat
Date: 05 February 2016, 09:23:32
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

Previous

Next