Multi-Master Logical Replication - Mailing list pgsql-hackers

From Peter Smith
Subject Multi-Master Logical Replication
Date
Msg-id CAHut+PuwRAoWY9pz=Eubps3ooQCOBFiYPU9Yi=VB-U+yORU7OA@mail.gmail.com
Whole thread Raw
Responses Re: Multi-Master Logical Replication  (Laurenz Albe <laurenz.albe@cybertec.at>)
Re: Multi-Master Logical Replication  (vignesh C <vignesh21@gmail.com>)
Re: Multi-Master Logical Replication  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
MULTI-MASTER LOGICAL REPLICATION

1.0 BACKGROUND

Let’s assume that a user wishes to set up a multi-master environment
so that a set of PostgreSQL instances (nodes) use logical replication
to share tables with every other node in the set.

We define this as a multi-master logical replication (MMLR) node-set.

<please refer to the attached node-set diagram>

1.1 ADVANTAGES OF MMLR

- Increases write scalability (e.g., all nodes can write arbitrary data).
- Allows load balancing
- Allows rolling updates of nodes (e.g., logical replication works
between different major versions of PostgreSQL).
- Improves the availability of the system (e.g., no single point of failure)
- Improves performance (e.g., lower latencies for geographically local nodes)

2.0 MMLR AND POSTGRESQL

It is already possible to configure a kind of MMLR set in PostgreSQL
15 using PUB/SUB, but it is very restrictive because it can only work
when no two nodes operate on the same table. This is because when two
nodes try to share the same table then there becomes a circular
recursive problem where Node1 replicates data to Node2 which is then
replicated back to Node1 and so on.

To prevent the circular recursive problem Vignesh is developing a
patch [1] that introduces new SUBSCRIPTION options "local_only" (for
publishing only data originating at the publisher node) and
"copy_data=force". Using this patch, we have created a script [2]
demonstrating how to set up all the above multi-node examples. An
overview of the necessary steps is given in the next section.

2.1 STEPS – Adding a new node N to an existing node-set

step 1. Prerequisites – Apply Vignesh’s patch [1]. All nodes in the
set must be visible to each other by a known CONNECTION. All shared
tables must already be defined on all nodes.

step 2. On node N do CREATE PUBLICATION pub_N FOR ALL TABLES

step 3. All other nodes then CREATE SUBSCRIPTION to PUBLICATION pub_N
with "local_only=on, copy_data=on" (this will replicate initial data
from the node N tables to every other node).

step 4. On node N, temporarily ALTER PUBLICATION pub_N to prevent
replication of 'truncate', then TRUNCATE all tables of node N, then
re-allow replication of 'truncate'.

step 5. On node N do CREATE SUBSCRIPTION to the publications of all
other nodes in the set
5a. Specify "local_only=on, copy_data=force" for exactly one of the
subscriptions  (this will make the node N tables now have the same
data as the other nodes)
5b. Specify "local_only=on, copy_data=off" for all other subscriptions.

step 6. Result - Now changes to any table on any node should be
replicated to every other node in the set.

Note: Steps 4 and 5 need to be done within the same transaction to
avoid loss of data in case of some command failure. (Because we can't
perform create subscription in a transaction, we need to create the
subscription in a disabled mode first and then enable it in the
transaction).

2.2 DIFFICULTIES

Notice that it becomes increasingly complex to configure MMLR manually
as the number of nodes in the set increases. There are also some
difficulties such as
- dealing with initial table data
- coordinating the timing to avoid concurrent updates
- getting the SUBSCRIPTION options for copy_data exactly right.

3.0 PROPOSAL

To make the MMLR setup simpler, we propose to create a new API that
will hide all the step details and remove the burden on the user to
get it right without mistakes.

3.1 MOTIVATION
- MMLR (sharing the same tables) is not currently possible
- Vignesh's patch [1] makes MMLR possible, but the manual setup is
still quite difficult
- An MMLR implementation can solve the timing problems (e.g., using
Database Locking)

3.2 API

Preferably the API would be implemented as new SQL functions in
PostgreSQL core, however, implementation using a contrib module or
some new SQL syntax may also be possible.

SQL functions will be like below:
- pg_mmlr_set_create = create a new set, and give it a name
- pg_mmlr_node_attach = attach the current node to a specified set
- pg_mmlr_node_detach = detach a specified node from a specified set
- pg_mmlr_set_delete = delete a specified set

For example, internally the pg_mmlr_node_attach API function would
execute the equivalent of all the CREATE PUBLICATION, CREATE
SUBSCRIPTION, and TRUNCATE steps described above.

Notice this proposal has some external API similarities with the BDR
extension [3] (which also provides multi-master logical replication),
although we plan to implement it entirely using PostgreSQL’s PUB/SUB.

4.0 ACKNOWLEDGEMENTS

The following people have contributed to this proposal – Hayato
Kuroda, Vignesh C, Peter Smith, Amit Kapila.

5.0 REFERENCES

[1] https://www.postgresql.org/message-id/flat/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9%3D9orqubhjcQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPvY2P%3DUL-X6maMA5QxFKdcdciRRCKDH3j%3D_hO8u2OyRYg%40mail.gmail.com
[3] https://www.enterprisedb.com/docs/bdr/latest/

[END]

~~~

One of my colleagues will post more detailed information later.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: bogus: logical replication rows/cols combinations
Next
From: Thomas Munro
Date:
Subject: Re: pgsql: Add contrib/pg_walinspect.