Re: Multi-Master Logical Replication - Mailing list pgsql-hackers

From vignesh C
Subject Re: Multi-Master Logical Replication
Date
Msg-id CALDaNm3aD3nZ0HWXA8V435AGMvORyR5-mq2FzqQdKQ8CPomB5Q@mail.gmail.com
Whole thread Raw
In response to Re: Multi-Master Logical Replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Apr 29, 2022 at 2:35 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 2:16 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> >
> > В Чт, 28/04/2022 в 17:37 +0530, vignesh C пишет:
> > > On Thu, Apr 28, 2022 at 4:24 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> > > > В Чт, 28/04/2022 в 09:49 +1000, Peter Smith пишет:
> > > >
> > > > > 1.1 ADVANTAGES OF MMLR
> > > > >
> > > > > - Increases write scalability (e.g., all nodes can write arbitrary data).
> > > >
> > > > I've never heard how transactional-aware multimaster increases
> > > > write scalability. More over, usually even non-transactional
> > > > multimaster doesn't increase write scalability. At the best it
> > > > doesn't decrease.
> > > >
> > > > That is because all hosts have to write all changes anyway. But
> > > > side cost increases due to increased network interchange and
> > > > interlocking (for transaction-aware MM) and increased latency.
> > >
> > > I agree it won't increase in all cases, but it will be better in a few
> > > cases when the user works on different geographical regions operating
> > > on independent schemas in asynchronous mode. Since the write node is
> > > closer to the geographical zone, the performance will be better in a
> > > few cases.
> >
> > From EnterpriseDB BDB page [1]:
> >
> > > Adding more master nodes to a BDR Group does not result in
> > > significant write throughput increase when most tables are
> > > replicated because BDR has to replay all the writes on all nodes.
> > > Because BDR writes are in general more effective than writes coming
> > > from Postgres clients via SQL, some performance increase can be
> > > achieved. Read throughput generally scales linearly with the number
> > > of nodes.
> >
> > And I'm sure EnterpriseDB does the best.
> >
> > > > В Чт, 28/04/2022 в 08:34 +0000, kuroda.hayato@fujitsu.com пишет:
> > > > > Dear Laurenz,
> > > > >
> > > > > Thank you for your interest in our works!
> > > > >
> > > > > > I am missing a discussion how replication conflicts are handled to
> > > > > > prevent replication from breaking
> > > > >
> > > > > Actually we don't have plans for developing the feature that avoids conflict.
> > > > > We think that it should be done as core PUB/SUB feature, and
> > > > > this module will just use that.
> > > >
> > > > If you really want to have some proper isolation levels (
> > > > Read Committed? Repeatable Read?) and/or want to have
> > > > same data on each "master", there is no easy way. If you
> > > > think it will be "easy", you are already wrong.
> > >
> > > The synchronous_commit and synchronous_standby_names configuration
> > > parameters will help in getting the same data across the nodes. Can
> > > you give an example for the scenario where it will be difficult?
> >
> > So, synchronous or asynchronous?
> > Synchronous commit on every master, every alive master or on quorum
> > of masters?
> >
> > And it is not about synchronicity. It is about determinism at
> > conflicts.
> >
> > If you have fully determenistic conflict resolution that works
> > exactly same way on each host, then it is possible to have same
> > data on each host. (But it will not be transactional.)And it seems EDB BDB achieved this.
> >
> > Or if you have fully and correctly implemented one of distributed
> > transactions protocols.
> >
> > [1]  https://www.enterprisedb.com/docs/bdr/latest/overview/#characterising-bdr-performance
> >
> > regards
> >
> > ------
> >
> > Yura Sokolov
>
> Thanks for your feedback.
>
> This MMLR proposal was mostly just to create an interface making it
> easier to use PostgreSQL core logical replication CREATE
> PUBLICATION/SUBSCRIPTION for table sharing among a set of nodes.
> Otherwise, this is difficult for a user to do manually. (e.g.
> difficulties as mentioned in section 2.2 of the original post [1] -
> dealing with initial table data, coordinating the timing/locking to
> avoid concurrent updates, getting the SUBSCRIPTION options for
> copy_data exactly right etc)

Different problems and how to solve each scenario is mentioned detailly in [1].
It gets even more complex when there are more nodes associated, let's
consider the 3 node case:
Adding a new node node3 to the existing node1 and node2 when data is
present in existing nodes node1 and node2, the following steps are
required:
Create a publication in node3:
CREATE PUBLICATION pub_node3 for all tables;

Create a subscription in node1 to subscribe the changes from node3:
CREATE SUBSCRIPTION sub_node1_node3 CONNECTION 'dbname=foo host=node3
user=repuser' PUBLICATION pub_node3 WITH (copy_data = off, local_only
= on);

Create a subscription in node2 to subscribe the changes from node3:
CREATE SUBSCRIPTION sub_node2_node3 CONNECTION 'dbname=foo host=node3
user=repuser' PUBLICATION pub_node3 WITH (copy_data = off, local_only
= on);

Lock database at node2 and wait till walsender sends WAL to node1(upto
current lsn) to avoid any data loss because of node2's WAL not being
sent to node1. This lock needs to be held till the setup is complete.

Create a subscription in node3 to subscribe the changes from node1,
here copy_data is specified as force so that the existing table data
is copied during initial sync:
CREATE SUBSCRIPTION sub_node3_node1
CONNECTION 'dbname=foo host=node1 user=repuser'
PUBLICATION pub_node1
 WITH (copy_data = force, local_only = on);

Create a subscription in node3 to subscribe the changes from node2:
CREATE SUBSCRIPTION sub_node3_node2
 CONNECTION 'dbname=foo host=node2 user=repuser'
 PUBLICATION pub_node2
 WITH (copy_data = off, local_only = on);

If data is present in node3 few more additional steps are required: a)
copying node3 data to node1 b) copying node3 data to node2 c) altering
publication not to send truncate operation d) truncate the data in
node3 e) altering the publication to include sending of truncate.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2Bco2cd8a6okgUD_pcFEHcc7mVc0k_RE2%3D6ahyv3WPRMg%40mail.gmail.com

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Next
From: Alvaro Herrera
Date:
Subject: Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)