Home > mailing lists

Re: Multi-Master Logical Replication - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Multi-Master Logical Replication
Date	June 10, 2022 07:10:15
Msg-id	CALj2ACWw3USgZbFL55UoW8RKWkck5foe6ZuVQSe528+O8Hwo3Q@mail.gmail.com Whole thread
In response to	Re: Multi-Master Logical Replication (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Multi-Master Logical Replication
List	pgsql-hackers

Tree view

On Fri, Jun 10, 2022 at 9:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 9, 2022 at 6:04 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > MULTI-MASTER LOGICAL REPLICATION
> > >
> > > 1.0 BACKGROUND
> > >
> > > Let’s assume that a user wishes to set up a multi-master environment
> > > so that a set of PostgreSQL instances (nodes) use logical replication
> > > to share tables with every other node in the set.
> > >
> > > We define this as a multi-master logical replication (MMLR) node-set.
> > >
> > > <please refer to the attached node-set diagram>
> > >
> > > 1.1 ADVANTAGES OF MMLR
> > >
> > > - Increases write scalability (e.g., all nodes can write arbitrary data).
> > > - Allows load balancing
> > > - Allows rolling updates of nodes (e.g., logical replication works
> > > between different major versions of PostgreSQL).
> > > - Improves the availability of the system (e.g., no single point of failure)
> > > - Improves performance (e.g., lower latencies for geographically local nodes)
> >
> > Thanks for working on this proposal. I have a few high-level thoughts,
> > please bear with me if I repeat any of them:
> >
> > 1. Are you proposing to use logical replication subscribers to be in
> > sync quorum? In other words, in an N-masters node, M (M >= N)-node
> > configuration, will each master be part of the sync quorum in the
> > other master?
> >
>
> What exactly do you mean by sync quorum here? If you mean to say that
> each master node will be allowed to wait till the commit happens on
> all other nodes similar to how our current synchronous_commit and
> synchronous_standby_names work, then yes, it could be achieved. I
> think the patch currently doesn't support this but it could be
> extended to support the same. Basically, one can be allowed to set up
> async and sync nodes in combination depending on its use case.

Yes, I meant each master node will be in synchronous_commit with
others. In this setup, do you see any problems such as deadlocks if
write-txns on the same table occur on all the masters at a time?

If the master nodes are not in synchronous_commit i.e. connected in
asynchronous mode, don't we have data synchronous problems because of
logical decoding and replication latencies? Say, I do a bulk-insert to
a table foo on master 1, Imagine there's a latency with which the
inserted rows get replicated to master 2 and meanwhile I do update on
the same table foo on master 2 based on the rows inserted in master 1
- master 2 doesn't have all the inserted rows on master 1 - how does
the solution proposed here address this problem?

> > 3. What if "some" postgres provider assures an SLA of very few seconds
> > for failovers in typical HA set up with primary and multiple sync and
> > async standbys? In this context, where does the multi-master
> > architecture sit in the broad range of postgres use-cases?
> >
>
> I think this is one of the primary use cases of the n-way logical
> replication solution where in there shouldn't be any noticeable wait
> time when one or more of the nodes goes down. All nodes have the
> capability to allow writes so the app just needs to connect to another
> node. I feel some analysis is required to find out and state exactly
> how the users can achieve this but seems doable. The other use cases
> are discussed in this thread and are summarized in emails [1][2].

IIUC, the main goals of this feature are - zero failover times and
less write latencies, right? How is it going to solve the data
synchronization problem (stated above) with the master nodes connected
to each other in asynchronous mode?

> > 4. Can the design proposed here be implemented as an extension instead
> > of a core postgres solution?
> >
>
> Yes, I think it could be. I think this proposal introduces some system
> tables, so need to analyze what to do about that.  BTW, do you see any
> advantages to doing so?

IMO, yes, doing it the extension way has many advantages - it doesn't
have to touch the core part of postgres, usability will be good -
whoever requires this solution will use and we can avoid code chunks
within the core such as if (feature_enabled) { do foo} else { do bar}
sorts. Since this feature is based on core postgres logical
replication infrastructure, I think it's worth implementing it as an
extension first, maybe the extension as a PoC?

> > 5. Why should one use logical replication for multi master
> > replication? If logical replication is used, isn't it going to be
> > something like logically decode and replicate every WAL record from
> > one master to all other masters? Instead, can't it be achieved via
> > streaming/physical replication?
> >
>
> The failover/downtime will be much lesser in a solution based on
> logical replication because all nodes are master nodes and users will
> be allowed to write on other nodes instead of waiting for the physical
> standby to become writeable.

I don't think that's a correct statement unless the design proposed
here addresses the data synchronization problem (stated above) with
the master nodes connected to each other in asynchronous mode.

> Then it will allow more localized
> database access for geographically distributed databases, see the
> email for further details on this [3]. Also, the benefiting scenarios
> are the same as all usual Logical Replication quoted benefits - e.g
> version independence, getting selective/required data, etc.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BZP9c6q1BQWSQC__w09WQ-qGt22dTmajDmTxR_CAUyJQ%40mail.gmail.com
> [2] -
https://www.postgresql.org/message-id/TYAPR01MB58660FCFEC7633E15106C94BF5A29%40TYAPR01MB5866.jpnprd01.prod.outlook.com
> [3] - https://www.postgresql.org/message-id/CAA4eK1%2BDRHCNLongM0stsVBY01S-s%3DEa_yjBFnv_Uz3m3Hky-w%40mail.gmail.com

IMHO, geographically distributed databases are "different sorts in
themselves" and have different ways and means to address data
synchronization, latencies, replication, failovers, conflict
resolutions etc. (I'm no expert there, others may have better
thoughts).

Having said that, it will be great to know if there are any notable or
mentionable customer typical scenarios or use-cases for multi master
solutions within postgres.

Regards,
Bharath Rupireddy.

pgsql-hackers by date:

From: Fabien COELHO
Date: 10 June 2022, 06:36:38
Subject: Re: better page-level checksums

From: Etsuro Fujita
Date: 10 June 2022, 08:17:22
Subject: Re: Error from the foreign RDBMS on a foreign table I have no privilege on

Re: Multi-Master Logical Replication - Mailing list pgsql-hackers

Previous

Next