Re: Multi-Master Logical Replication - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Multi-Master Logical Replication |
Date | |
Msg-id | CALj2ACWw3USgZbFL55UoW8RKWkck5foe6ZuVQSe528+O8Hwo3Q@mail.gmail.com Whole thread Raw |
In response to | Re: Multi-Master Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Multi-Master Logical Replication
|
List | pgsql-hackers |
On Fri, Jun 10, 2022 at 9:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 9, 2022 at 6:04 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > MULTI-MASTER LOGICAL REPLICATION > > > > > > 1.0 BACKGROUND > > > > > > Let’s assume that a user wishes to set up a multi-master environment > > > so that a set of PostgreSQL instances (nodes) use logical replication > > > to share tables with every other node in the set. > > > > > > We define this as a multi-master logical replication (MMLR) node-set. > > > > > > <please refer to the attached node-set diagram> > > > > > > 1.1 ADVANTAGES OF MMLR > > > > > > - Increases write scalability (e.g., all nodes can write arbitrary data). > > > - Allows load balancing > > > - Allows rolling updates of nodes (e.g., logical replication works > > > between different major versions of PostgreSQL). > > > - Improves the availability of the system (e.g., no single point of failure) > > > - Improves performance (e.g., lower latencies for geographically local nodes) > > > > Thanks for working on this proposal. I have a few high-level thoughts, > > please bear with me if I repeat any of them: > > > > 1. Are you proposing to use logical replication subscribers to be in > > sync quorum? In other words, in an N-masters node, M (M >= N)-node > > configuration, will each master be part of the sync quorum in the > > other master? > > > > What exactly do you mean by sync quorum here? If you mean to say that > each master node will be allowed to wait till the commit happens on > all other nodes similar to how our current synchronous_commit and > synchronous_standby_names work, then yes, it could be achieved. I > think the patch currently doesn't support this but it could be > extended to support the same. Basically, one can be allowed to set up > async and sync nodes in combination depending on its use case. Yes, I meant each master node will be in synchronous_commit with others. In this setup, do you see any problems such as deadlocks if write-txns on the same table occur on all the masters at a time? If the master nodes are not in synchronous_commit i.e. connected in asynchronous mode, don't we have data synchronous problems because of logical decoding and replication latencies? Say, I do a bulk-insert to a table foo on master 1, Imagine there's a latency with which the inserted rows get replicated to master 2 and meanwhile I do update on the same table foo on master 2 based on the rows inserted in master 1 - master 2 doesn't have all the inserted rows on master 1 - how does the solution proposed here address this problem? > > 3. What if "some" postgres provider assures an SLA of very few seconds > > for failovers in typical HA set up with primary and multiple sync and > > async standbys? In this context, where does the multi-master > > architecture sit in the broad range of postgres use-cases? > > > > I think this is one of the primary use cases of the n-way logical > replication solution where in there shouldn't be any noticeable wait > time when one or more of the nodes goes down. All nodes have the > capability to allow writes so the app just needs to connect to another > node. I feel some analysis is required to find out and state exactly > how the users can achieve this but seems doable. The other use cases > are discussed in this thread and are summarized in emails [1][2]. IIUC, the main goals of this feature are - zero failover times and less write latencies, right? How is it going to solve the data synchronization problem (stated above) with the master nodes connected to each other in asynchronous mode? > > 4. Can the design proposed here be implemented as an extension instead > > of a core postgres solution? > > > > Yes, I think it could be. I think this proposal introduces some system > tables, so need to analyze what to do about that. BTW, do you see any > advantages to doing so? IMO, yes, doing it the extension way has many advantages - it doesn't have to touch the core part of postgres, usability will be good - whoever requires this solution will use and we can avoid code chunks within the core such as if (feature_enabled) { do foo} else { do bar} sorts. Since this feature is based on core postgres logical replication infrastructure, I think it's worth implementing it as an extension first, maybe the extension as a PoC? > > 5. Why should one use logical replication for multi master > > replication? If logical replication is used, isn't it going to be > > something like logically decode and replicate every WAL record from > > one master to all other masters? Instead, can't it be achieved via > > streaming/physical replication? > > > > The failover/downtime will be much lesser in a solution based on > logical replication because all nodes are master nodes and users will > be allowed to write on other nodes instead of waiting for the physical > standby to become writeable. I don't think that's a correct statement unless the design proposed here addresses the data synchronization problem (stated above) with the master nodes connected to each other in asynchronous mode. > Then it will allow more localized > database access for geographically distributed databases, see the > email for further details on this [3]. Also, the benefiting scenarios > are the same as all usual Logical Replication quoted benefits - e.g > version independence, getting selective/required data, etc. > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BZP9c6q1BQWSQC__w09WQ-qGt22dTmajDmTxR_CAUyJQ%40mail.gmail.com > [2] - https://www.postgresql.org/message-id/TYAPR01MB58660FCFEC7633E15106C94BF5A29%40TYAPR01MB5866.jpnprd01.prod.outlook.com > [3] - https://www.postgresql.org/message-id/CAA4eK1%2BDRHCNLongM0stsVBY01S-s%3DEa_yjBFnv_Uz3m3Hky-w%40mail.gmail.com IMHO, geographically distributed databases are "different sorts in themselves" and have different ways and means to address data synchronization, latencies, replication, failovers, conflict resolutions etc. (I'm no expert there, others may have better thoughts). Having said that, it will be great to know if there are any notable or mentionable customer typical scenarios or use-cases for multi master solutions within postgres. Regards, Bharath Rupireddy.
pgsql-hackers by date: