Re: Multi-Master Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Multi-Master Logical Replication
Date
Msg-id CAA4eK1+DRHCNLongM0stsVBY01S-s=Ea_yjBFnv_Uz3m3Hky-w@mail.gmail.com
Whole thread Raw
In response to Re: Multi-Master Logical Replication  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Multi-Master Logical Replication
Re: Multi-Master Logical Replication
List pgsql-hackers
On Tue, May 24, 2022 at 5:57 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Sat, May 14, 2022 at 12:20:05PM +0530, Amit Kapila wrote:
> > On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > Uh, without these features, what workload would this help with?
> > >
> >
> > To allow replication among multiple nodes when some of the nodes may
> > have pre-existing data. This work plans to provide simple APIs to
> > achieve that. Now, let me try to explain the difficulties users can
> > face with the existing interface. It is simple to set up replication
> > among various nodes when they don't have any pre-existing data but
> > even in that case if the user operates on the same table at multiple
> > nodes, the replication will lead to an infinite loop and won't
> > proceed. The example in email [1] demonstrates that and the patch in
> > that thread attempts to solve it. I have mentioned that problem
> > because this work will need that patch.
> ...
> > This will become more complicated when more than two nodes are
> > involved, see the example provided for the three nodes case [2]. Can
> > you think of some other simpler way to achieve the same? If not, I
> > don't think the current way is ideal and even users won't prefer that.
> > I am not telling that the APIs proposed in this thread is the only or
> > best way to achieve the desired purpose but I think we should do
> > something to allow users to easily set up replication among multiple
> > nodes.
>
> You still have not answered my question above.  "Without these features,
> what workload would this help with?"  You have only explained how the
> patch would fix one of the many larger problems.
>

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to
always send and retrieve data to local nodes in a geographically
distributed database. Now, for such apps, to get 100% consistent data
among nodes, one needs to enable synchronous_mode (aka set
synchronous_standby_names) but if that hurts performance and the data
is for analytical purposes then one can use it in asynchronous mode.
Now, for such cases, if the local node goes down, the other master
node can be immediately available to use, sure it may slow down the
operations for some time till the local node come-up. For such apps,
later it will be also easier to perform online upgrades.

Without this, if the user tries to achieve the same via physical
replication by having two local nodes, it can take quite long before
the standby can be promoted to master and local reads/writes will be
much costlier.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pg_upgrade test writes to source directory
Next
From: Noah Misch
Date:
Subject: Re: "ERROR: latch already owned" on gharial