Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Handle infinite recursion in logical replication setup
Date
Msg-id CAA4eK1+ucspf0ypAQ3sVt-j3xg1v5wja_coEs3NXZ9B1B55mAg@mail.gmail.com
Whole thread Raw
In response to Re: Handle infinite recursion in logical replication setup  (Peter Smith <smithpb2250@gmail.com>)
Responses RE: Handle infinite recursion in logical replication setup
List pgsql-hackers
On Wed, Aug 17, 2022 at 12:34 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 17, 2022 at 8:48 AM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Thanks for the summary.
> > >
> > > I think it's fine to make the user use the copy_data option more carefully to
> > > prevent duplicate copies by reporting an ERROR.
> > >
> > > But I also have similar concern with Sawada-san as it's possible for user to
> > > receive an ERROR in some unexpected cases.
> > >
> > > For example I want to build bi-directional setup between two nodes:
> > >
> > > Node A: TABLE test (has actual data)
> > > Node B: TABLE test (empty)
> > >
> > > Step 1:
> > > CREATE PUBLICATION on both Node A and B.
> > >
> > > Step 2:
> > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > -- this is fine as there is no data on Node B
> > >
> > > Step 3:
> > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > -- this should be fine as user needs to copy data from Node A to Node B,
> > > -- but we still report an error for this case.
> > >
> > > It looks a bit strict to report an ERROR in this case and it seems not easy to
> > > avoid this. So, personally, I think it might be better to document the correct
> > > steps to build the bi-directional replication and probably also docuemnt the
> > > steps to recover if user accidently did duplicate initial copy if not
> > > documented yet.
> > >
> > > In addition, we could also LOG some additional information about the ORIGIN and
> > > initial copy which might help user to analyze if needed.
> > >
> >
> > But why LOG instead of WARNING? I feel in this case there is a chance
> > of inconsistent data so a WARNING like "publication "pub1" could have
> > data from multiple origins" can be given when the user has specified
> > options: "copy_data = on, origin = NONE" while creating a
> > subscription. We give a WARNING during subscription creation when the
> > corresponding publication doesn't exist, eg.
> >
> > postgres=# create subscription sub1 connection 'dbname = postgres'
> > publication pub1;
> > WARNING:  publication "pub1" does not exist in the publisher
> >
> > Then, we can explain in docs how users can avoid data inconsistencies
> > while setting up replication.
> >
>
> I was wondering if this copy/origin case really should be a NOTICE.
>

We usually give NOTICE for some sort of additional implicit
information, e.g., when we create a slot during CREATE SUBSCRIPTION
command: "NOTICE: created replication slot "sub1" on publisher". IMO,
this is likely to be a problem of data inconsistency so I think here
we can choose between WARNING and LOG. I prefer WARNING but okay with
LOG as well if others feel so. I think we can change this later as
well if required. We do have an option to not do anything and just
document it but I feel it is better to give user some indication of
problem here because not everyone reads each update of documentation.

Jonathan, Sawada-San, Hou-San, and others, what do you think is the
best way to move forward here?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: shadow variables - pg15 edition
Next
From: David Rowley
Date:
Subject: Re: shadow variables - pg15 edition