Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: Handle infinite recursion in logical replication setup
Date
Msg-id afb653ba-e2b1-33a3-a54c-849f4466e1b4@postgresql.org
Whole thread Raw
In response to Re: Handle infinite recursion in logical replication setup  (vignesh C <vignesh21@gmail.com>)
Responses Re: Handle infinite recursion in logical replication setup
Re: Handle infinite recursion in logical replication setup
List pgsql-hackers
On 7/25/22 4:54 AM, vignesh C wrote:
> On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>
>> On 7/22/22 12:47 AM, Amit Kapila wrote:
>>> On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:

>>> BTW, do you have any opinion on the idea of the first remaining patch
>>> where we accomplish two things: a) Checks and throws an error if
>>> 'copy_data = on' and 'origin = none' but the publication tables were
>>> also replicated from other publishers. b) Adds 'force' value for
>>> copy_data parameter to allow copying in such a case. The primary
>>> reason for this patch is to avoid loops or duplicate data in the
>>> initial phase. We can't skip copying based on origin as we can do
>>> while replicating changes from WAL. So, we detect that the publisher
>>> already has data from some other node and doesn't allow replication
>>> unless the user uses the 'force' option for copy_data.
>>
>> In general, I agree with the patch; but I'm not sure why we are calling
>> "copy_data = force" in this case and how it varies from "on". I
>> understand the goal is to prevent the infinite loop, but is there some
>> technical restriction why we can't set "origin = none, copy_data = on"
>> and have this work (and apologies if I missed that upthread)?
> 
> Let's take a simple case to understand why copy_data = force is
> required to replicate between two primaries for table t1 which has
> data as given below:

> step4 - Node-2:
> Create Subscription sub2 Connection '<node-1 details>' Publication
> pub1_2 with (origin = none, copy_data=on);
> If we had allowed the create subscription to be successful with
> copy_data = on. After this the data will be something like this:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
> 
> Node-2:
> 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
> 
> So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
> case, table t1 has a unique key, it will lead to a unique key
> violation and replication won't proceed.
> 
> To avoid this we will throw an error:
> ERROR:  could not replicate table "public.t1"
> DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
> on is not allowed when the publisher has subscribed same table.
> HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.

Thanks for the example. I agree that it is fairly simple to reproduce.

I understand that "copy_data = force" is meant to protect a user from 
hurting themself. I'm not convinced that this is the best way to do so.

For example today I can subscribe to multiple publications that write to 
the same table. If I have a primary key on that table, and two of the 
subscriptions try to write an identical ID, we conflict. We don't have 
any special flags or modes to guard against that from happening, though 
we do have documentation on conflicts and managing them.

AFAICT the same issue with "copy_data" also exists in the above scenario 
too, even without the "origin" attribute. However, I think this case is 
more noticeable for "origin=none" because we currently default 
"copy_data" to "true" and in this case data can be copied in two 
directions.

That said, this introduces a new restriction for this particular 
scenario that doesn't exist on other scenarios. Instead, I would 
advocate we document how to correctly set up the two-way replication 
scenario (which we have a draft!), document the warnings around the 
conflicts, perhaps include Vignesh's instructions on how to remediate a 
conflict on initial sync, and consider throwing a WARNING as you suggested.

Thoughts?

Thanks,

Jonathan

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Introduce wait_for_subscription_sync for TAP tests
Next
From: Peter Smith
Date:
Subject: Re: Handle infinite recursion in logical replication setup