Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAA4eK1JmEpuhQeo9hzd=5jA+e7gwS34x__Ymu_3SuwhUBaN16Q@mail.gmail.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
On Wed, Oct 20, 2021 at 1:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 20, 2021 at 12:44 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Mon, Oct 18, 2021 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > I have not debugged it yet to find out why, but with the patch
> > > > applied, the original double-publish problem that I reported
> > > > (converted to just use TABLE rather than ALL TABLES IN SCHEMA) still
> > > > occurs.
> > > >
> > >
> > > Yeah, I think this is a variant of the problem being fixed by
> > > Hou-San's patch. I think one possible idea to investigate is that on
> > > the subscriber-side, after fetching tables, we check the already
> > > subscribed tables and if the child tables already exist then we ignore
> > > the parent table and vice versa. We might want to consider the case
> > > where a user has toggled the "publish_via_partition_root" parameter.
> > >
> > > It seems both these behaviours/problems exist since commit 17b9e7f9
> > > (Support adding partitioned tables to publication). Adding Amit L and
> > > Peter E (people involved in this work) to know their opinion?
> > >
> >
> > Actually, at least with the scenario I gave steps for, after looking
> > at it again and debugging, I think that the behavior is understandable
> > and not a bug.
> > The reason is that the INSERTed data is first published though the
> > partitions, since initially there is no partitioned table in the
> > publication (so publish_via_partition_root=true doesn't have any
> > effect). But then adding the partitioned table to the publication and
> > refreshing the publication in the subscriber, the data is then
> > published "using the identity and schema of the partitioned table" due
> > to publish_via_partition_root=true. Note that the corresponding table
> > in the subscriber may well be a non-partitioned table (or the
> > partitions arranged differently) so the data does need to be
> > replicated again.
>

Even if the partitions are arranged differently why would the user
expect the same data to be replicated twice?

> I don't think this behavior is consistent, I mean for the initial sync
> we will replicate the duplicate data, whereas for later streaming we
> will only replicate it once.  From the user POW, this behavior doesn't
> look correct.
>

+1.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Next
From: Greg Nancarrow
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication