Home > mailing lists

Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Data is copied twice when specifying both child and parent table in publication
Date	October 20, 2021 10:19:46
Msg-id	CAA4eK1+Y0cP+xgZuHHxvsO=hQ+Zrp4GbBRTKH6kCGq3=FfVAHA@mail.gmail.com Whole thread
In response to	Re: Data is copied twice when specifying both child and parent table in publication (Greg Nancarrow <gregn4422@gmail.com>)
Responses	Re: Data is copied twice when specifying both child and parent table in publication
List	pgsql-hackers

Tree view

On Wed, Oct 20, 2021 at 3:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Oct 20, 2021 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > Actually, at least with the scenario I gave steps for, after looking
> > > > at it again and debugging, I think that the behavior is understandable
> > > > and not a bug.
> > > > The reason is that the INSERTed data is first published though the
> > > > partitions, since initially there is no partitioned table in the
> > > > publication (so publish_via_partition_root=true doesn't have any
> > > > effect). But then adding the partitioned table to the publication and
> > > > refreshing the publication in the subscriber, the data is then
> > > > published "using the identity and schema of the partitioned table" due
> > > > to publish_via_partition_root=true. Note that the corresponding table
> > > > in the subscriber may well be a non-partitioned table (or the
> > > > partitions arranged differently) so the data does need to be
> > > > replicated again.
> > >
> >
> > Even if the partitions are arranged differently why would the user
> > expect the same data to be replicated twice?
> >
>
> It's the same data, but published in different ways because of changes
> the user made to the publication.
> I am not talking in general, I am specifically referring to the
> scenario I gave steps for.
> In the example scenario I gave, initially when the subscription was
> made, the publication just explicitly included the partitions, but
> publish_via_partition_root was true. So in this case it publishes
> through the individual partitions (as no partitioned table is present
> in the publication). Then on the publisher side, the partitioned table
> was then added to the publication and then ALTER SUBSCRIPTION ...
> REFRESH PUBLICATION done on the subscriber side. Now that the
> partitioned table is present in the publication and
> publish_via_partition_root is true, it is "published using the
> identity and schema of the partitioned table rather than that of the
> individual partitions that are actually changed". So the data is
> replicated again.
>

I don't see why data need to be replicated again even in that case.
Can you see any such duplicate data replicated for non-partitioned
tables?

> This scenario didn't use initial table data, so initial table sync
> didn't come into play
>

It will be equivalent to initial sync because the tablesync worker
would copy the entire data again in this case unless during refresh we
pass copy_data as false.

-- 
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Masahiro Ikeda
Date: 20 October 2021, 10:16:20
Subject: Re: LogicalChanges* and LogicalSubxact* wait events are never reported

From: Amit Kapila
Date: 20 October 2021, 10:47:26
Subject: Re: pgsql: Document XLOG_INCLUDE_XID a little better

Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

Previous

Next