Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAA4eK1+Y0cP+xgZuHHxvsO=hQ+Zrp4GbBRTKH6kCGq3=FfVAHA@mail.gmail.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
On Wed, Oct 20, 2021 at 3:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Oct 20, 2021 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > Actually, at least with the scenario I gave steps for, after looking
> > > > at it again and debugging, I think that the behavior is understandable
> > > > and not a bug.
> > > > The reason is that the INSERTed data is first published though the
> > > > partitions, since initially there is no partitioned table in the
> > > > publication (so publish_via_partition_root=true doesn't have any
> > > > effect). But then adding the partitioned table to the publication and
> > > > refreshing the publication in the subscriber, the data is then
> > > > published "using the identity and schema of the partitioned table" due
> > > > to publish_via_partition_root=true. Note that the corresponding table
> > > > in the subscriber may well be a non-partitioned table (or the
> > > > partitions arranged differently) so the data does need to be
> > > > replicated again.
> > >
> >
> > Even if the partitions are arranged differently why would the user
> > expect the same data to be replicated twice?
> >
>
> It's the same data, but published in different ways because of changes
> the user made to the publication.
> I am not talking in general, I am specifically referring to the
> scenario I gave steps for.
> In the example scenario I gave, initially when the subscription was
> made, the publication just explicitly included the partitions, but
> publish_via_partition_root was true. So in this case it publishes
> through the individual partitions (as no partitioned table is present
> in the publication). Then on the publisher side, the partitioned table
> was then added to the publication and then ALTER SUBSCRIPTION ...
> REFRESH PUBLICATION done on the subscriber side. Now that the
> partitioned table is present in the publication and
> publish_via_partition_root is true, it is "published using the
> identity and schema of the partitioned table rather than that of the
> individual partitions that are actually changed". So the data is
> replicated again.
>

I don't see why data need to be replicated again even in that case.
Can you see any such duplicate data replicated for non-partitioned
tables?

> This scenario didn't use initial table data, so initial table sync
> didn't come into play
>

It will be equivalent to initial sync because the tablesync worker
would copy the entire data again in this case unless during refresh we
pass copy_data as false.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: LogicalChanges* and LogicalSubxact* wait events are never reported
Next
From: Amit Kapila
Date:
Subject: Re: pgsql: Document XLOG_INCLUDE_XID a little better