On Wed, Oct 20, 2021 at 3:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Wed, Oct 20, 2021 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > Actually, at least with the scenario I gave steps for, after looking
> > > > at it again and debugging, I think that the behavior is understandable
> > > > and not a bug.
> > > > The reason is that the INSERTed data is first published though the
> > > > partitions, since initially there is no partitioned table in the
> > > > publication (so publish_via_partition_root=true doesn't have any
> > > > effect). But then adding the partitioned table to the publication and
> > > > refreshing the publication in the subscriber, the data is then
> > > > published "using the identity and schema of the partitioned table" due
> > > > to publish_via_partition_root=true. Note that the corresponding table
> > > > in the subscriber may well be a non-partitioned table (or the
> > > > partitions arranged differently) so the data does need to be
> > > > replicated again.
> > >
> >
> > Even if the partitions are arranged differently why would the user
> > expect the same data to be replicated twice?
> >
>
> It's the same data, but published in different ways because of changes
> the user made to the publication.
> I am not talking in general, I am specifically referring to the
> scenario I gave steps for.
> In the example scenario I gave, initially when the subscription was
> made, the publication just explicitly included the partitions, but
> publish_via_partition_root was true. So in this case it publishes
> through the individual partitions (as no partitioned table is present
> in the publication). Then on the publisher side, the partitioned table
> was then added to the publication and then ALTER SUBSCRIPTION ...
> REFRESH PUBLICATION done on the subscriber side. Now that the
> partitioned table is present in the publication and
> publish_via_partition_root is true, it is "published using the
> identity and schema of the partitioned table rather than that of the
> individual partitions that are actually changed". So the data is
> replicated again.
>
I don't see why data need to be replicated again even in that case.
Can you see any such duplicate data replicated for non-partitioned
tables?
> This scenario didn't use initial table data, so initial table sync
> didn't come into play
>
It will be equivalent to initial sync because the tablesync worker
would copy the entire data again in this case unless during refresh we
pass copy_data as false.
--
With Regards,
Amit Kapila.