Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAJcOf-fv7tEv=N+LZo9H1fp1A7NB9wsWDDMw048XNy2fyESgnw@mail.gmail.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Oct 20, 2021 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > Actually, at least with the scenario I gave steps for, after looking
> > > at it again and debugging, I think that the behavior is understandable
> > > and not a bug.
> > > The reason is that the INSERTed data is first published though the
> > > partitions, since initially there is no partitioned table in the
> > > publication (so publish_via_partition_root=true doesn't have any
> > > effect). But then adding the partitioned table to the publication and
> > > refreshing the publication in the subscriber, the data is then
> > > published "using the identity and schema of the partitioned table" due
> > > to publish_via_partition_root=true. Note that the corresponding table
> > > in the subscriber may well be a non-partitioned table (or the
> > > partitions arranged differently) so the data does need to be
> > > replicated again.
> >
>
> Even if the partitions are arranged differently why would the user
> expect the same data to be replicated twice?
>

It's the same data, but published in different ways because of changes
the user made to the publication.
I am not talking in general, I am specifically referring to the
scenario I gave steps for.
In the example scenario I gave, initially when the subscription was
made, the publication just explicitly included the partitions, but
publish_via_partition_root was true. So in this case it publishes
through the individual partitions (as no partitioned table is present
in the publication). Then on the publisher side, the partitioned table
was then added to the publication and then ALTER SUBSCRIPTION ...
REFRESH PUBLICATION done on the subscriber side. Now that the
partitioned table is present in the publication and
publish_via_partition_root is true, it is "published using the
identity and schema of the partitioned table rather than that of the
individual partitions that are actually changed". So the data is
replicated again.
This scenario didn't use initial table data, so initial table sync
didn't come into play (although as I previously posted, I can see a
double-publish issue on initial sync if data is put in the table prior
to subscription and partitions have been explicitly added to the
publication).

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: LogicalChanges* and LogicalSubxact* wait events are never reported
Next
From: Ronan Dunklau
Date:
Subject: Re: pg_receivewal starting position