Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAJcOf-fHq5Mca2sf7MqckwkXGLfjqiKboKsDNnywC-jnvM_BBQ@mail.gmail.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Wed, Oct 20, 2021 at 7:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > Actually, at least with the scenario I gave steps for, after looking
> > at it again and debugging, I think that the behavior is understandable
> > and not a bug.
> > The reason is that the INSERTed data is first published though the
> > partitions, since initially there is no partitioned table in the
> > publication (so publish_via_partition_root=true doesn't have any
> > effect). But then adding the partitioned table to the publication and
> > refreshing the publication in the subscriber, the data is then
> > published "using the identity and schema of the partitioned table" due
> > to publish_via_partition_root=true. Note that the corresponding table
> > in the subscriber may well be a non-partitioned table (or the
> > partitions arranged differently) so the data does need to be
> > replicated again.
>
> I don't think this behavior is consistent, I mean for the initial sync
> we will replicate the duplicate data, whereas for later streaming we
> will only replicate it once.  From the user POW, this behavior doesn't
> look correct.
>

The scenario I gave steps for didn't have any table data when the
subscription was made, so the initial sync did not replicate any data.
I was referring to the double-publish that occurs when
publish_via_partition_root=true and then the partitioned table is
added to the publication and the subscriber does ALTER SUBSCRIPTION
... REFRESH PUBLICATION.
If I modify my example to include both the partitioned table and
(explicitly) its child partitions in the publication, and insert some
data on the publisher side prior to the subscription, then I am seeing
duplicate data on the initial sync on the subscriber side, and I would
agree that this doesn't seem correct.

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Next
From: Amit Kapila
Date:
Subject: Re: LogicalChanges* and LogicalSubxact* wait events are never reported