Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CA+HiwqEJdWeYKS+W+_enTt+SkpJ4wioBa+KFN6HamaT5RA=UtQ@mail.gmail.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Mon, Oct 18, 2021 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Oct 18, 2021 at 8:27 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Sat, Oct 16, 2021 at 5:30 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Friday, October 15, 2021 7:23 PM houzj.fnst@fujitsu.com wrote:
> > > > Attach a patch to fix it.
> > > Attach a new version patch which refactor the fix code in a cleaner way.
> > >
> >
> > I have not debugged it yet to find out why, but with the patch
> > applied, the original double-publish problem that I reported
> > (converted to just use TABLE rather than ALL TABLES IN SCHEMA) still
> > occurs.
> >
>
> Yeah, I think this is a variant of the problem being fixed by
> Hou-San's patch. I think one possible idea to investigate is that on
> the subscriber-side, after fetching tables, we check the already
> subscribed tables and if the child tables already exist then we ignore
> the parent table and vice versa. We might want to consider the case
> where a user has toggled the "publish_via_partition_root" parameter.
>
> It seems both these behaviours/problems exist since commit 17b9e7f9
> (Support adding partitioned tables to publication). Adding Amit L and
> Peter E (people involved in this work) to know their opinion?

I can imagine that the behavior seen here may look surprising, but not
sure if I would call it a bug as such.  I do remember thinking about
this case and the current behavior is how I may have coded it to be.

Looking at this command in Hou-san's email:

  create publication pub for table tbl1, tbl1_part1 with
(publish_via_partition_root=on);

It's adding both the root partitioned table and the leaf partition
*explicitly*, and it's not clear to me if the latter's inclusion in
the publication should be assumed because the former is found to have
been added to the publication, that is, as far as the latter's
visibility to the subscriber is concerned.  It's not a stretch to
imagine that a user may write the command this way to account for a
subscriber node on which tbl1 and tbl1_part1 are unrelated tables.

I don't think we assume anything on the publisher side regarding the
state/configuration of tables on the subscriber side, at least with
publication commands where tables are added to a publication
explicitly, so it is up to the user to make sure that the tables are
not added duplicatively.  One may however argue that the way we've
decided to handle FOR ALL TABLES does assume something about
partitions where it skips advertising them to subscribers when
publish_via_partition_root flag is set to true, but that is exactly to
avoid the duplication of data that goes to a subscriber.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side