RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From shiy.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OSZPR01MB631041286A2FA44C6195813DFD809@OSZPR01MB6310.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Tuesday, October 19, 2021 10:47 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> 
> On Monday, October 18, 2021 5:03 PM Amit Langote
> <amitlangote09@gmail.com> wrote:
> > I can imagine that the behavior seen here may look surprising, but not
> > sure if I would call it a bug as such.  I do remember thinking about
> > this case and the current behavior is how I may have coded it to be.
> >
> > Looking at this command in Hou-san's email:
> >
> >   create publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=on);
> >
> > It's adding both the root partitioned table and the leaf partition
> > *explicitly*, and it's not clear to me if the latter's inclusion in
> > the publication should be assumed because the former is found to have
> > been added to the publication, that is, as far as the latter's
> > visibility to the subscriber is concerned.  It's not a stretch to
> > imagine that a user may write the command this way to account for a
> > subscriber node on which tbl1 and tbl1_part1 are unrelated tables.
> >
> > I don't think we assume anything on the publisher side regarding the
> > state/configuration of tables on the subscriber side, at least with
> > publication commands where tables are added to a publication
> > explicitly, so it is up to the user to make sure that the tables are
> > not added duplicatively.  One may however argue that the way we've
> > decided to handle FOR ALL TABLES does assume something about
> > partitions where it skips advertising them to subscribers when
> > publish_via_partition_root flag is set to true, but that is exactly to
> > avoid the duplication of data that goes to a subscriber.
> 
> Hi,
> 
> Thanks for the explanation.
> 
> I think one reason that I consider this behavior a bug is that: If we add
> both the root partitioned table and the leaf partition explicitly to the
> publication (and set publish_via_partition_root = on), the behavior of the
> apply worker is inconsistent with the behavior of table sync worker.
> 
> In this case, all changes in the leaf the partition will be applied using the
> identity and schema of the partitioned(root) table. But for the table sync, it
> will execute table sync for both the leaf and the root table which cause
> duplication of data.
> 
> Wouldn't it be better to make the behavior consistent here ?
> 

I agree with this point. 

About this case,

> >   create publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=on);

As a user, although partitioned table includes the partition, publishing partitioned
table and its partition is allowed. So, I think we should take this case into
consideration. Initial data is copied once via the parent table seems reasonable.

Regards
Shi yu

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?
Next
From: Tom Lane
Date:
Subject: Re: Experimenting with hash tables inside pg_dump