RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS0PR01MB57165AB0FE2E96B642238AFB94BD9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication  (Dilip Kumar <dilipbalaut@gmail.com>)
RE: Data is copied twice when specifying both child and parent table in publication  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
List pgsql-hackers
On Monday, October 18, 2021 5:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I can imagine that the behavior seen here may look surprising, but not
> sure if I would call it a bug as such.  I do remember thinking about
> this case and the current behavior is how I may have coded it to be.
> 
> Looking at this command in Hou-san's email:
> 
>   create publication pub for table tbl1, tbl1_part1 with
> (publish_via_partition_root=on);
> 
> It's adding both the root partitioned table and the leaf partition
> *explicitly*, and it's not clear to me if the latter's inclusion in
> the publication should be assumed because the former is found to have
> been added to the publication, that is, as far as the latter's
> visibility to the subscriber is concerned.  It's not a stretch to
> imagine that a user may write the command this way to account for a
> subscriber node on which tbl1 and tbl1_part1 are unrelated tables.
> 
> I don't think we assume anything on the publisher side regarding the
> state/configuration of tables on the subscriber side, at least with
> publication commands where tables are added to a publication
> explicitly, so it is up to the user to make sure that the tables are
> not added duplicatively.  One may however argue that the way we've
> decided to handle FOR ALL TABLES does assume something about
> partitions where it skips advertising them to subscribers when
> publish_via_partition_root flag is set to true, but that is exactly to
> avoid the duplication of data that goes to a subscriber.

Hi,

Thanks for the explanation.

I think one reason that I consider this behavior a bug is that: If we add
both the root partitioned table and the leaf partition explicitly to the
publication (and set publish_via_partition_root = on), the behavior of the
apply worker is inconsistent with the behavior of table sync worker.

In this case, all changes in the leaf the partition will be applied using the
identity and schema of the partitioned(root) table. But for the table sync, it
will execute table sync for both the leaf and the root table which cause
duplication of data.

Wouldn't it be better to make the behavior consistent here ?

Best regards,
Hou zj



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Failed transaction statistics to measure the logical replication progress