On Tuesday, October 19, 2021 10:47 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
>
> On Monday, October 18, 2021 5:03 PM Amit Langote
> <amitlangote09@gmail.com> wrote:
> > I can imagine that the behavior seen here may look surprising, but not
> > sure if I would call it a bug as such. I do remember thinking about
> > this case and the current behavior is how I may have coded it to be.
> >
> > Looking at this command in Hou-san's email:
> >
> > create publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=on);
> >
> > It's adding both the root partitioned table and the leaf partition
> > *explicitly*, and it's not clear to me if the latter's inclusion in
> > the publication should be assumed because the former is found to have
> > been added to the publication, that is, as far as the latter's
> > visibility to the subscriber is concerned. It's not a stretch to
> > imagine that a user may write the command this way to account for a
> > subscriber node on which tbl1 and tbl1_part1 are unrelated tables.
> >
> > I don't think we assume anything on the publisher side regarding the
> > state/configuration of tables on the subscriber side, at least with
> > publication commands where tables are added to a publication
> > explicitly, so it is up to the user to make sure that the tables are
> > not added duplicatively. One may however argue that the way we've
> > decided to handle FOR ALL TABLES does assume something about
> > partitions where it skips advertising them to subscribers when
> > publish_via_partition_root flag is set to true, but that is exactly to
> > avoid the duplication of data that goes to a subscriber.
>
> Hi,
>
> Thanks for the explanation.
>
> I think one reason that I consider this behavior a bug is that: If we add
> both the root partitioned table and the leaf partition explicitly to the
> publication (and set publish_via_partition_root = on), the behavior of the
> apply worker is inconsistent with the behavior of table sync worker.
>
> In this case, all changes in the leaf the partition will be applied using the
> identity and schema of the partitioned(root) table. But for the table sync, it
> will execute table sync for both the leaf and the root table which cause
> duplication of data.
>
> Wouldn't it be better to make the behavior consistent here ?
>
I agree with this point.
About this case,
> > create publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=on);
As a user, although partitioned table includes the partition, publishing partitioned
table and its partition is allowed. So, I think we should take this case into
consideration. Initial data is copied once via the parent table seems reasonable.
Regards
Shi yu