Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAA4eK1JtYLk-Hp7mKKw1-gYf2q+feQe-mT4EdJOJj9aSUjUEeg@mail.gmail.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Oct 19, 2021 at 8:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 18, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Oct 18, 2021 at 2:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > Looking at this command in Hou-san's email:
> > >
> > >   create publication pub for table tbl1, tbl1_part1 with
> > > (publish_via_partition_root=on);
> > >
> > > It's adding both the root partitioned table and the leaf partition
> > > *explicitly*, and it's not clear to me if the latter's inclusion in
> > > the publication should be assumed because the former is found to have
> > > been added to the publication, that is, as far as the latter's
> > > visibility to the subscriber is concerned.  It's not a stretch to
> > > imagine that a user may write the command this way to account for a
> > > subscriber node on which tbl1 and tbl1_part1 are unrelated tables.
> > >
> > > I don't think we assume anything on the publisher side regarding the
> > > state/configuration of tables on the subscriber side, at least with
> > > publication commands where tables are added to a publication
> > > explicitly, so it is up to the user to make sure that the tables are
> > > not added duplicatively.  One may however argue that the way we've
> > > decided to handle FOR ALL TABLES does assume something about
> > > partitions where it skips advertising them to subscribers when
> > > publish_via_partition_root flag is set to true, but that is exactly to
> > > avoid the duplication of data that goes to a subscriber.
> > >
> >
> > I think the same confusion will then apply to the new feature (For All
> > Tables In Schema) being discussed [1] (that is a bit long thread so
> > shared the email where the latest patch version is posted). There
> > also, the partitioned table and partition can be in a different
> > schema.
> >
>
> Sorry, I wanted to say that table partition and partitioned table can
> be in the same schema. Now, if the user publishes all tables in a
> schema, if we want to follow the "For All Tables" behavior then we
> should skip the leaf table and publish only the parent table, OTOH, if
> want to follow "For Table" behavior, we need to publish both
> partitioned table and partition table. I feel it is better to be
> consistent here in all three cases ("For Table", "For All Tables", and
> "For All Tables In Schema") as it will be easier to explain and
> document it.
>

Thinking some more about it, I think we also have a problem when the
partitioned and partition tables are in different schemas especially
when the user created a publication having a combination of "For
Table" and "For All Tables In Schema", see below:

create schema sch1;
create schema sch2;

create table sch1.tbl1 (a int) partition by range ( a );
create table sch2.tbl1_part1 partition of sch1.tbl1 for values from
(1) to (101);
create table sch1.tbl1_part2 partition of sch1.tbl1 for values from
(101) to (200);

create publication mypub for table sch2.tbl1_part1, all tables in
schema sch1 WITH (publish_via_partition_root = true);

Now, here if we follow the rules of "For Table", then we should get
both partitioned and partition tables which will be different from the
case when both are in the same schema considering we follow "For All
Tables" behavior in "For All Tables In Schema" case.

The point is that as we extend the current feature, I think the
complications will increase if we don't have a consistent behavior for
all cases and it will also be difficult to explain it to users.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Prefer getenv("HOME") to find the UNIX home directory
Next
From: Amit Kapila
Date:
Subject: Re: Added schema level support for publication.