Thread: pg_get_publication_tables() output duplicate relid

pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
Hi hackers,

In another thread[1], we found the pg_get_publication_tables function will output
duplicate partition relid when adding both child and parent table to the
publication(pubviaroot = false).

Example:
create table tbl1 (a int) partition by range (a);
create table tbl1_part1 partition of tbl1 for values from (1) to (10);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);

select * from pg_get_publication_tables('pub');
 relid
-------
 16387
 16390
 16387

The reason of the duplicate output is that:

pg_get_publication_tables invokes function GetPublicationRelations internally.
In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
twice. First time from extracting partitions from the specified parent table
'tbl1', second time from the explicitly specified partition 'tbl1_part1'.


I am not sure is this behavior expected as it seems natural for
pg_get_publication_tables to return duplicate-free relid list. OTOH, there
seems no harm for the current behavior(duplicate output), it doesn't affect the
initial sync and change replication when using logical replication.

Personally, I think it might be better to make the output of
pg_get_publication_tables duplicate-free, because the change happened on each
output relid will only be replicated once. So, it seems more consistent to
output each relid only once.

Thoughts ?

(Attach a patch which make the output duplicate-free)

[1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com

Best regards,
Hou zj

Attachment

Re: pg_get_publication_tables() output duplicate relid

From
Bharath Rupireddy
Date:
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Hi hackers,
>
> In another thread[1], we found the pg_get_publication_tables function will output
> duplicate partition relid when adding both child and parent table to the
> publication(pubviaroot = false).
>
> Example:
> create table tbl1 (a int) partition by range (a);
> create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> create publication pub for table
> tbl1, tbl1_part1 with (publish_via_partition_root=false);
>
> select * from pg_get_publication_tables('pub');
>  relid
> -------
>  16387
>  16390
>  16387
>
> The reason of the duplicate output is that:
>
> pg_get_publication_tables invokes function GetPublicationRelations internally.
> In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
> twice. First time from extracting partitions from the specified parent table
> 'tbl1', second time from the explicitly specified partition 'tbl1_part1'.
>
>
> I am not sure is this behavior expected as it seems natural for
> pg_get_publication_tables to return duplicate-free relid list. OTOH, there
> seems no harm for the current behavior(duplicate output), it doesn't affect the
> initial sync and change replication when using logical replication.
>
> Personally, I think it might be better to make the output of
> pg_get_publication_tables duplicate-free, because the change happened on each
> output relid will only be replicated once. So, it seems more consistent to
> output each relid only once.
>
> Thoughts ?
>
> (Attach a patch which make the output duplicate-free)
>
> [1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com

The users can always specify the distinct clause, see [1]. I don't see
any problem with the existing way. If required you can specify in the
view pg_publication_tables documentation that "This view returns
multiple rows for the same relation, if the relation is child to a
parent partition table and if both the parent and child are specified
in the publication" or something similart.

If at all, the pg_get_publication_tables() is supposed to give unique
outputs, let's fix it and it is a good idea to specify that the view
pg_publication_tables returns unique rows even though child and parent
partition tables are specified in the publication.

I have fdw comments on the patch:
1) Why to do  result = list_concat_unique_oid(result, relids); within
the while loop which can make the while loop O(n*n*n)
complexity(list_concat_unique_oid is O(n*n)? Can't you use
list_append_unique, of course this can also be costlier? Let's avoid
adding anything to the while loop and use
list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))

    /* Now sort and de-duplicate the result list */
    list_sort(result, list_oid_cmp);
    list_deduplicate_oid(result);

  while (HeapTupleIsValid(tup = systable_getnext(scan)))
  {
  Form_pg_publication_rel pubrel;
+ List *relids = NIL;

  pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
- result = GetPubPartitionOptionRelations(result, pub_partopt,
+ relids = GetPubPartitionOptionRelations(relids, pub_partopt,
  pubrel->prrelid);
+
+ result = list_concat_unique_oid(result, relids);
  }

2) Are you sure you want GetPublicationRelations to be returning the
unique relations? I'm just thinking why can't you just do
list_sort+list_deduplicate_oid in the caller? I think this makes the
function more restrictive. If required, you can pass a boolean flag
(bool give_unique and specify in the function comments and if passed
true do list_sort+list_deduplicate_oid at the end of
GetPublicationRelations ).

[1]
postgres=# select * from pg_get_publication_tables('pub');
 relid
-------
 16387
 16390
 16387
(3 rows)

postgres=# select distinct * from pg_get_publication_tables('pub');
 relid
-------
 16387
 16390
(2 rows)

postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename
---------+------------+------------
 pub     | public     | tbl1_part1
 pub     | public     | tbl1_part2
 pub     | public     | tbl1_part1
(3 rows)

postgres=# select distinct * from pg_publication_tables;
 pubname | schemaname | tablename
---------+------------+------------
 pub     | public     | tbl1_part1
 pub     | public     | tbl1_part2
(2 rows)

Regards,
Bharath Rupireddy.



Re: pg_get_publication_tables() output duplicate relid

From
Alvaro Herrera
Date:
> On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:

> > create table tbl1 (a int) partition by range (a);
> > create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > create publication pub for table
> > tbl1, tbl1_part1 with (publish_via_partition_root=false);

In the name of consistency, I think this situation should be an error --
I mean, if we detect that the user is trying to add both the partitioned
table *and* its partition, then all manner of things are possibly going
to go wrong in some way, so my inclination is to avoid it altogether.

Is there any reason to allow that?

If we do that, then we have to be careful with later ALTER PUBLICATION
too: adding a partition is not allowed if its partitioned table is
there, and adding a partitioned table should behave in some sensible way
if one of its partitions is there (either removing the partition at the
same time, or outright rejecting the operation.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Mon, Nov 15, 2021 6:17 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Hi hackers,
> >
> > In another thread[1], we found the pg_get_publication_tables function
> > will output duplicate partition relid when adding both child and
> > parent table to the publication(pubviaroot = false).
> >
> > Example:
> > create table tbl1 (a int) partition by range (a); create table
> > tbl1_part1 partition of tbl1 for values from (1) to (10); create table
> > tbl1_part2 partition of tbl1 for values from (10) to (20); create
> > publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=false);
> >
> > select * from pg_get_publication_tables('pub');  relid
> > -------
> >  16387
> >  16390
> >  16387
> >
> > The reason of the duplicate output is that:
> >
> > pg_get_publication_tables invokes function GetPublicationRelations
> internally.
> > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
> > twice. First time from extracting partitions from the specified parent
> > table 'tbl1', second time from the explicitly specified partition 'tbl1_part1'.
> >
> >
> > I am not sure is this behavior expected as it seems natural for
> > pg_get_publication_tables to return duplicate-free relid list. OTOH,
> > there seems no harm for the current behavior(duplicate output), it
> > doesn't affect the initial sync and change replication when using logical
> replication.
> >
> > Personally, I think it might be better to make the output of
> > pg_get_publication_tables duplicate-free, because the change happened
> > on each output relid will only be replicated once. So, it seems more
> > consistent to output each relid only once.
> >
> > Thoughts ?
> >
> > (Attach a patch which make the output duplicate-free)
> >
> > [1]
> >
> https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37Ptsxu
> NW4nB
> > GN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com
> 
> The users can always specify the distinct clause, see [1]. I don't see any problem
> with the existing way. If required you can specify in the view
> pg_publication_tables documentation that "This view returns multiple rows for
> the same relation, if the relation is child to a parent partition table and if both
> the parent and child are specified in the publication" or something similart.

Thanks for the response.
Yes, I agreed, as I said there's no harm for the current behavior. If most people don't think
It's worth changing the behavior, I am fine with that.

> If at all, the pg_get_publication_tables() is supposed to give unique outputs,
> let's fix it and it is a good idea to specify that the view pg_publication_tables
> returns unique rows even though child and parent partition tables are specified
> in the publication.
> 
> I have fdw comments on the patch:
> Let's avoid adding anything to the while
> loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))
> 

Thanks for the comments, I will address this if we finally decide to fix it.

> 
> 2) Are you sure you want GetPublicationRelations to be returning the unique
> relations? I'm just thinking why can't you just do list_sort+list_deduplicate_oid
> in the caller? I think this makes the function more restrictive. If required, you
> can pass a boolean flag (bool give_unique and specify in the function
> comments and if passed true do list_sort+list_deduplicate_oid at the end of
> GetPublicationRelations ).

I change the GetPublicationRelations because I found no caller need the
duplicate oid, and de-duplicate oid in GetPublicationRelations could reduce
some code change. I think if we need the duplicate oid in the future, we can
add the flag as you suggested.

Best regards,
Hou zj

RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> 
> > > create table tbl1 (a int) partition by range (a); create table
> > > tbl1_part1 partition of tbl1 for values from (1) to (10); create
> > > table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > create publication pub for table tbl1, tbl1_part1 with
> > > (publish_via_partition_root=false);
> 
> In the name of consistency, I think this situation should be an error -- I mean, if
> we detect that the user is trying to add both the partitioned table *and* its
> partition, then all manner of things are possibly going to go wrong in some way,
> so my inclination is to avoid it altogether.
>
> Is there any reason to allow that?
>
> If we do that, then we have to be careful with later ALTER PUBLICATION
> too: adding a partition is not allowed if its partitioned table is there, and adding
> a partitioned table should behave in some sensible way if one of its partitions is
> there (either removing the partition at the same time, or outright rejecting the
> operation.)

Thanks for the response.

If we decide to disallow this case, we seem need to handle some other cases as
well, for example: We might also need additional check when ATTACH a partition,
because the partition's parent table could already be published in the same
publication as the partition. During the check we might also need to lock all
the parent tables to handle concurrently change which looks complicated to me :(
I am not sure is it worth adding these lock and check. Maybe we can leave the current
behavior as is ?

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
>
> > > create table tbl1 (a int) partition by range (a);
> > > create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> > > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > create publication pub for table
> > > tbl1, tbl1_part1 with (publish_via_partition_root=false);
>
> In the name of consistency, I think this situation should be an error --
> I mean, if we detect that the user is trying to add both the partitioned
> table *and* its partition, then all manner of things are possibly going
> to go wrong in some way, so my inclination is to avoid it altogether.
>
> Is there any reason to allow that?
>

I think it could provide flexibility to users to later change
"publish_via_partition_root" option. Because when that option is
false, we use individual partitions schema to send changes and when it
is true, we use root table's schema to send changes. Added Amit L. to
know if he has any thoughts on this matter as he was the author of
this work?

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> > > <houzj.fnst@fujitsu.com> wrote:
> >
> > > > create table tbl1 (a int) partition by range (a); create table
> > > > tbl1_part1 partition of tbl1 for values from (1) to (10); create
> > > > table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > > create publication pub for table tbl1, tbl1_part1 with
> > > > (publish_via_partition_root=false);
> >
> > In the name of consistency, I think this situation should be an error -- I mean, if
> > we detect that the user is trying to add both the partitioned table *and* its
> > partition, then all manner of things are possibly going to go wrong in some way,
> > so my inclination is to avoid it altogether.
> >
> > Is there any reason to allow that?
> >
> > If we do that, then we have to be careful with later ALTER PUBLICATION
> > too: adding a partition is not allowed if its partitioned table is there, and adding
> > a partitioned table should behave in some sensible way if one of its partitions is
> > there (either removing the partition at the same time, or outright rejecting the
> > operation.)
>
> Thanks for the response.
>
> If we decide to disallow this case, we seem need to handle some other cases as
> well, for example: We might also need additional check when ATTACH a partition,
> because the partition's parent table could already be published in the same
> publication as the partition.
>

What kind of additional checks you are envisioning and why?

-- 
With Regards,
Amit Kapila.



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:
> > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> > > > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > > > create table tbl1 (a int) partition by range (a); create table
> > > > > tbl1_part1 partition of tbl1 for values from (1) to (10); create
> > > > > table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > > > create publication pub for table tbl1, tbl1_part1 with
> > > > > (publish_via_partition_root=false);
> > >
> > > In the name of consistency, I think this situation should be an
> > > error -- I mean, if we detect that the user is trying to add both
> > > the partitioned table *and* its partition, then all manner of things
> > > are possibly going to go wrong in some way, so my inclination is to avoid it
> > > altogether.
> > >
> > > Is there any reason to allow that?
> > >
> > > If we do that, then we have to be careful with later ALTER
> > > PUBLICATION
> > > too: adding a partition is not allowed if its partitioned table is
> > > there, and adding a partitioned table should behave in some sensible
> > > way if one of its partitions is there (either removing the partition
> > > at the same time, or outright rejecting the
> > > operation.)
> >
> > Thanks for the response.
> >
> > If we decide to disallow this case, we seem need to handle some other
> > cases as well, for example: We might also need additional check when
> > ATTACH a partition, because the partition's parent table could already
> > be published in the same publication as the partition.
> >
> 
> What kind of additional checks you are envisioning and why?

For example:

create table tbl1 (a int) partition by range (a);
create table tbl1_part1 (like tbl1);
create table tbl1_part2 partition of tbl1 for values from (10) to (20);
create publication pub for table
tbl1, tbl1_part1 with (publish_via_partition_root=false);

--- We might need addition check here
Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);

In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
which seems the case we want to disallow(both parent and child table in
publication). So, When ATTACH, I thought we might need to check all the parent
tables to see if any parent table is in the same publication which the table to
be attached is also belongs to. Does it make sense ?

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> > > <houzj.fnst@fujitsu.com> wrote:
> >
> > > > create table tbl1 (a int) partition by range (a);
> > > > create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> > > > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > > create publication pub for table
> > > > tbl1, tbl1_part1 with (publish_via_partition_root=false);
> >
> > In the name of consistency, I think this situation should be an error --
> > I mean, if we detect that the user is trying to add both the partitioned
> > table *and* its partition, then all manner of things are possibly going
> > to go wrong in some way, so my inclination is to avoid it altogether.
> >
> > Is there any reason to allow that?
>
> I think it could provide flexibility to users to later change
> "publish_via_partition_root" option. Because when that option is
> false, we use individual partitions schema to send changes and when it
> is true, we use root table's schema to send changes. Added Amit L. to
> know if he has any thoughts on this matter as he was the author of
> this work?

FWIW, I'm not sure that adding an error in this path, that is, when a
user adds both a partitioned table and its partitions to a
publication, helps much.  As for the safety of allowing it, if you
look at get_rel_sync_entry(), which handles much of the complexity of
determining whether to publish a relation's changes and the schema to
use when doing so, you may be able to see that a partition being added
duplicatively is harmless, modulo any undiscovered bugs.  At least as
far as the post-initial-sync replication functionality is concerned.

What IS problematic is what a subscriber sees in the
pg_publication_tables view and the problem occurs only in the initial
sync phase, where the partition is synced duplicatively because of
being found in the view along with the parent in this case, that is,
when publish_via_partiiton_root is true.  I was saying on the other
thread [1] that we should leave it up to the subscriber to decide what
to do when the view (duplicatively) returns both the parent and the
partition, citing the use case that a subscriber may want to replicate
the parent and the partition as independent tables.  Though I now tend
to agree with Amit K that that may be such a meaningful and all that
common use case, and the implementation on the subscriber side would
have to be unnecessarily complex.

So maybe it makes sense to just do what has been proposed --
de-duplicate partitions out of the pg_publication_tables view, unless
we know of a bigger problem that requires us to hack the subscriber
side of things too.  Actually, I came to know of one such problem
while thinking about this today: when you ATTACH a partition to a
table that is present in a publish_via_partition_root=true
publication, it doesn't get copied via the initial sync even though
subsequent replication works just fine.  The reason for that is that
the subscriber only syncs the partitions that are known at the time
when the parent is first synced, that too via the parent (as SELECT
<columns..> FROM parent), and then marks the parent as sync-done.
Refreshing the subscription after ATTACHing doesn't help, because the
subscriber can't see any partitions to begin with.  Maybe a more
elaborate solution is needed for this one, though I haven't figured
what it is going to look like yet.

Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
> > > If we decide to disallow this case, we seem need to handle some other
> > > cases as well, for example: We might also need additional check when
> > > ATTACH a partition, because the partition's parent table could already
> > > be published in the same publication as the partition.
> > >
> >
> > What kind of additional checks you are envisioning and why?
>
> For example:
>
> create table tbl1 (a int) partition by range (a);
> create table tbl1_part1 (like tbl1);
> create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> create publication pub for table
> tbl1, tbl1_part1 with (publish_via_partition_root=false);
>
> --- We might need addition check here
> Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);
>
> In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
> publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
> which seems the case we want to disallow(both parent and child table in
> publication). So, When ATTACH, I thought we might need to check all the parent
> tables to see if any parent table is in the same publication which the table to
> be attached is also belongs to. Does it make sense ?

I don't think creating or attaching a partition of a table that is
present in a publish_via_partition_root=false actually adds the
partition to pg_publication_rel, the base catalog.  A partition's
membership in the publication is implicit, unless of course you add it
to the publication explicitly, like all the examples we have been
discussing.  I guess we're only arguing about the problems with the
pg_publication_tables view, which does expand the partitioned table to
show the partitions that are not otherwise not present in the base
catalog.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Wed, Nov 17, 2021 at 3:09 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Though I now tend
> to agree with Amit K that that may be such a meaningful and all that
> common use case,

Oops I meant: that may NOT be such...

-- 
Amit Langote
EDB: http://www.enterprisedb.com



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Wed, Nov 17, 2021 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
> > > > If we decide to disallow this case, we seem need to handle some other
> > > > cases as well, for example: We might also need additional check when
> > > > ATTACH a partition, because the partition's parent table could already
> > > > be published in the same publication as the partition.
> > > >
> > >
> > > What kind of additional checks you are envisioning and why?
> >
> > For example:
> >
> > create table tbl1 (a int) partition by range (a);
> > create table tbl1_part1 (like tbl1);
> > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > create publication pub for table
> > tbl1, tbl1_part1 with (publish_via_partition_root=false);
> >
> > --- We might need addition check here
> > Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);
> >
> > In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
> > publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
> > which seems the case we want to disallow(both parent and child table in
> > publication). So, When ATTACH, I thought we might need to check all the parent
> > tables to see if any parent table is in the same publication which the table to
> > be attached is also belongs to. Does it make sense ?
> 
> I don't think creating or attaching a partition of a table that is
> present in a publish_via_partition_root=false actually adds the
> partition to pg_publication_rel, the base catalog.  A partition's
> membership in the publication is implicit, unless of course you add it
> to the publication explicitly, like all the examples we have been
> discussing.  I guess we're only arguing about the problems with the
> pg_publication_tables view, which does expand the partitioned table to
> show the partitions that are not otherwise not present in the base
> catalog.

Maybe I didn't make it clear, I was trying to explain that it would be
complicated if we want to completely disallow specifying both child and parent
table in the publication because of the ATTACH case I gave. In other words, I
think it's fine to specify both child and parent table in the publication.

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Thu, Nov 18, 2021 at 9:33 houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
On Wed, Nov 17, 2021 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
> > > > If we decide to disallow this case, we seem need to handle some other
> > > > cases as well, for example: We might also need additional check when
> > > > ATTACH a partition, because the partition's parent table could already
> > > > be published in the same publication as the partition.
> > > >
> > >
> > > What kind of additional checks you are envisioning and why?
> >
> > For example:
> >
> > create table tbl1 (a int) partition by range (a);
> > create table tbl1_part1 (like tbl1);
> > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > create publication pub for table
> > tbl1, tbl1_part1 with (publish_via_partition_root=false);
> >
> > --- We might need addition check here
> > Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);
> >
> > In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
> > publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
> > which seems the case we want to disallow(both parent and child table in
> > publication). So, When ATTACH, I thought we might need to check all the parent
> > tables to see if any parent table is in the same publication which the table to
> > be attached is also belongs to. Does it make sense ?
>
> I don't think creating or attaching a partition of a table that is
> present in a publish_via_partition_root=false actually adds the
> partition to pg_publication_rel, the base catalog.  A partition's
> membership in the publication is implicit, unless of course you add it
> to the publication explicitly, like all the examples we have been
> discussing.  I guess we're only arguing about the problems with the
> pg_publication_tables view, which does expand the partitioned table to
> show the partitions that are not otherwise not present in the base
> catalog.

Maybe I didn't make it clear, I was trying to explain that it would be
complicated if we want to completely disallow specifying both child and parent
table in the publication because of the ATTACH case I gave. In other words, I
think it's fine to specify both child and parent table in the publication.

Ah okay.  I see your point.  A table that was not once a partition would be in the catalog along with the unrelated parent table and if we were to enforce the rule that a parent and a partition cannot be in the catalog at the same time, we’d need to delete that table’s catalog record on ATTACHing it as a partition.

Thanks.
--

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
> > > > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > > > create table tbl1 (a int) partition by range (a);
> > > > > create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> > > > > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > > > create publication pub for table
> > > > > tbl1, tbl1_part1 with (publish_via_partition_root=false);
> > >
> > > In the name of consistency, I think this situation should be an error --
> > > I mean, if we detect that the user is trying to add both the partitioned
> > > table *and* its partition, then all manner of things are possibly going
> > > to go wrong in some way, so my inclination is to avoid it altogether.
> > >
> > > Is there any reason to allow that?
> >
> > I think it could provide flexibility to users to later change
> > "publish_via_partition_root" option. Because when that option is
> > false, we use individual partitions schema to send changes and when it
> > is true, we use root table's schema to send changes. Added Amit L. to
> > know if he has any thoughts on this matter as he was the author of
> > this work?
>
> FWIW, I'm not sure that adding an error in this path, that is, when a
> user adds both a partitioned table and its partitions to a
> publication, helps much.  As for the safety of allowing it, if you
> look at get_rel_sync_entry(), which handles much of the complexity of
> determining whether to publish a relation's changes and the schema to
> use when doing so, you may be able to see that a partition being added
> duplicatively is harmless, modulo any undiscovered bugs.  At least as
> far as the post-initial-sync replication functionality is concerned.
>
> What IS problematic is what a subscriber sees in the
> pg_publication_tables view and the problem occurs only in the initial
> sync phase, where the partition is synced duplicatively because of
> being found in the view along with the parent in this case, that is,
> when publish_via_partiiton_root is true.  I was saying on the other
> thread [1] that we should leave it up to the subscriber to decide what
> to do when the view (duplicatively) returns both the parent and the
> partition, citing the use case that a subscriber may want to replicate
> the parent and the partition as independent tables.  Though I now tend
> to agree with Amit K that that may be such a meaningful and all that
> common use case, and the implementation on the subscriber side would
> have to be unnecessarily complex.
>
> So maybe it makes sense to just do what has been proposed --
> de-duplicate partitions out of the pg_publication_tables view,
>

AFAICU, there are actually two problems related to
pg_publication_tables view that are being discussed: (a) when
'publish_via_partition_root' is true then it returns both parent and
child tables (provided both are part of publication), this can lead to
duplicate data on the subscriber; the fix for this is being discussed
in thread [1]. (b) when 'publish_via_partition_root' is false, then it
returns duplicate entries for child tables (provided both are part of
publication), this problem is being discussed in this thread.

As per the proposed patches, it seems both need a separate fix, we can
fix them together if we want but I am not sure. Is your understanding
the same?

> unless
> we know of a bigger problem that requires us to hack the subscriber
> side of things too.
>

There is yet another issue that might need subscriber side change. See
the second issue summarized by Hou-San in the email[2]. I feel it is
better to tackle that separately.

>  Actually, I came to know of one such problem
> while thinking about this today: when you ATTACH a partition to a
> table that is present in a publish_via_partition_root=true
> publication, it doesn't get copied via the initial sync even though
> subsequent replication works just fine.  The reason for that is that
> the subscriber only syncs the partitions that are known at the time
> when the parent is first synced, that too via the parent (as SELECT
> <columns..> FROM parent), and then marks the parent as sync-done.
> Refreshing the subscription after ATTACHing doesn't help, because the
> subscriber can't see any partitions to begin with.
>

Sorry, it is not clear to me how this can lead to a problem. Even if
we just have the root table in the subscriber at the time of sync
later shouldn't copy get the newly attached partition's data and
replicate it to the required partition for
"publish_via_partition_root=true" case? Anyway, if this is a problem
we need to figure the solution for this separately.

[1]: https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com



--
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Bharath Rupireddy
Date:
On Thu, Nov 18, 2021 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> (b) when 'publish_via_partition_root' is false, then it
> returns duplicate entries for child tables (provided both are part of
> publication), this problem is being discussed in this thread.

+1 to just focus on fixing the initial problem proposed i.e.
pg_get_publication_tables() outputting duplicate relid. IMO, all other
issues (if unrelated to this problem and the patch can go to another
new thread, if necessary).

Regards,
Bharath Rupireddy.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > What IS problematic is what a subscriber sees in the
> > pg_publication_tables view and the problem occurs only in the initial
> > sync phase, where the partition is synced duplicatively because of
> > being found in the view along with the parent in this case, that is,
> > when publish_via_partiiton_root is true.  I was saying on the other
> > thread [1] that we should leave it up to the subscriber to decide what
> > to do when the view (duplicatively) returns both the parent and the
> > partition, citing the use case that a subscriber may want to replicate
> > the parent and the partition as independent tables.  Though I now tend
> > to agree with Amit K that that may be such a meaningful and all that
> > common use case, and the implementation on the subscriber side would
> > have to be unnecessarily complex.
> >
> > So maybe it makes sense to just do what has been proposed --
> > de-duplicate partitions out of the pg_publication_tables view,
>
> AFAICU, there are actually two problems related to
> pg_publication_tables view that are being discussed: (a) when
> 'publish_via_partition_root' is true then it returns both parent and
> child tables (provided both are part of publication), this can lead to
> duplicate data on the subscriber; the fix for this is being discussed
> in thread [1]. (b) when 'publish_via_partition_root' is false, then it
> returns duplicate entries for child tables (provided both are part of
> publication), this problem is being discussed in this thread.
>
> As per the proposed patches, it seems both need a separate fix, we can
> fix them together if we want but I am not sure. Is your understanding
> the same?

Actually, I'm seeing both (a) and (b) as more or less the same thing.
If publish_via_partition_root=true, we'd want to remove a partition
from the list if the root parent is also present.  If false, we'd want
to remove a partition from the list because it'd also be added by way
of expanding the root.

I know that (a) causes the partition-double-sync problem that could be
fixed by partition OID de-duplication as proposed in the other thread.
As a comment on that, it'd be nice to see a test case added to
src/test/subscription suite in that patch, because
partition-double-sync is the main problem statement I'd think.

Like Bharath said, I don't see (b) as much of a problem, because the
subscriber applies DISTINCT when querying pg_publication_tables()
anyway.  Other users, if any out there, may be doing the same by
following core subscription code's example.  That said, I am not
opposed to applying the patch being proposed here, though I guess we
don't have any src/test/subscription test case for this one?  As in,
do we know of any replication (initial/streaming) misbehavior caused
by the duplicate partition OIDs in this case or is the only problem
that pg_publication_tables output looks odd?

> > unless
> > we know of a bigger problem that requires us to hack the subscriber
> > side of things too.
> >
>
> There is yet another issue that might need subscriber side change. See
> the second issue summarized by Hou-San in the email[2]. I feel it is
> better to tackle that separately.
>
> >  Actually, I came to know of one such problem
> > while thinking about this today: when you ATTACH a partition to a
> > table that is present in a publish_via_partition_root=true
> > publication, it doesn't get copied via the initial sync even though
> > subsequent replication works just fine.  The reason for that is that
> > the subscriber only syncs the partitions that are known at the time
> > when the parent is first synced, that too via the parent (as SELECT
> > <columns..> FROM parent), and then marks the parent as sync-done.
> > Refreshing the subscription after ATTACHing doesn't help, because the
> > subscriber can't see any partitions to begin with.
> >
>
> Sorry, it is not clear to me how this can lead to a problem. Even if
> we just have the root table in the subscriber at the time of sync
> later shouldn't copy get the newly attached partition's data and
> replicate it to the required partition for
> "publish_via_partition_root=true" case?

Not sure if you thought otherwise but I am talking about attaching a
new partition into the partition tree on the *publication side*.

The problematic case is attaching the partition *after* the subscriber
has already marked the root parent as synced and/or ready for
replication.  Refreshing the subscription doesn't help it discover the
newly attached partition, because a publish_via_partition_root only
ever tells about the root parent, which would be already synced, so
the subscriber would think there's nothing to copy.

> Anyway, if this is a problem
> we need to figure the solution for this separately.

Sure, we might need to do that after all.  Though it might be a good
idea to be sure that we won't need to reconsider the fix we push for
the issue(s) being discussed here and elsewhere, because I suspect
that the solution to the problem I mentioned is likely to involve
tweaking pg_publication_tables view output.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > AFAICU, there are actually two problems related to
> > pg_publication_tables view that are being discussed: (a) when
> > 'publish_via_partition_root' is true then it returns both parent and
> > child tables (provided both are part of publication), this can lead to
> > duplicate data on the subscriber; the fix for this is being discussed
> > in thread [1]. (b) when 'publish_via_partition_root' is false, then it
> > returns duplicate entries for child tables (provided both are part of
> > publication), this problem is being discussed in this thread.
> >
> > As per the proposed patches, it seems both need a separate fix, we can
> > fix them together if we want but I am not sure. Is your understanding
> > the same?
>
> Actually, I'm seeing both (a) and (b) as more or less the same thing.
> If publish_via_partition_root=true, we'd want to remove a partition
> from the list if the root parent is also present.  If false, we'd want
> to remove a partition from the list because it'd also be added by way
> of expanding the root.
>
> I know that (a) causes the partition-double-sync problem that could be
> fixed by partition OID de-duplication as proposed in the other thread.
> As a comment on that, it'd be nice to see a test case added to
> src/test/subscription suite in that patch, because
> partition-double-sync is the main problem statement I'd think.
>
> Like Bharath said, I don't see (b) as much of a problem, because the
> subscriber applies DISTINCT when querying pg_publication_tables()
> anyway.  Other users, if any out there, may be doing the same by
> following core subscription code's example.
>

Hmm, I don't think we can assume that current or future users of
pg_publication_tables() will use it with DISTINCT unless we specify in
the specs/docs. However, I guess we might want to do it only for HEAD
as there is no direct field report for this and anyway we have some
workaround for this.

>  That said, I am not
> opposed to applying the patch being proposed here, though I guess we
> don't have any src/test/subscription test case for this one?
>

Yeah, we can add tests for this and the other case.

>  As in,
> do we know of any replication (initial/streaming) misbehavior caused
> by the duplicate partition OIDs in this case or is the only problem
> that pg_publication_tables output looks odd?
>

The latter one but I think either we should document this or change it
as we can't assume users will follow what subscriber-side code does.

> > > unless
> > > we know of a bigger problem that requires us to hack the subscriber
> > > side of things too.
> > >
> >
> > There is yet another issue that might need subscriber side change. See
> > the second issue summarized by Hou-San in the email[2]. I feel it is
> > better to tackle that separately.
> >
>
> The problematic case is attaching the partition *after* the subscriber
> has already marked the root parent as synced and/or ready for
> replication.  Refreshing the subscription doesn't help it discover the
> newly attached partition, because a publish_via_partition_root only
> ever tells about the root parent, which would be already synced, so
> the subscriber would think there's nothing to copy.
>

Okay, I see this could be a problem but I haven't tried to reproduce it.

> > Anyway, if this is a problem
> > we need to figure the solution for this separately.
>
> Sure, we might need to do that after all.  Though it might be a good
> idea to be sure that we won't need to reconsider the fix we push for
> the issue(s) being discussed here and elsewhere, because I suspect
> that the solution to the problem I mentioned is likely to involve
> tweaking pg_publication_tables view output.
>

Okay, so we have four known problems in the same area. The first has
been reported at the start of this thread, then the two as summarized
by Hou-San in his email [1] and the fourth one is the attach of
partitions after initial sync as you mentioned. It makes sense to have
some high-level idea on how to solve each of the problems before
pushing a fix for any one particular problem as one fix could have an
impact on other fixes. I'll think over it but please do let me know if
you have any ideas.


[1] -
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > The problematic case is attaching the partition *after* the subscriber
> > has already marked the root parent as synced and/or ready for
> > replication.  Refreshing the subscription doesn't help it discover the
> > newly attached partition, because a publish_via_partition_root only
> > ever tells about the root parent, which would be already synced, so
> > the subscriber would think there's nothing to copy.
> >
>
> Okay, I see this could be a problem but I haven't tried to reproduce it.
>
> > > Anyway, if this is a problem
> > > we need to figure the solution for this separately.
> >
> > Sure, we might need to do that after all.  Though it might be a good
> > idea to be sure that we won't need to reconsider the fix we push for
> > the issue(s) being discussed here and elsewhere, because I suspect
> > that the solution to the problem I mentioned is likely to involve
> > tweaking pg_publication_tables view output.
> >

I have thought about this problem and I see two possibilities for a
solution (a) We could provide a new option say 'truncate' (something
on lines proposed here [1]) which would truncate the table(s) and
change its status to 'i' in the pg_subscription_rel, this would allow
the newly added partition to be synced after refresh. This could lead
to a large copy in such a case. (b) We could somehow get and store all
the partition info from the publisher-side on the subscriber-side
while initial sync (say in new system table
pg_subscription_rel_members). Now, after the refresh, if this list
changes, we can allow to just get the data of that particular
partition but I guess it would mean that we need to store oids of the
publisher which might or might not be safe considering oids can
wraparound before the refresh.

Do you have any other ideas?

One more thing you mentioned is that the initial sync won't work after
refresh but later changes will be replicated but I noticed that later
changes also don't get streamed till we restart the subscriber server.
I am not sure but we might not be invalidating apply workers cache due
to which it didn't notice the same.

[1] - https://www.postgresql.org/message-id/CF3B6672-2A43-4204-A60A-68F359218A9B%40endpoint.com

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >  As in,
> > do we know of any replication (initial/streaming) misbehavior caused
> > by the duplicate partition OIDs in this case or is the only problem
> > that pg_publication_tables output looks odd?
>
> The latter one but I think either we should document this or change it
> as we can't assume users will follow what subscriber-side code does.

On second thought, I agree that de-duplicating partitions from this
view is an improvement.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > The problematic case is attaching the partition *after* the subscriber
> > > has already marked the root parent as synced and/or ready for
> > > replication.  Refreshing the subscription doesn't help it discover the
> > > newly attached partition, because a publish_via_partition_root only
> > > ever tells about the root parent, which would be already synced, so
> > > the subscriber would think there's nothing to copy.
> >
> > Okay, I see this could be a problem but I haven't tried to reproduce it.
> >
> > > > Anyway, if this is a problem
> > > > we need to figure the solution for this separately.
> > >
> > > Sure, we might need to do that after all.  Though it might be a good
> > > idea to be sure that we won't need to reconsider the fix we push for
> > > the issue(s) being discussed here and elsewhere, because I suspect
> > > that the solution to the problem I mentioned is likely to involve
> > > tweaking pg_publication_tables view output.
>
> I have thought about this problem and I see two possibilities for a
> solution (a) We could provide a new option say 'truncate' (something
> on lines proposed here [1]) which would truncate the table(s) and
> change its status to 'i' in the pg_subscription_rel, this would allow
> the newly added partition to be synced after refresh. This could lead
> to a large copy in such a case.

Maybe I am missing something about the proposal, though I'd think a
more automatic solution would be better, something that doesn't need
to rely on an unrelated feature.

> (b) We could somehow get and store all
> the partition info from the publisher-side on the subscriber-side
> while initial sync (say in new system table
> pg_subscription_rel_members). Now, after the refresh, if this list
> changes, we can allow to just get the data of that particular
> partition but I guess it would mean that we need to store oids of the
> publisher which might or might not be safe considering oids can
> wraparound before the refresh.
>
> Do you have any other ideas?

I thought that the idea I had earlier mentioned at [1] may be useful,
which I can see is similar to your idea (b). I also suspect that it
can be implemented without needing a separate catalog and storing
publication-side relation OIDs in the subscription-side catalog,
though maybe I haven't thought hard enough.

> One more thing you mentioned is that the initial sync won't work after
> refresh but later changes will be replicated but I noticed that later
> changes also don't get streamed till we restart the subscriber server.
> I am not sure but we might not be invalidating apply workers cache due
> to which it didn't notice the same.

Oh, that sounds odd and, as you appear to say, a separate problem.  I'll check.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > The problematic case is attaching the partition *after* the subscriber
> > > > has already marked the root parent as synced and/or ready for
> > > > replication.  Refreshing the subscription doesn't help it discover the
> > > > newly attached partition, because a publish_via_partition_root only
> > > > ever tells about the root parent, which would be already synced, so
> > > > the subscriber would think there's nothing to copy.
> > >
> > > Okay, I see this could be a problem but I haven't tried to reproduce it.
> > >
> > > > > Anyway, if this is a problem
> > > > > we need to figure the solution for this separately.
> > > >
> > > > Sure, we might need to do that after all.  Though it might be a good
> > > > idea to be sure that we won't need to reconsider the fix we push for
> > > > the issue(s) being discussed here and elsewhere, because I suspect
> > > > that the solution to the problem I mentioned is likely to involve
> > > > tweaking pg_publication_tables view output.
> >
> > I have thought about this problem and I see two possibilities for a
> > solution (a) We could provide a new option say 'truncate' (something
> > on lines proposed here [1]) which would truncate the table(s) and
> > change its status to 'i' in the pg_subscription_rel, this would allow
> > the newly added partition to be synced after refresh. This could lead
> > to a large copy in such a case.
>
> Maybe I am missing something about the proposal, though I'd think a
> more automatic solution would be better, something that doesn't need
> to rely on an unrelated feature.
>

Agreed, this was more of a workaround for users if we didn't get any
automatic solution.

> > (b) We could somehow get and store all
> > the partition info from the publisher-side on the subscriber-side
> > while initial sync (say in new system table
> > pg_subscription_rel_members). Now, after the refresh, if this list
> > changes, we can allow to just get the data of that particular
> > partition but I guess it would mean that we need to store oids of the
> > publisher which might or might not be safe considering oids can
> > wraparound before the refresh.
> >
> > Do you have any other ideas?
>
> I thought that the idea I had earlier mentioned at [1] may be useful,
> which I can see is similar to your idea (b). I also suspect that it
> can be implemented without needing a separate catalog and storing
> publication-side relation OIDs in the subscription-side catalog,
> though maybe I haven't thought hard enough.
>

The problem with storing this info in pg_subscription_rel as you were
describing in your proposal is that currently, we ensure that the same
table exists in subscriber and then store the subscriber side table id
in that catalog. I am not sure if we can store publisher-side oids in
that catalog and if we store then it would be confusing as now it will
have info of both publisher-side oids and subscriber-side oids. Isn't
that a problem with this approach?

One more problem with this overall approach is the risk of OID
wraparound. Say between Create Subscription and Alter Subscription ..
Refresh, we detach one partition, the oid wraparounds, and we
create/reattach another partition which gets the same oid as for the
partition which we detached earlier then we won't be able to identify
the new partition even after refresh.

-- 
With Regards,
Amit Kapila.



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Mon, Nov 22, 2021 6:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote
> <amitlangote09@gmail.com> wrote:
> > > > > The problematic case is attaching the partition *after* the
> > > > > subscriber has already marked the root parent as synced and/or
> > > > > ready for replication.  Refreshing the subscription doesn't help
> > > > > it discover the newly attached partition, because a
> > > > > publish_via_partition_root only ever tells about the root
> > > > > parent, which would be already synced, so the subscriber would think
> > > > > there's nothing to copy.
> > > >
> > > > Okay, I see this could be a problem but I haven't tried to reproduce it.
> > > >
> > > > > > Anyway, if this is a problem
> > > > > > we need to figure the solution for this separately.
> > > > >
> > > > > Sure, we might need to do that after all.  Though it might be a
> > > > > good idea to be sure that we won't need to reconsider the fix we
> > > > > push for the issue(s) being discussed here and elsewhere,
> > > > > because I suspect that the solution to the problem I mentioned
> > > > > is likely to involve tweaking pg_publication_tables view output.
> > >
> > > I have thought about this problem and I see two possibilities for a
> > > solution (a) We could provide a new option say 'truncate' (something
> > > on lines proposed here [1]) which would truncate the table(s) and
> > > change its status to 'i' in the pg_subscription_rel, this would
> > > allow the newly added partition to be synced after refresh. This
> > > could lead to a large copy in such a case.
> >
> > Maybe I am missing something about the proposal, though I'd think a
> > more automatic solution would be better, something that doesn't need
> > to rely on an unrelated feature.
> >
> 
> Agreed, this was more of a workaround for users if we didn't get any automatic
> solution.
> 
> > > (b) We could somehow get and store all the partition info from the
> > > publisher-side on the subscriber-side while initial sync (say in new
> > > system table pg_subscription_rel_members). Now, after the refresh,
> > > if this list changes, we can allow to just get the data of that
> > > particular partition but I guess it would mean that we need to store
> > > oids of the publisher which might or might not be safe considering
> > > oids can wraparound before the refresh.
> > >
> > > Do you have any other ideas?
> >
> > I thought that the idea I had earlier mentioned at [1] may be useful,
> > which I can see is similar to your idea (b). I also suspect that it
> > can be implemented without needing a separate catalog and storing
> > publication-side relation OIDs in the subscription-side catalog,
> > though maybe I haven't thought hard enough.
> >
> 
> The problem with storing this info in pg_subscription_rel as you were
> describing in your proposal is that currently, we ensure that the same table
> exists in subscriber and then store the subscriber side table id in that catalog. I
> am not sure if we can store publisher-side oids in that catalog and if we store
> then it would be confusing as now it will have info of both publisher-side oids
> and subscriber-side oids. Isn't that a problem with this approach?
> 
> One more problem with this overall approach is the risk of OID wraparound.
> Say between Create Subscription and Alter Subscription ..
> Refresh, we detach one partition, the oid wraparounds, and we create/reattach
> another partition which gets the same oid as for the partition which we
> detached earlier then we won't be able to identify the new partition even after
> refresh.

If we use this approach, I think maybe we can store the publisher-side's table
name instead of oid. Because for non-partitioned table, it's possible that user
drop and create a new table with the same name in which case the oid would be
changed. And in this case, the existing behavior wouldn't sync the table again.
So I think it might be better to make the partitioned table's behavior
consistent with the non-partitioned table which only sync the table's data when
detect new table name.

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Tue, Nov 23, 2021 at 7:28 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Mon, Nov 22, 2021 6:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09@gmail.com>
> > wrote:
> > >
> > > On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote
> > <amitlangote09@gmail.com> wrote:
> > > > > > The problematic case is attaching the partition *after* the
> > > > > > subscriber has already marked the root parent as synced and/or
> > > > > > ready for replication.  Refreshing the subscription doesn't help
> > > > > > it discover the newly attached partition, because a
> > > > > > publish_via_partition_root only ever tells about the root
> > > > > > parent, which would be already synced, so the subscriber would think
> > > > > > there's nothing to copy.
> > > > >
> > > > > Okay, I see this could be a problem but I haven't tried to reproduce it.
> > > > >
> > > > > > > Anyway, if this is a problem
> > > > > > > we need to figure the solution for this separately.
> > > > > >
> > > > > > Sure, we might need to do that after all.  Though it might be a
> > > > > > good idea to be sure that we won't need to reconsider the fix we
> > > > > > push for the issue(s) being discussed here and elsewhere,
> > > > > > because I suspect that the solution to the problem I mentioned
> > > > > > is likely to involve tweaking pg_publication_tables view output.
> > > >
> > > > I have thought about this problem and I see two possibilities for a
> > > > solution (a) We could provide a new option say 'truncate' (something
> > > > on lines proposed here [1]) which would truncate the table(s) and
> > > > change its status to 'i' in the pg_subscription_rel, this would
> > > > allow the newly added partition to be synced after refresh. This
> > > > could lead to a large copy in such a case.
> > >
> > > Maybe I am missing something about the proposal, though I'd think a
> > > more automatic solution would be better, something that doesn't need
> > > to rely on an unrelated feature.
> > >
> >
> > Agreed, this was more of a workaround for users if we didn't get any automatic
> > solution.
> >
> > > > (b) We could somehow get and store all the partition info from the
> > > > publisher-side on the subscriber-side while initial sync (say in new
> > > > system table pg_subscription_rel_members). Now, after the refresh,
> > > > if this list changes, we can allow to just get the data of that
> > > > particular partition but I guess it would mean that we need to store
> > > > oids of the publisher which might or might not be safe considering
> > > > oids can wraparound before the refresh.
> > > >
> > > > Do you have any other ideas?
> > >
> > > I thought that the idea I had earlier mentioned at [1] may be useful,
> > > which I can see is similar to your idea (b). I also suspect that it
> > > can be implemented without needing a separate catalog and storing
> > > publication-side relation OIDs in the subscription-side catalog,
> > > though maybe I haven't thought hard enough.
> > >
> >
> > The problem with storing this info in pg_subscription_rel as you were
> > describing in your proposal is that currently, we ensure that the same table
> > exists in subscriber and then store the subscriber side table id in that catalog. I
> > am not sure if we can store publisher-side oids in that catalog and if we store
> > then it would be confusing as now it will have info of both publisher-side oids
> > and subscriber-side oids. Isn't that a problem with this approach?
> >
> > One more problem with this overall approach is the risk of OID wraparound.
> > Say between Create Subscription and Alter Subscription ..
> > Refresh, we detach one partition, the oid wraparounds, and we create/reattach
> > another partition which gets the same oid as for the partition which we
> > detached earlier then we won't be able to identify the new partition even after
> > refresh.
>
> If we use this approach, I think maybe we can store the publisher-side's table
> name instead of oid. Because for non-partitioned table, it's possible that user
> drop and create a new table with the same name in which case the oid would be
> changed. And in this case, the existing behavior wouldn't sync the table again.
> So I think it might be better to make the partitioned table's behavior
> consistent with the non-partitioned table which only sync the table's data when
> detect new table name.
>

I think firstly you need to then also store schema info. In the case
of non-partitioned tables, it is derived differently because we remove
the table's entry from pg_publication_rel, so the data won't be sent
to subscriber till the new table is added to the publication. In the
case of partitioned table (especially for the case of
publish_via_partition_root) we won't sync initial data in such a case
(dropped and re-created the partition with the same name) both with
and without your solution but the future data will be streamed.

Another point related to the Attach problem is what if such a
partition before was also part of publication separately, then if we
copy its initial data then that data copy might be considered a
duplicate. Isn't it better to document this case and explain what
users can expect instead of trying to design a solution around this?
Even if we do so the streaming after attach partition problem as
discussed above should be fixed.

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Tue, Nov 23, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Isn't it better to document this case and explain what
> users can expect instead of trying to design a solution around this?

I thought about the problems you've described and it looks like I
hadn't considered many of the details which complicate implementing a
solution for this.

So yeah, documenting the ATTACH issue as a limitation sounds like the
best course for now.  I might word it as follows and add it under
Notes at https://www.postgresql.org/docs/current/sql-createpublication.html:

"ATTACHing a table into a partition tree whose root is published using
a publication with publish_via_partition_root set to true does not
result in the table's existing contents to be replicated."

I'm not sure there's a clean enough workaround to this that we can add
to the paragraph.

Does that make sense?

> Even if we do so the streaming after attach partition problem as
> discussed above should be fixed.

I agree.  I have reproduced the problem though haven't managed to pin
down the cause yet.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Wed, Nov 24, 2021 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Nov 23, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Isn't it better to document this case and explain what
> > users can expect instead of trying to design a solution around this?
>
> I thought about the problems you've described and it looks like I
> hadn't considered many of the details which complicate implementing a
> solution for this.
>
> So yeah, documenting the ATTACH issue as a limitation sounds like the
> best course for now.  I might word it as follows and add it under
> Notes at https://www.postgresql.org/docs/current/sql-createpublication.html:
>
> "ATTACHing a table into a partition tree whose root is published using
> a publication with publish_via_partition_root set to true does not
> result in the table's existing contents to be replicated."
>

Instead of "to be", shall we use "being"?

> I'm not sure there's a clean enough workaround to this that we can add
> to the paragraph.
>
> Does that make sense?
>

Yeah, that sounds reasonable. I think the case with
"publish_via_partition_root = false" should work but please test it
once. The other thing which we need to consider about all these fixes
is whether to back patch them or not. I think it is on case to case
basis. I feel this limitation should be documented and backpatched,
what do you think? Feel free to submit patches accordingly.

> > Even if we do so the streaming after attach partition problem as
> > discussed above should be fixed.
>
> I agree.  I have reproduced the problem though haven't managed to pin
> down the cause yet.
>

No problem, do let us know whatever be your findings at the end. I
think now we know how to proceed with this attach issue, maybe we
should also try to commit the fixes for other issues.

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > >  As in,
> > > do we know of any replication (initial/streaming) misbehavior caused
> > > by the duplicate partition OIDs in this case or is the only problem
> > > that pg_publication_tables output looks odd?
> >
> > The latter one but I think either we should document this or change it
> > as we can't assume users will follow what subscriber-side code does.
>
> On second thought, I agree that de-duplicating partitions from this
> view is an improvement.
>

Fair enough. Hou-San, Can you please submit the updated patch after
fixing any pending comments including the test case?

-- 
With Regards,
Amit Kapila.



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> > > >  As in,
> > > > do we know of any replication (initial/streaming) misbehavior
> > > > caused by the duplicate partition OIDs in this case or is the only
> > > > problem that pg_publication_tables output looks odd?
> > >
> > > The latter one but I think either we should document this or change
> > > it as we can't assume users will follow what subscriber-side code does.
> >
> > On second thought, I agree that de-duplicating partitions from this
> > view is an improvement.
> >
> 
> Fair enough. Hou-San, Can you please submit the updated patch after fixing
> any pending comments including the test case?

OK, I will submit the updated patch soon.

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Wed, Nov 24, 2021 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Nov 24, 2021 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > So yeah, documenting the ATTACH issue as a limitation sounds like the
> > best course for now.  I might word it as follows and add it under
> > Notes at https://www.postgresql.org/docs/current/sql-createpublication.html:
> >
> > "ATTACHing a table into a partition tree whose root is published using
> > a publication with publish_via_partition_root set to true does not
> > result in the table's existing contents to be replicated."
>
> Instead of "to be", shall we use "being"?

That reads better.

> > I'm not sure there's a clean enough workaround to this that we can add
> > to the paragraph.
> >
> > Does that make sense?
> >
>
> Yeah, that sounds reasonable. I think the case with
> "publish_via_partition_root = false" should work but please test it
> once.

Yeah, I did test that case to compare and didn't see a problem with it.

> The other thing which we need to consider about all these fixes
> is whether to back patch them or not. I think it is on case to case
> basis. I feel this limitation should be documented and backpatched,
> what do you think? Feel free to submit patches accordingly.

I agree with backpatching the doc fix.  I've attached a diff against
master, though it also appears to apply to 13 and 14 branches.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Wed, Nov 24, 2021 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Nov 24, 2021 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > So yeah, documenting the ATTACH issue as a limitation sounds like the
> > > best course for now.  I might word it as follows and add it under
> > > Notes at https://www.postgresql.org/docs/current/sql-createpublication.html:
> > >
> > > "ATTACHing a table into a partition tree whose root is published using
> > > a publication with publish_via_partition_root set to true does not
> > > result in the table's existing contents to be replicated."
> >
> > Instead of "to be", shall we use "being"?
>
> That reads better.
>
> > > I'm not sure there's a clean enough workaround to this that we can add
> > > to the paragraph.
> > >
>
> I agree with backpatching the doc fix.  I've attached a diff against
> master, though it also appears to apply to 13 and 14 branches.
>

I think we can <literal></literal> for publish_via_partition_root,
other than that the patch looks good to me.

Hou-San, others, do you have any opinion about this patch and whether
to backpatch it or not?

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Thu, Nov 25, 2021 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > I agree with backpatching the doc fix.  I've attached a diff against
> > master, though it also appears to apply to 13 and 14 branches.
>
> I think we can <literal></literal> for publish_via_partition_root,
> other than that the patch looks good to me.

Oh right, fixed.  I also added the tag for "true" in the same sentence.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, November 25, 2021 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com>
> >
> > I agree with backpatching the doc fix.  I've attached a diff against
> > master, though it also appears to apply to 13 and 14 branches.
> >
> 
> I think we can <literal></literal> for publish_via_partition_root, other than that
> the patch looks good to me.
> 
> Hou-San, others, do you have any opinion about this patch and whether to
> backpatch it or not?

I think it makes sense to backpatch the doc fix, and the patch looks good to me.

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Fri, Nov 26, 2021 at 7:10 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, November 25, 2021 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com>
> > >
> > > I agree with backpatching the doc fix.  I've attached a diff against
> > > master, though it also appears to apply to 13 and 14 branches.
> > >
> >
> > I think we can <literal></literal> for publish_via_partition_root, other than that
> > the patch looks good to me.
> >
> > Hou-San, others, do you have any opinion about this patch and whether to
> > backpatch it or not?
>
> I think it makes sense to backpatch the doc fix, and the patch looks good to me.
>

Thanks, I'll push this sometime early next week unless there are any
objections to it.


-- 
With Regards,
Amit Kapila.



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com>
> On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com>
> wrote:
> >
> > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> > > >  As in,
> > > > do we know of any replication (initial/streaming) misbehavior
> > > > caused by the duplicate partition OIDs in this case or is the only
> > > > problem that pg_publication_tables output looks odd?
> > >
> > > The latter one but I think either we should document this or change
> > > it as we can't assume users will follow what subscriber-side code does.
> >
> > On second thought, I agree that de-duplicating partitions from this
> > view is an improvement.
> >
> 
> Fair enough. Hou-San, Can you please submit the updated patch after fixing
> any pending comments including the test case?

Attach the updated patch which address the comments so far.

The patch only adds a testcase in publication.sql because the duplicate
output doesn't cause unexpected behavior in pub-sub test.

Best regards,
Hou zj

Attachment

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Fri, Nov 26, 2021 at 8:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 26, 2021 at 7:10 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Thursday, November 25, 2021 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com>
> > > >
> > > > I agree with backpatching the doc fix.  I've attached a diff against
> > > > master, though it also appears to apply to 13 and 14 branches.
> > > >
> > >
> > > I think we can <literal></literal> for publish_via_partition_root, other than that
> > > the patch looks good to me.
> > >
> > > Hou-San, others, do you have any opinion about this patch and whether to
> > > backpatch it or not?
> >
> > I think it makes sense to backpatch the doc fix, and the patch looks good to me.
> >
>
> Thanks, I'll push this sometime early next week unless there are any
> objections to it.
>

Pushed.

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Nov 29, 2021 at 2:37 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com>
> > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com>
> > wrote:
> > >
> > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com>
> > wrote:
> > > > >  As in,
> > > > > do we know of any replication (initial/streaming) misbehavior
> > > > > caused by the duplicate partition OIDs in this case or is the only
> > > > > problem that pg_publication_tables output looks odd?
> > > >
> > > > The latter one but I think either we should document this or change
> > > > it as we can't assume users will follow what subscriber-side code does.
> > >
> > > On second thought, I agree that de-duplicating partitions from this
> > > view is an improvement.
> > >
> >
> > Fair enough. Hou-San, Can you please submit the updated patch after fixing
> > any pending comments including the test case?
>
> Attach the updated patch which address the comments so far.
>
> The patch only adds a testcase in publication.sql because the duplicate
> output doesn't cause unexpected behavior in pub-sub test.
>

Thanks, the patch looks good to me. I have slightly changed the commit
message in the attached. I would like to commit this only in HEAD as
there is no user complaint about this and it might not stop any user's
service unless it relies on this view's output for the initial table
synchronization.

What do you think? Bharath/Amit L., others, do you have any opinion on
this matter?

-- 
With Regards,
Amit Kapila.

Attachment

RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Nov 29, 2021 at 2:37 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com>
> > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote
> > > <amitlangote09@gmail.com>
> > > wrote:
> > > >
> > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila
> > > > <amit.kapila16@gmail.com>
> > > wrote:
> > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote
> > > > > <amitlangote09@gmail.com>
> > > wrote:
> > > > > >  As in,
> > > > > > do we know of any replication (initial/streaming) misbehavior
> > > > > > caused by the duplicate partition OIDs in this case or is the
> > > > > > only problem that pg_publication_tables output looks odd?
> > > > >
> > > > > The latter one but I think either we should document this or
> > > > > change it as we can't assume users will follow what subscriber-side
> > > > > code does.
> > > >
> > > > On second thought, I agree that de-duplicating partitions from
> > > > this view is an improvement.
> > > >
> > >
> > > Fair enough. Hou-San, Can you please submit the updated patch after
> > > fixing any pending comments including the test case?
> >
> > Attach the updated patch which address the comments so far.
> >
> > The patch only adds a testcase in publication.sql because the
> > duplicate output doesn't cause unexpected behavior in pub-sub test.
> >
> 
> Thanks, the patch looks good to me. I have slightly changed the commit
> message in the attached. I would like to commit this only in HEAD as there is no
> user complaint about this and it might not stop any user's service unless it relies
> on this view's output for the initial table synchronization.
> 
> What do you think?
I agreed that we can commit only in HEAD.

Best regards,
Hou zj


Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Thu, Dec 2, 2021 at 9:44 houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote
> > > > <amitlangote09@gmail.com> wrote:
> > > > > On second thought, I agree that de-duplicating partitions from
> > > > > this view is an improvement.
> > > >
> > > > Fair enough. Hou-San, Can you please submit the updated patch after
> > > > fixing any pending comments including the test case?
> > >
> > > Attach the updated patch which address the comments so far.
> > >
> > > The patch only adds a testcase in publication.sql because the
> > > duplicate output doesn't cause unexpected behavior in pub-sub test.
> >
> > Thanks, the patch looks good to me. I have slightly changed the commit
> > message in the attached. I would like to commit this only in HEAD as there is no
> > user complaint about this and it might not stop any user's service unless it relies
> > on this view's output for the initial table synchronization.

The patch looks good to me too in that it gets the job done.

Reading Alvaro's email at the top again gave me a pause to reconsider
what I had said in reply.  It might indeed have been nice if the
publication DDL itself had prevented the cases where a partition and
its ancestor are added to a publication, though as Hou-san mentioned,
partition ATTACHes make that a bit tricky to enforce at all times,
though of course not impossible.  Maybe it's okay to just de-duplicate
pg_publication_tables output as the patch does, even though it may
appear to be a band-aid solution if we one considers Alvaro's point
about consistency.  Sorry about too many second thoughts on these
threads. :(

> > What do you think?
> I agreed that we can commit only in HEAD.

+1 to do this only in HEAD.

Thank you.


--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Dec 2, 2021 at 9:44 houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote
> > > > > <amitlangote09@gmail.com> wrote:
> > > > > > On second thought, I agree that de-duplicating partitions from
> > > > > > this view is an improvement.
> > > > >
> > > > > Fair enough. Hou-San, Can you please submit the updated patch after
> > > > > fixing any pending comments including the test case?
> > > >
> > > > Attach the updated patch which address the comments so far.
> > > >
> > > > The patch only adds a testcase in publication.sql because the
> > > > duplicate output doesn't cause unexpected behavior in pub-sub test.
> > >
> > > Thanks, the patch looks good to me. I have slightly changed the commit
> > > message in the attached. I would like to commit this only in HEAD as there is no
> > > user complaint about this and it might not stop any user's service unless it relies
> > > on this view's output for the initial table synchronization.
>
> The patch looks good to me too in that it gets the job done.
>
> Reading Alvaro's email at the top again gave me a pause to reconsider
> what I had said in reply.  It might indeed have been nice if the
> publication DDL itself had prevented the cases where a partition and
> its ancestor are added to a publication, though as Hou-san mentioned,
> partition ATTACHes make that a bit tricky to enforce at all times,
> though of course not impossible.  Maybe it's okay to just de-duplicate
> pg_publication_tables output as the patch does, even though it may
> appear to be a band-aid solution if we one considers Alvaro's point
> about consistency.
>

True, I think even if we consider that idea it will be a much bigger
change, and also as it will be a behavioral change so we might want to
keep it just for HEAD and some of these fixes need to be backpatched.
Having said that, if you or someone want to pursue that idea and come
up with a better solution than what we have currently it is certainly
welcome.


> > > What do you think?
> > I agreed that we can commit only in HEAD.
>
> +1 to do this only in HEAD.
>

Okay, If there are no further suggestions or objections, I'll commit
the patch early next week (mostly by Monday).

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > Reading Alvaro's email at the top again gave me a pause to reconsider
> > what I had said in reply.  It might indeed have been nice if the
> > publication DDL itself had prevented the cases where a partition and
> > its ancestor are added to a publication, though as Hou-san mentioned,
> > partition ATTACHes make that a bit tricky to enforce at all times,
> > though of course not impossible.  Maybe it's okay to just de-duplicate
> > pg_publication_tables output as the patch does, even though it may
> > appear to be a band-aid solution if we one considers Alvaro's point
> > about consistency.
>
> True, I think even if we consider that idea it will be a much bigger
> change, and also as it will be a behavioral change so we might want to
> keep it just for HEAD and some of these fixes need to be backpatched.
> Having said that, if you or someone want to pursue that idea and come
> up with a better solution than what we have currently it is certainly
> welcome.

Okay, I did write a PoC patch this morning after sending out my
earlier email.  I polished it a bit, which is attached.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Reading Alvaro's email at the top again gave me a pause to reconsider
> > > what I had said in reply.  It might indeed have been nice if the
> > > publication DDL itself had prevented the cases where a partition and
> > > its ancestor are added to a publication, though as Hou-san mentioned,
> > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > though of course not impossible.  Maybe it's okay to just de-duplicate
> > > pg_publication_tables output as the patch does, even though it may
> > > appear to be a band-aid solution if we one considers Alvaro's point
> > > about consistency.
> >
> > True, I think even if we consider that idea it will be a much bigger
> > change, and also as it will be a behavioral change so we might want to
> > keep it just for HEAD and some of these fixes need to be backpatched.
> > Having said that, if you or someone want to pursue that idea and come
> > up with a better solution than what we have currently it is certainly
> > welcome.
>
> Okay, I did write a PoC patch this morning after sending out my
> earlier email.  I polished it a bit, which is attached.
>

I see multiple problems with this patch and idea. (a) I think you
forgot to deal with "All Tables In Schema" Publication which will be
quite tricky to deal with during attach operation. How will you remove
a particular relation from such a publication if there is a need to do
so? (b) Also, how will we prohibit adding partition and its root in
the case of "All Tables In Schema" or "All Tables" publication? If
there is no way to do that, then that would mean we anyway need to
deal with parent and child tables as part of the same publication and
this new behavior will add an exception to it. (c) I think this can
make switching "publish_via_partition_root" inconvenient for users.
Say all or some of the partitions are part of the publication and then
the user decides to add root table and change publish option
"publish_via_partition_root" to "true" at that moment she needs to
remove all partitions first.

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Reading Alvaro's email at the top again gave me a pause to reconsider
> > > > what I had said in reply.  It might indeed have been nice if the
> > > > publication DDL itself had prevented the cases where a partition and
> > > > its ancestor are added to a publication, though as Hou-san mentioned,
> > > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > > though of course not impossible.  Maybe it's okay to just de-duplicate
> > > > pg_publication_tables output as the patch does, even though it may
> > > > appear to be a band-aid solution if we one considers Alvaro's point
> > > > about consistency.
> > >
> > > True, I think even if we consider that idea it will be a much bigger
> > > change, and also as it will be a behavioral change so we might want to
> > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > Having said that, if you or someone want to pursue that idea and come
> > > up with a better solution than what we have currently it is certainly
> > > welcome.
> >
> > Okay, I did write a PoC patch this morning after sending out my
> > earlier email.  I polished it a bit, which is attached.
>
> I see multiple problems with this patch and idea.

Thanks for looking at it.  Yeah, I have not looked very closely at ALL
TABLES [IN SCHEMA], though only because I suspected that those cases
deal with partitioning in such a way that the partition duplication
issue doesn't arise.  That is, only the FOR TABLE list_of_tables and
ADD TABLE syntax allow for the duplication issue to occur.

> (a) I think you
> forgot to deal with "All Tables In Schema" Publication which will be
> quite tricky to deal with during attach operation. How will you remove
> a particular relation from such a publication if there is a need to do
> so?

Hmm, my understanding of how FOR ALL TABLES... features work is that
one cannot remove a particular relation from such publications?

create schema sch;
create table sch.p (a int primary key) partition by list (a);
create table sch.p1 partition of sch.p for values in (1);
create table sch.p2 partition of sch.p for values in (2);
create table p (a int primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create publication puball for all tables;
create publication pubsch for all tables in schema sch;

alter publication puball drop table p;
ERROR:  publication "puball" is defined as FOR ALL TABLES
DETAIL:  Tables cannot be added to or dropped from FOR ALL TABLES publications.

alter publication pubsch drop table sch.p;
ERROR:  relation "p" is not part of the publication

What am I missing?

> (b) Also, how will we prohibit adding partition and its root in
> the case of "All Tables In Schema" or "All Tables" publication? If
> there is no way to do that, then that would mean we anyway need to
> deal with parent and child tables as part of the same publication and
> this new behavior will add an exception to it.

I checked that FOR ALL TABLES publications don't allow adding a table
explicitly, but apparently IN SCHEMA one does:

alter publication pubsch add table p2;
\dRp+ pubsch
                           Publication pubsch
 Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
-------+------------+---------+---------+---------+-----------+----------
 amit  | f          | t       | t       | t       | t         | f
Tables:
    "public.p2"
Tables from schemas:
    "sch"

ISTM that the ..IN SCHEMA syntax does not behave like FOR ALL TABLES
without the IN SCHEMA in this regard.  Is that correct?

> (c) I think this can
> make switching "publish_via_partition_root" inconvenient for users.
> Say all or some of the partitions are part of the publication and then
> the user decides to add root table and change publish option
> "publish_via_partition_root" to "true" at that moment she needs to
> remove all partitions first.

Hmm, actually I don't think I have tested the case where some or all
partitions are present by themselves before adding the root table.
I'll check that and try to think of sensible behavior for that.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Fri, Dec 3, 2021 at 21:34 Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Reading Alvaro's email at the top again gave me a pause to reconsider
> > > > what I had said in reply.  It might indeed have been nice if the
> > > > publication DDL itself had prevented the cases where a partition and
> > > > its ancestor are added to a publication, though as Hou-san mentioned,
> > > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > > though of course not impossible.  Maybe it's okay to just de-duplicate
> > > > pg_publication_tables output as the patch does, even though it may
> > > > appear to be a band-aid solution if we one considers Alvaro's point
> > > > about consistency.
> > >
> > > True, I think even if we consider that idea it will be a much bigger
> > > change, and also as it will be a behavioral change so we might want to
> > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > Having said that, if you or someone want to pursue that idea and come
> > > up with a better solution than what we have currently it is certainly
> > > welcome.
> >
> > Okay, I did write a PoC patch this morning after sending out my
> > earlier email.  I polished it a bit, which is attached.
>
> I see multiple problems with this patch and idea.

Thanks for looking at it.  Yeah, I have not looked very closely at ALL
TABLES [IN SCHEMA], though only because I suspected that those cases
deal with partitioning in such a way that the partition duplication
issue doesn't arise.  That is, only the FOR TABLE list_of_tables and
ADD TABLE syntax allow for the duplication issue to occur.

Another thing I forgot to mention is that the patch passes check-world.  Perhaps we don’t have enough tests that would’ve exposed any problems with the patch’s approach.
--

Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Fri, Dec 3, 2021 at 6:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > > Reading Alvaro's email at the top again gave me a pause to reconsider
> > > > > what I had said in reply.  It might indeed have been nice if the
> > > > > publication DDL itself had prevented the cases where a partition and
> > > > > its ancestor are added to a publication, though as Hou-san mentioned,
> > > > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > > > though of course not impossible.  Maybe it's okay to just de-duplicate
> > > > > pg_publication_tables output as the patch does, even though it may
> > > > > appear to be a band-aid solution if we one considers Alvaro's point
> > > > > about consistency.
> > > >
> > > > True, I think even if we consider that idea it will be a much bigger
> > > > change, and also as it will be a behavioral change so we might want to
> > > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > > Having said that, if you or someone want to pursue that idea and come
> > > > up with a better solution than what we have currently it is certainly
> > > > welcome.
> > >
> > > Okay, I did write a PoC patch this morning after sending out my
> > > earlier email.  I polished it a bit, which is attached.
> >
> > I see multiple problems with this patch and idea.
>
> Thanks for looking at it.  Yeah, I have not looked very closely at ALL
> TABLES [IN SCHEMA], though only because I suspected that those cases
> deal with partitioning in such a way that the partition duplication
> issue doesn't arise.  That is, only the FOR TABLE list_of_tables and
> ADD TABLE syntax allow for the duplication issue to occur.
>
> > (a) I think you
> > forgot to deal with "All Tables In Schema" Publication which will be
> > quite tricky to deal with during attach operation. How will you remove
> > a particular relation from such a publication if there is a need to do
> > so?
>
> Hmm, my understanding of how FOR ALL TABLES... features work is that
> one cannot remove a particular relation from such publications?
>
> create schema sch;
> create table sch.p (a int primary key) partition by list (a);
> create table sch.p1 partition of sch.p for values in (1);
> create table sch.p2 partition of sch.p for values in (2);
> create table p (a int primary key) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> create publication puball for all tables;
> create publication pubsch for all tables in schema sch;
>
> alter publication puball drop table p;
> ERROR:  publication "puball" is defined as FOR ALL TABLES
> DETAIL:  Tables cannot be added to or dropped from FOR ALL TABLES publications.
>
> alter publication pubsch drop table sch.p;
> ERROR:  relation "p" is not part of the publication
>
> What am I missing?
>

Currently, in your patch, you are trying to remove a particular
relation/partition during attach but how will you do that if such a
relation is part of All Tables In Schema publication?

> > (b) Also, how will we prohibit adding partition and its root in
> > the case of "All Tables In Schema" or "All Tables" publication? If
> > there is no way to do that, then that would mean we anyway need to
> > deal with parent and child tables as part of the same publication and
> > this new behavior will add an exception to it.
>
> I checked that FOR ALL TABLES publications don't allow adding a table
> explicitly, but apparently IN SCHEMA one does:
>
> alter publication pubsch add table p2;
> \dRp+ pubsch
>                            Publication pubsch
>  Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
> -------+------------+---------+---------+---------+-----------+----------
>  amit  | f          | t       | t       | t       | t         | f
> Tables:
>     "public.p2"
> Tables from schemas:
>     "sch"
>
> ISTM that the ..IN SCHEMA syntax does not behave like FOR ALL TABLES
> without the IN SCHEMA in this regard.  Is that correct?
>

We do allow adding additional tables or schema to an exiting All
tables in schema publication.

-- 
With Regards,
Amit Kapila.



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, December 2, 2021 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> > > Reading Alvaro's email at the top again gave me a pause to
> > > reconsider what I had said in reply.  It might indeed have been nice
> > > if the publication DDL itself had prevented the cases where a
> > > partition and its ancestor are added to a publication, though as
> > > Hou-san mentioned, partition ATTACHes make that a bit tricky to
> > > enforce at all times, though of course not impossible.  Maybe it's
> > > okay to just de-duplicate pg_publication_tables output as the patch
> > > does, even though it may appear to be a band-aid solution if we one
> > > considers Alvaro's point about consistency.
> >
> > True, I think even if we consider that idea it will be a much bigger
> > change, and also as it will be a behavioral change so we might want to
> > keep it just for HEAD and some of these fixes need to be backpatched.
> > Having said that, if you or someone want to pursue that idea and come
> > up with a better solution than what we have currently it is certainly
> > welcome.
> 
> Okay, I did write a PoC patch this morning after sending out my earlier email.  I
> polished it a bit, which is attached.

After thinking more on this. I find there might be some usage about adding both
child and parent to the publication.

For the future feature publication row filter(and maybe column filter), It
could be useful for user to adding child and parent with different filter
expression. If pubviaroot=true, user can expect the parent's filter take
effect, If pubviaroot=false, they can expect the child's filter take effect.

If we disallow adding both child and parent to publication, it could be harder
for these features to implement.

So, personally, I am not sure disallow adding both child and parent to the
publication is a good idea.

Best regards,
Hou zj

Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> On Thursday, December 2, 2021 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com>
> > wrote:
> > > > Reading Alvaro's email at the top again gave me a pause to
> > > > reconsider what I had said in reply.  It might indeed have been nice
> > > > if the publication DDL itself had prevented the cases where a
> > > > partition and its ancestor are added to a publication, though as
> > > > Hou-san mentioned, partition ATTACHes make that a bit tricky to
> > > > enforce at all times, though of course not impossible.  Maybe it's
> > > > okay to just de-duplicate pg_publication_tables output as the patch
> > > > does, even though it may appear to be a band-aid solution if we one
> > > > considers Alvaro's point about consistency.
> > >
> > > True, I think even if we consider that idea it will be a much bigger
> > > change, and also as it will be a behavioral change so we might want to
> > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > Having said that, if you or someone want to pursue that idea and come
> > > up with a better solution than what we have currently it is certainly
> > > welcome.
> >
> > Okay, I did write a PoC patch this morning after sending out my earlier email.  I
> > polished it a bit, which is attached.
>
> After thinking more on this. I find there might be some usage about adding both
> child and parent to the publication.
>
> For the future feature publication row filter(and maybe column filter), It
> could be useful for user to adding child and parent with different filter
> expression. If pubviaroot=true, user can expect the parent's filter take
> effect, If pubviaroot=false, they can expect the child's filter take effect.
>
> If we disallow adding both child and parent to publication, it could be harder
> for these features to implement.

It's possible that one may want such a feature, and yes, outright
preventing adding both the parent and a partition to a publication
would perhaps make it harder.  Mind you though that any such feature
would need to fix the current implementation anyway to make the
publication-behavior-related catalog entries for child tables, which
we currently don't, unless children are added explicitly.

That said, we should also be careful in implementing new features that
offer a finer per-partition granularity, because the fact that we do
currently offer that for many of the existing old features such as
constraints, indexes, etc. limits the scalability of our architecture.
That point may be off-topic for the thread, but something to consider,
or maybe I need to take a look at the other patch to see if my
concerns are addressable / have been addressed.


--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Dec 6, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Thursday, December 2, 2021 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com>
> > > wrote:
> > > > > Reading Alvaro's email at the top again gave me a pause to
> > > > > reconsider what I had said in reply.  It might indeed have been nice
> > > > > if the publication DDL itself had prevented the cases where a
> > > > > partition and its ancestor are added to a publication, though as
> > > > > Hou-san mentioned, partition ATTACHes make that a bit tricky to
> > > > > enforce at all times, though of course not impossible.  Maybe it's
> > > > > okay to just de-duplicate pg_publication_tables output as the patch
> > > > > does, even though it may appear to be a band-aid solution if we one
> > > > > considers Alvaro's point about consistency.
> > > >
> > > > True, I think even if we consider that idea it will be a much bigger
> > > > change, and also as it will be a behavioral change so we might want to
> > > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > > Having said that, if you or someone want to pursue that idea and come
> > > > up with a better solution than what we have currently it is certainly
> > > > welcome.
> > >
> > > Okay, I did write a PoC patch this morning after sending out my earlier email.  I
> > > polished it a bit, which is attached.
> >
> > After thinking more on this. I find there might be some usage about adding both
> > child and parent to the publication.
> >
> > For the future feature publication row filter(and maybe column filter), It
> > could be useful for user to adding child and parent with different filter
> > expression. If pubviaroot=true, user can expect the parent's filter take
> > effect, If pubviaroot=false, they can expect the child's filter take effect.
> >
> > If we disallow adding both child and parent to publication, it could be harder
> > for these features to implement.
>
> It's possible that one may want such a feature, and yes, outright
> preventing adding both the parent and a partition to a publication
> would perhaps make it harder.  Mind you though that any such feature
> would need to fix the current implementation anyway to make the
> publication-behavior-related catalog entries for child tables, which
> we currently don't, unless children are added explicitly.
>
> That said, we should also be careful in implementing new features that
> offer a finer per-partition granularity, because the fact that we do
> currently offer that for many of the existing old features such as
> constraints, indexes, etc. limits the scalability of our architecture.
> That point may be off-topic for the thread, but something to consider,
> or maybe I need to take a look at the other patch to see if my
> concerns are addressable / have been addressed.
>

Your concern is not very clear to me. Can you be more specific in what
kind of problem you see with such a design for row filtering? You
might want to once look at that thread. You can find a summary of that
patch in the email [1].

[1] - https://www.postgresql.org/message-id/49ba49f1-8bdb-40b7-ae9e-f17d88b3afcd%40www.fastmail.com

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Mon, Dec 6, 2021 at 1:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Dec 3, 2021 at 6:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Okay, I did write a PoC patch this morning after sending out my
> > > > earlier email.  I polished it a bit, which is attached.
> > >
> > > I see multiple problems with this patch and idea.
> >
> > Thanks for looking at it.  Yeah, I have not looked very closely at ALL
> > TABLES [IN SCHEMA], though only because I suspected that those cases
> > deal with partitioning in such a way that the partition duplication
> > issue doesn't arise.  That is, only the FOR TABLE list_of_tables and
> > ADD TABLE syntax allow for the duplication issue to occur.
> >
> > > (a) I think you
> > > forgot to deal with "All Tables In Schema" Publication which will be
> > > quite tricky to deal with during attach operation. How will you remove
> > > a particular relation from such a publication if there is a need to do
> > > so?
> >
> > Hmm, my understanding of how FOR ALL TABLES... features work is that
> > one cannot remove a particular relation from such publications?
> >
> > create schema sch;
> > create table sch.p (a int primary key) partition by list (a);
> > create table sch.p1 partition of sch.p for values in (1);
> > create table sch.p2 partition of sch.p for values in (2);
> > create table p (a int primary key) partition by list (a);
> > create table p1 partition of p for values in (1);
> > create table p2 partition of p for values in (2);
> > create publication puball for all tables;
> > create publication pubsch for all tables in schema sch;
> >
> > alter publication puball drop table p;
> > ERROR:  publication "puball" is defined as FOR ALL TABLES
> > DETAIL:  Tables cannot be added to or dropped from FOR ALL TABLES publications.
> >
> > alter publication pubsch drop table sch.p;
> > ERROR:  relation "p" is not part of the publication
> >
> > What am I missing?
>
> Currently, in your patch, you are trying to remove a particular
> relation/partition during attach but how will you do that if such a
> relation is part of All Tables In Schema publication?

So IIUC the scenario of concern is when a table to be attached as a
partition is in a schema that's present in pg_publication_namespace.
The only way to stop it from being published is to move it to another
schema that is not published using that publication.

I think I misunderstood how the IN SCHEMA feature works.
Specifically, I didn't know that one can add a partitioned table to
the same publication (or any table other than those in a particular
schema for that matter).  Then the attached partition would still be
present in the publication by way of being part of the schema that is
present in the publication, along with the partitioned table that is
added separately.

Yes, my proposal in its current form can't prevent that kind of duplication.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Dec 6, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > > After thinking more on this. I find there might be some usage about adding both
> > > child and parent to the publication.
> > >
> > > For the future feature publication row filter(and maybe column filter), It
> > > could be useful for user to adding child and parent with different filter
> > > expression. If pubviaroot=true, user can expect the parent's filter take
> > > effect, If pubviaroot=false, they can expect the child's filter take effect.
> > >
> > > If we disallow adding both child and parent to publication, it could be harder
> > > for these features to implement.
> >
> > It's possible that one may want such a feature, and yes, outright
> > preventing adding both the parent and a partition to a publication
> > would perhaps make it harder.  Mind you though that any such feature
> > would need to fix the current implementation anyway to make the
> > publication-behavior-related catalog entries for child tables, which
> > we currently don't, unless children are added explicitly.
> >
> > That said, we should also be careful in implementing new features that
> > offer a finer per-partition granularity, because the fact that we do
> > currently offer that for many of the existing old features such as
> > constraints, indexes, etc. limits the scalability of our architecture.
> > That point may be off-topic for the thread, but something to consider,
> > or maybe I need to take a look at the other patch to see if my
> > concerns are addressable / have been addressed.
> >
>
> Your concern is not very clear to me. Can you be more specific in what
> kind of problem you see with such a design for row filtering?

I guess my characterization of it is still vague but the problem I see
is that we'll be adding one more feature that allows manipulating
partitions directly, which means more system catalogs than there
already are that know about partitions as objects.  I remember being
deliberate about avoiding pg_publication_rel entries for all
partitions when a partitioned table is added to a publication for that
very reason.

Is covering the use case of defining different filters for each
partition important for developing that feature?  I feel that defining
one for the root table and just using that for all partitions (albeit
translating as needed to account for possibly different attribute
numbers) would be enough for most users' use cases.  Maybe I'm wrong
on that.

> You
> might want to once look at that thread. You can find a summary of that
> patch in the email [1].

Okay, I'll try to do that.


--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Dec 6, 2021 at 8:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> So IIUC the scenario of concern is when a table to be attached as a
> partition is in a schema that's present in pg_publication_namespace.
> The only way to stop it from being published is to move it to another
> schema that is not published using that publication.
>
> I think I misunderstood how the IN SCHEMA feature works.
> Specifically, I didn't know that one can add a partitioned table to
> the same publication (or any table other than those in a particular
> schema for that matter).  Then the attached partition would still be
> present in the publication by way of being part of the schema that is
> present in the publication, along with the partitioned table that is
> added separately.
>

Right.

> Yes, my proposal in its current form can't prevent that kind of duplication.
>

I am not sure how to proceed here. I feel it is better to go ahead
with the fix Hou-san proposed here and in another email [1] to fix the
know issues, especially because the issue discussed in [1] needs to be
back-patched. We can evaluate your proposal separately for HEAD. What
do you think? I am fine if you think that we should evaluate/analyze
your proposal before making a call on whether to fix the existing
issues with the proposed set of patches.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB57165D94CB187A97128F75EA946A9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Mon, Dec 6, 2021 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Dec 6, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com
> > > <houzj.fnst@fujitsu.com> wrote:
> > > > After thinking more on this. I find there might be some usage about adding both
> > > > child and parent to the publication.
> > > >
> > > > For the future feature publication row filter(and maybe column filter), It
> > > > could be useful for user to adding child and parent with different filter
> > > > expression. If pubviaroot=true, user can expect the parent's filter take
> > > > effect, If pubviaroot=false, they can expect the child's filter take effect.
> > > >
> > > > If we disallow adding both child and parent to publication, it could be harder
> > > > for these features to implement.
> > >
> > > It's possible that one may want such a feature, and yes, outright
> > > preventing adding both the parent and a partition to a publication
> > > would perhaps make it harder.  Mind you though that any such feature
> > > would need to fix the current implementation anyway to make the
> > > publication-behavior-related catalog entries for child tables, which
> > > we currently don't, unless children are added explicitly.
> > >
> > > That said, we should also be careful in implementing new features that
> > > offer a finer per-partition granularity, because the fact that we do
> > > currently offer that for many of the existing old features such as
> > > constraints, indexes, etc. limits the scalability of our architecture.
> > > That point may be off-topic for the thread, but something to consider,
> > > or maybe I need to take a look at the other patch to see if my
> > > concerns are addressable / have been addressed.
> > >
> >
> > Your concern is not very clear to me. Can you be more specific in what
> > kind of problem you see with such a design for row filtering?
>
> I guess my characterization of it is still vague but the problem I see
> is that we'll be adding one more feature that allows manipulating
> partitions directly, which means more system catalogs than there
> already are that know about partitions as objects.  I remember being
> deliberate about avoiding pg_publication_rel entries for all
> partitions when a partitioned table is added to a publication for that
> very reason.
>

The same will be true even after this new feature.

> Is covering the use case of defining different filters for each
> partition important for developing that feature?
>

Users can only define the filters for a relation only when they want
to add a particular partition to the publication. Say, if only root
table is added to the publication (and pubviaroot is true) then we
don't need row filters to be defined for partitions. Basically, there
will be only one additional column in pg_publication_rel for the row
filter.

>  I feel that defining
> one for the root table and just using that for all partitions (albeit
> translating as needed to account for possibly different attribute
> numbers) would be enough for most users' use cases.
>

Sure, but what if the user just wants to publish only one of the
partitions? I think in that case along with adding the partition to
publication, we need to allow specifying row filter.

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Tue, Dec 7, 2021 at 12:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Dec 6, 2021 at 8:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > So IIUC the scenario of concern is when a table to be attached as a
> > partition is in a schema that's present in pg_publication_namespace.
> > The only way to stop it from being published is to move it to another
> > schema that is not published using that publication.
> >
> > I think I misunderstood how the IN SCHEMA feature works.
> > Specifically, I didn't know that one can add a partitioned table to
> > the same publication (or any table other than those in a particular
> > schema for that matter).  Then the attached partition would still be
> > present in the publication by way of being part of the schema that is
> > present in the publication, along with the partitioned table that is
> > added separately.
>
> Right.
>
> > Yes, my proposal in its current form can't prevent that kind of duplication.
>
> I am not sure how to proceed here. I feel it is better to go ahead
> with the fix Hou-san proposed here and in another email [1] to fix the
> know issues, especially because the issue discussed in [1] needs to be
> back-patched.
>
> We can evaluate your proposal separately for HEAD. What
> do you think?

Yeah, maybe.  Though given the direction that the row-filters patch
set is taking in allowing to definw filters on the individual
partitions, I am not sure if I should be pushing the approach to
disallow partitions from being added to publications explicitly
alongside their parent tables.  I'll try to take a look at that thread
to be sure if that's actually the case.

Also, for the purposes of the problems that Greg and Hou-san have
discovered, I have no objection with applying Hou-san's patches.
Those seem better for the back-patching anyway.

Thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Tue, Dec 7, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Dec 6, 2021 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Your concern is not very clear to me. Can you be more specific in what
> > > kind of problem you see with such a design for row filtering?
> >
> > I guess my characterization of it is still vague but the problem I see
> > is that we'll be adding one more feature that allows manipulating
> > partitions directly, which means more system catalogs than there
> > already are that know about partitions as objects.  I remember being
> > deliberate about avoiding pg_publication_rel entries for all
> > partitions when a partitioned table is added to a publication for that
> > very reason.
>
> The same will be true even after this new feature.

Oh, interesting.

> > Is covering the use case of defining different filters for each
> > partition important for developing that feature?
> >
>
> Users can only define the filters for a relation only when they want
> to add a particular partition to the publication. Say, if only root
> table is added to the publication (and pubviaroot is true) then we
> don't need row filters to be defined for partitions. Basically, there
> will be only one additional column in pg_publication_rel for the row
> filter.

Okay, so you are talking about allowing setting filters on a partition
that is added *explicitly* to a publication.  I guess that's the only
way to do such a thing given the syntax of FOR TABLE / ADD TABLE.
Filters specified on a partitioned table that is added to a
publication apply equally to all partitions, depending on the
publish_via_partition_root configuration.  So if a partition is
explicitly present in a publication with a filter defined on it, I
suppose following are the possible scenarios:

* publish_via_partition_root=true

1. If the root table is not present in the publication (perhaps
unlikely), use the partition's filter because there's nothing else to
consider.

2. If the root table is present in the publication (very likely), use
the root's filter, ignoring/overriding the partition's own if any.

* publish_via_partition_root=false

1. If the root table is not present in the publication, use the
partition's filter because there's nothing else to consider.

2. If the root table is present in the publication, use the
partition's own filter if present, else root's.

> >  I feel that defining
> > one for the root table and just using that for all partitions (albeit
> > translating as needed to account for possibly different attribute
> > numbers) would be enough for most users' use cases.
> >
>
> Sure, but what if the user just wants to publish only one of the
> partitions? I think in that case along with adding the partition to
> publication, we need to allow specifying row filter.

Yeah, I understand now that we're talking about partitions that are
added explicitly.

Thanks for your patience.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Tue, Dec 7, 2021 at 10:52 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Dec 7, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Mon, Dec 6, 2021 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > Your concern is not very clear to me. Can you be more specific in what
> > > > kind of problem you see with such a design for row filtering?
> > >
> > > I guess my characterization of it is still vague but the problem I see
> > > is that we'll be adding one more feature that allows manipulating
> > > partitions directly, which means more system catalogs than there
> > > already are that know about partitions as objects.  I remember being
> > > deliberate about avoiding pg_publication_rel entries for all
> > > partitions when a partitioned table is added to a publication for that
> > > very reason.
> >
> > The same will be true even after this new feature.
>
> Oh, interesting.
>
> > > Is covering the use case of defining different filters for each
> > > partition important for developing that feature?
> > >
> >
> > Users can only define the filters for a relation only when they want
> > to add a particular partition to the publication. Say, if only root
> > table is added to the publication (and pubviaroot is true) then we
> > don't need row filters to be defined for partitions. Basically, there
> > will be only one additional column in pg_publication_rel for the row
> > filter.
>
> Okay, so you are talking about allowing setting filters on a partition
> that is added *explicitly* to a publication.  I guess that's the only
> way to do such a thing given the syntax of FOR TABLE / ADD TABLE.
> Filters specified on a partitioned table that is added to a
> publication apply equally to all partitions, depending on the
> publish_via_partition_root configuration.
>

Yes.

>  So if a partition is
> explicitly present in a publication with a filter defined on it, I
> suppose following are the possible scenarios:
>
> * publish_via_partition_root=true
>
> 1. If the root table is not present in the publication (perhaps
> unlikely), use the partition's filter because there's nothing else to
> consider.
>
> 2. If the root table is present in the publication (very likely), use
> the root's filter, ignoring/overriding the partition's own if any.
>
> * publish_via_partition_root=false
>
> 1. If the root table is not present in the publication, use the
> partition's filter because there's nothing else to consider.
>
> 2. If the root table is present in the publication, use the
> partition's own filter if present, else root's.
>

I have not tested/checked each of these scenarios individually but I
think it is something like, if publish_via_partition_root is false
then we will always try to use partitions row filter and if it is not
there then we don't use any filter. Similarly, if
publish_via_partition_root is true, then we always try to use a
partitioned table row filter and if it is not there (either because
the partitioned table is not part of the publication or because there
is no filter defined for it) then we don't use any filter.

-- 
With Regards,
Amit Kapila.



Re: pg_get_publication_tables() output duplicate relid

From
Amit Langote
Date:
On Tue, Dec 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Dec 7, 2021 at 10:52 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >  So if a partition is
> > explicitly present in a publication with a filter defined on it, I
> > suppose following are the possible scenarios:
> >
> > * publish_via_partition_root=true
> >
> > 1. If the root table is not present in the publication (perhaps
> > unlikely), use the partition's filter because there's nothing else to
> > consider.
> >
> > 2. If the root table is present in the publication (very likely), use
> > the root's filter, ignoring/overriding the partition's own if any.
> >
> > * publish_via_partition_root=false
> >
> > 1. If the root table is not present in the publication, use the
> > partition's filter because there's nothing else to consider.
> >
> > 2. If the root table is present in the publication, use the
> > partition's own filter if present, else root's.
>
> I have not tested/checked each of these scenarios individually but I
> think it is something like, if publish_via_partition_root is false
> then we will always try to use partitions row filter and if it is not
> there then we don't use any filter. Similarly, if
> publish_via_partition_root is true, then we always try to use a
> partitioned table row filter and if it is not there (either because
> the partitioned table is not part of the publication or because there
> is no filter defined for it) then we don't use any filter.

Okay, thanks for the explanation.


--
Amit Langote
EDB: http://www.enterprisedb.com



Re: pg_get_publication_tables() output duplicate relid

From
Amit Kapila
Date:
On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Dec 2, 2021 at 9:44 houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote
> > > > > <amitlangote09@gmail.com> wrote:
> > > > > > On second thought, I agree that de-duplicating partitions from
> > > > > > this view is an improvement.
> > > > >
> > > > > Fair enough. Hou-San, Can you please submit the updated patch after
> > > > > fixing any pending comments including the test case?
> > > >
> > > > Attach the updated patch which address the comments so far.
> > > >
> > > > The patch only adds a testcase in publication.sql because the
> > > > duplicate output doesn't cause unexpected behavior in pub-sub test.
> > >
> > > Thanks, the patch looks good to me. I have slightly changed the commit
> > > message in the attached. I would like to commit this only in HEAD as there is no
> > > user complaint about this and it might not stop any user's service unless it relies
> > > on this view's output for the initial table synchronization.
>
> The patch looks good to me too in that it gets the job done.
>

Pushed.

-- 
With Regards,
Amit Kapila.



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Sat, Nov 20, 2021 7:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> > >
> > > The problematic case is attaching the partition *after* the
> > > subscriber has already marked the root parent as synced and/or ready
> > > for replication.  Refreshing the subscription doesn't help it
> > > discover the newly attached partition, because a
> > > publish_via_partition_root only ever tells about the root parent,
> > > which would be already synced, so the subscriber would think there's
> > > nothing to copy.
> > >
> >
> > Okay, I see this could be a problem but I haven't tried to reproduce it.
> 
> One more thing you mentioned is that the initial sync won't work after refresh
> but later changes will be replicated but I noticed that later changes also don't
> get streamed till we restart the subscriber server.
> I am not sure but we might not be invalidating apply workers cache due to
> which it didn't notice the same.

I investigated this bug recently, and I think the reason is that when receiving
relcache invalidation message, the callback function[1] in walsender only reset
the schema sent status while it doesn't reset the replicate_valid flag. So, it
won’t rebuild the publication actions of the relation.

[1]
static void
rel_sync_cache_relation_cb(Datum arg, Oid relid)
...
    /*
     * Reset schema sent status as the relation definition may have changed.
     * Also free any objects that depended on the earlier definition.
     */
    if (entry != NULL)
    {
        entry->schema_sent = false;
        list_free(entry->streamed_txns);
...

Also, when you DETACH a partition, the publication won’t be rebuilt too because
of the same reason. Which could cause unexpected behavior if we modify the
detached table's data . And the bug happens regardless of whether pubviaroot is
set or not.

For the fix:

I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb,
then the bug can be fixed. I have a bit hesitation about this approach, because
it could increase the frequency of invalidating and rebuilding the publication
action. But I haven't produced some other better approaches.

Another possibility could be that we add a syscache callback function for
pg_inherits table, but we currently don't have syscache for pg_inherits. We
might need to add the cache pg_inherits first which doesn't seems better
than the above approach.

What do you think ?

Attach an initial patch which reset the replicate_valid flag in
rel_sync_cache_relation_cb and add some reproduction tests in 013_partition.pl.

Best regards,
Hou zj




Attachment

RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, December 14, 2021 3:42 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> 
> On Sat, Nov 20, 2021 7:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote
> > > <amitlangote09@gmail.com>
> > wrote:
> > > >
> > > > The problematic case is attaching the partition *after* the
> > > > subscriber has already marked the root parent as synced and/or
> > > > ready for replication.  Refreshing the subscription doesn't help
> > > > it discover the newly attached partition, because a
> > > > publish_via_partition_root only ever tells about the root parent,
> > > > which would be already synced, so the subscriber would think
> > > > there's nothing to copy.
> > > >
> > >
> > > Okay, I see this could be a problem but I haven't tried to reproduce it.
> >
> > One more thing you mentioned is that the initial sync won't work after
> > refresh but later changes will be replicated but I noticed that later
> > changes also don't get streamed till we restart the subscriber server.
> > I am not sure but we might not be invalidating apply workers cache due
> > to which it didn't notice the same.
> 
> I investigated this bug recently, and I think the reason is that when receiving
> relcache invalidation message, the callback function[1] in walsender only reset
> the schema sent status while it doesn't reset the replicate_valid flag. So, it
> won’t rebuild the publication actions of the relation.
> 
> [1]
> static void
> rel_sync_cache_relation_cb(Datum arg, Oid relid) ...
>     /*
>      * Reset schema sent status as the relation definition may have
> changed.
>      * Also free any objects that depended on the earlier definition.
>      */
>     if (entry != NULL)
>     {
>         entry->schema_sent = false;
>         list_free(entry->streamed_txns);
> ...
> 
> Also, when you DETACH a partition, the publication won’t be rebuilt too
> because of the same reason. Which could cause unexpected behavior if we
> modify the detached table's data . And the bug happens regardless of whether
> pubviaroot is set or not.
> 
> For the fix:
> 
> I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb, then
> the bug can be fixed. I have a bit hesitation about this approach, because it
> could increase the frequency of invalidating and rebuilding the publication
> action. But I haven't produced some other better approaches.
> 

I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent
commit 7f481b8. Currently, we always invalidate the RelationSyncCache when
attaching a partition, so the pubactions of the newly attached partition will
be rebuilt correctly.

Best regards,
Hou zj



Re: pg_get_publication_tables() output duplicate relid

From
Alvaro Herrera
Date:
On 2022-Apr-11, houzj.fnst@fujitsu.com wrote:

> I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent
> commit 7f481b8. Currently, we always invalidate the RelationSyncCache when
> attaching a partition, so the pubactions of the newly attached partition will
> be rebuilt correctly.

Hm, this commit was not backpatched -- it is only in master.  I just
checked that pg14 behaves as you describe in the opening email of this
thread; did we decide that we don't want to change behavior of stable
branches?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



RE: pg_get_publication_tables() output duplicate relid

From
"houzj.fnst@fujitsu.com"
Date:
On Friday, April 15, 2022 12:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 
> On 2022-Apr-11, houzj.fnst@fujitsu.com wrote:
> 
> > I have confirmed that the bug of ATTACH PARTITION has been fixed due to
> recent
> > commit 7f481b8. Currently, we always invalidate the RelationSyncCache when
> > attaching a partition, so the pubactions of the newly attached partition will
> > be rebuilt correctly.
> 
> Hm, this commit was not backpatched -- it is only in master.  I just
> checked that pg14 behaves as you describe in the opening email of this
> thread; did we decide that we don't want to change behavior of stable
> branches?

Hi,

Do you mean the original bug: "pg_get_publication_tables function will output
duplicate partition relid when adding both child and parent table to the
publication(pubviaroot = false). " ?

We decided to commit the fix for it only in HEAD as there is no user complaint
about this and it might not stop any user's service unless it relies on this
view's output for the initial table synchronization.

Or do you mean another one mentioned here which is "the changes on newly
attached partition won't be replicated" ? For this one, I think it's fine to
backpatch the fix and I can prepare the patch for back branches if needed. BTW,
if we want to backpatch this fix we might need to keep the order of
RelationSyncEntry struct member unchanged in back branches.

Best regards,
Hou zj