Thread: pg_get_publication_tables() output duplicate relid
Hi hackers, In another thread[1], we found the pg_get_publication_tables function will output duplicate partition relid when adding both child and parent table to the publication(pubviaroot = false). Example: create table tbl1 (a int) partition by range (a); create table tbl1_part1 partition of tbl1 for values from (1) to (10); create table tbl1_part2 partition of tbl1 for values from (10) to (20); create publication pub for table tbl1, tbl1_part1 with (publish_via_partition_root=false); select * from pg_get_publication_tables('pub'); relid ------- 16387 16390 16387 The reason of the duplicate output is that: pg_get_publication_tables invokes function GetPublicationRelations internally. In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1' twice. First time from extracting partitions from the specified parent table 'tbl1', second time from the explicitly specified partition 'tbl1_part1'. I am not sure is this behavior expected as it seems natural for pg_get_publication_tables to return duplicate-free relid list. OTOH, there seems no harm for the current behavior(duplicate output), it doesn't affect the initial sync and change replication when using logical replication. Personally, I think it might be better to make the output of pg_get_publication_tables duplicate-free, because the change happened on each output relid will only be replicated once. So, it seems more consistent to output each relid only once. Thoughts ? (Attach a patch which make the output duplicate-free) [1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com Best regards, Hou zj
Attachment
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > Hi hackers, > > In another thread[1], we found the pg_get_publication_tables function will output > duplicate partition relid when adding both child and parent table to the > publication(pubviaroot = false). > > Example: > create table tbl1 (a int) partition by range (a); > create table tbl1_part1 partition of tbl1 for values from (1) to (10); > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > create publication pub for table > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > select * from pg_get_publication_tables('pub'); > relid > ------- > 16387 > 16390 > 16387 > > The reason of the duplicate output is that: > > pg_get_publication_tables invokes function GetPublicationRelations internally. > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1' > twice. First time from extracting partitions from the specified parent table > 'tbl1', second time from the explicitly specified partition 'tbl1_part1'. > > > I am not sure is this behavior expected as it seems natural for > pg_get_publication_tables to return duplicate-free relid list. OTOH, there > seems no harm for the current behavior(duplicate output), it doesn't affect the > initial sync and change replication when using logical replication. > > Personally, I think it might be better to make the output of > pg_get_publication_tables duplicate-free, because the change happened on each > output relid will only be replicated once. So, it seems more consistent to > output each relid only once. > > Thoughts ? > > (Attach a patch which make the output duplicate-free) > > [1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com The users can always specify the distinct clause, see [1]. I don't see any problem with the existing way. If required you can specify in the view pg_publication_tables documentation that "This view returns multiple rows for the same relation, if the relation is child to a parent partition table and if both the parent and child are specified in the publication" or something similart. If at all, the pg_get_publication_tables() is supposed to give unique outputs, let's fix it and it is a good idea to specify that the view pg_publication_tables returns unique rows even though child and parent partition tables are specified in the publication. I have fdw comments on the patch: 1) Why to do result = list_concat_unique_oid(result, relids); within the while loop which can make the while loop O(n*n*n) complexity(list_concat_unique_oid is O(n*n)? Can't you use list_append_unique, of course this can also be costlier? Let's avoid adding anything to the while loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n)) /* Now sort and de-duplicate the result list */ list_sort(result, list_oid_cmp); list_deduplicate_oid(result); while (HeapTupleIsValid(tup = systable_getnext(scan))) { Form_pg_publication_rel pubrel; + List *relids = NIL; pubrel = (Form_pg_publication_rel) GETSTRUCT(tup); - result = GetPubPartitionOptionRelations(result, pub_partopt, + relids = GetPubPartitionOptionRelations(relids, pub_partopt, pubrel->prrelid); + + result = list_concat_unique_oid(result, relids); } 2) Are you sure you want GetPublicationRelations to be returning the unique relations? I'm just thinking why can't you just do list_sort+list_deduplicate_oid in the caller? I think this makes the function more restrictive. If required, you can pass a boolean flag (bool give_unique and specify in the function comments and if passed true do list_sort+list_deduplicate_oid at the end of GetPublicationRelations ). [1] postgres=# select * from pg_get_publication_tables('pub'); relid ------- 16387 16390 16387 (3 rows) postgres=# select distinct * from pg_get_publication_tables('pub'); relid ------- 16387 16390 (2 rows) postgres=# select * from pg_publication_tables; pubname | schemaname | tablename ---------+------------+------------ pub | public | tbl1_part1 pub | public | tbl1_part2 pub | public | tbl1_part1 (3 rows) postgres=# select distinct * from pg_publication_tables; pubname | schemaname | tablename ---------+------------+------------ pub | public | tbl1_part1 pub | public | tbl1_part2 (2 rows) Regards, Bharath Rupireddy.
> On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > create table tbl1 (a int) partition by range (a); > > create table tbl1_part1 partition of tbl1 for values from (1) to (10); > > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > > create publication pub for table > > tbl1, tbl1_part1 with (publish_via_partition_root=false); In the name of consistency, I think this situation should be an error -- I mean, if we detect that the user is trying to add both the partitioned table *and* its partition, then all manner of things are possibly going to go wrong in some way, so my inclination is to avoid it altogether. Is there any reason to allow that? If we do that, then we have to be careful with later ALTER PUBLICATION too: adding a partition is not allowed if its partitioned table is there, and adding a partitioned table should behave in some sensible way if one of its partitions is there (either removing the partition at the same time, or outright rejecting the operation.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Mon, Nov 15, 2021 6:17 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > Hi hackers, > > > > In another thread[1], we found the pg_get_publication_tables function > > will output duplicate partition relid when adding both child and > > parent table to the publication(pubviaroot = false). > > > > Example: > > create table tbl1 (a int) partition by range (a); create table > > tbl1_part1 partition of tbl1 for values from (1) to (10); create table > > tbl1_part2 partition of tbl1 for values from (10) to (20); create > > publication pub for table tbl1, tbl1_part1 with > > (publish_via_partition_root=false); > > > > select * from pg_get_publication_tables('pub'); relid > > ------- > > 16387 > > 16390 > > 16387 > > > > The reason of the duplicate output is that: > > > > pg_get_publication_tables invokes function GetPublicationRelations > internally. > > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1' > > twice. First time from extracting partitions from the specified parent > > table 'tbl1', second time from the explicitly specified partition 'tbl1_part1'. > > > > > > I am not sure is this behavior expected as it seems natural for > > pg_get_publication_tables to return duplicate-free relid list. OTOH, > > there seems no harm for the current behavior(duplicate output), it > > doesn't affect the initial sync and change replication when using logical > replication. > > > > Personally, I think it might be better to make the output of > > pg_get_publication_tables duplicate-free, because the change happened > > on each output relid will only be replicated once. So, it seems more > > consistent to output each relid only once. > > > > Thoughts ? > > > > (Attach a patch which make the output duplicate-free) > > > > [1] > > > https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37Ptsxu > NW4nB > > GN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com > > The users can always specify the distinct clause, see [1]. I don't see any problem > with the existing way. If required you can specify in the view > pg_publication_tables documentation that "This view returns multiple rows for > the same relation, if the relation is child to a parent partition table and if both > the parent and child are specified in the publication" or something similart. Thanks for the response. Yes, I agreed, as I said there's no harm for the current behavior. If most people don't think It's worth changing the behavior, I am fine with that. > If at all, the pg_get_publication_tables() is supposed to give unique outputs, > let's fix it and it is a good idea to specify that the view pg_publication_tables > returns unique rows even though child and parent partition tables are specified > in the publication. > > I have fdw comments on the patch: > Let's avoid adding anything to the while > loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n)) > Thanks for the comments, I will address this if we finally decide to fix it. > > 2) Are you sure you want GetPublicationRelations to be returning the unique > relations? I'm just thinking why can't you just do list_sort+list_deduplicate_oid > in the caller? I think this makes the function more restrictive. If required, you > can pass a boolean flag (bool give_unique and specify in the function > comments and if passed true do list_sort+list_deduplicate_oid at the end of > GetPublicationRelations ). I change the GetPublicationRelations because I found no caller need the duplicate oid, and de-duplicate oid in GetPublicationRelations could reduce some code change. I think if we need the duplicate oid in the future, we can add the flag as you suggested. Best regards, Hou zj
On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > > <houzj.fnst@fujitsu.com> wrote: > > > > create table tbl1 (a int) partition by range (a); create table > > > tbl1_part1 partition of tbl1 for values from (1) to (10); create > > > table tbl1_part2 partition of tbl1 for values from (10) to (20); > > > create publication pub for table tbl1, tbl1_part1 with > > > (publish_via_partition_root=false); > > In the name of consistency, I think this situation should be an error -- I mean, if > we detect that the user is trying to add both the partitioned table *and* its > partition, then all manner of things are possibly going to go wrong in some way, > so my inclination is to avoid it altogether. > > Is there any reason to allow that? > > If we do that, then we have to be careful with later ALTER PUBLICATION > too: adding a partition is not allowed if its partitioned table is there, and adding > a partitioned table should behave in some sensible way if one of its partitions is > there (either removing the partition at the same time, or outright rejecting the > operation.) Thanks for the response. If we decide to disallow this case, we seem need to handle some other cases as well, for example: We might also need additional check when ATTACH a partition, because the partition's parent table could already be published in the same publication as the partition. During the check we might also need to lock all the parent tables to handle concurrently change which looks complicated to me :( I am not sure is it worth adding these lock and check. Maybe we can leave the current behavior as is ? Best regards, Hou zj
On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > > <houzj.fnst@fujitsu.com> wrote: > > > > create table tbl1 (a int) partition by range (a); > > > create table tbl1_part1 partition of tbl1 for values from (1) to (10); > > > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > > > create publication pub for table > > > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > In the name of consistency, I think this situation should be an error -- > I mean, if we detect that the user is trying to add both the partitioned > table *and* its partition, then all manner of things are possibly going > to go wrong in some way, so my inclination is to avoid it altogether. > > Is there any reason to allow that? > I think it could provide flexibility to users to later change "publish_via_partition_root" option. Because when that option is false, we use individual partitions schema to send changes and when it is true, we use root table's schema to send changes. Added Amit L. to know if he has any thoughts on this matter as he was the author of this work? -- With Regards, Amit Kapila.
On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > create table tbl1 (a int) partition by range (a); create table > > > > tbl1_part1 partition of tbl1 for values from (1) to (10); create > > > > table tbl1_part2 partition of tbl1 for values from (10) to (20); > > > > create publication pub for table tbl1, tbl1_part1 with > > > > (publish_via_partition_root=false); > > > > In the name of consistency, I think this situation should be an error -- I mean, if > > we detect that the user is trying to add both the partitioned table *and* its > > partition, then all manner of things are possibly going to go wrong in some way, > > so my inclination is to avoid it altogether. > > > > Is there any reason to allow that? > > > > If we do that, then we have to be careful with later ALTER PUBLICATION > > too: adding a partition is not allowed if its partitioned table is there, and adding > > a partitioned table should behave in some sensible way if one of its partitions is > > there (either removing the partition at the same time, or outright rejecting the > > operation.) > > Thanks for the response. > > If we decide to disallow this case, we seem need to handle some other cases as > well, for example: We might also need additional check when ATTACH a partition, > because the partition's parent table could already be published in the same > publication as the partition. > What kind of additional checks you are envisioning and why? -- With Regards, Amit Kapila.
On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > On Mon, Nov 15, 2021 9:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> > wrote: > > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > create table tbl1 (a int) partition by range (a); create table > > > > > tbl1_part1 partition of tbl1 for values from (1) to (10); create > > > > > table tbl1_part2 partition of tbl1 for values from (10) to (20); > > > > > create publication pub for table tbl1, tbl1_part1 with > > > > > (publish_via_partition_root=false); > > > > > > In the name of consistency, I think this situation should be an > > > error -- I mean, if we detect that the user is trying to add both > > > the partitioned table *and* its partition, then all manner of things > > > are possibly going to go wrong in some way, so my inclination is to avoid it > > > altogether. > > > > > > Is there any reason to allow that? > > > > > > If we do that, then we have to be careful with later ALTER > > > PUBLICATION > > > too: adding a partition is not allowed if its partitioned table is > > > there, and adding a partitioned table should behave in some sensible > > > way if one of its partitions is there (either removing the partition > > > at the same time, or outright rejecting the > > > operation.) > > > > Thanks for the response. > > > > If we decide to disallow this case, we seem need to handle some other > > cases as well, for example: We might also need additional check when > > ATTACH a partition, because the partition's parent table could already > > be published in the same publication as the partition. > > > > What kind of additional checks you are envisioning and why? For example: create table tbl1 (a int) partition by range (a); create table tbl1_part1 (like tbl1); create table tbl1_part2 partition of tbl1 for values from (10) to (20); create publication pub for table tbl1, tbl1_part1 with (publish_via_partition_root=false); --- We might need addition check here Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10); In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1' which seems the case we want to disallow(both parent and child table in publication). So, When ATTACH, I thought we might need to check all the parent tables to see if any parent table is in the same publication which the table to be attached is also belongs to. Does it make sense ? Best regards, Hou zj
On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > create table tbl1 (a int) partition by range (a); > > > > create table tbl1_part1 partition of tbl1 for values from (1) to (10); > > > > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > > > > create publication pub for table > > > > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > > > In the name of consistency, I think this situation should be an error -- > > I mean, if we detect that the user is trying to add both the partitioned > > table *and* its partition, then all manner of things are possibly going > > to go wrong in some way, so my inclination is to avoid it altogether. > > > > Is there any reason to allow that? > > I think it could provide flexibility to users to later change > "publish_via_partition_root" option. Because when that option is > false, we use individual partitions schema to send changes and when it > is true, we use root table's schema to send changes. Added Amit L. to > know if he has any thoughts on this matter as he was the author of > this work? FWIW, I'm not sure that adding an error in this path, that is, when a user adds both a partitioned table and its partitions to a publication, helps much. As for the safety of allowing it, if you look at get_rel_sync_entry(), which handles much of the complexity of determining whether to publish a relation's changes and the schema to use when doing so, you may be able to see that a partition being added duplicatively is harmless, modulo any undiscovered bugs. At least as far as the post-initial-sync replication functionality is concerned. What IS problematic is what a subscriber sees in the pg_publication_tables view and the problem occurs only in the initial sync phase, where the partition is synced duplicatively because of being found in the view along with the parent in this case, that is, when publish_via_partiiton_root is true. I was saying on the other thread [1] that we should leave it up to the subscriber to decide what to do when the view (duplicatively) returns both the parent and the partition, citing the use case that a subscriber may want to replicate the parent and the partition as independent tables. Though I now tend to agree with Amit K that that may be such a meaningful and all that common use case, and the implementation on the subscriber side would have to be unnecessarily complex. So maybe it makes sense to just do what has been proposed -- de-duplicate partitions out of the pg_publication_tables view, unless we know of a bigger problem that requires us to hack the subscriber side of things too. Actually, I came to know of one such problem while thinking about this today: when you ATTACH a partition to a table that is present in a publish_via_partition_root=true publication, it doesn't get copied via the initial sync even though subsequent replication works just fine. The reason for that is that the subscriber only syncs the partitions that are known at the time when the parent is first synced, that too via the parent (as SELECT <columns..> FROM parent), and then marks the parent as sync-done. Refreshing the subscription after ATTACHing doesn't help, because the subscriber can't see any partitions to begin with. Maybe a more elaborate solution is needed for this one, though I haven't figured what it is going to look like yet. Thanks. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote: > > > If we decide to disallow this case, we seem need to handle some other > > > cases as well, for example: We might also need additional check when > > > ATTACH a partition, because the partition's parent table could already > > > be published in the same publication as the partition. > > > > > > > What kind of additional checks you are envisioning and why? > > For example: > > create table tbl1 (a int) partition by range (a); > create table tbl1_part1 (like tbl1); > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > create publication pub for table > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > --- We might need addition check here > Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10); > > In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a > publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1' > which seems the case we want to disallow(both parent and child table in > publication). So, When ATTACH, I thought we might need to check all the parent > tables to see if any parent table is in the same publication which the table to > be attached is also belongs to. Does it make sense ? I don't think creating or attaching a partition of a table that is present in a publish_via_partition_root=false actually adds the partition to pg_publication_rel, the base catalog. A partition's membership in the publication is implicit, unless of course you add it to the publication explicitly, like all the examples we have been discussing. I guess we're only arguing about the problems with the pg_publication_tables view, which does expand the partitioned table to show the partitions that are not otherwise not present in the base catalog. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Nov 17, 2021 at 3:09 PM Amit Langote <amitlangote09@gmail.com> wrote: > Though I now tend > to agree with Amit K that that may be such a meaningful and all that > common use case, Oops I meant: that may NOT be such... -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Nov 17, 2021 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote: > > > > If we decide to disallow this case, we seem need to handle some other > > > > cases as well, for example: We might also need additional check when > > > > ATTACH a partition, because the partition's parent table could already > > > > be published in the same publication as the partition. > > > > > > > > > > What kind of additional checks you are envisioning and why? > > > > For example: > > > > create table tbl1 (a int) partition by range (a); > > create table tbl1_part1 (like tbl1); > > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > > create publication pub for table > > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > > > --- We might need addition check here > > Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10); > > > > In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a > > publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1' > > which seems the case we want to disallow(both parent and child table in > > publication). So, When ATTACH, I thought we might need to check all the parent > > tables to see if any parent table is in the same publication which the table to > > be attached is also belongs to. Does it make sense ? > > I don't think creating or attaching a partition of a table that is > present in a publish_via_partition_root=false actually adds the > partition to pg_publication_rel, the base catalog. A partition's > membership in the publication is implicit, unless of course you add it > to the publication explicitly, like all the examples we have been > discussing. I guess we're only arguing about the problems with the > pg_publication_tables view, which does expand the partitioned table to > show the partitions that are not otherwise not present in the base > catalog. Maybe I didn't make it clear, I was trying to explain that it would be complicated if we want to completely disallow specifying both child and parent table in the publication because of the ATTACH case I gave. In other words, I think it's fine to specify both child and parent table in the publication. Best regards, Hou zj
On Wed, Nov 17, 2021 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Nov 17, 2021 at 12:15 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > On Wed, Nov 17, 2021 10:47 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Tue, Nov 16, 2021 at 7:21 AM houzj.fnst@fujitsu.com wrote:
> > > > If we decide to disallow this case, we seem need to handle some other
> > > > cases as well, for example: We might also need additional check when
> > > > ATTACH a partition, because the partition's parent table could already
> > > > be published in the same publication as the partition.
> > > >
> > >
> > > What kind of additional checks you are envisioning and why?
> >
> > For example:
> >
> > create table tbl1 (a int) partition by range (a);
> > create table tbl1_part1 (like tbl1);
> > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > create publication pub for table
> > tbl1, tbl1_part1 with (publish_via_partition_root=false);
> >
> > --- We might need addition check here
> > Alter table tbl1 ATTACH partition tbl1_part1 for values from (1) to (10);
> >
> > In the above cases, 'tbl1_part1' is not a partition of 'tb1' when creating a
> > publication. After the ATTACH, 'tbl1_part1' would become a partition of 'tbl1'
> > which seems the case we want to disallow(both parent and child table in
> > publication). So, When ATTACH, I thought we might need to check all the parent
> > tables to see if any parent table is in the same publication which the table to
> > be attached is also belongs to. Does it make sense ?
>
> I don't think creating or attaching a partition of a table that is
> present in a publish_via_partition_root=false actually adds the
> partition to pg_publication_rel, the base catalog. A partition's
> membership in the publication is implicit, unless of course you add it
> to the publication explicitly, like all the examples we have been
> discussing. I guess we're only arguing about the problems with the
> pg_publication_tables view, which does expand the partitioned table to
> show the partitions that are not otherwise not present in the base
> catalog.
Maybe I didn't make it clear, I was trying to explain that it would be
complicated if we want to completely disallow specifying both child and parent
table in the publication because of the ATTACH case I gave. In other words, I
think it's fine to specify both child and parent table in the publication.
Ah okay. I see your point. A table that was not once a partition would be in the catalog along with the unrelated parent table and if we were to enforce the rule that a parent and a partition cannot be in the catalog at the same time, we’d need to delete that table’s catalog record on ATTACHing it as a partition.
Thanks.
Amit Langote
EDB: http://www.enterprisedb.com
EDB: http://www.enterprisedb.com
On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > create table tbl1 (a int) partition by range (a); > > > > > create table tbl1_part1 partition of tbl1 for values from (1) to (10); > > > > > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > > > > > create publication pub for table > > > > > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > > > > > In the name of consistency, I think this situation should be an error -- > > > I mean, if we detect that the user is trying to add both the partitioned > > > table *and* its partition, then all manner of things are possibly going > > > to go wrong in some way, so my inclination is to avoid it altogether. > > > > > > Is there any reason to allow that? > > > > I think it could provide flexibility to users to later change > > "publish_via_partition_root" option. Because when that option is > > false, we use individual partitions schema to send changes and when it > > is true, we use root table's schema to send changes. Added Amit L. to > > know if he has any thoughts on this matter as he was the author of > > this work? > > FWIW, I'm not sure that adding an error in this path, that is, when a > user adds both a partitioned table and its partitions to a > publication, helps much. As for the safety of allowing it, if you > look at get_rel_sync_entry(), which handles much of the complexity of > determining whether to publish a relation's changes and the schema to > use when doing so, you may be able to see that a partition being added > duplicatively is harmless, modulo any undiscovered bugs. At least as > far as the post-initial-sync replication functionality is concerned. > > What IS problematic is what a subscriber sees in the > pg_publication_tables view and the problem occurs only in the initial > sync phase, where the partition is synced duplicatively because of > being found in the view along with the parent in this case, that is, > when publish_via_partiiton_root is true. I was saying on the other > thread [1] that we should leave it up to the subscriber to decide what > to do when the view (duplicatively) returns both the parent and the > partition, citing the use case that a subscriber may want to replicate > the parent and the partition as independent tables. Though I now tend > to agree with Amit K that that may be such a meaningful and all that > common use case, and the implementation on the subscriber side would > have to be unnecessarily complex. > > So maybe it makes sense to just do what has been proposed -- > de-duplicate partitions out of the pg_publication_tables view, > AFAICU, there are actually two problems related to pg_publication_tables view that are being discussed: (a) when 'publish_via_partition_root' is true then it returns both parent and child tables (provided both are part of publication), this can lead to duplicate data on the subscriber; the fix for this is being discussed in thread [1]. (b) when 'publish_via_partition_root' is false, then it returns duplicate entries for child tables (provided both are part of publication), this problem is being discussed in this thread. As per the proposed patches, it seems both need a separate fix, we can fix them together if we want but I am not sure. Is your understanding the same? > unless > we know of a bigger problem that requires us to hack the subscriber > side of things too. > There is yet another issue that might need subscriber side change. See the second issue summarized by Hou-San in the email[2]. I feel it is better to tackle that separately. > Actually, I came to know of one such problem > while thinking about this today: when you ATTACH a partition to a > table that is present in a publish_via_partition_root=true > publication, it doesn't get copied via the initial sync even though > subsequent replication works just fine. The reason for that is that > the subscriber only syncs the partitions that are known at the time > when the parent is first synced, that too via the parent (as SELECT > <columns..> FROM parent), and then marks the parent as sync-done. > Refreshing the subscription after ATTACHing doesn't help, because the > subscriber can't see any partitions to begin with. > Sorry, it is not clear to me how this can lead to a problem. Even if we just have the root table in the subscriber at the time of sync later shouldn't copy get the newly attached partition's data and replicate it to the required partition for "publish_via_partition_root=true" case? Anyway, if this is a problem we need to figure the solution for this separately. [1]: https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Thu, Nov 18, 2021 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > (b) when 'publish_via_partition_root' is false, then it > returns duplicate entries for child tables (provided both are part of > publication), this problem is being discussed in this thread. +1 to just focus on fixing the initial problem proposed i.e. pg_get_publication_tables() outputting duplicate relid. IMO, all other issues (if unrelated to this problem and the patch can go to another new thread, if necessary). Regards, Bharath Rupireddy.
On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09@gmail.com> wrote: > > What IS problematic is what a subscriber sees in the > > pg_publication_tables view and the problem occurs only in the initial > > sync phase, where the partition is synced duplicatively because of > > being found in the view along with the parent in this case, that is, > > when publish_via_partiiton_root is true. I was saying on the other > > thread [1] that we should leave it up to the subscriber to decide what > > to do when the view (duplicatively) returns both the parent and the > > partition, citing the use case that a subscriber may want to replicate > > the parent and the partition as independent tables. Though I now tend > > to agree with Amit K that that may be such a meaningful and all that > > common use case, and the implementation on the subscriber side would > > have to be unnecessarily complex. > > > > So maybe it makes sense to just do what has been proposed -- > > de-duplicate partitions out of the pg_publication_tables view, > > AFAICU, there are actually two problems related to > pg_publication_tables view that are being discussed: (a) when > 'publish_via_partition_root' is true then it returns both parent and > child tables (provided both are part of publication), this can lead to > duplicate data on the subscriber; the fix for this is being discussed > in thread [1]. (b) when 'publish_via_partition_root' is false, then it > returns duplicate entries for child tables (provided both are part of > publication), this problem is being discussed in this thread. > > As per the proposed patches, it seems both need a separate fix, we can > fix them together if we want but I am not sure. Is your understanding > the same? Actually, I'm seeing both (a) and (b) as more or less the same thing. If publish_via_partition_root=true, we'd want to remove a partition from the list if the root parent is also present. If false, we'd want to remove a partition from the list because it'd also be added by way of expanding the root. I know that (a) causes the partition-double-sync problem that could be fixed by partition OID de-duplication as proposed in the other thread. As a comment on that, it'd be nice to see a test case added to src/test/subscription suite in that patch, because partition-double-sync is the main problem statement I'd think. Like Bharath said, I don't see (b) as much of a problem, because the subscriber applies DISTINCT when querying pg_publication_tables() anyway. Other users, if any out there, may be doing the same by following core subscription code's example. That said, I am not opposed to applying the patch being proposed here, though I guess we don't have any src/test/subscription test case for this one? As in, do we know of any replication (initial/streaming) misbehavior caused by the duplicate partition OIDs in this case or is the only problem that pg_publication_tables output looks odd? > > unless > > we know of a bigger problem that requires us to hack the subscriber > > side of things too. > > > > There is yet another issue that might need subscriber side change. See > the second issue summarized by Hou-San in the email[2]. I feel it is > better to tackle that separately. > > > Actually, I came to know of one such problem > > while thinking about this today: when you ATTACH a partition to a > > table that is present in a publish_via_partition_root=true > > publication, it doesn't get copied via the initial sync even though > > subsequent replication works just fine. The reason for that is that > > the subscriber only syncs the partitions that are known at the time > > when the parent is first synced, that too via the parent (as SELECT > > <columns..> FROM parent), and then marks the parent as sync-done. > > Refreshing the subscription after ATTACHing doesn't help, because the > > subscriber can't see any partitions to begin with. > > > > Sorry, it is not clear to me how this can lead to a problem. Even if > we just have the root table in the subscriber at the time of sync > later shouldn't copy get the newly attached partition's data and > replicate it to the required partition for > "publish_via_partition_root=true" case? Not sure if you thought otherwise but I am talking about attaching a new partition into the partition tree on the *publication side*. The problematic case is attaching the partition *after* the subscriber has already marked the root parent as synced and/or ready for replication. Refreshing the subscription doesn't help it discover the newly attached partition, because a publish_via_partition_root only ever tells about the root parent, which would be already synced, so the subscriber would think there's nothing to copy. > Anyway, if this is a problem > we need to figure the solution for this separately. Sure, we might need to do that after all. Though it might be a good idea to be sure that we won't need to reconsider the fix we push for the issue(s) being discussed here and elsewhere, because I suspect that the solution to the problem I mentioned is likely to involve tweaking pg_publication_tables view output. -- Amit Langote EDB: http://www.enterprisedb.com
On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > AFAICU, there are actually two problems related to > > pg_publication_tables view that are being discussed: (a) when > > 'publish_via_partition_root' is true then it returns both parent and > > child tables (provided both are part of publication), this can lead to > > duplicate data on the subscriber; the fix for this is being discussed > > in thread [1]. (b) when 'publish_via_partition_root' is false, then it > > returns duplicate entries for child tables (provided both are part of > > publication), this problem is being discussed in this thread. > > > > As per the proposed patches, it seems both need a separate fix, we can > > fix them together if we want but I am not sure. Is your understanding > > the same? > > Actually, I'm seeing both (a) and (b) as more or less the same thing. > If publish_via_partition_root=true, we'd want to remove a partition > from the list if the root parent is also present. If false, we'd want > to remove a partition from the list because it'd also be added by way > of expanding the root. > > I know that (a) causes the partition-double-sync problem that could be > fixed by partition OID de-duplication as proposed in the other thread. > As a comment on that, it'd be nice to see a test case added to > src/test/subscription suite in that patch, because > partition-double-sync is the main problem statement I'd think. > > Like Bharath said, I don't see (b) as much of a problem, because the > subscriber applies DISTINCT when querying pg_publication_tables() > anyway. Other users, if any out there, may be doing the same by > following core subscription code's example. > Hmm, I don't think we can assume that current or future users of pg_publication_tables() will use it with DISTINCT unless we specify in the specs/docs. However, I guess we might want to do it only for HEAD as there is no direct field report for this and anyway we have some workaround for this. > That said, I am not > opposed to applying the patch being proposed here, though I guess we > don't have any src/test/subscription test case for this one? > Yeah, we can add tests for this and the other case. > As in, > do we know of any replication (initial/streaming) misbehavior caused > by the duplicate partition OIDs in this case or is the only problem > that pg_publication_tables output looks odd? > The latter one but I think either we should document this or change it as we can't assume users will follow what subscriber-side code does. > > > unless > > > we know of a bigger problem that requires us to hack the subscriber > > > side of things too. > > > > > > > There is yet another issue that might need subscriber side change. See > > the second issue summarized by Hou-San in the email[2]. I feel it is > > better to tackle that separately. > > > > The problematic case is attaching the partition *after* the subscriber > has already marked the root parent as synced and/or ready for > replication. Refreshing the subscription doesn't help it discover the > newly attached partition, because a publish_via_partition_root only > ever tells about the root parent, which would be already synced, so > the subscriber would think there's nothing to copy. > Okay, I see this could be a problem but I haven't tried to reproduce it. > > Anyway, if this is a problem > > we need to figure the solution for this separately. > > Sure, we might need to do that after all. Though it might be a good > idea to be sure that we won't need to reconsider the fix we push for > the issue(s) being discussed here and elsewhere, because I suspect > that the solution to the problem I mentioned is likely to involve > tweaking pg_publication_tables view output. > Okay, so we have four known problems in the same area. The first has been reported at the start of this thread, then the two as summarized by Hou-San in his email [1] and the fourth one is the attach of partitions after initial sync as you mentioned. It makes sense to have some high-level idea on how to solve each of the problems before pushing a fix for any one particular problem as one fix could have an impact on other fixes. I'll think over it but please do let me know if you have any ideas. [1] - https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > The problematic case is attaching the partition *after* the subscriber > > has already marked the root parent as synced and/or ready for > > replication. Refreshing the subscription doesn't help it discover the > > newly attached partition, because a publish_via_partition_root only > > ever tells about the root parent, which would be already synced, so > > the subscriber would think there's nothing to copy. > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > Anyway, if this is a problem > > > we need to figure the solution for this separately. > > > > Sure, we might need to do that after all. Though it might be a good > > idea to be sure that we won't need to reconsider the fix we push for > > the issue(s) being discussed here and elsewhere, because I suspect > > that the solution to the problem I mentioned is likely to involve > > tweaking pg_publication_tables view output. > > I have thought about this problem and I see two possibilities for a solution (a) We could provide a new option say 'truncate' (something on lines proposed here [1]) which would truncate the table(s) and change its status to 'i' in the pg_subscription_rel, this would allow the newly added partition to be synced after refresh. This could lead to a large copy in such a case. (b) We could somehow get and store all the partition info from the publisher-side on the subscriber-side while initial sync (say in new system table pg_subscription_rel_members). Now, after the refresh, if this list changes, we can allow to just get the data of that particular partition but I guess it would mean that we need to store oids of the publisher which might or might not be safe considering oids can wraparound before the refresh. Do you have any other ideas? One more thing you mentioned is that the initial sync won't work after refresh but later changes will be replicated but I noticed that later changes also don't get streamed till we restart the subscriber server. I am not sure but we might not be invalidating apply workers cache due to which it didn't notice the same. [1] - https://www.postgresql.org/message-id/CF3B6672-2A43-4204-A60A-68F359218A9B%40endpoint.com -- With Regards, Amit Kapila.
On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > As in, > > do we know of any replication (initial/streaming) misbehavior caused > > by the duplicate partition OIDs in this case or is the only problem > > that pg_publication_tables output looks odd? > > The latter one but I think either we should document this or change it > as we can't assume users will follow what subscriber-side code does. On second thought, I agree that de-duplicating partitions from this view is an improvement. -- Amit Langote EDB: http://www.enterprisedb.com
On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > The problematic case is attaching the partition *after* the subscriber > > > has already marked the root parent as synced and/or ready for > > > replication. Refreshing the subscription doesn't help it discover the > > > newly attached partition, because a publish_via_partition_root only > > > ever tells about the root parent, which would be already synced, so > > > the subscriber would think there's nothing to copy. > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > > > Anyway, if this is a problem > > > > we need to figure the solution for this separately. > > > > > > Sure, we might need to do that after all. Though it might be a good > > > idea to be sure that we won't need to reconsider the fix we push for > > > the issue(s) being discussed here and elsewhere, because I suspect > > > that the solution to the problem I mentioned is likely to involve > > > tweaking pg_publication_tables view output. > > I have thought about this problem and I see two possibilities for a > solution (a) We could provide a new option say 'truncate' (something > on lines proposed here [1]) which would truncate the table(s) and > change its status to 'i' in the pg_subscription_rel, this would allow > the newly added partition to be synced after refresh. This could lead > to a large copy in such a case. Maybe I am missing something about the proposal, though I'd think a more automatic solution would be better, something that doesn't need to rely on an unrelated feature. > (b) We could somehow get and store all > the partition info from the publisher-side on the subscriber-side > while initial sync (say in new system table > pg_subscription_rel_members). Now, after the refresh, if this list > changes, we can allow to just get the data of that particular > partition but I guess it would mean that we need to store oids of the > publisher which might or might not be safe considering oids can > wraparound before the refresh. > > Do you have any other ideas? I thought that the idea I had earlier mentioned at [1] may be useful, which I can see is similar to your idea (b). I also suspect that it can be implemented without needing a separate catalog and storing publication-side relation OIDs in the subscription-side catalog, though maybe I haven't thought hard enough. > One more thing you mentioned is that the initial sync won't work after > refresh but later changes will be replicated but I noticed that later > changes also don't get streamed till we restart the subscriber server. > I am not sure but we might not be invalidating apply workers cache due > to which it didn't notice the same. Oh, that sounds odd and, as you appear to say, a separate problem. I'll check. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > The problematic case is attaching the partition *after* the subscriber > > > > has already marked the root parent as synced and/or ready for > > > > replication. Refreshing the subscription doesn't help it discover the > > > > newly attached partition, because a publish_via_partition_root only > > > > ever tells about the root parent, which would be already synced, so > > > > the subscriber would think there's nothing to copy. > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > > > > > Anyway, if this is a problem > > > > > we need to figure the solution for this separately. > > > > > > > > Sure, we might need to do that after all. Though it might be a good > > > > idea to be sure that we won't need to reconsider the fix we push for > > > > the issue(s) being discussed here and elsewhere, because I suspect > > > > that the solution to the problem I mentioned is likely to involve > > > > tweaking pg_publication_tables view output. > > > > I have thought about this problem and I see two possibilities for a > > solution (a) We could provide a new option say 'truncate' (something > > on lines proposed here [1]) which would truncate the table(s) and > > change its status to 'i' in the pg_subscription_rel, this would allow > > the newly added partition to be synced after refresh. This could lead > > to a large copy in such a case. > > Maybe I am missing something about the proposal, though I'd think a > more automatic solution would be better, something that doesn't need > to rely on an unrelated feature. > Agreed, this was more of a workaround for users if we didn't get any automatic solution. > > (b) We could somehow get and store all > > the partition info from the publisher-side on the subscriber-side > > while initial sync (say in new system table > > pg_subscription_rel_members). Now, after the refresh, if this list > > changes, we can allow to just get the data of that particular > > partition but I guess it would mean that we need to store oids of the > > publisher which might or might not be safe considering oids can > > wraparound before the refresh. > > > > Do you have any other ideas? > > I thought that the idea I had earlier mentioned at [1] may be useful, > which I can see is similar to your idea (b). I also suspect that it > can be implemented without needing a separate catalog and storing > publication-side relation OIDs in the subscription-side catalog, > though maybe I haven't thought hard enough. > The problem with storing this info in pg_subscription_rel as you were describing in your proposal is that currently, we ensure that the same table exists in subscriber and then store the subscriber side table id in that catalog. I am not sure if we can store publisher-side oids in that catalog and if we store then it would be confusing as now it will have info of both publisher-side oids and subscriber-side oids. Isn't that a problem with this approach? One more problem with this overall approach is the risk of OID wraparound. Say between Create Subscription and Alter Subscription .. Refresh, we detach one partition, the oid wraparounds, and we create/reattach another partition which gets the same oid as for the partition which we detached earlier then we won't be able to identify the new partition even after refresh. -- With Regards, Amit Kapila.
On Mon, Nov 22, 2021 6:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > <amitlangote09@gmail.com> wrote: > > > > > The problematic case is attaching the partition *after* the > > > > > subscriber has already marked the root parent as synced and/or > > > > > ready for replication. Refreshing the subscription doesn't help > > > > > it discover the newly attached partition, because a > > > > > publish_via_partition_root only ever tells about the root > > > > > parent, which would be already synced, so the subscriber would think > > > > > there's nothing to copy. > > > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > > > > > > > Anyway, if this is a problem > > > > > > we need to figure the solution for this separately. > > > > > > > > > > Sure, we might need to do that after all. Though it might be a > > > > > good idea to be sure that we won't need to reconsider the fix we > > > > > push for the issue(s) being discussed here and elsewhere, > > > > > because I suspect that the solution to the problem I mentioned > > > > > is likely to involve tweaking pg_publication_tables view output. > > > > > > I have thought about this problem and I see two possibilities for a > > > solution (a) We could provide a new option say 'truncate' (something > > > on lines proposed here [1]) which would truncate the table(s) and > > > change its status to 'i' in the pg_subscription_rel, this would > > > allow the newly added partition to be synced after refresh. This > > > could lead to a large copy in such a case. > > > > Maybe I am missing something about the proposal, though I'd think a > > more automatic solution would be better, something that doesn't need > > to rely on an unrelated feature. > > > > Agreed, this was more of a workaround for users if we didn't get any automatic > solution. > > > > (b) We could somehow get and store all the partition info from the > > > publisher-side on the subscriber-side while initial sync (say in new > > > system table pg_subscription_rel_members). Now, after the refresh, > > > if this list changes, we can allow to just get the data of that > > > particular partition but I guess it would mean that we need to store > > > oids of the publisher which might or might not be safe considering > > > oids can wraparound before the refresh. > > > > > > Do you have any other ideas? > > > > I thought that the idea I had earlier mentioned at [1] may be useful, > > which I can see is similar to your idea (b). I also suspect that it > > can be implemented without needing a separate catalog and storing > > publication-side relation OIDs in the subscription-side catalog, > > though maybe I haven't thought hard enough. > > > > The problem with storing this info in pg_subscription_rel as you were > describing in your proposal is that currently, we ensure that the same table > exists in subscriber and then store the subscriber side table id in that catalog. I > am not sure if we can store publisher-side oids in that catalog and if we store > then it would be confusing as now it will have info of both publisher-side oids > and subscriber-side oids. Isn't that a problem with this approach? > > One more problem with this overall approach is the risk of OID wraparound. > Say between Create Subscription and Alter Subscription .. > Refresh, we detach one partition, the oid wraparounds, and we create/reattach > another partition which gets the same oid as for the partition which we > detached earlier then we won't be able to identify the new partition even after > refresh. If we use this approach, I think maybe we can store the publisher-side's table name instead of oid. Because for non-partitioned table, it's possible that user drop and create a new table with the same name in which case the oid would be changed. And in this case, the existing behavior wouldn't sync the table again. So I think it might be better to make the partitioned table's behavior consistent with the non-partitioned table which only sync the table's data when detect new table name. Best regards, Hou zj
On Tue, Nov 23, 2021 at 7:28 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Mon, Nov 22, 2021 6:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 22, 2021 at 1:45 PM Amit Langote <amitlangote09@gmail.com> > > wrote: > > > > > > On Sat, Nov 20, 2021 at 8:31 PM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > > <amitlangote09@gmail.com> wrote: > > > > > > The problematic case is attaching the partition *after* the > > > > > > subscriber has already marked the root parent as synced and/or > > > > > > ready for replication. Refreshing the subscription doesn't help > > > > > > it discover the newly attached partition, because a > > > > > > publish_via_partition_root only ever tells about the root > > > > > > parent, which would be already synced, so the subscriber would think > > > > > > there's nothing to copy. > > > > > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > > > > > > > > > Anyway, if this is a problem > > > > > > > we need to figure the solution for this separately. > > > > > > > > > > > > Sure, we might need to do that after all. Though it might be a > > > > > > good idea to be sure that we won't need to reconsider the fix we > > > > > > push for the issue(s) being discussed here and elsewhere, > > > > > > because I suspect that the solution to the problem I mentioned > > > > > > is likely to involve tweaking pg_publication_tables view output. > > > > > > > > I have thought about this problem and I see two possibilities for a > > > > solution (a) We could provide a new option say 'truncate' (something > > > > on lines proposed here [1]) which would truncate the table(s) and > > > > change its status to 'i' in the pg_subscription_rel, this would > > > > allow the newly added partition to be synced after refresh. This > > > > could lead to a large copy in such a case. > > > > > > Maybe I am missing something about the proposal, though I'd think a > > > more automatic solution would be better, something that doesn't need > > > to rely on an unrelated feature. > > > > > > > Agreed, this was more of a workaround for users if we didn't get any automatic > > solution. > > > > > > (b) We could somehow get and store all the partition info from the > > > > publisher-side on the subscriber-side while initial sync (say in new > > > > system table pg_subscription_rel_members). Now, after the refresh, > > > > if this list changes, we can allow to just get the data of that > > > > particular partition but I guess it would mean that we need to store > > > > oids of the publisher which might or might not be safe considering > > > > oids can wraparound before the refresh. > > > > > > > > Do you have any other ideas? > > > > > > I thought that the idea I had earlier mentioned at [1] may be useful, > > > which I can see is similar to your idea (b). I also suspect that it > > > can be implemented without needing a separate catalog and storing > > > publication-side relation OIDs in the subscription-side catalog, > > > though maybe I haven't thought hard enough. > > > > > > > The problem with storing this info in pg_subscription_rel as you were > > describing in your proposal is that currently, we ensure that the same table > > exists in subscriber and then store the subscriber side table id in that catalog. I > > am not sure if we can store publisher-side oids in that catalog and if we store > > then it would be confusing as now it will have info of both publisher-side oids > > and subscriber-side oids. Isn't that a problem with this approach? > > > > One more problem with this overall approach is the risk of OID wraparound. > > Say between Create Subscription and Alter Subscription .. > > Refresh, we detach one partition, the oid wraparounds, and we create/reattach > > another partition which gets the same oid as for the partition which we > > detached earlier then we won't be able to identify the new partition even after > > refresh. > > If we use this approach, I think maybe we can store the publisher-side's table > name instead of oid. Because for non-partitioned table, it's possible that user > drop and create a new table with the same name in which case the oid would be > changed. And in this case, the existing behavior wouldn't sync the table again. > So I think it might be better to make the partitioned table's behavior > consistent with the non-partitioned table which only sync the table's data when > detect new table name. > I think firstly you need to then also store schema info. In the case of non-partitioned tables, it is derived differently because we remove the table's entry from pg_publication_rel, so the data won't be sent to subscriber till the new table is added to the publication. In the case of partitioned table (especially for the case of publish_via_partition_root) we won't sync initial data in such a case (dropped and re-created the partition with the same name) both with and without your solution but the future data will be streamed. Another point related to the Attach problem is what if such a partition before was also part of publication separately, then if we copy its initial data then that data copy might be considered a duplicate. Isn't it better to document this case and explain what users can expect instead of trying to design a solution around this? Even if we do so the streaming after attach partition problem as discussed above should be fixed. -- With Regards, Amit Kapila.
On Tue, Nov 23, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Isn't it better to document this case and explain what > users can expect instead of trying to design a solution around this? I thought about the problems you've described and it looks like I hadn't considered many of the details which complicate implementing a solution for this. So yeah, documenting the ATTACH issue as a limitation sounds like the best course for now. I might word it as follows and add it under Notes at https://www.postgresql.org/docs/current/sql-createpublication.html: "ATTACHing a table into a partition tree whose root is published using a publication with publish_via_partition_root set to true does not result in the table's existing contents to be replicated." I'm not sure there's a clean enough workaround to this that we can add to the paragraph. Does that make sense? > Even if we do so the streaming after attach partition problem as > discussed above should be fixed. I agree. I have reproduced the problem though haven't managed to pin down the cause yet. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Nov 24, 2021 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Tue, Nov 23, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Isn't it better to document this case and explain what > > users can expect instead of trying to design a solution around this? > > I thought about the problems you've described and it looks like I > hadn't considered many of the details which complicate implementing a > solution for this. > > So yeah, documenting the ATTACH issue as a limitation sounds like the > best course for now. I might word it as follows and add it under > Notes at https://www.postgresql.org/docs/current/sql-createpublication.html: > > "ATTACHing a table into a partition tree whose root is published using > a publication with publish_via_partition_root set to true does not > result in the table's existing contents to be replicated." > Instead of "to be", shall we use "being"? > I'm not sure there's a clean enough workaround to this that we can add > to the paragraph. > > Does that make sense? > Yeah, that sounds reasonable. I think the case with "publish_via_partition_root = false" should work but please test it once. The other thing which we need to consider about all these fixes is whether to back patch them or not. I think it is on case to case basis. I feel this limitation should be documented and backpatched, what do you think? Feel free to submit patches accordingly. > > Even if we do so the streaming after attach partition problem as > > discussed above should be fixed. > > I agree. I have reproduced the problem though haven't managed to pin > down the cause yet. > No problem, do let us know whatever be your findings at the end. I think now we know how to proceed with this attach issue, maybe we should also try to commit the fixes for other issues. -- With Regards, Amit Kapila.
On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > As in, > > > do we know of any replication (initial/streaming) misbehavior caused > > > by the duplicate partition OIDs in this case or is the only problem > > > that pg_publication_tables output looks odd? > > > > The latter one but I think either we should document this or change it > > as we can't assume users will follow what subscriber-side code does. > > On second thought, I agree that de-duplicating partitions from this > view is an improvement. > Fair enough. Hou-San, Can you please submit the updated patch after fixing any pending comments including the test case? -- With Regards, Amit Kapila.
On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > As in, > > > > do we know of any replication (initial/streaming) misbehavior > > > > caused by the duplicate partition OIDs in this case or is the only > > > > problem that pg_publication_tables output looks odd? > > > > > > The latter one but I think either we should document this or change > > > it as we can't assume users will follow what subscriber-side code does. > > > > On second thought, I agree that de-duplicating partitions from this > > view is an improvement. > > > > Fair enough. Hou-San, Can you please submit the updated patch after fixing > any pending comments including the test case? OK, I will submit the updated patch soon. Best regards, Hou zj
On Wed, Nov 24, 2021 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Nov 24, 2021 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote: > > So yeah, documenting the ATTACH issue as a limitation sounds like the > > best course for now. I might word it as follows and add it under > > Notes at https://www.postgresql.org/docs/current/sql-createpublication.html: > > > > "ATTACHing a table into a partition tree whose root is published using > > a publication with publish_via_partition_root set to true does not > > result in the table's existing contents to be replicated." > > Instead of "to be", shall we use "being"? That reads better. > > I'm not sure there's a clean enough workaround to this that we can add > > to the paragraph. > > > > Does that make sense? > > > > Yeah, that sounds reasonable. I think the case with > "publish_via_partition_root = false" should work but please test it > once. Yeah, I did test that case to compare and didn't see a problem with it. > The other thing which we need to consider about all these fixes > is whether to back patch them or not. I think it is on case to case > basis. I feel this limitation should be documented and backpatched, > what do you think? Feel free to submit patches accordingly. I agree with backpatching the doc fix. I've attached a diff against master, though it also appears to apply to 13 and 14 branches. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Nov 24, 2021 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 24, 2021 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > So yeah, documenting the ATTACH issue as a limitation sounds like the > > > best course for now. I might word it as follows and add it under > > > Notes at https://www.postgresql.org/docs/current/sql-createpublication.html: > > > > > > "ATTACHing a table into a partition tree whose root is published using > > > a publication with publish_via_partition_root set to true does not > > > result in the table's existing contents to be replicated." > > > > Instead of "to be", shall we use "being"? > > That reads better. > > > > I'm not sure there's a clean enough workaround to this that we can add > > > to the paragraph. > > > > > I agree with backpatching the doc fix. I've attached a diff against > master, though it also appears to apply to 13 and 14 branches. > I think we can <literal></literal> for publish_via_partition_root, other than that the patch looks good to me. Hou-San, others, do you have any opinion about this patch and whether to backpatch it or not? -- With Regards, Amit Kapila.
On Thu, Nov 25, 2021 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I agree with backpatching the doc fix. I've attached a diff against > > master, though it also appears to apply to 13 and 14 branches. > > I think we can <literal></literal> for publish_via_partition_root, > other than that the patch looks good to me. Oh right, fixed. I also added the tag for "true" in the same sentence. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Thursday, November 25, 2021 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> > > > > I agree with backpatching the doc fix. I've attached a diff against > > master, though it also appears to apply to 13 and 14 branches. > > > > I think we can <literal></literal> for publish_via_partition_root, other than that > the patch looks good to me. > > Hou-San, others, do you have any opinion about this patch and whether to > backpatch it or not? I think it makes sense to backpatch the doc fix, and the patch looks good to me. Best regards, Hou zj
On Fri, Nov 26, 2021 at 7:10 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Thursday, November 25, 2021 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> > > > > > > I agree with backpatching the doc fix. I've attached a diff against > > > master, though it also appears to apply to 13 and 14 branches. > > > > > > > I think we can <literal></literal> for publish_via_partition_root, other than that > > the patch looks good to me. > > > > Hou-San, others, do you have any opinion about this patch and whether to > > backpatch it or not? > > I think it makes sense to backpatch the doc fix, and the patch looks good to me. > Thanks, I'll push this sometime early next week unless there are any objections to it. -- With Regards, Amit Kapila.
On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com> > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > As in, > > > > do we know of any replication (initial/streaming) misbehavior > > > > caused by the duplicate partition OIDs in this case or is the only > > > > problem that pg_publication_tables output looks odd? > > > > > > The latter one but I think either we should document this or change > > > it as we can't assume users will follow what subscriber-side code does. > > > > On second thought, I agree that de-duplicating partitions from this > > view is an improvement. > > > > Fair enough. Hou-San, Can you please submit the updated patch after fixing > any pending comments including the test case? Attach the updated patch which address the comments so far. The patch only adds a testcase in publication.sql because the duplicate output doesn't cause unexpected behavior in pub-sub test. Best regards, Hou zj
Attachment
On Fri, Nov 26, 2021 at 8:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 26, 2021 at 7:10 AM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > On Thursday, November 25, 2021 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote <amitlangote09@gmail.com> > > > > > > > > I agree with backpatching the doc fix. I've attached a diff against > > > > master, though it also appears to apply to 13 and 14 branches. > > > > > > > > > > I think we can <literal></literal> for publish_via_partition_root, other than that > > > the patch looks good to me. > > > > > > Hou-San, others, do you have any opinion about this patch and whether to > > > backpatch it or not? > > > > I think it makes sense to backpatch the doc fix, and the patch looks good to me. > > > > Thanks, I'll push this sometime early next week unless there are any > objections to it. > Pushed. -- With Regards, Amit Kapila.
On Mon, Nov 29, 2021 at 2:37 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com> > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote <amitlangote09@gmail.com> > > wrote: > > > > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> > > wrote: > > > > > As in, > > > > > do we know of any replication (initial/streaming) misbehavior > > > > > caused by the duplicate partition OIDs in this case or is the only > > > > > problem that pg_publication_tables output looks odd? > > > > > > > > The latter one but I think either we should document this or change > > > > it as we can't assume users will follow what subscriber-side code does. > > > > > > On second thought, I agree that de-duplicating partitions from this > > > view is an improvement. > > > > > > > Fair enough. Hou-San, Can you please submit the updated patch after fixing > > any pending comments including the test case? > > Attach the updated patch which address the comments so far. > > The patch only adds a testcase in publication.sql because the duplicate > output doesn't cause unexpected behavior in pub-sub test. > Thanks, the patch looks good to me. I have slightly changed the commit message in the attached. I would like to commit this only in HEAD as there is no user complaint about this and it might not stop any user's service unless it relies on this view's output for the initial table synchronization. What do you think? Bharath/Amit L., others, do you have any opinion on this matter? -- With Regards, Amit Kapila.
Attachment
On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Nov 29, 2021 at 2:37 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > On Wed, Nov 24, 2021 4:48 PM Amit Kapila <amit.kapila16@gmail.com> > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote > > > <amitlangote09@gmail.com> > > > wrote: > > > > > > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila > > > > <amit.kapila16@gmail.com> > > > wrote: > > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > > > > > <amitlangote09@gmail.com> > > > wrote: > > > > > > As in, > > > > > > do we know of any replication (initial/streaming) misbehavior > > > > > > caused by the duplicate partition OIDs in this case or is the > > > > > > only problem that pg_publication_tables output looks odd? > > > > > > > > > > The latter one but I think either we should document this or > > > > > change it as we can't assume users will follow what subscriber-side > > > > > code does. > > > > > > > > On second thought, I agree that de-duplicating partitions from > > > > this view is an improvement. > > > > > > > > > > Fair enough. Hou-San, Can you please submit the updated patch after > > > fixing any pending comments including the test case? > > > > Attach the updated patch which address the comments so far. > > > > The patch only adds a testcase in publication.sql because the > > duplicate output doesn't cause unexpected behavior in pub-sub test. > > > > Thanks, the patch looks good to me. I have slightly changed the commit > message in the attached. I would like to commit this only in HEAD as there is no > user complaint about this and it might not stop any user's service unless it relies > on this view's output for the initial table synchronization. > > What do you think? I agreed that we can commit only in HEAD. Best regards, Hou zj
On Thu, Dec 2, 2021 at 9:44 houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote > > > > <amitlangote09@gmail.com> wrote: > > > > > On second thought, I agree that de-duplicating partitions from > > > > > this view is an improvement. > > > > > > > > Fair enough. Hou-San, Can you please submit the updated patch after > > > > fixing any pending comments including the test case? > > > > > > Attach the updated patch which address the comments so far. > > > > > > The patch only adds a testcase in publication.sql because the > > > duplicate output doesn't cause unexpected behavior in pub-sub test. > > > > Thanks, the patch looks good to me. I have slightly changed the commit > > message in the attached. I would like to commit this only in HEAD as there is no > > user complaint about this and it might not stop any user's service unless it relies > > on this view's output for the initial table synchronization. The patch looks good to me too in that it gets the job done. Reading Alvaro's email at the top again gave me a pause to reconsider what I had said in reply. It might indeed have been nice if the publication DDL itself had prevented the cases where a partition and its ancestor are added to a publication, though as Hou-san mentioned, partition ATTACHes make that a bit tricky to enforce at all times, though of course not impossible. Maybe it's okay to just de-duplicate pg_publication_tables output as the patch does, even though it may appear to be a band-aid solution if we one considers Alvaro's point about consistency. Sorry about too many second thoughts on these threads. :( > > What do you think? > I agreed that we can commit only in HEAD. +1 to do this only in HEAD. Thank you. -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 9:44 houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote > > > > > <amitlangote09@gmail.com> wrote: > > > > > > On second thought, I agree that de-duplicating partitions from > > > > > > this view is an improvement. > > > > > > > > > > Fair enough. Hou-San, Can you please submit the updated patch after > > > > > fixing any pending comments including the test case? > > > > > > > > Attach the updated patch which address the comments so far. > > > > > > > > The patch only adds a testcase in publication.sql because the > > > > duplicate output doesn't cause unexpected behavior in pub-sub test. > > > > > > Thanks, the patch looks good to me. I have slightly changed the commit > > > message in the attached. I would like to commit this only in HEAD as there is no > > > user complaint about this and it might not stop any user's service unless it relies > > > on this view's output for the initial table synchronization. > > The patch looks good to me too in that it gets the job done. > > Reading Alvaro's email at the top again gave me a pause to reconsider > what I had said in reply. It might indeed have been nice if the > publication DDL itself had prevented the cases where a partition and > its ancestor are added to a publication, though as Hou-san mentioned, > partition ATTACHes make that a bit tricky to enforce at all times, > though of course not impossible. Maybe it's okay to just de-duplicate > pg_publication_tables output as the patch does, even though it may > appear to be a band-aid solution if we one considers Alvaro's point > about consistency. > True, I think even if we consider that idea it will be a much bigger change, and also as it will be a behavioral change so we might want to keep it just for HEAD and some of these fixes need to be backpatched. Having said that, if you or someone want to pursue that idea and come up with a better solution than what we have currently it is certainly welcome. > > > What do you think? > > I agreed that we can commit only in HEAD. > > +1 to do this only in HEAD. > Okay, If there are no further suggestions or objections, I'll commit the patch early next week (mostly by Monday). -- With Regards, Amit Kapila.
On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote: > > Reading Alvaro's email at the top again gave me a pause to reconsider > > what I had said in reply. It might indeed have been nice if the > > publication DDL itself had prevented the cases where a partition and > > its ancestor are added to a publication, though as Hou-san mentioned, > > partition ATTACHes make that a bit tricky to enforce at all times, > > though of course not impossible. Maybe it's okay to just de-duplicate > > pg_publication_tables output as the patch does, even though it may > > appear to be a band-aid solution if we one considers Alvaro's point > > about consistency. > > True, I think even if we consider that idea it will be a much bigger > change, and also as it will be a behavioral change so we might want to > keep it just for HEAD and some of these fixes need to be backpatched. > Having said that, if you or someone want to pursue that idea and come > up with a better solution than what we have currently it is certainly > welcome. Okay, I did write a PoC patch this morning after sending out my earlier email. I polished it a bit, which is attached. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > Reading Alvaro's email at the top again gave me a pause to reconsider > > > what I had said in reply. It might indeed have been nice if the > > > publication DDL itself had prevented the cases where a partition and > > > its ancestor are added to a publication, though as Hou-san mentioned, > > > partition ATTACHes make that a bit tricky to enforce at all times, > > > though of course not impossible. Maybe it's okay to just de-duplicate > > > pg_publication_tables output as the patch does, even though it may > > > appear to be a band-aid solution if we one considers Alvaro's point > > > about consistency. > > > > True, I think even if we consider that idea it will be a much bigger > > change, and also as it will be a behavioral change so we might want to > > keep it just for HEAD and some of these fixes need to be backpatched. > > Having said that, if you or someone want to pursue that idea and come > > up with a better solution than what we have currently it is certainly > > welcome. > > Okay, I did write a PoC patch this morning after sending out my > earlier email. I polished it a bit, which is attached. > I see multiple problems with this patch and idea. (a) I think you forgot to deal with "All Tables In Schema" Publication which will be quite tricky to deal with during attach operation. How will you remove a particular relation from such a publication if there is a need to do so? (b) Also, how will we prohibit adding partition and its root in the case of "All Tables In Schema" or "All Tables" publication? If there is no way to do that, then that would mean we anyway need to deal with parent and child tables as part of the same publication and this new behavior will add an exception to it. (c) I think this can make switching "publish_via_partition_root" inconvenient for users. Say all or some of the partitions are part of the publication and then the user decides to add root table and change publish option "publish_via_partition_root" to "true" at that moment she needs to remove all partitions first. -- With Regards, Amit Kapila.
On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Reading Alvaro's email at the top again gave me a pause to reconsider > > > > what I had said in reply. It might indeed have been nice if the > > > > publication DDL itself had prevented the cases where a partition and > > > > its ancestor are added to a publication, though as Hou-san mentioned, > > > > partition ATTACHes make that a bit tricky to enforce at all times, > > > > though of course not impossible. Maybe it's okay to just de-duplicate > > > > pg_publication_tables output as the patch does, even though it may > > > > appear to be a band-aid solution if we one considers Alvaro's point > > > > about consistency. > > > > > > True, I think even if we consider that idea it will be a much bigger > > > change, and also as it will be a behavioral change so we might want to > > > keep it just for HEAD and some of these fixes need to be backpatched. > > > Having said that, if you or someone want to pursue that idea and come > > > up with a better solution than what we have currently it is certainly > > > welcome. > > > > Okay, I did write a PoC patch this morning after sending out my > > earlier email. I polished it a bit, which is attached. > > I see multiple problems with this patch and idea. Thanks for looking at it. Yeah, I have not looked very closely at ALL TABLES [IN SCHEMA], though only because I suspected that those cases deal with partitioning in such a way that the partition duplication issue doesn't arise. That is, only the FOR TABLE list_of_tables and ADD TABLE syntax allow for the duplication issue to occur. > (a) I think you > forgot to deal with "All Tables In Schema" Publication which will be > quite tricky to deal with during attach operation. How will you remove > a particular relation from such a publication if there is a need to do > so? Hmm, my understanding of how FOR ALL TABLES... features work is that one cannot remove a particular relation from such publications? create schema sch; create table sch.p (a int primary key) partition by list (a); create table sch.p1 partition of sch.p for values in (1); create table sch.p2 partition of sch.p for values in (2); create table p (a int primary key) partition by list (a); create table p1 partition of p for values in (1); create table p2 partition of p for values in (2); create publication puball for all tables; create publication pubsch for all tables in schema sch; alter publication puball drop table p; ERROR: publication "puball" is defined as FOR ALL TABLES DETAIL: Tables cannot be added to or dropped from FOR ALL TABLES publications. alter publication pubsch drop table sch.p; ERROR: relation "p" is not part of the publication What am I missing? > (b) Also, how will we prohibit adding partition and its root in > the case of "All Tables In Schema" or "All Tables" publication? If > there is no way to do that, then that would mean we anyway need to > deal with parent and child tables as part of the same publication and > this new behavior will add an exception to it. I checked that FOR ALL TABLES publications don't allow adding a table explicitly, but apparently IN SCHEMA one does: alter publication pubsch add table p2; \dRp+ pubsch Publication pubsch Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root -------+------------+---------+---------+---------+-----------+---------- amit | f | t | t | t | t | f Tables: "public.p2" Tables from schemas: "sch" ISTM that the ..IN SCHEMA syntax does not behave like FOR ALL TABLES without the IN SCHEMA in this regard. Is that correct? > (c) I think this can > make switching "publish_via_partition_root" inconvenient for users. > Say all or some of the partitions are part of the publication and then > the user decides to add root table and change publish option > "publish_via_partition_root" to "true" at that moment she needs to > remove all partitions first. Hmm, actually I don't think I have tested the case where some or all partitions are present by themselves before adding the root table. I'll check that and try to think of sensible behavior for that. -- Amit Langote EDB: http://www.enterprisedb.com
On Fri, Dec 3, 2021 at 21:34 Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Reading Alvaro's email at the top again gave me a pause to reconsider
> > > > what I had said in reply. It might indeed have been nice if the
> > > > publication DDL itself had prevented the cases where a partition and
> > > > its ancestor are added to a publication, though as Hou-san mentioned,
> > > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > > though of course not impossible. Maybe it's okay to just de-duplicate
> > > > pg_publication_tables output as the patch does, even though it may
> > > > appear to be a band-aid solution if we one considers Alvaro's point
> > > > about consistency.
> > >
> > > True, I think even if we consider that idea it will be a much bigger
> > > change, and also as it will be a behavioral change so we might want to
> > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > Having said that, if you or someone want to pursue that idea and come
> > > up with a better solution than what we have currently it is certainly
> > > welcome.
> >
> > Okay, I did write a PoC patch this morning after sending out my
> > earlier email. I polished it a bit, which is attached.
>
> I see multiple problems with this patch and idea.
Thanks for looking at it. Yeah, I have not looked very closely at ALL
TABLES [IN SCHEMA], though only because I suspected that those cases
deal with partitioning in such a way that the partition duplication
issue doesn't arise. That is, only the FOR TABLE list_of_tables and
ADD TABLE syntax allow for the duplication issue to occur.
Another thing I forgot to mention is that the patch passes check-world. Perhaps we don’t have enough tests that would’ve exposed any problems with the patch’s approach.
Amit Langote
EDB: http://www.enterprisedb.com
EDB: http://www.enterprisedb.com
On Fri, Dec 3, 2021 at 6:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > Reading Alvaro's email at the top again gave me a pause to reconsider > > > > > what I had said in reply. It might indeed have been nice if the > > > > > publication DDL itself had prevented the cases where a partition and > > > > > its ancestor are added to a publication, though as Hou-san mentioned, > > > > > partition ATTACHes make that a bit tricky to enforce at all times, > > > > > though of course not impossible. Maybe it's okay to just de-duplicate > > > > > pg_publication_tables output as the patch does, even though it may > > > > > appear to be a band-aid solution if we one considers Alvaro's point > > > > > about consistency. > > > > > > > > True, I think even if we consider that idea it will be a much bigger > > > > change, and also as it will be a behavioral change so we might want to > > > > keep it just for HEAD and some of these fixes need to be backpatched. > > > > Having said that, if you or someone want to pursue that idea and come > > > > up with a better solution than what we have currently it is certainly > > > > welcome. > > > > > > Okay, I did write a PoC patch this morning after sending out my > > > earlier email. I polished it a bit, which is attached. > > > > I see multiple problems with this patch and idea. > > Thanks for looking at it. Yeah, I have not looked very closely at ALL > TABLES [IN SCHEMA], though only because I suspected that those cases > deal with partitioning in such a way that the partition duplication > issue doesn't arise. That is, only the FOR TABLE list_of_tables and > ADD TABLE syntax allow for the duplication issue to occur. > > > (a) I think you > > forgot to deal with "All Tables In Schema" Publication which will be > > quite tricky to deal with during attach operation. How will you remove > > a particular relation from such a publication if there is a need to do > > so? > > Hmm, my understanding of how FOR ALL TABLES... features work is that > one cannot remove a particular relation from such publications? > > create schema sch; > create table sch.p (a int primary key) partition by list (a); > create table sch.p1 partition of sch.p for values in (1); > create table sch.p2 partition of sch.p for values in (2); > create table p (a int primary key) partition by list (a); > create table p1 partition of p for values in (1); > create table p2 partition of p for values in (2); > create publication puball for all tables; > create publication pubsch for all tables in schema sch; > > alter publication puball drop table p; > ERROR: publication "puball" is defined as FOR ALL TABLES > DETAIL: Tables cannot be added to or dropped from FOR ALL TABLES publications. > > alter publication pubsch drop table sch.p; > ERROR: relation "p" is not part of the publication > > What am I missing? > Currently, in your patch, you are trying to remove a particular relation/partition during attach but how will you do that if such a relation is part of All Tables In Schema publication? > > (b) Also, how will we prohibit adding partition and its root in > > the case of "All Tables In Schema" or "All Tables" publication? If > > there is no way to do that, then that would mean we anyway need to > > deal with parent and child tables as part of the same publication and > > this new behavior will add an exception to it. > > I checked that FOR ALL TABLES publications don't allow adding a table > explicitly, but apparently IN SCHEMA one does: > > alter publication pubsch add table p2; > \dRp+ pubsch > Publication pubsch > Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root > -------+------------+---------+---------+---------+-----------+---------- > amit | f | t | t | t | t | f > Tables: > "public.p2" > Tables from schemas: > "sch" > > ISTM that the ..IN SCHEMA syntax does not behave like FOR ALL TABLES > without the IN SCHEMA in this regard. Is that correct? > We do allow adding additional tables or schema to an exiting All tables in schema publication. -- With Regards, Amit Kapila.
On Thursday, December 2, 2021 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> > wrote: > > > Reading Alvaro's email at the top again gave me a pause to > > > reconsider what I had said in reply. It might indeed have been nice > > > if the publication DDL itself had prevented the cases where a > > > partition and its ancestor are added to a publication, though as > > > Hou-san mentioned, partition ATTACHes make that a bit tricky to > > > enforce at all times, though of course not impossible. Maybe it's > > > okay to just de-duplicate pg_publication_tables output as the patch > > > does, even though it may appear to be a band-aid solution if we one > > > considers Alvaro's point about consistency. > > > > True, I think even if we consider that idea it will be a much bigger > > change, and also as it will be a behavioral change so we might want to > > keep it just for HEAD and some of these fixes need to be backpatched. > > Having said that, if you or someone want to pursue that idea and come > > up with a better solution than what we have currently it is certainly > > welcome. > > Okay, I did write a PoC patch this morning after sending out my earlier email. I > polished it a bit, which is attached. After thinking more on this. I find there might be some usage about adding both child and parent to the publication. For the future feature publication row filter(and maybe column filter), It could be useful for user to adding child and parent with different filter expression. If pubviaroot=true, user can expect the parent's filter take effect, If pubviaroot=false, they can expect the child's filter take effect. If we disallow adding both child and parent to publication, it could be harder for these features to implement. So, personally, I am not sure disallow adding both child and parent to the publication is a good idea. Best regards, Hou zj
On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > On Thursday, December 2, 2021 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> > > wrote: > > > > Reading Alvaro's email at the top again gave me a pause to > > > > reconsider what I had said in reply. It might indeed have been nice > > > > if the publication DDL itself had prevented the cases where a > > > > partition and its ancestor are added to a publication, though as > > > > Hou-san mentioned, partition ATTACHes make that a bit tricky to > > > > enforce at all times, though of course not impossible. Maybe it's > > > > okay to just de-duplicate pg_publication_tables output as the patch > > > > does, even though it may appear to be a band-aid solution if we one > > > > considers Alvaro's point about consistency. > > > > > > True, I think even if we consider that idea it will be a much bigger > > > change, and also as it will be a behavioral change so we might want to > > > keep it just for HEAD and some of these fixes need to be backpatched. > > > Having said that, if you or someone want to pursue that idea and come > > > up with a better solution than what we have currently it is certainly > > > welcome. > > > > Okay, I did write a PoC patch this morning after sending out my earlier email. I > > polished it a bit, which is attached. > > After thinking more on this. I find there might be some usage about adding both > child and parent to the publication. > > For the future feature publication row filter(and maybe column filter), It > could be useful for user to adding child and parent with different filter > expression. If pubviaroot=true, user can expect the parent's filter take > effect, If pubviaroot=false, they can expect the child's filter take effect. > > If we disallow adding both child and parent to publication, it could be harder > for these features to implement. It's possible that one may want such a feature, and yes, outright preventing adding both the parent and a partition to a publication would perhaps make it harder. Mind you though that any such feature would need to fix the current implementation anyway to make the publication-behavior-related catalog entries for child tables, which we currently don't, unless children are added explicitly. That said, we should also be careful in implementing new features that offer a finer per-partition granularity, because the fact that we do currently offer that for many of the existing old features such as constraints, indexes, etc. limits the scalability of our architecture. That point may be off-topic for the thread, but something to consider, or maybe I need to take a look at the other patch to see if my concerns are addressable / have been addressed. -- Amit Langote EDB: http://www.enterprisedb.com
On Mon, Dec 6, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > On Thursday, December 2, 2021 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> > > > wrote: > > > > > Reading Alvaro's email at the top again gave me a pause to > > > > > reconsider what I had said in reply. It might indeed have been nice > > > > > if the publication DDL itself had prevented the cases where a > > > > > partition and its ancestor are added to a publication, though as > > > > > Hou-san mentioned, partition ATTACHes make that a bit tricky to > > > > > enforce at all times, though of course not impossible. Maybe it's > > > > > okay to just de-duplicate pg_publication_tables output as the patch > > > > > does, even though it may appear to be a band-aid solution if we one > > > > > considers Alvaro's point about consistency. > > > > > > > > True, I think even if we consider that idea it will be a much bigger > > > > change, and also as it will be a behavioral change so we might want to > > > > keep it just for HEAD and some of these fixes need to be backpatched. > > > > Having said that, if you or someone want to pursue that idea and come > > > > up with a better solution than what we have currently it is certainly > > > > welcome. > > > > > > Okay, I did write a PoC patch this morning after sending out my earlier email. I > > > polished it a bit, which is attached. > > > > After thinking more on this. I find there might be some usage about adding both > > child and parent to the publication. > > > > For the future feature publication row filter(and maybe column filter), It > > could be useful for user to adding child and parent with different filter > > expression. If pubviaroot=true, user can expect the parent's filter take > > effect, If pubviaroot=false, they can expect the child's filter take effect. > > > > If we disallow adding both child and parent to publication, it could be harder > > for these features to implement. > > It's possible that one may want such a feature, and yes, outright > preventing adding both the parent and a partition to a publication > would perhaps make it harder. Mind you though that any such feature > would need to fix the current implementation anyway to make the > publication-behavior-related catalog entries for child tables, which > we currently don't, unless children are added explicitly. > > That said, we should also be careful in implementing new features that > offer a finer per-partition granularity, because the fact that we do > currently offer that for many of the existing old features such as > constraints, indexes, etc. limits the scalability of our architecture. > That point may be off-topic for the thread, but something to consider, > or maybe I need to take a look at the other patch to see if my > concerns are addressable / have been addressed. > Your concern is not very clear to me. Can you be more specific in what kind of problem you see with such a design for row filtering? You might want to once look at that thread. You can find a summary of that patch in the email [1]. [1] - https://www.postgresql.org/message-id/49ba49f1-8bdb-40b7-ae9e-f17d88b3afcd%40www.fastmail.com -- With Regards, Amit Kapila.
On Mon, Dec 6, 2021 at 1:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Dec 3, 2021 at 6:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Thu, Dec 2, 2021 at 7:18 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Okay, I did write a PoC patch this morning after sending out my > > > > earlier email. I polished it a bit, which is attached. > > > > > > I see multiple problems with this patch and idea. > > > > Thanks for looking at it. Yeah, I have not looked very closely at ALL > > TABLES [IN SCHEMA], though only because I suspected that those cases > > deal with partitioning in such a way that the partition duplication > > issue doesn't arise. That is, only the FOR TABLE list_of_tables and > > ADD TABLE syntax allow for the duplication issue to occur. > > > > > (a) I think you > > > forgot to deal with "All Tables In Schema" Publication which will be > > > quite tricky to deal with during attach operation. How will you remove > > > a particular relation from such a publication if there is a need to do > > > so? > > > > Hmm, my understanding of how FOR ALL TABLES... features work is that > > one cannot remove a particular relation from such publications? > > > > create schema sch; > > create table sch.p (a int primary key) partition by list (a); > > create table sch.p1 partition of sch.p for values in (1); > > create table sch.p2 partition of sch.p for values in (2); > > create table p (a int primary key) partition by list (a); > > create table p1 partition of p for values in (1); > > create table p2 partition of p for values in (2); > > create publication puball for all tables; > > create publication pubsch for all tables in schema sch; > > > > alter publication puball drop table p; > > ERROR: publication "puball" is defined as FOR ALL TABLES > > DETAIL: Tables cannot be added to or dropped from FOR ALL TABLES publications. > > > > alter publication pubsch drop table sch.p; > > ERROR: relation "p" is not part of the publication > > > > What am I missing? > > Currently, in your patch, you are trying to remove a particular > relation/partition during attach but how will you do that if such a > relation is part of All Tables In Schema publication? So IIUC the scenario of concern is when a table to be attached as a partition is in a schema that's present in pg_publication_namespace. The only way to stop it from being published is to move it to another schema that is not published using that publication. I think I misunderstood how the IN SCHEMA feature works. Specifically, I didn't know that one can add a partitioned table to the same publication (or any table other than those in a particular schema for that matter). Then the attached partition would still be present in the publication by way of being part of the schema that is present in the publication, along with the partitioned table that is added separately. Yes, my proposal in its current form can't prevent that kind of duplication. -- Amit Langote EDB: http://www.enterprisedb.com
On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Dec 6, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com > > <houzj.fnst@fujitsu.com> wrote: > > > After thinking more on this. I find there might be some usage about adding both > > > child and parent to the publication. > > > > > > For the future feature publication row filter(and maybe column filter), It > > > could be useful for user to adding child and parent with different filter > > > expression. If pubviaroot=true, user can expect the parent's filter take > > > effect, If pubviaroot=false, they can expect the child's filter take effect. > > > > > > If we disallow adding both child and parent to publication, it could be harder > > > for these features to implement. > > > > It's possible that one may want such a feature, and yes, outright > > preventing adding both the parent and a partition to a publication > > would perhaps make it harder. Mind you though that any such feature > > would need to fix the current implementation anyway to make the > > publication-behavior-related catalog entries for child tables, which > > we currently don't, unless children are added explicitly. > > > > That said, we should also be careful in implementing new features that > > offer a finer per-partition granularity, because the fact that we do > > currently offer that for many of the existing old features such as > > constraints, indexes, etc. limits the scalability of our architecture. > > That point may be off-topic for the thread, but something to consider, > > or maybe I need to take a look at the other patch to see if my > > concerns are addressable / have been addressed. > > > > Your concern is not very clear to me. Can you be more specific in what > kind of problem you see with such a design for row filtering? I guess my characterization of it is still vague but the problem I see is that we'll be adding one more feature that allows manipulating partitions directly, which means more system catalogs than there already are that know about partitions as objects. I remember being deliberate about avoiding pg_publication_rel entries for all partitions when a partitioned table is added to a publication for that very reason. Is covering the use case of defining different filters for each partition important for developing that feature? I feel that defining one for the root table and just using that for all partitions (albeit translating as needed to account for possibly different attribute numbers) would be enough for most users' use cases. Maybe I'm wrong on that. > You > might want to once look at that thread. You can find a summary of that > patch in the email [1]. Okay, I'll try to do that. -- Amit Langote EDB: http://www.enterprisedb.com
On Mon, Dec 6, 2021 at 8:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > So IIUC the scenario of concern is when a table to be attached as a > partition is in a schema that's present in pg_publication_namespace. > The only way to stop it from being published is to move it to another > schema that is not published using that publication. > > I think I misunderstood how the IN SCHEMA feature works. > Specifically, I didn't know that one can add a partitioned table to > the same publication (or any table other than those in a particular > schema for that matter). Then the attached partition would still be > present in the publication by way of being part of the schema that is > present in the publication, along with the partitioned table that is > added separately. > Right. > Yes, my proposal in its current form can't prevent that kind of duplication. > I am not sure how to proceed here. I feel it is better to go ahead with the fix Hou-san proposed here and in another email [1] to fix the know issues, especially because the issue discussed in [1] needs to be back-patched. We can evaluate your proposal separately for HEAD. What do you think? I am fine if you think that we should evaluate/analyze your proposal before making a call on whether to fix the existing issues with the proposed set of patches. [1] - https://www.postgresql.org/message-id/OS0PR01MB57165D94CB187A97128F75EA946A9%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Mon, Dec 6, 2021 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 6, 2021 at 1:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Mon, Dec 6, 2021 at 3:55 PM houzj.fnst@fujitsu.com > > > <houzj.fnst@fujitsu.com> wrote: > > > > After thinking more on this. I find there might be some usage about adding both > > > > child and parent to the publication. > > > > > > > > For the future feature publication row filter(and maybe column filter), It > > > > could be useful for user to adding child and parent with different filter > > > > expression. If pubviaroot=true, user can expect the parent's filter take > > > > effect, If pubviaroot=false, they can expect the child's filter take effect. > > > > > > > > If we disallow adding both child and parent to publication, it could be harder > > > > for these features to implement. > > > > > > It's possible that one may want such a feature, and yes, outright > > > preventing adding both the parent and a partition to a publication > > > would perhaps make it harder. Mind you though that any such feature > > > would need to fix the current implementation anyway to make the > > > publication-behavior-related catalog entries for child tables, which > > > we currently don't, unless children are added explicitly. > > > > > > That said, we should also be careful in implementing new features that > > > offer a finer per-partition granularity, because the fact that we do > > > currently offer that for many of the existing old features such as > > > constraints, indexes, etc. limits the scalability of our architecture. > > > That point may be off-topic for the thread, but something to consider, > > > or maybe I need to take a look at the other patch to see if my > > > concerns are addressable / have been addressed. > > > > > > > Your concern is not very clear to me. Can you be more specific in what > > kind of problem you see with such a design for row filtering? > > I guess my characterization of it is still vague but the problem I see > is that we'll be adding one more feature that allows manipulating > partitions directly, which means more system catalogs than there > already are that know about partitions as objects. I remember being > deliberate about avoiding pg_publication_rel entries for all > partitions when a partitioned table is added to a publication for that > very reason. > The same will be true even after this new feature. > Is covering the use case of defining different filters for each > partition important for developing that feature? > Users can only define the filters for a relation only when they want to add a particular partition to the publication. Say, if only root table is added to the publication (and pubviaroot is true) then we don't need row filters to be defined for partitions. Basically, there will be only one additional column in pg_publication_rel for the row filter. > I feel that defining > one for the root table and just using that for all partitions (albeit > translating as needed to account for possibly different attribute > numbers) would be enough for most users' use cases. > Sure, but what if the user just wants to publish only one of the partitions? I think in that case along with adding the partition to publication, we need to allow specifying row filter. -- With Regards, Amit Kapila.
On Tue, Dec 7, 2021 at 12:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Dec 6, 2021 at 8:04 PM Amit Langote <amitlangote09@gmail.com> wrote: > > So IIUC the scenario of concern is when a table to be attached as a > > partition is in a schema that's present in pg_publication_namespace. > > The only way to stop it from being published is to move it to another > > schema that is not published using that publication. > > > > I think I misunderstood how the IN SCHEMA feature works. > > Specifically, I didn't know that one can add a partitioned table to > > the same publication (or any table other than those in a particular > > schema for that matter). Then the attached partition would still be > > present in the publication by way of being part of the schema that is > > present in the publication, along with the partitioned table that is > > added separately. > > Right. > > > Yes, my proposal in its current form can't prevent that kind of duplication. > > I am not sure how to proceed here. I feel it is better to go ahead > with the fix Hou-san proposed here and in another email [1] to fix the > know issues, especially because the issue discussed in [1] needs to be > back-patched. > > We can evaluate your proposal separately for HEAD. What > do you think? Yeah, maybe. Though given the direction that the row-filters patch set is taking in allowing to definw filters on the individual partitions, I am not sure if I should be pushing the approach to disallow partitions from being added to publications explicitly alongside their parent tables. I'll try to take a look at that thread to be sure if that's actually the case. Also, for the purposes of the problems that Greg and Hou-san have discovered, I have no objection with applying Hou-san's patches. Those seem better for the back-patching anyway. Thank you. -- Amit Langote EDB: http://www.enterprisedb.com
On Tue, Dec 7, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Dec 6, 2021 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Your concern is not very clear to me. Can you be more specific in what > > > kind of problem you see with such a design for row filtering? > > > > I guess my characterization of it is still vague but the problem I see > > is that we'll be adding one more feature that allows manipulating > > partitions directly, which means more system catalogs than there > > already are that know about partitions as objects. I remember being > > deliberate about avoiding pg_publication_rel entries for all > > partitions when a partitioned table is added to a publication for that > > very reason. > > The same will be true even after this new feature. Oh, interesting. > > Is covering the use case of defining different filters for each > > partition important for developing that feature? > > > > Users can only define the filters for a relation only when they want > to add a particular partition to the publication. Say, if only root > table is added to the publication (and pubviaroot is true) then we > don't need row filters to be defined for partitions. Basically, there > will be only one additional column in pg_publication_rel for the row > filter. Okay, so you are talking about allowing setting filters on a partition that is added *explicitly* to a publication. I guess that's the only way to do such a thing given the syntax of FOR TABLE / ADD TABLE. Filters specified on a partitioned table that is added to a publication apply equally to all partitions, depending on the publish_via_partition_root configuration. So if a partition is explicitly present in a publication with a filter defined on it, I suppose following are the possible scenarios: * publish_via_partition_root=true 1. If the root table is not present in the publication (perhaps unlikely), use the partition's filter because there's nothing else to consider. 2. If the root table is present in the publication (very likely), use the root's filter, ignoring/overriding the partition's own if any. * publish_via_partition_root=false 1. If the root table is not present in the publication, use the partition's filter because there's nothing else to consider. 2. If the root table is present in the publication, use the partition's own filter if present, else root's. > > I feel that defining > > one for the root table and just using that for all partitions (albeit > > translating as needed to account for possibly different attribute > > numbers) would be enough for most users' use cases. > > > > Sure, but what if the user just wants to publish only one of the > partitions? I think in that case along with adding the partition to > publication, we need to allow specifying row filter. Yeah, I understand now that we're talking about partitions that are added explicitly. Thanks for your patience. -- Amit Langote EDB: http://www.enterprisedb.com
On Tue, Dec 7, 2021 at 10:52 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Tue, Dec 7, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 6, 2021 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Mon, Dec 6, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Your concern is not very clear to me. Can you be more specific in what > > > > kind of problem you see with such a design for row filtering? > > > > > > I guess my characterization of it is still vague but the problem I see > > > is that we'll be adding one more feature that allows manipulating > > > partitions directly, which means more system catalogs than there > > > already are that know about partitions as objects. I remember being > > > deliberate about avoiding pg_publication_rel entries for all > > > partitions when a partitioned table is added to a publication for that > > > very reason. > > > > The same will be true even after this new feature. > > Oh, interesting. > > > > Is covering the use case of defining different filters for each > > > partition important for developing that feature? > > > > > > > Users can only define the filters for a relation only when they want > > to add a particular partition to the publication. Say, if only root > > table is added to the publication (and pubviaroot is true) then we > > don't need row filters to be defined for partitions. Basically, there > > will be only one additional column in pg_publication_rel for the row > > filter. > > Okay, so you are talking about allowing setting filters on a partition > that is added *explicitly* to a publication. I guess that's the only > way to do such a thing given the syntax of FOR TABLE / ADD TABLE. > Filters specified on a partitioned table that is added to a > publication apply equally to all partitions, depending on the > publish_via_partition_root configuration. > Yes. > So if a partition is > explicitly present in a publication with a filter defined on it, I > suppose following are the possible scenarios: > > * publish_via_partition_root=true > > 1. If the root table is not present in the publication (perhaps > unlikely), use the partition's filter because there's nothing else to > consider. > > 2. If the root table is present in the publication (very likely), use > the root's filter, ignoring/overriding the partition's own if any. > > * publish_via_partition_root=false > > 1. If the root table is not present in the publication, use the > partition's filter because there's nothing else to consider. > > 2. If the root table is present in the publication, use the > partition's own filter if present, else root's. > I have not tested/checked each of these scenarios individually but I think it is something like, if publish_via_partition_root is false then we will always try to use partitions row filter and if it is not there then we don't use any filter. Similarly, if publish_via_partition_root is true, then we always try to use a partitioned table row filter and if it is not there (either because the partitioned table is not part of the publication or because there is no filter defined for it) then we don't use any filter. -- With Regards, Amit Kapila.
On Tue, Dec 7, 2021 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Dec 7, 2021 at 10:52 AM Amit Langote <amitlangote09@gmail.com> wrote: > > So if a partition is > > explicitly present in a publication with a filter defined on it, I > > suppose following are the possible scenarios: > > > > * publish_via_partition_root=true > > > > 1. If the root table is not present in the publication (perhaps > > unlikely), use the partition's filter because there's nothing else to > > consider. > > > > 2. If the root table is present in the publication (very likely), use > > the root's filter, ignoring/overriding the partition's own if any. > > > > * publish_via_partition_root=false > > > > 1. If the root table is not present in the publication, use the > > partition's filter because there's nothing else to consider. > > > > 2. If the root table is present in the publication, use the > > partition's own filter if present, else root's. > > I have not tested/checked each of these scenarios individually but I > think it is something like, if publish_via_partition_root is false > then we will always try to use partitions row filter and if it is not > there then we don't use any filter. Similarly, if > publish_via_partition_root is true, then we always try to use a > partitioned table row filter and if it is not there (either because > the partitioned table is not part of the publication or because there > is no filter defined for it) then we don't use any filter. Okay, thanks for the explanation. -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, Dec 2, 2021 at 8:42 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Dec 2, 2021 at 9:44 houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > On Wed, Dec 1, 2021 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote > > > > > <amitlangote09@gmail.com> wrote: > > > > > > On second thought, I agree that de-duplicating partitions from > > > > > > this view is an improvement. > > > > > > > > > > Fair enough. Hou-San, Can you please submit the updated patch after > > > > > fixing any pending comments including the test case? > > > > > > > > Attach the updated patch which address the comments so far. > > > > > > > > The patch only adds a testcase in publication.sql because the > > > > duplicate output doesn't cause unexpected behavior in pub-sub test. > > > > > > Thanks, the patch looks good to me. I have slightly changed the commit > > > message in the attached. I would like to commit this only in HEAD as there is no > > > user complaint about this and it might not stop any user's service unless it relies > > > on this view's output for the initial table synchronization. > > The patch looks good to me too in that it gets the job done. > Pushed. -- With Regards, Amit Kapila.
On Sat, Nov 20, 2021 7:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote <amitlangote09@gmail.com> > wrote: > > > > > > The problematic case is attaching the partition *after* the > > > subscriber has already marked the root parent as synced and/or ready > > > for replication. Refreshing the subscription doesn't help it > > > discover the newly attached partition, because a > > > publish_via_partition_root only ever tells about the root parent, > > > which would be already synced, so the subscriber would think there's > > > nothing to copy. > > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > One more thing you mentioned is that the initial sync won't work after refresh > but later changes will be replicated but I noticed that later changes also don't > get streamed till we restart the subscriber server. > I am not sure but we might not be invalidating apply workers cache due to > which it didn't notice the same. I investigated this bug recently, and I think the reason is that when receiving relcache invalidation message, the callback function[1] in walsender only reset the schema sent status while it doesn't reset the replicate_valid flag. So, it won’t rebuild the publication actions of the relation. [1] static void rel_sync_cache_relation_cb(Datum arg, Oid relid) ... /* * Reset schema sent status as the relation definition may have changed. * Also free any objects that depended on the earlier definition. */ if (entry != NULL) { entry->schema_sent = false; list_free(entry->streamed_txns); ... Also, when you DETACH a partition, the publication won’t be rebuilt too because of the same reason. Which could cause unexpected behavior if we modify the detached table's data . And the bug happens regardless of whether pubviaroot is set or not. For the fix: I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb, then the bug can be fixed. I have a bit hesitation about this approach, because it could increase the frequency of invalidating and rebuilding the publication action. But I haven't produced some other better approaches. Another possibility could be that we add a syscache callback function for pg_inherits table, but we currently don't have syscache for pg_inherits. We might need to add the cache pg_inherits first which doesn't seems better than the above approach. What do you think ? Attach an initial patch which reset the replicate_valid flag in rel_sync_cache_relation_cb and add some reproduction tests in 013_partition.pl. Best regards, Hou zj
Attachment
On Tuesday, December 14, 2021 3:42 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Sat, Nov 20, 2021 7:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > > > <amitlangote09@gmail.com> > > wrote: > > > > > > > > The problematic case is attaching the partition *after* the > > > > subscriber has already marked the root parent as synced and/or > > > > ready for replication. Refreshing the subscription doesn't help > > > > it discover the newly attached partition, because a > > > > publish_via_partition_root only ever tells about the root parent, > > > > which would be already synced, so the subscriber would think > > > > there's nothing to copy. > > > > > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > One more thing you mentioned is that the initial sync won't work after > > refresh but later changes will be replicated but I noticed that later > > changes also don't get streamed till we restart the subscriber server. > > I am not sure but we might not be invalidating apply workers cache due > > to which it didn't notice the same. > > I investigated this bug recently, and I think the reason is that when receiving > relcache invalidation message, the callback function[1] in walsender only reset > the schema sent status while it doesn't reset the replicate_valid flag. So, it > won’t rebuild the publication actions of the relation. > > [1] > static void > rel_sync_cache_relation_cb(Datum arg, Oid relid) ... > /* > * Reset schema sent status as the relation definition may have > changed. > * Also free any objects that depended on the earlier definition. > */ > if (entry != NULL) > { > entry->schema_sent = false; > list_free(entry->streamed_txns); > ... > > Also, when you DETACH a partition, the publication won’t be rebuilt too > because of the same reason. Which could cause unexpected behavior if we > modify the detached table's data . And the bug happens regardless of whether > pubviaroot is set or not. > > For the fix: > > I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb, then > the bug can be fixed. I have a bit hesitation about this approach, because it > could increase the frequency of invalidating and rebuilding the publication > action. But I haven't produced some other better approaches. > I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent commit 7f481b8. Currently, we always invalidate the RelationSyncCache when attaching a partition, so the pubactions of the newly attached partition will be rebuilt correctly. Best regards, Hou zj
On 2022-Apr-11, houzj.fnst@fujitsu.com wrote: > I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent > commit 7f481b8. Currently, we always invalidate the RelationSyncCache when > attaching a partition, so the pubactions of the newly attached partition will > be rebuilt correctly. Hm, this commit was not backpatched -- it is only in master. I just checked that pg14 behaves as you describe in the opening email of this thread; did we decide that we don't want to change behavior of stable branches? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Friday, April 15, 2022 12:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Apr-11, houzj.fnst@fujitsu.com wrote: > > > I have confirmed that the bug of ATTACH PARTITION has been fixed due to > recent > > commit 7f481b8. Currently, we always invalidate the RelationSyncCache when > > attaching a partition, so the pubactions of the newly attached partition will > > be rebuilt correctly. > > Hm, this commit was not backpatched -- it is only in master. I just > checked that pg14 behaves as you describe in the opening email of this > thread; did we decide that we don't want to change behavior of stable > branches? Hi, Do you mean the original bug: "pg_get_publication_tables function will output duplicate partition relid when adding both child and parent table to the publication(pubviaroot = false). " ? We decided to commit the fix for it only in HEAD as there is no user complaint about this and it might not stop any user's service unless it relies on this view's output for the initial table synchronization. Or do you mean another one mentioned here which is "the changes on newly attached partition won't be replicated" ? For this one, I think it's fine to backpatch the fix and I can prepare the patch for back branches if needed. BTW, if we want to backpatch this fix we might need to keep the order of RelationSyncEntry struct member unchanged in back branches. Best regards, Hou zj