Re: pg_get_publication_tables() output duplicate relid - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: pg_get_publication_tables() output duplicate relid |
Date | |
Msg-id | CALj2ACX69Pr5HoEfifB_=cjSz-G=eH7MV6L0sjN_YWtwuJX-yA@mail.gmail.com Whole thread Raw |
In response to | pg_get_publication_tables() output duplicate relid ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Responses |
Re: pg_get_publication_tables() output duplicate relid
RE: pg_get_publication_tables() output duplicate relid |
List | pgsql-hackers |
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > Hi hackers, > > In another thread[1], we found the pg_get_publication_tables function will output > duplicate partition relid when adding both child and parent table to the > publication(pubviaroot = false). > > Example: > create table tbl1 (a int) partition by range (a); > create table tbl1_part1 partition of tbl1 for values from (1) to (10); > create table tbl1_part2 partition of tbl1 for values from (10) to (20); > create publication pub for table > tbl1, tbl1_part1 with (publish_via_partition_root=false); > > select * from pg_get_publication_tables('pub'); > relid > ------- > 16387 > 16390 > 16387 > > The reason of the duplicate output is that: > > pg_get_publication_tables invokes function GetPublicationRelations internally. > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1' > twice. First time from extracting partitions from the specified parent table > 'tbl1', second time from the explicitly specified partition 'tbl1_part1'. > > > I am not sure is this behavior expected as it seems natural for > pg_get_publication_tables to return duplicate-free relid list. OTOH, there > seems no harm for the current behavior(duplicate output), it doesn't affect the > initial sync and change replication when using logical replication. > > Personally, I think it might be better to make the output of > pg_get_publication_tables duplicate-free, because the change happened on each > output relid will only be replicated once. So, it seems more consistent to > output each relid only once. > > Thoughts ? > > (Attach a patch which make the output duplicate-free) > > [1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com The users can always specify the distinct clause, see [1]. I don't see any problem with the existing way. If required you can specify in the view pg_publication_tables documentation that "This view returns multiple rows for the same relation, if the relation is child to a parent partition table and if both the parent and child are specified in the publication" or something similart. If at all, the pg_get_publication_tables() is supposed to give unique outputs, let's fix it and it is a good idea to specify that the view pg_publication_tables returns unique rows even though child and parent partition tables are specified in the publication. I have fdw comments on the patch: 1) Why to do result = list_concat_unique_oid(result, relids); within the while loop which can make the while loop O(n*n*n) complexity(list_concat_unique_oid is O(n*n)? Can't you use list_append_unique, of course this can also be costlier? Let's avoid adding anything to the while loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n)) /* Now sort and de-duplicate the result list */ list_sort(result, list_oid_cmp); list_deduplicate_oid(result); while (HeapTupleIsValid(tup = systable_getnext(scan))) { Form_pg_publication_rel pubrel; + List *relids = NIL; pubrel = (Form_pg_publication_rel) GETSTRUCT(tup); - result = GetPubPartitionOptionRelations(result, pub_partopt, + relids = GetPubPartitionOptionRelations(relids, pub_partopt, pubrel->prrelid); + + result = list_concat_unique_oid(result, relids); } 2) Are you sure you want GetPublicationRelations to be returning the unique relations? I'm just thinking why can't you just do list_sort+list_deduplicate_oid in the caller? I think this makes the function more restrictive. If required, you can pass a boolean flag (bool give_unique and specify in the function comments and if passed true do list_sort+list_deduplicate_oid at the end of GetPublicationRelations ). [1] postgres=# select * from pg_get_publication_tables('pub'); relid ------- 16387 16390 16387 (3 rows) postgres=# select distinct * from pg_get_publication_tables('pub'); relid ------- 16387 16390 (2 rows) postgres=# select * from pg_publication_tables; pubname | schemaname | tablename ---------+------------+------------ pub | public | tbl1_part1 pub | public | tbl1_part2 pub | public | tbl1_part1 (3 rows) postgres=# select distinct * from pg_publication_tables; pubname | schemaname | tablename ---------+------------+------------ pub | public | tbl1_part1 pub | public | tbl1_part2 (2 rows) Regards, Bharath Rupireddy.
pgsql-hackers by date: