Re: pg_get_publication_tables() output duplicate relid - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_get_publication_tables() output duplicate relid
Date
Msg-id CALj2ACX69Pr5HoEfifB_=cjSz-G=eH7MV6L0sjN_YWtwuJX-yA@mail.gmail.com
Whole thread Raw
In response to pg_get_publication_tables() output duplicate relid  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses Re: pg_get_publication_tables() output duplicate relid
RE: pg_get_publication_tables() output duplicate relid
List pgsql-hackers
On Mon, Nov 15, 2021 at 1:48 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Hi hackers,
>
> In another thread[1], we found the pg_get_publication_tables function will output
> duplicate partition relid when adding both child and parent table to the
> publication(pubviaroot = false).
>
> Example:
> create table tbl1 (a int) partition by range (a);
> create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> create publication pub for table
> tbl1, tbl1_part1 with (publish_via_partition_root=false);
>
> select * from pg_get_publication_tables('pub');
>  relid
> -------
>  16387
>  16390
>  16387
>
> The reason of the duplicate output is that:
>
> pg_get_publication_tables invokes function GetPublicationRelations internally.
> In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
> twice. First time from extracting partitions from the specified parent table
> 'tbl1', second time from the explicitly specified partition 'tbl1_part1'.
>
>
> I am not sure is this behavior expected as it seems natural for
> pg_get_publication_tables to return duplicate-free relid list. OTOH, there
> seems no harm for the current behavior(duplicate output), it doesn't affect the
> initial sync and change replication when using logical replication.
>
> Personally, I think it might be better to make the output of
> pg_get_publication_tables duplicate-free, because the change happened on each
> output relid will only be replicated once. So, it seems more consistent to
> output each relid only once.
>
> Thoughts ?
>
> (Attach a patch which make the output duplicate-free)
>
> [1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com

The users can always specify the distinct clause, see [1]. I don't see
any problem with the existing way. If required you can specify in the
view pg_publication_tables documentation that "This view returns
multiple rows for the same relation, if the relation is child to a
parent partition table and if both the parent and child are specified
in the publication" or something similart.

If at all, the pg_get_publication_tables() is supposed to give unique
outputs, let's fix it and it is a good idea to specify that the view
pg_publication_tables returns unique rows even though child and parent
partition tables are specified in the publication.

I have fdw comments on the patch:
1) Why to do  result = list_concat_unique_oid(result, relids); within
the while loop which can make the while loop O(n*n*n)
complexity(list_concat_unique_oid is O(n*n)? Can't you use
list_append_unique, of course this can also be costlier? Let's avoid
adding anything to the while loop and use
list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))

    /* Now sort and de-duplicate the result list */
    list_sort(result, list_oid_cmp);
    list_deduplicate_oid(result);

  while (HeapTupleIsValid(tup = systable_getnext(scan)))
  {
  Form_pg_publication_rel pubrel;
+ List *relids = NIL;

  pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
- result = GetPubPartitionOptionRelations(result, pub_partopt,
+ relids = GetPubPartitionOptionRelations(relids, pub_partopt,
  pubrel->prrelid);
+
+ result = list_concat_unique_oid(result, relids);
  }

2) Are you sure you want GetPublicationRelations to be returning the
unique relations? I'm just thinking why can't you just do
list_sort+list_deduplicate_oid in the caller? I think this makes the
function more restrictive. If required, you can pass a boolean flag
(bool give_unique and specify in the function comments and if passed
true do list_sort+list_deduplicate_oid at the end of
GetPublicationRelations ).

[1]
postgres=# select * from pg_get_publication_tables('pub');
 relid
-------
 16387
 16390
 16387
(3 rows)

postgres=# select distinct * from pg_get_publication_tables('pub');
 relid
-------
 16387
 16390
(2 rows)

postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename
---------+------------+------------
 pub     | public     | tbl1_part1
 pub     | public     | tbl1_part2
 pub     | public     | tbl1_part1
(3 rows)

postgres=# select distinct * from pg_publication_tables;
 pubname | schemaname | tablename
---------+------------+------------
 pub     | public     | tbl1_part1
 pub     | public     | tbl1_part2
(2 rows)

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Partial aggregates pushdown
Next
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication