Home > mailing lists

Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From	Peter Smith
Subject	Re: Data is copied twice when specifying both child and parent table in publication
Date	March 22, 2023 04:50:07
Msg-id	CAHut+Ptk6oANo2-mw5191aU0HHP9YKjYhWgTZYU0g0Y1Co_5SQ@mail.gmail.com Whole thread Raw
In response to	RE: Data is copied twice when specifying both child and parent table in publication ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses	RE: Data is copied twice when specifying both child and parent table in publication
List	pgsql-hackers

Tree view

Here are some review comments for patch code of HEAD_v19-0001

======
doc/src/sgml/ref/create_publication.sgml

1.
+         <para>
+          There can be a case where a subscription combines multiple
+          publications. If a root partitioned table is published by any
+          subscribed publications which set
+          <literal>publish_via_partition_root</literal> = true, changes on this
+          root partitioned table (or on its partitions) will be published using
+          the identity and schema of this root partitioned table rather than
+          that of the individual partitions.
+         </para>

1a.
The paragraph prior to this one just refers to "partitioned tables"
instead of "root partitioned table", so IMO we should continue with
the same terminology.

I also modified the remaining text slightly. AFAIK my suggestion
conveys exactly the same information but is shorter.

SUGGESTION
There can be a case where one subscription combines multiple
publications. If any of those publications has set
<literal>publish_via_partition_root</literal> = true, then changes in
a partitioned table (or on its partitions) will be published using the
identity and schema of the partitioned table.

~

1b.
Shouldn't that paragraph (or possibly somewhere in the CREATE
SUBSCRIPTION notes?) also explain that in this scenario the logical
replication will only publish one set of changes? After all, this is
the whole point of the patch, but I am not sure if the user will know
of this behaviour from the current documentation.

======
src/backend/catalog/pg_publication.c

2. filter_partitions

BEFORE:
static void
filter_partitions(List *table_infos)
{
ListCell   *lc;

foreach(lc, table_infos)
{
bool skip = false;
List    *ancestors = NIL;
ListCell    *lc2;
published_rel    *table_info = (published_rel *) lfirst(lc);

if (get_rel_relispartition(table_info->relid))
ancestors = get_partition_ancestors(table_info->relid);

foreach(lc2, ancestors)
{
Oid ancestor = lfirst_oid(lc2);

/* Is ancestor exists in the published table list? */
if (is_ancestor_member_tableinfos(ancestor, table_infos))
{
skip = true;
break;
}
}

if (skip)
table_infos = foreach_delete_current(table_infos, lc);
}
}

~

2a.
My previous review [1] (see #1) suggested putting most code within the
condition. AFAICT my comment still is applicable but was not yet
addressed.

2b.
IMO the comment "/* Is ancestor exists in the published table list?
*/" is unnecessary because it is already clear what is the purpose of
the function named "is_ancestor_member_tableinfos".


SUGGESTION
static void
filter_partitions(List *table_infos)
{
    ListCell   *lc;

    foreach(lc, table_infos)
    {
        if (get_rel_relispartition(table_info->relid))
        {
            bool skip = false;
            ListCell    *lc2;
            published_rel    *table_info = (published_rel *) lfirst(lc);
            List *ancestors = get_partition_ancestors(table_info->relid);

            foreach(lc2, ancestors)
            {
                Oid ancestor = lfirst_oid(lc2);

                if (is_ancestor_member_tableinfos(ancestor, table_infos))
                {
                    skip = true;
                    break;
                }
            }

            if (skip)
                table_infos = foreach_delete_current(table_infos, lc);
        }
    }
}

======
src/backend/commands/subscriptioncmds.c

3.
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
 {
  WalRcvExecResult *res;
- StringInfoData cmd;
+ StringInfoData cmd,
+ pub_names;
  TupleTableSlot *slot;
  Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
  List    *tablelist = NIL;
- bool check_columnlist = (walrcv_server_version(wrconn) >= 150000);
+ int server_version = walrcv_server_version(wrconn);
+ bool check_columnlist = (server_version >= 150000);
+
+ initStringInfo(&pub_names);
+ get_publications_str(publications, &pub_names, true);

  initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");

- /* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ /* Get the list of tables from the publisher. */
+ if (server_version >= 160000)
+ {

~

I think the 'pub_names' is only needed within that ">= 160000" condition.

So all the below code can be moved into that scope can't it?

+ StringInfoData pub_names;
+ initStringInfo(&pub_names);
+ get_publications_str(publications, &pub_names, true);

+ pfree(pub_names.data);

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuNsvO9o9XzeJuSLsAsndgCKVphDPBRqYuOTy2bR28E%2Bg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

pgsql-hackers by date:

From: Michael Paquier
Date: 22 March 2023, 04:45:12
Subject: Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

From: Andres Freund
Date: 22 March 2023, 05:11:58
Subject: Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

Previous

Next