Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAHut+Ptk6oANo2-mw5191aU0HHP9YKjYhWgTZYU0g0Y1Co_5SQ@mail.gmail.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses RE: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
Here are some review comments for patch code of HEAD_v19-0001

======
doc/src/sgml/ref/create_publication.sgml

1.
+         <para>
+          There can be a case where a subscription combines multiple
+          publications. If a root partitioned table is published by any
+          subscribed publications which set
+          <literal>publish_via_partition_root</literal> = true, changes on this
+          root partitioned table (or on its partitions) will be published using
+          the identity and schema of this root partitioned table rather than
+          that of the individual partitions.
+         </para>

1a.
The paragraph prior to this one just refers to "partitioned tables"
instead of "root partitioned table", so IMO we should continue with
the same terminology.

I also modified the remaining text slightly. AFAIK my suggestion
conveys exactly the same information but is shorter.

SUGGESTION
There can be a case where one subscription combines multiple
publications. If any of those publications has set
<literal>publish_via_partition_root</literal> = true, then changes in
a partitioned table (or on its partitions) will be published using the
identity and schema of the partitioned table.

~

1b.
Shouldn't that paragraph (or possibly somewhere in the CREATE
SUBSCRIPTION notes?) also explain that in this scenario the logical
replication will only publish one set of changes? After all, this is
the whole point of the patch, but I am not sure if the user will know
of this behaviour from the current documentation.

======
src/backend/catalog/pg_publication.c

2. filter_partitions

BEFORE:
static void
filter_partitions(List *table_infos)
{
ListCell   *lc;

foreach(lc, table_infos)
{
bool skip = false;
List    *ancestors = NIL;
ListCell    *lc2;
published_rel    *table_info = (published_rel *) lfirst(lc);

if (get_rel_relispartition(table_info->relid))
ancestors = get_partition_ancestors(table_info->relid);

foreach(lc2, ancestors)
{
Oid ancestor = lfirst_oid(lc2);

/* Is ancestor exists in the published table list? */
if (is_ancestor_member_tableinfos(ancestor, table_infos))
{
skip = true;
break;
}
}

if (skip)
table_infos = foreach_delete_current(table_infos, lc);
}
}

~

2a.
My previous review [1] (see #1) suggested putting most code within the
condition. AFAICT my comment still is applicable but was not yet
addressed.

2b.
IMO the comment "/* Is ancestor exists in the published table list?
*/" is unnecessary because it is already clear what is the purpose of
the function named "is_ancestor_member_tableinfos".


SUGGESTION
static void
filter_partitions(List *table_infos)
{
    ListCell   *lc;

    foreach(lc, table_infos)
    {
        if (get_rel_relispartition(table_info->relid))
        {
            bool skip = false;
            ListCell    *lc2;
            published_rel    *table_info = (published_rel *) lfirst(lc);
            List *ancestors = get_partition_ancestors(table_info->relid);

            foreach(lc2, ancestors)
            {
                Oid ancestor = lfirst_oid(lc2);

                if (is_ancestor_member_tableinfos(ancestor, table_infos))
                {
                    skip = true;
                    break;
                }
            }

            if (skip)
                table_infos = foreach_delete_current(table_infos, lc);
        }
    }
}

======
src/backend/commands/subscriptioncmds.c

3.
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
 {
  WalRcvExecResult *res;
- StringInfoData cmd;
+ StringInfoData cmd,
+ pub_names;
  TupleTableSlot *slot;
  Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
  List    *tablelist = NIL;
- bool check_columnlist = (walrcv_server_version(wrconn) >= 150000);
+ int server_version = walrcv_server_version(wrconn);
+ bool check_columnlist = (server_version >= 150000);
+
+ initStringInfo(&pub_names);
+ get_publications_str(publications, &pub_names, true);

  initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");

- /* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ /* Get the list of tables from the publisher. */
+ if (server_version >= 160000)
+ {

~

I think the 'pub_names' is only needed within that ">= 160000" condition.

So all the below code can be moved into that scope can't it?

+ StringInfoData pub_names;
+ initStringInfo(&pub_names);
+ get_publications_str(publications, &pub_names, true);

+ pfree(pub_names.data);

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuNsvO9o9XzeJuSLsAsndgCKVphDPBRqYuOTy2bR28E%2Bg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
Next
From: Andres Freund
Date:
Subject: Re: CREATE DATABASE ... STRATEGY WAL_LOG issues