Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Handle infinite recursion in logical replication setup
Date
Msg-id CAHut+PueiXUeb4bfdQXFStpxp6KLMEDXGRN_ta-N2RYBQAhm6g@mail.gmail.com
Whole thread Raw
In response to Re: Handle infinite recursion in logical replication setup  (vignesh C <vignesh21@gmail.com>)
Responses Re: Handle infinite recursion in logical replication setup
Re: Handle infinite recursion in logical replication setup
List pgsql-hackers
Here are my review comments for v45-0001:

======

1. doc/src/sgml/logical-replication.sgml

  <para>
   To find which tables might potentially include non-local origins (due to
   other subscriptions created on the publisher) try this SQL query:
<programlisting>
SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
FROM pg_publication P,
     LATERAL pg_get_publication_tables(P.pubname) GPT
     LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
     pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
      P.pubname IN (pub-names);
</programlisting></para>

1a.
To use "<pub-names>" with the <> then simply put meta characters in the SGML.
e.g.
<pub-names>

~

1b.
The patch forgot to add the SQL "#" instruction as suggested by my
previous comment (see [1] #3b)

~~~

2.

 <sect1 id="preventing-inconsistencies-for-copy_data-origin">
  <title>Preventing Data Inconsistencies for copy_data, origin=NONE</title>

The title is OK, but I think this should all be a <sect2> sub-section
of "31.2 Subscription"

======

3. src/backend/commands/subscriptioncmds.c - check_publications_origin

+ initStringInfo(&cmd);
+ appendStringInfoString(&cmd,
+    "SELECT DISTINCT P.pubname AS pubname\n"
+    "FROM pg_publication P,\n"
+    " LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+    " LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+    " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+    "WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');
+ get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);

(see from get_skip_tables_str)

+ appendStringInfoString(dest, " AND C.oid NOT IN (SELECT C.oid FROM
pg_class C JOIN pg_namespace N on N.oid = C.relnamespace where ");


IMO the way you are using get_skip_tables_str should be modified. I
will show by way of example below.
- "where" -> "WHERE"
- put the SELECT at the caller instead of inside the function
- handle the ")" at the caller

All this will also make the body of the 'get_skip_tables_str' function
much simpler (see the next review comments)

SUGGESTION
if (subrel_count > 0)
{
/* TODO - put some explanatory comment here about skipping the tables */
appendStringInfo(&cmd,
" AND C.oid NOT IN (\n"
"SELECT C.oid FROM pg_class C\n"
"JOIN pg_namespace N on N.oid = C.relnamespace WHERE ");
get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
appendStringInf(&cmd, “)”);
}

~~~

4. src/backend/commands/subscriptioncmds.c - get_skip_tables_str

+/*
+ * Add the table names that should be skipped.
+ */

This comment does not have enough detail to know really what the
function is for. Perhaps you only need to say that this is a helper
function for 'check_publications_origin' and then where it is called
you can give a comment (e.g. see my review comment #3)

~~

5. get_skip_tables_str (body)

5a. Variable 'count' is not a very good name; IMO just say 'i' for
index, and it can be declared C99 style.

~

5b. Variable 'first' is not necessary - it's same as (i == 0)

~

5c. The whole "SELECT" part and the ")" parts are more simply done at
the caller (see the review comment #3)

~

5d.

+ appendStringInfo(dest, "(C.relname = '%s' and N.nspname = '%s')",
+ tablename, schemaname);

It makes no difference but I thought it would feel more natural if the
SQL would compare the schema name *before* the table name.

~

5e.
"and" -> "AND"

~

Doing all 5a,b,c,d, and e means overall having a much simpler function body:

SUGGESTION
+ for (int i = 0; i < subrel_count; i++)
+ {
+ Oid relid = subrel_local_oids[i];
+ char    *schemaname = get_namespace_name(get_rel_namespace(relid));
+ char    *tablename = get_rel_name(relid);
+
+ if (i > 0)
+ appendStringInfoString(dest, " OR ");
+
+ appendStringInfo(dest, "(N.nspname = '%s' AND C.relname = '%s')",
+ schemaname, tablename);
+ }

------
[1] https://www.postgresql.org/message-id/CAHut%2BPsku25%2BVjz7HiohWxc2WU07O_ZV4voFG%2BU7WzwKhUzpGQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Column Filtering in Logical Replication
Next
From: Amit Kapila
Date:
Subject: Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)