Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Handle infinite recursion in logical replication setup
Date
Msg-id CAHut+PvsvrZiPPZK7=_4c+Lc+MCe8peqDKMwbH+_mp1wj0-yOQ@mail.gmail.com
Whole thread Raw
In response to Re: Handle infinite recursion in logical replication setup  (vignesh C <vignesh21@gmail.com>)
Responses Re: Handle infinite recursion in logical replication setup  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Handle infinite recursion in logical replication setup  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Re: Handle infinite recursion in logical replication setup  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
Here are my comments for the latest patch v6-0001.

(I will post my v6-0002 review comments separately)

PATCH v6-0001 comments
======================

1.1 General - Option name

I still feel like the option name is not ideal. Unfortunately, this is
important because any name change would impact lots of these patch
files and docs, struct members etc.

It was originally called "local_only", but I thought that as a
SUBSCRIPTION option that was confusing because "local" means local to
what? Really it is local to the publisher, not local to the
subscriber, so that name seemed misleading.

Then I suggested "publish_local_only". Although that resolved the
ambiguity problem, other people thought it seemed odd to have the
"publish" prefix for a subscription-side option.

So now it is changed again to "subscribe_local_only" -- It's getting
better but still, it is implied that the "local" means local to the
publisher except there is nothing in the option name really to convey
that meaning. IMO we here all understand the meaning of this option
mostly by familiarity with this discussion thread, but I think a user
coming to this for the first time will still be confused by the name.

Below are some other name ideas. Maybe they are not improvements, but
it might help other people to come up with something better.

subscribe_publocal_only = true/false
origin = pub_only/any
adjacent_data_only = true/false
direct_subscriptions_only = true/false
...


(FYI, the remainder of these review comments will assume the option is
still called "subscribe_local_only")

~~~

1.2 General - inconsistent members and args

IMO the struct members and args should also be named for close
consistency with whatever the option name is.

Currently the option is called "subscription_local_only".  So I think
the members/args would be better to be called "local_only" instead of
"only_local".

~~~

1.3 Commit message - wrong option name

The commit message refers to the option name as "publish_local_only"
instead of the option name that is currently implemented.

~~~

1.4 Commit message - wording

The wording seems a bit off. Below is suggested simpler wording which
I AFAIK conveys the same information.

BEFORE
Add an option publish_local_only which will subscribe only to the locally
generated data in the publisher node. If subscriber is created with this
option, publisher will skip publishing the data that was subscribed
from other nodes. It can be created using following syntax:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=9999'
PUBLICATION pub1 with (publish_local_only = on);

SUGGESTION
This patch adds a new SUBSCRIPTION boolean option
"subscribe_local_only". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=9999'
PUBLICATION pub1 with (subscribe_local_only = true);

~~~

1.5 doc/src/sgml/ref/create_subscription.sgml - "generated" changes.

+         <para>
+          Specifies whether the subscription will request the publisher to send
+          locally generated changes or both the locally generated changes and
+          the replicated changes that was generated from other nodes. The
+          default is <literal>false</literal>.
+         </para>

For some reason, it seemed a bit strange to me to use the term
"generated" changes. Maybe better to refer to the origin of changes?

SUGGESTION
Specifies whether the publisher should send only changes that
originated locally at the publisher node, or send any publisher node
changes regardless of their origin. The default is false.

~~~

1.6 src/backend/replication/pgoutput/pgoutput.c -
LOGICALREP_PROTO_TWOPHASE_VERSION_NUM

@@ -496,6 +509,12 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
  else
  ctx->twophase_opt_given = true;

+ if (data->only_local && data->protocol_version <
LOGICALREP_PROTO_TWOPHASE_VERSION_NUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("requested proto_version=%d does not support
subscribe_local_only, need %d or higher",
+ data->protocol_version, LOGICALREP_PROTO_TWOPHASE_VERSION_NUM)));

I thought this code should not be using
LOGICALREP_PROTO_TWOPHASE_VERSION_NUM. Shouldn't there be some newly
introduced constant like LOGICALREP_PROTO_LOCALONLY_VERSION_NUM which
you will use here?

~~~

1.7 src/bin/pg_dump/pg_dump.c - 150000

@@ -4451,11 +4452,13 @@ getSubscriptions(Archive *fout)
  if (fout->remoteVersion >= 150000)
  appendPQExpBufferStr(query,
  " s.subtwophasestate,\n"
- " s.subdisableonerr\n");
+ " s.subdisableonerr,\n"
+ " s.sublocal\n");
  else
  appendPQExpBuffer(query,
    " '%c' AS subtwophasestate,\n"
-   " false AS subdisableonerr\n",
+   " false AS subdisableonerr,\n"
+   " false AS s.sublocal\n",
    LOGICALREP_TWOPHASE_STATE_DISABLED);

I think this local_only feature is unlikely to get into the PG15
release, so this code should be split out into a separate condition
because later will need to change to say >= 160000.

~~~

1.8 src/bin/pg_dump/pg_dump.c - dumpSubscription

@@ -4585,6 +4591,9 @@ dumpSubscription(Archive *fout, const
SubscriptionInfo *subinfo)
  if (strcmp(subinfo->subdisableonerr, "t") == 0)
  appendPQExpBufferStr(query, ", disable_on_error = true");

+ if (strcmp(subinfo->sublocal, "f") != 0)
+ appendPQExpBufferStr(query, ", subscribe_local_only = on");
+

I felt it is more natural to say "if it is true set to true", instead
of "if it is not false set to on".

SUGGESTION
if (strcmp(subinfo->sublocal, "t") == 0)
appendPQExpBufferStr(query, ", subscribe_local_only = true");

~~~

1.9 src/bin/psql/describe.c - 150000

@@ -6318,9 +6318,11 @@ describeSubscriptions(const char *pattern, bool verbose)
  if (pset.sversion >= 150000)
  appendPQExpBuffer(&buf,
    ", subtwophasestate AS \"%s\"\n"
-   ", subdisableonerr AS \"%s\"\n",
+   ", subdisableonerr AS \"%s\"\n"
+   ", sublocal AS \"%s\"\n",
    gettext_noop("Two phase commit"),
-   gettext_noop("Disable on error"));
+   gettext_noop("Disable on error"),
+   gettext_noop("Only local"));

I think this local_only feature is unlikely to get into the PG15
release, so this code should be split out into a separate condition
because later will need to change to say >= 160000.

~~~

1.10 src/bin/psql/describe.c - describeSubscriptions column

@@ -6318,9 +6318,11 @@ describeSubscriptions(const char *pattern, bool verbose)
  if (pset.sversion >= 150000)
  appendPQExpBuffer(&buf,
    ", subtwophasestate AS \"%s\"\n"
-   ", subdisableonerr AS \"%s\"\n",
+   ", subdisableonerr AS \"%s\"\n"
+   ", sublocal AS \"%s\"\n",
    gettext_noop("Two phase commit"),
-   gettext_noop("Disable on error"));
+   gettext_noop("Disable on error"),
+   gettext_noop("Only local"));

I think the column name here should be more consistent with the option
name. e.g. it should be "Local only", not "Only local".

~~~

1.11 src/bin/psql/tab-complete.c - whitespace

@@ -3167,7 +3167,7 @@ psql_completion(const char *text, int start, int end)
  /* Complete "CREATE SUBSCRIPTION <name> ...  WITH ( <opt>" */
  else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
  COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
-   "enabled", "slot_name", "streaming",
+   "enabled",  "slot_name", "streaming", "subscribe_local_only",

The patch accidentally added a space char before the "slot_name".

~~~

1.12 src/include/replication/walreceiver.h - "generated"

@@ -183,6 +183,7 @@ typedef struct
  bool streaming; /* Streaming of large transactions */
  bool twophase; /* Streaming of two-phase transactions at
  * prepare time */
+ bool only_local; /* publish only locally generated data */

This is a similar review comment as #1.5 about saying the word "generated".
Maybe there is another way to word this?

~~~

1.13 src/test/regress/sql/subscription.sql - missing test case

Isn't there a missing test case for ensuring that the new option is boolean?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side