Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers

From vignesh C
Subject Re: Handle infinite recursion in logical replication setup
Date
Msg-id CALDaNm3iLLxP4OV+yQHs-c71P6zQ9W8D30DGsve1SQs_1pFsSQ@mail.gmail.com
Whole thread Raw
In response to Re: Handle infinite recursion in logical replication setup  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Thu, May 26, 2022 at 2:06 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> This post completes my review of patch v15*.
>
> ======
>
> 1. A general comment affecting both patches
>
> I realised the things defined in the subscription WITH are referred to
> in the PG DOCS as "parameters" (not "options"). See CREATE
> SUBSCRIPTION [1]. This is already reflected in some of my review
> comments for the v15-0002 below., but I'd already posted my v15-0001
> review comments before noticing this.
>
> Please also go through the v15-0001 patch to make all the necessary
> terminology changes (including in the v15-0001 commit message).

Modified

> =====
>
> Please find below review comments for patch v15-0002.
>
> 2. Commit message
>
> This patch does couple of things:
> change 1) Added force option for copy_data.
> change 2) Check and throw an error if the publication tables were also
> subscribing data in the publisher from other publishers.
>
> --
>
> 2a.
> "couple" -> "a couple"
> "Added force" -> "Adds force ..."
> "Check and throw" -> "Checks and throws ..."

Modified

> 2b.
> Isn't "Change 2)" only applicable when copy_data and only_local are
> on? It should say so here.

Modified

> ~~~
>
> 3. Commit message
>
> node1: Table t1 (c1 int) has data 1, 2, 3, 4
> node2: Table t1 (c1 int) has data 5, 6, 7, 8
>
> --
>
> I feel these kinds of replication examples are always much easier to
> visualize when the data somehow indicates where it came from. E.g.
> consider using different data in your example like:
>
> node1: Table t1 (c1 int) has data 11, 12, 13, 14
> node2: Table t1 (c1 int) has data 21, 22, 23, 24

Modified

> ~~~
>
> 4. Commit message
>
> Examples in this message exceed 80 chars. Please wrap the SQL appropriately

Modified

> ~~~
>
> 5. Commit message
>
> After this the data will be something like this:
> node1:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> node2:
> 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
>
> So, you can see that data on node2 (5, 6, 7, 8) is duplicated.
>
> --
>
> Yeah, but is that entirely correct though? AFAIK the scenario
> described by this example is going to recurse forever.

Modified

> ~~~
>
> 6. Commit message
>
> This problem can be solved by using only_local and copy_data option as
> given below
>
> --
>
> "option" -> "parameters"

Modified

> ~~~
>
> 7. Commit message
>
> There are other minor review comments but it's too messy to report in
> detail here. I suggest checking all the ":" (some were missing) and
> the letter case of/within the sentences. I also suggest
> cutting/pasting all this text into MSWord or some grammar checker to
> correct anything else reported.

Modified

> ~~~
>
> 8. Commit message
>
> CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>'
> PUBLICATION pub_node1 WITH (copy_data = force, only_local = on);
> ERROR:  CREATE/ALTER SUBSCRIPTION with only_local and copy_data as
> true is not allowed when the publisher might have replicated data
>
> --
>
> Looks like a typo: e.g. If copy_data is "force" then how is the error happening?

Modified

> ~~~
>
> 9. Commit message
>
> The examples seem confused/muddled in some way IMO.
>
> IIUC the new ERROR is like a new integrity check for when the
> subscription is using the only_local = on, along with copy_data = on.
>
> And so then the copy_data = force was added as a way to override that
> previous error being reported.
>
> So unless you know about the possible error then the "force" option
> makes no sense. So I think you need to change the order of the
> explanations in the commit message to describe the new error first.

Modified

> ======
>
> 10. doc/src/sgml/logical-replication.sgml
>
> +
> + <sect1 id="bidirectional-logical-replication">
> +  <title>Bidirectional logical replication</title>
> +
> +  <sect2 id="setting-bidirectional-replication-two-nodes">
> +   <title>Setting bidirectional replication between two nodes</title>
> +   <para>
> +    Bidirectional replication is useful in creating a multi-master database
> +    which helps in performing read/write operations from any of the nodes.
> +    Setting up bidirectional logical replication between two nodes requires
> +    creation of a publication in all the nodes, creating subscriptions in
> +    each of the nodes that subscribes to data from all the nodes. The steps
> +    to create a two-node bidirectional replication when there is no data in
> +    both the nodes are given below:
> +   </para>
>
> This description is confusing. It seems to be trying to say something
> about a multi-master system (e.g. you say "all the nodes") yet this is
> all written inside the section about "... two nodes", so saying "all
> the nodes" makes no sense here.
>
> I think you need to split this information and put some kind of blurb
> text at the top level (section 31.11), and then this section (for "two
> nodes") should ONLY be talking about the case when there are just TWO
> nodes.

Modified

> ~~~
>
> 11. doc/src/sgml/logical-replication.sgml
>
> +   <para>
> +    Lock the required tables in <literal>node1</literal> and
> +    <literal>node2</literal> till the setup is completed.
> +   </para>
>
> "till" -> "until" (make this same change in multiple places)

Modified

> ~~~
>
> 12. doc/src/sgml/logical-replication.sgml
>
> +  <sect2 id="add-new-node-data-in-existing-node">
> +   <title>Adding a new node when data is present in the existing nodes</title>
> +    <para>
> +     Adding a new node <literal>node3</literal> to the existing
> +     <literal>node1</literal> and <literal>node2</literal> when data is present
> +     in existing nodes <literal>node1</literal> and <literal>node2</literal>
> +     needs similar steps. The only change required here is that
> +     <literal>node3</literal> should create a subscription with
> +     <literal>copy_data = force</literal> to one of the existing nodes to
> +     receive the existing data during initial data synchronization.
> +   </para>
>
> I think perhaps this should clarify that there is NO data in node3 for
> this example.

Modified

> ~~~
>
> 13. doc/src/sgml/logical-replication.sgml - section 31.11.3
>
> +   <para>
> +    Lock the required tables in <literal>node2</literal> and
> +    <literal>node3</literal> till the setup is completed.
> +   </para>
>
> Should you describe why you say no lock is required for tables on
> node1? (Similar information is in section 31.11.4 also)

Modified

> ~~~
>
> 14. doc/src/sgml/logical-replication.sgml - section 31.11.3
>
> +   <para>
> +    Create a subscription in <literal>node3</literal> to subscribe to
> +    <literal>node2</literal>:
> +<programlisting>
> +node3=# CREATE SUBSCRIPTION sub_node3_node2
> +node3-# CONNECTION 'dbname=foo host=node2 user=repuser'
> +node3-# PUBLICATION pub_node2
> +node3-# WITH (copy_data = off, only_local = on);
> +CREATE SUBSCRIPTION
> +</programlisting></para>
>
> Using a similar description as in section 31.11.4 this should probably
> also say something like:
> "Use copy_data specified as off because the initial table data would
> have been already copied in the previous step".

Modified

> ~~~
>
> 15. doc/src/sgml/logical-replication.sgml - section 31.11.4
>
> +  <sect2 id="add-node-data-present-in-new-node">
> +   <title>Adding a new node when data is present in the new node</title>
> +   <para>
> +     Adding a new node <literal>node3</literal> to the existing
> +     <literal>node1</literal> and <literal>node2</literal> when data is present
> +     in the new node <literal>node3</literal> needs similar steps. A
> few changes
> +     are required here to get the existing data from <literal>node3</literal>
> +     to <literal>node1</literal> and <literal>node2</literal> and later
> +     cleaning up of data in <literal>node3</literal> before synchronization of
> +     all the data from the existing nodes.
> +   </para>
>
> I think perhaps this should clarify that there is NO data in node1 or
> node2 for this example.

This section contents have been removed because this is not feasible currently.

> ~~~
>
> 16. doc/src/sgml/logical-replication.sgml
>
> +  <sect2 id="generic-steps-add-new-node">
> +   <title>Generic steps to add a new node to the existing set of nodes</title>
>
> I think all the steps in this section should be numbered because the
> order is important.

Modified

> ~~~
>
> 17. doc/src/sgml/logical-replication.sgml
>
> +  <sect2 id="generic-steps-add-new-node">
> +   <title>Generic steps to add a new node to the existing set of nodes</title>
>
> IIUC the word "parameter" should replace the word "option" in all the
> text of this section.

Modified

> ~~~
>
> 18. doc/src/sgml/logical-replication.sgml - Notes
>
> +  <sect2>
> +   <title>Notes</title>
> +   <para>
> +    Setting up bidirectional logical replication across nodes requires multiple
> +    steps to be performed on various nodes, as all operations are not
> +    transactional, user is advised to take backup of existing data to avoid any
> +    inconsistency.
> +   </para>
> +  </sect2>
>
> 18a.
> I thought this "Notes" section should be rendered more like an sgml
> note or a warning. E.g. like the warning on this page [2].

Modified

> 18b.
> SUGGESTION (split the para into multiple sentences). e.g.
> Setting up bidirectional logical replication across nodes requires
> multiple steps to be performed on various nodes. Because not all
> operations are transactional, the user is advised to take backups.

Modified

> ======
>
> 19. doc/src/sgml/ref/alter_subscription.sgml
>
> +         <para>
> +          There is some interaction between the <literal>only_local</literal>
> +          option and <literal>copy_data</literal> option. Refer to the
> +          <command>CREATE SUBSCRIPTION</command>
> +          <xref linkend="sql-createsubscription-notes" /> for interaction
> +          details and usage of <literal>force</literal> for
> +          <literal>copy_data</literal> option.
>           </para>
>
> "option" -> "parameter" (3x)

Modified

> ======
>
> 20. doc/src/sgml/ref/create_subscription.sgml
>
> +         <para>
> +          There is some interaction between the <literal>only_local</literal>
> +          option and <literal>copy_data</literal> option. Refer to the
> +          <xref linkend="sql-createsubscription-notes" /> for interaction
> +          details and usage of <literal>force</literal> for
> +          <literal>copy_data</literal> option.
> +         </para>
>
> "option" -> "parameter" (3x)

Modified

> ======
>
> 21. src/backend/commands/subscriptioncmds.c
>
> @@ -69,6 +69,18 @@
>  /* check if the 'val' has 'bits' set */
>  #define IsSet(val, bits)  (((val) & (bits)) == (bits))
>
> +#define IS_COPY_DATA_ON_OR_FORCE(copy_data) ((copy_data) != COPY_DATA_OFF)
> +
> +/*
> + * Represents whether copy_data option is specified with off, on or force.
> + */
> +typedef enum CopyData
> +{
> + COPY_DATA_OFF,
> + COPY_DATA_ON,
> + COPY_DATA_FORCE
> +} CopyData;
> +
>
> 21a.
> "option" -> "parameter"

Modified

> 21b.
> I think the new #define is redundant and can be removed (see the next
> review comment)

Modified

> 21c.
> I think you set the OFF enum value like:
> typedef enum CopyData
> {
>  COPY_DATA_OFF = 0,
>  COPY_DATA_ON,
>  COPY_DATA_FORCE
> } CopyData;
>
> then it will greatly simplify things; many changes in this file will
> be unnecessary (see subsequent review comments)

Modified

> ~~~
>
> 22. src/backend/commands/subscriptioncmds.c
>
> +/*
> + * Validate the value specified for copy_data option.
> + */
> +static CopyData
> +DefGetCopyData(DefElem *def)
>
> "option" -> "parameter"

Modified

> ~~~
>
> 23. src/backend/commands/subscriptioncmds.c
>
> + /*
> + * Allow 0, 1, "true", "false", "on", "off" or "force".
> + */
> + switch (nodeTag(def->arg))
> + {
> + case T_Integer:
> + switch (intVal(def->arg))
> + {
> + case 0:
> + return COPY_DATA_OFF;
> + case 1:
> + return COPY_DATA_ON;
> + default:
> + /* otherwise, error out below */
> + break;
> + }
> + break;
>
> What about also allowing copy_data = 2, and making it equivalent to "force"?

I felt the existing looks ok, no need to support 2. It might confuse the user.

> ~~~
>
> 24. src/backend/commands/subscriptioncmds.c
>
> @@ -333,17 +400,17 @@ parse_subscription_options(ParseState *pstate,
> List *stmt_options,
>   errmsg("%s and %s are mutually exclusive options",
>   "connect = false", "create_slot = true")));
>
> - if (opts->copy_data &&
> + if (IS_COPY_DATA_ON_OR_FORCE(opts->copy_data) &&
>   IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
>
> I think this change would not be needed if you set the OFF enum = 0.

Modified

> ~~~
> 25. src/backend/commands/subscriptioncmds.c
>
> @@ -671,13 +738,14 @@ CreateSubscription(ParseState *pstate,
> CreateSubscriptionStmt *stmt,
>   * Set sync state based on if we were asked to do data copy or
>   * not.
>   */
> - table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY;
> + table_state = IS_COPY_DATA_ON_OR_FORCE(opts.copy_data) ?
> SUBREL_STATE_INIT : SUBREL_STATE_READY;
>
> I think this change would not be needed if you set the OFF enum = 0.

Modified

> ~~~
>
> 26. src/backend/commands/subscriptioncmds.c
>
> @@ -720,7 +788,8 @@ CreateSubscription(ParseState *pstate,
> CreateSubscriptionStmt *stmt,
>   * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH
>   * PUBLICATION to work.
>   */
> - if (opts.twophase && !opts.copy_data && tables != NIL)
> + if (opts.twophase && opts.copy_data == COPY_DATA_OFF &&
> + tables != NIL)
>   twophase_enabled = true;
>
> I think this change would not be needed if you set the OFF enum = 0.

Modified

> ~~~
>
> 27. src/backend/commands/subscriptioncmds.c
>
> @@ -851,7 +921,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
>   list_length(subrel_states), sizeof(Oid), oid_cmp))
>   {
>   AddSubscriptionRelState(sub->oid, relid,
> - copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
> + IS_COPY_DATA_ON_OR_FORCE(copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
>   InvalidXLogRecPtr);
>
> I think this change would not be needed if you set the OFF enum = 0.
>

Modified

> ~~~
> 28. src/backend/commands/subscriptioncmds.c
>
> @@ -1157,7 +1227,7 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
>   * See ALTER_SUBSCRIPTION_REFRESH for details why this is
>   * not allowed.
>   */
> - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
> + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
> IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
>   ereport(ERROR,
>   (errcode(ERRCODE_SYNTAX_ERROR),
>   errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when two_phase is enabled"),
>
> I think this change would not be needed if you set the OFF enum = 0.

Modified

> ~~~
>
> 29. src/backend/commands/subscriptioncmds.c
>
> @@ -1209,7 +1279,7 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
>   * See ALTER_SUBSCRIPTION_REFRESH for details why this is
>   * not allowed.
>   */
> - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
> + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
> IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
>   ereport(ERROR,
>   (errcode(ERRCODE_SYNTAX_ERROR),
>   errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when two_phase is enabled"),
>
> I think this change would not be needed if you set the OFF enum = 0.

Modified

> ~~~
>
> 30. src/backend/commands/subscriptioncmds.c
>
> @@ -1255,7 +1325,8 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
>   *
>   * For more details see comments atop worker.c.
>   */
> - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)
> + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
> + IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))
>   ereport(ERROR,
>   (errcode(ERRCODE_SYNTAX_ERROR),
>   errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed
> when two_phase is enabled"),
>
> I think this change would not be needed if you set the OFF enum = 0.

Modified

> ~~~
>
> 31. src/backend/commands/subscriptioncmds.c
>
> + appendStringInfoString(&cmd,
> +    "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename,
> PS.srrelid as replicated\n"
> +    "FROM pg_publication P,\n"
> +    "LATERAL pg_get_publication_tables(P.pubname) GPT\n"
> +    "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +    "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> +    "WHERE C.oid = GPT.relid AND P.pubname in (");
>
> use upper case "in" -> "IN"

Modified

> ======
>
> 32.  src/test/regress/sql/subscription.sql
>
> @@ -40,6 +40,9 @@ SET SESSION AUTHORIZATION 'regress_subscription_user';
>
>  -- fail - invalid option combinations
>  CREATE SUBSCRIPTION regress_testsub2 CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> false, copy_data = true);
> +CREATE SUBSCRIPTION regress_testsub2 CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> false, copy_data = on);
> +CREATE SUBSCRIPTION regress_testsub2 CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> false, copy_data = 1);
> +CREATE SUBSCRIPTION regress_testsub2 CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> false, copy_data = force);
>
> 32a.
> The test for "copy_data = 1" should confirm it is the same as "on".

We have the test for copy_data = on and copy_data = 1, both the
results are the same. Isn't that sufficient?

> 32b.
> Should also test copy_data = 2 (and confirm it is the same as "force").

Currently only on, off, true, false, 0 & 1 is supported. 2 is not
supported. Added a test for copy_data as 2

> ======
>
> 33. src/test/subscription/t/032_localonly.pl - cosmetic changes
>
> 33a.
> +# Detach node C from the node-group of (A, B, C) and clean the table contents
> +# from all nodes.
>
> SUGGESTION
> Detach node_C from the node-group of (node_A, node_B, node_C) and ...

Modified

> 33b.
> +# Subroutine to create subscription and wait till the initial sync is
> completed.
>
> "till" -> "until"

Modified

> 33c.
> +# Subroutine to create subscription and wait till the initial sync is
> completed.
> +# Subroutine expects subscriber node, publisher node, subscription name,
> +# destination connection string, publication name and the subscription with
> +# options to be passed as input parameters.
> +sub create_subscription
> +{
> + my ($node_subscriber, $node_publisher, $sub_name, $node_connstr,
> + $pub_name, $with_options)
> +   = @_;
>
> "subscription with options" => "subscription parameters"
> "$with_options" -> "$sub_params"

Modified

> 33d.
> +# Specifying only_local 'on' which indicates that the publisher should only
>
> "Specifying" => "Specify"

Modified

Thanks for the comments, the v17 patch attached at [1] has the changes
for the same.
[1] - https://www.postgresql.org/message-id/CALDaNm1rMihO7daiFyLdxkqArrC%2BdtM61oPXc-XrTYBYnJg3nw%40mail.gmail.com

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Handle infinite recursion in logical replication setup
Next
From: vignesh C
Date:
Subject: Re: Handle infinite recursion in logical replication setup