Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Handle infinite recursion in logical replication setup |
Date | |
Msg-id | CALDaNm3iLLxP4OV+yQHs-c71P6zQ9W8D30DGsve1SQs_1pFsSQ@mail.gmail.com Whole thread Raw |
In response to | Re: Handle infinite recursion in logical replication setup (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Thu, May 26, 2022 at 2:06 PM Peter Smith <smithpb2250@gmail.com> wrote: > > This post completes my review of patch v15*. > > ====== > > 1. A general comment affecting both patches > > I realised the things defined in the subscription WITH are referred to > in the PG DOCS as "parameters" (not "options"). See CREATE > SUBSCRIPTION [1]. This is already reflected in some of my review > comments for the v15-0002 below., but I'd already posted my v15-0001 > review comments before noticing this. > > Please also go through the v15-0001 patch to make all the necessary > terminology changes (including in the v15-0001 commit message). Modified > ===== > > Please find below review comments for patch v15-0002. > > 2. Commit message > > This patch does couple of things: > change 1) Added force option for copy_data. > change 2) Check and throw an error if the publication tables were also > subscribing data in the publisher from other publishers. > > -- > > 2a. > "couple" -> "a couple" > "Added force" -> "Adds force ..." > "Check and throw" -> "Checks and throws ..." Modified > 2b. > Isn't "Change 2)" only applicable when copy_data and only_local are > on? It should say so here. Modified > ~~~ > > 3. Commit message > > node1: Table t1 (c1 int) has data 1, 2, 3, 4 > node2: Table t1 (c1 int) has data 5, 6, 7, 8 > > -- > > I feel these kinds of replication examples are always much easier to > visualize when the data somehow indicates where it came from. E.g. > consider using different data in your example like: > > node1: Table t1 (c1 int) has data 11, 12, 13, 14 > node2: Table t1 (c1 int) has data 21, 22, 23, 24 Modified > ~~~ > > 4. Commit message > > Examples in this message exceed 80 chars. Please wrap the SQL appropriately Modified > ~~~ > > 5. Commit message > > After this the data will be something like this: > node1: > 1, 2, 3, 4, 5, 6, 7, 8 > > node2: > 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8 > > So, you can see that data on node2 (5, 6, 7, 8) is duplicated. > > -- > > Yeah, but is that entirely correct though? AFAIK the scenario > described by this example is going to recurse forever. Modified > ~~~ > > 6. Commit message > > This problem can be solved by using only_local and copy_data option as > given below > > -- > > "option" -> "parameters" Modified > ~~~ > > 7. Commit message > > There are other minor review comments but it's too messy to report in > detail here. I suggest checking all the ":" (some were missing) and > the letter case of/within the sentences. I also suggest > cutting/pasting all this text into MSWord or some grammar checker to > correct anything else reported. Modified > ~~~ > > 8. Commit message > > CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' > PUBLICATION pub_node1 WITH (copy_data = force, only_local = on); > ERROR: CREATE/ALTER SUBSCRIPTION with only_local and copy_data as > true is not allowed when the publisher might have replicated data > > -- > > Looks like a typo: e.g. If copy_data is "force" then how is the error happening? Modified > ~~~ > > 9. Commit message > > The examples seem confused/muddled in some way IMO. > > IIUC the new ERROR is like a new integrity check for when the > subscription is using the only_local = on, along with copy_data = on. > > And so then the copy_data = force was added as a way to override that > previous error being reported. > > So unless you know about the possible error then the "force" option > makes no sense. So I think you need to change the order of the > explanations in the commit message to describe the new error first. Modified > ====== > > 10. doc/src/sgml/logical-replication.sgml > > + > + <sect1 id="bidirectional-logical-replication"> > + <title>Bidirectional logical replication</title> > + > + <sect2 id="setting-bidirectional-replication-two-nodes"> > + <title>Setting bidirectional replication between two nodes</title> > + <para> > + Bidirectional replication is useful in creating a multi-master database > + which helps in performing read/write operations from any of the nodes. > + Setting up bidirectional logical replication between two nodes requires > + creation of a publication in all the nodes, creating subscriptions in > + each of the nodes that subscribes to data from all the nodes. The steps > + to create a two-node bidirectional replication when there is no data in > + both the nodes are given below: > + </para> > > This description is confusing. It seems to be trying to say something > about a multi-master system (e.g. you say "all the nodes") yet this is > all written inside the section about "... two nodes", so saying "all > the nodes" makes no sense here. > > I think you need to split this information and put some kind of blurb > text at the top level (section 31.11), and then this section (for "two > nodes") should ONLY be talking about the case when there are just TWO > nodes. Modified > ~~~ > > 11. doc/src/sgml/logical-replication.sgml > > + <para> > + Lock the required tables in <literal>node1</literal> and > + <literal>node2</literal> till the setup is completed. > + </para> > > "till" -> "until" (make this same change in multiple places) Modified > ~~~ > > 12. doc/src/sgml/logical-replication.sgml > > + <sect2 id="add-new-node-data-in-existing-node"> > + <title>Adding a new node when data is present in the existing nodes</title> > + <para> > + Adding a new node <literal>node3</literal> to the existing > + <literal>node1</literal> and <literal>node2</literal> when data is present > + in existing nodes <literal>node1</literal> and <literal>node2</literal> > + needs similar steps. The only change required here is that > + <literal>node3</literal> should create a subscription with > + <literal>copy_data = force</literal> to one of the existing nodes to > + receive the existing data during initial data synchronization. > + </para> > > I think perhaps this should clarify that there is NO data in node3 for > this example. Modified > ~~~ > > 13. doc/src/sgml/logical-replication.sgml - section 31.11.3 > > + <para> > + Lock the required tables in <literal>node2</literal> and > + <literal>node3</literal> till the setup is completed. > + </para> > > Should you describe why you say no lock is required for tables on > node1? (Similar information is in section 31.11.4 also) Modified > ~~~ > > 14. doc/src/sgml/logical-replication.sgml - section 31.11.3 > > + <para> > + Create a subscription in <literal>node3</literal> to subscribe to > + <literal>node2</literal>: > +<programlisting> > +node3=# CREATE SUBSCRIPTION sub_node3_node2 > +node3-# CONNECTION 'dbname=foo host=node2 user=repuser' > +node3-# PUBLICATION pub_node2 > +node3-# WITH (copy_data = off, only_local = on); > +CREATE SUBSCRIPTION > +</programlisting></para> > > Using a similar description as in section 31.11.4 this should probably > also say something like: > "Use copy_data specified as off because the initial table data would > have been already copied in the previous step". Modified > ~~~ > > 15. doc/src/sgml/logical-replication.sgml - section 31.11.4 > > + <sect2 id="add-node-data-present-in-new-node"> > + <title>Adding a new node when data is present in the new node</title> > + <para> > + Adding a new node <literal>node3</literal> to the existing > + <literal>node1</literal> and <literal>node2</literal> when data is present > + in the new node <literal>node3</literal> needs similar steps. A > few changes > + are required here to get the existing data from <literal>node3</literal> > + to <literal>node1</literal> and <literal>node2</literal> and later > + cleaning up of data in <literal>node3</literal> before synchronization of > + all the data from the existing nodes. > + </para> > > I think perhaps this should clarify that there is NO data in node1 or > node2 for this example. This section contents have been removed because this is not feasible currently. > ~~~ > > 16. doc/src/sgml/logical-replication.sgml > > + <sect2 id="generic-steps-add-new-node"> > + <title>Generic steps to add a new node to the existing set of nodes</title> > > I think all the steps in this section should be numbered because the > order is important. Modified > ~~~ > > 17. doc/src/sgml/logical-replication.sgml > > + <sect2 id="generic-steps-add-new-node"> > + <title>Generic steps to add a new node to the existing set of nodes</title> > > IIUC the word "parameter" should replace the word "option" in all the > text of this section. Modified > ~~~ > > 18. doc/src/sgml/logical-replication.sgml - Notes > > + <sect2> > + <title>Notes</title> > + <para> > + Setting up bidirectional logical replication across nodes requires multiple > + steps to be performed on various nodes, as all operations are not > + transactional, user is advised to take backup of existing data to avoid any > + inconsistency. > + </para> > + </sect2> > > 18a. > I thought this "Notes" section should be rendered more like an sgml > note or a warning. E.g. like the warning on this page [2]. Modified > 18b. > SUGGESTION (split the para into multiple sentences). e.g. > Setting up bidirectional logical replication across nodes requires > multiple steps to be performed on various nodes. Because not all > operations are transactional, the user is advised to take backups. Modified > ====== > > 19. doc/src/sgml/ref/alter_subscription.sgml > > + <para> > + There is some interaction between the <literal>only_local</literal> > + option and <literal>copy_data</literal> option. Refer to the > + <command>CREATE SUBSCRIPTION</command> > + <xref linkend="sql-createsubscription-notes" /> for interaction > + details and usage of <literal>force</literal> for > + <literal>copy_data</literal> option. > </para> > > "option" -> "parameter" (3x) Modified > ====== > > 20. doc/src/sgml/ref/create_subscription.sgml > > + <para> > + There is some interaction between the <literal>only_local</literal> > + option and <literal>copy_data</literal> option. Refer to the > + <xref linkend="sql-createsubscription-notes" /> for interaction > + details and usage of <literal>force</literal> for > + <literal>copy_data</literal> option. > + </para> > > "option" -> "parameter" (3x) Modified > ====== > > 21. src/backend/commands/subscriptioncmds.c > > @@ -69,6 +69,18 @@ > /* check if the 'val' has 'bits' set */ > #define IsSet(val, bits) (((val) & (bits)) == (bits)) > > +#define IS_COPY_DATA_ON_OR_FORCE(copy_data) ((copy_data) != COPY_DATA_OFF) > + > +/* > + * Represents whether copy_data option is specified with off, on or force. > + */ > +typedef enum CopyData > +{ > + COPY_DATA_OFF, > + COPY_DATA_ON, > + COPY_DATA_FORCE > +} CopyData; > + > > 21a. > "option" -> "parameter" Modified > 21b. > I think the new #define is redundant and can be removed (see the next > review comment) Modified > 21c. > I think you set the OFF enum value like: > typedef enum CopyData > { > COPY_DATA_OFF = 0, > COPY_DATA_ON, > COPY_DATA_FORCE > } CopyData; > > then it will greatly simplify things; many changes in this file will > be unnecessary (see subsequent review comments) Modified > ~~~ > > 22. src/backend/commands/subscriptioncmds.c > > +/* > + * Validate the value specified for copy_data option. > + */ > +static CopyData > +DefGetCopyData(DefElem *def) > > "option" -> "parameter" Modified > ~~~ > > 23. src/backend/commands/subscriptioncmds.c > > + /* > + * Allow 0, 1, "true", "false", "on", "off" or "force". > + */ > + switch (nodeTag(def->arg)) > + { > + case T_Integer: > + switch (intVal(def->arg)) > + { > + case 0: > + return COPY_DATA_OFF; > + case 1: > + return COPY_DATA_ON; > + default: > + /* otherwise, error out below */ > + break; > + } > + break; > > What about also allowing copy_data = 2, and making it equivalent to "force"? I felt the existing looks ok, no need to support 2. It might confuse the user. > ~~~ > > 24. src/backend/commands/subscriptioncmds.c > > @@ -333,17 +400,17 @@ parse_subscription_options(ParseState *pstate, > List *stmt_options, > errmsg("%s and %s are mutually exclusive options", > "connect = false", "create_slot = true"))); > > - if (opts->copy_data && > + if (IS_COPY_DATA_ON_OR_FORCE(opts->copy_data) && > IsSet(opts->specified_opts, SUBOPT_COPY_DATA)) > > I think this change would not be needed if you set the OFF enum = 0. Modified > ~~~ > 25. src/backend/commands/subscriptioncmds.c > > @@ -671,13 +738,14 @@ CreateSubscription(ParseState *pstate, > CreateSubscriptionStmt *stmt, > * Set sync state based on if we were asked to do data copy or > * not. > */ > - table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY; > + table_state = IS_COPY_DATA_ON_OR_FORCE(opts.copy_data) ? > SUBREL_STATE_INIT : SUBREL_STATE_READY; > > I think this change would not be needed if you set the OFF enum = 0. Modified > ~~~ > > 26. src/backend/commands/subscriptioncmds.c > > @@ -720,7 +788,8 @@ CreateSubscription(ParseState *pstate, > CreateSubscriptionStmt *stmt, > * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH > * PUBLICATION to work. > */ > - if (opts.twophase && !opts.copy_data && tables != NIL) > + if (opts.twophase && opts.copy_data == COPY_DATA_OFF && > + tables != NIL) > twophase_enabled = true; > > I think this change would not be needed if you set the OFF enum = 0. Modified > ~~~ > > 27. src/backend/commands/subscriptioncmds.c > > @@ -851,7 +921,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data, > list_length(subrel_states), sizeof(Oid), oid_cmp)) > { > AddSubscriptionRelState(sub->oid, relid, > - copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY, > + IS_COPY_DATA_ON_OR_FORCE(copy_data) ? SUBREL_STATE_INIT : SUBREL_STATE_READY, > InvalidXLogRecPtr); > > I think this change would not be needed if you set the OFF enum = 0. > Modified > ~~~ > 28. src/backend/commands/subscriptioncmds.c > > @@ -1157,7 +1227,7 @@ AlterSubscription(ParseState *pstate, > AlterSubscriptionStmt *stmt, > * See ALTER_SUBSCRIPTION_REFRESH for details why this is > * not allowed. > */ > - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data) > + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && > IS_COPY_DATA_ON_OR_FORCE(opts.copy_data)) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed > when two_phase is enabled"), > > I think this change would not be needed if you set the OFF enum = 0. Modified > ~~~ > > 29. src/backend/commands/subscriptioncmds.c > > @@ -1209,7 +1279,7 @@ AlterSubscription(ParseState *pstate, > AlterSubscriptionStmt *stmt, > * See ALTER_SUBSCRIPTION_REFRESH for details why this is > * not allowed. > */ > - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data) > + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && > IS_COPY_DATA_ON_OR_FORCE(opts.copy_data)) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed > when two_phase is enabled"), > > I think this change would not be needed if you set the OFF enum = 0. Modified > ~~~ > > 30. src/backend/commands/subscriptioncmds.c > > @@ -1255,7 +1325,8 @@ AlterSubscription(ParseState *pstate, > AlterSubscriptionStmt *stmt, > * > * For more details see comments atop worker.c. > */ > - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data) > + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && > + IS_COPY_DATA_ON_OR_FORCE(opts.copy_data)) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed > when two_phase is enabled"), > > I think this change would not be needed if you set the OFF enum = 0. Modified > ~~~ > > 31. src/backend/commands/subscriptioncmds.c > > + appendStringInfoString(&cmd, > + "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename, > PS.srrelid as replicated\n" > + "FROM pg_publication P,\n" > + "LATERAL pg_get_publication_tables(P.pubname) GPT\n" > + "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" > + "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" > + "WHERE C.oid = GPT.relid AND P.pubname in ("); > > use upper case "in" -> "IN" Modified > ====== > > 32. src/test/regress/sql/subscription.sql > > @@ -40,6 +40,9 @@ SET SESSION AUTHORIZATION 'regress_subscription_user'; > > -- fail - invalid option combinations > CREATE SUBSCRIPTION regress_testsub2 CONNECTION > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = > false, copy_data = true); > +CREATE SUBSCRIPTION regress_testsub2 CONNECTION > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = > false, copy_data = on); > +CREATE SUBSCRIPTION regress_testsub2 CONNECTION > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = > false, copy_data = 1); > +CREATE SUBSCRIPTION regress_testsub2 CONNECTION > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = > false, copy_data = force); > > 32a. > The test for "copy_data = 1" should confirm it is the same as "on". We have the test for copy_data = on and copy_data = 1, both the results are the same. Isn't that sufficient? > 32b. > Should also test copy_data = 2 (and confirm it is the same as "force"). Currently only on, off, true, false, 0 & 1 is supported. 2 is not supported. Added a test for copy_data as 2 > ====== > > 33. src/test/subscription/t/032_localonly.pl - cosmetic changes > > 33a. > +# Detach node C from the node-group of (A, B, C) and clean the table contents > +# from all nodes. > > SUGGESTION > Detach node_C from the node-group of (node_A, node_B, node_C) and ... Modified > 33b. > +# Subroutine to create subscription and wait till the initial sync is > completed. > > "till" -> "until" Modified > 33c. > +# Subroutine to create subscription and wait till the initial sync is > completed. > +# Subroutine expects subscriber node, publisher node, subscription name, > +# destination connection string, publication name and the subscription with > +# options to be passed as input parameters. > +sub create_subscription > +{ > + my ($node_subscriber, $node_publisher, $sub_name, $node_connstr, > + $pub_name, $with_options) > + = @_; > > "subscription with options" => "subscription parameters" > "$with_options" -> "$sub_params" Modified > 33d. > +# Specifying only_local 'on' which indicates that the publisher should only > > "Specifying" => "Specify" Modified Thanks for the comments, the v17 patch attached at [1] has the changes for the same. [1] - https://www.postgresql.org/message-id/CALDaNm1rMihO7daiFyLdxkqArrC%2BdtM61oPXc-XrTYBYnJg3nw%40mail.gmail.com Regards, Vignesh
pgsql-hackers by date: