Re: Handle infinite recursion in logical replication setup - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Handle infinite recursion in logical replication setup
Date
Msg-id CAA4eK1KjaLGnvpeFT15qt4U5PXtEGCYMzqA7xFehtKRmuu=fsA@mail.gmail.com
Whole thread Raw
In response to Re: Handle infinite recursion in logical replication setup  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Handle infinite recursion in logical replication setup
List pgsql-hackers
On Wed, Aug 31, 2022 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for v43-0001.
>
> ======
>
> 1. Commit message
>
> The initial copy phase has no way to know the origin of the row data,
> so if 'copy_data = true' in the step 4 below, log a warning to notify user
> that potentially non-local data might have been copied.
>
> 1a.
> "in the step 4" -> "in step 4"
>
> ~
>
> 1b.
> "notify user" -> "notify the user"
>
> ======
>
> 2. doc/src/sgml/ref/create_subscription.sgml
>
> +  <para>
> +   If the subscription is created with <literal>origin = none</literal> and
> +   <literal>copy_data = true</literal>, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, log a
> +   warning to notify the user to check the publisher tables. Before continuing
> +   with other operations the user should check that publisher tables did not
> +   have data with different origins to prevent data inconsistency issues on the
> +   subscriber.
> +  </para>
>
> I am not sure about that wording saying "to prevent data inconsistency
> issues" because I thought maybe is already too late because any
> unwanted origin data is already copied during the initial sync. I
> think the user can do it try to clean up any problems caused before
> things become even worse (??)
>
> ~~~
>
> You asked for my thoughts (see [1] 6b):
>
> > There is nothing much a user can do in this case. Only option would be
> > to take a backup before the operation and restore it and then recreate
> > the replication setup. I was not sure if we should document these
> > steps as similar inconsistent data could get created without the
> > origin option . Thoughts?
>
> I think those cases are a bit different:
>
> - For a normal scenario (i.e. origin=ANY) then inconsistent data just
> refers to problems like maybe PK violations so I think you just get
> what you get and have to deal with the errors...
>
> - But when the user said origin=NONE they are specifically saying they
> do NOT want any non-local data, yet they still might end up getting
> data contrary to their wishes. So I think maybe there ought to be
> something documented to warn about this potentially happening. My
> first impression is that all this seems like a kind of terrible
> problem because if it happens then cleanup could be difficult - if
> that subscriber in turn was also a publisher then this bad data might
> have propagated all over the place already! Anyway, I guess all this
> could be mentioned in some (new ?) section of the
> logical-replication.sgml file (i.e. patch 0002).
>
> IDEA: I was wondering if the documentation should recommend (or maybe
> we can even enforce this) that when using origin=NONE the user should
> always create the subscription with enabled=FALSE. That way if there
> are some warnings about potential bad origin data then there it might
> be possible to take some action or change their mind *before* any
> unwanted data pollutes everything.
>
> ======
>
> 3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> + if (count == 0)
> + tablenames = makeStringInfo();
> + else
> + appendStringInfoString(tablenames, ", ");
> +
> + appendStringInfo(tablenames, "%s.%s", nspname, relname);
> + count++;
>
> I think the quotes are not correct - each separate table name should be quoted.
>
> ~~~
>
> 4.
>
> + /*
> + * There might be many tables present in this case, we will display a
> + * maximum of 100 tables followed by "..." to indicate there might be
> + * more tables.
> + */
> + if (count == 100)
> + {
> + appendStringInfoString(tablenames, ", ...");
> + break;
> + }
>
> 4a.
> IMO this code should be part of the previous if/else
>
> e.g.
> if (count == 0) ... makeStringInfo()
> else if (count > 100) ... append ellipsis "..." and break out of the loop
> else ... append table name to the message
>
> Doing it in this way means "..." definitely means there are more than
> 100 bad tables.
>
> ~
>
> 4b.
> Changing like above (#4a) simplifies the comment because it removes
> doubts like "might be…" since you already know you found the 101st
> table.
>
> SUGGESTION (comment)
> The message displays a maximum of 100 tables having potentially
> non-local data. Any more than that will be indicated by "...".
>
> ~
>

Are we using this style of an error message anywhere else in the code?
If not, then I think we should avoid it here as well as it appears a
bit adhoc to me in the sense that why restrict at 100 tables. Can we
instead think of displaying the publications list in the message? So
errdetail would be something like: Subscribed publications \"%s\" has
been subscribed from some other publisher. Then we can probably give a
SQL query for a user to find tables that may data from multiple
origins especially if that is not complicated. Also, then we can
change the function name to check_publications_origin().

Few other minor comments on v43-0001*
1.
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("subscription %s requested origin=NONE but might copy data
that had a different origin.",
+ subname),

In error message, we don't use full stop at the end.

2.
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("subscription %s requested origin=NONE but might copy data
that had a different origin.",
+ subname),
+ errdetail("Subscribed table(s) \"%s\" has been subscribed from some
other publisher.",
+   tablenames->data),

It is better to use errdetail_plural here as there could be one or
more objects in this message?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Instrumented pages/tuples frozen in autovacuum's server log out (and VACUUM VERBOSE)
Next
From: Tom Lane
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types