Re: Add the replication origin name and commit-LSN to logical replication worker errcontext - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
Date
Msg-id CAD21AoAz76QEWhdbC45qDLXJscejZa2BOSJUKGnXXo6W2mPyQA@mail.gmail.com
Whole thread Raw
In response to Re: Add the replication origin name and commit-LSN to logical replication worker errcontext  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
List pgsql-hackers
On Thu, Mar 3, 2022 at 3:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I've attached updated patches.
> >
>
> The first patch LGTM. Some comments on the second patch:
>
> 1.
> @@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg)
>   myslotname = MemoryContextStrdup(ApplyContext, syncslotname);
>
>   pfree(syncslotname);
> +
> + /*
> + * Allocate the origin name in long-lived context for error context
> + * message
> + */
> + ReplicationOriginNameForTablesync(MySubscription->oid,
> +   MyLogicalRepWorker->relid,
> +   originname,
> +   sizeof(originname));
> + apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
> +    originname);
>
> Can we assign this in LogicalRepSyncTableStart() where we already
> forming the origin name? That will avoid this extra call to
> ReplicationOriginNameForTablesync.

Yes, but it requires to expose either apply_error_callback_arg or the
tablesync's origin name since the tablesync worker sets its origin
name in tablesync.c. I think it's better to avoid exposing them.

>
> 2.
> @@ -3657,30 +3679,37 @@ apply_error_callback(void *arg)
>   errcontext("processing remote data during \"%s\"",
>      logicalrep_message_type(errarg->command));
>   else
> - errcontext("processing remote data during \"%s\" in transaction %u",
> + errcontext("processing remote data during \"%s\" in transaction %u
> committed at LSN %X/%X from replication origin \"%s\"",
>      logicalrep_message_type(errarg->command),
> -    errarg->remote_xid);
> +    errarg->remote_xid,
> +    LSN_FORMAT_ARGS(errarg->commit_lsn),
> +    errarg->origin_name);
>
> Won't we set the origin name before the command? If so, it can be used
> in the first message as well and we can change the condition in the
> beginning such that if the origin or command is not set then we can
> return without adding additional context information.

Good point.

>
> Isn't this error message used for rollback prepared failure as well?
> If so, do we need to say "... finished at LSN ..." instead of "...
> committed at LSN ..."?

Right. Or can we just remove "committed" since the current message is
"transaction %u at %s"? That is , just replace the timestamp with LSN.

>
> The other point about this message is that saying ".... from
> replication origin ..." sounds more like remote information similar to
> publication but the origin is of the local node. How about slightly
> changing it to "processing remote data for replication origin \"%s\"
> during \"%s\" in transaction ..."?

Okay, so the modified message would be like:

"processing remote data for replication origin \"%s\" during \"%s\"
for replication target relation \"%s.%s\" column \"%s\" in transaction
%u finished at LSN %X/%X"

>
> 3.
> +</screen>
> +   The LSN of the transaction that contains the change violating the
> constraint and
> +   the replication origin name can be found from those outputs (LSN
> 0/14C0378 and
> +   replication origin <literal>pg_16395</literal> in the above case).
> The transaction
> +   can be skipped by calling the <link linkend="pg-replication-origin-advance">
>     <function>pg_replication_origin_advance()</function></link> function with
> -   a <parameter>node_name</parameter> corresponding to the subscription name,
> -   and a position.  The current position of origins can be seen in the
> -   <link linkend="view-pg-replication-origin-status">
> +   the <parameter>node_name</parameter> and the next LSN of the commit LSN
> +   (i.e., 0/14C0379) from those outputs.
>
> After node_name, can we specify origin_name similar to what we do for
> LSN as that will make things more clear? Also, shall we mention that
> users need to disable subscriptions to perform replication origin
> advance?

Agreed.

>
> I think for prepared transactions, the user might need to use it twice
> because after skipping prepared xact, it will get an error again
> during the processing of commit prepared (no such prepare exists).

Good point.

>  I
> thought of mentioning it but felt that might lead to specifying too
> many details which can confuse users as well. What do you think?

Given that this method of using pg_replication_origin_advance() is
normally not a preferable way, I think we might not need to mention
it. Also, it needs twice for one transaction but the steps are the
same.

>
> 4. There are places in the patch like apply_handle_stream_start()
> which sets commit_lsn in callback arg as InvalidXLogRecPtr but the
> callback function seems to be assuming that it is always a valid
> value. Shouldn't we need to avoid appending LSN for such cases?

Agreed. Will fix it in the next version patch.

I'm updating the patches and will submit them.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Problem with moderation of messages with patched attached.
Next
From: Andrew Dunstan
Date:
Subject: Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set