Re: Add the replication origin name and commit-LSN to logical replication worker errcontext - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Add the replication origin name and commit-LSN to logical replication worker errcontext |
Date | |
Msg-id | CAD21AoAz76QEWhdbC45qDLXJscejZa2BOSJUKGnXXo6W2mPyQA@mail.gmail.com Whole thread Raw |
In response to | Re: Add the replication origin name and commit-LSN to logical replication worker errcontext (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
|
List | pgsql-hackers |
On Thu, Mar 3, 2022 at 3:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I've attached updated patches. > > > > The first patch LGTM. Some comments on the second patch: > > 1. > @@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg) > myslotname = MemoryContextStrdup(ApplyContext, syncslotname); > > pfree(syncslotname); > + > + /* > + * Allocate the origin name in long-lived context for error context > + * message > + */ > + ReplicationOriginNameForTablesync(MySubscription->oid, > + MyLogicalRepWorker->relid, > + originname, > + sizeof(originname)); > + apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext, > + originname); > > Can we assign this in LogicalRepSyncTableStart() where we already > forming the origin name? That will avoid this extra call to > ReplicationOriginNameForTablesync. Yes, but it requires to expose either apply_error_callback_arg or the tablesync's origin name since the tablesync worker sets its origin name in tablesync.c. I think it's better to avoid exposing them. > > 2. > @@ -3657,30 +3679,37 @@ apply_error_callback(void *arg) > errcontext("processing remote data during \"%s\"", > logicalrep_message_type(errarg->command)); > else > - errcontext("processing remote data during \"%s\" in transaction %u", > + errcontext("processing remote data during \"%s\" in transaction %u > committed at LSN %X/%X from replication origin \"%s\"", > logicalrep_message_type(errarg->command), > - errarg->remote_xid); > + errarg->remote_xid, > + LSN_FORMAT_ARGS(errarg->commit_lsn), > + errarg->origin_name); > > Won't we set the origin name before the command? If so, it can be used > in the first message as well and we can change the condition in the > beginning such that if the origin or command is not set then we can > return without adding additional context information. Good point. > > Isn't this error message used for rollback prepared failure as well? > If so, do we need to say "... finished at LSN ..." instead of "... > committed at LSN ..."? Right. Or can we just remove "committed" since the current message is "transaction %u at %s"? That is , just replace the timestamp with LSN. > > The other point about this message is that saying ".... from > replication origin ..." sounds more like remote information similar to > publication but the origin is of the local node. How about slightly > changing it to "processing remote data for replication origin \"%s\" > during \"%s\" in transaction ..."? Okay, so the modified message would be like: "processing remote data for replication origin \"%s\" during \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u finished at LSN %X/%X" > > 3. > +</screen> > + The LSN of the transaction that contains the change violating the > constraint and > + the replication origin name can be found from those outputs (LSN > 0/14C0378 and > + replication origin <literal>pg_16395</literal> in the above case). > The transaction > + can be skipped by calling the <link linkend="pg-replication-origin-advance"> > <function>pg_replication_origin_advance()</function></link> function with > - a <parameter>node_name</parameter> corresponding to the subscription name, > - and a position. The current position of origins can be seen in the > - <link linkend="view-pg-replication-origin-status"> > + the <parameter>node_name</parameter> and the next LSN of the commit LSN > + (i.e., 0/14C0379) from those outputs. > > After node_name, can we specify origin_name similar to what we do for > LSN as that will make things more clear? Also, shall we mention that > users need to disable subscriptions to perform replication origin > advance? Agreed. > > I think for prepared transactions, the user might need to use it twice > because after skipping prepared xact, it will get an error again > during the processing of commit prepared (no such prepare exists). Good point. > I > thought of mentioning it but felt that might lead to specifying too > many details which can confuse users as well. What do you think? Given that this method of using pg_replication_origin_advance() is normally not a preferable way, I think we might not need to mention it. Also, it needs twice for one transaction but the steps are the same. > > 4. There are places in the patch like apply_handle_stream_start() > which sets commit_lsn in callback arg as InvalidXLogRecPtr but the > callback function seems to be assuming that it is always a valid > value. Shouldn't we need to avoid appending LSN for such cases? Agreed. Will fix it in the next version patch. I'm updating the patches and will submit them. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: