Re: Add the replication origin name and commit-LSN to logical replication worker errcontext - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Add the replication origin name and commit-LSN to logical replication worker errcontext |
Date | |
Msg-id | CAA4eK1+V51Rcy1Pg0u2srqKvG1u_6RUz4=MuC0Wt3+smoObn9w@mail.gmail.com Whole thread Raw |
In response to | Re: Add the replication origin name and commit-LSN to logical replication worker errcontext (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
|
List | pgsql-hackers |
On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I've attached updated patches. > The first patch LGTM. Some comments on the second patch: 1. @@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg) myslotname = MemoryContextStrdup(ApplyContext, syncslotname); pfree(syncslotname); + + /* + * Allocate the origin name in long-lived context for error context + * message + */ + ReplicationOriginNameForTablesync(MySubscription->oid, + MyLogicalRepWorker->relid, + originname, + sizeof(originname)); + apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext, + originname); Can we assign this in LogicalRepSyncTableStart() where we already forming the origin name? That will avoid this extra call to ReplicationOriginNameForTablesync. 2. @@ -3657,30 +3679,37 @@ apply_error_callback(void *arg) errcontext("processing remote data during \"%s\"", logicalrep_message_type(errarg->command)); else - errcontext("processing remote data during \"%s\" in transaction %u", + errcontext("processing remote data during \"%s\" in transaction %u committed at LSN %X/%X from replication origin \"%s\"", logicalrep_message_type(errarg->command), - errarg->remote_xid); + errarg->remote_xid, + LSN_FORMAT_ARGS(errarg->commit_lsn), + errarg->origin_name); Won't we set the origin name before the command? If so, it can be used in the first message as well and we can change the condition in the beginning such that if the origin or command is not set then we can return without adding additional context information. Isn't this error message used for rollback prepared failure as well? If so, do we need to say "... finished at LSN ..." instead of "... committed at LSN ..."? The other point about this message is that saying ".... from replication origin ..." sounds more like remote information similar to publication but the origin is of the local node. How about slightly changing it to "processing remote data for replication origin \"%s\" during \"%s\" in transaction ..."? 3. +</screen> + The LSN of the transaction that contains the change violating the constraint and + the replication origin name can be found from those outputs (LSN 0/14C0378 and + replication origin <literal>pg_16395</literal> in the above case). The transaction + can be skipped by calling the <link linkend="pg-replication-origin-advance"> <function>pg_replication_origin_advance()</function></link> function with - a <parameter>node_name</parameter> corresponding to the subscription name, - and a position. The current position of origins can be seen in the - <link linkend="view-pg-replication-origin-status"> + the <parameter>node_name</parameter> and the next LSN of the commit LSN + (i.e., 0/14C0379) from those outputs. After node_name, can we specify origin_name similar to what we do for LSN as that will make things more clear? Also, shall we mention that users need to disable subscriptions to perform replication origin advance? I think for prepared transactions, the user might need to use it twice because after skipping prepared xact, it will get an error again during the processing of commit prepared (no such prepare exists). I thought of mentioning it but felt that might lead to specifying too many details which can confuse users as well. What do you think? 4. There are places in the patch like apply_handle_stream_start() which sets commit_lsn in callback arg as InvalidXLogRecPtr but the callback function seems to be assuming that it is always a valid value. Shouldn't we need to avoid appending LSN for such cases? -- With Regards, Amit Kapila.
pgsql-hackers by date: