Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoCO3TED2cSche4iGpYekpyE-rwzput0oDW3YNvoXNkrQQ@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > Yeah, it seems to be introduced by commit 0926e96c493. I've attached
> > > > the patch for that.
> > > >
> > > > Also, I've attached the updated version patches. This version patch
> > > > has pg_stat_reset_subscription_error() SQL function and sends a clear
> > > > message after skipping the transaction. 0004 patch includes the
> > > > skipping transaction feature and introducing RESET to ALTER
> > > > SUBSCRIPTION. It would be better to separate them.
> > > >
>
> +1, to separate out the reset part.

Okay, I'll do that.

>
> > >
> > > I've attached the new version patches that fix cfbot failure.
> >
> > Sorry I've attached wrong ones. Reattached the correct version patches.
> >
>
> Pushed the 0001* patch that removes the unused parameter.

Thanks!

>
> Few comments on v4-0001-Add-errcontext-to-errors-of-the-applying-logical-
> ===========================================================

Thank you for the comments!

> 1.
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -78,6 +78,7 @@
>  #include "partitioning/partbounds.h"
>  #include "partitioning/partdesc.h"
>  #include "pgstat.h"
> +#include "replication/logicalworker.h"
>  #include "rewrite/rewriteDefine.h"
>  #include "rewrite/rewriteHandler.h"
>  #include "rewrite/rewriteManip.h"
> @@ -1899,6 +1900,9 @@ ExecuteTruncateGuts(List *explicit_rels,
>   continue;
>   }
>
> + /* Set logical replication error callback info if necessary */
> + set_logicalrep_error_context_rel(rel);
> +
>   /*
>   * Build the lists of foreign tables belonging to each foreign server
>   * and pass each list to the foreign data wrapper's callback function,
> @@ -2006,6 +2010,9 @@ ExecuteTruncateGuts(List *explicit_rels,
>   pgstat_count_truncate(rel);
>   }
>
> + /* Reset logical replication error callback info */
> + reset_logicalrep_error_context_rel();
> +
>
> Setting up logical rep error context in a generic function looks a bit
> odd to me. Do we really need to set up error context here? I
> understand we can't do this in caller but anyway I think we are not
> sending this to logical replication view as well, so not sure we need
> to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

>
> 2.
> +/* Struct for saving and restoring apply information */
> +typedef struct ApplyErrCallbackArg
> +{
> + LogicalRepMsgType command; /* 0 if invalid */
> +
> + /* Local relation information */
> + char    *nspname; /* used for error context */
> + char    *relname; /* used for error context */
> +
> + TransactionId remote_xid;
> + TimestampTz committs;
> +} ApplyErrCallbackArg;
> +static ApplyErrCallbackArg apply_error_callback_arg =
> +{
> + .command = 0,
> + .relname = NULL,
> + .nspname = NULL,
> + .remote_xid = InvalidTransactionId,
> + .committs = 0,
> +};
> +
>
> Better to have a space between the above two declarations.

Will fix.

>
> 3. commit message:
> This commit adds the error context to errors happening during applying
> logical replication changes, showing the command, the relation
> relation, transaction ID, and commit timestamp in the server log.
>
> 'relation' is mentioned twice.

Will fix.

>
> The patch is not getting applied probably due to yesterday's commit in
> this area.

Okay. I'll rebase the patches to the current HEAD.

I'm incorporating all comments from you and Houzj, and will submit the
new patch soon.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side