Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Logical replication timeout problem
Date
Msg-id CAA4eK1JBT_ZKY5xSG_cwD7EOb26GrK2_CS1YPBgxE7rc69Y3jg@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Logical replication timeout problem
List pgsql-hackers
On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Apr 28, 2022 at 7:01 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Hi Sawada-san, Wang
> >
> > I was looking at the patch and noticed that we moved some logic from
> > update_replication_progress() to OutputPluginUpdateProgress() like
> > your suggestion.
> >
> > I have a question about this change. In the patch we added some
> > restriction in this function OutputPluginUpdateProgress() like below:
> >
> > + /*
> > + * If we are at the end of transaction LSN, update progress tracking.
> > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
> > + * try to send a keepalive message if required.
> > + */
> > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
> > + {
> > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
> > + skipped_xact);
> > + changes_count = 0;
> > + }
> >
> > After the patch, we won't be able to always invoke the update_progress() if the
> > caller are at the middle of transaction(e.g. end_xact = false). The bebavior of the
> > public function OutputPluginUpdateProgress() is changed. I am thinking is it ok to
> > change this at back-branches ?
> >
> > Because OutputPluginUpdateProgress() is a public function for the extension
> > developer, just a little concerned about the behavior change here.
>
> Good point.
>
> As you pointed out, it would not be good if we change the behavior of
> OutputPluginUpdateProgress() in back branches. Also, after more
> thought, it is not a good idea even for HEAD since there might be
> background workers that use logical decoding and the timeout checking
> might not be relevant at all with them.
>

So, shall we go back to the previous approach of using a separate
function update_replication_progress?

> BTW, I think you're concerned about the plugins that call
> OutputPluginUpdateProgress() at the middle of the transaction (i.e.,
> end_xact = false). We have the following change that makes the
> walsender not update the progress at the middle of the transaction. Do
> you think it is okay since it's not common usage to call
> OutputPluginUpdateProgress() at the middle of the transaction by the
> plugin that is used by the walsender?
>

We have done that purposefully as otherwise, the lag tracker shows
incorrect information. See email [1]. The reason is that we always get
ack from subscribers for transaction end. Also, prior to this patch we
never call the lag tracker recording apart from the transaction end,
so as a bug fix we shouldn't try to change it.

[1] -
https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: ERROR: type of parameter 1 (fruit2) does not match that when preparing the plan (fruit1)
Next
From: Masahiko Sawada
Date:
Subject: Re: Logical replication timeout problem