Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Logical replication timeout problem
Date
Msg-id CAD21AoB4km=SgT0DAJOoKYicVNqvWZMtuQ7D9d0RFq20NK_EEw@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Logical replication timeout problem
List pgsql-hackers
On Mon, May 2, 2022 at 11:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 2, 2022 at 7:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Thu, Apr 28, 2022 at 7:01 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Hi Sawada-san, Wang
> > >
> > > I was looking at the patch and noticed that we moved some logic from
> > > update_replication_progress() to OutputPluginUpdateProgress() like
> > > your suggestion.
> > >
> > > I have a question about this change. In the patch we added some
> > > restriction in this function OutputPluginUpdateProgress() like below:
> > >
> > > + /*
> > > + * If we are at the end of transaction LSN, update progress tracking.
> > > + * Otherwise, after continuously processing CHANGES_THRESHOLD changes, we
> > > + * try to send a keepalive message if required.
> > > + */
> > > + if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
> > > + {
> > > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid,
> > > + skipped_xact);
> > > + changes_count = 0;
> > > + }
> > >
> > > After the patch, we won't be able to always invoke the update_progress() if the
> > > caller are at the middle of transaction(e.g. end_xact = false). The bebavior of the
> > > public function OutputPluginUpdateProgress() is changed. I am thinking is it ok to
> > > change this at back-branches ?
> > >
> > > Because OutputPluginUpdateProgress() is a public function for the extension
> > > developer, just a little concerned about the behavior change here.
> >
> > Good point.
> >
> > As you pointed out, it would not be good if we change the behavior of
> > OutputPluginUpdateProgress() in back branches. Also, after more
> > thought, it is not a good idea even for HEAD since there might be
> > background workers that use logical decoding and the timeout checking
> > might not be relevant at all with them.
> >
>
> So, shall we go back to the previous approach of using a separate
> function update_replication_progress?

Ok, agreed.

>
> > BTW, I think you're concerned about the plugins that call
> > OutputPluginUpdateProgress() at the middle of the transaction (i.e.,
> > end_xact = false). We have the following change that makes the
> > walsender not update the progress at the middle of the transaction. Do
> > you think it is okay since it's not common usage to call
> > OutputPluginUpdateProgress() at the middle of the transaction by the
> > plugin that is used by the walsender?
> >
>
> We have done that purposefully as otherwise, the lag tracker shows
> incorrect information. See email [1]. The reason is that we always get
> ack from subscribers for transaction end. Also, prior to this patch we
> never call the lag tracker recording apart from the transaction end,
> so as a bug fix we shouldn't try to change it.

Make sense.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical replication timeout problem
Next
From: Masahiko Sawada
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum