Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Logical replication timeout problem
Date
Msg-id CAExHW5u93iKb_DPvXG=FcOPOMU7=tCYUbGeha7-qgxZLiu0JhA@mail.gmail.com
Whole thread Raw
In response to RE: Logical replication timeout problem  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
List pgsql-hackers
On Mon, Jan 9, 2023 at 4:08 PM wangw.fnst@fujitsu.com
<wangw.fnst@fujitsu.com> wrote:
>
> On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>
> I'm not sure if we need to add global variables or member variables for a
> cumulative count that is only used here. How would you feel if I add some
> comments when declaring this static variable?

I see WalSndUpdateProgress::sendTime is static already. So this seems
fine. A comment will help sure.

>
> > +
> > +    if (!ctx->update_progress)
> > +        return;
> > +
> > +    Assert(!ctx->fast_forward);
> > +
> > +    /* set output state */
> > +    ctx->accept_writes = false;
> > +    ctx->write_xid = txn->xid;
> > +    ctx->write_location = change->lsn;
> > +    ctx->end_xact = false;
> >
> > This patch reverts many of the changes of the previous commit which tried to
> > fix this issue i.e. 55558df2374. end_xact was introduced by the same commit but
> > without much explanation of that in the commit message. Its only user,
> > WalSndUpdateProgress(), is probably making a wrong assumption as well.
> >
> >      * We don't have a mechanism to get the ack for any LSN other than end
> >      * xact LSN from the downstream. So, we track lag only for end of
> >      * transaction LSN.
> >
> > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
> > sent downstream through a keep alive message. Downstream may
> > acknowledge this
> > LSN. So we do get ack for any LSN, not just commit LSN.
> >
> > So I propose removing end_xact as well.
>
> We didn't track the lag during a transaction because it could make the
> calculations of lag functionality inaccurate. If we track every lsn, it could
> fail to record important lsn information because of
> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
> Please see details in [1] and [2].

LagTrackerRead() interpolates to reduce the inaccuracy. I don't
understand why we need to track the end LSN only. But I don't think
that affects this fix. So I am fine if we want to leave end_xact
there.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Next
From: Tom Lane
Date:
Subject: Re: Rethinking the implementation of ts_headline()