Re: pg_stat_replication.*_lag sometimes shows NULL during active replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Date
Msg-id CAHGQGwEmMBBAE0RG-R3_LacfT4fbB55qGE6n9O5mNwrqvbNBtw@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_replication.*_lag sometimes shows NULL during active replication  (Shinya Kato <shinya11.kato@gmail.com>)
Responses Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
List pgsql-hackers
On Tue, Mar 10, 2026 at 10:02 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> On Mon, Mar 9, 2026 at 8:21 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> > > The attached v2 patch takes a different approach: it additionally
> > > requires that all reported positions (write/flush/apply) remain
> > > unchanged from the previous reply. This directly detects a truly idle
> > > system without relying on timeouts—if any position has advanced, new
> > > WAL activity must have occurred, so we should not clear the lag values
> > > even if the lag tracker is empty.
> >
> > This approach looks good to me.
>
> Thank you for looking into this.
>
> > One comment: currently, the lag becomes NULL basically after about one
> > wal_receiver_status_interval during periods of no activity. OTOH, with this
> > approach, it seems it would take about twice wal_receiver_status_interval.
> > Is this understanding correct?
>
> Exactly. With this patch, it takes about two
> wal_receiver_status_interval cycles to show NULL instead of one. I
> think this is an acceptable trade-off because it is better to take a
> bit longer to detect inactivity than to incorrectly show NULL during
> active replication.

Even with your latest patch, if we remove fullyAppliedLastTime, and set
clearLagTimes to true when applyPtr == sentPtr && noLagSamples &&
positionsUnchanged,
wouldn't the time for the lag to become NULL be almost the same as
wal_receiver_status_interval?

The documentation doesn't clearly specify how long it should take for
the lag to become NULL, so doubling that time might be acceptable.
However, if we can keep it roughly the same without much complexity,
I think that would be preferable.

Thought?

--
Fujii Masao



pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: client_connection_check_interval default value
Next
From: Manni Wood
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD