pgsql: Fix premature NULL lag reporting in pg_stat_replication - Mailing list pgsql-committers

From Fujii Masao
Subject pgsql: Fix premature NULL lag reporting in pg_stat_replication
Date
Msg-id E1w5jHV-001XI8-1m@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix premature NULL lag reporting in pg_stat_replication

pg_stat_replication is documented to keep the last measured lag values for
a short time after the standby catches up, and then set them to NULL when
there is no WAL activity. However, previously lag values could become NULL
prematurely even while WAL activity was ongoing, especially in logical
replication.

This happened because the code cleared lag when two consecutive reply messages
indicated that the apply location had caught up with the send location.
It did not verify that the reported positions were unchanged, so lag could be
cleared even when positions had advanced between messages. In logical
replication, where the apply location often quickly catches up, this issue was
more likely to occur.

This commit fixes the issue by clearing lag only when the standby reports that
it has fully replayed WAL (i.e., both flush and apply locations have caught up
with the send location) and the write/flush/apply positions remain unchanged
across two consecutive reply messages.

The second message with unchanged positions typically results from
wal_receiver_status_interval, so lag values are cleared after that interval
when there is no activity. This avoids showing stale lag data while preventing
premature NULL values.

Even with this fix, lag may rarely become NULL during activity if identical
position reports are sent repeatedly. Eliminating such duplicate messages
would address this fully, but that change is considered too invasive for stable
branches and will be handled in master only later.

Backpatch to all supported branches.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAOzEurTzcUrEzrH97DD7+Yz=HGPU81kzWQonKZvqBwYhx2G9_A@mail.gmail.com
Backpatch-through: 14

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/bf7ecf35318d432e6661c3b79e0160ff733eef7b

Modified Files
--------------
src/backend/replication/walsender.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)


pgsql-committers by date:

Previous
From: Fujii Masao
Date:
Subject: pgsql: Fix premature NULL lag reporting in pg_stat_replication
Next
From: Fujii Masao
Date:
Subject: pgsql: Avoid sending duplicate WAL locations in standby status replies