Re: Add max_wal_replay_size connection parameter to libpq - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: Add max_wal_replay_size connection parameter to libpq
Date
Msg-id CAHg+QDffk2NSTTvobAqqBmpN+DTZDJ3cdwsi5XxOUmY-MdKgwA@mail.gmail.com
Whole thread Raw
In response to Re: Add max_wal_replay_size connection parameter to libpq  (Jim Jones <jim.jones@uni-muenster.de>)
Responses Re: Add max_wal_replay_size connection parameter to libpq
List pgsql-hackers
Hi,

On Sun, Mar 29, 2026 at 11:53 AM Jim Jones <jim.jones@uni-muenster.de> wrote:


On 29/03/2026 20:31, SATYANARAYANA NARLAPURAM wrote:
> What if none of them meets the criteria? You fail the connection?
> Wouldn't it cause an availability issue?


Yes, the connection fails if no host meets the threshold. This is
intentional, and it is consistent with the existing behaviour of
target_session_attrs: if you set target_session_attrs=standby and no
standby is reachable, the connection fails too.


>     If pg_last_wal_receive_lsn() is NULL (e.g. no active WAL receiver due to
>     missing primary_conninfo or a disconnected upstream), the backlog cannot
>     be determined. In that case, the standby is treated as exceeding the
>     threshold and is skipped.
>
>
> When a standby is replaying archiving log, it can still be caught up.
> This doesn't seem right to me.


I totally see your point here. The issue is that
pg_last_wal_receive_lsn() returns NULL when there is no WAL receiver
process -- regardless of how current the data actually is. Without a
receive LSN, the metric this parameter is based on (receive_lsn -
replay_lsn) is simply undefined for that standby.

Please let me know if I am missing something here.


>
>     This parameter measures only the apply lag on the standby itself, i.e.,
>     how much already-received WAL remains to be replayed. It does not
>     attempt to measure how far the standby is behind the primary. In
>     particular, a standby that is slow to receive WAL but fast to replay it
>     may report a small backlog here while still being significantly behind.
>
>
> IMHO, this change appears to not meet the objective of routing
> connections/queries to the most up-to-date standby.


The parameter's objective is not to route to the most up-to-date
standby; it is to skip standbys whose apply lag exceeds a given threshold.

What is the expectation from such a routing? Is it for freshness of data for the client or 
freeing up the standby  from user connections so that it can catch up with primary?
The paragraph described originally was talking about the freshness.
 
Thanks,
Satya

pgsql-hackers by date:

Previous
From: "Jelte Fennema-Nio"
Date:
Subject: Re: Make copyObject work in C++
Next
From: SATYANARAYANA NARLAPURAM
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum