Home > mailing lists
Re: Add pg_stat_recovery system view - Mailing list pgsql-hackers

From	Chao Li
Subject	Re: Add pg_stat_recovery system view
Date	March 6 10:31:36
Msg-id	3198603F-8323-4616-888C-EF77D0D0165B@gmail.com Whole thread Raw
In response to	Re: Add pg_stat_recovery system view (Xuneng Zhou <xunengzhou@gmail.com>)
Responses	Re: Add pg_stat_recovery system view
List	pgsql-hackers
Tree view

> On Mar 5, 2026, at 21:01, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Thu, Mar 5, 2026 at 3:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>>
>> Hi Michael,
>>
>> Thanks for the detailed feedback!
>>
>> On Thu, Mar 5, 2026 at 11:58 AM Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Wed, Mar 04, 2026 at 09:02:29AM +0800, Xuneng Zhou wrote:
>>>> Just rebase.
>>>
>>> I have applied 0001, that simply moves some code around.
>>>
>>
>> Thanks for applying.
>>
>>> Regarding 0002, I would recommend to not bump the catalog version in
>>> catversion.h when sending a patch to the lists.  This task is left to
>>> committers when the code gets merged into the tree.  And this is
>>> annoying for submitters because it can create a lot of conflicts.
>>
>> I'll leave it untouched.
>>
>>> Using a set-returning function is I think wrong here, betraying the
>>> representation of the recovery status as stored in the system.  We
>>> know that there is only one state of recovery, fixed in shared memory.
>>> Like the cousins of this new function, let's make thinks non-SRF,
>>> returning one row all the time with PG_RETURN_NULL() if the conditions
>>> for information display are not satisfied.  When we are not in
>>> recovery or when the role querying the function is not granted
>>> ROLE_PG_READ_ALL_STATS, that will simplify the patch as there is no
>>> need for the else branch with the nulls, as written in your patch.
>>> The field values are acquired the right way, spinlock acquisitions
>>> have to be short.
>>
>> My earlier thought for keeping pg_stat_get_recovery as an SRF is to
>> make pg_stat_recovery simpler by avoiding an extra filter such as
>> WHERE s.promote_triggered IS NOT NULL to preserve 0/1-row semantics.
>> pg_stat_get_recovery_prefetch also uses the SRF pattern.
>>
>>> Like pg_stat_wal_receiver, let's add to the view definition a qual to
>>> return a row only if the fields are not NULL.
>>>
>>> pg_get_wal_replay_pause_state() displays the pause_state, and it is
>>> not hidden behind the stats read role.  I am not really convinced that
>>> this is worth bothering to treat as an exception in this patch.  It
>>> makes it a bit more complicated, for little gain.  I would just hide
>>> all the fields behind the role granted, to keep the code simpler, even
>>> if that's slightly inconsistent with pg_get_wal_replay_pause_state().
>>
>> I agree that exposing a subset of columns unconditionally is not worth
>> the added complexity.
>>
>>> After writing this last point, as far as I can see there is a small
>>> optimization possible in the patch.  When a role is not granted
>>> ROLE_PG_READ_ALL_STATS, we know that we will not return any
>>> information so we could skip the spinlock acquisition and avoid
>>> spinlock acquisitions when one queries the function but has no access
>>> to its data.
>>
>> Makes sense to me. I'll apply it as suggested.
>>
>>> +       True if a promotion has been triggered for this standby server.
>>>
>>> Standby servers are implied if data is returned, this sentence can be
>>> simpler and cut the "standby server" part.
>>
>> + 1
>>
>>> +       Start LSN of the last WAL record replayed during recovery.
>>> [...]
>>> +       End LSN of the last WAL record replayed during recovery.
>>> [...]
>>> +       Timeline of the last replayed WAL record.
>>> For other system views with LSN information, we don't use "LSN", but
>>> "write-ahead log location".  I'd suggest the same term here.  These
>>> three fields refer to the last record *successfully* replayed.  It
>>> seems important to mention this fact, I guess?
>>
>> I'll replace them.
>>
>>> +       <structfield>replay_end_lsn</structfield> <type>pg_lsn</type>
>>> +      </para>
>>> +      <para>
>>> +       Current replay position. When replaying a record, this is the end
>>> +       position of the record being replayed; otherwise it equals
>>> +       <structfield>last_replayed_end_lsn</structfield>.
>>>
>>> Slightly inexact.  This is the end LSN + 1.
>>
>> Yeh, this needs to be corrected.
>>
>>> +       <structfield>replay_end_lsn</structfield> <type>pg_lsn</type>
>>> [..]
>>> +       <structfield>replay_end_tli</structfield> <type>integer</type>
>>>
>>> Okay, I can fall behind the addition of these two, it could be helpful
>>> in debugging something like a locking issue when replaying a specific
>>> record, I guess.  At least I'd want to know what is happening for a
>>> record currently being replayed.  It seems to me that this could be
>>> more precise, mentioning that these refer to a record *currently*
>>> being replayed.
>>
>> I will adjust the docs to say these describe the record currently
>> being replayed, with replay_end_lsn being the end position + 1.
>>
>>> Is recoveryLastXTime actually that relevant to have?  We use it for
>>> logging and for recovery target times, which is something less
>>> relevant than the other fields, especially if we think about standbys
>>> where these have no targets to reach.
>>
>> I agree it is less central for standby monitoring (and partly overlaps
>> with pg_last_xact_replay_timestamp()), so I’ll remove it from this
>> view in the next revision.
>>
>>> currentChunkStartTime, on the contrary, is much more relevant to me,
>>> due to the fact that we use it in WaitForWALToBecomeAvailable() with
>>> active WAL receivers running.
>>
>> Yeah, it could be useful for apply-delay/catch-up diagnostics.
>>
>
> Here is the updated patch set. Please take a look.
>
>
> --
> Best,
> Xuneng
>
<v3-0001-Add-pg_stat_recovery-system-view.patch><v3-0002-Refactor-move-XLogSource-enum-to-xlogrecovery.h.patch><v3-0003-Add-wal_source-column-to-pg_stat_recovery.patch>

0001 has been pushed.

0002 is pure refactoring and only moves the structure XLogSource from .h to .c, thus no comment.

A few comments on 0003.

1
```
+    /*
+     * Source from which the startup process most recently read WAL data.
+     * Updated when the startup process successfully reads WAL from a source.
+     * Note: this reflects the read source, not the original receipt source;
+     * streamed WAL read from local files will show XLOG_FROM_PG_WAL.
+     */
+    XLogSource    lastReadSource;
```

This comment says, "streamed WAL read from local files will show XLOG_FROM_PG_WAL”. But I don’t see how the conversion
happens.In WaitForWALToBecomeAvailable(), there is a path XLogFileRead() is called with XLOG_FROM_STREAM, but in
XLogFileRead(),source is directly assigned to XLogRecoveryCtl->lastReadSource without conversion. 

On the other side, if “stream” is impossible, then the doc should not mention it:
```
+         <para>
+          <literal>stream</literal>: WAL actively being streamed from the
+          upstream server.
+         </para>
```


2 Related to 1. WRT the new column name wal_source, it sounds like “where WAL is coming from”. Say the comment of
lastReadSourceis true, WAL from stream is shown as pg_wal, “wal source” sounds more like stream, and “wal read source”
ispg_wal. From this perspective, I just feel the new column name should be something like “wal_read_source”. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
pgsql-hackers by date:
From: Madyshev Egor
Date: 06 March, 10:26:44
Subject: RE: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
From: "慈超云"
Date: 06 March, 10:34:16
Subject: [PATCH] Remove unused is_error parameter from TeardownHistoricSnapshot()
Re: Add pg_stat_recovery system view - Mailing list pgsql-hackers

Previous

Next