Re: Add pg_stat_recovery system view - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Add pg_stat_recovery system view |
| Date | |
| Msg-id | CABPTF7UyUWT0E4_vxzfxPkHy=hqjrO1wcKO_ZCLBq9+en_UpQQ@mail.gmail.com Whole thread |
| In response to | Re: Add pg_stat_recovery system view (Chao Li <li.evan.chao@gmail.com>) |
| List | pgsql-hackers |
Hi, Thanks for looking into this. On Fri, Mar 6, 2026 at 3:32 PM Chao Li <li.evan.chao@gmail.com> wrote: > > > > > On Mar 5, 2026, at 21:01, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi, > > > > On Thu, Mar 5, 2026 at 3:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > >> > >> Hi Michael, > >> > >> Thanks for the detailed feedback! > >> > >> On Thu, Mar 5, 2026 at 11:58 AM Michael Paquier <michael@paquier.xyz> wrote: > >>> > >>> On Wed, Mar 04, 2026 at 09:02:29AM +0800, Xuneng Zhou wrote: > >>>> Just rebase. > >>> > >>> I have applied 0001, that simply moves some code around. > >>> > >> > >> Thanks for applying. > >> > >>> Regarding 0002, I would recommend to not bump the catalog version in > >>> catversion.h when sending a patch to the lists. This task is left to > >>> committers when the code gets merged into the tree. And this is > >>> annoying for submitters because it can create a lot of conflicts. > >> > >> I'll leave it untouched. > >> > >>> Using a set-returning function is I think wrong here, betraying the > >>> representation of the recovery status as stored in the system. We > >>> know that there is only one state of recovery, fixed in shared memory. > >>> Like the cousins of this new function, let's make thinks non-SRF, > >>> returning one row all the time with PG_RETURN_NULL() if the conditions > >>> for information display are not satisfied. When we are not in > >>> recovery or when the role querying the function is not granted > >>> ROLE_PG_READ_ALL_STATS, that will simplify the patch as there is no > >>> need for the else branch with the nulls, as written in your patch. > >>> The field values are acquired the right way, spinlock acquisitions > >>> have to be short. > >> > >> My earlier thought for keeping pg_stat_get_recovery as an SRF is to > >> make pg_stat_recovery simpler by avoiding an extra filter such as > >> WHERE s.promote_triggered IS NOT NULL to preserve 0/1-row semantics. > >> pg_stat_get_recovery_prefetch also uses the SRF pattern. > >> > >>> Like pg_stat_wal_receiver, let's add to the view definition a qual to > >>> return a row only if the fields are not NULL. > >>> > >>> pg_get_wal_replay_pause_state() displays the pause_state, and it is > >>> not hidden behind the stats read role. I am not really convinced that > >>> this is worth bothering to treat as an exception in this patch. It > >>> makes it a bit more complicated, for little gain. I would just hide > >>> all the fields behind the role granted, to keep the code simpler, even > >>> if that's slightly inconsistent with pg_get_wal_replay_pause_state(). > >> > >> I agree that exposing a subset of columns unconditionally is not worth > >> the added complexity. > >> > >>> After writing this last point, as far as I can see there is a small > >>> optimization possible in the patch. When a role is not granted > >>> ROLE_PG_READ_ALL_STATS, we know that we will not return any > >>> information so we could skip the spinlock acquisition and avoid > >>> spinlock acquisitions when one queries the function but has no access > >>> to its data. > >> > >> Makes sense to me. I'll apply it as suggested. > >> > >>> + True if a promotion has been triggered for this standby server. > >>> > >>> Standby servers are implied if data is returned, this sentence can be > >>> simpler and cut the "standby server" part. > >> > >> + 1 > >> > >>> + Start LSN of the last WAL record replayed during recovery. > >>> [...] > >>> + End LSN of the last WAL record replayed during recovery. > >>> [...] > >>> + Timeline of the last replayed WAL record. > >>> For other system views with LSN information, we don't use "LSN", but > >>> "write-ahead log location". I'd suggest the same term here. These > >>> three fields refer to the last record *successfully* replayed. It > >>> seems important to mention this fact, I guess? > >> > >> I'll replace them. > >> > >>> + <structfield>replay_end_lsn</structfield> <type>pg_lsn</type> > >>> + </para> > >>> + <para> > >>> + Current replay position. When replaying a record, this is the end > >>> + position of the record being replayed; otherwise it equals > >>> + <structfield>last_replayed_end_lsn</structfield>. > >>> > >>> Slightly inexact. This is the end LSN + 1. > >> > >> Yeh, this needs to be corrected. > >> > >>> + <structfield>replay_end_lsn</structfield> <type>pg_lsn</type> > >>> [..] > >>> + <structfield>replay_end_tli</structfield> <type>integer</type> > >>> > >>> Okay, I can fall behind the addition of these two, it could be helpful > >>> in debugging something like a locking issue when replaying a specific > >>> record, I guess. At least I'd want to know what is happening for a > >>> record currently being replayed. It seems to me that this could be > >>> more precise, mentioning that these refer to a record *currently* > >>> being replayed. > >> > >> I will adjust the docs to say these describe the record currently > >> being replayed, with replay_end_lsn being the end position + 1. > >> > >>> Is recoveryLastXTime actually that relevant to have? We use it for > >>> logging and for recovery target times, which is something less > >>> relevant than the other fields, especially if we think about standbys > >>> where these have no targets to reach. > >> > >> I agree it is less central for standby monitoring (and partly overlaps > >> with pg_last_xact_replay_timestamp()), so I’ll remove it from this > >> view in the next revision. > >> > >>> currentChunkStartTime, on the contrary, is much more relevant to me, > >>> due to the fact that we use it in WaitForWALToBecomeAvailable() with > >>> active WAL receivers running. > >> > >> Yeah, it could be useful for apply-delay/catch-up diagnostics. > >> > > > > Here is the updated patch set. Please take a look. > > > > > > -- > > Best, > > Xuneng > > <v3-0001-Add-pg_stat_recovery-system-view.patch><v3-0002-Refactor-move-XLogSource-enum-to-xlogrecovery.h.patch><v3-0003-Add-wal_source-column-to-pg_stat_recovery.patch> > > 0001 has been pushed. > > 0002 is pure refactoring and only moves the structure XLogSource from .h to .c, thus no comment. > > A few comments on 0003. > > 1 > ``` > + /* > + * Source from which the startup process most recently read WAL data. > + * Updated when the startup process successfully reads WAL from a source. > + * Note: this reflects the read source, not the original receipt source; > + * streamed WAL read from local files will show XLOG_FROM_PG_WAL. > + */ > + XLogSource lastReadSource; > ``` > > This comment says, "streamed WAL read from local files will show XLOG_FROM_PG_WAL”. But I don’t see how the conversionhappens. In WaitForWALToBecomeAvailable(), there is a path XLogFileRead() is called with XLOG_FROM_STREAM, butin XLogFileRead(), source is directly assigned to XLogRecoveryCtl->lastReadSource without conversion. > > On the other side, if “stream” is impossible, then the doc should not mention it: > ``` > + <para> > + <literal>stream</literal>: WAL actively being streamed from the > + upstream server. > + </para> > ``` you're right, the comment is wrong. There is no conversion. In WaitForWALToBecomeAvailable(), when the walreceiver has flushed data ahead of the startup process, the code deliberately opens the segment file with XLOG_FROM_STREAM , and the existing comment explains the intent: * Use XLOG_FROM_STREAM so that source info is set correctly * and XLogReceiptTime isn't changed. XLogFileRead() then assigns that directly to lastReadSource without conversion. So XLOG_FROM_STREAM is a valid observable value — it means the segment was opened during active walreceiver streaming, even though the actual I/O is against a local file in pg_wal. I think stream documentation entry should stay. > 2 Related to 1. WRT the new column name wal_source, it sounds like “where WAL is coming from”. Say the comment of lastReadSourceis true, WAL from stream is shown as pg_wal, “wal source” sounds more like stream, and “wal read source” ispg_wal. From this perspective, I just feel the new column name should be something like “wal_read_source”. > agreed, wal_read_source seems to be a better name. It more closely mirrors the underlying lastReadSource field and avoids ambiguity about whether the column describes the delivery mechanism or the read path. Please check the updated patches. -- Best, Xuneng
Attachment
pgsql-hackers by date: