Re: per backend WAL statistics - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: per backend WAL statistics
Date
Msg-id Z8UKZyVSHUUQJHNb@paquier.xyz
Whole thread Raw
In response to Re: per backend WAL statistics  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: per backend WAL statistics
List pgsql-hackers
On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> Also attaching the patch I mentioned up-thread to address some of Rahila's
> comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> and B_WAL_WRITER.

Okay by me as it makes the code automatically more flexible if
pgstat_tracks_backend_bktype() gets tweaked, including the call of
pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
able to report backend stats for its WAL I/O.  Applied this part as of
3f1db99bfabb.

Something that's still not quite right is that the WAL receiver and
the WAL summarizer do not call pgstat_report_wal() at all, so we don't
report much data and we expect these processes to run continuously.
The location where to report stats for the WAL summarizer is simple,
even if the system is aggressive with WAL this is never called more
than a couple of times per seconds, like the WAL writer:

@@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
                  * so we don't tight-loop.
                  */
                 HandleWalSummarizerInterrupts();
+
+                /* report pending statistics to the cumulative stats system */
+                pgstat_report_wal(false);
+
                 summarizer_wait_for_wal();

At this location, the WAL summarizer would wait as there is no data to
read.  The hot path is when we're reading a block.

The WAL receiver is a different story, because the WaitLatchOrSocket()
call in the main loop of WalReceiverMain() is *very* aggressive, and
it's easy to reach this code dozens of times each millisecond.  In
short, we need to be careful, I think, based on how this is currently
written.  My choice is then this path:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
                      */
                     bool        requestReply = false;

+                    /* report pending statistics to the cumulative stats system */
+                    pgstat_report_wal(false);
+
                     /*
                      * Check if time since last receive from prim

This would update the stats only when the WAL receiver has nothing to
do or if wal_receiver_status_interval is reached, so we're not going
to storm pgstats with updates, still we get some data on a periodic
basis *because* wal_receiver_status_interval would make sure that the
path is taken even if we're under a lot of WAL pressure when sending
feedback messages back to the WAL sender.  Of course this needs a
pretty good comment explaining the choice of this location.  What do
you think?

> It looks like it does not need doc updates. Attached as 0002 as it's somehow
> un-related to this thread (but not sure it deserves it's dedicated thread though).

I'm wondering if we should not lift more the list of processes listed
in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
entire paragraph.  Not sure if we really have to do that for this
release, but we could look at that separately.

With 3f1db99bfabb in place, wouldn't it be simpler to update
pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
with one call of pgstat_flush_backend()?  This saves one call for each
stats flush.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jacob Brazeal
Date:
Subject: Fixing various typos in comments and docs
Next
From: Michael Paquier
Date:
Subject: Re: [BUG]: the walsender does not update its IO statistics until it exits