Re: per backend WAL statistics - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: per backend WAL statistics |
Date | |
Msg-id | Z8UKZyVSHUUQJHNb@paquier.xyz Whole thread Raw |
In response to | Re: per backend WAL statistics (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: per backend WAL statistics
|
List | pgsql-hackers |
On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote: > Also attaching the patch I mentioned up-thread to address some of Rahila's > comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid() > and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a > modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER > and B_WAL_WRITER. Okay by me as it makes the code automatically more flexible if pgstat_tracks_backend_bktype() gets tweaked, including the call of pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is able to report backend stats for its WAL I/O. Applied this part as of 3f1db99bfabb. Something that's still not quite right is that the WAL receiver and the WAL summarizer do not call pgstat_report_wal() at all, so we don't report much data and we expect these processes to run continuously. The location where to report stats for the WAL summarizer is simple, even if the system is aggressive with WAL this is never called more than a couple of times per seconds, like the WAL writer: @@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state, * so we don't tight-loop. */ HandleWalSummarizerInterrupts(); + + /* report pending statistics to the cumulative stats system */ + pgstat_report_wal(false); + summarizer_wait_for_wal(); At this location, the WAL summarizer would wait as there is no data to read. The hot path is when we're reading a block. The WAL receiver is a different story, because the WaitLatchOrSocket() call in the main loop of WalReceiverMain() is *very* aggressive, and it's easy to reach this code dozens of times each millisecond. In short, we need to be careful, I think, based on how this is currently written. My choice is then this path: --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len) */ bool requestReply = false; + /* report pending statistics to the cumulative stats system */ + pgstat_report_wal(false); + /* * Check if time since last receive from prim This would update the stats only when the WAL receiver has nothing to do or if wal_receiver_status_interval is reached, so we're not going to storm pgstats with updates, still we get some data on a periodic basis *because* wal_receiver_status_interval would make sure that the path is taken even if we're under a lot of WAL pressure when sending feedback messages back to the WAL sender. Of course this needs a pretty good comment explaining the choice of this location. What do you think? > It looks like it does not need doc updates. Attached as 0002 as it's somehow > un-related to this thread (but not sure it deserves it's dedicated thread though). I'm wondering if we should not lift more the list of processes listed in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER, B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the entire paragraph. Not sure if we really have to do that for this release, but we could look at that separately. With 3f1db99bfabb in place, wouldn't it be simpler to update pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL with one call of pgstat_flush_backend()? This saves one call for each stats flush. -- Michael
Attachment
pgsql-hackers by date: