Re: per backend WAL statistics - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: per backend WAL statistics
Date
Msg-id Z8VzqpBkj6ja5+EN@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: per backend WAL statistics  (Michael Paquier <michael@paquier.xyz>)
Responses Re: per backend WAL statistics
Re: per backend WAL statistics
List pgsql-hackers
Hi,

On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
> On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> > Also attaching the patch I mentioned up-thread to address some of Rahila's
> > comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> > and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> > modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> > and B_WAL_WRITER.
> 
> Okay by me as it makes the code automatically more flexible if
> pgstat_tracks_backend_bktype() gets tweaked, including the call of
> pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
> able to report backend stats for its WAL I/O.  Applied this part as of
> 3f1db99bfabb.

Thanks!

> Something that's still not quite right is that the WAL receiver and
> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
> report much data and we expect these processes to run continuously.
> The location where to report stats for the WAL summarizer is simple,
> even if the system is aggressive with WAL this is never called more
> than a couple of times per seconds, like the WAL writer:
> 
> @@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
>                   * so we don't tight-loop.
>                   */
>                  HandleWalSummarizerInterrupts();
> +
> +                /* report pending statistics to the cumulative stats system */
> +                pgstat_report_wal(false);
> +
>                  summarizer_wait_for_wal();
> 
> At this location, the WAL summarizer would wait as there is no data to
> read.  The hot path is when we're reading a block.

Did not look closely enough but that sounds right after a quick look.

> The WAL receiver is a different story, because the WaitLatchOrSocket()
> call in the main loop of WalReceiverMain() is *very* aggressive, and
> it's easy to reach this code dozens of times each millisecond.  In
> short, we need to be careful, I think, based on how this is currently
> written.  My choice is then this path:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
>                       */
>                      bool        requestReply = false;
>  
> +                    /* report pending statistics to the cumulative stats system */
> +                    pgstat_report_wal(false);
> +
>                      /*
>                       * Check if time since last receive from prim
> 
> This would update the stats only when the WAL receiver has nothing to
> do or if wal_receiver_status_interval is reached, so we're not going
> to storm pgstats with updates, still we get some data on a periodic
> basis *because* wal_receiver_status_interval would make sure that the
> path is taken even if we're under a lot of WAL pressure when sending
> feedback messages back to the WAL sender.  Of course this needs a
> pretty good comment explaining the choice of this location.  What do
> you think?

Same as above, that sounds right after a quick look.

> > It looks like it does not need doc updates. Attached as 0002 as it's somehow
> > un-related to this thread (but not sure it deserves it's dedicated thread though).
> 
> I'm wondering if we should not lift more the list of processes listed
> in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
> B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
> entire paragraph.  Not sure if we really have to do that for this
> release, but we could look at that separately.

hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
update the documentation?

> With 3f1db99bfabb in place, wouldn't it be simpler to update
> pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
> with one call of pgstat_flush_backend()?  This saves one call for each
> stats flush.

hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Suraj Kharage
Date:
Subject: Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints
Next
From: Amit Kapila
Date:
Subject: Re: Selectively invalidate caches in pgoutput module