Re: per backend WAL statistics - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: per backend WAL statistics
Date
Msg-id Z8a+XFNm5RV4LTy1@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: per backend WAL statistics  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On Tue, Mar 04, 2025 at 09:28:27AM +0900, Michael Paquier wrote:
> On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> > On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
> >> Something that's still not quite right is that the WAL receiver and
> >> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
> >> report much data and we expect these processes to run continuously.
> >> The location where to report stats for the WAL summarizer is simple,
> >> even if the system is aggressive with WAL this is never called more
> >> than a couple of times per seconds, like the WAL writer:
> > 
> > Same as above, that sounds right after a quick look.
> 
> Attached is a patch for this set of issues for the WAL receiver, the
> WAL summarizer and the WAL writer.

Thanks for the patch!

=== 1

@@ -1543,6 +1544,9 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
                                HandleWalSummarizerInterrupts();
                                summarizer_wait_for_wal();

+                               /* report pending statistics to the cumulative stats system */
+                               pgstat_report_wal(false);

s/report/Report/ and s/system/system./? to be consistent with the other single
line comments around.

=== 2

+  /*
+   * Report pending statistics to the cumulative stats
+   * system. This location is useful for the report as it is
+   * not within a tight loop in the WAL receiver, which
+   * would bloat requests to pgstats, while also making sure
+   * that the reports happen at least each time a status
+   * update is sent.
+   */

Yeah, I also think that's the right location.

Nit: s/would/could/?

=== 3

+       /*
+        * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all
+        * rows for all the other objects for these.
+        */
+       if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER ||
+                bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL)
+               return false;

I think that makes sense and it removes 15 lines out of 86. This function is
"hard" to read/parse from my point of view. Maybe we could re-write it in a
simpler way but that's outside the purpose of this thread. 

=== 4

+   WHERE backend_type = 'walsummarizer' AND object = 'wal'});

The object = 'wal' is not needed (thanks to === 3), maybe we can remove this
filtering?

Also what about adding a test to check that sum(reads) is NULL where object != 'wal'?

=== 5

Same remark as above for the WAL receiver (excepts that sum(writes) is NULL where
object != 'wal').

> All that should be fixed before looking at the remaining patch for the
> WAL stats at backend level

Not sure as that would also prevent the other backend types to report their WAL
statistics if the above is not fixed. 

>, so what do you think about the attached?

That's pretty straightforward, so yeah we can wait that it goes in before
moving forward with the WAL stats at backend level.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: bug: ALTER TABLE ADD VIRTUAL GENERATED COLUMN with table rewrite
Next
From: Peter Eisentraut
Date:
Subject: Re: Accidental assignment instead of compare in GetOperatorFromCompareType?