Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Show WAL write and fsync stats in pg_stat_io
Date
Msg-id ZYkfSkorj6OgD6ZM@paquier.xyz
Whole thread Raw
In response to Re: Show WAL write and fsync stats in pg_stat_io  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Show WAL write and fsync stats in pg_stat_io
List pgsql-hackers
On Sat, Dec 16, 2023 at 08:20:57PM +0100, Michael Paquier wrote:
> One thing that 0001 missed is an update of the header where the
> function is declared.  I've edited a few things, and applied it to
> start on this stuff.  The rest will have to wait a bit more..

I have been reviewing the whole, and spotted a couple of issues.

+     * At the end of the if case, accumulate time for the pg_stat_io.
+     */
+    if (pgstat_should_track_io_time(io_object, io_context))

There was a bug here.  WAL operations can do IOOP_WRITE or IOOP_READ,
and this would cause pgstat_count_buffer_read_time() and
pgstat_count_buffer_write_time() to be called, incrementing
pgStatBlock{Read,Write}Time, which would be incorrect when it comes to
a WAL page or a WAL segment.  I was wondering what to do here first,
but we could just avoid calling these routines when working on an
IOOBJECT_WAL as that's the only object not doing a buffer operation.

A comment at the top of pgstat_tracks_io_bktype() is incorrect,
because this patch adds the WAL writer sender in the I/O tracking.

+           case B_WAL_RECEIVER:
+           case B_WAL_SENDER:
+           case B_WAL_WRITER:
+               return false;

pgstat_tracks_io_op() now needs B_WAL_SUMMARIZER.

pgstat_should_track_io_time() is used only in pgstat_io.c, so it can
be static rather than published in pgstat.h.

pgstat_tracks_io_bktype() does not look correct to me.  Why is the WAL
receiver considered as something correct in the list of backend types,
while the intention is to *not* add it to pg_stat_io?  I have tried to
switche to the correct behavior of returning false for a
B_WAL_RECEIVER, to notice that pg_rewind's test 002_databases.pl
freezes on its shutdown sequence.  Something weird is going on here.
Could you look at it?  See the XXX comment in the attached, which is
the same behavior as v6-0002.  It looks to me that the patch has
introduced an infinite loop tweaking pgstat_tracks_io_bktype() in an
incorrect way to avoid the root issue.

I have also spent more time polishing the rest, touching a few things
while reviewing.  Not sure that I see a point in splitting the tests
from the main patch.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Should "CRC" be in uppercase?
Next
From: Michael Paquier
Date:
Subject: Re: Show WAL write and fsync stats in pg_stat_io