Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Show WAL write and fsync stats in pg_stat_io |
Date | |
Msg-id | CAN55FZ2obn1N09Hgf9mBWg5jmHuoJQ_2gOrtXTJ4Bi37di-_7Q@mail.gmail.com Whole thread Raw |
In response to | Re: Show WAL write and fsync stats in pg_stat_io (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Show WAL write and fsync stats in pg_stat_io
|
List | pgsql-hackers |
Hi, Thanks for the review and feedback on your previous reply! On Mon, 25 Dec 2023 at 09:40, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Dec 25, 2023 at 03:20:58PM +0900, Michael Paquier wrote: > > pgstat_tracks_io_bktype() does not look correct to me. Why is the WAL > > receiver considered as something correct in the list of backend types, > > while the intention is to *not* add it to pg_stat_io? I have tried to > > switche to the correct behavior of returning false for a > > B_WAL_RECEIVER, to notice that pg_rewind's test 002_databases.pl > > freezes on its shutdown sequence. Something weird is going on here. > > Could you look at it? See the XXX comment in the attached, which is > > the same behavior as v6-0002. It looks to me that the patch has > > introduced an infinite loop tweaking pgstat_tracks_io_bktype() in an > > incorrect way to avoid the root issue. > > Ah, that's because it would trigger an assertion failure: > TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object, > io_context, io_op)"), File: "pgstat_io.c", Line: 89, PID: 6824 > postgres: standby_local: walreceiver > (ExceptionalCondition+0xa8)[0x560d1b4dd38a] > > And the backtrace just tells that this is the WAL receiver > initializing a WAL segment: > #5 0x0000560d1b3322c8 in pgstat_count_io_op_n > (io_object=IOOBJECT_WAL, io_context=IOCONTEXT_INIT, io_op=IOOP_WRITE, > cnt=1) at pgstat_io.c:89 > #6 0x0000560d1b33254a in pgstat_count_io_op_time > (io_object=IOOBJECT_WAL, io_context=IOCONTEXT_INIT, io_op=IOOP_WRITE, > start_time=..., cnt=1) at pgstat_io.c:181 > #7 0x0000560d1ae7f932 in XLogFileInitInternal (logsegno=3, logtli=1, > added=0x7ffd2733c6eb, path=0x7ffd2733c2e0 "pg_wal/00000001", '0' > <repeats 15 times>, "3") at xlog.c:3115 > #8 0x0000560d1ae7fc4e in XLogFileInit (logsegno=3, logtli=1) at > xlog.c:3215 Correct. > > Wouldn't it be simpler to just bite the bullet in this case and handle > WAL receivers in the IO tracking? There is one problem and I couldn't decide how to solve it. We need to handle read IO in WALRead() in xlogreader.c. How many bytes the WALRead() function will read is controlled by a variable and it can be different from XLOG_BLCKSZ. This is a problem because pg_stat_io's op_bytes column is a constant. Here are all WALRead() function calls: 1- read_local_xlog_page_guts() in xlogutils.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 2- summarizer_read_local_xlog_page() in walsummarizer.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 3- logical_read_xlog_page() in walsender.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 4- XLogSendPhysical() in walsender.c => WALRead(nbytes) => nbytes can be different from XLOG_BLCKSZ. 5- WALDumpReadPage() in pg_waldump.c => WALRead(count) => count can be different from XLOG_BLCKSZ. 4 and 5 are the problematic calls. Melanie's answer to this problem on previous discussions: On Wed, 9 Aug 2023 at 21:52, Melanie Plageman <melanieplageman@gmail.com> wrote: > > If there is any combination of BackendType and IOContext which will > always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's > op_bytes. For other cases, we may have to consider using op_bytes 1 and > tracking reads and write IOOps in number of bytes (instead of number of > pages). I don't actually know if there is a clear separation by > BackendType for these different cases. Using op_bytes as 1 solves this problem but since it will be different from the rest of the pg_stat_io view it could be hard to understand. There is no clear separation by backends as it can be seen from the walsender. > > The other alternative I see is to use XLOG_BLCKSZ as the op_bytes and > treat op_bytes * number of reads as an approximation of the number of > bytes read. I don't actually know what makes more sense. I don't think I > would like having a number for bytes that is not accurate. Also, we have a similar problem in XLogPageRead() in xlogrecovery.c. pg_pread() call tries to read XLOG_BLCKSZ but it is not certain and we don't count IO if it couldn't read XLOG_BLCKSZ. IMO, this is not as important as the previous problem but it still is a problem. I would be glad to hear opinions on these problems. -- Regards, Nazir Bilal Yavuz Microsoft
pgsql-hackers by date: