Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Show WAL write and fsync stats in pg_stat_io |
Date | |
Msg-id | CAN55FZ05OQ-FVD-OCbyb8RFDFViNabiL4aAbgbZFsryyEN38Cg@mail.gmail.com Whole thread Raw |
In response to | Re: Show WAL write and fsync stats in pg_stat_io (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Show WAL write and fsync stats in pg_stat_io
|
List | pgsql-hackers |
Hi, Thanks for the review! Current status of the patch is: - 'WAL read' stats in xlogrecovery.c are added to pg_stat_io. - IOCONTEXT_INIT is added to count 'WAL init'. 'WAL init' stats are added to pg_stat_io. - pg_stat_io shows different op_bytes for the IOOBJECT_WAL operations. - Working on which 'BackendType / IOContext / IOOp' should be banned in pg_stat_io. - Working on adding 'WAL read' to the xlogreader.c and walsender.c. - PendingWalStats.wal_sync and PendingWalStats.wal_write_time/PendingWalStats.wal_sync_time are moved to pgstat_count_io_op_n()/pgstat_count_io_op_time() respectively. TODOs: - Documentation. - Thinking about how to interpret the different IOOps within the two IOContexts and discussing what would be useful to count. - Decide which 'BackendType / IOContext / IOOp' should not be tracked. - Adding 'WAL read' to the xlogreader.c and walsender.c. (This could be an another patch) - Adding WAIT_EVENT_WAL_COPY_* operations to pg_stat_io if needed. (This could be an another patch) On Sat, 22 Jul 2023 at 01:30, Melanie Plageman <melanieplageman@gmail.com> wrote: > I think it would be good to count WAL reads even though they are not > currently represented in pg_stat_wal. Here is a thread discussing this > [1]. I used the same implementation in the thread link [1]. I added 'WAL read' to only xlogrecovery.c for now. I didn't add 'WAL read' to xlogreader.c and walsender.c because they cause some failures on: '!pgStatLocal.shmem->is_shutdown' asserts. I will spend more time on these. Also, I added Bharath to CC. I have a question about 'WAL read': 1. There are two places where 'WAL read' happens. a. In WALRead() in xlogreader.c, it reads 'count' bytes, most of the time count is equal to XLOG_BLCKSZ but there are some cases it is not. For example - in XLogSendPhysical() in walsender.c WALRead() is called by nbytes - in WALDumpReadPage() in pg_waldump.c WALRead() is called by count These nbytes and count variables could be different from XLOG_BLCKSZ. b. in XLogPageRead() in xlogreader.c, it reads exactly XLOG_BLCKSZ bytes: pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff); So, what should op_bytes be set to for 'WAL read' operations? > Eventually, the docs will need an update as well. You can wait until a > later version of the patch to do this, but I would include it in a list > of the remaining TODOs in your next version. Done. I shared TODOs at the top. > I think we will also want to add an IOContext for WAL initialization. > Then we can track how long is spent doing 'WAL init' (including filling > the WAL file with zeroes). XLogFileInitInternal() is likely where we > would want to add it. And op_bytes for this would likely be > wal_segment_size. I thought I heard about someone proposing adding WAL > init to pg_stat_wal, but I can't find the thread. Done. I created a new IOCONTEXT_INIT IOContext for the 'WAL init'. I have a question there: 1. Some of the WAL processes happens at initdb (standalone backend IOCONTEXT_NORMAL/(IOOP_READ & IOOP_WRITE) and IOCONTEXT_INIT/(IOOP_WRITE & IOOP_FSYNC)). Since this happens at the initdb, AFAIK there is no way to set 'track_wal_io_timing' and 'track_io_timing' variables there. So, their timings appear as 0. Should I use IsBootstrapProcessingMode() to enable WAL io timings at the initdb or are they not that much important? > I think there is also an argument for counting WAL files recycled as > IOOP_REUSES. We should start thinking about how to interpret the > different IOOps within the two IOContexts and discussing what would be > useful to count. For example, should removing a logfile count as an > IOOP_EVICT? Maybe it is not directly related to "IO" enough or even an > interesting statistic, but we should think about what kinds of > IO-related WAL statistics we want to track. I added that to TODOs. > Any that we decide not to count for now should be "banned" in > pgstat_tracks_io_op() for clarity. For example, if we create a separate > IOContext for WAL file init, I'm not sure what would count as an > IOOP_EXTEND in IOCONTEXT_NORMAL for IOOBJECT_WAL. > > Also, I think there are some backend types which will not generate WAL > and we should determine which those are and skip those rows in > pgstat_tracks_io_object(). I agree, I am working on this. I have a couple of questions: 1. Can client backend and background worker do IOCONTEXT_NORMAL/IOOP_READ? 2. Is there an easy way to check if 'BackendType / IOOBJECT_WAL' does specific IOOp operations? > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index 8b0710abe6..2ee6c21398 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -2207,6 +2207,10 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID > tli, bool flexible) > > I think we should likely follow the pattern of using > pgstat_prepare_io_time() and pgstat_count_io_op_time() as it is done > elsewhere. You could pass the IOObject as a parameter to > pgstat_prepare_io_time() in order to determine if we should check > track_io_timing or track_wal_io_timing. And we'll want to check > track_wal_io_timing if IOObject is IOOBJECT_WAL in > pgstat_count_io_op_time(). Done. Instead of passing parameters to pgstat_prepare_io_time(), I used a slightly different implementation. I return the current time if there is a chance that any 'time' can be tracked. > INSTR_TIME_SET_CURRENT(duration); > > INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, > start); > > + pgstat_count_io_op_time(IOOBJECT_WAL, > IOCONTEXT_NORMAL, IOOP_WRITE, start, 1); > + } else > + { > > Other users of pgstat_count_io_op_time()/io_op_n() which write multiple > pages at a time pass the number of pages in as the cnt parameter. (see > ExtendBufferedRelLocal() as an example). I think we want to do that for > WAL also. In this case, it would be the local variable "npages" and we > can do it outside of this loop. > > It is true that the existing WAL stats count wal_writes here. However, > this is essentially counting write system calls, which is probably not > what we want for pg_stat_io. See [2] for a discussion about whether to > count blocks written back or writeback system calls for a previous > pg_stat_io feature. All of the other block-based IO statistics in > pg_stat_io count the number of blocks. > > This being said, we probably want to just leave > PendingWalStats.wal_write++ here. We would normally move it into > pg_stat_io like we have with pgBufferUsage and the db IO stats that are > updated in pgstat_count_io_op_time(). This consolidation makes it easier > to eventually reduce the duplication. However, in this case, it seems > wal_write counts something we don't count in pg_stat_io, so it can > probably be left here. I would still move the > PendingWalStats.wal_write_time into pgstat_count_io_op_time(), since > that seems like it is the same as what will be in pg_stat_io. Done. I moved PendingWalStats.wal_sync and PendingWalStats.wal_write_time/PendingWalStats.wal_sync_time to pgstat_count_io_op_n()/pgstat_count_io_op_time() respectively. Because of this change, pg_stat_wal's and pg_stat_io's IOOBJECT_WAL/IOCONTEXT_NORMAL/IOOP_WRITE counts are different but the rest are the same. > Also, op_bytes for IOOBJECT_WAL/IOCONTEXT_NORMAL should be XLOG_BLCKSZ > (see comment in pg_stat_get_io() in pgstatfuncs.c). Those default to the > same value but can be made to be different. Done. > I would wrap this line and check other lines to make sure they are not > too long. Done. > > + } else > + { > + pgstat_count_io_op_n(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, 1); > } > > PendingWalStats.wal_sync++; > > Same feedback as above about using the prepare/count pattern used for > pg_stat_io elsewhere. In this case, you should be able to move > PendingWalStats.wal_sync into there as well. Done. > > There is a track_io_timing variable to control pg_stat_io > > timings and a track_wal_io_timing variable to control WAL timings. I > > couldn't decide on which logic to enable WAL timings on pg_stat_io. > > For now, both pg_stat_io and track_wal_io_timing are needed to be > > enabled to track WAL timings in pg_stat_io. > > Hmm. I could see a case where someone doesn't want to incur the > overhead of track_io_timing for regular IO but does want to do so for > WAL because they are interested in a specific issue. I'm not sure > though. I could be convinced otherwise (based on relative overhead, > etc). Done. IOOBJECT_WAL uses track_wal_io_timing regardless of track_io_timing for now. > [1] https://www.postgresql.org/message-id/flat/20230216191138.jotc73lqb7xhfqbi%40awork3.anarazel.de#eb4a641427fa1eb013e9ecdd8648e640 > [2] https://www.postgresql.org/message-id/20230504165738.4e2hfoddoels542c%40awork3.anarazel.de In addition to these, are WAIT_EVENT_WAL_COPY_* operations needed to be added to pg_stat_io? If the answer is yes, should I add them to the current patch? Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft
Attachment
pgsql-hackers by date: