Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Re: Show WAL write and fsync stats in pg_stat_io
Date
Msg-id CAN55FZ0exE77NARQ5YYQ5Q0diOtu5Fbuj1uNbbKXz_Y=GpQKaQ@mail.gmail.com
Whole thread Raw
In response to Re: Show WAL write and fsync stats in pg_stat_io  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

Thank you for looking into this! And, sorry for the late answer.

On Mon, 13 May 2024 at 17:12, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Apr 19, 2024 at 1:32 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > > I wanted to inform you that the 73f0a13266 commit changed all WALRead
> > > calls to read variable bytes, only the WAL receiver was reading
> > > variable bytes before.
> >
> > I want to start working on this again if possible. I will try to
> > summarize the current status:
>
> Thanks for working on this.
>
> > * With the 73f0a13266 commit, the WALRead() function started to read
> > variable bytes in every case. Before, only the WAL receiver was
> > reading variable bytes.
> >
> > * With the 91f2cae7a4 commit, WALReadFromBuffers() is merged. We were
> > discussing what we have to do when this is merged. It is decided that
> > WALReadFromBuffers() does not call pgstat_report_wait_start() because
> > this function does not perform any IO [1]. We may follow the same
> > logic by not including these to pg_stat_io?
>
> Right. WALReadFromBuffers doesn't do any I/O.
>
> Whoever reads WAL from disk (backends, walsenders, recovery process)
> using pg_pread (XLogPageRead, WALRead) needs to be tracked in
> pg_stat_io or some other view. If it were to be in pg_stat_io,
> although we may not be able to distinguish WAL read stats at a backend
> level (like how many times/bytes a walsender or recovery process or a
> backend read WAL from disk), but it can help understand overall impact
> of WAL read I/O at a cluster level. With this approach, the WAL I/O
> stats are divided up - WAL read I/O and write I/O stats are in
> pg_stat_io and pg_stat_wal respectively.
>
> This makes me think if we need to add WAL read I/O stats also to
> pg_stat_wal. Then, we can also add WALReadFromBuffers stats
> hits/misses there. With this approach, pg_stat_wal can be a one-stop
> view for all the WAL related stats. If needed, we can join info from
> pg_stat_wal to pg_stat_io in system_views.sql so that the I/O stats
> are emitted to the end-user via pg_stat_io.

I agree that the ultimate goal is seeing WAL I/O stats from one place.
There is a reply to this from Amit:

On Tue, 28 May 2024 at 03:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> If possible, let's have all the I/O stats (even for WAL) in
> pg_stat_io. Can't we show the WAL data we get from buffers in the hits
> column and then have read_bytes or something like that to know the
> amount of data read?

I think it is better to have all the I/O stats in pg_stat_io like Amit
said. And, it makes sense to me to show 'WAL data we get from buffers'
in the hits column. Since, basically instead of doing I/O from disk;
we get data directly from WAL buffers. I think that fits the
explanation of the hits column in pg_stat_io, which is 'The number of
times a desired block was found in a shared buffer.' [1].

> > * With the b5a9b18cd0 commit, streaming I/O is merged but AFAIK this
> > does not block anything related to putting WAL stats in pg_stat_io.
> >
> > If I am not missing any new changes, the only problem is reading
> > variable bytes now. We have discussed a couple of solutions:
> >
> > 1- Change op_bytes to something like -1, 0, 1, NULL etc. and document
> > that this means some variable byte I/O is happening.
> >
> > I kind of dislike this solution because if the *only* read I/O is
> > happening in variable bytes, it will look like write and extend I/Os
> > are happening in variable bytes as well. As a solution, it could be
> > documented that only read I/Os could happen in variable bytes for now.
>
> Yes, read I/O for relation and WAL can happen in variable bytes. I
> think this idea seems reasonable and simple yet useful to know the
> cluster-wide read I/O.

I agree.

> However, another point here is how the total number of bytes read is
> represented with existing pg_stat_io columns 'reads' and 'op_bytes'.
> It is known now with 'reads' * 'op_bytes', but with variable bytes,
> how is read bytes calculated? Maybe add new columns
> read_bytes/write_bytes?
>
> > 2- Use op_bytes_[read | write | extend] columns instead of one
> > op_bytes column, also use the first solution.
> >
> > This can solve the first solution's weakness but it introduces two
> > more columns. This is more future proof compared to the first solution
> > if there is a chance that some variable I/O could happen in write and
> > extend calls as well in the future.
>
> -1 as more columns impact the readability and usability.

I did not understand the overall difference between what you suggested
(adding read_bytes/write_bytes columns) and my suggestion (adding
op_bytes_[read | write | extend] columns). They both introduce new
columns. Could you please explain what you suggested in more detail?

> > 3- Create a new pg_stat_io_wal view to put WAL I/Os here instead of pg_stat_io.
> >
> > pg_stat_io could remain untouchable and we will have flexibility to
> > edit this new view as much as we want. But the original aim of the
> > pg_stat_io is evaluating all I/O from a single view and adding a new
> > view breaks this aim.
>
> -1 as it defeats the very purpose of one-stop view pg_stat_io for all
> kinds of I/O. PS: see my response above about adding both WAL write
> I/O and read I/O stats to pg_stat_wal.

I agree.

[1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-IO-VIEW

--
Regards,
Nazir Bilal Yavuz
Microsoft



pgsql-hackers by date:

Previous
From: "Li, Yong"
Date:
Subject: Re: Proposal to add page headers to SLRU pages
Next
From: "Joel Jacobson"
Date:
Subject: Re: Optimize numeric.c mul_var() using the Karatsuba algorithm