Thread: Re: Make pg_stat_io view count IOs as bytes instead of blocks

Re: Make pg_stat_io view count IOs as bytes instead of blocks

From
Melanie Plageman
Date:
On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Currently, in the pg_stat_io view, IOs are counted as blocks. However, there are two issues with this approach:
>
> 1- The actual number of IO requests to the kernel is lower because IO requests can be merged before sending the final
request.Additionally, it appears that all IOs are counted in block size.
 

I think this is a great idea. It will allow people to tune
io_combine_limit as you mention below.

> 2- Some IOs may not align with block size. For example, WAL read IOs are done in variable bytes and it is not
possibleto correctly show these IOs in the pg_stat_io view [1].
 

Yep, this makes a lot of sense as a solution.

> To address this, I propose showing the total number of IO requests to the kernel (as smgr function calls) and the
totalnumber of bytes in the IO. To implement this change, the op_bytes column will be removed from the pg_stat_io view.
Instead,the [reads | writes | extends] columns will track the total number of IO requests, and newly added [read |
write| extend]_bytes columns will track the total number of bytes in the IO.
 

smgr API seems like the right place for this.

> Example benefit of this change:
>
> Running query [2], the result is:
>
> ╔═══════════════════╦══════════╦══════════╦═══════════════╗
> ║    backend_type   ║  object  ║  context ║ avg_io_blocks ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║   client backend  ║ relation ║ bulkread ║     15.99     ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║ background worker ║ relation ║ bulkread ║     15.99     ║
> ╚═══════════════════╩══════════╩══════════╩═══════════════╝

I don't understand why background worker is listed here.

> You can rerun the same query [2] after setting io_combine_limit to 32 [3]. The result is:
>
> ╔═══════════════════╦══════════╦══════════╦═══════════════╗
> ║    backend_type   ║  object  ║  context ║ avg_io_blocks ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║   client backend  ║ relation ║ bulkread ║     31.70     ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║ background worker ║ relation ║ bulkread ║     31.60     ║
> ╚═══════════════════╩══════════╩══════════╩═══════════════╝
>
> I believe that having visibility into avg_io_[bytes | blocks] is valuable information that could help optimize
Postgres.

In general, for this example, I think it would be more clear if you
compared what visibility we have in pg_stat_io on master with what
visibility we have with your patch.

I like that you show how io_combine_limit can be tuned using this, but
I don't think the problem statement is clear nor is the full
narrative.

> CREATE TABLE t as select i, repeat('a', 600) as filler from generate_series(1, 10000000) as i;
> SELECT pg_stat_reset_shared('io');
> SELECT * FROM t WHERE i = 0;
> SELECT backend_type, object, context, TRUNC((read_bytes / reads / (SELECT current_setting('block_size')::numeric)),
2)as avg_io_blocks FROM pg_stat_io WHERE reads > 0;
 

I like that you calculate the avg_io_blocks, but I think it is good to
show the raw columns as well.

- Melanie

Re: Make pg_stat_io view count IOs as bytes instead of blocks

From
Bertrand Drouvot
Date:
Hi,

On Wed, Nov 27, 2024 at 11:08:01AM -0500, Melanie Plageman wrote:
> On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Currently, in the pg_stat_io view, IOs are counted as blocks. However, there are two issues with this approach:
> >
> > 1- The actual number of IO requests to the kernel is lower because IO requests can be merged before sending the
finalrequest. Additionally, it appears that all IOs are counted in block size.
 
> 
> I think this is a great idea. It will allow people to tune
> io_combine_limit as you mention below.
> 
> > 2- Some IOs may not align with block size. For example, WAL read IOs are done in variable bytes and it is not
possibleto correctly show these IOs in the pg_stat_io view [1].
 
> 
> Yep, this makes a lot of sense as a solution.

Thanks for the patch! I also think it makes sense.

A few random comments:

=== 1

+       /*
+        * If IO done in bytes and byte is <= 0, this means there is an error
+        * while doing an IO. Don't count these IOs.
+        */

s/byte/bytes/?

also:

The pgstat_io_count_checks() parameter is uint64. Does it mean it has to be
changed to int64?

Also from what I can see the calls are done with those values:

- 0
- io_buffers_len * BLCKSZ
- extend_by * BLCKSZ
- BLCKSZ

could io_buffers_len and extend_by be < 0? If not, is the comment correct?

=== 2

+       Assert((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND

and

+       if ((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND) &&

What about ordering the enum in IOOp (no bytes/bytes) so that we could check
that io_op >= "our firt bytes enum" instead?

Also we could create a macro on top of that to make it clear. And a comment
would be needed around the IOOp definition.

I think that would be simpler to maintain should we add no bytes or bytes op in
the future.

=== 3

+pgstat_io_count_checks(IOObject io_object, IOContext io_context, IOOp io_op, uint64 bytes)
+{
+       Assert((unsigned int) io_object < IOOBJECT_NUM_TYPES);
+       Assert((unsigned int) io_context < IOCONTEXT_NUM_TYPES);
+       Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+       Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));

IOObject and IOContext are passed only for the assertions. What about removing
them from there and put the asserts in other places?

=== 4

+       /* Only IOOP_READ, IOOP_WRITE and IOOP_EXTEND can do IO in bytes. */

Not sure about "can do IO in bytes" (same wording is used in multiple places).

=== 5

/* Convert to numeric. */

"convert to numeric"? to be consistent with others single line comments around.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Make pg_stat_io view count IOs as bytes instead of blocks

From
Bertrand Drouvot
Date:
Hi,

On Wed, Dec 04, 2024 at 02:49:11PM +0300, Nazir Bilal Yavuz wrote:
> On Thu, 28 Nov 2024 at 16:39, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> You are right, no need to have this check; it can not be less than 0.
> I completely removed the function now.

Thanks! Yeah I think that makes sense. 

> > What about ordering the enum in IOOp (no bytes/bytes) so that we could check
> > that io_op >= "our firt bytes enum" instead?
> >
> > Also we could create a macro on top of that to make it clear. And a comment
> > would be needed around the IOOp definition.
> >
> > I think that would be simpler to maintain should we add no bytes or bytes op in
> > the future.
> 
> I think this is a good idea. I applied all the comments.

Thanks!

+ * non-byte-measured and byte-measured operations. So, that makes is easier
+ * to check whether IO is measured in bytes.

s/that makes is/that makes it/

+ *
+ * Note: If this enum is modified, ensure that both `IOOP_NUM_TYPES` macro
+ * and the `ioop_measured_in_bytes` function are updated accordingly.

Yeah, or?

"Ensure IOOP_EXTEND is the first and IOOP_WRITE the last ones in the measured in
bytes" group and that the groups stay in that order. 

That would make the "checks" local to the enum def should someone modify it.

> Created an
> inline function instead of macro and added this 'Assert((unsigned int)
> io_object < IOOBJECT_NUM_TYPES);' to function.

An inline function seems the right choice for me too.

> Done. I moved these checks to the pgstat_count_io_op_n() function. The
> code looks more like its previous version now.

Thanks! Yeah, more easier to follow now.

> > Not sure about "can do IO in bytes" (same wording is used in multiple places).
> 
> I changed it to 'measured in bytes' but I am not sure if this is
> better, open to suggestions.

I'm tempted to say something around "track", would mean things like:

1.

ioop_measured_in_bytes => is_ioop_tracked_in_bytes

2.

s/If an op does not do IO in bytes/If an op does not track bytes/

3.

s/not every IO measured in bytes/not every IO tracks bytes/

4.

s/non-byte-measured and byte-measured/non-tracking and tracking bytes/

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com