Thread: Re: Make pg_stat_io view count IOs as bytes instead of blocks

Re: Make pg_stat_io view count IOs as bytes instead of blocks

From

Melanie Plageman

Date:

27 November 2024, 19:08:01

On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Currently, in the pg_stat_io view, IOs are counted as blocks. However, there are two issues with this approach:
>
> 1- The actual number of IO requests to the kernel is lower because IO requests can be merged before sending the final
request.Additionally, it appears that all IOs are counted in block size.
 

I think this is a great idea. It will allow people to tune
io_combine_limit as you mention below.

> 2- Some IOs may not align with block size. For example, WAL read IOs are done in variable bytes and it is not
possibleto correctly show these IOs in the pg_stat_io view [1].
 

Yep, this makes a lot of sense as a solution.

> To address this, I propose showing the total number of IO requests to the kernel (as smgr function calls) and the
totalnumber of bytes in the IO. To implement this change, the op_bytes column will be removed from the pg_stat_io view.
Instead,the [reads | writes | extends] columns will track the total number of IO requests, and newly added [read |
write| extend]_bytes columns will track the total number of bytes in the IO.
 

smgr API seems like the right place for this.

> Example benefit of this change:
>
> Running query [2], the result is:
>
> ╔═══════════════════╦══════════╦══════════╦═══════════════╗
> ║    backend_type   ║  object  ║  context ║ avg_io_blocks ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║   client backend  ║ relation ║ bulkread ║     15.99     ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║ background worker ║ relation ║ bulkread ║     15.99     ║
> ╚═══════════════════╩══════════╩══════════╩═══════════════╝

I don't understand why background worker is listed here.

> You can rerun the same query [2] after setting io_combine_limit to 32 [3]. The result is:
>
> ╔═══════════════════╦══════════╦══════════╦═══════════════╗
> ║    backend_type   ║  object  ║  context ║ avg_io_blocks ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║   client backend  ║ relation ║ bulkread ║     31.70     ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║ background worker ║ relation ║ bulkread ║     31.60     ║
> ╚═══════════════════╩══════════╩══════════╩═══════════════╝
>
> I believe that having visibility into avg_io_[bytes | blocks] is valuable information that could help optimize
Postgres.

In general, for this example, I think it would be more clear if you
compared what visibility we have in pg_stat_io on master with what
visibility we have with your patch.

I like that you show how io_combine_limit can be tuned using this, but
I don't think the problem statement is clear nor is the full
narrative.

> CREATE TABLE t as select i, repeat('a', 600) as filler from generate_series(1, 10000000) as i;
> SELECT pg_stat_reset_shared('io');
> SELECT * FROM t WHERE i = 0;
> SELECT backend_type, object, context, TRUNC((read_bytes / reads / (SELECT current_setting('block_size')::numeric)),
2)as avg_io_blocks FROM pg_stat_io WHERE reads > 0;
 

I like that you calculate the avg_io_blocks, but I think it is good to
show the raw columns as well.

- Melanie

Re: Make pg_stat_io view count IOs as bytes instead of blocks

From

Bertrand Drouvot

Date:

28 November 2024, 16:39:21

Hi,

On Wed, Nov 27, 2024 at 11:08:01AM -0500, Melanie Plageman wrote:
> On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Currently, in the pg_stat_io view, IOs are counted as blocks. However, there are two issues with this approach:
> >
> > 1- The actual number of IO requests to the kernel is lower because IO requests can be merged before sending the
finalrequest. Additionally, it appears that all IOs are counted in block size.

> 
> I think this is a great idea. It will allow people to tune
> io_combine_limit as you mention below.
> 
> > 2- Some IOs may not align with block size. For example, WAL read IOs are done in variable bytes and it is not
possibleto correctly show these IOs in the pg_stat_io view [1].

> 
> Yep, this makes a lot of sense as a solution.

Thanks for the patch! I also think it makes sense.

A few random comments:

=== 1

+       /*
+        * If IO done in bytes and byte is <= 0, this means there is an error
+        * while doing an IO. Don't count these IOs.
+        */

s/byte/bytes/?

also:

The pgstat_io_count_checks() parameter is uint64. Does it mean it has to be
changed to int64?

Also from what I can see the calls are done with those values:

- 0
- io_buffers_len * BLCKSZ
- extend_by * BLCKSZ
- BLCKSZ

could io_buffers_len and extend_by be < 0? If not, is the comment correct?

=== 2

+       Assert((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND

and

+       if ((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND) &&

What about ordering the enum in IOOp (no bytes/bytes) so that we could check
that io_op >= "our firt bytes enum" instead?

Also we could create a macro on top of that to make it clear. And a comment
would be needed around the IOOp definition.

I think that would be simpler to maintain should we add no bytes or bytes op in
the future.

=== 3

+pgstat_io_count_checks(IOObject io_object, IOContext io_context, IOOp io_op, uint64 bytes)
+{
+       Assert((unsigned int) io_object < IOOBJECT_NUM_TYPES);
+       Assert((unsigned int) io_context < IOCONTEXT_NUM_TYPES);
+       Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+       Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));

IOObject and IOContext are passed only for the assertions. What about removing
them from there and put the asserts in other places?

=== 4

+       /* Only IOOP_READ, IOOP_WRITE and IOOP_EXTEND can do IO in bytes. */

Not sure about "can do IO in bytes" (same wording is used in multiple places).

=== 5

/* Convert to numeric. */

"convert to numeric"? to be consistent with others single line comments around.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Make pg_stat_io view count IOs as bytes instead of blocks

From

Bertrand Drouvot

Date:

05 December 2024, 12:13:53

Hi,

On Wed, Dec 04, 2024 at 02:49:11PM +0300, Nazir Bilal Yavuz wrote:
> On Thu, 28 Nov 2024 at 16:39, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> You are right, no need to have this check; it can not be less than 0.
> I completely removed the function now.

Thanks! Yeah I think that makes sense. 

> > What about ordering the enum in IOOp (no bytes/bytes) so that we could check
> > that io_op >= "our firt bytes enum" instead?
> >
> > Also we could create a macro on top of that to make it clear. And a comment
> > would be needed around the IOOp definition.
> >
> > I think that would be simpler to maintain should we add no bytes or bytes op in
> > the future.
> 
> I think this is a good idea. I applied all the comments.

Thanks!

+ * non-byte-measured and byte-measured operations. So, that makes is easier
+ * to check whether IO is measured in bytes.

s/that makes is/that makes it/

+ *
+ * Note: If this enum is modified, ensure that both `IOOP_NUM_TYPES` macro
+ * and the `ioop_measured_in_bytes` function are updated accordingly.

Yeah, or?

"Ensure IOOP_EXTEND is the first and IOOP_WRITE the last ones in the measured in
bytes" group and that the groups stay in that order. 

That would make the "checks" local to the enum def should someone modify it.

> Created an
> inline function instead of macro and added this 'Assert((unsigned int)
> io_object < IOOBJECT_NUM_TYPES);' to function.

An inline function seems the right choice for me too.

> Done. I moved these checks to the pgstat_count_io_op_n() function. The
> code looks more like its previous version now.

Thanks! Yeah, more easier to follow now.

> > Not sure about "can do IO in bytes" (same wording is used in multiple places).
> 
> I changed it to 'measured in bytes' but I am not sure if this is
> better, open to suggestions.

I'm tempted to say something around "track", would mean things like:

1.

ioop_measured_in_bytes => is_ioop_tracked_in_bytes

2.

s/If an op does not do IO in bytes/If an op does not track bytes/

3.

s/not every IO measured in bytes/not every IO tracks bytes/

4.

s/non-byte-measured and byte-measured/non-tracking and tracking bytes/

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Make pg_stat_io view count IOs as bytes instead of blocks

From

Bertrand Drouvot

Date:

09 January, 18:30:37

Hi,

On Thu, Jan 09, 2025 at 02:20:16PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Thu, 9 Jan 2025 at 11:11, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Jan 09, 2025 at 10:15:20AM +0300, Nazir Bilal Yavuz wrote:
> > > I am a bit confused, are you suggesting these two alternatives:
> > > 1- Making pgstat_count_io_op_n() static and continuing to use
> > > pgstat_count_io_op() as it is.
> > > 2- Removing pgstat_count_io_op() and instead using
> > > pgstat_count_io_op_n() everywhere.
> >
> > Either of these options is OK by me.  The current state of things just
> > seems a bit strange because we publish a routine that's used nowhere.
> > If you have plans for it in a different patch, that's also fine.
> 
> I followed the second option as it is similar to
> pgstat_count_io_op_time() and also more future proof. I attached it as
> another patch. v7 is attached.

Thanks for the patches!

v7-0001:

+pg_attribute_unused()
+static inline bool
+is_ioop_tracked_in_bytes(IOOp io_op)
+{
+       Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+       return io_op >= IOOP_EXTEND;
+}

and then

+       Assert(is_ioop_tracked_in_bytes(io_op) || bytes == 0);

We first use an Assert in is_ioop_tracked_in_bytes() and then we return
a value "just" to check another Assert. I wonder if it wouldn't make more sense
to get rid of this function and use a macro instead, something like?

#define is_ioop_tracked_in_bytes(io_op) \
    ((io_op) < IOOP_NUM_TYPES && (io_op) >= IOOP_EXTEND)

v7-0002:

I wonder if it wouldn't make more sense to remove pgstat_count_io_op() first
and then implement what currently is in v7-0001. What v7-0002 is removing is
not produced by v7-0001.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com