Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Show WAL write and fsync stats in pg_stat_io |
Date | |
Msg-id | CAN55FZ1c=p0Gwqa=hUqoKHStEtkMqMecdhC5YQf-RbStZVhPhw@mail.gmail.com Whole thread Raw |
In response to | Re: Show WAL write and fsync stats in pg_stat_io (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
Hi, On Wed, 10 Jan 2024 at 08:25, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 03, 2024 at 04:10:58PM +0300, Nazir Bilal Yavuz wrote: > > > > I thought removing op_bytes completely ( as you said "This patch > > extends it with two more operation sizes, and there are even cases > > where it may be a variable" ) from pg_stat_io view then adding > > something like {read | write | extend}_bytes and {read | write | > > extend}_calls could be better, so that we don't lose any information. > > But then you'd lose the possibility to analyze correlations between > the size and the number of the operations, which is something that > matters for more complex I/O scenarios. This does not need to be > tackled in this patch, which is useful on its own, though I am really > wondering if this is required for the recent work done by Thomas. > Perhaps Andres, Thomas or Melanie could comment on that? Yes, you are right. > >> Yeah, a limitation like that may be acceptable for now. Tracking the > >> WAL writer and WAL sender activities can be relevant in a lot of cases > >> even if we don't have the full picture for the WAL receiver yet. > > > > I added that and disabled B_WAL_RECEIVER backend with comments > > explaining why. v8 is attached. > > I can see that's what you have been adding here, which should be OK: > > > - if (track_io_timing) > > + /* > > + * B_WAL_RECEIVER backend does IOOBJECT_WAL IOObject & IOOP_READ IOOp IOs > > + * but these IOs are not countable for now because IOOP_READ IOs' op_bytes > > + * (number of bytes per unit of I/O) might not be the same all the time. > > + * The current implementation requires that the op_bytes must be the same > > + * for the same IOObject, IOContext and IOOp. To avoid confusion, the > > + * B_WAL_RECEIVER backend & IOOBJECT_WAL IOObject IOs are disabled for > > + * now. > > + */ > > + if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL) > > + return; > > This could be worded better, but that's one of these nits from me I > usually tweak when committing stuff. Thanks for doing that! Do you have any specific comments that can help improve it? > > +/* > > + * Decide if IO timings need to be tracked. Timings associated to > > + * IOOBJECT_WAL objects are tracked if track_wal_io_timing is enabled, > > + * else rely on track_io_timing. > > + */ > > +static bool > > +pgstat_should_track_io_time(IOObject io_object) > > +{ > > + if (io_object == IOOBJECT_WAL) > > + return track_wal_io_timing; > > + > > + return track_io_timing; > > +} > > One thing I was also considering is if eliminating this routine would > make pgstat_count_io_op_time() more readable the result, but I cannot > get to that. I could not think of a way to eliminate pgstat_should_track_io_time() route without causing performance regressions. What do you think about moving inside of 'pgstat_should_track_io_time(io_object) if check' to another function and call this function from pgstat_count_io_op_time()? This does not change anything but IMO it increases the readability. > > if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) > > { > > - pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); > > + if (io_object != IOOBJECT_WAL) > > + pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); > > + > > if (io_object == IOOBJECT_RELATION) > > INSTR_TIME_ADD(pgBufferUsage.shared_blk_write_time, io_time); > > else if (io_object == IOOBJECT_TEMP_RELATION) > > @@ -139,7 +177,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, > > } > > else if (io_op == IOOP_READ) > > { > > - pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); > > + if (io_object != IOOBJECT_WAL) > > + pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); > > + > > if (io_object == IOOBJECT_RELATION) > > INSTR_TIME_ADD(pgBufferUsage.shared_blk_read_time, io_time); > > else if (io_object == IOOBJECT_TEMP_RELATION) > > A second thing is if this would be better with more switch/cases, say: > switch (io_op): > { > case IOOP_EXTEND: > case IOOP_WRITE: > switch (io_object): > { > case WAL: > /* do nothing */ > break; > case RELATION: > case TEMP: > .. blah .. > } > break; > case IOOP_READ: > switch (io_object): > { > .. blah .. > } > break; > } > > Or just this one to make it clear that nothing happens for WAL > objects: > switch (io_object): > { > case WAL: > /* do nothing */ > break; > case RELATION: > switch (io_op): > { > case IOOP_EXTEND: > case IOOP_WRITE: > .. blah .. > case IOOP_READ: > .. blah .. > } > break; > case TEMP: > /* same switch as RELATION */ > break; > } > > This duplicates a bit things, but at least in the second case it's > clear which counters are updated when I/O timings are tracked. It's > OK by me if people don't like this suggestion, but that would avoid > bugs like the one I found upthread. I am more inclined towards the second one because it is more likely that a new io_object will be introduced rather than a new io_op. So, I think the second one is a bit more future proof. -- Regards, Nazir Bilal Yavuz Microsoft
pgsql-hackers by date: