Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Re: Show WAL write and fsync stats in pg_stat_io
Date
Msg-id CAN55FZ1c=p0Gwqa=hUqoKHStEtkMqMecdhC5YQf-RbStZVhPhw@mail.gmail.com
Whole thread Raw
In response to Re: Show WAL write and fsync stats in pg_stat_io  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On Wed, 10 Jan 2024 at 08:25, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 03, 2024 at 04:10:58PM +0300, Nazir Bilal Yavuz wrote:
> >
> > I thought removing op_bytes completely ( as you said "This patch
> > extends it with two more operation sizes, and there are even cases
> > where it may be a variable" ) from pg_stat_io view then adding
> > something like {read | write | extend}_bytes and {read | write |
> > extend}_calls could be better, so that we don't lose any information.
>
> But then you'd lose the possibility to analyze correlations between
> the size and the number of the operations, which is something that
> matters for more complex I/O scenarios.  This does not need to be
> tackled in this patch, which is useful on its own, though I am really
> wondering if this is required for the recent work done by Thomas.
> Perhaps Andres, Thomas or Melanie could comment on that?

Yes, you are right.

> >> Yeah, a limitation like that may be acceptable for now.  Tracking the
> >> WAL writer and WAL sender activities can be relevant in a lot of cases
> >> even if we don't have the full picture for the WAL receiver yet.
> >
> > I added that and disabled B_WAL_RECEIVER backend with comments
> > explaining why. v8 is attached.
>
> I can see that's what you have been adding here, which should be OK:
>
> > -    if (track_io_timing)
> > +    /*
> > +     * B_WAL_RECEIVER backend does IOOBJECT_WAL IOObject & IOOP_READ IOOp IOs
> > +     * but these IOs are not countable for now because IOOP_READ IOs' op_bytes
> > +     * (number of bytes per unit of I/O) might not be the same all the time.
> > +     * The current implementation requires that the op_bytes must be the same
> > +     * for the same IOObject, IOContext and IOOp. To avoid confusion, the
> > +     * B_WAL_RECEIVER backend & IOOBJECT_WAL IOObject IOs are disabled for
> > +     * now.
> > +     */
> > +    if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL)
> > +        return;
>
> This could be worded better, but that's one of these nits from me I
> usually tweak when committing stuff.

Thanks for doing that! Do you have any specific comments that can help
improve it?

> > +/*
> > + * Decide if IO timings need to be tracked.  Timings associated to
> > + * IOOBJECT_WAL objects are tracked if track_wal_io_timing is enabled,
> > + * else rely on track_io_timing.
> > + */
> > +static bool
> > +pgstat_should_track_io_time(IOObject io_object)
> > +{
> > +    if (io_object == IOOBJECT_WAL)
> > +        return track_wal_io_timing;
> > +
> > +    return track_io_timing;
> > +}
>
> One thing I was also considering is if eliminating this routine would
> make pgstat_count_io_op_time() more readable the result, but I cannot
> get to that.

I could not think of a way to eliminate pgstat_should_track_io_time()
route without causing performance regressions. What do you think about
moving inside of 'pgstat_should_track_io_time(io_object) if check' to
another function and call this function from
pgstat_count_io_op_time()? This does not change anything but IMO it
increases the readability.

> >          if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND)
> >          {
> > -            pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +            if (io_object != IOOBJECT_WAL)
> > +                pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +
> >              if (io_object == IOOBJECT_RELATION)
> >                  INSTR_TIME_ADD(pgBufferUsage.shared_blk_write_time, io_time);
> >              else if (io_object == IOOBJECT_TEMP_RELATION)
> > @@ -139,7 +177,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
> >          }
> >          else if (io_op == IOOP_READ)
> >          {
> > -            pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +            if (io_object != IOOBJECT_WAL)
> > +                pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +
> >              if (io_object == IOOBJECT_RELATION)
> >                  INSTR_TIME_ADD(pgBufferUsage.shared_blk_read_time, io_time);
> >              else if (io_object == IOOBJECT_TEMP_RELATION)
>
> A second thing is if this would be better with more switch/cases, say:
> switch (io_op):
> {
>     case IOOP_EXTEND:
>     case IOOP_WRITE:
>         switch (io_object):
>         {
>             case WAL:
>                 /* do nothing */
>                 break;
>             case RELATION:
>             case TEMP:
>                 .. blah ..
>         }
>         break;
>     case IOOP_READ:
>         switch (io_object):
>         {
>             .. blah ..
>         }
>         break;
> }
>
> Or just this one to make it clear that nothing happens for WAL
> objects:
> switch (io_object):
> {
>    case WAL:
>        /* do nothing */
>        break;
>    case RELATION:
>        switch (io_op):
>        {
>            case IOOP_EXTEND:
>            case IOOP_WRITE:
>                .. blah ..
>            case IOOP_READ:
>                .. blah ..
>        }
>        break;
>    case TEMP:
>        /* same switch as RELATION */
>        break;
> }
>
> This duplicates a bit things, but at least in the second case it's
> clear which counters are updated when I/O timings are tracked.  It's
> OK by me if people don't like this suggestion, but that would avoid
> bugs like the one I found upthread.

I am more inclined towards the second one because it is more likely
that a new io_object will be introduced rather than a new io_op. So, I
think the second one is a bit more future proof.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: System username in pg_stat_activity
Next
From: Matthias van de Meent
Date:
Subject: Re: the s_lock_stuck on perform_spin_delay