Re: pg_stat_io not tracking smgrwriteback() is confusing - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: pg_stat_io not tracking smgrwriteback() is confusing
Date
Msg-id CAAKRu_ZeOtUU1sCZfpF-wa3FicdfOYXtbDNs3H3ZTvqEx8i-6g@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_io not tracking smgrwriteback() is confusing  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: pg_stat_io not tracking smgrwriteback() is confusing
Re: pg_stat_io not tracking smgrwriteback() is confusing
List pgsql-hackers
Thanks for the review!

On Wed, Apr 26, 2023 at 10:22 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 26 Apr 2023 17:08:14 -0400, Melanie Plageman <melanieplageman@gmail.com> wrote in
> > On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> > > I've yet to cook up a client backend test case (e.g. with COPY). I've taken
> > > that as a todo.
> >
> > It was trivial to see client backend writebacks in almost any scenario
> > once I set backend_flush_after. I wonder if it is worth mentioning the
> > various "*flush_after" gucs in the docs?
> >
> > > I have a few outstanding questions:
> > >
> > > 1) Does it make sense for writebacks to count the number of blocks for
> > > which writeback was requested or the number of calls to smgrwriteback() or
> > > the number of actual syscalls made? We don't actually know from outside
> > > of mdwriteback() how many FileWriteback() calls we will make.
> >
> > So, in the attached v3, I've kept the first method: writebacks are the
> > number of blocks which the backend has requested writeback of. I'd like
> > it to be clear in the docs exactly what writebacks are (so that people
> > know not to add them together with writes or something like that). I
> > made an effort but could use further docs review.
>
> +        Number of units of size <varname>op_bytes</varname> which the backend
> +        requested the kernel write out to permanent storage.
>
> I just want to mention that it is not necessarily the same as the
> number of system calls, but I'm not sure what others think about that.

My thinking is that some other IO operations, for example, extends,
count the number of blocks extended and not the number of syscalls.

> +        Time spent in writeback operations in milliseconds (if
> +        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero). This
> +        does not necessarily count the time spent by the kernel writing the
> +        data out. The backend will initiate write-out of the dirty pages and
> +        wait only if the request queue is full.
>
> The last sentence looks like it's taken from the sync_file_range() man
> page, but I think it's a bit too detailed. We could just say, "The
> time usually only includes the time it takes to queue write-out
> requests.", bit I'm not sure wh...

Ah, yes, I indeed took heavy inspiration from the sync_file_range()
man page :) I've modified this comment in the attached v4. I didn't want
to say "usually" since I imagine it is quite workload and configuration
dependent.

> > > 2) I'm a little nervous about not including IOObject in the writeback
> > > context. Technically, there is nothing stopping local buffer code from
> > > calling IssuePendingWritebacks(). Right now, local buffer code doesn't
> > > do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to
> > > hardcode in IOOBJECT_RELATION when there is nothing wrong with
> > > requesting writeback of local buffers (AFAIK). What do you think?
> >
> > I've gone ahead and added IOObject to the WritebackContext.
>
> The smgropen call in IssuePendingWritebacks below clearly shows that
> the function only deals with shared buffers.
>
> >               /* and finally tell the kernel to write the data to storage */
> >               reln = smgropen(currlocator, InvalidBackendId);
> >               smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks);

Yes, as it is currently, IssuePendingWritebacks() is only used for shared
buffers. My rationale for including IOObject is that localbuf.c calls
smgr* functions and there isn't anything stopping it from calling
smgrwriteback() or using WritebackContexts (AFAICT).

> The callback-related code fully depends on callers following its
> expectation. So we can rewrite the following comment added to
> InitBufferPoll with a more confident tone.
>
> +        * Initialize per-backend file flush context. IOObject is initialized to
> +        * IOOBJECT_RELATION and IOContext to IOCONTEXT_NORMAL since these are the
> +        * most likely targets for writeback. The backend can overwrite these as
> +        * appropriate.

I have updated this comment to be more confident and specific.

> Or I actually think we might not even need to pass around the io_*
> parameters and could just pass immediate values to the
> pgstat_count_io_op_time call. If we ever start using shared buffers
> for thing other than relation files (for example SLRU?), we'll have to
> consider the target individually for each buffer block. That being
> said, I'm fine with how it is either.

In IssuePendingWritebacks() we don't actually know which IOContext we
are issuing writebacks for when we call pgstat_count_io_op_time() (we do
issue pending writebacks for other IOContexts than IOCONTEXT_NORMAL). I
agree IOObject is not strictly necessary right now. I've kept IOObject a
member of WritebackContext for the reasons I mention above, however, I
am open to removing it if it adds confusion.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Possible regression setting GUCs on \connect
Next
From: Tom Lane
Date:
Subject: Re: Possible regression setting GUCs on \connect