Re: pg_stat_io not tracking smgrwriteback() is confusing - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: pg_stat_io not tracking smgrwriteback() is confusing
Date
Msg-id 86559a5e-ea15-9d0d-bf08-d08cd36e8e93@postgresql.org
Whole thread Raw
In response to Re: pg_stat_io not tracking smgrwriteback() is confusing  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: pg_stat_io not tracking smgrwriteback() is confusing
List pgsql-hackers
On 4/27/23 11:36 AM, Melanie Plageman wrote:
> Thanks for the review!
> 
> On Wed, Apr 26, 2023 at 10:22 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>>
>> At Wed, 26 Apr 2023 17:08:14 -0400, Melanie Plageman <melanieplageman@gmail.com> wrote in
>>> On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman
>>> <melanieplageman@gmail.com> wrote:
>>>> I've yet to cook up a client backend test case (e.g. with COPY). I've taken
>>>> that as a todo.
>>>
>>> It was trivial to see client backend writebacks in almost any scenario
>>> once I set backend_flush_after. I wonder if it is worth mentioning the
>>> various "*flush_after" gucs in the docs?
>>>
>>>> I have a few outstanding questions:
>>>>
>>>> 1) Does it make sense for writebacks to count the number of blocks for
>>>> which writeback was requested or the number of calls to smgrwriteback() or
>>>> the number of actual syscalls made? We don't actually know from outside
>>>> of mdwriteback() how many FileWriteback() calls we will make.
>>>
>>> So, in the attached v3, I've kept the first method: writebacks are the
>>> number of blocks which the backend has requested writeback of. I'd like
>>> it to be clear in the docs exactly what writebacks are (so that people
>>> know not to add them together with writes or something like that). I
>>> made an effort but could use further docs review.
>>
>> +        Number of units of size <varname>op_bytes</varname> which the backend
>> +        requested the kernel write out to permanent storage.
>>
>> I just want to mention that it is not necessarily the same as the
>> number of system calls, but I'm not sure what others think about that.
> 
> My thinking is that some other IO operations, for example, extends,
> count the number of blocks extended and not the number of syscalls.
> 
>> +        Time spent in writeback operations in milliseconds (if
>> +        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero). This
>> +        does not necessarily count the time spent by the kernel writing the
>> +        data out. The backend will initiate write-out of the dirty pages and
>> +        wait only if the request queue is full.
>>
>> The last sentence looks like it's taken from the sync_file_range() man
>> page, but I think it's a bit too detailed. We could just say, "The
>> time usually only includes the time it takes to queue write-out
>> requests.", bit I'm not sure wh...
> 
> Ah, yes, I indeed took heavy inspiration from the sync_file_range()
> man page :) I've modified this comment in the attached v4. I didn't want
> to say "usually" since I imagine it is quite workload and configuration
> dependent.
> 
>>>> 2) I'm a little nervous about not including IOObject in the writeback
>>>> context. Technically, there is nothing stopping local buffer code from
>>>> calling IssuePendingWritebacks(). Right now, local buffer code doesn't
>>>> do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to
>>>> hardcode in IOOBJECT_RELATION when there is nothing wrong with
>>>> requesting writeback of local buffers (AFAIK). What do you think?
>>>
>>> I've gone ahead and added IOObject to the WritebackContext.
>>
>> The smgropen call in IssuePendingWritebacks below clearly shows that
>> the function only deals with shared buffers.
>>
>>>                /* and finally tell the kernel to write the data to storage */
>>>                reln = smgropen(currlocator, InvalidBackendId);
>>>                smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks);
> 
> Yes, as it is currently, IssuePendingWritebacks() is only used for shared
> buffers. My rationale for including IOObject is that localbuf.c calls
> smgr* functions and there isn't anything stopping it from calling
> smgrwriteback() or using WritebackContexts (AFAICT).
> 
>> The callback-related code fully depends on callers following its
>> expectation. So we can rewrite the following comment added to
>> InitBufferPoll with a more confident tone.
>>
>> +        * Initialize per-backend file flush context. IOObject is initialized to
>> +        * IOOBJECT_RELATION and IOContext to IOCONTEXT_NORMAL since these are the
>> +        * most likely targets for writeback. The backend can overwrite these as
>> +        * appropriate.
> 
> I have updated this comment to be more confident and specific.
> 
>> Or I actually think we might not even need to pass around the io_*
>> parameters and could just pass immediate values to the
>> pgstat_count_io_op_time call. If we ever start using shared buffers
>> for thing other than relation files (for example SLRU?), we'll have to
>> consider the target individually for each buffer block. That being
>> said, I'm fine with how it is either.
> 
> In IssuePendingWritebacks() we don't actually know which IOContext we
> are issuing writebacks for when we call pgstat_count_io_op_time() (we do
> issue pending writebacks for other IOContexts than IOCONTEXT_NORMAL). I
> agree IOObject is not strictly necessary right now. I've kept IOObject a
> member of WritebackContext for the reasons I mention above, however, I
> am open to removing it if it adds confusion.

[RMT hat]

Horiguchi-san: do the changes that Melanie made address your feedback?

It'd be good if we can get this into Beta 1 if everyone is comfortable 
with the patch.

Thanks,

Jonathan

Attachment

pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: Move defaults toward ICU in 16?
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: psql: Add role's membership options to the \du+ command