Re: pg_stat_io not tracking smgrwriteback() is confusing - Mailing list pgsql-hackers
From | Jonathan S. Katz |
---|---|
Subject | Re: pg_stat_io not tracking smgrwriteback() is confusing |
Date | |
Msg-id | 86559a5e-ea15-9d0d-bf08-d08cd36e8e93@postgresql.org Whole thread Raw |
In response to | Re: pg_stat_io not tracking smgrwriteback() is confusing (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: pg_stat_io not tracking smgrwriteback() is confusing
|
List | pgsql-hackers |
On 4/27/23 11:36 AM, Melanie Plageman wrote: > Thanks for the review! > > On Wed, Apr 26, 2023 at 10:22 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: >> >> At Wed, 26 Apr 2023 17:08:14 -0400, Melanie Plageman <melanieplageman@gmail.com> wrote in >>> On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman >>> <melanieplageman@gmail.com> wrote: >>>> I've yet to cook up a client backend test case (e.g. with COPY). I've taken >>>> that as a todo. >>> >>> It was trivial to see client backend writebacks in almost any scenario >>> once I set backend_flush_after. I wonder if it is worth mentioning the >>> various "*flush_after" gucs in the docs? >>> >>>> I have a few outstanding questions: >>>> >>>> 1) Does it make sense for writebacks to count the number of blocks for >>>> which writeback was requested or the number of calls to smgrwriteback() or >>>> the number of actual syscalls made? We don't actually know from outside >>>> of mdwriteback() how many FileWriteback() calls we will make. >>> >>> So, in the attached v3, I've kept the first method: writebacks are the >>> number of blocks which the backend has requested writeback of. I'd like >>> it to be clear in the docs exactly what writebacks are (so that people >>> know not to add them together with writes or something like that). I >>> made an effort but could use further docs review. >> >> + Number of units of size <varname>op_bytes</varname> which the backend >> + requested the kernel write out to permanent storage. >> >> I just want to mention that it is not necessarily the same as the >> number of system calls, but I'm not sure what others think about that. > > My thinking is that some other IO operations, for example, extends, > count the number of blocks extended and not the number of syscalls. > >> + Time spent in writeback operations in milliseconds (if >> + <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero). This >> + does not necessarily count the time spent by the kernel writing the >> + data out. The backend will initiate write-out of the dirty pages and >> + wait only if the request queue is full. >> >> The last sentence looks like it's taken from the sync_file_range() man >> page, but I think it's a bit too detailed. We could just say, "The >> time usually only includes the time it takes to queue write-out >> requests.", bit I'm not sure wh... > > Ah, yes, I indeed took heavy inspiration from the sync_file_range() > man page :) I've modified this comment in the attached v4. I didn't want > to say "usually" since I imagine it is quite workload and configuration > dependent. > >>>> 2) I'm a little nervous about not including IOObject in the writeback >>>> context. Technically, there is nothing stopping local buffer code from >>>> calling IssuePendingWritebacks(). Right now, local buffer code doesn't >>>> do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to >>>> hardcode in IOOBJECT_RELATION when there is nothing wrong with >>>> requesting writeback of local buffers (AFAIK). What do you think? >>> >>> I've gone ahead and added IOObject to the WritebackContext. >> >> The smgropen call in IssuePendingWritebacks below clearly shows that >> the function only deals with shared buffers. >> >>> /* and finally tell the kernel to write the data to storage */ >>> reln = smgropen(currlocator, InvalidBackendId); >>> smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks); > > Yes, as it is currently, IssuePendingWritebacks() is only used for shared > buffers. My rationale for including IOObject is that localbuf.c calls > smgr* functions and there isn't anything stopping it from calling > smgrwriteback() or using WritebackContexts (AFAICT). > >> The callback-related code fully depends on callers following its >> expectation. So we can rewrite the following comment added to >> InitBufferPoll with a more confident tone. >> >> + * Initialize per-backend file flush context. IOObject is initialized to >> + * IOOBJECT_RELATION and IOContext to IOCONTEXT_NORMAL since these are the >> + * most likely targets for writeback. The backend can overwrite these as >> + * appropriate. > > I have updated this comment to be more confident and specific. > >> Or I actually think we might not even need to pass around the io_* >> parameters and could just pass immediate values to the >> pgstat_count_io_op_time call. If we ever start using shared buffers >> for thing other than relation files (for example SLRU?), we'll have to >> consider the target individually for each buffer block. That being >> said, I'm fine with how it is either. > > In IssuePendingWritebacks() we don't actually know which IOContext we > are issuing writebacks for when we call pgstat_count_io_op_time() (we do > issue pending writebacks for other IOContexts than IOCONTEXT_NORMAL). I > agree IOObject is not strictly necessary right now. I've kept IOObject a > member of WritebackContext for the reasons I mention above, however, I > am open to removing it if it adds confusion. [RMT hat] Horiguchi-san: do the changes that Melanie made address your feedback? It'd be good if we can get this into Beta 1 if everyone is comfortable with the patch. Thanks, Jonathan
Attachment
pgsql-hackers by date: