Re: AIO v2.5 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | 20250320011118.8d.nmisch@google.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote: > On 2025-03-19 14:25:30 -0700, Noah Misch wrote: > > commit 55b454d wrote: > > > aio: Infrastructure for io_method=worker > > > > > + /* Try to launch one. */ > > > + child = StartChildProcess(B_IO_WORKER); > > > + if (child != NULL) > > > + { > > > + io_worker_children[id] = child; > > > + ++io_worker_count; > > > + } > > > + else > > > + break; /* XXX try again soon? */ > > > > I'd change the comment to something like one of: > > > > retry after DetermineSleepTime() > > next LaunchMissingBackgroundProcesses() will retry in <60s > > Hm, we retry more frequently that that if there are new connections... Maybe > just "try again next time"? Works for me. > > On Tue, Mar 18, 2025 at 04:12:18PM -0400, Andres Freund wrote: > > > Subject: [PATCH v2.10 08/28] bufmgr: Implement AIO read support > > > > Some comments about BM_IO_IN_PROGRESS may need updates. This paragraph: > > > > * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a > > buffer to complete (and in releases before 14, it was accompanied by a > > per-buffer LWLock). The process doing a read or write sets the flag for the > > duration, and processes that need to wait for it to be cleared sleep on a > > condition variable. > > First draft: > * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a > buffer to complete (and in releases before 14, it was accompanied by a > per-buffer LWLock). The process start a read or write sets the flag. When the s/start/starting/ > I/O is completed, be it by the process that initiated the I/O or by another > process, the flag is removed and the Buffer's condition variable is signalled. > Processes that need to wait for the I/O to complete can wait for asynchronous > I/O to using BufferDesc->io_wref and for BM_IO_IN_PROGRESS to be unset by s/to using/by using/ > sleeping on the buffer's condition variable. Sounds good. > > And these individual lines from "git grep BM_IO_IN_PROGRESS": > > > > * i.e at most one BM_IO_IN_PROGRESS bit is set per proc. > > > > The last especially. > > Huh - yea. This isn't a "new" issue, I think I missed this comment in 16's > 12f3867f5534. I think the comment can just be deleted? Hmm, yes, it's orthogonal to $SUBJECT and deletion works fine. > > * I/O already in progress. We already hold BM_IO_IN_PROGRESS for the > > * only one process at a time can set the BM_IO_IN_PROGRESS bit. > > * only one process at a time can set the BM_IO_IN_PROGRESS bit. > > > For the other three lines and the paragraph, the notion > > of a process "holding" BM_IO_IN_PROGRESS or being the process to "set" it or > > being the process "doing a read" becomes less significant when one process > > starts the IO and another completes it. > > Hm. I think they'd be ok as-is, but we can probably improve them. Maybe Looking again, I agree they're okay. > > * Now it's safe to write buffer to disk. Note that no one else should > * have been able to write it while we were busy with log flushing because > * we got the exclusive right to perform I/O by setting the > * BM_IO_IN_PROGRESS bit. That's fine too. Maybe s/perform/stage/ or s/perform/start/. > > I see this relies on md_readv_complete having converted "result" to blocks. > > Was there some win from doing that as opposed to doing the division here? > > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel easier > > to follow, to me. > > It seemed like that would be wrong layering - what if we had an smgr that > could store data in a compressed format? The raw read would be of a smaller > size. The smgr API deals in BlockNumbers, only the md.c layer should know > about bytes. I hadn't thought of that. That's a good reason.
pgsql-hackers by date: