Re: AIO v2.5 - Mailing list pgsql-hackers

From Noah Misch
Subject Re: AIO v2.5
Date
Msg-id 20250320011118.8d.nmisch@google.com
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
Responses Re: AIO v2.5
List pgsql-hackers
On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote:
> On 2025-03-19 14:25:30 -0700, Noah Misch wrote:
> > commit 55b454d wrote:
> > >     aio: Infrastructure for io_method=worker
> >
> > > +        /* Try to launch one. */
> > > +        child = StartChildProcess(B_IO_WORKER);
> > > +        if (child != NULL)
> > > +        {
> > > +            io_worker_children[id] = child;
> > > +            ++io_worker_count;
> > > +        }
> > > +        else
> > > +            break;                /* XXX try again soon? */
> >
> > I'd change the comment to something like one of:
> >
> >   retry after DetermineSleepTime()
> >   next LaunchMissingBackgroundProcesses() will retry in <60s
> 
> Hm, we retry more frequently that that if there are new connections...  Maybe
> just "try again next time"?

Works for me.

> > On Tue, Mar 18, 2025 at 04:12:18PM -0400, Andres Freund wrote:
> > > Subject: [PATCH v2.10 08/28] bufmgr: Implement AIO read support
> >
> > Some comments about BM_IO_IN_PROGRESS may need updates.  This paragraph:
> >
> > * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a
> > buffer to complete (and in releases before 14, it was accompanied by a
> > per-buffer LWLock).  The process doing a read or write sets the flag for the
> > duration, and processes that need to wait for it to be cleared sleep on a
> > condition variable.
> 
> First draft:
> * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a
> buffer to complete (and in releases before 14, it was accompanied by a
> per-buffer LWLock).  The process start a read or write sets the flag. When the
s/start/starting/
> I/O is completed, be it by the process that initiated the I/O or by another
> process, the flag is removed and the Buffer's condition variable is signalled.
> Processes that need to wait for the I/O to complete can wait for asynchronous
> I/O to using BufferDesc->io_wref and for BM_IO_IN_PROGRESS to be unset by
s/to using/by using/
> sleeping on the buffer's condition variable.

Sounds good.

> > And these individual lines from "git grep BM_IO_IN_PROGRESS":
> >
> >  *    i.e at most one BM_IO_IN_PROGRESS bit is set per proc.
> >
> > The last especially.
> 
> Huh - yea.  This isn't a "new" issue, I think I missed this comment in 16's
> 12f3867f5534.  I think the comment can just be deleted?

Hmm, yes, it's orthogonal to $SUBJECT and deletion works fine.

> >          * I/O already in progress.  We already hold BM_IO_IN_PROGRESS for the
> >      * only one process at a time can set the BM_IO_IN_PROGRESS bit.
> >      * only one process at a time can set the BM_IO_IN_PROGRESS bit.
> 
> > For the other three lines and the paragraph, the notion
> > of a process "holding" BM_IO_IN_PROGRESS or being the process to "set" it or
> > being the process "doing a read" becomes less significant when one process
> > starts the IO and another completes it.
> 
> Hm. I think they'd be ok as-is, but we can probably improve them. Maybe

Looking again, I agree they're okay.

> 
>      * Now it's safe to write buffer to disk. Note that no one else should
>      * have been able to write it while we were busy with log flushing because
>      * we got the exclusive right to perform I/O by setting the
>      * BM_IO_IN_PROGRESS bit.

That's fine too.  Maybe s/perform/stage/ or s/perform/start/.

> > I see this relies on md_readv_complete having converted "result" to blocks.
> > Was there some win from doing that as opposed to doing the division here?
> > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel easier
> > to follow, to me.
> 
> It seemed like that would be wrong layering - what if we had an smgr that
> could store data in a compressed format? The raw read would be of a smaller
> size. The smgr API deals in BlockNumbers, only the md.c layer should know
> about bytes.

I hadn't thought of that.  That's a good reason.



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Doc: Fixup misplaced filelist.sgml entities and add some commentary
Next
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations