Re: AIO v2.5 - Mailing list pgsql-hackers

From Noah Misch
Subject Re: AIO v2.5
Date
Msg-id 20250320195245.b5.nmisch@google.com
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Mar 20, 2025 at 02:54:14PM -0400, Andres Freund wrote:
> On 2025-03-19 18:11:18 -0700, Noah Misch wrote:
> > On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote:
> > > On 2025-03-19 14:25:30 -0700, Noah Misch wrote:
> > > > I see this relies on md_readv_complete having converted "result" to blocks.
> > > > Was there some win from doing that as opposed to doing the division here?
> > > > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel easier
> > > > to follow, to me.
> > > 
> > > It seemed like that would be wrong layering - what if we had an smgr that
> > > could store data in a compressed format? The raw read would be of a smaller
> > > size. The smgr API deals in BlockNumbers, only the md.c layer should know
> > > about bytes.
> > 
> > I hadn't thought of that.  That's a good reason.
> 
> I thought that was better documented, but alas, it wasn't. How about updating
> the documentation of smgrstartreadv to the following:
> 
> /*
>  * smgrstartreadv() -- asynchronous version of smgrreadv()
>  *
>  * This starts an asynchronous readv IO using the IO handle `ioh`. Other than
>  * `ioh` all parameters are the same as smgrreadv().
>  *
>  * Completion callbacks above smgr will be passed the result as the number of
>  * successfully read blocks if the read [partially] succeeds. This maintains
>  * the abstraction that smgr operates on the level of blocks, rather than
>  * bytes.
>  */

That's good.  Possibly add "(Buffers for blocks not successfully read might
bear unspecified modifications, up to the full nblocks.)"

In a bit of over-thinking this, I wondered if shared_buffer_readv_complete
would be better named shared_buffer_smgrreadv_complete, to emphasize the
smgrreadv semantics.  PGAIO_HCB_SHARED_BUFFER_READV likewise.  But I tend to
think not.  smgrreadv() has no "result" concept, so the symmetry is limited.

> I briefly had a bug in test_aio's injection point that lead to *increasing*
> the number of bytes successfully read. That triggered an assertion failure in
> bufmgr.c, but not closer to the problem.  Is it worth adding an assert against
> that to md_readv_complete? Can't quite decide.

I'd lean yes, if in doubt.



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Have postgres.bki self-identify
Next
From: Andres Freund
Date:
Subject: Re: md.c vs elog.c vs smgrreleaseall() in barrier