Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id 24mblqwtpzwncjcmfoqhpyuwzcejrnnyddska3h2z6fmmkh5t2@gldyx4346n3y
Whole thread Raw
In response to Re: AIO v2.5  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hi,

On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote:
> On Wed, 9 Jul 2025 at 16:59, Andres Freund <andres@anarazel.de> wrote:
> > > 3. I noticed that there is AIO code for writev-related operations
> > > (specifically, pgaio_io_start_writev is exposed, as is
> > > PGAIO_OP_WRITEV), but no practical way to excercise that code: it's
> > > not called from anywhere in the project, and there is no way for
> > > extensions to register the relevant callbacks required to make writev
> > > work well on buffered contents. Is that intentional?
> >
> > Yes.  We obviously do want to support writes eventually, and it didn't seem
> > useful to not have the most basic code for writes in the AIO infrastructure.
> >
> > You could still use it to e.g. write out temporary file data or such.
>
> Yes, though IIUC that would require an implementation of at least
> PgAioTargetInfo for such a use case (it's definitely not a SMGR
> target), which currently isn't available and can't be registered
> dynamically by an extension. Or maybe did I miss something?

I can see some hacky ways around that, but they're just that, hacky...



> (PS. I'm not quite 100% sure that it is impossible to use, just that
> there are rather few handles available for using this part of the new
> tool, and it seems completely untested in the PG18 branch)

I'm not saying it's 100% ready to use without modifying core code, but for
something that's like 30 lines of code, as part of a considerably larger
subystem, I just don't see a problem with writev not yet being covered.  It's
just incremental development.


> -----
>
> Something else I've just noticed is the use of int32 in
> PgAIOHandle->result. In sync and worker mode, pg_preadv and pg_pwritev
> return ssize_t, which most modern systems can't fit in int32 (the
> output was int before, then size_t, then ssize_t: [0]).

I don't think there's anything that can actually do IO that's large enough to
be problematic. What's the potential scenario where you'd want to read/write
more than 3GB of data within one syscall? That just doesn't seem to make
sense.


> While not directly an issue in default PG18 due to the use of 1GB relation
> segments capping the max IO size for SMGR-managed IOs (and various other
> code-level constraints), this may have more issues when an extension starts
> bulk-reading data on a system compiled with RELSEG_SIZE >= 2GB; I can't find
> any protective checks against overflows in downcasting the IO result.

I don't think the relation size is relevant piece here, it's just that it
doesn't make sense (and likely isn't possible) to read that much data at once.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Improving and extending int128.h to more of numeric.c
Next
From: Dmitry Mityugov
Date:
Subject: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings