Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id sakxksi5yt2ruf7temo7imakewfjljrdy4zfmqtklafjqra6jg@fyj4jvkncfcc
Whole thread Raw
In response to Re: AIO v2.5  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hi,

On 2025-07-09 13:26:09 +0200, Matthias van de Meent wrote:
> I've been going through the new AIO code as an effort to rebase and
> adapt Neon to PG18. In doing so, I found the following
> items/curiosities:
> 
> 1. In aio/README.md, the following code snippet is found:
> 
> [...]
> pgaio_io_set_handle_data_32(ioh, (uint32 *) buffer, 1);
> [...]
> 
> I believe it would be clearer if it took a reference to the buffer:
> 
> pgaio_io_set_handle_data_32(ioh, (uint32 *) &buffer, 1);
> 
> The main reason here is that common practice is to have a `Buffer
> buffer;` whereas a Buffer * is more commonly plural.

It's also just simply wrong as-is :/. Interpreting the buffer id as a pointer
obviously makes no sense...


> 
> 2. In aio.h, PgAioHandleCallbackID is checked to fit in
> PGAIO_RESULT_ID_BITS (though the value of PGAIO_HCB_MAX).
> However, the check is off by 1:
> 
> ```
> #define PGAIO_HCB_MAX    PGAIO_HCB_LOCAL_BUFFER_READV
> StaticAssertDecl(PGAIO_HCB_MAX <= (1 << PGAIO_RESULT_ID_BITS), [...])
> ```
> 
> This static assert will not trigger when PGAIO_HCB_MAX is equal to
> 2^PGAIO_RESULT_ID_BITS, but its value won't fit in those 6 bits and
> instead overflow to 0.
> To fix this, I suggest the `<=` is replaced with `<` to make that
> work, or the definition of PGAIO_HCB_MAX to be updated to
> PGAIO_HCB_LOCAL_BUFFER_READV + 1.

Ooops.



> 3. I noticed that there is AIO code for writev-related operations
> (specifically, pgaio_io_start_writev is exposed, as is
> PGAIO_OP_WRITEV), but no practical way to excercise that code: it's
> not called from anywhere in the project, and there is no way for
> extensions to register the relevant callbacks required to make writev
> work well on buffered contents. Is that intentional?

Yes.  We obviously do want to support writes eventually, and it didn't seem
useful to not have the most basic code for writes in the AIO infrastructure.

You could still use it to e.g. write out temporary file data or such.


> and there is no way for extensions to register the relevant callbacks
> required to make writev work well on buffered contents. Is that intentional?

FWIW, the problem with writev for buffered IO is not so much with the AIO
infrastructure, but with buffer locking and sync.c...




> Relatedly, the rationale for using enum PgAioHandleCallbackID rather
> than function pointers in the documentation above its definition says
> that function pointer instability across backends is one of the 2 main
> reasons.
> Is there any example of OS or linker behaviour that does not start
> PostgreSQL with stable function pointer addresses across backends of
> the same PostgreSQL binary? Or is this designed with extensibility
> and/or cross-version EXEC_BACKEND in mind, but with extensibility not
> yet been implemented due to $constraints?

It's due to EXEC_BACKEND - function pointers aren't stable with EXEC_BACKEND
even *without* cross-version issues. Due to ASLR function/data pointers may be
shifted around.

Even without EXEC_BACKEND we probably would want to have something like
PgAioHandleCallbackID, just to save space. But it'd be a lot easier to make it
extensible, because we could add entries once, rather than having to do so in
every process.


I suspect to make this extensible in the face of EXEC_BACKEND we'd want an
array of dynamically registered callbacks, in shared memory, that has the
library_name/symbol_name for the location of each callback struct, to be
registered in _PG_init(), which then would be entered into the backend-local
table in pgaio_init_backend().

Personally I'm not particularly interested in investing the energy to make
that work, I'd rather wait till we have threading and focus on the
infrastructure work for AIO writes etc. But then I don't work on an extension
these days, so I'm not directly affected by this not being extensible - I'm
not opposed to somebody else doing that work.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Rintaro Ikeda
Date:
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench
Next
From: Andres Freund
Date:
Subject: Re: Adding wait events statistics