Re: AIO v2.2 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.2
Date
Msg-id fsauvxs3xzgqsowpu4cyon5pj4nwzfejbazsd5aqbd5t3qxi6p@fklsi6bpmniw
Whole thread Raw
In response to Re: AIO v2.2  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2025-01-07 14:59:58 -0500, Robert Haas wrote:
> On Tue, Jan 7, 2025 at 11:11 AM Andres Freund <andres@anarazel.de> wrote:
> > The difference between a handle and a reference is useful right now, to have
> > some separation between the functions that can be called by anyone (taking a
> > PgAioHandleRef) and only by the issuer (PgAioHandle). That might better be
> > solved by having a PgAioHandleIssuerRef ref or something.
>
> To me, those names don't convey that.

I'm certainly not wedded to these names - I went back and forth between
different names a fair bit, because I wasn't quite happy. I am however certain
that the current names are better than what it used to be (PgAioInProgress and
because that's long, a bunch of PgAioIP* names) :)


To make sure were talking about the same things, I am thinking of the
following "entities" needing names:


1) Shared memory representation of an IO, for the AIO subsystem internally

   Currently: PgAioHandle

   Because shared memory is limited, we need to reuse this entity. This reuse
   needs to be possible "immediately" after completion, to avoid a bunch of
   nasty scenarios.

   To distinguish a reused PgAioHandle from its "prior" incarnation, each
   PgAioHandle has a 64bit "generation counter.

   In addition to being referenceable via pointer, it's also possible to
   assign a 32bit integer to each PgAioHandle, as there is a fixed number of
   them.


2) A way for the issuer of an IO to reference 1), to attach information to the
   IO

   Currently: PgAioHandle*

   As long as the issuer hasn't yet staged the IO, it can't be
   reused. Therefore it's OK to just point to the PgAioHandle.

   One disadvantage of just using a pointer to PgAioHandle* is that it's
   harder to distinguish subystem-internal functions that accept PgAioHandle*
   from "public" functions that accept the "issuer reference".


3) A way for any backend to wait for a specific IO to complete

   Currently: PgAioHandleRef

   This references 1) using a 32 bit ID and the 64bit generation.

   This is used to allow any backend to wait for a specific IO to
   complete. E.g. by including it in the BufferDesc so that WaitIO can wait
   for it.

   Because it includes the generation it's trivial to detect whether the
   PgAioHandle was reused.



> I would perhaps call the thing that supports issuer-only operations a
> "PgAio" and the thing other people can use a "PgAioHandle". Or
> "PgAioRequest" and "PgAioHandle" or something like that. With
> PgAioHandleRef, IMHO you've got two words that both imply a layer of
> indirection -- "handle" and "ref" -- which doesn't seem quite as nice,
> because then the other thing -- "PgAioHandle" still sort of implies one
> layer of indirection and the whole thing seems a bit less clear.

It's indirections all the way down. The PG representation of "one IO" in the
end is just an indirection for a kernel operation :)


I would like to split 1) and 2) above.

1) PgAio{Handle,Request,} (a large struct)  - used internally by AIO subsystem,
   "pointed to" by the following
2) PgAioIssuerRef (an ID or pointer) - used by the issuer to incrementally
   define the IO
3) PgAioWaitRef - (an ID and generation) - used to wait for a specific IO to
   complete, not affected by reuse of PgAioHandle





> > > REAPED feels like a bad name. It sounds like a later stage than COMPLETED,
> > > but it's actually vice versa.
> >
> > What would you call having gotten "completion notifications" from the kernel,
> > but not having processed them?
>
> The Linux kernel calls those zombie processes, so we could call it a ZOMBIE
> state, but that seems like it might be a bit of inside baseball.

ZOMBIE feels even later than REAPED to me :)


> I do agree with Heikki that REAPED sounds later than COMPLETED, because you
> reap zombie processes by collecting their exit status. Maybe you could have
> AHS_COMPLETE or AHS_IO_COMPLETE for the state where the I/O is done but
> there's still completion-related work to be done, and then the other state
> could be AHS_DONE or AHS_FINISHED or AHS_FINAL or AHS_REAPED or something.

How about

AHS_COMPLETE_KERNEL or AHS_COMPLETE_RAW - raw syscall completed
AHS_COMPLETE_SHARED_CB - shared callback completed
AHS_COMPLETE_LOCAL_CB - local callback completed

?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: New GUC autovacuum_max_threshold ?
Next
From: Tom Lane
Date:
Subject: Re: Adding support for SSLKEYLOGFILE in the frontend