Re: AIO v2.2 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.2 |
Date | |
Msg-id | fsauvxs3xzgqsowpu4cyon5pj4nwzfejbazsd5aqbd5t3qxi6p@fklsi6bpmniw Whole thread Raw |
In response to | Re: AIO v2.2 (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Hi, On 2025-01-07 14:59:58 -0500, Robert Haas wrote: > On Tue, Jan 7, 2025 at 11:11 AM Andres Freund <andres@anarazel.de> wrote: > > The difference between a handle and a reference is useful right now, to have > > some separation between the functions that can be called by anyone (taking a > > PgAioHandleRef) and only by the issuer (PgAioHandle). That might better be > > solved by having a PgAioHandleIssuerRef ref or something. > > To me, those names don't convey that. I'm certainly not wedded to these names - I went back and forth between different names a fair bit, because I wasn't quite happy. I am however certain that the current names are better than what it used to be (PgAioInProgress and because that's long, a bunch of PgAioIP* names) :) To make sure were talking about the same things, I am thinking of the following "entities" needing names: 1) Shared memory representation of an IO, for the AIO subsystem internally Currently: PgAioHandle Because shared memory is limited, we need to reuse this entity. This reuse needs to be possible "immediately" after completion, to avoid a bunch of nasty scenarios. To distinguish a reused PgAioHandle from its "prior" incarnation, each PgAioHandle has a 64bit "generation counter. In addition to being referenceable via pointer, it's also possible to assign a 32bit integer to each PgAioHandle, as there is a fixed number of them. 2) A way for the issuer of an IO to reference 1), to attach information to the IO Currently: PgAioHandle* As long as the issuer hasn't yet staged the IO, it can't be reused. Therefore it's OK to just point to the PgAioHandle. One disadvantage of just using a pointer to PgAioHandle* is that it's harder to distinguish subystem-internal functions that accept PgAioHandle* from "public" functions that accept the "issuer reference". 3) A way for any backend to wait for a specific IO to complete Currently: PgAioHandleRef This references 1) using a 32 bit ID and the 64bit generation. This is used to allow any backend to wait for a specific IO to complete. E.g. by including it in the BufferDesc so that WaitIO can wait for it. Because it includes the generation it's trivial to detect whether the PgAioHandle was reused. > I would perhaps call the thing that supports issuer-only operations a > "PgAio" and the thing other people can use a "PgAioHandle". Or > "PgAioRequest" and "PgAioHandle" or something like that. With > PgAioHandleRef, IMHO you've got two words that both imply a layer of > indirection -- "handle" and "ref" -- which doesn't seem quite as nice, > because then the other thing -- "PgAioHandle" still sort of implies one > layer of indirection and the whole thing seems a bit less clear. It's indirections all the way down. The PG representation of "one IO" in the end is just an indirection for a kernel operation :) I would like to split 1) and 2) above. 1) PgAio{Handle,Request,} (a large struct) - used internally by AIO subsystem, "pointed to" by the following 2) PgAioIssuerRef (an ID or pointer) - used by the issuer to incrementally define the IO 3) PgAioWaitRef - (an ID and generation) - used to wait for a specific IO to complete, not affected by reuse of PgAioHandle > > > REAPED feels like a bad name. It sounds like a later stage than COMPLETED, > > > but it's actually vice versa. > > > > What would you call having gotten "completion notifications" from the kernel, > > but not having processed them? > > The Linux kernel calls those zombie processes, so we could call it a ZOMBIE > state, but that seems like it might be a bit of inside baseball. ZOMBIE feels even later than REAPED to me :) > I do agree with Heikki that REAPED sounds later than COMPLETED, because you > reap zombie processes by collecting their exit status. Maybe you could have > AHS_COMPLETE or AHS_IO_COMPLETE for the state where the I/O is done but > there's still completion-related work to be done, and then the other state > could be AHS_DONE or AHS_FINISHED or AHS_FINAL or AHS_REAPED or something. How about AHS_COMPLETE_KERNEL or AHS_COMPLETE_RAW - raw syscall completed AHS_COMPLETE_SHARED_CB - shared callback completed AHS_COMPLETE_LOCAL_CB - local callback completed ? Greetings, Andres Freund
pgsql-hackers by date: