Re: AIO v2.3 - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: AIO v2.3 |
Date | |
Msg-id | CA+hUKGLxH1tsUgzZfng4BU6GqnS6bKF2ThvxH1_w5c7-sLRKQw@mail.gmail.com Whole thread Raw |
Responses |
Re: AIO v2.3
Re: AIO v2.3 Re: AIO v2.3 |
List | pgsql-hackers |
On Thu, Jan 23, 2025 at 5:29 PM Andres Freund <andres@anarazel.de> wrote: +/* caller will issue more io, don't submit */ +#define READ_BUFFERS_MORE_MORE_MORE (1 << 3) > - Heikki doesn't love pgaio_submit_staged(), suggested pgaio_kick_staged() or > such. I don't love that name though. Problem statement: You want to be able to batch I/O submission, ie make a single call to ioring_enter() (and other mechanisms) to start several I/Os, but the code that submits is inside StartReadBuffers() and the code that knows how many I/Os it wants to start now is at a higher level, read_stream.c and in future elsewhere. So you invented this flag to tell StartReadBuffers() not to call pgaio_submit_staged(), because you promise to do it later, via this staging list. Additionally, there is a kind of programming rule here that you *must* submit I/Os that you stage, you aren't allowed to (for example) stage I/Os and then sleep, so it has to be a fairly tight piece of code. Would the API be better like this?: When you want to create a batch of I/Os submitted together, you wrap the work in pgaio_begin_batch() and pgaio_submit_batch(), eg the loop in read_stream_lookahead(). Then bufmgr wouldn't need this flag: when it (or anything else) calls smgrstartreadv(), if there is not currently an explicit batch then it would be submitted immediately, and otherwise it would only be staged. This way, batch construction (or whatever word you prefer for batch) is in a clearly and explicitly demarcated stretch of code in one lexical scope (though its effect is dynamically scoped just like the staging list itself because we don't want to pass explicit I/O contexts through the layers), but code that doesn't call those and reaches AsyncReadBuffer() or whatever gets an implicit batch of size one and that's also OK. Not sure what semantics nesting would have but I doubt it matters much. > Things that need to be fixed / are fixed in this: > - max pinned buffers should be limited by io_combine_limit, not * 4 > - overflow distance > - pins need to be limited in more places I have patches for these and a few more things and will post in a separate thread shortly because they can be understood without reference to this AIO stuff and that'll hopefully be more digestible. + /* + * In some rare-ish cases one operation causes multiple reads (e.g. if a + * buffer was concurrently read by another backend). It'd be much better + * if we ensured that each ReadBuffersOperation covered only one IO - but + * that's not entirely trivial, due to having pinned victim buffers before + * starting IOs. + * + * TODO: Change the API of StartReadBuffers() to ensure we only ever need + * one IO. Likewise. + /* IO finished, but result has not yet been processed */ + PGAIO_HS_COMPLETED_IO, + + /* IO completed, shared completion has been called */ + PGAIO_HS_COMPLETED_SHARED, + + /* IO completed, local completion has been called */ + PGAIO_HS_COMPLETED_LOCAL, (Repeating something I mentioned in off-list bikeshedding) I wondered if it might be clearer to use the terminology "terminated" for the work that PostgreSQL has to do after an I/O completes, instead of overloading/subdividing the term "completed". We already "terminate" an I/O when smgr I/O completes in pre-existing bufmgr terminology, and this feels like a sort of generalisation of that notion. In this AIO world, some work is done by the backend that receives the completion notification from the kernel, and some is done by the backend that submitted the I/O in the first place, a division that doesn't exist with simple synchronous system calls. I wonder if it would be clearer to use terms based on those two roles, rather than "shared" and "local", leading to something like: PGAIO_HS_COMPLETED, PGAIO_HS_TERMINATED_BY_COMPLETER, PGAIO_HS_TERMINATED_BY_SUBMITTER,
pgsql-hackers by date: