Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | dbeeaize47y7esifdrinpa2l7cqqb67k72exvuf3appyxywjnc@7bt76mozhcy2 Whole thread Raw |
In response to | Re: AIO v2.5 (Noah Misch <noah@leadboat.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, On 2025-03-24 17:45:37 -0700, Noah Misch wrote: > (We may be due for a test mode that does smgrreleaseall() at every > CHECK_FOR_INTERRUPTS()?) I suspect we are. I'm a bit afraid of even trying... ... It's extremely slow - but at least the main regression as well as the aio tests pass! > > I however don't particularly like the *start* or *prep* names, I've gone back > > and forth on those a couple times. I could see "begin" work uniformly across > > those. > > For ease of new readers understanding things, I think it helps for the > functions that advance PgAioHandleState to have names that use words from > PgAioHandleState. It's one less mapping to get into the reader's head. Unfortunately for me it's kind of the opposite in this case, see below. > "Begin", "Start" and "prep" are all outside that taxonomy, making the reader > learn how to map them to the taxonomy. What reward does the reader get at the > end of that exercise? I'm not seeing one, but please do tell me what I'm > missing here. Because the end state varies, depending on the number of previously staged IOs, the IO method and whether batchmode is enabled, I think it's better if the "function naming pattern" (i.e. FileStartReadv, smgrstartreadv etc) is *not* aligned with an internal state name. It will just mislead readers to think that there's a deterministic mapping when that does not exist. That's not an excuse for pgaio_io_prep* though, that's a pointlessly different naming that I just stopped seeing. I'll try to think more about this, perhaps I can make myself see your POV more. > > > > Subject: [PATCH v2.11 08/27] localbuf: Track pincount in BufferDesc as well > > > LockBufferForCleanup() should get code like this > > > ConditionalLockBufferForCleanup() code, either now or when "not possible > > > today" ends. Currently, it just assumes all local buffers are > > > cleanup-lockable: > > > > > > /* Nobody else to wait for */ > > > if (BufferIsLocal(buffer)) > > > return; > > > > Kinda, yes, kinda no? LockBufferForCleanup() assumes, even for shared > > buffers, that the current backend can't be doing anything that conflicts with > > acquiring a buffer pin - note that it doesn't check the backend local pincount > > for shared buffers either. > > It checks the local pincount via callee CheckBufferIsPinnedOnce(). In exactly one of the callers :/ > As the patch stands, LockBufferForCleanup() can succeed when > ConditionalLockBufferForCleanup() would have returned false. That's already true today, right? In master ConditionalLockBufferForCleanup() for temp buffers checks LocalRefCount, whereas LockBufferForCleanup() doesn't. I think I agree with your suggestion further below, but independent of that, I don't see how the current modification in the patch makes the worse. Historically this behaviour of LockBufferForCleanup() kinda somewhat makes sense - the only place we use LockBufferForCleanup() is in a non-transactional command i.e. vacuum / index vacuum. So LockBufferForCleanup() turns out to only be safe in that context. > Like the comment, I expect it's academic today. I expect it will stay > academic. Anything that does a cleanup will start by reading the buffer, > which will resolve any refcnt the AIO subsystems holds for a read. If there's > an AIO write, the LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE) will block on > that. How about just removing the ConditionalLockBufferForCleanup() changes > or replacing them with a comment (like the present paragraph)? I think we'll need an expanded version of what I suggest once we have writes - but as you say, it shouldn't matter as long as we only have reads. So I think moving the relevant changes, with adjusted caveats, to the bufmgr: write change makes sense. > > > /* --- > > > * Opt-in to using AIO batchmode. > > > * > > > * Submitting IO in larger batches can be more efficient than doing so > > > * one-by-one, particularly for many small reads. It does, however, require > > > * the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO > > > * batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not: > > > * a) block without first calling pgaio_submit_staged(), unless a > > > * to-be-waited-on lock cannot be part of a deadlock, e.g. because it is > > > * never acquired in a nested fashion > > > * b) directly or indirectly start another batch pgaio_enter_batchmode() > > I think a callback could still do: > > pgaio_exit_batchmode() > ... arbitrary code that might reach pgaio_enter_batchmode() ... > pgaio_enter_batchmode() Yea - but I somehow doubt there are many cases where it makes sense to deep-queue IOs within the callback. The cases I can think of are things like ensuring the right VM buffer is in s_b. But if it turns out to be necessary, what you seuggest would be an out. Do you think it's worth mentioning the above workaround? I'm mildly inclined that not. If it turns out to be actually useful to do nested batching, we can change it so that nested batching *is* allowed, that'd not be hard. > > > * > > > * As this requires care and is nontrivial in some cases, batching is only > > > * used with explicit opt-in. > > > * --- > > > */ > > > #define READ_STREAM_USE_BATCHING 0x08 > > > > +1 > > Agreed. It's simple, and there's no loss of generality. > > > I wonder if something more like READ_STREAM_CALLBACK_BATCHMODE_AWARE > > would be better, to highlight that you are making a declaration about > > a property of your callback, not just turning on an independent > > go-fast feature... I fished those words out of the main (?) > > description of this topic atop pgaio_enter_batchmode(). Just a > > thought, IDK. > > Good points. I lean toward your renaming suggestion, or shortening to > READ_STREAM_BATCHMODE_AWARE or READ_STREAM_BATCH_OK. I'm also fine with the > original name, though. I'm ok with all of these. In order of preference: 1) READ_STREAM_USE_BATCHING or READ_STREAM_BATCH_OK 2) READ_STREAM_BATCHMODE_AWARE 3) READ_STREAM_CALLBACK_BATCHMODE_AWARE Greetings, Andres Freund
pgsql-hackers by date: