Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | z6iusgq6hpz64nkzjlwj3eagko3kvjithksxeerg2mmvjvbbc6@sxekmwohpmda Whole thread Raw |
In response to | Re: AIO v2.5 (Noah Misch <noah@leadboat.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, On 2025-04-04 14:18:02 -0700, Noah Misch wrote: > On Fri, Apr 04, 2025 at 03:16:18PM -0400, Andres Freund wrote: > > > - Make DEFINED in completor before verifying page. It might be cleaner to > > > do this when the completor first retrieves a return value from io_uring, > > > since this just makes up for what Valgrind already does for readv(). > > > > I think we can't rely on the marking during retrieving it from io_uring, as > > that might have happened in a different backend for a temp buffer. That'd only > > happen if we got io_uring events for *another* IO that involved a shared rel, > > but it can happen. > > Good point. I think the VALGRIND_MAKE_MEM_DEFINED() in > pgaio_uring_drain_locked() isn't currently needed at all. If > completor-subxact==definer-subxact, PinBuffer() already did what Valgrind > needs. Otherwise, buffer_readv_complete_one() does what Valgrind needs. We did need it - but only because I bungled something in the earlier patch to add valgrind support. The problem is that in PinLocalBuffer() there may not actually be any storage allocated for the buffer yet, so VALGRIND_MAKE_MEM_DEFINED() doesn't work. In the first use of the buffer the allocation happens a bit later, in GetLocalVictimBuffer(), namely during the call to GetLocalBufferStorage(). Not quite sure yet how to best deal with it. Putting the PinLocalBuffer() slightly later into GetLocalVictimBuffer() fixes the issue, but also doesn't really seem great. > If that's right, it would still be nice to reach the right > VALGRIND_MAKE_MEM_DEFINED() without involving bufmgr. I think that would be possible if we didn't do VALGRIND_MAKE_MEM_NOACCESS() in UnpinBuffer()/UnpinLocalBuffer(). But with that I don't see how we can avoid needing to remark the region as accessible? > That helps future, non-bufmgr AIO use cases. It's tricky to pick the right > place for that VALGRIND_MAKE_MEM_DEFINED(): > - pgaio_uring_drain_locked() is problematic, I think. In the localbuf case, > the iovec base address is relevant only in the ioh-defining process. In the > shmem completor!=definer case, this runs only in the completor. You're right :( > - A complete_local callback solves those problems. However, if the > AIO-defining subxact aborted, then we shouldn't set DEFINED at all, since > the buffer mapping may have changed by the time of complete_local. I don't think that is possible, due to the aio subsystem owned pin? > - Putting it in the place that would call pgaio_result_report(ERROR) if > needed, e.g. ProcessReadBuffersResult(), solves the problem of the buffer > mapping having moved. ProcessReadBuffersResult() doesn't even need this, > since PinBuffer() already did it. Each future AIO use case will have a > counterpart of ProcessReadBuffersResult() that consumes the result and > proceeds with tasks that depend on the AIO. That's the place. I don't really follow - at the point something like ProcessReadBuffersResult() gets involved, we'll already have done the accesses that needed the memory to be accessible and defined? I think the point about non-aio uses is a fair one, but I don't quite know how to best solve it right now, due to the local buffer issue you mentioned. I'd guess that we'd best put it somewhere a) in pgaio_io_process_completion(), if definer==completor || !PGAIO_HF_REFERENCES_LOCAL b) pgaio_io_call_complete_local(), just before calling pgaio_io_call_complete_local() if PGAIO_HF_REFERENCES_LOCAL > > related code until it is pinned bu "user" code again. But it requires some > > s/bu/by/ > > > + * Return the iovecand its length. Currently only expected to be used by > > s/iovecand/iovec and/ Fixed. > > @@ -361,13 +405,16 @@ pgaio_uring_drain_locked(PgAioUringContext *context) > > for (int i = 0; i < ncqes; i++) > > { > > struct io_uring_cqe *cqe = cqes[i]; > > + int32 res; > > PgAioHandle *ioh; > > > > ioh = io_uring_cqe_get_data(cqe); > > errcallback.arg = ioh; > > + res = cqe->res; > > + > > io_uring_cqe_seen(&context->io_uring_ring, cqe); > > > > - pgaio_io_process_completion(ioh, cqe->res); > > + pgaio_uring_io_process_completion(ioh, res); > > I guess this is a distinct cleanup, done to avoid any suspicion of cqe being > reused asynchronously after io_uring_cqe_seen(). Is that right? I don't think there is any such danger - there's no background thing processing things on the ring, if there were, it'd get corrupted, but it seemed cleaner to do it that way when I introduced pgaio_uring_io_process_completion(). > > Subject: [PATCH v1 3/3] aio: Avoid spurious coverity warning > > Ready for commit Thanks! Greetings, Andres Freund
pgsql-hackers by date: