Re: AIO v2.5 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | 20250405134352.66.nmisch@google.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Fri, Apr 04, 2025 at 11:53:13PM -0400, Andres Freund wrote: > On 2025-04-04 14:18:02 -0700, Noah Misch wrote: > > On Fri, Apr 04, 2025 at 03:16:18PM -0400, Andres Freund wrote: > > > > - Make DEFINED in completor before verifying page. It might be cleaner to > > > > do this when the completor first retrieves a return value from io_uring, > > > > since this just makes up for what Valgrind already does for readv(). > > > > > > I think we can't rely on the marking during retrieving it from io_uring, as > > > that might have happened in a different backend for a temp buffer. That'd only > > > happen if we got io_uring events for *another* IO that involved a shared rel, > > > but it can happen. > > > > Good point. I think the VALGRIND_MAKE_MEM_DEFINED() in > > pgaio_uring_drain_locked() isn't currently needed at all. If > > completor-subxact==definer-subxact, PinBuffer() already did what Valgrind > > needs. Otherwise, buffer_readv_complete_one() does what Valgrind needs. > > We did need it - but only because I bungled something in the earlier patch to > add valgrind support. The problem is that in PinLocalBuffer() there may not > actually be any storage allocated for the buffer yet, so > VALGRIND_MAKE_MEM_DEFINED() doesn't work. In the first use of the buffer the > allocation happens a bit later, in GetLocalVictimBuffer(), namely during the > call to GetLocalBufferStorage(). > > Not quite sure yet how to best deal with it. Putting the PinLocalBuffer() > slightly later into GetLocalVictimBuffer() fixes the issue, but also doesn't > really seem great. Yeah. Maybe this (untested): diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c index fb44d75..3de13e7 100644 --- a/src/backend/storage/buffer/localbuf.c +++ b/src/backend/storage/buffer/localbuf.c @@ -813,7 +813,8 @@ PinLocalBuffer(BufferDesc *buf_hdr, bool adjust_usagecount) pg_atomic_unlocked_write_u32(&buf_hdr->state, buf_state); /* see comment in PinBuffer() */ - VALGRIND_MAKE_MEM_DEFINED(LocalBufHdrGetBlock(buf_hdr), BLCKSZ); + if (LocalBufHdrGetBlock(bufHdr) != NULL) + VALGRIND_MAKE_MEM_DEFINED(LocalBufHdrGetBlock(buf_hdr), BLCKSZ); } LocalRefCount[bufid]++; ResourceOwnerRememberBuffer(CurrentResourceOwner, @@ -932,6 +933,15 @@ GetLocalBufferStorage(void) next_buf_in_block++; total_bufs_allocated++; + /* + * Caller's PinBuffer() was too early for Valgrind updates, so do it here. + * The block is actually undefined, but we want consistency with the + * regular case of not needing to allocate memory. This is specifically + * needed when method_io_uring.c fills the block, because Valgrind doesn't + * recognize io_uring reads causing undefined memory to become defined. + */ + VALGRIND_MAKE_MEM_DEFINED(this_buf, BLCKSZ); + return (Block) this_buf; } > > If that's right, it would still be nice to reach the right > > VALGRIND_MAKE_MEM_DEFINED() without involving bufmgr. > > I think that would be possible if we didn't do VALGRIND_MAKE_MEM_NOACCESS() in > UnpinBuffer()/UnpinLocalBuffer(). But with that I don't see how we can avoid > needing to remark the region as accessible? Yes, it's not that we should remove VALGRIND_MAKE_MEM_DEFINED() from bufmgr. I was trying to think about future AIO callers (e.g. RelationCopyStorage()) and how they'd want things to work. That said, perhaps we should just omit the io_uring-level Valgrind calls and delegate the problem to higher layers until there's a concrete use case: --- a/src/backend/storage/smgr/smgr.c +++ b/src/backend/storage/smgr/smgr.c @@ -742,14 +742,16 @@ smgrreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, * Compared to smgrreadv(), more responsibilities fall on the caller: * - Partial reads need to be handled by the caller re-issuing IO for the * unread blocks * - smgr will ereport(LOG_SERVER_ONLY) some problems, but higher layers are * responsible for pgaio_result_report() to mirror that news to the user (if * the IO results in PGAIO_RS_WARNING) or abort the (sub)transaction (if * PGAIO_RS_ERROR). + * - Under Valgrind, the "buffers" memory may or may not change status to + * DEFINED, depending on io_method and concurrent activity. */ void smgrstartreadv(PgAioHandle *ioh, > > - A complete_local callback solves those problems. However, if the > > AIO-defining subxact aborted, then we shouldn't set DEFINED at all, since > > the buffer mapping may have changed by the time of complete_local. > > I don't think that is possible, due to the aio subsystem owned pin? We currently drop the shared buffer AIO subsystem pin in complete_shared (buffer_readv_complete_one() -> TerminateBufferIO()). I was trying to say that if we put a VALGRIND_MAKE_MEM_DEFINED in complete_local without changing anything else, it would have this problem. We probably wouldn't want to move the shared buffer pin drop to complete_local without a strong reason. > > - Putting it in the place that would call pgaio_result_report(ERROR) if > > needed, e.g. ProcessReadBuffersResult(), solves the problem of the buffer > > mapping having moved. ProcessReadBuffersResult() doesn't even need this, > > since PinBuffer() already did it. Each future AIO use case will have a > > counterpart of ProcessReadBuffersResult() that consumes the result and > > proceeds with tasks that depend on the AIO. That's the place. > > I don't really follow - at the point something like ProcessReadBuffersResult() > gets involved, we'll already have done the accesses that needed the memory to > be accessible and defined? (I was referring to future AIO callers, and this part of the discussion is obsolete if we just delegate the problem.) The bufmgr subsystem does the PageIsVerified() accesses before ProcessReadBuffersResult(), but I think the client of the bufmgr subsystem, e.g. read_stream.c, does accesses only after that. The client must do accesses only after that, since any pgaio_result_report(ERROR) doesn't happen until ProcessReadBuffersResult(). > I think the point about non-aio uses is a fair one, but I don't quite know how > to best solve it right now, due to the local buffer issue you mentioned. I'd > guess that we'd best put it somewhere > a) in pgaio_io_process_completion(), if definer==completor || !PGAIO_HF_REFERENCES_LOCAL > b) pgaio_io_call_complete_local(), just before calling > pgaio_io_call_complete_local() if PGAIO_HF_REFERENCES_LOCAL I think that would do nothing wrong today, but it uses !PGAIO_HF_REFERENCES_LOCAL as a proxy for "target memory is already DEFINED, or a higher layer will make it DEFINED". While !PGAIO_HF_REFERENCES_LOCAL does imply that today, I think that's a bufmgr-specific conclusion. I have no particular reason to expect that to hold for future AIO use cases. As above, I'd be inclined to omit the io_uring-level Valgrind calls until we have a concrete use case to drive their design. How do you see it? > > > @@ -361,13 +405,16 @@ pgaio_uring_drain_locked(PgAioUringContext *context) > > > for (int i = 0; i < ncqes; i++) > > > { > > > struct io_uring_cqe *cqe = cqes[i]; > > > + int32 res; > > > PgAioHandle *ioh; > > > > > > ioh = io_uring_cqe_get_data(cqe); > > > errcallback.arg = ioh; > > > + res = cqe->res; > > > + > > > io_uring_cqe_seen(&context->io_uring_ring, cqe); > > > > > > - pgaio_io_process_completion(ioh, cqe->res); > > > + pgaio_uring_io_process_completion(ioh, res); > > > > I guess this is a distinct cleanup, done to avoid any suspicion of cqe being > > reused asynchronously after io_uring_cqe_seen(). Is that right? > > I don't think there is any such danger - there's no background thing > processing things on the ring, if there were, it'd get corrupted, but it > seemed cleaner to do it that way when I introduced > pgaio_uring_io_process_completion(). I agree there's no concrete danger.
pgsql-hackers by date: