Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | zfkgslbfxyrzntjclthxa5jo75es3qqt5sdavci4givvszp4u3@iwu2ciwdxhcb Whole thread Raw |
In response to | Re: AIO v2.5 (Noah Misch <noah@leadboat.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, On 2025-03-25 17:19:15 -0700, Noah Misch wrote: > On Mon, Mar 24, 2025 at 09:18:06PM -0400, Andres Freund wrote: > > @@ -296,7 +299,9 @@ pgaio_io_call_complete_local(PgAioHandle *ioh) > > > > /* > > * Note that we don't save the result in ioh->distilled_result, the local > > - * callback's result should not ever matter to other waiters. > > + * callback's result should not ever matter to other waiters. However, the > > + * local backend does care, so we return the result as modified by local > > + * callbacks, which then can be passed to ioh->report_return->result. > > */ > > pgaio_debug_io(DEBUG3, ioh, > > "after local completion: distilled result: (status %s, id %u, error_data %d, result %d), raw_result:%d", > > Should this debug message remove the word "distilled", since this commit > solidifies distilled_result as referring to the complete_shared result? Good point, updated. > > Subject: [PATCH v2.12 01/28] aio: Be more paranoid about interrupts > Ready for commit > > Subject: [PATCH v2.12 02/28] aio: Pass result of local callbacks to > > ->report_return > > Ready for commit w/ up to one cosmetic change: > And pushed. Together with the s/pgaio_io_prep_/s/pgaio_io_start_/ renaming we've been discussing. Btw, I figured out the origin of that, I was just mirroring the liburing API... Thanks again for the reviews. > > Subject: [PATCH v2.12 03/28] aio: Add liburing dependency > > Ready for commit > > > > Subject: [PATCH v2.12 04/28] aio: Add io_method=io_uring > > Ready for commit w/ open_fd.fixup Yay. Planning to push those soon. > > Subject: [PATCH v2.12 05/28] aio: Implement support for reads in smgr/md/fd > > Ready for commit w/ up to two cosmetic changes: Cool. > > +/* > > + * AIO error reporting callback for mdstartreadv(). > > + * > > + * Errors are encoded as follows: > > + * - PgAioResult.error_data != 0 encodes IO that failed with errno != 0 > > I recommend replacing "errno != 0" with either "that errno" or "errno == > error_data". Applied. > Second, the aio_internal.h comment changes discussed in > postgr.es/m/20250325155808.f7.nmisch@google.com and earlier. Here's my current version of that: * Note that the externally visible functions to start IO * (e.g. FileStartReadV(), via pgaio_io_start_readv()) move an IO from * PGAIO_HS_HANDED_OUT to at least PGAIO_HS_STAGED and at most * PGAIO_HS_COMPLETED_LOCAL (at which point the handle will be reused). Does that work? I think I'll push that as part of the comment updates patch instead of "Implement support for reads in smgr/md/fd", unless you see a reason to do so differently. I'd have done it in the patch to s/prep/start/, but then it would reference functions that don't exist yet... > > Subject: [PATCH v2.12 06/28] aio: Add README.md explaining higher level design > > Ready for commit Cool. Comments in it reference PGAIO_HCB_SHARED_BUFFER_READV, so I'm inclined to reorder it until after "bufmgr: Implement AIO read support". There's also a small change in a new patch in the series (not yet sent), due to the changes related to emitting WARNINGs about checksum failures to the client connection. I think that part is fine, but... > (This and the previous patch have three spots that would change with the > s/prep/start/ renames. No opinion on whether to rename before or rename > after.) I thought it'd be better to do the renaming first. > > Subject: [PATCH v2.12 07/28] localbuf: Track pincount in BufferDesc as well > > The plan here looks good: > postgr.es/m/dbeeaize47y7esifdrinpa2l7cqqb67k72exvuf3appyxywjnc@7bt76mozhcy2 > > Subject: [PATCH v2.12 08/28] bufmgr: Implement AIO read support > > See review here and later discussion: > postgr.es/m/20250325022037.91.nmisch@google.com I'm working on a version with those addressed. > > Subject: [PATCH v2.12 09/28] bufmgr: Use AIO in StartReadBuffers() > > Ready for commit after a batch of small things, all but one of which have no > implications beyond code cosmetics. Yay. > I like the test coverage (by the end of the patch series). I'm really shocked just how bad our test coverage for a lot of this is today :( > For anyone else following, I found "diff -w" helpful for the bufmgr.c > changes. That's because a key part is former WaitReadBuffers() code moving > up an indentation level to its home in new subroutine AsyncReadBuffers(). For reviewing changes that move stuff around a lot I find this rather helpful: git diff --color-moved --color-moved-ws=ignore-space-change That highlights removed code differently from moved code, and due to ignore-space-change considers code that changed just due to space changes, to be moved. > > Assert(*nblocks == 1 || allow_forwarding); > > Assert(*nblocks > 0); > > Assert(*nblocks <= MAX_IO_COMBINE_LIMIT); > > + Assert(*nblocks == 1 || allow_forwarding); > > Duplicates the assert three lines back. Ah, it was moved into ce1a75c4fea, which I didn't notice while rebasing... > > + nblocks = aio_ret->result.result; > > + > > + elog(DEBUG3, "partial read, will retry"); > > + > > + } > > + else if (aio_ret->result.status == PGAIO_RS_ERROR) > > + { > > + pgaio_result_report(aio_ret->result, &aio_ret->target_data, ERROR); > > + nblocks = 0; /* silence compiler */ > > + } > > > > Assert(nblocks > 0); > > Assert(nblocks <= MAX_IO_COMBINE_LIMIT); > > > > + operation->nblocks_done += nblocks; > > I struggled somewhat from the variety of "nblocks" variables: this local > nblocks, operation->nblocks, actual_nblocks, and *nblocks in/out parameters of > some functions. No one of them is clearly wrong to use the name, and some of > these names are preexisting. That said, if you see opportunities to push in > the direction of more-specific names, I'd welcome it. > > For example, this local variable could become add_to_nblocks_done instead. I named it "newly_read_blocks", hope that works? > > + AsyncReadBuffers(operation, &nblocks); > > I suggest renaming s/nblocks/ignored_nblocks_progress/ here. Adopted. Unfortunately I didn't see a good way to reduce the amount of the other nblocks variables, as they are all, I think, preexisting. > > + * If we need to wait for IO before we can get a handle, submit already > > + * staged IO first, so that other backends don't need to wait. There > > s/already staged/already-staged/. Normally I'd skip this as nitpicking, but I > misread this particular sentence twice, as "submit" being the subject that > "staged" something. (It's still nitpicking, alas.) Makes sense - it doesn't help that it was at a linebreak... > > /* > > * How many neighboring-on-disk blocks can we scatter-read into other > > * buffers at the same time? In this case we don't wait if we see an > > - * I/O already in progress. We already hold BM_IO_IN_PROGRESS for the > > + * I/O already in progress. We already set BM_IO_IN_PROGRESS for the > > * head block, so we should get on with that I/O as soon as possible. > > - * We'll come back to this block again, above. > > + * > > + * We'll come back to this block in the next call to > > + * StartReadBuffers() -> AsyncReadBuffers(). > > Did this mean to say "WaitReadBuffers() -> AsyncReadBuffers()"? I'm guessing > so, since WaitReadBuffers() is the one that loops. It might be referring to > read_stream_start_pending_read()'s next StartReadBuffers(), though. I was referring to the latter, as that is the more common case (it's pretty easy to hit if you e.g. have multiple sequential scans on the same table going). > I think this could just delete the last sentence. The function header comment > already mentions the possibility of reading a subset of the request. This > spot doesn't need to detail how the higher layers come back to here. Agreed. > > + smgrstartreadv(ioh, operation->smgr, forknum, > > + blocknum + nblocks_done, > > + io_pages, io_buffers_len); > > + pgstat_count_io_op_time(io_object, io_context, IOOP_READ, > > + io_start, 1, *nblocks_progress * BLCKSZ); > > We don't assign *nblocks_progress until lower in the function, so I think > "io_buffers_len" should replace "*nblocks_progress" here. (This is my only > non-cosmetic comment on this patch.) Good catch! > > Subject: [PATCH v2.12 16/28] aio: Add test_aio module > > > +use List::Util qw(sample); > > sample() is new in 2020: > https://metacpan.org/release/PEVANS/Scalar-List-Utils-1.68/source/Changes#L100 > > Hence, I'd expect some buildfarm failures. I'd try to use shuffle(), then > take the first N elements. Hah. Bilal's patch was using shuffe(). I wanted to reduce the number of iterations and first did as you suggest and then saw that there's a nicer way... Done that way again... > > +++ b/src/test/modules/test_aio/test_aio.c > > @@ -0,0 +1,712 @@ > > +/*------------------------------------------------------------------------- > > + * > > + * delay_execution.c > > + * Test module to allow delay between parsing and execution of a query. > > + * > > + * The delay is implemented by taking and immediately releasing a specified > > + * advisory lock. If another process has previously taken that lock, the > > + * current process will be blocked until the lock is released; otherwise, > > + * there's no effect. This allows an isolationtester script to reliably > > + * test behaviors where some specified action happens in another backend > > + * between parsing and execution of any desired query. > > + * > > + * Copyright (c) 2020-2025, PostgreSQL Global Development Group > > + * > > + * IDENTIFICATION > > + * src/test/modules/test_aio/test_aio.c > > To elaborate on my last review, the entire header comment was a copy from > delay_execution.c. v2.12 fixes the IDENTIFICATION, but the rest needs > updates. I was really too tired that day... Embarassing. Greetings, Andres Freund
pgsql-hackers by date: