Re: AIO v2.5 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | 20250503030511.9b.nmisch@google.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Wed, Apr 30, 2025 at 04:00:35PM -0400, Andres Freund wrote: > pgaio_io_wait_for_free() does what it says on the tin. For that, after a bunch > of other things, finds the oldest in-flight IO and waits for it. > > PgAioHandle *ioh = dclist_head_element(PgAioHandle, node, > &pgaio_my_backend->in_flight_ios); > > switch (ioh->state) > { > ... > case PGAIO_HS_COMPLETED_IO: > case PGAIO_HS_SUBMITTED: > pgaio_debug_io(DEBUG2, ioh, > "waiting for free io with %d in flight", > dclist_count(&pgaio_my_backend->in_flight_ios)); > ... > pgaio_io_wait(ioh, ioh->generation); > break; > > > The problem is that, if the log level is low enough, ereport() (which is > called by pgaio_debug_io()), processes interrupts. The interrupt processing > may end up execute ProcessBarrierSmgrRelease(), which in turn needs to wait > for all in-flight IOs before the IOs are closed. > > Which then leads to the > elog(PANIC, "waiting for own IO in wrong state: %d", > state); > > error. Printing state 0 (PGAIO_HS_IDLE), right? I think the chief problem is that pgaio_io_wait_for_free() is fetching ioh->state, then possibly processing interrupts in pgaio_debug_io(), then finally fetching ioh->generation. If it fetched ioh->generation to a local variable before pgaio_debug_io, I think that would resolve this one. Then the pgaio_io_was_recycled() would prevent the PANIC: if (pgaio_io_was_recycled(ioh, ref_generation, &state)) return; if (am_owner) { if (state != PGAIO_HS_SUBMITTED && state != PGAIO_HS_COMPLETED_IO && state != PGAIO_HS_COMPLETED_SHARED && state != PGAIO_HS_COMPLETED_LOCAL) { elog(PANIC, "waiting for own IO in wrong state: %d", state); } } Is that right? If that's the solution, pgaio_closing_fd() and pgaio_shutdown() would need similar care around fetching the generation before the pgaio_debug_io. Maybe there's an opportunity for a common inline function. Or at least a comment at the "generation" field on how to safely time a fetch thereof and any barrier required. > A similar set of steps can lead to the "no free IOs despite no in-flight IOs" > ERROR that Alexander also observed - if pgaio_submit_staged() triggers a debug > ereport that executes ProcessBarrierSmgrRelease() in an interrupt, we might > wait for all in-flight IOs during IO submission, triggering the error. That makes sense. > I'm not yet sure how to best fix it - locally I have done so by pgaio_debug() > do a HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around the call to ereport. But > that doesn't really seem great - otoh requiring various pieces of code to know > that anything emitting debug messages needs to hold interrupts etc makes for > rare and hard to understand bugs. > > We could just make the relevant functions hold interrupts, and that might be > the best path forward, but we don't really need to hold all interrupts > (e.g. termination would be fine), so it's a bit coarse grained. It would need > to happen in a few places, which isn't great either. > > Other suggestions? For the "no free IOs despite no in-flight IOs" case, I'd replace the ereport(ERROR) with "return;" since we now know interrupt processing reclaimed an IO. Then decide what protection if any, we need against bugs causing an infinite loop in caller pgaio_io_acquire(). What's the case motivating the unbounded loop in pgaio_io_acquire(), as opposed to capping at two pgaio_io_acquire_nb() calls? If the theory is that pgaio_io_acquire() could be reentrant, what scenario would reach that reentrancy? > Thanks again for finding and reporting this Alexander! +1!
pgsql-hackers by date: