Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | xg2s43enrr7lwvghh55kofxdj7o7zd6s6ocgmowel76oxyb2gm@eh6guzumnido Whole thread Raw |
In response to | Re: AIO v2.5 (Alexander Lakhin <exclusion@gmail.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, After a bit more private back and forth with Alexander I have found the issue - and it's pretty stupid: pgaio_io_wait_for_free() does what it says on the tin. For that, after a bunch of other things, finds the oldest in-flight IO and waits for it. PgAioHandle *ioh = dclist_head_element(PgAioHandle, node, &pgaio_my_backend->in_flight_ios); switch (ioh->state) { ... case PGAIO_HS_COMPLETED_IO: case PGAIO_HS_SUBMITTED: pgaio_debug_io(DEBUG2, ioh, "waiting for free io with %d in flight", dclist_count(&pgaio_my_backend->in_flight_ios)); ... pgaio_io_wait(ioh, ioh->generation); break; The problem is that, if the log level is low enough, ereport() (which is called by pgaio_debug_io()), processes interrupts. The interrupt processing may end up execute ProcessBarrierSmgrRelease(), which in turn needs to wait for all in-flight IOs before the IOs are closed. Which then leads to the elog(PANIC, "waiting for own IO in wrong state: %d", state); error. The waiting for in-flight IOs before closing FDs only happens with io-uring, hence this only triggering with io-uring. A similar set of steps can lead to the "no free IOs despite no in-flight IOs" ERROR that Alexander also observed - if pgaio_submit_staged() triggers a debug ereport that executes ProcessBarrierSmgrRelease() in an interrupt, we might wait for all in-flight IOs during IO submission, triggering the error. I'm somewhat amazed that Alexander could somewhat reliably reproduce this - I haven't been able to do so once using Alexander's recipe. I did find a reproducer though: c=32;pgbench -n -c$c -j$c -P1 -T1000 -f <(echo 'SELECT sum(abalance) FROM pgbench_accounts;') c=1;pgbench -n -c$c -j$c -P1 -T1000 -f <(echo 'DROP DATABASE IF EXISTS foo;CREATE DATABASE foo;') trigger both issues quickly if run with log_min_messages=debug2 io_method=io_uring and not when a non-debug log level is used. I'm not yet sure how to best fix it - locally I have done so by pgaio_debug() do a HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around the call to ereport. But that doesn't really seem great - otoh requiring various pieces of code to know that anything emitting debug messages needs to hold interrupts etc makes for rare and hard to understand bugs. We could just make the relevant functions hold interrupts, and that might be the best path forward, but we don't really need to hold all interrupts (e.g. termination would be fine), so it's a bit coarse grained. It would need to happen in a few places, which isn't great either. Other suggestions? Thanks again for finding and reporting this Alexander! Greetings, Andres Freund
pgsql-hackers by date: