On 19/08/2025 03:07, Andres Freund wrote:
> On 2025-08-15 17:39:30 +1200, Thomas Munro wrote:
>> From ec2e1e21054f00918d3b28ce01129bc06de37790 Mon Sep 17 00:00:00 2001
>> From: Thomas Munro <thomas.munro@gmail.com>
>> Date: Sun, 3 Aug 2025 23:07:56 +1200
>> Subject: [PATCH v2 1/2] aio: Fix pgaio_io_wait() for staged IOs.
>>
>> Previously, pgaio_io_wait()'s cases for PGAIO_HS_DEFINED and
>> PGAIO_HS_STAGED fell through to waiting for completion. The owner only
>> promises to advance it to PGAIO_HS_SUBMITTED. The waiter needs to be
>> prepared to call ->wait_one() itself once the IO is submitted in order
>> to guarantee progress and avoid deadlocks on IO methods that provide
>> ->wait_one().
>>
>> Introduce a new per-backend condition variable submit_cv, woken by by
>> pgaio_submit_stage(), and use it to wait for the state to advance. The
>> new broadcast doesn't seem to cause any measurable slowdown, so ideas
>> for optimizing the common no-waiters case were abandoned for now.
>>
>> It may not be possible to reach any real deadlock with existing AIO
>> users, but that situation could change. There's also no reason the
>> waiter shouldn't begin to wait via the IO method as soon as possible
>> even without a deadlock.
>>
>> Picked up by testing a proposed IO method that has ->wait_one(), like
>> io_method=io_uring, and code review.
>
> LGTM.
I just independently noticed this same issue, wrote a little test to
reproduce it, and was about to report it, when I noticed that you found
this already. Attached is the repro script.
Both of the proposed patches seem fine to me. I'm inclined to go with
the first patch (v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs.patch),
without the extra optimization, unless we can actually measure a
performance difference.
- Heikki