Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id 5ons2rtmwarqqhhexb3dnqulw5rjgwgoct57vpdau4rujlrffj@3fls6d2mkiwc
Whole thread Raw
In response to Re: AIO v2.5  (Noah Misch <noah@leadboat.com>)
Responses Re: AIO v2.5
List pgsql-hackers
Hi,

On 2025-03-25 08:58:08 -0700, Noah Misch wrote:
> While having nagging thoughts that we might be releasing FDs before io_uring
> gets them into kernel custody, I tried this hack to maximize FD turnover:
> 
> static void
> ReleaseLruFiles(void)
> {
> #if 0
>     while (nfile + numAllocatedDescs + numExternalFDs >= max_safe_fds)
>     {
>         if (!ReleaseLruFile())
>             break;
>     }
> #else
>     while (ReleaseLruFile())
>         ;
> #endif
> }
> 
> "make check" with default settings (io_method=worker) passes, but
> io_method=io_uring in the TEMP_CONFIG file got different diffs in each of two
> runs.  s/#if 0/#if 1/ (restore normal FD turnover) removes the failures.
> Here's the richer of the two diffs:

Yikes. That's a very good catch.

I spent a bit of time debugging this. I think I see what's going on - it turns
out that the kernel does *not* open the FDs during io_uring_enter() if
IOSQE_ASYNC is specified [1].  Which we do add heuristically, in an attempt to
avoid a small but measurable slowdown for sequential scans that are fully
buffered (c.f. pgaio_uring_submit()).  If I disable that heuristic, your patch
above passes all tests here.


I don't know if that's an intentional or unintentional behavioral difference.

There are 2 1/2 ways around this:

1) Stop using IOSQE_ASYNC heuristic
2a) Wait for all in-flight IOs when any FD gets closed
2b) Wait for all in-flight IOs using FD when it gets closed

Given that we have clear evidence that io_uring doesn't completely support
closing FDs while IOs are in flight, be it a bug or intentional, it seems
clearly better to go for 2a or 2b.

Greetings,

Andres Freund


[1] Instead files are opened when the queue entry is being worked on
    instead. Interestingly that only happens when the IO is *explicitly*
    requested to be executed in the workqueue with IOSQE_ASYNC, not when it's
    put there because it couldn't be done in a non-blocking way.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: why there is not VACUUM FULL CONCURRENTLY?
Next
From: Daniel Gustafsson
Date:
Subject: Re: Allow default \watch interval in psql to be configured