Hi,
On 2025-03-25 08:58:08 -0700, Noah Misch wrote:
> While having nagging thoughts that we might be releasing FDs before io_uring
> gets them into kernel custody, I tried this hack to maximize FD turnover:
>
> static void
> ReleaseLruFiles(void)
> {
> #if 0
> while (nfile + numAllocatedDescs + numExternalFDs >= max_safe_fds)
> {
> if (!ReleaseLruFile())
> break;
> }
> #else
> while (ReleaseLruFile())
> ;
> #endif
> }
>
> "make check" with default settings (io_method=worker) passes, but
> io_method=io_uring in the TEMP_CONFIG file got different diffs in each of two
> runs. s/#if 0/#if 1/ (restore normal FD turnover) removes the failures.
> Here's the richer of the two diffs:
Yikes. That's a very good catch.
I spent a bit of time debugging this. I think I see what's going on - it turns
out that the kernel does *not* open the FDs during io_uring_enter() if
IOSQE_ASYNC is specified [1]. Which we do add heuristically, in an attempt to
avoid a small but measurable slowdown for sequential scans that are fully
buffered (c.f. pgaio_uring_submit()). If I disable that heuristic, your patch
above passes all tests here.
I don't know if that's an intentional or unintentional behavioral difference.
There are 2 1/2 ways around this:
1) Stop using IOSQE_ASYNC heuristic
2a) Wait for all in-flight IOs when any FD gets closed
2b) Wait for all in-flight IOs using FD when it gets closed
Given that we have clear evidence that io_uring doesn't completely support
closing FDs while IOs are in flight, be it a bug or intentional, it seems
clearly better to go for 2a or 2b.
Greetings,
Andres Freund
[1] Instead files are opened when the queue entry is being worked on
instead. Interestingly that only happens when the IO is *explicitly*
requested to be executed in the workqueue with IOSQE_ASYNC, not when it's
put there because it couldn't be done in a non-blocking way.