Re: AIO v2.5 - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | CA+hUKGKwV7MccEL+atTwwX2Pazo1h8M_ZChzKKMp7pz258uWow@mail.gmail.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Mon, Mar 24, 2025 at 5:59 AM Andres Freund <andres@anarazel.de> wrote: > On 2025-03-23 08:55:29 -0700, Noah Misch wrote: > > An IO in PGAIO_HS_STAGED clearly blocks closing the IO's FD, and an IO in > > PGAIO_HS_COMPLETED_IO clearly doesn't block that close. For io_method=worker, > > closing in PGAIO_HS_SUBMITTED is okay. For io_method=io_uring, is there a > > reference about it being okay to close during PGAIO_HS_SUBMITTED? I looked > > awhile for an authoritative view on that, but I didn't find one. If we can > > rely on io_uring_submit() returning only after the kernel has given the > > io_uring its own reference to all applicable file descriptors, I expect it's > > okay to close the process's FD. If the io_uring acquires its reference later > > than that, I expect we shouldn't close before that later time. > > I'm fairly sure io_uring has its own reference for the file descriptor by the > time io_uring_enter() returns [1]. What io_uring does *not* reliably tolerate > is the issuing process *exiting* before the IO completes, even if there are > other processes attached to the same io_uring instance. It is a bit strange that the documentation doesn't say that explicitly. You can sorta-maybe-kinda infer it from the fact that io_uring didn't originally support cancelling requests at all, maybe a small clue that it also didn't cancel them when you closed the fd :-) The only sane alternative would seem to be that they keep running and have their own reference to the *file* (not the fd), which is the actual case, and might also be inferrable at a stretch from the io_uring_register() documentation that says it reduces overheads with a "long term reference" reducing "per-I/O overhead". (The distant third option/non-option is a sort of late/async binding fd as seen in the Glibc user space POSIX AIO implementation, but that sort of madness doesn't seem to be the sort of thing anyone working in the kernel would entertain for a nanosecond...) Anyway, there are also public discussions involving Mr Axboe that discuss the fact that async operations continue to run when the associated fd is closed, eg from people who were surprised by that when porting stuff from other systems, which might help fill in the documentation gap a teensy bit if people want to see something outside the source code: https://github.com/axboe/liburing/issues/568 > AIO v1 had a posix_aio backend, which, on several platforms, did *not* > tolerate the FD being closed before the IO completes. Because of that > IoMethodOps had a closing_fd callback, which posix_aio used to wait for the > IO's completion [2]. Just for the record while remembering this stuff: Windows is another system that took the cancel-on-close approach, so the Windows IOCP proof-of-concept patches also used that AIO v1 callback and we'll have to think about that again if/when we want to get that stuff going on AIO v2. I recall also speculating that it might be better to teach the vfd system to pick another victim to close instead if an fd was currently tied up with an asynchronous I/O for the benefit of those cancel-on-close systems, hopefully without any happy-path book-keeping. But just submitting staged I/O is a nice and cheap solution for now, without them in the picture.
pgsql-hackers by date: