Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id djy6kd673fj4ked5jb2itksixceoog2evrpgk5xglaflkglmaw@pmk3oyeui2cj
Whole thread Raw
In response to Re: AIO v2.5  (Noah Misch <noah@leadboat.com>)
Responses Re: AIO v2.5
List pgsql-hackers
Hi,

On 2025-03-23 08:55:29 -0700, Noah Misch wrote:
> On Sun, Mar 23, 2025 at 11:11:53AM -0400, Andres Freund wrote:
> Unrelated to the above, another question about io_uring:
> 
> commit da722699 wrote:
> > +/*
> > + * Need to submit staged but not yet submitted IOs using the fd, otherwise
> > + * the IO would end up targeting something bogus.
> > + */
> > +void
> > +pgaio_closing_fd(int fd)
> 
> An IO in PGAIO_HS_STAGED clearly blocks closing the IO's FD, and an IO in
> PGAIO_HS_COMPLETED_IO clearly doesn't block that close.  For io_method=worker,
> closing in PGAIO_HS_SUBMITTED is okay.  For io_method=io_uring, is there a
> reference about it being okay to close during PGAIO_HS_SUBMITTED?  I looked
> awhile for an authoritative view on that, but I didn't find one.  If we can
> rely on io_uring_submit() returning only after the kernel has given the
> io_uring its own reference to all applicable file descriptors, I expect it's
> okay to close the process's FD.  If the io_uring acquires its reference later
> than that, I expect we shouldn't close before that later time.

I'm fairly sure io_uring has its own reference for the file descriptor by the
time io_uring_enter() returns [1].  What io_uring does *not* reliably tolerate
is the issuing process *exiting* before the IO completes, even if there are
other processes attached to the same io_uring instance.

AIO v1 had a posix_aio backend, which, on several platforms, did *not*
tolerate the FD being closed before the IO completes. Because of that
IoMethodOps had a closing_fd callback, which posix_aio used to wait for the
IO's completion [2].


I've added a test case exercising this path for all io methods. But I can't
think of a way that would catch io_uring not actually holding a reference to
the fd with a high likelihood - the IO will almost always complete quickly
enough to not be able to catch that. But it still seems better than not at all
testing the path - it does catch at least the problem of pgaio_closing_fd()
not doing anything.

Greetings,

Andres Freund

[1] See
  https://github.com/torvalds/linux/blob/586de92313fcab8ed84ac5f78f4d2aae2db92c59/io_uring/io_uring.c#L1728
  called from
  https://github.com/torvalds/linux/blob/586de92313fcab8ed84ac5f78f4d2aae2db92c59/io_uring/io_uring.c#L2204
  called from
  https://github.com/torvalds/linux/blob/586de92313fcab8ed84ac5f78f4d2aae2db92c59/io_uring/io_uring.c#L3372
  in the io_uring_enter() syscall

[2]
https://github.com/anarazel/postgres/blob/a08cd717b5af4e51afb25ec86623973158a72ab9/src/backend/storage/aio/aio_posix.c#L738



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Snapshot related assert failure on skink
Next
From: Melanie Plageman
Date:
Subject: Re: Parallel heap vacuum