Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id cdxulvhcemfcvluc6cttva5f5gthya6jcxrj64yhnmy3rmufe2@7kbk657ixbuu
Whole thread Raw
In response to Re: AIO v2.5  (Noah Misch <noah@leadboat.com>)
Responses Re: AIO v2.5
List pgsql-hackers
Hi,

On 2025-03-22 19:09:55 -0700, Noah Misch wrote:
> On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> > Attached v2.11
>
> > Subject: [PATCH v2.11 05/27] aio: Add io_method=io_uring
>
> Apart from some isolated cosmetic points, this is ready to commit:
>
> > +            ereport(ERROR,
> > +                    errcode(err),
> > +                    errmsg("io_uring_queue_init failed: %m"),
> > +                    hint != NULL ? errhint("%s", hint) : 0);
>
> https://www.postgresql.org/docs/current/error-style-guide.html gives the example:
>
> BAD:    open() failed: %m
> BETTER: could not open file %s: %m
>
> Hence, this errmsg should change, perhaps to:
> "could not setup io_uring queues: %m".

You're right. I didn't intentionally "violate" the policy, but I do have to
admit, I'm not a huge fan of that aspect, it just obfuscates what actually
failed, forcing one to look at the code or strace to figure out what precisely
failed.

(Changed)


> > +        pgaio_debug_io(DEBUG3, ioh,
> > +                       "wait_one io_gen: %llu, ref_gen: %llu, cycle %d",
> > +                       (long long unsigned) ref_generation,
> > +                       (long long unsigned) ioh->generation,
>
> In the message string, io_gen appears before ref_gen.  In the subsequent args,
> the order is swapped relative to the message string.

Oops, you're right.


> > --- a/src/backend/utils/activity/wait_event_names.txt
> > +++ b/src/backend/utils/activity/wait_event_names.txt
> > @@ -192,6 +192,8 @@ ABI_compatibility:
> >
> >  Section: ClassName - WaitEventIO
> >
> > +AIO_IO_URING_SUBMIT    "Waiting for IO submission via io_uring."
> > +AIO_IO_URING_COMPLETION    "Waiting for IO completion via io_uring."
> >  AIO_IO_COMPLETION    "Waiting for IO completion."
>
> I'm wondering if there's an opportunity to enrich the last two wait event
> names and/or descriptions.  The current descriptions suggest to me more
> similarity than is actually there.  Inputs to the decision:
>
> - AIO_IO_COMPLETION waits for an IO in PGAIO_HS_DEFINED, PGAIO_HS_STAGED, or
>   PGAIO_HS_COMPLETED_IO to reach PGAIO_HS_COMPLETED_SHARED.  The three
>   starting states are the states where some other backend owns the next
>   action, so the current backend can only wait to be signaled.
>
> - AIO_IO_URING_COMPLETION waits for the kernel to do enough so we can move
>   from PGAIO_HS_SUBMITTED to PGAIO_HS_COMPLETED_IO.
>
> Possible names and descriptions, based on PgAioHandleState enum names and
> comments:
>
> AIO_IO_URING_COMPLETED_IO    "Waiting for IO result via io_uring."
> AIO_COMPLETED_SHARED    "Waiting for IO shared completion callback."
>
> If "shared completion callback" is too internals-focused, perhaps this:
>
> AIO_IO_URING_COMPLETED_IO    "Waiting for IO result via io_uring."
> AIO_COMPLETED_SHARED    "Waiting for IO completion to update shared memory."

Hm, right now AIO_IO_COMPLETION also covers the actual "raw" execution of the
IO with io_method=worker/sync. For that AIO_COMPLETED_SHARED would be
inappropriate.

We could use a different wait event if wait for an IO via CV in
PGAIO_HS_SUBMITTED, with a small refactoring of pgaio_io_wait().  But I'm not
sure that would get you that far - we don't broadcast the CV when
transitioning from PGAIO_HS_SUBMITTED -> PGAIO_HS_COMPLETED_IO, so the wait
event would stay the same, now wrong, wait event until the shared callback
completes. Obviously waking everyone up just so they can use a differen wait
event doesn't make sense.

A more minimal change would be to narrow AIO_IO_URING_COMPLETION to
"execution" or something like that, to hint at a separation between the raw IO
being completed and the IO, including the callbacks completing.


> > --- a/doc/src/sgml/config.sgml
> > +++ b/doc/src/sgml/config.sgml
> > @@ -2710,6 +2710,12 @@ include_dir 'conf.d'
> >              <literal>worker</literal> (execute asynchronous I/O using worker processes)
> >             </para>
> >            </listitem>
> > +          <listitem>
> > +           <para>
> > +            <literal>io_uring</literal> (execute asynchronous I/O using
> > +            io_uring, if available)
>
> I feel the "if available" doesn't quite fit, since we'll fail if unavailable.
> Maybe just "(execute asynchronous I/O using Linux io_uring)" with "Linux"
> there to reduce surprise on other platforms.

You're right, the if available can be misunderstood. But not mentioning that
it's an optional dependency seems odd too. What about something like

           <para>
            <literal>io_uring</literal> (execute asynchronous I/O using
            io_uring, requires postgres to have been built with
            <link linkend="configure-option-with-liburing"><option>--with-liburing</option></link> /
            <link linkend="configure-with-liburing-meson"><option>-Dliburing</option></link>)
           </para>

Should the docs for --with-liburing/-Dliburing mention it's linux only? We
don't seem to do that for things like systemd (linux), selinux (linux) and
only kinda for bonjour (macos).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: AIO v2.5
Next
From: Noah Misch
Date:
Subject: Re: AIO v2.5