Re: AIO v2.5 - Mailing list pgsql-hackers

From Noah Misch
Subject Re: AIO v2.5
Date
Msg-id 20250323002056.95.nmisch@google.com
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
Responses Re: AIO v2.5
List pgsql-hackers
On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> Attached v2.11, with the following changes:

> - Added an error check for FileStartReadV() failing
> 
>   FileStartReadV() actually can fail, if the file can't be re-opened. I
>   thought it'd be important for the error message to differ from the one
>   that's issued for read actually failing, so I went with:
> 
>   "could not start reading blocks %u..%u in file \"%s\": %m"
> 
>   but I'm not sure how good that is.

Message looks good.

> - Improved error message if io_uring_queue_init() fails
> 
>   Added errhint()s for likely cases of failure.
> 
>   Added errcode().  I was tempted to use errcode_for_file_access(), but that
>   doesn't support ENOSYS - perhaps I should add that instead?

Either way is fine with me.  ENOSYS -> ERRCODE_FEATURE_NOT_SUPPORTED is a good
general mapping to have in errcode_for_file_access(), but it's also not a
problem to keep it the way v2.11 has it.

> - Disable io_uring method when using EXEC_BACKEND, they're not compatible
> 
>   I chose to do this with a define aio.h, but I guess we could also do it at
>   configure time? That seems more complicated though - how would we even know
>   that EXEC_BACKEND is used on non-windows?

Agreed, "make PROFILE=-DEXEC_BACKEND" is a valid way to get EXEC_BACKEND.

>   Not sure yet how to best disable testing io_uring in this case. We can't
>   just query EXEC_BACKEND from pg_config.h unfortunately.  I guess making the
>   initdb not fail and checking the error log would work, but that doesn't work
>   nicely with Cluster.pm.

How about "postgres -c io_method=io_uring -C <anything>":

--- a/src/test/modules/test_aio/t/001_aio.pl
+++ b/src/test/modules/test_aio/t/001_aio.pl
@@ -29,7 +29,13 @@ $node_worker->stop();
 # Test io_method=io_uring
 ###
 
-if ($ENV{with_liburing} eq 'yes')
+sub have_io_uring
+{
+    local %ENV = $node_worker->_get_env();  # any node works
+    return run_log [qw(postgres -c io_method=io_uring -C io_method)];
+}
+
+if (have_io_uring())
 {
     my $node_uring = create_node('io_uring');
     $node_uring->start();

> Questions:
> 
> 
> - We only "look" at BM_IO_ERROR for writes, isn't that somewhat weird?
> 
>   See AbortBufferIO(Buffer buffer)
> 
>   It doesn't really matter for the patchset, but it just strikes me as an oddity.

That caught my attention in an earlier review round, but I didn't find it
important enough to raise.  It's mildly unfortunate to be setting BM_IO_ERROR
for reads when the only thing BM_IO_ERROR drives is message "Multiple failures
--- write error might be permanent."  It's minor, so let's leave it that way
for the foreseeable future.

> Subject: [PATCH v2.11 01/27] aio, bufmgr: Comment fixes

Ready to commit, though other comment fixes might come up in later reviews.
One idea so far is to comment on valid states after some IoMethodOps
callbacks:

--- a/src/include/storage/aio_internal.h
+++ b/src/include/storage/aio_internal.h
@@ -310,6 +310,9 @@ typedef struct IoMethodOps
     /*
      * Start executing passed in IOs.
      *
+     * Shall advance state to PGAIO_HS_SUBMITTED.  (By the time this returns,
+     * other backends might have advanced the state further.)
+     *
      * Will not be called if ->needs_synchronous_execution() returned true.
      *
      * num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE.
@@ -321,6 +324,12 @@ typedef struct IoMethodOps
     /*
      * Wait for the IO to complete. Optional.
      *
+     * On return, state shall be PGAIO_HS_COMPLETED_IO,
+     * PGAIO_HS_COMPLETED_SHARED or PGAIO_HS_COMPLETED_LOCAL.  (The callback
+     * need not change the state if it's already one of those.)  If state is
+     * PGAIO_HS_COMPLETED_IO, state will reach PGAIO_HS_COMPLETED_SHARED
+     * without further intervention.
+     *
      * If not provided, it needs to be guaranteed that the IO method calls
      * pgaio_io_process_completion() without further interaction by the
      * issuing backend.

> Subject: [PATCH v2.11 02/27] aio: Change prefix of PgAioResultStatus values to
>  PGAIO_RS_

Ready to commit

> Subject: [PATCH v2.11 03/27] Redefine max_files_per_process to control
>  additionally opened files

Ready to commit

> Subject: [PATCH v2.11 04/27] aio: Add liburing dependency

> --- a/meson.build
> +++ b/meson.build
> @@ -944,6 +944,18 @@ endif
>  
>  
>  
> +###############################################################
> +# Library: liburing
> +###############################################################
> +
> +liburingopt = get_option('liburing')
> +liburing = dependency('liburing', required: liburingopt)
> +if liburing.found()
> +  cdata.set('USE_LIBURING', 1)
> +endif

This is a different style from other deps; is it equivalent to our standard
style?  Example for lz4:

lz4opt = get_option('lz4')
if not lz4opt.disabled()
  lz4 = dependency('liblz4', required: false)
  # Unfortunately the dependency is named differently with cmake
  if not lz4.found() # combine with above once meson 0.60.0 is required
    lz4 = dependency('lz4', required: lz4opt,
                     method: 'cmake', modules: ['LZ4::lz4_shared'],
                    )
  endif

  if lz4.found()
    cdata.set('USE_LZ4', 1)
    cdata.set('HAVE_LIBLZ4', 1)
  endif

else
  lz4 = not_found_dep
endif

> --- a/configure.ac
> +++ b/configure.ac
> @@ -975,6 +975,14 @@ AC_SUBST(with_readline)
>  PGAC_ARG_BOOL(with, libedit-preferred, no,
>                [prefer BSD Libedit over GNU Readline])
>  
> +#
> +# liburing
> +#
> +AC_MSG_CHECKING([whether to build with liburing support])
> +PGAC_ARG_BOOL(with, liburing, no, [io_uring support, for asynchronous I/O],

Fourth arg generally starts with "build" for args like this.  I suggest "build
with io_uring support, for asynchronous I/O".  Comparable options:

  --with-llvm             build with LLVM based JIT support
  --with-tcl              build Tcl modules (PL/Tcl)
  --with-perl             build Perl modules (PL/Perl)
  --with-python           build Python modules (PL/Python)
  --with-gssapi           build with GSSAPI support
  --with-pam              build with PAM support
  --with-bsd-auth         build with BSD Authentication support
  --with-ldap             build with LDAP support
  --with-bonjour          build with Bonjour support
  --with-selinux          build with SELinux support
  --with-systemd          build with systemd support
  --with-libcurl          build with libcurl support
  --with-libxml           build with XML support
  --with-libxslt          use XSLT support when building contrib/xml2
  --with-lz4              build with LZ4 support
  --with-zstd             build with ZSTD support

> +              [AC_DEFINE([USE_LIBURING], 1, [Define to build with io_uring support. (--with-liburing)])])
> +AC_MSG_RESULT([$with_liburing])
> +AC_SUBST(with_liburing)
>  
>  #
>  # UUID library
> @@ -1463,6 +1471,9 @@ elif test "$with_uuid" = ossp ; then
>  fi
>  AC_SUBST(UUID_LIBS)
>  
> +if test "$with_liburing" = yes; then
> +  PKG_CHECK_MODULES(LIBURING, liburing)
> +fi

We usually put this right after the AC_MSG_CHECKING ... AC_SUBST block.  This
currently has unrelated stuff separating them.  Also, with the exception of
icu, we follow PKG_CHECK_MODULES uses by absorbing flags from pkg-config and
use AC_CHECK_LIB to add the actual "-l".  By not absorbing flags, I think a
liburing in a nonstandard location would require --with-libraries and
--with-includes, unlike the other PKG_CHECK_MODULES-based dependencies.  lz4
is a representative example of our standard:

```
AC_MSG_CHECKING([whether to build with LZ4 support])
PGAC_ARG_BOOL(with, lz4, no, [build with LZ4 support],
              [AC_DEFINE([USE_LZ4], 1, [Define to 1 to build with LZ4 support. (--with-lz4)])])
AC_MSG_RESULT([$with_lz4])
AC_SUBST(with_lz4)

if test "$with_lz4" = yes; then
  PKG_CHECK_MODULES(LZ4, liblz4)
  # We only care about -I, -D, and -L switches;
  # note that -llz4 will be added by AC_CHECK_LIB below.
  for pgac_option in $LZ4_CFLAGS; do
    case $pgac_option in
      -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
    esac
  done
  for pgac_option in $LZ4_LIBS; do
    case $pgac_option in
      -L*) LDFLAGS="$LDFLAGS $pgac_option";;
    esac
  done
fi

# ... later in file ...

if test "$with_lz4" = yes ; then
  AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
```

I think it's okay to not use the AC_CHECK_LIB and rely on explicit
src/backend/Makefile code like you've done, but we shouldn't miss
CPPFLAGS/LDFLAGS (or should have a comment on why missing them is right).

> --- a/doc/src/sgml/installation.sgml
> +++ b/doc/src/sgml/installation.sgml

lz4 and other deps have a mention in <sect1 id="install-requirements">, in
addition to sections edited here.

> Subject: [PATCH v2.11 05/27] aio: Add io_method=io_uring

(Still reviewing this one.)



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Proposal - Allow extensions to set a Plan Identifier
Next
From: Michael Paquier
Date:
Subject: Re: query_id: jumble names of temp tables for better pg_stat_statement UX