Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id 73p6ltusdmvnycra32pcaysqeeh2yvwoijwhvmkjexhej6jeso@gxhtb32iiu3d
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2025-03-14 15:43:15 -0400, Andres Freund wrote:
> Open items:
>
> - The upstream BAS_BULKREAD is so small that throughput is substantially worse
>   once a table reaches 1/4 shared_buffers. That patch in the patchset as-is is
>   probably not good enough, although I am not sure about that
>
>
> - The set_max_safe_fds() issue for io_uring
>
>
> - Right now effective_io_concurrency cannot be set > 0 on Windows and other
>   platforms that lack posix_fadvise. But with AIO we can read ahead without
>   posix_fadvise().
>
>   It'd not really make anything worse than today to not remove the limit, but
>   it'd be pretty weird to prevent windows etc from benefiting from AIO.  Need
>   to look around and see whether it would require anything other than doc
>   changes.

A fourth, smaller, question:

- Should the docs for debug_io_direct be rephrased and if so, how?

  Without read-stream-AIO debug_io_direct=data has completely unusable
  performance if there's ever any data IO - and if there's no IO there's no
  point in using the option.

  Now there is a certain set of workloads where performance with
  debug_io_direct=data can be better than master, sometimes substantially
  so. But at the same time, without support for at least:

  - AIO writes for at least checkpointer, bgwriter

    doing one synchronous IO for each buffer is ... slow.


  - read-streamified index vacuuming


  And probably also:
  - AIO-ified writes for writes executed by backends, e.g. due to strategies

    Doing one synchronous IO for each buffer is ... slow. And e.g. with COPY
    we do a *lot* of those. OTOH, it could be fine if most modifications are
    done via INSERTs instead of COPY.


  - prefetching for non-BHS index accesses

    Without prefetching, a well correlated index-range scan will be orders of
    magnitude slower with DIO.


  - Anything bypassing shared_buffers, like RelationCopyStorage() or
    bulk_write.c will be extremely slow

    The only saving grace is that these aren't all *that* common.


Due to those constraints I think it's pretty clear we can't remove the debug_
prefix at this time.

Perhaps it's worth going from

       <para>
        Currently this feature reduces performance, and is intended for
        developer testing only.
       </para>
to
       <para>
        Currently this feature reduces performance in many workloads, and is
        intended for testing only.
       </para>

I.e. qualify the downside with "many workloads" and widen the audience ever so
slightly?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: Doc: fix the rewrite condition when executing ALTER TABLE ADD COLUMN
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export