Re: aio/README.md comments - Mailing list pgsql-hackers

From Andres Freund
Subject Re: aio/README.md comments
Date
Msg-id uebw3wuq3iudyx7xjgfqt7icqrtk4xv22cmwjittcy4s3rsaj2@d6sf52qwppbe
Whole thread Raw
In response to Re: aio/README.md comments  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi,

On 2025-08-29 15:23:48 -0700, Jeff Davis wrote:
> On Fri, 2025-08-29 at 12:32 -0400, Andres Freund wrote:
> > I don't really see an advantage of sync in those cases either.
> 
> It seems a bit early to say that it's just there for debugging. But
> it's just in a README, so I won't argue the point.

There might be some regressions that make io_method=sync beneficial, but short
to medium term, the goal ought to be to make all non-ridiculous configurations
(I don't care about AIO performing well with s_b=16) to not regress
meaningfully and for most things to be the same or better with AIO.

I don't see any reason for io_method=sync to be something we should have for
anything other than debugging medium to long term.

Why do you think different?



> diff --git a/src/backend/storage/aio/README.md b/src/backend/storage/aio/README.md
> index 72ae3b3737d..8fa6bd6e9ca 100644
> --- a/src/backend/storage/aio/README.md
> +++ b/src/backend/storage/aio/README.md
> @@ -4,27 +4,38 @@
>  
>  ### Why Asynchronous IO
>  
> -Until the introduction of asynchronous IO postgres relied on the operating
> -system to hide the cost of synchronous IO from postgres. While this worked
> -surprisingly well in a lot of workloads, it does not do as good a job on
> -prefetching and controlled writeback as we would like.
> -
> -There are important expensive operations like `fdatasync()` where the operating
> -system cannot hide the storage latency. This is particularly important for WAL
> -writes, where the ability to asynchronously issue `fdatasync()` or O_DSYNC
> -writes can yield significantly higher throughput.

I think this second paragraph was important and your rewrite largely removed
it?


> +Postgres depends on IO operations happening asynchronously for reasonable
> +performance: for instance, a sequential scan would be far slower without the
> +benefit of readahead. Historically, Postgres only used synchronous APIs for
> +IO, while assuming that the operating system would use the kernel buffer cache
> +to make those operations asynchronous in most cases (aside from, e.g.,
> +`fdatasync()`).
> +
> +The asynchronous IO APIs described here do not depend on that
> +assumption. Instead, they allow different low-level IO methods, which are
> +given more control and therefore rely less on the kernel's
> +behavior. Currently, only async read operations are supported, but the
> +infrastructure is designed to support async write operations in the future.

The infrastructure supports writes today, it's just md.c and bufmgr.c isn't
aren't ready to use it today.


>  ### Why Direct / unbuffered IO
>  
>  The main reasons to want to use Direct IO are:
>  
> -- Lower CPU usage / higher throughput. Particularly on modern storage buffered
> -  writes are bottlenecked by the operating system having to copy data from the
> -  kernel's page cache to postgres buffer pool using the CPU. Whereas direct IO
> -  can often move the data directly between the storage devices and postgres'
> -  buffer cache, using DMA. While that transfer is ongoing, the CPU is free to
> -  perform other work.
> +- Avoid extra memory copies between the kernel buffer cache and Postgres
> +  shared buffers. These memory copies can become the bottleneck when the
> +  underlying storage has high enough throughput, which is common for
> +  solid-state drives or fast network block devices. Instead, direct IO can
> +  often move the data directly between the Postgres buffer cache and the
> +  device by using DMA, leaving the CPU free to perform other work.
>  - Reduced latency - Direct IO can have substantially lower latency than
>    buffered IO, which can be impactful for OLTP workloads bottlenecked by WAL
>    write latency.

I preferred the prior formulation that had the main reasons at the start of
the bullet points.


> @@ -37,11 +48,24 @@ The main reasons *not* to use Direct IO are:
>  
>  - Without AIO, Direct IO is unusably slow for most purposes.
>  - Even with AIO, many parts of postgres need to be modified to perform
> -  explicit prefetching.
> +  explicit prefetching (see read_stream.c).
>  - In situations where shared_buffers cannot be set appropriately large,
>    e.g. because there are many different postgres instances hosted on shared
>    hardware, performance will often be worse than when using buffered IO.

Ok, although perhaps better to refer to the read stream section at the bottom?


> +### Writing WAL
> +
> +Using AIO and Direct IO can reduce the overhead of WAL logging
> +substantially:
> +
> +- AIO allows to start WAL writes eagerly, so they complete before needing to
> +  wait
> +- AIO allows to have multiple WAL flushes in progress at the same time
> +- Direct IO can reduce the number of roundtrips to storage on some OSs
> +  and storage HW (buffered IO and direct IO without O_DSYNC needs to
> +  issue a write and after the write's completion a cache flush,
> +  whereas O\_DIRECT + O\_DSYNC can use a single Force Unit Access
> +  (FUA) write).

>  ## AIO Usage Example
>  
> @@ -196,25 +220,15 @@ processing to the AIO workers).
>  
>  ### IO can be started in critical sections
>  
> -Using AIO for WAL writes can reduce the overhead of WAL logging substantially:
>  
> -- AIO allows to start WAL writes eagerly, so they complete before needing to
> -  wait
> -- AIO allows to have multiple WAL flushes in progress at the same time
> -- AIO makes it more realistic to use O\_DIRECT + O\_DSYNC, which can reduce
> -  the number of roundtrips to storage on some OSs and storage HW (buffered IO
> -  and direct IO without O_DSYNC needs to issue a write and after the write's
> -  completion a cache flush, whereas O\_DIRECT + O\_DSYNC can use a single
> -  Force Unit Access (FUA) write).

Direct IO alone does not reduce the number of roundtrips, the combination of
DIO and O_DSYNC does. I think that got less clear in the rewrite.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] meson: Update meson to enable building postgres as a subproject
Next
From: Sami Imseih
Date:
Subject: Re: Improve LWLock tranche name visibility across backends