Re: aio/README.md comments - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: aio/README.md comments |
Date | |
Msg-id | uebw3wuq3iudyx7xjgfqt7icqrtk4xv22cmwjittcy4s3rsaj2@d6sf52qwppbe Whole thread Raw |
In response to | Re: aio/README.md comments (Jeff Davis <pgsql@j-davis.com>) |
List | pgsql-hackers |
Hi, On 2025-08-29 15:23:48 -0700, Jeff Davis wrote: > On Fri, 2025-08-29 at 12:32 -0400, Andres Freund wrote: > > I don't really see an advantage of sync in those cases either. > > It seems a bit early to say that it's just there for debugging. But > it's just in a README, so I won't argue the point. There might be some regressions that make io_method=sync beneficial, but short to medium term, the goal ought to be to make all non-ridiculous configurations (I don't care about AIO performing well with s_b=16) to not regress meaningfully and for most things to be the same or better with AIO. I don't see any reason for io_method=sync to be something we should have for anything other than debugging medium to long term. Why do you think different? > diff --git a/src/backend/storage/aio/README.md b/src/backend/storage/aio/README.md > index 72ae3b3737d..8fa6bd6e9ca 100644 > --- a/src/backend/storage/aio/README.md > +++ b/src/backend/storage/aio/README.md > @@ -4,27 +4,38 @@ > > ### Why Asynchronous IO > > -Until the introduction of asynchronous IO postgres relied on the operating > -system to hide the cost of synchronous IO from postgres. While this worked > -surprisingly well in a lot of workloads, it does not do as good a job on > -prefetching and controlled writeback as we would like. > - > -There are important expensive operations like `fdatasync()` where the operating > -system cannot hide the storage latency. This is particularly important for WAL > -writes, where the ability to asynchronously issue `fdatasync()` or O_DSYNC > -writes can yield significantly higher throughput. I think this second paragraph was important and your rewrite largely removed it? > +Postgres depends on IO operations happening asynchronously for reasonable > +performance: for instance, a sequential scan would be far slower without the > +benefit of readahead. Historically, Postgres only used synchronous APIs for > +IO, while assuming that the operating system would use the kernel buffer cache > +to make those operations asynchronous in most cases (aside from, e.g., > +`fdatasync()`). > + > +The asynchronous IO APIs described here do not depend on that > +assumption. Instead, they allow different low-level IO methods, which are > +given more control and therefore rely less on the kernel's > +behavior. Currently, only async read operations are supported, but the > +infrastructure is designed to support async write operations in the future. The infrastructure supports writes today, it's just md.c and bufmgr.c isn't aren't ready to use it today. > ### Why Direct / unbuffered IO > > The main reasons to want to use Direct IO are: > > -- Lower CPU usage / higher throughput. Particularly on modern storage buffered > - writes are bottlenecked by the operating system having to copy data from the > - kernel's page cache to postgres buffer pool using the CPU. Whereas direct IO > - can often move the data directly between the storage devices and postgres' > - buffer cache, using DMA. While that transfer is ongoing, the CPU is free to > - perform other work. > +- Avoid extra memory copies between the kernel buffer cache and Postgres > + shared buffers. These memory copies can become the bottleneck when the > + underlying storage has high enough throughput, which is common for > + solid-state drives or fast network block devices. Instead, direct IO can > + often move the data directly between the Postgres buffer cache and the > + device by using DMA, leaving the CPU free to perform other work. > - Reduced latency - Direct IO can have substantially lower latency than > buffered IO, which can be impactful for OLTP workloads bottlenecked by WAL > write latency. I preferred the prior formulation that had the main reasons at the start of the bullet points. > @@ -37,11 +48,24 @@ The main reasons *not* to use Direct IO are: > > - Without AIO, Direct IO is unusably slow for most purposes. > - Even with AIO, many parts of postgres need to be modified to perform > - explicit prefetching. > + explicit prefetching (see read_stream.c). > - In situations where shared_buffers cannot be set appropriately large, > e.g. because there are many different postgres instances hosted on shared > hardware, performance will often be worse than when using buffered IO. Ok, although perhaps better to refer to the read stream section at the bottom? > +### Writing WAL > + > +Using AIO and Direct IO can reduce the overhead of WAL logging > +substantially: > + > +- AIO allows to start WAL writes eagerly, so they complete before needing to > + wait > +- AIO allows to have multiple WAL flushes in progress at the same time > +- Direct IO can reduce the number of roundtrips to storage on some OSs > + and storage HW (buffered IO and direct IO without O_DSYNC needs to > + issue a write and after the write's completion a cache flush, > + whereas O\_DIRECT + O\_DSYNC can use a single Force Unit Access > + (FUA) write). > ## AIO Usage Example > > @@ -196,25 +220,15 @@ processing to the AIO workers). > > ### IO can be started in critical sections > > -Using AIO for WAL writes can reduce the overhead of WAL logging substantially: > > -- AIO allows to start WAL writes eagerly, so they complete before needing to > - wait > -- AIO allows to have multiple WAL flushes in progress at the same time > -- AIO makes it more realistic to use O\_DIRECT + O\_DSYNC, which can reduce > - the number of roundtrips to storage on some OSs and storage HW (buffered IO > - and direct IO without O_DSYNC needs to issue a write and after the write's > - completion a cache flush, whereas O\_DIRECT + O\_DSYNC can use a single > - Force Unit Access (FUA) write). Direct IO alone does not reduce the number of roundtrips, the combination of DIO and O_DSYNC does. I think that got less clear in the rewrite. Greetings, Andres Freund
pgsql-hackers by date: