Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id 3yxd5r23zly5bytvgyktbxtxq2r3gbpi7xd4dugevh3h4w4q6c@lu6oatjjpltz
Whole thread Raw
In response to Re: AIO v2.5  (Antonin Houska <ah@cybertec.at>)
Responses Re: AIO v2.5
Re: AIO v2.5
Re: AIO v2.5
List pgsql-hackers
Hi,

On 2025-03-13 11:53:03 +0100, Antonin Houska wrote:
> Attached are a few proposals for minor comment fixes.

Thanks, applied.


> Besides that, it occurred to me when I was trying to get familiar with the
> patch set (respectable work, btw) that an additional Assert() statement could
> make sense:

Yea, it does. I added it to another place as well.


Attached is v2.8 with the following changes:

- I wasn't happy with the way StartReadBuffers(), WaitReadBuffers() and
  AsyncReadBuffers() interacted. The io_method=sync path in particular was too
  cute by half, calling WaitReadBuffers() from within WaitReadBuffers().

  I think the new state considerably better.

  Plenty other smaller changes as part of that. One worth calling out is that
  ReadBuffersCanStartIO() now submits staged IO before actually blocking. Not
  the prettiest code, but I think it's ok.


- Added a function to assert the sanitiy of a ReadBuffersOperation

  While doing the refactoring for the prior point, I temporarily had a bug
  that returned buffers for which IO wasn't actually performed. Surprisingly
  the only assertion that triggered was when that buffer was read again by
  another operation, because it had been marked dirty, despite never being
  valid.

  Now there's a function that can be used to check that the buffers referenced
  by a ReadBuffersOperation are in a sane state.

  I guess this could be committed independently, but it'd not be entirely
  trivial to extract, so I'm currently leaning against doing that.


- Previously VacuumCostActive accounting happened after IO completion. But
  that doesn't seem quite right, it'd allow to submit a lot of IO at
  once. It's now moved to AsyncReadBuffers()


- With io_method=sync or with worker and temp tables, smgrstartreadv() would
  actually execute the IO. But the time accounting was done entirely around
  pgaio_wref_wait(). Now it's done in both places.


- Rebased onto newer version of Thomas' read_stream.c changes

  With that newer version the integration with read stream for actually doing
  AIO is a bit simpler.  There's one FIXME in the patch, because I don't
  really understand what a comment is referring to.

  I also split out a more experimental patch to make more efficient use of
  batching in read stream, the heuristics are more complicated, and it works
  well enough without.


- I added a commit to clean up the buffer access accounting for the case that
  a buffer was read in concurrently. That IMO is somewhat bogus on master, and
  it seemed to get more bogus with AIO.


- Integrated Antonin Houska's fixes and Assert suggestion


- Added a patch to address the smgr.c/md.c interrupt issue (a problem in master), see
  https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl



I think the reasonable next steps are:

- Commit "localbuf: *" commits


- Commit temp table tests, likely after lowering the minimum temp_buffers setting


- Pursue a fix of the smgr interupt issue on the referenced thread

  This can happen in parallel with AIO patches up to
  "aio: Implement support for reads in smgr/md/fd"


- Commit the core AIO infrastructure patch after doing a few more passes


- Commit IO worker support


- In parallel: Find a way to deal with the set_max_safe_fds() issue that we've
  been discussing on this thread recently. As that only affects io_uring, it
  doesn't have to block other patches going in.


- Do a round of review of the read_stream changes Thomas recently posted (and
  that are also included here)


- Try to get some more review for the bufmgr.c related changes. I've whacked
  them around a fair bit lately.


- Try to get Thomas to review my read_stream.c changes



Open items:

- The upstream BAS_BULKREAD is so small that throughput is substantially worse
  once a table reaches 1/4 shared_buffers. That patch in the patchset as-is is
  probably not good enough, although I am not sure about that


- The set_max_safe_fds() issue for io_uring


- Right now effective_io_concurrency cannot be set > 0 on Windows and other
  platforms that lack posix_fadvise. But with AIO we can read ahead without
  posix_fadvise().

  It'd not really make anything worse than today to not remove the limit, but
  it'd be pretty weird to prevent windows etc from benefiting from AIO.  Need
  to look around and see whether it would require anything other than doc
  changes.


Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: lwlocknames.h beautification attempt
Next
From: Álvaro Herrera
Date:
Subject: Re: Doc: fix the rewrite condition when executing ALTER TABLE ADD COLUMN