Re: AIO v2.0 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.0
Date
Msg-id sazl7yyvaae23dysaedc62pu3zfvpc3bytaaqy5lk2sec3cmca@w4gt3tjs2tso
Whole thread Raw
In response to Re: AIO v2.0  (Noah Misch <noah@leadboat.com>)
Responses Re: AIO v2.0
Re: AIO v2.0
List pgsql-hackers
Hi,

On 2024-09-17 11:08:19 -0700, Noah Misch wrote:
> > - I am worried about the need for bounce buffers for writes of checksummed
> >   buffers. That quickly ends up being a significant chunk of memory,
> >   particularly when using a small shared_buffers with a higher than default
> >   number of connection. I'm currently hacking up a prototype that'd prevent us
> >   from setting hint bits with just a share lock. I'm planning to start a
> >   separate thread about that.
> 
> AioChooseBounceBuffers() limits usage to 256 blocks (2MB) per MaxBackends.
> Doing better is nice, but I don't consider this a blocker.  I recommend
> dealing with the worry by reducing the limit initially (128 blocks?).  Can
> always raise it later.

On storage that has nontrivial latency, like just about all cloud storage,
even 256 will be too low. Particularly for checkpointer.

Assuming 1ms latency - which isn't the high end of cloud storage latency - 256
blocks in flight limits you to <= 256MByte/s, even on storage that can have a
lot more throughput. With 3ms, which isn't uncommon, it's 85MB/s.

Of course this could be addressed by tuning, but it seems like something that
shouldn't need to be tuned by the majority of folks running postgres.


We also discussed the topic at https://postgr.es/m/20240925020022.c5.nmisch%40google.com
> ... neither BM_SETTING_HINTS nor keeping bounce buffers looks like a bad
> decision.  From what I've heard so far of the performance effects, if it were
> me, I would keep the bounce buffers.  I'd pursue BM_SETTING_HINTS and bounce
> buffer removal as a distinct project after the main AIO capability.  Bounce
> buffers have an implementation.  They aren't harming other design decisions.
> The AIO project is big, so I'd want to err on the side of not designating
> other projects as its prerequisites.

Given the issues that modifying pages while in flight causes, not just with PG
level checksums, but also filesystem level checksum, I don't feel like it's a
particularly promising approach.

However, I think this doesn't have to mean that the BM_SETTING_HINTS stuff has
to be completed before we can move forward with AIO. If I split out the write
portion from the read portion a bit further, the main AIO changes and the
shared-buffer read user can be merged before there's a dependency on the hint
bit stuff being done.

Does that seem reasonable?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: ACL_MAINTAIN, Lack of comment content
Next
From: Maxim Orlov
Date:
Subject: Do not lock temp relations