Re: AIO writes vs hint bits vs checksums - Mailing list pgsql-hackers

From Noah Misch
Subject Re: AIO writes vs hint bits vs checksums
Date
Msg-id 20240924194340.92.nmisch@google.com
Whole thread Raw
In response to AIO writes vs hint bits vs checksums  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Sep 24, 2024 at 11:55:08AM -0400, Andres Freund wrote:
> So far the AIO patchset has solved this by introducing a set of "bounce
> buffers", which can be acquired and used as the source/target of IO when doing
> it in-place into shared buffers isn't viable.
> 
> I am worried about that solution however, as either acquisition of bounce
> buffers becomes a performance issue (that's how I did it at first, it was hard
> to avoid regressions) or we reserve bounce buffers for each backend, in which
> case the memory overhead for instances with relatively small amount of
> shared_buffers and/or many connections can be significant.

> But: We can address this and improve performance over the status quo! Today we
> determine tuple visiblity determination one-by-one, even when checking the
> visibility of an entire page worth of tuples. That's not exactly free. I've
> prototyped checking visibility of an entire page of tuples at once and it
> indeed speeds up visibility checks substantially (in some cases seqscans are
> over 20% faster!).

Nice!  It sounds like you refactored the relationship between
heap_prepare_pagescan() and HeapTupleSatisfiesVisibility() to move the hint
bit setting upward or the iterate-over-tuples downward.  Is that about right?

> Once we have page-level visibility checks we can get the right to set hint
> bits once for an entire page instead of doing it for every tuple - with that
> in place the "new approach" of setting hint bits only with BM_SETTING_HINTS
> wins.

How did page-level+BM_SETTING_HINTS performance compare to performance of the
page-level change w/o the BM_SETTING_HINTS change?

> Having a page level approach to setting hint bits has other advantages:
> 
> E.g. today, with wal_log_hints, we'll log hint bits on the first hint bit set
> on the page and we don't mark a page dirty on hot standby. Which often will
> result in hint bits notpersistently set on replicas until the page is frozen.

Nice way to improve that.

> Does this sound like a reasonable idea?  Counterpoints?

I guess the main part left to discuss is index scans or other scan types where
we'd either not do page-level visibility or we'd do page-level visibility
including tuples we wouldn't otherwise use.  BM_SETTING_HINTS likely won't
show up so readily in index scan profiles, but the cost is still there.  How
should we think about comparing the distributed cost of the buffer header
manipulations during index scans vs. the costs of bounce buffers?

Thanks,
nm



pgsql-hackers by date:

Previous
From: Shayon Mukherjee
Date:
Subject: Re: Proposal to Enable/Disable Index using ALTER INDEX
Next
From: Alvaro Herrera
Date:
Subject: Re: Possible null pointer dereference in afterTriggerAddEvent()