Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: index prefetching
Date	September 3 21:47:25
Msg-id	bpdeohyqvltb77viyft4bza4xc4peed3jcoep74d2ih6ynqlke@wbnhcwmq3ril Whole thread Raw
In response to	Re: index prefetching (Andres Freund <andres@anarazel.de>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view

Hi,

I spent a fair bit more time analyzing this issue.

On 2025-08-28 21:10:48 -0400, Andres Freund wrote:
> On 2025-08-28 19:57:17 -0400, Peter Geoghegan wrote:
> > On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra <tomas@vondra.me> wrote:
> > I'm not sure that Thomas'/your patch to ameliorate the problem on the
> > read stream side is essential here. Perhaps Andres can just take a
> > look at the test case + feature branch, without the extra patches.
> > That way he'll be able to see whatever the immediate problem is, which
> > might be all we need.
> 
> It seems caused to a significant degree by waiting at low queue depths.  If I
> comment out the stream->distance-- in read_stream_start_pending_read() the
> regression is reduced greatly.
> 
> As far as I can tell, after that the process is CPU bound, i.e. IO waits don't
> play a role.

Indeed the actual AIO subsystem is unrelated, from what I can tell:

I hacked up read_stream.c/bufmgr.c to do readahead even if the buffer is in
shared_buffers. With that, the negative performance impact of doing
enable_indexscan_prefetch=1 is of a similar magnitude even if the table is
already entirely in shared buffers. I.e. actual IO is unrelated.

I compared perf stat -ddd output for enable_indexscan_prefetch=0 with
enable_indexscan_prefetch=1. The only real difference is a substantial (~3x)
increase in branch misses.

I then took a perf profile to see where all those misses are from.

The first souce is:

> I see a variety for increased CPU usage:
> 
> 1) The private ref count infrastructure in bufmgr.c gets a bit slower once
>    more buffers are pinned

The problem mainly seems to be that the branches in the loop at the start of
GetPrivateRefCountEntry() are entirely unpredictable in this workload.  I had
an old patch that tried to make it possible to use SIMD for the search, by
using a separate array for the Buffer ids - with that gcc generates fairly
crappy code, but does make the code branchless.

Here that substantially reduces the overhead of doing prefetching. Afterwards
it's not a meaningful source of misses anymore.

> 3) same issue with the resowner tracking

This one is much harder to address:

a) The "key" we are searching for is much wider (16 bytes), making
   vectorization of the search less helpful

b) because we search up to owner->narr instead of a fixed-length, the compiler
   wouldn't be able to auto-vectorize anyway

c) the branch-misses are partially caused by ResourcOwnerForget() "scrambling"
   the order in the array when forgetting an element

I don't know how to fix this right now.  I nevertheless wanted to see how big
the impact of this is, so I just neutered
ResourceOwner{Remember,Forget}{Buffer,BufferIO} - that's obviously not
correct, but suffices to see that the performance difference reduces
substantially.

But not completely, unfortunately.

> But there's some additional difference in performance I don't yet
> understand...

I still don't think I fully understand why the impact of this is so large. The
branch misses appear to be the only thing differentiating the two cases, but
with resowners neutralized, the remaining difference in branch misses seems
too large - it's not like the sequence of block numbers is more predictable
without prefetching...

The main increase in branch misses is in index_scan_stream_read_next...

Greetings,

Andres Freund

pgsql-hackers by date:

From: Mihail Nikalayeu
Date: 03 September, 21:20:21
Subject: Re: [BUG?] check_exclusion_or_unique_constraint false negative

From: Jeff Davis
Date: 03 September, 21:50:05
Subject: Re: Should io_method=worker remain the default?

Re: index prefetching - Mailing list pgsql-hackers

Previous

Next