Re: index prefetching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: index prefetching
Date
Msg-id 64c8b824-6203-46a3-b045-5e95b796feee@vondra.me
Whole thread Raw
In response to Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Responses Re: index prefetching
Re: index prefetching
List pgsql-hackers
On 7/13/25 23:56, Tomas Vondra wrote:
> 
> ...
> 
>>> The one real limitation of the simpler approach is that prefetching is
>>> limited to a single leaf page - we can't prefetch from the next one,
>>> until the scan advances to it. But based on experiments comparing this
>>> simpler and the "complex" approach, I don't think that really matters
>>> that much. I haven't seen any difference for regular queries.
>>
>> Did you model/benchmark it?
>>
> 
> Yes. I did benchmark the simple and complex versions I had at the time.
> But you know how it's with benchmarking - I'm sure it's possible to pick
> queries where it'd make a (significant) difference.
> 
> For example if you make the index tuples "fat" that would make the
> prefetching less efficient.
> 
> Another thing is hardware. I've been testing on local NVMe drives, and
> those don't seem to need very long queues (it's diminishing returns).
> Maybe the results would be different on systems with more I/O latency
> (e.g. because the storage is not local).
> 

I decided to do some fresh benchmarks, to confirm my claims about the
simple vs. complex patches is still true even for the recent versions.
And there's a lot of strange stuff / stuff I don't quite understand.

The results are in git (still running, so only some data sets):

  https://github.com/tvondra/indexscan-prefetch-tests/

there's a run.sh script, it expects three builds - master,
prefetch-simple and prefetch-complex (for the two patches). And then it
does queries with index scans (and bitmap scans, for comparison),
forcing different io_methods, eic, ... Tests are running on the same
data directory, in random order.

Consider for example this (attached):

https://github.com/tvondra/indexscan-prefetch-tests/blob/master/d16-rows-cold-32GB-16-scaled.pdf

There's one column for each io_method ("worker" has two different
counts), different data sets in rows. There's not much difference
between io_methods, so I'll focus on "sync" (it's the simplest one).

For "uniform" data set, both prefetch patches do much better than master
(for low selectivities it's clearer in the log-scale chart). The
"complex" prefetch patch appears to have a bit of an edge for >1%
selectivities. I find this a bit surprising, the leaf pages have ~360
index items, so I wouldn't expect such impact due to not being able to
prefetch beyond the end of the current leaf page. But could be on
storage with higher latencies (this is the cloud SSD on azure).

But the thing I don't really understand it the "cyclic" dataset (for
example). And the "simple" patch performs really badly here. This data
set is designed to not work for prefetching, it's pretty much an
adversary case. There's ~100 TIDs from 100 pages for each key value, and
once you read the 100 pages you'll hit them many times for following
values. Prefetching is pointless, and skipping duplicate blocks can't
help, because the blocks are not effective.

But how come the "complex" patch does so much better? It can't really
benefit from prefetching TID from the next leaf - not this much. Yet it
does a bit better than master. I'm looking at this since yesterday, and
it makes no sense to me. Per "perf trace" it actually does 2x many
fadvise calls compared to the "simple" patch (which is strange on it's
own, I think), yet it's apparently so much faster?


regards

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Yugo Nagata
Date:
Subject: Re: Extend ALTER DEFAULT PRIVILEGES for large objects
Next
From: John Naylor
Date:
Subject: Re: Improving and extending int128.h to more of numeric.c