Re: index prefetching - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: index prefetching
Date
Msg-id 425c9dba-309d-49fa-a701-4ccbf5cce1da@garret.ru
Whole thread Raw
In response to Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Responses Re: index prefetching
List pgsql-hackers
On 28/12/2025 8:08 PM, Tomas Vondra wrote:
> On 12/25/25 16:39, Konstantin Knizhnik wrote:
>> On 21/12/2025 7:55 PM, Peter Geoghegan wrote:
>>> On Wed, Dec 10, 2025 at 9:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
>>>> Attached is v4.
>>> Attached is v5. Changes from v4:
>>>
>>> * Simplified and optimized index-only scans, with a particular
>>> emphasis on avoiding regressions with nested loop joins with an inner
>>> index-only scan.
>>>
>>> There were quite a number of small problems/dead code related to
>>> index-only scans fixed by this new v5. Overall, I'm quite a bit
>>> happier with the state of index-only scans, which I'd not paid too
>>> much attention to before now.
>>>
>>> * Added Valgrind instrumentation to the hash index patch, which was
>>> required to fix some false positives.
>>>
>>> The generic indexam_util_batch_unlock routine had Valgrind
>>> instrumentation in earlier versions, just to keep nbtree's buffer
>>> locking checks from generating similar false positives. Some time
>>> later, when I added the hashgetbatch patch, there were new Valgrind
>>> false positives during hash index scans -- which I missed at first.
>>> This new v5 revisions adds similar Valgrind checks to hash itself
>>> (changes that add code that is more or less a direct port of the stuff
>>> added to nbtree by commit 4a70f829), which fixes the false positives,
>>> and is independently useful.
>>>
>>> The rule for amgetbatch-based index AMs is that they must have similar
>>> buffer locking instrumentation. That seems like a good thing.
>>>
>>> -- 
>>> Peter Geoghegan
>> I the previous mail I shared results of my experiments with different
>> prefetch distance.
>> I think that we should start prefetching of heap tuples not from the
>> second batch, but after some number of proceeded tids.
>>
>> Attached please find a patch which implements this approach.
>> And below are updated results:
>>
>> limit\prefetch    on      off   always  inc    threshold
>> 1                 12074   12765  3146    3282     12394
>> 2                 5912    6198   2463    2438      6124
>> 4                 2919    3047   1334    1964      2910
>> 8                 1554    1496   1166    1409      1588
>> 16                815     775    947     940        600
>> 32                424     403    687     695        478
>> 64                223     208    446     453        358
>> 128               115     106    258     270        232
>> 256               68      53     138     149        131
>> 512               43      27     72      78          71
>> 1024              28      13     38      40          38
>>
>> Last column is result of prefetch with read_stream_threshold=10.
>>
> That's great, but it only works for cases that can (and do) benefit from
> the prefetching. Try running the benchmark with a data set that fits
> into shared buffers (or RAM), which makes prefetching useless.
>
> I tried that with your test, comparing master, v5 and v5 + your
> read_stream_threshold patch. See the attached run.sh script, and the PDF
> summarizing the results. The last two column groups are comparisons to
> master, with green=improvement, red=regression. There are no actual
> improvements (1% delta is just noise). But the read_stream_threshold
> results have a clear pattern of pretty massive (20-30%) regressions.
>
> The difference between v5 and v5-threshold is pretty clear.
>
> IIRC cases like this are *exactly* why we ended up with the current
> heuristics, enabling prefetching only from the second batch. This
> removes the risk of expensive read_stream init for very fast queries
> that don't benefit anything. Of course, prefetching may be useless for
> later batches too (e.g. if all the data is cached), but the query will
> be expensive enough for the read_stream init cost to be negligible.
>
> To put this differently, the more aggressive the heuristics is (enabling
> prefetching in more case), the more likely it's to cause regressions.
> We've chosen to be more defensive, i.e. to sacrifice some possible gains
> in order to not regress plausible workloads. I hope we agree queries on
> fully cached "hot" data are pretty common / important.
>
> We can probably do better in the future. But we'll never know for sure
> if a given scan benefits from prefetching. It's not just about the
> number of items in the batch, but also about how many heap pages that
> translates to, what I/O pattern (random vs. sequential?), how many are
> already cached. For some queries we don't even know how many items we'll
> actually need. We can't check all that at the very beginning, because
> it's simply prohibitively expensive.

Thank you for looking at my patch.
I agree with you that such overhead in case of presence of data in 
shared buffers is certainly not acceptable.
But it just means that we need some better criteria than number of 
scanned TIDs - i.e. number of smgr heap reads.
I do not think that it will be too complex or expensive to implement - I 
will try.

But in any case - the current heuristics: prefetching only from the 
second batch, is IMHO not solving this problem.
First of all, as far as I understand batch = TIDs from one leaf page and 
if there are large keys (i.e. URLs), there will be just few items in the 
batch.
Also if all pages are present in shared buffers, then even starting 
prefetching from the second batch will have negative impact on performance.
May be for long queries (scanning a lot of data) this overhead will be 
less noticeable.
But as far as it is proportional to the amount of scanned data, it can 
be still desirable to avoid it.





>
> regards
>



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Next
From: Tomas Vondra
Date:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks