Re: index prefetching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: index prefetching
Date
Msg-id c333af4f-6448-4494-aaf7-9dea3f0afb2e@vondra.me
Whole thread Raw
In response to Re: index prefetching  (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Responses Re: index prefetching
List pgsql-hackers
On 7/21/25 08:53, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Mon, 21 Jul 2025 at 03:59, Thomas Munro <thomas.munro@gmail.com> wrote:
>>
>> On Sun, Jul 20, 2025 at 1:07 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>>> On Sat, Jul 19, 2025 at 11:23 PM Tomas Vondra <tomas@vondra.me> wrote:
>>>> The thing that however concerns me is that what I observed was not the
>>>> distance getting reset to 1, and then ramping up. Which should happen
>>>> pretty quickly, thanks to the doubling. In my experiments it *never*
>>>> ramped up again, it stayed at 1. I still don't quite understand why.
>>>
>>> Huh.  Will look into that on Monday.
>>
>> I suspect that it might be working as designed, but suffering from a
>> bit of a weakness in the distance control algorithm, which I described
>> in another thread[1].  In short, the simple minded algorithm that
>> doubles on miss and subtracts one on hit can get stuck alternating
>> between 1 and 2 if you hit certain patterns.  Bilal pinged me off-list
>> to say that he'd repro'd something like your test case and that's what
>> seemed to be happening, anyway?  I will dig out my experimental
>> patches that tried different adjustments to escape from that state....
> 
> I used Tomas Vondra's test [1]. I tracked how many times
> StartReadBuffersImpl() functions return true (IO is needed) and false
> (IO is not needed, cache hit). It returns true ~%6 times on both
> simple and complex patches (~116000 times true, ~1900000 times false
> on both patches).
> 
> A complex patch ramps up to ~250 distance at the start of the stream
> and %6 is enough to stay at distance. Actually, it is enough to ramp
> up more but it seems the max distance is about ~270 so it stays there.
> On the other hand, a simple patch doesn't ramp up at the start of the
> stream and %6 is not enough to ramp up. It is always like distance is
> 1 and IO needed, so multiplying the distance by 2 -> distance = 2 but
> then the next block is cached, so decreasing the distance by 1 and
> distance is 1 again.
> 
> [1] https://www.postgresql.org/message-id/aa46af80-5219-47e6-a7d0-7628106965a6%40vondra.me
> 

Yes, this is the behavior I observed too. I was wondering if the 5% miss
ratio hit some special "threshold" in the distance heuristics, and maybe
it'd work fine with a couple more misses.

But I don't think so, I think pretty workloads with up to 50% misses may
hit this problem. We reset the distance to 1, and then with 50% misses
we'll do about 1 hit + 1 miss, which doubles the distance to 2 and then
reduces the distance to 1, infinitely. Of course, that's only for even
distribution hits/misses (and the synthetic workloads are fairly even).

Real workloads are likely to have multiple misses in a row, which indeed
ramps up the distance quickly. So maybe it's not that bad. Could we
track a longer history of hits/misses, and consider that when adjusting
the distance? Not just the most recent hit/miss?


FWIW I re-ran the index-prefetch-test benchmarks with restoring the
distance for the "simple" patch. The results are in the same github
repository, in a separate branch:

https://github.com/tvondra/indexscan-prefetch-tests/tree/with-distance-restore-after-reset

I'm attaching two PDFs with results for eic=16 (linear and log-scale, to
compare timings for quick queries). This shows that with restoring
distance after reset, the simple patch is pretty much the same as the
complex patch.

The only data set where that's not the case is the "linear" data set,
when everything is perfectly sequential. In this case the simple patch
performs like "master" (i.e. no prefetching). I'm not sure why is that.


Anyway, it seems to confirm most of the differences between the two
patches is due to the "distance collapse". The impact of the resets in
the first benchmarks surprised me quite a bit, but if we don't ramp up
the distance that makes perfect sense.

The issue probably affects other queries that do a lot of resets. Index
scan prefetching just makes it very obvious.


regards

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: AIO v2.5
Next
From: Andrei Lepikhov
Date:
Subject: Re: Log prefix missing for subscriber log messages received from publisher