Re: index prefetching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: index prefetching
Date
Msg-id 99028cb4-2782-43fe-b7aa-590b9692b040@vondra.me
Whole thread Raw
In response to Re: index prefetching  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On 8/25/25 17:43, Thomas Munro wrote:
> On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra <tomas@vondra.me> wrote:
>> Of course, this can happen even with other hit ratios, there's nothing
>> special about 50%.
> 
> Right, that's what this patch was attacking directly, basically only
> giving up when misses are so sparse we can't do anything about it for
> an ordered stream:
> 
> https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw%40mail.gmail.com
> 
> aio: Improve read_stream.c look-ahead heuristics C
> 
> Previously we would reduce the look-ahead distance by one every time we
> got a cache hit, which sometimes performed poorly with mixed hit/miss
> patterns, especially if it was trapped at one.
> 
> Instead, sustain the current distance until we've seen evidence that
> there is no window big enough to span the gap between rare IOs.  In
> other words, we now use information from a much larger window to
> estimate the utility of looking far ahead.

Ah, I forgot about this patch.

There's been too many PoC / experimental patches with read_stream
improvements, I'm loosing track of them. I'm ready to do some
evaluation, but it's not clear which ones to evaluate, etc. Could you
maybe consolidate them into a patch series that I could benchmark?

I did give this patch a try with the dataset/query shared in [1], and
the explain looks like this:

                              QUERY PLAN
---------------------------------------------------------------------
Index Scan using idx on t  (actual rows=9048576.00 loops=1)
   Index Cond: ((a >= 16150) AND (a <= 4540437))
   Index Searches: 1
   Prefetch Distance: 271.999
   Prefetch Count: 4339129
   Prefetch Stalls: 386
   Prefetch Skips: 6039906
   Prefetch Resets: 0
   Stream Ungets: 1331122
   Stream Forwarded: 306719
   Prefetch Histogram: [2,4) => 10, [4,8) => 2, [8,16) => 2,
                       [16,32) => 2, [32,64) => 2, [64,128) => 3,
                       [256,512) => 4339108
   Buffers: shared hit=2573920 read=455610
 Planning:
   Buffers: shared hit=83 read=26
 Planning Time: 4.142 ms
 Execution Time: 1694.368 ms
(16 rows)

which is pretty good, and pretty much on-par with master (so no
regression, which is good).

It's a bit strange the distance ends up being that high, though. The
explain says:

   Prefetch Distance: 271.999

There's ~70% misses on average, so isn't 217 a bit too high? Wouldn't
that cause too many concurrent IOs? Maybe I'm interpreting this wrong,
or maybe the explain stats are not quite right.

For comparison, the patch from [1] ends up with this:

   Prefetch Distance: 36.321

In any case, the patch seems to help, and maybe it's a better approach,
I need to take a closer look.


regards


[1]
https://www.postgresql.org/message-id/8f5d66cf-44e9-40e0-8349-d5590ba8efb4%40vondra.me

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Peter Geoghegan
Date:
Subject: Re: index prefetching