Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 99028cb4-2782-43fe-b7aa-590b9692b040@vondra.me Whole thread Raw |
In response to | Re: index prefetching (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-hackers |
On 8/25/25 17:43, Thomas Munro wrote: > On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra <tomas@vondra.me> wrote: >> Of course, this can happen even with other hit ratios, there's nothing >> special about 50%. > > Right, that's what this patch was attacking directly, basically only > giving up when misses are so sparse we can't do anything about it for > an ordered stream: > > https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw%40mail.gmail.com > > aio: Improve read_stream.c look-ahead heuristics C > > Previously we would reduce the look-ahead distance by one every time we > got a cache hit, which sometimes performed poorly with mixed hit/miss > patterns, especially if it was trapped at one. > > Instead, sustain the current distance until we've seen evidence that > there is no window big enough to span the gap between rare IOs. In > other words, we now use information from a much larger window to > estimate the utility of looking far ahead. Ah, I forgot about this patch. There's been too many PoC / experimental patches with read_stream improvements, I'm loosing track of them. I'm ready to do some evaluation, but it's not clear which ones to evaluate, etc. Could you maybe consolidate them into a patch series that I could benchmark? I did give this patch a try with the dataset/query shared in [1], and the explain looks like this: QUERY PLAN --------------------------------------------------------------------- Index Scan using idx on t (actual rows=9048576.00 loops=1) Index Cond: ((a >= 16150) AND (a <= 4540437)) Index Searches: 1 Prefetch Distance: 271.999 Prefetch Count: 4339129 Prefetch Stalls: 386 Prefetch Skips: 6039906 Prefetch Resets: 0 Stream Ungets: 1331122 Stream Forwarded: 306719 Prefetch Histogram: [2,4) => 10, [4,8) => 2, [8,16) => 2, [16,32) => 2, [32,64) => 2, [64,128) => 3, [256,512) => 4339108 Buffers: shared hit=2573920 read=455610 Planning: Buffers: shared hit=83 read=26 Planning Time: 4.142 ms Execution Time: 1694.368 ms (16 rows) which is pretty good, and pretty much on-par with master (so no regression, which is good). It's a bit strange the distance ends up being that high, though. The explain says: Prefetch Distance: 271.999 There's ~70% misses on average, so isn't 217 a bit too high? Wouldn't that cause too many concurrent IOs? Maybe I'm interpreting this wrong, or maybe the explain stats are not quite right. For comparison, the patch from [1] ends up with this: Prefetch Distance: 36.321 In any case, the patch seems to help, and maybe it's a better approach, I need to take a closer look. regards [1] https://www.postgresql.org/message-id/8f5d66cf-44e9-40e0-8349-d5590ba8efb4%40vondra.me -- Tomas Vondra
pgsql-hackers by date: