Re: index prefetching - Mailing list pgsql-hackers
| From | Konstantin Knizhnik |
|---|---|
| Subject | Re: index prefetching |
| Date | |
| Msg-id | 383515d3-1122-47fd-b6ea-09152500496a@garret.ru Whole thread Raw |
| In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
| List | pgsql-hackers |
On 18/12/2025 4:45 PM, Andres Freund wrote:
> Hi,
>
> On 2025-12-18 15:40:59 +0100, Tomas Vondra wrote:
>> The technical reason is that batch_getnext() does this:
>>
>> /* Delay initializing stream until reading from scan's second batch */
>> if (priorbatch && !scan->xs_heapfetch->rs && !batchqueue->disabled &&
>> enable_indexscan_prefetch)
>> scan->xs_heapfetch->rs =
>> read_stream_begin_relation(READ_STREAM_DEFAULT, NULL,
>> ....);
>>
>> which means we only create the read_stream (which is what enables the
>> prefetching) only when creating the second batch. And with LIMIT 100 we
>> likely read just a single leaf page (=batch) most of the time, which
>> means no read_stream and thus no prefetching.
> Why is the logic tied to the number of batches, rather the number of items in
> batches? It's not hard to come up with scenarios where having to wait for ~100
> random pages will be the majority of the queries IO wait... It makes sense to
> not initialize readahead if we just fetch an entry or two, but after that?
I did more experiments trying to understand when we can take advantage
of prefetch:
So schema is the same:
create table t (pk integer, sk integer, payload text default repeat('x',
1000)) with (fillfactor=10);
insert into t values (generate_series(1,10000000),random()*10000000);
create index on t(sk);
select.sql:
\set sk random(1, 10000000)
select * from t where sk >= :sk order by sk limit N;
And I do
pgbench -n -T 30 -M prepared -f select.sql postgres
limit\prefetch on off always incremental
1 12074 12765 3146 3282
2 5912 6198 2463 2438
4 2919 3047 1334 1964
8 1554 1496 1166 1409
16 815 775 947 940
32 424 403 687 695
64 223 208 446 453
128 115 106 258 270
256 68 53 138 149
512 43 27 72 78
1024 28 13 38 40
prefetch=always means commenting of `priorbatch` check and immediate
creation of read_stream:
/* Delay initializing stream until reading from scan's second
batch */
- if (priorbatch && !scan->xs_heapfetch->rs &&
!batchqueue->disabled &&+
+ if (/*priorbatch && */!scan->xs_heapfetch->rs &&
!batchqueue->disabled &&
prefetch=increment replaces doubling of prefetch distance with increment:
/* Look-ahead distance ramps up rapidly after we do I/O. */
- distance = stream->distance * 2;
+ distance = stream->distance ? stream->distance + 1 : 0;
So as you expected, immediate creation of read_stream cause quite
significant degrade of performance on indexscans inspecting small number
of TIDs.
Looks like the threshold where read stream provides advantages in
performance is about 10.
After it earlier initialization of read stream adds quite noticeable
performance improvement.
I tried to find out using profiler and debugger where most of the time
is spent in this case and answer was quite predictable -
in read_stream_reset->read_stream_next_buffer.
So we just consuming pefetched buffers which we do not need.
I thought that we can use some better policy for increasing prefetch
distance (right now it is just doubled).
This is why I have tried this "incremental" policy.
Unfortunately it can not help to reduce prefetch overhead for "short"
indexscans.
But what surprised me is that for longer indexscans this approach seems
to be slightly more efficient than doubling.
So look like we really should use number of items criteria for read
stream initialization rather than number of batches.
And may be think about alternative policy for increasing prefetch distance.
pgsql-hackers by date: