Re: index prefetching - Mailing list pgsql-hackers
| From | Konstantin Knizhnik |
|---|---|
| Subject | Re: index prefetching |
| Date | |
| Msg-id | 03cbf145-ea85-49d3-b5fc-71ed144c28e2@garret.ru Whole thread Raw |
| In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
| List | pgsql-hackers |
On 18/12/2025 4:40 PM, Tomas Vondra wrote:
On 12/18/25 14:57, Konstantin Knizhnik wrote:On 17/12/2025 9:54 PM, Tomas Vondra wrote:On 12/17/25 20:30, Andres Freund wrote:Hi, On 2025-12-17 13:49:43 -0500, Peter Geoghegan wrote:On Wed, Dec 17, 2025 at 12:19 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:Moreover with `enable_indexscan_prefetch=off` results are the same.It's quite unlikely that the current heuristics that trigger prefetching would have ever allowed any prefetching, for queries such as these. The exact rule right now is that we don't even begin prefetching until we've already read at least one index leaf page, and have to read another one. So it's impossible to use prefetching with a LIMIT of 1, with queries such as these. It's highly unlikely that you'd see any benefits from prefetching even with LIMIT 100 (usually we wouldn't even begin prefetching).Note that due to the tuple size and fillfactor in Konstantin's workload, there will be one tuple per page... That should allow for some prefetching.Yes, but that's in the heap. The mechanism Peter described is about leaf pages in the index, and the index has the usual fillfactor. So there'll be many index entries per leaf.I slightly change my benchmark setup: create table t (pk integer, sk integer, payload text default repeat('x', 1000)) with (fillfactor=10); insert into t values (generate_series(1,10000000),random()*10000000); create index on t(sk); select.sql: \set sk random(1, 10000000) select * from t where sk >= :sk order by sk limit N; You are right. There is almost no effect of prefetch for limit=100, but ~2x times improvement for limit=1000: eio\limit 1 100 1000 10 11102 142 28 0 11419 137 14 master: limit 1 100 1000 11480 130 13 One of the motivation of my experiments was to check that there is no degrade of performance because of batching. And it is nice that there is no performance penalty here. Still it is not quite clear to me why there is no any positive effect for LIMIT 100.The technical reason is that batch_getnext() does this: /* Delay initializing stream until reading from scan's second batch */ if (priorbatch && !scan->xs_heapfetch->rs && !batchqueue->disabled && enable_indexscan_prefetch) scan->xs_heapfetch->rs = read_stream_begin_relation(READ_STREAM_DEFAULT, NULL, ....); which means we only create the read_stream (which is what enables the prefetching) only when creating the second batch. And with LIMIT 100 we likely read just a single leaf page (=batch) most of the time, which means no read_stream and thus no prefetching. You can try disabling this "priorbatch" condition, so that the read_stream gets created right away.
It makes the expected effect - performance of LIMIT 100 is increased from 142TPS to 315TPS (so also 2x times). At the same time performance of LIMIT 1 is reduced from 11419 to 3499 - ~4x times For LIMIT 10 result are 1388 with disabled prefetch and 1116 with enabled prefetch. So looks like threshold for enabling prefetch should be based not on number of batches, but on expected heap reads and it is more closer to 100 or even 10 than to 1000. And for slower disks (or remote storage), effect of prefetch should be much bigger.
True, but only if the data is not already in memory / shared buffers. IIRC this "priorbatch" logic mitigates regressions for cached workloads, because the read_stream initialization is expensive enough to hurt small queries when no I/O is needed.
I see.
But may be we should compare table size with shared buffers or effective_cache_size?
pgsql-hackers by date: