Re: index prefetching - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: index prefetching
Date
Msg-id CAH2-Wzk+ba5oHf1ghC34Z8eoCvum5yHGMxHeFCbC2ibcQ+J7uw@mail.gmail.com
Whole thread Raw
In response to Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Responses Re: index prefetching
Re: index prefetching
Re: index prefetching
List pgsql-hackers
On Tue, Jul 22, 2025 at 9:06 AM Tomas Vondra <tomas@vondra.me> wrote:
> Real workloads are likely to have multiple misses in a row, which indeed
> ramps up the distance quickly. So maybe it's not that bad. Could we
> track a longer history of hits/misses, and consider that when adjusting
> the distance? Not just the most recent hit/miss?

+1

> FWIW I re-ran the index-prefetch-test benchmarks with restoring the
> distance for the "simple" patch. The results are in the same github
> repository, in a separate branch:
>
> https://github.com/tvondra/indexscan-prefetch-tests/tree/with-distance-restore-after-reset

These results make way more sense. There was absolutely no reason why
the "simple" patch should have done so much worse than the "complex"
one for most of the tests you've been running.

Obviously, whatever advantage that the "complex" patch has is bound to
be limited to cases where index characteristics are naturally the
limiting factor. For example, with the pgbench_accounts_pkey table
there are only ever 6 distinct heap blocks on each leaf page. I bet
that your "linear" test more or less looks like that, too.

I attach pgbench_accounts_pkey_nhblks.txt, which shows a query that
(among other things) ouputs "nhblks" for each leaf page from a given
index (while showing the details of each leaf page in index key space
order). It also shows results for pgbench_accounts_pkey with pgbench
scale 1. This is how I determined that every pgbench_accounts_pkey
leaf page points to exactly 6 distinct heap blocks -- "nhblks" is
always 6. Note that this is what I see regardless of the pgbench
scale, indicating that things always perfectly line up (even more than
I would expect for very synthetic data such as this).

This query is unwieldy when run against larger indexes, but that
shouldn't be necessary. As with pgbench_accounts_pkey, it's typical
for synthetically generated data to have a very consistent "nhblks",
regardless of the total amount of data.

With your "uniform" test cases, I'd expect this query to show "nhtids
== nhblks" (or very close to it), which of course makes our ability to
eagerly read further leaf pages almost irrelevant. If there are
hundreds of distinct heap blocks on each leaf page, but
effective_io_concurrency is 16 (or even 64), there's little we can do
about it.

> I'm attaching two PDFs with results for eic=16 (linear and log-scale, to
> compare timings for quick queries). This shows that with restoring
> distance after reset, the simple patch is pretty much the same as the
> complex patch.
>
> The only data set where that's not the case is the "linear" data set,
> when everything is perfectly sequential. In this case the simple patch
> performs like "master" (i.e. no prefetching). I'm not sure why is that.

Did you restore the distance for the "complex" patch, too? I think
that it might well matter there too.

Isn't the obvious explanation that the complex patch benefits from
being able to prefetch without being limited by index
characteristics/leaf page boundaries, while the simple patch doesn't?

> Anyway, it seems to confirm most of the differences between the two
> patches is due to the "distance collapse". The impact of the resets in
> the first benchmarks surprised me quite a bit, but if we don't ramp up
> the distance that makes perfect sense.
>
> The issue probably affects other queries that do a lot of resets. Index
> scan prefetching just makes it very obvious.

What is the difference between cases like "linear / eic=16 / sync" and
"linear_1 / eic=16 / sync"?

One would imagine that these tests are very similar, based on the fact
that they have very similar names. But we see very different results
for each: with the former ("linear") test results, the "complex" patch
is 2x-4x faster than the "simple" patch. But, with the latter test
results ("linear_1", and other similar pairs of "linear_N" tests) the
advantage for the "complex" patch *completely* evaporates. I find that
very suspicious, and wonder if it might be due to a bug/inefficiency
that could easily be fixed (possibly an issue on the read stream side,
like the one you mentioned to Nazir just now).

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: track generic and custom plans in pg_stat_statements
Next
From: Merlin Moncure
Date:
Subject: Re: Making jsonb_agg() faster