Re: index prefetching - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: index prefetching
Date
Msg-id CAH2-WznH5SQNvtTonhQexdZ-0x_7-m8uAsrbUkxpucg5ErYNzA@mail.gmail.com
Whole thread Raw
In response to Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Responses Re: index prefetching
List pgsql-hackers
On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra <tomas@vondra.me> wrote:
> Almost all regressions (at least the top ones) now look like this, i.e.
> distance collapses to ~2.0, which essentially disables prefetching.

Good to know.

> But I no longer think it's caused by the "priorbatch" optimization,
> which delays read stream creation until after the first batch. I still
> think we may need to rethink that (e.g. if the first batch is huge), but
> he distance can "collapse" even without it. The optimization just makes
> it easier to happen.

That shouldn't count against the "priorbatch" optimization. I still
think that this issue should be treated as 100% unrelated to the
"priorbatch" optimization.

You might very well be right that the "priorbatch" optimization is too
naive about index scans whose first/possibly only leaf page has TIDs
that point to many distinct heap blocks (hundreds, say). But there's
no reason to think that that's truly relevant to the problem at hand.
If there was such a problem, then it wouldn't look like a regression
against enable_indexscan_prefetch = off/master. We'd likely require a
targeted approach to even notice such a problem; so far, most/all of
our index scan test cases have read hundreds/thousands of index pages
-- so any problem that's limited to the first leaf page read is likely
to go unnoticed.

I think that the "priorbatch" optimization at least takes
*approximately* the right approach, which is good enough for now. It
at least shouldn't ever do completely the wrong thing. It even seems
possible that sufficiently testing will actually show that its naive
approach to be the best one, on balance, once the cost of adding
mitigations (costs for all queries, not just ones like the one you
looked at recently) is taken into account.

I suggest that we not even think about "priorbatch" until the problem
on the read stream side is fixed. IMV we should at least have a
prototype patch for the read stream that we're reasonably happy with
before looking at "priorbatch" in further detail. I don't think we
have that right now.

> AFAICS the distance collapse is "inherent" to how the distance gets
> increased/decreased after hits/misses.

Right. (I think that you'll probably agree with me about addressing
this problem before even thinking about limitations in the
"priorbatch" optimization, but I thought it best to be clear about
that.)

> I find this distance heuristics a bit strange, for a couple reasons:
>
> * It doesn't seem right to get stuck at distance=2 with 50% misses.
> Surely that would benefit from prefetching a bit more?

Maybe, but at what cost? It doesn't necessarily make sense to continue
to read additional leaf pages, regardless of the number of heap buffer
hits in the recent past. At some point it likely makes more sense to
just give up and do actual query processing/return rows to the scan.
Even without a LIMIT. I have low confidence here, though.

> * It mostly ignores effective_io_concurrency, which I think about as
> "Keep this number of I/Os in the queue." But we don't try doing that.

As I said, I might just be wrong about "just giving up at some point"
making sense. I just don't necessarily think it makes sense to go from
ignoring effective_io_concurrency to *only* caring about
effective_io_concurrency. It's likely true that keeping
effective_io_concurrency-many I/Os in flight is the single most
important thing -- but I doubt it's the only thing that ever matters
(again, even assuming that there's no LIMIT involved).

> Attached is an example table/query, found by my script. Without the
> read_stream patch (i.e. just with the current index prefetching), it
> looks like this:

> So it's more a case of "mitigating a regression" (finding regressions
> like this is the purpose of my script). Still, I believe the questions
> about the distance heuristics are valid.
>
> (Another interesting detail is that the regression happens only with
> io_method=worker, not with io_uring. I'm not sure why.)

I find that the regression happens with io_uring. I also find that
your patch doesn't fix it. I have no idea why.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: index prefetching
Next
From: Sami Imseih
Date:
Subject: Re: GetNamedLWLockTranche crashes on Windows in normal backend