Re: index prefetching - Mailing list pgsql-hackers

From Andres Freund
Subject Re: index prefetching
Date
Msg-id 5pltwb73d7cynsxo2yb54ygjk7haviatkrx43mnzihc6kkield@ahnstpgof46i
Whole thread Raw
In response to Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Responses Re: index prefetching
Re: index prefetching
List pgsql-hackers
Hi,

On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
> On 8/28/25 18:16, Andres Freund wrote:
> >> So I think the IPC overhead with "worker" can be quite significant,
> >> especially for cases with distance=1. I don't think it's a major issue
> >> for PG18, because seq/bitmap scans are unlikely to collapse the distance
> >> like this. And with larger distances the cost amortizes. It's much
> >> bigger issue for the index prefetching, it seems.
> > 
> > I couldn't keep up with all the discussion, but is there actually valid I/O
> > bound cases (i.e. not ones were we erroneously keep the distance short) where
> > index scans end can't have a higher distance?
> > 
> 
> I don't know, really.
> 
> Is the presented exaple really a case of an "erroneously short
> distance"?

I think the query isn't actually measuring something particularly useful in
the general case. You're benchmarking something were the results are never
looked at - which means the time between two index fetches is unrealistically
short. That means any tiny latency increase matters a lot more than with
realistic queries.

And this is, IIUC, on a local SSD. I'd bet that on cloud latencies AIO would
still be a huge win.


> From the 2x regression (compared to master) it might seem like that, but
> even with the increased distance it's still slower than master (by 25%). So
> maybe the "error" is to use AIO in these cases, instead of just switching to
> I/O done by the backend.

If it's slower at a higher distance, we're missing something.


> It may be a bit worse for non-btree indexes, e.g. for for ordered scans
> on gist indexes (getting the next tuple may require reading many leaf
> pages, so maybe we can't look too far ahead?). Or for indexes with
> naturally "fat" tuples, which limits how many tuples we see ahead.

I am not worried at all about those cases. If you have to read a lot of index
leaf pages to get a heap fetch, a distance of even just 2 will be fine,
because the IPC overhead is a neglegible cost compared to the index
processing. Similarly, if you have to do very deep index traversals due to
wide index tuples, there's going to be more time between two table fetches.


> > Obviously you can construct cases with a low distance by having indexes point
> > to a lot of tiny tuples pointing to perfectly correlated pages, but in that
> > case IO can't be a significant factor.
> > 
> 
> It's definitely true the examples the script finds are "adversary", but
> also not entirely unrealistic.

I think doing index scans where the results are just thrown out are entirely
unrealistic...


> I suppose there will be such cases for any heuristics we come up with.

Agreed.


> There's probably more cases like this, where we end up with many hits.
> Say, a merge join may visit index tuples repeatedly, and so on. But then
> it's likely in shared buffers, so there won't be any IPC.

Yea, I'd not expect a meaningful impact of any of this in a workload like
that.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Generate GUC tables from .dat file
Next
From: Álvaro Herrera
Date:
Subject: Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt