Home > mailing lists
Re: index prefetching - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: index prefetching
Date	July 18 21:50:46
Msg-id	CAH2-Wz=FTBKaOS=1zC_ThFSiaLTebGiOvR0WuOj43GJD6igwzg@mail.gmail.com Whole thread Raw
In response to	Re: index prefetching (Tomas Vondra <tomas@vondra.me>)
List	pgsql-hackers
Tree view
On Fri, Jul 18, 2025 at 1:44 PM Tomas Vondra <tomas@vondra.me> wrote:
> I agree tableam needs to have a say in this, so that it can interpret
> the TIDs in a way that fits how it actually stores data. But I'm not
> sure it should be responsible for calling index_batch_getnext(). Isn't
> the batching mostly an "implementation" detail of the index AM? That's
> how I was thinking about it, at least.

I think of it in roughly the opposite way: to me, the table AM should
mostly be in control of the whole process. The index AM (or really
some generalized layer that is used for every index AM) should have
some influence over the scheduling of index scans, but in typical
cases where prefetching might be helpful the index AM should have
little or no impact on the scheduling.

All of this business with holding on to buffer pins is 100% due to
heap AM implementation details. Index vacuuming doesn't acquire
cleanup locks because the index AM requires it. Cleanup locks are only
required because otherwise there are races that affect index scans,
where we get confused about which TID relates to which logical row.
That's why bitmap index scans don't need to hold onto pins at all.

It's true that the current index AM API makes this the direct
responsibility of index AMs, by requiring them to hold on to buffer
pins across heap accesses. But that's just a historical accident.

> The reasons why I started to look at the "simple" patch again [1] were
> not entirely technical, at least not in the sense "Which of the two
> designs is better?" It was mostly about my (in)ability to get it into a
> shape I'd be confident enough to commit. I kept running into weird and
> subtle issues in parts of the code I knew nothing about. Great way to
> learn stuff, but also a great way to burnout ...

I was almost 100% sure that those nbtree implementation details were
quite fixable from a very early stage. I didn't really get involved
too much at first, because I didn't want to encroach. I probably could
have done a lot better with that myself.

> So the way I was thinking about it is more "perfect approach that I'll
> never be able to commit" vs. "good (and much simpler) approach". It's a
> bit like in the saying about a tree falling in forest. If a perfect
> patch never gets committed, does it make a sound?

Give yourself some credit. The complex patch is roughly 98% your work,
and already works quite well. It's far from committable, of course,
but it feels like it's already in roughly the right shape.

> From the technical point of view, the "complex" approach is clearly more
> flexible. Because how could it not be? It can do everything the simple
> approach can, but also some additional stuff thanks to having multiple
> leaf pages at once.

Right.

More than anything else, I don't like the "simple" approach because
limiting the number of leaf pages that can read to exactly one feels
so unnatural to me. It works in terms of the existing behavior with
reading one leaf page at a time to do heap prefetching. But that
existing behavior is itself a behavior that only exists for the
benefit of heapam.

It just seems circular to me: "simple" heap prefetching does things in
a way that's convenient for index AMs, specifically around the
leaf-at-a-time implementation details -- details which only exist for
the benefit of heapam. My sense is that just cutting out the index AM
entirely is a much more principled approach.

It's also because of the ability to reorder work, and to centralize
scheduling of index scans, of course -- there are practical benefits,
too. But, honestly, my primary concern is this issue with
"circularity". The "simple" patch is simpler only as one incremental
step. But it doesn't actually leave the codebase as a whole in a
simpler state than I believe to be possible with the "complex" patch.
It won't really be simpler in the first committed version, and it
definitely won't be if we ever want to improve on that.

If anybody else has an opinion on this, please speak up. I'm pretty
sure that only Tomas and I have commented on this important aspect
directly. I don't want to win the argument; I just want the best
design.

> I don't have any clear "vision" of how the index AMs should work. My
> ambition was (and still is) limited to "add prefetching to index scans",
> and I don't feel qualified to make judgments about the overall design of
> index AMs (interfaces, layering). I have opinions, of course, but I also
> realize my insights are not very deep in this area.

Thanks for being so open. Your position is completely reasonable.

> Which is why I've been trying to measure the "practical" differences
> between the two approaches, e.g. trying to compare how it performs on
> different data sets, etc. There are some pretty massive differences in
> favor of the "complex" approach, mostly due to the single-leaf-page
> limitation of the simple patch. I'm still trying to understand if this
> is "inherent" or if it could be mitigated in read_stream_reset(). (Will
> share results from a couple experiments in a separate message later.)

At a minimum, you should definitely teach the "simple" patchset to not
reset the prefetch distance when there's no real need for it. That
puts the "simple" patch at an artificial and unfair disadvantage.

> This is the context of the benchmarks I've been sharing - me trying to
> understand the practical implications/limits of the simple approach. Not
> an attempt to somehow prove it's better, or anything like that.

Makes sense.

> I'm not opposed to continuing work on the "complex" approach, but as I
> said, I'm sure I can't pull that off on my own. With your help, I think
> the chance of success would be considerably higher.

I can commit to making this project my #1 focus for Postgres 19 (#1
focus by far), provided the "complex" approach is used - just say the
word.

I cannot promise that we will be successful. But I can say for sure
that I'll have skin in the game. If the project fails, then I'll have
failed too.

> Does this clarify how I think about the complex patch?

Yes, it does.

BTW, I don't think that there's all that much left to be said about
nbtree in particular here. I don't think that there's very much work
left there.

--
Peter Geoghegan
pgsql-hackers by date:
From: Tomas Vondra
Date: 18 July, 21:31:37
Subject: Re: index prefetching
From: Jacob Champion
Date: 18 July, 22:09:03
Subject: Re: libpq: Process buffered SSL read bytes to support records >8kB on async API
Re: index prefetching - Mailing list pgsql-hackers

Previous

Next