Re: index prefetching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: index prefetching
Date
Msg-id 1f4a10b3-0e5d-43cd-99d8-7657c02f4078@vondra.me
Whole thread Raw
In response to Re: index prefetching  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: index prefetching
Re: index prefetching
List pgsql-hackers
On 7/17/25 00:33, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 6:18 PM Andres Freund <andres@anarazel.de> wrote:
>> There's no problem today - the indexams never use the tids to look up blocks
>> themselves. They're always passed to the tableam to do so (via
>> table_index_fetch_tuple() etc). I.e. the translation from TIDs to specific
>> blocks & buffers happens entirely inside the tableam, therefore the tableam
>> can choose to not use a 1:1 mapping or even to not use any buffers at all.
> 
> Of course. Somehow, I missed that obvious point. That is the bare
> minimum for a new interface such as this.
> 
>> ISTM the right answer would be to allow the tableam to get the batches,
>> without indexam feeding the read stream.  That, perhaps not so coincidentally,
>> is also what's needed for batching heap page locking and and HOT search.
> 
> I agree.
> 
>> I think this means that it has to be the tableam that creates the read stream
>> and that does the work that's currently done in index_scan_stream_read_next(),
>> i.e. the translation from TID to whatever resources are required by the
>> tableam. Which presumably would include the tableam calling
>> index_batch_getnext().
> 
> It probably makes sense to put that off for (let's say) a couple more
> months. Just so we can get what we have now in better shape. The
> "complex" patch only very recently started to pass all my tests (my
> custom nbtree test suite used for my work in 17 and 18).
> 

I agree tableam needs to have a say in this, so that it can interpret
the TIDs in a way that fits how it actually stores data. But I'm not
sure it should be responsible for calling index_batch_getnext(). Isn't
the batching mostly an "implementation" detail of the index AM? That's
how I was thinking about it, at least.

Some of these arguments could be used against the current patch, where
the next_block callback is defined by executor nodes. So in a way those
are also "aware" of the batching.

> I still need buy-in from Tomas on the "complex" approach. We chatted 
> briefly on IM, and he seems more optimistic about it than I thought 
> (in my on-list remarks from earlier). It is definitely his patch,
> and I don't want to speak for him.

I think I feel much better about the "complex" approach, mostly because
you got involved and fixed some of the issues I've been struggling with.
That is a huge help, thank you for that.

The reasons why I started to look at the "simple" patch again [1] were
not entirely technical, at least not in the sense "Which of the two
designs is better?" It was mostly about my (in)ability to get it into a
shape I'd be confident enough to commit. I kept running into weird and
subtle issues in parts of the code I knew nothing about. Great way to
learn stuff, but also a great way to burnout ...

So the way I was thinking about it is more "perfect approach that I'll
never be able to commit" vs. "good (and much simpler) approach". It's a
bit like in the saying about a tree falling in forest. If a perfect
patch never gets committed, does it make a sound?

From the technical point of view, the "complex" approach is clearly more
flexible. Because how could it not be? It can do everything the simple
approach can, but also some additional stuff thanks to having multiple
leaf pages at once.

The question I'm still trying to figure out is how significant those
benefits are, and whether it's worth it the extra complexity. I realize
there's a difference between "complexity of a patch" and "complexity of
the final code", and it may very well be that the complex approach would
result in a much cleaner final code - I don't know.

I don't have any clear "vision" of how the index AMs should work. My
ambition was (and still is) limited to "add prefetching to index scans",
and I don't feel qualified to make judgments about the overall design of
index AMs (interfaces, layering). I have opinions, of course, but I also
realize my insights are not very deep in this area.

Which is why I've been trying to measure the "practical" differences
between the two approaches, e.g. trying to compare how it performs on
different data sets, etc. There are some pretty massive differences in
favor of the "complex" approach, mostly due to the single-leaf-page
limitation of the simple patch. I'm still trying to understand if this
is "inherent" or if it could be mitigated in read_stream_reset(). (Will
share results from a couple experiments in a separate message later.)

This is the context of the benchmarks I've been sharing - me trying to
understand the practical implications/limits of the simple approach. Not
an attempt to somehow prove it's better, or anything like that.

I'm not opposed to continuing work on the "complex" approach, but as I
said, I'm sure I can't pull that off on my own. With your help, I think
the chance of success would be considerably higher.

Does this clarify how I think about the complex patch?



regards

[1]
https://www.postgresql.org/message-id/32c15a30-6e25-4f6d-9191-76a19482c556%40vondra.me

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Jacob Champion
Date:
Subject: Re: libpq: Process buffered SSL read bytes to support records >8kB on async API