Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 1f4a10b3-0e5d-43cd-99d8-7657c02f4078@vondra.me Whole thread Raw |
In response to | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: index prefetching
Re: index prefetching |
List | pgsql-hackers |
On 7/17/25 00:33, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 6:18 PM Andres Freund <andres@anarazel.de> wrote: >> There's no problem today - the indexams never use the tids to look up blocks >> themselves. They're always passed to the tableam to do so (via >> table_index_fetch_tuple() etc). I.e. the translation from TIDs to specific >> blocks & buffers happens entirely inside the tableam, therefore the tableam >> can choose to not use a 1:1 mapping or even to not use any buffers at all. > > Of course. Somehow, I missed that obvious point. That is the bare > minimum for a new interface such as this. > >> ISTM the right answer would be to allow the tableam to get the batches, >> without indexam feeding the read stream. That, perhaps not so coincidentally, >> is also what's needed for batching heap page locking and and HOT search. > > I agree. > >> I think this means that it has to be the tableam that creates the read stream >> and that does the work that's currently done in index_scan_stream_read_next(), >> i.e. the translation from TID to whatever resources are required by the >> tableam. Which presumably would include the tableam calling >> index_batch_getnext(). > > It probably makes sense to put that off for (let's say) a couple more > months. Just so we can get what we have now in better shape. The > "complex" patch only very recently started to pass all my tests (my > custom nbtree test suite used for my work in 17 and 18). > I agree tableam needs to have a say in this, so that it can interpret the TIDs in a way that fits how it actually stores data. But I'm not sure it should be responsible for calling index_batch_getnext(). Isn't the batching mostly an "implementation" detail of the index AM? That's how I was thinking about it, at least. Some of these arguments could be used against the current patch, where the next_block callback is defined by executor nodes. So in a way those are also "aware" of the batching. > I still need buy-in from Tomas on the "complex" approach. We chatted > briefly on IM, and he seems more optimistic about it than I thought > (in my on-list remarks from earlier). It is definitely his patch, > and I don't want to speak for him. I think I feel much better about the "complex" approach, mostly because you got involved and fixed some of the issues I've been struggling with. That is a huge help, thank you for that. The reasons why I started to look at the "simple" patch again [1] were not entirely technical, at least not in the sense "Which of the two designs is better?" It was mostly about my (in)ability to get it into a shape I'd be confident enough to commit. I kept running into weird and subtle issues in parts of the code I knew nothing about. Great way to learn stuff, but also a great way to burnout ... So the way I was thinking about it is more "perfect approach that I'll never be able to commit" vs. "good (and much simpler) approach". It's a bit like in the saying about a tree falling in forest. If a perfect patch never gets committed, does it make a sound? From the technical point of view, the "complex" approach is clearly more flexible. Because how could it not be? It can do everything the simple approach can, but also some additional stuff thanks to having multiple leaf pages at once. The question I'm still trying to figure out is how significant those benefits are, and whether it's worth it the extra complexity. I realize there's a difference between "complexity of a patch" and "complexity of the final code", and it may very well be that the complex approach would result in a much cleaner final code - I don't know. I don't have any clear "vision" of how the index AMs should work. My ambition was (and still is) limited to "add prefetching to index scans", and I don't feel qualified to make judgments about the overall design of index AMs (interfaces, layering). I have opinions, of course, but I also realize my insights are not very deep in this area. Which is why I've been trying to measure the "practical" differences between the two approaches, e.g. trying to compare how it performs on different data sets, etc. There are some pretty massive differences in favor of the "complex" approach, mostly due to the single-leaf-page limitation of the simple patch. I'm still trying to understand if this is "inherent" or if it could be mitigated in read_stream_reset(). (Will share results from a couple experiments in a separate message later.) This is the context of the benchmarks I've been sharing - me trying to understand the practical implications/limits of the simple approach. Not an attempt to somehow prove it's better, or anything like that. I'm not opposed to continuing work on the "complex" approach, but as I said, I'm sure I can't pull that off on my own. With your help, I think the chance of success would be considerably higher. Does this clarify how I think about the complex patch? regards [1] https://www.postgresql.org/message-id/32c15a30-6e25-4f6d-9191-76a19482c556%40vondra.me -- Tomas Vondra
pgsql-hackers by date: