Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-Wz=FTBKaOS=1zC_ThFSiaLTebGiOvR0WuOj43GJD6igwzg@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
On Fri, Jul 18, 2025 at 1:44 PM Tomas Vondra <tomas@vondra.me> wrote: > I agree tableam needs to have a say in this, so that it can interpret > the TIDs in a way that fits how it actually stores data. But I'm not > sure it should be responsible for calling index_batch_getnext(). Isn't > the batching mostly an "implementation" detail of the index AM? That's > how I was thinking about it, at least. I think of it in roughly the opposite way: to me, the table AM should mostly be in control of the whole process. The index AM (or really some generalized layer that is used for every index AM) should have some influence over the scheduling of index scans, but in typical cases where prefetching might be helpful the index AM should have little or no impact on the scheduling. All of this business with holding on to buffer pins is 100% due to heap AM implementation details. Index vacuuming doesn't acquire cleanup locks because the index AM requires it. Cleanup locks are only required because otherwise there are races that affect index scans, where we get confused about which TID relates to which logical row. That's why bitmap index scans don't need to hold onto pins at all. It's true that the current index AM API makes this the direct responsibility of index AMs, by requiring them to hold on to buffer pins across heap accesses. But that's just a historical accident. > The reasons why I started to look at the "simple" patch again [1] were > not entirely technical, at least not in the sense "Which of the two > designs is better?" It was mostly about my (in)ability to get it into a > shape I'd be confident enough to commit. I kept running into weird and > subtle issues in parts of the code I knew nothing about. Great way to > learn stuff, but also a great way to burnout ... I was almost 100% sure that those nbtree implementation details were quite fixable from a very early stage. I didn't really get involved too much at first, because I didn't want to encroach. I probably could have done a lot better with that myself. > So the way I was thinking about it is more "perfect approach that I'll > never be able to commit" vs. "good (and much simpler) approach". It's a > bit like in the saying about a tree falling in forest. If a perfect > patch never gets committed, does it make a sound? Give yourself some credit. The complex patch is roughly 98% your work, and already works quite well. It's far from committable, of course, but it feels like it's already in roughly the right shape. > From the technical point of view, the "complex" approach is clearly more > flexible. Because how could it not be? It can do everything the simple > approach can, but also some additional stuff thanks to having multiple > leaf pages at once. Right. More than anything else, I don't like the "simple" approach because limiting the number of leaf pages that can read to exactly one feels so unnatural to me. It works in terms of the existing behavior with reading one leaf page at a time to do heap prefetching. But that existing behavior is itself a behavior that only exists for the benefit of heapam. It just seems circular to me: "simple" heap prefetching does things in a way that's convenient for index AMs, specifically around the leaf-at-a-time implementation details -- details which only exist for the benefit of heapam. My sense is that just cutting out the index AM entirely is a much more principled approach. It's also because of the ability to reorder work, and to centralize scheduling of index scans, of course -- there are practical benefits, too. But, honestly, my primary concern is this issue with "circularity". The "simple" patch is simpler only as one incremental step. But it doesn't actually leave the codebase as a whole in a simpler state than I believe to be possible with the "complex" patch. It won't really be simpler in the first committed version, and it definitely won't be if we ever want to improve on that. If anybody else has an opinion on this, please speak up. I'm pretty sure that only Tomas and I have commented on this important aspect directly. I don't want to win the argument; I just want the best design. > I don't have any clear "vision" of how the index AMs should work. My > ambition was (and still is) limited to "add prefetching to index scans", > and I don't feel qualified to make judgments about the overall design of > index AMs (interfaces, layering). I have opinions, of course, but I also > realize my insights are not very deep in this area. Thanks for being so open. Your position is completely reasonable. > Which is why I've been trying to measure the "practical" differences > between the two approaches, e.g. trying to compare how it performs on > different data sets, etc. There are some pretty massive differences in > favor of the "complex" approach, mostly due to the single-leaf-page > limitation of the simple patch. I'm still trying to understand if this > is "inherent" or if it could be mitigated in read_stream_reset(). (Will > share results from a couple experiments in a separate message later.) At a minimum, you should definitely teach the "simple" patchset to not reset the prefetch distance when there's no real need for it. That puts the "simple" patch at an artificial and unfair disadvantage. > This is the context of the benchmarks I've been sharing - me trying to > understand the practical implications/limits of the simple approach. Not > an attempt to somehow prove it's better, or anything like that. Makes sense. > I'm not opposed to continuing work on the "complex" approach, but as I > said, I'm sure I can't pull that off on my own. With your help, I think > the chance of success would be considerably higher. I can commit to making this project my #1 focus for Postgres 19 (#1 focus by far), provided the "complex" approach is used - just say the word. I cannot promise that we will be successful. But I can say for sure that I'll have skin in the game. If the project fails, then I'll have failed too. > Does this clarify how I think about the complex patch? Yes, it does. BTW, I don't think that there's all that much left to be said about nbtree in particular here. I don't think that there's very much work left there. -- Peter Geoghegan
pgsql-hackers by date: