Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-WzkqnVGLEQ31W1vm8T_uzy-ma-6A8QL-C56=0QUqs12b=Q@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On Mon, Nov 11, 2024 at 1:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > That makes sense from the point of view of working with the btree code > itself, but from a system-wide perspective, it's weird to pretend like > the pins don't exist or don't matter just because a buffer lock is > also held. I can see how that could cause confusion. If you're working on nbtree all day long, it becomes natural, though. Both points are true, and relevant to the discussion. I prefer to over-communicate when discussing these points -- it's too easy to talk past each other here. I think that the precise reasons why the index AM does things with buffer pins will need to be put on a more rigorous and formalized footing with Tomas' patch. The different requirements/safety considerations will have to be carefully teased apart. > I had actually forgotten that the btree code tends to > pin+lock together; now that you mention it, I remember that I knew it > at one point, but it fell out of my head a long time ago... The same thing appears to mostly be true of hash, which mostly uses _hash_getbuf + _hash_relbuf (hash's idiosyncratic use of cleanup locks notwithstanding). To be fair it does look like GiST's gistdoinsert function holds onto multiple buffer pins at a time, for its own reasons -- index AM reasons. But this looks to be more or less an optimization to deal with navigating the tree with a loose index order, where multiple descents and ascents are absolutely expected. (This makes it a bit like the nbtree "drop lock but not pin" case that I mentioned in my last email.) It's not as if these gistdoinsert buffer pins persist across calls to amgettuple, though, so for the purposes of this discussion about the new batch API to replace amgettuple they are not relevant -- they don't actually undermine my point. (Though to be fair their existence does help to explain why you found my characterization of buffer pins as irrelevant to index AMs confusing.) The real sign that what I said is generally true of index AMs is that you'll see so few calls to LockBufferForCleanup/ConditionalLockBufferForCleanup. Only hash calls ConditionalLockBufferForCleanup at all (which I find a bit weird). Both GiST and SP-GiST call neither functions -- even during VACUUM. So GiST and SP-GiST make clear that index AMs (that support only MVCC snapshot scans) can easily get by without any use of cleanup locks (and with no externally significant use of buffer pins). > > I think that this is exactly what I propose to do, said in a different > > way. (Again, I wouldn't have expressed it in this way because it seems > > obvious to me that buffer pins don't have nearly the same significance > > to an index AM as they do to heapam -- they have no value in > > protecting the index structure, or helping an index scan to reason > > about concurrency that isn't due to a heapam issue.) > > > > Does that make sense? > > Yeah, it just really throws me for a loop that you're using "pin" to > mean "pin at a time when we don't also hold a lock." I'll try to be more careful about that in the future, then. > The fundamental > purpose of a pin is to prevent a buffer from being evicted while > someone is in the middle of looking at it, and nothing that uses > buffers can possibly work correctly without that guarantee. Everything > you've written in parentheses there is, AFAICT, 100% wrong if you mean > "any pin" and 100% correct if you mean "a pin held without a > corresponding lock." I agree. -- Peter Geoghegan
pgsql-hackers by date: