As already discussed on another nearby thread [1], I think that we
should get rid of nbtree's BTScanPosUnpinIfPinned idiom (which
originated in 2015 commit 2ed5b87f), since it is rather brittle.
Recent experience from today's bugfix commit 7c319f54 made that pretty
clear.
Attached patch gets rid of nbtree's use of BTScanPosUnpinIfPinned.
Rather than deciding whether or not to unpin a saved position's buffer
on the basis of it having a buffer set at all (at any point where we
invalidate the buffer), the patch has us condition everything on the
scan-level so->dropPin field. This so->dropPin field was added by my
recent commit e6eed40e; you can think of this patch as finishing off
the work started in that commit.
I'll add this patch to the Postgres 19 cycle's first commitfest.
More on the underlying motivation:
Removing BTScanPosUnpinIfPinned allows me to significantly simplify
the management of buffer pins with mark/restore. The patch also gets
rid of all of the calls to IncrBufferRefCount made from
nbtree, since it's no longer necessary to support a so->markPos
representation of a mark that needs its own pin, independent of the
pin held by/for so->currPos (if so->markPos needs its own pin it'll be
because it has its own page).
In other words, with the patch, there is exactly one way of
representing that there's a mark held on the current page. We still
need multiple pins (one on currPos, another on markPos) during
!so->dropPin scans, but now that only happens when they are on
different pages. In other other words, the only valid representation
of a mark held on the page currently in so->currPos is the
"so->markItemIndex" field (we're now consistent about that).
I also believe that this simplified approach to holding buffer pins in
mark/restore will facilitate better nbtree designs. Having "exactly
one way of representing that there's a mark held on the current page"
helps with designs that do more heap prefetching by reading multiple
leaf pages, before earlier pages have performed all of their required
heap page accesses.
I'm thinking of new designs that maintain multiple leaf page
details/returned items within one big currPos-like position (the
current so->currPos just maintains a single leaf page). The main thing
that makes that difficult right now is the complex rules around buffer
pins, particularly with mark/restore, which is generally a very hard
code path to test.
[1] https://postgr.es/m/CAH2-Wzm3a8i31aBXi6oQt9uG7m1-xpX-MXjMMYoDJH=sBj1jnA@mail.gmail.com
--
Peter Geoghegan