Attached are two patches, both of which are fixes for bugs in nbtree
VACUUM page deletion.
The first fix for a bug in commit 857f9c36cda. The immediate issue is
that the code that maintains the oldest btpo.xact in the index
accesses the special area of pages without holding a buffer pin. More
fundamentally, the logic for examining pages that are deleted by the
current VACUUM operation for the purposes of maintaining the oldest
btpo.xact needs to be pushed down into the guts of the page deletion
code. This has been described on other threads [1][2], but I'm
starting a new one to focus attention on the bugs themselves (any
discussion of Valgrind instrumentation can remain on the other
thread).
The second fix is a spin-off of the first. It fixes a much less
serious issue that is present in all supported versions (it has
probably been around forever). The issue is with undercounting in the
"%u index pages have been deleted" figure reported by VACUUM VERBOSE.
For example, suppose I delete most of the tuples from a table with a
B-Tree index, leading to lots of empty pages that are candidates for
deletion (the specifics really aren't very important). I then run
VACUUM VERBOSE, and can see something like this:
4900 index pages have been deleted, 0 are currently reusable.
If I immediately run another VACUUM VERBOSE, I can see this:
4924 index pages have been deleted, 4924 are currently reusable.
It makes sense that the pages weren't reusable in the first VACUUM,
but were in the second VACUUM. But where did the extra 24 pages come
from? That doesn't make any sense at all. With the second patch
applied, the same test case shows the correct number ("4924 index
pages have been deleted") consistently. See the patch for details of
how the accounting is wrong. (The short version is that we need to
push the accounting down into the guts of page deletion, which is why
the second patch relies on the first patch.)
I would like to backpatch both patches to all branches that have
commit 857f9c36cda -- v11, v12, and master. The second issue isn't
serious, but it seems worth keeping v11+ in sync in this area. Note
that any backpatch theoretically creates an ABI break for callers of
the _bt_pagedel() function. Perhaps I could get around this by using
global state in the back branches or something, but that's ugly as
sin. Besides, I have a very hard time imagining an extension that
feels entitled to call _bt_pagedel(). There are all kinds of ways in
which that's a terrible idea. And it's hard to imagine how that could
ever seem useful. I'd like to hear what other people think about this
ABI issue in particular, though.
[1] https://postgr.es/m/CAH2-Wz=mWPBLZ2cr95cBV=kzPav8OOBtkHTfg+-+AUiozANK0w@mail.gmail.com
[2] https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
--
Peter Geoghegan