On Sat, Feb 19, 2022 at 5:54 PM Andres Freund <andres@anarazel.de> wrote:
> How does that cause the endless loop?
Attached is the page image itself, dumped via gdb (and gzip'd). This
was on recent HEAD (commit 8f388f6f, actually), plus
0001-Add-adversarial-ConditionalLockBuff[...]. No other changes. No
defragmenting in pg_surgery, nothing like that.
> It doesn't do so on HEAD + 0001-Add-adversarial-ConditionalLockBuff[...] for
> me. So something needs have changed with your patch?
It doesn't always happen -- only about half the time on my machine.
Maybe it's timing sensitive?
We hit the "goto retry" on offnum 2, which is the first tuple with
storage (you can see "the ghost" of the tuple from the LP_DEAD item at
offnum 1, since the page isn't defragmented in pg_surgery). I think
that this happens because the heap-only tuple at offnum 2 is fully
DEAD to lazy_scan_prune, but hasn't been recognized as such by
heap_page_prune. There is no way that they'll ever "agree" on the
tuple being DEAD right now, because pruning still doesn't assume that
an orphaned heap-only tuple is fully DEAD.
We can either do that, or we can throw an error concerning corruption
when heap_page_prune notices orphaned tuples. Neither seems
particularly appealing. But it definitely makes no sense to allow
lazy_scan_prune to spin in a futile attempt to reach agreement with
heap_page_prune about a DEAD tuple really being DEAD.
> Given that heap_surgery's raison d'etre is correcting corruption etc, I think
> it makes sense for it to do as minimal work as possible. Iterating through a
> HOT chain would be a problem if you e.g. tried to repair a page with HOT
> corruption.
I guess that's also true. There is at least a legitimate argument to
be made for not leaving behind any orphaned heap-only tuples. The
interface is a TID, and so the user may already believe that they're
killing the heap-only, not just the root item (since ctid suggests
that the TID of a heap-only tuple is the TID of the root item, which
is kind of misleading).
Anyway, we can decide on what to do in heap_surgery later, once the
main issue is under control. My point was mostly just that orphaned
heap-only tuples are definitely not okay, in general. They are the
least worst option when corruption has already happened, maybe -- but
maybe not.
--
Peter Geoghegan