Hi,
On 2022-02-19 17:22:33 -0800, Peter Geoghegan wrote:
> Looks like pg_surgery isn't processing HOT chains as whole units,
> which it really should (at least in the context of killing items via
> the heap_force_kill() function). Killing a root item in a HOT chain is
> just hazardous -- disconnected/orphaned heap-only tuples are liable to
> cause chaos, and should be avoided everywhere (including during
> pruning, and within pg_surgery).
How does that cause the endless loop?
It doesn't do so on HEAD + 0001-Add-adversarial-ConditionalLockBuff[...] for
me. So something needs have changed with your patch?
> It's likely that the hardening I already planned on adding to pruning
> [1] (as follow-up work to recent bugfix commit 18b87b201f) will
> prevent lazy_scan_prune from getting stuck like this, whatever the
> cause happens to be.
Yea, we should pick that up again. Not just for robustness or
performance. Also because it's just a lot easier to understand.
> Leaving behind disconnected/orphaned heap-only tuples is pretty much
> pointless anyway, since they'll never be accessible by index scans.
> Even after a REINDEX, since there is no root item from the heap page
> to go in the index. (A dump and restore might work better, though.)
Given that heap_surgery's raison d'etre is correcting corruption etc, I think
it makes sense for it to do as minimal work as possible. Iterating through a
HOT chain would be a problem if you e.g. tried to repair a page with HOT
corruption.
Greetings,
Andres Freund