On Sun, Jun 6, 2021 at 11:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Sorry, but I already killed the process to try to follow Matthias' suggestion.
> I have a core file from "gcore" but it looks like it's incomplete and the
> address is now "out of bounds"...
Based on what you said about ending up back in lazy_scan_prune()
alone, I think he's right. That is, I agree that it's very likely that
the stuck VACUUM would not have become stuck had the "goto retry on
HEAPTUPLE_DEAD inside lazy_scan_prune" thing not been added by commit
8523492d4e3. But that in itself doesn't necessarily implicate commit
8523492d4e3.
The interesting question is: Why doesn't heap_page_prune() ever agree
with HeapTupleSatisfiesVacuum() calls made from lazy_scan_prune(), no
matter how many times the call to heap_page_prune() is repeated? (It's
repeated to try to resolve the disagreement that aborted xacts can
sometimes cause.)
If I had to guess I'd say that the underlying problem has something to
do with heap_prune_satisfies_vacuum() not agreeing with
HeapTupleSatisfiesVacuum(), perhaps only when GlobalVisCatalogRels is
used. But that's a pretty wild guess at this point.
--
Peter Geoghegan