On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Any inserting transaction which aborts after heap_page_prune()'s
> visibility check will now be of no concern to lazy_scan_prune(). Since
> we don't do the visibility check again, we won't find the tuple
> HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
> the array of tuples for vacuum to reap. This does mean that we won't
> reap and remove tuples whose inserting transactions abort right after
> heap_page_prune()'s visibility check. But, this doesn't seem like an
> issue.
It's definitely not an issue.
The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.
> They may not be removed until the next vacuum, but ISTM it is
> actually worse to pay the cost of re-pruning the whole page just to
> clean up that one tuple. Maybe if that renders the page all visible
> and we can mark it as such in the visibility map -- but that seems
> like a relatively narrow use case.
The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.
--
Peter Geoghegan