On Mon, Mar 15, 2021 at 4:11 PM Andres Freund <andres@anarazel.de> wrote:
> > > I'm not comfortable with this change without adding more safety
> > > checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
> > > and the xid needs to be frozen, we'll either cause errors or
> > > corruption. Yes, that's already the case with params->index_cleanup ==
> > > DISABLED, but that's not that widely used.
> >
> > I noticed that Noah's similar 2013 patch [1] added a defensive
> > heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I
> > suppose that that's roughly what you have in mind here?
>
> I'm not sure that's sufficient. If the case is legitimately reachable
> (I'm maybe 60% is not, after staring at it for a long time, but ...),
> then we can't just error out when we didn't so far.
If you're only 60% sure that the heap_tuple_needs_freeze() error thing
doesn't break anything that should work by now then it seems unlikely
that you'll ever get past 90% sure. I think that we should make a
conservative assumption that the defensive elog(ERROR) won't be
sufficient, and proceed on that basis.
> I kinda wonder whether this case should just be handled by just gotoing
> back to the start of the blkno loop, and redoing the pruning. The only
> thing that makes that a bit more complicatd is that we've already
> incremented vacrelstats->{scanned_pages,vacrelstats->tupcount_pages}.
That seems like a good solution to me -- this is a very seldom hit
path, so we can be a bit inefficient without it mattering.
It might make sense to *also* check some things (maybe using
heap_tuple_needs_freeze()) in passing, just for debugging purposes.
> We really should put the per-page work (i.e. the blkno loop body) of
> lazy_scan_heap() into a separate function, same with the
> too-many-dead-tuples branch.
+1.
BTW I've noticed that the code (and code like it) tends to confuse
things that the current VACUUM performed versus things by *some*
VACUUM (that may or may not be current one). This refactoring might be
a good opportunity to think about that as well.
> > * It is assumed that the caller has checked the tuple with
> > * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
> > * (else we should be removing the tuple, not freezing it).
> >
> > Does that need work too?
>
> I'm pretty scared of the index-cleanup-disabled path, for that reason. I
> think the hot path is more likely to be unproblematic, but I'd not bet
> my (nonexistant) farm on it.
Well if we can solve the problem by simply doing pruning once again in
the event of a HEAPTUPLE_DEAD return value from the lazy_scan_heap()
HTSV call, then the comment becomes 100% true (which is not the case
even today).
--
Peter Geoghegan