Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date
Msg-id CA+TgmoY44P_retk+YikORugc_7=t0Qne9vOBWD6R8qMy6OkL3w@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin  (Andres Freund <andres@anarazel.de>)
Responses Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
List pgsql-hackers
On Tue, Jun 25, 2024 at 8:03 AM Andres Freund <andres@anarazel.de> wrote:
> I think that's going in the wrong direction. We *want* to prune more
> aggressively if we can (*), the necessary state is represented by the
> vistest. That's a different thing than *having* to prune tuples beyond a
> certain xmin (the cutoff determined by vacuum.c/vacuumlazy.c). The problem
> we're having here is that the two states can get out of sync due to the
> vistest "moving backwards", because of hot_standby_feedback (and perhaps also
> an issue around aborts).

I agree that we want to prune more aggressively if we can. I think
that fixing this by preventing vistest from going backward is
reasonable, and I like it better than what Melanie proposed, although
I like what Melanie proposed much better than not fixing it! I'm not
sure how to do that cleanly, but one of you may have an idea.

I do think that having a bunch of different XID values that function
as horizons and a vistest object that holds some more XID horizons
floating around in vacuum makes the code hard to understand. The
relationships between the various values are not well-documented. For
instance, the vistest has to be after vacrel->cutoffs.OldestXmin for
correctness, but I don't think there's a single comment anywhere
saying that; meanwhile, the comments for VacuumCutoffs say "OldestXmin
is the Xid below which tuples deleted by any xact (that committed)
should be considered DEAD, not just RECENTLY_DEAD." Surely the reader
can be forgiven for thinking that this is the cutoff that will
actually be used by pruning, but it isn't.

And more generally, it seems like a fairly big problem to me that
LVRelState directly stores NewRelfrozenXid; contains a VacuumCutoffs
object that stores relfrozenxid, OldestXmin, and FreezeLimit; and also
points to a GlobalVisState object that contains definitely_needed and
maybe_needed. That is six different XID cutoffs for one vacuum
operation. That's a lot. I can't describe how they're all different
from each other or what the necessary relationships between them are
off-hand, and I bet nobody else could either, at least until recently,
else we might not have this bug. I feel like if it were possible to
have fewer of them and still have things work, we'd be better off. I'm
not sure that's doable. But six seems like a lot.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: improve ssl error code, 2147483650
Next
From: Andres Freund
Date:
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin