Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CA+fd4k67ke0uj4ioK03o7bepxHadyM64c+veop+np_GKZk3TsA@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, 28 Aug 2020 at 23:39, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Aug 28, 2020 at 4:07 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > You've removed the description about executing VACUUM with
> > DISABLE_PAGE_SKIPPING option on the target relation after using
> > pg_surgery functions from the doc but I guess it’s better to recommend
> > that in the doc for safety. Could you please tell me the reason for
> > removing that?
>
> Well, I think that was added because there wasn't code to clear the
> visibility map bits, either page-level in the map, but we added code
> for that, so now I don't really see why it's necessary or even
> desirable.
>
> Here are a few example scenarios:
>
> 1. My table is not corrupt. For no particular reason, I force-freeze
> or force-kill a tuple which is neither dead nor all-visible.
> Concurrent queries might return wrong answers, but the table is not
> corrupt. It does not require VACUUM and would not benefit from it.
> Actually, it doesn't need anything at all.
>
> 2. My table is not corrupt. For no particular reason, I force-freeze a
> tuple which is dead. I believe it's possible that the index entries
> for that tuple might be gone already, but VACUUM won't fix that.
> REINDEX or a table rewrite would, though. It's also possible, if the
> dead tuple was added by an aborted transaction which added columns to
> the table, that the tuple might have been created using a tuple
> descriptor that differs from the table's current tuple descriptor. If
> so, I think scanning the table could produce a crash. VACUUM won't fix
> this, either. I would need to delete or force-kill the offending
> tuple.
>
> 3. I have one or more tuples in my table that are intact except that
> they have garbage values for xmin, resulting in VACUUM failure or
> possibly even SELECT failure if the CLOG entries are also missing. I
> force-kill or force-freeze them. If by chance the affected tuples were
> also omitted from one or more indexes, a REINDEX or table rewrite is
> needed to fix them, but a VACUUM will not help. On the other hand, if
> those tuples are present in the indexes, there's no remaining problem
> and VACUUM is not needed for the purpose of restoring the integrity of
> the table. If the problem has been ongoing for a while, VACUUM might
> be needed to advance relfrozenxid, but that doesn't require
> DISABLE_PAGE_SKIPPING.
>
> 4. I have some pages in my table that have incorrect visibility map
> bits. In this case, I need VACUUM (DISABLE_PAGE_SKIPPING). However, I
> don't need the functions we're talking about here at all unless I also
> have tuples with corrupted visibility information. If I do happen to
> have both tuples with corrupted visibility information and also pages
> with incorrect visibility map bits, then I suppose I need both these
> tools and also VACUUM (DISABLE_PAGE_SKIPPING). Probably, I'll want to
> do the VACUUM second. But, if I happened to do the VACUUM first and
> then use these functions afterward, the worst thing that could happen
> is that I might end up with a some dead tuples that could've gotten
> removed faster if I'd switched the order. And that's not a disaster.
>
> Basically, I can see no real reason to recommend VACUUM
> (DISABLE_PAGE_SKIPPING) here. There are problems that can be fixed
> with that command, and there are problems that can be fixed by this
> method, but they are mostly independent of each other. We should not
> recommend that people run VACUUM "just in case." That kind of fuzzy
> thinking seems relatively prevalent already, and it leads to people
> spending a lot of time running slow maintenance commands that do
> nothing to help them, and which occasionally make things worse.
>

Thank you for your explanation. That very makes sense to me.

If vacuum could fix the particular kind of problem by using together
with pg_surgery we could recommend using vacuum. But I agree that the
corruption of heap table is not the case.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Documentation patch for backup manifests in protocol.sgml
Next
From: Tomas Vondra
Date:
Subject: Re: Disk-based hash aggregate's cost model