recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

Hi,

A number of EDB customers have had this error crop on their tables for
reasons that we have usually not been able to determine. In many
cases, it's probably down to things like running buggy old releases
for a long time before upgrading, or bad backup and recovery
procedures. It's more than possible that there are still-unfixed
server bugs, but I do not have any compelling evidence of such bugs at
this time. Unfortunately, once you're in this situation, it's kind of
hard to find your way out of it. There are a few problems:

1. There's nothing to identify the tuple that has the problem, and no
way to know how many more of them there might be. Back-patching
b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
part of this.

2. In some other, similar situations, e.g. where the tuple data is
garbled, it's often possible to get out from under the problem by
deleting the tuple at issue. But I think that doesn't necessarily fix
anything in this case.

3. We've had some success with using a PL/plgsql loop with an
EXCEPTION block to extract all the accessible tuples from the table.
Then you can truncate the original table and reinsert the data. But
this is slow, so it stinks if the table is big, and it's not a viable
approach if the table in question is a system catalog table -- at
least if it's not if it's something critical like pg_class.

I realize somebody's probably going to say "well, you shouldn't try to
repair a database that's in this state, you shouldn't let it happen in
the first place, and if it does happen, you should track the root
cause to the ends of the earth." But I think that's a completely
impractical approach. I at least have no idea how I'm supposed to
figure out when and how a bad relfrozenxid ended up in the table, and
by the time the problem is discovered after an upgrade the problem
that caused it may be quite old. Moreover, not everyone is as
interested in an extended debugging exercise as they are in getting
the system working again, and VACUUM failing repeatedly is a pretty
serious problem.

Therefore, one of my colleagues has - at my request - created a couple
of functions called heap_force_kill() and heap_force_freeze() which
take an array of TIDs. The former truncates them all to dead line
pointers. The latter resets the infomask and xmin to make the xmin
frozen. (It should probably handle the xmax too; not sure that the
current version does that, but it's easily fixed if not.) The
intention is that you can use these to get either get rid of, or get
access to, tuples whose visibility information is corrupted for
whatever reason. These are pretty sharp tools; you could corrupt a
perfectly-good table by incautious use of them, or destroy a large
amount of data. You could, for example, force-freeze a tuple created
by a transaction which added a column, inserted data, and rolled back;
that would likely be disastrous. However, in the cases that I'm
thinking about, disaster has already struck, and something that you
can use to get things back to a saner state is better than just
leaving the table perpetually broken. Without something like this, the
backup plan is probably to shut down the server and try to edit the
pages using a perl script or something, but that seems clearly worse.

So I have these questions:

- Do people think it would me smart/good/useful to include something
like this in PostgreSQL?

- If so, how? I would propose a new contrib module that we back-patch
all the way, because the VACUUM errors were back-patched all the way,
and there seems to be no advantage in making people wait 5 years for a
new version that has some kind of tooling in this area.

- Any ideas for additional things we should include, or improvements
on the sketch above?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: Proposal: Automatic partition creation
Next
From: Anastasia Lubennikova
Date:
Subject: Re: Proposal: Automatic partition creation