Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Robert Haas
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CA+TgmoYDsjyLBOE8YG-hXryGu4WSoV1-oQGRG-uggFZOesV6-A@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."
Re: recovering from "found xmin ... from before relfrozenxid ..."
List pgsql-hackers
On Wed, Aug 26, 2020 at 10:26 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Okay, point noted.

I spent some time today working on this patch. I'm fairly happy with
it now and intend to commit it if nobody sees a big problem with that.
Per discussion, I do not intend to back-patch at this time. The two
most significant changes I made to your version are:

1. I changed things around to avoid using any form of ereport() in a
critical section. I'm not actually sure whether it is project policy
to avoid ereport(NOTICE, ...) or similar in a critical section, but it
seems prudent, because if anything fails in a critical section, we
will PANIC, so doing fewer things there seems prudent.

2. I changed the code so that it does not try to follow redirected
line pointers; instead, it skips them with an appropriate message, as
we were already doing for dead and unused line pointers. I think the
way you had it coded might've been my suggestion originally, but the
more I looked into it the less I liked it. One problem is that it
didn't match the docs. A second is that following a corrupted line
pointer might index off the end of the line pointer array, and while
that probably shouldn't happen, we are talking about corruption
recovery here. Then I realized that, as you coded it, if the line
pointer was redirected to a line pointer that is in turn dead (or
unused, if there's corruption) the user would get a NOTICE complaining
about a TID they hadn't specified, which seems like it would be very
confusing. I thought about trying to fix all that stuff, but it just
didn't seem worth it, because I can't think of a good reason to pass
this function the TID of a redirected line pointer in the first place.
If you're doing surgery, you should probably specify the exact thing
upon which you want to operate, not some other thing that points to
it.

Here is a list of other changes I made:

* Added a .gitignore file.
* Renamed the regression test file from pg_surgery to heap_surgery to
match the name of the single C source file we currently have.
* Capitalized TID in a few places.
* Ran pgindent.
* Adjusted various comments.
* Removed the check for an empty TID array. I don't see any reason why
this should be an error case and I don't see much precedent for having
such a check.
* Fixed the code to work properly with that change and added a test case.
* Added a check that the array is not multi-dimensional.
* Put the AM type check before the relkind check, following existing precedent.
* Adjusted the check to use the AM OID rather than the handler OID,
following existing precedent. Fixed the message wording accordingly.
* Changed the documentation wording to say less about specific
recovery procedures and focus more on the general idea that this is
dangerous.
* Removed all but one of the test cases that checked what happens if
you use this on a non-heap; three tests for basically the same thing
seemed excessive.
* Added some additional tests to improve code coverage. There are now
only a handful of lines not covered.
* Reorganized the test cases somewhat.

New patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: "cert" + clientcert=verify-ca in pg_hba.conf?
Next
From: Robert Haas
Date:
Subject: Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)