Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date | |
Msg-id | CA+TgmoYDsjyLBOE8YG-hXryGu4WSoV1-oQGRG-uggFZOesV6-A@mail.gmail.com Whole thread Raw |
In response to | Re: recovering from "found xmin ... from before relfrozenxid ..." (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: recovering from "found xmin ... from before relfrozenxid ..."
Re: recovering from "found xmin ... from before relfrozenxid ..." |
List | pgsql-hackers |
On Wed, Aug 26, 2020 at 10:26 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Okay, point noted. I spent some time today working on this patch. I'm fairly happy with it now and intend to commit it if nobody sees a big problem with that. Per discussion, I do not intend to back-patch at this time. The two most significant changes I made to your version are: 1. I changed things around to avoid using any form of ereport() in a critical section. I'm not actually sure whether it is project policy to avoid ereport(NOTICE, ...) or similar in a critical section, but it seems prudent, because if anything fails in a critical section, we will PANIC, so doing fewer things there seems prudent. 2. I changed the code so that it does not try to follow redirected line pointers; instead, it skips them with an appropriate message, as we were already doing for dead and unused line pointers. I think the way you had it coded might've been my suggestion originally, but the more I looked into it the less I liked it. One problem is that it didn't match the docs. A second is that following a corrupted line pointer might index off the end of the line pointer array, and while that probably shouldn't happen, we are talking about corruption recovery here. Then I realized that, as you coded it, if the line pointer was redirected to a line pointer that is in turn dead (or unused, if there's corruption) the user would get a NOTICE complaining about a TID they hadn't specified, which seems like it would be very confusing. I thought about trying to fix all that stuff, but it just didn't seem worth it, because I can't think of a good reason to pass this function the TID of a redirected line pointer in the first place. If you're doing surgery, you should probably specify the exact thing upon which you want to operate, not some other thing that points to it. Here is a list of other changes I made: * Added a .gitignore file. * Renamed the regression test file from pg_surgery to heap_surgery to match the name of the single C source file we currently have. * Capitalized TID in a few places. * Ran pgindent. * Adjusted various comments. * Removed the check for an empty TID array. I don't see any reason why this should be an error case and I don't see much precedent for having such a check. * Fixed the code to work properly with that change and added a test case. * Added a check that the array is not multi-dimensional. * Put the AM type check before the relkind check, following existing precedent. * Adjusted the check to use the AM OID rather than the handler OID, following existing precedent. Fixed the message wording accordingly. * Changed the documentation wording to say less about specific recovery procedures and focus more on the general idea that this is dangerous. * Removed all but one of the test cases that checked what happens if you use this on a non-heap; three tests for basically the same thing seemed excessive. * Added some additional tests to improve code coverage. There are now only a handful of lines not covered. * Reorganized the test cases somewhat. New patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: