Thread: "failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
"failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
From
Peter Geoghegan
Date:
Commit d70cf811, from 2014, promoted an Assert() within IndexBuildHeapScan() to a "can't happen" elog() error, in order to detect when a parent tuple cannot be found for some heap-only tuple -- if this happens, then it indicates corruption. I think that we should make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION, to match what Andres just added to code that deals with freezing (he promoted Assert()s to errors, just like the 2014 commit, though he went as far as making them ereport()s to begin with). Attached patch does this. I propose a backpatch to 9.3, partially for the sake of tools like amcheck, where users may only be on the lookout for ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED. FWIW, an old MultiXact/recovery bug, alluded to by the commit message of d70cf811 [1] (and fixed by 6bfa88acd) was the cause of some déjà vu for me while looking into the "freeze the dead" issues. Because the enhanced amcheck [2] actually raised this error when I went to verify the first "freeze the dead" bugfix [3], it's clearly effective as a test for certain types of corruption. If CREATE INDEX/IndexBuildHeapScan() didn't already perform this check, then it would probably be necessary for amcheck to implement it on its own. What heap_get_root_tuples() does for us here is ideally suited to finding inconsistencies in HOT chains, because it matches xmin against xmax, looks at line pointer bits/redirects, and consults pg_multixact if necessary. The only thing that it *doesn't* do is make sure that hint bits accurately reflect what it says in the CLOG -- we'll need to find another way to do that, by directly targeting heap relations with their own function. In short, it does an awful lot for tools like amcheck, and I want to make sure that we get the full benefit of that. [1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com [2] https://github.com/petergeoghegan/amcheck#optional-heapallindexed-verification [3] https://postsgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com -- Peter Geoghegan
Attachment
Re: "failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
From
David Steele
Date:
On 12/15/17 5:31 PM, Peter Geoghegan wrote: > Commit d70cf811, from 2014, promoted an Assert() within > IndexBuildHeapScan() to a "can't happen" elog() error, in order to > detect when a parent tuple cannot be found for some heap-only tuple -- > if this happens, then it indicates corruption. I think that we should > make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION, > to match what Andres just added to code that deals with freezing (he > promoted Assert()s to errors, just like the 2014 commit, though he > went as far as making them ereport()s to begin with). Attached patch > does this. > > I propose a backpatch to 9.3, partially for the sake of tools like > amcheck, where users may only be on the lookout for > ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED. This patch applies, passes testing, and appears very straight-forward to me. Are there any tests for these conditions currently, or are you only doing that in amcheck? -- -David david@pgmasters.net
Re: "failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
From
Peter Geoghegan
Date:
On Thu, Mar 1, 2018 at 11:15 AM, David Steele <david@pgmasters.net> wrote: > On 12/15/17 5:31 PM, Peter Geoghegan wrote: >> Commit d70cf811, from 2014, promoted an Assert() within >> IndexBuildHeapScan() to a "can't happen" elog() error, in order to >> detect when a parent tuple cannot be found for some heap-only tuple -- >> if this happens, then it indicates corruption. I think that we should >> make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION, >> to match what Andres just added to code that deals with freezing (he >> promoted Assert()s to errors, just like the 2014 commit, though he >> went as far as making them ereport()s to begin with). Attached patch >> does this. >> >> I propose a backpatch to 9.3, partially for the sake of tools like >> amcheck, where users may only be on the lookout for >> ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED. > > This patch applies, passes testing, and appears very straight-forward to > me. Are there any tests for these conditions currently, or are you only > doing that in amcheck? The enhanced amcheck in the current CF only tests these conditions indirectly, by pretending to be a CREATE INDEX statement. It matters to amcheck because these conditions are pretty plausible symptoms of corruption, and I can imagine someone missing them because they are not either ERRCODE_DATA_CORRUPTION or ERRCODE_INDEX_CORRUPTED. If amcheck didn't exist, then I'd still think that this patch was worthwhile. -- Peter Geoghegan
Re: "failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
From
David Steele
Date:
On 3/1/18 2:19 PM, Peter Geoghegan wrote: > On Thu, Mar 1, 2018 at 11:15 AM, David Steele <david@pgmasters.net> wrote: >> On 12/15/17 5:31 PM, Peter Geoghegan wrote: >>> Commit d70cf811, from 2014, promoted an Assert() within >>> IndexBuildHeapScan() to a "can't happen" elog() error, in order to >>> detect when a parent tuple cannot be found for some heap-only tuple -- >>> if this happens, then it indicates corruption. I think that we should >>> make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION, >>> to match what Andres just added to code that deals with freezing (he >>> promoted Assert()s to errors, just like the 2014 commit, though he >>> went as far as making them ereport()s to begin with). Attached patch >>> does this. >>> >>> I propose a backpatch to 9.3, partially for the sake of tools like >>> amcheck, where users may only be on the lookout for >>> ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED. >> >> This patch applies, passes testing, and appears very straight-forward to >> me. Are there any tests for these conditions currently, or are you only >> doing that in amcheck? > > The enhanced amcheck in the current CF only tests these conditions > indirectly, by pretending to be a CREATE INDEX statement. It matters > to amcheck because these conditions are pretty plausible symptoms of > corruption, and I can imagine someone missing them because they are > not either ERRCODE_DATA_CORRUPTION or ERRCODE_INDEX_CORRUPTED. > > If amcheck didn't exist, then I'd still think that this patch was worthwhile. Yes, I agree. My thrust was more to discover if there is any testing for these conditions being done in core. It sounds like no, but I don't think it's the responsibility of this patch to add them. I'll mark it Ready for Committer. -- -David david@pgmasters.net
Re: "failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
From
Peter Geoghegan
Date:
On Thu, Mar 1, 2018 at 11:23 AM, David Steele <david@pgmasters.net> wrote: > Yes, I agree. My thrust was more to discover if there is any testing > for these conditions being done in core. It sounds like no, but I don't > think it's the responsibility of this patch to add them. The only tests possible with stock Postgres are performed by CREATE INDEX/REINDEX. That generally isn't practical as a general smoke test, not least because you cannot do a CREATE INDEX on a standby -- unlike the amcheck heapallindexed test. -- Peter Geoghegan
Re: "failed to find parent tuple for heap-only tuple" error as an ERRCODE_DATA_CORRUPTION ereport()
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Mar 1, 2018 at 11:23 AM, David Steele <david@pgmasters.net> wrote: >> Yes, I agree. My thrust was more to discover if there is any testing >> for these conditions being done in core. It sounds like no, but I don't >> think it's the responsibility of this patch to add them. > The only tests possible with stock Postgres are performed by CREATE > INDEX/REINDEX. Yeah, I see no practical way to test this error case, and little point in asking this patch to do anything about it. If we had some test rig for injecting corrupted data, it might be interesting to look at ... Patch looks fine to me, will push. regards, tom lane
Re: "failed to find parent tuple for heap-only tuple" error as anERRCODE_DATA_CORRUPTION ereport()
From
Peter Geoghegan
Date:
On Thu, Mar 1, 2018 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Patch looks fine to me, will push. Thank you. -- Peter Geoghegan