Thread: Failed to re-find parent key
What does the error message failed to re-find parent key in "tablename_pkey" mean? This happens reproducibly during VACUUM on a certain table. Would REINDEX fix it? Anything else we should check? This is PostgreSQL 7.4.2. Are there relevant fixes later in the 7.4 series? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > What does the error message > failed to re-find parent key in "tablename_pkey" > mean? This happens reproducibly during VACUUM on a certain table. If it happens during vacuum (not vacuum full) then it must be coming from _bt_pagedel, and it means that _bt_pagedel could not find the parent-level btree entry for the page it wants to remove from the index. > Would REINDEX fix it? Anything else we should check? REINDEX would fix it, but it would be interesting to find out what the actual cause is. I think we've seen one or two similar reports previously in 7.4.*, but there's never been enough info to track it down. Any chance of going in with a debugger, or capturing a tarball image of the database for someone else to look at? regards, tom lane
On Tue, Mar 22, 2005 at 12:31:55PM +0100, Peter Eisentraut wrote: > What does the error message > > failed to re-find parent key in "tablename_pkey" > > mean? This happens reproducibly during VACUUM on a certain table. This has been reported before, but no one has been able to reproduce it (not the VACUUM, but the steps that led the index to that state). This is probably a very subtle bug introduced after the page-reusing code was introduced in nbtree. I don't think it has been corrected in later releases. There are two ocurrences of this error message in the code: one is while trying to split a page, inserting the pointer to the new page in its parent. This one is not what you are seeing, because during vacuum no splitting takes place. The other ocurrence is at the first pass of page recovery, which happens at VACUUM. The code tries to find the parent page to delete the pointer that leads to the page being unlinked; if it can't find said pointer, the error you see is issued. I think it takes a lot of concurrency for the situation to arise. > Would REINDEX fix it? Anything else we should check? Maybe you could see exactly what page is causing the problem, pg_filedump it, and see what's the exact problem. Yes, a REINDEX fixes it (at least it did in Gaetano's case.) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > I think it takes a lot of concurrency for > the situation to arise. Maybe. Since Peter can reproduce the error, there's not any concurrency misbehavior involved in VACUUM itself; what we are dealing with is probably corruption in the on-disk state of the index (or maybe a legal corner case that _bt_pagedel mishandles). There might have been concurrency to blame for getting into that state in the first place. Need data ... regards, tom lane
Am Dienstag, 22. März 2005 15:54 schrieb Tom Lane: > Any chance of going in with a debugger, or capturing a > tarball image of the database for someone else to look at? Unfortunately, this database is restricted and I don't have access myself. I will tell the customer that they should provide a data directory tarball if they are interested in researching the problem, but I don't expect much. The database does have a lot of concurrent read and write access and extremely high load, but I am aware that this doesn't help pinpointing the problem. -- Peter Eisentraut http://developer.postgresql.org/~petere/