Thread: Failed to re-find parent key

Failed to re-find parent key

From
Peter Eisentraut
Date:
What does the error message

failed to re-find parent key in "tablename_pkey"

mean?  This happens reproducibly during VACUUM on a certain table.

Would REINDEX fix it?  Anything else we should check?

This is PostgreSQL 7.4.2.  Are there relevant fixes later in the 7.4 series?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Failed to re-find parent key

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> What does the error message
> failed to re-find parent key in "tablename_pkey"
> mean?  This happens reproducibly during VACUUM on a certain table.

If it happens during vacuum (not vacuum full) then it must be coming
from _bt_pagedel, and it means that _bt_pagedel could not find the
parent-level btree entry for the page it wants to remove from the index.

> Would REINDEX fix it?  Anything else we should check?

REINDEX would fix it, but it would be interesting to find out what the
actual cause is.  I think we've seen one or two similar reports
previously in 7.4.*, but there's never been enough info to track
it down.  Any chance of going in with a debugger, or capturing a
tarball image of the database for someone else to look at?
        regards, tom lane


Re: Failed to re-find parent key

From
Alvaro Herrera
Date:
On Tue, Mar 22, 2005 at 12:31:55PM +0100, Peter Eisentraut wrote:
> What does the error message
> 
> failed to re-find parent key in "tablename_pkey"
> 
> mean?  This happens reproducibly during VACUUM on a certain table.

This has been reported before, but no one has been able to reproduce it
(not the VACUUM, but the steps that led the index to that state).  This
is probably a very subtle bug introduced after the page-reusing code was
introduced in nbtree.  I don't think it has been corrected in later
releases.

There are two ocurrences of this error message in the code: one is while
trying to split a page, inserting the pointer to the new page in its
parent.  This one is not what you are seeing, because during vacuum no
splitting takes place.

The other ocurrence is at the first pass of page recovery, which happens
at VACUUM.  The code tries to find the parent page to delete the pointer
that leads to the page being unlinked; if it can't find said pointer,
the error you see is issued.  I think it takes a lot of concurrency for
the situation to arise.

> Would REINDEX fix it?  Anything else we should check?

Maybe you could see exactly what page is causing the problem,
pg_filedump it, and see what's the exact problem.  Yes, a REINDEX fixes
it (at least it did in Gaetano's case.)

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)


Re: Failed to re-find parent key

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> I think it takes a lot of concurrency for
> the situation to arise.

Maybe.  Since Peter can reproduce the error, there's not any concurrency
misbehavior involved in VACUUM itself; what we are dealing with is
probably corruption in the on-disk state of the index (or maybe a legal
corner case that _bt_pagedel mishandles).  There might have been
concurrency to blame for getting into that state in the first place.
Need data ...
        regards, tom lane


Re: Failed to re-find parent key

From
Peter Eisentraut
Date:
Am Dienstag, 22. März 2005 15:54 schrieb Tom Lane:
> Any chance of going in with a debugger, or capturing a
> tarball image of the database for someone else to look at?

Unfortunately, this database is restricted and I don't have access myself.  I 
will tell the customer that they should provide a data directory tarball if 
they are interested in researching the problem, but I don't expect much.

The database does have a lot of concurrent read and write access and extremely 
high load, but I am aware that this doesn't help pinpointing the problem.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/