strange nbtree corruption report - Mailing list pgsql-hackers

From Alvaro Herrera
Subject strange nbtree corruption report
Date
Msg-id 1321915576-sup-7558@alvh.no-ip.org
Whole thread Raw
Responses Re: strange nbtree corruption report
Re: strange nbtree corruption report
List pgsql-hackers
Hi,

We got a very strange nbtree corruption report some time ago.  This was
a btree index on a vey high churn table -- entries are updated and
deleted very quickly, so the index grows very large and also shrinks
quickly (AFAICT this is a work queue of sorts).

The most strange thing of all is that there was this error:

ERROR:  left link changed unexpectedly in block 3378 of index "index_name"
CONTEXT:  automatic vacuum of table "table_name"

This was reported not once, but several dozens of times, by each new
autovacuum worker that tried to vacuum the table.

As far as I can see, there is just no way for this to happen ... much
less happen repeatedly.  I thought it might be related to concurrent
insertions somehow managing to split the page under deletion very
quickly (given the load these systems are under, this is plausible).
But I can't find how.

(There were other error reports of btree indexes going awry here, such
as "ERROR: right sibling's left-link doesn't match: block 67 links to
2118 instead of expected 2228 in index "pg_depend_depender_index").

These guys are running 8.3.14 here, and this is a Londiste slave
database.  Sadly, it seems that the index files in our case are gone
now.

I see three independent reports of this error message in the archives
(Ulrich Wisser, Mark Kirkwood, Gabriel Ferro), but no one seems to have
carried the investigation forward enough to discover what is exactly
going wrong.

Any thoughts about this?

--
Álvaro Herrera <alvherre@alvh.no-ip.org>


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Inlining comparators as a performance optimisation
Next
From: "Kevin Grittner"
Date:
Subject: Re: testing ProcArrayLock patches