Re: 8.1 index corruption woes - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: 8.1 index corruption woes
Date
Msg-id 20080711180223.GI4110@alvh.no-ip.org
Whole thread Raw
In response to Re: 8.1 index corruption woes  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: 8.1 index corruption woes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:
> > > We've detected what I think is some sort of index corruption in 8.1.
> > > The server is running 8.1.11, so AFAICT the problem with truncated pages
> > > in vacuum is already patched and accounted for (i.e. we reindexed, and a
> > > bit later the problem presented itself again).  There haven't been any
> > > relevant fixes after that AFAICT.
> > 
> > > What we see is that after a bit of updating the index, it starts having
> > > tuples that poing to heap entries which are marked unused.
> > 
> > Do you actually see any observed problem, or is this conclusion based
> > entirely on your pg_filedump analysis?
> 
> Well, yeah, this all started because the guys started getting weird
> results in queries, and found out that disabling index scans returned
> different results.

I neglected to mention that further analysis of the failed index scans
showed that index entries were pointing to heap tuples with completely
different data.

I was tracking one of those, when I noticed that in the same index page
was an entry pointing to a heap entry marked as unused.

> > I suspect a problem with your analysis script, although a quick scan
> > of the code didn't find an issue.

I don't want to discard this idea, because we're getting a very
unusually high number of bogus entries.  However, they are all (or a
very high percentage of them) the very first entry on each index page.
I want to confirm that the leftmost on a leaf btree page is a valid
item, and not something like the lower bound value?  (I think we only
store high bounds on internal pages, not leaf pages, but I'm not sure).


> > Another point to keep in mind, if you are trying to analyze files
> > belonging to a live database, is that what you can see in the filesystem
> > may not be the "current" contents of every page.  For typical access
> > patterns it'd be unsurprising for the visible index pages to lag behind
> > those of the heap, since they'd be "hotter" and tend to stay in shared
> > buffers longer.

It was confirmed that the servers are not live, and a checkpoint has
been executed by the recovery code.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: _bt_delitems: change before WAL?
Next
From: Tom Lane
Date:
Subject: Re: 8.1 index corruption woes