Re: corrupted tuple (header?), pg_filedump output - Mailing list pgsql-hackers

From Eric Parusel
Subject Re: corrupted tuple (header?), pg_filedump output
Date
Msg-id 423B8B8D.30106@globalrelay.net
Whole thread Raw
In response to corrupted tuple (header?), pg_filedump output  (Eric Parusel <lists@globalrelay.net>)
List pgsql-hackers
I've brought this back on-list, probably best that way..?

Eric Parusel wrote:
> Tom Lane wrote:
> 
>> What it kinda looks like from here is that you suffered a "page tear":
>> the itemid pointers at the front of the page may be self-consistent, but
>> they don't quite match the state of the rest of the page.  For instance
>> the claimed item-2 header is obviously bogus but it looks like there is
>> a valid header starting a few bytes after where the itemid points.
>> I suspect that the itemid pointers are one generation earlier or later
>> than the remainder of the page.  Since disks typically write in 512-byte
>> sectors and there is nothing else in the first 512 bytes except the
>> itemids, we could imagine that that sector got written and then the rest
>> of the page did not.  Postgres is supposed to protect against this sort
>> of thing in case of a system crash, but I wouldn't want to swear that
>> the protections are completely bulletproof.  Have you had any power
>> failures or system crashes lately?  What sort of hardware and OS is this
>> on?
> 
> 
> Hmm...
> Here is some system information:
> 
> Dell PE1750, 2GB ECC ram, 2x73GB 10K scsi attached to Perc4/di 
> (raid-on-motherboard, LSI megaraid chipset, battery-backed cache, 
> write-back cache enabled), firmware/drivers is up to date as of a month 
> ago.
> 
> The OS is RHEL3, kept up to date with the newest kernel for it.
> 
> PgSQL 8.0.1 installed from RPMs on postgresql.org, it had 8.0.0 
> installed from DGPG RPMs initially until 8.0.1 came out.
> 
> No power failures or crashes since it's been up...
> 
> It's been up and running with moderate to heavy load for about 2 months 
> now.
> 
> I don't think there have been any pgsql backend (if that's the word for 
> them) processes crashing or anything of that sort...
> 
> Pretty heavy write load on the box, it will be getting a 14 disk raid10 
> array plugged into it soon to speed things up.
> 
> 
> 
> I can't remember and I couldn't find it, but is there a consistency 
> checking tool (pg_fsck or something?) for pgsql?  Or I suppose a dump of 
> the whole database (which I do nightly) ensures all the data is readable...
> 
> If there's anything else I can do to help figure this out, let me know..
> 
> Thanks,
> Eric
> 

How would I go about double checking I don't have this problem on other 
pages?  As above, a successful db dump would verify everything's fine?
I suppose a dump and reload after that point would verify that my 
indexes and anything else in base/ is fine?

How would I figure out where and how much to overwrite with dd if I was 
to clear this page?   Or how would I set the invalid item's itemid to empty?

Obviously, stuff like this tends not to be in the documentation :D

Thanks for the help,
Eric


pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: read-only planner input
Next
From: Tom Lane
Date:
Subject: Re: read-only planner input