Home > mailing lists

Re: BUG #5055: Invalid page header error - Mailing list pgsql-bugs

From	Craig Ringer
Subject	Re: BUG #5055: Invalid page header error
Date	September 15, 2009 02:43:28
Msg-id	1252993395.27254.16.camel@wallace.localnet Whole thread Raw
In response to	BUG #5055: Invalid page header error ("john martin" <postgres_bee@live.com>)
Responses	Re: BUG #5055: Invalid page header error Re: BUG #5055: Invalid page header error
List	pgsql-bugs

Tree view

On Mon, 2009-09-14 at 23:17 +0000, john martin wrote:
> All of a sudden we started seeing page header errors in certain queries.

Was there any particular event that marked the onset of these issues?

Anything in the system logs (dmesg / syslog etc) around that time?

[for SATA disks]: does smartctl from the smartmontools package indicate
anything interesting about the disk(s)? (Ignore the "health status",
it's a foul lie, and rely on the error log plus the vendor attributes:
reallocated sector count, pending sector, uncorrectable sector count,
etc).

Was Pg forcibly killed and restarted, or the machine hard-reset? (This
_shouldn't_ cause data corruption, but might give some starting point
for looking for a bug).

> I am urging the community to investigate the possibility that it may not be
> hardware related, especially since it was first reported at least 5 years
> back.

If anything, the fact that it was first reported 5 years back makes it
_more_ likely to be hardware related. Bad hardware eats/scrambles some
of your data; Pg goes "oh crap, that page is garbage". People aren't
constantly getting their data eaten, though, despite the age of the
initial reports.

It's not turning up lots. It's not turning up in cases where hardware
issues can be ruled out. There doesn't seem to be a strong pattern
associating issues to a particular CPU / disk controller / drive etc to
suggest it could be Pg triggering a hardware bug or a bug in Pg
triggered by a hardware quirk. It doesn't seem to be reproducible and
people generally don't seem to be able to trigger the issue repeatedly.
Either it's a *really* rare and quirky bug that's really hard to
trigger, or it's a variety of hardware / disk issues.

If it's a really rare and quirky hard to trigger bug, where do you even
start looking without *some* idea what happened to trigger the issue? Do
you have any idea what might've started it in your case?

*** DID YOU TAKE COPIES OF YOUR DATA FILES BEFORE "FIXING" THEM *** ?

--
Craig Ringer

pgsql-bugs by date:

From: "john martin"
Date: 15 September 2009, 00:52:06
Subject: BUG #5055: Invalid page header error

From: John R Pierce
Date: 15 September 2009, 02:58:57
Subject: Re: BUG #5055: Invalid page header error

Re: BUG #5055: Invalid page header error - Mailing list pgsql-bugs

Previous

Next