Re: Detecting corrupted pages earlier - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Detecting corrupted pages earlier
Date
Msg-id 21294.1045547521@sss.pgh.pa.us
Whole thread Raw
In response to Re: Detecting corrupted pages earlier  (Curt Sampson <cjs@cynic.net>)
Responses Re: Detecting corrupted pages earlier  (Curt Sampson <cjs@cynic.net>)
Re: Detecting corrupted pages earlier  (Kevin Brown <kevin@sysexperts.com>)
List pgsql-hackers
Curt Sampson <cjs@cynic.net> writes:
> Well, I wasn't proposing the whole page, just the header. That would be
> significantly cheaper (in fact, there's no real need even for a CRC;
> probably just xoring all of the words in the header into one word would
> be fine) and would tell you if the page was torn during the write, which
> was what I was imagining the problem might be.

The header is only a dozen or two bytes long, so torn-page syndrome
won't result in header corruption.

The cases I've been able to study look like the header and a lot of the
following page data have been overwritten with garbage --- when it made
any sense at all, it looked like the contents of non-Postgres files (eg,
plain text), which is why I mentioned the possibility of disks writing
data to the wrong sector.  Another recent report suggested that all
bytes of the header had been replaced with 0x55, which sounds more like
RAM or disk-controller malfeasance.

You're right that we don't need a heck of a powerful check to catch
this sort of thing.  I was envisioning checks comparable to what's now
in PageAddItem: valid pagesize, valid version, pd_lower and pd_upper and
pd_special sane relative to each other and to the pagesize.  I think this
would be nearly as effective as an XOR sum --- and it has the major
advantage of being compatible with the existing page layout.  I'd like
to think we're done munging the page layout for awhile.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Curt Sampson
Date:
Subject: Re: Detecting corrupted pages earlier
Next
From: Curt Sampson
Date:
Subject: Re: Detecting corrupted pages earlier