Re: Detecting corrupted pages earlier - Mailing list pgsql-hackers

From Greg Copeland
Subject Re: Detecting corrupted pages earlier
Date
Msg-id 1045598835.3290.2.camel@mouse.copelandconsulting.net
Whole thread Raw
In response to Re: Detecting corrupted pages earlier  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 2003-02-17 at 22:04, Tom Lane wrote:
> Curt Sampson <cjs@cynic.net> writes:
> > On Mon, 17 Feb 2003, Tom Lane wrote:
> >> Postgres has a bad habit of becoming very confused if the page header of
> >> a page on disk has become corrupted.
> 
> > What typically causes this corruption?
> 
> Well, I'd like to know that too.  I have seen some cases that were
> identified as hardware problems (disk wrote data to wrong sector, RAM
> dropped some bits, etc).  I'm not convinced that that's the whole story,
> but I have nothing to chew on that could lead to identifying a software
> bug.
> 
> > If it's any kind of a serious problem, maybe it would be worth keeping
> > a CRC of the header at the end of the page somewhere.
> 
> See past discussions about keeping CRCs of page contents.  Ultimately
> I think it's a significant expenditure of CPU for very marginal returns
> --- the layers underneath us are supposed to keep their own CRCs or
> other cross-checks, and a very substantial chunk of the problem seems
> to be bad RAM, against which occasional software CRC checks aren't 
> especially useful.

This is exactly why "magic numbers" or simple algorithmic bit patterns
are commonly used.  If the "magic number" or bit pattern doesn't match
it's page number accordingly, you know something is wrong.  Storage cost
tends to be slightly and CPU overhead low.

I agree with you that a CRC is seems overkill for little return.

Regards,

-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



pgsql-hackers by date:

Previous
From: "Mikheev, Vadim"
Date:
Subject: Re: WAL replay logic (was Re: [PERFORM] Mount options f
Next
From: "Sumaira Ali"
Date:
Subject: PGRPROC