Home > mailing lists

Re: Page Checksums - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Page Checksums
Date	December 19, 2011 18:27:26
Msg-id	CA+TgmoZhSKAP-TN6N2ahe-+zfZn_L-T_ykVOekyuCU_Z2Kh+=Q@mail.gmail.com Whole thread Raw
In response to	Re: Page Checksums (David Fetter <david@fetter.org>)
Responses	Re: Page Checksums
List	pgsql-hackers

Tree view

On Mon, Dec 19, 2011 at 12:07 PM, David Fetter <david@fetter.org> wrote:
> On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote:
>> On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> > * Aidan Van Dyk (aidan@highrise.ca) wrote:
>> >> But the scary part is you don't know how long *ago* the crash was.
>> >> Because a hint-bit-only change w/ a torn-page is a "non event" in
>> >> PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
>> >> and "scrub" every page in the database.
>> >
>> > Fair enough, but, could we distinguish these two cases?  In other words,
>> > would it be possible to detect if a page was torn due to a 'traditional'
>> > crash and not complain in that case, but complain if there's a CRC
>> > failure and it *doesn't* look like a torn page?
>>
>> No.
>
> Would you be so kind as to elucidate this a bit?

Well, basically, Stephen's proposal was pure hand-waving.  :-)

I don't know of any magic trick that would allow us to know whether a
CRC failure "looks like a torn page".  The only information we're
going to get is the knowledge of whether the CRC matches or not.  If
it doesn't, it's fundamentally impossible for us to know why.  We know
the page contents are not as expected - that's it!

It's been proposed before that we could examine the page, consider all
the unset hint bits that could be set, and try all combinations of
setting and clearing them to see whether any of them produce a valid
CRC.  But, as Tom has pointed out previously, that has a really quite
large chance of making a page that's *actually* been corrupted look
OK.  If you have 30 or so unset hint bits, odds are very good that
some combination will produce the 32-CRC you're expecting.

To put this another way, we currently WAL-log just about everything.
We get away with NOT WAL-logging some things when we don't care about
whether they make it to disk.  Hint bits, killed index tuple pointers,
etc. cause no harm if they don't get written out, even if some other
portion of the same page does get written out.  But as soon as you CRC
the whole page, now absolutely every single bit on that page becomes
critical data which CANNOT be lost.  IOW, it now requires the same
sort of protection that we already need for our other critical updates
- i.e. WAL logging.  Or you could introduce some completely new
mechanism that serves the same purpose, like MySQL's double-write
buffer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Greg Smith
Date: 19 December 2011, 18:18:28
Subject: Re: why do we need two snapshots per query?

From: Robert Haas
Date: 19 December 2011, 18:29:42
Subject: Re: Page Checksums

Re: Page Checksums - Mailing list pgsql-hackers

Previous

Next