Re: Page Checksums - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Page Checksums |
Date | |
Msg-id | CA+TgmoZhSKAP-TN6N2ahe-+zfZn_L-T_ykVOekyuCU_Z2Kh+=Q@mail.gmail.com Whole thread Raw |
In response to | Re: Page Checksums (David Fetter <david@fetter.org>) |
Responses |
Re: Page Checksums
|
List | pgsql-hackers |
On Mon, Dec 19, 2011 at 12:07 PM, David Fetter <david@fetter.org> wrote: > On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote: >> On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost <sfrost@snowman.net> wrote: >> > * Aidan Van Dyk (aidan@highrise.ca) wrote: >> >> But the scary part is you don't know how long *ago* the crash was. >> >> Because a hint-bit-only change w/ a torn-page is a "non event" in >> >> PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try >> >> and "scrub" every page in the database. >> > >> > Fair enough, but, could we distinguish these two cases? In other words, >> > would it be possible to detect if a page was torn due to a 'traditional' >> > crash and not complain in that case, but complain if there's a CRC >> > failure and it *doesn't* look like a torn page? >> >> No. > > Would you be so kind as to elucidate this a bit? Well, basically, Stephen's proposal was pure hand-waving. :-) I don't know of any magic trick that would allow us to know whether a CRC failure "looks like a torn page". The only information we're going to get is the knowledge of whether the CRC matches or not. If it doesn't, it's fundamentally impossible for us to know why. We know the page contents are not as expected - that's it! It's been proposed before that we could examine the page, consider all the unset hint bits that could be set, and try all combinations of setting and clearing them to see whether any of them produce a valid CRC. But, as Tom has pointed out previously, that has a really quite large chance of making a page that's *actually* been corrupted look OK. If you have 30 or so unset hint bits, odds are very good that some combination will produce the 32-CRC you're expecting. To put this another way, we currently WAL-log just about everything. We get away with NOT WAL-logging some things when we don't care about whether they make it to disk. Hint bits, killed index tuple pointers, etc. cause no harm if they don't get written out, even if some other portion of the same page does get written out. But as soon as you CRC the whole page, now absolutely every single bit on that page becomes critical data which CANNOT be lost. IOW, it now requires the same sort of protection that we already need for our other critical updates - i.e. WAL logging. Or you could introduce some completely new mechanism that serves the same purpose, like MySQL's double-write buffer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: