On Thu, 2011-12-22 at 03:50 -0600, Kevin Grittner wrote:
> Now, on to the separate-but-related topic of double-write. That
> absolutely requires some form of checksum or CRC to detect torn
> pages, in order for the technique to work at all. Adding a CRC
> without double-write would work fine if you have a storage stack
> which prevents torn pages in the file system or hardware driver. If
> you don't have that, it could create a damaged page indication after
> a hardware or OS crash, although I suspect that would be the
> exception, not the typical case. Given all that, and the fact that
> it would be cleaner to deal with these as two separate patches, it
> seems the CRC patch should go in first.
I think it could be broken down further.
Taking a step back, there are several types of HW-induced corruption,
and checksums only catch some of them. For instance, the disk losing
data completely and just returning zeros won't be caught, because we
assume that a zero page is just fine.
From a development standpoint, I think a better approach would be:
1. Investigate if there are reasonable ways to ensure that (outside of
recovery) pages are always initialized; and therefore zero pages can be
treated as corruption.
2. Make some room in the page header for checksums and maybe some other
simple sanity information (like file and page number). It will be a big
project to sort out the pg_upgrade issues (as Tom and others have
pointed out).
3. Attack hint bits problem.
If (1) and (2) were complete, we would catch many common types of
corruption, and we'd be in a much better position to think clearly about
hint bits, double writes, etc.
Regards,Jeff Davis