Re: Page Checksums - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Page Checksums
Date
Msg-id CA+TgmoYp4+7UNS-FeSWejy3ZT1rjuRkgqtLVCz0J0zTVUNRQhw@mail.gmail.com
Whole thread Raw
In response to Re: Page Checksums  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Page Checksums
Re: Page Checksums
List pgsql-hackers
On Tue, Dec 27, 2011 at 1:39 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote:
>> I
>> think it would be regrettable if everyone had to give up 4 bytes per
>> page because some people want checksums.
>
> I can understand that some people might not want the CPU expense of
> calculating CRCs; or the upgrade expense to convert to new pages; but do
> you think 4 bytes out of 8192 is a real concern?
>
> (Aside: it would be MAXALIGNed anyway, so probably more like 8 bytes.)

Yeah, I do.  Our on-disk footprint is already significantly greater
than that of some other systems, and IMHO we should be looking for a
way to shrink our overhead in that area, not make it bigger.
Admittedly, most of the fat is probably in the tuple header rather
than the page header, but at any rate I don't consider burning up 1%
of our available storage space to be a negligible overhead.  I'm not
sure I believe it should need to be MAXALIGN'd, since it is followed
by item pointers which IIRC only need 2-byte alignment, but then again
Heikki also recently proposed adding 4 bytes per page to allow each
page to track its XID generation, to help mitigate the need for
anti-wraparound vacuuming.

I think Simon's approach of stealing the 16-bit page version field is
reasonably clever in this regard, although I also understand why Tom
objects to it, and I certainly agree with him that we need to be
careful not to back ourselves into a corner.  What I'm not too clear
about is whether a 16-bit checksum meets the needs of people who want
checksums.  If we assume that flaky hardware is going to corrupt pages
steadily over time, then it seems like it might be adequate, because
in the unlikely event that the first corrupted page happens to still
pass its checksum test, well, another will come along and we'll
probably spot the problem then, likely well before any significant
fraction of the data gets eaten.  But I'm not sure whether that's the
right mental model.  I, and I think some others, initially assumed
we'd want a 32-bit checksum, but I'm not sure I can justify that
beyond "well, I think that's what people usually do".  It could be
that even if we add new page header space for the checksum (as opposed
to stuffing it into the page version field) we still want to add only
2 bytes.  Not sure...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: 16-bit page checksums for 9.2
Next
From: Simon Riggs
Date:
Subject: Re: 16-bit page checksums for 9.2