Re: Page Checksums - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Page Checksums |
Date | |
Msg-id | CA+TgmoYp4+7UNS-FeSWejy3ZT1rjuRkgqtLVCz0J0zTVUNRQhw@mail.gmail.com Whole thread Raw |
In response to | Re: Page Checksums (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Page Checksums
Re: Page Checksums |
List | pgsql-hackers |
On Tue, Dec 27, 2011 at 1:39 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote: >> I >> think it would be regrettable if everyone had to give up 4 bytes per >> page because some people want checksums. > > I can understand that some people might not want the CPU expense of > calculating CRCs; or the upgrade expense to convert to new pages; but do > you think 4 bytes out of 8192 is a real concern? > > (Aside: it would be MAXALIGNed anyway, so probably more like 8 bytes.) Yeah, I do. Our on-disk footprint is already significantly greater than that of some other systems, and IMHO we should be looking for a way to shrink our overhead in that area, not make it bigger. Admittedly, most of the fat is probably in the tuple header rather than the page header, but at any rate I don't consider burning up 1% of our available storage space to be a negligible overhead. I'm not sure I believe it should need to be MAXALIGN'd, since it is followed by item pointers which IIRC only need 2-byte alignment, but then again Heikki also recently proposed adding 4 bytes per page to allow each page to track its XID generation, to help mitigate the need for anti-wraparound vacuuming. I think Simon's approach of stealing the 16-bit page version field is reasonably clever in this regard, although I also understand why Tom objects to it, and I certainly agree with him that we need to be careful not to back ourselves into a corner. What I'm not too clear about is whether a 16-bit checksum meets the needs of people who want checksums. If we assume that flaky hardware is going to corrupt pages steadily over time, then it seems like it might be adequate, because in the unlikely event that the first corrupted page happens to still pass its checksum test, well, another will come along and we'll probably spot the problem then, likely well before any significant fraction of the data gets eaten. But I'm not sure whether that's the right mental model. I, and I think some others, initially assumed we'd want a 32-bit checksum, but I'm not sure I can justify that beyond "well, I think that's what people usually do". It could be that even if we add new page header space for the checksum (as opposed to stuffing it into the page version field) we still want to add only 2 bytes. Not sure... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: