Re: Enabling Checksums - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Enabling Checksums |
Date | |
Msg-id | 1362506575.26602.89.camel@jdavis Whole thread Raw |
In response to | Re: Enabling Checksums (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Enabling Checksums
|
List | pgsql-hackers |
Thank you for the review. On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote: > If you enable checksums, the free space map never gets updated in a > standby. It will slowly drift to be completely out of sync with reality, > which could lead to significant slowdown and bloat after failover. Will investigate. > Since the checksums are an all-or-nothing cluster-wide setting, the > three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and > PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the > code simpler, and leaves the bits free for future use. If we want to > enable such per-page setting in the future, we can add it later. For a > per-relation scheme, they're not needed. They don't really need to be there, I just put them there because it seemed wise if we ever want to allow online enabling/disabling of checksums. But I will remove them. > How does the error detection rate of this compare with e.g CRC-16? Is > there any ill effect from truncating the Fletcher sums like this? I don't recall if I published these results or not, but I loaded a table, and used pageinspect to get the checksums of the pages. I then did some various GROUP BY queries to see if I could find any clustering or stepping of the checksum values, and I could not. The distribution seemed very uniform across the 255^2 space. I tried to think of other problems, like missing errors in the high or low bits of a word or a page (similar to the issue with mod 256 described below), but I couldn't find any. I'm not enough of an expert to say more than that about the error detection rate. Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. > That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall > seeing that in other checksum implementations either. 16-bits is not > very wide for a checksum, and this eats about 1% of the space of valid > values. > > I can see that it might be a handy debugging aid to avoid 0. But there's > probably no need to avoid 0 in both bytes, it seems enough to avoid a > completely zero return value. http://en.wikipedia.org/wiki/Fletcher%27s_checksum If you look at the section on Fletcher-16, it discusses the choice of the modulus. If we used 256, then an error anywhere except the lowest byte of a 4-byte word read from the page would be missed. Considering that I was using only 255 values anyway, I thought I might as well shift the values away from zero. We could get slightly better by using all combinations. I also considered chopping the 64-bit ints into 16-bit chunks and XORing them together. But when I saw the fact that we avoided zero with the other approach, I kind of liked it, and kept it. > XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN > without a lock. That's not atomic, so it could incorrectly determine > that a page doesn't need to be backed up. We used to always hold an > exclusive lock on the buffer when it's called, which prevents > modifications to the LSN, but that's no longer the case. Will investigate, but it sounds like a buffer header lock will fix it. > Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I > think it will generate WAL records for unlogged tables as it is. Yes, thank you. Also, in FlushBuffer(), this patch moves the clearing of the BM_JUST_DIRTIED bit to before the WAL flush. That seems to expand the window during which a change to a page will prevent it from being marked clean. Do you see any performance problem with that? The alternative is to take the buffer header lock twice: once to get the LSN, then WAL flush, then another header lock to clear BM_JUST_DIRTIED. Not sure if that's better or worse. This goes back to Simon's patch, so he may have a comment here, as well. I'll post a new patch with these comments addressed, probably tomorrow so that I have some time to self-review and do some basic testing. Regards,Jeff Davis
pgsql-hackers by date: