Re: Enabling Checksums - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Enabling Checksums
Date
Msg-id 1362506575.26602.89.camel@jdavis
Whole thread Raw
In response to Re: Enabling Checksums  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Thank you for the review.

On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote:
> If you enable checksums, the free space map never gets updated in a 
> standby. It will slowly drift to be completely out of sync with reality, 
> which could lead to significant slowdown and bloat after failover.

Will investigate.

> Since the checksums are an all-or-nothing cluster-wide setting, the 
> three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and 
> PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the 
> code simpler, and leaves the bits free for future use. If we want to 
> enable such per-page setting in the future, we can add it later. For a 
> per-relation scheme, they're not needed.

They don't really need to be there, I just put them there because it
seemed wise if we ever want to allow online enabling/disabling of
checksums. But I will remove them.

> How does the error detection rate of this compare with e.g CRC-16? Is 
> there any ill effect from truncating the Fletcher sums like this?

I don't recall if I published these results or not, but I loaded a
table, and used pageinspect to get the checksums of the pages. I then
did some various GROUP BY queries to see if I could find any clustering
or stepping of the checksum values, and I could not. The distribution
seemed very uniform across the 255^2 space.

I tried to think of other problems, like missing errors in the high or
low bits of a word or a page (similar to the issue with mod 256
described below), but I couldn't find any. I'm not enough of an expert
to say more than that about the error detection rate.

Fletcher is probably significantly faster than CRC-16, because I'm just
doing int32 addition in a tight loop.

Simon originally chose Fletcher, so perhaps he has more to say.

> That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall 
> seeing that in other checksum implementations either. 16-bits is not 
> very wide for a checksum, and this eats about 1% of the space of valid 
> values.
> 
> I can see that it might be a handy debugging aid to avoid 0. But there's 
> probably no need to avoid 0 in both bytes, it seems enough to avoid a 
> completely zero return value.

http://en.wikipedia.org/wiki/Fletcher%27s_checksum

If you look at the section on Fletcher-16, it discusses the choice of
the modulus. If we used 256, then an error anywhere except the lowest
byte of a 4-byte word read from the page would be missed.

Considering that I was using only 255 values anyway, I thought I might
as well shift the values away from zero.

We could get slightly better by using all combinations. I also
considered chopping the 64-bit ints into 16-bit chunks and XORing them
together. But when I saw the fact that we avoided zero with the other
approach, I kind of liked it, and kept it.

> XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN 
> without a lock. That's not atomic, so it could incorrectly determine 
> that a page doesn't need to be backed up. We used to always hold an 
> exclusive lock on the buffer when it's called, which prevents 
> modifications to the LSN, but that's no longer the case.

Will investigate, but it sounds like a buffer header lock will fix it.

> Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I 
> think it will generate WAL records for unlogged tables as it is.

Yes, thank you.

Also, in FlushBuffer(), this patch moves the clearing of the
BM_JUST_DIRTIED bit to before the WAL flush. That seems to expand the
window during which a change to a page will prevent it from being marked
clean. Do you see any performance problem with that?

The alternative is to take the buffer header lock twice: once to get the
LSN, then WAL flush, then another header lock to clear BM_JUST_DIRTIED.
Not sure if that's better or worse. This goes back to Simon's patch, so
he may have a comment here, as well.

I'll post a new patch with these comments addressed, probably tomorrow
so that I have some time to self-review and do some basic testing.

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: sql_drop Event Triggerg
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] Floating point error