Re: Enabling Checksums - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Enabling Checksums
Date
Msg-id 516868A4.6000403@vmware.com
Whole thread Raw
In response to Re: Enabling Checksums  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Enabling Checksums
Re: Enabling Checksums
List pgsql-hackers
On 12.04.2013 22:31, Bruce Momjian wrote:
> On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote:
>>> Only point worth discussing is that this change would make backup blocks be
>>> covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
>>> header is covered by a CRC32 but the backup blocks only by 16-bit.
>>
>> That means we will have to do the verification for this in
>> ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
>> won't always recognize the end of WAL correctly.
>> And I am a bit wary of reducing the likelihood of noticing the proper
>> end-of-recovery by reducing the crc width.
>>
>> Why again are we doing this now? Just to reduce the overhead of CRC
>> computation for full page writes? Or are we forseeing issues with the
>> page checksums being wrong because of non-zero data in the hole being
>> zero after the restore from bkp blocks?
>
> I thought the idea is that we were going to re-use the already-computed
> CRC checksum on the page, and we only have 16-bits of storage for that.

No, the patch has to compute the 16-bit checksum for the page when the 
full-page image is added to the WAL record. There would otherwise be no 
need to calculate the page checksum at that point, but only later when 
the page is written out from shared buffer cache.

I think this is a bad idea. It complicates the WAL format significantly. 
Simon's patch didn't include the changes to recovery to validate the 
checksum, but I suspect it would be complicated. And it reduces the 
error-detection capability of WAL recovery. Keep in mind that unlike 
page checksums, which are never expected to fail, so even if we miss a 
few errors it's still better than nothing, the WAL checkum is used to 
detect end-of-WAL. There is expected to be a failure every time we do 
crash recovery. This far, we've considered the probability of one in 
1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.

If you want to speed up the CRC calculation of full-page images, you 
could have an optimized version of the WAL CRC algorithm, using e.g. 
SIMD instructions. Because typical WAL records are small, max 100-200 
bytes, and it consists of several even smaller chunks, the normal WAL 
CRC calculation is quite resistant to common optimization techniques. 
But it might work for the full-page images. Let's not conflate it with 
the page checksums, though.

- Heikki



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Small reduction in memory usage of index relcache entries
Next
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums