Re: Enabling Checksums - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Enabling Checksums
Date
Msg-id 6363139D-2757-4A78-993D-66F39F478D80@phlo.org
Whole thread Raw
In response to Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Nov10, 2012, at 00:08 , Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2012-11-09 at 20:48 +0100, Markus Wanner wrote:
>> Given your description of option 2 I was under the impression that each
>> page already has a bit indicating whether or not the page is protected
>> by a checksum. Why do you need more bits than that?
> 
> The bit indicating that a checksum is present may be lost due to
> corruption.

Though that concern mostly goes away if instead of a separate bit we use a
special checksum value, say 0xDEAD, to indicate that the page isn't
checksummed, no?

If checksums were always enabled, the probability of a random corruption
going undetected is N/N^2 = 1/N where N is the number of distinct checksum
values, since out of the N^2 equally likely pairs of computed and stored
checksums values, N show two identical values.

With the 0xDEAD-scheme, the probability of a random corruption going
undetected is (N-1 + N)/N^2 = 2/N - 1/N^2, since there are (N-1) pairs
with identical values != 0xDEAD, and N pairs where the stored checksum
value is 0xDEAD.

So instead of a 1 in 65536 chance of a corruption going undetected, the
0xDEAD-schema gives (approximately) a chance of 1 in 32768, i.e the
strength of the checksum is reduced by one bit. That's still acceptable,
I'd say.

In practice, 0xDEAD may be a bad choice because of it's widespread use
as an uninitialized marker for blocks of memory. A randomly picked value
would probably be a better choice.

best regards,
Florian Pflug





pgsql-hackers by date:

Previous
From: Chen Huajun
Date:
Subject: Re: Fix errcontext() function
Next
From: Noah Misch
Date:
Subject: Re: Incorrect behaviour when using a GiST index on points