Re: Cost of XLogInsert CRC calculations - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Cost of XLogInsert CRC calculations
Date
Msg-id 878y2dccno.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Cost of XLogInsert CRC calculations  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-hackers
"Dann Corbit" <DCorbit@connx.com> writes:

> Probably you already knew that, and probably the birthday paradox does
> not apply.
> 
> I generally use 64 bit CRCs (UMAC) for just about anything that needs a
> CRC.
> http://www.cs.ucdavis.edu/~rogaway/umac/

The birthday paradox doesn't come up here. The CRC has to match the actual
data for that specific xlog, not just any CRC match with any xlog from a large
list.

So if an xlog is corrupted or truncated then the chances that a 64-bit CRC
would match and the xlog be mishandled is one in 16 quadrillion or so. A
32-bit CRC will match the invalid data is about one in 4 billion. The chances
that a 16-bit CRC would match would be one in 64 thousand.

I mention 16-bit CRC because you use a system every day that uses 16-bit CRCs
and you trust thousands of data blocks each day to this protection (actually
probably thousands each second). I refer to TCP/IP. Every TCP/IP segment is
protected by just a 16-bit CRC.

Have you ever seen a corrupted TCP stream caused by the use of such a short
CRC? Actually I have. A router with a bad memory card caused about 1% packet
loss due to corrupted segments. Low enough not to be noticed but in a large
FTP transfer it meant about one corrupted packet got through every 2.4GB of
data or so.

Now consider the data being protected by a the xlog CRC. If 1% of every disk
write were being corrupted would one incorrect xlog being read in and
mishandled about once every few gigabytes of logs really be the worst of your
worries?

More realistically, if you were losing power frequently and having truncated
xlog writes frequently, say about once every 5 minutes (if you could get it to
boot that fast). Would one incorrectly handled truncated log every 56 days be
considered unacceptable? That would be the consequence of 16-bit checksums.

If you ran the same experiment with 32-bit checksums it would mean the
database wouldn't correctly replay once every two thousand five hundred and
fifty three years.

-- 
greg



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Cost of XLogInsert CRC calculations
Next
From: Greg Stark
Date:
Subject: Re: Learning curves and such (was Re: pgFoundry)