Thread: RE: CRC was: Re: beta testing version

RE: CRC was: Re: beta testing version

From
"Mikheev, Vadim"
Date:
> > This may be implemented very fast (if someone points me where
> > I can find CRC func). And I could implement "physical log"
> > till next monday.
> 
> I have been experimenting with CRCs for the past 6 month in 
> our database for internal logging purposes. Downloaded a lot of
> hash libraries, tried different algorithms, and implemented a few
> myself. Which algorithm do you want? Have a look at the openssl
> libraries (www.openssl.org) for a start -if you don't find what
> you want let me know.

Thanks.

> As the logging might include large data blocks, especially 
> now that we can TOAST our data, 

TOAST breaks data into a few 2K (or so) tuples to be inserted
separately. But first after checkpoint btree split will require
logging of 2x8K record -:(

> I would strongly suggest to use strong hashes like RIPEMD or
> MD5 instead of CRC-32 and the like. Sure, it takes more time 
> tocalculate and more place on the hard disk, but then: a database
> without data integrity (and means of _proofing_ integrity) is
> pretty worthless.

Other opinions? Also, we shouldn't forget licence issues.

Vadim


Re: CRC was: Re: beta testing version

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> I would strongly suggest to use strong hashes like RIPEMD or
>> MD5 instead of CRC-32 and the like.

> Other opinions? Also, we shouldn't forget licence issues.

I agree with whoever commented that crypto hashes are silly for this
application.  A 64-bit CRC *might* be enough stronger than a 32-bit
CRC to be worth the extra calculation, but frankly I doubt that too.

Remember that we are already sitting atop hardware that's really pretty
reliable, despite the carping that's been going on in this thread.  All
that we have to do is detect the infrequent case where a block of data
didn't get written due to system failure.  It's wildly pessimistic to
think that we might get called on to do so as much as once a day (if
you are trying to run a reliable database, and are suffering power
failures once a day, and haven't bought a UPS, you're a lost cause).
A 32-bit CRC will fail to detect such an error with a probability of
about 1 in 2^32.  So, a 32-bit CRC will have an MBTF of 2^32 days, or
11 million years, on the wildly pessimistic side --- real installations
probably 100 times better.  That's plenty for me, and improving the odds
to 2^64 or 2^128 is not worth any slowdown IMHO.
        regards, tom lane


Re: CRC was: Re: beta testing version

From
ncm@zembu.com (Nathan Myers)
Date:
On Thu, Dec 07, 2000 at 04:35:00PM -0500, Tom Lane wrote:
> Remember that we are already sitting atop hardware that's really
> pretty reliable, despite the carping that's been going on in this
> thread. All that we have to do is detect the infrequent case where a
> block of data didn't get written due to system failure. It's wildly
> pessimistic to think that we might get called on to do so as much as
> once a day (if you are trying to run a reliable database, and are
> suffering power failures once a day, and haven't bought a UPS, you're
> a lost cause). A 32-bit CRC will fail to detect such an error with a
> probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of
> 2^32 days, or 11 million years, on the wildly pessimistic side ---
> real installations probably 100 times better. That's plenty for me,
> and improving the odds to 2^64 or 2^128 is not worth any slowdown
> IMHO.

1. Computing a CRC-64 takes only about twice as long as a CRC-32, for  2^32 times the confidence.  That's pretty cheap
confidence.

2. I disagree with way the above statistics were computed.  That eleven   million-year figure gets whittled down pretty
quicklywhen you   factor in all the sources of corruption, even without crashes.    (Power failures are only one of
manysources of corruption.)  They   grow with the size and activity of the database.  Databases are   getting very
largeand busy indeed.
 

3. Many users clearly hope to be able to pull the plug on their hardware   and get back up confidently.  While we can't
promisethey won't have   to go to their backups, we should at least be equipped to promise,  with confidence, that they
willknow whether they need to.
 

4. For a way to mark the "current final" log entry, you want a lot more  confidence, because you read a lot more of
them,and reading beyond   the end may cause you to corrupt a currently-valid database, which   seems a lot worse than
justusing a corrupted database.
 

Still, I agree that a 32-bit CRC is better than none at all.  

Nathan Myers
ncm@zembu.com


Re: CRC was: Re: beta testing version

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
> 2. I disagree with way the above statistics were computed.  That eleven 
>    million-year figure gets whittled down pretty quickly when you 
>    factor in all the sources of corruption, even without crashes.  
>    (Power failures are only one of many sources of corruption.)  They 
>    grow with the size and activity of the database.  Databases are 
>    getting very large and busy indeed.

Sure, but the argument still holds.  If the net MTBF of your underlying
system is less than a day, it's too unreliable to run a database that
you want to trust.  Doesn't matter what the contributing failure
mechanisms are.  In practice, I'd demand an MTBF of a lot more than a
day before I'd accept a hardware system as satisfactory...

> 3. Many users clearly hope to be able to pull the plug on their hardware 
>    and get back up confidently.  While we can't promise they won't have 
>    to go to their backups, we should at least be equipped to promise,
>    with confidence, that they will know whether they need to.

And the difference in odds between 2^32 and 2^64 matters here?  I made
a numerical case that it doesn't, and you haven't refuted it.  By your
logic, we might as well say that we should be using a 128-bit CRC, or
256-bit, or heck, a few kilobytes.  It only takes a little longer to go
up each step, right, so where should you stop?  I say MTBF measured in
megayears ought to be plenty.  Show me the numerical argument that 64
bits is the right place on the curve.

> 4. For a way to mark the "current final" log entry, you want a lot more
>    confidence, because you read a lot more of them,

You only need to make the distinction during a restart, so I don't
think that argument is correct.
        regards, tom lane


RFC: CRC datatype

From
"Horst Herb"
Date:
I believe that there are many good points to have CRC facilities "built
int", and I failed to detect any arguments against it. In my domain (the
medical domain) we simply can't use data without "proof" of integrity
("proof" as in  highest possible  level of confidence within reasonable
effort)

Therefore, I propose defining new data types like "CRC32", "CRC64",
"RIPEMD", whatever (rather than pluggable arbitrary CRCs).

Similar as with the SERIAL data type, the CRC data type would generate
automatically a trigger function that calculates a CRC across a tuple
(omitting the CRC property of course, and maybe the OID as well) before each
update and store it in itself.

Is there anything wrong with this idea?

Can somebody help me by pointing me into the right direction to implement
it? (The person who implemeted the SERIAL data type maybe ?)

Regards,
Horst



RE: RFC: CRC datatype

From
"Christopher Kings-Lynne"
Date:
> Therefore, I propose defining new data types like "CRC32", "CRC64",
> "RIPEMD", whatever (rather than pluggable arbitrary CRCs).

I suspect that you are really looking at the problem from the wrong end.
CRC checking should not need to be done by the database user, with a fancy
type.  The postgres server itself should guarantee data integrity - you
shouldn't have to worry about it in userland.

This is, in fact, what the recent discussion on this list has been
proposing...

Chris



Re: RFC: CRC datatype

From
"Horst Herb"
Date:
> I suspect that you are really looking at the problem from the wrong end.
> CRC checking should not need to be done by the database user, with a fancy
> type.  The postgres server itself should guarantee data integrity - you
> shouldn't have to worry about it in userland.

I agree in principle. However, performance sometimes is more important than
integrity. Think of a data logger of uncritical data. A online forum. There
a plenty of occasions where you don't have to worry for a single bit on or
off, but a lot to worry about performance. Look at all those people using M$
Access or MySQL who don't give a damn about data integrity. As opposed to
them, there will always be other "typical" database applications where 100%
integrity is paramount. Then it is nice to have a choice of CRCs, where the
database designer can choose according to his/her specific
performance/integrity balanced needs. This is why I would prefer the
"datatype" solution.

> This is, in fact, what the recent discussion on this list has been
> proposing...

AFAIK the thread for "built in" crcs referred only to CRCs in the
transaction log. This here is a different thing. CRCs in the transaction log
are crucial to proof integrity of the log, CRCs as datatype are neccessary
to proof integrity of database entries at row level. Always remember that a
psotgres data base on the harddisk can be manipulated accidentally /
maliciously without postgres even running. These are the cases where you
need row level CRCs.

Horst



Re: RFC: CRC datatype

From
Tom Lane
Date:
"Horst Herb" <hherb@malleenet.net.au> writes:
> AFAIK the thread for "built in" crcs referred only to CRCs in the
> transaction log. This here is a different thing. CRCs in the transaction log
> are crucial to proof integrity of the log, CRCs as datatype are neccessary
> to proof integrity of database entries at row level.

I think a row-level CRC is rather pointless.  Perhaps it'd be a good
idea to have a disk-page-level CRC, though.  That would check the rows
on the page *and* allow catching errors in the page and tuple overhead
structures, which row-level CRCs would not cover.

I suspect TOAST breaks your notion of computing a CRC at trigger time
anyway --- some of the fields may be toasted already, some not.

If you're sufficiently paranoid that you insist you need a row-level
CRC, it seems to me that you'd want to generate it and later check it
in your application, not in the database.  That's the only way you get
end-to-end coverage.  Surely you don't trust your TCP connection to the
server, either?
        regards, tom lane


Re: RFC: CRC datatype

From
"Horst Herb"
Date:
> I think a row-level CRC is rather pointless.  Perhaps it'd be a good
> idea to have a disk-page-level CRC, though.  That would check the rows
> on the page *and* allow catching errors in the page and tuple overhead
> structures, which row-level CRCs would not cover.

row level is neccessary to be able tocheck integrity at application level.

> I suspect TOAST breaks your notion of computing a CRC at trigger time
> anyway --- some of the fields may be toasted already, some not.

The workaround is a loggingtable where you store the crcs as well. Lateron,
an "integrity daemon" can compare whether match or not.

> If you're sufficiently paranoid that you insist you need a row-level
> CRC, it seems to me that you'd want to generate it and later check it
> in your application, not in the database.  That's the only way you get

Oh, sure, that is the way we do it now. And no, nothing to do with paranoia.
Burnt previously badly by assumption that a decent SQL server is a
"guarantee" for data integrity. Shit simply happens.

> end-to-end coverage.  Surely you don't trust your TCP connection to the
> server, either?

TCP _IS_ heavily checksummed. But yes, we _do_ calculate checksums at the
client, recalculate at the server, and compare after the transaction is
completed. As we have only few writes between heavy read access, the
performance penalty doing this (for our purposes) is minimal.

Horst



Re: CRC was: Re: beta testing version

From
Bruce Guenter
Date:
On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote:
> 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for
>    2^32 times the confidence.  That's pretty cheap confidence.

Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
Celeron and on a PIII was MD5.  The 64-bit fingerprint was only a hair
slower, the CRC was (quite surprisingly) about 40% slower, and the
implementation of SHA that I had available was a real dog.  Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: CRC was: Re: beta testing version

From
Ian Lance Taylor
Date:
Date: Fri, 8 Dec 2000 12:19:39 -0600  From: Bruce Guenter <bruceg@em.ca>
  Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,  the standard 32-bit CRC, MD5 and SHA, and
thefastest algorithm on my  Celeron and on a PIII was MD5.  The 64-bit fingerprint was only a hair  slower, the CRC was
(quitesurprisingly) about 40% slower, and the  implementation of SHA that I had available was a real dog.  Taking an
arbitrary32 bits of a MD5 would likely be less collision prone than  using a 32-bit CRC, and it appears faster as
well.

I just want to confirm that you used something like the fast 32-bit
CRC algorithm, appended.  The one posted earlier was accurate but
slow.

Ian

/** Copyright (C) 1986 Gary S. Brown.  You may use this program, or* code or tables extracted from it, as desired
withoutrestriction.*/
 

/* Modified slightly by Ian Lance Taylor, ian@airs.com, for use with  Taylor UUCP.  */

#include "uucp.h"
#include "prot.h"

/* First, the polynomial itself and its table of feedback terms.  The  */
/* polynomial is                                                       */
/* X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 */
/* Note that we take it "backwards" and put the highest-order term in  */
/* the lowest-order bit.  The X^32 term is "implied"; the LSB is the   */
/* X^31 term, etc.  The X^0 term (usually shown as "+1") results in    */
/* the MSB being 1.                                                    */

/* Note that the usual hardware shift register implementation, which   */
/* is what we're using (we're merely optimizing it by doing eight-bit  */
/* chunks at a time) shifts bits into the lowest-order term.  In our   */
/* implementation, that means shifting towards the right.  Why do we   */
/* do it this way?  Because the calculated CRC must be transmitted in  */
/* order from highest-order term to lowest-order term.  UARTs transmit */
/* characters in order from LSB to MSB.  By storing the CRC this way,  */
/* we hand it to the UART in the order low-byte to high-byte; the UART */
/* sends each low-bit to hight-bit; and the result is transmission bit */
/* by bit from highest- to lowest-order term without requiring any bit */
/* shuffling on our part.  Reception works similarly.                  */

/* The feedback terms table consists of 256, 32-bit entries.  Notes:   */
/*                                                                     */
/*     The table can be generated at runtime if desired; code to do so */
/*     is shown later.  It might not be obvious, but the feedback      */
/*     terms simply represent the results of eight shift/xor opera-    */
/*     tions for all combinations of data and CRC register values.     */
/*     [this code is no longer present--ian]                   */
/*                                                                     */
/*     The values must be right-shifted by eight bits by the "updcrc"  */
/*     logic; the shift must be unsigned (bring in zeroes).  On some   */
/*     hardware you could probably optimize the shift in assembler by  */
/*     using byte-swap instructions.                                   */

static const unsigned long aicrc32tab[] = { /* CRC polynomial 0xedb88320 */
0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L,
0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L,
0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L,
0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L,
0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L,
0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL,
0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL,
0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L,
0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L,
0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L,
0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL,
0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L,
0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL,
0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L,
0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L,
0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L,
0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL,
0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L,
0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL,
0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL,
0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L,
0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L,
0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L,
0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL,
0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL,
0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL
};

/** IUPDC32 macro derived from article Copyright (C) 1986 Stephen Satchell. *  NOTE: First argument must be in range 0
to255.*        Second argument is referenced twice.* * Programmers may incorporate any or all code into their programs,
*giving proper credit within the source. Publication of the * source routines is permitted so long as proper credit is
given* to Stephen Satchell, Satchell Evaluations and Chuck Forsberg, * Omen Technology.*/
 

#define IUPDC32(b, ick) \ (aicrc32tab[((int) (ick) ^ (b)) & 0xff] ^ (((ick) >> 8) & 0x00ffffffL))

unsigned long
icrc (z, c, ick)    const char *z;    size_t c;    unsigned long ick;
{ while (c > 4)   {     ick = IUPDC32 (*z++, ick);     ick = IUPDC32 (*z++, ick);     ick = IUPDC32 (*z++, ick);
ick= IUPDC32 (*z++, ick);     c -= 4;   } while (c-- != 0)   ick = IUPDC32 (*z++, ick); return ick;
 
}


Re: RFC: CRC datatype

From
Tom Lane
Date:
"Horst Herb" <hherb@malleenet.net.au> writes:
>> Surely you don't trust your TCP connection to the
>> server, either?

> TCP _IS_ heavily checksummed.

Yes, and so are the disk drives that you are asserting you don't trust.

My point is that in both cases, there are lots and lots of failure
mechanisms that won't be caught by the transport or storage CRC.
The same applies to anything other than an end-to-end check.
        regards, tom lane


Re: CRC was: Re: beta testing version

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 10:36:39AM -0800, Ian Lance Taylor wrote:
>    Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
>    the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
>    Celeron and on a PIII was MD5.  The 64-bit fingerprint was only a hair
>    slower, the CRC was (quite surprisingly) about 40% slower, and the
>    implementation of SHA that I had available was a real dog.  Taking an
>    arbitrary 32 bits of a MD5 would likely be less collision prone than
>    using a 32-bit CRC, and it appears faster as well.
>
> I just want to confirm that you used something like the fast 32-bit
> CRC algorithm, appended.  The one posted earlier was accurate but
> slow.

Yes.  I just rebuilt the framework using this exact code, and it
performed identically to the previous CRC code (which didn't have an
unrolled inner loop).  These were compiled with -O6 with egcs 1.1.2.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: CRC was: Re: beta testing version

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> > ... Taking an
> > arbitrary 32 bits of a MD5 would likely be less collision prone than
> > using a 32-bit CRC, and it appears faster as well.
>
> ... but that would be an algorithm that you know NOTHING about the
> properties of.  What is your basis for asserting it's better than CRC?

MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts.  In other words, any stream of input to the hash that is
longer than the hash length (8 bytes for MD5) is equally probable to
produce a given hash code.

> CRC is pretty well studied and its error-detection behavior is known
> (and good).  MD5 has been studied less thoroughly AFAIK, and in any
> case what's known about its behavior is that the *entire* MD5 output
> provides a good signature for a datastream.  If you pick some ad-hoc
> method like taking a randomly chosen subset of MD5's output bits,
> you really don't know anything at all about what the error-detection
> properties of the method are.

Actually, in my reading reagarding the properties of MD5, I read an
article that stated that if a smaller number of bits was desired, one
could either (and here's where my memory fails me) just select the
middle N bits from the resulting hash, or fold the hash using XOR until
the desired number of bits was reached.  I'll see if I can find a
reference...

RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for
folding MD5 digests down to 64 bits by XORing the top half with the
bottom half.  See appendix A.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: CRC

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Dec 08, 2000 at 12:19:39PM -0600, Bruce Guenter wrote:
> On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote:
> > 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for
> >    2^32 times the confidence.  That's pretty cheap confidence.
> 
> Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
> the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
> Celeron and on a PIII was MD5.  The 64-bit fingerprint was only a hair
> slower, the CRC was (quite surprisingly) about 40% slower, and the
> implementation of SHA that I had available was a real dog.  Taking an
> arbitrary 32 bits of a MD5 would likely be less collision prone than
> using a 32-bit CRC, and it appears faster as well.

This is very interesting.  MD4 is faster than MD5.  (MD5, described as 
"MD4 with suspenders on", does some extra stuff to protect against more-
obscure attacks, of no interest to us.)  Which 64-bit CRC code did you 
use, Mark Mitchell's?  Are you really saying MD5 was faster than CRC-32?

I don't know of any reason to think that 32 bits of an MD5 would be 
better distributed than a CRC-32, or that having computed the 64 bits
there would be any point in throwing away half.

Current evidence suggests that MD4 would be a good choice for a hash
algorithm.

Nathan Myers
ncm@zembu.com



Re: CRC was: Re: beta testing version

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
> ... Taking an
> arbitrary 32 bits of a MD5 would likely be less collision prone than
> using a 32-bit CRC, and it appears faster as well.

... but that would be an algorithm that you know NOTHING about the
properties of.  What is your basis for asserting it's better than CRC?

CRC is pretty well studied and its error-detection behavior is known
(and good).  MD5 has been studied less thoroughly AFAIK, and in any
case what's known about its behavior is that the *entire* MD5 output
provides a good signature for a datastream.  If you pick some ad-hoc
method like taking a randomly chosen subset of MD5's output bits,
you really don't know anything at all about what the error-detection
properties of the method are.

I am reminded of Knuth's famous advice about random number generators:
"Random numbers should not be generated with a method chosen at random.
Some theory should be used."  Error-detection codes, like random-number
generators, have decades of theory behind them.  Seat-of-the-pants
tinkering, even if it starts with a known-good method, is not likely to
produce an improvement.
        regards, tom lane


Re: CRC was: Re: beta testing version

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
> MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
> impossible to produce a collision using any other method than brute
> force attempts.

True but irrelevant.  What we need to worry about is the probability
that a random error will be detected, not the computational effort that
a malicious attacker would need in order to insert an undetectable
error.

MD5 is designed for a purpose that really doesn't have much to do with
error detection, when you think about it.  It says "you will have a hard
time computing a different string that produces the same hash as some
prespecified string".  This is not the same as promising
better-than-random odds against a damaged copy of some string having the
same hash as the original.  CRC, on the other hand, is specifically
designed for error detection, and for localized errors (such as a
corrupted byte or two) it does a provably better-than-random job.
For nonlocalized errors you don't get a guarantee, but you do get
same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC).
I really doubt that MD5 can beat a CRC with the same number of output
bits for the purpose of error detection; given the lack of guarantee
about short burst errors, I doubt it's even as good.  (Wild-pointer
stomps on disk buffers are an example of the sort of thing that may
look like a burst error.)

Now, if you are worried about crypto-capable gremlins living in your
file system, maybe what you want is MD5.  But I'm inclined to think that
CRC is more appropriate for the job at hand.
        regards, tom lane


Re: Re: CRC

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Dec 08, 2000 at 11:10:19AM -0800, I wrote:
> Current evidence suggests that MD4 would be a good choice for a hash
> algorithm.

Thinking about it, I suspect that any CRC implementation that can't outrun 
MD5 by a wide margin is seriously sub-optimal.  Can you post any more
details about how the tests were run?  I'd like to try it.

Nathan Myers
ncm@zembu.com


Re: Re: CRC

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote:
> This is very interesting.  MD4 is faster than MD5.  (MD5, described as
> "MD4 with suspenders on", does some extra stuff to protect against more-
> obscure attacks, of no interest to us.)  Which 64-bit CRC code did you
> use, Mark Mitchell's?

Yes.

> Are you really saying MD5 was faster than CRC-32?

Yes.  I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
is mostly non-parallelizable, uses a table lookup, and operates on
single bytes.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
> Thinking about it, I suspect that any CRC implementation that can't outrun 
> MD5 by a wide margin is seriously sub-optimal.

I was finding that hard to believe, too, at least for CRC-32 (CRC-64
would take more code, so I'm not so sure about it).

Is that 64-bit code you pointed us to before actually a CRC, or
something else?  It doesn't call itself a CRC, and I was having a hard
time extracting anything definite (like the polynomial) from all the
bit-pushing underbrush :-(
        regards, tom lane


Re: Re: CRC

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
>> Are you really saying MD5 was faster than CRC-32?

> Yes.  I expect it's because the operations used in MD5 are easily
> parallelized, and operate on blocks of 64-bytes at a time, while the CRC
> is mostly non-parallelizable, uses a table lookup, and operates on
> single bytes.

What MD5 implementation did you use?  The one I have handy (the original
RSA reference version) sure looks like it's more computation per byte
than a CRC.
        regards, tom lane


Re: Re: CRC

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 04:21:21PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > Thinking about it, I suspect that any CRC implementation that can't outrun
> > MD5 by a wide margin is seriously sub-optimal.
> I was finding that hard to believe, too, at least for CRC-32 (CRC-64
> would take more code, so I'm not so sure about it).

Would you like to see the simple benchmarking setup I used?  The amount
of code involved (once all the hashes are factored in) is fairly large,
so I'm somewhat hesitant to just send it to the mailing list.

> Is that 64-bit code you pointed us to before actually a CRC, or
> something else?  It doesn't call itself a CRC, and I was having a hard
> time extracting anything definite (like the polynomial) from all the
> bit-pushing underbrush :-(

It isn't a CRC.  It's a fingerprint.  As you've mentioned, it doesn't
have the guarantees against burst errors that a CRC would have, but it
does have as good as random collision avoidance over any random data
corruption.  At least, that's what the author claims.  My math isn't
nearly good enough to verify such claims.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: CRC was: Re: beta testing version

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 03:38:09PM -0500, Tom Lane wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
> > impossible to produce a collision using any other method than brute
> > force attempts.
> True but irrelevant.  What we need to worry about is the probability
> that a random error will be detected,

Which I indicated immediately after the sentence you quoted.  The
probability that a random error will be detected is the same as the
probability of a collision in the hash given two different inputs.  The
brute force note means that the probability of a collision is as good as
random.

> MD5 is designed for a purpose that really doesn't have much to do with
> error detection, when you think about it.  It says "you will have a hard
> time computing a different string that produces the same hash as some
> prespecified string".  This is not the same as promising
> better-than-random odds against a damaged copy of some string having the
> same hash as the original.

It does provide as-good-as-random odds against a damaged copy of some
string having the same hash as the original -- nobody has been able to
exhibit any collisions in MD5 (see http://cr.yp.to/papers/hash127.ps,
page 18 for notes on this).

> CRC, on the other hand, is specifically
> designed for error detection, and for localized errors (such as a
> corrupted byte or two) it does a provably better-than-random job.
> For nonlocalized errors you don't get a guarantee, but you do get
> same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC).

For the log, the CRC's primary function (as far as I understand it)
would be to guard against inconsistent transaction being treated as
consistent data.  Such inconsistent transactions would be partially
written, resulting in errors much larger than a small burst.

For guarding the actual record data, I agree with you 100% -- what we're
likely to see is a few localized bytes with flipped bits due to hardware
failure of one kind or another.  However, if the data is really
critical, an ECC may be more appropriate, but that would make the data
significantly larger (9/8 for the algorithms I've seen).

> I really doubt that MD5 can beat a CRC with the same number of output
> bits for the purpose of error detection;

Agreed.  However, MD5 provides four times as many bits as the standard
32-bit CRC.

(I think I initially suggested you could take an arbitrary 32 bits out
of MD5 to provide a check code "as good as CRC-32".  I now take that
back.  Due to the burst error nature of CRCs, nothing else could be as
good as it, unless the alternate algorithm also made some guarantees,
which MD5 definitely doesn't.)

> (Wild-pointer
> stomps on disk buffers are an example of the sort of thing that may
> look like a burst error.)

Actually, wild-pointer incidents involving disk buffers at the kernel
level, from my experience, are characterized by content from one file
appearing in another, which is distinctly different than a burst error,
and more like what would be seen if a log record were partially written.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> >> Are you really saying MD5 was faster than CRC-32?
> > Yes.  I expect it's because the operations used in MD5 are easily
> > parallelized, and operate on blocks of 64-bytes at a time, while the CRC
> > is mostly non-parallelizable, uses a table lookup, and operates on
> > single bytes.
> What MD5 implementation did you use?

I used the GPL'ed implementation written by Ulrich Drepper in 1995.  The
code from OpenSSL looks identical in terms of the operations performed.

> The one I have handy (the original
> RSA reference version) sure looks like it's more computation per byte
> than a CRC.

The algorithm itself does use more computation per byte.  However, the
algorithm works on blocks of 64 bytes at a time.  As well, the
operations should be easily pipelined.  On the other hand, the CRC code
is largely serial, and highly dependant on a table lookup operation.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
> Would you like to see the simple benchmarking setup I used?  The amount
> of code involved (once all the hashes are factored in) is fairly large,
> so I'm somewhat hesitant to just send it to the mailing list.

I agree, don't send it to the whole list.  But I'd like a copy.
        regards, tom lane


Re: Re: CRC

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
>> I agree, don't send it to the whole list.  But I'd like a copy.

> Here you go.

As near as I could tell, the test as you have it (one CRC computation per
fread) is purely I/O bound.  I changed the main loop to this:

int main() { static char buf[8192]; size_t rd; hash_t hash;
 while (rd = fread(buf, 1, sizeof buf, stdin)) {  int i;  for (i = 0; i < 1000; i++) {      init(&hash);
update(&hash,buf, rd);  } } return 0;
 
}

so as to get a reasonable amount of computation per fread.  On an
otherwise idle HP 9000 C180 machine, I get the following numbers on a
1MB input file:

time benchcrc <random32

real       35.3
user       35.0
sys         0.0

time benchmd5 <random32

real       37.6
user       37.3
sys         0.0

This is a lot closer than I'd have expected, but it sure ain't
"MD5 40% faster" as you reported.  I wonder why the difference
in results between your platform and mine?

BTW, I used gcc 2.95.2 to compile, -O6, no other switches.
        regards, tom lane


Re: Re: CRC

From
Tom Lane
Date:
A couple further observations while playing with this benchmark ---

1. This MD5 implementation is not too robust.  On my machine it dumps
core if given a non-word-aligned data buffer.  We could probably work
around that, but it bespeaks a little *too* much hand optimization...

2. It's a bad idea to ignore the startup/termination costs of the
algorithms.  Of course startup/termination is trivial for CRC, but
it's not so trivial for MD5.  I changed the code so that the md5
update() routine also calls md5_finish_ctx(), so that each inner
loop represents a complete MD5 calculation for a message of the
size of the main routine's fread buffer.  I then experimented with
different buffer sizes.  At a buffer size of 1K:

time benchcrc <random32

real       35.4
user       35.1
sys         0.0
time benchmd5 <random32

real       41.4
user       41.1
sys         0.0

At a buffer size of 100 bytes:

time benchcrc <random32

real       36.3
user       36.0
sys         0.0
time benchmd5 <random32

real     1:09.7
user     1:09.2
sys         0.0

(The total amount of data processed is 1000 MB in either case, but
it's divided into more messages in the second case.)

I'm not sure exactly what Vadim has in mind for computing CRCs on the
WAL log.  If he's thinking of a CRC for each log message, the MD5 stuff
would be at a definite disadvantage.  For disk-page checksums (8K or
more) this isn't too much of an issue, however.
        regards, tom lane


Re: Re: CRC

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> >> I agree, don't send it to the whole list.  But I'd like a copy.
> > Here you go.
> As near as I could tell, the test as you have it (one CRC computation per
> fread) is purely I/O bound.

Nope.  They got 99-100% CPU time with the original version.

> I changed the main loop to this:
> [...hash each block repeatedly...]

Good idea.  Might have been even better to just read the block once and
hash it even more times.

>   On an
> otherwise idle HP 9000 C180 machine, I get the following numbers on a
> 1MB input file:
>
> time benchcrc <random32
> real       35.3 > user       35.0 > sys         0.0
>
> time benchmd5 <random32
> real       37.6 > user       37.3 > sys         0.0
>
> This is a lot closer than I'd have expected, but it sure ain't
> "MD5 40% faster" as you reported.  I wonder why the difference
> in results between your platform and mine?

The difference is likely because PA-RISC (like most other RISC
architectures) lack a "roll" opcode that is very prevalent in the MD5
algorithm.  Intel CPUs have it.  With a new version modified to repeat
the inner loop 100,000 times, I got the following:

time benchcrc <random
21.35user 0.01system 0:21.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (79major+11minor)pagefaults 0swaps
time benchmd5 <random
12.79user 0.01system 0:12.79elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (80major+11minor)pagefaults 0swaps
time benchcrc <random
21.32user 0.06system 0:21.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (79major+11minor)pagefaults 0swaps
time benchmd5 <random
12.79user 0.01system 0:12.80elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (80major+11minor)pagefaults 0swaps
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
>> This is a lot closer than I'd have expected, but it sure ain't
>> "MD5 40% faster" as you reported.  I wonder why the difference
>> in results between your platform and mine?

> The difference is likely because PA-RISC (like most other RISC
> architectures) lack a "roll" opcode that is very prevalent in the MD5
> algorithm.

A good theory, but unfortunately not a correct theory.  PA-RISC can do a
circular shift in one cycle using the "shift right double" instruction,
with the same register specified as both high and low halves of the
64-bit operand.  And gcc does know about that.

After some groveling through assembly code, it seems that the CRC-32
implementation is about as tight as it could get: two loads, two XORs,
and two EXTRU's per byte (one used to implement the right shift, the
other to implement masking with 0xFF).  And the wall clock timing does
indeed come out to just about six cycles per byte.  The MD5 code also
looks pretty tight.  Each basic OP requires either two or three logical
operations (and/or/xor/not) depending on which round you're looking at,
plus four additions and a circular shift.  PA-RISC needs two cycles to
load an arbitrary 32-bit constant, but other than that I see no wasted
cycles here:
ldil L'-1444681467,%r20xor %r3,%r14,%r19ldo R'-1444681467(%r20),%r20and %r1,%r19,%r19addl %r15,%r20,%r20xor
%r14,%r19,%r19addl%r19,%r26,%r19addl %r20,%r19,%r15shd %r15,%r15,27,%r15addl %r15,%r3,%r15
 

Note gcc has been smart enough to assign all the correct_words[] array
elements to registers, else we'd lose another cycle to a load operation
--- fortunately PA-RISC has lots of registers.

There are 64 of these basic OPs needed in each round, and each round
processes 64 input bytes, so basically you can figure one OP per byte.
Ignoring loop overhead and so forth, it's nine or ten cycles per byte
for MD5 versus six for CRC.

I'm at a loss to see how a Pentium would arrive at a better result for
MD5 than for CRC.  For one thing, it's going to be at a disadvantage
because it hasn't got enough registers.  I'd be interested to see the
assembly code...
        regards, tom lane


Re: Re: CRC

From
Bruce Guenter
Date:
On Fri, Dec 08, 2000 at 10:17:00PM -0500, Tom Lane wrote:
> A couple further observations while playing with this benchmark ---
>
> 1. This MD5 implementation is not too robust.  On my machine it dumps
> core if given a non-word-aligned data buffer.  We could probably work
> around that, but it bespeaks a little *too* much hand optimization...

The operations in the MD5 core are based on word-sized chunks.
Obviously, the implentation only does word-sized loads and stores for
that data, and you got a bus error.

> 2. It's a bad idea to ignore the startup/termination costs of the
> algorithms.

Yes.  I had included the startup costs in my benchmark, but not the
termination costs, which are large for MD5 as you point out.

> Of course startup/termination is trivial for CRC, but
> it's not so trivial for MD5.  I changed the code so that the md5
> update() routine also calls md5_finish_ctx(), so that each inner
> loop represents a complete MD5 calculation for a message of the
> size of the main routine's fread buffer.  I then experimented with
> different buffer sizes.  At a buffer size of 1K:

On my Celeron, at 1K blocks MD5 is still significantly faster than CRC,
but is slightly slower at 100 byte blocks.  For comparison, I added
RIPEMD-160, but it's far slower than any of them (twice as long as CRC).
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Bruce Guenter
Date:
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> > The difference is likely because PA-RISC (like most other RISC
> > architectures) lack a "roll" opcode that is very prevalent in the MD5
> > algorithm.
>
> A good theory, but unfortunately not a correct theory.  PA-RISC can do a
> circular shift in one cycle using the "shift right double" instruction,
> with the same register specified as both high and low halves of the
> 64-bit operand.  And gcc does know about that.

Interesting.  I was under the impression that virtually no RISC CPU had
a rotate instruction.  Do any others?

> After some groveling through assembly code, it seems that the CRC-32
> implementation is about as tight as it could get: two loads, two XORs,
> and two EXTRU's per byte (one used to implement the right shift, the
> other to implement masking with 0xFF).

Same with the x86 core:       movb %dl,%al       xorb (%ecx),%al       andl $255,%eax       shrl $8,%edx       incl
%ecx      xorl (%esi,%eax,4),%edx 

> And the wall clock timing does
> indeed come out to just about six cycles per byte.

On my Celeron, the timing for those six opcodes is almost whopping 13
cycles per byte.  Obviously there's some major performance hit to do the
memory instructions, because there's no more than 4 cycles worth of
dependant instructions in that snippet.

BTW, for reference, P3 timings are almost identical to those of the
Celeron, so it's not causing problems outside the built-in caches common
to the two chips.

> The MD5 code also
> looks pretty tight.  Each basic OP requires either two or three logical
> operations (and/or/xor/not) depending on which round you're looking at,
> plus four additions and a circular shift.  PA-RISC needs two cycles to
> load an arbitrary 32-bit constant, but other than that I see no wasted
> cycles here:
>
>     ldil L'-1444681467,%r20
>     xor %r3,%r14,%r19
>     ldo R'-1444681467(%r20),%r20
>     and %r1,%r19,%r19
>     addl %r15,%r20,%r20
>     xor %r14,%r19,%r19
>     addl %r19,%r26,%r19
>     addl %r20,%r19,%r15
>     shd %r15,%r15,27,%r15
>     addl %r15,%r3,%r15

Here's the x86 assembly code for what appears to be the same basic OP:       movl %edx,%eax       xorl %esi,%eax
andl%edi,%eax       xorl %esi,%eax       movl -84(%ebp),%ecx       leal -1444681467(%ecx,%eax),%eax       addl
%eax,%ebx      roll $5,%ebx       addl %edx,%ebx 
This is a couple fewer instructions, mainly saving on doing any loads to
use the constant value.  This takes almost exactly 9 cycles per byte.

> There are 64 of these basic OPs needed in each round, and each round
> processes 64 input bytes, so basically you can figure one OP per byte.
> Ignoring loop overhead and so forth, it's nine or ten cycles per byte
> for MD5 versus six for CRC.

On Celeron/P3, CRC scores almost 13 cycles per byte, MD4 is about 6
cycles per byte, and MD5 is about 9 cycles per byte.  On Pentium MMX,
CRC is 7.25, MD4 is 7.5 and MD5 is 10.25.  So, the newer CPUs actually
do worse on CRC than the older ones do.  Weirder and weirder.

> I'm at a loss to see how a Pentium would arrive at a better result for
> MD5 than for CRC.  For one thing, it's going to be at a disadvantage
> because it hasn't got enough registers.

I agree.  It would appear that the table lookup is causing a major
bubble in the pipelines on the newer Celeron/P2/P3 CPUs.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
ncm@zembu.com (Nathan Myers)
Date:
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote:
> I'm at a loss to see how a Pentium would arrive at a better result for
> MD5 than for CRC.  For one thing, it's going to be at a disadvantage
> because it hasn't got enough registers.  I'd be interested to see the
> assembly code...

Minutiae aside, it's clear that the MD5 and CRC are "comparable",
regardless of CPU.

For a 32-bit hash, the proven characteristics of CRCs are critical in 
some applications.  With a good 64-bit hash, the probability of any 
collision whether from a burst error or otherwise becomes much lower 
than every other systematic source of error -- the details just don't
matter any more.  If you miss the confidence that CRCs gave you about 
burst errors, consider how easy it would be to construct a collision 
if you could just try changing a couple of adjacent bytes -- an 
exhaustive search would be easy.  

MD4 would be a better choice than MD5, despite that a theoretical attack
on MD4 has been described (albeit never executed).  We don't even care 
about real attacks, never mind theoretical ones.  What matters is that 
MD4 is entirely good enough, and faster to compute than MD5.

I find these results very encouraging.  BSD-licensed MD4 code is readily
available, e.g. from any of the BSDs themselves.

Nathan Myers
ncm@zembu.com



Re: Re: CRC

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
> Minutiae aside, it's clear that the MD5 and CRC are "comparable",
> regardless of CPU.

We've established that the inner loops are pretty comparable.  I'm
still concerned about the startup/termination costs of MD5 on short
records.  The numbers Bruce and I were trading were mostly for records
long enough to make the startup costs negligible, but they're not
negligible for sub-100-byte records.

> For a 32-bit hash, the proven characteristics of CRCs are critical in 
> some applications.  With a good 64-bit hash, the probability of any 
> collision whether from a burst error or otherwise becomes much lower 
> than every other systematic source of error -- the details just don't
> matter any more.

That's a good point.  Of course the critical phrase there is *good*
hash, ie, one without any systematic weaknesses, but as long as we
don't use a "method chosen at random" you're right, it hardly matters.

However, this just begs the question: can't the same be said of a 32-bit
checksum?  My argument the other day essentially was that 32 bits is
plenty for what we need to do, and I have not heard a refutation.

One thing we should look at before going with a 64-bit method is the
extra storage space for the larger checksum.  We can clearly afford
an extra 32 bits for a checksum on an 8K disk page, but if Vadim is
envisioning checksumming each individual XLOG record then the extra
space is more annoying.

Also, there's the KISS issue.  When it takes less than a dozen lines
to do a CRC, versus pages to do MD5, you have to ask yourself what the
extra code space is buying you... also whether you want to get into
licensing issues by borrowing someone else's code.  The MD5 code that
Bruce was using is GPL, not BSD, and so couldn't go into the Postgres
core anyway.  

> MD4 would be a better choice than MD5, despite that a theoretical attack
> on MD4 has been described (albeit never executed).  We don't even care 
> about real attacks, never mind theoretical ones.  What matters is that 
> MD4 is entirely good enough, and faster to compute than MD5.

> I find these results very encouraging.  BSD-licensed MD4 code is readily
> available, e.g. from any of the BSDs themselves.

MD4 would be worth looking at, especially if it has less
startup/shutdown overhead than MD5.  I think a 64-bit true CRC might
also be worth looking at, just for completeness.  But I don't know
where to find code for one.
        regards, tom lane


Re: Re: CRC

From
Tom Lane
Date:
Bruce Guenter <bruceg@em.ca> writes:
>> A good theory, but unfortunately not a correct theory.  PA-RISC can do a
>> circular shift in one cycle using the "shift right double" instruction,

> Interesting.  I was under the impression that virtually no RISC CPU had
> a rotate instruction.  Do any others?

Darn if I know.  A RISC purist would probably say that PA-RISC isn't all
that reduced ... for example, the reason it needs six cycles not seven
for the CRC inner loop is that the LOAD instruction has an option to
postincrement the pointer register (like a C "*ptr++").

> Same with the x86 core:
>         movb %dl,%al
>         xorb (%ecx),%al
>         andl $255,%eax
>         shrl $8,%edx
>         incl %ecx
>         xorl (%esi,%eax,4),%edx

> On my Celeron, the timing for those six opcodes is almost whopping 13
> cycles per byte.  Obviously there's some major performance hit to do the
> memory instructions, because there's no more than 4 cycles worth of
> dependant instructions in that snippet.

Yes.  It looks like we're looking at pipeline stalls for the memory
reads.  I expect PA-RISC would have the same problem if it were not that
the CRC table and data buffer are almost certainly loaded into level-2
cache memory.  Curious that you don't get the same result --- what is
the memory cache architecture on your box?

As Nathan remarks nearby, this is just minutiae, but I'm interested
anyway...
        regards, tom lane


Re: Re: CRC

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [001210 12:00] wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> >> A good theory, but unfortunately not a correct theory.  PA-RISC can do a
> >> circular shift in one cycle using the "shift right double" instruction,
> 
> > Interesting.  I was under the impression that virtually no RISC CPU had
> > a rotate instruction.  Do any others?
> 
> Darn if I know.  A RISC purist would probably say that PA-RISC isn't all
> that reduced ... for example, the reason it needs six cycles not seven
> for the CRC inner loop is that the LOAD instruction has an option to
> postincrement the pointer register (like a C "*ptr++").
> 
> > Same with the x86 core:
> >         movb %dl,%al
> >         xorb (%ecx),%al
> >         andl $255,%eax
> >         shrl $8,%edx
> >         incl %ecx
> >         xorl (%esi,%eax,4),%edx
> 
> > On my Celeron, the timing for those six opcodes is almost whopping 13
> > cycles per byte.  Obviously there's some major performance hit to do the
> > memory instructions, because there's no more than 4 cycles worth of
> > dependant instructions in that snippet.
> 
> Yes.  It looks like we're looking at pipeline stalls for the memory
> reads.  I expect PA-RISC would have the same problem if it were not that
> the CRC table and data buffer are almost certainly loaded into level-2
> cache memory.  Curious that you don't get the same result --- what is
> the memory cache architecture on your box?
> 
> As Nathan remarks nearby, this is just minutiae, but I'm interested
> anyway...

I would try unrolling the loop some (if possible) and retesting.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Re: CRC

From
Bruce Guenter
Date:
On Sun, Dec 10, 2000 at 02:53:43PM -0500, Tom Lane wrote:
> > On my Celeron, the timing for those six opcodes is almost whopping 13
> > cycles per byte.  Obviously there's some major performance hit to do the
> > memory instructions, because there's no more than 4 cycles worth of
> > dependant instructions in that snippet.
> Yes.  It looks like we're looking at pipeline stalls for the memory
> reads.

In particular, for the single-byte memory read.  By loading in 32-bit
words at a time, the cost drops to about 7 cycles per byte.  I
imagine on a 64-bit CPU, loading 64-bit words at a time would drop the
cost even further. word1 = *(unsigned long*)z; while (c > 4)   {     z += 4;     ick = IUPDC32 (word1, ick); word1 >>=
8;    c -= 4;     ick = IUPDC32 (word1, ick); word1 >>= 8;     word1 = *(unsigned long*)z;     ick = IUPDC32 (word1,
ick);word1 >>= 8;     ick = IUPDC32 (word1, ick);   } 
I tried loading two words at a time, starting to load the second word
well before it's used, but that didn't actually reduce the time taken.

> As Nathan remarks nearby, this is just minutiae, but I'm interested
> anyway...

Yup.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Bruce Guenter
Date:
On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote:
> I would try unrolling the loop some (if possible) and retesting.

The inner loop was already unrolled, but was only processing single
bytes at a time.  By loading in 32-bit words at once, it reduced the
cost to only 7 cycles per byte (from 13).
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Re: CRC

From
Erich
Date:
PG should include support for SHA1 anyway.  MD5 is not being used in
new stuff anymore, I think.  I actually have an SHA1 implementation
that links into PG if anyone is interested (especially if it could get
included in a future release).

e


Re: Re: CRC

From
Thomas Lockhart
Date:
> Interesting.  I was under the impression that virtually no RISC CPU had
> a rotate instruction.  Do any others?

(fyi; doesn't really contribute to the thread :/

Most or all do. There are no "pure RISC" chips in production; all have
had some optimized complex operations added for performance and for code
compactness.
                     - Thomas


Re: Re: CRC

From
Bruce Guenter
Date:
On Sun, Dec 10, 2000 at 02:36:38PM -0500, Tom Lane wrote:
> > MD4 would be a better choice than MD5, despite that a theoretical attack
> > on MD4 has been described (albeit never executed).  We don't even care
> > about real attacks, never mind theoretical ones.  What matters is that
> > MD4 is entirely good enough, and faster to compute than MD5.
>
> > I find these results very encouraging.  BSD-licensed MD4 code is readily
> > available, e.g. from any of the BSDs themselves.
>
> MD4 would be worth looking at, especially if it has less
> startup/shutdown overhead than MD5.  I think a 64-bit true CRC might
> also be worth looking at, just for completeness.  But I don't know
> where to find code for one.

The startup/shutdown for MD4 is identical to that of MD5, however the
inner loop is much smaller (a total of 48 operations instead of 64, with
fewer constants).  The inner MD4 loop is about 1.5 times the speed of
MD5.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: CRC was: Re: beta testing version

From
ncm@zembu.com (Nathan Myers)
Date:
On Thu, Dec 07, 2000 at 07:36:33PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > 2. I disagree with way the above statistics were computed.  That eleven 
> >    million-year figure gets whittled down pretty quickly when you 
> >    factor in all the sources of corruption, even without crashes.  
> >    (Power failures are only one of many sources of corruption.)  They 
> >    grow with the size and activity of the database.  Databases are 
> >    getting very large and busy indeed.
> 
> Sure, but the argument still holds.  If the net MTBF of your underlying
> system is less than a day, it's too unreliable to run a database that
> you want to trust.  Doesn't matter what the contributing failure
> mechanisms are.  In practice, I'd demand an MTBF of a lot more than a
> day before I'd accept a hardware system as satisfactory...

In many intended uses (such as Landmark's original plan?) it is not just 
one box, but hundreds or thousands.  With thousands of databases deployed, 
the MTBF (including power outages) for commodity hardware is well under a 
day, and there's not much you can do about that.

In a large database (e.g. 64GB) you have 8M blocks.  Each hash covers
one block.  With a 32-bit checksum, when you check one block, you have 
a 2^(-32) likelihood of missing an error, assuming there is one.  With 
8M blocks, you can only claim a 2^(-9) chance.

This is what I meant by "whittling".  A factor of ten or a thousand
here, another there, and pretty soon the possibility of undetected
corruption is something that can't reasonably be ruled out.


> > 3. Many users clearly hope to be able to pull the plug on their hardware 
> >    and get back up confidently.  While we can't promise they won't have 
> >    to go to their backups, we should at least be equipped to promise,
> >    with confidence, that they will know whether they need to.
> 
> And the difference in odds between 2^32 and 2^64 matters here?  I made
> a numerical case that it doesn't, and you haven't refuted it.  By your
> logic, we might as well say that we should be using a 128-bit CRC, or
> 256-bit, or heck, a few kilobytes.  It only takes a little longer to go
> up each step, right, so where should you stop?  I say MTBF measured in
> megayears ought to be plenty.  Show me the numerical argument that 64
> bits is the right place on the curve.

I agree that this is a reasonable question.  However, the magic of 
exponential growth makes any dissatisfaction with a 64-bit checksum
far less likely than with a 32-bit checksum.

It would forestall any such problems to arrange a configure-time
flag such as "--with-checksum crc-32" or "--with-checksum md4",
and make it clear where to plug in the checksum of one's choice.
Then, ship 7.2 with just crc-32 and let somebody else produce 
patches for alternatives if they want them.

BTW, I have been looking for Free 64-bit CRC codes/polynomials and 
the closest thing I have found so far was Mark Mitchell's hash, 
translated from the Modula-3 system.  All the tape drive makers
advertise (but don't publish (AFAIK)) a 64-bit CRC.

A reasonable approach would be to deliver CRC-32 in 7.2, and then
reconsider the default later if anybody contributes good alternatives.

Nathan Myers
ncm@zembu.com


Re: RFC: CRC datatype

From
ncm@zembu.com (Nathan Myers)
Date:
On Sat, Dec 09, 2000 at 12:03:52AM +1100, Horst Herb wrote:
> AFAIK the thread for "built in" crcs referred only to CRCs in
> the transaction log. 

We have been discussing checksums for both the table blocks and for
the transaction log.

> Always remember that a psotgres data base on the harddisk can be
> manipulated accidentally / maliciously without postgres even running.
> These are the cases where you need row level CRCs.

"There is no security without physical security."  If somebody can
change the row contents, they can also change the row and/or block 
checksum to match.

Nathan Myers
ncm@zembu.com


Re: RFC: CRC datatype

From
Horst Herb
Date:
O
> > Always remember that a psotgres data base on the harddisk can be
> > manipulated accidentally / maliciously without postgres even running.
> > These are the cases where you need row level CRCs.
>
> "There is no security without physical security."  If somebody can
> change the row contents, they can also change the row and/or block
> checksum to match.

They may, but in a proper setup they won't be able to access the CRC log 
files. That way, you can still detect alterations. I presume anyway that most 
alterations would be rather accidental than malicious, and in that case the 
CRC is extremely helpful

Horst