Re: Cost of XLogInsert CRC calculations - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Cost of XLogInsert CRC calculations
Date
Msg-id 1115761565.3830.196.camel@localhost.localdomain
Whole thread Raw
In response to Re: Cost of XLogInsert CRC calculations  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Cost of XLogInsert CRC calculations  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 2005-05-10 at 10:34 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > > I was just researching some articles on compression (zlib) and I saw mention
> > > of the Adler-32 algorithm which is supposed to be slightly less accurate
> > > than an equivalent CRC calculation but significantly faster to compute. I
> > > haven't located a good paper comparing the error rates of the two different
> > > checksums,
> > 
> > ... probably because there isn't one.  With all due respect to the Zip
> > guys, I doubt anyone has done anywhere near the analysis on Adler-32
> > that has been done on CRCs.  I'd much prefer to stick with true CRC
> > and drop it to 32 bits than go with a less-tested algorithm.  Throwing
> > more bits at the problem doesn't necessarily create a safer checksum.
> 
> Agreed.  64-bit was overkill when we added it, and it is now shown to be
> a performance problem.

Hold on... Tom has shown that there is a performance problem with the
existing CRC calculation. Thanks to Mark for staying on top of that with
some good ideas. 

The cause of the performance problem has been attributed to it being a
64-bit rather than 32-bit calculation. That is certainly part of it, but
I have seen evidence that there is an Intel processor stall associated
with the use of a single byte constant somewhere in the algorithm. So
I'm unclear as to what extent the poor performance is attributable to
either issue.

That's where my experience stops so I have highlighted that for somebody
with more hardware specific assembler experience to have a look at the
algorithm. Fixing that, if possible, could greatly improve the
performance whether or not we drop from 64 to 32 bits. My hope for
outside assistance on that looks like it is not now forthcoming.

My guess would be that the algorithm's use of the byte-by-byte
calculation together with a bitmask of &FF is responsible. Perhaps
varying the length of the bitmask to either &000000FF or longer may
help? (see src/include/xlog_internal.h)

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [PATCHES] Cleaning up unreferenced table files
Next
From: Simon Riggs
Date:
Subject: Re: Table Partitioning, Part 1