Re: Enabling Checksums - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Enabling Checksums
Date
Msg-id CA+CSw_uXO-fRkuzL0Yzs0wSdL8dipZV-ugMvYN-yV45SGUBU2w@mail.gmail.com
Whole thread Raw
In response to Re: Enabling Checksums  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, Mar 20, 2013 at 12:52 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 3/19/13 6:08 PM, Ants Aasma wrote:
>>
>> My main worry is that there is a reasonably
>> large population of users out there that don't have that acceleration
>> capability and will have to settle for performance overhead 4x worse
>> than what you currently measured for a shared buffer swapping
>> workload.
>
>
> That would be very bad.  I want to keep hammering on this part of the
> implementation.  If the only style of checksum that's computationally
> feasible is the Fletcher one that's already been done--if that approach is
> basically the most expensive one that's practical to use--I'd still consider
> that a major win over doing nothing.

Well there is also the SIMD checksum that outperforms hardware
assisted CRC's, is almost 3 times as fast as Fletcher on the most
popular platform, should run fast on every CPU that has vector
instructions (almost all server CPUs from the last 10 years), should
run fast even on last two generations of cellphone CPUs and I don't
see any obvious errors that it misses. It will require some
portability work (maybe use intrinsics instead of relying on the
vectorizer) but I don't see why it wouldn't work.

> While being a lazy researcher today instead of writing code, I discovered
> that the PNG file format includes a CRC-32 on its data chunks, and to
> support that there's a CRC32 function inside of zlib:
> http://www.zlib.net/zlib_tech.html
>
> Is there anywhere that compiles a PostgreSQL --without-zlib that matters?
>
> The UI looks like this:
>
> ZEXTERN uLong ZEXPORT crc32 OF((uLong crc, const Bytef *buf, uInt len));
>
> And they've already put some work into optimizing its table-driven
> implementation.  Seems possible to punt the whole problem of how to do this
> efficiently toward the zlib developers, let them drop into assembly to get
> the best possible Intel acceleration etc. one day.

That's the same byte at a time lookup-table algorithm that Intel uses
in the slice-by-8 method, zlib uses a 4 level lookup table for a
smaller table but more overhead. Also, zlib uses the 0x04C11DB7
polynomial that is not supported by the Intel accelerated crc32c
instruction. I believe that if we go crc32 route we should definitely
pick the Castagnoli polynomial that atleast has the hope of being
accelerated.

I copied crc32.c, crc32.h and zutil.h from zlib to the test framework
and ran the tests. While at it I also did a version where the fletcher
loop was unrolled by hand 8 times.

Results on sandy bridge (plain -O2 compile):
CRC32 slicing by 8 Algorithm (bytes/cycle), 0.522284
CRC32 zlib (bytes/cycle), 0.308307
Fletcher Algorithm: (bytes/cycle), 1.891964
Fletcher Algorithm hand unrolled: (bytes/cycle), 3.306666
SIMD Algorithm (gcc): (bytes/cycle), 0.572407
SIMD Algorithm (hand coded): (bytes/cycle), 9.124589

Results from papers:
crc32c instruction (castagnoli only): 6.8 bytes/cycle
pqlmulqdq based crc32: 0.9 bytes/cycle

Fletcher is also still a strong contender, we just need to replace the
255 modulus with something less prone to common errors, maybe use
65521 as the modulus. I'd have to think how to best combine the values
in that case. I believe we can lose the property that neither byte can
be zero, just avoiding both being zero seems good enough to me.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Enabling Checksums
Next
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums