Re: What exactly is our CRC algorithm? - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: What exactly is our CRC algorithm?
Date
Msg-id 546CC8F1.3010404@vmware.com
Whole thread Raw
In response to Re: What exactly is our CRC algorithm?  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Responses Re: What exactly is our CRC algorithm?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 11/19/2014 05:58 PM, Abhijit Menon-Sen wrote:
> At 2014-11-11 16:56:00 +0530, ams@2ndQuadrant.com wrote:
>>
>> I'm working on this (first speeding up the default calculation using
>> slice-by-N, then adding support for the SSE4.2 CRC instruction on
>> top).
>
> I've done the first part in the attached patch, and I'm working on the
> second (especially the bits to issue CPUID at startup and decide which
> implementation to use).
>
> As a benchmark, I ran pg_xlogdump --stats against 11GB of WAL data (674
> segments) generated by running a total of 2M pgbench transactions on a
> db initialised with scale factor 25.

That's an interesting choice of workload. That sure is heavy on the CRC 
calculation, but the speed of pg_xlogdump hardly matters in real life.

> With HEAD's CRC code:
>
> bin/pg_xlogdump --stats wal/000000010000000000000001   29.81s user 3.56s system 77% cpu 43.274 total
> bin/pg_xlogdump --stats wal/000000010000000000000001   29.59s user 3.85s system 75% cpu 44.227 total
>
> With slice-by-4 (a minor variant of the attached patch; the results are
> included only for curiosity's sake, but I can post the code if needed):
>
> bin/pg_xlogdump --stats wal/000000010000000000000001   13.52s user 3.82s system 48% cpu 35.808 total
> bin/pg_xlogdump --stats wal/000000010000000000000001   13.34s user 3.96s system 48% cpu 35.834 total
>
> With slice-by-8 (i.e. the attached patch):
>
> bin/pg_xlogdump --stats wal/000000010000000000000001   7.88s user 3.96s system 34% cpu 34.414 total
> bin/pg_xlogdump --stats wal/000000010000000000000001   7.85s user 4.10s system 34% cpu 35.001 total
>
> (Note the progressive reduction in user time from ~29s to ~8s.)
>
> Finally, just for comparison, here's what happens when we use the
> hardware instruction via gcc's __builtin_ia32_crc32xx intrinsics
> (i.e. the additional patch I'm working on):
>
> bin/pg_xlogdump --stats wal/000000010000000000000001   3.33s user 4.79s system 23% cpu 34.832 total

Impressive!

It would be good to see separate benchmarks on WAL generation, and WAL 
replay. pg_xlogdump is probably close to WAL replay, but the WAL 
generation case is somewhat different, as WAL is generated in small 
pieces, and each piece is accumulated to the CRC separately. The 
slice-by-X algorithm might be less effective in that scenario. I have no 
doubt that it's still a lot faster than the byte-at-a-time operation, 
but would be nice to have numbers on it.

- Heikki




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: proposal: plpgsql - Assert statement
Next
From: Robert Haas
Date:
Subject: Re: What exactly is our CRC algorithm?