Re: What exactly is our CRC algorithm? - Mailing list pgsql-hackers

From Abhijit Menon-Sen
Subject Re: What exactly is our CRC algorithm?
Date
Msg-id 20141119155811.GA32492@toroid.org
Whole thread Raw
In response to Re: What exactly is our CRC algorithm?  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Responses Re: What exactly is our CRC algorithm?  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
At 2014-11-11 16:56:00 +0530, ams@2ndQuadrant.com wrote:
>
> I'm working on this (first speeding up the default calculation using
> slice-by-N, then adding support for the SSE4.2 CRC instruction on
> top).

I've done the first part in the attached patch, and I'm working on the
second (especially the bits to issue CPUID at startup and decide which
implementation to use).

As a benchmark, I ran pg_xlogdump --stats against 11GB of WAL data (674
segments) generated by running a total of 2M pgbench transactions on a
db initialised with scale factor 25. The tests were run on my i5-3230
CPU, and the code in each case was compiled with "-O3 -msse4.2" (and
without --enable-debug). The profile was dominated by the CRC
calculation in ValidXLogRecord.

With HEAD's CRC code:

bin/pg_xlogdump --stats wal/000000010000000000000001   29.81s user 3.56s system 77% cpu 43.274 total
bin/pg_xlogdump --stats wal/000000010000000000000001   29.59s user 3.85s system 75% cpu 44.227 total

With slice-by-4 (a minor variant of the attached patch; the results are
included only for curiosity's sake, but I can post the code if needed):

bin/pg_xlogdump --stats wal/000000010000000000000001   13.52s user 3.82s system 48% cpu 35.808 total
bin/pg_xlogdump --stats wal/000000010000000000000001   13.34s user 3.96s system 48% cpu 35.834 total

With slice-by-8 (i.e. the attached patch):

bin/pg_xlogdump --stats wal/000000010000000000000001   7.88s user 3.96s system 34% cpu 34.414 total
bin/pg_xlogdump --stats wal/000000010000000000000001   7.85s user 4.10s system 34% cpu 35.001 total

(Note the progressive reduction in user time from ~29s to ~8s.)

Finally, just for comparison, here's what happens when we use the
hardware instruction via gcc's __builtin_ia32_crc32xx intrinsics
(i.e. the additional patch I'm working on):

bin/pg_xlogdump --stats wal/000000010000000000000001   3.33s user 4.79s system 23% cpu 34.832 total

There are a number of potential micro-optimisations, I just wanted to
submit the obvious thing first and explore more possibilities later.

-- Abhijit

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Add shutdown_at_recovery_target option to recovery.conf
Next
From: Robert Haas
Date:
Subject: Re: group locking: incomplete patch, just for discussion