Re: Cost of XLogInsert CRC calculations - Mailing list pgsql-hackers

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> I didn't post the sources to the list originally as I wasn't sure if the
> topic were of enough interest to warrant a larger email. I've attached the
> two corrected programs as a .tar.gz - crctest.c uses uint32, whereas
> crctest64.c uses uint64.

I did some experimentation and concluded that gcc is screwing up
big-time on optimizing the CRC64 code for 32-bit Intel.  It does much
better on every other architecture though.

Here are some numbers with gcc 3.2.3 on an Intel Xeon machine.  (I'm
showing the median of three trials in each case, but the numbers were
pretty repeatable.  I also tried gcc 4.0.0 on this machine and got
similar numbers.)

gcc -O1 crctest.c            0.328571 s
gcc -O2 crctest.c            0.297978 s
gcc -O3 crctest.c            0.306894 s

gcc -O1 crctest64.c            0.358263 s
gcc -O2 crctest64.c            0.773544 s
gcc -O3 crctest64.c            0.770945 s

When -O2 is slower than -O1, you know the compiler is blowing it :-(.
I fooled around with non-default -march settings but didn't see much
change.

Similar tests on a several-year-old Pentium 4 machine, this time with
gcc version 3.4.3:

gcc -O1 -march=pentium4 crctest.c    0.486266 s
gcc -O2 -march=pentium4 crctest.c    0.520237 s
gcc -O3 -march=pentium4 crctest.c    0.520299 s

gcc -O1 -march=pentium4 crctest64.c    0.928107 s
gcc -O2 -march=pentium4 crctest64.c    1.247673 s
gcc -O3 -march=pentium4 crctest64.c    1.654102 s

Here are some comparisons showing that the performance difference is
not inherent:

IA64 (Itanium 2), gcc 3.2.3:

gcc -O1 crctest.c            0.898595 s
gcc -O2 crctest.c            0.599005 s
gcc -O3 crctest.c            0.598824 s

gcc -O1 crctest64.c            0.524257 s
gcc -O2 crctest64.c            0.524168 s
gcc -O3 crctest64.c            0.524140 s

X86_64 (Opteron), gcc 3.2.3:

gcc -O1 crctest.c            0.460000 s
gcc -O2 crctest.c            0.460000 s
gcc -O3 crctest.c            0.460000 s

gcc -O1 crctest64.c            0.410000 s
gcc -O2 crctest64.c            0.410000 s
gcc -O3 crctest64.c            0.410000 s

PPC64 (IBM POWER4+), gcc 3.2.3

gcc -O1 crctest.c            0.819492 s
gcc -O2 crctest.c            0.819427 s
gcc -O3 crctest.c            0.820616 s

gcc -O1 crctest64.c            0.751639 s
gcc -O2 crctest64.c            0.894250 s
gcc -O3 crctest64.c            0.888959 s

PPC (Mac G4), gcc 3.3

gcc -O1 crctest.c            0.949094 s
gcc -O2 crctest.c            1.011220 s
gcc -O3 crctest.c            1.013847 s
gcc -O1 crctest64.c            1.314093 s
gcc -O2 crctest64.c            1.015367 s
gcc -O3 crctest64.c            1.011468 s

HPPA, gcc 2.95.3:

gcc -O1 crctest.c            1.796604 s
gcc -O2 crctest.c            1.676023 s
gcc -O3 crctest.c            1.676476 s
gcc -O1 crctest64.c            2.022798 s
gcc -O2 crctest64.c            1.916185 s
gcc -O3 crctest64.c            1.904094 s

Given the lack of impressive advantage to the 64-bit code even on 64-bit
architectures, it might be best to go with the 32-bit code everywhere,
but I also think we have grounds to file a gcc bug report.

Anyone want to try it with non-gcc compilers?  I attach a slightly
cleaned-up version of Mark's original (doesn't draw compiler warnings
or errors on what I tried it on).

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: postgreSQL as deductive DBMS
Next
From: Tom Lane
Date:
Subject: Re: SO_KEEPALIVE