Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: pgbench - add pseudo-random permutation function |
Date | |
Msg-id | CAEepm=0FMnWm9miM3Tkuq5j3Q31WQJ7vvniy6rinCfkrsByQRQ@mail.gmail.com Whole thread Raw |
In response to | Re: pgbench - add pseudo-random permutation function (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: pgbench - add pseudo-random permutation function
|
List | pgsql-hackers |
On Wed, Sep 26, 2018 at 8:26 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > modular_multiply() is an interesting device. I will leave this to > > committers with a stronger mathematical background than me, but I have > > a small comment in passing: > > For testing this function, I have manually commented out the various > shortcuts so that the manual multiplication code is always used, and the > tests passed. I just did it again. > > > +#ifdef HAVE__BUILTIN_POPCOUNTLL > > + return __builtin_popcountll(n); > > +#else /* use clever no branching bitwise operator version */ > > > > I think it is not enough for the compiler to have > > __builtin_popcountll(). The CPU that this is eventually executed on > > must actually have the POPCNT instruction[1] (or equivalent on ARM, > > POWER etc), or the program will die with SIGILL. > > Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to > the compiler the hassle of selecting the best option available, depending > on target hardware class: whether it issues a cpu/sse4 instruction is > not/should not be our problem. True, it selects based on what you tell it: $ cat x.c #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { printf("%d\n", __builtin_popcount(atoi(argv[1]))); } $ gdb -q a.out ... (gdb) disass main ... 0x00000000004005a4 <+39>: callq 0x4005c0 <__popcountdi2> ... $ gcc -mpopcnt x.c $ gdb -q a.out ... (gdb) disass main ... 0x000000000040059f <+34>: popcnt %eax,%eax ... I'd forgotten one detail... we figure out if need to tell it that we have SSE4.2 instructions explicitly in the configure script: checking which CRC-32C implementation to use... SSE 4.2 with runtime check We enable -msse4.2 just on the one file that needs it, in src/port/Makefile: pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_SSE42) On this CentOS machine I see: gcc ... -msse4.2 ... -c pg_crc32c_sse42.c -o pg_crc32c_sse42_srv.o That is necessary because most people consume PostgreSQL through packages from distributions that have to work on an Athlon II or whatever, so we can't just use -msse4.2 for every translation unit. So it becomes our job to isolate small bits of code that use newer instructions, if it's really worth the effort to do that, and supply our own runtime checks and provide a fallback. > > There are CPUs in circulation produced in this decade that don't have > > it. > > Then the compiler, when generating code that is expected to run for a > large class of hardware which include such old ones, should not use a > possibly unavailable instruction, or the runtime should take > responsability for dynamically selecting a viable option. Right. Our problem is that if we didn't do our own runtime testing, users of (say) Debian and RHEL packages (= most users of PostgreSQL) would effectively never be able to use things like CRC32 instructions. None of that seems worth it for something like this. > An interesting side effect of your mail is that while researching a bit on > the subject I noticed __builtin_clzll() which helps improve the nbits code > compared to using popcount. Patch attached uses CLZ insted of POPCOUNT. It's annoying that fls() didn't make it into POSIX along with ffs(). On my system it compiles to a BSR instruction, just like __builtin_clz(). -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: