Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id alpine.DEB.2.21.1809260717020.22248@lancre
Whole thread Raw
In response to Re: pgbench - add pseudo-random permutation function  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: pgbench - add pseudo-random permutation function
List pgsql-hackers
Hello Thomas,

> modular_multiply() is an interesting device.  I will leave this to
> committers with a stronger mathematical background than me, but I have
> a small comment in passing:

For testing this function, I have manually commented out the various 
shortcuts so that the manual multiplication code is always used, and the 
tests passed. I just did it again.

> +#ifdef HAVE__BUILTIN_POPCOUNTLL
> + return __builtin_popcountll(n);
> +#else /* use clever no branching bitwise operator version */
>
> I think it is not enough for the compiler to have 
> __builtin_popcountll().  The CPU that this is eventually executed on 
> must actually have the POPCNT instruction[1] (or equivalent on ARM, 
> POWER etc), or the program will die with SIGILL.

Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to 
the compiler the hassle of selecting the best option available, depending 
on target hardware class: whether it issues a cpu/sse4 instruction is 
not/should not be our problem.

> There are CPUs in circulation produced in this decade that don't have 
> it.

Then the compiler, when generating code that is expected to run for a 
large class of hardware which include such old ones, should not use a 
possibly unavailable instruction, or the runtime should take 
responsability for dynamically selecting a viable option.

My understanding is that it should always be safe to call __builtin_XYZ 
functions when available. Then if you compile saying that you want code 
specific to cpu X and then run it on cpu Y and it fails, basically you 
just shot yourself in the foot.

> I have previously considered something like this[2], but realised you 
> would therefore need a runtime check.  There are a couple of ways to do 
> that: see commit a7a73875 for one example, also 
> __builtin_cpu_supports(), and direct CPU ID bit checks (some platforms). 
> There is also the GCC "multiversion" system that takes care of that for 
> you but that worked only for C++ last time I checked.

ISTM that the purpose of a dynamic check would be to have the better 
hardware benefit even when compiling for a generic class of hardware which 
may or may not have the better option.

This approach is fine for reaching a better performance/portability 
compromise, but I do not think that it is that useful for pgbench to go to 
this level of optimization, unless someone else takes care, hence the 
compiler builtin.

An interesting side effect of your mail is that while researching a bit on 
the subject I noticed __builtin_clzll() which helps improve the nbits code 
compared to using popcount. Patch attached uses CLZ insted of POPCOUNT.

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: Chris Travers
Date:
Subject: Re: Proposal for Signal Detection Refactoring
Next
From: Fabien COELHO
Date:
Subject: Re: Progress reporting for pg_verify_checksums