Hello again,
>>> - lrand48 (48 bits state as 3 uint16) is 29 ops
>>> (10 =, 8 *, 7 +, 4 >>)
>>
>> - xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw)
>> 8 =, 2 *, 2 +, 5 <<, 5 ^, 2 |
>
> Small benchmark on my laptop with gcc-7.3 -O3:
>
> - pg_lrand48 takes 4.0 seconds to generate 1 billion 32-bit ints
> - xoshiro256** takes 1.6 seconds to generate 1 billion 64-bit ints
These too-good-to-be-true figures have raised doubt in my mind, so after
giving it some thought, I do not trust them: the function call is probably
inlined and other optimizations are performed which would not apply in a
more realistic case.
> With -O2 it is 4.8 and 3.4 seconds, respectively.
I trust this one, which is reasonably consistent with the operation count.
More tests with "clang -O2" yielded 3.2 and 1.6 respectively, that I do
not trust either.
I did separate compilation to prevent inlining and other undesirable
optimizations: clang -Ofast or -O2 gives 5.2 and 3.9, gcc -Ofast gives 5.4
and 3.5.
It seems that clang is better at compiling pg_erand48 but gcc is better at
xoroshi256**.
> So significantly better speed _and_ quality are quite achievable.
xoroshi256** is about 1/3 faster at producing twice as much pseudo-randoms
stuff, and it passed significant randomness tests that an LCG PRNG would
not.
--
Fabien.