>> - lrand48 (48 bits state as 3 uint16) is 29 ops
>> (10 =, 8 *, 7 +, 4 >>)
>
> - xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw)
> 8 =, 2 *, 2 +, 5 <<, 5 ^, 2 |
>
> See http://vigna.di.unimi.it/xorshift/
Small benchmark on my laptop with gcc-7.3 -O3:
- pg_lrand48 takes 4.0 seconds to generate 1 billion 32-bit ints
- xoshiro256** takes 1.6 seconds to generate 1 billion 64-bit ints
With -O2 it is 4.8 and 3.4 seconds, respectively. So significantly better
speed _and_ quality are quite achievable.
Note that small attempt at optimizing these functions (inline constants,
array replaced with scalars) did not yield significant improvements.
--
Fabien.