> About costs, not counting array accesses:
>
> - lrand48 (48 bits state as 3 uint16) is 29 ops
> (10 =, 8 *, 7 +, 4 >>)
> - xorshift+ (128 bits state as 2 uint64) is 13 ops
> ( 5 =, 0 *, 1 +, 3 >>, 4 ^)
> - xororshift128+ (idem) is 17 ops
> ( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2 |, less if rot in hardware)
> - WELL512 (512 bits state as 16 uint32) is 38 ops
> (11 =, 0 *, 3 +, 7 >>, 10 ^, 4 &)
> probably much better, but probably slower than the current version
>
> I'd be of the (debatable) opinion that we could use xororshift128+, already
> used by various languages, even if it fails some specialized tests.
After some more digging, the better choice seems to be the 64 bits
optimized xoshiro256** (xoshiro = xor shift rotate):
- xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw)
8 =, 2 *, 2 +, 5 <<, 5 ^, 2 |
See http://vigna.di.unimi.it/xorshift/
--
Fabien.