Hello Dean,
Thanks a lot for this work. This version looks much better than the
previous one you sent, and has significant advantages over the one I sent,
in particular avoiding the prime number stuff and large modular multiply.
So this looks good!
I'm happy that halves-xoring is back because it was an essential part of
the design. ISTM that the previous version you sent only had linear/affine
transforms which was a bad idea.
The linear transform is moved inside halves, why not, and the
any-odd-number multiplication is prime with power-of-two trick is neat on
these.
However I still have some reservations:
First, I have a thing against erand48. The erand 48 bits design looks like
something that made sense when computer architectures where 16/32 bits and
large integers were 32 bits, so a 16 bits margin looked good enough. Using
a int16[3] type now is really depressing and probably costly on modern
architectures. With 64 bits architecture, and given that we are
manipulating 64 bits integers (well 63 really), ISTM that the minimal
state size for a PRNG should be at least 64 bits. It there a good reason
NOT to use a 64 bits state prn generator? I took Knuth's because it is
cheap and 64 bits. I'd accept anything which is at least 64 bits. I looked
at xor-shift things, but there was some controversy around these so I kept
it simple and just shifted small values to get rid of the obvious cycles
on Knuth's generator.
Second, I have a significant reservation about the very structure of the
transformation in this version:
loop 4 times :
// FIRST HALF STEER
m/r = pseudo randoms
if v in first "half"
v = ((v * m) ^ r) & mask;
rotate1(v)
// FULL SHIFT 1
r = pseudo random
v = (v + r) % size
// SECOND HALF STEER
m/r = pseudo randoms
if v in second "half"
same as previous on second half
// FULL SHIFT 2
r = pseudo random
v = (v + r) % size
I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of
values are kept out of STEERING. Whole chunks of values could be kept
unshuffled because they would only have SHIFTS apply to them and each time
fall in the not steered half. It should be an essential part of the design
that at least one steer is applied on a value at each round, and if two
are applied then fine, but certainly not zero. So basically I think that
the design would be significantly improved by removing "FULL SHIFT 1".
Third, I think that the rotate code can be simplified, in particular the
?: should be avoided because it may induce branches quite damaging to
processor performance.
I'll give it some more thoughts.
--
Fabien.