Hello Dean,
> It may be true that the bias is of the same magnitude as FP multiply,
> but it is not of the same nature. With FP multiply, the
> more-likely-to-be-chosen values are more-or-less evenly distributed
> across the range, whereas modulo concentrates them all at one end,
> making it more likely to bias test results.
Yes, that is true.
> It's worth paying attention to how other languages/libraries implement
> this, and basically no one chooses the modulo method, which ought to
> raise alarm bells. Of the biased methods, it has the worst kind of
> bias and the worst performance.
Hmmm. That is not exactly how I interpreted the figures in the blog.
> If a biased method is OK, then the biased integer multiply method
> seems to be the clear winner. This requires the high part of a
> 64x64-bit product, which is trivial if 128-bit integers are available,
> but would need a little more work otherwise. There's some code in
> common/d2s that might be suitable.
And yes, modulo is expensive. If we allow 128 bits integers operations, I
would not choose this RNPG in the first place, I'd take PCG with a 128 bits state.
That does not change the discussion about bias, though.
> Most other implementations tend to use an unbiased method though, and I
> think it's worth doing the same. It might be a bit slower, or even
> faster depending on implementation and platform, but in the context of
> the DB as a whole, I don't think a few extra cycles matters either way.
Ok ok ok, I surrender!
> The method recommended at the very end of that blog seems to be pretty
> good all round, but any of the other commonly used unbiased methods
> would probably be OK too.
That does not address my other issues with the proposed methods, in
particular the fact that the generated sequence is less deterministic, but
I think I have a simple way around that. I'm hesitating to skip to the the
bitmask method, and give up performance uniformity. I'll try to come up
with something over the week-end, and also address Tom's comments in
passing.
--
Fabien.