Hello Yura,
>> However, I'm not enthousiastic at combining two methods depending on
>> the range, the function looks complex enough without that, so I would
>> suggest not to take this option. Also, the decision process adds to
>> the average cost, which is undesirable.
>
> Given 99.99% cases will be in the likely case, branch predictor should
> eliminate decision cost.
Hmmm. ISTM that a branch predictor should predict that unknown < small
should probably be false, so a hint should be given that it is really
true.
> And as Dean Rasheed correctly mentioned, mask method will have really
> bad pattern for branch predictor if range is not just below or equal to
> power of two.
On average the bitmask is the better unbiased method, if the online
figures are to be trusted. Also, as already said, I do not really want to
add code complexity, especially to get lower average performance, and
especially with code like "threshold = - range % range", where both
variables are unsigned, I have a headache just looking at it:-)
> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
> on modern processor.
Well, % is not cheap either.
> Well, probably it could run in parallel with some part of xoroshiro, but
> it depends on how compiler will optimize this function.
>
>> I would certainly select the unbias multiply method if we want a u32
>> range variant.
>
> There could be two functions.
Yep, but do we need them? Who is likely to want 32 bits pseudo random
ints in a range? pgbench needs 64 bits.
So I'm still inclined to just keep the faster-on-average bitmask method,
despite that it may be slower for some ranges. The average cost for the
worst case in PRNG calls is, if I'm not mistaken:
1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2
which does not look too bad.
--
Fabien.