>> The first value is taken about 75% of the time for N=1000 and s=2.5, which
>> means that a non deterministic implementation would succeed about 0.75² ~
>> 56% of the time on that one.
>
> Right, that's about what we've been seeing on OpenBSD.
>
>> Also, the drawing procedure is less efficient when the parameter is close
>> to 1 because it is more likely to loop,
>
> That might be something to fix, but I agree it's a reason not to go
> overboard trying to flatten the test case's distribution right now.
Probably you would have to invent a new method to draw a zipfian
distribution for that, which would be nice.
>> If you want something more drastic, using 1.5 instead of 2.5 would reduce
>> the probability of accidentaly passing the test by chance to about 20%, so
>> it would fail 80% of the time.
>
> I think your math is off;
Argh. Although I confirm my computation, ISTM that with 1.5 the first
value as 39% chance of getting out so collision on 15% of cases, second
value 14% so collision on 2%, ... total cumulated probability about 18%.
> 1.5 works quite well here. I saw one failure to produce distinct values
> in 20 attempts.
For 3 failure expected, that is possible.
> It's not demonstrably slower than 2.5 either. (1.1 is measurably
> slower; probably not by enough for anyone to care, but 1.5 is good
> enough for me.)
Good if it fails quick enough for you.
--
Fabien.