Re: rand48 replacement - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: rand48 replacement
Date
Msg-id 8904daa4a916b4990b11d7140e0ff81f@postgrespro.ru
Whole thread Raw
In response to Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: rand48 replacement  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Fabien COELHO писал 2021-07-06 23:49:
> Hello Yura,
> 
>>> However, I'm not enthousiastic at combining two methods depending on
>>> the range, the function looks complex enough without that, so I would
>>> suggest not to take this option. Also, the decision process adds to
>>> the average cost, which is undesirable.
>> 
>> Given 99.99% cases will be in the likely case, branch predictor should
>> eliminate decision cost.
> 
> Hmmm. ISTM that a branch predictor should predict that unknown < small
> should probably be false, so a hint should be given that it is really
> true.

Why? Branch predictor is history based: if it were usually true here
then it will be true this time either.
unknown < small is usually true therefore branch predictor will assume
it is true.

I put `likely` for compiler: compiler then puts `likely` path closer.

> 
>> And as Dean Rasheed correctly mentioned, mask method will have really 
>> bad pattern for branch predictor if range is not just below or equal 
>> to power of two.
> 
> On average the bitmask is the better unbiased method, if the online
> figures are to be trusted. Also, as already said, I do not really want
> to add code complexity, especially to get lower average performance,
> and especially with code like "threshold = - range % range", where
> both variables are unsigned, I have a headache just looking at it:-)

If you mention https://www.pcg-random.org/posts/bounded-rands.html then
1. first graphs are made with not exact Lemire's code.
   Last code from 
https://lemire.me/blog/2016/06/30/fast-random-shuffling/
   (which I derived from) performs modulo operation only if `(leftover < 
range)`.
   Even with `rand_range(1000000)` it is just once in four thousands 
runs.
2. You can see "Small-Constant Benchmark" at that page, Debiased Int is
   1.5 times faster. And even in "Small-Shuffle" benchmark their 
unoptimized
   version is on-par with mask method.
3. If you go to "Making Improvements/Faster Threshold-Based Discarding"
   section, then you'll see code my version is matched with. It is twice
   faster than masked method in Small-Shuffle benchmark, and just a bit
   slower in Large-Shuffle.

> 
>> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
>> on modern processor.
> 
> Well, % is not cheap either.
> 
>> Well, probably it could run in parallel with some part of xoroshiro, 
>> but it depends on how compiler will optimize this function.
>> 
>>> I would certainly select the unbias multiply method if we want a u32 
>>> range variant.
>> 
>> There could be two functions.
> 
> Yep, but do we need them? Who is likely to want 32 bits pseudo random
> ints in a range? pgbench needs 64 bits.
> 
> So I'm still inclined to just keep the faster-on-average bitmask
> method, despite that it may be slower for some ranges. The average
> cost for the worst case in PRNG calls is, if I'm not mistaken:
> 
>   1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2
> 
> which does not look too bad.

You doesn't count cost of branch-misprediction.
https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array
https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/
Therefore calculation should be at least:

    1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6

By the way, we have 64bit random. If we use 44bit from it for range <= 
(1<<20), then
bias will be less than 1/(2**24). Could we just ignore it (given it is 
not crypto
strong random)?

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
   uint64 val = xoroshiro128ss(state);
   uint64 m;
   if ((range & (range-1) == 0) /* handle all power 2 cases */
     return range != 0 ? val & (range-1) : 0;
   if (likely(range < (1<<20)))
      /*
       * While multiply method is biased, bias will be smaller than 
1/(1<<24) for
       * such small ranges. Lets ignore it.
       */
      return ((val >> 20) * range) >> 44;
   /* Apple's mask method */
   m = mask_u64(range-1);
   val &= m;
   while (val >= range)
     val = xoroshiro128ss(state) & m;
   return val;
}

Anyway, excuse me for heating this discussion cause of such 
non-essential issue.
I'll try to control myself and don't proceed it further.

regards,
Sokolov Yura.



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output
Next
From: Amit Kapila
Date:
Subject: Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options