Re: rand48 replacement - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: rand48 replacement
Date
Msg-id 3136e2672d5ab395521bae05a80df469@postgrespro.ru
Whole thread Raw
In response to Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Fabien COELHO писал 2021-07-04 23:29:
>> The important property of determinism is that if I set a seed, and 
>> then make an identical set of calls to the random API, the results 
>> will be identical every time, so that it's possible to write tests 
>> with predictable/repeatable results.
> 
> Hmmm… I like my stronger determinism definition more than this one:-)
> 
>>> I would work around that by deriving another 128 bit generator 
>>> instead of splitmix 64 bit, but that is overkill.
>> 
>> Not really relevant now, but I'm pretty sure that's impossible to do.
>> You might try it as an interesting academic exercise, but I believe
>> it's a logical impossibility.
> 
> Hmmm… I was simply thinking of seeding a new pg_prng_state from the
> main pg_prng_state with some transformation, and then iterate over the
> second PRNG, pretty much like I did with splitmix, but with 128 bits
> so that your #states argument does not apply, i.e. something like:
> 
>  /* select in a range with bitmask rejection */
>  uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
>  {
>     /* always advance state once */
>     uint64 next = xoroshiro128ss(state);
>     uint64 val;
> 
>     if (range >= 2)
>     {
>         uint64 mask = mask_u64(range-1);
> 
>         val = next & mask;
> 
>         if (val >= range)
>         {
>             /* copy and update current prng state */
>             pg_prng_state iterator = *state;
> 
>             iterator.s0 ^= next;
>             iterator.s1 += UINT64CONST(0x9E3779B97f4A7C15);
> 
>             /* iterate till val in [0, range) */
>             while ((val = xoroshiro128ss(&iterator) & mask) >= range)
>                 ;
>         }
>     }
>     else
>         val = 0;
> 
>     return val;
>  }
> 
> The initial pseudo-random sequence is left to proceed, and a new PRNG
> is basically forked for iterating on the mask, if needed.

I believe most "range" values are small, much smaller than UINT32_MAX.
In this case, according to [1] fastest method is Lemire's one (I'd take
original version from [2])

Therefore combined method pg_prng_u64_range could branch on range value

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
   uint64 val = xoroshiro128ss(state);
   uint64 m;
   if ((range & (range-1) == 0) /* handle all power 2 cases */
     return range != 0 ? val & (range-1) : 0;
   if (likely(range < PG_UINT32_MAX/32)
   {
     /*
      * Daniel Lamire's unbiased range random algorithm based on 
rejection sampling
      * https://lemire.me/blog/2016/06/30/fast-random-shuffling/
      */
     m = (uint32)val * range;
     if ((uint32)m < range)
     {
       uint32 t = -range % range;
       while ((uint32)m < t)
         m = (uint32)xoroshiro128ss(state) * range;
     }
     return m >> 32;
   }
   /* Apple's mask method */
   m = mask_u64(range-1);
   val &= m;
   while (val >= range)
     val = xoroshiro128ss(state) & m;
   return val;
}

Mask method could also be faster when range is close to mask.
For example, fast check for "range is within 1/1024 to mask" is
   range < (range/512 + (range&(range*2)))

And then method choose could like:
   if (likely(range < UINT32_MAX/32 && range > (range/512 + 
(range&(range*2)))))

But I don't know does additional condition worth difference or not.

[1] https://www.pcg-random.org/posts/bounded-rands.html
[2] https://lemire.me/blog/2016/06/30/fast-random-shuffling/

regards,
Sokolov Yura



pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Next
From: Ronan Dunklau
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates