Home > mailing lists

Re: rand48 replacement - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: rand48 replacement
Date	July 8, 2021 12:31:16
Msg-id	alpine.DEB.2.22.394.2107072036140.4160419@pseudo Whole thread
In response to	Re: rand48 replacement (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses	Re: rand48 replacement
List	pgsql-hackers

Tree view

Hello Yura,

>>> Given 99.99% cases will be in the likely case, branch predictor should
>>> eliminate decision cost.
>> 
>> Hmmm. ISTM that a branch predictor should predict that unknown < small
>> should probably be false, so a hint should be given that it is really
>> true.
>
> Why? Branch predictor is history based:

Hmmm. This means running the compiler with some special options, running 
the code on significant and representative data, then recompiling based on 
collected branch stats. This is not the usual way pg is built.

> if it were usually true here then it will be true this time either. 
> unknown < small is usually true therefore branch predictor will assume 
> it is true.
>
> I put `likely` for compiler: compiler then puts `likely` path closer.

Yes, an explicit hint is needed.

>>> And as Dean Rasheed correctly mentioned, mask method will have really bad 
>>> pattern for branch predictor if range is not just below or equal to power 
>>> of two.
>> 
>> On average the bitmask is the better unbiased method, if the online
>> figures are to be trusted. Also, as already said, I do not really want
>> to add code complexity, especially to get lower average performance,
>> and especially with code like "threshold = - range % range", where
>> both variables are unsigned, I have a headache just looking at it:-)
>
> If you mention https://www.pcg-random.org/posts/bounded-rands.html then

Indeed, this is the figures I was refering to when saying that bitmask 
looks the best method.

> 1. first graphs are made with not exact Lemire's code.
>  Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/

Ok, other figures, however there is no comparison with the mask method in 
this post, it mostly argues agains division/modulo.

> By the way, we have 64bit random. If we use 44bit from it for range <= 
> (1<<20), then bias will be less than 1/(2**24). Could we just ignore it 
> (given it is not crypto strong random)?

That was my initial opinion, by Dean insists on an unbiased method. I 
agree with Dean that performance, if it is not too bad, does not matter 
that much, so that I'm trying to keep the code simple as a main objective.

You do not seem ready to buy this argument. Note that despite that my 
research is about compiler optimizations, I did bought it:-)

Given the overheads involved in pgbench, the performance impact of best vs 
worst case scenario is minimal:

   \set i random(0, 7) -- 8 values, good for mask: 4.515 Mtps
   \set i random(0, 8) -- 9 values, bad for mask: 4.151 Mtps

sp the performance penalty is about 8%.

>  if ((range & (range-1) == 0) /* handle all power 2 cases */
>    return range != 0 ? val & (range-1) : 0;
>  if (likely(range < (1<<20)))
>     /*
>      * While multiply method is biased, bias will be smaller than 1/(1<<24) 
> for
>      * such small ranges. Lets ignore it.
>      */
>     return ((val >> 20) * range) >> 44;
>  /* Apple's mask method */
>  m = mask_u64(range-1);
>  val &= m;
>  while (val >= range)
>    val = xoroshiro128ss(state) & m;
>  return val;
> }

Hmmm. The code looks "simple" enough, but I do not like optimizing for 20 
bits values is worth it, especially as the bitmask method seems the best 
anyway. We were discussing 32 bits before.

> Anyway, excuse me for heating this discussion cause of such 
> non-essential issue.

Well, I like to discuss things!

> I'll try to control myself and don't proceed it further.

Yep. We have to compromise at some point. The majority opinion seems to be 
that we want code simplicity more, so the bitmask it is. I've posted a 
v10.

Thanks for the interesting discussion and arguments!

-- 
Fabien.

pgsql-hackers by date:

From: Fabien COELHO
Date: 08 July 2021, 12:19:38
Subject: Re: rand48 replacement

From: Filip Janus
Date: 08 July 2021, 12:33:33
Subject: SHA-1 FIPS - compliance

Re: rand48 replacement - Mailing list pgsql-hackers

Previous

Next