Re: rand48 replacement - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: rand48 replacement
Date
Msg-id CAEZATCXcmccPQismOioD5kr2NMyMoeLpDaesizzoKh8M-BK5Dw@mail.gmail.com
Whole thread Raw
In response to Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Sun, 4 Jul 2021 at 10:35, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> I did not understand why it is not correct.
>

Well, to make it easier to visualise, let's imagine our word size is
just 3 bits instead of 64 bits, and that the basic prng() function
generates numbers in the range [0,8). Similarly, imagine a splitmix3()
that operates on 3-bit values. So it might do something like this
(state offset and return values made up):

splitmix3(state):
  state=0 -> 5, return 2
  state=1 -> 6, return 5
  state=2 -> 7, return 0
  state=3 -> 0, return 3
  state=4 -> 1, return 6
  state=5 -> 2, return 1
  state=6 -> 3, return 7
  state=7 -> 4, return 4

Now suppose we want a random number in the range [0,6). This is what
happens with your algorithm for each of the possible prng() return
values:

  prng() returns 0 -- OK
  prng() returns 1 -- OK
  prng() returns 2 -- OK
  prng() returns 3 -- OK
  prng() returns 4 -- OK
  prng() returns 5 -- OK
  prng() returns 6 -- out of range so use splitmix3() with initial state=6:
    state=6 -> 3, return 7 -- still out of range, so repeat
    state=3 -> 0, return 3 -- now OK
  prng() returns 7 -- out of range so use splitmix3() with initial state=7:
    state=7 -> 4, return 4 -- now OK

So, assuming that prng() chooses each of the 8 possible values with
equal probability (1/8), the overall result is that the values 0,1,2
and 5 are returned with a probability of 1/8, whereas 3 and 4 are
returned with a probability of 2/8.

Using the correct implementation of the bitmask algorithm, each
iteration calls prng() again, so in the end no particular return value
is ever more likely than any other (hence it's unbiased).

As for determinism, the end result is still fully deterministic. For
example, lets say that prng() returns the following sequence, for some
initial state:

  1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,...

then the bitmask method just returns that sequence with all the 6's
and 7's removed:

  1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,...

and that same sequence will always be returned, when starting from
that initial seed.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Reducing the cycle time for CLOBBER_CACHE_ALWAYS buildfarm members
Next
From: Michael Paquier
Date:
Subject: Re: Mention --enable-tap-tests in the TAP section page