Re: rand48 replacement - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: rand48 replacement |
Date | |
Msg-id | alpine.DEB.2.22.394.2107041718570.2359988@pseudo Whole thread Raw |
In response to | Re: rand48 replacement (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: rand48 replacement
|
List | pgsql-hackers |
> Now suppose we want a random number in the range [0,6). This is what > happens with your algorithm for each of the possible prng() return > values: > > prng() returns 0 -- OK > prng() returns 1 -- OK > prng() returns 2 -- OK > prng() returns 3 -- OK > prng() returns 4 -- OK > prng() returns 5 -- OK > prng() returns 6 -- out of range so use splitmix3() with initial state=6: > state=6 -> 3, return 7 -- still out of range, so repeat > state=3 -> 0, return 3 -- now OK > prng() returns 7 -- out of range so use splitmix3() with initial state=7: > state=7 -> 4, return 4 -- now OK > > So, assuming that prng() chooses each of the 8 possible values with > equal probability (1/8), the overall result is that the values 0,1,2 > and 5 are returned with a probability of 1/8, whereas 3 and 4 are > returned with a probability of 2/8. Ok, I got that explanation. > Using the correct implementation of the bitmask algorithm, each > iteration calls prng() again, so in the end no particular return value > is ever more likely than any other (hence it's unbiased). Ok, you're taking into account the number of states of the PRNG, so this finite number implies some bias on some values if you actually enumerate all possible cases, as you do above. I was reasoning "as if" the splitmix PRNG was an actual random function, which is obviously false, but is also somehow a usual (false) assumption with PRNGs, and with this false assumption my implementation is perfect:-) The defect of the modulo method for range extraction is that even with an actual (real) random generator the results would be biased. The bias is in the method itself. Now you are arguing for a bias linked to the internals of the PRNG. They are not the same in nature, even if the effect is the same. Also the bias is significant for close to the limit ranges, which is not the kind of use case I have in mind when looking at pgbench. If you consider the PRNG internals, then splitmix extraction function could also be taken into account. If it is not invertible (I'm unsure), then assuming it is some kind of hash function, about 1/e of output values would not reachable, which is yet another bias that you could argue against. Using the initial PRNG works better only because the underlying 128 bits state is much larger than the output value. Which is the point for having a larger state in the first place, anyway. > As for determinism, the end result is still fully deterministic. For > example, lets say that prng() returns the following sequence, for some > initial state: > > 1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,... > > then the bitmask method just returns that sequence with all the 6's > and 7's removed: > > 1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,... > > and that same sequence will always be returned, when starting from > that initial seed. Yes and no. The result is indeed deterministic of you call the function with the same range. However, if you change the range value in one place then sometimes the state can advance differently, so the determinism is lost, meaning that it depends on actual range values. Attached a v7 which does as you wish, but also looses the deterministic non-value dependent property I was seeking. I would work around that by deriving another 128 bit generator instead of splitmix 64 bit, but that is overkill. -- Fabien.
Attachment
pgsql-hackers by date: