Thread: rand48 replacement
Hello pg-devs, I have given a go at proposing a replacement for rand48. POSIX 1988 (?) rand48 is a LCG PRNG designed to generate 32 bits integers or floats based on a 48 bits state on 16 or 32 bits architectures. LCG cycles on the low bits, which can be quite annoying. Given that we run on 64 bits architectures and that we need to generate 64 bits ints or doubles, IMHO it makes very little sense to stick to that. We should (probably) want: - one reasonable default PRNG for all pg internal uses. - NOT to invent a new design! - something fast, close to rand48 (which basically does 2 arithmetic ops, so it is hard to compete) no need for something cryptographic though, which would imply slow - to produce 64 bits integers & doubles with a 52 bits mantissa, so state size > 64 bits. - a small state though, because we might generate quite a few of them for different purposes so state size <= 256 or even <= 128 bits - the state to be aligned to whatever => 128 bits - 64 bits operations for efficiency on modern architectures, but not 128 bits operations. - not to depend on special hardware for speed (eg MMX/SSE/AES). - not something with obvious known and relevant defects. - not something with "rights" attached. These constraints reduce drastically the available options from https://en.wikipedia.org/wiki/List_of_random_number_generators The attached patch removes "rand48" and adds a "pg_prng" implementation based on xoroshiro128ss, and replaces it everywhere. In pgbench, the non portable double-relying code is replaced by hopefully portable ints. The interface makes it easy to replace the underlying PRNG if something else is desired. Thanks for your feedback. -- Fabien.
Attachment
Hi, On 5/24/21 12:31 PM, Fabien COELHO wrote: > > Hello pg-devs, > > I have given a go at proposing a replacement for rand48. > So what is the motivation for replacing rand48? Speed, quality of produced random numbers, features rand48 can't provide, or what? > POSIX 1988 (?) rand48 is a LCG PRNG designed to generate 32 bits > integers or floats based on a 48 bits state on 16 or 32 bits > architectures. LCG cycles on the low bits, which can be quite annoying. > Given that we run on 64 bits architectures and that we need to generate > 64 bits ints or doubles, IMHO it makes very little sense to stick to that. > > We should (probably) want: > - one reasonable default PRNG for all pg internal uses. > - NOT to invent a new design! > - something fast, close to rand48 (which basically does 2 arithmetic > ops, so it is hard to compete) > no need for something cryptographic though, which would imply slow > - to produce 64 bits integers & doubles with a 52 bits mantissa, > so state size > 64 bits. > - a small state though, because we might generate quite a few of them > for different purposes so state size <= 256 or even <= 128 bits > - the state to be aligned to whatever => 128 bits > - 64 bits operations for efficiency on modern architectures, > but not 128 bits operations. > - not to depend on special hardware for speed (eg MMX/SSE/AES). > - not something with obvious known and relevant defects. > - not something with "rights" attached. > > These constraints reduce drastically the available options from > https://en.wikipedia.org/wiki/List_of_random_number_generators > > The attached patch removes "rand48" and adds a "pg_prng" implementation > based on xoroshiro128ss, and replaces it everywhere. In pgbench, the non > portable double-relying code is replaced by hopefully portable ints. The > interface makes it easy to replace the underlying PRNG if something else > is desired. > xoroshiro seems reasonable. How does it compare to rand48? Does it need much less/more state, is it faster/slower, etc.? I'd expect that it produces better random sequence, considering rand48 is a LCG, which is fairly simple decades old design. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Tomas, >> I have given a go at proposing a replacement for rand48. > > So what is the motivation for replacing rand48? Speed, quality of produced > random numbers, features rand48 can't provide, or what? Speed can only be near rand48, see below. Quality (eg no trivial cycles, does not fail too quickly on statistical tests) and soundness (the point of generating 52-64 bits of data out of 48 bits, which means that only a small part of the target space is covered, fails me). Also, I like having a implementation independent interface (current rand48 tells the name of the algorithm everywhere and the "uint16[3]" state type is hardcoded in several places). >> POSIX 1988 (?) rand48 is a LCG PRNG designed to generate 32 bits integers >> or floats based on a 48 bits state on 16 or 32 bits architectures. LCG >> cycles on the low bits, which can be quite annoying. Given that we run on >> 64 bits architectures and that we need to generate 64 bits ints or doubles, >> IMHO it makes very little sense to stick to that. >> >> We should (probably) want: >> - one reasonable default PRNG for all pg internal uses. >> - NOT to invent a new design! >> - something fast, close to rand48 (which basically does 2 arithmetic >> ops, so it is hard to compete) >> no need for something cryptographic though, which would imply slow >> - to produce 64 bits integers & doubles with a 52 bits mantissa, >> so state size > 64 bits. >> - a small state though, because we might generate quite a few of them >> for different purposes so state size <= 256 or even <= 128 bits >> - the state to be aligned to whatever => 128 bits >> - 64 bits operations for efficiency on modern architectures, >> but not 128 bits operations. >> - not to depend on special hardware for speed (eg MMX/SSE/AES). >> - not something with obvious known and relevant defects. >> - not something with "rights" attached. >> >> These constraints reduce drastically the available options from >> https://en.wikipedia.org/wiki/List_of_random_number_generators >> >> The attached patch removes "rand48" and adds a "pg_prng" implementation >> based on xoroshiro128ss, and replaces it everywhere. In pgbench, the non >> portable double-relying code is replaced by hopefully portable ints. The >> interface makes it easy to replace the underlying PRNG if something else is >> desired. > > xoroshiro seems reasonable. How does it compare to rand48? Does it need much > less/more state, is it faster/slower, etc.? Basically any PRNG should be slower/comparable than rand48 because it only does 2 arithmetic ops, you cannot do much less when trying to steer bits. However because of the 16-bits unpacking/packing on 64 bits architecture there is some room for additional useful ops, so in the end from the end user the performance is only about 5% loweR. State is 16 bytes vs 6 bytes for rand48. This is ok for generating 8 bytes per round and is still quite small. > I'd expect that it produces better random sequence, considering rand48 > is a LCG, which is fairly simple decades old design. Yep, it does not cycle trivially on low bits compared to an LCG (eg odd -> even -> odd -> even -> ...), e.g. if you have the bad idea to do "% 2" on an LCG to extract a bool you just alternate. To summarize: - better software engineering - similar speed (slightly slower) - better statistical quality - quite small state - soundness -- Fabien.
> 24 мая 2021 г., в 15:31, Fabien COELHO <coelho@cri.ensmp.fr> написал(а): > > > - NOT to invent a new design! Radical version of this argument would be to use de-facto standard and ubiquitous MT19937. Though, I suspect, it's not optimal solution to the date. Best regards, Andrey Borodin.
Hi Fabien, > To summarize: > - better software engineering > - similar speed (slightly slower) > - better statistical quality > - quite small state > - soundness Personally, I think your patch is great. Speaking of the speed I believe we should consider the performance of the entire DBMS in typical scenarios, not the performance of the single procedure. I'm pretty sure in these terms the impact of your patch is neglectable now, and almost certainly beneficial in the long term because of better randomness. While reviewing your patch I noticed that you missed test_integerset.c. Here is an updated patch. -- Best regards, Aleksander Alekseev
Attachment
Hello Andrey, >> - NOT to invent a new design! > > Radical version of this argument would be to use de-facto standard and > ubiquitous MT19937. Indeed, I started considering this one for this reason, obviously. > Though, I suspect, it's not optimal solution to the date. "not optimal" does not do justice to the issues. The main one is the huge 2.5 KB state of MT19937 which makes it quite impractical for plenty of internal and temporary uses. In pgbench there are many PRNG needed for reproducibility (eg one global, 3 per thread, one per client) plus a temporary one internal to a function call (permute) which is expected to be reasonably fast, so should not start by initializing 2.5 KB of data. In postgres there are 2 permanent ones (sql random, C double random) plus some in a geqo and in sampling internal structures. So standard MT19937 is basically out of the equation. It also happens to fail some statistical tests and not be very fast. It has a insanely huge cycle, but pg does not need that, and probably nobody does. The only good point is that it is a standard, which IMO is not enough to fix the other issues. -- Fabien.
Hello Aleksander, >> - better software engineering >> - similar speed (slightly slower) >> - better statistical quality >> - quite small state >> - soundness > > Personally, I think your patch is great. Thanks for having a look! > Speaking of the speed I believe we should consider the performance of > the entire DBMS in typical scenarios, not the performance of the single > procedure. Sure. I tested a worst-case pgbench script with only "\set i random(1, 100000000)" on a loop, the slowdown was a few percents (IFAICR < 5%). > I'm pretty sure in these terms the impact of your patch is neglectable > now, and almost certainly beneficial in the long term because of better > randomness. > > While reviewing your patch I noticed that you missed test_integerset.c. > Here is an updated patch. Indeed. Thanks for the catch & the v2 patch! -- Fabien.
Sorry for a duplicate entry on CF web application
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Although the patch looks OK I would like to keep the status "Needs review" for now in case someone would like to join thediscussion.
> The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: tested, passed > > Although the patch looks OK I would like to keep the status "Needs review" for now in case someone would like to join thediscussion. Ok, fine with me. -- Fabien.
Although this patch is marked RFC, the cfbot shows it doesn't even compile on Windows. I think you missed updating Mkvcbuild.pm. regards, tom lane
> Although this patch is marked RFC, the cfbot shows it doesn't > even compile on Windows. I think you missed updating Mkvcbuild.pm. Indeed. Here is a blind attempt at fixing the build, I'll check later to see whether it works. It would help me if the cfbot results were integrated into the cf app. -- Fabien.
Attachment
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> Although this patch is marked RFC, the cfbot shows it doesn't >> even compile on Windows. I think you missed updating Mkvcbuild.pm. > Indeed. Here is a blind attempt at fixing the build, I'll check later to > see whether it works. It would help me if the cfbot results were > integrated into the cf app. Hmm, not there yet per cfbot, not sure why not. Anyway, after taking a very quick look at the patch itself, I've got just one main objection: I don't approve of putting this in port.h or src/port/. erand48.c is there because we envisioned it originally as an occasionally-used substitute for libc facilities. But this is most certainly not that, so it belongs in src/common/ instead. I'd also be inclined to invent a new single-purpose .h file for it. I see that you probably did that because random.c and srandom.c depend on it, but I wonder why we don't make an effort to flush those altogether. It's surely pretty confusing to newbies that what appears to be a call of the libc primitives is no such thing. regards, tom lane
On Thu, 1 Jul 2021 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Anyway, after taking a very quick look at the patch itself, I've > got just one main objection: I don't approve of putting this in > port.h or src/port/. I haven't looked at the patch in detail, but one thing I object to is the code to choose a random integer in an arbitrary range. Currently, this is done in pgbench by getrand(), which has its problems. However, this patch seems to be replacing that with a simple modulo operation, which is perhaps the worst possible way to do it. There's plenty of research out there on how to do it better -- see, for example, [1] for a nice summary. Also, I'd say that functions to choose random integers in an arbitrary range ought to be part of the common API, as they are in almost every language's random API. Regards, Dean [1] https://www.pcg-random.org/posts/bounded-rands.html
Hello Tom, >> Indeed. Here is a blind attempt at fixing the build, I'll check later to >> see whether it works. It would help me if the cfbot results were >> integrated into the cf app. > > Hmm, not there yet per cfbot, not sure why not. I'll investigate. > Anyway, after taking a very quick look at the patch itself, I've > got just one main objection: I don't approve of putting this in > port.h or src/port/. erand48.c is there because we envisioned it > originally as an occasionally-used substitute for libc facilities. > But this is most certainly not that, so it belongs in src/common/ > instead. Ok, this would make sense. > I'd also be inclined to invent a new single-purpose .h > file for it. Hmmm. Why not. > I see that you probably did that because random.c and srandom.c > depend on it, but I wonder why we don't make an effort to flush > those altogether. Ok for removing them. They are used in contrib where they can be replaced. I hope that extensions would not depend on that, though. > It's surely pretty confusing to newbies that what appears to be a call > of the libc primitives is no such thing. I do not understand your point. If people believe the current random() implementation to be *the* libc primitive, then my linux doc says "The random() function uses a nonlinear additive feedback random number generator employing a default table of size 31 long integers to return successive pseudo-random numbers in the range from 0 to RAND_MAX. The period of this random number generator is very large, approximately 16 * ((2^31) - 1).", which is pretty far from the rand48 implementation provided in port, so ISTM that the confusion is already there? -- Fabien.
Hello Dean, > I haven't looked at the patch in detail, but one thing I object to is > the code to choose a random integer in an arbitrary range. Thanks for bringing up this interesting question! > Currently, this is done in pgbench by getrand(), which has its > problems. Yes. That is one of the motivation for providing something hopefully better. > However, this patch seems to be replacing that with a simple > modulo operation, which is perhaps the worst possible way to do it. I did it knowing this issue. Here is why: The modulo operation is biased for large ranges close to the limit, sure. Also, the bias is somehow of the same magnitude as the FP multiplication approach used previously, so the "worst" has not changed much, it is really the same as before. I thought it is not such an issue because for typical uses we are unlikely to be in these conditions, so the one-operation no branching approach seemed like a good precision vs performance compromise: I'd expect the typical largest ranges to be well below 40 bits (eg a key in a pretty large table in pgbench), which makes the bias well under 1/2**24 and ISTM that I can live with that. With the initial 48 bits state, obviously the situation was not the same. > There's plenty of research out there on how to do it better -- see, > for example, [1] for a nice summary. Rejection methods include branches, thus may cost significantly more, as show by the performance figures in blog. Also, it somehow breaks the sequence determinism when using range, which I found quite annoying: ISTM desirable that when generating a number the state advances once, and just once. Also some methods have higher costs depending on the actual range, eg the bitmask approach: for range 129 the bitmask is 0xff and you have a nearly 50% probability of iterating once, nearly 25% of iterating twice, and so on… I like performance to be uniform, not to depend on actual values. Given these arguments I'd be inclined to keep the bias, but I'm open to more discussion. > Also, I'd say that functions to choose random integers in an arbitrary > range ought to be part of the common API, as they are in almost every > language's random API. That is a good point. -- Fabien.
On Thu, 1 Jul 2021 at 22:18, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > However, this patch seems to be replacing that with a simple > > modulo operation, which is perhaps the worst possible way to do it. > > The modulo operation is biased for large ranges close to the limit, sure. > Also, the bias is somehow of the same magnitude as the FP multiplication > approach used previously, so the "worst" has not changed much, it is > really the same as before. > It may be true that the bias is of the same magnitude as FP multiply, but it is not of the same nature. With FP multiply, the more-likely-to-be-chosen values are more-or-less evenly distributed across the range, whereas modulo concentrates them all at one end, making it more likely to bias test results. It's worth paying attention to how other languages/libraries implement this, and basically no one chooses the modulo method, which ought to raise alarm bells. Of the biased methods, it has the worst kind of bias and the worst performance. If a biased method is OK, then the biased integer multiply method seems to be the clear winner. This requires the high part of a 64x64-bit product, which is trivial if 128-bit integers are available, but would need a little more work otherwise. There's some code in common/d2s that might be suitable. Most other implementations tend to use an unbiased method though, and I think it's worth doing the same. It might be a bit slower, or even faster depending on implementation and platform, but in the context of the DB as a whole, I don't think a few extra cycles matters either way. The method recommended at the very end of that blog seems to be pretty good all round, but any of the other commonly used unbiased methods would probably be OK too. Regards, Dean
Hello Dean, > It may be true that the bias is of the same magnitude as FP multiply, > but it is not of the same nature. With FP multiply, the > more-likely-to-be-chosen values are more-or-less evenly distributed > across the range, whereas modulo concentrates them all at one end, > making it more likely to bias test results. Yes, that is true. > It's worth paying attention to how other languages/libraries implement > this, and basically no one chooses the modulo method, which ought to > raise alarm bells. Of the biased methods, it has the worst kind of > bias and the worst performance. Hmmm. That is not exactly how I interpreted the figures in the blog. > If a biased method is OK, then the biased integer multiply method > seems to be the clear winner. This requires the high part of a > 64x64-bit product, which is trivial if 128-bit integers are available, > but would need a little more work otherwise. There's some code in > common/d2s that might be suitable. And yes, modulo is expensive. If we allow 128 bits integers operations, I would not choose this RNPG in the first place, I'd take PCG with a 128 bits state. That does not change the discussion about bias, though. > Most other implementations tend to use an unbiased method though, and I > think it's worth doing the same. It might be a bit slower, or even > faster depending on implementation and platform, but in the context of > the DB as a whole, I don't think a few extra cycles matters either way. Ok ok ok, I surrender! > The method recommended at the very end of that blog seems to be pretty > good all round, but any of the other commonly used unbiased methods > would probably be OK too. That does not address my other issues with the proposed methods, in particular the fact that the generated sequence is less deterministic, but I think I have a simple way around that. I'm hesitating to skip to the the bitmask method, and give up performance uniformity. I'll try to come up with something over the week-end, and also address Tom's comments in passing. -- Fabien.
Hello Dean & Tom, Here is a v4, which: - moves the stuff to common and fully removes random/srandom (Tom) - includes a range generation function based on the bitmask method (Dean) but iterates with splitmix so that the state always advances once (Me) -- Fabien.
Attachment
> Here is a v4, which: > > - moves the stuff to common and fully removes random/srandom (Tom) > - includes a range generation function based on the bitmask method (Dean) > but iterates with splitmix so that the state always advances once (Me) And a v5 where an unused test file does also compile if we insist. -- Fabien.
Attachment
Fabien COELHO wrote 2021-07-03 11:45: > And a v5 where an unused test file does also compile if we insist. About patch: 1. PostgreSQL source uses `uint64` and `uint32`, but not `uint64_t`/`uint32_t` 2. I don't see why pg_prng_state could not be `typedef uint64 pg_prng_state[2];` 3. Then SamplerRandomState and pgbench RandomState could stay. Patch will be a lot shorter. I don't like mix of semantic refactoring and syntactic refactoring in the same patch. While I could agree with replacing `SamplerRandomState => pg_prng_state`, I'd rather see it in separate commit. And that separate commit could contain transition: `typedef uint64 pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;` 4. There is no need in ReservoirStateData->randstate_initialized. There could be macros/function: `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }` 5. Is there need for 128bit prng at all? At least 2*64bit. There are 2*32bit xoroshiro64 https://prng.di.unimi.it/xoroshiro64starstar.c And there is 4*32bit xoshiro128: https://prng.di.unimi.it/xoshiro128plusplus.c 32bit operations are faster on 32bit platforms. But 32bit platforms are quite rare in production this days. Therefore I don't have strong opinion on this. regards, Sokolov Yura.
Yura Sokolov <y.sokolov@postgrespro.ru> writes: > 2. I don't see why pg_prng_state could not be `typedef uint64 > pg_prng_state[2];` Please no. That sort of typedef behaves so weirdly that it's a foot-gun. regards, tom lane
Hello Yura, > 1. PostgreSQL source uses `uint64` and `uint32`, but not > `uint64_t`/`uint32_t` > 2. I don't see why pg_prng_state could not be `typedef uint64 > pg_prng_state[2];` It could, but I do not see that as desirable. From an API design point of view we want something clean and abstract, and for me a struct looks better for that. It would be a struct with an array insided, I think that the code is more readable by avoiding constant index accesses (s[0] vs s0), we do not need actual indexes. > 3. Then SamplerRandomState and pgbench RandomState could stay. > Patch will be a lot shorter. You mean "typedef pg_prng_state SamplerRandomState"? One point of the patch is to have "one" standard PRNG commonly used where one is needed, so I'd say we want the name to be used, hence the substitutions. Also, I have a thing against objects being named "Random" which are not random, which is highly misleading. A PRNG is purely deterministic. Removing misleading names is also a benefit. So If people want to keep the old name it can be done. But I see these name changes as desirable. > I don't like mix of semantic refactoring and syntactic refactoring in > the same patch. While I could agree with replacing `SamplerRandomState > => pg_prng_state`, I'd rather see it in separate commit. And that > separate commit could contain transition: `typedef uint64 > pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;` I would tend to agree on principle, but separating in two phases here looks pointless: why implementing a cleaner rand48 interface, which would then NOT be the rand48 standard, just to upgrade it to something else in the next commit? And the other path is as painfull and pointless. So I think that the new feature better comes with its associated refactoring which is an integral part of it. > 4. There is no need in ReservoirStateData->randstate_initialized. There could > be macros/function: > `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }` It would work for this peculiar implementation but not necessary for others that we may want to substitute later, as it would mean either breaking the interface or adding a boolean in the structure if there is no special unintialized state that can be detected, which would impact memory usage and alignment. So I think it is better to keep it that way, usually the user knows whether its structure has been initialized, and the special case for reservoir where the user does not seem to know about that can handle its own boolean without impacting the common API or the data structure. > 5. Is there need for 128 bit prng at all? This is a 64 bits PRNG with a 128 bit state. We are generating 64 bits values, so we want a 64 bit PRNG. A prng state must be larger than its generated value, so we need sizeof(state) > 64 bits, hence at least 128 bits if we add 128 bits memory alignement. > And there is 4*32bit xoshiro128: > https://prng.di.unimi.it/xoshiro128plusplus.c > 32bit operations are faster on 32bit platforms. > But 32bit platforms are quite rare in production this days. > Therefore I don't have strong opinion on this. I think that 99.9% hardware running postgres is 64 bits, so 64 bits is the right choice. -- Fabien.
> 1. PostgreSQL source uses `uint64` and `uint32`, but not > `uint64_t`/`uint32_t` Indeed you are right. Attached v6 does that as well. -- Fabien.
Attachment
On Sat, 3 Jul 2021 at 08:06, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Here is a v4, which: > > - moves the stuff to common and fully removes random/srandom (Tom) > - includes a range generation function based on the bitmask method (Dean) > but iterates with splitmix so that the state always advances once (Me) At the risk of repeating myself: do *not* invent your own scheme. The problem with iterating using splitmix is that splitmix is a simple shuffling function that takes a single input and returns a mutated output depending only on that input. So let's say for simplicity that you're generating numbers in the range [0,N) with N=2^64-n and n<2^63. Each of the n values in [N,2^64) that lie outside the range wanted are just mapped in a deterministic way back onto (at most) n values in the range [0,N), making those n values twice as likely to be chosen as the other N-n values. So what you've just invented is an algorithm with the complexity of the unbiased bitmask method, but basically the same bias as the biased integer multiply method. I don't understand why you object to advancing the state more than once. Doing so doesn't make the resulting sequence of numbers any less deterministic. In fact, I'm pretty sure you *have to* advance the state more than once in *any* unbiased scheme. That's a common characteristic of all the unbiased methods I've seen, and intuitively, I think it has to be so. Otherwise, I'm happy with the use of the bitmask method, as long as it's implemented correctly. Regards, Dean
Hello Dean, >> - moves the stuff to common and fully removes random/srandom (Tom) >> - includes a range generation function based on the bitmask method (Dean) >> but iterates with splitmix so that the state always advances once (Me) > > At the risk of repeating myself: do *not* invent your own scheme. > The problem with iterating using splitmix is that splitmix is a simple > shuffling function that takes a single input and returns a mutated > output depending only on that input. It also iterates over its 64 bits state in a round robin fashion so that the cycle size is 2^64 (it is a simple add). > So let's say for simplicity that you're generating numbers in the range > [0,N) with N=2^64-n and n<2^63. Each of the n values in [N,2^64) that > lie outside the range wanted are just mapped in a deterministic way back > onto (at most) n values in the range [0,N), making those n values twice > as likely to be chosen as the other N-n values. I do understand your point. If the value is outside the range, splitmix iterates over its seed and the extraction functions produces a new number which is tested again. I do not understand the "mapped back onto" part, the out of range value is just discarded and the loops starts over with a new derivation, and why it would imply that some values are more likely to come out. > So what you've just invented is an algorithm with the complexity of the > unbiased bitmask method, That is what I am trying to implement. > but basically the same bias as the biased integer multiply method. I did not understand. > I don't understand why you object to advancing the state more than > once. Doing so doesn't make the resulting sequence of numbers any less > deterministic. It does, somehow, hence my struggle to try to avoid it. call seed(0xdeadbeef); x1 = somepseudorand(); x2 = somepseudorand(); x3 = somepsuedorand(); I think we should want x3 to be the same result whatever the previous calls to the API. > In fact, I'm pretty sure you *have to* advance the state more than > once in *any* unbiased scheme. That's a common characteristic of all > the unbiased methods I've seen, and intuitively, I think it has to be > so. Yes and no. We can advance another state seeded by the root prng. > Otherwise, I'm happy with the use of the bitmask method, as long as > it's implemented correctly. I did not understand why it is not correct. -- Fabien.
On Sun, 4 Jul 2021 at 10:35, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > I did not understand why it is not correct. > Well, to make it easier to visualise, let's imagine our word size is just 3 bits instead of 64 bits, and that the basic prng() function generates numbers in the range [0,8). Similarly, imagine a splitmix3() that operates on 3-bit values. So it might do something like this (state offset and return values made up): splitmix3(state): state=0 -> 5, return 2 state=1 -> 6, return 5 state=2 -> 7, return 0 state=3 -> 0, return 3 state=4 -> 1, return 6 state=5 -> 2, return 1 state=6 -> 3, return 7 state=7 -> 4, return 4 Now suppose we want a random number in the range [0,6). This is what happens with your algorithm for each of the possible prng() return values: prng() returns 0 -- OK prng() returns 1 -- OK prng() returns 2 -- OK prng() returns 3 -- OK prng() returns 4 -- OK prng() returns 5 -- OK prng() returns 6 -- out of range so use splitmix3() with initial state=6: state=6 -> 3, return 7 -- still out of range, so repeat state=3 -> 0, return 3 -- now OK prng() returns 7 -- out of range so use splitmix3() with initial state=7: state=7 -> 4, return 4 -- now OK So, assuming that prng() chooses each of the 8 possible values with equal probability (1/8), the overall result is that the values 0,1,2 and 5 are returned with a probability of 1/8, whereas 3 and 4 are returned with a probability of 2/8. Using the correct implementation of the bitmask algorithm, each iteration calls prng() again, so in the end no particular return value is ever more likely than any other (hence it's unbiased). As for determinism, the end result is still fully deterministic. For example, lets say that prng() returns the following sequence, for some initial state: 1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,... then the bitmask method just returns that sequence with all the 6's and 7's removed: 1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,... and that same sequence will always be returned, when starting from that initial seed. Regards, Dean
> Now suppose we want a random number in the range [0,6). This is what > happens with your algorithm for each of the possible prng() return > values: > > prng() returns 0 -- OK > prng() returns 1 -- OK > prng() returns 2 -- OK > prng() returns 3 -- OK > prng() returns 4 -- OK > prng() returns 5 -- OK > prng() returns 6 -- out of range so use splitmix3() with initial state=6: > state=6 -> 3, return 7 -- still out of range, so repeat > state=3 -> 0, return 3 -- now OK > prng() returns 7 -- out of range so use splitmix3() with initial state=7: > state=7 -> 4, return 4 -- now OK > > So, assuming that prng() chooses each of the 8 possible values with > equal probability (1/8), the overall result is that the values 0,1,2 > and 5 are returned with a probability of 1/8, whereas 3 and 4 are > returned with a probability of 2/8. Ok, I got that explanation. > Using the correct implementation of the bitmask algorithm, each > iteration calls prng() again, so in the end no particular return value > is ever more likely than any other (hence it's unbiased). Ok, you're taking into account the number of states of the PRNG, so this finite number implies some bias on some values if you actually enumerate all possible cases, as you do above. I was reasoning "as if" the splitmix PRNG was an actual random function, which is obviously false, but is also somehow a usual (false) assumption with PRNGs, and with this false assumption my implementation is perfect:-) The defect of the modulo method for range extraction is that even with an actual (real) random generator the results would be biased. The bias is in the method itself. Now you are arguing for a bias linked to the internals of the PRNG. They are not the same in nature, even if the effect is the same. Also the bias is significant for close to the limit ranges, which is not the kind of use case I have in mind when looking at pgbench. If you consider the PRNG internals, then splitmix extraction function could also be taken into account. If it is not invertible (I'm unsure), then assuming it is some kind of hash function, about 1/e of output values would not reachable, which is yet another bias that you could argue against. Using the initial PRNG works better only because the underlying 128 bits state is much larger than the output value. Which is the point for having a larger state in the first place, anyway. > As for determinism, the end result is still fully deterministic. For > example, lets say that prng() returns the following sequence, for some > initial state: > > 1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,... > > then the bitmask method just returns that sequence with all the 6's > and 7's removed: > > 1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,... > > and that same sequence will always be returned, when starting from > that initial seed. Yes and no. The result is indeed deterministic of you call the function with the same range. However, if you change the range value in one place then sometimes the state can advance differently, so the determinism is lost, meaning that it depends on actual range values. Attached a v7 which does as you wish, but also looses the deterministic non-value dependent property I was seeking. I would work around that by deriving another 128 bit generator instead of splitmix 64 bit, but that is overkill. -- Fabien.
Attachment
On Sun, 4 Jul 2021 at 17:03, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > As for determinism, the end result is still fully deterministic. > > The result is indeed deterministic of you call the function with the same > range. However, if you change the range value in one place then sometimes > the state can advance differently, so the determinism is lost, meaning > that it depends on actual range values. Ah yes, that's true. I can trivially reproduce that in other languages too. For example, in python, if I call random.seed(0) and then random.randrange() with the inputs 10,10,10 then the results are 6,6,0. But if the randrange() inputs are 10,1000,10 then the results are 6,776,6. So the result from the 3rd call changes as a result of changing the 2nd input. That's not entirely surprising. The important property of determinism is that if I set a seed, and then make an identical set of calls to the random API, the results will be identical every time, so that it's possible to write tests with predictable/repeatable results. > I would work around that by > deriving another 128 bit generator instead of splitmix 64 bit, but that is > overkill. Not really relevant now, but I'm pretty sure that's impossible to do. You might try it as an interesting academic exercise, but I believe it's a logical impossibility. Regards, Dean
> The important property of determinism is that if I set a seed, and then > make an identical set of calls to the random API, the results will be > identical every time, so that it's possible to write tests with > predictable/repeatable results. Hmmm… I like my stronger determinism definition more than this one:-) >> I would work around that by deriving another 128 bit generator instead >> of splitmix 64 bit, but that is overkill. > > Not really relevant now, but I'm pretty sure that's impossible to do. > You might try it as an interesting academic exercise, but I believe > it's a logical impossibility. Hmmm… I was simply thinking of seeding a new pg_prng_state from the main pg_prng_state with some transformation, and then iterate over the second PRNG, pretty much like I did with splitmix, but with 128 bits so that your #states argument does not apply, i.e. something like: /* select in a range with bitmask rejection */ uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range) { /* always advance state once */ uint64 next = xoroshiro128ss(state); uint64 val; if (range >= 2) { uint64 mask = mask_u64(range-1); val = next & mask; if (val >= range) { /* copy and update current prng state */ pg_prng_state iterator = *state; iterator.s0 ^= next; iterator.s1 += UINT64CONST(0x9E3779B97f4A7C15); /* iterate till val in [0, range) */ while ((val = xoroshiro128ss(&iterator) & mask) >= range) ; } } else val = 0; return val; } The initial pseudo-random sequence is left to proceed, and a new PRNG is basically forked for iterating on the mask, if needed. -- Fabien.
Fabien COELHO писал 2021-07-04 23:29: >> The important property of determinism is that if I set a seed, and >> then make an identical set of calls to the random API, the results >> will be identical every time, so that it's possible to write tests >> with predictable/repeatable results. > > Hmmm… I like my stronger determinism definition more than this one:-) > >>> I would work around that by deriving another 128 bit generator >>> instead of splitmix 64 bit, but that is overkill. >> >> Not really relevant now, but I'm pretty sure that's impossible to do. >> You might try it as an interesting academic exercise, but I believe >> it's a logical impossibility. > > Hmmm… I was simply thinking of seeding a new pg_prng_state from the > main pg_prng_state with some transformation, and then iterate over the > second PRNG, pretty much like I did with splitmix, but with 128 bits > so that your #states argument does not apply, i.e. something like: > > /* select in a range with bitmask rejection */ > uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range) > { > /* always advance state once */ > uint64 next = xoroshiro128ss(state); > uint64 val; > > if (range >= 2) > { > uint64 mask = mask_u64(range-1); > > val = next & mask; > > if (val >= range) > { > /* copy and update current prng state */ > pg_prng_state iterator = *state; > > iterator.s0 ^= next; > iterator.s1 += UINT64CONST(0x9E3779B97f4A7C15); > > /* iterate till val in [0, range) */ > while ((val = xoroshiro128ss(&iterator) & mask) >= range) > ; > } > } > else > val = 0; > > return val; > } > > The initial pseudo-random sequence is left to proceed, and a new PRNG > is basically forked for iterating on the mask, if needed. I believe most "range" values are small, much smaller than UINT32_MAX. In this case, according to [1] fastest method is Lemire's one (I'd take original version from [2]) Therefore combined method pg_prng_u64_range could branch on range value uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range) { uint64 val = xoroshiro128ss(state); uint64 m; if ((range & (range-1) == 0) /* handle all power 2 cases */ return range != 0 ? val & (range-1) : 0; if (likely(range < PG_UINT32_MAX/32) { /* * Daniel Lamire's unbiased range random algorithm based on rejection sampling * https://lemire.me/blog/2016/06/30/fast-random-shuffling/ */ m = (uint32)val * range; if ((uint32)m < range) { uint32 t = -range % range; while ((uint32)m < t) m = (uint32)xoroshiro128ss(state) * range; } return m >> 32; } /* Apple's mask method */ m = mask_u64(range-1); val &= m; while (val >= range) val = xoroshiro128ss(state) & m; return val; } Mask method could also be faster when range is close to mask. For example, fast check for "range is within 1/1024 to mask" is range < (range/512 + (range&(range*2))) And then method choose could like: if (likely(range < UINT32_MAX/32 && range > (range/512 + (range&(range*2))))) But I don't know does additional condition worth difference or not. [1] https://www.pcg-random.org/posts/bounded-rands.html [2] https://lemire.me/blog/2016/06/30/fast-random-shuffling/ regards, Sokolov Yura
Hello Yura, > I believe most "range" values are small, much smaller than UINT32_MAX. > In this case, according to [1] fastest method is Lemire's one (I'd take > original version from [2]) [...] Yep. I share your point that the range is more often 32 bits. However, I'm not enthousiastic at combining two methods depending on the range, the function looks complex enough without that, so I would suggest not to take this option. Also, the decision process adds to the average cost, which is undesirable. I would certainly select the unbias multiply method if we want a u32 range variant. -- Fabien.
Fabien COELHO писал 2021-07-06 09:13: > Hello Yura, > >> I believe most "range" values are small, much smaller than UINT32_MAX. >> In this case, according to [1] fastest method is Lemire's one (I'd >> take >> original version from [2]) [...] > > Yep. > > I share your point that the range is more often 32 bits. > > However, I'm not enthousiastic at combining two methods depending on > the range, the function looks complex enough without that, so I would > suggest not to take this option. Also, the decision process adds to > the average cost, which is undesirable. Given 99.99% cases will be in the likely case, branch predictor should eliminate decision cost. And as Dean Rasheed correctly mentioned, mask method will have really bad pattern for branch predictor if range is not just below or equal to power of two. For example, rand_range(10000) will have 60% probability to pass through `while (val > range)` and 40% probability to go to next loop iteration. rand_range(100000) will have 76%/24% probabilities. Branch predictor doesn't like it. Even rand_range(1000000), which is quite close to 2^20, will have 95%/5%, and still not enough to please BP. But with unbias multiply method it will be 0.0002%/99.9998% for 10000, 0,002%/99.998% for 100000 and 0.02%/99.98% for 1000000 - much-much better. Branch predictor will make it almost free (i hope). And __builtin_clzl is not free lunch either, it has latency 3-4 cycles on modern processor. Well, probably it could run in parallel with some part of xoroshiro, but it depends on how compiler will optimize this function. > I would certainly select the unbias multiply method if we want a u32 > range variant. There could be two functions. regards, Sokolov Yura.
Hello Yura, >> However, I'm not enthousiastic at combining two methods depending on >> the range, the function looks complex enough without that, so I would >> suggest not to take this option. Also, the decision process adds to >> the average cost, which is undesirable. > > Given 99.99% cases will be in the likely case, branch predictor should > eliminate decision cost. Hmmm. ISTM that a branch predictor should predict that unknown < small should probably be false, so a hint should be given that it is really true. > And as Dean Rasheed correctly mentioned, mask method will have really > bad pattern for branch predictor if range is not just below or equal to > power of two. On average the bitmask is the better unbiased method, if the online figures are to be trusted. Also, as already said, I do not really want to add code complexity, especially to get lower average performance, and especially with code like "threshold = - range % range", where both variables are unsigned, I have a headache just looking at it:-) > And __builtin_clzl is not free lunch either, it has latency 3-4 cycles > on modern processor. Well, % is not cheap either. > Well, probably it could run in parallel with some part of xoroshiro, but > it depends on how compiler will optimize this function. > >> I would certainly select the unbias multiply method if we want a u32 >> range variant. > > There could be two functions. Yep, but do we need them? Who is likely to want 32 bits pseudo random ints in a range? pgbench needs 64 bits. So I'm still inclined to just keep the faster-on-average bitmask method, despite that it may be slower for some ranges. The average cost for the worst case in PRNG calls is, if I'm not mistaken: 1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2 which does not look too bad. -- Fabien.
Fabien COELHO писал 2021-07-06 23:49: > Hello Yura, > >>> However, I'm not enthousiastic at combining two methods depending on >>> the range, the function looks complex enough without that, so I would >>> suggest not to take this option. Also, the decision process adds to >>> the average cost, which is undesirable. >> >> Given 99.99% cases will be in the likely case, branch predictor should >> eliminate decision cost. > > Hmmm. ISTM that a branch predictor should predict that unknown < small > should probably be false, so a hint should be given that it is really > true. Why? Branch predictor is history based: if it were usually true here then it will be true this time either. unknown < small is usually true therefore branch predictor will assume it is true. I put `likely` for compiler: compiler then puts `likely` path closer. > >> And as Dean Rasheed correctly mentioned, mask method will have really >> bad pattern for branch predictor if range is not just below or equal >> to power of two. > > On average the bitmask is the better unbiased method, if the online > figures are to be trusted. Also, as already said, I do not really want > to add code complexity, especially to get lower average performance, > and especially with code like "threshold = - range % range", where > both variables are unsigned, I have a headache just looking at it:-) If you mention https://www.pcg-random.org/posts/bounded-rands.html then 1. first graphs are made with not exact Lemire's code. Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/ (which I derived from) performs modulo operation only if `(leftover < range)`. Even with `rand_range(1000000)` it is just once in four thousands runs. 2. You can see "Small-Constant Benchmark" at that page, Debiased Int is 1.5 times faster. And even in "Small-Shuffle" benchmark their unoptimized version is on-par with mask method. 3. If you go to "Making Improvements/Faster Threshold-Based Discarding" section, then you'll see code my version is matched with. It is twice faster than masked method in Small-Shuffle benchmark, and just a bit slower in Large-Shuffle. > >> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles >> on modern processor. > > Well, % is not cheap either. > >> Well, probably it could run in parallel with some part of xoroshiro, >> but it depends on how compiler will optimize this function. >> >>> I would certainly select the unbias multiply method if we want a u32 >>> range variant. >> >> There could be two functions. > > Yep, but do we need them? Who is likely to want 32 bits pseudo random > ints in a range? pgbench needs 64 bits. > > So I'm still inclined to just keep the faster-on-average bitmask > method, despite that it may be slower for some ranges. The average > cost for the worst case in PRNG calls is, if I'm not mistaken: > > 1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2 > > which does not look too bad. You doesn't count cost of branch-misprediction. https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/ Therefore calculation should be at least: 1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6 By the way, we have 64bit random. If we use 44bit from it for range <= (1<<20), then bias will be less than 1/(2**24). Could we just ignore it (given it is not crypto strong random)? uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range) { uint64 val = xoroshiro128ss(state); uint64 m; if ((range & (range-1) == 0) /* handle all power 2 cases */ return range != 0 ? val & (range-1) : 0; if (likely(range < (1<<20))) /* * While multiply method is biased, bias will be smaller than 1/(1<<24) for * such small ranges. Lets ignore it. */ return ((val >> 20) * range) >> 44; /* Apple's mask method */ m = mask_u64(range-1); val &= m; while (val >= range) val = xoroshiro128ss(state) & m; return val; } Anyway, excuse me for heating this discussion cause of such non-essential issue. I'll try to control myself and don't proceed it further. regards, Sokolov Yura.
On Wed, 7 Jul 2021 at 04:00, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > > Anyway, excuse me for heating this discussion cause of such > non-essential issue. > I'll try to control myself and don't proceed it further. > Whilst it has been interesting learning and discussing all these different techniques, I think it's probably best to stick with the bitmask method, rather than making the code too complex and difficult to follow. The bitmask method has the advantage of being very simple, easy to understand and fast (fastest in many of the benchmarks, and close enough in others to make me think that the difference won't matter for our purposes). To test the current patch, I hacked up a simple SQL-callable server function: random(bigint, bigint) returns bigint, similar to the one in pgbench. After doing so, I couldn't help thinking that it would be useful to have such a function in core, so maybe that could be a follow-on patch. Anyway, that led to the following observations: Firstly, there's a bug in the existing mask_u64() code -- if pg_leftmost_one_pos64(u) returns 63, you end up with a mask equal to 0, and it breaks down. Secondly, I think it would be simpler to implement this as a bitshift, rather than a bitmask, using the high bits from the random number. That might not make much difference for xoroshiro**, but in general, PRNGs tend to be weaker in the lower bits, so it seems preferable on that basis. But also, it makes the code simpler and less error-prone. Finally, I think it would be better to treat the upper bound of the range as inclusive. Doing so makes the function able to cover all possible 64-bit ranges. It would then be easy (perhaps in another follow-on patch) to make the pgbench random() function work for all 64-bit bounds (as long as max >= min), without the weird overflow checking it currently has. Putting those 3 things together, the code (minus comments) becomes: if (range > 0) { int rshift = 63 - pg_leftmost_one_pos64(range); do { val = xoroshiro128ss(state) >> rshift; } while (val > range); } else val = 0; which reduces the complexity a bit. Regards, Dean
Hello Dean, > Whilst it has been interesting learning and discussing all these > different techniques, I think it's probably best to stick with the > bitmask method, rather than making the code too complex and difficult > to follow. Yes. > The bitmask method has the advantage of being very simple, easy to > understand and fast (fastest in many of the benchmarks, and close enough > in others to make me think that the difference won't matter for our > purposes). > > To test the current patch, I hacked up a simple SQL-callable server > function: random(bigint, bigint) returns bigint, similar to the one in > pgbench. After doing so, I couldn't help thinking that it would be > useful to have such a function in core, so maybe that could be a > follow-on patch. Yep. > Anyway, that led to the following observations: > > Firstly, there's a bug in the existing mask_u64() code -- if > pg_leftmost_one_pos64(u) returns 63, you end up with a mask equal to > 0, and it breaks down. Oops:-( > Secondly, I think it would be simpler to implement this as a bitshift, > rather than a bitmask, using the high bits from the random number. That > might not make much difference for xoroshiro**, but in general, PRNGs > tend to be weaker in the lower bits, so it seems preferable on that > basis. But also, it makes the code simpler and less error-prone. Indeed, that looks like a good option. > Finally, I think it would be better to treat the upper bound of the > range as inclusive. This bothered me as well, but the usual approach seems to use range as the number of values, so I was hesitant to depart from that. I'm still hesitant to go that way. > Doing so makes the function able to cover all > possible 64-bit ranges. It would then be easy (perhaps in another > follow-on patch) to make the pgbench random() function work for all > 64-bit bounds (as long as max >= min), without the weird overflow > checking it currently has. > > Putting those 3 things together, the code (minus comments) becomes: > > if (range > 0) > { > int rshift = 63 - pg_leftmost_one_pos64(range); > > do > { > val = xoroshiro128ss(state) >> rshift; > } > while (val > range); > } > else > val = 0; > > which reduces the complexity a bit. Indeed. Attached v9 follows this approach but for the range being inclusive, as most sources I found understand the range as the number of values. -- Fabien.
Attachment
On Thu, 8 Jul 2021 at 09:26, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Finally, I think it would be better to treat the upper bound of the > > range as inclusive. > > This bothered me as well, but the usual approach seems to use range as the > number of values, so I was hesitant to depart from that. I'm still > hesitant to go that way. > Yeah, that bothered me too. For example, java.util.Random.nextInt(bound) returns a value in the range [0,bound). But other implementations are not all like that. For example python's random.randint(a,b) returns a value in the range [a,b]. Python also has random.randrange(start,stop[,step]), which is designed for compatibility with their range(start,stop[,step]) function, which treats "stop" as exclusive. However, Postgres tends to go the other way, and treat the upper bound as inclusive, as in, for example, generate_series() and pgbench's random() function. I think it makes more sense to do it that way, because then such functions can work all the way up to and including the limit of the bound's datatype, which makes them more general. Regards, Dean
>>> Finally, I think it would be better to treat the upper bound of the >>> range as inclusive. >> >> This bothered me as well, but the usual approach seems to use range as the >> number of values, so I was hesitant to depart from that. I'm still >> hesitant to go that way. > > Yeah, that bothered me too. > > For example, java.util.Random.nextInt(bound) returns a value in the > range [0,bound). > > But other implementations are not all like that. For example python's > random.randint(a,b) returns a value in the range [a,b]. > > Python also has random.randrange(start,stop[,step]), which is designed > for compatibility with their range(start,stop[,step]) function, which > treats "stop" as exclusive. > > However, Postgres tends to go the other way, and treat the upper bound > as inclusive, as in, for example, generate_series() and pgbench's > random() function. > > I think it makes more sense to do it that way, because then such > functions can work all the way up to and including the limit of the > bound's datatype, which makes them more general. Yep. Still, with one argument: - C#: Random Next is exclusive - Go: rand Intn is exclusive - Rust: rand gen_range is exclusive - Erlang: rand uniform is inclusive, BUT values start from 1 The rule seems to be: one parameter is usually the number of values, thus is exclusive, 2 parameters describes the range, this is inclusive. Attached a v10 which is some kind of compromise where the interface uses inclusive min and max bounds, so that all values can be reached. -- Fabien.
Attachment
Hello Yura, >>> Given 99.99% cases will be in the likely case, branch predictor should >>> eliminate decision cost. >> >> Hmmm. ISTM that a branch predictor should predict that unknown < small >> should probably be false, so a hint should be given that it is really >> true. > > Why? Branch predictor is history based: Hmmm. This means running the compiler with some special options, running the code on significant and representative data, then recompiling based on collected branch stats. This is not the usual way pg is built. > if it were usually true here then it will be true this time either. > unknown < small is usually true therefore branch predictor will assume > it is true. > > I put `likely` for compiler: compiler then puts `likely` path closer. Yes, an explicit hint is needed. >>> And as Dean Rasheed correctly mentioned, mask method will have really bad >>> pattern for branch predictor if range is not just below or equal to power >>> of two. >> >> On average the bitmask is the better unbiased method, if the online >> figures are to be trusted. Also, as already said, I do not really want >> to add code complexity, especially to get lower average performance, >> and especially with code like "threshold = - range % range", where >> both variables are unsigned, I have a headache just looking at it:-) > > If you mention https://www.pcg-random.org/posts/bounded-rands.html then Indeed, this is the figures I was refering to when saying that bitmask looks the best method. > 1. first graphs are made with not exact Lemire's code. > Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/ Ok, other figures, however there is no comparison with the mask method in this post, it mostly argues agains division/modulo. > By the way, we have 64bit random. If we use 44bit from it for range <= > (1<<20), then bias will be less than 1/(2**24). Could we just ignore it > (given it is not crypto strong random)? That was my initial opinion, by Dean insists on an unbiased method. I agree with Dean that performance, if it is not too bad, does not matter that much, so that I'm trying to keep the code simple as a main objective. You do not seem ready to buy this argument. Note that despite that my research is about compiler optimizations, I did bought it:-) Given the overheads involved in pgbench, the performance impact of best vs worst case scenario is minimal: \set i random(0, 7) -- 8 values, good for mask: 4.515 Mtps \set i random(0, 8) -- 9 values, bad for mask: 4.151 Mtps sp the performance penalty is about 8%. > if ((range & (range-1) == 0) /* handle all power 2 cases */ > return range != 0 ? val & (range-1) : 0; > if (likely(range < (1<<20))) > /* > * While multiply method is biased, bias will be smaller than 1/(1<<24) > for > * such small ranges. Lets ignore it. > */ > return ((val >> 20) * range) >> 44; > /* Apple's mask method */ > m = mask_u64(range-1); > val &= m; > while (val >= range) > val = xoroshiro128ss(state) & m; > return val; > } Hmmm. The code looks "simple" enough, but I do not like optimizing for 20 bits values is worth it, especially as the bitmask method seems the best anyway. We were discussing 32 bits before. > Anyway, excuse me for heating this discussion cause of such > non-essential issue. Well, I like to discuss things! > I'll try to control myself and don't proceed it further. Yep. We have to compromise at some point. The majority opinion seems to be that we want code simplicity more, so the bitmask it is. I've posted a v10. Thanks for the interesting discussion and arguments! -- Fabien.
Hi Fabien,
> Attached a v10 which is some kind of compromise where the interface uses
> inclusive min and max bounds, so that all values can be reached.
Just wanted to let you know that cfbot [1] doesn't seem to be happy with the patch. Apparently, some tests are falling. To be honest, I didn't invest too much time into investigating this. Hopefully, it's not a big deal.
--
Best regards,
Aleksander Alekseev
> Attached a v10 which is some kind of compromise where the interface uses
> inclusive min and max bounds, so that all values can be reached.
Just wanted to let you know that cfbot [1] doesn't seem to be happy with the patch. Apparently, some tests are falling. To be honest, I didn't invest too much time into investigating this. Hopefully, it's not a big deal.
Best regards,
Aleksander Alekseev
Hello Aleksander, >> Attached a v10 which is some kind of compromise where the interface uses >> inclusive min and max bounds, so that all values can be reached. > > Just wanted to let you know that cfbot [1] doesn't seem to be happy with > the patch. Apparently, some tests are falling. To be honest, I didn't > invest too much time into investigating this. Hopefully, it's not a big > deal. > > [1]: http://cfbot.cputube.org/ Indeed. I wish that these results would be available from the cf interface. Attached a v11 which might improve things. Thanks for the ping! -- Fabien.
Attachment
Hello again, >> Just wanted to let you know that cfbot [1] doesn't seem to be happy with >> the patch. Apparently, some tests are falling. To be honest, I didn't >> invest too much time into investigating this. Hopefully, it's not a big >> deal. >> >> [1]: http://cfbot.cputube.org/ > > Indeed. I wish that these results would be available from the cf interface. > > Attached a v11 which might improve things. Not enough. Here is a v12 which might improve things further. -- Fabien.
Attachment
<resent because of list filter> >>> [1]: http://cfbot.cputube.org/ >> >> Indeed. I wish that these results would be available from the cf interface. >> >> Attached a v11 which might improve things. > > Not enough. Here is a v12 which might improve things further. Not enough. Here is a v13 which might improve things further more. -- Fabien.
Attachment
>>>> [1]: http://cfbot.cputube.org/ >>> >>> Indeed. I wish that these results would be available from the cf >>> interface. >>> >>> Attached a v11 which might improve things. >> >> Not enough. Here is a v12 which might improve things further. > > Not enough. Here is a v13 which might improve things further more. Not enough. Here is a v14 which might improve things further more again. Sorry for this noise due to blind windows tests. -- Fabien.
Attachment
Fabien COELHO <coelho@cri.ensmp.fr> writes: > Not enough. Here is a v14 which might improve things further more again. > Sorry for this noise due to blind windows tests. Just FTR, I strongly object to your removal of process-startup srandom() calls. Those are not only setting the seed for our own use, but also ensuring that things like random() calls within PL functions or other libraries aren't 100% predictable. regards, tom lane
Hello Tom, > Just FTR, I strongly object to your removal of process-startup srandom() > calls. Ok. The point of the patch is to replace and unify the postgres underlying PRNG, so there was some logic behind this removal. > Those are not only setting the seed for our own use, but also ensuring > that things like random() calls within PL functions or other libraries > aren't 100% predictable. Sure, they shouldn't be predictable. Attached v15 also does call srandom if it is there, and fixes yet another remaining random call. -- Fabien.
Attachment
>> Just FTR, I strongly object to your removal of process-startup srandom() >> calls. > > Ok. The point of the patch is to replace and unify the postgres underlying > PRNG, so there was some logic behind this removal. FTR, this was triggered by your comment on Jul 1: >> [...] I see that you probably did that because random.c and srandom.c >> depend on it, but I wonder why we don't make an effort to flush those >> altogether. It's surely pretty confusing to newbies that what appears >> to be a call of the libc primitives is no such thing. I understood "flushing s?random.c" as that it would be a good thing to remove their definitions, hence their calls, whereas in the initial patch I provided a replacement for srandom & random. -- Fabien.
> Attached v15 also does call srandom if it is there, and fixes yet another > remaining random call. I think that I have now removed all references to "random" from pg source. However, the test still fails on windows, because the linker does not find a global variable when compiling extensions, but it seems to find the functions defined in the very same file... Link: 4130 C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\link.exe /ERRORREPORT:QUEUE /OUT:".\Release\tablefunc\tablefunc.dll"/INCREMENTAL:NO /NOLOGO Release/postgres/postgres.lib kernel32.lib user32.lib gdi32.libwinspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /NODEFAULTLIB:libc/DEF:"./Release/tablefunc/tablefunc.def" /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed/DEBUG /PDB:".\Release\tablefunc\tablefunc.pdb" /SUBSYSTEM:CONSOLE /STACK:"4194304" /TLBID:1 /DYNAMICBASE:NO/NXCOMPAT /IMPLIB:"Release/tablefunc/tablefunc.lib" /MACHINE:X64 /ignore:4197 /DLL .\Release\tablefunc\win32ver.res 4131 .\Release\tablefunc\tablefunc.obj 4132 Creating library Release/tablefunc/tablefunc.lib and object Release/tablefunc/tablefunc.exp 4133 tablefunc.obj : error LNK2001: unresolved external symbol pg_global_prng_state [C:\projects\postgresql\tablefunc.vcxproj] 4134 .\Release\tablefunc\tablefunc.dll : fatal error LNK1120: 1 unresolved externals [C:\projects\postgresql\tablefunc.vcxproj] 4135 Done Building Project "C:\projects\postgresql\tablefunc.vcxproj" (default targets) -- FAILED. The missing symbol is really defined in common/pg_prng.c which AFAICT is linked with postgres. If someone experienced with the windows compilation chain could give a hint of what is needed, I'd appreciate it! -- Fabien.
On Thu, Sep 30, 2021 at 9:23 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > The missing symbol is really defined in common/pg_prng.c which AFAICT is > linked with postgres. > > If someone experienced with the windows compilation chain could give a > hint of what is needed, I'd appreciate it! I guess the declaration needs PGDLLIMPORT.
> I guess the declaration needs PGDLLIMPORT. Indeed, thanks! Attached v16 adds that. -- Fabien.
Attachment
Hi hackers, > > > I guess the declaration needs PGDLLIMPORT. > > Indeed, thanks! > > Attached v16 adds that. It looks like the patch is in pretty good shape. I noticed that the return value of pg_prng_strong_seed() is not checked in several places, also there was a typo in pg_trgm.c. The corrected patch is attached. Assuming the new version will not upset cfbot, I would call the patch "Ready for Committer". -- Best regards, Aleksander Alekseev
Attachment
Aleksander Alekseev <aleksander@timescale.com> writes: > It looks like the patch is in pretty good shape. I noticed that the > return value of pg_prng_strong_seed() is not checked in several > places, also there was a typo in pg_trgm.c. The corrected patch is > attached. Assuming the new version will not upset cfbot, I would call > the patch "Ready for Committer". I took a quick look through this. The biggest substantive point I found was that you didn't update the configure script. It's certainly not appropriate for configure to continue to do AC_REPLACE_FUNCS on random and srandom when you've removed the src/port files that that would attempt to include. The simplest change here is just to delete those entries from the list, but that would also result in not #define'ing HAVE_RANDOM or HAVE_SRANDOM, and I see that the patch introduces a dependency on the latter. I'm inclined to think that's misguided. srandom() has been required by POSIX since SUSv2, and we certainly have not got any non-Windows buildfarm members that lack it. So I don't think we really need a configure check. What we do need is a decision about what to do on Windows. We could write it like +#ifndef WIN32 + srandom(pg_prng_i32(&pg_global_prng_state)); +#endif but I have a different modest suggestion: add #define srandom(seed) srand(seed) in win32_port.h. As far as I can see from Microsoft's docs [1], srand() is exactly like srandom(), they just had some compulsion to not be POSIX-compatible. BTW, the commentary in InitProcessGlobals is now completely inadequate; it's unclear to a reader why we should be bothering ith srandom(). I suggest adding a comment right before the srandom() call, along the lines of /* * Also make sure that we've set a good seed for random() (or rand() * on Windows). Use of those functions is deprecated in core * Postgres, but they might get used by extensions. */ +/* use Donald Knuth's LCG constants for default state */ How did Knuth get into this? This algorithm is certainly not his, so why are those constants at all relevant? Other cosmetic/commentary issues: * I could do without the stream-of-consciousness notes in pg_prng.c. I think what's appropriate is to say "We use thus-and-such a generator which is documented here", maybe with a line or two about its properties. * Function names like these convey practically nothing to readers: +extern int64 pg_prng_i64(pg_prng_state *state); +extern uint32 pg_prng_u32(pg_prng_state *state); +extern int32 pg_prng_i32(pg_prng_state *state); +extern double pg_prng_f64(pg_prng_state *state); +extern bool pg_prng_bool(pg_prng_state *state); and these functions' header comments add a grand total of zero bits of information. What someone generally wants to know first about a PRNG is (a) is it uniform and (b) what is the range of outputs, neither of which are specified anywhere. +#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000) +#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF) +#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF) I'm not sure that these named constants are any more readable than writing the underlying constant, maybe less so --- in particular I think something based on (1<<52)-1 would be more appropriate for the float mantissa operations. We don't need RIGHT_HALF_MASK at all, the casts to uint32 or int32 will accomplish that just fine. BTW, why are we bothering with FIRST_BIT_MASK in the first place, rather than returning "v & 1" for pg_prng_bool? Is xoroshiro128ss less random in the low-order bits than the higher? If so, that would be a pretty important thing to document. If it's not, we shouldn't make the code look like it is. + * select in a range with bitmask rejection. What is "bitmask rejection"? Is it actually important to callers? I think this should be documented more like "Produce a random integer uniformly selected from the range [rmin, rmax)." regards, tom lane [1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/srand?view=msvc-170
I wrote: > ... What we do need is a decision > about what to do on Windows. We could write it like > +#ifndef WIN32 > + srandom(pg_prng_i32(&pg_global_prng_state)); > +#endif > but I have a different modest suggestion: add > #define srandom(seed) srand(seed) > in win32_port.h. As far as I can see from Microsoft's docs [1], > srand() is exactly like srandom(), they just had some compulsion > to not be POSIX-compatible. Oh, wait, I take that back --- rand()/srand() are also in POSIX, and in the C99 standard (which presumably is where Microsoft got them from). They're deprecated by POSIX on the grounds that the spec only allows them to have 32 bits of state, so they can't be terribly random. Given that, I think we should just avert our eyes; anybody depending on those functions is destined to lose anyway. Probably the "#ifndef WIN32" fragment suggested above is enough. I suppose we could *also* call srand() but that feels a bit silly. regards, tom lane
Hello Tom, Thanks for the feedback. > +/* use Donald Knuth's LCG constants for default state */ > > How did Knuth get into this? This algorithm is certainly not his, > so why are those constants at all relevant? They are not more nor less relevant than any other "random" constant. The state needs a default initialization. The point of using DK's is that it is somehow cannot be some specially crafted value which would have some special property only know to the purveyor of the constant and could be used by them to break the algorithm. https://en.wikipedia.org/wiki/Dual_EC_DRBG > * I could do without the stream-of-consciousness notes in pg_prng.c. > I think what's appropriate is to say "We use thus-and-such a generator > which is documented here", maybe with a line or two about its properties. The stuff was really written essentially as a "why this" for the first patch, and to prevent questions about "why not this other generator" later, because it could never stop. > * Function names like these convey practically nothing to readers: > > +extern int64 pg_prng_i64(pg_prng_state *state); > +extern uint32 pg_prng_u32(pg_prng_state *state); > +extern int32 pg_prng_i32(pg_prng_state *state); > +extern double pg_prng_f64(pg_prng_state *state); > +extern bool pg_prng_bool(pg_prng_state *state); The intention is obviously "postgres pseudo-random number generator for <type>". ISTM that it conveys (1) that it is a postgres-specific stuff, (2) that it is a PRNG (which I find *MUCH* more informative than the misleading statement that something is random when it is not, and it is shorter) and (3) about the type it returns, because C does require functions to have distinct names. What would you suggest? > and these functions' header comments add a grand total of zero bits > of information. Yes, probably. I do not like not to comment at all on a function. > What someone generally wants to know first about a PRNG is (a) is it > uniform and (b) what is the range of outputs, neither of which are > specified anywhere. ISTM (b) is suggested thanks to the type and (a) I'm not sure about a PRNG which would claim not at least claim to be uniform. Non uniform PRNG are usually built based on a uniform one. What do you suggest as alternate names? > +#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000) > +#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF) > +#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF) > > I'm not sure that these named constants are any more readable than > writing the underlying constant, maybe less so --- in particular > I think something based on (1<<52)-1 would be more appropriate for > the float mantissa operations. We don't need RIGHT_HALF_MASK at > all, the casts to uint32 or int32 will accomplish that just fine. Yep. I did it for uniformity. > BTW, why are we bothering with FIRST_BIT_MASK in the first place, > rather than returning "v & 1" for pg_prng_bool? Because some PRNG are very bad in the low bits, not xoroshiro stuff, though. > Is xoroshiro128ss less random in the low-order bits than the higher? > If so, that would be a pretty important thing to document. If it's not, > we shouldn't make the code look like it is. Dunno. Why should we prefer low bits? > + * select in a range with bitmask rejection. > > What is "bitmask rejection"? Is it actually important to callers? No, it is important to understand how it does it. That is the name of the technique which is implemented, which helps if you want to understand what is going on by googling it. This point could be moved inside the function. > I think this should be documented more like "Produce a random > integer uniformly selected from the range [rmin, rmax)." Sure. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> How did Knuth get into this? This algorithm is certainly not his, >> so why are those constants at all relevant? > They are not more nor less relevant than any other "random" constant. The > state needs a default initialization. The point of using DK's is that it > is somehow cannot be some specially crafted value which would have some > special property only know to the purveyor of the constant and could be > used by them to break the algorithm. Well, none of that is in the comment, which is probably just as well because it reads like baseless paranoia. *Any* initialization vector should be as good as any other; if it's not, that's an algorithm fault. (OK, I'll give it a pass for zeroes being bad, but otherwise not.) >> * Function names like these convey practically nothing to readers: >> >> +extern int64 pg_prng_i64(pg_prng_state *state); >> +extern uint32 pg_prng_u32(pg_prng_state *state); >> +extern int32 pg_prng_i32(pg_prng_state *state); >> +extern double pg_prng_f64(pg_prng_state *state); >> +extern bool pg_prng_bool(pg_prng_state *state); > The intention is obviously "postgres pseudo-random number generator for > <type>". ISTM that it conveys (1) that it is a postgres-specific stuff, > (2) that it is a PRNG (which I find *MUCH* more informative than the > misleading statement that something is random when it is not, and it is > shorter) and (3) about the type it returns, because C does require > functions to have distinct names. > What would you suggest? We have names for these types, and those abbreviations are (mostly) not them. Name-wise I'd be all right with pg_prng_int64 and so on, but I still expect that these functions' header comments should be explicit about uniformity and about the precise output range. As an example, it's far from obvious whether the minimum value of pg_prng_int32 should be zero or INT_MIN. (Actually, I suspect you ought to provide both of those cases.) And the output range of pg_prng_float8 is not merely unobvious, but not very easy to deduce from examining the code either; not that users should have to. >> BTW, why are we bothering with FIRST_BIT_MASK in the first place, >> rather than returning "v & 1" for pg_prng_bool? > Because some PRNG are very bad in the low bits, not xoroshiro stuff, > though. Good, but then you shouldn't write associated code as if that's still a problem, because you'll cause other people to think it's still a problem and write equally contorted code elsewhere. "v & 1" is a transparent way of producing a bool, while this code isn't. regards, tom lane
Hello, >> They are not more nor less relevant than any other "random" constant. The >> state needs a default initialization. The point of using DK's is that it >> is somehow cannot be some specially crafted value which would have some >> special property only know to the purveyor of the constant and could be >> used by them to break the algorithm. > > Well, none of that is in the comment, which is probably just as well > because it reads like baseless paranoia. Sure. Welcome to cryptography:-) > *Any* initialization vector should be as good as any other; if it's not, > that's an algorithm fault. Yep. > (OK, I'll give it a pass for zeroes being bad, but otherwise not.) Ok. We can use any non-zero constant. What's wrong with constants provided by a Turing award computer scientist? I find them more attractive that some stupid 0x0123456789…. >>> * Function names like these convey practically nothing to readers: >>> >>> +extern int64 pg_prng_i64(pg_prng_state *state); [...] > >> The intention is obviously "postgres pseudo-random number generator for >> <type>". [...] > >> What would you suggest? > > We have names for these types, and those abbreviations are (mostly) > not them. Name-wise I'd be all right with pg_prng_int64 and so on, Ok. You prefer "uint64" to "u64". > but I still expect that these functions' header comments should be > explicit about uniformity and about the precise output range. Ok. > As an example, it's far from obvious whether the minimum value > of pg_prng_int32 should be zero or INT_MIN. > (Actually, I suspect you ought to provide both of those cases.) I agree that it is not obvious. I added "p" for "positive" variants. I found one place where one could be used. > And the output range of pg_prng_float8 is not merely unobvious, but not > very easy to deduce from examining the code either; not that users > should have to. Ok. >>> BTW, why are we bothering with FIRST_BIT_MASK in the first place, >>> rather than returning "v & 1" for pg_prng_bool? > >> Because some PRNG are very bad in the low bits, not xoroshiro stuff, >> though. > > Good, but then you shouldn't write associated code as if that's still > a problem, because you'll cause other people to think it's still a > problem and write equally contorted code elsewhere. "v & 1" is a > transparent way of producing a bool, while this code isn't. "v & 1" really produces an integer, not a bool. I'd prefer to actually generate a boolean and let the compiler optimizer do the cleaning. Some Xoshiro-family generators have "linear artifacts in the low bits", Although Xoroshiro128** is supposed to be immune, I thought better to keep away from these, and I could not see why the last bit would be better than any other bit, so taking the first looked okay to me at least. I think that the attached v18 addresses most of your concerns. -- Fabien.
Attachment
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> Good, but then you shouldn't write associated code as if that's still >> a problem, because you'll cause other people to think it's still a >> problem and write equally contorted code elsewhere. "v & 1" is a >> transparent way of producing a bool, while this code isn't. > Some Xoshiro-family generators have "linear artifacts in the low bits", > Although Xoroshiro128** is supposed to be immune, I thought better to keep > away from these, and I could not see why the last bit would be better than > any other bit, so taking the first looked okay to me at least. Meh. If we're going to trust the high bits more than the lower ones, we should do so consistently; it makes no sense to do that in one pg_prng.c function and not its siblings. Pushed with that change and some others, notably: * Rewrote a lot of the comments. * Refactored so that pg_strong_random() is not called from pg_prng.c. As it stood, that would result in pulling in OpenSSL in programs that have no need of it. (ldexp() still creates a dependency on libm, but I figured that was OK.) * Changed a number of call sites that were using modulo reduction to use pg_prng_uint64_range instead. Given the lengthy discussion we had, it seems silly not to apply the conclusion everywhere. regards, tom lane
> Pushed with that change and some others, notably: Thanks for the improvements and the push! -- Fabien.