Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: pgbench - add pseudo-random permutation function |
Date | |
Msg-id | alpine.DEB.2.22.394.2104051009040.2033293@pseudo Whole thread Raw |
In response to | Re: pgbench - add pseudo-random permutation function (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: pgbench - add pseudo-random permutation function
|
List | pgsql-hackers |
Hello Dean, >> I think that permute should only use integer operations. I'd suggest to >> use one of the integer variants instead of going through a double >> computation and casting back to int. The internal state is based on >> integers, I do not see the added value of going through floats, possibly >> enduring floating point issues (undeflow, rounding, normalization, >> whatever) on the way, whereas from start to finish we just need ints. > > This is the already-established coding pattern used in getrand() to > pick a random number uniformly in some range that's not necessarily a > power of 2. Indeed. I'm not particularly happy with that one either. > Floating point underflow and normalisation issues are not possible > because erand48() takes a 48-bit integer N and uses ldexp() to store > N/2^48 in a double, which is an exact operation since IEEE doubles > have something like 56-bit mantissas. Double mantissa size is 52 bits. > This is then turned back into an integer in the required range by > multiplying by the desired maximum value, so there's never any risk of > underflow or normalisation issues. ISTM that there are significant issues when multiplying with an integer, because the integer is cast to a double before multiplying, so if the int is over 52 bits then it is coldly truncated and some values are just lost in the process and will never be drawn. Probably not too many of them, but some of them anyway. > I guess that there may be rounding variations once the required > maximum value exceeds something like 2^56 (although the comment in > getrand() is much more conservative than that), so it's possible that > a pgbench script calling random() with (ub-lb) larger than that might > give different results on different platforms. Dunno. This may be the same issue I'm pointing out above. > For the non-uniform random functions, that effect might well kick in > sooner. I'm not aware of any field complaints about that though, > possibly because real-world data sizes are significantly smaller than > that. > > In practice, permute() is likely to take its input from one of the > non-uniform random functions, so it won't be permute() that first > introduces rounding issues. Sure. I'd like permute to be immune to that. >> See attached v27 proposal. > > This update has a number of flaws. For example, this: Indeed:-) > +static uint64 > +randu64(RandomState * state) > +{ > + uint64 r1 = pg_jrand48((*state).xseed), > + r2 = pg_jrand48((*state).xseed); > + return r1 << 51 | r2 << 13 | r1 >> 13; > +} > > It still uses a 48-bit RandomState, so it doesn't improve on getrand() > in that respect. Sure. I'm pretty unhappy with that one, but I was not trying to address that. My idea that randu64 would be replace with something better at some point. My intention was "64-bits pseudo-random", my implementation does not work, ok:-) > It replaces a single erand48() call with 2 jrand48() calls, which > comes at a cost in terms of performance. (In fact, changing the number > of rounds in the previous version of permute() from 4 to 6 has a > smaller performance impact than this -- more about that below.) Sure, same remark as above, I was not trying to address that pointB. > jrand48() returns a signed 32-bit integer, which has a 50% chance of > being negative, so when that is cast to a uint64, there is a 50% > chance that the 32 most significant bits will be 1. Argh. > When the various parts are OR'ed together, that will then mask out any > randomness from the other parts. For example, 50% of the time, the > jrand48() value used for r1 will be negative, and so 32 bits in the > middle of the final result will all be set. Argh. I hesitated to use xor. I should not have:-) > So overall, the results will be highly non-uniform, with less > randomness and poorer performance than erand48(). Indeed, bad choice. I wanted to used the unsigned version but it is not implemented, and swichted to the signed version without thinking of some of the implications. > In addition, it returns a result in the range [0,2^64), which is not > really what's wanted. For example: > > + /* Random offset */ > + r = randu64(&random_state2); > + v = (v + r) % size; > > The previous code used r = getrand(0, size-1), which gave a uniformly > random offset. However, the new code risks introducing additional > non-uniformity when size is not a power of 2. ISTM that the overall non uniformity is worse with the float approach as opposed to the int approach. Conceptually, the same kind of bias is expected whether you get through floats or through ints, because the underlying power-of-two maths is the same, so what makes the difference in reducing non-uniformity is using more bits. Basically, when enough bits are used the same number of values should appear n vs n+1 times. When not enough bits are provided, things get ugly: for instance, with size = 2^53, even if the floats were fully the 52-bit float pseudo-random mantissa (they are really 48 with erand48) would result in only even numbers to be produced, whereas with ints all numbers are produced. With erand48, when size is above 48 bits ISTM that last bits are always zeros with the double approach. I'm not counting lost values because of size truncation when converting it to double. > Finally, worst of all, this random offset is no longer bijective, due > to 64-bit integer wrap-around. For example, suppose that size=100 and > r=(2^64-10), then the following 2 values both map to the same result: > > v = 20 -> (v + r) % size > = (20 + (2^64 - 10)) % 100 > = (2^64 + 10) % 100 > = (10) % 100 > = 10 > > v = 4 -> (v + r) % size > = (4 + (2^64 - 10)) % 100 > = (2^64 - 6) % 100 > = (18446744073709551610) % 100 > = 10 > > So not only are the results no longer uniformly random, they might not > even form a permutation. Indeed, this one is pretty fun! Probably the right formula for this approach is "(v + r % size) % size", which is kind of a mouthful. I fully agree that my v27 implementation is butched on many dimensions, some of them intentional and temporary (use jrand48 twice) and some of them accidental (not considering int overflows, being optimistic on signed to unsigned casts…). I still disagree though that going through floating point is the right thing to do, because of some of the issues I outlined above (eg truncation and rounding for above 48/52-bits sizes). Basically I think that an algorithm dealing with integers should not have to resort to floating point computations unless it is actually required. This is not the case for permute, were v26 is using doubles as glorified 48-bit integers, that could be extended to 52-bit integers, but no more. The only benefit I see is using implicitly the internal 104-bit rounding by truncation on multiply, but I do not think that implicitely reducing the practical int values to 52 bits is worth it, and that the same quality (bias) can be achieved for 63 bits integers by keeping them as ints are writing the right formula, which I fully failed to demonstrate in v27. > I did some more testing of the previous version (v26), this time > looking at much larger sizes, all the way up to the maximum, which is > 2^63-1 since it comes from a signed int64. In general, the results > were very good, though I did notice some slight non-uniformity in the > way adjacent inputs were separated from another when the size was just > under a power of 2. I think that's the hardest case for this > algorithm, because there's very little overlap between the 2 halves. Yes, less values are steered twice per round. However, as for adjacent values for large sizes, I'm wondering whether this may have more to do with the 48 bit limitations, so that lower bits are not really xored for instance. Not sure. > Increasing the number of rounds from 4 to 6 ironed out that > non-uniformity (and as mentioned above, is still cheaper than using > randu64() with 4 rounds), so I think we should go with that. There is a quality-cost tradeoff. With the previous version I convinced myself that 4 rounds were a good compromise (not perfect, but ok for keeping the cost low on practical sizes). With this version, I'll admit that I do not have an opinion. > You may wish to submit a separate patch to replace pgbench's use of > *rand48() with something else, and that would be discussed on its own > merits, but I don't see why that should hold up adding permute(). I'll see. Attached a v28 which I hope fixes the many above issues and stays with ints. The randu64 is still some kind of a joke, I artificially reduced the cost by calling jrand48 once and extending it to 64 bits, so it could give an idea of the cost endured if a 64-bit prng was used. Now you are the committer, you can do as you please, I'm just stating my (mathematical) opinions about using floating point computations for that. I think that apart from this point of principle/philosophy the permute performance and implementation are reasonable, and better than my initial version because it avoids int128 computations and the large prime number business. -- Fabien.
Attachment
pgsql-hackers by date: