Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id alpine.DEB.2.22.394.2104051009040.2033293@pseudo
Whole thread Raw
In response to Re: pgbench - add pseudo-random permutation function  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: pgbench - add pseudo-random permutation function
List pgsql-hackers
Hello Dean,

>> I think that permute should only use integer operations. I'd suggest to
>> use one of the integer variants instead of going through a double
>> computation and casting back to int. The internal state is based on
>> integers, I do not see the added value of going through floats, possibly
>> enduring floating point issues (undeflow, rounding, normalization,
>> whatever) on the way, whereas from start to finish we just need ints.
>
> This is the already-established coding pattern used in getrand() to
> pick a random number uniformly in some range that's not necessarily a
> power of 2.

Indeed. I'm not particularly happy with that one either.

> Floating point underflow and normalisation issues are not possible
> because erand48() takes a 48-bit integer N and uses ldexp() to store
> N/2^48 in a double, which is an exact operation since IEEE doubles
> have something like 56-bit mantissas.

Double mantissa size is 52 bits.

> This is then turned back into an integer in the required range by 
> multiplying by the desired maximum value, so there's never any risk of 
> underflow or normalisation issues.

ISTM that there are significant issues when multiplying with an integer, 
because the integer is cast to a double before multiplying, so if the int 
is over 52 bits then it is coldly truncated and some values are just lost 
in the process and will never be drawn. Probably not too many of them, but 
some of them anyway.

> I guess that there may be rounding variations once the required
> maximum value exceeds something like 2^56 (although the comment in
> getrand() is much more conservative than that), so it's possible that
> a pgbench script calling random() with (ub-lb) larger than that might
> give different results on different platforms.

Dunno. This may be the same issue I'm pointing out above.

> For the non-uniform random functions, that effect might well kick in 
> sooner. I'm not aware of any field complaints about that though, 
> possibly because real-world data sizes are significantly smaller than 
> that.
>
> In practice, permute() is likely to take its input from one of the
> non-uniform random functions, so it won't be permute() that first
> introduces rounding issues.

Sure. I'd like permute to be immune to that.

>> See attached v27 proposal.
>
> This update has a number of flaws. For example, this:

Indeed:-)

> +static uint64
> +randu64(RandomState * state)
> +{
> +    uint64 r1 = pg_jrand48((*state).xseed),
> +           r2 = pg_jrand48((*state).xseed);
> +    return r1 << 51 | r2 << 13 | r1 >> 13;
> +}
>
> It still uses a 48-bit RandomState, so it doesn't improve on getrand()
> in that respect.

Sure. I'm pretty unhappy with that one, but I was not trying to address 
that. My idea that randu64 would be replace with something better at some 
point. My intention was "64-bits pseudo-random", my implementation does 
not work, ok:-)

> It replaces a single erand48() call with 2 jrand48() calls, which
> comes at a cost in terms of performance. (In fact, changing the number
> of rounds in the previous version of permute() from 4 to 6 has a
> smaller performance impact than this -- more about that below.)

Sure, same remark as above, I was not trying to address that pointB.

> jrand48() returns a signed 32-bit integer, which has a 50% chance of
> being negative, so when that is cast to a uint64, there is a 50%
> chance that the 32 most significant bits will be 1.

Argh.

> When the various parts are OR'ed together, that will then mask out any 
> randomness from the other parts. For example, 50% of the time, the 
> jrand48() value used for r1 will be negative, and so 32 bits in the 
> middle of the final result will all be set.

Argh. I hesitated to use xor. I should not have:-)

> So overall, the results will be highly non-uniform, with less
> randomness and poorer performance than erand48().

Indeed, bad choice. I wanted to used the unsigned version but it is not 
implemented, and swichted to the signed version without thinking of some 
of the implications.

> In addition, it returns a result in the range [0,2^64), which is not
> really what's wanted. For example:
>
> +        /* Random offset */
> +        r = randu64(&random_state2);
> +        v = (v + r) % size;
>
> The previous code used r = getrand(0, size-1), which gave a uniformly
> random offset. However, the new code risks introducing additional
> non-uniformity when size is not a power of 2.

ISTM that the overall non uniformity is worse with the float approach as 
opposed to the int approach.

Conceptually, the same kind of bias is expected whether you get through 
floats or through ints, because the underlying power-of-two maths is the 
same, so what makes the difference in reducing non-uniformity is using 
more bits. Basically, when enough bits are used the same number of values 
should appear n vs n+1 times.

When not enough bits are provided, things get ugly: for instance, with 
size = 2^53, even if the floats were fully the 52-bit float pseudo-random 
mantissa (they are really 48 with erand48) would result in only even 
numbers to be produced, whereas with ints all numbers are produced. With 
erand48, when size is above 48 bits ISTM that last bits are always zeros 
with the double approach. I'm not counting lost values because of size 
truncation when converting it to double.

> Finally, worst of all, this random offset is no longer bijective, due
> to 64-bit integer wrap-around. For example, suppose that size=100 and
> r=(2^64-10), then the following 2 values both map to the same result:
>
>  v = 20 -> (v + r) % size
>          = (20 + (2^64 - 10)) % 100
>          = (2^64 + 10) % 100
>          = (10) % 100
>          = 10
>
>  v = 4 -> (v + r) % size
>         = (4 + (2^64 - 10)) % 100
>         = (2^64 - 6) % 100
>         = (18446744073709551610) % 100
>         = 10
>
> So not only are the results no longer uniformly random, they might not
> even form a permutation.

Indeed, this one is pretty fun! Probably the right formula for this 
approach is "(v + r % size) % size", which is kind of a mouthful.

I fully agree that my v27 implementation is butched on many dimensions, 
some of them intentional and temporary (use jrand48 twice) and some of 
them accidental (not considering int overflows, being optimistic on signed 
to unsigned casts…).

I still disagree though that going through floating point is the right 
thing to do, because of some of the issues I outlined above (eg truncation 
and rounding for above 48/52-bits sizes). Basically I think that an 
algorithm dealing with integers should not have to resort to floating 
point computations unless it is actually required. This is not the case 
for permute, were v26 is using doubles as glorified 48-bit integers, that 
could be extended to 52-bit integers, but no more. The only benefit I see 
is using implicitly the internal 104-bit rounding by truncation on 
multiply, but I do not think that implicitely reducing the practical int 
values to 52 bits is worth it, and that the same quality (bias) can be 
achieved for 63 bits integers by keeping them as ints are writing the 
right formula, which I fully failed to demonstrate in v27.

> I did some more testing of the previous version (v26), this time
> looking at much larger sizes, all the way up to the maximum, which is
> 2^63-1 since it comes from a signed int64. In general, the results
> were very good, though I did notice some slight non-uniformity in the
> way adjacent inputs were separated from another when the size was just
> under a power of 2. I think that's the hardest case for this
> algorithm, because there's very little overlap between the 2 halves.

Yes, less values are steered twice per round. However, as for adjacent 
values for large sizes, I'm wondering whether this may have more to do 
with the 48 bit limitations, so that lower bits are not really xored for 
instance. Not sure.

> Increasing the number of rounds from 4 to 6 ironed out that
> non-uniformity (and as mentioned above, is still cheaper than using
> randu64() with 4 rounds), so I think we should go with that.

There is a quality-cost tradeoff. With the previous version I convinced 
myself that 4 rounds were a good compromise (not perfect, but ok for 
keeping the cost low on practical sizes).

With this version, I'll admit that I do not have an opinion.

> You may wish to submit a separate patch to replace pgbench's use of
> *rand48() with something else, and that would be discussed on its own
> merits, but I don't see why that should hold up adding permute().

I'll see.

Attached a v28 which I hope fixes the many above issues and stays with 
ints. The randu64 is still some kind of a joke, I artificially reduced the 
cost by calling jrand48 once and extending it to 64 bits, so it could give 
an idea of the cost endured if a 64-bit prng was used.

Now you are the committer, you can do as you please, I'm just stating my 
(mathematical) opinions about using floating point computations for that. 
I think that apart from this point of principle/philosophy the permute 
performance and implementation are reasonable, and better than my initial 
version because it avoids int128 computations and the large prime number 
business.

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: torikoshia
Date:
Subject: Re: Get memory contexts of an arbitrary backend process
Next
From: Sait Talha Nisanci
Date:
Subject: Re: [EXTERNAL] Re: Crash in record_type_typmod_compare