Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id CAEZATCVgW-hT2L_eyighcBB3+pujuRmihXAE73_vwNF4WtDLVQ@mail.gmail.com
Whole thread Raw
In response to Re: pgbench - add pseudo-random permutation function  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench - add pseudo-random permutation function
List pgsql-hackers
On Wed, 31 Mar 2021 at 18:53, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> While looking at it, I have some doubts on this part:
>
>   m = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)) | 1;
>   r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1));
>   r = (uint64) (pg_erand48(random_state.xseed) * size);
>
> I do not understand why the random values are multiplied by anything in
> the first place…
>

These are just random integers in the range [0,mask] and [0,size-1],
formed in exactly the same way as getrand().

> This one looks like a no-op :
>
>     r = (uint64) (pg_erand48(random_state.xseed) * size);
>     v = (v + r) % size;
>
>     v = (v + r) % size
>       = (v + rand * size) % size
>       =? (v % size + rand * size % size) % size
>       =? (v % size + 0) % size
>       = v % size
>       = v
>

rand * size % size is not zero because rand is a floating point number
in the range [0,1), so actually rand * size % size = rand * size.
Similarly in the other case, you're forgetting that rand is not an
integer.

Thinking more about our use of erand48(), the only real impact it has
is to limit the number of possible permutations produced, and actually
2^48 is so huge (roughly 281 million million) that I can't ever see
that being an issue in practice. (In a quick dummy test, replacing
erand48() with a silly "erand8()" function that only returned one of
256 distinct values, permute() still worked fine at any size, except
for the fact that only up to 256 distinct permutations were produced.
In other words, limitations on the source of randomness don't prevent
it from producing permutations of any size, they just limit the number
of distinct permutations possible. And since 2^48 is so big, that
shouldn't be an issue.)

Also, I think the source of the input seed is most likely to be either
manually hand-picked integers or pgbench's own random() function, so
the only real issue I can see is that by ignoring the upper 16-bits,
there's a very small chance of always using the same random sequence
if some hand-picked numbers only vary in the top 16 bits, though I
think that's highly unlikely in practice.

Nonetheless, it's not much more effort to make another random state
and use those remaining bits of the seed and get more internal random
states, so here's an update doing that. I intentionally chose to reuse
the lower 16 bits of the seed in the second random function (in a
different slot of the random state), since those are probably the ones
most likely to vary in practice.

This doesn't actually make any measurable difference to any of the
tests, but it closes that potential loophole of ignoring part of the
seed. In all my tests, the biggest improvement was between v23 and v24
of the patch. By comparison, the later versions have been relatively
small improvements, and it's probably now "random enough" for the
intended purposes.

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Stronger safeguard for archive recovery not to miss data
Next
From: Michael Paquier
Date:
Subject: Re: Flaky vacuum truncate test in reloptions.sql