Re: rand48 replacement - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: rand48 replacement
Date
Msg-id CAEZATCX2qmzW+mVgpFM1HWwP_7bu_PnPvJomKMC58u02Ls39+w@mail.gmail.com
Whole thread Raw
In response to Re: rand48 replacement  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Wed, 7 Jul 2021 at 04:00, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>
> Anyway, excuse me for heating this discussion cause of such
> non-essential issue.
> I'll try to control myself and don't proceed it further.
>

Whilst it has been interesting learning and discussing all these
different techniques, I think it's probably best to stick with the
bitmask method, rather than making the code too complex and difficult
to follow. The bitmask method has the advantage of being very simple,
easy to understand and fast (fastest in many of the benchmarks, and
close enough in others to make me think that the difference won't
matter for our purposes).

To test the current patch, I hacked up a simple SQL-callable server
function: random(bigint, bigint) returns bigint, similar to the one in
pgbench. After doing so, I couldn't help thinking that it would be
useful to have such a function in core, so maybe that could be a
follow-on patch. Anyway, that led to the following observations:

Firstly, there's a bug in the existing mask_u64() code -- if
pg_leftmost_one_pos64(u) returns 63, you end up with a mask equal to
0, and it breaks down.

Secondly, I think it would be simpler to implement this as a bitshift,
rather than a bitmask, using the high bits from the random number.
That might not make much difference for xoroshiro**, but in general,
PRNGs tend to be weaker in the lower bits, so it seems preferable on
that basis. But also, it makes the code simpler and less error-prone.

Finally, I think it would be better to treat the upper bound of the
range as inclusive. Doing so makes the function able to cover all
possible 64-bit ranges. It would then be easy (perhaps in another
follow-on patch) to make the pgbench random() function work for all
64-bit bounds (as long as max >= min), without the weird overflow
checking it currently has.

Putting those 3 things together, the code (minus comments) becomes:

    if (range > 0)
    {
        int rshift = 63 - pg_leftmost_one_pos64(range);

        do
        {
            val = xoroshiro128ss(state) >> rshift;
        }
        while (val > range);
    }
    else
        val = 0;

which reduces the complexity a bit.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Boris Kolpackov
Date:
Subject: Re: Pipeline mode and PQpipelineSync()
Next
From: Yugo NAGATA
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors