Thread: rand48 replacement

Re: rand48 replacement

From

Tomas Vondra

Date:

24 May 2021, 10:57:13

Hi,

On 5/24/21 12:31 PM, Fabien COELHO wrote:
> 
> Hello pg-devs,
> 
> I have given a go at proposing a replacement for rand48.
> 

So what is the motivation for replacing rand48? Speed, quality of 
produced random numbers, features rand48 can't provide, or what?

> POSIX 1988 (?) rand48 is a LCG PRNG designed to generate 32 bits 
> integers or floats based on a 48 bits state on 16 or 32 bits 
> architectures. LCG cycles on the low bits, which can be quite annoying. 
> Given that we run on 64 bits architectures and that we need to generate 
> 64 bits ints or doubles, IMHO it makes very little sense to stick to that.
> 
> We should (probably) want:
>   - one reasonable default PRNG for all pg internal uses.
>   - NOT to invent a new design!
>   - something fast, close to rand48 (which basically does 2 arithmetic
>     ops, so it is hard to compete)
>     no need for something cryptographic though, which would imply slow
>   - to produce 64 bits integers & doubles with a 52 bits mantissa,
>     so state size > 64 bits.
>   - a small state though, because we might generate quite a few of them
>     for different purposes so state size <= 256 or even <= 128 bits
>   - the state to be aligned to whatever => 128 bits
>   - 64 bits operations for efficiency on modern architectures,
>     but not 128 bits operations.
>   - not to depend on special hardware for speed (eg MMX/SSE/AES).
>   - not something with obvious known and relevant defects.
>   - not something with "rights" attached.
> 
> These constraints reduce drastically the available options from 
> https://en.wikipedia.org/wiki/List_of_random_number_generators
> 
> The attached patch removes "rand48" and adds a "pg_prng" implementation 
> based on xoroshiro128ss, and replaces it everywhere. In pgbench, the non 
> portable double-relying code is replaced by hopefully portable ints. The 
> interface makes it easy to replace the underlying PRNG if something else 
> is desired.
> 

xoroshiro seems reasonable. How does it compare to rand48? Does it need 
much less/more state, is it faster/slower, etc.? I'd expect that it 
produces better random sequence, considering rand48 is a LCG, which is 
fairly simple decades old design.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: rand48 replacement

From

Fabien COELHO

Date:

24 May 2021, 11:28:07

Hello Tomas,

>> I have given a go at proposing a replacement for rand48.
>
> So what is the motivation for replacing rand48? Speed, quality of produced 
> random numbers, features rand48 can't provide, or what?

Speed can only be near rand48, see below. Quality (eg no trivial cycles, 
does not fail too quickly on statistical tests) and soundness (the point 
of generating 52-64 bits of data out of 48 bits, which means that only a 
small part of the target space is covered, fails me). Also, I like having 
a implementation independent interface (current rand48 tells the name of 
the algorithm everywhere and the "uint16[3]" state type is hardcoded in 
several places).

>> POSIX 1988 (?) rand48 is a LCG PRNG designed to generate 32 bits integers 
>> or floats based on a 48 bits state on 16 or 32 bits architectures. LCG 
>> cycles on the low bits, which can be quite annoying. Given that we run on 
>> 64 bits architectures and that we need to generate 64 bits ints or doubles, 
>> IMHO it makes very little sense to stick to that.
>> 
>> We should (probably) want:
>>   - one reasonable default PRNG for all pg internal uses.
>>   - NOT to invent a new design!
>>   - something fast, close to rand48 (which basically does 2 arithmetic
>>     ops, so it is hard to compete)
>>     no need for something cryptographic though, which would imply slow
>>   - to produce 64 bits integers & doubles with a 52 bits mantissa,
>>     so state size > 64 bits.
>>   - a small state though, because we might generate quite a few of them
>>     for different purposes so state size <= 256 or even <= 128 bits
>>   - the state to be aligned to whatever => 128 bits
>>   - 64 bits operations for efficiency on modern architectures,
>>     but not 128 bits operations.
>>   - not to depend on special hardware for speed (eg MMX/SSE/AES).
>>   - not something with obvious known and relevant defects.
>>   - not something with "rights" attached.
>> 
>> These constraints reduce drastically the available options from 
>> https://en.wikipedia.org/wiki/List_of_random_number_generators
>> 
>> The attached patch removes "rand48" and adds a "pg_prng" implementation 
>> based on xoroshiro128ss, and replaces it everywhere. In pgbench, the non 
>> portable double-relying code is replaced by hopefully portable ints. The 
>> interface makes it easy to replace the underlying PRNG if something else is 
>> desired.
>
> xoroshiro seems reasonable. How does it compare to rand48? Does it need much 
> less/more state, is it faster/slower, etc.?

Basically any PRNG should be slower/comparable than rand48 because it only 
does 2 arithmetic ops, you cannot do much less when trying to steer bits. 
However because of the 16-bits unpacking/packing on 64 bits architecture 
there is some room for additional useful ops, so in the end from the end 
user the performance is only about 5% loweR.

State is 16 bytes vs 6 bytes for rand48. This is ok for generating 8 bytes 
per round and is still quite small.

> I'd expect that it produces better random sequence, considering rand48 
> is a LCG, which is fairly simple decades old design.

Yep, it does not cycle trivially on low bits compared to an LCG (eg odd -> 
even -> odd -> even -> ...), e.g. if you have the bad idea to do "% 2" on 
an LCG to extract a bool you just alternate.

To summarize:
  - better software engineering
  - similar speed (slightly slower)
  - better statistical quality
  - quite small state
  - soundness

-- 
Fabien.

Re: rand48 replacement

From

Andrey Borodin

Date:

24 May 2021, 12:30:16


> 24 мая 2021 г., в 15:31, Fabien COELHO <coelho@cri.ensmp.fr> написал(а):
>
>
> - NOT to invent a new design!

Radical version of this argument would be to use de-facto standard and ubiquitous MT19937.
Though, I suspect, it's not optimal solution to the date.

Best regards, Andrey Borodin.

Re: rand48 replacement

From

Fabien COELHO

Date:

24 May 2021, 13:08:57

Hello Andrey,

>> - NOT to invent a new design!
>
> Radical version of this argument would be to use de-facto standard and 
> ubiquitous MT19937.

Indeed, I started considering this one for this reason, obviously.

> Though, I suspect, it's not optimal solution to the date.

"not optimal" does not do justice to the issues.

The main one is the huge 2.5 KB state of MT19937 which makes it quite 
impractical for plenty of internal and temporary uses. In pgbench there 
are many PRNG needed for reproducibility (eg one global, 3 per thread, one 
per client) plus a temporary one internal to a function call (permute) 
which is expected to be reasonably fast, so should not start by 
initializing 2.5 KB of data. In postgres there are 2 permanent ones (sql 
random, C double random) plus some in a geqo and in sampling internal 
structures.

So standard MT19937 is basically out of the equation. It also happens to 
fail some statistical tests and not be very fast. It has a insanely huge 
cycle, but pg does not need that, and probably nobody does. The only good 
point is that it is a standard, which IMO is not enough to fix the other 
issues.

-- 
Fabien.

Re: rand48 replacement

From

Fabien COELHO

Date:

24 May 2021, 13:22:58

Hello Aleksander,

>>   - better software engineering
>>   - similar speed (slightly slower)
>>   - better statistical quality
>>   - quite small state
>>   - soundness
>
> Personally, I think your patch is great.

Thanks for having a look!

> Speaking of the speed I believe we should consider the performance of 
> the entire DBMS in typical scenarios, not the performance of the single 
> procedure.

Sure. I tested a worst-case pgbench script with only "\set i random(1, 
100000000)" on a loop, the slowdown was a few percents (IFAICR < 5%).

> I'm pretty sure in these terms the impact of your patch is neglectable 
> now, and almost certainly beneficial in the long term because of better 
> randomness.
>
> While reviewing your patch I noticed that you missed test_integerset.c. 
> Here is an updated patch.

Indeed. Thanks for the catch & the v2 patch!

-- 
Fabien.

Re: rand48 replacement

From

Aleksander Alekseev

Date:

24 May 2021, 14:09:01

Sorry for a duplicate entry on CF web application

Re: rand48 replacement

From

Aleksander Alekseev

Date:

24 May 2021, 14:11:47

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Although the patch looks OK I would like to keep the status "Needs review" for now in case someone would like to join
thediscussion.

Re: rand48 replacement

From

Fabien COELHO

Date:

24 May 2021, 15:35:24

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           tested, passed
> Documentation:            tested, passed
>
> Although the patch looks OK I would like to keep the status "Needs review" for now in case someone would like to join
thediscussion.
 

Ok, fine with me.

-- 
Fabien.

Re: rand48 replacement

From

Tom Lane

Date:

01 July 2021, 16:20:41

Although this patch is marked RFC, the cfbot shows it doesn't
even compile on Windows.  I think you missed updating Mkvcbuild.pm.

            regards, tom lane

Re: rand48 replacement

From

Tom Lane

Date:

01 July 2021, 18:41:09

Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Although this patch is marked RFC, the cfbot shows it doesn't
>> even compile on Windows.  I think you missed updating Mkvcbuild.pm.

> Indeed. Here is a blind attempt at fixing the build, I'll check later to 
> see whether it works. It would help me if the cfbot results were 
> integrated into the cf app.

Hmm, not there yet per cfbot, not sure why not.

Anyway, after taking a very quick look at the patch itself, I've
got just one main objection: I don't approve of putting this in
port.h or src/port/.  erand48.c is there because we envisioned it
originally as an occasionally-used substitute for libc facilities.
But this is most certainly not that, so it belongs in src/common/
instead.  I'd also be inclined to invent a new single-purpose .h
file for it.

I see that you probably did that because random.c and srandom.c
depend on it, but I wonder why we don't make an effort to flush
those altogether.  It's surely pretty confusing to newbies that
what appears to be a call of the libc primitives is no such thing.

            regards, tom lane

Re: rand48 replacement

From

Dean Rasheed

Date:

01 July 2021, 19:45:04

On Thu, 1 Jul 2021 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Anyway, after taking a very quick look at the patch itself, I've
> got just one main objection: I don't approve of putting this in
> port.h or src/port/.

I haven't looked at the patch in detail, but one thing I object to is
the code to choose a random integer in an arbitrary range.

Currently, this is done in pgbench by getrand(), which has its
problems. However, this patch seems to be replacing that with a simple
modulo operation, which is perhaps the worst possible way to do it.
There's plenty of research out there on how to do it better -- see,
for example, [1] for a nice summary.

Also, I'd say that functions to choose random integers in an arbitrary
range ought to be part of the common API, as they are in almost every
language's random API.

Regards,
Dean

[1] https://www.pcg-random.org/posts/bounded-rands.html

Re: rand48 replacement

From

Fabien COELHO

Date:

01 July 2021, 20:36:08

Hello Tom,

>> Indeed. Here is a blind attempt at fixing the build, I'll check later to
>> see whether it works. It would help me if the cfbot results were
>> integrated into the cf app.
>
> Hmm, not there yet per cfbot, not sure why not.

I'll investigate.

> Anyway, after taking a very quick look at the patch itself, I've
> got just one main objection: I don't approve of putting this in
> port.h or src/port/.  erand48.c is there because we envisioned it
> originally as an occasionally-used substitute for libc facilities.
> But this is most certainly not that, so it belongs in src/common/
> instead.

Ok, this would make sense.

> I'd also be inclined to invent a new single-purpose .h
> file for it.

Hmmm. Why not.

> I see that you probably did that because random.c and srandom.c
> depend on it, but I wonder why we don't make an effort to flush
> those altogether.

Ok for removing them. They are used in contrib where they can be replaced. 
I hope that extensions would not depend on that, though.

> It's surely pretty confusing to newbies that what appears to be a call 
> of the libc primitives is no such thing.

I do not understand your point.

If people believe the current random() implementation to be *the* libc 
primitive, then my linux doc says "The random() function uses a nonlinear 
additive feedback random number generator employing a default table of 
size 31 long integers to return successive pseudo-random numbers in the 
range from 0 to RAND_MAX.  The period of this random number generator is 
very large, approximately 16 * ((2^31) - 1).", which is pretty far from 
the rand48 implementation provided in port, so ISTM that the confusion is 
already there?

-- 
Fabien.

Re: rand48 replacement

From

Fabien COELHO

Date:

01 July 2021, 21:18:44

Hello Dean,

> I haven't looked at the patch in detail, but one thing I object to is
> the code to choose a random integer in an arbitrary range.

Thanks for bringing up this interesting question!

> Currently, this is done in pgbench by getrand(), which has its
> problems.

Yes. That is one of the motivation for providing something hopefully 
better.

> However, this patch seems to be replacing that with a simple
> modulo operation, which is perhaps the worst possible way to do it.

I did it knowing this issue. Here is why:

The modulo operation is biased for large ranges close to the limit, sure. 
Also, the bias is somehow of the same magnitude as the FP multiplication 
approach used previously, so the "worst" has not changed much, it is 
really the same as before.

I thought it is not such an issue because for typical uses we are unlikely 
to be in these conditions, so the one-operation no branching approach 
seemed like a good precision vs performance compromise: I'd expect the 
typical largest ranges to be well below 40 bits (eg a key in a pretty 
large table in pgbench), which makes the bias well under 1/2**24 and ISTM 
that I can live with that. With the initial 48 bits state, obviously the 
situation was not the same.

> There's plenty of research out there on how to do it better -- see,
> for example, [1] for a nice summary.

Rejection methods include branches, thus may cost significantly more, as 
show by the performance figures in blog.

Also, it somehow breaks the sequence determinism when using range, which I 
found quite annoying: ISTM desirable that when generating a number the 
state advances once, and just once.

Also some methods have higher costs depending on the actual range, eg the 
bitmask approach: for range 129 the bitmask is 0xff and you have a nearly 
50% probability of iterating once, nearly 25% of iterating twice, and so 
on… I like performance to be uniform, not to depend on actual values.

Given these arguments I'd be inclined to keep the bias, but I'm open to 
more discussion.

> Also, I'd say that functions to choose random integers in an arbitrary 
> range ought to be part of the common API, as they are in almost every 
> language's random API.

That is a good point.

-- 
Fabien.

Re: rand48 replacement

From

Dean Rasheed

Date:

02 July 2021, 08:31:41

On Thu, 1 Jul 2021 at 22:18, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > However, this patch seems to be replacing that with a simple
> > modulo operation, which is perhaps the worst possible way to do it.
>
> The modulo operation is biased for large ranges close to the limit, sure.
> Also, the bias is somehow of the same magnitude as the FP multiplication
> approach used previously, so the "worst" has not changed much, it is
> really the same as before.
>

It may be true that the bias is of the same magnitude as FP multiply,
but it is not of the same nature. With FP multiply, the
more-likely-to-be-chosen values are more-or-less evenly distributed
across the range, whereas modulo concentrates them all at one end,
making it more likely to bias test results.

It's worth paying attention to how other languages/libraries implement
this, and basically no one chooses the modulo method, which ought to
raise alarm bells. Of the biased methods, it has the worst kind of
bias and the worst performance.

If a biased method is OK, then the biased integer multiply method
seems to be the clear winner. This requires the high part of a
64x64-bit product, which is trivial if 128-bit integers are available,
but would need a little more work otherwise. There's some code in
common/d2s that might be suitable.

Most other implementations tend to use an unbiased method though, and
I think it's worth doing the same. It might be a bit slower, or even
faster depending on implementation and platform, but in the context of
the DB as a whole, I don't think a few extra cycles matters either
way. The method recommended at the very end of that blog seems to be
pretty good all round, but any of the other commonly used unbiased
methods would probably be OK too.

Regards,
Dean

Re: rand48 replacement

From

Fabien COELHO

Date:

02 July 2021, 21:51:42

Hello Dean,

> It may be true that the bias is of the same magnitude as FP multiply, 
> but it is not of the same nature. With FP multiply, the 
> more-likely-to-be-chosen values are more-or-less evenly distributed 
> across the range, whereas modulo concentrates them all at one end, 
> making it more likely to bias test results.

Yes, that is true.

> It's worth paying attention to how other languages/libraries implement
> this, and basically no one chooses the modulo method, which ought to
> raise alarm bells. Of the biased methods, it has the worst kind of
> bias and the worst performance.

Hmmm. That is not exactly how I interpreted the figures in the blog.

> If a biased method is OK, then the biased integer multiply method
> seems to be the clear winner. This requires the high part of a
> 64x64-bit product, which is trivial if 128-bit integers are available,
> but would need a little more work otherwise. There's some code in
> common/d2s that might be suitable.

And yes, modulo is expensive. If we allow 128 bits integers operations, I 
would not choose this RNPG in the first place, I'd take PCG with a 128 bits state.
That does not change the discussion about bias, though.

> Most other implementations tend to use an unbiased method though, and I 
> think it's worth doing the same. It might be a bit slower, or even 
> faster depending on implementation and platform, but in the context of 
> the DB as a whole, I don't think a few extra cycles matters either way.

Ok ok ok, I surrender!

> The method recommended at the very end of that blog seems to be pretty 
> good all round, but any of the other commonly used unbiased methods 
> would probably be OK too.

That does not address my other issues with the proposed methods, in 
particular the fact that the generated sequence is less deterministic, but 
I think I have a simple way around that. I'm hesitating to skip to the the 
bitmask method, and give up performance uniformity. I'll try to come up 
with something over the week-end, and also address Tom's comments in 
passing.

-- 
Fabien.

Re: rand48 replacement

From

Yura Sokolov

Date:

03 July 2021, 10:20:52

Fabien COELHO wrote 2021-07-03 11:45:
> And a v5 where an unused test file does also compile if we insist.

About patch:
1. PostgreSQL source uses `uint64` and `uint32`, but not 
`uint64_t`/`uint32_t`
2. I don't see why pg_prng_state could not be `typedef uint64 
pg_prng_state[2];`
3. Then SamplerRandomState and pgbench RandomState could stay.
    Patch will be a lot shorter.
    I don't like mix of semantic refactoring and syntactic refactoring in 
the
    same patch.
    While I could agree with replacing `SamplerRandomState => 
pg_prng_state`, I'd
    rather see it in separate commit.
    And that separate commit could contain transition:
    `typedef uint64 pg_prng_state[2];` => `typedef struct { uint64 s0, s1 
} pg_prng_state;`
4. There is no need in ReservoirStateData->randstate_initialized. There 
could
    be macros/function:
    `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }`
5. Is there need for 128bit prng at all? At least 2*64bit.
    There are 2*32bit xoroshiro64 
https://prng.di.unimi.it/xoroshiro64starstar.c
    And there is 4*32bit xoshiro128: 
https://prng.di.unimi.it/xoshiro128plusplus.c
    32bit operations are faster on 32bit platforms.
    But 32bit platforms are quite rare in production this days.
    Therefore I don't have strong opinion on this.

regards,
Sokolov Yura.

Re: rand48 replacement

From

Tom Lane

Date:

03 July 2021, 14:14:38

Yura Sokolov <y.sokolov@postgrespro.ru> writes:
> 2. I don't see why pg_prng_state could not be `typedef uint64 
> pg_prng_state[2];`

Please no.  That sort of typedef behaves so weirdly that it's
a foot-gun.

            regards, tom lane

Re: rand48 replacement

From

Fabien COELHO

Date:

03 July 2021, 15:26:00

Hello Yura,

> 1. PostgreSQL source uses `uint64` and `uint32`, but not 
> `uint64_t`/`uint32_t`
> 2. I don't see why pg_prng_state could not be `typedef uint64 
> pg_prng_state[2];`

It could, but I do not see that as desirable. From an API design point of 
view we want something clean and abstract, and for me a struct looks 
better for that. It would be a struct with an array insided, I think that 
the code is more readable by avoiding constant index accesses (s[0] vs 
s0), we do not need actual indexes.

> 3. Then SamplerRandomState and pgbench RandomState could stay.
>   Patch will be a lot shorter.

You mean "typedef pg_prng_state SamplerRandomState"? One point of the 
patch is to have "one" standard PRNG commonly used where one is needed, so 
I'd say we want the name to be used, hence the substitutions.

Also, I have a thing against objects being named "Random" which are not 
random, which is highly misleading. A PRNG is purely deterministic. 
Removing misleading names is also a benefit.

So If people want to keep the old name it can be done. But I see these 
name changes as desirable.

> I don't like mix of semantic refactoring and syntactic refactoring in 
> the same patch. While I could agree with replacing `SamplerRandomState 
> => pg_prng_state`, I'd rather see it in separate commit. And that 
> separate commit could contain transition: `typedef uint64 
> pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;`

I would tend to agree on principle, but separating in two phases here 
looks pointless: why implementing a cleaner rand48 interface, which would 
then NOT be the rand48 standard, just to upgrade it to something else in 
the next commit? And the other path is as painfull and pointless.

So I think that the new feature better comes with its associated 
refactoring which is an integral part of it.

> 4. There is no need in ReservoirStateData->randstate_initialized. There could
>   be macros/function:
>   `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }`

It would work for this peculiar implementation but not necessary for 
others that we may want to substitute later, as it would mean either 
breaking the interface or adding a boolean in the structure if there is no 
special unintialized state that can be detected, which would impact memory 
usage and alignment.

So I think it is better to keep it that way, usually the user knows 
whether its structure has been initialized, and the special case for 
reservoir where the user does not seem to know about that can handle its 
own boolean without impacting the common API or the data structure.

> 5. Is there need for 128 bit prng at all?

This is a 64 bits PRNG with a 128 bit state. We are generating 64 bits 
values, so we want a 64 bit PRNG. A prng state must be larger than its 
generated value, so we need sizeof(state) > 64 bits, hence at least 128 
bits if we add 128 bits memory alignement.

>   And there is 4*32bit xoshiro128: 
> https://prng.di.unimi.it/xoshiro128plusplus.c
>   32bit operations are faster on 32bit platforms.
>   But 32bit platforms are quite rare in production this days.
>   Therefore I don't have strong opinion on this.

I think that 99.9% hardware running postgres is 64 bits, so 64 bits is the 
right choice.

-- 
Fabien.

Re: rand48 replacement

From

Dean Rasheed

Date:

04 July 2021, 08:47:41

On Sat, 3 Jul 2021 at 08:06, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Here is a v4, which:
>
>   - moves the stuff to common and fully removes random/srandom (Tom)
>   - includes a range generation function based on the bitmask method (Dean)
>     but iterates with splitmix so that the state always advances once (Me)

At the risk of repeating myself: do *not* invent your own scheme.

The problem with iterating using splitmix is that splitmix is a simple
shuffling function that takes a single input and returns a mutated
output depending only on that input. So let's say for simplicity that
you're generating numbers in the range [0,N) with N=2^64-n and n<2^63.
Each of the n values in [N,2^64) that lie outside the range wanted are
just mapped in a deterministic way back onto (at most) n values in the
range [0,N), making those n values twice as likely to be chosen as the
other N-n values. So what you've just invented is an algorithm with
the complexity of the unbiased bitmask method, but basically the same
bias as the biased integer multiply method.

I don't understand why you object to advancing the state more than
once. Doing so doesn't make the resulting sequence of numbers any less
deterministic.

In fact, I'm pretty sure you *have to* advance the state more than
once in *any* unbiased scheme. That's a common characteristic of all
the unbiased methods I've seen, and intuitively, I think it has to be
so.

Otherwise, I'm happy with the use of the bitmask method, as long as
it's implemented correctly.

Regards,
Dean

Re: rand48 replacement

From

Fabien COELHO

Date:

04 July 2021, 09:35:09

Hello Dean,

>>   - moves the stuff to common and fully removes random/srandom (Tom)
>>   - includes a range generation function based on the bitmask method (Dean)
>>     but iterates with splitmix so that the state always advances once (Me)
>
> At the risk of repeating myself: do *not* invent your own scheme.

> The problem with iterating using splitmix is that splitmix is a simple
> shuffling function that takes a single input and returns a mutated
> output depending only on that input.

It also iterates over its 64 bits state in a round robin fashion so that 
the cycle size is 2^64 (it is a simple add).

> So let's say for simplicity that you're generating numbers in the range 
> [0,N) with N=2^64-n and n<2^63. Each of the n values in [N,2^64) that 
> lie outside the range wanted are just mapped in a deterministic way back 
> onto (at most) n values in the range [0,N), making those n values twice 
> as likely to be chosen as the other N-n values.

I do understand your point. If the value is outside the range, splitmix 
iterates over its seed and the extraction functions produces a new number 
which is tested again. I do not understand the "mapped back onto" part, 
the out of range value is just discarded and the loops starts over with a 
new derivation, and why it would imply that some values are more likely to 
come out.

> So what you've just invented is an algorithm with the complexity of the 
> unbiased bitmask method,

That is what I am trying to implement.

> but basically the same bias as the biased integer multiply method.

I did not understand.

> I don't understand why you object to advancing the state more than
> once. Doing so doesn't make the resulting sequence of numbers any less
> deterministic.

It does, somehow, hence my struggle to try to avoid it.

   call seed(0xdeadbeef);
   x1 = somepseudorand();
   x2 = somepseudorand();
   x3 = somepsuedorand();

I think we should want x3 to be the same result whatever the previous 
calls to the API.

> In fact, I'm pretty sure you *have to* advance the state more than
> once in *any* unbiased scheme. That's a common characteristic of all
> the unbiased methods I've seen, and intuitively, I think it has to be
> so.

Yes and no. We can advance another state seeded by the root prng.

> Otherwise, I'm happy with the use of the bitmask method, as long as
> it's implemented correctly.

I did not understand why it is not correct.

-- 
Fabien.

Re: rand48 replacement

From

Dean Rasheed

Date:

04 July 2021, 11:30:41

On Sun, 4 Jul 2021 at 10:35, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> I did not understand why it is not correct.
>

Well, to make it easier to visualise, let's imagine our word size is
just 3 bits instead of 64 bits, and that the basic prng() function
generates numbers in the range [0,8). Similarly, imagine a splitmix3()
that operates on 3-bit values. So it might do something like this
(state offset and return values made up):

splitmix3(state):
  state=0 -> 5, return 2
  state=1 -> 6, return 5
  state=2 -> 7, return 0
  state=3 -> 0, return 3
  state=4 -> 1, return 6
  state=5 -> 2, return 1
  state=6 -> 3, return 7
  state=7 -> 4, return 4

Now suppose we want a random number in the range [0,6). This is what
happens with your algorithm for each of the possible prng() return
values:

  prng() returns 0 -- OK
  prng() returns 1 -- OK
  prng() returns 2 -- OK
  prng() returns 3 -- OK
  prng() returns 4 -- OK
  prng() returns 5 -- OK
  prng() returns 6 -- out of range so use splitmix3() with initial state=6:
    state=6 -> 3, return 7 -- still out of range, so repeat
    state=3 -> 0, return 3 -- now OK
  prng() returns 7 -- out of range so use splitmix3() with initial state=7:
    state=7 -> 4, return 4 -- now OK

So, assuming that prng() chooses each of the 8 possible values with
equal probability (1/8), the overall result is that the values 0,1,2
and 5 are returned with a probability of 1/8, whereas 3 and 4 are
returned with a probability of 2/8.

Using the correct implementation of the bitmask algorithm, each
iteration calls prng() again, so in the end no particular return value
is ever more likely than any other (hence it's unbiased).

As for determinism, the end result is still fully deterministic. For
example, lets say that prng() returns the following sequence, for some
initial state:

  1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,...

then the bitmask method just returns that sequence with all the 6's
and 7's removed:

  1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,...

and that same sequence will always be returned, when starting from
that initial seed.

Regards,
Dean

Re: rand48 replacement

From

Dean Rasheed

Date:

04 July 2021, 17:35:33

On Sun, 4 Jul 2021 at 17:03, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > As for determinism, the end result is still fully deterministic.
>
> The result is indeed deterministic of you call the function with the same
> range. However, if you change the range value in one place then sometimes
> the state can advance differently, so the determinism is lost, meaning
> that it depends on actual range values.

Ah yes, that's true. I can trivially reproduce that in other languages
too. For example, in python, if I call random.seed(0) and then
random.randrange() with the inputs 10,10,10 then the results are
6,6,0. But if the randrange() inputs are 10,1000,10 then the results
are 6,776,6. So the result from the 3rd call changes as a result of
changing the 2nd input. That's not entirely surprising. The important
property of determinism is that if I set a seed, and then make an
identical set of calls to the random API, the results will be
identical every time, so that it's possible to write tests with
predictable/repeatable results.

> I would work around that by
> deriving another 128 bit generator instead of splitmix 64 bit, but that is
> overkill.

Not really relevant now, but I'm pretty sure that's impossible to do.
You might try it as an interesting academic exercise, but I believe
it's a logical impossibility.

Regards,
Dean

Re: rand48 replacement

From

Fabien COELHO

Date:

04 July 2021, 20:29:48

> The important property of determinism is that if I set a seed, and then 
> make an identical set of calls to the random API, the results will be 
> identical every time, so that it's possible to write tests with 
> predictable/repeatable results.

Hmmm… I like my stronger determinism definition more than this one:-)

>> I would work around that by deriving another 128 bit generator instead 
>> of splitmix 64 bit, but that is overkill.
>
> Not really relevant now, but I'm pretty sure that's impossible to do.
> You might try it as an interesting academic exercise, but I believe
> it's a logical impossibility.

Hmmm… I was simply thinking of seeding a new pg_prng_state from the main 
pg_prng_state with some transformation, and then iterate over the second 
PRNG, pretty much like I did with splitmix, but with 128 bits so that your 
#states argument does not apply, i.e. something like:

  /* select in a range with bitmask rejection */
  uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
  {
     /* always advance state once */
     uint64 next = xoroshiro128ss(state);
     uint64 val;

     if (range >= 2)
     {
         uint64 mask = mask_u64(range-1);

         val = next & mask;

         if (val >= range)
         {
             /* copy and update current prng state */
             pg_prng_state iterator = *state;

             iterator.s0 ^= next;
             iterator.s1 += UINT64CONST(0x9E3779B97f4A7C15);

             /* iterate till val in [0, range) */
             while ((val = xoroshiro128ss(&iterator) & mask) >= range)
                 ;
         }
     }
     else
         val = 0;

     return val;
  }

The initial pseudo-random sequence is left to proceed, and a new PRNG is 
basically forked for iterating on the mask, if needed.

-- 
Fabien.

Re: rand48 replacement

From

Yura Sokolov

Date:

05 July 2021, 06:36:27

Fabien COELHO писал 2021-07-04 23:29:
>> The important property of determinism is that if I set a seed, and 
>> then make an identical set of calls to the random API, the results 
>> will be identical every time, so that it's possible to write tests 
>> with predictable/repeatable results.
> 
> Hmmm… I like my stronger determinism definition more than this one:-)
> 
>>> I would work around that by deriving another 128 bit generator 
>>> instead of splitmix 64 bit, but that is overkill.
>> 
>> Not really relevant now, but I'm pretty sure that's impossible to do.
>> You might try it as an interesting academic exercise, but I believe
>> it's a logical impossibility.
> 
> Hmmm… I was simply thinking of seeding a new pg_prng_state from the
> main pg_prng_state with some transformation, and then iterate over the
> second PRNG, pretty much like I did with splitmix, but with 128 bits
> so that your #states argument does not apply, i.e. something like:
> 
>  /* select in a range with bitmask rejection */
>  uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
>  {
>     /* always advance state once */
>     uint64 next = xoroshiro128ss(state);
>     uint64 val;
> 
>     if (range >= 2)
>     {
>         uint64 mask = mask_u64(range-1);
> 
>         val = next & mask;
> 
>         if (val >= range)
>         {
>             /* copy and update current prng state */
>             pg_prng_state iterator = *state;
> 
>             iterator.s0 ^= next;
>             iterator.s1 += UINT64CONST(0x9E3779B97f4A7C15);
> 
>             /* iterate till val in [0, range) */
>             while ((val = xoroshiro128ss(&iterator) & mask) >= range)
>                 ;
>         }
>     }
>     else
>         val = 0;
> 
>     return val;
>  }
> 
> The initial pseudo-random sequence is left to proceed, and a new PRNG
> is basically forked for iterating on the mask, if needed.

I believe most "range" values are small, much smaller than UINT32_MAX.
In this case, according to [1] fastest method is Lemire's one (I'd take
original version from [2])

Therefore combined method pg_prng_u64_range could branch on range value

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
   uint64 val = xoroshiro128ss(state);
   uint64 m;
   if ((range & (range-1) == 0) /* handle all power 2 cases */
     return range != 0 ? val & (range-1) : 0;
   if (likely(range < PG_UINT32_MAX/32)
   {
     /*
      * Daniel Lamire's unbiased range random algorithm based on 
rejection sampling
      * https://lemire.me/blog/2016/06/30/fast-random-shuffling/
      */
     m = (uint32)val * range;
     if ((uint32)m < range)
     {
       uint32 t = -range % range;
       while ((uint32)m < t)
         m = (uint32)xoroshiro128ss(state) * range;
     }
     return m >> 32;
   }
   /* Apple's mask method */
   m = mask_u64(range-1);
   val &= m;
   while (val >= range)
     val = xoroshiro128ss(state) & m;
   return val;
}

Mask method could also be faster when range is close to mask.
For example, fast check for "range is within 1/1024 to mask" is
   range < (range/512 + (range&(range*2)))

And then method choose could like:
   if (likely(range < UINT32_MAX/32 && range > (range/512 + 
(range&(range*2)))))

But I don't know does additional condition worth difference or not.

[1] https://www.pcg-random.org/posts/bounded-rands.html
[2] https://lemire.me/blog/2016/06/30/fast-random-shuffling/

regards,
Sokolov Yura

Re: rand48 replacement

From

Fabien COELHO

Date:

06 July 2021, 06:13:36

Hello Yura,

> I believe most "range" values are small, much smaller than UINT32_MAX.
> In this case, according to [1] fastest method is Lemire's one (I'd take
> original version from [2]) [...]

Yep.

I share your point that the range is more often 32 bits.

However, I'm not enthousiastic at combining two methods depending on the 
range, the function looks complex enough without that, so I would suggest 
not to take this option. Also, the decision process adds to the average 
cost, which is undesirable. I would certainly select the unbias multiply 
method if we want a u32 range variant.

-- 
Fabien.

Re: rand48 replacement

From

Yura Sokolov

Date:

06 July 2021, 07:19:45

Fabien COELHO писал 2021-07-06 09:13:
> Hello Yura,
> 
>> I believe most "range" values are small, much smaller than UINT32_MAX.
>> In this case, according to [1] fastest method is Lemire's one (I'd 
>> take
>> original version from [2]) [...]
> 
> Yep.
> 
> I share your point that the range is more often 32 bits.
> 
> However, I'm not enthousiastic at combining two methods depending on
> the range, the function looks complex enough without that, so I would
> suggest not to take this option. Also, the decision process adds to
> the average cost, which is undesirable.

Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.

And as Dean Rasheed correctly mentioned, mask method will
have really bad pattern for branch predictor if range is not just below
or equal to power of two.
For example, rand_range(10000) will have 60% probability to pass through
`while (val > range)` and 40% probability to go to next loop iteration.
rand_range(100000) will have 76%/24% probabilities. Branch predictor
doesn't like it. Even rand_range(1000000), which is quite close to 2^20,
will have 95%/5%, and still not enough to please BP.

But with unbias multiply method it will be 0.0002%/99.9998% for 10000,
0,002%/99.998% for 100000 and 0.02%/99.98% for 1000000 - much-much 
better.
Branch predictor will make it almost free (i hope).

And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
on modern processor. Well, probably it could run in parallel with some
part of xoroshiro, but it depends on how compiler will optimize this
function.

> I would certainly select the unbias multiply method if we want a u32 
> range variant.

There could be two functions.

regards,
Sokolov Yura.

Re: rand48 replacement

From

Fabien COELHO

Date:

06 July 2021, 20:49:07

Hello Yura,

>> However, I'm not enthousiastic at combining two methods depending on
>> the range, the function looks complex enough without that, so I would
>> suggest not to take this option. Also, the decision process adds to
>> the average cost, which is undesirable.
>
> Given 99.99% cases will be in the likely case, branch predictor should
> eliminate decision cost.

Hmmm. ISTM that a branch predictor should predict that unknown < small 
should probably be false, so a hint should be given that it is really 
true.

> And as Dean Rasheed correctly mentioned, mask method will have really 
> bad pattern for branch predictor if range is not just below or equal to 
> power of two.

On average the bitmask is the better unbiased method, if the online 
figures are to be trusted. Also, as already said, I do not really want to 
add code complexity, especially to get lower average performance, and 
especially with code like "threshold = - range % range", where both 
variables are unsigned, I have a headache just looking at it:-)

> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
> on modern processor.

Well, % is not cheap either.

> Well, probably it could run in parallel with some part of xoroshiro, but 
> it depends on how compiler will optimize this function.
>
>> I would certainly select the unbias multiply method if we want a u32 
>> range variant.
>
> There could be two functions.

Yep, but do we need them? Who is likely to want 32 bits pseudo random 
ints in a range? pgbench needs 64 bits.

So I'm still inclined to just keep the faster-on-average bitmask method, 
despite that it may be slower for some ranges. The average cost for the 
worst case in PRNG calls is, if I'm not mistaken:

   1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2

which does not look too bad.

-- 
Fabien.

Re: rand48 replacement

From

Yura Sokolov

Date:

07 July 2021, 03:00:47

Fabien COELHO писал 2021-07-06 23:49:
> Hello Yura,
> 
>>> However, I'm not enthousiastic at combining two methods depending on
>>> the range, the function looks complex enough without that, so I would
>>> suggest not to take this option. Also, the decision process adds to
>>> the average cost, which is undesirable.
>> 
>> Given 99.99% cases will be in the likely case, branch predictor should
>> eliminate decision cost.
> 
> Hmmm. ISTM that a branch predictor should predict that unknown < small
> should probably be false, so a hint should be given that it is really
> true.

Why? Branch predictor is history based: if it were usually true here
then it will be true this time either.
unknown < small is usually true therefore branch predictor will assume
it is true.

I put `likely` for compiler: compiler then puts `likely` path closer.

> 
>> And as Dean Rasheed correctly mentioned, mask method will have really 
>> bad pattern for branch predictor if range is not just below or equal 
>> to power of two.
> 
> On average the bitmask is the better unbiased method, if the online
> figures are to be trusted. Also, as already said, I do not really want
> to add code complexity, especially to get lower average performance,
> and especially with code like "threshold = - range % range", where
> both variables are unsigned, I have a headache just looking at it:-)

If you mention https://www.pcg-random.org/posts/bounded-rands.html then
1. first graphs are made with not exact Lemire's code.
   Last code from 
https://lemire.me/blog/2016/06/30/fast-random-shuffling/
   (which I derived from) performs modulo operation only if `(leftover < 
range)`.
   Even with `rand_range(1000000)` it is just once in four thousands 
runs.
2. You can see "Small-Constant Benchmark" at that page, Debiased Int is
   1.5 times faster. And even in "Small-Shuffle" benchmark their 
unoptimized
   version is on-par with mask method.
3. If you go to "Making Improvements/Faster Threshold-Based Discarding"
   section, then you'll see code my version is matched with. It is twice
   faster than masked method in Small-Shuffle benchmark, and just a bit
   slower in Large-Shuffle.

> 
>> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
>> on modern processor.
> 
> Well, % is not cheap either.
> 
>> Well, probably it could run in parallel with some part of xoroshiro, 
>> but it depends on how compiler will optimize this function.
>> 
>>> I would certainly select the unbias multiply method if we want a u32 
>>> range variant.
>> 
>> There could be two functions.
> 
> Yep, but do we need them? Who is likely to want 32 bits pseudo random
> ints in a range? pgbench needs 64 bits.
> 
> So I'm still inclined to just keep the faster-on-average bitmask
> method, despite that it may be slower for some ranges. The average
> cost for the worst case in PRNG calls is, if I'm not mistaken:
> 
>   1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2
> 
> which does not look too bad.

You doesn't count cost of branch-misprediction.
https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array
https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/
Therefore calculation should be at least:

    1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6

By the way, we have 64bit random. If we use 44bit from it for range <= 
(1<<20), then
bias will be less than 1/(2**24). Could we just ignore it (given it is 
not crypto
strong random)?

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
   uint64 val = xoroshiro128ss(state);
   uint64 m;
   if ((range & (range-1) == 0) /* handle all power 2 cases */
     return range != 0 ? val & (range-1) : 0;
   if (likely(range < (1<<20)))
      /*
       * While multiply method is biased, bias will be smaller than 
1/(1<<24) for
       * such small ranges. Lets ignore it.
       */
      return ((val >> 20) * range) >> 44;
   /* Apple's mask method */
   m = mask_u64(range-1);
   val &= m;
   while (val >= range)
     val = xoroshiro128ss(state) & m;
   return val;
}

Anyway, excuse me for heating this discussion cause of such 
non-essential issue.
I'll try to control myself and don't proceed it further.

regards,
Sokolov Yura.

Re: rand48 replacement

From

Dean Rasheed

Date:

07 July 2021, 10:29:07

On Wed, 7 Jul 2021 at 04:00, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>
> Anyway, excuse me for heating this discussion cause of such
> non-essential issue.
> I'll try to control myself and don't proceed it further.
>

Whilst it has been interesting learning and discussing all these
different techniques, I think it's probably best to stick with the
bitmask method, rather than making the code too complex and difficult
to follow. The bitmask method has the advantage of being very simple,
easy to understand and fast (fastest in many of the benchmarks, and
close enough in others to make me think that the difference won't
matter for our purposes).

To test the current patch, I hacked up a simple SQL-callable server
function: random(bigint, bigint) returns bigint, similar to the one in
pgbench. After doing so, I couldn't help thinking that it would be
useful to have such a function in core, so maybe that could be a
follow-on patch. Anyway, that led to the following observations:

Firstly, there's a bug in the existing mask_u64() code -- if
pg_leftmost_one_pos64(u) returns 63, you end up with a mask equal to
0, and it breaks down.

Secondly, I think it would be simpler to implement this as a bitshift,
rather than a bitmask, using the high bits from the random number.
That might not make much difference for xoroshiro**, but in general,
PRNGs tend to be weaker in the lower bits, so it seems preferable on
that basis. But also, it makes the code simpler and less error-prone.

Finally, I think it would be better to treat the upper bound of the
range as inclusive. Doing so makes the function able to cover all
possible 64-bit ranges. It would then be easy (perhaps in another
follow-on patch) to make the pgbench random() function work for all
64-bit bounds (as long as max >= min), without the weird overflow
checking it currently has.

Putting those 3 things together, the code (minus comments) becomes:

    if (range > 0)
    {
        int rshift = 63 - pg_leftmost_one_pos64(range);

        do
        {
            val = xoroshiro128ss(state) >> rshift;
        }
        while (val > range);
    }
    else
        val = 0;

which reduces the complexity a bit.

Regards,
Dean

Re: rand48 replacement

From

Dean Rasheed

Date:

08 July 2021, 09:08:52

On Thu, 8 Jul 2021 at 09:26, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > Finally, I think it would be better to treat the upper bound of the
> > range as inclusive.
>
> This bothered me as well, but the usual approach seems to use range as the
> number of values, so I was hesitant to depart from that. I'm still
> hesitant to go that way.
>

Yeah, that bothered me too.

For example, java.util.Random.nextInt(bound) returns a value in the
range [0,bound).

But other implementations are not all like that. For example python's
random.randint(a,b) returns a value in the range [a,b].

Python also has random.randrange(start,stop[,step]), which is designed
for compatibility with their range(start,stop[,step]) function, which
treats "stop" as exclusive.

However, Postgres tends to go the other way, and treat the upper bound
as inclusive, as in, for example, generate_series() and pgbench's
random() function.

I think it makes more sense to do it that way, because then such
functions can work all the way up to and including the limit of the
bound's datatype, which makes them more general.

Regards,
Dean

Re: rand48 replacement

From

Fabien COELHO

Date:

08 July 2021, 12:31:16

Hello Yura,

>>> Given 99.99% cases will be in the likely case, branch predictor should
>>> eliminate decision cost.
>> 
>> Hmmm. ISTM that a branch predictor should predict that unknown < small
>> should probably be false, so a hint should be given that it is really
>> true.
>
> Why? Branch predictor is history based:

Hmmm. This means running the compiler with some special options, running 
the code on significant and representative data, then recompiling based on 
collected branch stats. This is not the usual way pg is built.

> if it were usually true here then it will be true this time either. 
> unknown < small is usually true therefore branch predictor will assume 
> it is true.
>
> I put `likely` for compiler: compiler then puts `likely` path closer.

Yes, an explicit hint is needed.

>>> And as Dean Rasheed correctly mentioned, mask method will have really bad 
>>> pattern for branch predictor if range is not just below or equal to power 
>>> of two.
>> 
>> On average the bitmask is the better unbiased method, if the online
>> figures are to be trusted. Also, as already said, I do not really want
>> to add code complexity, especially to get lower average performance,
>> and especially with code like "threshold = - range % range", where
>> both variables are unsigned, I have a headache just looking at it:-)
>
> If you mention https://www.pcg-random.org/posts/bounded-rands.html then

Indeed, this is the figures I was refering to when saying that bitmask 
looks the best method.

> 1. first graphs are made with not exact Lemire's code.
>  Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/

Ok, other figures, however there is no comparison with the mask method in 
this post, it mostly argues agains division/modulo.

> By the way, we have 64bit random. If we use 44bit from it for range <= 
> (1<<20), then bias will be less than 1/(2**24). Could we just ignore it 
> (given it is not crypto strong random)?

That was my initial opinion, by Dean insists on an unbiased method. I 
agree with Dean that performance, if it is not too bad, does not matter 
that much, so that I'm trying to keep the code simple as a main objective.

You do not seem ready to buy this argument. Note that despite that my 
research is about compiler optimizations, I did bought it:-)

Given the overheads involved in pgbench, the performance impact of best vs 
worst case scenario is minimal:

   \set i random(0, 7) -- 8 values, good for mask: 4.515 Mtps
   \set i random(0, 8) -- 9 values, bad for mask: 4.151 Mtps

sp the performance penalty is about 8%.

>  if ((range & (range-1) == 0) /* handle all power 2 cases */
>    return range != 0 ? val & (range-1) : 0;
>  if (likely(range < (1<<20)))
>     /*
>      * While multiply method is biased, bias will be smaller than 1/(1<<24) 
> for
>      * such small ranges. Lets ignore it.
>      */
>     return ((val >> 20) * range) >> 44;
>  /* Apple's mask method */
>  m = mask_u64(range-1);
>  val &= m;
>  while (val >= range)
>    val = xoroshiro128ss(state) & m;
>  return val;
> }

Hmmm. The code looks "simple" enough, but I do not like optimizing for 20 
bits values is worth it, especially as the bitmask method seems the best 
anyway. We were discussing 32 bits before.

> Anyway, excuse me for heating this discussion cause of such 
> non-essential issue.

Well, I like to discuss things!

> I'll try to control myself and don't proceed it further.

Yep. We have to compromise at some point. The majority opinion seems to be 
that we want code simplicity more, so the bitmask it is. I've posted a 
v10.

Thanks for the interesting discussion and arguments!

-- 
Fabien.

Re: rand48 replacement

From

Aleksander Alekseev

Date:

24 September 2021, 12:50:00

Hi Fabien,

> Attached a v10 which is some kind of compromise where the interface uses
> inclusive min and max bounds, so that all values can be reached.

Just wanted to let you know that cfbot [1] doesn't seem to be happy with the patch. Apparently, some tests are falling. To be honest, I didn't invest too much time into investigating this. Hopefully, it's not a big deal.

[1]: http://cfbot.cputube.org/

--
Best regards,
Aleksander Alekseev

Re: rand48 replacement

From

Fabien COELHO

Date:

25 September 2021, 07:40:50

Hello Aleksander,

>> Attached a v10 which is some kind of compromise where the interface uses
>> inclusive min and max bounds, so that all values can be reached.
>
> Just wanted to let you know that cfbot [1] doesn't seem to be happy with
> the patch. Apparently, some tests are falling. To be honest, I didn't
> invest too much time into investigating this. Hopefully, it's not a big
> deal.
>
> [1]: http://cfbot.cputube.org/

Indeed. I wish that these results would be available from the cf 
interface.

Attached a v11 which might improve things.

Thanks for the ping!

-- 
Fabien.

Attachment

prng-11.patch

Re: rand48 replacement

From

Fabien COELHO

Date:

25 September 2021, 14:44:52

Hello again,

>> Just wanted to let you know that cfbot [1] doesn't seem to be happy with
>> the patch. Apparently, some tests are falling. To be honest, I didn't
>> invest too much time into investigating this. Hopefully, it's not a big
>> deal.
>> 
>> [1]: http://cfbot.cputube.org/
>
> Indeed. I wish that these results would be available from the cf interface.
>
> Attached a v11 which might improve things.

Not enough. Here is a v12 which might improve things further.

-- 
Fabien.

Attachment

prng-12.patch

Re: rand48 replacement

From

Fabien COELHO

Date:

25 September 2021, 16:26:41

<resent because of list filter>

>>> [1]: http://cfbot.cputube.org/
>> 
>> Indeed. I wish that these results would be available from the cf interface.
>> 
>> Attached a v11 which might improve things.
>
> Not enough. Here is a v12 which might improve things further.

Not enough. Here is a v13 which might improve things further more.

-- 
Fabien.

Attachment

prng-13.patch

Re: rand48 replacement

From

Fabien COELHO

Date:

25 September 2021, 17:04:25

>>>> [1]: http://cfbot.cputube.org/
>>> 
>>> Indeed. I wish that these results would be available from the cf 
>>> interface.
>>> 
>>> Attached a v11 which might improve things.
>> 
>> Not enough. Here is a v12 which might improve things further.
>
> Not enough. Here is a v13 which might improve things further more.

Not enough. Here is a v14 which might improve things further more again. 
Sorry for this noise due to blind windows tests.

-- 
Fabien.

Attachment

prng-14.patch

Re: rand48 replacement

From

Tom Lane

Date:

25 September 2021, 17:23:22

Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Not enough. Here is a v14 which might improve things further more again. 
> Sorry for this noise due to blind windows tests.

Just FTR, I strongly object to your removal of process-startup srandom()
calls.  Those are not only setting the seed for our own use, but also
ensuring that things like random() calls within PL functions or other
libraries aren't 100% predictable.

            regards, tom lane

Re: rand48 replacement

From

Fabien COELHO

Date:

25 September 2021, 20:08:57

Hello Tom,

> Just FTR, I strongly object to your removal of process-startup srandom()
> calls.

Ok. The point of the patch is to replace and unify the postgres underlying 
PRNG, so there was some logic behind this removal.

> Those are not only setting the seed for our own use, but also ensuring 
> that things like random() calls within PL functions or other libraries 
> aren't 100% predictable.

Sure, they shouldn't be predictable.

Attached v15 also does call srandom if it is there, and fixes yet another 
remaining random call.

-- 
Fabien.

Attachment

prng-15.patch

Re: rand48 replacement

From

Fabien COELHO

Date:

26 September 2021, 05:55:22

>> Just FTR, I strongly object to your removal of process-startup srandom()
>> calls.
>
> Ok. The point of the patch is to replace and unify the postgres underlying 
> PRNG, so there was some logic behind this removal.

FTR, this was triggered by your comment on Jul 1:

>> [...] I see that you probably did that because random.c and srandom.c 
>> depend on it, but I wonder why we don't make an effort to flush those 
>> altogether.  It's surely pretty confusing to newbies that what appears 
>> to be a call of the libc primitives is no such thing.

I understood "flushing s?random.c" as that it would be a good thing to 
remove their definitions, hence their calls, whereas in the initial patch 
I provided a replacement for srandom & random.

-- 
Fabien.

Re: rand48 replacement

From

Fabien COELHO

Date:

30 September 2021, 08:23:00

> Attached v15 also does call srandom if it is there, and fixes yet another 
> remaining random call.

I think that I have now removed all references to "random" from pg source. 
However, the test still fails on windows, because the linker does not find 
a global variable when compiling extensions, but it seems to find the 
functions defined in the very same file...

Link:
  4130  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\link.exe /ERRORREPORT:QUEUE
/OUT:".\Release\tablefunc\tablefunc.dll"/INCREMENTAL:NO /NOLOGO Release/postgres/postgres.lib kernel32.lib user32.lib
gdi32.libwinspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib
/NODEFAULTLIB:libc/DEF:"./Release/tablefunc/tablefunc.def" /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'"
/manifest:embed/DEBUG /PDB:".\Release\tablefunc\tablefunc.pdb" /SUBSYSTEM:CONSOLE /STACK:"4194304" /TLBID:1
/DYNAMICBASE:NO/NXCOMPAT /IMPLIB:"Release/tablefunc/tablefunc.lib" /MACHINE:X64 /ignore:4197 /DLL
.\Release\tablefunc\win32ver.res
  4131  .\Release\tablefunc\tablefunc.obj
  4132     Creating library Release/tablefunc/tablefunc.lib and object Release/tablefunc/tablefunc.exp
  4133 tablefunc.obj : error LNK2001: unresolved external symbol pg_global_prng_state
[C:\projects\postgresql\tablefunc.vcxproj]
  4134 .\Release\tablefunc\tablefunc.dll : fatal error LNK1120: 1 unresolved externals
[C:\projects\postgresql\tablefunc.vcxproj]
  4135 Done Building Project "C:\projects\postgresql\tablefunc.vcxproj" (default targets) -- FAILED.

The missing symbol is really defined in common/pg_prng.c which AFAICT is 
linked with postgres.

If someone experienced with the windows compilation chain could give a 
hint of what is needed, I'd appreciate it!

-- 
Fabien.

Re: rand48 replacement

From

Thomas Munro

Date:

30 September 2021, 09:31:03

On Thu, Sep 30, 2021 at 9:23 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> The missing symbol is really defined in common/pg_prng.c which AFAICT is
> linked with postgres.
>
> If someone experienced with the windows compilation chain could give a
> hint of what is needed, I'd appreciate it!

I guess the declaration needs PGDLLIMPORT.

Re: rand48 replacement

From

Fabien COELHO

Date:

30 September 2021, 10:36:47

> I guess the declaration needs PGDLLIMPORT.

Indeed, thanks!

Attached v16 adds that.

-- 
Fabien.

Attachment

prng-16.patch

Re: rand48 replacement

From

Aleksander Alekseev

Date:

22 November 2021, 13:57:56

Hi hackers,

>
> > I guess the declaration needs PGDLLIMPORT.
>
> Indeed, thanks!
>
> Attached v16 adds that.

It looks like the patch is in pretty good shape. I noticed that the
return value of pg_prng_strong_seed() is not checked in several
places, also there was a typo in pg_trgm.c. The corrected patch is
attached. Assuming the new version will not upset cfbot, I would call
the patch "Ready for Committer".

-- 
Best regards,
Aleksander Alekseev

Attachment

prng-17.patch

Re: rand48 replacement

From

Tom Lane

Date:

26 November 2021, 18:26:39

Aleksander Alekseev <aleksander@timescale.com> writes:
> It looks like the patch is in pretty good shape. I noticed that the
> return value of pg_prng_strong_seed() is not checked in several
> places, also there was a typo in pg_trgm.c. The corrected patch is
> attached. Assuming the new version will not upset cfbot, I would call
> the patch "Ready for Committer".

I took a quick look through this.  The biggest substantive point
I found was that you didn't update the configure script.  It's
certainly not appropriate for configure to continue to do
AC_REPLACE_FUNCS on random and srandom when you've removed the
src/port files that that would attempt to include.

The simplest change here is just to delete those entries from the
list, but that would also result in not #define'ing HAVE_RANDOM
or HAVE_SRANDOM, and I see that the patch introduces a dependency
on the latter.  I'm inclined to think that's misguided.  srandom()
has been required by POSIX since SUSv2, and we certainly have not
got any non-Windows buildfarm members that lack it.  So I don't
think we really need a configure check.  What we do need is a decision
about what to do on Windows.  We could write it like

+#ifndef WIN32
+    srandom(pg_prng_i32(&pg_global_prng_state));
+#endif

but I have a different modest suggestion: add

#define srandom(seed) srand(seed)

in win32_port.h.  As far as I can see from Microsoft's docs [1],
srand() is exactly like srandom(), they just had some compulsion
to not be POSIX-compatible.

BTW, the commentary in InitProcessGlobals is now completely
inadequate; it's unclear to a reader why we should be bothering
ith srandom().  I suggest adding a comment right before the
srandom() call, along the lines of

    /*
     * Also make sure that we've set a good seed for random() (or rand()
     * on Windows).  Use of those functions is deprecated in core
     * Postgres, but they might get used by extensions.
     */

+/* use Donald Knuth's LCG constants for default state */

How did Knuth get into this?  This algorithm is certainly not his,
so why are those constants at all relevant?

Other cosmetic/commentary issues:

* I could do without the stream-of-consciousness notes in pg_prng.c.
I think what's appropriate is to say "We use thus-and-such a generator
which is documented here", maybe with a line or two about its properties.

* Function names like these convey practically nothing to readers:

+extern int64 pg_prng_i64(pg_prng_state *state);
+extern uint32 pg_prng_u32(pg_prng_state *state);
+extern int32 pg_prng_i32(pg_prng_state *state);
+extern double pg_prng_f64(pg_prng_state *state);
+extern bool pg_prng_bool(pg_prng_state *state);

and these functions' header comments add a grand total of zero bits
of information.  What someone generally wants to know first about
a PRNG is (a) is it uniform and (b) what is the range of outputs,
neither of which are specified anywhere.

+#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000)
+#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF)
+#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF)

I'm not sure that these named constants are any more readable than
writing the underlying constant, maybe less so --- in particular
I think something based on (1<<52)-1 would be more appropriate for
the float mantissa operations.  We don't need RIGHT_HALF_MASK at
all, the casts to uint32 or int32 will accomplish that just fine.

BTW, why are we bothering with FIRST_BIT_MASK in the first place,
rather than returning "v & 1" for pg_prng_bool?  Is xoroshiro128ss
less random in the low-order bits than the higher?  If so, that would
be a pretty important thing to document.  If it's not, we shouldn't
make the code look like it is.

+ * select in a range with bitmask rejection.

What is "bitmask rejection"?  Is it actually important to callers?
I think this should be documented more like "Produce a random
integer uniformly selected from the range [rmin, rmax)."

            regards, tom lane

[1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/srand?view=msvc-170

Re: rand48 replacement

From

Tom Lane

Date:

26 November 2021, 19:25:27

I wrote:
> ... What we do need is a decision
> about what to do on Windows.  We could write it like
> +#ifndef WIN32
> +    srandom(pg_prng_i32(&pg_global_prng_state));
> +#endif
> but I have a different modest suggestion: add
> #define srandom(seed) srand(seed)
> in win32_port.h.  As far as I can see from Microsoft's docs [1],
> srand() is exactly like srandom(), they just had some compulsion
> to not be POSIX-compatible.

Oh, wait, I take that back --- rand()/srand() are also in POSIX,
and in the C99 standard (which presumably is where Microsoft got
them from).  They're deprecated by POSIX on the grounds that the
spec only allows them to have 32 bits of state, so they can't be
terribly random.  Given that, I think we should just avert our eyes;
anybody depending on those functions is destined to lose anyway.
Probably the "#ifndef WIN32" fragment suggested above is enough.
I suppose we could *also* call srand() but that feels a bit silly.

            regards, tom lane

Re: rand48 replacement

From

Fabien COELHO

Date:

27 November 2021, 19:11:34

Hello Tom,

Thanks for the feedback.

> +/* use Donald Knuth's LCG constants for default state */
>
> How did Knuth get into this?  This algorithm is certainly not his,
> so why are those constants at all relevant?

They are not more nor less relevant than any other "random" constant. The 
state needs a default initialization. The point of using DK's is that it 
is somehow cannot be some specially crafted value which would have some 
special property only know to the purveyor of the constant and could be 
used by them to break the algorithm.

   https://en.wikipedia.org/wiki/Dual_EC_DRBG

> * I could do without the stream-of-consciousness notes in pg_prng.c.
> I think what's appropriate is to say "We use thus-and-such a generator
> which is documented here", maybe with a line or two about its properties.

The stuff was really written essentially as a "why this" for the first 
patch, and to prevent questions about "why not this other generator" 
later, because it could never stop.

> * Function names like these convey practically nothing to readers:
>
> +extern int64 pg_prng_i64(pg_prng_state *state);
> +extern uint32 pg_prng_u32(pg_prng_state *state);
> +extern int32 pg_prng_i32(pg_prng_state *state);
> +extern double pg_prng_f64(pg_prng_state *state);
> +extern bool pg_prng_bool(pg_prng_state *state);

The intention is obviously "postgres pseudo-random number generator for 
<type>". ISTM that it conveys (1) that it is a postgres-specific stuff, 
(2) that it is a PRNG (which I find *MUCH* more informative than the 
misleading statement that something is random when it is not, and it is 
shorter) and (3) about the type it returns, because C does require 
functions to have distinct names.

What would you suggest?

> and these functions' header comments add a grand total of zero bits
> of information.

Yes, probably. I do not like not to comment at all on a function.

> What someone generally wants to know first about a PRNG is (a) is it 
> uniform and (b) what is the range of outputs, neither of which are 
> specified anywhere.

ISTM (b) is suggested thanks to the type and (a) I'm not sure about a PRNG 
which would claim not at least claim to be uniform. Non uniform PRNG are 
usually built based on a uniform one.

What do you suggest as alternate names?

> +#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000)
> +#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF)
> +#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF)
>
> I'm not sure that these named constants are any more readable than
> writing the underlying constant, maybe less so --- in particular
> I think something based on (1<<52)-1 would be more appropriate for
> the float mantissa operations.  We don't need RIGHT_HALF_MASK at
> all, the casts to uint32 or int32 will accomplish that just fine.

Yep. I did it for uniformity.

> BTW, why are we bothering with FIRST_BIT_MASK in the first place,
> rather than returning "v & 1" for pg_prng_bool?

Because some PRNG are very bad in the low bits, not xoroshiro stuff, 
though.

> Is xoroshiro128ss less random in the low-order bits than the higher? 
> If so, that would be a pretty important thing to document.  If it's not, 
> we shouldn't make the code look like it is.

Dunno. Why should we prefer low bits?

> + * select in a range with bitmask rejection.
>
> What is "bitmask rejection"?  Is it actually important to callers?

No, it is important to understand how it does it. That is the name of the 
technique which is implemented, which helps if you want to understand what 
is going on by googling it. This point could be moved inside the function.

> I think this should be documented more like "Produce a random
> integer uniformly selected from the range [rmin, rmax)."

Sure.

-- 
Fabien.

Re: rand48 replacement

From

Tom Lane

Date:

27 November 2021, 19:49:11

Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> How did Knuth get into this?  This algorithm is certainly not his,
>> so why are those constants at all relevant?

> They are not more nor less relevant than any other "random" constant. The 
> state needs a default initialization. The point of using DK's is that it 
> is somehow cannot be some specially crafted value which would have some 
> special property only know to the purveyor of the constant and could be 
> used by them to break the algorithm.

Well, none of that is in the comment, which is probably just as well
because it reads like baseless paranoia.  *Any* initialization vector
should be as good as any other; if it's not, that's an algorithm fault.
(OK, I'll give it a pass for zeroes being bad, but otherwise not.)

>> * Function names like these convey practically nothing to readers:
>> 
>> +extern int64 pg_prng_i64(pg_prng_state *state);
>> +extern uint32 pg_prng_u32(pg_prng_state *state);
>> +extern int32 pg_prng_i32(pg_prng_state *state);
>> +extern double pg_prng_f64(pg_prng_state *state);
>> +extern bool pg_prng_bool(pg_prng_state *state);

> The intention is obviously "postgres pseudo-random number generator for 
> <type>". ISTM that it conveys (1) that it is a postgres-specific stuff, 
> (2) that it is a PRNG (which I find *MUCH* more informative than the 
> misleading statement that something is random when it is not, and it is 
> shorter) and (3) about the type it returns, because C does require 
> functions to have distinct names.

> What would you suggest?

We have names for these types, and those abbreviations are (mostly)
not them.  Name-wise I'd be all right with pg_prng_int64 and so on,
but I still expect that these functions' header comments should be
explicit about uniformity and about the precise output range.
As an example, it's far from obvious whether the minimum value
of pg_prng_int32 should be zero or INT_MIN.  (Actually, I suspect
you ought to provide both of those cases.)  And the output range
of pg_prng_float8 is not merely unobvious, but not very easy
to deduce from examining the code either; not that users should
have to.

>> BTW, why are we bothering with FIRST_BIT_MASK in the first place,
>> rather than returning "v & 1" for pg_prng_bool?

> Because some PRNG are very bad in the low bits, not xoroshiro stuff, 
> though.

Good, but then you shouldn't write associated code as if that's still
a problem, because you'll cause other people to think it's still a
problem and write equally contorted code elsewhere.  "v & 1" is a
transparent way of producing a bool, while this code isn't.

            regards, tom lane

Re: rand48 replacement

From

Fabien COELHO

Date:

28 November 2021, 09:26:57

Hello,

>> They are not more nor less relevant than any other "random" constant. The
>> state needs a default initialization. The point of using DK's is that it
>> is somehow cannot be some specially crafted value which would have some
>> special property only know to the purveyor of the constant and could be
>> used by them to break the algorithm.
>
> Well, none of that is in the comment, which is probably just as well
> because it reads like baseless paranoia.

Sure. Welcome to cryptography:-)

> *Any* initialization vector should be as good as any other; if it's not, 
> that's an algorithm fault.

Yep.

> (OK, I'll give it a pass for zeroes being bad, but otherwise not.)

Ok. We can use any non-zero constant. What's wrong with constants provided 
by a Turing award computer scientist? I find them more attractive that 
some stupid 0x0123456789….

>>> * Function names like these convey practically nothing to readers:
>>>
>>> +extern int64 pg_prng_i64(pg_prng_state *state); [...]
>
>> The intention is obviously "postgres pseudo-random number generator for
>> <type>". [...]
>
>> What would you suggest?
>
> We have names for these types, and those abbreviations are (mostly)
> not them.  Name-wise I'd be all right with pg_prng_int64 and so on,

Ok. You prefer "uint64" to "u64".

> but I still expect that these functions' header comments should be
> explicit about uniformity and about the precise output range.

Ok.

> As an example, it's far from obvious whether the minimum value
> of pg_prng_int32 should be zero or INT_MIN.
> (Actually, I suspect you ought to provide both of those cases.)

I agree that it is not obvious. I added "p" for "positive" variants. I 
found one place where one could be used.

> And the output range of pg_prng_float8 is not merely unobvious, but not 
> very easy to deduce from examining the code either; not that users 
> should have to.

Ok.

>>> BTW, why are we bothering with FIRST_BIT_MASK in the first place,
>>> rather than returning "v & 1" for pg_prng_bool?
>
>> Because some PRNG are very bad in the low bits, not xoroshiro stuff,
>> though.
>
> Good, but then you shouldn't write associated code as if that's still
> a problem, because you'll cause other people to think it's still a
> problem and write equally contorted code elsewhere.  "v & 1" is a
> transparent way of producing a bool, while this code isn't.

"v & 1" really produces an integer, not a bool. I'd prefer to actually 
generate a boolean and let the compiler optimizer do the cleaning.

Some Xoshiro-family generators have "linear artifacts in the low bits", 
Although Xoroshiro128** is supposed to be immune, I thought better to keep 
away from these, and I could not see why the last bit would be better than 
any other bit, so taking the first looked okay to me at least.

I think that the attached v18 addresses most of your concerns.

-- 
Fabien.

Attachment

prng-18.patch

Re: rand48 replacement

From

Tom Lane

Date:

29 November 2021, 02:40:41

Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Good, but then you shouldn't write associated code as if that's still
>> a problem, because you'll cause other people to think it's still a
>> problem and write equally contorted code elsewhere.  "v & 1" is a
>> transparent way of producing a bool, while this code isn't.

> Some Xoshiro-family generators have "linear artifacts in the low bits", 
> Although Xoroshiro128** is supposed to be immune, I thought better to keep 
> away from these, and I could not see why the last bit would be better than 
> any other bit, so taking the first looked okay to me at least.

Meh.  If we're going to trust the high bits more than the lower ones,
we should do so consistently; it makes no sense to do that in one
pg_prng.c function and not its siblings.

Pushed with that change and some others, notably:

* Rewrote a lot of the comments.
* Refactored so that pg_strong_random() is not called from pg_prng.c.
As it stood, that would result in pulling in OpenSSL in programs that
have no need of it.  (ldexp() still creates a dependency on libm, but
I figured that was OK.)
* Changed a number of call sites that were using modulo reduction
to use pg_prng_uint64_range instead.  Given the lengthy discussion
we had, it seems silly not to apply the conclusion everywhere.

            regards, tom lane

Re: rand48 replacement

From

Fabien COELHO

Date:

30 November 2021, 07:37:57

> Pushed with that change and some others, notably:

Thanks for the improvements and the push!

-- 
Fabien.