Re: rand48 replacement - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: rand48 replacement
Date
Msg-id alpine.DEB.2.22.394.2111271951200.223291@pseudo
Whole thread Raw
In response to Re: rand48 replacement  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: rand48 replacement
List pgsql-hackers
Hello Tom,

Thanks for the feedback.

> +/* use Donald Knuth's LCG constants for default state */
>
> How did Knuth get into this?  This algorithm is certainly not his,
> so why are those constants at all relevant?

They are not more nor less relevant than any other "random" constant. The 
state needs a default initialization. The point of using DK's is that it 
is somehow cannot be some specially crafted value which would have some 
special property only know to the purveyor of the constant and could be 
used by them to break the algorithm.

   https://en.wikipedia.org/wiki/Dual_EC_DRBG

> * I could do without the stream-of-consciousness notes in pg_prng.c.
> I think what's appropriate is to say "We use thus-and-such a generator
> which is documented here", maybe with a line or two about its properties.

The stuff was really written essentially as a "why this" for the first 
patch, and to prevent questions about "why not this other generator" 
later, because it could never stop.

> * Function names like these convey practically nothing to readers:
>
> +extern int64 pg_prng_i64(pg_prng_state *state);
> +extern uint32 pg_prng_u32(pg_prng_state *state);
> +extern int32 pg_prng_i32(pg_prng_state *state);
> +extern double pg_prng_f64(pg_prng_state *state);
> +extern bool pg_prng_bool(pg_prng_state *state);

The intention is obviously "postgres pseudo-random number generator for 
<type>". ISTM that it conveys (1) that it is a postgres-specific stuff, 
(2) that it is a PRNG (which I find *MUCH* more informative than the 
misleading statement that something is random when it is not, and it is 
shorter) and (3) about the type it returns, because C does require 
functions to have distinct names.

What would you suggest?

> and these functions' header comments add a grand total of zero bits
> of information.

Yes, probably. I do not like not to comment at all on a function.

> What someone generally wants to know first about a PRNG is (a) is it 
> uniform and (b) what is the range of outputs, neither of which are 
> specified anywhere.

ISTM (b) is suggested thanks to the type and (a) I'm not sure about a PRNG 
which would claim not at least claim to be uniform. Non uniform PRNG are 
usually built based on a uniform one.

What do you suggest as alternate names?

> +#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000)
> +#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF)
> +#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF)
>
> I'm not sure that these named constants are any more readable than
> writing the underlying constant, maybe less so --- in particular
> I think something based on (1<<52)-1 would be more appropriate for
> the float mantissa operations.  We don't need RIGHT_HALF_MASK at
> all, the casts to uint32 or int32 will accomplish that just fine.

Yep. I did it for uniformity.

> BTW, why are we bothering with FIRST_BIT_MASK in the first place,
> rather than returning "v & 1" for pg_prng_bool?

Because some PRNG are very bad in the low bits, not xoroshiro stuff, 
though.

> Is xoroshiro128ss less random in the low-order bits than the higher? 
> If so, that would be a pretty important thing to document.  If it's not, 
> we shouldn't make the code look like it is.

Dunno. Why should we prefer low bits?

> + * select in a range with bitmask rejection.
>
> What is "bitmask rejection"?  Is it actually important to callers?

No, it is important to understand how it does it. That is the name of the 
technique which is implemented, which helps if you want to understand what 
is going on by googling it. This point could be moved inside the function.

> I think this should be documented more like "Produce a random
> integer uniformly selected from the range [rmin, rmax)."

Sure.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Non-superuser subscription owners
Next
From: Tom Lane
Date:
Subject: Re: SSL Tests for sslinfo extension