Re: rand48 replacement - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: rand48 replacement
Date
Msg-id alpine.DEB.2.22.394.2107031643010.2359988@pseudo
Whole thread Raw
In response to Re: rand48 replacement  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
Hello Yura,

> 1. PostgreSQL source uses `uint64` and `uint32`, but not 
> `uint64_t`/`uint32_t`
> 2. I don't see why pg_prng_state could not be `typedef uint64 
> pg_prng_state[2];`

It could, but I do not see that as desirable. From an API design point of 
view we want something clean and abstract, and for me a struct looks 
better for that. It would be a struct with an array insided, I think that 
the code is more readable by avoiding constant index accesses (s[0] vs 
s0), we do not need actual indexes.

> 3. Then SamplerRandomState and pgbench RandomState could stay.
>   Patch will be a lot shorter.

You mean "typedef pg_prng_state SamplerRandomState"? One point of the 
patch is to have "one" standard PRNG commonly used where one is needed, so 
I'd say we want the name to be used, hence the substitutions.

Also, I have a thing against objects being named "Random" which are not 
random, which is highly misleading. A PRNG is purely deterministic. 
Removing misleading names is also a benefit.

So If people want to keep the old name it can be done. But I see these 
name changes as desirable.

> I don't like mix of semantic refactoring and syntactic refactoring in 
> the same patch. While I could agree with replacing `SamplerRandomState 
> => pg_prng_state`, I'd rather see it in separate commit. And that 
> separate commit could contain transition: `typedef uint64 
> pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;`

I would tend to agree on principle, but separating in two phases here 
looks pointless: why implementing a cleaner rand48 interface, which would 
then NOT be the rand48 standard, just to upgrade it to something else in 
the next commit? And the other path is as painfull and pointless.

So I think that the new feature better comes with its associated 
refactoring which is an integral part of it.

> 4. There is no need in ReservoirStateData->randstate_initialized. There could
>   be macros/function:
>   `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }`

It would work for this peculiar implementation but not necessary for 
others that we may want to substitute later, as it would mean either 
breaking the interface or adding a boolean in the structure if there is no 
special unintialized state that can be detected, which would impact memory 
usage and alignment.

So I think it is better to keep it that way, usually the user knows 
whether its structure has been initialized, and the special case for 
reservoir where the user does not seem to know about that can handle its 
own boolean without impacting the common API or the data structure.

> 5. Is there need for 128 bit prng at all?

This is a 64 bits PRNG with a 128 bit state. We are generating 64 bits 
values, so we want a 64 bit PRNG. A prng state must be larger than its 
generated value, so we need sizeof(state) > 64 bits, hence at least 128 
bits if we add 128 bits memory alignement.

>   And there is 4*32bit xoshiro128: 
> https://prng.di.unimi.it/xoshiro128plusplus.c
>   32bit operations are faster on 32bit platforms.
>   But 32bit platforms are quite rare in production this days.
>   Therefore I don't have strong opinion on this.

I think that 99.9% hardware running postgres is 64 bits, so 64 bits is the 
right choice.

-- 
Fabien.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Preventing abort() and exit() calls in libpq
Next
From: Fabien COELHO
Date:
Subject: Re: rand48 replacement