Re: random() (was Re: New GUC to sample log queries) - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: random() (was Re: New GUC to sample log queries)
Date
Msg-id alpine.DEB.2.21.1812280808490.32444@lancre
Whole thread Raw
In response to Re: random() (was Re: New GUC to sample log queries)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: random() (was Re: New GUC to sample log queries)
Re: random() (was Re: New GUC to sample log queries)
List pgsql-hackers
Hello Tom,

>>> Another idea, which would be a lot less prone to breakage by
>>> add-on code, is to change drandom() and setseed() to themselves
>>> use pg_erand48() with a private seed.
>
>> The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits
>> architectures (which it is probably not, but why not going to 64 bits or
>> 128 bits directly looks like a missed opportunity), its internal state is
>> 48 bits as its name implies, and its period probably around 2**48, which
>> is 2**12 better than the previous case, not an extraordinary achievement.
>
> I can't get terribly excited about rewriting that code.  You're arguing
> from a "one size should fit all" perspective,

Not exactly. I'm not claiming that distinguishing parts that require 
good random from others is a bad choice. I'm arguing that:

(1) from a software engineering perspective, the PRNG implementation 
should hide the underlying generator name and state size.

(2) from a numerical perspective, poor seedings practice should be avoided 
when possible.

(3) from a cryptographic perspective, LCG is a poor choice of fast PRNG, 
which a quick look at wikipedia (mother of all knowledge) confirms.

(4) the fact that pg historical PRNG choices are well documented is not a 
good justification for not improving them.

Better alternatives exist that do not cost much (eg xorshift variants, 
WELL...), some of which are optimized for 64 bits architectures.

> which is exactly not the design approach we're using.

> We've already converted security-sensitive PRNG uses to use 
> pg_strong_random (or if we haven't, that's a localized bug in any such 
> places we missed).  What remains are places where we are not so 
> concerned about cryptographic strength but rather speed.

I do agree with the speed concern. I'm arguing that better quality at 
speed can be achieved with better seeding practices and not using a LCG.

About costs, not counting array accesses:

  - lrand48 (48 bits state as 3 uint16)        is 29 ops
    (10 =, 8 *, 7 +, 4 >>)
  - xorshift+ (128 bits state as 2 uint64)     is 13 ops
    ( 5 =, 0 *, 1 +, 3 >>, 4 ^)
  - xororshift128+ (idem)                      is 17 ops
    ( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2 |, less if rot in hardware)
  - WELL512 (512 bits state as 16 uint32)      is 38 ops
    (11 =, 0 *, 3 +, 7 >>, 10 ^, 4 &)
    probably much better, but probably slower than the current version

I'd be of the (debatable) opinion that we could use xororshift128+, 
already used by various languages, even if it fails some specialized 
tests.

> It does behoove us to ensure that the seed values are unpredictable and 
> that a user-controllable seed isn't used for internal operations,

Sure.

> but I don't feel a need for replacing the algorithm.

Hmmm. Does it mean that you would veto any change, even if the speed 
concern is addressed (i.e. faster/not slower with better quality)?

> You might argue that the SQL function drandom should be held to a higher
> standard, but we document exactly this same tradeoff for it.  Users who
> want cryptographic strength are directed to pgcrypto, leaving the audience
> for drandom being people who want speed.

My point is that speed is not necessary incompatible with better quality.
Better quality should not replace strong random when needed.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Tatsuro Yamada
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw: oddity in costing aggregate pushdown paths