Re: random() (was Re: New GUC to sample log queries) - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: random() (was Re: New GUC to sample log queries) |
Date | |
Msg-id | alpine.DEB.2.21.1812280808490.32444@lancre Whole thread Raw |
In response to | Re: random() (was Re: New GUC to sample log queries) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: random() (was Re: New GUC to sample log queries)
Re: random() (was Re: New GUC to sample log queries) |
List | pgsql-hackers |
Hello Tom, >>> Another idea, which would be a lot less prone to breakage by >>> add-on code, is to change drandom() and setseed() to themselves >>> use pg_erand48() with a private seed. > >> The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits >> architectures (which it is probably not, but why not going to 64 bits or >> 128 bits directly looks like a missed opportunity), its internal state is >> 48 bits as its name implies, and its period probably around 2**48, which >> is 2**12 better than the previous case, not an extraordinary achievement. > > I can't get terribly excited about rewriting that code. You're arguing > from a "one size should fit all" perspective, Not exactly. I'm not claiming that distinguishing parts that require good random from others is a bad choice. I'm arguing that: (1) from a software engineering perspective, the PRNG implementation should hide the underlying generator name and state size. (2) from a numerical perspective, poor seedings practice should be avoided when possible. (3) from a cryptographic perspective, LCG is a poor choice of fast PRNG, which a quick look at wikipedia (mother of all knowledge) confirms. (4) the fact that pg historical PRNG choices are well documented is not a good justification for not improving them. Better alternatives exist that do not cost much (eg xorshift variants, WELL...), some of which are optimized for 64 bits architectures. > which is exactly not the design approach we're using. > We've already converted security-sensitive PRNG uses to use > pg_strong_random (or if we haven't, that's a localized bug in any such > places we missed). What remains are places where we are not so > concerned about cryptographic strength but rather speed. I do agree with the speed concern. I'm arguing that better quality at speed can be achieved with better seeding practices and not using a LCG. About costs, not counting array accesses: - lrand48 (48 bits state as 3 uint16) is 29 ops (10 =, 8 *, 7 +, 4 >>) - xorshift+ (128 bits state as 2 uint64) is 13 ops ( 5 =, 0 *, 1 +, 3 >>, 4 ^) - xororshift128+ (idem) is 17 ops ( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2 |, less if rot in hardware) - WELL512 (512 bits state as 16 uint32) is 38 ops (11 =, 0 *, 3 +, 7 >>, 10 ^, 4 &) probably much better, but probably slower than the current version I'd be of the (debatable) opinion that we could use xororshift128+, already used by various languages, even if it fails some specialized tests. > It does behoove us to ensure that the seed values are unpredictable and > that a user-controllable seed isn't used for internal operations, Sure. > but I don't feel a need for replacing the algorithm. Hmmm. Does it mean that you would veto any change, even if the speed concern is addressed (i.e. faster/not slower with better quality)? > You might argue that the SQL function drandom should be held to a higher > standard, but we document exactly this same tradeoff for it. Users who > want cryptographic strength are directed to pgcrypto, leaving the audience > for drandom being people who want speed. My point is that speed is not necessary incompatible with better quality. Better quality should not replace strong random when needed. -- Fabien.
pgsql-hackers by date: