Re: random() (was Re: New GUC to sample log queries) - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: random() (was Re: New GUC to sample log queries) |
Date | |
Msg-id | alpine.DEB.2.21.1812270833190.32444@lancre Whole thread Raw |
In response to | random() (was Re: New GUC to sample log queries) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: random() (was Re: New GUC to sample log queries)
|
List | pgsql-hackers |
Hello all, > I am not sure I buy the argument that this is a security hazard, but > there are other reasons to question the use of random() here, some of > which you stated yourself above. Another one is that using random() > for internal purposes interferes with applications' possible use of > drandom() and setseed(), ie an application depending on getting a > particular random series would see different behavior depending on > whether this GUC is active or not. > > Another idea, which would be a lot less prone to breakage by > add-on code, is to change drandom() and setseed() to themselves > use pg_erand48() with a private seed. My random thoughts about random, erand48, etc. which may be slightly out of topic, sorry if this is the case. The word "random" is a misnommer for these pseudo-random generators, so that "strong" has to be used for higher quality generators:-( On Linux, random() is advertised with a period of around 2**36, its internal state is 8 to 256 bytes (default unclear, probably 8 bytes), however seeding with srandom() provides only 32 bits, which is a drawback. The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits architectures (which it is probably not, but why not going to 64 bits or 128 bits directly looks like a missed opportunity), its internal state is 48 bits as its name implies, and its period probably around 2**48, which is 2**12 better than the previous case, not an extraordinary achievement. Initial seeding of any pseudo-random generator should NEVER only use pid & time, which are too predictable, as already noted on the thread. They should use a strong random source if available, and maybe some backup, eg hashing logs. I think that this should be directly implemented, maybe with some provision to set the seed manually for debugging purposes, although with time-dependent features that may use random I'm not sure how far this would go. Also, I would suggest to centralize and abstract the implementation of a default pseudo-random generator so that its actual internal size and quality can be changed. That would mean renaming pg_erand48 and hidding its state size, maybe along the lines of: // extractors void pg_random_bytes(int nbytes, char *where_to_put_them); uint32 pg_random_32(); uint64 pg_random_48(); uint64 pg_random_64(); ... // dynamic? int pg_random_state_size(void); // in bytes // or static? #define PG_RANDOM_STATE_SIZE 6 // bytes // get/set state bool pg_random_get_state(uchar *state(, int size|[PG_RANDOM_STATE_SIZE])); bool pg_random_set_state(const uchar *state...); Given the typical hardware a postgres instance runs on, I would shop around for a pseudo-random generator which takes advantage of 64 bits operations, and not go below 64 bit seeds, or possibly 128. If a strong random source is available but considered too costly, so that a (weak) linear congruencial algorithm must be used, a possible compromise is to reseed from the strong source every few thousands/millions draws, or with a time periodicity, eg every few minutes, or maybe some configurable option. A not too costly security enhancer is to combine different fast generators so that if one becomes weak at some point, the combination does not. -- Fabien.
pgsql-hackers by date: