Thread: pgsql: Use a separate random seed for SQL random()/setseed()functions.
Use a separate random seed for SQL random()/setseed() functions. Previously, the SQL random() function depended on libc's random(3), and setseed() invoked srandom(3). This results in interference between these functions and backend-internal uses of random(3). We'd never paid too much mind to that, but in the wake of commit 88bdbd3f7 which added log_statement_sample_rate, the interference arguably has a security consequence: if log_statement_sample_rate is active then an unprivileged user could probably control which if any of his SQL commands get logged, by issuing setseed() at the right times. That seems bad. To fix this reliably, we need random() and setseed() to use their own private random state variable. Standard random(3) isn't amenable to such usage, so let's switch to pg_erand48(). It's hard to say whether that's more or less "random" than any particular platform's version of random(3), but it does have a wider seed value and a longer period than are required by POSIX, so we can hope that this isn't a big downgrade. Also, we should now have uniform behavior of random() across platforms, which is worth something. While at it, upgrade the per-process seed initialization method to use pg_strong_random() if available, greatly reducing the predictability of the initial seed value. (I'll separately do something similar for the internal uses of random().) In addition to forestalling the possible security problem, this has a benefit in the other direction, which is that we can now document setseed() as guaranteeing a reproducible sequence of random() values. Previously, because of the possibility of internal calls of random(3), we could not promise any such thing. Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/6645ad6bdd81e7d5a764e0d94ef52fae053a9e13 Modified Files -------------- doc/src/sgml/func.sgml | 14 +++++++----- src/backend/utils/adt/float.c | 52 ++++++++++++++++++++++++++++++++++++------- 2 files changed, 53 insertions(+), 13 deletions(-)
Hello Tom, I'm sorry I'm a bit late back into this discussion, I was on the road. > To fix this reliably, we need random() and setseed() to use their own > private random state variable. Ok. > Standard random(3) isn't amenable to such usage, so let's switch to > pg_erand48(). Hmmm… bad idea? > It's hard to say whether that's more or less "random" than any > particular platform's version of random(3), It looks much less pseudo-random on Linux: POSIX provides 3 pseudo-random functions (probably 2 too many): "random", "rand" (mapped on the previous one by glibc), and ".rand48". According to glibc documentation, "random" has an internal state of 31 long integers i.e. 992 bits (checked into the source code, although why it can only be seeded from 32 bits fails me) with a "nonlinear additive feedback" PRNG, vs 48 bits of rand48 linear congruential generator PRNG, also seeded from 32 bits. For me, a 48 bit state is inadequate for anything but a toy application that would need a few casual pseudo-random numbers. I recommand against using the rand48 for a backend purpose which might have any, even remote, security implication. As rand48 is a LCG, it cycles on the low-order bits so that they should not be used (although they are for erand48). The 48 bit state looks like a 80's design, when hardware was between 16 and 32 bit, and was still in use in the 90's so that it is in POSIX and Java. I can retro-explain it as follows: the aim was to produce reasonable 32 bit pseudo-random ints on slow machines while not using low-order bits, so 48 was the closest round-up possible. Why not go up to 64 bits was very probably because it would have required more expensive mults to simulate 64 bit multiply on a 16 or 32 bit architecture. The 48-bit LCG makes it "good enough" for less than a cubic root of size samples, i.e. 2**16 draws. This is much too small on today GHz hardware. ISTM that 64 bits would be on the too-low side as well. I'd shop for a 128 or 256-bit state generator. I'm unsure of the best choice, though. I have looked at "xorshift128+" and "xoshiro256**", which have some critics (basically, non-cryptographic PRNG can have their state rebuilt from a few outputs, and there usually is a simple transformation on outputs which make it fails statistical tests). ISTM that "xoshiro256**" would be a reasonable choice, much better than "rand48". An LCG with a larger state (>= 128) could be admissible as well. > but it does have a wider seed value and a longer period than are required > by POSIX, so we can hope that this isn't a big downgrade. I'd say that it is a significant downgrade that I wish postgres woud avoid, especially with the argument that it for better security! I'd suggest again that (1) postgres should provide an algorithm-independent interface to its PRNG with an external state and (2) use an alternative to rand48, the choice of which should be discussed. -- Fabien.
Hello Tom, I'm sorry I'm a bit late back into this discussion, I was on the road. > To fix this reliably, we need random() and setseed() to use their own > private random state variable. Ok. > Standard random(3) isn't amenable to such usage, so let's switch to > pg_erand48(). Hmmm… bad idea? > It's hard to say whether that's more or less "random" than any > particular platform's version of random(3), It looks much less pseudo-random on Linux: POSIX provides 3 pseudo-random functions (probably 2 too many): "random", "rand" (mapped on the previous one by glibc), and ".rand48". According to glibc documentation, "random" has an internal state of 31 long integers i.e. 992 bits (checked into the source code, although why it can only be seeded from 32 bits fails me) with a "nonlinear additive feedback" PRNG, vs 48 bits of rand48 linear congruential generator PRNG, also seeded from 32 bits. For me, a 48 bit state is inadequate for anything but a toy application that would need a few casual pseudo-random numbers. I recommand against using the rand48 for a backend purpose which might have any, even remote, security implication. As rand48 is a LCG, it cycles on the low-order bits so that they should not be used (although they are for erand48). The 48 bit state looks like a 80's design, when hardware was between 16 and 32 bit, and was still in use in the 90's so that it is in POSIX and Java. I can retro-explain it as follows: the aim was to produce reasonable 32 bit pseudo-random ints on slow machines while not using low-order bits, so 48 was the closest round-up possible. Why not go up to 64 bits was very probably because it would have required more expensive mults to simulate 64 bit multiply on a 16 or 32 bit architecture. The 48-bit LCG makes it "good enough" for less than a cubic root of size samples, i.e. 2**16 draws. This is much too small on today GHz hardware. ISTM that 64 bits would be on the too-low side as well. I'd shop for a 128 or 256-bit state generator. I'm unsure of the best choice, though. I have looked at "xorshift128+" and "xoshiro256**", which have some critics (basically, non-cryptographic PRNG can have their state rebuilt from a few outputs, and there usually is a simple transformation on outputs which make it fails statistical tests). ISTM that "xoshiro256**" would be a reasonable choice, much better than "rand48". An LCG with a larger state (>= 128) could be admissible as well. > but it does have a wider seed value and a longer period than are required > by POSIX, so we can hope that this isn't a big downgrade. I'd say that it is a significant downgrade that I wish postgres woud avoid, especially with the argument that it for better security! I'd suggest again that (1) postgres should provide an algorithm-independent interface to its PRNG with an external state and (2) use an alternative to rand48, the choice of which should be discussed. -- Fabien.