Thread: pgsql: Use a separate random seed for SQL random()/setseed()functions.

pgsql: Use a separate random seed for SQL random()/setseed()functions.

From
Tom Lane
Date:
Use a separate random seed for SQL random()/setseed() functions.

Previously, the SQL random() function depended on libc's random(3),
and setseed() invoked srandom(3).  This results in interference between
these functions and backend-internal uses of random(3).  We'd never paid
too much mind to that, but in the wake of commit 88bdbd3f7 which added
log_statement_sample_rate, the interference arguably has a security
consequence: if log_statement_sample_rate is active then an unprivileged
user could probably control which if any of his SQL commands get logged,
by issuing setseed() at the right times.  That seems bad.

To fix this reliably, we need random() and setseed() to use their own
private random state variable.  Standard random(3) isn't amenable to such
usage, so let's switch to pg_erand48().  It's hard to say whether that's
more or less "random" than any particular platform's version of random(3),
but it does have a wider seed value and a longer period than are required
by POSIX, so we can hope that this isn't a big downgrade.  Also, we should
now have uniform behavior of random() across platforms, which is worth
something.

While at it, upgrade the per-process seed initialization method to use
pg_strong_random() if available, greatly reducing the predictability
of the initial seed value.  (I'll separately do something similar for
the internal uses of random().)

In addition to forestalling the possible security problem, this has a
benefit in the other direction, which is that we can now document
setseed() as guaranteeing a reproducible sequence of random() values.
Previously, because of the possibility of internal calls of random(3),
we could not promise any such thing.

Discussion: https://postgr.es/m/3859.1545849900@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/6645ad6bdd81e7d5a764e0d94ef52fae053a9e13

Modified Files
--------------
doc/src/sgml/func.sgml        | 14 +++++++-----
src/backend/utils/adt/float.c | 52 ++++++++++++++++++++++++++++++++++++-------
2 files changed, 53 insertions(+), 13 deletions(-)


Re: pgsql: Use a separate random seed for SQL random()/setseed()functions.

From
Fabien COELHO
Date:
Hello Tom,

I'm sorry I'm a bit late back into this discussion, I was on the road.

> To fix this reliably, we need random() and setseed() to use their own
> private random state variable.

Ok.

> Standard random(3) isn't amenable to such usage, so let's switch to 
> pg_erand48().

Hmmm… bad idea?

> It's hard to say whether that's more or less "random" than any 
> particular platform's version of random(3),

It looks much less pseudo-random on Linux: POSIX provides 3 pseudo-random 
functions (probably 2 too many): "random", "rand" (mapped on the previous 
one by glibc), and ".rand48". According to glibc documentation, "random" 
has an internal state of 31 long integers i.e. 992 bits (checked into the 
source code, although why it can only be seeded from 32 bits fails me) 
with a "nonlinear additive feedback" PRNG, vs 48 bits of rand48 linear 
congruential generator PRNG, also seeded from 32 bits.

For me, a 48 bit state is inadequate for anything but a toy application 
that would need a few casual pseudo-random numbers. I recommand against 
using the rand48 for a backend purpose which might have any, even remote, 
security implication.

As rand48 is a LCG, it cycles on the low-order bits so that they should 
not be used (although they are for erand48). The 48 bit state looks like a 
80's design, when hardware was between 16 and 32 bit, and was still in use 
in the 90's so that it is in POSIX and Java. I can retro-explain it as 
follows: the aim was to produce reasonable 32 bit pseudo-random ints on 
slow machines while not using low-order bits, so 48 was the closest 
round-up possible. Why not go up to 64 bits was very probably because it 
would have required more expensive mults to simulate 64 bit multiply on a 
16 or 32 bit architecture. The 48-bit LCG makes it "good enough" for less 
than a cubic root of size samples, i.e. 2**16 draws. This is much too 
small on today GHz hardware.

ISTM that 64 bits would be on the too-low side as well. I'd shop for a 128 
or 256-bit state generator. I'm unsure of the best choice, though. I have 
looked at "xorshift128+" and "xoshiro256**", which have some critics 
(basically, non-cryptographic PRNG can have their state rebuilt from a few 
outputs, and there usually is a simple transformation on outputs which 
make it fails statistical tests). ISTM that "xoshiro256**" would be a 
reasonable choice, much better than "rand48". An LCG with a larger state 
(>= 128) could be admissible as well.

> but it does have a wider seed value and a longer period than are required
> by POSIX, so we can hope that this isn't a big downgrade.

I'd say that it is a significant downgrade that I wish postgres woud 
avoid, especially with the argument that it for better security!

I'd suggest again that (1) postgres should provide an 
algorithm-independent interface to its PRNG with an external state and (2) 
use an alternative to rand48, the choice of which should be discussed.

-- 
Fabien.

Re: pgsql: Use a separate random seed for SQL random()/setseed()functions.

From
Fabien COELHO
Date:
Hello Tom,

I'm sorry I'm a bit late back into this discussion, I was on the road.

> To fix this reliably, we need random() and setseed() to use their own
> private random state variable.

Ok.

> Standard random(3) isn't amenable to such usage, so let's switch to 
> pg_erand48().

Hmmm… bad idea?

> It's hard to say whether that's more or less "random" than any 
> particular platform's version of random(3),

It looks much less pseudo-random on Linux: POSIX provides 3 pseudo-random 
functions (probably 2 too many): "random", "rand" (mapped on the previous 
one by glibc), and ".rand48". According to glibc documentation, "random" 
has an internal state of 31 long integers i.e. 992 bits (checked into the 
source code, although why it can only be seeded from 32 bits fails me) 
with a "nonlinear additive feedback" PRNG, vs 48 bits of rand48 linear 
congruential generator PRNG, also seeded from 32 bits.

For me, a 48 bit state is inadequate for anything but a toy application 
that would need a few casual pseudo-random numbers. I recommand against 
using the rand48 for a backend purpose which might have any, even remote, 
security implication.

As rand48 is a LCG, it cycles on the low-order bits so that they should 
not be used (although they are for erand48). The 48 bit state looks like a 
80's design, when hardware was between 16 and 32 bit, and was still in use 
in the 90's so that it is in POSIX and Java. I can retro-explain it as 
follows: the aim was to produce reasonable 32 bit pseudo-random ints on 
slow machines while not using low-order bits, so 48 was the closest 
round-up possible. Why not go up to 64 bits was very probably because it 
would have required more expensive mults to simulate 64 bit multiply on a 
16 or 32 bit architecture. The 48-bit LCG makes it "good enough" for less 
than a cubic root of size samples, i.e. 2**16 draws. This is much too 
small on today GHz hardware.

ISTM that 64 bits would be on the too-low side as well. I'd shop for a 128 
or 256-bit state generator. I'm unsure of the best choice, though. I have 
looked at "xorshift128+" and "xoshiro256**", which have some critics 
(basically, non-cryptographic PRNG can have their state rebuilt from a few 
outputs, and there usually is a simple transformation on outputs which 
make it fails statistical tests). ISTM that "xoshiro256**" would be a 
reasonable choice, much better than "rand48". An LCG with a larger state 
(>= 128) could be admissible as well.

> but it does have a wider seed value and a longer period than are required
> by POSIX, so we can hope that this isn't a big downgrade.

I'd say that it is a significant downgrade that I wish postgres woud 
avoid, especially with the argument that it for better security!

I'd suggest again that (1) postgres should provide an 
algorithm-independent interface to its PRNG with an external state and (2) 
use an alternative to rand48, the choice of which should be discussed.

-- 
Fabien.