Thread: Random not so random

Random not so random

From
"Arnau Rebassa"
Date:
Hi everybody,

  I'm doing the following query:

    select * from messages order by random() limit 1;

  in the table messages I have more than 200 messages and a lot of times,
the message retrieved is the same. Anybody knows how I could do a more
"random" random?

  Thank you very much

--
Arnau

_________________________________________________________________
¿Cuánto vale tu auto? Tips para mantener tu carro. ¡De todo en MSN Latino
Autos! http://latino.msn.com/autos/


Re: Random not so random

From
"Arnau Rebassa"
Date:
Hi Greg,

>What OS is this? Postgres is just using your OS's random()/srandom() calls.
>On
>some platforms these may be poorly implemented and not very random.

  I'm using a debian linux as OS with a 2.4 kernel running on it.

>Incidentally, are you reconnecting every time or is it that multiple calls
>in
>a single session are returning the same record?

  I'm reconnecting each time I want to retrieve a message. The idea is I
have a lilbrary of messages and I want to pick one of it randomly. I don't
know if there is the possibility to seed the random number generator
manually, anybody knows it?


Thanks to all

--
Arnau

_________________________________________________________________
¿Cuánto vale tu auto? Tips para mantener tu carro. ¡De todo en MSN Latino
Autos! http://latino.msn.com/autos/


Re: Random not so random

From
Neil Conway
Date:
Arnau Rebassa wrote:
> I don't know if there is the possibility to seed the random number
> generator manually, anybody knows it?

setseed()

-Neil

Re: Random not so random

From
Tom Lane
Date:
"Arnau Rebassa" <arebassa@hotmail.com> writes:
>   I'm using a debian linux as OS with a 2.4 kernel running on it.

>> Incidentally, are you reconnecting every time or is it that multiple calls
>> in a single session are returning the same record?

>   I'm reconnecting each time I want to retrieve a message.

Hmm.  postmaster.c does this during startup of each backend process:

    gettimeofday(&now, &tz);
    srandom((unsigned int) now.tv_usec);

which would ordinarily be fairly good at mixing things up.  On some
platforms I might worry that the microseconds part of gettimeofday
might only have a few bits of accuracy, but I don't think that's an
issue on Linux.

It occurs to me that you might be seeing predictability as an indirect
result of something else you are doing that somehow tends to synchronize
the backend start times.  Are you connecting from a cron script that
would tend to be launched at the same relative instant within a second?

It might improve matters to make the code do something like

    srandom((unsigned int) (now.tv_sec ^ now.tv_usec));

            regards, tom lane

Re: Random not so random

From
Bruno Wolff III
Date:
On Mon, Oct 04, 2004 at 10:14:19 -0400,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It occurs to me that you might be seeing predictability as an indirect
> result of something else you are doing that somehow tends to synchronize
> the backend start times.  Are you connecting from a cron script that
> would tend to be launched at the same relative instant within a second?
>
> It might improve matters to make the code do something like
>
>     srandom((unsigned int) (now.tv_sec ^ now.tv_usec));

Using /dev/urandom, where available, might be another option. However, some
people may not want their entropy pool getting 4 bytes used up on every
connection start up.

Re: Random not so random

From
Marco Colombo
Date:
On Mon, 4 Oct 2004, Tom Lane wrote:

> "Arnau Rebassa" <arebassa@hotmail.com> writes:
>>   I'm using a debian linux as OS with a 2.4 kernel running on it.
>
>>> Incidentally, are you reconnecting every time or is it that multiple calls
>>> in a single session are returning the same record?
>
>>   I'm reconnecting each time I want to retrieve a message.
>
> Hmm.  postmaster.c does this during startup of each backend process:
>
>     gettimeofday(&now, &tz);
>     srandom((unsigned int) now.tv_usec);
>
> which would ordinarily be fairly good at mixing things up.  On some
> platforms I might worry that the microseconds part of gettimeofday
> might only have a few bits of accuracy, but I don't think that's an
> issue on Linux.
>
> It occurs to me that you might be seeing predictability as an indirect
> result of something else you are doing that somehow tends to synchronize
> the backend start times.  Are you connecting from a cron script that
> would tend to be launched at the same relative instant within a second?
>
> It might improve matters to make the code do something like
>
>     srandom((unsigned int) (now.tv_sec ^ now.tv_usec));

How about reading from /dev/urandom on platforms that support it?

Actually, that should be done each time the random() function
is evaluated. (I have no familiarity with the code, so please
bear with me if the suggestion is unsound). I'd even add a parameter
for "really" random data to be provided, by reading /dev/random
instead of /dev/urandom (but read(2) may block).

How about the following:
random() = random(0) = traditional random()
random(1) = best effort random() via /dev/urandom
random(2) = wait for really random bits via /dev/random

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Random not so random

From
Bruno Wolff III
Date:
On Mon, Oct 04, 2004 at 18:58:41 +0200,
  Marco Colombo <pgsql@esiway.net> wrote:
>
> Actually, that should be done each time the random() function
> is evaluated. (I have no familiarity with the code, so please

That may be overkill, since I don't think that random has been advertised
as a secure or even particularly strong random number generator.

> bear with me if the suggestion is unsound). I'd even add a parameter
> for "really" random data to be provided, by reading /dev/random
> instead of /dev/urandom (but read(2) may block).

You don't want to use /dev/random. You aren't going to get better random
numbers that way and blocking reads is a big problem.

> How about the following:
> random() = random(0) = traditional random()
> random(1) = best effort random() via /dev/urandom
> random(2) = wait for really random bits via /dev/random

It might be nice to have a secure random function available in postgres.
Just using /dev/urandom is probably good enough to provide this service.

Re: Random not so random

From
"D. Stimits"
Date:
Tom Lane wrote:
> "Arnau Rebassa" <arebassa@hotmail.com> writes:
>
>>  I'm using a debian linux as OS with a 2.4 kernel running on it.
>
>
>>>Incidentally, are you reconnecting every time or is it that multiple calls
>>>in a single session are returning the same record?
>
>
>>  I'm reconnecting each time I want to retrieve a message.
>
>
> Hmm.  postmaster.c does this during startup of each backend process:
>
>     gettimeofday(&now, &tz);
>     srandom((unsigned int) now.tv_usec);

If it uses the same seed from the connection, then all randoms within a
connect that has not reconnected will use the same seed. Which means the
same sequence will be generated each time, which is why it is
pseudo-random and not random. For it to be random not just the first
call of a new connection, but among all calls of new connection, it
would have to seed it based on time at the moment of query and not at
the moment of connect. A pseudo-random generator using the same seed
will generate the same sequence.

D. Stimits, stimits AT comcast DOT net


Re: Random not so random

From
Tom Lane
Date:
"D. Stimits" <stimits@comcast.net> writes:
> Tom Lane wrote:
>> Hmm.  postmaster.c does this during startup of each backend process:
>>
>> gettimeofday(&now, &tz);
>> srandom((unsigned int) now.tv_usec);

> If it uses the same seed from the connection, then all randoms within a
> connect that has not reconnected will use the same seed. Which means the
> same sequence will be generated each time, which is why it is
> pseudo-random and not random. For it to be random not just the first
> call of a new connection, but among all calls of new connection, it
> would have to seed it based on time at the moment of query and not at
> the moment of connect. A pseudo-random generator using the same seed
> will generate the same sequence.

Did you read what I said?  Or experiment?

            regards, tom lane

Re: Random not so random

From
Harald Fuchs
Date:
In article <20041004155742.GA8488@wolff.to>,
Bruno Wolff III <bruno@wolff.to> writes:

> On Mon, Oct 04, 2004 at 10:14:19 -0400,
>   Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> It occurs to me that you might be seeing predictability as an indirect
>> result of something else you are doing that somehow tends to synchronize
>> the backend start times.  Are you connecting from a cron script that
>> would tend to be launched at the same relative instant within a second?
>>
>> It might improve matters to make the code do something like
>>
>> srandom((unsigned int) (now.tv_sec ^ now.tv_usec));

> Using /dev/urandom, where available, might be another option. However, some
> people may not want their entropy pool getting 4 bytes used up on every
> connection start up.

I think we don't need the randomness provided by /dev/[u]random.  How
about XORing in getpid?

Re: Random not so random

From
Michael Fuhr
Date:
On Tue, Oct 05, 2004 at 02:39:13PM +0200, Harald Fuchs wrote:

> I think we don't need the randomness provided by /dev/[u]random.  How
> about XORing in getpid?

What about making the seeding mechanism and perhaps random()'s
behavior configurable?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Random not so random

From
Tom Lane
Date:
Harald Fuchs <hf0722x@protecting.net> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> It might improve matters to make the code do something like
>>>> srandom((unsigned int) (now.tv_sec ^ now.tv_usec));

> I think we don't need the randomness provided by /dev/[u]random.  How
> about XORing in getpid?

That sounds like a fine compromise --- it'll ensure a reasonable-size
set of possible seeds, it's at least marginally less predictable than
now.tv_sec, and it's perfectly portable.  No one in their right mind
expects random(3) to be cryptographically secure anyway, so doing more
doesn't seem warranted.

The various proposals to create a more-secure, less-portable variant
of random() don't seem appropriate to me for beta.  But I'd not object
to someone whipping up a contrib module for 8.1 or beyond.

            regards, tom lane

Re: Random not so random

From
"Dann Corbit"
Date:
A better way would be to seed a Mersenne Twister PRNG at server startup
time and then use the same generator for all subsequent calls.
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

The period is exceptionally long, and it has many excellent properties.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of D. Stimits
> Sent: Monday, October 04, 2004 7:23 AM
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Random not so random
>
>
> Tom Lane wrote:
> > "Arnau Rebassa" <arebassa@hotmail.com> writes:
> >
> >>  I'm using a debian linux as OS with a 2.4 kernel running on it.
> >
> >
> >>>Incidentally, are you reconnecting every time or is it
> that multiple
> >>>calls
> >>>in a single session are returning the same record?
> >
> >
> >>  I'm reconnecting each time I want to retrieve a message.
> >
> >
> > Hmm.  postmaster.c does this during startup of each backend process:
> >
> >     gettimeofday(&now, &tz);
> >     srandom((unsigned int) now.tv_usec);
>
> If it uses the same seed from the connection, then all
> randoms within a
> connect that has not reconnected will use the same seed.
> Which means the
> same sequence will be generated each time, which is why it is
> pseudo-random and not random. For it to be random not just the first
> call of a new connection, but among all calls of new connection, it
> would have to seed it based on time at the moment of query and not at
> the moment of connect. A pseudo-random generator using the same seed
> will generate the same sequence.
>
> D. Stimits, stimits AT comcast DOT net
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: Random not so random

From
"Dann Corbit"
Date:
Here is a tarball for MT:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.t
gz

It has a BSD license.  What could be better?

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Dann Corbit
> Sent: Tuesday, October 05, 2004 9:34 AM
> To: pgsql-general@postgresql.org
> Cc: stimits@comcast.net
> Subject: Re: [GENERAL] Random not so random
>
>
> A better way would be to seed a Mersenne Twister PRNG at
> server startup time and then use the same generator for all
> subsequent calls.
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
>
> The period is exceptionally long, and it has many excellent
> properties.
>
> > -----Original Message-----
> > From: pgsql-general-owner@postgresql.org
> > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of D. Stimits
> > Sent: Monday, October 04, 2004 7:23 AM
> > Cc: pgsql-general@postgresql.org
> > Subject: Re: [GENERAL] Random not so random
> >
> >
> > Tom Lane wrote:
> > > "Arnau Rebassa" <arebassa@hotmail.com> writes:
> > >
> > >>  I'm using a debian linux as OS with a 2.4 kernel running on it.
> > >
> > >
> > >>>Incidentally, are you reconnecting every time or is it
> > that multiple
> > >>>calls
> > >>>in a single session are returning the same record?
> > >
> > >
> > >>  I'm reconnecting each time I want to retrieve a message.
> > >
> > >
> > > Hmm.  postmaster.c does this during startup of each
> backend process:
> > >
> > >     gettimeofday(&now, &tz);
> > >     srandom((unsigned int) now.tv_usec);
> >
> > If it uses the same seed from the connection, then all
> > randoms within a
> > connect that has not reconnected will use the same seed.
> > Which means the
> > same sequence will be generated each time, which is why it is
> > pseudo-random and not random. For it to be random not just
> the first
> > call of a new connection, but among all calls of new connection, it
> > would have to seed it based on time at the moment of query
> and not at
> > the moment of connect. A pseudo-random generator using the
> same seed
> > will generate the same sequence.
> >
> > D. Stimits, stimits AT comcast DOT net
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 8: explain analyze is your friend
> >
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so
> that your
>       message can get through to the mailing list cleanly
>

Re: Random not so random

From
Vivek Khera
Date:
>>>>> "DS" == D Stimits <stimits@comcast.net> writes:

DS> If it uses the same seed from the connection, then all randoms within
DS> a connect that has not reconnected will use the same seed. Which means
DS> the same sequence will be generated each time, which is why it is
DS> pseudo-random and not random. For it to be random not just the first
DS> call of a new connection, but among all calls of new connection, it
DS> would have to seed it based on time at the moment of query and not at
DS> the moment of connect. A pseudo-random generator using the same seed
DS> will generate the same sequence.

You clearly demonstrate you do not understand the purpose of a seed in
a PRNG, nor how PRNG's in general work.  You want to seed it once and
only once per process, not every time you issue a query.  And nobody
said to use the same seed every time, either.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Random not so random

From
"D. Stimits"
Date:
Vivek Khera wrote:
>>>>>>"DS" == D Stimits <stimits@comcast.net> writes:
>
>
> DS> If it uses the same seed from the connection, then all randoms within
> DS> a connect that has not reconnected will use the same seed. Which means
> DS> the same sequence will be generated each time, which is why it is
> DS> pseudo-random and not random. For it to be random not just the first
> DS> call of a new connection, but among all calls of new connection, it
> DS> would have to seed it based on time at the moment of query and not at
> DS> the moment of connect. A pseudo-random generator using the same seed
> DS> will generate the same sequence.
>
> You clearly demonstrate you do not understand the purpose of a seed in
> a PRNG, nor how PRNG's in general work.  You want to seed it once and
> only once per process, not every time you issue a query.  And nobody
> said to use the same seed every time, either.
>

Sorry, at the time I don't believe PRNG was part of the conversation,
that came after (maybe I missed that). What I saw were functions based
on one-way hashes...is that incorrect? For one-way hashes and
pseudo-random generators using some form of hash a different seed should
be used or else the pattern will be the same when using the same data.
 From the man page on srandom():

        The srandom() function sets its argument as the seed for a new
sequence  of  pseudo-random
        integers  to be returned by random().  These sequences are
repeatable by calling srandom()
        with the same seed value.  If no seed value is provided, the
random() function is automat-
        ically seeded with a value of 1.

The srandom() caught my eye in the earlier email, not PRNG. You are
welcome to re-use srandom() without a new seed if you want.

D. Stimits, stimits AT comcast DOT net

Re: Random not so random

From
Harald Fuchs
Date:
In article <D425483C2C5C9F49B5B7A41F89441547055523@postal.corporate.connx.com>,
"Dann Corbit" <DCorbit@connx.com> writes:

> A better way would be to seed a Mersenne Twister PRNG at server startup
> time and then use the same generator for all subsequent calls.
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

> The period is exceptionally long, and it has many excellent properties.

I think you're slightly missing the point.  Every PRNG always returns
the same output swquence for the same seed, and that's the problem
with the current implementation: it might happen that two backends get
the same seed.

Re: Random not so random

From
Bruno Wolff III
Date:
On Tue, Oct 05, 2004 at 11:27:05 +0200,
  Marco Colombo <marco@esi.it> wrote:
> On Mon, 4 Oct 2004, Bruno Wolff III wrote:
>
> >You don't want to use /dev/random. You aren't going to get better random
> >numbers that way and blocking reads is a big problem.
>
> Sure you are. As far as the entropy pool isn't empty, /dev/random
> won't block, and thus there's no difference in behaviour.

/dev/random blocking is a huge problem.

> When you're short of random bits, /dev/random blocks, /dev/urandom
> falls back to a PRNG + hash (I think SHA1). Under these conditions,
> /dev/urandom output has 0 "entropy" at all: an attacker can predict
> the output after short observation provided that he can break SHA1.

We aren't worried about an attacker breaking SHA1 in this case because
we are just generating a seed for srandom which isn't cryptographically
secure. What you have to worry about is using so many values that
the nonrandomness is appearant. This will generally become appearant
after roughly 2^(number of bits in state / 2) samples. That isn't going
to happen when the /dev/urandom is being used to generate a seed.

> That is, anything that uses /dev/urandom (when the kernel pool is
> empty) is just as safe as SHA1 is.

SHA1 is pretty safe for this purpose. The recent weaknesses of related
hashes isn't going to be of much help in predicting the output of
/dev/urandom. If SHA1 were to be broken that badly /dev/random would
also be broken.

The reason to add in entropy is protect against your state having been
leaked once and having an attacker be able to always be able to know
what the next output is going to be without having further access to
the state. I don't believe the way linux mixes in entropy protects
against this attack. If you mix in too little entropy between output
values you can figure what entropy was added by going through all
possible entropy values and seeing which one gives the expected output.
(This assumes that you have gotten a copy of the internal state at
some point.) My memory of looking at the /dev/[u]random code is that
there is just one entropy pool and entropy is added to it as it is
obtained. So that if values are obtained from /dev/[u]random at a high
enough rate the above attack is practical.

So the only case where you might want to use /dev/random over /dev/urandom
is where the internal state is vunerable, the attacker has access to a large
fraction of the output values and where there are at least some gaps between
samples where large amounts of entropy are collected.

Re: Random not so random

From
Michael Fuhr
Date:
On Tue, Oct 05, 2004 at 07:23:32AM -0600, Michael Fuhr wrote:
> On Tue, Oct 05, 2004 at 02:39:13PM +0200, Harald Fuchs wrote:
>
> > I think we don't need the randomness provided by /dev/[u]random.  How
> > about XORing in getpid?
>
> What about making the seeding mechanism and perhaps random()'s
> behavior configurable?

Regarding a configurable seeding mechanism, I was thinking along
the lines of Apache's SSLRandomSeed directive:

http://httpd.apache.org/docs-2.0/mod/mod_ssl.html#sslrandomseed

The "builtin" source could use a seed based on the time and
possibly the process ID, similar to the current implementation.

The "file" source would allow admins to use /dev/random or
/dev/urandom, whichever they prefer, or even an ordinary
file if they always wanted the same seed for testing purposes.
The backend wouldn't know or care what the source was: it
would simply open the specified file and read from it.

The "exec" source would read the seed from an external program,
which could generate it by whatever means desired.

The Apache directive also supports "egd" to obtain the seed
from an Entropy Gathering Daemon.

By making the seeding mechanism configurable, then everybody
can have it their own way.

Comments?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Random not so random

From
Bruno Wolff III
Date:
I am going to keep this on general for now, since it seems like other people
might be interested even though it is straying a somewhat off topic.

On Wed, Oct 06, 2004 at 18:02:39 +0200,
  Marco Colombo <marco@esi.it> wrote:
>
> It depends. What's wrong with a SQL function taking long to
> complete? It could be a long computation, maybe days long. As far as

Days long SQL queries are hardly normal. Common things you might want
to generate secure random numbers for aren't going to be queries you
want to run a long time. For example you might want to generate
session ids to store in a cookie handed out for web requests.

> the client is concerned, there's no difference. Of course you'll have
> to use it with care, just as you would do with any potentially long
> function (that is, don't lock everything and then sit down waiting).

This might be reasonable if there was a significant benefit to doing so.
My argument is that /dev/random does not produce significantly more
secure random numbers under normal circumstances.

> >SHA1 is pretty safe for this purpose. The recent weaknesses of related
> >hashes isn't going to be of much help in predicting the output of
> >/dev/urandom. If SHA1 were to be broken that badly /dev/random would
> >also be broken.
>
> No, because /dev/random returns only after the internal state has been
> perturbed "enough". An observer may (in theory) get to know all
> bits of the internal pool right _after_ a read() from /dev/random,
> but still isn't able to guess the value of all of them right before the
> _next_ read() returns, exactly because the kernel waits until the return
> is safe in this regard.

OK, I'll buy that for cases where the estimates of entropy acquired are
reasonable. This may still be a problem for some uses though. It may allow
an attacker to figure out other data returned by /dev/random that they
weren't able to observe.

> Now, if the attacker gets the output, and breaks SHA1, he'll know the
> internal state again. But if the output goes elsewhere, he won't be
> able to guess it. That's why /dev/random is 'safe'. It won't return
> until the output is random enough.

You don't necessarily need to break SHA1 to be able to track the internal
state.

> Now, my esteem is 0 entropy bits for my pool, since you drained them all.
>
> "Secure Application X" needs to generate a session key (128bits),
> so reads them from /dev/urandom. Let's assume the entropy count is still 0.
> I (the kernel) provide it with 128bits that come from my PRNG + SHA1 engine.
> Now, you can predict those 128 bits easily, and thus you know the session
> key. The attack succeeds.
>
> "Really Secure Application Y" needs 128 bits, too. But this time it
> reads from /dev/random (assume entropy count is still 0).
> Now, I can't fulfill the request. So have Y wait on read().
>
> As time passes, I start collecting entropy bits, from IRQ timings,
> or a hardware RNG. These bits change the internal pool in a way you
> don't know. Eventually I get the count up to 128 bits.
> Now, I run my PRNG (from the now-changed state) + SHA1 engine, and return
> 128 bits to Y. Can you guess the session key? Of course. You know
> how internal state was before, you can "go through all possible entropy
> values". How many of them? 2^128. That's, no wonder, the same of trying
> to guess the key directly. Now you're telling me if you had the key,
> you could know the internal state again? Man, if you had the key, you
> already broke application Y, and the attack already succeeded by other
> means!

Assuming that all 128 bits were grabbed in one call to /dev/random, you
should be safe. However if the bits were returned a byte at a time
with intervening bytes returned to /dev/urandom to the attacker, the
key could be vunerable.

> >My memory of looking at the /dev/[u]random code is that
> >there is just one entropy pool and entropy is added to it as it is
> >obtained. So that if values are obtained from /dev/[u]random at a high
> >enough rate the above attack is practical.
> >
> >So the only case where you might want to use /dev/random over /dev/urandom
> >is where the internal state is vunerable, the attacker has access to a
> >large
> >fraction of the output values and where there are at least some gaps
> >between
> >samples where large amounts of entropy are collected.
>
> No. SHA1 protects the internal pool from an attacker who knows all the
> output. That's easy, just read from /dev/random until it blocks. If you're
> fast enough, you can assume no one else read from /dev/random. Now,
> _if you can break SHA1_, you have enough cleartext output of the internal
> PRNG to guess it's state. You may have used /dev/urandom as well, there's
> no difference.

I think you misunderstood the context. The attacker is assumed to have
initially gotten the state through some means. By watching the output
from /dev/urandom it is possible to figure out what the current state is
if not too much entropy has been added since the attacker knew the state.

> The purpose of /dev/random blocking is not protecting the internal pool
> state, that's SHA1 job. /dev/random blocks in order to protect other
> users (application Y in my example) from you.

SHA1 only protects you from getting the state solely be viewing the output.
It doesn't protect against you getting through other means.

I think /dev/random output would be better protected if it didn't share
a pool with /dev/urandom. Doing that is what makes tracking the internal
state possible.

Another issue with using /dev/random instead of /dev/urandom is that it
adds another way to do denial of service attacks. That may or may not
be an issue in a particular use since there may be more effective ways
of doing denial of service attacks in other ways.

Re: Random not so random

From
Marco Colombo
Date:
On Tue, 5 Oct 2004, Tom Lane wrote:

> now.tv_sec, and it's perfectly portable.  No one in their right mind
> expects random(3) to be cryptographically secure anyway, so doing more
> doesn't seem warranted.

Tom, having a source of "real" random data isn't useful just for crypto
applications. No PRNG is perfect, when it comes to statistics.

> The various proposals to create a more-secure, less-portable variant
> of random() don't seem appropriate to me for beta.  But I'd not object
> to someone whipping up a contrib module for 8.1 or beyond.

Agreed.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Random not so random

From
Marco Colombo
Date:
On Wed, 6 Oct 2004, Bruno Wolff III wrote:

> I am going to keep this on general for now, since it seems like other people
> might be interested even though it is straying a somewhat off topic.

Agreed.
Anyone who's not really interested on /dev/[u]random issues should
jump right to the last paragraphs of this message, which are more
on-topic.

> On Wed, Oct 06, 2004 at 18:02:39 +0200,
>  Marco Colombo <marco@esi.it> wrote:
>>
>> It depends. What's wrong with a SQL function taking long to
>> complete? It could be a long computation, maybe days long. As far as
>
> Days long SQL queries are hardly normal. Common things you might want
> to generate secure random numbers for aren't going to be queries you
> want to run a long time. For example you might want to generate
> session ids to store in a cookie handed out for web requests.

I'm just pointing out that:

(1)     select read_from_dev_random();

may take a while if the read() blocks, but it's not that different
from:

(2)     select very_complex_aggregate() from huge_table;

which may take a while as well. I think the backend is already able to
handle a blocking read(), since _any_ read() from disk may block, a read()
from /dev/random is nothing special. For clients, there's no difference.
So I don't see why a blocking read from /dev/random should be a problem.

>> the client is concerned, there's no difference. Of course you'll have
>> to use it with care, just as you would do with any potentially long
>> function (that is, don't lock everything and then sit down waiting).
>
> This might be reasonable if there was a significant benefit to doing so.
> My argument is that /dev/random does not produce significantly more
> secure random numbers under normal circumstances.
>
>>> SHA1 is pretty safe for this purpose. The recent weaknesses of related
>>> hashes isn't going to be of much help in predicting the output of
>>> /dev/urandom. If SHA1 were to be broken that badly /dev/random would
>>> also be broken.
>>
>> No, because /dev/random returns only after the internal state has been
>> perturbed "enough". An observer may (in theory) get to know all
>> bits of the internal pool right _after_ a read() from /dev/random,
>> but still isn't able to guess the value of all of them right before the
>> _next_ read() returns, exactly because the kernel waits until the return
>> is safe in this regard.
>
> OK, I'll buy that for cases where the estimates of entropy acquired are
> reasonable. This may still be a problem for some uses though. It may allow
> an attacker to figure out other data returned by /dev/random that they
> weren't able to observe.

If the kernel is overestimating the entropy count, it's a kernel bug,
and should be fixed.

>> Now, if the attacker gets the output, and breaks SHA1, he'll know the
>> internal state again. But if the output goes elsewhere, he won't be
>> able to guess it. That's why /dev/random is 'safe'. It won't return
>> until the output is random enough.
>
> You don't necessarily need to break SHA1 to be able to track the internal
> state.

Well, I'm not an expert. I base my knowledge on what other people say.
Quoting the opening comment in random.c from a recent Linux kernel tree:

  * When random bytes are desired, they are obtained by taking the SHA
  * hash of the contents of the "entropy pool".  The SHA hash avoids
  * exposing the internal state of the entropy pool.  It is believed to
  * be computationally infeasible to derive any useful information
  * about the input of SHA from its output.
[...]
  * If this estimate goes to zero, the routine can still generate
  * random numbers; however, an attacker may (at least in theory) be
  * able to infer the future output of the generator from prior
  * outputs.  This requires successful cryptanalysis of SHA, which is
  * not believed to be feasible, but there is a remote possibility.

The latter refers to the /dev/urandom interface, not /dev/random.

[...]
  * The two other interfaces are two character devices /dev/random and
  * /dev/urandom.  /dev/random is suitable for use when very high
  * quality randomness is desired (for example, for key generation or
  * one-time pads), as it will only return a maximum of the number of
  * bits of randomness (as estimated by the random number generator)
  * contained in the entropy pool.
  *
  * The /dev/urandom device does not have this limit, and will return
  * as many bytes as are requested.  As more and more random bytes are
  * requested without giving time for the entropy pool to recharge,
  * this will result in random numbers that are merely cryptographically
  * strong.  For many applications, however, this is acceptable.

Whether "random numbers that are merely cryptographically strong"
are acceptable for your application is up to you. Your application
blocking might be even a less acceptable event, but I don't buy this
kind of argument. Random bits are just another resource.
If the application needs disk space, and the system is out of it,
the application "blocks" or worse. The same for virtual memory.
Should we stop using hard disks or RAM? I guess no. Just as you
make sure your apps have enough disk and RAM, make sure they have
enough random bits, so that /dev/random never blocks (that is, use
a larger pool and feed it from an external source).

>> Now, my esteem is 0 entropy bits for my pool, since you drained them all.
>>
>> "Secure Application X" needs to generate a session key (128bits),
>> so reads them from /dev/urandom. Let's assume the entropy count is still 0.
>> I (the kernel) provide it with 128bits that come from my PRNG + SHA1 engine.
>> Now, you can predict those 128 bits easily, and thus you know the session
>> key. The attack succeeds.
>>
>> "Really Secure Application Y" needs 128 bits, too. But this time it
>> reads from /dev/random (assume entropy count is still 0).
>> Now, I can't fulfill the request. So have Y wait on read().
>>
>> As time passes, I start collecting entropy bits, from IRQ timings,
>> or a hardware RNG. These bits change the internal pool in a way you
>> don't know. Eventually I get the count up to 128 bits.
>> Now, I run my PRNG (from the now-changed state) + SHA1 engine, and return
>> 128 bits to Y. Can you guess the session key? Of course. You know
>> how internal state was before, you can "go through all possible entropy
>> values". How many of them? 2^128. That's, no wonder, the same of trying
>> to guess the key directly. Now you're telling me if you had the key,
>> you could know the internal state again? Man, if you had the key, you
>> already broke application Y, and the attack already succeeded by other
>> means!
>
> Assuming that all 128 bits were grabbed in one call to /dev/random, you
> should be safe. However if the bits were returned a byte at a time
> with intervening bytes returned to /dev/urandom to the attacker, the
> key could be vunerable.

I'm not sure I get the "intervening bytes" part. Guessing a 128-bits long
string or a 160-bits long one of which 32 bits are known shouldn't make
any difference.

[...]
>> No. SHA1 protects the internal pool from an attacker who knows all the
>> output. That's easy, just read from /dev/random until it blocks. If you're
>> fast enough, you can assume no one else read from /dev/random. Now,
>> _if you can break SHA1_, you have enough cleartext output of the internal
>> PRNG to guess it's state. You may have used /dev/urandom as well, there's
>> no difference.
>
> I think you misunderstood the context. The attacker is assumed to have
> initially gotten the state through some means. By watching the output
> from /dev/urandom it is possible to figure out what the current state is
> if not too much entropy has been added since the attacker knew the state.
>
>> The purpose of /dev/random blocking is not protecting the internal pool
>> state, that's SHA1 job. /dev/random blocks in order to protect other
>> users (application Y in my example) from you.
>
> SHA1 only protects you from getting the state solely be viewing the output.
> It doesn't protect against you getting through other means.

The only other means I can think of (speaking of Linux), are:

- the CAP_SYS_ADMIN RNDGETPOOL ioctl(), to read the pool contents;
- the CAP_SYS_ADMIN RNDADDENTROPY ioctl(), to 'taint' the pool;
- the CAP_SYS_ADMIN RNDADDTOENTCNT ioctl(), to trick the kernel into
   thinking the pool contains more randomness w/o adding any.

No wonder, these methods are CAP_SYS_ADMIN only. If you have the
CAP_SYS_ADMIN capability, you already own the system.

Ok, maybe you can control the physics of your hard disks, and have them
behave in a known (to you) way. But if you can do that, you are Q, and
already own the universe. :)

> I think /dev/random output would be better protected if it didn't share
> a pool with /dev/urandom. Doing that is what makes tracking the internal
> state possible.
>
> Another issue with using /dev/random instead of /dev/urandom is that it
> adds another way to do denial of service attacks. That may or may not
> be an issue in a particular use since there may be more effective ways
> of doing denial of service attacks in other ways.

Exaclty. Randomness is just a resourse. Like many other resorces,
it's limited, and that opens to possible DoS attacks. Nothing new.


I think the whole point is: does PostgreSQL have to care about this?
Is providing a source of real random data valuable? Or should the
client handle the issue?

Well, in a Web based application, some kind of server-side tier is
likely to be already present: no one is so crazy to have the DB
directly generate HTML pages, right? (pun intended) :)
So it could take care of /dev/random and friends as well, w/o relaying
on the DB for that. That's because it runs on the server.

But in a real client-server application, where all the logic is
either in the DB, or in the (remote) client, placing the need to
collect really random data in a safe way on the clients may be
infeasible.

The same argument holds for the current weak random() we have now.
I can't think of any client-side language that doesn't provide
a random() function. Even bash can do some $RANDOM. A purist may
argue that "providing a random anwser to a query" is not the job of
a RDBMS. But a random() function can be useful sometimes.

I know 0 of PostgreSQL internals and development issues.
I'll play with the idea of making a contrib module.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Random not so random

From
Bruno Wolff III
Date:
On Thu, Oct 07, 2004 at 12:08:51 +0200,
  Marco Colombo <pgsql@esiway.net> wrote:
> On Wed, 6 Oct 2004, Bruno Wolff III wrote:
>
> >You don't necessarily need to break SHA1 to be able to track the internal
> >state.
>
> Well, I'm not an expert. I base my knowledge on what other people say.
> Quoting the opening comment in random.c from a recent Linux kernel tree:

I was thinking something like have root access for a short time or getting
access to the state that is saved between reboots off a backup tape or
something.

> Whether "random numbers that are merely cryptographically strong"
> are acceptable for your application is up to you. Your application
> blocking might be even a less acceptable event, but I don't buy this
> kind of argument. Random bits are just another resource.

I think we agree here. I just don't see many cases where the risk of SHA1
being broken in a way useful to compromise /dev/urandom is significant
compared to other risks for a typical postgres system.

> >Assuming that all 128 bits were grabbed in one call to /dev/random, you
> >should be safe. However if the bits were returned a byte at a time
> >with intervening bytes returned to /dev/urandom to the attacker, the
> >key could be vunerable.
>
> I'm not sure I get the "intervening bytes" part. Guessing a 128-bits long
> string or a 160-bits long one of which 32 bits are known shouldn't make
> any difference.

I am not sure of the semantics of a read from /dev/random requesting, say
16 bytes. If you are guarenteed to get all 16 bytes without anyone else
being able to get any /dev/urandom reads inbetween any of the bytes, you
have convinced me that /dev/random does protect the data better than
/dev/urandom. However if you get /dev/urandom output between the bytes
(by say knowing the approximate rate at which entropy is being collected
and doing the requests in bursts shortly after estimating enough entropy
for one byte has been obtained) then you have to only break 16 8 bit
keys instead of one 128 bit key.

> I know 0 of PostgreSQL internals and development issues.
> I'll play with the idea of making a contrib module.

It should be pretty easy to write a C function that reads from /dev/[u]random.
Extensability is one of the nice features of Postgres.

Re: Random not so random

From
Marco Colombo
Date:
On Mon, 4 Oct 2004, Bruno Wolff III wrote:

> On Mon, Oct 04, 2004 at 18:58:41 +0200,
>  Marco Colombo <pgsql@esiway.net> wrote:
>>
>> Actually, that should be done each time the random() function
>> is evaluated. (I have no familiarity with the code, so please
>
> That may be overkill, since I don't think that random has been advertised
> as a secure or even particularly strong random number generator.
>
>> bear with me if the suggestion is unsound). I'd even add a parameter
>> for "really" random data to be provided, by reading /dev/random
>> instead of /dev/urandom (but read(2) may block).
>
> You don't want to use /dev/random. You aren't going to get better random
> numbers that way and blocking reads is a big problem.

Sure you are. As far as the entropy pool isn't empty, /dev/random
won't block, and thus there's no difference in behaviour.
When you're short of random bits, /dev/random blocks, /dev/urandom
falls back to a PRNG + hash (I think SHA1). Under these conditions,
/dev/urandom output has 0 "entropy" at all: an attacker can predict
the output after short observation provided that he can break SHA1.
That is, anything that uses /dev/urandom (when the kernel pool is
empty) is just as safe as SHA1 is.

I agree that for a general purpose 'good' random() function,
/dev/urandom is enough (as opposed to a plain-old PRNG).
In some applications, you may need the extra security provided
by /dev/random: its output (_when_ is available) it's always
truly random (as long as you trust the kernel, of course - there
have been bugs in the past in Linux about overestimating the randomness
of certain sources, but they've been corrected AFAIK).

>> How about the following:
>> random() = random(0) = traditional random()
>> random(1) = best effort random() via /dev/urandom
>> random(2) = wait for really random bits via /dev/random
>
> It might be nice to have a secure random function available in postgres.
> Just using /dev/urandom is probably good enough to provide this service.

Why not all of them. The problem is how to handle a potentially
blocking read in /dev/random (actually _any_ disk read may block
as well). Just warn people not to use random(2) unless they really
know what they're doing...

I don't think the read syscall overhead is noticeable (in Linux at least).
But for sure we can't afford to _open_ /dev/urandom each time...
backends will have to keep an extra fd open just for /dev/urandom... hmm...
I can't think of any better way of doing that.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it