Re: Random not so random - Mailing list pgsql-general

From Marco Colombo
Subject Re: Random not so random
Date
Msg-id Pine.LNX.4.61.0410070931070.22573@Megathlon.ESI
Whole thread Raw
In response to Re: Random not so random  (Bruno Wolff III <bruno@wolff.to>)
Responses Re: Random not so random  (Bruno Wolff III <bruno@wolff.to>)
List pgsql-general
On Wed, 6 Oct 2004, Bruno Wolff III wrote:

> I am going to keep this on general for now, since it seems like other people
> might be interested even though it is straying a somewhat off topic.

Agreed.
Anyone who's not really interested on /dev/[u]random issues should
jump right to the last paragraphs of this message, which are more
on-topic.

> On Wed, Oct 06, 2004 at 18:02:39 +0200,
>  Marco Colombo <marco@esi.it> wrote:
>>
>> It depends. What's wrong with a SQL function taking long to
>> complete? It could be a long computation, maybe days long. As far as
>
> Days long SQL queries are hardly normal. Common things you might want
> to generate secure random numbers for aren't going to be queries you
> want to run a long time. For example you might want to generate
> session ids to store in a cookie handed out for web requests.

I'm just pointing out that:

(1)     select read_from_dev_random();

may take a while if the read() blocks, but it's not that different
from:

(2)     select very_complex_aggregate() from huge_table;

which may take a while as well. I think the backend is already able to
handle a blocking read(), since _any_ read() from disk may block, a read()
from /dev/random is nothing special. For clients, there's no difference.
So I don't see why a blocking read from /dev/random should be a problem.

>> the client is concerned, there's no difference. Of course you'll have
>> to use it with care, just as you would do with any potentially long
>> function (that is, don't lock everything and then sit down waiting).
>
> This might be reasonable if there was a significant benefit to doing so.
> My argument is that /dev/random does not produce significantly more
> secure random numbers under normal circumstances.
>
>>> SHA1 is pretty safe for this purpose. The recent weaknesses of related
>>> hashes isn't going to be of much help in predicting the output of
>>> /dev/urandom. If SHA1 were to be broken that badly /dev/random would
>>> also be broken.
>>
>> No, because /dev/random returns only after the internal state has been
>> perturbed "enough". An observer may (in theory) get to know all
>> bits of the internal pool right _after_ a read() from /dev/random,
>> but still isn't able to guess the value of all of them right before the
>> _next_ read() returns, exactly because the kernel waits until the return
>> is safe in this regard.
>
> OK, I'll buy that for cases where the estimates of entropy acquired are
> reasonable. This may still be a problem for some uses though. It may allow
> an attacker to figure out other data returned by /dev/random that they
> weren't able to observe.

If the kernel is overestimating the entropy count, it's a kernel bug,
and should be fixed.

>> Now, if the attacker gets the output, and breaks SHA1, he'll know the
>> internal state again. But if the output goes elsewhere, he won't be
>> able to guess it. That's why /dev/random is 'safe'. It won't return
>> until the output is random enough.
>
> You don't necessarily need to break SHA1 to be able to track the internal
> state.

Well, I'm not an expert. I base my knowledge on what other people say.
Quoting the opening comment in random.c from a recent Linux kernel tree:

  * When random bytes are desired, they are obtained by taking the SHA
  * hash of the contents of the "entropy pool".  The SHA hash avoids
  * exposing the internal state of the entropy pool.  It is believed to
  * be computationally infeasible to derive any useful information
  * about the input of SHA from its output.
[...]
  * If this estimate goes to zero, the routine can still generate
  * random numbers; however, an attacker may (at least in theory) be
  * able to infer the future output of the generator from prior
  * outputs.  This requires successful cryptanalysis of SHA, which is
  * not believed to be feasible, but there is a remote possibility.

The latter refers to the /dev/urandom interface, not /dev/random.

[...]
  * The two other interfaces are two character devices /dev/random and
  * /dev/urandom.  /dev/random is suitable for use when very high
  * quality randomness is desired (for example, for key generation or
  * one-time pads), as it will only return a maximum of the number of
  * bits of randomness (as estimated by the random number generator)
  * contained in the entropy pool.
  *
  * The /dev/urandom device does not have this limit, and will return
  * as many bytes as are requested.  As more and more random bytes are
  * requested without giving time for the entropy pool to recharge,
  * this will result in random numbers that are merely cryptographically
  * strong.  For many applications, however, this is acceptable.

Whether "random numbers that are merely cryptographically strong"
are acceptable for your application is up to you. Your application
blocking might be even a less acceptable event, but I don't buy this
kind of argument. Random bits are just another resource.
If the application needs disk space, and the system is out of it,
the application "blocks" or worse. The same for virtual memory.
Should we stop using hard disks or RAM? I guess no. Just as you
make sure your apps have enough disk and RAM, make sure they have
enough random bits, so that /dev/random never blocks (that is, use
a larger pool and feed it from an external source).

>> Now, my esteem is 0 entropy bits for my pool, since you drained them all.
>>
>> "Secure Application X" needs to generate a session key (128bits),
>> so reads them from /dev/urandom. Let's assume the entropy count is still 0.
>> I (the kernel) provide it with 128bits that come from my PRNG + SHA1 engine.
>> Now, you can predict those 128 bits easily, and thus you know the session
>> key. The attack succeeds.
>>
>> "Really Secure Application Y" needs 128 bits, too. But this time it
>> reads from /dev/random (assume entropy count is still 0).
>> Now, I can't fulfill the request. So have Y wait on read().
>>
>> As time passes, I start collecting entropy bits, from IRQ timings,
>> or a hardware RNG. These bits change the internal pool in a way you
>> don't know. Eventually I get the count up to 128 bits.
>> Now, I run my PRNG (from the now-changed state) + SHA1 engine, and return
>> 128 bits to Y. Can you guess the session key? Of course. You know
>> how internal state was before, you can "go through all possible entropy
>> values". How many of them? 2^128. That's, no wonder, the same of trying
>> to guess the key directly. Now you're telling me if you had the key,
>> you could know the internal state again? Man, if you had the key, you
>> already broke application Y, and the attack already succeeded by other
>> means!
>
> Assuming that all 128 bits were grabbed in one call to /dev/random, you
> should be safe. However if the bits were returned a byte at a time
> with intervening bytes returned to /dev/urandom to the attacker, the
> key could be vunerable.

I'm not sure I get the "intervening bytes" part. Guessing a 128-bits long
string or a 160-bits long one of which 32 bits are known shouldn't make
any difference.

[...]
>> No. SHA1 protects the internal pool from an attacker who knows all the
>> output. That's easy, just read from /dev/random until it blocks. If you're
>> fast enough, you can assume no one else read from /dev/random. Now,
>> _if you can break SHA1_, you have enough cleartext output of the internal
>> PRNG to guess it's state. You may have used /dev/urandom as well, there's
>> no difference.
>
> I think you misunderstood the context. The attacker is assumed to have
> initially gotten the state through some means. By watching the output
> from /dev/urandom it is possible to figure out what the current state is
> if not too much entropy has been added since the attacker knew the state.
>
>> The purpose of /dev/random blocking is not protecting the internal pool
>> state, that's SHA1 job. /dev/random blocks in order to protect other
>> users (application Y in my example) from you.
>
> SHA1 only protects you from getting the state solely be viewing the output.
> It doesn't protect against you getting through other means.

The only other means I can think of (speaking of Linux), are:

- the CAP_SYS_ADMIN RNDGETPOOL ioctl(), to read the pool contents;
- the CAP_SYS_ADMIN RNDADDENTROPY ioctl(), to 'taint' the pool;
- the CAP_SYS_ADMIN RNDADDTOENTCNT ioctl(), to trick the kernel into
   thinking the pool contains more randomness w/o adding any.

No wonder, these methods are CAP_SYS_ADMIN only. If you have the
CAP_SYS_ADMIN capability, you already own the system.

Ok, maybe you can control the physics of your hard disks, and have them
behave in a known (to you) way. But if you can do that, you are Q, and
already own the universe. :)

> I think /dev/random output would be better protected if it didn't share
> a pool with /dev/urandom. Doing that is what makes tracking the internal
> state possible.
>
> Another issue with using /dev/random instead of /dev/urandom is that it
> adds another way to do denial of service attacks. That may or may not
> be an issue in a particular use since there may be more effective ways
> of doing denial of service attacks in other ways.

Exaclty. Randomness is just a resourse. Like many other resorces,
it's limited, and that opens to possible DoS attacks. Nothing new.


I think the whole point is: does PostgreSQL have to care about this?
Is providing a source of real random data valuable? Or should the
client handle the issue?

Well, in a Web based application, some kind of server-side tier is
likely to be already present: no one is so crazy to have the DB
directly generate HTML pages, right? (pun intended) :)
So it could take care of /dev/random and friends as well, w/o relaying
on the DB for that. That's because it runs on the server.

But in a real client-server application, where all the logic is
either in the DB, or in the (remote) client, placing the need to
collect really random data in a safe way on the clients may be
infeasible.

The same argument holds for the current weak random() we have now.
I can't think of any client-side language that doesn't provide
a random() function. Even bash can do some $RANDOM. A purist may
argue that "providing a random anwser to a query" is not the job of
a RDBMS. But a random() function can be useful sometimes.

I know 0 of PostgreSQL internals and development issues.
I'll play with the idea of making a contrib module.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: database constraints
Next
From: Marco Colombo
Date:
Subject: Re: database constraints