Thread: pgcrypto seeding problem when ssl=on

pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
When there is 'ssl=on' then postmaster calls SSL_CTX_new(),
which asks for random number, thus requiring initialization
of randomness pool (RAND_poll).  After that all forked backends
think pool is already initialized.  Thus they proceed with same
fixed state they got from postmaster.

Output is not completely constant due to:
1) When outputting random bytes, OpenSSL includes getpid()
   output when hashing pool state.
2) SSL_accept() adds time() output into pool before asking
   for random bytes.  This and 1) should make sure SSL handshake
   is done with unique random numbers.  [It's uncertain tho'
   how unpredictable the PRNG is in such mode.]

Now, problem in pgcrypto side is that when compiled with OpenSSL,
it expects the randomness pool to be already initialized.  This
expectation is filled when ssl=off, or when actual SSL connection
is used.  But it's broken when non-SSL connection is used while
having ssl=on in config.  Then all backends are in same state
when they reach pgcrypto functions first time and output is only
affected by pid.

Affected:
* pgp_encrypt*() - it feeds hashes of user data back to pool,
  but this is "randomized", thus there is small chance first
  few messages have weaker keys.

* gen_random_bytes() - this does not feed back anything, thus
  when used alone in session, it's output will repeat quite easily
  as randomness sequence is affected only by pid.

Attached patch makes both gen_random_bytes() and pgp_encrypt()
seed pool with output from gettimeofday(), thus getting pool
off from fixed state.  Basically, this mirrors what SSL_accept()
already does.

I'd like to do bigger reorg of seeding code in pgcrypto, but
that might not be back-patchable.  So I propose this simple fix,
which should be applied also to older releases.

--
marko

Attachment

Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
This should have gone to security@postgresql.org, instead.

On Fri, Dec 21, 2012 at 06:05:10PM +0200, Marko Kreen wrote:
> When there is 'ssl=on' then postmaster calls SSL_CTX_new(),
> which asks for random number, thus requiring initialization
> of randomness pool (RAND_poll).  After that all forked backends
> think pool is already initialized.  Thus they proceed with same
> fixed state they got from postmaster.

> Attached patch makes both gen_random_bytes() and pgp_encrypt()
> seed pool with output from gettimeofday(), thus getting pool
> off from fixed state.  Basically, this mirrors what SSL_accept()
> already does.

That adds only 10-20 bits of entropy.  Is that enough?

How about instead calling RAND_cleanup() after each backend fork?

Thanks,
nm



Re: pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
> This should have gone to security@postgresql.org, instead.

It went and this could have been patched in 9.2.2 but they did not care.

> On Fri, Dec 21, 2012 at 06:05:10PM +0200, Marko Kreen wrote:
>> When there is 'ssl=on' then postmaster calls SSL_CTX_new(),
>> which asks for random number, thus requiring initialization
>> of randomness pool (RAND_poll).  After that all forked backends
>> think pool is already initialized.  Thus they proceed with same
>> fixed state they got from postmaster.
>
>> Attached patch makes both gen_random_bytes() and pgp_encrypt()
>> seed pool with output from gettimeofday(), thus getting pool
>> off from fixed state.  Basically, this mirrors what SSL_accept()
>> already does.
>
> That adds only 10-20 bits of entropy.  Is that enough?

It's enough to get numbers that are not the same.

Whether it's possible to guess next number when you know
that there is only time() and getpid() added to fixed state
(eg. those cleartext bytes in SSL handshake) - i don't know.

In any case, this is how Postgres already operates SSL connections.

> How about instead calling RAND_cleanup() after each backend fork?

Not "instead" - the gettimeofday() makes sense in any case.  Considering
that it's immediate problem only for pgcrypto, this patch has higher chance
for appearing in back-branches.

If the _cleanup() makes next RAND_bytes() call RAND_poll() again?
Then yes, it certainly makes sense as core patch.

-- 
marko



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Sat, Dec 22, 2012 at 12:59:55AM +0200, Marko Kreen wrote:
> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
> > This should have gone to security@postgresql.org, instead.
> 
> It went and this could have been patched in 9.2.2 but they did not care.

Ah.

> > On Fri, Dec 21, 2012 at 06:05:10PM +0200, Marko Kreen wrote:
> >> When there is 'ssl=on' then postmaster calls SSL_CTX_new(),
> >> which asks for random number, thus requiring initialization
> >> of randomness pool (RAND_poll).  After that all forked backends
> >> think pool is already initialized.  Thus they proceed with same
> >> fixed state they got from postmaster.
> >
> >> Attached patch makes both gen_random_bytes() and pgp_encrypt()
> >> seed pool with output from gettimeofday(), thus getting pool
> >> off from fixed state.  Basically, this mirrors what SSL_accept()
> >> already does.
> >
> > That adds only 10-20 bits of entropy.  Is that enough?
> 
> It's enough to get numbers that are not the same.

I think I see what you're getting at.  With an ssl=on postmaster that only
processes non-SSL connections, calls to gen_random_bytes() in a session can,
at least initially, only produce about 2^15[1] distinct random sequences per
postmaster restart.  Furthermore, an attacker with access to run queries can
generate the list of possible sequences, then proceed to know the exact bytes
each other backend with receive, for the life of the postmaster.  With your
patch, no two backends will ever mix the same PID+microsecond time into the
random state.

For anyone following along, the following Perl program demonstrates the
weakness.  After it has visited the full range of PIDs, every future iteration
will find a duplicate:

for my $offset (1 .. 100000) {my $db = DBI->connect('dbi:Pg:', undef, undef, {RaiseError => 1});my($bytes) =
$db->selectrow_array('SELECTgen_random_bytes(10)');print "Saw $bytes twice!\n" if $seen{$bytes};$seen{$bytes}++;print
"Done$offset\n" unless $offset % 100;
 
}

> Whether it's possible to guess next number when you know
> that there is only time() and getpid() added to fixed state
> (eg. those cleartext bytes in SSL handshake) - i don't know.

I don't know, either.

> In any case, this is how Postgres already operates SSL connections.

Fair point.  Interesting reading (search for "fork"):

http://www.acsac.org/2003/papers/79.pdf
http://www.apache-ssl.org/randomness.pdf
http://openssl.6102.n7.nabble.com/recycled-pids-causes-PRNG-to-repeat-td41669.html

The first paper suggests exactly what you've implemented.  Between that and
OpenSSL doing it, I don't feel further need to doubt its adequacy.

> > How about instead calling RAND_cleanup() after each backend fork?
> 
> Not "instead" - the gettimeofday() makes sense in any case.  Considering
> that it's immediate problem only for pgcrypto, this patch has higher chance
> for appearing in back-branches.

I worry that it's too cavalier to leave the OpenSSL PRNG in an insecure state,
expecting every consumer to fix the problem or, failing to do so, silently
operate in an insecure manner.  pgcrypto may be the only security-critical
user in our tree, but others in the field would not surprise me.  Putting the
change after fork_process() in BackendStartup() solves the problem for good.

> If the _cleanup() makes next RAND_bytes() call RAND_poll() again?
> Then yes, it certainly makes sense as core patch.

It does.  The disadvantage is that we'll then draw bytes from /dev/urandom for
every backend fork.  The Linux /dev/urandom is mostly designed for such abuse,
but it would carry some unnecessary risk for back branches.  So, I think your
mixing of time into the seed is indeed best.

Thanks,
nm

[1] That assumes the common 1 ... 32768 PID range.



Re: pgcrypto seeding problem when ssl=on

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Dec 22, 2012 at 12:59:55AM +0200, Marko Kreen wrote:
>> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
>>> This should have gone to security@postgresql.org, instead.
>> 
>> It went and this could have been patched in 9.2.2 but they did not care.

> Ah.

A slightly less revisionist view of the history is that the patch that
Marko originally proposed to -security was considerably more complicated
and less portable than this one, so we put it on hold until we thought of
something better.  This new patch looks pretty reasonable from here,
modulo the question of whether it adds "enough" entropy.

I'm not entirely sold on whether it does.  ISTM that any attack based on
this line of thinking already assumes that the attacker has complete
knowledge of how many backends have been launched (else he doesn't know
which sequence a targeted session will get).  If he knows that much,
mightn't he also know *when* they were launched?  Alternatively: if he
can know the session's start time (which we helpfully make available...)
how much harder is this really making it for him to deduce the session's
seed?

On top of which: what exactly will he do with the seed once he's got it
that would amount to a security problem?

Or to put it in different terms, I'm not quite convinced that there's a
plausible threat model that this patch blocks effectively.
        regards, tom lane



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Fri, Dec 21, 2012 at 10:05:30PM -0500, Tom Lane wrote:
> This new patch looks pretty reasonable from here,
> modulo the question of whether it adds "enough" entropy.

More brains reviewing that question will be good.

> I'm not entirely sold on whether it does.  ISTM that any attack based on
> this line of thinking already assumes that the attacker has complete
> knowledge of how many backends have been launched (else he doesn't know
> which sequence a targeted session will get).

pg_stat_activity.pid will do.

> If he knows that much,
> mightn't he also know *when* they were launched?  Alternatively: if he
> can know the session's start time (which we helpfully make available...)
> how much harder is this really making it for him to deduce the session's
> seed?

If he has access to his own fork-time seed, he can indeed estimate the seeds
of other sessions.  That seed is currently invisible to non-native code, and a
user able to deploy a C function already has the access needed to compromise
all sessions.  The assumption that the fork-time seed stays buried is a source
of unease, though.

> On top of which: what exactly will he do with the seed once he's got it
> that would amount to a security problem?
> 
> Or to put it in different terms, I'm not quite convinced that there's a
> plausible threat model that this patch blocks effectively.

The examples could be as numerous as the algorithms that specify use of a
cryptographically secure PRNG.  Here's a simple one: an application generates
long-term private encryption keys by connecting, issuing "SELECT
gen_random_bytes(16)", and disconnecting.  Long-term, across postmaster
restarts, there are still almost 2^128 possible keys.  However, backends of
any given postmaster can only generate 2^15 possible keys.  An attacker can
attempt to acquire, over time, a backend with every PID in the system.  The
script I gave earlier is an example of doing so.  He won't manage to visit
literally every PID, sure, but he'll easily get 95% of them.  By issuing
"SELECT pg_backend_pid(), gen_random_bytes(16)" in each session, he assembles
a dictionary of most possible keys under this postmaster.  If he repeats this
after each postmaster restart, he might acquire a dictionary covering months
or years of key generation.  Given a ciphertext based on a key presumed to be
made during that time, he can try his relatively-small dictionary with a high
chance of success.

nm



Re: pgcrypto seeding problem when ssl=on

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
>> How about instead calling RAND_cleanup() after each backend fork?

> Not "instead" - the gettimeofday() makes sense in any case.  Considering
> that it's immediate problem only for pgcrypto, this patch has higher chance
> for appearing in back-branches.

> If the _cleanup() makes next RAND_bytes() call RAND_poll() again?
> Then yes, it certainly makes sense as core patch.

I thought some more about this, and I think there is a pretty strong
objection to this version of the patch: it *only* fixes the
lack-of-randomness problem for pgcrypto users.  Any other third-party
code depending on the OpenSSL PRNG will have the same problem.
Furthermore, it's not even obvious that this patch fixes it for all
pgcrypto functions --- it probably is all right today, since I assume
Marko remembers all the connections in that code, but it would be easy
to miss the problem in future additions.

I believe that we'd be better off doing something in postmaster.c to
positively ensure that each session has a distinct seed value.  Notice
that BackendRun() already takes measures to ensure that's the case for
the regular libc random() function; it seems like a reasonable extension
to also worry about OpenSSL's PRNG.

I'm not thrilled with the suggestion to do RAND_cleanup() after forking
though, as that seems like it'll guarantee that each session starts out
with only a minimal amount of entropy.  What seems to me like it'd be
most secure is to make the postmaster do RAND_add() with a gettimeofday
result after each successful fork, say at the bottom of
BackendStartup().  That way the randomness accumulates in the parent
process, and there's no way to predict the initial state of any session
without exact knowledge of every child process launch since postmaster
start.  So it'd go something like

#ifdef USE_SSLif (EnableSSL){    struct timeval tv;
    gettimeofday(&tv, NULL);    RAND_add(&tv, sizeof(tv), 0);}
#endif

We could perhaps also make this conditional on not EXEC_BACKEND, since
the whole issue is moot if backends are launched by fork/exec.
        regards, tom lane



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote:
> I believe that we'd be better off doing something in postmaster.c to
> positively ensure that each session has a distinct seed value.  Notice
> that BackendRun() already takes measures to ensure that's the case for
> the regular libc random() function; it seems like a reasonable extension
> to also worry about OpenSSL's PRNG.

> #ifdef USE_SSL
>     if (EnableSSL)
>     {
>         struct timeval tv;
> 
>         gettimeofday(&tv, NULL);
>         RAND_add(&tv, sizeof(tv), 0);
>     }
> #endif

Take the caution one step further and make it independent of EnableSSL.  In a
stock installation, a !EnableSSL postmaster will never seed its PRNG, and
there's no vulnerability.  Add a shared_preload_libraries module that uses the
OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again.

Other than that, looks good.

> We could perhaps also make this conditional on not EXEC_BACKEND, since
> the whole issue is moot if backends are launched by fork/exec.

True.



Re: pgcrypto seeding problem when ssl=on

From
Andres Freund
Date:
On 2012-12-22 14:20:56 -0500, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
> >> How about instead calling RAND_cleanup() after each backend fork?
>
> > Not "instead" - the gettimeofday() makes sense in any case.  Considering
> > that it's immediate problem only for pgcrypto, this patch has higher chance
> > for appearing in back-branches.
>
> > If the _cleanup() makes next RAND_bytes() call RAND_poll() again?
> > Then yes, it certainly makes sense as core patch.
>
> I thought some more about this, and I think there is a pretty strong
> objection to this version of the patch: it *only* fixes the
> lack-of-randomness problem for pgcrypto users.  Any other third-party
> code depending on the OpenSSL PRNG will have the same problem.
> Furthermore, it's not even obvious that this patch fixes it for all
> pgcrypto functions --- it probably is all right today, since I assume
> Marko remembers all the connections in that code, but it would be easy
> to miss the problem in future additions.
>
> I believe that we'd be better off doing something in postmaster.c to
> positively ensure that each session has a distinct seed value.  Notice
> that BackendRun() already takes measures to ensure that's the case for
> the regular libc random() function; it seems like a reasonable extension
> to also worry about OpenSSL's PRNG.
>
> I'm not thrilled with the suggestion to do RAND_cleanup() after forking
> though, as that seems like it'll guarantee that each session starts out
> with only a minimal amount of entropy.  What seems to me like it'd be
> most secure is to make the postmaster do RAND_add() with a gettimeofday
> result after each successful fork, say at the bottom of
> BackendStartup().  That way the randomness accumulates in the parent
> process, and there's no way to predict the initial state of any session
> without exact knowledge of every child process launch since postmaster
> start.

I am entirely unconvinced that adding in a gettimeofday() provides more
entropy than what openssl gathers internally after a
RAND_cleanup(). gettimeofday() will yield the same result in close
enough forks on a significant number of systems whereas openssl should
use stuff like /dev/urandom to initialize its PRNG after a cleanup.

I think a better solution might be to use up some entropy in the
postmaster *before* doing a fork and then mix in the pid +
gettimeofday()after the fork.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not thrilled with the suggestion to do RAND_cleanup() after forking
> though, as that seems like it'll guarantee that each session starts out
> with only a minimal amount of entropy.

No, that's actually the right fix - that forces OpenSSL to do new reseed
with system randomness, thus making backend (including SSL_accept)
maximally disconnected from static pool in postmaster.

This also makes behaviour equal to current ssl=off and exec-backend
mode, which already do initial seeding in backend.

The fact that PRNG behaviour is affected by complex set of compile-
and run-time switches makes current situation rather fragile and
hard to understand.

>  What seems to me like it'd be
> most secure is to make the postmaster do RAND_add() with a gettimeofday
> result after each successful fork, say at the bottom of
> BackendStartup().  That way the randomness accumulates in the parent
> process, and there's no way to predict the initial state of any session
> without exact knowledge of every child process launch since postmaster
> start.  So it'd go something like
>
> #ifdef USE_SSL
>         if (EnableSSL)
>         {
>                 struct timeval tv;
>
>                 gettimeofday(&tv, NULL);
>                 RAND_add(&tv, sizeof(tv), 0);
>         }
> #endif

If you decide against RAND_cleanup in backend and instead
do workarounds in backend or postmaster, then please
also to periodic RAND_cleanup in postmaster.  This should
make harder to exploit weaknesses in reused slowly moving state.

-- 
marko



Re: pgcrypto seeding problem when ssl=on

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not thrilled with the suggestion to do RAND_cleanup() after forking
>> though, as that seems like it'll guarantee that each session starts out
>> with only a minimal amount of entropy.

> No, that's actually the right fix - that forces OpenSSL to do new reseed
> with system randomness, thus making backend (including SSL_accept)
> maximally disconnected from static pool in postmaster.

I don't think that "maximal disconnectedness" is the deciding factor
here, or even a very important factor.  If we do RAND_cleanup() then
each new session will be forced to suck entropy from /dev/urandom
(immediately, if an SSL connection is attempted).  This opens the door
to quasi-denial-of-service attacks that bleed all the entropy out of the
system, forcing long delays for new PRNG-using processes, Postgres or
otherwise.  And long delays are the *best case* scenario --- worst case,
if the system lacks decent /dev/random support, is that new processes
are starting up with highly correlated PRNG states.

If such an approach were a good thing, the OpenSSL guys would have
implemented it internally --- it'd be easy enough to do, just by forcing
RAND_cleanup anytime getpid() reads differently than it did when the
pool was set up.

gettimeofday() might not be the very best way of changing the inherited
PRNG state from child to child, but I think it's a more attractive
solution than RAND_cleanup.

> This also makes behaviour equal to current ssl=off and exec-backend
> mode, which already do initial seeding in backend.

No, it's not "equal", because when ssl=off seeding attempts won't happen
immediately after fork and thus an attacker doesn't have such an easy
way of draining entropy.  If we do what you're suggesting, the attacker
doesn't even need a valid database login to drain entropy --- he can
just fire-and-forget NEGOTIATE_SSL startup packets.  (The exec-backend
case will have such issues anyway, but who thought Windows was a secure
OS?)

> If you decide against RAND_cleanup in backend and instead
> do workarounds in backend or postmaster, then please
> also to periodic RAND_cleanup in postmaster.  This should
> make harder to exploit weaknesses in reused slowly moving state.

We could consider that, but it's not apparent to me that it has any
real value.
        regards, tom lane



Re: pgcrypto seeding problem when ssl=on

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote:
>> I believe that we'd be better off doing something in postmaster.c to
>> positively ensure that each session has a distinct seed value.  Notice
>> that BackendRun() already takes measures to ensure that's the case for
>> the regular libc random() function; it seems like a reasonable extension
>> to also worry about OpenSSL's PRNG.

>> #ifdef USE_SSL
>> if (EnableSSL)
>> {
>>    struct timeval tv;
>> 
>>    gettimeofday(&tv, NULL);
>>    RAND_add(&tv, sizeof(tv), 0);
>> }
>> #endif

> Take the caution one step further and make it independent of EnableSSL.  In a
> stock installation, a !EnableSSL postmaster will never seed its PRNG, and
> there's no vulnerability.  Add a shared_preload_libraries module that uses the
> OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again.

Meh.  In a postmaster that wasn't built with SSL support at all, such
a module is still dangerous (and I'm not convinced anybody would build
such a module anyway).  I think we should confine our ambitions to
preventing security issues caused by our own code.
        regards, tom lane



Re: pgcrypto seeding problem when ssl=on

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-22 14:20:56 -0500, Tom Lane wrote:
>> I believe that we'd be better off doing something in postmaster.c to
>> positively ensure that each session has a distinct seed value.

> I am entirely unconvinced that adding in a gettimeofday() provides more
> entropy than what openssl gathers internally after a
> RAND_cleanup().

No, it doesn't provide more entropy, but what it does do is avoid
draining entropy from the kernel (see my reply to Marko just now).

> gettimeofday() will yield the same result in close
> enough forks on a significant number of systems whereas openssl should
> use stuff like /dev/urandom to initialize its PRNG after a cleanup.

Well, in the first place, that doesn't matter much because the PRNG
state will still change from repeated mix-ins of gettimeofday, even if
we obtain the same reading successive times.  In the second place, if we
add it where I suggest, it would be easy to also include the child
process PID in the added entropy -- we could xor it into the tv_sec
value for instance -- thus further reducing the chance of identical
values getting added in.
        regards, tom lane



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Sun, Dec 23, 2012 at 02:49:08PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote:
> >> #ifdef USE_SSL
> >> if (EnableSSL)
> >> {
> >>    struct timeval tv;
> >> 
> >>    gettimeofday(&tv, NULL);
> >>    RAND_add(&tv, sizeof(tv), 0);
> >> }
> >> #endif
> 
> > Take the caution one step further and make it independent of EnableSSL.  In a
> > stock installation, a !EnableSSL postmaster will never seed its PRNG, and
> > there's no vulnerability.  Add a shared_preload_libraries module that uses the
> > OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again.
> 
> Meh.  In a postmaster that wasn't built with SSL support at all, such
> a module is still dangerous (and I'm not convinced anybody would build
> such a module anyway).  I think we should confine our ambitions to
> preventing security issues caused by our own code.

You're adding lines of code to prematurely micro-optimize the backend fork
cycle.  If code introduced into the postmaster, by us or others, ever violates
the assumption behind that micro-optimization, certain users get a precipitous
loss of security with no clear alarm bells.  I don't like that trade.

nm



Re: pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
On Sun, Dec 23, 2012 at 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm not thrilled with the suggestion to do RAND_cleanup() after forking
> >> though, as that seems like it'll guarantee that each session starts out
> >> with only a minimal amount of entropy.
>
> > No, that's actually the right fix - that forces OpenSSL to do new reseed
> > with system randomness, thus making backend (including SSL_accept)
> > maximally disconnected from static pool in postmaster.
>
> I don't think that "maximal disconnectedness" is the deciding factor
> here, or even a very important factor.  If we do RAND_cleanup() then
> each new session will be forced to suck entropy from /dev/urandom
> (immediately, if an SSL connection is attempted).  This opens the door
> to quasi-denial-of-service attacks that bleed all the entropy out of the
> system, forcing long delays for new PRNG-using processes, Postgres or
> otherwise.  And long delays are the *best case* scenario --- worst case,
> if the system lacks decent /dev/random support, is that new processes
> are starting up with highly correlated PRNG states.

You cannot realistically drain /dev/urandom - it's just a CSPRNG, periodically
seeded from /dev/random.  And it cannot cause long-delays.  Sessions keys
are actually it's main use-case.

And anyway - if best way to attack Postgres SSL session is to drain and
cryptoanalyze /dev/urandom via SSL attempts, then we are in pretty good
shape.  When depening only on properly-used OpenSSL PRNG, we are still
good, but bit worse situation - it's gets minimal amount of entropy
after initial seeding, thus successful cryptoanalyze is more probable
than on /dev/urandom.  And when depending on incorrectly used OpenSSL PRNG
(current situation) then the situation is bad - we seem to be depending
on security of single hash.

> If such an approach were a good thing, the OpenSSL guys would have
> implemented it internally --- it'd be easy enough to do, just by forcing
> RAND_cleanup anytime getpid() reads differently than it did when the
> pool was set up.

I thought about it and I realized they cannot do it - the libcrypto has
no idea of application lifetime, so doing such cleanup without application
knowledge is wrong.

Eg. imagine a daemon that loads config, sets up SSL, goes background into
empty chroot.

So the current idea of hashing getpid() on each call is best they can
do to help badly written applications - like Postgres.

> gettimeofday() might not be the very best way of changing the inherited
> PRNG state from child to child, but I think it's a more attractive
> solution than RAND_cleanup.

1. It optimizes CPU for unrealistic attack scenario.

2. It spends more CPU in single-threaded postmaster when SSL is not used -  as postmaster does not know whether
incomingconnection is SSL or not  it needs to do the PRNG feeding anyway.  This might be noticeable  in realistic
short-connectionsscenario, where SSL is used only  for admin or replication connections.
 

> > This also makes behaviour equal to current ssl=off and exec-backend
> > mode, which already do initial seeding in backend.
>
> No, it's not "equal", because when ssl=off seeding attempts won't happen
> immediately after fork and thus an attacker doesn't have such an easy
> way of draining entropy.  If we do what you're suggesting, the attacker
> doesn't even need a valid database login to drain entropy --- he can
> just fire-and-forget NEGOTIATE_SSL startup packets.

You can't just fire-and-forget them - you need to do TCP handshake
plus Postgres SSL handshake; this means each request takes noticeable
amount of setup time from attacker side and on server side the sucker's
IP is visible in logs.

And /dev/urandom seeding is prettly inexpensive compared to SSL pubkey
crypto.  It does not look like a scenario we need to design against.

> (The exec-backend
> case will have such issues anyway, but who thought Windows was a secure
> OS?)

ATM the Windows is pretty good shape compared to our Unix situation...

> > If you decide against RAND_cleanup in backend and instead
> > do workarounds in backend or postmaster, then please
> > also to periodic RAND_cleanup in postmaster.  This should
> > make harder to exploit weaknesses in reused slowly moving state.
>
> We could consider that, but it's not apparent to me that it has any
> real value.

Properly-designed and properly-used CSPRNG feeding and extraction
should tolerate minor breaks in low-level hashes/ciphers.  If we are
not using it as intended, then we lose that guarantee and need to
limit the damage on our side.

-- 
marko



Re: pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
> How about instead calling RAND_cleanup() after each backend fork?

Attached is a patch that adds RAND_cleanup() to fork_process().
That way all forked processes start with fresh state.  This should
make sure the problem does not happen in any processes
forked by postmaster.

Please backpatch.

...

Alternative is to put RAND_cleanup() to BackendInitialize() so only
new backends start with fresh state.

Another alternative is to put RAND_cleanup() after SSL_accept(),
that way core code sees no change, but other OpenSSL users
in backend operate securely.

--
marko

Attachment

Re: pgcrypto seeding problem when ssl=on

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
>> How about instead calling RAND_cleanup() after each backend fork?

> Attached is a patch that adds RAND_cleanup() to fork_process().

I remain unconvinced that this is the best solution.  Anybody else have
an opinion?
        regards, tom lane



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Sun, Jan 13, 2013 at 05:46:12PM -0500, Tom Lane wrote:
> Marko Kreen <markokr@gmail.com> writes:
> > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
> >> How about instead calling RAND_cleanup() after each backend fork?
> 
> > Attached is a patch that adds RAND_cleanup() to fork_process().
> 
> I remain unconvinced that this is the best solution.  Anybody else have
> an opinion?

I'd describe it as the clearly-secure solution.  The biggest hazard I see is
the drain on system entropy.  A system having only a blocking /dev/random
could suddenly find itself entropy-exhausted after installing the minor
upgrade.  Backends could block waiting for system entropy to accumulate.
That's not directly a problem on Linux.  However, if other programs on the
system read from /dev/random, the pressure from PostgreSQL's /dev/urandom
usage may make those programs wait longer for entropy.

I'm also comfortable with the security of Marko's original proposal, modulo it
happening in each backend shortly after fork, not merely in pgcrypto.
OpenSSL's ssl module uses a similar method, and one of the papers I cited
describes it.  (If anything, OpenSSL is less cautious.  It uses time(), not
gettimeofday().)  The performance characteristics of this approach are easier
to guess: one system call if we use MyProcPid + gettimeofday(), zero if we use
MyProcPid + MyStartTime.

You proposed mixing gettimeofday() into the postmaster's entropy pool after
each fork.  I wouldn't be too concerned if we did it that way, but my quick
literature review did not turn up any similar ideas.  Given that this is
strictly more expensive than the previous method, I don't recommend it.

Overall, I'd recommend mixing in MyProcPid and MyStartTime after each fork.

nm



Re: pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
On Mon, Jan 14, 2013 at 12:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
>>> How about instead calling RAND_cleanup() after each backend fork?
>
>> Attached is a patch that adds RAND_cleanup() to fork_process().
>
> I remain unconvinced that this is the best solution.  Anybody else have
> an opinion?

Do you have knowledge about systems that have /dev/random (blocking)
but not /dev/urandom (non-blocking)?  The only argument I see against
RAND_cleanup() is that postgres might eat entropy from /dev/random (blocking)
and cause both other programs and itself block, waiting for more entropy.

But this can only happen on systems that don't have /dev/urandom.

Note: reading from /dev/urandom does not affect /dev/random.

-- 
marko



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Mon, Jan 14, 2013 at 02:21:00PM +0200, Marko Kreen wrote:
> Note: reading from /dev/urandom does not affect /dev/random.

Reading from /dev/urandom drains the pool that serves /dev/random:

$ cat /proc/sys/kernel/random/entropy_avail 
3596
$ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null
1+0 records in
1+0 records out
100 bytes (100 B) copied, 0.000174798 s, 572 kB/s
$ cat /proc/sys/kernel/random/entropy_avail 
2839
$ head -c10000000 /dev/urandom >/dev/null
$ cat /proc/sys/kernel/random/entropy_avail 
212
$ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null
0+1 records in
0+1 records out
38 bytes (38 B) copied, 0.000101439 s, 375 kB/s



Re: pgcrypto seeding problem when ssl=on

From
Marko Kreen
Date:
On Mon, Jan 14, 2013 at 3:00 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jan 14, 2013 at 02:21:00PM +0200, Marko Kreen wrote:
>> Note: reading from /dev/urandom does not affect /dev/random.
>
> Reading from /dev/urandom drains the pool that serves /dev/random:
>
> $ cat /proc/sys/kernel/random/entropy_avail
> 3596
> $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null
> 1+0 records in
> 1+0 records out
> 100 bytes (100 B) copied, 0.000174798 s, 572 kB/s
> $ cat /proc/sys/kernel/random/entropy_avail
> 2839
> $ head -c10000000 /dev/urandom >/dev/null
> $ cat /proc/sys/kernel/random/entropy_avail
> 212
> $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null
> 0+1 records in
> 0+1 records out
> 38 bytes (38 B) copied, 0.000101439 s, 375 kB/s

This slightly weakens my argument, but not completely - any
/dev/urandom-using processes are still unaffected, but
indeed processes using /dev/random can block.  But what are those?

So it's still problem only on systems that do not have /dev/urandom.

Btw, observe fun behaviour:

$ cat /proc/sys/kernel/random/entropy_avail
3451
$ cat /proc/sys/kernel/random/entropy_avail
3218
$ cat /proc/sys/kernel/random/entropy_avail
3000
$ cat /proc/sys/kernel/random/entropy_avail
2771
$ cat /proc/sys/kernel/random/entropy_avail
2551
$ cat /proc/sys/kernel/random/entropy_avail
2321
$ cat /proc/sys/kernel/random/entropy_avail
2101
$ cat /proc/sys/kernel/random/entropy_avail
1759
$ cat /proc/sys/kernel/random/entropy_avail
1527
$ cat /proc/sys/kernel/random/entropy_avail
1319
$ cat /proc/sys/kernel/random/entropy_avail
1080
$ cat /proc/sys/kernel/random/entropy_avail
844
$ cat /proc/sys/kernel/random/entropy_avail
605
$ cat /proc/sys/kernel/random/entropy_avail
366
$ cat /proc/sys/kernel/random/entropy_avail
128
$ cat /proc/sys/kernel/random/entropy_avail
142
$ cat /proc/sys/kernel/random/entropy_avail

Each exec() eats from the pool on modern system.

-- 
marko



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Mon, Jan 14, 2013 at 03:42:42PM +0200, Marko Kreen wrote:
> On Mon, Jan 14, 2013 at 3:00 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Jan 14, 2013 at 02:21:00PM +0200, Marko Kreen wrote:
> >> Note: reading from /dev/urandom does not affect /dev/random.
> >
> > Reading from /dev/urandom drains the pool that serves /dev/random:
> >
> > $ cat /proc/sys/kernel/random/entropy_avail
> > 3596
> > $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null
> > 1+0 records in
> > 1+0 records out
> > 100 bytes (100 B) copied, 0.000174798 s, 572 kB/s
> > $ cat /proc/sys/kernel/random/entropy_avail
> > 2839
> > $ head -c10000000 /dev/urandom >/dev/null
> > $ cat /proc/sys/kernel/random/entropy_avail
> > 212
> > $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null
> > 0+1 records in
> > 0+1 records out
> > 38 bytes (38 B) copied, 0.000101439 s, 375 kB/s
> 
> This slightly weakens my argument, but not completely - any
> /dev/urandom-using processes are still unaffected, but
> indeed processes using /dev/random can block.  But what are those?

I don't know, really.  I agree it's probably a rare thing, to the point of
trivial concern for a major release.  For a backpatch, avoiding such
system-wide effects seems to me like a fair tiebreaker.

Needless to say, all three proposals I listed would be major improvements over
the present, vulnerable state.

> $ cat /proc/sys/kernel/random/entropy_avail
> 3451
> $ cat /proc/sys/kernel/random/entropy_avail
> 3218
> $ cat /proc/sys/kernel/random/entropy_avail
> 3000

> Each exec() eats from the pool on modern system.

Amusing.

Thanks,
nm



Re: pgcrypto seeding problem when ssl=on

From
Noah Misch
Date:
On Sun, Jan 13, 2013 at 07:01:07PM -0500, Noah Misch wrote:
> On Sun, Jan 13, 2013 at 05:46:12PM -0500, Tom Lane wrote:
> > Marko Kreen <markokr@gmail.com> writes:
> > > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote:
> > >> How about instead calling RAND_cleanup() after each backend fork?
> > 
> > > Attached is a patch that adds RAND_cleanup() to fork_process().
> > 
> > I remain unconvinced that this is the best solution.  Anybody else have
> > an opinion?
> 
> I'd describe it as the clearly-secure solution.  The biggest hazard I see is
> the drain on system entropy.  A system having only a blocking /dev/random
> could suddenly find itself entropy-exhausted after installing the minor
> upgrade.  Backends could block waiting for system entropy to accumulate.
> That's not directly a problem on Linux.  However, if other programs on the
> system read from /dev/random, the pressure from PostgreSQL's /dev/urandom
> usage may make those programs wait longer for entropy.
> 
> I'm also comfortable with the security of Marko's original proposal, modulo it
> happening in each backend shortly after fork, not merely in pgcrypto.
> OpenSSL's ssl module uses a similar method, and one of the papers I cited
> describes it.  (If anything, OpenSSL is less cautious.  It uses time(), not
> gettimeofday().)  The performance characteristics of this approach are easier
> to guess: one system call if we use MyProcPid + gettimeofday(), zero if we use
> MyProcPid + MyStartTime.
> 
> You proposed mixing gettimeofday() into the postmaster's entropy pool after
> each fork.  I wouldn't be too concerned if we did it that way, but my quick
> literature review did not turn up any similar ideas.  Given that this is
> strictly more expensive than the previous method, I don't recommend it.
> 
> Overall, I'd recommend mixing in MyProcPid and MyStartTime after each fork.

I neglected to ping this for the last back-branch releases.  May we adopt one
of the above fixes and issue a CVE with the next releases?  Though I've stated
my preference, all three proposals would be major improvements over the
present, quietly-vulnerable state.

Thanks,
nm