Thread: pgcrypto seeding problem when ssl=on
When there is 'ssl=on' then postmaster calls SSL_CTX_new(), which asks for random number, thus requiring initialization of randomness pool (RAND_poll). After that all forked backends think pool is already initialized. Thus they proceed with same fixed state they got from postmaster. Output is not completely constant due to: 1) When outputting random bytes, OpenSSL includes getpid() output when hashing pool state. 2) SSL_accept() adds time() output into pool before asking for random bytes. This and 1) should make sure SSL handshake is done with unique random numbers. [It's uncertain tho' how unpredictable the PRNG is in such mode.] Now, problem in pgcrypto side is that when compiled with OpenSSL, it expects the randomness pool to be already initialized. This expectation is filled when ssl=off, or when actual SSL connection is used. But it's broken when non-SSL connection is used while having ssl=on in config. Then all backends are in same state when they reach pgcrypto functions first time and output is only affected by pid. Affected: * pgp_encrypt*() - it feeds hashes of user data back to pool, but this is "randomized", thus there is small chance first few messages have weaker keys. * gen_random_bytes() - this does not feed back anything, thus when used alone in session, it's output will repeat quite easily as randomness sequence is affected only by pid. Attached patch makes both gen_random_bytes() and pgp_encrypt() seed pool with output from gettimeofday(), thus getting pool off from fixed state. Basically, this mirrors what SSL_accept() already does. I'd like to do bigger reorg of seeding code in pgcrypto, but that might not be back-patchable. So I propose this simple fix, which should be applied also to older releases. -- marko
Attachment
This should have gone to security@postgresql.org, instead. On Fri, Dec 21, 2012 at 06:05:10PM +0200, Marko Kreen wrote: > When there is 'ssl=on' then postmaster calls SSL_CTX_new(), > which asks for random number, thus requiring initialization > of randomness pool (RAND_poll). After that all forked backends > think pool is already initialized. Thus they proceed with same > fixed state they got from postmaster. > Attached patch makes both gen_random_bytes() and pgp_encrypt() > seed pool with output from gettimeofday(), thus getting pool > off from fixed state. Basically, this mirrors what SSL_accept() > already does. That adds only 10-20 bits of entropy. Is that enough? How about instead calling RAND_cleanup() after each backend fork? Thanks, nm
On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: > This should have gone to security@postgresql.org, instead. It went and this could have been patched in 9.2.2 but they did not care. > On Fri, Dec 21, 2012 at 06:05:10PM +0200, Marko Kreen wrote: >> When there is 'ssl=on' then postmaster calls SSL_CTX_new(), >> which asks for random number, thus requiring initialization >> of randomness pool (RAND_poll). After that all forked backends >> think pool is already initialized. Thus they proceed with same >> fixed state they got from postmaster. > >> Attached patch makes both gen_random_bytes() and pgp_encrypt() >> seed pool with output from gettimeofday(), thus getting pool >> off from fixed state. Basically, this mirrors what SSL_accept() >> already does. > > That adds only 10-20 bits of entropy. Is that enough? It's enough to get numbers that are not the same. Whether it's possible to guess next number when you know that there is only time() and getpid() added to fixed state (eg. those cleartext bytes in SSL handshake) - i don't know. In any case, this is how Postgres already operates SSL connections. > How about instead calling RAND_cleanup() after each backend fork? Not "instead" - the gettimeofday() makes sense in any case. Considering that it's immediate problem only for pgcrypto, this patch has higher chance for appearing in back-branches. If the _cleanup() makes next RAND_bytes() call RAND_poll() again? Then yes, it certainly makes sense as core patch. -- marko
On Sat, Dec 22, 2012 at 12:59:55AM +0200, Marko Kreen wrote: > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: > > This should have gone to security@postgresql.org, instead. > > It went and this could have been patched in 9.2.2 but they did not care. Ah. > > On Fri, Dec 21, 2012 at 06:05:10PM +0200, Marko Kreen wrote: > >> When there is 'ssl=on' then postmaster calls SSL_CTX_new(), > >> which asks for random number, thus requiring initialization > >> of randomness pool (RAND_poll). After that all forked backends > >> think pool is already initialized. Thus they proceed with same > >> fixed state they got from postmaster. > > > >> Attached patch makes both gen_random_bytes() and pgp_encrypt() > >> seed pool with output from gettimeofday(), thus getting pool > >> off from fixed state. Basically, this mirrors what SSL_accept() > >> already does. > > > > That adds only 10-20 bits of entropy. Is that enough? > > It's enough to get numbers that are not the same. I think I see what you're getting at. With an ssl=on postmaster that only processes non-SSL connections, calls to gen_random_bytes() in a session can, at least initially, only produce about 2^15[1] distinct random sequences per postmaster restart. Furthermore, an attacker with access to run queries can generate the list of possible sequences, then proceed to know the exact bytes each other backend with receive, for the life of the postmaster. With your patch, no two backends will ever mix the same PID+microsecond time into the random state. For anyone following along, the following Perl program demonstrates the weakness. After it has visited the full range of PIDs, every future iteration will find a duplicate: for my $offset (1 .. 100000) {my $db = DBI->connect('dbi:Pg:', undef, undef, {RaiseError => 1});my($bytes) = $db->selectrow_array('SELECTgen_random_bytes(10)');print "Saw $bytes twice!\n" if $seen{$bytes};$seen{$bytes}++;print "Done$offset\n" unless $offset % 100; } > Whether it's possible to guess next number when you know > that there is only time() and getpid() added to fixed state > (eg. those cleartext bytes in SSL handshake) - i don't know. I don't know, either. > In any case, this is how Postgres already operates SSL connections. Fair point. Interesting reading (search for "fork"): http://www.acsac.org/2003/papers/79.pdf http://www.apache-ssl.org/randomness.pdf http://openssl.6102.n7.nabble.com/recycled-pids-causes-PRNG-to-repeat-td41669.html The first paper suggests exactly what you've implemented. Between that and OpenSSL doing it, I don't feel further need to doubt its adequacy. > > How about instead calling RAND_cleanup() after each backend fork? > > Not "instead" - the gettimeofday() makes sense in any case. Considering > that it's immediate problem only for pgcrypto, this patch has higher chance > for appearing in back-branches. I worry that it's too cavalier to leave the OpenSSL PRNG in an insecure state, expecting every consumer to fix the problem or, failing to do so, silently operate in an insecure manner. pgcrypto may be the only security-critical user in our tree, but others in the field would not surprise me. Putting the change after fork_process() in BackendStartup() solves the problem for good. > If the _cleanup() makes next RAND_bytes() call RAND_poll() again? > Then yes, it certainly makes sense as core patch. It does. The disadvantage is that we'll then draw bytes from /dev/urandom for every backend fork. The Linux /dev/urandom is mostly designed for such abuse, but it would carry some unnecessary risk for back branches. So, I think your mixing of time into the seed is indeed best. Thanks, nm [1] That assumes the common 1 ... 32768 PID range.
Noah Misch <noah@leadboat.com> writes: > On Sat, Dec 22, 2012 at 12:59:55AM +0200, Marko Kreen wrote: >> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: >>> This should have gone to security@postgresql.org, instead. >> >> It went and this could have been patched in 9.2.2 but they did not care. > Ah. A slightly less revisionist view of the history is that the patch that Marko originally proposed to -security was considerably more complicated and less portable than this one, so we put it on hold until we thought of something better. This new patch looks pretty reasonable from here, modulo the question of whether it adds "enough" entropy. I'm not entirely sold on whether it does. ISTM that any attack based on this line of thinking already assumes that the attacker has complete knowledge of how many backends have been launched (else he doesn't know which sequence a targeted session will get). If he knows that much, mightn't he also know *when* they were launched? Alternatively: if he can know the session's start time (which we helpfully make available...) how much harder is this really making it for him to deduce the session's seed? On top of which: what exactly will he do with the seed once he's got it that would amount to a security problem? Or to put it in different terms, I'm not quite convinced that there's a plausible threat model that this patch blocks effectively. regards, tom lane
On Fri, Dec 21, 2012 at 10:05:30PM -0500, Tom Lane wrote: > This new patch looks pretty reasonable from here, > modulo the question of whether it adds "enough" entropy. More brains reviewing that question will be good. > I'm not entirely sold on whether it does. ISTM that any attack based on > this line of thinking already assumes that the attacker has complete > knowledge of how many backends have been launched (else he doesn't know > which sequence a targeted session will get). pg_stat_activity.pid will do. > If he knows that much, > mightn't he also know *when* they were launched? Alternatively: if he > can know the session's start time (which we helpfully make available...) > how much harder is this really making it for him to deduce the session's > seed? If he has access to his own fork-time seed, he can indeed estimate the seeds of other sessions. That seed is currently invisible to non-native code, and a user able to deploy a C function already has the access needed to compromise all sessions. The assumption that the fork-time seed stays buried is a source of unease, though. > On top of which: what exactly will he do with the seed once he's got it > that would amount to a security problem? > > Or to put it in different terms, I'm not quite convinced that there's a > plausible threat model that this patch blocks effectively. The examples could be as numerous as the algorithms that specify use of a cryptographically secure PRNG. Here's a simple one: an application generates long-term private encryption keys by connecting, issuing "SELECT gen_random_bytes(16)", and disconnecting. Long-term, across postmaster restarts, there are still almost 2^128 possible keys. However, backends of any given postmaster can only generate 2^15 possible keys. An attacker can attempt to acquire, over time, a backend with every PID in the system. The script I gave earlier is an example of doing so. He won't manage to visit literally every PID, sure, but he'll easily get 95% of them. By issuing "SELECT pg_backend_pid(), gen_random_bytes(16)" in each session, he assembles a dictionary of most possible keys under this postmaster. If he repeats this after each postmaster restart, he might acquire a dictionary covering months or years of key generation. Given a ciphertext based on a key presumed to be made during that time, he can try his relatively-small dictionary with a high chance of success. nm
Marko Kreen <markokr@gmail.com> writes: > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: >> How about instead calling RAND_cleanup() after each backend fork? > Not "instead" - the gettimeofday() makes sense in any case. Considering > that it's immediate problem only for pgcrypto, this patch has higher chance > for appearing in back-branches. > If the _cleanup() makes next RAND_bytes() call RAND_poll() again? > Then yes, it certainly makes sense as core patch. I thought some more about this, and I think there is a pretty strong objection to this version of the patch: it *only* fixes the lack-of-randomness problem for pgcrypto users. Any other third-party code depending on the OpenSSL PRNG will have the same problem. Furthermore, it's not even obvious that this patch fixes it for all pgcrypto functions --- it probably is all right today, since I assume Marko remembers all the connections in that code, but it would be easy to miss the problem in future additions. I believe that we'd be better off doing something in postmaster.c to positively ensure that each session has a distinct seed value. Notice that BackendRun() already takes measures to ensure that's the case for the regular libc random() function; it seems like a reasonable extension to also worry about OpenSSL's PRNG. I'm not thrilled with the suggestion to do RAND_cleanup() after forking though, as that seems like it'll guarantee that each session starts out with only a minimal amount of entropy. What seems to me like it'd be most secure is to make the postmaster do RAND_add() with a gettimeofday result after each successful fork, say at the bottom of BackendStartup(). That way the randomness accumulates in the parent process, and there's no way to predict the initial state of any session without exact knowledge of every child process launch since postmaster start. So it'd go something like #ifdef USE_SSLif (EnableSSL){ struct timeval tv; gettimeofday(&tv, NULL); RAND_add(&tv, sizeof(tv), 0);} #endif We could perhaps also make this conditional on not EXEC_BACKEND, since the whole issue is moot if backends are launched by fork/exec. regards, tom lane
On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote: > I believe that we'd be better off doing something in postmaster.c to > positively ensure that each session has a distinct seed value. Notice > that BackendRun() already takes measures to ensure that's the case for > the regular libc random() function; it seems like a reasonable extension > to also worry about OpenSSL's PRNG. > #ifdef USE_SSL > if (EnableSSL) > { > struct timeval tv; > > gettimeofday(&tv, NULL); > RAND_add(&tv, sizeof(tv), 0); > } > #endif Take the caution one step further and make it independent of EnableSSL. In a stock installation, a !EnableSSL postmaster will never seed its PRNG, and there's no vulnerability. Add a shared_preload_libraries module that uses the OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again. Other than that, looks good. > We could perhaps also make this conditional on not EXEC_BACKEND, since > the whole issue is moot if backends are launched by fork/exec. True.
On 2012-12-22 14:20:56 -0500, Tom Lane wrote: > Marko Kreen <markokr@gmail.com> writes: > > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: > >> How about instead calling RAND_cleanup() after each backend fork? > > > Not "instead" - the gettimeofday() makes sense in any case. Considering > > that it's immediate problem only for pgcrypto, this patch has higher chance > > for appearing in back-branches. > > > If the _cleanup() makes next RAND_bytes() call RAND_poll() again? > > Then yes, it certainly makes sense as core patch. > > I thought some more about this, and I think there is a pretty strong > objection to this version of the patch: it *only* fixes the > lack-of-randomness problem for pgcrypto users. Any other third-party > code depending on the OpenSSL PRNG will have the same problem. > Furthermore, it's not even obvious that this patch fixes it for all > pgcrypto functions --- it probably is all right today, since I assume > Marko remembers all the connections in that code, but it would be easy > to miss the problem in future additions. > > I believe that we'd be better off doing something in postmaster.c to > positively ensure that each session has a distinct seed value. Notice > that BackendRun() already takes measures to ensure that's the case for > the regular libc random() function; it seems like a reasonable extension > to also worry about OpenSSL's PRNG. > > I'm not thrilled with the suggestion to do RAND_cleanup() after forking > though, as that seems like it'll guarantee that each session starts out > with only a minimal amount of entropy. What seems to me like it'd be > most secure is to make the postmaster do RAND_add() with a gettimeofday > result after each successful fork, say at the bottom of > BackendStartup(). That way the randomness accumulates in the parent > process, and there's no way to predict the initial state of any session > without exact knowledge of every child process launch since postmaster > start. I am entirely unconvinced that adding in a gettimeofday() provides more entropy than what openssl gathers internally after a RAND_cleanup(). gettimeofday() will yield the same result in close enough forks on a significant number of systems whereas openssl should use stuff like /dev/urandom to initialize its PRNG after a cleanup. I think a better solution might be to use up some entropy in the postmaster *before* doing a fork and then mix in the pid + gettimeofday()after the fork. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not thrilled with the suggestion to do RAND_cleanup() after forking > though, as that seems like it'll guarantee that each session starts out > with only a minimal amount of entropy. No, that's actually the right fix - that forces OpenSSL to do new reseed with system randomness, thus making backend (including SSL_accept) maximally disconnected from static pool in postmaster. This also makes behaviour equal to current ssl=off and exec-backend mode, which already do initial seeding in backend. The fact that PRNG behaviour is affected by complex set of compile- and run-time switches makes current situation rather fragile and hard to understand. > What seems to me like it'd be > most secure is to make the postmaster do RAND_add() with a gettimeofday > result after each successful fork, say at the bottom of > BackendStartup(). That way the randomness accumulates in the parent > process, and there's no way to predict the initial state of any session > without exact knowledge of every child process launch since postmaster > start. So it'd go something like > > #ifdef USE_SSL > if (EnableSSL) > { > struct timeval tv; > > gettimeofday(&tv, NULL); > RAND_add(&tv, sizeof(tv), 0); > } > #endif If you decide against RAND_cleanup in backend and instead do workarounds in backend or postmaster, then please also to periodic RAND_cleanup in postmaster. This should make harder to exploit weaknesses in reused slowly moving state. -- marko
Marko Kreen <markokr@gmail.com> writes: > On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not thrilled with the suggestion to do RAND_cleanup() after forking >> though, as that seems like it'll guarantee that each session starts out >> with only a minimal amount of entropy. > No, that's actually the right fix - that forces OpenSSL to do new reseed > with system randomness, thus making backend (including SSL_accept) > maximally disconnected from static pool in postmaster. I don't think that "maximal disconnectedness" is the deciding factor here, or even a very important factor. If we do RAND_cleanup() then each new session will be forced to suck entropy from /dev/urandom (immediately, if an SSL connection is attempted). This opens the door to quasi-denial-of-service attacks that bleed all the entropy out of the system, forcing long delays for new PRNG-using processes, Postgres or otherwise. And long delays are the *best case* scenario --- worst case, if the system lacks decent /dev/random support, is that new processes are starting up with highly correlated PRNG states. If such an approach were a good thing, the OpenSSL guys would have implemented it internally --- it'd be easy enough to do, just by forcing RAND_cleanup anytime getpid() reads differently than it did when the pool was set up. gettimeofday() might not be the very best way of changing the inherited PRNG state from child to child, but I think it's a more attractive solution than RAND_cleanup. > This also makes behaviour equal to current ssl=off and exec-backend > mode, which already do initial seeding in backend. No, it's not "equal", because when ssl=off seeding attempts won't happen immediately after fork and thus an attacker doesn't have such an easy way of draining entropy. If we do what you're suggesting, the attacker doesn't even need a valid database login to drain entropy --- he can just fire-and-forget NEGOTIATE_SSL startup packets. (The exec-backend case will have such issues anyway, but who thought Windows was a secure OS?) > If you decide against RAND_cleanup in backend and instead > do workarounds in backend or postmaster, then please > also to periodic RAND_cleanup in postmaster. This should > make harder to exploit weaknesses in reused slowly moving state. We could consider that, but it's not apparent to me that it has any real value. regards, tom lane
Noah Misch <noah@leadboat.com> writes: > On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote: >> I believe that we'd be better off doing something in postmaster.c to >> positively ensure that each session has a distinct seed value. Notice >> that BackendRun() already takes measures to ensure that's the case for >> the regular libc random() function; it seems like a reasonable extension >> to also worry about OpenSSL's PRNG. >> #ifdef USE_SSL >> if (EnableSSL) >> { >> struct timeval tv; >> >> gettimeofday(&tv, NULL); >> RAND_add(&tv, sizeof(tv), 0); >> } >> #endif > Take the caution one step further and make it independent of EnableSSL. In a > stock installation, a !EnableSSL postmaster will never seed its PRNG, and > there's no vulnerability. Add a shared_preload_libraries module that uses the > OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again. Meh. In a postmaster that wasn't built with SSL support at all, such a module is still dangerous (and I'm not convinced anybody would build such a module anyway). I think we should confine our ambitions to preventing security issues caused by our own code. regards, tom lane
Andres Freund <andres@2ndquadrant.com> writes: > On 2012-12-22 14:20:56 -0500, Tom Lane wrote: >> I believe that we'd be better off doing something in postmaster.c to >> positively ensure that each session has a distinct seed value. > I am entirely unconvinced that adding in a gettimeofday() provides more > entropy than what openssl gathers internally after a > RAND_cleanup(). No, it doesn't provide more entropy, but what it does do is avoid draining entropy from the kernel (see my reply to Marko just now). > gettimeofday() will yield the same result in close > enough forks on a significant number of systems whereas openssl should > use stuff like /dev/urandom to initialize its PRNG after a cleanup. Well, in the first place, that doesn't matter much because the PRNG state will still change from repeated mix-ins of gettimeofday, even if we obtain the same reading successive times. In the second place, if we add it where I suggest, it would be easy to also include the child process PID in the added entropy -- we could xor it into the tv_sec value for instance -- thus further reducing the chance of identical values getting added in. regards, tom lane
On Sun, Dec 23, 2012 at 02:49:08PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sat, Dec 22, 2012 at 02:20:56PM -0500, Tom Lane wrote: > >> #ifdef USE_SSL > >> if (EnableSSL) > >> { > >> struct timeval tv; > >> > >> gettimeofday(&tv, NULL); > >> RAND_add(&tv, sizeof(tv), 0); > >> } > >> #endif > > > Take the caution one step further and make it independent of EnableSSL. In a > > stock installation, a !EnableSSL postmaster will never seed its PRNG, and > > there's no vulnerability. Add a shared_preload_libraries module that uses the > > OpenSSL PRNG in its _PG_init(), and suddenly you're vulnerable again. > > Meh. In a postmaster that wasn't built with SSL support at all, such > a module is still dangerous (and I'm not convinced anybody would build > such a module anyway). I think we should confine our ambitions to > preventing security issues caused by our own code. You're adding lines of code to prematurely micro-optimize the backend fork cycle. If code introduced into the postmaster, by us or others, ever violates the assumption behind that micro-optimization, certain users get a precipitous loss of security with no clear alarm bells. I don't like that trade. nm
On Sun, Dec 23, 2012 at 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Sat, Dec 22, 2012 at 9:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm not thrilled with the suggestion to do RAND_cleanup() after forking > >> though, as that seems like it'll guarantee that each session starts out > >> with only a minimal amount of entropy. > > > No, that's actually the right fix - that forces OpenSSL to do new reseed > > with system randomness, thus making backend (including SSL_accept) > > maximally disconnected from static pool in postmaster. > > I don't think that "maximal disconnectedness" is the deciding factor > here, or even a very important factor. If we do RAND_cleanup() then > each new session will be forced to suck entropy from /dev/urandom > (immediately, if an SSL connection is attempted). This opens the door > to quasi-denial-of-service attacks that bleed all the entropy out of the > system, forcing long delays for new PRNG-using processes, Postgres or > otherwise. And long delays are the *best case* scenario --- worst case, > if the system lacks decent /dev/random support, is that new processes > are starting up with highly correlated PRNG states. You cannot realistically drain /dev/urandom - it's just a CSPRNG, periodically seeded from /dev/random. And it cannot cause long-delays. Sessions keys are actually it's main use-case. And anyway - if best way to attack Postgres SSL session is to drain and cryptoanalyze /dev/urandom via SSL attempts, then we are in pretty good shape. When depening only on properly-used OpenSSL PRNG, we are still good, but bit worse situation - it's gets minimal amount of entropy after initial seeding, thus successful cryptoanalyze is more probable than on /dev/urandom. And when depending on incorrectly used OpenSSL PRNG (current situation) then the situation is bad - we seem to be depending on security of single hash. > If such an approach were a good thing, the OpenSSL guys would have > implemented it internally --- it'd be easy enough to do, just by forcing > RAND_cleanup anytime getpid() reads differently than it did when the > pool was set up. I thought about it and I realized they cannot do it - the libcrypto has no idea of application lifetime, so doing such cleanup without application knowledge is wrong. Eg. imagine a daemon that loads config, sets up SSL, goes background into empty chroot. So the current idea of hashing getpid() on each call is best they can do to help badly written applications - like Postgres. > gettimeofday() might not be the very best way of changing the inherited > PRNG state from child to child, but I think it's a more attractive > solution than RAND_cleanup. 1. It optimizes CPU for unrealistic attack scenario. 2. It spends more CPU in single-threaded postmaster when SSL is not used - as postmaster does not know whether incomingconnection is SSL or not it needs to do the PRNG feeding anyway. This might be noticeable in realistic short-connectionsscenario, where SSL is used only for admin or replication connections. > > This also makes behaviour equal to current ssl=off and exec-backend > > mode, which already do initial seeding in backend. > > No, it's not "equal", because when ssl=off seeding attempts won't happen > immediately after fork and thus an attacker doesn't have such an easy > way of draining entropy. If we do what you're suggesting, the attacker > doesn't even need a valid database login to drain entropy --- he can > just fire-and-forget NEGOTIATE_SSL startup packets. You can't just fire-and-forget them - you need to do TCP handshake plus Postgres SSL handshake; this means each request takes noticeable amount of setup time from attacker side and on server side the sucker's IP is visible in logs. And /dev/urandom seeding is prettly inexpensive compared to SSL pubkey crypto. It does not look like a scenario we need to design against. > (The exec-backend > case will have such issues anyway, but who thought Windows was a secure > OS?) ATM the Windows is pretty good shape compared to our Unix situation... > > If you decide against RAND_cleanup in backend and instead > > do workarounds in backend or postmaster, then please > > also to periodic RAND_cleanup in postmaster. This should > > make harder to exploit weaknesses in reused slowly moving state. > > We could consider that, but it's not apparent to me that it has any > real value. Properly-designed and properly-used CSPRNG feeding and extraction should tolerate minor breaks in low-level hashes/ciphers. If we are not using it as intended, then we lose that guarantee and need to limit the damage on our side. -- marko
On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: > How about instead calling RAND_cleanup() after each backend fork? Attached is a patch that adds RAND_cleanup() to fork_process(). That way all forked processes start with fresh state. This should make sure the problem does not happen in any processes forked by postmaster. Please backpatch. ... Alternative is to put RAND_cleanup() to BackendInitialize() so only new backends start with fresh state. Another alternative is to put RAND_cleanup() after SSL_accept(), that way core code sees no change, but other OpenSSL users in backend operate securely. -- marko
Attachment
Marko Kreen <markokr@gmail.com> writes: > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: >> How about instead calling RAND_cleanup() after each backend fork? > Attached is a patch that adds RAND_cleanup() to fork_process(). I remain unconvinced that this is the best solution. Anybody else have an opinion? regards, tom lane
On Sun, Jan 13, 2013 at 05:46:12PM -0500, Tom Lane wrote: > Marko Kreen <markokr@gmail.com> writes: > > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: > >> How about instead calling RAND_cleanup() after each backend fork? > > > Attached is a patch that adds RAND_cleanup() to fork_process(). > > I remain unconvinced that this is the best solution. Anybody else have > an opinion? I'd describe it as the clearly-secure solution. The biggest hazard I see is the drain on system entropy. A system having only a blocking /dev/random could suddenly find itself entropy-exhausted after installing the minor upgrade. Backends could block waiting for system entropy to accumulate. That's not directly a problem on Linux. However, if other programs on the system read from /dev/random, the pressure from PostgreSQL's /dev/urandom usage may make those programs wait longer for entropy. I'm also comfortable with the security of Marko's original proposal, modulo it happening in each backend shortly after fork, not merely in pgcrypto. OpenSSL's ssl module uses a similar method, and one of the papers I cited describes it. (If anything, OpenSSL is less cautious. It uses time(), not gettimeofday().) The performance characteristics of this approach are easier to guess: one system call if we use MyProcPid + gettimeofday(), zero if we use MyProcPid + MyStartTime. You proposed mixing gettimeofday() into the postmaster's entropy pool after each fork. I wouldn't be too concerned if we did it that way, but my quick literature review did not turn up any similar ideas. Given that this is strictly more expensive than the previous method, I don't recommend it. Overall, I'd recommend mixing in MyProcPid and MyStartTime after each fork. nm
On Mon, Jan 14, 2013 at 12:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: >>> How about instead calling RAND_cleanup() after each backend fork? > >> Attached is a patch that adds RAND_cleanup() to fork_process(). > > I remain unconvinced that this is the best solution. Anybody else have > an opinion? Do you have knowledge about systems that have /dev/random (blocking) but not /dev/urandom (non-blocking)? The only argument I see against RAND_cleanup() is that postgres might eat entropy from /dev/random (blocking) and cause both other programs and itself block, waiting for more entropy. But this can only happen on systems that don't have /dev/urandom. Note: reading from /dev/urandom does not affect /dev/random. -- marko
On Mon, Jan 14, 2013 at 02:21:00PM +0200, Marko Kreen wrote: > Note: reading from /dev/urandom does not affect /dev/random. Reading from /dev/urandom drains the pool that serves /dev/random: $ cat /proc/sys/kernel/random/entropy_avail 3596 $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null 1+0 records in 1+0 records out 100 bytes (100 B) copied, 0.000174798 s, 572 kB/s $ cat /proc/sys/kernel/random/entropy_avail 2839 $ head -c10000000 /dev/urandom >/dev/null $ cat /proc/sys/kernel/random/entropy_avail 212 $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null 0+1 records in 0+1 records out 38 bytes (38 B) copied, 0.000101439 s, 375 kB/s
On Mon, Jan 14, 2013 at 3:00 PM, Noah Misch <noah@leadboat.com> wrote: > On Mon, Jan 14, 2013 at 02:21:00PM +0200, Marko Kreen wrote: >> Note: reading from /dev/urandom does not affect /dev/random. > > Reading from /dev/urandom drains the pool that serves /dev/random: > > $ cat /proc/sys/kernel/random/entropy_avail > 3596 > $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null > 1+0 records in > 1+0 records out > 100 bytes (100 B) copied, 0.000174798 s, 572 kB/s > $ cat /proc/sys/kernel/random/entropy_avail > 2839 > $ head -c10000000 /dev/urandom >/dev/null > $ cat /proc/sys/kernel/random/entropy_avail > 212 > $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null > 0+1 records in > 0+1 records out > 38 bytes (38 B) copied, 0.000101439 s, 375 kB/s This slightly weakens my argument, but not completely - any /dev/urandom-using processes are still unaffected, but indeed processes using /dev/random can block. But what are those? So it's still problem only on systems that do not have /dev/urandom. Btw, observe fun behaviour: $ cat /proc/sys/kernel/random/entropy_avail 3451 $ cat /proc/sys/kernel/random/entropy_avail 3218 $ cat /proc/sys/kernel/random/entropy_avail 3000 $ cat /proc/sys/kernel/random/entropy_avail 2771 $ cat /proc/sys/kernel/random/entropy_avail 2551 $ cat /proc/sys/kernel/random/entropy_avail 2321 $ cat /proc/sys/kernel/random/entropy_avail 2101 $ cat /proc/sys/kernel/random/entropy_avail 1759 $ cat /proc/sys/kernel/random/entropy_avail 1527 $ cat /proc/sys/kernel/random/entropy_avail 1319 $ cat /proc/sys/kernel/random/entropy_avail 1080 $ cat /proc/sys/kernel/random/entropy_avail 844 $ cat /proc/sys/kernel/random/entropy_avail 605 $ cat /proc/sys/kernel/random/entropy_avail 366 $ cat /proc/sys/kernel/random/entropy_avail 128 $ cat /proc/sys/kernel/random/entropy_avail 142 $ cat /proc/sys/kernel/random/entropy_avail Each exec() eats from the pool on modern system. -- marko
On Mon, Jan 14, 2013 at 03:42:42PM +0200, Marko Kreen wrote: > On Mon, Jan 14, 2013 at 3:00 PM, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Jan 14, 2013 at 02:21:00PM +0200, Marko Kreen wrote: > >> Note: reading from /dev/urandom does not affect /dev/random. > > > > Reading from /dev/urandom drains the pool that serves /dev/random: > > > > $ cat /proc/sys/kernel/random/entropy_avail > > 3596 > > $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null > > 1+0 records in > > 1+0 records out > > 100 bytes (100 B) copied, 0.000174798 s, 572 kB/s > > $ cat /proc/sys/kernel/random/entropy_avail > > 2839 > > $ head -c10000000 /dev/urandom >/dev/null > > $ cat /proc/sys/kernel/random/entropy_avail > > 212 > > $ dd iflag=nonblock bs=100 count=1 if=/dev/random of=/dev/null > > 0+1 records in > > 0+1 records out > > 38 bytes (38 B) copied, 0.000101439 s, 375 kB/s > > This slightly weakens my argument, but not completely - any > /dev/urandom-using processes are still unaffected, but > indeed processes using /dev/random can block. But what are those? I don't know, really. I agree it's probably a rare thing, to the point of trivial concern for a major release. For a backpatch, avoiding such system-wide effects seems to me like a fair tiebreaker. Needless to say, all three proposals I listed would be major improvements over the present, vulnerable state. > $ cat /proc/sys/kernel/random/entropy_avail > 3451 > $ cat /proc/sys/kernel/random/entropy_avail > 3218 > $ cat /proc/sys/kernel/random/entropy_avail > 3000 > Each exec() eats from the pool on modern system. Amusing. Thanks, nm
On Sun, Jan 13, 2013 at 07:01:07PM -0500, Noah Misch wrote: > On Sun, Jan 13, 2013 at 05:46:12PM -0500, Tom Lane wrote: > > Marko Kreen <markokr@gmail.com> writes: > > > On Fri, Dec 21, 2012 at 10:27 PM, Noah Misch <noah@leadboat.com> wrote: > > >> How about instead calling RAND_cleanup() after each backend fork? > > > > > Attached is a patch that adds RAND_cleanup() to fork_process(). > > > > I remain unconvinced that this is the best solution. Anybody else have > > an opinion? > > I'd describe it as the clearly-secure solution. The biggest hazard I see is > the drain on system entropy. A system having only a blocking /dev/random > could suddenly find itself entropy-exhausted after installing the minor > upgrade. Backends could block waiting for system entropy to accumulate. > That's not directly a problem on Linux. However, if other programs on the > system read from /dev/random, the pressure from PostgreSQL's /dev/urandom > usage may make those programs wait longer for entropy. > > I'm also comfortable with the security of Marko's original proposal, modulo it > happening in each backend shortly after fork, not merely in pgcrypto. > OpenSSL's ssl module uses a similar method, and one of the papers I cited > describes it. (If anything, OpenSSL is less cautious. It uses time(), not > gettimeofday().) The performance characteristics of this approach are easier > to guess: one system call if we use MyProcPid + gettimeofday(), zero if we use > MyProcPid + MyStartTime. > > You proposed mixing gettimeofday() into the postmaster's entropy pool after > each fork. I wouldn't be too concerned if we did it that way, but my quick > literature review did not turn up any similar ideas. Given that this is > strictly more expensive than the previous method, I don't recommend it. > > Overall, I'd recommend mixing in MyProcPid and MyStartTime after each fork. I neglected to ping this for the last back-branch releases. May we adopt one of the above fixes and issue a CVE with the next releases? Though I've stated my preference, all three proposals would be major improvements over the present, quietly-vulnerable state. Thanks, nm