Thread: SSL renegotiation
Hi, I'm having a look at the SSL support code, because one of our customers reported it behaves unstably when the network is unreliable. I have yet to reproduce the exact problem they're having, but while reading the code I notice this in be-secure.c:secure_write() : if (ssl_renegotiation_limit && port->count > ssl_renegotiation_limit * 1024L) { SSL_set_session_id_context(port->ssl,(void *) &SSL_context, sizeof(SSL_context)); if (SSL_renegotiate(port->ssl) <= 0) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL renegotiation failure"))); if (SSL_do_handshake(port->ssl)<= 0) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL renegotiation failure"))); if (port->ssl->state!= SSL_ST_OK) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL failed to send renegotiation request"))); port->ssl->state |= SSL_ST_ACCEPT; SSL_do_handshake(port->ssl); if (port->ssl->state != SSL_ST_OK) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL renegotiation failure"))); port->count = 0; } I call your attention to the fact that we're calling SSL_do_handshake twice here; and what's more, we're messing with the internal port->ssl->state variable by setting the SSL_ST_ACCEPT bit. According to the commit message that introduces this[1], this is "text book correct", but I'm unable to find the text book that specifies that this is correct. Is it? I have gone through the OpenSSL documentation and can find not a single bit on this; and I have also gone through the OpenSSL mailing list archives and as far as I can tell they say you have to call SSL_renegotiate() and then SSL_do_handshake() once, and that's it. (You can call SSL_renegotiate_pending() afterwards to verify that the renegotiation completed successfully, which is something we don't do.) I have found in our archives several reports of people that get "SSL renegotiation failed" error messages in the log, and no explanation has ever been found. I instrumented that block, and I have observed that after the second handshake call, ssl->state is 0x2111, 0x21c0, 0x21a0 and other values; never SSL_ST_OK. (0x2000 is SSL_ST_ACCEPT which is the bit we added; SSL_ST_OK is 0x03). If I remove the second SSL_do_handshake() call and the changes to port->ssl->state, everything appears to work perfectly well and there's no extra log spam (and ssl->state is SSL_ST_OK by the time things are finished). So apparently renegotiation is not actually failing, but we're just adding a confusing error message for no apparent reason. Thoughts? I have still to find where the actual problems are happening in unreliable networks ... [1] commit 17386ac45345fe38a10caec9d6e63afa3ce31bb9 Author: Bruce Momjian <bruce@momjian.us> Date: Wed Jun 11 15:05:502003 +0000 patch by Sean Chittenden (in CC), posted as [2] [2] http://www.postgresql.org/message-id/20030419190821.GQ79923@perrin.int.nxad.com -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I think this block is better written as: if (ssl_renegotiation_limit && port->count > ssl_renegotiation_limit * 1024L) { SSL_set_session_id_context(port->ssl,(void *) &SSL_context, sizeof(SSL_context)); if (SSL_renegotiate(port->ssl) <= 0) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL renegotiation failure in renegotiate"))); else { int handshake; do { handshake = SSL_do_handshake(port->ssl); if (handshake <= 0) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL renegotiation failure in handshake, retrying"))); } while (handshake <= 0); if (port->ssl->state != SSL_ST_OK) ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL failed to send renegotiation request"))); else port->count = 0; } } In other words, retry the SSL_do_handshake() until it reports OK, but not more than that; this seems to give better results in my (admittedly crude) experiments. I am unsure about checking port->ssl->state after the handshake; it's probably pointless, really. In any case, the old code was calling SSL_do_handshake() even if SSL_renegotiate() failed; and it was resetting the port->count even if the handshake failed. Both of these smell like bugs to me. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Wow, that was a long time ago... I remember a few things about this 1) I was running in to an issue where every 64KB of transfer (or something inanely low like that), SSL was being renegotiated. This was causing performance problems over the wire. I think we settled on once an hour renegotiating the key, but I'd have to look again to check. It looks like the default is now 512MB, which is a more sane limit, but still pretty easy to exhaust - I have several hosts that would run past that default every 45sec or so, probably faster at peak times. 2) The system that I was running this on was Solaris 2.5, I believe, and /dev/random was having a problem staying populated given the frequent renegotiations, which prompted me to look in to this. In your testing and attempts to repro, try draining your prng pool, or patch things on Linux to read from /dev/random instead of /dev/urandom... something like that may be at fault and why limited testing won't expose this, but under load you might see it. *shrug* A WAG, but possibly relevant. Rough notes inline below (almost all of this should be wrapped in an <iirc> block): > if (ssl_renegotiation_limit && port->count > ssl_renegotiation_limit * 1024L) This doesn't seem right. 512MB * 1024? Maybe that's why I haven't actually had to play with this limit in a long time. Every 512GB is much more reasonable in that it would take 12hrs to renegotiate on a busy host. The "* 1024L" seems suspicious to me and should probably be removed in favor of the constant passed in from the config. > { > SSL_set_session_id_context(port->ssl, (void *) &SSL_context, > sizeof(SSL_context)); > if (SSL_renegotiate(port->ssl) <= 0) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL renegotiation failure"))); This sets a bit asking the peer to renegotiation. > if (SSL_do_handshake(port->ssl) <= 0) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL renegotiation failure"))); > if (port->ssl->state != SSL_ST_OK) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL failed to send renegotiation request"))); Push the request to renegotiate out to the peer and check the status. > port->ssl->state |= SSL_ST_ACCEPT; > SSL_do_handshake(port->ssl); In OpenSSL 0.9.6 this was the correct way to renegotiate a connection and I would need to confirm if this is still required for OpenSSL >= 0.9.7. > if (port->ssl->state != SSL_ST_OK) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL renegotiation failure"))); > port->count = 0; > } > > I call your attention to the fact that we're calling SSL_do_handshake > twice here; and what's more, we're messing with the internal > port->ssl->state variable by setting the SSL_ST_ACCEPT bit. According > to the commit message that introduces this[1], this is "text book > correct", but I'm unable to find the text book that specifies that this > is correct. Is it? I have gone through the OpenSSL documentation and > can find not a single bit on this; and I have also gone through the > OpenSSL mailing list archives and as far as I can tell they say you have > to call SSL_renegotiate() and then SSL_do_handshake() once, and that's > it. (You can call SSL_renegotiate_pending() afterwards to verify that > the renegotiation completed successfully, which is something we don't > do.) It would not surprise me if we could #ifdef out the "|= SSL_ST_ACCEPT" and second call to SSL_do_handshake() if we're running OpenSSL 0.9.7 or newer. iirc, 0.9.7's big feature was improving renegotiation. > I have found in our archives several reports of people that get "SSL > renegotiation failed" error messages in the log, and no explanation has > ever been found. I instrumented that block, and I have observed that > after the second handshake call, ssl->state is 0x2111, 0x21c0, 0x21a0 > and other values; never SSL_ST_OK. (0x2000 is SSL_ST_ACCEPT which is > the bit we added; SSL_ST_OK is 0x03). If I remove the second > SSL_do_handshake() call and the changes to port->ssl->state, everything > appears to work perfectly well and there's no extra log spam (and > ssl->state is SSL_ST_OK by the time things are finished). So apparently > renegotiation is not actually failing, but we're just adding a confusing > error message for no apparent reason. > > Thoughts? Since 0.9.6 almost certainly has vulnerabilities that haven't been fixed, can we just depreciate anything older than OpenSSL 0.9.7 or #ifdef the above out? 0.9.7 is also ancient, but we're talking about new releases... and while we're at it, can we prefer PFS ciphers? > I have still to find where the actual problems are happening in > unreliable networks ... I'm guessing you're blocking on /dev/random on some systems and that's the source of unreliability/timeouts. > [1] commit 17386ac45345fe38a10caec9d6e63afa3ce31bb9 > Author: Bruce Momjian <bruce@momjian.us> > Date: Wed Jun 11 15:05:50 2003 +0000 > > patch by Sean Chittenden (in CC), posted as [2] > > [2] http://www.postgresql.org/message-id/20030419190821.GQ79923@perrin.int.nxad.com If you want, I can dig in to this further.... my brain's way back machine isn't as on-demand as it used to be, but I can certainly help clean some of this up. -sc -- Sean Chittenden
Hi,
These are the relevant bits from Apache2.4's mod_ssl.
SSL_renegotiate(ssl);
SSL_do_handshake(ssl);
if (SSL_get_state(ssl) != SSL_ST_OK) {
ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, APLOGNO(02225)
"Re-negotiation request failed");
ssl_log_ssl_error(SSLLOG_MARK, APLOG_ERR, r->server);
r->connection->keepalive = AP_CONN_CLOSE;
return HTTP_FORBIDDEN;
}
ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, APLOGNO(02226)
"Awaiting re-negotiation handshake");
/* XXX: Should replace setting state with SSL_renegotiate(ssl);
* However, this causes failures in perl-framework currently,
* perhaps pre-test if we have already negotiated?
*/
#ifdef OPENSSL_NO_SSL_INTERN
SSL_set_state(ssl, SSL_ST_ACCEPT);
#else
ssl->state = SSL_ST_ACCEPT;
#endif
SSL_do_handshake(ssl);
sslconn->reneg_state = RENEG_REJECT;
if (SSL_get_state(ssl) != SSL_ST_OK) {
ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, APLOGNO(02261)
"Re-negotiation handshake failed: "
"Not accepted by client!?");
r->connection->keepalive = AP_CONN_CLOSE;
return HTTP_FORBIDDEN;
}
That code supports at least OpenSSL 0.9.7 and later.
Some explanation for it can be found here:
http://books.google.dk/books?id=IIqwAy4qEl0C&pg=PT184&lpg=PT184&dq=SSL_do_handshake&source=bl&ots=ma632U4vWv&sig=d2qqS0svhu4EwIZZaONdHoTulVI&hl=en&sa=X&ei=xdPdUczoDuf34QSzmoDQDg&ved=0CIIDEOgBMCo
The snippet there is probably the textbook implementation.
So the original code looks OK. Perhaps the check
on the return code of the first SSL_do_handshake (and SSL_renegotiate)
is unnecessary and may lead to unwarranted error messages, though.
on the return code of the first SSL_do_handshake (and SSL_renegotiate)
is unnecessary and may lead to unwarranted error messages, though.
And it may be wrong to continue the renegotiation if
the state is not SSL_ST_OK after the first SSL_do_handshake.
If the renegotiation fails, I suppose two things can be done:
1. Quit the connection
2. Carry on pretending nothing happened.
I think 2. is correct in the vast majority of cases (as it looks like is being done now).
And in that case: not resetting port->count will make for a very slow
and awkward connection as new handshakes will be attempted again and again,
the state is not SSL_ST_OK after the first SSL_do_handshake.
If the renegotiation fails, I suppose two things can be done:
1. Quit the connection
2. Carry on pretending nothing happened.
I think 2. is correct in the vast majority of cases (as it looks like is being done now).
And in that case: not resetting port->count will make for a very slow
and awkward connection as new handshakes will be attempted again and again,
even if the other party persistently refuse to shake hands.
Kind Regards
Troels Nielsen
On Thu, Jul 11, 2013 at 12:34 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I think this block is better written as:errmsg("SSL renegotiation failure in renegotiate")));
if (ssl_renegotiation_limit && port->count > ssl_renegotiation_limit * 1024L)
{
SSL_set_session_id_context(port->ssl, (void *) &SSL_context,
sizeof(SSL_context));
if (SSL_renegotiate(port->ssl) <= 0)
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
else
{
int handshake;
do {
handshake = SSL_do_handshake(port->ssl);
if (handshake <= 0)
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL renegotiation failure in handshake, retrying")));
} while (handshake <= 0);else
if (port->ssl->state != SSL_ST_OK)
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL failed to send renegotiation request")));
port->count = 0;
}
}
In other words, retry the SSL_do_handshake() until it reports OK, but
not more than that; this seems to give better results in my (admittedly
crude) experiments. I am unsure about checking port->ssl->state after
the handshake; it's probably pointless, really.
In any case, the old code was calling SSL_do_handshake() even if
SSL_renegotiate() failed; and it was resetting the port->count even if
the handshake failed. Both of these smell like bugs to me.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
> If the renegotiation fails AH! Now I remember. SSL clients can optionally renegotiate, it's not required to renegotiate the session if the other side chooses not to (almost certainly due to a bug or limitation in the client's connecting library). By monkeying with the state, you can explicitly force a client to renegotiate. I don't think in code from yesteryear it was portable or possible to see if the server successfully renegotiated a connection before 0.9.6, so you just forced the client to renegotiate after the server and ignored the result. A client pausing for a few extra round trips was probably never noticed. I'm not saying this is correct, but I think that was the thinking back in the day. > , I suppose two things can be done: > > 1. Quit the connection With my Infosec hat on, this is the correct option - force the client back in to compliance with whatever the stated crypto policy is through a reconnection. > 2. Carry on pretending nothing happened. This is almost never correct in a security context (all errors or abnormalities must boil up). > I think 2 is correct in the vast majority of cases (as it looks like > is being done now). That is a correct statement in that most code disregards renegotiation, but that is because there is a pragmatic assumption that HTTPS connections will be short lived. In the case of PostgreSQL, there is a good chance that a connection will be established for weeks or months. In the case of Apache, allowing a client to renegotiate every byte would be a possible CPU DoS, but I digress.... > And in that case: not resetting port->count will make for a very slow > and awkward connection as new handshakes will be attempted again and > again, > even if the other party persistently refuse to shake hands. Which could lead to a depletion of random bits. This sounds like a plausible explanation to me. Too bad we're stuck using an ill-concieved SSL implementation and can't use botan[1]. > I think this block is better written as: > > if (ssl_renegotiation_limit && port->count > > ssl_renegotiation_limit * 1024L) I don't think the " * 1024L" is right. > { > SSL_set_session_id_context(port->ssl, (void *) > &SSL_context, > sizeof(SSL_context)); > if (SSL_renegotiate(port->ssl) <= 0) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL renegotiation failure in > renegotiate"))); > else > { > int handshake; > > do { > handshake = SSL_do_handshake(port->ssl); > if (handshake <= 0) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL renegotiation failure > in handshake, retrying"))); > } while (handshake <= 0); It's worth noting that for broken SSL implementations or SSL implementations that refuse to renegotiate, this will be yield a stalled connection, though at least it will be obvious in the logs. I think this is the correct approach. It is probably prudent to set an upper bound on this loop in order to free up the resource and unblock the client who will appear to be mysteriously hung for no reason until they look at the PostgreSQL server logs. > if (port->ssl->state != SSL_ST_OK) > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL failed to send > renegotiation request"))); > else > port->count = 0; > } > } > > In other words, retry the SSL_do_handshake() until it reports OK, but > not more than that; this seems to give better results in my > (admittedly > crude) experiments. I am unsure about checking port->ssl->state after > the handshake; it's probably pointless, really. Correct. In async IO, this would be important, but since the server is synchronous in its handling of communication, we can remove the if/else (state != SSL_ST_OK) block. > In any case, the old code was calling SSL_do_handshake() even if > SSL_renegotiate() failed; and it was resetting the port->count even if > the handshake failed. Both of these smell like bugs to me. I don't know how SSL_renegotiate() could fail in the past. SSL_renegotiate(3) should never fail on a well formed implementation (e.g. ssl/t1_reneg.c @ssl_add_serverhello_renegotiate_ext()). While we're on the subject of crypto and OpenSSL, we force server ciphers to be preferred instead of client ciphers: SSL_CTX_set_options(SSL_context, SSL_OP_CIPHER_SERVER_PREFERENCE); http://www.openssl.org/docs/ssl/SSL_CTX_set_options.html#NOTES SSL_get_secure_renegotiation_support() would be a good call to add to autoconf to see if it is supported (OpenSSL >= 0.9.8m), which would make this code a good bit easier. Add a GUC, ssl_renegotiation_require=(true,false,disconnect), which would force renegotiations, ignore renegotiation failures, and disconnect a client if it fails. As mentioned above re: breaking out of the loop, there should probably be a ssl_renegotiation_min tunable so that the client can't renegotiate too fast. The default ssl_ciphers in the examples should also be updated as well (e.g. we still allow SSLv2): ssl_ciphers = 'ECDHE-RSA-AES128-SHA256:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH:@STRENGTH' Longer, but current with today's security standards. -sc [1] http://botan.randombit.net/tls.html?highlight=renegotiate -- Sean Chittenden
On Thu, Jul 11, 2013 at 4:20 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I'm having a look at the SSL support code, because one of our customers > reported it behaves unstably when the network is unreliable. I have yet > to reproduce the exact problem they're having, but while reading the > code I notice this in be-secure.c:secure_write() : The recap of my experiences you requested... I first saw SSL renegotiation failures on Ubuntu 10.04 LTS (Lucid) with openssl 0.9.8 (something). I think this was because SSL renegotiation had been disabled due to due to CVE 2009-3555 (affecting all versions before 0.9.8l). I think the version now in lucid is 0.9.8k with fixes for SSL renegotiation, but I haven't tested this. The failures I saw with no-renegotiation-SSL for streaming replication looked like this: On the master: 2012-06-25 16:16:26 PDT LOG: SSL renegotiation failure 2012-06-25 16:16:26 PDT LOG: SSL error: unexpected record 2012-06-25 16:16:26 PDT LOG: could not send data to client: Connection reset by peer On the hot standby: 2012-06-25 11:12:11 PDT FATAL: could not receive data from WAL stream: SSL error: sslv3 alert unexpected message 2012-06-25 11:12:11 PDT LOG: record with zero length at 1C5/95D2FE00 Now I'm running Ubuntu 12.04 LTS (Precise) with openssl 1.0.1, and I think all the known renegotiation issues have been dealt with. I still get failures, but they are less informative: <postgres@[unknown]:19761> 2013-03-15 03:55:12 UTC LOG: SSL renegotiation failure -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Thu, Jul 11, 2013 at 1:13 AM, Sean Chittenden <sean@chittenden.org> wrote: >> , I suppose two things can be done: >> >> 1. Quit the connection > > With my Infosec hat on, this is the correct option - force the client > back in to compliance with whatever the stated crypto policy is through > a reconnection. > >> 2. Carry on pretending nothing happened. > > This is almost never correct in a security context (all errors or > abnormalities must boil up). > >> I think 2 is correct in the vast majority of cases (as it looks like >> is being done now). > > That is a correct statement in that most code disregards renegotiation, > but that is because there is a pragmatic assumption that HTTPS > connections will be short lived. In the case of PostgreSQL, there is a > good chance that a connection will be established for weeks or months. > In the case of Apache, allowing a client to renegotiate every byte would > be a possible CPU DoS, but I digress.... And, allowing the client to refuse to renegotiate leaves the relevant vulnerability unpatched. Renegotiation was introduced to patch a vulnerability in which, without renegotiation, there was the possibility of an attacker gaining knowledge of session keys (and hence the ability to intercept the stream). I think 2 is not viable in this context. Only 1.
Troels Nielsen escribió: > Hi, > > These are the relevant bits from Apache2.4's mod_ssl. > > [snip] So this is basically the same thing the Pg code is doing. > That code supports at least OpenSSL 0.9.7 and later. > > Some explanation for it can be found here: > > http://books.google.dk/books?id=IIqwAy4qEl0C&pg=PT184&lpg=PT184&dq=SSL_do_handshake&source=bl&ots=ma632U4vWv&sig=d2qqS0svhu4EwIZZaONdHoTulVI&hl=en&sa=X&ei=xdPdUczoDuf34QSzmoDQDg&ved=0CIIDEOgBMCo > > The snippet there is probably the textbook implementation. Ah, thanks for the pointer. I notice this book is from 2002 and documents OpenSSL 0.9.6. So yeah, that's what both mod_ssl and Pg implement. However, reading the text carefully, I see that this snippet is the correct implementation for 0.9.6 and earlier; the same book, speaking about the 0.9.7 release in the future tense, explains that in that release it will be much easier to do renegotiations, without getting into precise details of how exactly is to be done (I suppose the OpenSSL team hadn't finalized the API yet). As far as I can understand, what I propose is the right sequence. Now, should we support the 0.9.6-and-earlier mechanism? My inclination is no; even RHEL 3, the oldest supported Linux distribution, uses 0.9.7 (Heck, even Red Hat Linux 9, released on 2003). To see OpenSSL 0.9.6 you need to go back to Red Hat Linux 7.2, released on 2001 using a Linux kernel 2.4. Surely no one in their right mind would use a current Postgres release on such an ancient animal. So I continue to maintain that we should rip the old mechanism out and replace it with current renegotiation. Now, I've read a bit more of the code and it seems that we should be doing SSL_renegotiate() and then check the SSL_renegotiate_pending() result until it returns that the renegotiation has completed. > So the original code looks OK. Perhaps the check > on the return code of the first SSL_do_handshake (and SSL_renegotiate) > is unnecessary and may lead to unwarranted error messages, though. > And it may be wrong to continue the renegotiation if > the state is not SSL_ST_OK after the first SSL_do_handshake. > > If the renegotiation fails, I suppose two things can be done: > 1. Quit the connection > 2. Carry on pretending nothing happened. > > I think 2. is correct in the vast majority of cases (as it looks like is > being done now). > And in that case: not resetting port->count will make for a very slow > and awkward connection as new handshakes will be attempted again and again, > even if the other party persistently refuse to shake hands. Good point. From a security point of view, it seems that the correct reaction is to close the connection if renegotiation doesn't complete. Along the same lines, it seems to me that accepting SSLv2 (which doesn't support renegotiations at all) when renegotiations have been requested is not a good choice; we should accept only SSLv3 in that case. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jul 12, 2013 at 04:32:52PM -0400, Alvaro Herrera wrote: > Now, should we support the 0.9.6-and-earlier mechanism? My inclination > is no; even RHEL 3, the oldest supported Linux distribution, uses 0.9.7 > (Heck, even Red Hat Linux 9, released on 2003). To see OpenSSL 0.9.6 > you need to go back to Red Hat Linux 7.2, released on 2001 using a Linux > kernel 2.4. Surely no one in their right mind would use a current > Postgres release on such an ancient animal. Agreed. The OpenSSL Project last applied a security fix to 0.9.6 over eight years ago. Compatibility with 0.9.6 has zero or negative value. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Fri, Jul 12, 2013 at 8:51 PM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Jul 12, 2013 at 04:32:52PM -0400, Alvaro Herrera wrote: >> Now, should we support the 0.9.6-and-earlier mechanism? My inclination >> is no; even RHEL 3, the oldest supported Linux distribution, uses 0.9.7 >> (Heck, even Red Hat Linux 9, released on 2003). To see OpenSSL 0.9.6 >> you need to go back to Red Hat Linux 7.2, released on 2001 using a Linux >> kernel 2.4. Surely no one in their right mind would use a current >> Postgres release on such an ancient animal. > > Agreed. The OpenSSL Project last applied a security fix to 0.9.6 over eight > years ago. Compatibility with 0.9.6 has zero or negative value. +1 from me as well, if any more are needed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jul 12, 2013 at 08:51:52PM -0400, Noah Misch wrote: > On Fri, Jul 12, 2013 at 04:32:52PM -0400, Alvaro Herrera wrote: > > Now, should we support the 0.9.6-and-earlier mechanism? My > > inclination is no; even RHEL 3, the oldest supported Linux > > distribution, uses 0.9.7 (Heck, even Red Hat Linux 9, released on > > 2003). To see OpenSSL 0.9.6 you need to go back to Red Hat Linux > > 7.2, released on 2001 using a Linux kernel 2.4. Surely no one in > > their right mind would use a current Postgres release on such an > > ancient animal. > > Agreed. The OpenSSL Project last applied a security fix to 0.9.6 > over eight years ago. Compatibility with 0.9.6 has zero or negative > value. You've made a persuasive case that we should actively break backward compatibility here. Would that be complicated to do? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Jul 16, 2013 at 10:41:44AM -0700, David Fetter wrote: > On Fri, Jul 12, 2013 at 08:51:52PM -0400, Noah Misch wrote: > > Agreed. The OpenSSL Project last applied a security fix to 0.9.6 > > over eight years ago. Compatibility with 0.9.6 has zero or negative > > value. > > You've made a persuasive case that we should actively break backward > compatibility here. Would that be complicated to do? Nope. If Alvaro's code change builds under 0.9.6, malfunctioning only at runtime, I suspect we would add a "configure"-time version check and possibly a runtime one as well. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Here's the patch I propose to handle renegotiation cleanly. I noticed in testing that SSL_renegotiate_pending() doesn't seem to actually work --- if I throw an ereport(FATAL) at the point where I expect the renegotiation to be complete, it always dies; even if I give it megabytes of extra traffic while waiting for the renegotiation to complete. I suspect this is an OpenSSL bug. Instead, in this patch I check the internal renegotiation counter: grab its current value when starting the renegotiation, and consider it complete when the counter has advanced. This works fine. Another thing not covered by the original code snippet I proposed upthread is to avoid renegotiating when there was a renegotiation in progress. This bug has been observed in the field. Per discussion, I made it close the connection with a FATAL error if the limit is reached and the renegotiation hasn't taken place. To do otherwise is not acceptable for a security PoV. Sean Chittenden mentioned that when retrying the handshake, we should be careful to only do it a few times, not forever, to avoid a malfeasant from grabbing hold of a connection indefinitely. I've added that too, hardcoding the number of retries to 20. Also, I made this code request a renegotiation slightly before the limit is actually reached. I noticed that in some cases some traffic can go by before the renegotiation is actually completed. The difference should be pretty minimal. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Here's an updated version; this mainly simplifies code, per comments from Andres (things were a bit too baroque in places due to the way the code had evolved, and I hadn't gone over it to simplify it). The only behavior change is that the renegotiation is requested 1kB before the limit is hit: the raise to 1% of the configured limit was removed. The "fixup" is an incremental on top of the previous version; the other one is the full v2 patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Sep 23, 2013 at 4:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Here's an updated version; this mainly simplifies code, per comments > from Andres (things were a bit too baroque in places due to the way the > code had evolved, and I hadn't gone over it to simplify it). > > The only behavior change is that the renegotiation is requested 1kB > before the limit is hit: the raise to 1% of the configured limit was > removed. What basis do we have for thinking that 1kB is definitely enough to avoid spurious disconnects? (I have a bad feeling that you're going to say something along the lines of "well, we tried it a bunch of times, and...".) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Mon, Sep 23, 2013 at 4:51 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Here's an updated version; this mainly simplifies code, per comments > > from Andres (things were a bit too baroque in places due to the way the > > code had evolved, and I hadn't gone over it to simplify it). > > > > The only behavior change is that the renegotiation is requested 1kB > > before the limit is hit: the raise to 1% of the configured limit was > > removed. > > What basis do we have for thinking that 1kB is definitely enough to > avoid spurious disconnects? I noticed that the "count" variable (which is what we use to determine when to start the renegotiation and eventually kill the connection) is only incremented when there's successful SSL transmission: it doesn't count low-level network transmission. If OpenSSL returns a WANT_READ or WANT_WRITE error code, that variable is not incremented. The number of bytes returned does not include network data transmitted only to satisfy the renegotiation. Sadly, with the OpenSSL codebase, there isn't much documented field experience to go by. Even something battle-tested such as Apache's mod_ssl gets this wrong; but apparently they don't care because their sessions are normally so short-lived that they don't get these problems. Also, I spent several days trying to understand the OpenSSL codebase to figure out how this works, and I think there might be bugs in there too, at least with nonblocking sockets. I wasn't able to reproduce an actual failure, though. Funnily enough, their own test utilities do not stress this area too much (at least the ones they include in their release tarballs). > (I have a bad feeling that you're going to say something along the > lines of "well, we tried it a bunch of times, and...".) Well, I did try a few times and saw no failure :-) I have heard about processes in production environments that are restarted periodically to avoid SSL failures which they blame on renegotiation. Some other guys have ssl_renegotiation_limit=0 because they know it causes network problems. I suggest we need to get this patch out there, so that they can test it; and if 1kB turns out not to be sufficient, we will have field experience including appropriate error messages on what is actually going on. (Right now, the error messages we get are complaining about completely the wrong thing.) I mean, if that 1kB limit is the only quarrel you have with this patch, I'm happy. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 24, 2013 at 12:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I mean, if that 1kB limit is the only quarrel you have with this patch, > I'm happy. You should probably be happy, then. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Since back branches releases are getting closer, I would like to push this to all supported branches. To avoid a compatibility nightmare in case the new die-on-delayed-renegotiation behavior turns out not to be so great, I think it would be OK to set the error level to WARNING in all branches but master (and reset the byte count, to avoid filling the log). I would also add a CONTEXT line with the current counter value and the configured limit, and a HINT to report to pg-hackers. That way we will hopefully have more info on problems in the field. Anybody opposed to this? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 1, 2013 at 9:16 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Since back branches releases are getting closer, I would like to push > this to all supported branches. To avoid a compatibility nightmare in > case the new die-on-delayed-renegotiation behavior turns out not to be > so great, I think it would be OK to set the error level to WARNING in > all branches but master (and reset the byte count, to avoid filling the > log). I would also add a CONTEXT line with the current counter value > and the configured limit, and a HINT to report to pg-hackers. That way > we will hopefully have more info on problems in the field. > > Anybody opposed to this? Yes, warning suck. If things just failed, users would fix them, but instead they fill up their hard disk, and then things fail much later, usually when they are asleep in bed. If we can't feel comfortable with an ERROR, let's not do it at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 1, 2013 at 4:17 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 1, 2013 at 9:16 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Since back branches releases are getting closer, I would like to push >> this to all supported branches. To avoid a compatibility nightmare in >> case the new die-on-delayed-renegotiation behavior turns out not to be >> so great, I think it would be OK to set the error level to WARNING in >> all branches but master (and reset the byte count, to avoid filling the >> log). I would also add a CONTEXT line with the current counter value >> and the configured limit, and a HINT to report to pg-hackers. That way >> we will hopefully have more info on problems in the field. >> >> Anybody opposed to this? > > Yes, warning suck. If things just failed, users would fix them, but > instead they fill up their hard disk, and then things fail much later, > usually when they are asleep in bed. > > If we can't feel comfortable with an ERROR, let's not do it at all. In principle, I agree. However, if we want to do this as a temporary measure to judge impact, we could do WARNING now and flip it to ERROR in the next minor release. However, do we think we'll actually *get* any reports in of it if we do that? As in would we trust the input? If we do, the it might be worth doing that. If we don't believe we'll get any input from the WARNINGs anyway, we might as well flip it to an ERROR. But if we do flip it to an ERROR, we should have a way to disable that error if, as Alvaro puts it, we end up breaking too many things. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Oct 1, 2013 at 10:19 AM, Magnus Hagander <magnus@hagander.net> wrote: >> If we can't feel comfortable with an ERROR, let's not do it at all. > > In principle, I agree. > > However, if we want to do this as a temporary measure to judge impact, > we could do WARNING now and flip it to ERROR in the next minor > release. I can't imagine that whacking the behavior around multiple times is going to please very many people. And, from a practical standpoint, the time between minor releases is typically on the order of ~3 months. That's not very long. The situations in which trouble occurs are likely to obscure, and a lot of users don't apply every minor release, or don't apply them in a timely fashion. So I can't see that this strategy would increase our confidence very much anyway. In other words... > However, do we think we'll actually *get* any reports in of it if we > do that? As in would we trust the input? ...no. > If we do, the it might be > worth doing that. If we don't believe we'll get any input from the > WARNINGs anyway, we might as well flip it to an ERROR. But if we do > flip it to an ERROR, we should have a way to disable that error if, as > Alvaro puts it, we end up breaking too many things. A way of disabling the error seems like an awfully good idea. Since I know my audience, I won't suggest the obvious method of accomplishing that goal, but I think we all know what it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-10-01 10:27:14 -0400, Robert Haas wrote: > On Tue, Oct 1, 2013 at 10:19 AM, Magnus Hagander <magnus@hagander.net> wrote: > >> If we can't feel comfortable with an ERROR, let's not do it at all. > > > > In principle, I agree. > > > > However, if we want to do this as a temporary measure to judge impact, > > we could do WARNING now and flip it to ERROR in the next minor > > release. > > I can't imagine that whacking the behavior around multiple times is > going to please very many people. If we have to whack it around, it's because there's a bug in either our usage of openssl or in openssl itself. Neither is particularly unlikely, but it's not like not erroring or warning out will stop that. The alternate reason for getting the WARNING/ERROR is that there is somebody MITMing the connection and explicitly corrupting renegotiation packets. But that's a reason for making it an ERROR immediately, not making it silent. I think from the security POV it's pretty clear that we should make it an error. So, imo the question is why are we uncomfortable making it an ERROR immediately? I think the most likely reason for problems is users having configured ssl_renegotiation_limit absurdly low, like less than what the particular algorithms used actually need. What about clamping it to 1MB if != 0? We can't make it an actual error in the backbranches... > > If we do, the it might be > > worth doing that. If we don't believe we'll get any input from the > > WARNINGs anyway, we might as well flip it to an ERROR. But if we do > > flip it to an ERROR, we should have a way to disable that error if, as > > Alvaro puts it, we end up breaking too many things. > > A way of disabling the error seems like an awfully good idea. Since I > know my audience, I won't suggest the obvious method of accomplishing > that goal, but I think we all know what it is. In which scenario would you want to do that? The way to prevent the ERROR it is to disable renegotiation entirely. And that's already possible. Anything else seems like papering over security issues. I guess I am voting for making renegotiation failure an ERROR everywhere and silently clamp renegotiation_limit to a lower bound of 1MB in the backbranches while making the (0, 1MB) an error in HEAD. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/23/2013 10:51 PM, Alvaro Herrera wrote: > + /* are we in the middle of a renegotiation? */ > + static bool in_ssl_renegotiation = false; > + Since this was committed, I'm getting the following warning: be-secure.c:105:13: warning: ‘in_ssl_renegotiation’ defined but not used [-Wunused-variable] -- Vik
Vik Fearing wrote: > On 09/23/2013 10:51 PM, Alvaro Herrera wrote: > > + /* are we in the middle of a renegotiation? */ > > + static bool in_ssl_renegotiation = false; > > Since this was committed, I'm getting the following warning: > > be-secure.c:105:13: warning: ‘in_ssl_renegotiation’ defined but not used > [-Wunused-variable] Jaime Casanova wrote: > Shouldn't this new variable be declared inside the #ifdef USE_SSL block? > My compiler is giving me a warning for the unused variable Yep, thanks guys. Just pushed a fix. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
So I committed this patch without backpatching anything. There was some discussion about the exact strategy for backpatching the behavior change, but no final agreement; the suggestions were 0. Backpatch as an ERROR. If it causes failures, people is supposed to change their apps or something. 1. Don't backpatch the ERROR bit at all, so that if the renegotiation fails we would silently continue just as currently 2. Do spit the message, but only as a WARNING. Thinks this may end up causing log disks to fill up. 3. Add it as an ERROR, but make it possible to disable it, presumably via a new GUC. So people can see their security problems and hopefuly fix them, but if they don't, then they can shut it up via server configuration. This would entail a GUC variable that exists in existing branches but not HEAD (we could avoid an upgradability problem of postgresql.conf files by having a no-op phantom GUC variable in HEAD). I was reminded of this once more because I just saw a spurious renegotiation failure in somebody's production setup. Kind of like a recurring nightmare which I thought I had already erradicated. Opinions? Also, should we wait longer for the new renegotiation code to be more battle-tested? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro, * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > 1. Don't backpatch the ERROR bit at all, so that if the renegotiation > fails we would silently continue just as currently I'm leaning towards the above at this point. > I was reminded of this once more because I just saw a spurious > renegotiation failure in somebody's production setup. Kind of like a > recurring nightmare which I thought I had already erradicated. I saw one yesterday. :( > Opinions? Also, should we wait longer for the new renegotiation code to > be more battle-tested? I've got a better environment to test this in now and given that I saw it just yesterday, I'm very interested in addressing it. I grow tired of seeing these renegotiation errors. Thanks! Stephen
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > So I committed this patch without backpatching anything. ... > ... should we wait longer for the new renegotiation code to > be more battle-tested? +1 to waiting awhile. I think if we don't see any problems in HEAD, then back-patching as-is would be the best solution. The other alternatives are essentially acknowledging that you're back-patching something you're afraid isn't production ready. Let's not go there. Another reason I'm not in a hurry is that the problem we're trying to solve doesn't seem to be causing real-world trouble. So by "awhile", I'm thinking "let's let it get through 9.4 beta testing". regards, tom lane
On 2013-11-15 10:43:23 -0500, Tom Lane wrote: > +1 to waiting awhile. I think if we don't see any problems in > HEAD, then back-patching as-is would be the best solution. > The other alternatives are essentially acknowledging that you're > back-patching something you're afraid isn't production ready. > Let's not go there. Agreed. Both on just backpatching it unchanged and waiting for the fix to prove itself a bit. > Another reason I'm not in a hurry is that the problem we're trying > to solve doesn't seem to be causing real-world trouble. So by > "awhile", I'm thinking "let's let it get through 9.4 beta testing". Well, there have been a bunch of customer complaints about it, afair that's what made Alvaro look into it in the first place. So it's not a victimless bug. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-11-15 10:43:23 -0500, Tom Lane wrote: >> Another reason I'm not in a hurry is that the problem we're trying >> to solve doesn't seem to be causing real-world trouble. So by >> "awhile", I'm thinking "let's let it get through 9.4 beta testing". > Well, there have been a bunch of customer complaints about it, afair > that's what made Alvaro look into it in the first place. So it's not a > victimless bug. OK, then maybe end-of-beta is too long. But how much testing will it get during development? I know I never use SSL on development installs. How many hackers do? regards, tom lane
On 2013-11-15 10:58:19 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-11-15 10:43:23 -0500, Tom Lane wrote: > >> Another reason I'm not in a hurry is that the problem we're trying > >> to solve doesn't seem to be causing real-world trouble. So by > >> "awhile", I'm thinking "let's let it get through 9.4 beta testing". > > > Well, there have been a bunch of customer complaints about it, afair > > that's what made Alvaro look into it in the first place. So it's not a > > victimless bug. > > OK, then maybe end-of-beta is too long. But how much testing will it get > during development? I know I never use SSL on development installs. > How many hackers do? I guess few. And even fewer will actually have connections that live long enough to experience renegotiations :/. I wonder how hard it'd be to rig the buildfarm code to generate ssl certificates and use them during installcheck. If we'd additionally set a low renegotiation limit... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 15, 2013 at 10:49 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Another reason I'm not in a hurry is that the problem we're trying >> to solve doesn't seem to be causing real-world trouble. So by >> "awhile", I'm thinking "let's let it get through 9.4 beta testing". > > Well, there have been a bunch of customer complaints about it, afair > that's what made Alvaro look into it in the first place. So it's not a > victimless bug. Well, can any of those people try running with this patch? That'd be a good way of getting some confidence in it. Generally, I agree that something needs to be back-patched here. But we don't want to create a situation where we fix some people and break others, and it's not too obvious that we have a way to get there. Personally, I favor adding some kind of GUC to control the behavior, but I'm not exactly sure what the shape of it ought to be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-11-15 10:43:23 -0500, Tom Lane wrote: > >> Another reason I'm not in a hurry is that the problem we're trying > >> to solve doesn't seem to be causing real-world trouble. So by > >> "awhile", I'm thinking "let's let it get through 9.4 beta testing". > > > Well, there have been a bunch of customer complaints about it, afair > > that's what made Alvaro look into it in the first place. So it's not a > > victimless bug. > > OK, then maybe end-of-beta is too long. But how much testing will it get > during development? I know I never use SSL on development installs. > How many hackers do? Just a reminder that I intend to backpatch this (and subsequent fixes). We've gone over two 9.4 betas now. Maybe it'd be a good thing if the beta3 announcement carried a note about enabling SSL with a low ssl_renegotiation_limit setting. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Aug 25, 2014 at 11:46:13PM -0400, Alvaro Herrera wrote: > Tom Lane wrote: > > OK, then maybe end-of-beta is too long. But how much testing will it get > > during development? I know I never use SSL on development installs. > > How many hackers do? > > Just a reminder that I intend to backpatch this (and subsequent fixes). > We've gone over two 9.4 betas now. Maybe it'd be a good thing if the > beta3 announcement carried a note about enabling SSL with a low > ssl_renegotiation_limit setting. To elaborate on my private comments of 2013-10-11, I share Robert's wariness[1] concerning the magic number of 1024 bytes of renegotiation headroom. Use of that number predates your work, but your work turned exhaustion of that headroom into a FATAL error. Situations where the figure is too small will become disruptive, whereas the problem is nearly invisible today. Network congestion is a factor, so the lack of complaints during beta is relatively uninformative. Disabling renegotiation is a quick workaround, fortunately, but needing to use that workaround will damage users' fragile faith in the safety of our minor releases. My recommendation is to either keep this 9.4-only or do focused load testing to determine the actual worst-case headroom requirement. [1] http://www.postgresql.org/message-id/CA+TgmoZVGmyZLx7e4ARq_5nu4uDeN7wrvg1xJXg_o9c61CHu3g@mail.gmail.com