Thread: Improve errors when setting incorrect bounds for SSL protocols
Hi all, (Daniel G. in CC.) As discussed on the thread to be able to set the min/max SSL protocols with libpq, when mixing incorrect bounds the user experience is not that good: https://www.postgresql.org/message-id/9CFA34EE-F670-419D-B92C-CB7943A27573@yesql.se It happens that the error generated with incorrect combinations depends solely on what OpenSSL thinks is fine, and that's the following: psql: error: could not connect to server: SSL error: tlsv1 alert internal error It is hard for users to understand what such an error means and how to act on it. Please note that OpenSSL 1.1.0 has added two routines to be able to get the min/max protocols set in a context, called SSL_CTX_get_min/max_proto_version. Thinking about older versions of OpenSSL I think that it is better to use ssl_protocol_version_to_openssl to do the parsing work. I also found that it is easier to check for compatible versions after setting both bounds in the SSL context, so as there is no need to worry about invalid values depending on the build of OpenSSL used. So attached is a patch to improve the detection of incorrect combinations. Once applied, we get a complain about an incorrect version at server startup (FATAL) or reload (LOG). The patch includes new regression tests. Thoughts? -- Michael
Attachment
> On 14 Jan 2020, at 04:54, Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > (Daniel G. in CC.) > > As discussed on the thread to be able to set the min/max SSL protocols > with libpq, when mixing incorrect bounds the user experience is not > that good: > https://www.postgresql.org/message-id/9CFA34EE-F670-419D-B92C-CB7943A27573@yesql.se > > It happens that the error generated with incorrect combinations > depends solely on what OpenSSL thinks is fine, and that's the > following: > psql: error: could not connect to server: SSL error: tlsv1 alert > internal error > > It is hard for users to understand what such an error means and how to > act on it. Correct, it's an easy mistake to make but based on the error it might take some time to figure it out. > Please note that OpenSSL 1.1.0 has added two routines to be able to > get the min/max protocols set in a context, called > SSL_CTX_get_min/max_proto_version. Thinking about older versions of > OpenSSL I think that it is better to use > ssl_protocol_version_to_openssl to do the parsing work. I also found > that it is easier to check for compatible versions after setting both > bounds in the SSL context, so as there is no need to worry about > invalid values depending on the build of OpenSSL used. I'm not convinced that it's a good idea to check for incompatible protocol range in the OpenSSL backend. We've spent a lot of energy to make the TLS code library agnostic and pluggable, and since identifying a basic configuration error isn't OpenSSL specific I think it should be in the guc code. That would keep the layering as well as ensure that we don't mistakenly treat this differently should we get a second TLS backend. cheers ./daniel
On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote: > On 14 Jan 2020, at 04:54, Michael Paquier <michael@paquier.xyz> wrote: >> Please note that OpenSSL 1.1.0 has added two routines to be able to >> get the min/max protocols set in a context, called >> SSL_CTX_get_min/max_proto_version. Thinking about older versions of >> OpenSSL I think that it is better to use >> ssl_protocol_version_to_openssl to do the parsing work. I also found >> that it is easier to check for compatible versions after setting both >> bounds in the SSL context, so as there is no need to worry about >> invalid values depending on the build of OpenSSL used. > > I'm not convinced that it's a good idea to check for incompatible protocol > range in the OpenSSL backend. We've spent a lot of energy to make the TLS code > library agnostic and pluggable, and since identifying a basic configuration > error isn't OpenSSL specific I think it should be in the guc code. That would > keep the layering as well as ensure that we don't mistakenly treat this > differently should we get a second TLS backend. Good points. And the get routines are not that portable in OpenSSL either even if HEAD supports 1.0.1 and newer versions... Attached is an updated patch which uses a GUC check for both parameters, and provides a hint on top of the original error message. The SSL context does not get reloaded if there is an error, so the errors from OpenSSL cannot be triggered as far as I checked (after mixing a couple of corrent and incorrect combinations manually). -- Michael
Attachment
> On 15 Jan 2020, at 03:28, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote: >> On 14 Jan 2020, at 04:54, Michael Paquier <michael@paquier.xyz> wrote: >>> Please note that OpenSSL 1.1.0 has added two routines to be able to >>> get the min/max protocols set in a context, called >>> SSL_CTX_get_min/max_proto_version. Thinking about older versions of >>> OpenSSL I think that it is better to use >>> ssl_protocol_version_to_openssl to do the parsing work. I also found >>> that it is easier to check for compatible versions after setting both >>> bounds in the SSL context, so as there is no need to worry about >>> invalid values depending on the build of OpenSSL used. >> >> I'm not convinced that it's a good idea to check for incompatible protocol >> range in the OpenSSL backend. We've spent a lot of energy to make the TLS code >> library agnostic and pluggable, and since identifying a basic configuration >> error isn't OpenSSL specific I think it should be in the guc code. That would >> keep the layering as well as ensure that we don't mistakenly treat this >> differently should we get a second TLS backend. > > Good points. And the get routines are not that portable in OpenSSL > either even if HEAD supports 1.0.1 and newer versions... Attached is > an updated patch which uses a GUC check for both parameters, and > provides a hint on top of the original error message. The SSL context > does not get reloaded if there is an error, so the errors from OpenSSL > cannot be triggered as far as I checked (after mixing a couple of > corrent and incorrect combinations manually). This is pretty much exactly the patch I was intending to write for this, so +1 from me. cheers ./daniel
On Wed, Jan 15, 2020 at 06:34:39PM +0100, Daniel Gustafsson wrote: > This is pretty much exactly the patch I was intending to write for this, so +1 > from me. Thanks for the review. Let's wait a couple of days to see if others have objections or more comments about this patch, but I'd like to fix the issue and backpatch down to 12 where the parameters have been introduced. -- Michael
Attachment
On Thu, Jan 16, 2020 at 10:00:52AM +0900, Michael Paquier wrote: > Thanks for the review. Let's wait a couple of days to see if others > have objections or more comments about this patch, but I'd like to > fix the issue and backpatch down to 12 where the parameters have been > introduced. And committed. -- Michael
Attachment
On 2020-01-15 03:28, Michael Paquier wrote: > Good points. And the get routines are not that portable in OpenSSL > either even if HEAD supports 1.0.1 and newer versions... Attached is > an updated patch which uses a GUC check for both parameters, and > provides a hint on top of the original error message. The SSL context > does not get reloaded if there is an error, so the errors from OpenSSL > cannot be triggered as far as I checked (after mixing a couple of > corrent and incorrect combinations manually). The reason this wasn't done originally is that it is not correct to have GUC check hooks that refer to other GUC variables, because otherwise you get inconsistent behavior depending on the order of processing of the assignments. In this case, I think it would work because you have symmetric checks for both variables, but in general it is a problematic strategy. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Jan 16, 2020 at 10:00:52AM +0900, Michael Paquier wrote: >> Thanks for the review. Let's wait a couple of days to see if others >> have objections or more comments about this patch, but I'd like to >> fix the issue and backpatch down to 12 where the parameters have been >> introduced. > And committed. I just happened to look at this patch while working on the release notes. I think this is a bad idea and very probably creates worse problems than it fixes. As we have learned painfully in the past, you can't have GUC check or assign hooks that look at other GUC variables, because that creates order-of-operations problems. If a postgresql.conf update is trying to change both values (hardly an unlikely scenario, for this pair of variables) then the checks are going to be comparing against the old values of the other variables, leading to either incorrect rejections of valid states or incorrect acceptances of invalid states. It's pure accident that the particular cases tested in the regression tests behave sanely. I think this should be reverted. Perhaps there's a way to do it without these problems, but we failed to find one in the past. regards, tom lane
> On 6 Feb 2020, at 20:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think this should be reverted. Perhaps there's a way to do it without > these problems, but we failed to find one in the past. Or change to the v1 patch in this thread, which avoids the problem by doing it in the OpenSSL code. It's a shame to have generic TLS functionality be OpenSSL specific when everything else TLS has been abstracted, but not working is clearly a worse option. cheers ./daniel
On Thu, Feb 06, 2020 at 11:30:40PM +0100, Daniel Gustafsson wrote: > Or change to the v1 patch in this thread, which avoids the problem by doing it > in the OpenSSL code. It's a shame to have generic TLS functionality be OpenSSL > specific when everything else TLS has been abstracted, but not working is > clearly a worse option. The v1 would work just fine considering that, as the code would be invoked in a context where all GUCs are already loaded. That's too late before the release though, so I have reverted 41aadee, and attached is a new patch to consider with improvements compared to v1 mainly in the error messages. -- Michael
Attachment
> On 7 Feb 2020, at 01:33, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Feb 06, 2020 at 11:30:40PM +0100, Daniel Gustafsson wrote: >> Or change to the v1 patch in this thread, which avoids the problem by doing it >> in the OpenSSL code. It's a shame to have generic TLS functionality be OpenSSL >> specific when everything else TLS has been abstracted, but not working is >> clearly a worse option. > > The v1 would work just fine considering that, as the code would be > invoked in a context where all GUCs are already loaded. That's too > late before the release though, so I have reverted 41aadee, and > attached is a new patch to consider with improvements compared to v1 > mainly in the error messages. Having gone back to look at this, I can't think of a better way to implement this and I think we should go ahead with the proposed patch. In this message we aren't quoting the TLS protocol setting: + (errmsg("%s setting %s not supported by this build", ..but in this detail we are: + errdetail("\"%s\" cannot be higher than \"%s\"", Perhaps we should be consistent across all ereports? Marking as ready for committer. cheers ./daniel
On Thu, Mar 19, 2020 at 10:54:35PM +0100, Daniel Gustafsson wrote: > In this message we aren't quoting the TLS protocol setting: > + (errmsg("%s setting %s not supported by this build", > ..but in this detail we are: > + errdetail("\"%s\" cannot be higher than \"%s\"", > Perhaps we should be consistent across all ereports? Right. Using quotes is a more popular style when it comes to GUC parameters and their values, so switched to use that, and committed the patch. Thanks for the review. -- Michael
Attachment
Working in the TLS corners of the backend, I found while re-reviewing and re-testing for the release that this patch actually was a small, but vital, brick shy of a load. The error handling is always invoked due to a set of missing braces. Going into the check will cause the context to be freed and be_tls_open_server error out. The tests added narrowly escapes it by not setting the max version in the final test, but I'm not sure it's worth changing that now as not setting a value is an interesting testcase too. Sorry for missing that at the time of reviewing. cheers ./daniel
Attachment
On Wed, Apr 29, 2020 at 01:57:49PM +0200, Daniel Gustafsson wrote: > Working in the TLS corners of the backend, I found while re-reviewing and > re-testing for the release that this patch actually was a small, but vital, > brick shy of a load. The error handling is always invoked due to a set of > missing braces. Going into the check will cause the context to be freed and > be_tls_open_server error out. The tests added narrowly escapes it by not > setting the max version in the final test, but I'm not sure it's worth changing > that now as not setting a value is an interesting testcase too. Sorry for > missing that at the time of reviewing. Good catch, fixed. We would still have keep around the SSL old context if both bounds were set. Testing this case would mean one extra full restart of the server, and I am not sure either if that's worth the extra cost here. -- Michael
Attachment
> On 30 Apr 2020, at 01:14, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Apr 29, 2020 at 01:57:49PM +0200, Daniel Gustafsson wrote: >> Working in the TLS corners of the backend, I found while re-reviewing and >> re-testing for the release that this patch actually was a small, but vital, >> brick shy of a load. The error handling is always invoked due to a set of >> missing braces. Going into the check will cause the context to be freed and >> be_tls_open_server error out. The tests added narrowly escapes it by not >> setting the max version in the final test, but I'm not sure it's worth changing >> that now as not setting a value is an interesting testcase too. Sorry for >> missing that at the time of reviewing. > > Good catch, fixed. We would still have keep around the SSL old > context if both bounds were set. Testing this case would mean one > extra full restart of the server, and I am not sure either if that's > worth the extra cost here. Agreed. I don't think the cost is warranted given the low probability of new errors around here, so I think the commit as it stands is sufficient. Thanks. cheers ./daniel