Thread: Improve errors when setting incorrect bounds for SSL protocols

Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

14 January 2020, 06:54:20

Hi all,
(Daniel G. in CC.)

As discussed on the thread to be able to set the min/max SSL protocols
with libpq, when mixing incorrect bounds the user experience is not
that good:
https://www.postgresql.org/message-id/9CFA34EE-F670-419D-B92C-CB7943A27573@yesql.se

It happens that the error generated with incorrect combinations
depends solely on what OpenSSL thinks is fine, and that's the
following:
psql: error: could not connect to server: SSL error: tlsv1 alert
internal error

It is hard for users to understand what such an error means and how to
act on it.

Please note that OpenSSL 1.1.0 has added two routines to be able to
get the min/max protocols set in a context, called
SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
OpenSSL I think that it is better to use
ssl_protocol_version_to_openssl to do the parsing work.  I also found
that it is easier to check for compatible versions after setting both
bounds in the SSL context, so as there is no need to worry about
invalid values depending on the build of OpenSSL used.

So attached is a patch to improve the detection of incorrect
combinations.  Once applied, we get a complain about an incorrect
version at server startup (FATAL) or reload (LOG).  The patch includes
new regression tests.

Thoughts?
--
Michael

Attachment

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Daniel Gustafsson

Date:

14 January 2020, 13:21:53

> On 14 Jan 2020, at 04:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
> (Daniel G. in CC.)
>
> As discussed on the thread to be able to set the min/max SSL protocols
> with libpq, when mixing incorrect bounds the user experience is not
> that good:
> https://www.postgresql.org/message-id/9CFA34EE-F670-419D-B92C-CB7943A27573@yesql.se
>
> It happens that the error generated with incorrect combinations
> depends solely on what OpenSSL thinks is fine, and that's the
> following:
> psql: error: could not connect to server: SSL error: tlsv1 alert
> internal error
>
> It is hard for users to understand what such an error means and how to
> act on it.

Correct, it's an easy mistake to make but based on the error it might take some
time to figure it out.

> Please note that OpenSSL 1.1.0 has added two routines to be able to
> get the min/max protocols set in a context, called
> SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
> OpenSSL I think that it is better to use
> ssl_protocol_version_to_openssl to do the parsing work.  I also found
> that it is easier to check for compatible versions after setting both
> bounds in the SSL context, so as there is no need to worry about
> invalid values depending on the build of OpenSSL used.

I'm not convinced that it's a good idea to check for incompatible protocol
range in the OpenSSL backend.  We've spent a lot of energy to make the TLS code
library agnostic and pluggable, and since identifying a basic configuration
error isn't OpenSSL specific I think it should be in the guc code.  That would
keep the layering as well as ensure that we don't mistakenly treat this
differently should we get a second TLS backend.

cheers ./daniel

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

15 January 2020, 05:28:05

On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote:
> On 14 Jan 2020, at 04:54, Michael Paquier <michael@paquier.xyz> wrote:
>> Please note that OpenSSL 1.1.0 has added two routines to be able to
>> get the min/max protocols set in a context, called
>> SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
>> OpenSSL I think that it is better to use
>> ssl_protocol_version_to_openssl to do the parsing work.  I also found
>> that it is easier to check for compatible versions after setting both
>> bounds in the SSL context, so as there is no need to worry about
>> invalid values depending on the build of OpenSSL used.
>
> I'm not convinced that it's a good idea to check for incompatible protocol
> range in the OpenSSL backend.  We've spent a lot of energy to make the TLS code
> library agnostic and pluggable, and since identifying a basic configuration
> error isn't OpenSSL specific I think it should be in the guc code.  That would
> keep the layering as well as ensure that we don't mistakenly treat this
> differently should we get a second TLS backend.

Good points.  And the get routines are not that portable in OpenSSL
either even if HEAD supports 1.0.1 and newer versions...  Attached is
an updated patch which uses a GUC check for both parameters, and
provides a hint on top of the original error message.  The SSL context
does not get reloaded if there is an error, so the errors from OpenSSL
cannot be triggered as far as I checked (after mixing a couple of
corrent and incorrect combinations manually).
--
Michael

Attachment

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Daniel Gustafsson

Date:

15 January 2020, 20:34:39

> On 15 Jan 2020, at 03:28, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote:
>> On 14 Jan 2020, at 04:54, Michael Paquier <michael@paquier.xyz> wrote:
>>> Please note that OpenSSL 1.1.0 has added two routines to be able to
>>> get the min/max protocols set in a context, called
>>> SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
>>> OpenSSL I think that it is better to use
>>> ssl_protocol_version_to_openssl to do the parsing work.  I also found
>>> that it is easier to check for compatible versions after setting both
>>> bounds in the SSL context, so as there is no need to worry about
>>> invalid values depending on the build of OpenSSL used.
>>
>> I'm not convinced that it's a good idea to check for incompatible protocol
>> range in the OpenSSL backend.  We've spent a lot of energy to make the TLS code
>> library agnostic and pluggable, and since identifying a basic configuration
>> error isn't OpenSSL specific I think it should be in the guc code.  That would
>> keep the layering as well as ensure that we don't mistakenly treat this
>> differently should we get a second TLS backend.
>
> Good points.  And the get routines are not that portable in OpenSSL
> either even if HEAD supports 1.0.1 and newer versions...  Attached is
> an updated patch which uses a GUC check for both parameters, and
> provides a hint on top of the original error message.  The SSL context
> does not get reloaded if there is an error, so the errors from OpenSSL
> cannot be triggered as far as I checked (after mixing a couple of
> corrent and incorrect combinations manually).

This is pretty much exactly the patch I was intending to write for this, so +1
from me.

cheers ./daniel

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

16 January 2020, 04:00:52

On Wed, Jan 15, 2020 at 06:34:39PM +0100, Daniel Gustafsson wrote:
> This is pretty much exactly the patch I was intending to write for this, so +1
> from me.

Thanks for the review.  Let's wait a couple of days to see if others
have objections or more comments about this patch, but I'd like to
fix the issue and backpatch down to 12 where the parameters have been
introduced.
--
Michael

Attachment

signature.asc

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

18 January 2020, 06:36:20

On Thu, Jan 16, 2020 at 10:00:52AM +0900, Michael Paquier wrote:
> Thanks for the review.  Let's wait a couple of days to see if others
> have objections or more comments about this patch, but I'd like to
> fix the issue and backpatch down to 12 where the parameters have been
> introduced.

And committed.
--
Michael

Attachment

signature.asc

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Peter Eisentraut

Date:

20 January 2020, 11:11:30

On 2020-01-15 03:28, Michael Paquier wrote:
> Good points.  And the get routines are not that portable in OpenSSL
> either even if HEAD supports 1.0.1 and newer versions...  Attached is
> an updated patch which uses a GUC check for both parameters, and
> provides a hint on top of the original error message.  The SSL context
> does not get reloaded if there is an error, so the errors from OpenSSL
> cannot be triggered as far as I checked (after mixing a couple of
> corrent and incorrect combinations manually).

The reason this wasn't done originally is that it is not correct to have 
GUC check hooks that refer to other GUC variables, because otherwise you 
get inconsistent behavior depending on the order of processing of the 
assignments.  In this case, I think it would work because you have 
symmetric checks for both variables, but in general it is a problematic 
strategy.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Tom Lane

Date:

06 February 2020, 22:04:53

Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Jan 16, 2020 at 10:00:52AM +0900, Michael Paquier wrote:
>> Thanks for the review.  Let's wait a couple of days to see if others
>> have objections or more comments about this patch, but I'd like to
>> fix the issue and backpatch down to 12 where the parameters have been
>> introduced.

> And committed.

I just happened to look at this patch while working on the release notes.
I think this is a bad idea and very probably creates worse problems than
it fixes.  As we have learned painfully in the past, you can't have GUC
check or assign hooks that look at other GUC variables, because that
creates order-of-operations problems.  If a postgresql.conf update is
trying to change both values (hardly an unlikely scenario, for this
pair of variables) then the checks are going to be comparing against the
old values of the other variables, leading to either incorrect rejections
of valid states or incorrect acceptances of invalid states.  It's pure
accident that the particular cases tested in the regression tests behave
sanely.

I think this should be reverted.  Perhaps there's a way to do it without
these problems, but we failed to find one in the past.

            regards, tom lane

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Daniel Gustafsson

Date:

07 February 2020, 01:30:40

> On 6 Feb 2020, at 20:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think this should be reverted.  Perhaps there's a way to do it without
> these problems, but we failed to find one in the past.

Or change to the v1 patch in this thread, which avoids the problem by doing it
in the OpenSSL code.  It's a shame to have generic TLS functionality be OpenSSL
specific when everything else TLS has been abstracted, but not working is
clearly a worse option.

cheers ./daniel

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

07 February 2020, 03:33:35

On Thu, Feb 06, 2020 at 11:30:40PM +0100, Daniel Gustafsson wrote:
> Or change to the v1 patch in this thread, which avoids the problem by doing it
> in the OpenSSL code.  It's a shame to have generic TLS functionality be OpenSSL
> specific when everything else TLS has been abstracted, but not working is
> clearly a worse option.

The v1 would work just fine considering that, as the code would be
invoked in a context where all GUCs are already loaded.  That's too
late before the release though, so I have reverted 41aadee, and
attached is a new patch to consider with improvements compared to v1
mainly in the error messages.
--
Michael

Attachment

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Daniel Gustafsson

Date:

20 March 2020, 00:54:35

> On 7 Feb 2020, at 01:33, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Feb 06, 2020 at 11:30:40PM +0100, Daniel Gustafsson wrote:
>> Or change to the v1 patch in this thread, which avoids the problem by doing it
>> in the OpenSSL code.  It's a shame to have generic TLS functionality be OpenSSL
>> specific when everything else TLS has been abstracted, but not working is
>> clearly a worse option.
>
> The v1 would work just fine considering that, as the code would be
> invoked in a context where all GUCs are already loaded.  That's too
> late before the release though, so I have reverted 41aadee, and
> attached is a new patch to consider with improvements compared to v1
> mainly in the error messages.

Having gone back to look at this, I can't think of a better way to implement
this and I think we should go ahead with the proposed patch.

In this message we aren't quoting the TLS protocol setting:
+          (errmsg("%s setting %s not supported by this build",
..but in this detail we are:
+           errdetail("\"%s\" cannot be higher than \"%s\"",
Perhaps we should be consistent across all ereports?

Marking as ready for committer.

cheers ./daniel

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

23 March 2020, 05:06:45

On Thu, Mar 19, 2020 at 10:54:35PM +0100, Daniel Gustafsson wrote:
> In this message we aren't quoting the TLS protocol setting:
> +          (errmsg("%s setting %s not supported by this build",
> ..but in this detail we are:
> +           errdetail("\"%s\" cannot be higher than \"%s\"",
> Perhaps we should be consistent across all ereports?

Right.  Using quotes is a more popular style when it comes to GUC
parameters and their values, so switched to use that, and committed
the patch.  Thanks for the review.
--
Michael

Attachment

signature.asc

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Daniel Gustafsson

Date:

29 April 2020, 14:57:49

Working in the TLS corners of the backend, I found while re-reviewing and
re-testing for the release that this patch actually was a small, but vital,
brick shy of a load.  The error handling is always invoked due to a set of
missing braces.  Going into the check will cause the context to be freed and
be_tls_open_server error out.  The tests added narrowly escapes it by not
setting the max version in the final test, but I'm not sure it's worth changing
that now as not setting a value is an interesting testcase too.  Sorry for
missing that at the time of reviewing.

cheers ./daniel

Attachment

minmaxproto_guc.patch

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Michael Paquier

Date:

30 April 2020, 02:14:55

On Wed, Apr 29, 2020 at 01:57:49PM +0200, Daniel Gustafsson wrote:
> Working in the TLS corners of the backend, I found while re-reviewing and
> re-testing for the release that this patch actually was a small, but vital,
> brick shy of a load.  The error handling is always invoked due to a set of
> missing braces.  Going into the check will cause the context to be freed and
> be_tls_open_server error out.  The tests added narrowly escapes it by not
> setting the max version in the final test, but I'm not sure it's worth changing
> that now as not setting a value is an interesting testcase too.  Sorry for
> missing that at the time of reviewing.

Good catch, fixed.  We would still have keep around the SSL old
context if both bounds were set.  Testing this case would mean one
extra full restart of the server, and I am not sure either if that's
worth the extra cost here.
--
Michael

Attachment

signature.asc

Re: Improve errors when setting incorrect bounds for SSL protocols

From

Daniel Gustafsson

Date:

30 April 2020, 10:51:06

> On 30 Apr 2020, at 01:14, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Apr 29, 2020 at 01:57:49PM +0200, Daniel Gustafsson wrote:
>> Working in the TLS corners of the backend, I found while re-reviewing and
>> re-testing for the release that this patch actually was a small, but vital,
>> brick shy of a load.  The error handling is always invoked due to a set of
>> missing braces.  Going into the check will cause the context to be freed and
>> be_tls_open_server error out.  The tests added narrowly escapes it by not
>> setting the max version in the final test, but I'm not sure it's worth changing
>> that now as not setting a value is an interesting testcase too.  Sorry for
>> missing that at the time of reviewing.
>
> Good catch, fixed.  We would still have keep around the SSL old
> context if both bounds were set.  Testing this case would mean one
> extra full restart of the server, and I am not sure either if that's
> worth the extra cost here.

Agreed.  I don't think the cost is warranted given the low probability of new
errors around here, so I think the commit as it stands is sufficient.  Thanks.

cheers ./daniel