Re: Support for NSS as a libpq TLS backend - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Support for NSS as a libpq TLS backend
Date
Msg-id 051B5E92-066F-4079-8C40-01009CFEBC4B@yesql.se
Whole thread Raw
In response to Re: Support for NSS as a libpq TLS backend  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Support for NSS as a libpq TLS backend  (Jacob Champion <pchampion@vmware.com>)
List pgsql-hackers
> On 23 Mar 2021, at 20:04, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Daniel Gustafsson (daniel@yesql.se) wrote:
>>> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote:
>>
>> Thanks for the review!  Below is a partial response, I haven't had time to
>> address all your review comments yet but I wanted to submit a rebased patchset
>> directly since the current version doesn't work after recent changes in the
>> tree. I will address the remaining comments tomorrow or the day after.
>
> Great, thanks!
>
>> This rebase also includes a fix for pgtls_init which was sent offlist by Jacob.
>> The changes in pgtls_init can potentially be used to initialize the crypto
>> context for NSS to clean up this patch, Jacob is currently looking at that.
>
> Ah, cool, sounds good.
>
>>> They aren't the same and it might not be
>>> clear what's going on if one was to somehow mix them (at least if pr_fd
>>> continues to sometimes be a void*, but I wonder why that's being
>>> done..?  more on that later..).
>>
>> To paraphrase from a later in this email, there are collisions between nspr and
>> postgres on things like BITS_PER_BYTE, and there were also collisions on basic
>> types until I learned about NO_NSPR_10_SUPPORT.  By moving the juggling of this
>> into common/nss.h we can use proper types without introducing that pollution
>> everywhere. I will address these places.
>
> Ah, ok, and great, that sounds good.

>>>> +    /*
>>>> +     * Set the fallback versions for the TLS protocol version range to a
>>>> +     * combination of our minimal requirement and the library maximum. Error
>>>> +     * messages should be kept identical to those in be-secure-openssl.c to
>>>> +     * make translations easier.
>>>> +     */
>>>
>>> Should we pull these error messages out into another header so that
>>> they're in one place to make sure they're kept consistent, if we really
>>> want to put the effort in to keep them the same..?  I'm not 100% sure
>>> that it's actually necessary to do so, but defining these in one place
>>> would help maintain this if we want to.  Also alright with just keeping
>>> the comment, not that big of a deal.
>>
>> It might make sense to pull them into common/nss.h, but seeing the error
>> message right there when reading the code does IMO make it clearer so it's a
>> doubleedged sword.  Not sure what is the best option, but I'm not married to
>> the current solution so if there is consensus to pull them out somewhere I'm
>> happy to do so.
>
> My thought was to put them into some common/ssl.h or something along
> those lines but I don't see it as a big deal either way really.  You
> make a good point that having the error message there when reading the
> code is nice.

Thinking more on this, I think my vote will be to keep them duplicated in the
code for readability.  Unless there are strong feelings against I think we at
least should start there.

>>> Maybe we should put a stake in the ground that says "we only support
>>> back to version X of NSS", test with that and a few more recent versions
>>> and the most recent, and then rip out anything that's needed for
>>> versions which are older than that?
>>
>> Yes, right now there is very little in the patch which caters for old versions,
>> the PR_Init call might be one of the few offenders.  There has been discussion
>> upthread about settling for a required version, combining the insights learned
>> there with a survey of which versions are commonly available packaged.
>>
>> Once we settle on a version we can confirm if PR_Init is/isn't needed and
>> remove all traces of it if not.
>
> I don't really see this as all that hard to do- I'd suggest we look at
> what systems someone might reasonably deploy v14 on.  To that end, I'd
> say "only systems which are presently supported", so: RHEL7+, Debian 9+,
> Ubuntu 16.04+.

Sounds reasonable.

> Looking at those, I see:
>
> Ubuntu 16.04: 3.28.4
> RHEL6: v3.28.4
> Debian: 3.26.2

I assume these have matching NSPR versions placing the Debian 9 NSPR package as
the lowest required version for that?

>>> I have a pretty hard time imagining that someone is going to want to build PG
>>> v14 w/ NSS 2.0 ...
>>
>> Let alone compiling 2.0 at all on a recent system..
>
> Indeed, and given the above, it seems entirely reasonable to make the
> requirement be NSS v3+, no?  I wouldn't be against making that even
> tighter if we thought it made sense to do so.

I think anything but doing that would be incredibly unreasonable.

>>> Also- we don't seem to complain at all about a cipher being specified that we
>>> don't find?  Guess I would think that we might want to throw a WARNING in such
>>> a case, but I could possibly be convinced otherwise.
>>
>> No, I think you're right, we should throw WARNING there or possibly even a
>> higher elevel. Should that be a COMMERROR even?
>
> I suppose the thought I was having was that we might want to allow some
> string that covered all the OpenSSL and NSS ciphers that someone feels
> comfortable with and we'd just ignore the ones that don't make sense for
> the particular library we're currently built with.  Making it a
> COMMERROR seems like overkill and I'm not entirely sure we actually want
> any warning since we might then be constantly bleating about it.

Right, with a string like that we'd induce WARNING fatigue quickly.  Catching
the case of *no* ciphers enabled with a COMMERROR is going some way towards
being helpful to the user in debugging the failed connection here.

>>>> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags,
>>>> +            PRIntervalTime timeout)
>>>> +{
>>>> +    PRRecvFN    read_fn;
>>>> +    PRInt32        n_read;
>>>> +
>>>> +    read_fn = fd->lower->methods->recv;
>>>> +    n_read = read_fn(fd->lower, buf, amount, flags, timeout);
>>>> +
>>>> +    return n_read;
>>>> +}
>>>> +
>>>> +static PRInt32
>>>> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags,
>>>> +             PRIntervalTime timeout)
>>>> +{
>>>> +    PRSendFN    send_fn;
>>>> +    PRInt32        n_write;
>>>> +
>>>> +    send_fn = fd->lower->methods->send;
>>>> +    n_write = send_fn(fd->lower, buf, amount, flags, timeout);
>>>> +
>>>> +    return n_write;
>>>> +}
>>>> +
>>>> +static PRStatus
>>>> +pg_ssl_close(PRFileDesc *fd)
>>>> +{
>>>> +    /*
>>>> +     * Disconnect our private Port from the fd before closing out the stack.
>>>> +     * (Debug builds of NSPR will assert if we do not.)
>>>> +     */
>>>> +    fd->secret = NULL;
>>>> +    return PR_GetDefaultIOMethods()->close(fd);
>>>> +}
>>>
>>> Regarding these, I find myself wondering how they're different from the
>>> defaults..?  I mean, the above just directly called
>>> PR_GetDefaultIOMethods() to then call it's close() function- are the
>>> fd->lower_methods->recv/send not the default methods?  I don't quite get
>>> what the point is from having our own callbacks here if they just do
>>> exactly what the defaults would do (or are there actually no defined
>>> defaults and you have to provide these..?).
>>
>> It's really just to cope with debug builds of NSPR which assert that fd->secret
>> is null before closing.
>
> And we have to override the recv/send functions for this too..?  Sorry,
> my comment wasn't just about the close() method but about the others
> too.

Ah, no we can ditch the .send and .recv functions and stick with the default
built-ins, I just confirmed this and removed them.  I think they are leftovers
from when I injected debug code there during development, they were as you say
copies of the default.

>>>> +    /*
>>>> +     * Return the underlying PRFileDesc which can be used to access
>>>> +     * information on the connection details. There is no SSL context per se.
>>>> +     */
>>>> +    if (strcmp(struct_name, "NSS") == 0)
>>>> +        return conn->pr_fd;
>>>> +    return NULL;
>>>> +}
>>>
>>> Is there never a reason someone might want the pointer returned by
>>> NSS_InitContext?  I don't know that there is but it might be something
>>> to consider (we could even possibly have our own structure returned by
>>> this function which includes both, maybe..?).  Not sure if there's a
>>> sensible use-case for that or not just wanted to bring it up as it's
>>> something I asked myself while reading through this patch.
>>
>> Not sure I understand what you're asking for here, did you mean "is there ever
>> a reason"?
>
> Eh, poor wording on my part.  You're right, the question, reworded
> again, was "Would someone want to get the context returned by
> NSS_InitContext?".  If we think there's a reason that someone might want
> that context then perhaps we should allow getting it, in addition to the
> pr_fd.  If there's really no reason to ever want the context from
> NSS_InitContext then what you have here where we're returning pr_fd is
> probably fine.

I can't think of any reason, maybe Jacob who has been knee-deep in NSS contexts
have insights which tell a different story?

>>>> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
>>>> index c601071838..7f10da3010 100644
>>>> --- a/src/interfaces/libpq/fe-secure.c
>>>> +++ b/src/interfaces/libpq/fe-secure.c
>>>> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn)
>>>> }
>>>> #endif                            /* USE_OPENSSL */
>>>>
>>>> +#ifndef USE_NSS
>>>> +
>>>> +PQsslKeyPassHook_nss_type
>>>> +PQgetSSLKeyPassHook_nss(void)
>>>> +{
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +void
>>>> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook)
>>>> +{
>>>> +    return;
>>>> +}
>>>> +
>>>> +char *
>>>> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg)
>>>> +{
>>>> +    return NULL;
>>>> +}
>>>> +#endif                            /* USE_NSS */
>>>
>>> Isn't this '!USE_NSS'?
>>
>> Technically it is, but using just /* USE_NSS */ is consistent with the rest of
>> blocks in the file.
>
> Hrmpf.  I guess it seems a bit confusing to me to have to go find the
> opening #ifndef to realize that it's actally !USE_NSS..  In other words,
> I would think we'd actually want to fix all of these, heh.  I only
> actually see one case on a quick grep where it's wrong for USE_OPENSSL
> and so that doesn't seem like it's really a precedent and is more of a
> bug.  We certainly say 'not OPENSSL' in one place today too and also
> have a number of places where we have: #endif ... /* ! WHATEVER */.

No disagreement from me.  To cut down the size of this patchset however I
propose that we tackle this separately and leave this as is in this thread
since it's in line with the rest of the file (for now).

>>>> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library
>>>>
>>>> The SSL testharness was fully tied to OpenSSL in the way the server was
>>>> set up and reconfigured. This refactors the SSLServer module into a SSL
>>>> library agnostic SSL/Server module which in turn use SSL/Backend/<lib>
>>>> modules for the implementation details.
>>>>
>>>> No changes are done to the actual tests, this only change how setup and
>>>> teardown is performed.
>>>
>>> Presumably this could be committed ahead of the main NSS support?
>>
>> Correct, I think this has merits even if NSS support is ultimately rejected.
>
> Ok- could you break it out on to its own thread and I'll see about
> committing it soonish, to get it out of the way?

It was already on it's own thread, as we discussed offlist.  I have since
rebased and expanded that patch over in that thread which has gotten review
that needs to be addressed.  As such, I will not update that patch in the
series in this thread but keep the changes on that thread, and then pull them
back into here when ready.

>>>> Subject: [PATCH v30 5/9] nss: Documentation
>>>> +++ b/doc/src/sgml/acronyms.sgml
>>>> @@ -684,6 +717,16 @@
>>>>    </listitem>
>>>>   </varlistentry>
>>>>
>>>> +   <varlistentry>
>>>> +    <term><acronym>TLS</acronym></term>
>>>> +    <listitem>
>>>> +     <para>
>>>> +      <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security">
>>>> +      Transport Layer Security</ulink>
>>>> +     </para>
>>>> +    </listitem>
>>>> +   </varlistentry>
>>>
>>> We don't have this already..?  Surely we should..
>>
>> We really should, especially since we've had <acronym>TLS</acronym> in
>> config.sgml since 2014 (c6763156589).  That's another small piece that could be
>> committed on it's own to cut down the size of this patchset (even if only by a
>> tiny amount).
>
> Ditto on this. :)

Done in https://postgr.es/m/27109504-82DB-41A8-8E63-C0498314F5B0@yesql.se

>>>> @@ -1288,7 +1305,9 @@ include_dir 'conf.d'
>>>>        connections using TLS version 1.2 and lower are affected.  There is
>>>>        currently no setting that controls the cipher choices used by TLS
>>>>        version 1.3 connections.  The default value is
>>>> -        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>.  The default is usually a
>>>> +        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have
>>>> +        been built with <productname>OpenSSL</productname> as the
>>>> +        <acronym>SSL</acronym> library.  The default is usually a
>>>>        reasonable choice unless you have specific security requirements.
>>>>       </para>
>>>
>>> Shouldn't we say something here wrt NSS?
>>
>> We should, but I'm not entirely what just yet. Need to revisit that.
>
> Not sure if we really want to do this but at least with ssllabs.com,
> postgresql.org gets an 'A' rating with this set:
>
> ECDHE-ECDSA-CHACHA20-POLY1305
> ECDHE-RSA-CHACHA20-POLY1305
> ECDHE-ECDSA-AES128-GCM-SHA256
> ECDHE-RSA-AES128-GCM-SHA256
> ECDHE-ECDSA-AES256-GCM-SHA384
> ECDHE-RSA-AES256-GCM-SHA384
> DHE-RSA-AES128-GCM-SHA256
> DHE-RSA-AES256-GCM-SHA384
> ECDHE-ECDSA-AES128-SHA256
> ECDHE-RSA-AES128-SHA256
> ECDHE-ECDSA-AES128-SHA
> ECDHE-RSA-AES256-SHA384
> ECDHE-RSA-AES128-SHA
> ECDHE-ECDSA-AES256-SHA384
> ECDHE-ECDSA-AES256-SHA
> ECDHE-RSA-AES256-SHA
> DHE-RSA-AES128-SHA256
> DHE-RSA-AES128-SHA
> DHE-RSA-AES256-SHA256
> DHE-RSA-AES256-SHA
> ECDHE-ECDSA-DES-CBC3-SHA
> ECDHE-RSA-DES-CBC3-SHA
> EDH-RSA-DES-CBC3-SHA
> AES128-GCM-SHA256
> AES256-GCM-SHA384
> AES128-SHA256
> AES256-SHA256
> AES128-SHA
> AES256-SHA
> DES-CBC3-SHA
> !DSS
>
> Which also seems kinda close to what the default when built with OpenSSL
> ends up being?  Thought the ssllabs report does list which ones it
> thinks are weak and so we might consider excluding those by default too:
>
> https://www.ssllabs.com/ssltest/analyze.html?d=postgresql.org&s=2a02%3a16a8%3adc51%3a0%3a0%3a0%3a0%3a50

Agreed, maintaining parity (or thereabouts) with OpenSSL defaults taking
industry best practices into account is probably what we should aim for.

>>>> @@ -1490,8 +1509,11 @@ include_dir 'conf.d'
>>>>       <para>
>>>>        Sets an external command to be invoked when a passphrase for
>>>>        decrypting an SSL file such as a private key needs to be obtained.  By
>>>> -        default, this parameter is empty, which means the built-in prompting
>>>> -        mechanism is used.
>>>> +        default, this parameter is empty. When the server is using
>>>> +        <productname>OpenSSL</productname>, this means the built-in prompting
>>>> +        mechanism is used. When using <productname>NSS</productname>, there is
>>>> +        no default prompting so a blank callback will be used returning an
>>>> +        empty password.
>>>>       </para>
>>>
>>> Maybe we should point out here that this requires the database to not
>>> require a password..?  So if they have one, they need to set this, or
>>> maybe we should provide a default one..
>>
>> I've added a sentence on not using a password for the cert database.  I'm not
>> sure if providing a default one is a good idea but it's no less insecure than
>> having no password really..
>
> I was meaning a default callback to prompt, not sure if that was clear.

Ah, no that's not what I thought you meant.  Do you have any thoughts on what
that callback would look like? Take a password on a TTY input?

Below are a few fixes addressed from the original review email:

>>> +    /*
>>> +     * Set up the custom IO layer.
>>> +     */
>>
>> Might be good to mention that the IO Layer is what sets up the
>> read/write callbacks to be used.
>
> Good point, will do in the next version of the patchset.

Fixed.

>>> +    port->pr_fd = SSL_ImportFD(model, pr_fd);
>>> +    if (!port->pr_fd)
>>> +    {
>>> +        ereport(COMMERROR,
>>> +                (errmsg("unable to initialize")));
>>> +        return -1;
>>> +    }
>>
>> Maybe a comment and a better error message for this?
>
> Will do.

Fixed.

>>>
>>> +    PR_Close(model);
>>
>> This might deserve one also, the whole 'model' construct is a bit
>> different. :)
>
> Agreed. will do.


Fixed.

>> Also, I get that they do similar jobs and that one is in the frontend
>> and the other is in the backend, but I'm not a fan of having two
>> 'ssl_protocol_version_to_nss()'s functions that take different argument
>> types but have exact same name and do functionally different things..
>
> Good point, I'll change that.

Fixed.

>>> +    /*
>>> +     * Configure cipher policy.
>>> +     */
>>> +    status = NSS_SetDomesticPolicy();
>>> +    if (status != SECSuccess)
>>> +    {
>>> +        printfPQExpBuffer(&conn->errorMessage,
>>> +                          libpq_gettext("unable to configure cipher policy: %s"),
>>> +                          pg_SSLerrmessage(PR_GetError()));
>>> +
>>> +        return PGRES_POLLING_FAILED;
>>> +    }
>>
>> Probably good to pull over at least some parts of the comments made in
>> the backend code about SetDomesticPolicy() actually enabling everything
>> (just like all the policies apparently do)...
>
> Good point, will do.

Fixed.

>>> +int
>>> +be_tls_open_server(Port *port)
>>> +{
>>> +    SECStatus    status;
>>> +    PRFileDesc *model;
>>> +    PRFileDesc *pr_fd;
>>
>> pr_fd here is materially different from port->pr_fd, no?  As in, one is
>> the NSS raw TCP fd while the other is the SSL fd, right?  Maybe we
>> should use two different variable names to try and make sure they don't
>> get confused?  Might even set this to NULL after we are done with it
>> too..  Then again, I see later on that when we do the dance with the
>> 'model' PRFileDesc that we just use the same variable- maybe we should
>> do that?  That is, just get rid of this 'pr_fd' and use port->pr_fd
>> always?
>
> Hmm, I think you're right. I will try that for the next patchset version.


>> Similar comments to the backend code- should we just always use
>> conn->pr_fd?  Or should we rename pr_fd to something else?
>
> Renaming is probably not a bad idea, will fix.


Both fixed.

Additionally, a few other off-list reported issues are also fixed in this
version (such as fixing the silly markup doc error and testplan off-by-one etc).

--
Daniel Gustafsson        https://vmware.com/


Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: shared memory stats: high level design decisions: consistency, dropping
Next
From: Jacob Champion
Date:
Subject: Re: Support for NSS as a libpq TLS backend