Thread: Support for NSS as a libpq TLS backend

Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
The attached patch implements NSS (Network Security Services) [0] with the
required NSPR runtime [1] as a TLS backend for PostgreSQL.

While all sslmodes are implemented and work for the most part, the patch is
*not* ready yet but I wanted to show progress early so that anyone interested
in this can help out with testing and maybe even hacking.

Why NSS?  Well.  It shares no lineage with OpenSSL making it not just an
alternative by fork but a 100% alternative.  It's also actively maintained, is
readily available on many platforms where PostgreSQL is popular and has a FIPS
mode which doesn't require an EOL'd library.  And finally, I was asked nicely
with the promise of a free beverage, an incentive as good as any.


Differences with OpenSSL
------------------------
NSS does not use certificates and keys on the filesystem, it instead uses a
certificate database in which all certificates, keys and CRL's are loaded.  A
set of tools are provided to work with the database, like: certutil, crlutil,
pk12util etc.  We could support plain PEM files as well, and load them into a
database ourselves but so far I've opted for just using what is already in the
database.

This does mean that new GUCs are needed to identify the database.  I've mostly
repurposed the existing ones for cert/key/crl, but had to invent a new one for
the database.  Maybe there should be an entirely new set?  This needs to be
discussed with not only NSS in mind but for additional as-of-yet unknown
backends we might get (SChannel comes to mind).

NSS also supports partial chain validation per default (as do many other TLS
libraries) where OpenSSL does not.  I haven't done anything about that just
yet, thus there is a failing test as a reminder to address it.

The documentation of NSS/NSPR is unfortunately quite poor and often times
outdated or simply nonexisting.  Cloning the repo and reading the source code
is the only option for parts of the API.

Featurewise there might be other things we can make use of in NSS which doesn't
exist in OpenSSL, but for now I've tried to keep them aligned.


Known Bugs and Limitations (in this version of the patch)
---------------------------------------------------------
The frontend doesn't attempt to verify whether the specified CRL exists in the
database or not.  This can be done with pretty much the same code as in the
backend, except that we don't have the client side certificate loaded so we
either need to read it back from the database, or parse a list of all CRLs
(which would save us from having the cert in local memory which generally is a
good thing to avoid).

pgtls_read is calling PR_Recv which works fine for communicating with an NSS
backend cluster, but hangs waiting for IO when communicating with an OpenSSL
backend cluster.  Using PR_Read reverses the situation.  This is probably a
simple bug but I haven't had time to track it down yet.  The below shifts
between the two for debugging.

-         nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);
+         nread = PR_Read(conn->pr_fd, ptr, len);

Passphrase handling in the backend is broken, more on that under TODO.

There are a few failing tests and a few skipped ones for now, but the majority
of the tests pass.


Testing
-------
In order for the TAP framework to be able to handle backends with different
characteristics I've broken up SSLServer.pm into a set of modules:

    SSL::Server
    SSL::Backend::NSS
    SSL::Backend::OpenSSL

The SSL tests import SSL::Server which in turn imports the appropriate backend
module in order to perform backend specific setup tasks.  The backend used
should be transparent for the TAP code when it comes to switching server certs
etc.

So far I've used foo|bar in the matching regex to provide alternative output,
and SKIP blocks for tests that don't apply.  There might be neater ways to
achieve this, but I was trying to avoid having separate test files for the
different backends.

The certificate databases can be created with a new nssfiles make target in
src/test/ssl, which use the existing files (and also depend on OpenSSL which I
don't think is a problematic dependency for development environments).  To keep
it simple I've named the certificates in the NSS database after the filenames,
this isn't really NSS best-practices but it makes for an easier time reading
the tests IMO.

If this direction is of interest, extracting into to a separate patch for just
setting up the modules and implementing OpenSSL without a new backend is
probably the next step.


TODO
----
This patch is a work in progress, and there is work left to do, below is a dump
of what is left to fix before this can be considered a full implementation for
review. Most of these items have more documentation in the code comments.

* The split between init and open needs to be revisited, especially in frontend
  where we have a bit more freedom. It remains to be seen if we can do better in
  the backend part.

* Documentation, it's currently not even started

* Windows support.  I've hacked mostly using Debian and have tested versions of
  the patch on macOS, but not Windows so far.

* Figure out how to handle cipher configuration.  Getting a set of ciphers that
  result in a useable socket isn't as easy as with OpenSSL, and policies seem
  much more preferred.  At the very least this needs to be solidly documented.

* The rules in src/test/ssl/Makefile for generating certificate databases can
  probably be generalized into a smaller set of rules based on wildcards.

* The password callback on server-side won't be invoked at server start due to
  init happening in be_tls_open, so something needs to be figured out there.
  Maybe attempt to open the database with a throw-away context in init just to
  invoke the password callback?

* Identify code duplicated between frontend and backend and try to generalize.

* Make sure the handling the error codes correctly in the certificate and auth
  callbacks to properly handle self-signed certs etc.

* Tidy up the tests which are partially hardwired for NSS now to make sure
  there are no regressions for OpenSSL.

* All the code using OpenSSL which isn't the libpq communications parts, like
  pgcrypto, strong_random, sha2, SCRAM et.al

* Review language in code comments and run pgindent et.al

* Settle on a minimum required version.  I've been using NSS 3.42 and NSPR 4.20
  simply since they were the packages Debian wanted to install for me, but I'm
  quite convinced that we can go a bit lower (featurewise we can go much lower
  but there are bugfixes in recent versions that we might want to include).
  Anything lower than a version supporting TLSv1.3 seems like an obvious no-no.


I'd be surprised if this is all, but that's at least a start.  There isn't
really a playbook on how to add a new TLS backend, but I'm hoping to be able to
summarize the required bits and pieces in README.SSL once this is a bit closer
to completion.

My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
if anyone has an interest in NSS, then I would love to hear feedback on how it
works (and doesn't work).

The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat
abstraction which IMO leaks backend implementation details.  This needs to go
on it's own thread, but since 0001 fails without it I've included it here for
simplicity sake for now.

cheers ./daniel

[0] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS
[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR



Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote:

> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat
> abstraction which IMO leaks backend implementation details.  This needs to go
> on it's own thread, but since 0001 fails without it I've included it here for
> simplicity sake for now.

The attached 0001 and 0002 are the same patchseries as before, but with the
OpenSSL test module fixed and a rebase on top of the current master.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 25 Jun 2020, at 17:39, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat
>> abstraction which IMO leaks backend implementation details.  This needs to go
>> on it's own thread, but since 0001 fails without it I've included it here for
>> simplicity sake for now.
>
> The attached 0001 and 0002 are the same patchseries as before, but with the
> OpenSSL test module fixed and a rebase on top of the current master.

Another rebase to resolve conflicts with the recent fixes in the SSL tests, as
well as some minor cleanup.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Thomas Munro
Date:
On Fri, Jul 3, 2020 at 11:51 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 25 Jun 2020, at 17:39, Daniel Gustafsson <daniel@yesql.se> wrote:
> >> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote:
> >> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat
> >> abstraction which IMO leaks backend implementation details.  This needs to go
> >> on it's own thread, but since 0001 fails without it I've included it here for
> >> simplicity sake for now.
> >
> > The attached 0001 and 0002 are the same patchseries as before, but with the
> > OpenSSL test module fixed and a rebase on top of the current master.
>
> Another rebase to resolve conflicts with the recent fixes in the SSL tests, as
> well as some minor cleanup.

Hi Daniel,

Thanks for blazing the trail for other implementations to coexist in
the tree.  I see that cURL (another project Daniel works on)
supports a lot of TLS implementations[1].  I recognise 4 other library
names from that table as having appeared on this mailing list as
candidates for PostgreSQL support complete with WIP patches, including
another one from you (Apple Secure Transport).  I don't have strong
views on how many and which libraries we should support, but I was
curious how many packages depend on libss1.1, libgnutls30 and libnss3
in the Debian package repos in my sources.list, and I came up with
OpenSSL = 820, GnuTLS = 342, and NSS = 87.

I guess Solution.pm needs at least USE_NSS => undef for this not to
break the build on Windows.

Obviously cfbot is useless for testing this code, since its build
script does --with-openssl and you need --with-nss, but it still shows
us one thing: with your patch, a --with-openssl build is apparently
broken:

/001_ssltests.pl .. 1/93 Bailout called.  Further testing stopped:
system pg_ctl failed

There are some weird configure-related hunks in the patch:

+  -runstatedir | --runstatedir | --runstatedi | --runstated \
...[more stuff like that]...

-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))

I see the same when I use Debian's autoconf, but not FreeBSD's or
MacPorts', despite all being version 2.69.  That seems to be due to
non-upstreamed changes added by the Debian maintainers (I see the
off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz).
I think you need to build a stock autoconf 2.69 or run autoconf on a
non-Debian system.

I installed libnss3-dev on my Debian box and then configure had
trouble locating and understanding <ssl.h>, until I added
--with-includes=/usr/include/nss:/usr/include/nspr.  I suspect this is
supposed to be done with pkg-config nss --cflags somewhere in
configure (or alternatively nss-config --cflags, nspr-config --cflags,
I don't know, but we're using pkg-config for other stuff).

I installed the Debian package libnss3-tools (for certutil) and then,
in src/test/ssl, I ran make nssfiles (I guess that should be
automatic?), and make check, and I got this far:

Test Summary Report
-------------------
t/001_ssltests.pl (Wstat: 3584 Tests: 93 Failed: 14)
  Failed tests:  14, 16, 18-20, 24, 27-28, 54-55, 78-80
                91
  Non-zero exit status: 14

You mentioned some were failing in this WIP -- are those results you expect?

[1] https://curl.haxx.se/docs/ssl-compared.html



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 10 Jul 2020, at 07:10, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Jul 3, 2020 at 11:51 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>> On 25 Jun 2020, at 17:39, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat
>>>> abstraction which IMO leaks backend implementation details.  This needs to go
>>>> on it's own thread, but since 0001 fails without it I've included it here for
>>>> simplicity sake for now.
>>>
>>> The attached 0001 and 0002 are the same patchseries as before, but with the
>>> OpenSSL test module fixed and a rebase on top of the current master.
>>
>> Another rebase to resolve conflicts with the recent fixes in the SSL tests, as
>> well as some minor cleanup.
>
> Hi Daniel,
>
> Thanks for blazing the trail for other implementations to coexist in
> the tree.  I see that cURL (another project Daniel works on)
> supports a lot of TLS implementations[1].

The list on that URL is also just a selection, the total count is 10 (not
counting OpenSSL forks) IIRC, after axing support for a few lately.  OpenSSL
clearly has a large mindshare but the gist of it is that there exist quite a
few alternatives each with their own strengths.

> I recognise 4 other library
> names from that table as having appeared on this mailing list as
> candidates for PostgreSQL support complete with WIP patches, including
> another one from you (Apple Secure Transport).  I don't have strong
> views on how many and which libraries we should support,

I think it's key to keep in mind *why* it's relevant to provide options in the
first place, after all, as they must be 100% interoperable one can easily argue
for a single one being enough.  We need to to look at what they offer users on
top of just a TLS connection, like: managed certificate storage like for
example macOS Keychains, FIPS certification, good platform availability and/or
OS integration etc.  If all a library offers is "not being OpenSSL" then it's
not clear that we're adding much value by spending the cycles to support it.

My personal opinion is that we should keep it pretty limited, not least to
lessen the burden of testing and during feature development.  Supporting a new
library comes with requirements on both the CFBot as well as the buildfarm, not
to mention on developers who dabble in that area of the code.  The goal should
IMHO be to make it trivial for every postgres installation to use TLS
regardless of platform and experience level with the person installing it.

The situation is a bit different for curl where we have as a goal to provide
enough alternatives such that every platform can have a libcurl/curl more or
less regardless of what it contains.  As a consequence, we have around 80 CI
jobs to test each pull request to provide ample coverage.  Being a kitchen-
sink is really hard work.

> but I was
> curious how many packages depend on libss1.1, libgnutls30 and libnss3
> in the Debian package repos in my sources.list, and I came up with
> OpenSSL = 820, GnuTLS = 342, and NSS = 87.

I don't see a lot of excitement over GnuTLS lately, but Debian shipping it due
to (I believe) licensing concerns with OpenSSL does help it along.  In my
experience, platforms with GnuTLS easily available also have OpenSSL easily
available.

> I guess Solution.pm needs at least USE_NSS => undef for this not to
> break the build on Windows.

Thanks, I'll fix (I admittedly haven't tried this at all on Windows yet).

> Obviously cfbot is useless for testing this code, since its build
> script does --with-openssl and you need --with-nss,

Right, this is a CFBot problem with any patch that require specific autoconf
flags to be excercised.  I wonder if we can make something when we do CF app
integration which can inject flags to a Travis pipeline in a safe manner?

> but it still shows
> us one thing: with your patch, a --with-openssl build is apparently
> broken:
>
> /001_ssltests.pl .. 1/93 Bailout called.  Further testing stopped:
> system pg_ctl failed

Humm ..  I hate to say "it worked on my machine" but it did, but my TLS
environment is hardly standard.  Sorry for posting breakage, most likely this
is a bug in the new test module structure that the patch introduce in order to
support multiple backends for src/tests/ssl.  I'll fix.

> There are some weird configure-related hunks in the patch:
>
> +  -runstatedir | --runstatedir | --runstatedi | --runstated \
> ...[more stuff like that]...
>
> -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
>
> I see the same when I use Debian's autoconf, but not FreeBSD's or
> MacPorts', despite all being version 2.69.  That seems to be due to
> non-upstreamed changes added by the Debian maintainers (I see the
> off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz).
> I think you need to build a stock autoconf 2.69 or run autoconf on a
> non-Debian system.

Sigh, yes that's a Debianism that slipped through, again sorry about that.

> I installed libnss3-dev on my Debian box and then configure had
> trouble locating and understanding <ssl.h>, until I added
> --with-includes=/usr/include/nss:/usr/include/nspr.  I suspect this is
> supposed to be done with pkg-config nss --cflags somewhere in
> configure (or alternatively nss-config --cflags, nspr-config --cflags,
> I don't know, but we're using pkg-config for other stuff).

Yeah, that's a good point, I should fix that.  Having a metric ton of TLS
libraries in various versions around in my environment I've been Stockholm
Syndromed to --with-includes to the point where I didn't even think to run
without it. It should clearly be as easy to use as OpenSSL wrt autoconf.

> I installed the Debian package libnss3-tools (for certutil) and then,
> in src/test/ssl, I ran make nssfiles (I guess that should be
> automatic?)

Yes, it needs to run automatically for NSS builds on make check.

> , and make check, and I got this far:
>
> Test Summary Report
> -------------------
> t/001_ssltests.pl (Wstat: 3584 Tests: 93 Failed: 14)
>  Failed tests:  14, 16, 18-20, 24, 27-28, 54-55, 78-80
>                91
>  Non-zero exit status: 14
>
> You mentioned some were failing in this WIP -- are those results you expect?

I'm not on my dev box at the moment, and I don't remember off the cuff, but
that sounds higher than I remember.  I wonder if I fat-fingered the regexes in
the last version?

Thanks for taking a look at the patch, I'll fix up the reported issues Monday
at the latest.

cheers ./daniel


Re: Support for NSS as a libpq TLS backend

From
Thomas Munro
Date:
On Fri, Jul 10, 2020 at 5:10 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
>
> I see the same when I use Debian's autoconf, but not FreeBSD's or
> MacPorts', despite all being version 2.69.  That seems to be due to
> non-upstreamed changes added by the Debian maintainers (I see the
> off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz).

By the way, Dagfinn mentioned that these changes were in fact
upstreamed, and happened to be beta-released today[1], and are due out
in ~3 months as 2.70.  That'll be something for us to coordinate a bit
further down the road.

[1] https://lists.gnu.org/archive/html/autoconf/2020-07/msg00006.html



Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>
> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
> if anyone has an interest in NSS, then I would love to hear feedback on how it
> works (and doesn't work).


I'll be happy to help, particularly with Windows support and with some
of the callback stuff I've had a hand in.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 12 Jul 2020, at 00:03, Daniel Gustafsson <daniel@yesql.se> wrote:

> Thanks for taking a look at the patch, I'll fix up the reported issues Monday
> at the latest.

A bit of life intervened, but attached is a new version of the patch which
should work for OpenSSL builds, and have the other issues addressed as well.  I
took the opportunity to clean up the NSS tests to be more like the OpenSSL ones
to lessen the impact on the TAP testcases.  On my Debian box, using the
standard NSS and NSPR packages, I get 6 failures which are essentially all
around CRL handling. I'm going to circle back and look at what is missing there.

This version also removes the required patch for statistics reporting as that
has been committed in 6a5c750f3f72899f4f982f921d5bf5665f55651e.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>
> On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>>
>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
>> if anyone has an interest in NSS, then I would love to hear feedback on how it
>> works (and doesn't work).
>
> I'll be happy to help, particularly with Windows support and with some
> of the callback stuff I've had a hand in.

That would be fantastic, thanks!  The password callback handling is still a
TODO so feel free to take a stab at that since you have a lot of context on
there.

For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this
thread, but that was done blind as I've done no testing on Windows yet.

cheers ./daniel


Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 16 Jul 2020, at 00:16, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 12 Jul 2020, at 00:03, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> Thanks for taking a look at the patch, I'll fix up the reported issues Monday
>> at the latest.
>
> A bit of life intervened, but attached is a new version of the patch which
> should work for OpenSSL builds, and have the other issues addressed as well.  I
> took the opportunity to clean up the NSS tests to be more like the OpenSSL ones
> to lessen the impact on the TAP testcases.  On my Debian box, using the
> standard NSS and NSPR packages, I get 6 failures which are essentially all
> around CRL handling. I'm going to circle back and look at what is missing there.

Taking a look at this, the issue was that I had fat-fingered the Makefile rules
for generating the NSS databases.  This is admittedly very messy at this point,
partly due to trying to mimick OpenSSL filepaths/names to minimize the impact
on tests and to keep OpenSSL/NSS tests as "optically" equivalent as I could.

With this, I have one failing test ("intermediate client certificate is
provided by client") which I've left failing since I believe the case should be
supported by NSS.  The issue is most likely that I havent figured out the right
certinfo incantation to make it so (Mozilla hasn't strained themselves when
writing documentation for this toolchain, or any part of NSS for that matter).

The failing test when running with OpenSSL also remains, the issue is that the
very first test for incorrect key passphrase fails, even though the server is
behaving exactly as it should.  Something in the test suite hackery breaks for
that test but I've been unable to pin down what it is, any help on would be
greatly appreciated.

This version adds support for sslinfo on NSS for most the functions.  In the
process I realized that sslinfo never got the memo about SSL support being
abstracted behind an API, so I went and did that as well.  This part of the
patch should perhaps be broken out into a separate patch/thread in case it's
deemed interesting regardless of the evetual conclusion on this patch.  Doing
this removed a bit of duplication with the backend code, and some errorhandling
moved to be-secure-openssl.c (originally added in d94c36a45ab45).  As the
original commit message states, they're mostly code hygiene with belts and
suspenders, but if we deemed them valuable enough for a contrib module ISTM
they should go into the backend as well.  Adding a testcase for sslinfo is a
TODO.

Support pg_strong_random, sha2 and pgcrypto has been started, but it's less
trivial as NSS/NSPR requires a lot more initialization and state than OpenSSL,
so it needs a bit more thought.

I've also done a rebase over todays HEAD, a pgindent pass and some cleanup here
and there.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 7/15/20 6:18 PM, Daniel Gustafsson wrote:
>> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>>
>> On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
>>> if anyone has an interest in NSS, then I would love to hear feedback on how it
>>> works (and doesn't work).
>> I'll be happy to help, particularly with Windows support and with some
>> of the callback stuff I've had a hand in.
> That would be fantastic, thanks!  The password callback handling is still a
> TODO so feel free to take a stab at that since you have a lot of context on
> there.
>
> For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this
> thread, but that was done blind as I've done no testing on Windows yet.
>


OK, here is an update of your patch that compiles and runs against NSS
under Windows (VS2019).


In addition to some work that was missing in src/tools/msvc, I had to
make a few adjustments, including:


  * strtok_r() isn't available on Windows. We don't use it elsewhere in
    the postgres code, and it seemed unnecessary to have reentrant calls
    here, so I just replaced it with equivalent strtok() calls.
  * We were missing an NSS implementation of
    pgtls_verify_peer_name_matches_certificate_guts(). I supplied a
    dummy that's enough to get it building cleanly, but that needs to be
    filled in properly.


There is still plenty of work to go, but this seemed a sufficient
milestone to report progress on.



cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 7/31/20 4:44 PM, Andrew Dunstan wrote:
> On 7/15/20 6:18 PM, Daniel Gustafsson wrote:
>>> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>>>
>>> On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>>>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
>>>> if anyone has an interest in NSS, then I would love to hear feedback on how it
>>>> works (and doesn't work).
>>> I'll be happy to help, particularly with Windows support and with some
>>> of the callback stuff I've had a hand in.
>> That would be fantastic, thanks!  The password callback handling is still a
>> TODO so feel free to take a stab at that since you have a lot of context on
>> there.
>>
>> For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this
>> thread, but that was done blind as I've done no testing on Windows yet.
>>
>
> OK, here is an update of your patch that compiles and runs against NSS
> under Windows (VS2019).
>
>
> In addition to some work that was missing in src/tools/msvc, I had to
> make a few adjustments, including:
>
>
>   * strtok_r() isn't available on Windows. We don't use it elsewhere in
>     the postgres code, and it seemed unnecessary to have reentrant calls
>     here, so I just replaced it with equivalent strtok() calls.
>   * We were missing an NSS implementation of
>     pgtls_verify_peer_name_matches_certificate_guts(). I supplied a
>     dummy that's enough to get it building cleanly, but that needs to be
>     filled in properly.
>
>
> There is still plenty of work to go, but this seemed a sufficient
> milestone to report progress on.
>
>


OK, this version contains pre-generated nss files, and passes a full
buildfarm run including the ssl test module, with both openssl and NSS.
That should keep the cfbot happy :-)


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 8/3/20 12:46 PM, Andrew Dunstan wrote:
> On 7/31/20 4:44 PM, Andrew Dunstan wrote:
>> On 7/15/20 6:18 PM, Daniel Gustafsson wrote:
>>>> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>>>>
>>>> On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>>>>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
>>>>> if anyone has an interest in NSS, then I would love to hear feedback on how it
>>>>> works (and doesn't work).
>>>> I'll be happy to help, particularly with Windows support and with some
>>>> of the callback stuff I've had a hand in.
>>> That would be fantastic, thanks!  The password callback handling is still a
>>> TODO so feel free to take a stab at that since you have a lot of context on
>>> there.
>>>
>>> For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this
>>> thread, but that was done blind as I've done no testing on Windows yet.
>>>
>> OK, here is an update of your patch that compiles and runs against NSS
>> under Windows (VS2019).
>>
>>
>> In addition to some work that was missing in src/tools/msvc, I had to
>> make a few adjustments, including:
>>
>>
>>   * strtok_r() isn't available on Windows. We don't use it elsewhere in
>>     the postgres code, and it seemed unnecessary to have reentrant calls
>>     here, so I just replaced it with equivalent strtok() calls.
>>   * We were missing an NSS implementation of
>>     pgtls_verify_peer_name_matches_certificate_guts(). I supplied a
>>     dummy that's enough to get it building cleanly, but that needs to be
>>     filled in properly.
>>
>>
>> There is still plenty of work to go, but this seemed a sufficient
>> milestone to report progress on.
>>
>>
>
> OK, this version contains pre-generated nss files, and passes a full
> buildfarm run including the ssl test module, with both openssl and NSS.
> That should keep the cfbot happy :-)
>
>

rebased on current master.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 3 Aug 2020, at 21:18, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
> On 8/3/20 12:46 PM, Andrew Dunstan wrote:
>> On 7/31/20 4:44 PM, Andrew Dunstan wrote:

>>> OK, here is an update of your patch that compiles and runs against NSS
>>> under Windows (VS2019).

Out of curiosity since I'm not familiar with Windows, how hard/easy is it to
install NSS for the purpose of a) hacking on postgres+NSS and b) using postgres
with NSS as the backend?

>>>  * strtok_r() isn't available on Windows. We don't use it elsewhere in
>>>    the postgres code, and it seemed unnecessary to have reentrant calls
>>>    here, so I just replaced it with equivalent strtok() calls.

Fair enough, that makes sense.

>>>  * We were missing an NSS implementation of
>>>    pgtls_verify_peer_name_matches_certificate_guts(). I supplied a
>>>    dummy that's enough to get it building cleanly, but that needs to be
>>>    filled in properly.

Interesting, not sure how I could've missed that one.

>> OK, this version contains pre-generated nss files, and passes a full
>> buildfarm run including the ssl test module, with both openssl and NSS.
>> That should keep the cfbot happy :-)

Exciting, thanks a lot for helping out on this!  I've started to look at the
required documentation changes during vacation, will hopefully be able to post
something soon.

cheers ./daniel


Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 8/4/20 5:42 PM, Daniel Gustafsson wrote:
>> On 3 Aug 2020, at 21:18, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>> On 8/3/20 12:46 PM, Andrew Dunstan wrote:
>>> On 7/31/20 4:44 PM, Andrew Dunstan wrote:
>>>> OK, here is an update of your patch that compiles and runs against NSS
>>>> under Windows (VS2019).
> Out of curiosity since I'm not familiar with Windows, how hard/easy is it to
> install NSS for the purpose of a) hacking on postgres+NSS and b) using postgres
> with NSS as the backend?




I've laid out the process at
https://www.2ndquadrant.com/en/blog/nss-on-windows-for-postgresql-development/


>>> OK, this version contains pre-generated nss files, and passes a full
>>> buildfarm run including the ssl test module, with both openssl and NSS.
>>> That should keep the cfbot happy :-)
> Exciting, thanks a lot for helping out on this!  I've started to look at the
> required documentation changes during vacation, will hopefully be able to post
> something soon.
>


Good. Having got the tests running cleanly on Linux, I'm now going back
to work on that for Windows.


After that I'll look at the hook/callback stuff.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 5 Aug 2020, at 22:38, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>
> On 8/4/20 5:42 PM, Daniel Gustafsson wrote:
>>> On 3 Aug 2020, at 21:18, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>>> On 8/3/20 12:46 PM, Andrew Dunstan wrote:
>>>> On 7/31/20 4:44 PM, Andrew Dunstan wrote:
>>>>> OK, here is an update of your patch that compiles and runs against NSS
>>>>> under Windows (VS2019).
>> Out of curiosity since I'm not familiar with Windows, how hard/easy is it to
>> install NSS for the purpose of a) hacking on postgres+NSS and b) using postgres
>> with NSS as the backend?
>
> I've laid out the process at
> https://www.2ndquadrant.com/en/blog/nss-on-windows-for-postgresql-development/

That's fantastic, thanks for putting that together.

>>>> OK, this version contains pre-generated nss files, and passes a full
>>>> buildfarm run including the ssl test module, with both openssl and NSS.
>>>> That should keep the cfbot happy :-)

Turns out the CFBot doesn't like the binary diffs.  They are included in this
version too but we should probably drop them again it seems.

>> Exciting, thanks a lot for helping out on this!  I've started to look at the
>> required documentation changes during vacation, will hopefully be able to post
>> something soon.
>
> Good. Having got the tests running cleanly on Linux, I'm now going back
> to work on that for Windows.
>
> After that I'll look at the hook/callback stuff.

The attached v9 contains mostly a first stab at getting some documentation
going, it's far from completed but I'd rather share more frequently to not have
local trees deviate too much in case you've had time to hack as well.  I had a
few documentation tweaks in the code too, but no real functionality change for
now.

The 0001 patch isn't strictly necessary but it seems reasonable to address the
various ways OpenSSL was spelled out in the docs while at updating the SSL
portions.  It essentially ensures that markup around OpenSSL and SSL is used
consistently.  I didn't address the linelengths being too long in this patch to
make review easier instead.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
>
> >>>> OK, this version contains pre-generated nss files, and passes a full
> >>>> buildfarm run including the ssl test module, with both openssl and NSS.
> >>>> That should keep the cfbot happy :-)
>
> Turns out the CFBot doesn't like the binary diffs.  They are included in this
> version too but we should probably drop them again it seems.
>

I did ask Thomas about this, he was going to try to fix it. In
principle we should want it to accept binary diffs exactly for this
sort of thing.


> The attached v9 contains mostly a first stab at getting some documentation
> going, it's far from completed but I'd rather share more frequently to not have
> local trees deviate too much in case you've had time to hack as well.  I had a
> few documentation tweaks in the code too, but no real functionality change for
> now.
>
> The 0001 patch isn't strictly necessary but it seems reasonable to address the
> various ways OpenSSL was spelled out in the docs while at updating the SSL
> portions.  It essentially ensures that markup around OpenSSL and SSL is used
> consistently.  I didn't address the linelengths being too long in this patch to
> make review easier instead.
>


I'll take a look.

cheers

andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Thu, Sep 03, 2020 at 03:26:03PM -0400, Andrew Dunstan wrote:
>> The 0001 patch isn't strictly necessary but it seems reasonable to address the
>> various ways OpenSSL was spelled out in the docs while at updating the SSL
>> portions.  It essentially ensures that markup around OpenSSL and SSL is used
>> consistently.  I didn't address the linelengths being too long in this patch to
>> make review easier instead.
>
> I'll take a look.

Adding a <productname> markup around OpenSSL in the docs makes things
consistent.  +1.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Fri, Sep 04, 2020 at 10:23:34AM +0900, Michael Paquier wrote:
> Adding a <productname> markup around OpenSSL in the docs makes things
> consistent.  +1.

I have looked at 0001, and applied it after fixing the line length
(thanks for not doing it to ease my lookup), and I found one extra
place in need of fix.  Patch 0002 is failing to apply.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 17 Sep 2020, at 09:41, Michael Paquier <michael@paquier.xyz> wrote:
> 
> On Fri, Sep 04, 2020 at 10:23:34AM +0900, Michael Paquier wrote:
>> Adding a <productname> markup around OpenSSL in the docs makes things
>> consistent.  +1.
> 
> I have looked at 0001, and applied it after fixing the line length
> (thanks for not doing it to ease my lookup), and I found one extra
> place in need of fix.  

Thanks!

> Patch 0002 is failing to apply.

Attached is a v10 rebased to apply on top of HEAD.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
> Attached is a v10 rebased to apply on top of HEAD.

I am afraid that this needs a new rebase.  The patch is failing to
apply, per the CF bot. :/
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 29 Sep 2020, at 07:59, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
>> Attached is a v10 rebased to apply on top of HEAD.
>
> I am afraid that this needs a new rebase.  The patch is failing to
> apply, per the CF bot. :/

It's failing on binary diffs due to the NSS certificate databases being
included to make hacking on the patch easier:

  File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported.

This is a limitation of the CFBot patch tester, the text portions of the patch
still applies with a tiny but of fuzz.

cheers ./daniel


Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 29 Sep 2020, at 09:52, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 29 Sep 2020, at 07:59, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
>>> Attached is a v10 rebased to apply on top of HEAD.
>>
>> I am afraid that this needs a new rebase.  The patch is failing to
>> apply, per the CF bot. :/
>
> It's failing on binary diffs due to the NSS certificate databases being
> included to make hacking on the patch easier:
>
>  File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported.
>
> This is a limitation of the CFBot patch tester, the text portions of the patch
> still applies with a tiny but of fuzz.

Attached is a new version which doesn't contain the NSS certificate databases
to keep the CFBot happy.

It also implements server-side passphrase callbacks as well as re-enables the
tests for those.  The callback works a bit differently from the OpenSSL one as
it must run in the forked process, so it can't run on server reload.  There's
also no default fallback reading from a TTY like in OpenSSL, so if no callback
it set the always-failing dummy is set.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
The attached v12 adds support for pgcrypto as well as pg_strong_random, which I
believe completes the required subsystems where we have OpenSSL support today.
I opted for not adding code to handle the internal shaXXX implementations until
the dust settles around the proposal to change the API there.

Blowfish is not supported by NSS AFAICT, even though the cipher mechanism is
defined, so the internal implementation is used there instead.  CAST5 is
supported, but segfaults inside NSS on most inputs so support for that is not
included for now.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andres Freund
Date:
Hi,

On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
> From: Daniel Gustafsson <daniel@yesql.se>
> Date: Thu, 8 Oct 2020 18:44:28 +0200
> Subject: [PATCH] Support for NSS as a TLS backend v12
> ---
>  configure                                     |  223 +++-
>  configure.ac                                  |   39 +-
>  contrib/Makefile                              |    2 +-
>  contrib/pgcrypto/Makefile                     |    5 +
>  contrib/pgcrypto/nss.c                        |  773 +++++++++++
>  contrib/pgcrypto/openssl.c                    |    2 +-
>  contrib/pgcrypto/px.c                         |    1 +
>  contrib/pgcrypto/px.h                         |    1 +

Personally I'd like to see this patch broken up a bit - it's quite
large. Several of the changes could easily be committed separately, no?


>  if test "$with_openssl" = yes ; then
> +  if test x"$with_nss" = x"yes" ; then
> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> +  fi

Based on a quick look there's no similar error check for the msvc
build. Should there be?

>  
> +if test "$with_nss" = yes ; then
> +  if test x"$with_openssl" = x"yes" ; then
> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> +  fi

Isn't this a repetition of the earlier check?


> +  CLEANLDFLAGS="$LDFLAGS"
> +  # TODO: document this set of LDFLAGS
> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"

Shouldn't this use nss-config or such?


> +if test "$with_nss" = yes ; then
> +  AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])])
> +  AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])])
> +fi

Hm. For me, on debian, these headers are not directly in the default
include search path, but would be as nss/ssl.h. I don't see you adding
nss/ to CFLAGS anywhere? How does this work currently?

I think it'd also be better if we could include these files as nss/ssl.h
etc - ssl.h is a name way too likely to conflict imo.

> +++ b/src/backend/libpq/be-secure-nss.c
> @@ -0,0 +1,1158 @@
> +/*
> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
> + * our version to avoid compiler warnings on redefinition.
> + */
> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
> +#undef BITS_PER_BYTE

Most compilers/preprocessors don't warn about redefinitions when they
would result in the same value (IIRC we have some cases of redefinitions
in tree even). Does nspr's differ?


> +/*
> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
> + * colliding definitions from ours, causing a much expected compiler error.
> + * The definitions are however not actually used in NSPR at all, and are only
> + * intended for what seems to be backwards compatibility for apps written
> + * against old versions of NSPR.  The following comment is in the referenced
> + * file, and was added in 1998:
> + *
> + *        This section typedefs the old 'native' types to the new PR<type>s.
> + *        These definitions are scheduled to be eliminated at the earliest
> + *        possible time. The NSPR API is implemented and documented using
> + *        the new definitions.
> + *
> + * As there is no opt-out from pulling in these typedefs, we define the guard
> + * for the file to exclude it. This is incredibly ugly, but seems to be about
> + * the only way around it.
> + */
> +#define PROTYPES_H
> +#include <nspr.h>
> +#undef PROTYPES_H

Yuck :(.


> +int
> +be_tls_init(bool isServerStart)
> +{
> +    SECStatus    status;
> +    SSLVersionRange supported_sslver;
> +
> +    /*
> +     * Set up the connection cache for multi-processing application behavior.

Hm. Do we necessarily want that? Session resumption is not exactly
unproblematic... Or does this do something else?


> +     * If we are in ServerStart then we initialize the cache. If the server is
> +     * already started, we inherit the cache such that it can be used for
> +     * connections. Calling SSL_ConfigMPServerSIDCache sets an environment
> +     * variable which contains enough information for the forked child to know
> +     * how to access it.  Passing NULL to SSL_InheritMPServerSIDCache will
> +     * make the forked child look it up by the default name SSL_INHERITANCE,
> +     * if env vars aren't inherited then the contents of the variable can be
> +     * passed instead.
> +     */

Does this stuff work on windows / EXEC_BACKEND?


> +     * The below parameters are what the implicit initialization would've done
> +     * for us, and should work even for older versions where it might not be
> +     * done automatically. The last parameter, maxPTDs, is set to various
> +     * values in other codebases, but has been unused since NSPR 2.1 which was
> +     * released sometime in 1998.
> +     */
> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init
says that currently all parameters are ignored?




> +    /*
> +     * Import the already opened socket as we don't want to use NSPR functions
> +     * for opening the network socket due to how the PostgreSQL protocol works
> +     * with TLS connections. This function is not part of the NSPR public API,
> +     * see the comment at the top of the file for the rationale of still using
> +     * it.
> +     */
> +    pr_fd = PR_ImportTCPSocket(port->sock);
> +    if (!pr_fd)
> +        ereport(ERROR,
> +                (errmsg("unable to connect to socket")));

I don't see the comment you're referring to?


> +    /*
> +     * Most of the documentation available, and implementations of, NSS/NSPR
> +     * use the PR_NewTCPSocket() function here, which has the drawback that it
> +     * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which
> +     * copes with IPv6 as well.
> +     */
> +    model = PR_OpenTCPSocket(port->laddr.addr.ss_family);
> +    if (!model)
> +        ereport(ERROR,
> +                (errmsg("unable to open socket")));
> +
> +    /*
> +     * Convert the NSPR socket to an SSL socket. Ensuring the success of this
> +     * operation is critical as NSS SSL_* functions may return SECSuccess on
> +     * the socket even though SSL hasn't been enabled, which introduce a risk
> +     * of silent downgrades.
> +     */
> +    model = SSL_ImportFD(NULL, model);
> +    if (!model)
> +        ereport(ERROR,
> +                (errmsg("unable to enable TLS on socket")));

It's confusing that these functions do not actually reference the socket
via some handle :(. What does opening a socket do here?


> +    /*
> +     * Configure the allowed cipher. If there are no user preferred suites,

*ciphers?

> +
> +    port->pr_fd = SSL_ImportFD(model, pr_fd);
> +    if (!port->pr_fd)
> +        ereport(ERROR,
> +                (errmsg("unable to initialize")));
> +
> +    PR_Close(model);

A comment explaining why we first import a NULL into the model, and then
release the model, and import the real fd would be good.


> +ssize_t
> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
> +{
> +    ssize_t        n_read;
> +    PRErrorCode err;
> +
> +    n_read = PR_Read(port->pr_fd, ptr, len);
> +
> +    if (n_read < 0)
> +    {
> +        err = PR_GetError();

Yay, more thread global state :(.

> +        /* XXX: This logic seems potentially bogus? */
> +        if (err == PR_WOULD_BLOCK_ERROR)
> +            *waitfor = WL_SOCKET_READABLE;
> +        else
> +            *waitfor = WL_SOCKET_WRITEABLE;

Don't we need to handle failed connections somewhere here? secure_read()
won't know about PR_GetError() etc? How would SSL errors be signalled
upwards here?

Also, as you XXX, it's not clear to me that your mapping would always
result in waiting for the right event? A tls write could e.g. very well
require receiving data etc?

> +    /*
> +     * At least one byte with password content was returned, and NSS requires
> +     * that we return it allocated in NSS controlled memory. If we fail to
> +     * allocate then abort without passing back NULL and bubble up the error
> +     * on the PG side.
> +     */
> +    password = (char *) PR_Malloc(len + 1);
> +    if (!password)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_OUT_OF_MEMORY),
> +                 errmsg("out of memory")));
>
> +    strlcpy(password, buf, sizeof(password));
> +    explicit_bzero(buf, sizeof(buf));
> +

In case of error you're not bzero'ing out the password!

Separately, I wonder if we should introduce a function for throwing OOM
errors - which then e.g. could print the memory context stats in those
places too...


> +static SECStatus
> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
> +{
> +    SECStatus    status;
> +    Port       *port = (Port *) arg;
> +    CERTCertificate *cert;
> +    char       *peer_cn;
> +    int            len;
> +
> +    status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
> +    if (status == SECSuccess)
> +    {
> +        cert = SSL_PeerCertificate(port->pr_fd);
> +        len = strlen(cert->subjectName);
> +        peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> +        if (strncmp(cert->subjectName, "CN=", 3) == 0)
> +            strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
> +        else
> +            strlcpy(peer_cn, cert->subjectName, len + 1);
> +        CERT_DestroyCertificate(cert);
> +
> +        port->peer_cn = peer_cn;
> +        port->peer_cert_valid = true;

Hm. We either should have something similar to

            /*
             * Reject embedded NULLs in certificate common name to prevent
             * attacks like CVE-2009-4034.
             */
            if (len != strlen(peer_cn))
            {
                ereport(COMMERROR,
                        (errcode(ERRCODE_PROTOCOL_VIOLATION),
                         errmsg("SSL certificate's common name contains embedded null")));
                pfree(peer_cn);
                return -1;
            }
here, or a comment explaining why not.

Also, what's up with the CN= bit? Why is that needed here, but not for
openssl?


> +static PRFileDesc *
> +init_iolayer(Port *port, int loglevel)
> +{
> +    const        PRIOMethods *default_methods;
> +    PRFileDesc *layer;
> +
> +    /*
> +     * Start by initializing our layer with all the default methods so that we
> +     * can selectively override the ones we want while still ensuring that we
> +     * have a complete layer specification.
> +     */
> +    default_methods = PR_GetDefaultIOMethods();
> +    memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods));
> +
> +    pr_iomethods.recv = pg_ssl_read;
> +    pr_iomethods.send = pg_ssl_write;
> +
> +    /*
> +     * Each IO layer must be identified by a unique name, where uniqueness is
> +     * per connection. Each connection in a postgres cluster can generate the
> +     * identity from the same string as they will create their IO layers on
> +     * different sockets. Only one layer per socket can have the same name.
> +     */
> +    pr_id = PR_GetUniqueIdentity("PostgreSQL");

Seems like it might not be a bad idea to append Server or something?


> +
> +    /*
> +     * Create the actual IO layer as a stub such that it can be pushed onto
> +     * the layer stack. The step via a stub is required as we define custom
> +     * callbacks.
> +     */
> +    layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods);
> +    if (!layer)
> +    {
> +        ereport(loglevel,
> +                (errmsg("unable to create NSS I/O layer")));
> +        return NULL;
> +    }

Why is this accepting a variable log level? The only caller passes ERROR?


> +/*
> + * pg_SSLerrmessage
> + *        Create and return a human readable error message given
> + *        the specified error code
> + *
> + * PR_ErrorToName only converts the enum identifier of the error to string,
> + * but that can be quite useful for debugging (and in case PR_ErrorToString is
> + * unable to render a message then we at least have something).
> + */
> +static char *
> +pg_SSLerrmessage(PRErrorCode errcode)
> +{
> +    char        error[128];
> +    int            ret;
> +
> +    /* TODO: this should perhaps use a StringInfo instead.. */
> +    ret = pg_snprintf(error, sizeof(error), "%s (%s)",
> +                      PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT),
> +                      PR_ErrorToName(errcode));
> +    if (ret)
> +        return pstrdup(error);

> +    return pstrdup(_("unknown TLS error"));
> +}

Why not use psrintf() here?



> +++ b/src/include/common/pg_nss.h
> @@ -0,0 +1,141 @@
> +/*-------------------------------------------------------------------------
> + *
> + * pg_nss.h
> + *      Support for NSS as a TLS backend
> + *
> + * These definitions are used by both frontend and backend code.
> + *
> + * Copyright (c) 2020, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + *        src/include/common/pg_nss.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef PG_NSS_H
> +#define PG_NSS_H
> +
> +#ifdef USE_NSS
> +
> +#include <sslproto.h>
> +
> +PRUint16    pg_find_cipher(char *name);
> +
> +typedef struct
> +{
> +    const char *name;
> +    PRUint16    number;
> +}            NSSCiphers;
> +
> +#define INVALID_CIPHER    0xFFFF
> +
> +/*
> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h
> + * in order to provide a human readable version of the ciphers. It would be
> + * nice to not have to have this, but NSS doesn't provide any API addressing
> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less?
> + */
> +static const NSSCiphers NSS_CipherList[] = {
> +
> +    {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL},

Hm. Is this whole business of defining array constants in a header just
done to avoid having a .c file that needs to be compiled both in
frontend and backend code?


> +/*
> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
> + * colliding definitions from ours, causing a much expected compiler error.
> + * The definitions are however not actually used in NSPR at all, and are only
> + * intended for what seems to be backwards compatibility for apps written
> + * against old versions of NSPR.  The following comment is in the referenced
> + * file, and was added in 1998:
> + *
> + *        This section typedefs the old 'native' types to the new PR<type>s.
> + *        These definitions are scheduled to be eliminated at the earliest
> + *        possible time. The NSPR API is implemented and documented using
> + *        the new definitions.
> + *
> + * As there is no opt-out from pulling in these typedefs, we define the guard
> + * for the file to exclude it. This is incredibly ugly, but seems to be about
> + * the only way around it.
> + */

There's a lot of duplicated comments here. Could we move either of the
files to reference the other for longer ones?



> +/*
> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
> + * only way to make NSS use an already set up POSIX file descriptor rather
> + * than opening one itself. To quote the NSS documentation:
> + *
> + *        "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
> + *        implementation changes. In practice, this is unlikely to happen because
> + *        NSPR's implementation has been stable for years and because of NSPR's
> + *        strong commitment to backward compatibility."
> + *
> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
> + *
> + * The function is declared in <private/pprio.h>, but as it is a header marked
> + * private we declare it here rather than including it.
> + */
> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);

Ugh. This is really the way to do this? How do other applications deal
with this problem?


> +#if defined(WIN32)
> +static const char *ca_trust_name = "nssckbi.dll";
> +#elif defined(__darwin__)
> +static const char *ca_trust_name = "libnssckbi.dylib";
> +#else
> +static const char *ca_trust_name = "libnssckbi.so";
> +#endif

There's really no pre-existing handling for this in nss???


> +    /*
> +     * The original design of NSS was for a single application to use a single
> +     * copy of it, initialized with NSS_Initialize() which isn't returning any
> +     * handle with which to refer to NSS. NSS initialization and shutdown are
> +     * global for the application, so a shutdown in another NSS enabled
> +     * library would cause NSS to be stopped for libpq as well.  The fix has
> +     * been to introduce NSS_InitContext which returns a context handle to
> +     * pass to NSS_ShutdownContext.  NSS_InitContext was introduced in NSS
> +     * 3.12, but the use of it is not very well documented.
> +     * https://bugzilla.redhat.com/show_bug.cgi?id=738456
> +     *
> +     * The InitParameters struct passed can be used to override internal
> +     * values in NSS, but the usage is not documented at all. When using
> +     * NSS_Init initializations, the values are instead set via PK11_Configure
> +     * calls so the PK11_Configure documentation can be used to glean some
> +     * details on these.
> +     *
> +     * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs

> +
> +    if (!nss_context)
> +    {
> +        char       *err = pg_SSLerrmessage(PR_GetError());
> +
> +        printfPQExpBuffer(&conn->errorMessage,
> +                          libpq_gettext("unable to %s certificate database: %s"),
> +                          conn->cert_database ? "open" : "create",
> +                          err);
> +        free(err);
> +        return PGRES_POLLING_FAILED;
> +    }
> +
> +    /*
> +     * Configure cipher policy.
> +     */
> +    status = NSS_SetDomesticPolicy();

Why is "domestic" the right thing here?


> +
> +    PK11_SetPasswordFunc(PQssl_passwd_cb);

Is it actually OK to do stuff like this when other users of NSS might be
present? That's obviously more likely in the libpq case, compared to the
backend case (where it's also possible, of course). What prevents us
from overriding another user's callback?


> +ssize_t
> +pgtls_read(PGconn *conn, void *ptr, size_t len)
> +{
> +    PRInt32        nread;
> +    PRErrorCode status;
> +    int            read_errno = 0;
> +
> +    nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);
> +
> +    /*
> +     * PR_Recv blocks until there is data to read or the timeout expires. Zero
> +     * is returned for closed connections, while -1 indicates an error within
> +     * the ongoing connection.
> +     */
> +    if (nread == 0)
> +    {
> +        read_errno = ECONNRESET;
> +        return -1;
> +    }

It's a bit confusing to talk about blocking when the socket presumably
is in non-blocking mode, and you're also asking to never wait?


> +    if (nread == -1)
> +    {
> +        status = PR_GetError();
> +
> +        switch (status)
> +        {
> +            case PR_WOULD_BLOCK_ERROR:
> +                read_errno = EINTR;
> +                break;

Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the
same as EAGAIN/EWOULDBLOCK?


> +            case PR_IO_TIMEOUT_ERROR:
> +                break;

What does this mean? We'll return with a 0 errno here, right? When is
this case reachable?

E.g. the comment in fe-misc.c:
                /* pqsecure_read set the error message for us */
for this case doesn't seem to be fulfilled by this.


> +/*
> + *    Verify that the server certificate matches the hostname we connected to.
> + *
> + * The certificate's Common Name and Subject Alternative Names are considered.
> + */
> +int
> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
> +                                                int *names_examined,
> +                                                char **first_name)
> +{
> +    return 1;
> +}

Uh, huh? Certainly doesn't verify anything...


> +/* ------------------------------------------------------------ */
> +/*            PostgreSQL specific TLS support functions            */
> +/* ------------------------------------------------------------ */
> +
> +/*
> + * TODO: this a 99% copy of the same function in the backend, make these share
> + * a single implementation instead.
> + */
> +static char *
> +pg_SSLerrmessage(PRErrorCode errcode)
> +{
> +    const char *error;
> +
> +    error = PR_ErrorToName(errcode);
> +    if (error)
> +        return strdup(error);
> +
> +    return strdup("unknown TLS error");
> +}

Btw, why does this need to duplicate strings, instead of returning a
const char*?



Greetings,

Andres Freund



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,

Thanks for your review, much appreciated!

> On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
>> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
>> From: Daniel Gustafsson <daniel@yesql.se>
>> Date: Thu, 8 Oct 2020 18:44:28 +0200
>> Subject: [PATCH] Support for NSS as a TLS backend v12
>> ---
>> configure                                     |  223 +++-
>> configure.ac                                  |   39 +-
>> contrib/Makefile                              |    2 +-
>> contrib/pgcrypto/Makefile                     |    5 +
>> contrib/pgcrypto/nss.c                        |  773 +++++++++++
>> contrib/pgcrypto/openssl.c                    |    2 +-
>> contrib/pgcrypto/px.c                         |    1 +
>> contrib/pgcrypto/px.h                         |    1 +
>
> Personally I'd like to see this patch broken up a bit - it's quite
> large. Several of the changes could easily be committed separately, no?

Not sure how much of this makes sense committed separately (unless separately
means in quick succession), but it could certainly be broken up for the sake of
making review easier.  I will take a stab at that, but in a follow-up email as
I would like the split to be a version just doing the split and not also
introducing/fixing things.

>> if test "$with_openssl" = yes ; then
>> +  if test x"$with_nss" = x"yes" ; then
>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>> +  fi
>
> Based on a quick look there's no similar error check for the msvc
> build. Should there be?

Thats a good question.  When embarking on this is seemed quite natural to me
that it should be, but now I'm not so sure.  Maybe there should be a
--with-openssl-preferred like how we handle readline/libedit or just allow
multiple and let the last one win?  Do you have any input on what would make
sense?

The only thing I think makes no sense is to allow multiple ones at the same
time given the current autoconf switches, even if it would just be to pick say
pg_strong_random from one and libpq TLS from another.

>> +if test "$with_nss" = yes ; then
>> +  if test x"$with_openssl" = x"yes" ; then
>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>> +  fi
>
> Isn't this a repetition of the earlier check?

It is, and it we want to keep such a check it should be broken out into a
separate step performed before all library specific checks IMO.

>> +  CLEANLDFLAGS="$LDFLAGS"
>> +  # TODO: document this set of LDFLAGS
>> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>
> Shouldn't this use nss-config or such?

Indeed it should, where available.  I've added rudimentary support for that
without a fallback as of now.

>> +if test "$with_nss" = yes ; then
>> +  AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])])
>> +  AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])])
>> +fi
>
> Hm. For me, on debian, these headers are not directly in the default
> include search path, but would be as nss/ssl.h. I don't see you adding
> nss/ to CFLAGS anywhere? How does this work currently?

I had Stockholm-syndromed myself into passing --with-includes and hadn't really
realized. Sometimes the obvious is too obvious in a 4000+ LOC patch.

> I think it'd also be better if we could include these files as nss/ssl.h
> etc - ssl.h is a name way too likely to conflict imo.

I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
will still need the direct path to the headers (from autoconf) since nss.h
includes NSPR headers as #include <nspr.h> and so on.

>> +++ b/src/backend/libpq/be-secure-nss.c
>> @@ -0,0 +1,1158 @@
>> +/*
>> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
>> + * our version to avoid compiler warnings on redefinition.
>> + */
>> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
>> +#undef BITS_PER_BYTE
>
> Most compilers/preprocessors don't warn about redefinitions when they
> would result in the same value (IIRC we have some cases of redefinitions
> in tree even). Does nspr's differ?

GCC 8.3 in my Debian installation throws the below warning:

    In file included from /usr/include/nspr/prtypes.h:26,
                     from /usr/include/nspr/pratom.h:14,
                     from /usr/include/nspr/nspr.h:9,
                     from be-secure-nss.c:45:
    /usr/include/nspr/prcpucfg.h:1143: warning: "BITS_PER_BYTE" redefined
     #define BITS_PER_BYTE  PR_BITS_PER_BYTE

    In file included from ../../../src/include/c.h:55,
                     from ../../../src/include/postgres.h:46,
                     from be-secure-nss.c:16:
    ../../../src/include/pg_config_manual.h:115: note: this is the location of the previous definition
     #define BITS_PER_BYTE  8

PR_BITS_PER_BYTE is defined per platform in pr/include/md/_<platform>.cfg and
is as expected 8. I assume it's that indirection which cause the warning?

>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * The definitions are however not actually used in NSPR at all, and are only
>> + * intended for what seems to be backwards compatibility for apps written
>> + * against old versions of NSPR.  The following comment is in the referenced
>> + * file, and was added in 1998:
>> + *
>> + *        This section typedefs the old 'native' types to the new PR<type>s.
>> + *        These definitions are scheduled to be eliminated at the earliest
>> + *        possible time. The NSPR API is implemented and documented using
>> + *        the new definitions.
>> + *
>> + * As there is no opt-out from pulling in these typedefs, we define the guard
>> + * for the file to exclude it. This is incredibly ugly, but seems to be about
>> + * the only way around it.
>> + */
>> +#define PROTYPES_H
>> +#include <nspr.h>
>> +#undef PROTYPES_H
>
> Yuck :(.

Thats not an understatement.  Taking another dive into the NSPR code I did
however find a proper way to deal with this.  Defining NO_NSPR_10_SUPPORT stops
NSPR from using the files in obsolete/. So fixed, yay!

>> +int
>> +be_tls_init(bool isServerStart)
>> +{
>> +    SECStatus    status;
>> +    SSLVersionRange supported_sslver;
>> +
>> +    /*
>> +     * Set up the connection cache for multi-processing application behavior.
>
> Hm. Do we necessarily want that? Session resumption is not exactly
> unproblematic... Or does this do something else?

From my reading of the docs, and experience with the code, a server application
must set up a connection cache in order to accept connections.  Not entirely
sure, and the docs aren't terribly clear for non SSLv2/v3 environments (it
seems to only cache for SSLv2/3 and not TLSv+) but it seems like it may have
other uses internally.  I will hunt down some more information on the NSS
mailing list.

>> +     * If we are in ServerStart then we initialize the cache. If the server is
>> +     * already started, we inherit the cache such that it can be used for
>> +     * connections. Calling SSL_ConfigMPServerSIDCache sets an environment
>> +     * variable which contains enough information for the forked child to know
>> +     * how to access it.  Passing NULL to SSL_InheritMPServerSIDCache will
>> +     * make the forked child look it up by the default name SSL_INHERITANCE,
>> +     * if env vars aren't inherited then the contents of the variable can be
>> +     * passed instead.
>> +     */
>
> Does this stuff work on windows

According to the documentation it does, and Andrew had this working on Windows
in an earlier version of the patch.  I need to get a proper Windows env for
testing/dev up and running as mine has bitrotted to nothingness.

> / EXEC_BACKEND?

That's a good point, maybe we need to do a SSL_ConfigServerSessionIDCache
rather than the MP version for EXEC_BACKEND? Not sure.

>> +     * The below parameters are what the implicit initialization would've done
>> +     * for us, and should work even for older versions where it might not be
>> +     * done automatically. The last parameter, maxPTDs, is set to various
>> +     * values in other codebases, but has been unused since NSPR 2.1 which was
>> +     * released sometime in 1998.
>> +     */
>> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );
>
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init
> says that currently all parameters are ignored?

Right, my comment didn't reflect that they're all dead these days, only that
one of them has been unused since RUN DMC topped the charts with "It's like
that". Comment updated.

>> +    /*
>> +     * Import the already opened socket as we don't want to use NSPR functions
>> +     * for opening the network socket due to how the PostgreSQL protocol works
>> +     * with TLS connections. This function is not part of the NSPR public API,
>> +     * see the comment at the top of the file for the rationale of still using
>> +     * it.
>> +     */
>> +    pr_fd = PR_ImportTCPSocket(port->sock);
>> +    if (!pr_fd)
>> +        ereport(ERROR,
>> +                (errmsg("unable to connect to socket")));
>
> I don't see the comment you're referring to?

It's referring to the comment discussing PR_ImportTCPSocket being a private API
call, yet still used by everyone (which is also discussed later in this review).

>> +    /*
>> +     * Most of the documentation available, and implementations of, NSS/NSPR
>> +     * use the PR_NewTCPSocket() function here, which has the drawback that it
>> +     * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which
>> +     * copes with IPv6 as well.
>> +     */
>> +    model = PR_OpenTCPSocket(port->laddr.addr.ss_family);
>> +    if (!model)
>> +        ereport(ERROR,
>> +                (errmsg("unable to open socket")));
>> +
>> +    /*
>> +     * Convert the NSPR socket to an SSL socket. Ensuring the success of this
>> +     * operation is critical as NSS SSL_* functions may return SECSuccess on
>> +     * the socket even though SSL hasn't been enabled, which introduce a risk
>> +     * of silent downgrades.
>> +     */
>> +    model = SSL_ImportFD(NULL, model);
>> +    if (!model)
>> +        ereport(ERROR,
>> +                (errmsg("unable to enable TLS on socket")));
>
> It's confusing that these functions do not actually reference the socket
> via some handle :(. What does opening a socket do here?

This specific call converts the socket from a plain NSPR socket to an SSL/TLS
capable socket which NSS will work with.  This is a required step for
"activating" NSS on the socket.

>> +    /*
>> +     * Configure the allowed cipher. If there are no user preferred suites,
>
> *ciphers?

Yes, fixed.

>> +
>> +    port->pr_fd = SSL_ImportFD(model, pr_fd);
>> +    if (!port->pr_fd)
>> +        ereport(ERROR,
>> +                (errmsg("unable to initialize")));
>> +
>> +    PR_Close(model);
>
> A comment explaining why we first import a NULL into the model, and then
> release the model, and import the real fd would be good.

I've added a small comment to explain how the model is a configuration template
for the actual socket.  This part of NSS/NSPR is a bit overcomplicated for how
we have connections, it's more geared towards having many open sockets in the
same process.

>> +ssize_t
>> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
>> +{
>> +    ssize_t        n_read;
>> +    PRErrorCode err;
>> +
>> +    n_read = PR_Read(port->pr_fd, ptr, len);
>> +
>> +    if (n_read < 0)
>> +    {
>> +        err = PR_GetError();
>
> Yay, more thread global state :(.

Sorry about that.

>> +        /* XXX: This logic seems potentially bogus? */
>> +        if (err == PR_WOULD_BLOCK_ERROR)
>> +            *waitfor = WL_SOCKET_READABLE;
>> +        else
>> +            *waitfor = WL_SOCKET_WRITEABLE;
>
> Don't we need to handle failed connections somewhere here? secure_read()
> won't know about PR_GetError() etc? How would SSL errors be signalled
> upwards here?
>
> Also, as you XXX, it's not clear to me that your mapping would always
> result in waiting for the right event? A tls write could e.g. very well
> require receiving data etc?

Fixed, but there might be more to be done here.

>> +    /*
>> +     * At least one byte with password content was returned, and NSS requires
>> +     * that we return it allocated in NSS controlled memory. If we fail to
>> +     * allocate then abort without passing back NULL and bubble up the error
>> +     * on the PG side.
>> +     */
>> +    password = (char *) PR_Malloc(len + 1);
>> +    if (!password)
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_OUT_OF_MEMORY),
>> +                 errmsg("out of memory")));
>>
>> +    strlcpy(password, buf, sizeof(password));
>> +    explicit_bzero(buf, sizeof(buf));
>> +
>
> In case of error you're not bzero'ing out the password!

Fixed.

> Separately, I wonder if we should introduce a function for throwing OOM
> errors - which then e.g. could print the memory context stats in those
> places too...

+1. I'd be happy to review such a patch.

>> +static SECStatus
>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>> +{
>> +    SECStatus    status;
>> +    Port       *port = (Port *) arg;
>> +    CERTCertificate *cert;
>> +    char       *peer_cn;
>> +    int            len;
>> +
>> +    status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>> +    if (status == SECSuccess)
>> +    {
>> +        cert = SSL_PeerCertificate(port->pr_fd);
>> +        len = strlen(cert->subjectName);
>> +        peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>> +        if (strncmp(cert->subjectName, "CN=", 3) == 0)
>> +            strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>> +        else
>> +            strlcpy(peer_cn, cert->subjectName, len + 1);
>> +        CERT_DestroyCertificate(cert);
>> +
>> +        port->peer_cn = peer_cn;
>> +        port->peer_cert_valid = true;
>
> Hm. We either should have something similar to
>
>             /*
>              * Reject embedded NULLs in certificate common name to prevent
>              * attacks like CVE-2009-4034.
>              */
>             if (len != strlen(peer_cn))
>             {
>                 ereport(COMMERROR,
>                         (errcode(ERRCODE_PROTOCOL_VIOLATION),
>                          errmsg("SSL certificate's common name contains embedded null")));
>                 pfree(peer_cn);
>                 return -1;
>             }
> here, or a comment explaining why not.

We should, but it's proving rather difficult as there is no equivalent API call
to get the string as well as the expected length of it.

> Also, what's up with the CN= bit? Why is that needed here, but not for
> openssl?

OpenSSL returns only the value portion, whereas NSS returns key=value so we
need to skip over the key= part.

>> +static PRFileDesc *
>> +init_iolayer(Port *port, int loglevel)
>> +{
>> +    const        PRIOMethods *default_methods;
>> +    PRFileDesc *layer;
>> +
>> +    /*
>> +     * Start by initializing our layer with all the default methods so that we
>> +     * can selectively override the ones we want while still ensuring that we
>> +     * have a complete layer specification.
>> +     */
>> +    default_methods = PR_GetDefaultIOMethods();
>> +    memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods));
>> +
>> +    pr_iomethods.recv = pg_ssl_read;
>> +    pr_iomethods.send = pg_ssl_write;
>> +
>> +    /*
>> +     * Each IO layer must be identified by a unique name, where uniqueness is
>> +     * per connection. Each connection in a postgres cluster can generate the
>> +     * identity from the same string as they will create their IO layers on
>> +     * different sockets. Only one layer per socket can have the same name.
>> +     */
>> +    pr_id = PR_GetUniqueIdentity("PostgreSQL");
>
> Seems like it might not be a bad idea to append Server or something?

Fixed.

>> +    /*
>> +     * Create the actual IO layer as a stub such that it can be pushed onto
>> +     * the layer stack. The step via a stub is required as we define custom
>> +     * callbacks.
>> +     */
>> +    layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods);
>> +    if (!layer)
>> +    {
>> +        ereport(loglevel,
>> +                (errmsg("unable to create NSS I/O layer")));
>> +        return NULL;
>> +    }
>
> Why is this accepting a variable log level? The only caller passes ERROR?

Good catch, that's a leftover from a previous version which no longer makes
sense.  loglevel param removed.

>> +/*
>> + * pg_SSLerrmessage
>> + *        Create and return a human readable error message given
>> + *        the specified error code
>> + *
>> + * PR_ErrorToName only converts the enum identifier of the error to string,
>> + * but that can be quite useful for debugging (and in case PR_ErrorToString is
>> + * unable to render a message then we at least have something).
>> + */
>> +static char *
>> +pg_SSLerrmessage(PRErrorCode errcode)
>> +{
>> +    char        error[128];
>> +    int            ret;
>> +
>> +    /* TODO: this should perhaps use a StringInfo instead.. */
>> +    ret = pg_snprintf(error, sizeof(error), "%s (%s)",
>> +                      PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT),
>> +                      PR_ErrorToName(errcode));
>> +    if (ret)
>> +        return pstrdup(error);
>
>> +    return pstrdup(_("unknown TLS error"));
>> +}
>
> Why not use psrintf() here?

Thats a good question to which I don't have a good answer.  Changed to doing
just that.

>> +++ b/src/include/common/pg_nss.h
>> @@ -0,0 +1,141 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * pg_nss.h
>> + *      Support for NSS as a TLS backend
>> + *
>> + * These definitions are used by both frontend and backend code.
>> + *
>> + * Copyright (c) 2020, PostgreSQL Global Development Group
>> + *
>> + * IDENTIFICATION
>> + *        src/include/common/pg_nss.h
>> + *
>> + *-------------------------------------------------------------------------
>> + */
>> +#ifndef PG_NSS_H
>> +#define PG_NSS_H
>> +
>> +#ifdef USE_NSS
>> +
>> +#include <sslproto.h>
>> +
>> +PRUint16    pg_find_cipher(char *name);
>> +
>> +typedef struct
>> +{
>> +    const char *name;
>> +    PRUint16    number;
>> +}            NSSCiphers;
>> +
>> +#define INVALID_CIPHER    0xFFFF
>> +
>> +/*
>> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h
>> + * in order to provide a human readable version of the ciphers. It would be
>> + * nice to not have to have this, but NSS doesn't provide any API addressing
>> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less?
>> + */
>> +static const NSSCiphers NSS_CipherList[] = {
>> +
>> +    {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL},
>
> Hm. Is this whole business of defining array constants in a header just
> done to avoid having a .c file that needs to be compiled both in
> frontend and backend code?

That was the original motivation, but I guess I should just bit the bullet and
make it a .c compiled in both frontend and backend?

>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * The definitions are however not actually used in NSPR at all, and are only
>> + * intended for what seems to be backwards compatibility for apps written
>> + * against old versions of NSPR.  The following comment is in the referenced
>> + * file, and was added in 1998:
>> + *
>> + *        This section typedefs the old 'native' types to the new PR<type>s.
>> + *        These definitions are scheduled to be eliminated at the earliest
>> + *        possible time. The NSPR API is implemented and documented using
>> + *        the new definitions.
>> + *
>> + * As there is no opt-out from pulling in these typedefs, we define the guard
>> + * for the file to exclude it. This is incredibly ugly, but seems to be about
>> + * the only way around it.
>> + */
>
> There's a lot of duplicated comments here. Could we move either of the
> files to reference the other for longer ones?

I took a stab at this in the attached version.  The code is perhaps over-
commented in parts but I tried to encode my understanding of NSS into the
comments where documentation is lacking, since I assume I'm not the only one
who is new to NSS.  There might be a need to pare back to keep it focused in
case this patch goes futher.

>> +/*
>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>> + * only way to make NSS use an already set up POSIX file descriptor rather
>> + * than opening one itself. To quote the NSS documentation:
>> + *
>> + *        "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>> + *        implementation changes. In practice, this is unlikely to happen because
>> + *        NSPR's implementation has been stable for years and because of NSPR's
>> + *        strong commitment to backward compatibility."
>> + *
>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>> + *
>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>> + * private we declare it here rather than including it.
>> + */
>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>
> Ugh. This is really the way to do this? How do other applications deal
> with this problem?

They either #include <private/pprio.h> or they do it like this (or vendor NSPR
which makes calling private APIs less problematic).  It sure is ugly, but there
is no alternative to using this function.

>> +#if defined(WIN32)
>> +static const char *ca_trust_name = "nssckbi.dll";
>> +#elif defined(__darwin__)
>> +static const char *ca_trust_name = "libnssckbi.dylib";
>> +#else
>> +static const char *ca_trust_name = "libnssckbi.so";
>> +#endif
>
> There's really no pre-existing handling for this in nss???

NSS_Init does have more or less the above logic (see snippet below), but only
when there is a cert database defined.

    /*
     * The following code is an attempt to automagically find the external root
     * module.
     * Note: Keep the #if-defined chunks in order. HPUX must select before UNIX.
     */

    static const char *dllname =
    #if defined(XP_WIN32) || defined(XP_OS2)
        "nssckbi.dll";
    #elif defined(HPUX) && !defined(__ia64) /* HP-UX PA-RISC */
        "libnssckbi.sl";
    #elif defined(DARWIN)
        "libnssckbi.dylib";
    #elif defined(XP_UNIX) || defined(XP_BEOS)
        "libnssckbi.so";
    #else
    #error "Uh! Oh! I don't know about this platform."
    #endif

In the NSS_INIT_NOCERTDB case there is no such handling of the libname provided
by NSS so we need to do that ourselves.

>> +    /*
>> +     * The original design of NSS was for a single application to use a single
>> +     * copy of it, initialized with NSS_Initialize() which isn't returning any
>> +     * handle with which to refer to NSS. NSS initialization and shutdown are
>> +     * global for the application, so a shutdown in another NSS enabled
>> +     * library would cause NSS to be stopped for libpq as well.  The fix has
>> +     * been to introduce NSS_InitContext which returns a context handle to
>> +     * pass to NSS_ShutdownContext.  NSS_InitContext was introduced in NSS
>> +     * 3.12, but the use of it is not very well documented.
>> +     * https://bugzilla.redhat.com/show_bug.cgi?id=738456
>> +     *
>> +     * The InitParameters struct passed can be used to override internal
>> +     * values in NSS, but the usage is not documented at all. When using
>> +     * NSS_Init initializations, the values are instead set via PK11_Configure
>> +     * calls so the PK11_Configure documentation can be used to glean some
>> +     * details on these.
>> +     *
>> +     * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs
>
>> +
>> +    if (!nss_context)
>> +    {
>> +        char       *err = pg_SSLerrmessage(PR_GetError());
>> +
>> +        printfPQExpBuffer(&conn->errorMessage,
>> +                          libpq_gettext("unable to %s certificate database: %s"),
>> +                          conn->cert_database ? "open" : "create",
>> +                          err);
>> +        free(err);
>> +        return PGRES_POLLING_FAILED;
>> +    }
>> +
>> +    /*
>> +     * Configure cipher policy.
>> +     */
>> +    status = NSS_SetDomesticPolicy();
>
> Why is "domestic" the right thing here?

Historically there are three cipher policies in NSS: Domestic, Export and
France.  These would enable a set of ciphers based on US export restrictions
(domest/export) or French import restrictions.  All ciphers would start
disabled and then the ciphers belonging to the chosen set would be enabled.
Long ago, that was however removed and they now all get enabled by calling
either of these three functions.  NSS_SetDomesticPolicy enables all implemented
ciphers, and the other calls just call NSS_SetDomesticPolicy, I guess that API
was kept for backwards compatibility.  The below bugzilla entry has a bit more
information on this:

   https://bugzilla.mozilla.org/show_bug.cgi?id=848384

That being said, the comment in the code did not reflect that, so I've reworded
it hoping it will be clearer now.

>> +
>> +    PK11_SetPasswordFunc(PQssl_passwd_cb);
>
> Is it actually OK to do stuff like this when other users of NSS might be
> present? That's obviously more likely in the libpq case, compared to the
> backend case (where it's also possible, of course). What prevents us
> from overriding another user's callback?

The password callback pointer is stored in a static variable in NSS (in the
file lib/pk11wrap/pk11auth.c).

>> +ssize_t
>> +pgtls_read(PGconn *conn, void *ptr, size_t len)
>> +{
>> +    PRInt32        nread;
>> +    PRErrorCode status;
>> +    int            read_errno = 0;
>> +
>> +    nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);
>> +
>> +    /*
>> +     * PR_Recv blocks until there is data to read or the timeout expires. Zero
>> +     * is returned for closed connections, while -1 indicates an error within
>> +     * the ongoing connection.
>> +     */
>> +    if (nread == 0)
>> +    {
>> +        read_errno = ECONNRESET;
>> +        return -1;
>> +    }
>
> It's a bit confusing to talk about blocking when the socket presumably
> is in non-blocking mode, and you're also asking to never wait?

Fair enough, I can agree that the wording isn't spot on. The socket is
non-blocking while PR_Recv can block (which is what we ask it not to).  I've
reworded and moved the comment around to hopefully make it clearer.

>> +    if (nread == -1)
>> +    {
>> +        status = PR_GetError();
>> +
>> +        switch (status)
>> +        {
>> +            case PR_WOULD_BLOCK_ERROR:
>> +                read_errno = EINTR;
>> +                break;
>
> Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the
> same as EAGAIN/EWOULDBLOCK?

Right, that's clearly not right.

>> +            case PR_IO_TIMEOUT_ERROR:
>> +                break;
>
> What does this mean? We'll return with a 0 errno here, right? When is
> this case reachable?

It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
we don't do.  It mentioned somewhere that it had happened in no-wait calls due
to a bug, but I fail to find that reference now.  Either way, I've removed it
to fall into the default error handling which now sets errno correctly as that
was a paddle short here.

> E.g. the comment in fe-misc.c:
>                 /* pqsecure_read set the error message for us */
> for this case doesn't seem to be fulfilled by this.

Fixed, I hope.

>> +/*
>> + *    Verify that the server certificate matches the hostname we connected to.
>> + *
>> + * The certificate's Common Name and Subject Alternative Names are considered.
>> + */
>> +int
>> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
>> +                                                int *names_examined,
>> +                                                char **first_name)
>> +{
>> +    return 1;
>> +}
>
> Uh, huh? Certainly doesn't verify anything...

Doh, the verification was done as part of the cert validation callback and I
had missed moving it to the stub.  Fixed and also expanded to closer match how
it's done in the OpenSSL implementation.

>> +/* ------------------------------------------------------------ */
>> +/*            PostgreSQL specific TLS support functions            */
>> +/* ------------------------------------------------------------ */
>> +
>> +/*
>> + * TODO: this a 99% copy of the same function in the backend, make these share
>> + * a single implementation instead.
>> + */
>> +static char *
>> +pg_SSLerrmessage(PRErrorCode errcode)
>> +{
>> +    const char *error;
>> +
>> +    error = PR_ErrorToName(errcode);
>> +    if (error)
>> +        return strdup(error);
>> +
>> +    return strdup("unknown TLS error");
>> +}
>
> Btw, why does this need to duplicate strings, instead of returning a
> const char*?

No, it doesn't, and no longer does.

The attached includes fixes for the above mentioned issues (and a few small
other ones I stumbled across), hopefully without introducing too many new.  As
mentioned, I'll perform the split into multiple patches in a separate version
which only performs a split to make it easier to diff the individual patchfile
versions.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Heikki Linnakangas
Date:
On 27/10/2020 22:07, Daniel Gustafsson wrote:
> /*
>  * Track whether the NSS database has a password set or not. There is no API
>  * function for retrieving password status, so we simply flip this to true in
>  * case NSS invoked the password callback - as that will only happen in case
>  * there is a password. The reason for tracking this is that there are calls
>  * which require a password parameter, but doesn't use the callbacks provided,
>  * so we must call the callback on behalf of these.
>  */
> static bool has_password = false;

This is set in PQssl_passwd_cb function, but never reset. That seems 
wrong. The NSS database used in one connection might have a password, 
while another one might not. Or have I completely misunderstood this?

- Heikki



Re: Support for NSS as a libpq TLS backend

From
Andres Freund
Date:
Hi,

On 2020-10-27 21:07:01 +0100, Daniel Gustafsson wrote:
> > On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
> >> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
> >> From: Daniel Gustafsson <daniel@yesql.se>
> >> Date: Thu, 8 Oct 2020 18:44:28 +0200
> >> Subject: [PATCH] Support for NSS as a TLS backend v12
> >> ---
> >> configure                                     |  223 +++-
> >> configure.ac                                  |   39 +-
> >> contrib/Makefile                              |    2 +-
> >> contrib/pgcrypto/Makefile                     |    5 +
> >> contrib/pgcrypto/nss.c                        |  773 +++++++++++
> >> contrib/pgcrypto/openssl.c                    |    2 +-
> >> contrib/pgcrypto/px.c                         |    1 +
> >> contrib/pgcrypto/px.h                         |    1 +
> > 
> > Personally I'd like to see this patch broken up a bit - it's quite
> > large. Several of the changes could easily be committed separately, no?
> 
> Not sure how much of this makes sense committed separately (unless separately
> means in quick succession), but it could certainly be broken up for the sake of
> making review easier.

Committing e.g. the pgcrypto pieces separately from the backend code
seems unproblematic. But yes, I would expect them to go in close to each
other. I'm mainly concerned with smaller review-able units.

Have you done testing to ensure that NSS PG cooperates correctly with
openssl PG? Is there a way we can make that easier to do? E.g. allowing
to build frontend with NSS and backend with openssl and vice versa?


> >> if test "$with_openssl" = yes ; then
> >> +  if test x"$with_nss" = x"yes" ; then
> >> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> >> +  fi
> > 
> > Based on a quick look there's no similar error check for the msvc
> > build. Should there be?
> 
> Thats a good question.  When embarking on this is seemed quite natural to me
> that it should be, but now I'm not so sure.  Maybe there should be a
> --with-openssl-preferred like how we handle readline/libedit or just allow
> multiple and let the last one win?  Do you have any input on what would make
> sense?
>
> The only thing I think makes no sense is to allow multiple ones at the same
> time given the current autoconf switches, even if it would just be to pick say
> pg_strong_random from one and libpq TLS from another.

Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
to check for errors.

Even better, of course, would be to allow switching of the SSL backend
based on config options (PGC_POSTMASTER GUC for backend, connection
string for frontend). Mainly because that would make testing of
interoperability so much easier.  Obviously still a few places like
pgcrypto, randomness, etc, where only a compile time decision seems to
make sense.


> >> +  CLEANLDFLAGS="$LDFLAGS"
> >> +  # TODO: document this set of LDFLAGS
> >> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
> > 
> > Shouldn't this use nss-config or such?
> 
> Indeed it should, where available.  I've added rudimentary support for that
> without a fallback as of now.

When would we need a fallback?


> > I think it'd also be better if we could include these files as nss/ssl.h
> > etc - ssl.h is a name way too likely to conflict imo.
> 
> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
> will still need the direct path to the headers (from autoconf) since nss.h
> includes NSPR headers as #include <nspr.h> and so on.

Hm. Then it's probably not worth going there...


> >> +static SECStatus
> >> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
> >> +{
> >> +    SECStatus    status;
> >> +    Port       *port = (Port *) arg;
> >> +    CERTCertificate *cert;
> >> +    char       *peer_cn;
> >> +    int            len;
> >> +
> >> +    status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
> >> +    if (status == SECSuccess)
> >> +    {
> >> +        cert = SSL_PeerCertificate(port->pr_fd);
> >> +        len = strlen(cert->subjectName);
> >> +        peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> >> +        if (strncmp(cert->subjectName, "CN=", 3) == 0)
> >> +            strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
> >> +        else
> >> +            strlcpy(peer_cn, cert->subjectName, len + 1);
> >> +        CERT_DestroyCertificate(cert);
> >> +
> >> +        port->peer_cn = peer_cn;
> >> +        port->peer_cert_valid = true;
> > 
> > Hm. We either should have something similar to
> > 
> >             /*
> >              * Reject embedded NULLs in certificate common name to prevent
> >              * attacks like CVE-2009-4034.
> >              */
> >             if (len != strlen(peer_cn))
> >             {
> >                 ereport(COMMERROR,
> >                         (errcode(ERRCODE_PROTOCOL_VIOLATION),
> >                          errmsg("SSL certificate's common name contains embedded null")));
> >                 pfree(peer_cn);
> >                 return -1;
> >             }
> > here, or a comment explaining why not.
> 
> We should, but it's proving rather difficult as there is no equivalent API call
> to get the string as well as the expected length of it.

Hm. Should at least have a test to ensure that's not a problem then. I
hope/assume NSS rejects this somewhere internally...


> > Also, what's up with the CN= bit? Why is that needed here, but not for
> > openssl?
> 
> OpenSSL returns only the value portion, whereas NSS returns key=value so we
> need to skip over the key= part.

Why is it a conditional path though?




> >> +/*
> >> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
> >> + * only way to make NSS use an already set up POSIX file descriptor rather
> >> + * than opening one itself. To quote the NSS documentation:
> >> + *
> >> + *        "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
> >> + *        implementation changes. In practice, this is unlikely to happen because
> >> + *        NSPR's implementation has been stable for years and because of NSPR's
> >> + *        strong commitment to backward compatibility."
> >> + *
> >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
> >> + *
> >> + * The function is declared in <private/pprio.h>, but as it is a header marked
> >> + * private we declare it here rather than including it.
> >> + */
> >> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
> > 
> > Ugh. This is really the way to do this? How do other applications deal
> > with this problem?
> 
> They either #include <private/pprio.h> or they do it like this (or vendor NSPR
> which makes calling private APIs less problematic).  It sure is ugly, but there
> is no alternative to using this function.

Hm - in debian unstable's NSS this function appears to be in nss/ssl.h,
not pprio.h:

/*
** Imports fd into SSL, returning a new socket.  Copies SSL configuration
** from model.
*/
SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd);

and ssl.h starts with:
/*
 * This file contains prototypes for the public SSL functions.


> >> +
> >> +    PK11_SetPasswordFunc(PQssl_passwd_cb);
> > 
> > Is it actually OK to do stuff like this when other users of NSS might be
> > present? That's obviously more likely in the libpq case, compared to the
> > backend case (where it's also possible, of course). What prevents us
> > from overriding another user's callback?
> 
> The password callback pointer is stored in a static variable in NSS (in the
> file lib/pk11wrap/pk11auth.c).

But, uh, how is that not a problem? What happens if a backend imports
libpq? What if plpython imports curl which then also uses nss?


> +    /*
> +     * Finally we must configure the socket for being a server by setting the
> +     * certificate and key.
> +     */
> +    status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa);
> +    if (status != SECSuccess)
> +        ereport(ERROR,
> +                (errmsg("unable to configure secure server: %s",
> +                        pg_SSLerrmessage(PR_GetError()))));
> +    status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0);
> +    if (status != SECSuccess)
> +        ereport(ERROR,
> +                (errmsg("unable to configure server for TLS server connections: %s",
> +                        pg_SSLerrmessage(PR_GetError()))));

Why do both of these need to get called? The NSS docs say:

/*
** Deprecated variant of SSL_ConfigServerCert.
**
...
SSL_IMPORT SECStatus SSL_ConfigSecureServer(
    PRFileDesc *fd, CERTCertificate *cert,
    SECKEYPrivateKey *key, SSLKEAType kea);


Greetings,

Andres Freund



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
>>> Personally I'd like to see this patch broken up a bit - it's quite
>>> large. Several of the changes could easily be committed separately, no?
>>
>> Not sure how much of this makes sense committed separately (unless separately
>> means in quick succession), but it could certainly be broken up for the sake of
>> making review easier.
>
> Committing e.g. the pgcrypto pieces separately from the backend code
> seems unproblematic. But yes, I would expect them to go in close to each
> other. I'm mainly concerned with smaller review-able units.

Attached is a v14 where the logical units are separated into individual
commits.  I hope this split makes it easier to read.

The 0006 commit were things not really related to NSS at all that can be
submitted to -hackers independently of this work, but they're still there since
this version wasn't supposed to change anything.

Most of the changes to sslinfo in 0005 are really only needed in case OpenSSL
isn't the only TLS library, but I would argue that they should be considered
regardless.  There we are still accessing the ->ssl member directly and passing
it to OpenSSL rather than using the be_tls_* API that we have.  I can extract
that portion as a separate patch submission unless there are objections.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 28 Oct 2020, at 07:39, Andres Freund <andres@anarazel.de> wrote:

> Have you done testing to ensure that NSS PG cooperates correctly with
> openssl PG? Is there a way we can make that easier to do? E.g. allowing
> to build frontend with NSS and backend with openssl and vice versa?

When I wrote the Secure Transport patch I had a patch against PostgresNode
which allowed for overriding the server binaries like so:

   SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check

I've used that coupled with manual testing so far to make sure that an openssl
client can talk to an NSS backend and so on.  Before any other backend is added
we clearly need *a* way of doing this, one which no doubt will need to be
improved upon to suit more workflows.

This is sort of the same situation as pg_upgrade, where two trees is needed to
really test it.

I can clean that patch up and post as a starting point for discussions.

>>>> if test "$with_openssl" = yes ; then
>>>> +  if test x"$with_nss" = x"yes" ; then
>>>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>>>> +  fi
>>>
>>> Based on a quick look there's no similar error check for the msvc
>>> build. Should there be?
>>
>> Thats a good question.  When embarking on this is seemed quite natural to me
>> that it should be, but now I'm not so sure.  Maybe there should be a
>> --with-openssl-preferred like how we handle readline/libedit or just allow
>> multiple and let the last one win?  Do you have any input on what would make
>> sense?
>>
>> The only thing I think makes no sense is to allow multiple ones at the same
>> time given the current autoconf switches, even if it would just be to pick say
>> pg_strong_random from one and libpq TLS from another.
>
> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
> to check for errors.

Thats another option, with --with-openssl being an alias for --with-ssl=openssl.

After another round of thinking I like this even better as it makes the build
infra cleaner, so the attached patch has this implemented.

> Even better, of course, would be to allow switching of the SSL backend
> based on config options (PGC_POSTMASTER GUC for backend, connection
> string for frontend). Mainly because that would make testing of
> interoperability so much easier.  Obviously still a few places like
> pgcrypto, randomness, etc, where only a compile time decision seems to
> make sense.

It would make testing easier, but the expense seems potentially rather high.
How would a GUC switch be allowed to operate, would we have mixed backends or
would be require all openssl connectins to be dropped before serving nss ones?

>>>> +  CLEANLDFLAGS="$LDFLAGS"
>>>> +  # TODO: document this set of LDFLAGS
>>>> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>>>
>>> Shouldn't this use nss-config or such?
>>
>> Indeed it should, where available.  I've added rudimentary support for that
>> without a fallback as of now.
>
> When would we need a fallback?

One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship
an nss-config AFAICT. I wouldn't be surprised if there are other cases.

>>> I think it'd also be better if we could include these files as nss/ssl.h
>>> etc - ssl.h is a name way too likely to conflict imo.
>>
>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
>> will still need the direct path to the headers (from autoconf) since nss.h
>> includes NSPR headers as #include <nspr.h> and so on.
>
> Hm. Then it's probably not worth going there...

It does however make visual parsing of the source files easer since it's clear
which ssl.h is being referred to.  I'm in favor of keeping it.

>>>> +static SECStatus
>>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>>>> +{
>>>> +    SECStatus    status;
>>>> +    Port       *port = (Port *) arg;
>>>> +    CERTCertificate *cert;
>>>> +    char       *peer_cn;
>>>> +    int            len;
>>>> +
>>>> +    status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>>>> +    if (status == SECSuccess)
>>>> +    {
>>>> +        cert = SSL_PeerCertificate(port->pr_fd);
>>>> +        len = strlen(cert->subjectName);
>>>> +        peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>>>> +        if (strncmp(cert->subjectName, "CN=", 3) == 0)
>>>> +            strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>>>> +        else
>>>> +            strlcpy(peer_cn, cert->subjectName, len + 1);
>>>> +        CERT_DestroyCertificate(cert);
>>>> +
>>>> +        port->peer_cn = peer_cn;
>>>> +        port->peer_cert_valid = true;
>>>
>>> Hm. We either should have something similar to
>>>
>>>             /*
>>>              * Reject embedded NULLs in certificate common name to prevent
>>>              * attacks like CVE-2009-4034.
>>>              */
>>>             if (len != strlen(peer_cn))
>>>             {
>>>                 ereport(COMMERROR,
>>>                         (errcode(ERRCODE_PROTOCOL_VIOLATION),
>>>                          errmsg("SSL certificate's common name contains embedded null")));
>>>                 pfree(peer_cn);
>>>                 return -1;
>>>             }
>>> here, or a comment explaining why not.
>>
>> We should, but it's proving rather difficult as there is no equivalent API call
>> to get the string as well as the expected length of it.
>
> Hm. Should at least have a test to ensure that's not a problem then. I
> hope/assume NSS rejects this somewhere internally...

Agreed, I'll try to hack up a testcase.

>>> Also, what's up with the CN= bit? Why is that needed here, but not for
>>> openssl?
>>
>> OpenSSL returns only the value portion, whereas NSS returns key=value so we
>> need to skip over the key= part.
>
> Why is it a conditional path though?

It was mostly just a belts-and-suspenders thing, I don't have any hard evidence
that it's been a thing in any modern NSS version so it can be removed.

>>>> +/*
>>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>>>> + * only way to make NSS use an already set up POSIX file descriptor rather
>>>> + * than opening one itself. To quote the NSS documentation:
>>>> + *
>>>> + *        "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>>>> + *        implementation changes. In practice, this is unlikely to happen because
>>>> + *        NSPR's implementation has been stable for years and because of NSPR's
>>>> + *        strong commitment to backward compatibility."
>>>> + *
>>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>>>> + *
>>>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>>>> + * private we declare it here rather than including it.
>>>> + */
>>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>>>
>>> Ugh. This is really the way to do this? How do other applications deal
>>> with this problem?
>>
>> They either #include <private/pprio.h> or they do it like this (or vendor NSPR
>> which makes calling private APIs less problematic).  It sure is ugly, but there
>> is no alternative to using this function.
>
> Hm - in debian unstable's NSS this function appears to be in nss/ssl.h,
> not pprio.h:
>
> /*
> ** Imports fd into SSL, returning a new socket.  Copies SSL configuration
> ** from model.
> */
> SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd);
>
> and ssl.h starts with:
> /*
> * This file contains prototypes for the public SSL functions.

Right, but that's Import*FD*, not Import*TCPSocket*.  We use ImportFD as well
since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS
on it.  Thats been a public API for a long time.  ImportTCPSocket is used to
import an already opened socket into NSPR, else NSPR must open the socket
itself.  That part has been kept private for reasons unknown, as it's
incredibly useful.

>>>> +    PK11_SetPasswordFunc(PQssl_passwd_cb);
>>>
>>> Is it actually OK to do stuff like this when other users of NSS might be
>>> present? That's obviously more likely in the libpq case, compared to the
>>> backend case (where it's also possible, of course). What prevents us
>>> from overriding another user's callback?
>>
>> The password callback pointer is stored in a static variable in NSS (in the
>> file lib/pk11wrap/pk11auth.c).
>
> But, uh, how is that not a problem? What happens if a backend imports
> libpq? What if plpython imports curl which then also uses nss?

Sorry, that sentence wasn't really finished.  What I meant to write was that I
don't really have good answers here.  The available implementation is via the
static var, and there are no alternative APIs.  I've tried googling for
insights but haven't come across any.

The only datapoint I have is that I can't recall there ever being a complaint
against libcurl doing this exact thing.  That of course doesn't mean it cannot
happen or cause problems.

>> +    /*
>> +     * Finally we must configure the socket for being a server by setting the
>> +     * certificate and key.
>> +     */
>> +    status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa);
>> +    if (status != SECSuccess)
>> +        ereport(ERROR,
>> +                (errmsg("unable to configure secure server: %s",
>> +                        pg_SSLerrmessage(PR_GetError()))));
>> +    status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0);
>> +    if (status != SECSuccess)
>> +        ereport(ERROR,
>> +                (errmsg("unable to configure server for TLS server connections: %s",
>> +                        pg_SSLerrmessage(PR_GetError()))));
>
> Why do both of these need to get called? The NSS docs say:
>
> /*
> ** Deprecated variant of SSL_ConfigServerCert.
> **
> ...
> SSL_IMPORT SECStatus SSL_ConfigSecureServer(
>    PRFileDesc *fd, CERTCertificate *ce    rt,
>    SECKEYPrivateKey *key, SSLKEAType kea);

They don't, I had missed the deprecation warning as it's not mentioned at all
in the online documentation:

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html

(SSL_ConfigServerCert isn't at all mentioned there which dates it to before
this went it obsoleting SSL_ConfigSecureServer.)

Fixed by removing the superfluous call.

Thanks again for reviewing!

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 10/29/20 11:20 AM, Daniel Gustafsson wrote:
>> On 28 Oct 2020, at 07:39, Andres Freund <andres@anarazel.de> wrote:
>> Have you done testing to ensure that NSS PG cooperates correctly with
>> openssl PG? Is there a way we can make that easier to do? E.g. allowing
>> to build frontend with NSS and backend with openssl and vice versa?
> When I wrote the Secure Transport patch I had a patch against PostgresNode
> which allowed for overriding the server binaries like so:
>
>    SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check
>
> I've used that coupled with manual testing so far to make sure that an openssl
> client can talk to an NSS backend and so on.  Before any other backend is added
> we clearly need *a* way of doing this, one which no doubt will need to be
> improved upon to suit more workflows.
>
> This is sort of the same situation as pg_upgrade, where two trees is needed to
> really test it.
>
> I can clean that patch up and post as a starting point for discussions.
>
>>>>> if test "$with_openssl" = yes ; then
>>>>> +  if test x"$with_nss" = x"yes" ; then
>>>>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>>>>> +  fi
>>>> Based on a quick look there's no similar error check for the msvc
>>>> build. Should there be?
>>> Thats a good question.  When embarking on this is seemed quite natural to me
>>> that it should be, but now I'm not so sure.  Maybe there should be a
>>> --with-openssl-preferred like how we handle readline/libedit or just allow
>>> multiple and let the last one win?  Do you have any input on what would make
>>> sense?
>>>
>>> The only thing I think makes no sense is to allow multiple ones at the same
>>> time given the current autoconf switches, even if it would just be to pick say
>>> pg_strong_random from one and libpq TLS from another.
>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
>> to check for errors.
> Thats another option, with --with-openssl being an alias for --with-ssl=openssl.
>
> After another round of thinking I like this even better as it makes the build
> infra cleaner, so the attached patch has this implemented.
>
>> Even better, of course, would be to allow switching of the SSL backend
>> based on config options (PGC_POSTMASTER GUC for backend, connection
>> string for frontend). Mainly because that would make testing of
>> interoperability so much easier.  Obviously still a few places like
>> pgcrypto, randomness, etc, where only a compile time decision seems to
>> make sense.
> It would make testing easier, but the expense seems potentially rather high.
> How would a GUC switch be allowed to operate, would we have mixed backends or
> would be require all openssl connectins to be dropped before serving nss ones?
>
>>>>> +  CLEANLDFLAGS="$LDFLAGS"
>>>>> +  # TODO: document this set of LDFLAGS
>>>>> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>>>> Shouldn't this use nss-config or such?
>>> Indeed it should, where available.  I've added rudimentary support for that
>>> without a fallback as of now.
>> When would we need a fallback?
> One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship
> an nss-config AFAICT. I wouldn't be surprised if there are other cases.
>
>>>> I think it'd also be better if we could include these files as nss/ssl.h
>>>> etc - ssl.h is a name way too likely to conflict imo.
>>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
>>> will still need the direct path to the headers (from autoconf) since nss.h
>>> includes NSPR headers as #include <nspr.h> and so on.
>> Hm. Then it's probably not worth going there...
> It does however make visual parsing of the source files easer since it's clear
> which ssl.h is being referred to.  I'm in favor of keeping it.
>
>>>>> +static SECStatus
>>>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>>>>> +{
>>>>> +    SECStatus    status;
>>>>> +    Port       *port = (Port *) arg;
>>>>> +    CERTCertificate *cert;
>>>>> +    char       *peer_cn;
>>>>> +    int            len;
>>>>> +
>>>>> +    status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>>>>> +    if (status == SECSuccess)
>>>>> +    {
>>>>> +        cert = SSL_PeerCertificate(port->pr_fd);
>>>>> +        len = strlen(cert->subjectName);
>>>>> +        peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>>>>> +        if (strncmp(cert->subjectName, "CN=", 3) == 0)
>>>>> +            strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>>>>> +        else
>>>>> +            strlcpy(peer_cn, cert->subjectName, len + 1);
>>>>> +        CERT_DestroyCertificate(cert);
>>>>> +
>>>>> +        port->peer_cn = peer_cn;
>>>>> +        port->peer_cert_valid = true;
>>>> Hm. We either should have something similar to
>>>>
>>>>             /*
>>>>              * Reject embedded NULLs in certificate common name to prevent
>>>>              * attacks like CVE-2009-4034.
>>>>              */
>>>>             if (len != strlen(peer_cn))
>>>>             {
>>>>                 ereport(COMMERROR,
>>>>                         (errcode(ERRCODE_PROTOCOL_VIOLATION),
>>>>                          errmsg("SSL certificate's common name contains embedded null")));
>>>>                 pfree(peer_cn);
>>>>                 return -1;
>>>>             }
>>>> here, or a comment explaining why not.
>>> We should, but it's proving rather difficult as there is no equivalent API call
>>> to get the string as well as the expected length of it.
>> Hm. Should at least have a test to ensure that's not a problem then. I
>> hope/assume NSS rejects this somewhere internally...
> Agreed, I'll try to hack up a testcase.
>
>>>> Also, what's up with the CN= bit? Why is that needed here, but not for
>>>> openssl?
>>> OpenSSL returns only the value portion, whereas NSS returns key=value so we
>>> need to skip over the key= part.
>> Why is it a conditional path though?
> It was mostly just a belts-and-suspenders thing, I don't have any hard evidence
> that it's been a thing in any modern NSS version so it can be removed.
>
>>>>> +/*
>>>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>>>>> + * only way to make NSS use an already set up POSIX file descriptor rather
>>>>> + * than opening one itself. To quote the NSS documentation:
>>>>> + *
>>>>> + *        "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>>>>> + *        implementation changes. In practice, this is unlikely to happen because
>>>>> + *        NSPR's implementation has been stable for years and because of NSPR's
>>>>> + *        strong commitment to backward compatibility."
>>>>> + *
>>>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>>>>> + *
>>>>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>>>>> + * private we declare it here rather than including it.
>>>>> + */
>>>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>>>> Ugh. This is really the way to do this? How do other applications deal
>>>> with this problem?
>>> They either #include <private/pprio.h> or they do it like this (or vendor NSPR
>>> which makes calling private APIs less problematic).  It sure is ugly, but there
>>> is no alternative to using this function.
>> Hm - in debian unstable's NSS this function appears to be in nss/ssl.h,
>> not pprio.h:
>>
>> /*
>> ** Imports fd into SSL, returning a new socket.  Copies SSL configuration
>> ** from model.
>> */
>> SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd);
>>
>> and ssl.h starts with:
>> /*
>> * This file contains prototypes for the public SSL functions.
> Right, but that's Import*FD*, not Import*TCPSocket*.  We use ImportFD as well
> since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS
> on it.  Thats been a public API for a long time.  ImportTCPSocket is used to
> import an already opened socket into NSPR, else NSPR must open the socket
> itself.  That part has been kept private for reasons unknown, as it's
> incredibly useful.
>
>>>>> +    PK11_SetPasswordFunc(PQssl_passwd_cb);
>>>> Is it actually OK to do stuff like this when other users of NSS might be
>>>> present? That's obviously more likely in the libpq case, compared to the
>>>> backend case (where it's also possible, of course). What prevents us
>>>> from overriding another user's callback?
>>> The password callback pointer is stored in a static variable in NSS (in the
>>> file lib/pk11wrap/pk11auth.c).
>> But, uh, how is that not a problem? What happens if a backend imports
>> libpq? What if plpython imports curl which then also uses nss?
> Sorry, that sentence wasn't really finished.  What I meant to write was that I
> don't really have good answers here.  The available implementation is via the
> static var, and there are no alternative APIs.  I've tried googling for
> insights but haven't come across any.
>
> The only datapoint I have is that I can't recall there ever being a complaint
> against libcurl doing this exact thing.  That of course doesn't mean it cannot
> happen or cause problems.
>
>>> +    /*
>>> +     * Finally we must configure the socket for being a server by setting the
>>> +     * certificate and key.
>>> +     */
>>> +    status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa);
>>> +    if (status != SECSuccess)
>>> +        ereport(ERROR,
>>> +                (errmsg("unable to configure secure server: %s",
>>> +                        pg_SSLerrmessage(PR_GetError()))));
>>> +    status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0);
>>> +    if (status != SECSuccess)
>>> +        ereport(ERROR,
>>> +                (errmsg("unable to configure server for TLS server connections: %s",
>>> +                        pg_SSLerrmessage(PR_GetError()))));
>> Why do both of these need to get called? The NSS docs say:
>>
>> /*
>> ** Deprecated variant of SSL_ConfigServerCert.
>> **
>> ...
>> SSL_IMPORT SECStatus SSL_ConfigSecureServer(
>>    PRFileDesc *fd, CERTCertificate *ce    rt,
>>    SECKEYPrivateKey *key, SSLKEAType kea);
> They don't, I had missed the deprecation warning as it's not mentioned at all
> in the online documentation:
>
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html
>
> (SSL_ConfigServerCert isn't at all mentioned there which dates it to before
> this went it obsoleting SSL_ConfigSecureServer.)
>
> Fixed by removing the superfluous call.
>



I've been looking through the new patch set, in particular the testing
setup.

The way it seems to proceed is to use the existing openssl generated
certificates and imports them into NSS certificate databases. That seems
fine to bootstrap testing, but it seems to me it would be more sound not
to rely on openssl at all. I'd rather see the Makefile containing
commands to create these from scratch, which mirror the openssl
variants. IOW you should be able to build and test this from scratch,
including certificate generation, without having openssl installed at all.

I also notice that the invocations to pk12util don't contain the "sql:"
prefix to the -d option, even though the database was created with that
prefix a few lines above. That seems like a mistake from my reading of
the pk12util man page.


cheers


andrew







Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 1 Nov 2020, at 14:13, Andrew Dunstan <andrew@dunslane.net> wrote:

> I've been looking through the new patch set, in particular the testing
> setup.

Thanks!

> The way it seems to proceed is to use the existing openssl generated
> certificates and imports them into NSS certificate databases. That seems
> fine to bootstrap testing,

That's pretty much why I opted for using the existing certs: to bootstrap the
patch and ensure OpenSSL-backend compatibility.

> but it seems to me it would be more sound not
> to rely on openssl at all. I'd rather see the Makefile containing
> commands to create these from scratch, which mirror the openssl
> variants. IOW you should be able to build and test this from scratch,
> including certificate generation, without having openssl installed at all.

I don't disagree with this, but I do also believe there is value in testing all
TLS backends with exactly the same certificates to act as a baseline.  The
nssfiles target should definitely be able to generate from scratch, but maybe a
combination is the best option?

Being well versed in the buildfarm code, do you have an off-the-cuff idea on
how to do cross library testing such that OpenSSL/NSS compatibility can be
ensured?  Andres was floating the idea of making a single sourcetree be able to
have both for testing but more discussion is needed to settle on a way forward.

> I also notice that the invocations to pk12util don't contain the "sql:"
> prefix to the -d option, even though the database was created with that
> prefix a few lines above. That seems like a mistake from my reading of
> the pk12util man page.

Fixed in the attached v16, which also drops the parts of the patchset which
have been submitted separately to -hackers (the sslinfo patch hunks are still
there are they are required).

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 11/1/20 5:04 PM, Daniel Gustafsson wrote:
>> On 1 Nov 2020, at 14:13, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I've been looking through the new patch set, in particular the testing
>> setup.
> Thanks!
>
>> The way it seems to proceed is to use the existing openssl generated
>> certificates and imports them into NSS certificate databases. That seems
>> fine to bootstrap testing,
> That's pretty much why I opted for using the existing certs: to bootstrap the
> patch and ensure OpenSSL-backend compatibility.
>
>> but it seems to me it would be more sound not
>> to rely on openssl at all. I'd rather see the Makefile containing
>> commands to create these from scratch, which mirror the openssl
>> variants. IOW you should be able to build and test this from scratch,
>> including certificate generation, without having openssl installed at all.
> I don't disagree with this, but I do also believe there is value in testing all
> TLS backends with exactly the same certificates to act as a baseline.  The
> nssfiles target should definitely be able to generate from scratch, but maybe a
> combination is the best option?


Yeah. I certainly think we need something that should how we would
generate them from scratch using nss. That said, the importation code is
also useful.



>
> Being well versed in the buildfarm code, do you have an off-the-cuff idea onIU
> how to do cross library testing such that OpenSSL/NSS compatibility can be
> ensured?  Andres was floating the idea of making a single sourcetree be able to
> have both for testing but more discussion is needed to settle on a way forward.


Well, I'd probably try to leverage the knowledge we have in doing
cross-version upgrade testing. It works like this: After the
install-check-C stage each branch saves its binaries and data files in a
special location, adjusting things like library locations to match. then
to test that version it uses that against all the older versions
similarly saved.


We could generalize that saving mechanism and do it if any module
required it. But instead of testing against a different branch, we'd
test against a different animal. So we'd have two animals, one building
with openssl and one with nss, and they would test against each other
(i.e. one as the client and one as the sever, and vice versa).


This would involve a deal of work on my part, but it's very doable, I
believe.


We'd need a way to run tests where we could specify the client and
server binary locations.


Anyway, those are my thoughts. Comments welcome.



cheers


andrew


--
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 27 Oct 2020, at 21:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 27/10/2020 22:07, Daniel Gustafsson wrote:
>> /*
>> * Track whether the NSS database has a password set or not. There is no API
>> * function for retrieving password status, so we simply flip this to true in
>> * case NSS invoked the password callback - as that will only happen in case
>> * there is a password. The reason for tracking this is that there are calls
>> * which require a password parameter, but doesn't use the callbacks provided,
>> * so we must call the callback on behalf of these.
>> */
>> static bool has_password = false;
>
> This is set in PQssl_passwd_cb function, but never reset. That seems wrong. The NSS database used in one connection
mighthave a password, while another one might not. Or have I completely misunderstood this? 

(sorry for slow response).  You are absolutely right, the has_password flag
must be tracked per connection in PGconn.  The attached v17 implements this as
well a frontend bugfix which caused dropped connections and some smaller fixups
to make strings more translateable.

I've also included a WIP version of SCRAM channel binding in the attached
patch, it's currently failing to connect but someone here might spot the bug
before I do so I figured it's better to include it.

The 0005 patch is now, thanks to the sslinfo patch going in on master, only
containing NSS specific code.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 2 Nov 2020, at 15:17, Andrew Dunstan <andrew@dunslane.net> wrote:

> We could generalize that saving mechanism and do it if any module
> required it. But instead of testing against a different branch, we'd
> test against a different animal. So we'd have two animals, one building
> with openssl and one with nss, and they would test against each other
> (i.e. one as the client and one as the sever, and vice versa).

That seems like a very good plan.  It would also allow us to test a backend
compiled with OpenSSL 1.0.2 against a frontend with OpenSSL 1.1.1 which might
come in handy when OpenSSL 3.0.0 lands.

> This would involve a deal of work on my part, but it's very doable, I
> believe.

I have no experience with the buildfarm code, but I'm happy to help if theres
anything I can do.

cheers ./daniel



Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 4, 2020, at 5:09 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

> (sorry for slow response).  You are absolutely right, the has_password flag
> must be tracked per connection in PGconn.  The attached v17 implements this as
> well a frontend bugfix which caused dropped connections and some smaller fixups
> to make strings more translateable.

Some initial notes from building and testing on macOS Mojave. I'm working with
both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a
hand-built NSS/NSPR (which does not).

1. In configure.ac:

> +  LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
> +  CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
> +
> +  AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])])

Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails
with the hand-built stack, where there is no nss-config to populate LDFLAGS. I
changed the function to NSS_InitContext and that seems to work nicely.

2. Among the things to eventually think about when it comes to configuring, it
looks like some platforms [1] install the headers under <nspr4/...> and
<nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS
maintainers never chose an official installation layout.

3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both

  src/include/common/pg_nss.h
  src/port/pg_strong_random.c

before the tree will compile for me. Both of those files include NSS headers.

4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR
assertion that looks like

  sslMutex_Init not implemented for multi-process applications !

With assertions disabled, this ends up showing a somewhat unhelpful

  FATAL:  unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE)

It looks like cross-process locking isn't actually enabled on macOS, which is a
long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache()
error out.

--Jacob

[1] https://github.com/erthink/ReOpenLDAP/issues/112
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=538680
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1192500




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 6 Nov 2020, at 21:37, Jacob Champion <pchampion@vmware.com> wrote:

> Some initial notes from building and testing on macOS Mojave. I'm working with
> both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a
> hand-built NSS/NSPR (which does not).

Thanks for looking!

> 1. In configure.ac:
>
>> +  LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
>> +  CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
>> +
>> +  AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])])
>
> Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails
> with the hand-built stack, where there is no nss-config to populate LDFLAGS. I
> changed the function to NSS_InitContext and that seems to work nicely.

Ah yes, fixed.

> 2. Among the things to eventually think about when it comes to configuring, it
> looks like some platforms [1] install the headers under <nspr4/...> and
> <nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS
> maintainers never chose an official installation layout.

Yeah, maybe we need to start with the most common path and have fallbacks in
case not found?

> 3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both
>
>  src/include/common/pg_nss.h
>  src/port/pg_strong_random.c
>
> before the tree will compile for me. Both of those files include NSS headers.

Odd that I was able to compile on Linux, but I've added these.

> 4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR
> assertion that looks like
>
>  sslMutex_Init not implemented for multi-process applications !
>
> With assertions disabled, this ends up showing a somewhat unhelpful
>
>  FATAL:  unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE)
>
> It looks like cross-process locking isn't actually enabled on macOS, which is a
> long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache()
> error out.

Thats unfortunate since the session cache is required for a server application
backed by NSS.  The attached switches to SSL_ConfigServerSessionIDCacheWithOpt
with which one can explicitly make the cache non-shared, which in turn backs
the mutexes with NSPR locks rather than the missing sem_init.  Can you test
this version and see if that makes it work?

This version also contains a channel binding bug that Heikki pointed out off-
list (sadly not The bug) and a few very minor cleanups as well as a rebase to
handle the new pg_strong_random_init.  Actually performing the context init
there is yet a TODO, but I wanted a version out that at all compiled.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> The attached switches to SSL_ConfigServerSessionIDCacheWithOpt
> with which one can explicitly make the cache non-shared, which in turn backs
> the mutexes with NSPR locks rather than the missing sem_init.  Can you test
> this version and see if that makes it work?

Yep, I get much farther through the tests with that patch. I'm currently
diving into another assertion failure during socket disconnection:

    Assertion failure: fd->secret == NULL, at prlayer.c:45

cURL has some ominously vague references to this [1], though I'm not
sure that we should work around it in the same way without knowing what
the cause is...

--Jacob

[1] https://github.com/curl/curl/blob/4d2f800/lib/vtls/nss.c#L1266




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 10 Nov 2020, at 21:11, Jacob Champion <pchampion@vmware.com> wrote:
> On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <daniel@yesql.se> wrote:

>> The attached switches to SSL_ConfigServerSessionIDCacheWithOpt
>> with which one can explicitly make the cache non-shared, which in turn backs
>> the mutexes with NSPR locks rather than the missing sem_init.  Can you test
>> this version and see if that makes it work?
>
> Yep, I get much farther through the tests with that patch.

Great, thanks for confirming.

> I'm currently
> diving into another assertion failure during socket disconnection:
>
>    Assertion failure: fd->secret == NULL, at prlayer.c:45
>
> cURL has some ominously vague references to this [1], though I'm not
> sure that we should work around it in the same way without knowing what
> the cause is...

Digging through the archives from when this landed in curl, the assertion
failure was never fully identified back then but happened spuriously.  Which
version of NSPR is this happening with?

cheers ./daniel


Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 10, 2020, at 2:28 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> Digging through the archives from when this landed in curl, the assertion
> failure was never fully identified back then but happened spuriously.  Which
> version of NSPR is this happening with?

This is NSPR 4.29, with debugging enabled. The fd that causes the
assertion is the custom layer that's added during be_tls_open_server(),
which connects a Port as the layer secret. It looks like NSPR is trying
to help surface potential memory leaks by asserting if the secret is
non-NULL at the time the stack is being closed.

In this case, it doesn't matter since the Port lifetime is managed
elsewhere, but it looks easy enough to add a custom close in the way
that cURL and the NSPR test programs [1] do. Sample patch attached,
which gets me to the end of the tests without any assertions. (Two
failures left on my machine.)

--Jacob

[1] https://hg.mozilla.org/projects/nspr/file/bf6620c143/pr/tests/nblayer.c#l354


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 11, 2020, at 10:17 AM, Jacob Champion <pchampion@vmware.com> wrote:
>
> (Two failures left on my machine.)

False alarm -- the stderr debugging I'd added in to track down the
assertion tripped up the "no stderr" tests. Zero failing tests now.

--Jacob



Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote:
>
> False alarm -- the stderr debugging I'd added in to track down the
> assertion tripped up the "no stderr" tests. Zero failing tests now.

I took a look at the OpenSSL interop problems you mentioned upthread. I
don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
connection.

I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
were read, in order to satisfy its API. There was some discussion on
this upthread:

On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
>>
>>> +            case PR_IO_TIMEOUT_ERROR:
>>> +                break;
>>
>> What does this mean? We'll return with a 0 errno here, right? When is
>> this case reachable?
>
> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
> we don't do.  It mentioned somewhere that it had happened in no-wait calls due
> to a bug, but I fail to find that reference now.  Either way, I've removed it
> to fall into the default error handling which now sets errno correctly as that
> was a paddle short here.

PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
machine. It doesn't look like the PR_Recv() API has a choice -- if
there's no data, it can't return a positive integer, and returning zero
means that the socket has been disconnected. So -1 with a timeout error
is the only option.

I'm not completely sure why this is exposed so easily with an OpenSSL
server -- I'm guessing the implementation slices up its packets
differently on the wire, causing a read event before NSS is able to
decrypt a full record -- but it's worth noting that this case also shows
up during NSS-to-NSS psql connections, when handling notifications at
the end of every query. PQconsumeInput() reports a hard failure with the
current implementation, but its return value is ignored by
PrintNotifications(). Otherwise this probably would have showed up
earlier.

(What's the best way to test this case? Are there lower-level tests for
the protocol/network layer somewhere that I'm missing?)

While patching this case, I also noticed that pgtls_read() doesn't call
SOCK_ERRNO_SET() for the disconnection case. That is also in the
attached patch.

--Jacob


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 12 Nov 2020, at 23:12, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote:
>>
>> False alarm -- the stderr debugging I'd added in to track down the
>> assertion tripped up the "no stderr" tests. Zero failing tests now.
>
> I took a look at the OpenSSL interop problems you mentioned upthread.

Great, thanks!

> I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
> connection.
>
> I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
> were read, in order to satisfy its API. There was some discussion on
> this upthread:
>
> On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
>>>
>>>> +            case PR_IO_TIMEOUT_ERROR:
>>>> +                break;
>>>
>>> What does this mean? We'll return with a 0 errno here, right? When is
>>> this case reachable?
>>
>> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
>> we don't do.  It mentioned somewhere that it had happened in no-wait calls due
>> to a bug, but I fail to find that reference now.  Either way, I've removed it
>> to fall into the default error handling which now sets errno correctly as that
>> was a paddle short here.
>
> PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
> machine. It doesn't look like the PR_Recv() API has a choice -- if
> there's no data, it can't return a positive integer, and returning zero
> means that the socket has been disconnected. So -1 with a timeout error
> is the only option.

Right, that makes sense.

> I'm not completely sure why this is exposed so easily with an OpenSSL
> server -- I'm guessing the implementation slices up its packets
> differently on the wire, causing a read event before NSS is able to
> decrypt a full record -- but it's worth noting that this case also shows
> up during NSS-to-NSS psql connections, when handling notifications at
> the end of every query. PQconsumeInput() reports a hard failure with the
> current implementation, but its return value is ignored by
> PrintNotifications(). Otherwise this probably would have showed up
> earlier.

Should there perhaps be an Assert there to catch those?

> (What's the best way to test this case? Are there lower-level tests for
> the protocol/network layer somewhere that I'm missing?)

Not AFAIK.  Having been knee-deep now, do you have any ideas on how to
implement?

> While patching this case, I also noticed that pgtls_read() doesn't call
> SOCK_ERRNO_SET() for the disconnection case. That is also in the
> attached patch.

Ah yes, nice catch.

I've incorporated this patch as well as the previous patch for the assertion
failure on private callback data into the attached v19 patchset.  I also did a
spellcheck and pgindent run on it for ease of review.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 13, 2020, at 4:14 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 12 Nov 2020, at 23:12, Jacob Champion <pchampion@vmware.com> wrote:
>>
>> I'm not completely sure why this is exposed so easily with an OpenSSL
>> server -- I'm guessing the implementation slices up its packets
>> differently on the wire, causing a read event before NSS is able to
>> decrypt a full record -- but it's worth noting that this case also shows
>> up during NSS-to-NSS psql connections, when handling notifications at
>> the end of every query. PQconsumeInput() reports a hard failure with the
>> current implementation, but its return value is ignored by
>> PrintNotifications(). Otherwise this probably would have showed up
>> earlier.
>
> Should there perhaps be an Assert there to catch those?

Hm. From the perspective of helping developers out, perhaps, but from
the standpoint of "don't crash when an endpoint outside our control does
something strange", I think that's a harder sell. Should the error be
bubbled all the way up instead? Or perhaps, if psql isn't supposed to
treat notification errors as "hard" failures, it should at least warn
the user that something is fishy?

>> (What's the best way to test this case? Are there lower-level tests for
>> the protocol/network layer somewhere that I'm missing?)
>
> Not AFAIK.  Having been knee-deep now, do you have any ideas on how to
> implement?

I think that testing these sorts of important edge cases needs a
friendly DSL -- something that doesn't want to make devs tear their hair
out while building tests. I've been playing a little bit with Scapy [1]
to understand more of the libpq v3 protocol; I'll see if that can be
adapted for pieces of the TLS handshake in a way that's easy to
maintain. If it can be, maybe that'd be a good starting example.

> I've incorporated this patch as well as the previous patch for the assertion
> failure on private callback data into the attached v19 patchset.  I also did a
> spellcheck and pgindent run on it for ease of review.

Commit 6be725e70 got rid of some psql error messaging that the tests
were keying off of, so there are a few new failures after a rebase onto
latest master.

I've attached a patch that gets the SCRAM tests a little further
(certificate hashing was caught in an infinite loop). I also added error
checks to those loops, along the lines of the existing OpenSSL
implementation: if a suitable digest can't be found, the user will see
an error like

    psql: error: could not find digest for OID 'PKCS #1 SHA-256 With RSA Encryption'

It's a little verbose but I don't think this case should come up in
normal practice.

--Jacob

[1] https://scapy.net/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 16 Nov 2020, at 21:00, Jacob Champion <pchampion@vmware.com> wrote:
> On Nov 13, 2020, at 4:14 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

>> I've incorporated this patch as well as the previous patch for the assertion
>> failure on private callback data into the attached v19 patchset.  I also did a
>> spellcheck and pgindent run on it for ease of review.
>
> Commit 6be725e70 got rid of some psql error messaging that the tests
> were keying off of, so there are a few new failures after a rebase onto
> latest master.
>
> I've attached a patch that gets the SCRAM tests a little further
> (certificate hashing was caught in an infinite loop). I also added error
> checks to those loops, along the lines of the existing OpenSSL
> implementation: if a suitable digest can't be found, the user will see
> an error like
>
>    psql: error: could not find digest for OID 'PKCS #1 SHA-256 With RSA Encryption'
>
> It's a little verbose but I don't think this case should come up in
> normal practice.

Nice, thanks for the fix!  I've incorporated your patch into the attached v20
which also fixes client side error reporting to be more readable.  The SCRAM
tests are now also hooked up, albeit with SKIP blocks for NSS, so they can
start getting fixed.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
> which also fixes client side error reporting to be more readable.

I was testing handshake failure modes and noticed that some FATAL
messages are being sent through to the client in cleartext. The OpenSSL
implementation doesn't do this, because it logs handshake problems at
COMMERROR level. Should we switch all those ereport() calls in the NSS
be_tls_open_server() to COMMERROR as well (and return explicitly), to
avoid this? Or was there a reason for logging at FATAL/ERROR level?

Related note, at the end of be_tls_open_server():

>     ...
>     port->ssl_in_use = true;
>     return 0;
>
> error:
>     return 1;
> }

This needs to return -1 in the error case; the only caller of
secure_open_server() does a direct `result == -1` comparison rather than
checking `result != 0`.

--Jacob


Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2020-10-27 at 21:07 +0100, Daniel Gustafsson wrote:
> > On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
> > 
> > > +static SECStatus
> > > +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
> > > +{
> > > +    SECStatus    status;
> > > +    Port       *port = (Port *) arg;
> > > +    CERTCertificate *cert;
> > > +    char       *peer_cn;
> > > +    int            len;
> > > +
> > > +    status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
> > > +    if (status == SECSuccess)
> > > +    {
> > > +        cert = SSL_PeerCertificate(port->pr_fd);
> > > +        len = strlen(cert->subjectName);
> > > +        peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> > > +        if (strncmp(cert->subjectName, "CN=", 3) == 0)
> > > +            strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
> > > +        else
> > > +            strlcpy(peer_cn, cert->subjectName, len + 1);
> > > +        CERT_DestroyCertificate(cert);
> > > +
> > > +        port->peer_cn = peer_cn;
> > > +        port->peer_cert_valid = true;
> > 
> > Hm. We either should have something similar to
> > 
> >             /*
> >              * Reject embedded NULLs in certificate common name to prevent
> >              * attacks like CVE-2009-4034.
> >              */
> >             if (len != strlen(peer_cn))
> >             {
> >                 ereport(COMMERROR,
> >                         (errcode(ERRCODE_PROTOCOL_VIOLATION),
> >                          errmsg("SSL certificate's common name contains embedded null")));
> >                 pfree(peer_cn);
> >                 return -1;
> >             }
> > here, or a comment explaining why not.
> 
> We should, but it's proving rather difficult as there is no equivalent API call
> to get the string as well as the expected length of it.

I'm going to try to tackle this part next. It looks like NSS uses RFC
4514 (or something like it) backslash-quoting, which this code either
needs to undo or bypass before performing a comparison.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Nov 17, 2020 at 04:00:53PM +0100, Daniel Gustafsson wrote:
> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
> which also fixes client side error reporting to be more readable.  The SCRAM
> tests are now also hooked up, albeit with SKIP blocks for NSS, so they can
> start getting fixed.

On top of the set of TODO items mentioned in the logs of the patches,
this patch set needs a rebase because it does not apply.  In order to
move on with this set, I would suggest to extract some parts of the
patch set independently of the others and have two buildfarm members
for the MSVC and non-MSVC cases to stress the parts that can be
committed.  Just seeing the size, we could move on with:
- The ./configure set, with the change to introduce --with-ssl=openssl.
- 0004 for strong randoms.
- Support for cryptohashes.

+/*
+ * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
+ * our version to avoid compiler warnings on redefinition.
+ */
+#define pg_BITS_PER_BYTE BITS_PER_BYTE
+#undef BITS_PER_BYTE
This could be done separately.

src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 18 Jan 2021, at 08:08, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Nov 17, 2020 at 04:00:53PM +0100, Daniel Gustafsson wrote:
>> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
>> which also fixes client side error reporting to be more readable.  The SCRAM
>> tests are now also hooked up, albeit with SKIP blocks for NSS, so they can
>> start getting fixed.
>
> On top of the set of TODO items mentioned in the logs of the patches,
> this patch set needs a rebase because it does not apply.

Fixed in the attached, which also addresses the points raised earlier by Jacob
as well as adds certificates created entirely by NSS tooling as well as initial
cryptohash support.  There is something iffy with these certs (the test fails
on mismatching ciphers and/or signature algorithms) that I haven't been able to
pin down, but to get more eyes on this I'm posting the patch with the test
enabled.  The NSS toolchain requires interactive input which makes the Makefile
a bit hacky, ideas on cleaning that up are appreciated.

> In order to
> move on with this set, I would suggest to extract some parts of the
> patch set independently of the others and have two buildfarm members
> for the MSVC and non-MSVC cases to stress the parts that can be
> committed.  Just seeing the size, we could move on with:
> - The ./configure set, with the change to introduce --with-ssl=openssl.
> - 0004 for strong randoms.
> - Support for cryptohashes.

I will leave it to others to decide the feasibility of this, I'm happy to slice
and dice the commits into smaller bits to for example separate out the
--with-ssl autoconf change into a non NSS dependent commit, if that's wanted.

> +/*
> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
> + * our version to avoid compiler warnings on redefinition.
> + */
> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
> +#undef BITS_PER_BYTE
> This could be done separately.

Based on an offlist discussion I believe this was a misunderstanding, but if I
instead misunderstood that feel free to correct me with how you think this
should be done.

> src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?

Good point, fixed.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 4 Dec 2020, at 01:57, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
>> which also fixes client side error reporting to be more readable.
>
> I was testing handshake failure modes and noticed that some FATAL
> messages are being sent through to the client in cleartext. The OpenSSL
> implementation doesn't do this, because it logs handshake problems at
> COMMERROR level. Should we switch all those ereport() calls in the NSS
> be_tls_open_server() to COMMERROR as well (and return explicitly), to
> avoid this? Or was there a reason for logging at FATAL/ERROR level?

The ERROR logging made early development easier but then stuck around, I've
changed them to COMMERROR returning an error instead in the v21 patch just
sent to the list.

> Related note, at the end of be_tls_open_server():
>
>>    ...
>>    port->ssl_in_use = true;
>>    return 0;
>>
>> error:
>>    return 1;
>> }
>
> This needs to return -1 in the error case; the only caller of
> secure_open_server() does a direct `result == -1` comparison rather than
> checking `result != 0`.

Fixed.

cheers ./daniel


Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote:
> There is something iffy with these certs (the test fails
> on mismatching ciphers and/or signature algorithms) that I haven't been able to
> pin down, but to get more eyes on this I'm posting the patch with the test
> enabled.

Removing `--keyUsage keyEncipherment` from the native_server-* CSR
generation seems to let the tests pass for me, but I'm wary of just
pushing that as a solution because I don't understand why that would
have anything to do with the failure mode
(SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM).

> The NSS toolchain requires interactive input which makes the Makefile
> a bit hacky, ideas on cleaning that up are appreciated.

Hm. I got nothing, short of a feature request to NSS...

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 20 Jan 2021, at 01:40, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote:
>> There is something iffy with these certs (the test fails
>> on mismatching ciphers and/or signature algorithms) that I haven't been able to
>> pin down, but to get more eyes on this I'm posting the patch with the test
>> enabled.
>
> Removing `--keyUsage keyEncipherment` from the native_server-* CSR
> generation seems to let the tests pass for me, but I'm wary of just
> pushing that as a solution because I don't understand why that would
> have anything to do with the failure mode
> (SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM).

Aha, that was a good clue, I had overlooked the required extensions in the CSR.
Re-reading RFC 5280 it seems we need keyEncipherment, dataEncipherment and
digitalSignature to create a valid SSL Server certificate.  Adding those indeed
make the test pass.  Skimming the certutil code *I think* removing it as you
did cause a set of defaults to kick in that made it work based on the parameter
"--nsCertType sslServer", but it's not entirely easy to make out.  Either way,
relying on defaults in a test suite seems less than good, so I've extended the
Makefile to be explicit about the extensions.

The attached v22 rebase incorporates the fixup to the test Makefile, with not
further changes on top of that.

cheers ./daniel


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-01-20 at 12:58 +0100, Daniel Gustafsson wrote:
> Aha, that was a good clue, I had overlooked the required extensions in the CSR.
> Re-reading RFC 5280 it seems we need keyEncipherment, dataEncipherment and
> digitalSignature to create a valid SSL Server certificate.  Adding those indeed
> make the test pass.  Skimming the certutil code *I think* removing it as you
> did cause a set of defaults to kick in that made it work based on the parameter
> "--nsCertType sslServer", but it's not entirely easy to make out.

Lovely. I didn't expect *removing* an extension to effectively *add*
more, but I'm glad it works now.

==

To continue the Subject Common Name discussion [1] from a different
part of the thread:

Attached is a v23 version of the patchset that peels the raw Common
Name out from a client cert's Subject. This allows the following cases
that the OpenSSL implementation currently handles:

- subjects that don't begin with a CN
- subjects with quotable characters
- subjects that have no CN at all
Embedded NULLs are now handled in a similar manner to the OpenSSL side,
though because this failure happens during the certificate
authentication callback, it results in a TLS alert rather than simply
closing the connection.

For easier review of just the parts I've changed, I've also attached a
since-v22.diff, which is part of the 0001 patch.

--Jacob

[1] 
https://www.postgresql.org/message-id/7d6a23a7e30540b486abc823f7ced7a93e1da1e8.camel%40vmware.com

Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Jan 19, 2021 at 09:21:41PM +0100, Daniel Gustafsson wrote:
>> In order to
>> move on with this set, I would suggest to extract some parts of the
>> patch set independently of the others and have two buildfarm members
>> for the MSVC and non-MSVC cases to stress the parts that can be
>> committed.  Just seeing the size, we could move on with:
>> - The ./configure set, with the change to introduce --with-ssl=openssl.
>> - 0004 for strong randoms.
>> - Support for cryptohashes.
>
> I will leave it to others to decide the feasibility of this, I'm happy to slice
> and dice the commits into smaller bits to for example separate out the
> --with-ssl autoconf change into a non NSS dependent commit, if that's wanted.

IMO it makes sense to extract the independent pieces and build on top
of them.  The bulk of the changes is likely going to have a bunch of
comments if reviewed deeply, so I think that we had better remove from
the stack the small-ish problems to ease the next moves.  The
./configure part and replacement of with_openssl by with_ssl is mixed
in 0001 and 0002, which is actually confusing.  And, FWIW, I would be
fine with applying a patch that introduces a --with-ssl with a
compatibility kept for --with-openssl.  This is what 0001 is doing,
actually, similarly to the past switches for --with-uuid.

A point that has been mentioned offline by you, but not mentioned on
this list.  The structure of the modules in src/test/ssl/ could be
refactored to help with an easier integration of more SSL libraries.
This makes sense taken independently.

> Based on an offlist discussion I believe this was a misunderstanding, but if I
> instead misunderstood that feel free to correct me with how you think this
> should be done.

The point would be to rename BITS_PER_BYTE to PG_BITS_PER_BYTE in the
code and avoid conflicts.  I am not completely sure if others would
agree here, but this would remove quite some ifdef/undef stuff from
the code dedicated to NSS.

> > src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?
>
> Good point, fixed.

Please note that patch 0001 is failing to apply after the recent
commit b663a41.  There are conflicts in postgres_fdw.out.

Also, what's the minimum version of NSS that would be supported?  It
would be good to define an acceptable older version, to keep that
documented and to track that perhaps with some configure checks (?),
similarly to what is done for OpenSSL.

Patch 0006 has three trailing whitespaces (git diff --check
complains).  Running the regression tests of pgcrypto, I think that
the SHA2 implementation is not completely right.  Some SHA2 encoding
reports results from already-freed data.  I have spotted a second
issue within scram_HMAC_init(), where pg_cryptohash_create() remains
stuck inside NSS_InitContext(), freezing the regression tests where
password hashed for SCRAM are created.

+   ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
+   ctx = MemoryContextAlloc(TopMemoryContext, sizeof(pg_cryptohash_ctx));
+#else
+   ctx = pg_malloc(sizeof(pg_cryptohash_ctx));
+#endif
cryptohash_nss.c cannot use pg_malloc() for frontend allocations.  On
OOM, your patch would call exit() directly, even within libpq.  But
shared library callers need to know about the OOM failure.

+   explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
+   pfree(ctx);
For similar reasons, pfree should not be used for the frontend code in
cryptohash_nss.c.  The fallback should be just a malloc/free set.

+   status = PK11_DigestBegin(ctx->pk11_context);
+
+   if (status != SECSuccess)
+       return 1;
+   return 0;
This needs to return -1 on failure, not 1.

I really need to study more the choide of the options chosen for
NSS_InitContext()...  But based on the docs I can read on the matter I
think that saving nsscontext in pg_cryptohash_ctx is right for each
cryptohash built.

src/tools/msvc/ is missing an update for cryptohash_nss.c.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2020-07-20 at 15:35 +0200, Daniel Gustafsson wrote:
> With this, I have one failing test ("intermediate client certificate is
> provided by client") which I've left failing since I believe the case should be
> supported by NSS.  The issue is most likely that I havent figured out the right
> certinfo incantation to make it so (Mozilla hasn't strained themselves when
> writing documentation for this toolchain, or any part of NSS for that matter).

I think we're missing a counterpart to this piece of the OpenSSL
implementation, in be_tls_init():

    if (ssl_ca_file[0])
    {
        ...
        SSL_CTX_set_client_CA_list(context, root_cert_list);
    }

I think the NSS equivalent to SSL_CTX_set_client_CA_list() is probably
SSL_SetTrustAnchors() (which isn't called out in the online NSS docs,
as far as I can see).

What I'm less sure of is how we want the NSS counterpart to ssl_ca_file
to behave. The OpenSSL implementation allows a list of CA names to be
sent. Should the NSS side take a list of CA cert nicknames? a list of
Subjects? something else?

mod_nss for httpd had a proposed feature [1] to do this that
unfortunately withered on the vine, and Google returns ~500 results for
"SSL_SetTrustAnchors", so I'm unaware of any prior art in the wild...

--Jacob

[1] https://bugzilla.redhat.com/show_bug.cgi?id=719401

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Thu, 2021-01-21 at 14:21 +0900, Michael Paquier wrote:
> Also, what's the minimum version of NSS that would be supported?  It
> would be good to define an acceptable older version, to keep that
> documented and to track that perhaps with some configure checks (?),
> similarly to what is done for OpenSSL.

Some version landmarks:

- 3.21 adds support for extended master secret, which according to [1]
is required for SCRAM channel binding to actually be secure.
- 3.26 is Debian Stretch.
- 3.28 is Ubuntu 16.04, and RHEL6 (I think).
- 3.35 is Ubuntu 18.04.
- 3.36 is RHEL7 (I think).
- 3.39 gets us final TLS 1.3 support.
- 3.42 is Debian Buster.
- 3.49 is Ubuntu 20.04.

(I'm having trouble finding online package information for RHEL variants, so I've pulled those versions from online
supportdocs. If someone notices that those are wrong please speak up.)
 
So 3.39 would guarantee TLS1.3 but exclude a decent chunk of still-
supported Debian-alikes. Anything less than 3.21 seems actively unsafe
unless we disable SCRAM with those versions.

Any other important landmarks (whether feature- or distro-related) we
need to consider?

--Jacob

[1] https://tools.ietf.org/html/rfc7677#section-4

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Wed, Jan 20, 2021 at 05:07:08PM +0000, Jacob Champion wrote:
> Lovely. I didn't expect *removing* an extension to effectively *add*
> more, but I'm glad it works now.

My apologies for chiming in.  I was looking at your patch set here,
and while reviewing the strong random and cryptohash parts I have
found a couple of mistakes in the ./configure part.  I think that the
switch from --with-openssl to --with-ssl={openssl} could just be done
independently as a building piece of the rest, then the first portion
based on NSS could just add the minimum set in configure.ac.

Please note that the patch set has been using autoconf from Debian, or
something forked from upstream.  There were also missing updates in
several parts of the code base, and a lack of docs for the new
switch.  I have spent time checking that with --with-openssl to make
sure that the obsolete grammar is still compatible, --with-ssl=openssl
and also without it.

Thoughts?
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-01-27 at 16:39 +0900, Michael Paquier wrote:
> My apologies for chiming in.  I was looking at your patch set here,
> and while reviewing the strong random and cryptohash parts I have
> found a couple of mistakes in the ./configure part.  I think that the
> switch from --with-openssl to --with-ssl={openssl} could just be done
> independently as a building piece of the rest, then the first portion
> based on NSS could just add the minimum set in configure.ac.
> 
> Please note that the patch set has been using autoconf from Debian, or
> something forked from upstream.  There were also missing updates in
> several parts of the code base, and a lack of docs for the new
> switch.  I have spent time checking that with --with-openssl to make
> sure that the obsolete grammar is still compatible, --with-ssl=openssl
> and also without it.
> 
> Thoughts?

Seems good to me on Ubuntu; builds with both flavors.

From peering at the Windows side:

> --- a/src/tools/msvc/config_default.pl
> +++ b/src/tools/msvc/config_default.pl
> @@ -16,7 +16,7 @@ our $config = {
>      tcl       => undef,    # --with-tcl=<path>
>      perl      => undef,    # --with-perl=<path>
>      python    => undef,    # --with-python=<path>
> -    openssl   => undef,    # --with-openssl=<path>
> +    openssl   => undef,    # --with-ssl=openssl with <path>
>      uuid      => undef,    # --with-uuid=<path>
>      xml       => undef,    # --with-libxml=<path>
>      xslt      => undef,    # --with-libxslt=<path>

So to check understanding: the `openssl` config variable is still alive
for MSVC builds; it just turns that into `--with-ssl=openssl` in the
fake CONFIGURE_ARGS?

<bikeshed color="lightblue">

Since SSL is an obsolete term, and the choice of OpenSSL vs NSS vs
[nothing] affects server operation (such as cryptohash) regardless of
whether or not connection-level TLS is actually used, what would you
all think about naming this option --with-crypto? I.e.

    --with-crypto=openssl
    --with-crypto=nss

</bikeshed>

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Wed, Jan 27, 2021 at 06:47:17PM +0000, Jacob Champion wrote:
> So to check understanding: the `openssl` config variable is still alive
> for MSVC builds; it just turns that into `--with-ssl=openssl` in the
> fake CONFIGURE_ARGS?

Yeah, I think that keeping both variables separated in the MSVC
scripts is the most straight-forward option, as this passes down a
path.  Once there is a value for nss, we'd need to properly issue an
error if both OpenSSL and NSS are specified.

> Since SSL is an obsolete term, and the choice of OpenSSL vs NSS vs
> [nothing] affects server operation (such as cryptohash) regardless of
> whether or not connection-level TLS is actually used, what would you
> all think about naming this option --with-crypto? I.e.
>
>     --with-crypto=openssl
>     --with-crypto=nss

Looking around, curl has multiple switches for each lib with one named
--with-ssl for OpenSSL, but it needs to be able to use multiple
libraries at run time.  I can spot that libssh2 uses what you are
proposing.  It seems to me that --with-ssl is a bit more popular but
not by that much: wget, wayland, some apache stuff (it uses a path as
option value).  Anyway, what you are suggesting sounds like a good in
the context of Postgres.  Daniel?
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 28 Jan 2021, at 07:06, Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Jan 27, 2021 at 06:47:17PM +0000, Jacob Champion wrote:

>> Since SSL is an obsolete term, and the choice of OpenSSL vs NSS vs
>> [nothing] affects server operation (such as cryptohash) regardless of
>> whether or not connection-level TLS is actually used, what would you
>> all think about naming this option --with-crypto? I.e.
>>
>>    --with-crypto=openssl
>>    --with-crypto=nss
>
> Looking around, curl has multiple switches for each lib with one named
> --with-ssl for OpenSSL, but it needs to be able to use multiple
> libraries at run time.

To be fair, if we started over in curl I would push back on --with-ssl meaning
OpenSSL but that ship has long since sailed.

> I can spot that libssh2 uses what you are
> proposing.  It seems to me that --with-ssl is a bit more popular but
> not by that much: wget, wayland, some apache stuff (it uses a path as
> option value).  Anyway, what you are suggesting sounds like a good in
> the context of Postgres.  Daniel?

SSL is admittedly an obsolete technical term, but it's one that enough people
have decided is interchangeable with TLS that it's not a hill worth dying on
IMHO.  Since postgres won't allow for using libnss or OpenSSL for cryptohash
*without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is
more descriptive and less confusing.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Thu, 2021-01-21 at 20:16 +0000, Jacob Champion wrote:
> I think we're missing a counterpart to this piece of the OpenSSL
> implementation, in be_tls_init():

Never mind. Using SSL_SetTrustAnchor is something we could potentially
do if we wanted to further limit the CAs that are actually sent to the
client, but it shouldn't be necessary to get the tests to pass.

I now think that it's just a matter of making sure that the "server-cn-
only" DB has the root_ca.crt included, so that it can correctly
validate the client certificate. Incidentally I think this should also
fix the remaining failing SCRAM test. I'll try to get a patch out
tomorrow, if adding the root CA doesn't invalidate some other test
logic.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Fri, Jan 29, 2021 at 12:20:21AM +0100, Daniel Gustafsson wrote:
> SSL is admittedly an obsolete technical term, but it's one that enough people
> have decided is interchangeable with TLS that it's not a hill worth dying on
> IMHO.  Since postgres won't allow for using libnss or OpenSSL for cryptohash
> *without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is
> more descriptive and less confusing.

Okay, let's use --with-ssl then for the new switch name.  The previous
patch is backward-compatible, and will simplify the rest of the set,
so let's move on with it.  Once this is done, my guess is that it
would be cleaner to have a new patch that includes only the
./configure and MSVC changes, and then the rest: test refactoring,
cryptohash, strong random and lastly TLS (we may want to cut this a
bit more though and perhaps have some restrictions depending on the
scope of options a first patch set could support).

I'll wait a bit first to see if there are any objections to this
change.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 21 Jan 2021, at 06:21, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 19, 2021 at 09:21:41PM +0100, Daniel Gustafsson wrote:
>>> In order to
>>> move on with this set, I would suggest to extract some parts of the
>>> patch set independently of the others and have two buildfarm members
>>> for the MSVC and non-MSVC cases to stress the parts that can be
>>> committed.  Just seeing the size, we could move on with:
>>> - The ./configure set, with the change to introduce --with-ssl=openssl.
>>> - 0004 for strong randoms.
>>> - Support for cryptohashes.
>>
>> I will leave it to others to decide the feasibility of this, I'm happy to slice
>> and dice the commits into smaller bits to for example separate out the
>> --with-ssl autoconf change into a non NSS dependent commit, if that's wanted.
>
> IMO it makes sense to extract the independent pieces and build on top
> of them.  The bulk of the changes is likely going to have a bunch of
> comments if reviewed deeply, so I think that we had better remove from
> the stack the small-ish problems to ease the next moves.  The
> ./configure part and replacement of with_openssl by with_ssl is mixed
> in 0001 and 0002, which is actually confusing.  And, FWIW, I would be
> fine with applying a patch that introduces a --with-ssl with a
> compatibility kept for --with-openssl.  This is what 0001 is doing,
> actually, similarly to the past switches for --with-uuid.

This has been discussed elsewhere in the thread, so let's continue that there.
The attached v23 does however split off --with-ssl for OpenSSL in 0001, adding
the nss option in 0002.

> A point that has been mentioned offline by you, but not mentioned on
> this list.  The structure of the modules in src/test/ssl/ could be
> refactored to help with an easier integration of more SSL libraries.
> This makes sense taken independently.

This has been submitted in F513E66A-E693-4802-9F8A-A74C1D0E3D10@yesql.se.

>> Based on an offlist discussion I believe this was a misunderstanding, but if I
>> instead misunderstood that feel free to correct me with how you think this
>> should be done.
>
> The point would be to rename BITS_PER_BYTE to PG_BITS_PER_BYTE in the
> code and avoid conflicts.  I am not completely sure if others would
> agree here, but this would remove quite some ifdef/undef stuff from
> the code dedicated to NSS.

Aha, now I see what you mean, sorry for the confusion.  That can certainly be
done (and done so outside of this patchset), but it admittedly feels a bit
intrusive.  If there is consensus that we should namespace our version like
this I'll go ahead and do that.

>>> src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?
>>
>> Good point, fixed.
>
> Please note that patch 0001 is failing to apply after the recent
> commit b663a41.  There are conflicts in postgres_fdw.out.

Fixed.

> Patch 0006 has three trailing whitespaces (git diff --check complains).

Fixed.

> Running the regression tests of pgcrypto, I think that
> the SHA2 implementation is not completely right.  Some SHA2 encoding
> reports results from already-freed data.

I've been unable to reproduce, can you shed some light on this?

> I have spotted a second
> issue within scram_HMAC_init(), where pg_cryptohash_create() remains
> stuck inside NSS_InitContext(), freezing the regression tests where
> password hashed for SCRAM are created.

I think the freezing you saw comes from opening and closing NSS contexts per
cryptohash op (some patience on my part runs the test Ok in ~30s which is
clearly not in the wheelhouse of acceptable), more on that below.

> +   ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
> +   ctx = MemoryContextAlloc(TopMemoryContext, sizeof(pg_cryptohash_ctx));
> +#else
> +   ctx = pg_malloc(sizeof(pg_cryptohash_ctx));
> +#endif
> cryptohash_nss.c cannot use pg_malloc() for frontend allocations.  On
> OOM, your patch would call exit() directly, even within libpq.  But
> shared library callers need to know about the OOM failure.

Of course, fixed.

> +   status = PK11_DigestBegin(ctx->pk11_context);
> +
> +   if (status != SECSuccess)
> +       return 1;
> +   return 0;
> This needs to return -1 on failure, not 1.

Doh, fixed.

> I really need to study more the choide of the options chosen for
> NSS_InitContext()...  But based on the docs I can read on the matter I
> think that saving nsscontext in pg_cryptohash_ctx is right for each
> cryptohash built.

It's a safe but slow option, NSS wasn't really made for running a single crypto
operation.  Since we are opening a context which isn't backed by an NSS
database we could have a static context, which indeed speeds up processing a
lot.  The problem with that is that there is no good callsite for closing the
context as the backend is closing down.  Since you are kneedeep in the
cryptohash code, do you have any thoughts on this?  I've included 0008 which
implements this, with a commented out dummy stub for cleaning up.

Making nss_context static in cryptohash_nss.c is
appealing but there is no good option for closing it there.  Any thoughts on
how to handle global contexts like this?

> src/tools/msvc/ is missing an update for cryptohash_nss.c.

Fixed.

--
Daniel Gustafsson        https://vmware.com/



Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 29 Jan 2021, at 07:01, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jan 29, 2021 at 12:20:21AM +0100, Daniel Gustafsson wrote:
>> SSL is admittedly an obsolete technical term, but it's one that enough people
>> have decided is interchangeable with TLS that it's not a hill worth dying on
>> IMHO.  Since postgres won't allow for using libnss or OpenSSL for cryptohash
>> *without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is
>> more descriptive and less confusing.
>
> Okay, let's use --with-ssl then for the new switch name.  The previous
> patch is backward-compatible, and will simplify the rest of the set,
> so let's move on with it.  Once this is done, my guess is that it
> would be cleaner to have a new patch that includes only the
> ./configure and MSVC changes, and then the rest: test refactoring,
> cryptohash, strong random and lastly TLS (we may want to cut this a
> bit more though and perhaps have some restrictions depending on the
> scope of options a first patch set could support).
>
> I'll wait a bit first to see if there are any objections to this
> change.

I'm still not convinced that adding --with-ssl=openssl is worth it before the
rest of NSS goes in (and more importantly, *if* it goes in).

On the one hand, we already have pluggable (for some value of) support for
adding TLS libraries, and adding --with-ssl is one more piece of that puzzle.
We could of course have endless --with-X options instead but as you say,
--with-uuid has set the tone here (and I believe that's good).  On the other
hand, if we never add any other library than OpenSSL then it's just complexity
without benefit.

As mentioned elsewhere in the thread, the current v23 patchset has the
--with-ssl change as a separate commit to at least make it visual what it looks
like.  The documentation changes are in the main NSS patch though since
documenting --with-ssl when there is only one possible value didn't seem to be
helpful to users whom are fully expected to use --with-openssl still.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Fri, 2021-01-29 at 13:57 +0100, Daniel Gustafsson wrote:
> > On 21 Jan 2021, at 06:21, Michael Paquier <michael@paquier.xyz> wrote:
> > I really need to study more the choide of the options chosen for
> > NSS_InitContext()...  But based on the docs I can read on the matter I
> > think that saving nsscontext in pg_cryptohash_ctx is right for each
> > cryptohash built.
> 
> It's a safe but slow option, NSS wasn't really made for running a single crypto
> operation.  Since we are opening a context which isn't backed by an NSS
> database we could have a static context, which indeed speeds up processing a
> lot.  The problem with that is that there is no good callsite for closing the
> context as the backend is closing down.  Since you are kneedeep in the
> cryptohash code, do you have any thoughts on this?  I've included 0008 which
> implements this, with a commented out dummy stub for cleaning up.
> 
> Making nss_context static in cryptohash_nss.c is
> appealing but there is no good option for closing it there.  Any thoughts on
> how to handle global contexts like this?

I'm completely new to this code, so take my thoughts with a grain of
salt...

I think the bad news is that the static approach will need support for
ENABLE_THREAD_SAFETY. (It looks like the NSS implementation of
pgtls_close() needs some thread support too?)

The good(?) news is that I don't understand why OpenSSL's
implementation of cryptohash doesn't _also_ need the thread-safety
code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al
before using any of its cryptohash implementation?) So maybe we can
implement the same global setup/teardown API for OpenSSL too and not
have to one-off it for NSS...

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Fri, Jan 29, 2021 at 02:13:30PM +0100, Daniel Gustafsson wrote:
> I'm still not convinced that adding --with-ssl=openssl is worth it before the
> rest of NSS goes in (and more importantly, *if* it goes in).
>
> On the one hand, we already have pluggable (for some value of) support for
> adding TLS libraries, and adding --with-ssl is one more piece of that puzzle.
> We could of course have endless --with-X options instead but as you say,
> --with-uuid has set the tone here (and I believe that's good).  On the other
> hand, if we never add any other library than OpenSSL then it's just complexity
> without benefit.

IMO, one could say the same thing for any piece of refactoring we have
done in the past to make the TLS/crypto code more modular.  There is
demand for being able to choose among multiple SSL libs at build time,
and we are still in a phase where we evaluate the options at hand.
This refactoring is just careful progress, and this is one step in
this direction.  The piece about refactoring the SSL tests is
similar.

> As mentioned elsewhere in the thread, the current v23 patchset has the
> --with-ssl change as a separate commit to at least make it visual what it looks
> like.  The documentation changes are in the main NSS patch though since
> documenting --with-ssl when there is only one possible value didn't seem to be
> helpful to users whom are fully expected to use --with-openssl still.

The documentation changes should be part of the patch introducing the
switch IMO: a description of the new switch, as well as a paragraph
about the old value being deprecated.  That's done this way for UUID.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Fri, Jan 29, 2021 at 01:57:02PM +0100, Daniel Gustafsson wrote:
> This has been discussed elsewhere in the thread, so let's continue that there.
> The attached v23 does however split off --with-ssl for OpenSSL in 0001, adding
> the nss option in 0002.

While going through 0001, I have found a couple of things.

-CF_SRCS = $(if $(subst no,,$(with_openssl)), $(OSSL_SRCS), $(INT_SRCS))
-CF_TESTS = $(if $(subst no,,$(with_openssl)), $(OSSL_TESTS), $(INT_TESTS))
+CF_SRCS = $(if $(subst openssl,,$(with_ssl)), $(OSSL_SRCS), $(INT_SRCS))
+CF_TESTS = $(if $(subst openssl,,$(with_ssl)), $(OSSL_TESTS), $(INT_TESTS))
It seems to me that this part is the opposite, aka here the OpenSSL
files and tests (OSSL*) would be used if with_ssl is not openssl.

-ifeq ($(with_openssl),yes)
+ifneq ($(with_ssl),no)
+OBJS += \
+       fe-secure-common.o
+endif
This split is better, good idea.

The two SSL tests still included a reference to with_openssl after
0001:
src/test/ssl/t/001_ssltests.pl:if ($ENV{with_openssl} eq 'yes')
src/test/ssl/t/002_scram.pl:if ($ENV{with_openssl} ne 'yes')

I have refreshed the docs on top to be consistent with the new
configuration, and applied it after more checks.  I'll try to look in
more details at the failures with cryptohashes I found upthread.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 20 Jan 2021, at 18:07, Jacob Champion <pchampion@vmware.com> wrote:

> To continue the Subject Common Name discussion [1] from a different
> part of the thread:
>
> Attached is a v23 version of the patchset that peels the raw Common
> Name out from a client cert's Subject. This allows the following cases
> that the OpenSSL implementation currently handles:
>
> - subjects that don't begin with a CN
> - subjects with quotable characters
> - subjects that have no CN at all

Nice, thanks for fixing this!

> Embedded NULLs are now handled in a similar manner to the OpenSSL side,
> though because this failure happens during the certificate
> authentication callback, it results in a TLS alert rather than simply
> closing the connection.

But returning SECFailure from the cert callback force NSS to terminate the
connection immediately doesn't it?

> For easier review of just the parts I've changed, I've also attached a
> since-v22.diff, which is part of the 0001 patch.

I confused my dev trees and missed to include this in the v23 that I sent out
(which should've been v24), sorry about that.  Attached is a v24 which is
rebased on top of todays --with-ssl commit, and now includes your changes.

Additionally I've added a shutdown callback such that we close the connection
immediately if NSS is shutting down from underneath us.  I can't imagine a
scenario in which that's benign, so let's take whatever precautions we can.

I've also changed the NSS initialization in the cryptohash code to closer match
what the NSS documentation recommends for similar scenarios, but more on that
downthread where that's discussed.

--
Daniel Gustafsson        https://vmware.com/



Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 29 Jan 2021, at 19:46, Jacob Champion <pchampion@vmware.com> wrote:

> I think the bad news is that the static approach will need support for
> ENABLE_THREAD_SAFETY.

I did some more reading today and noticed that the NSS documentation (and their
sample code for doing crypto without TLS connections) says to use NSS_NoDB_Init
to perform a read-only init which don't require a matching close call.  Now,
the docs aren't terribly clear and also seems to have gone offline from MDN,
and skimming the code isn't entirelt self-explanatory, so I may well have
missed something.  The v24 patchset posted changes to this and at least passes
tests with decent performance so it seems worth investigating.

> (It looks like the NSS implementation of pgtls_close() needs some thread
> support too?)


Storing the context in conn would probably be better?

> The good(?) news is that I don't understand why OpenSSL's
> implementation of cryptohash doesn't _also_ need the thread-safety
> code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al
> before using any of its cryptohash implementation?) So maybe we can
> implement the same global setup/teardown API for OpenSSL too and not
> have to one-off it for NSS...

No idea here, wouldn't that impact pgcrypto as well in that case?

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 1 Feb 2021, at 14:25, Michael Paquier <michael@paquier.xyz> wrote:

> I have refreshed the docs on top to be consistent with the new
> configuration, and applied it after more checks.

Thanks, I was just about to send a rebased version earlier today with the doc
changes in the 0001 patch when this email landed in my inbox =) The v24 posted
upthread is now rebased on top of this.

> I'll try to look in more details at the failures with cryptohashes I found
> upthread.

Great, thanks.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2021-02-01 at 21:49 +0100, Daniel Gustafsson wrote:
> > On 29 Jan 2021, at 19:46, Jacob Champion <pchampion@vmware.com> wrote:
> > I think the bad news is that the static approach will need support for
> > ENABLE_THREAD_SAFETY.
> 
> I did some more reading today and noticed that the NSS documentation (and their
> sample code for doing crypto without TLS connections) says to use NSS_NoDB_Init
> to perform a read-only init which don't require a matching close call.  Now,
> the docs aren't terribly clear and also seems to have gone offline from MDN,
> and skimming the code isn't entirelt self-explanatory, so I may well have
> missed something.  The v24 patchset posted changes to this and at least passes
> tests with decent performance so it seems worth investigating.

Nice! Not having to close helps quite a bit.

(Looks like thread safety for NSS_Init was added in 3.13, so we have an
absolute version floor.)

> > (It looks like the NSS implementation of pgtls_close() needs some thread
> > support too?)
> 
> Storing the context in conn would probably be better?

Agreed.

> > The good(?) news is that I don't understand why OpenSSL's
> > implementation of cryptohash doesn't _also_ need the thread-safety
> > code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al
> > before using any of its cryptohash implementation?) So maybe we can
> > implement the same global setup/teardown API for OpenSSL too and not
> > have to one-off it for NSS...
> 
> No idea here, wouldn't that impact pgcrypto as well in that case?

If pgcrypto is backend-only then I don't think it should need
multithreading protection; is that right?

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2021-02-01 at 21:49 +0100, Daniel Gustafsson wrote:
> > Embedded NULLs are now handled in a similar manner to the OpenSSL side,
> > though because this failure happens during the certificate
> > authentication callback, it results in a TLS alert rather than simply
> > closing the connection.
> 
> But returning SECFailure from the cert callback force NSS to terminate the
> connection immediately doesn't it?

IIRC NSS will send the alert first, whereas our OpenSSL implementation
will complete the handshake and then drop the connection. I'll rebuild
with the latest and confirm.

> > For easier review of just the parts I've changed, I've also attached a
> > since-v22.diff, which is part of the 0001 patch.
> 
> I confused my dev trees and missed to include this in the v23 that I sent out
> (which should've been v24), sorry about that.  Attached is a v24 which is
> rebased on top of todays --with-ssl commit, and now includes your changes.

No problem. Thanks!

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Feb 02, 2021 at 12:42:23AM +0000, Jacob Champion wrote:
> (Looks like thread safety for NSS_Init was added in 3.13, so we have an
> absolute version floor.)

If that's the case, I would recommend to add at least something in the
section called install-requirements in the docs.

> If pgcrypto is backend-only then I don't think it should need
> multithreading protection; is that right?

No need for it in the backend, unless there are plans to switch from
processes to threads there :p

libpq, ecpg and anything using them have to care about that.  Worth
noting that OpenSSL also has some special handling in libpq with
CRYPTO_get_id_callback() and that it tracks the number of opened
connections.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2021-02-02 at 00:55 +0000, Jacob Champion wrote:
> On Mon, 2021-02-01 at 21:49 +0100, Daniel Gustafsson wrote:
> > > Embedded NULLs are now handled in a similar manner to the OpenSSL side,
> > > though because this failure happens during the certificate
> > > authentication callback, it results in a TLS alert rather than simply
> > > closing the connection.
> > 
> > But returning SECFailure from the cert callback force NSS to terminate the
> > connection immediately doesn't it?
> 
> IIRC NSS will send the alert first, whereas our OpenSSL implementation
> will complete the handshake and then drop the connection. I'll rebuild
> with the latest and confirm.

I wasn't able to reproduce the behavior I thought I saw before. In any
case I think the current NSS implementation for embedded NULLs will
work correctly.

> > Attached is a v24 which is
> > rebased on top of todays --with-ssl commit, and now includes your changes.

I have a v25 attached which fixes and re-enables the skipped/todo'd
client certificate and SCRAM tests. (Changes between v24 and v25 are in
since-v24.diff.) The server-cn-only database didn't have the root CA
installed to be able to verify client certificates, so I've added it.

Note that this changes the error message printed during the invalid-
root tests, because NSS is now sending the root of the chain. So the
server's issuer is considered untrusted rather than unrecognized.

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Feb 02, 2021 at 08:33:35PM +0000, Jacob Champion wrote:
> Note that this changes the error message printed during the invalid-
> root tests, because NSS is now sending the root of the chain. So the
> server's issuer is considered untrusted rather than unrecognized.

I think that it is not a good idea to attach the since-v*.diff patches
into the threads.  This causes the CF bot to fail in applying those
patches.

Could it be possible to split 0001 into two parts at least with one
patch that includes the basic changes for the build and ./configure,
and a second with the FE/BE changes?
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Thu, 2021-02-04 at 16:30 +0900, Michael Paquier wrote:
> On Tue, Feb 02, 2021 at 08:33:35PM +0000, Jacob Champion wrote:
> > Note that this changes the error message printed during the invalid-
> > root tests, because NSS is now sending the root of the chain. So the
> > server's issuer is considered untrusted rather than unrecognized.
> 
> I think that it is not a good idea to attach the since-v*.diff patches
> into the threads.  This causes the CF bot to fail in applying those
> patches.

Ah, sorry about that. Is there an extension I can use (or lack thereof)
that the CF bot will ignore, or does it scan the attachment contents?

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Thu, Feb 04, 2021 at 06:35:28PM +0000, Jacob Champion wrote:
> Ah, sorry about that. Is there an extension I can use (or lack thereof)
> that the CF bot will ignore, or does it scan the attachment contents?

The thing is smart, but there are ways to bypass it.  Here is the
code:
https://github.com/macdice/cfbot/

And here are the patterns looked at:
cfbot_commitfest_rpc.py:    groups = re.search('<a
href="(/message-id/attachment/[^"]*\\.(diff|diff\\.gz|patch|patch\\.gz|tar\\.gz|tgz|tar\\.bz2))">',
line)
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 4 Feb 2021, at 08:30, Michael Paquier <michael@paquier.xyz> wrote:

> Could it be possible to split 0001 into two parts at least with one
> patch that includes the basic changes for the build and ./configure,
> and a second with the FE/BE changes?

Attached is a new patchset where I've tried to split the patches even further
to try and separate out changes for easier review.  While not a perfect split
I'm sure, and clearly only for review purposes, I do hope it helps a little.
There is one hunk in 0002 which moves some OpenSSL specific code from
underneath USE_SSL, but thats about the only non-NSS change left in this
patchset AFAICS.

Additionally, this version moves the code in thee shared header to a proper .c
file shared between frontend and backend as well as performs some general
cleanup around that.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 4 Feb 2021, at 19:35, Jacob Champion <pchampion@vmware.com> wrote:
> 
> On Thu, 2021-02-04 at 16:30 +0900, Michael Paquier wrote:
>> On Tue, Feb 02, 2021 at 08:33:35PM +0000, Jacob Champion wrote:
>>> Note that this changes the error message printed during the invalid-
>>> root tests, because NSS is now sending the root of the chain. So the
>>> server's issuer is considered untrusted rather than unrecognized.
>> 
>> I think that it is not a good idea to attach the since-v*.diff patches
>> into the threads.  This causes the CF bot to fail in applying those
>> patches.
> 
> Ah, sorry about that. Is there an extension I can use (or lack thereof)
> that the CF bot will ignore, or does it scan the attachment contents?

Naming the file .patch.txt should work, and it serves the double purpose of
making it extra clear that this is not a patch intended to be applied but one
intended to be read for informational purposes.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Feb 09, 2021 at 12:08:37AM +0100, Daniel Gustafsson wrote:
> Attached is a new patchset where I've tried to split the patches even further
> to try and separate out changes for easier review.  While not a perfect split
> I'm sure, and clearly only for review purposes, I do hope it helps a little.
> There is one hunk in 0002 which moves some OpenSSL specific code from
> underneath USE_SSL, but thats about the only non-NSS change left in this
> patchset AFAICS.

I would have imagined 0010 to be either a 0001 or a 0002 :)

 }
+#endif                         /* USE_SSL */
+
+#ifndef USE_OPENSSL

PQsslKeyPassHook_OpenSSL_type
PQgetSSLKeyPassHook_OpenSSL(void)
Indeed.  Let's fix that on HEAD, as an independent thing.

     errmsg("hostssl record cannot match because SSL is not supported by this build"),
-    errhint("Compile with --with-ssl=openssl to use SSL connections."),
+    errhint("Compile with --with-ssl to use SSL connections."),
Actually, we could change that directly on HEAD as you suggest.  This
code area is surrounded with USE_SSL so there is no need to mention
openssl at all.

-/* Support for overriding sslpassword handling with a callback. */
+/* Support for overriding sslpassword handling with a callback */
Makes sense.

 /*
  * USE_SSL code should be compiled only when compiling with an SSL
- * implementation.  (Currently, only OpenSSL is supported, but we might add
- * more implementations in the future.)
+ * implementation.
  */
Fine by me as well, meaning that 0002 could just be committed as-is.
I am also looking at 0003 a bit.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 9 Feb 2021, at 07:47, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Feb 09, 2021 at 12:08:37AM +0100, Daniel Gustafsson wrote:
>> Attached is a new patchset where I've tried to split the patches even further
>> to try and separate out changes for easier review.  While not a perfect split
>> I'm sure, and clearly only for review purposes, I do hope it helps a little.
>> There is one hunk in 0002 which moves some OpenSSL specific code from
>> underneath USE_SSL, but thats about the only non-NSS change left in this
>> patchset AFAICS.
>
> I would have imagined 0010 to be either a 0001 or a 0002 :)

Well, 0010 is a 2 in binary =) Jokes aside, I just didn't want to have a patch
referencing files added by later patches in the series.

>     errmsg("hostssl record cannot match because SSL is not supported by this build"),
> -    errhint("Compile with --with-ssl=openssl to use SSL connections."),
> +    errhint("Compile with --with-ssl to use SSL connections."),
> Actually, we could change that directly on HEAD as you suggest.  This
> code area is surrounded with USE_SSL so there is no need to mention
> openssl at all.

We could, the only reason it says =openssl today is that it's the only possible
value but thats an implementation detail.  Changing it now before it's shipped
anywhere means the translation will be stable even if another library is
supported.

> 0002 could just be committed as-is.

It can be, it's not the most pressing patch scope reduction but everything
helps of course.

> I am also looking at 0003 a bit.

Thanks.  That patch is slightly more interesting in terms of reducing scope
here, but I also think it makes the test code a bit easier to digest when
certificate management is abstracted into the API rather than the job of the
testfile to perform.

--
Daniel Gustafsson        https://vmware.com/


Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Feb 09, 2021 at 10:30:52AM +0100, Daniel Gustafsson wrote:
> It can be, it's not the most pressing patch scope reduction but everything
> helps of course.

Okay.  I have spent some time on this one and finished it.

> Thanks.  That patch is slightly more interesting in terms of reducing scope
> here, but I also think it makes the test code a bit easier to digest when
> certificate management is abstracted into the API rather than the job of the
> testfile to perform.

That's my impression.  Still, I am wondering if there could be a
different approach.  I need to think more about that first..
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 10 Feb 2021, at 08:23, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Feb 09, 2021 at 10:30:52AM +0100, Daniel Gustafsson wrote:
>> It can be, it's not the most pressing patch scope reduction but everything
>> helps of course.
>
> Okay.  I have spent some time on this one and finished it.

Thanks, I'll post a rebased version on top of this soon.

>> Thanks.  That patch is slightly more interesting in terms of reducing scope
>> here, but I also think it makes the test code a bit easier to digest when
>> certificate management is abstracted into the API rather than the job of the
>> testfile to perform.
>
> That's my impression.  Still, I am wondering if there could be a
> different approach.  I need to think more about that first..

Another option could be to roll SSL config into PostgresNode and expose SSL
connections to every subsystem tested with TAP. Something like:

    $node = get_new_node(..);
    $node->setup_ssl(..);
    $node->set_certificate(..);

That is a fair bit more work though, but perhaps we could then easier find
(and/or prevent) bugs like the one fixed in a45bc8a4f6495072bc48ad40a5aa03.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2020-07-20 at 15:35 +0200, Daniel Gustafsson wrote:
> This version adds support for sslinfo on NSS for most the functions.

I've poked around to see what can be done about the
unimplemented ssl_client_dn_field/ssl_issuer_field functions. There's a
nasty soup of specs to wade around in, and it's not really clear to me
which ones take precedence since they're mostly centered on LDAP.

My take on it is that OpenSSL has done its own thing here, with almost-
based-on-a-spec-but-not-quite semantics. NSS has no equivalents to many
of the field names that OpenSSL supports (e.g. "commonName"). Likewise,
OpenSSL doesn't support case-insensitivity (e.g. "cn" in addition to
"CN") as many of the relevant RFCs require. They do both support
dotted-decimal representations, so we could theoretically get feature
parity there without a huge amount of work.

For the few attributes that NSS has a public API for retrieving:
- common name
- country
- locality
- state
- organization
- domain component
- org. unit
- DN qualifier
- uid
- email address(es?)
we could hardcode the list of OpenSSL-compatible names, and just
translate manually in sslinfo. Then leave the rest up to dotted-decimal 
OIDs.

Would that be desirable, or do we want this interface to be something
more generally compatible with (some as-of-yet unspecified) spec?

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 17 Feb 2021, at 02:02, Jacob Champion <pchampion@vmware.com> wrote:

> On Mon, 2020-07-20 at 15:35 +0200, Daniel Gustafsson wrote:
>> This version adds support for sslinfo on NSS for most the functions.
>
> I've poked around to see what can be done about the
> unimplemented ssl_client_dn_field/ssl_issuer_field functions. There's a
> nasty soup of specs to wade around in, and it's not really clear to me
> which ones take precedence since they're mostly centered on LDAP.

Thanks for digging!

> we could hardcode the list of OpenSSL-compatible names, and just
> translate manually in sslinfo. Then leave the rest up to dotted-decimal
> OIDs.
>
> Would that be desirable, or do we want this interface to be something
> more generally compatible with (some as-of-yet unspecified) spec?

Regardless of approach taken I think this sounds like something that should be
tackled in a follow-up patch if the NSS patch is merged - and probably only as
a follow-up to a patch that adds test coverage to sslinfo.  From the sounds of
things me may not be able to guarantee stability across OpenSSL versions as it
is right now?

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 10 Feb 2021, at 13:17, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 10 Feb 2021, at 08:23, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Tue, Feb 09, 2021 at 10:30:52AM +0100, Daniel Gustafsson wrote:
>>> It can be, it's not the most pressing patch scope reduction but everything
>>> helps of course.
>>
>> Okay.  I have spent some time on this one and finished it.
>
> Thanks, I'll post a rebased version on top of this soon.

Attached is a rebase on top of this and the recent cryptohash changes to pass
in buffer lengths to the _final function.  On top of that, I fixed up and
expanded the documentation, improved SCRAM handling (by using NSS digest
operations which are better suited) and reworded and expanded comments.  This
patch version is, I think, feature complete with the OpenSSL implementation.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-02-17 at 22:19 +0100, Daniel Gustafsson wrote:
> > On 17 Feb 2021, at 02:02, Jacob Champion <pchampion@vmware.com> wrote:
> > Would that be desirable, or do we want this interface to be something
> > more generally compatible with (some as-of-yet unspecified) spec?
> 
> Regardless of approach taken I think this sounds like something that should be
> tackled in a follow-up patch if the NSS patch is merged - and probably only as
> a follow-up to a patch that adds test coverage to sslinfo.

Sounds good, and +1 to adding coverage at the same time.

> From the sounds of
> things me may not be able to guarantee stability across OpenSSL versions as it
> is right now?

Yeah. I was going to write that OpenSSL would be unlikely to change
these once they're added for the first time, but after checking GitHub
it looks like they have done so recently [1], as part of a patch
release no less.

--Jacob

[1] https://github.com/openssl/openssl/pull/10029

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-02-17 at 22:35 +0100, Daniel Gustafsson wrote:
> Attached is a rebase on top of this and the recent cryptohash changes to pass
> in buffer lengths to the _final function.  On top of that, I fixed up and
> expanded the documentation, improved SCRAM handling (by using NSS digest
> operations which are better suited) and reworded and expanded comments.  This
> patch version is, I think, feature complete with the OpenSSL implementation.

fe-secure-nss.c is no longer compiling as of this patchset; looks
like pgtls_open_client() has a truncated statement.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> Attached is a rebase which attempts to fix the cfbot Appveyor failure, there
> were missing HAVE_ defines for MSVC.

> Subject: [PATCH v30 1/9] nss: Support libnss as TLS library in libpq
>
> This commit contains the frontend and backend portion of TLS support
> in libpq to allow encrypted connections. The implementation is done

maybe add 'using NSS' to that first sentence. ;)

> +++ b/src/backend/libpq/auth.c
> @@ -2849,7 +2849,14 @@ CheckCertAuth(Port *port)
>  {
>      int            status_check_usermap = STATUS_ERROR;
>
> +#if defined(USE_OPENSSL)
>      Assert(port->ssl);
> +#elif defined(USE_NSS)
> +    /* TODO: should we rename pr_fd to ssl, to keep consistency? */
> +    Assert(port->pr_fd);
> +#else
> +    Assert(false);
> +#endif

Having thought about this TODO item for a bit, I tend to think it's
better to keep them distinct.  They aren't the same and it might not be
clear what's going on if one was to somehow mix them (at least if pr_fd
continues to sometimes be a void*, but I wonder why that's being
done..?  more on that later..).

> +++ b/src/backend/libpq/be-secure-nss.c
[...]
> +/* default init hook can be overridden by a shared library */
> +static void default_nss_tls_init(bool isServerStart);
> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init;

> +static PRDescIdentity pr_id;
> +
> +static PRIOMethods pr_iomethods;

Happy to be told I'm missing something, but the above two variables seem
to only be used in init_iolayer.. is there a reason they're declared
here instead of just being declared in that function?

> +    /*
> +     * Set the fallback versions for the TLS protocol version range to a
> +     * combination of our minimal requirement and the library maximum. Error
> +     * messages should be kept identical to those in be-secure-openssl.c to
> +     * make translations easier.
> +     */

Should we pull these error messages out into another header so that
they're in one place to make sure they're kept consistent, if we really
want to put the effort in to keep them the same..?  I'm not 100% sure
that it's actually necessary to do so, but defining these in one place
would help maintain this if we want to.  Also alright with just keeping
the comment, not that big of a deal.

> +int
> +be_tls_open_server(Port *port)
> +{
> +    SECStatus    status;
> +    PRFileDesc *model;
> +    PRFileDesc *pr_fd;

pr_fd here is materially different from port->pr_fd, no?  As in, one is
the NSS raw TCP fd while the other is the SSL fd, right?  Maybe we
should use two different variable names to try and make sure they don't
get confused?  Might even set this to NULL after we are done with it
too..  Then again, I see later on that when we do the dance with the
'model' PRFileDesc that we just use the same variable- maybe we should
do that?  That is, just get rid of this 'pr_fd' and use port->pr_fd
always?

> +    /*
> +     * The NSPR documentation states that runtime initialization via PR_Init
> +     * is no longer required, as the first caller into NSPR will perform the
> +     * initialization implicitly. The documentation doesn't however clarify
> +     * from which version this is holds true, so let's perform the potentially
> +     * superfluous initialization anyways to avoid crashing on older versions
> +     * of NSPR, as there is no difference in overhead.  The NSS documentation
> +     * still states that PR_Init must be called in some way (implicitly or
> +     * explicitly).
> +     *
> +     * The below parameters are what the implicit initialization would've done
> +     * for us, and should work even for older versions where it might not be
> +     * done automatically. The last parameter, maxPTDs, is set to various
> +     * values in other codebases, but has been unused since NSPR 2.1 which was
> +     * released sometime in 1998. In current versions of NSPR all parameters
> +     * are ignored.
> +     */
> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );
> +
> +    /*
> +     * The certificate path (configdir) must contain a valid NSS database. If
> +     * the certificate path isn't a valid directory, NSS will fall back on the
> +     * system certificate database. If the certificate path is a directory but
> +     * is empty then the initialization will fail. On the client side this can
> +     * be allowed for any sslmode but the verify-xxx ones.
> +     * https://bugzilla.redhat.com/show_bug.cgi?id=728562 For the server side
> +     * we won't allow this to fail however, as we require the certificate and
> +     * key to exist.
> +     *
> +     * The original design of NSS was for a single application to use a single
> +     * copy of it, initialized with NSS_Initialize() which isn't returning any
> +     * handle with which to refer to NSS. NSS initialization and shutdown are
> +     * global for the application, so a shutdown in another NSS enabled
> +     * library would cause NSS to be stopped for libpq as well.  The fix has
> +     * been to introduce NSS_InitContext which returns a context handle to
> +     * pass to NSS_ShutdownContext.  NSS_InitContext was introduced in NSS
> +     * 3.12, but the use of it is not very well documented.
> +     * https://bugzilla.redhat.com/show_bug.cgi?id=738456

The above seems to indicate that we will be requiring at least 3.12,
right?  Yet above we have code to work with NSPR versions before 2.1?
Maybe we should put a stake in the ground that says "we only support
back to version X of NSS", test with that and a few more recent versions
and the most recent, and then rip out anything that's needed for
versions which are older than that?  I have a pretty hard time imagining
that someone is going to want to build PG v14 w/ NSS 2.0 ...

> +    {
> +        char       *ciphers,
> +                   *c;
> +
> +        char       *sep = ":;, ";
> +        PRUint16    ciphercode;
> +        const        PRUint16 *nss_ciphers;
> +
> +        /*
> +         * If the user has specified a set of preferred cipher suites we start
> +         * by turning off all the existing suites to avoid the risk of down-
> +         * grades to a weaker cipher than expected.
> +         */
> +        nss_ciphers = SSL_GetImplementedCiphers();
> +        for (int i = 0; i < SSL_GetNumImplementedCiphers(); i++)
> +            SSL_CipherPrefSet(model, nss_ciphers[i], PR_FALSE);
> +
> +        ciphers = pstrdup(SSLCipherSuites);
> +
> +        for (c = strtok(ciphers, sep); c; c = strtok(NULL, sep))
> +        {
> +            if (!pg_find_cipher(c, &ciphercode))
> +            {
> +                status = SSL_CipherPrefSet(model, ciphercode, PR_TRUE);
> +                if (status != SECSuccess)
> +                {
> +                    ereport(COMMERROR,
> +                            (errmsg("invalid cipher-suite specified: %s", c)));
> +                    return -1;
> +                }
> +            }
> +        }

Maybe I'm a bit confused, but doesn't pg_find_cipher return *true* when
a cipher is found, and therefore the '!' above is saying "if we don't
find a matching cipher, then run the code to set the cipher ...".  Also-
we don't seem to complain at all about a cipher being specified that we
don't find?  Guess I would think that we might want to throw a WARNING
in such a case, but I could possibly be convinced otherwise.  Kind of
wonder just what happens with the current code, I'm guessing ciphercode
is zero and therefore doesn't complain but also doesn't do what we want.
I wonder if there's a way to test this?

I do think we should probably throw an error if we end up with *no*
ciphers being set, which doesn't seem to be happening here..?

> +    /*
> +     * Set up the custom IO layer.
> +     */

Might be good to mention that the IO Layer is what sets up the
read/write callbacks to be used.

> +    port->pr_fd = SSL_ImportFD(model, pr_fd);
> +    if (!port->pr_fd)
> +    {
> +        ereport(COMMERROR,
> +                (errmsg("unable to initialize")));
> +        return -1;
> +    }

Maybe a comment and a better error message for this?

> +    PR_Close(model);

This might deserve one also, the whole 'model' construct is a bit
different. :)

> +    port->ssl_in_use = true;
> +
> +    /* Register out shutdown callback */

*our

> +int
> +be_tls_get_cipher_bits(Port *port)
> +{
> +    SECStatus    status;
> +    SSLChannelInfo channel;
> +    SSLCipherSuiteInfo suite;
> +
> +    status = SSL_GetChannelInfo(port->pr_fd, &channel, sizeof(channel));
> +    if (status != SECSuccess)
> +        goto error;
> +
> +    status = SSL_GetCipherSuiteInfo(channel.cipherSuite, &suite, sizeof(suite));
> +    if (status != SECSuccess)
> +        goto error;
> +
> +    return suite.effectiveKeyBits;
> +
> +error:
> +    ereport(WARNING,
> +            (errmsg("unable to extract TLS session information: %s",
> +                    pg_SSLerrmessage(PR_GetError()))));
> +    return 0;
> +}

It doesn't have to be much, but I, at least, do prefer to see
function-header comments. :)  Not that the OpenSSL code has them
consistently, so obviously not that big of a deal.  Goes for a number of
the functions being added.

> +            /* Found a CN, ecode and copy it into a newly allocated buffer */

*decode

> +static PRInt32
> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags,
> +            PRIntervalTime timeout)
> +{
> +    PRRecvFN    read_fn;
> +    PRInt32        n_read;
> +
> +    read_fn = fd->lower->methods->recv;
> +    n_read = read_fn(fd->lower, buf, amount, flags, timeout);
> +
> +    return n_read;
> +}
> +
> +static PRInt32
> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags,
> +             PRIntervalTime timeout)
> +{
> +    PRSendFN    send_fn;
> +    PRInt32        n_write;
> +
> +    send_fn = fd->lower->methods->send;
> +    n_write = send_fn(fd->lower, buf, amount, flags, timeout);
> +
> +    return n_write;
> +}
> +
> +static PRStatus
> +pg_ssl_close(PRFileDesc *fd)
> +{
> +    /*
> +     * Disconnect our private Port from the fd before closing out the stack.
> +     * (Debug builds of NSPR will assert if we do not.)
> +     */
> +    fd->secret = NULL;
> +    return PR_GetDefaultIOMethods()->close(fd);
> +}

Regarding these, I find myself wondering how they're different from the
defaults..?  I mean, the above just directly called
PR_GetDefaultIOMethods() to then call it's close() function- are the
fd->lower_methods->recv/send not the default methods?  I don't quite get
what the point is from having our own callbacks here if they just do
exactly what the defaults would do (or are there actually no defined
defaults and you have to provide these..?).

> +/*
> + * ssl_protocol_version_to_nss
> + *            Translate PostgreSQL TLS version to NSS version
> + *
> + * Returns zero in case the requested TLS version is undefined (PG_ANY) and
> + * should be set by the caller, or -1 on failure.
> + */
> +static uint16
> +ssl_protocol_version_to_nss(int v, const char *guc_name)

guc_name isn't actually used in this function..?  Is there some reason
to keep it or is it leftover?

Also, I get that they do similar jobs and that one is in the frontend
and the other is in the backend, but I'm not a fan of having two
'ssl_protocol_version_to_nss()'s functions that take different argument
types but have exact same name and do functionally different things..

> +++ b/src/backend/utils/misc/guc.c
> @@ -4377,6 +4381,18 @@ static struct config_string ConfigureNamesString[] =
>          check_canonical_path, assign_pgstat_temp_directory, NULL
>      },
>
> +#ifdef USE_NSS
> +    {
> +        {"ssl_database", PGC_SIGHUP, CONN_AUTH_SSL,
> +            gettext_noop("Location of the NSS certificate database."),
> +            NULL
> +        },
> +        &ssl_database,
> +        "",
> +        NULL, NULL, NULL
> +    },
> +#endif

We don't #ifdef out the various GUCs even if SSL isn't compiled in, so
it doesn't seem quite right to be doing so here?  Generally speaking,
GUCs that we expect people to use (rather than debugging ones and such)
are typically always built, even if we don't build support for that
capability, so we can throw a better error message than just some ugly
syntax or parsing error if we come across one being set in a non-enabled
build.

> +++ b/src/common/cipher_nss.c
> @@ -0,0 +1,192 @@
> +/*-------------------------------------------------------------------------
> + *
> + * cipher_nss.c
> + *      NSS functionality shared between frontend and backend for working
> + *      with ciphers
> + *
> + * This should only bse used if code is compiled with NSS support.

*be

> +++ b/src/include/libpq/libpq-be.h
> @@ -200,6 +200,10 @@ typedef struct Port
>      SSL           *ssl;
>      X509       *peer;
>  #endif
> +
> +#ifdef USE_NSS
> +    void       *pr_fd;
> +#endif
>  } Port;

Given this is under a #ifdef USE_NSS, does it need to be / should it
really be a void*?

> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -359,6 +359,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
>          "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */
>      offsetof(struct pg_conn, target_session_attrs)},
>
> +    {"cert_database", NULL, NULL, NULL,
> +        "CertificateDatabase", "", 64,
> +    offsetof(struct pg_conn, cert_database)},

I mean, maybe nitpicking here, but all the other SSL stuff is
'sslsomething' and the backend version of this is 'ssl_database', so
wouldn't it be more consistent to have this be 'ssldatabase'?

> +++ b/src/interfaces/libpq/fe-secure-nss.c
> + * This logic exist in NSS as well, but it's only available for when there is

*exists

> +    /*
> +     * The NSPR documentation states that runtime initialization via PR_Init
> +     * is no longer required, as the first caller into NSPR will perform the
> +     * initialization implicitly. See be-secure-nss.c for further discussion
> +     * on PR_Init.
> +     */
> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0);

See same comment I made above- and also there's a comment earlier in
this file that we don't need to PR_Init() even ...

> +    {
> +        conn->nss_context = NSS_InitContext("", "", "", "", ¶ms,
> +                                            NSS_INIT_READONLY | NSS_INIT_NOCERTDB |
> +                                            NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN |
> +                                            NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD);
> +        if (!conn->nss_context)
> +        {
> +            printfPQExpBuffer(&conn->errorMessage,
> +                              libpq_gettext("unable to create certificate database: %s"),
> +                              pg_SSLerrmessage(PR_GetError()));
> +            return PGRES_POLLING_FAILED;
> +        }
> +    }

That error message seems a bit ... off?  Surely we aren't trying to
actually create a certificate database here?

> +    /*
> +     * Configure cipher policy.
> +     */
> +    status = NSS_SetDomesticPolicy();
> +    if (status != SECSuccess)
> +    {
> +        printfPQExpBuffer(&conn->errorMessage,
> +                          libpq_gettext("unable to configure cipher policy: %s"),
> +                          pg_SSLerrmessage(PR_GetError()));
> +
> +        return PGRES_POLLING_FAILED;
> +    }

Probably good to pull over at least some parts of the comments made in
the backend code about SetDomesticPolicy() actually enabling everything
(just like all the policies apparently do)...

> +    /*
> +     * If we don't have a certificate database, the system trust store is the
> +     * fallback we can use. If we fail to initialize that as well, we can
> +     * still attempt a connection as long as the sslmode isn't verify*.
> +     */
> +    if (!conn->cert_database && conn->sslmode[0] == 'v')
> +    {
> +        status = pg_load_nss_module(&ca_trust, ca_trust_name, "\"Root Certificates\"");
> +        if (status != SECSuccess)
> +        {
> +            printfPQExpBuffer(&conn->errorMessage,
> +                              libpq_gettext("WARNING: unable to load NSS trust module \"%s\" : %s"),
> +                              ca_trust_name,
> +                              pg_SSLerrmessage(PR_GetError()));
> +
> +            return PGRES_POLLING_FAILED;
> +        }
> +    }

Maybe have something a bit more here about "maybe you should specifify a
cert_database" or such?

> +    if (conn->ssl_max_protocol_version && strlen(conn->ssl_max_protocol_version) > 0)
> +    {
> +        int            ssl_max_ver = ssl_protocol_version_to_nss(conn->ssl_max_protocol_version);
> +
> +        if (ssl_max_ver == -1)
> +        {
> +            printfPQExpBuffer(&conn->errorMessage,
> +                              libpq_gettext("invalid value \"%s\" for maximum version of SSL protocol\n"),
> +                              conn->ssl_max_protocol_version);
> +            return -1;
> +        }
> +
> +        desired_range.max = ssl_max_ver;
> +    }

In the backend code, we have an additional check to make sure they
didn't set the min version higher than the max.. should we have that
here too?  Either way, seems like we should be consistent.

> +     * The model can now we closed as we've applied the settings of the model

*be

> +     * onto the real socket. From hereon we should only use conn->pr_fd.

*here on

Similar comments to the backend code- should we just always use
conn->pr_fd?  Or should we rename pr_fd to something else?

> +    /*
> +     * Specify which hostname we are expecting to talk to. This is required,
> +     * albeit mostly applies to when opening a connection to a traditional
> +     * http server it seems.
> +     */
> +    SSL_SetURL(conn->pr_fd, (conn->connhost[conn->whichhost]).host);

We should probably also set SNI, if available (NSS 3.12.6 it seems?),
since it looks like that's going to be added to the OpenSSL code.

> +    do
> +    {
> +        status = SSL_ForceHandshake(conn->pr_fd);
> +    }
> +    while (status != SECSuccess && PR_GetError() == PR_WOULD_BLOCK_ERROR);

We don't seem to have this loop in the backend code..  Is there some
reason that we don't?  Is it possible that we need to have a loop here
too?  I recall in the GSS encryption code there were definitely things
during setup that had to be looped back over on both sides to make sure
everything was finished ...

> +    if (conn->sslmode[0] == 'v')
> +        return SECFailure;

Seems a bit grotty to do this (though I see that the OpenSSL code does
too ... at least there we have a comment though, maybe add one here?).
I would have thought we'd actually do strcmp()'s like above.

> +    /*
> +     * Return the underlying PRFileDesc which can be used to access
> +     * information on the connection details. There is no SSL context per se.
> +     */
> +    if (strcmp(struct_name, "NSS") == 0)
> +        return conn->pr_fd;
> +    return NULL;
> +}

Is there never a reason someone might want the pointer returned by
NSS_InitContext?  I don't know that there is but it might be something
to consider (we could even possibly have our own structure returned by
this function which includes both, maybe..?).  Not sure if there's a
sensible use-case for that or not just wanted to bring it up as it's
something I asked myself while reading through this patch.

> +    if (strcmp(attribute_name, "protocol") == 0)
> +    {
> +        switch (channel.protocolVersion)
> +        {
> +#ifdef SSL_LIBRARY_VERSION_TLS_1_3
> +            case SSL_LIBRARY_VERSION_TLS_1_3:
> +                return "TLSv1.3";
> +#endif
> +#ifdef SSL_LIBRARY_VERSION_TLS_1_2
> +            case SSL_LIBRARY_VERSION_TLS_1_2:
> +                return "TLSv1.2";
> +#endif
> +#ifdef SSL_LIBRARY_VERSION_TLS_1_1
> +            case SSL_LIBRARY_VERSION_TLS_1_1:
> +                return "TLSv1.1";
> +#endif
> +            case SSL_LIBRARY_VERSION_TLS_1_0:
> +                return "TLSv1.0";
> +            default:
> +                return "unknown";
> +        }
> +    }

Not sure that it really matters, but this seems like it might be useful
to have as its own function...  Maybe even a data structure that both
functions use just in oppostie directions.  Really minor tho. :)

> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
> index c601071838..7f10da3010 100644
> --- a/src/interfaces/libpq/fe-secure.c
> +++ b/src/interfaces/libpq/fe-secure.c
> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn)
>  }
>  #endif                            /* USE_OPENSSL */
>
> +#ifndef USE_NSS
> +
> +PQsslKeyPassHook_nss_type
> +PQgetSSLKeyPassHook_nss(void)
> +{
> +    return NULL;
> +}
> +
> +void
> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook)
> +{
> +    return;
> +}
> +
> +char *
> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg)
> +{
> +    return NULL;
> +}
> +#endif                            /* USE_NSS */

Isn't this '!USE_NSS'?

> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
> index 0c9e95f1a7..f15af39222 100644
> --- a/src/interfaces/libpq/libpq-int.h
> +++ b/src/interfaces/libpq/libpq-int.h
> @@ -383,6 +383,7 @@ struct pg_conn
>      char       *sslrootcert;    /* root certificate filename */
>      char       *sslcrl;            /* certificate revocation list filename */
>      char       *sslcrldir;        /* certificate revocation list directory name */
> +    char       *cert_database;    /* NSS certificate/key database */
>      char       *requirepeer;    /* required peer credentials for local sockets */
>      char       *gssencmode;        /* GSS mode (require,prefer,disable) */
>      char       *krbsrvname;        /* Kerberos service name */
> @@ -507,6 +508,28 @@ struct pg_conn
>                                   * OpenSSL version changes */
>  #endif
>  #endif                            /* USE_OPENSSL */
> +
> +/*
> + * The NSS/NSPR specific types aren't used to avoid pulling in the required
> + * headers here, as they are causing conflicts with PG definitions.
> + */

I'm a bit confused- what are the conflicts being caused here..?
Certainly under USE_OPENSSL we use the actual OpenSSL types..

> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library
>
> The SSL testharness was fully tied to OpenSSL in the way the server was
> set up and reconfigured. This refactors the SSLServer module into a SSL
> library agnostic SSL/Server module which in turn use SSL/Backend/<lib>
> modules for the implementation details.
>
> No changes are done to the actual tests, this only change how setup and
> teardown is performed.

Presumably this could be committed ahead of the main NSS support?

> Subject: [PATCH v30 4/9] nss: pg_strong_random support
> +++ b/src/port/pg_strong_random.c
> +bool
> +pg_strong_random(void *buf, size_t len)
> +{
> +    NSSInitParameters params;
> +    NSSInitContext *nss_context;
> +    SECStatus    status;
> +
> +    memset(¶ms, 0, sizeof(params));
> +    params.length = sizeof(params);
> +    nss_context = NSS_InitContext("", "", "", "", ¶ms,
> +                                  NSS_INIT_READONLY | NSS_INIT_NOCERTDB |
> +                                  NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN |
> +                                  NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD);
> +
> +    if (!nss_context)
> +        return false;
> +
> +    status = PK11_GenerateRandom(buf, len);
> +    NSS_ShutdownContext(nss_context);
> +
> +    if (status == SECSuccess)
> +        return true;
> +
> +    return false;
> +}
> +
> +#else                            /* not USE_OPENSSL, USE_NSS or WIN32 */

I don't know that it's an issue, but do we actually need to init the NSS
context and shut it down every time..?

>  /*
>   * Without OpenSSL or Win32 support, just read /dev/urandom ourselves.

*or NSS

> Subject: [PATCH v30 5/9] nss: Documentation
> +++ b/doc/src/sgml/acronyms.sgml
> @@ -684,6 +717,16 @@
>      </listitem>
>     </varlistentry>
>
> +   <varlistentry>
> +    <term><acronym>TLS</acronym></term>
> +    <listitem>
> +     <para>
> +      <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security">
> +      Transport Layer Security</ulink>
> +     </para>
> +    </listitem>
> +   </varlistentry>

We don't have this already..?  Surely we should..

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index 967de73596..1608e9a7c7 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -1272,6 +1272,23 @@ include_dir 'conf.d'
>        </listitem>
>       </varlistentry>
>
> +     <varlistentry id="guc-ssl-database" xreflabel="ssl_database">
> +      <term><varname>ssl_database</varname> (<type>string</type>)
> +      <indexterm>
> +       <primary><varname>ssl_database</varname> configuration parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        Specifies the name of the file containing the server certificates and
> +        keys when using <productname>NSS</productname> for <acronym>SSL</acronym>
> +        connections. This parameter can only be set in the
> +        <filename>postgresql.conf</filename> file or on the server command
> +        line.

*SSL/TLS maybe?

> @@ -1288,7 +1305,9 @@ include_dir 'conf.d'
>          connections using TLS version 1.2 and lower are affected.  There is
>          currently no setting that controls the cipher choices used by TLS
>          version 1.3 connections.  The default value is
> -        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>.  The default is usually a
> +        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have
> +        been built with <productname>OpenSSL</productname> as the
> +        <acronym>SSL</acronym> library.  The default is usually a
>          reasonable choice unless you have specific security requirements.
>         </para>

Shouldn't we say something here wrt NSS?

> @@ -1490,8 +1509,11 @@ include_dir 'conf.d'
>         <para>
>          Sets an external command to be invoked when a passphrase for
>          decrypting an SSL file such as a private key needs to be obtained.  By
> -        default, this parameter is empty, which means the built-in prompting
> -        mechanism is used.
> +        default, this parameter is empty. When the server is using
> +        <productname>OpenSSL</productname>, this means the built-in prompting
> +        mechanism is used. When using <productname>NSS</productname>, there is
> +        no default prompting so a blank callback will be used returning an
> +        empty password.
>         </para>

Maybe we should point out here that this requires the database to not
require a password..?  So if they have one, they need to set this, or
maybe we should provide a default one..

> +++ b/doc/src/sgml/libpq.sgml
> +<synopsis>
> +PQsslKeyPassHook_nss_type PQgetSSLKeyPassHook_nss(void);
> +</synopsis>
> +      </para>
> +
> +      <para>
> +        <function>PQgetSSLKeyPassHook_nss</function> has no effect unless the
> +        server was compiled with <productname>nss</productname> support.
> +      </para>

We should try to be consistent- above should be NSS, not nss.

> +         <listitem>
> +          <para>
> +           <productname>NSS</productname>: specifying the parameter is required
> +           in case any password protected items are referenced in the
> +           <productname>NSS</productname> database, or if the database itself
> +           is password protected.  If multiple different objects are password
> +           protected, the same password is used for all.
> +          </para>
> +         </listitem>
> +        </itemizedlist>

Is this a statement about NSS databases (which I don't think it is) or
about the fact that we'll just use the password provided for all
attempts to decrypt something we need in the database?  Assuming the
latter, seems like we could reword this to be a bit more clear.

Maybe:

All attempts to decrypt objects which are password protected in the
database will use this password.

?

> @@ -2620,9 +2791,14 @@ void *PQsslStruct(const PGconn *conn, const char *struct_name);
> +       For <productname>NSS</productname>, there is one struct available under
> +       the name "NSS", and it returns a pointer to the
> +       <productname>NSS</productname> <literal>PRFileDesc</literal>.

... SSL PRFileDesc associated with the connection, no?

> +++ b/doc/src/sgml/runtime.sgml
> @@ -2552,6 +2583,89 @@ openssl x509 -req -in server.csr -text -days 365 \
>     </para>
>    </sect2>
>
> +  <sect2 id="nss-certificate-database">
> +   <title>NSS Certificate Databases</title>
> +
> +   <para>
> +    When using <productname>NSS</productname>, all certificates and keys must
> +    be loaded into an <productname>NSS</productname> certificate database.
> +   </para>
> +
> +   <para>
> +    To create a new <productname>NSS</productname> certificate database and
> +    load the certificates created in <xref linkend="ssl-certificate-creation" />,
> +    use the following <productname>NSS</productname> commands:
> +<programlisting>
> +certutil -d "sql:server.db" -N --empty-password
> +certutil -d "sql:server.db" -A -n server.crt -i server.crt -t "CT,C,C"
> +certutil -d "sql:server.db" -A -n root.crt -i root.crt -t "CT,C,C"
> +</programlisting>
> +    This will give the certificate the filename as the nickname identifier in
> +    the database which is created as <filename>server.db</filename>.
> +   </para>
> +   <para>
> +    Then load the server key, which require converting it to

*requires

> Subject: [PATCH v30 6/9] nss: Support NSS in pgcrypto
> +++ b/doc/src/sgml/pgcrypto.sgml
>        <row>
>         <entry>Blowfish</entry>
>         <entry>yes</entry>
>         <entry>yes</entry>
> +       <entry>yes</entry>
>        </row>

Maybe this should mention that it's with the built-in implementation as
blowfish isn't available from NSS?

>        <row>
>         <entry>DES/3DES/CAST5</entry>
>         <entry>no</entry>
>         <entry>yes</entry>
> +       <entry>yes</entry>
> +      </row>

Surely CAST5 from the above should be removed, since it's given its own
entry now?

> @@ -1241,7 +1260,8 @@ gen_random_uuid() returns uuid
>     <orderedlist>
>      <listitem>
>       <para>
> -      Any digest algorithm <productname>OpenSSL</productname> supports
> +      Any digest algorithm <productname>OpenSSL</productname> and
> +      <productname>NSS</productname> supports
>        is automatically picked up.

*or?  Maybe something more specific though- "Any digest algorithm
included with the library that PostgreSQL is compiled with is
automatically picked up." ?

> Subject: [PATCH v30 7/9] nss: Support NSS in sslinfo
>
> Since sslinfo to a large extent use the be_tls_* API this mostly

*uses

> Subject: [PATCH v30 8/9] nss: Support NSS in cryptohash
> +++ b/src/common/cryptohash_nss.c
> +    /*
> +     * Initialize our own NSS context without a database backing it.
> +     */
> +    memset(¶ms, 0, sizeof(params));
> +    params.length = sizeof(params);
> +    status = NSS_NoDB_Init(".");

We take some pains to use NSS_InitContext elsewhere..  Are we sure that
we should be using NSS_NoDB_Init here..?

Just a, well, not so quick read-through.  Generally it's looking pretty
good to me.  Will see about playing with it this week.

Thanks!

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,

Thanks for the review!  Below is a partial response, I haven't had time to
address all your review comments yet but I wanted to submit a rebased patchset
directly since the current version doesn't work after recent changes in the
tree. I will address the remaining comments tomorrow or the day after.

This rebase also includes a fix for pgtls_init which was sent offlist by Jacob.
The changes in pgtls_init can potentially be used to initialize the crypto
context for NSS to clean up this patch, Jacob is currently looking at that.

>> Subject: [PATCH v30 1/9] nss: Support libnss as TLS library in libpq
>>
>> This commit contains the frontend and backend portion of TLS support
>> in libpq to allow encrypted connections. The implementation is done
>
> maybe add 'using NSS' to that first sentence. ;)

Fixed.

>> +++ b/src/backend/libpq/auth.c
>> @@ -2849,7 +2849,14 @@ CheckCertAuth(Port *port)
>> {
>>     int            status_check_usermap = STATUS_ERROR;
>>
>> +#if defined(USE_OPENSSL)
>>     Assert(port->ssl);
>> +#elif defined(USE_NSS)
>> +    /* TODO: should we rename pr_fd to ssl, to keep consistency? */
>> +    Assert(port->pr_fd);
>> +#else
>> +    Assert(false);
>> +#endif
>
> Having thought about this TODO item for a bit, I tend to think it's
> better to keep them distinct.

I agree, which is why the TODO comment was there in the first place.  I've
removed the comment now.

> They aren't the same and it might not be
> clear what's going on if one was to somehow mix them (at least if pr_fd
> continues to sometimes be a void*, but I wonder why that's being
> done..?  more on that later..).

To paraphrase from a later in this email, there are collisions between nspr and
postgres on things like BITS_PER_BYTE, and there were also collisions on basic
types until I learned about NO_NSPR_10_SUPPORT.  By moving the juggling of this
into common/nss.h we can use proper types without introducing that pollution
everywhere. I will address these places.

>> +++ b/src/backend/libpq/be-secure-nss.c
> [...]
>> +/* default init hook can be overridden by a shared library */
>> +static void default_nss_tls_init(bool isServerStart);
>> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init;
>
>> +static PRDescIdentity pr_id;
>> +
>> +static PRIOMethods pr_iomethods;
>
> Happy to be told I'm missing something, but the above two variables seem
> to only be used in init_iolayer.. is there a reason they're declared
> here instead of just being declared in that function?

They must be there since NSPR doesn't copy these but reference them.

>> +    /*
>> +     * Set the fallback versions for the TLS protocol version range to a
>> +     * combination of our minimal requirement and the library maximum. Error
>> +     * messages should be kept identical to those in be-secure-openssl.c to
>> +     * make translations easier.
>> +     */
>
> Should we pull these error messages out into another header so that
> they're in one place to make sure they're kept consistent, if we really
> want to put the effort in to keep them the same..?  I'm not 100% sure
> that it's actually necessary to do so, but defining these in one place
> would help maintain this if we want to.  Also alright with just keeping
> the comment, not that big of a deal.

It might make sense to pull them into common/nss.h, but seeing the error
message right there when reading the code does IMO make it clearer so it's a
doubleedged sword.  Not sure what is the best option, but I'm not married to
the current solution so if there is consensus to pull them out somewhere I'm
happy to do so.

>> +int
>> +be_tls_open_server(Port *port)
>> +{
>> +    SECStatus    status;
>> +    PRFileDesc *model;
>> +    PRFileDesc *pr_fd;
>
> pr_fd here is materially different from port->pr_fd, no?  As in, one is
> the NSS raw TCP fd while the other is the SSL fd, right?  Maybe we
> should use two different variable names to try and make sure they don't
> get confused?  Might even set this to NULL after we are done with it
> too..  Then again, I see later on that when we do the dance with the
> 'model' PRFileDesc that we just use the same variable- maybe we should
> do that?  That is, just get rid of this 'pr_fd' and use port->pr_fd
> always?

Hmm, I think you're right. I will try that for the next patchset version.

>> +    /*
>> +     * The NSPR documentation states that runtime initialization via PR_Init
>> +     * is no longer required, as the first caller into NSPR will perform the
>> +     * initialization implicitly. The documentation doesn't however clarify
>> +     * from which version this is holds true, so let's perform the potentially
>> +     * superfluous initialization anyways to avoid crashing on older versions
>> +     * of NSPR, as there is no difference in overhead.  The NSS documentation
>> +     * still states that PR_Init must be called in some way (implicitly or
>> +     * explicitly).
>> +     *
>> +     * The below parameters are what the implicit initialization would've done
>> +     * for us, and should work even for older versions where it might not be
>> +     * done automatically. The last parameter, maxPTDs, is set to various
>> +     * values in other codebases, but has been unused since NSPR 2.1 which was
>> +     * released sometime in 1998. In current versions of NSPR all parameters
>> +     * are ignored.
>> +     */
>> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );
>> +
>> +    /*
>> +     * The certificate path (configdir) must contain a valid NSS database. If
>> +     * the certificate path isn't a valid directory, NSS will fall back on the
>> +     * system certificate database. If the certificate path is a directory but
>> +     * is empty then the initialization will fail. On the client side this can
>> +     * be allowed for any sslmode but the verify-xxx ones.
>> +     * https://bugzilla.redhat.com/show_bug.cgi?id=728562 For the server side
>> +     * we won't allow this to fail however, as we require the certificate and
>> +     * key to exist.
>> +     *
>> +     * The original design of NSS was for a single application to use a single
>> +     * copy of it, initialized with NSS_Initialize() which isn't returning any
>> +     * handle with which to refer to NSS. NSS initialization and shutdown are
>> +     * global for the application, so a shutdown in another NSS enabled
>> +     * library would cause NSS to be stopped for libpq as well.  The fix has
>> +     * been to introduce NSS_InitContext which returns a context handle to
>> +     * pass to NSS_ShutdownContext.  NSS_InitContext was introduced in NSS
>> +     * 3.12, but the use of it is not very well documented.
>> +     * https://bugzilla.redhat.com/show_bug.cgi?id=738456
>
> The above seems to indicate that we will be requiring at least 3.12,
> right?  Yet above we have code to work with NSPR versions before 2.1?

Well, not really.  The comment tries to explain the rationale for the
parameters passed.  Clearly the comment could be improved to make that point
clearer.

> Maybe we should put a stake in the ground that says "we only support
> back to version X of NSS", test with that and a few more recent versions
> and the most recent, and then rip out anything that's needed for
> versions which are older than that?

Yes, right now there is very little in the patch which caters for old versions,
the PR_Init call might be one of the few offenders.  There has been discussion
upthread about settling for a required version, combining the insights learned
there with a survey of which versions are commonly available packaged.

Once we settle on a version we can confirm if PR_Init is/isn't needed and
remove all traces of it if not.

> I have a pretty hard time imagining that someone is going to want to build PG
> v14 w/ NSS 2.0 ...


Let alone compiling 2.0 at all on a recent system..

>> +    {
>> +        char       *ciphers,
>> +                   *c;
>> +
>> +        char       *sep = ":;, ";
>> +        PRUint16    ciphercode;
>> +        const        PRUint16 *nss_ciphers;
>> +
>> +        /*
>> +         * If the user has specified a set of preferred cipher suites we start
>> +         * by turning off all the existing suites to avoid the risk of down-
>> +         * grades to a weaker cipher than expected.
>> +         */
>> +        nss_ciphers = SSL_GetImplementedCiphers();
>> +        for (int i = 0; i < SSL_GetNumImplementedCiphers(); i++)
>> +            SSL_CipherPrefSet(model, nss_ciphers[i], PR_FALSE);
>> +
>> +        ciphers = pstrdup(SSLCipherSuites);
>> +
>> +        for (c = strtok(ciphers, sep); c; c = strtok(NULL, sep))
>> +        {
>> +            if (!pg_find_cipher(c, &ciphercode))
>> +            {
>> +                status = SSL_CipherPrefSet(model, ciphercode, PR_TRUE);
>> +                if (status != SECSuccess)
>> +                {
>> +                    ereport(COMMERROR,
>> +                            (errmsg("invalid cipher-suite specified: %s", c)));
>> +                    return -1;
>> +                }
>> +            }
>> +        }
>
> Maybe I'm a bit confused, but doesn't pg_find_cipher return *true* when
> a cipher is found, and therefore the '!' above is saying "if we don't
> find a matching cipher, then run the code to set the cipher ...".

Hmm, yes thats broken. Fixed.

> Also- we don't seem to complain at all about a cipher being specified that we
> don't find?  Guess I would think that we might want to throw a WARNING in such
> a case, but I could possibly be convinced otherwise.


No, I think you're right, we should throw WARNING there or possibly even a
higher elevel. Should that be a COMMERROR even?

> Kind of wonder just what happens with the current code, I'm guessing ciphercode
> is zero and therefore doesn't complain but also doesn't do what we want.  I
> wonder if there's a way to test this?


We could extend the test suite to set ciphers in postgresql.conf, I'll give it
a go.

> I do think we should probably throw an error if we end up with *no*
> ciphers being set, which doesn't seem to be happening here..?

Yeah, that should be a COMMERROR. Fixed.

>> +    /*
>> +     * Set up the custom IO layer.
>> +     */
>
> Might be good to mention that the IO Layer is what sets up the
> read/write callbacks to be used.

Good point, will do in the next version of the patchset.

>> +    port->pr_fd = SSL_ImportFD(model, pr_fd);
>> +    if (!port->pr_fd)
>> +    {
>> +        ereport(COMMERROR,
>> +                (errmsg("unable to initialize")));
>> +        return -1;
>> +    }
>
> Maybe a comment and a better error message for this?

Will do.

>
>> +    PR_Close(model);
>
> This might deserve one also, the whole 'model' construct is a bit
> different. :)

Agreed. will do.

>> +    port->ssl_in_use = true;
>> +
>> +    /* Register out shutdown callback */
>
> *our

Fixed.

>> +int
>> +be_tls_get_cipher_bits(Port *port)
>> +{
>> +    SECStatus    status;
>> +    SSLChannelInfo channel;
>> +    SSLCipherSuiteInfo suite;
>> +
>> +    status = SSL_GetChannelInfo(port->pr_fd, &channel, sizeof(channel));
>> +    if (status != SECSuccess)
>> +        goto error;
>> +
>> +    status = SSL_GetCipherSuiteInfo(channel.cipherSuite, &suite, sizeof(suite));
>> +    if (status != SECSuccess)
>> +        goto error;
>> +
>> +    return suite.effectiveKeyBits;
>> +
>> +error:
>> +    ereport(WARNING,
>> +            (errmsg("unable to extract TLS session information: %s",
>> +                    pg_SSLerrmessage(PR_GetError()))));
>> +    return 0;
>> +}
>
> It doesn't have to be much, but I, at least, do prefer to see
> function-header comments. :)  Not that the OpenSSL code has them
> consistently, so obviously not that big of a deal.  Goes for a number of
> the functions being added.

No disagreement from me, I've added comments on a few more functions and will
continue to go over the patchset to add them everywhere.  Some of these
comments are pretty uninteresting and could do with some wordsmithing.

>> +            /* Found a CN, ecode and copy it into a newly allocated buffer */
>
> *decode

Fixed.

>> +static PRInt32
>> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags,
>> +            PRIntervalTime timeout)
>> +{
>> +    PRRecvFN    read_fn;
>> +    PRInt32        n_read;
>> +
>> +    read_fn = fd->lower->methods->recv;
>> +    n_read = read_fn(fd->lower, buf, amount, flags, timeout);
>> +
>> +    return n_read;
>> +}
>> +
>> +static PRInt32
>> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags,
>> +             PRIntervalTime timeout)
>> +{
>> +    PRSendFN    send_fn;
>> +    PRInt32        n_write;
>> +
>> +    send_fn = fd->lower->methods->send;
>> +    n_write = send_fn(fd->lower, buf, amount, flags, timeout);
>> +
>> +    return n_write;
>> +}
>> +
>> +static PRStatus
>> +pg_ssl_close(PRFileDesc *fd)
>> +{
>> +    /*
>> +     * Disconnect our private Port from the fd before closing out the stack.
>> +     * (Debug builds of NSPR will assert if we do not.)
>> +     */
>> +    fd->secret = NULL;
>> +    return PR_GetDefaultIOMethods()->close(fd);
>> +}
>
> Regarding these, I find myself wondering how they're different from the
> defaults..?  I mean, the above just directly called
> PR_GetDefaultIOMethods() to then call it's close() function- are the
> fd->lower_methods->recv/send not the default methods?  I don't quite get
> what the point is from having our own callbacks here if they just do
> exactly what the defaults would do (or are there actually no defined
> defaults and you have to provide these..?).

It's really just to cope with debug builds of NSPR which assert that fd->secret
is null before closing.

>> +/*
>> + * ssl_protocol_version_to_nss
>> + *            Translate PostgreSQL TLS version to NSS version
>> + *
>> + * Returns zero in case the requested TLS version is undefined (PG_ANY) and
>> + * should be set by the caller, or -1 on failure.
>> + */
>> +static uint16
>> +ssl_protocol_version_to_nss(int v, const char *guc_name)
>
> guc_name isn't actually used in this function..?  Is there some reason
> to keep it or is it leftover?

It's a leftover from when the function was doing error reporting, fixed.

> Also, I get that they do similar jobs and that one is in the frontend
> and the other is in the backend, but I'm not a fan of having two
> 'ssl_protocol_version_to_nss()'s functions that take different argument
> types but have exact same name and do functionally different things..

Good point, I'll change that.

>> +++ b/src/backend/utils/misc/guc.c
>> @@ -4377,6 +4381,18 @@ static struct config_string ConfigureNamesString[] =
>>         check_canonical_path, assign_pgstat_temp_directory, NULL
>>     },
>>
>> +#ifdef USE_NSS
>> +    {
>> +        {"ssl_database", PGC_SIGHUP, CONN_AUTH_SSL,
>> +            gettext_noop("Location of the NSS certificate database."),
>> +            NULL
>> +        },
>> +        &ssl_database,
>> +        "",
>> +        NULL, NULL, NULL
>> +    },
>> +#endif
>
> We don't #ifdef out the various GUCs even if SSL isn't compiled in, so
> it doesn't seem quite right to be doing so here?  Generally speaking,
> GUCs that we expect people to use (rather than debugging ones and such)
> are typically always built, even if we don't build support for that
> capability, so we can throw a better error message than just some ugly
> syntax or parsing error if we come across one being set in a non-enabled
> build.

Of course, fixed.

>> +++ b/src/common/cipher_nss.c
>> @@ -0,0 +1,192 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * cipher_nss.c
>> + *      NSS functionality shared between frontend and backend for working
>> + *      with ciphers
>> + *
>> + * This should only bse used if code is compiled with NSS support.
>
> *be

Fixed.

>> +++ b/src/include/libpq/libpq-be.h
>> @@ -200,6 +200,10 @@ typedef struct Port
>>     SSL           *ssl;
>>     X509       *peer;
>> #endif
>> +
>> +#ifdef USE_NSS
>> +    void       *pr_fd;
>> +#endif
>> } Port;
>
> Given this is under a #ifdef USE_NSS, does it need to be / should it
> really be a void*?

It's to avoid the same BITS_PER_BYTE collision discussed elsewhere in this
email.

>> +++ b/src/interfaces/libpq/fe-connect.c
>> @@ -359,6 +359,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
>>         "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */
>>     offsetof(struct pg_conn, target_session_attrs)},
>>
>> +    {"cert_database", NULL, NULL, NULL,
>> +        "CertificateDatabase", "", 64,
>> +    offsetof(struct pg_conn, cert_database)},
>
> I mean, maybe nitpicking here, but all the other SSL stuff is
> 'sslsomething' and the backend version of this is 'ssl_database', so
> wouldn't it be more consistent to have this be 'ssldatabase'?

Thats a good point, I was clearly Stockholm syndromed since I hadn't reflected
on that but it's clearly wrong.  Will fix.

>> +++ b/src/interfaces/libpq/fe-secure-nss.c
>> + * This logic exist in NSS as well, but it's only available for when there is
>
> *exists

Fixed.

>> +    /*
>> +     * The NSPR documentation states that runtime initialization via PR_Init
>> +     * is no longer required, as the first caller into NSPR will perform the
>> +     * initialization implicitly. See be-secure-nss.c for further discussion
>> +     * on PR_Init.
>> +     */
>> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0);
>
> See same comment I made above- and also there's a comment earlier in
> this file that we don't need to PR_Init() even ...

Right, once we can confirm that the minimum required versions are past the
PR_Init dependency then we should remove all of these calls.  If we can't
remove the calls, the comments should be updated to reflect why they are there.

>> +    {
>> +        conn->nss_context = NSS_InitContext("", "", "", "", ¶ms,
>> +                                            NSS_INIT_READONLY | NSS_INIT_NOCERTDB |
>> +                                            NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN |
>> +                                            NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD);
>> +        if (!conn->nss_context)
>> +        {
>> +            printfPQExpBuffer(&conn->errorMessage,
>> +                              libpq_gettext("unable to create certificate database: %s"),
>> +                              pg_SSLerrmessage(PR_GetError()));
>> +            return PGRES_POLLING_FAILED;
>> +        }
>> +    }
>
> That error message seems a bit ... off?  Surely we aren't trying to
> actually create a certificate database here?

Not really no, it does set up a transient database structure for the duration
of the connection AFAIK but thats clearly not the level of detail we should be
giving users.  I've reworded to indicate that NSS init failed, and ideally the
pg_SSLerrmessage call will provide appropriate detail.

>> +    /*
>> +     * Configure cipher policy.
>> +     */
>> +    status = NSS_SetDomesticPolicy();
>> +    if (status != SECSuccess)
>> +    {
>> +        printfPQExpBuffer(&conn->errorMessage,
>> +                          libpq_gettext("unable to configure cipher policy: %s"),
>> +                          pg_SSLerrmessage(PR_GetError()));
>> +
>> +        return PGRES_POLLING_FAILED;
>> +    }
>
> Probably good to pull over at least some parts of the comments made in
> the backend code about SetDomesticPolicy() actually enabling everything
> (just like all the policies apparently do)...

Good point, will do.

>> +    /*
>> +     * If we don't have a certificate database, the system trust store is the
>> +     * fallback we can use. If we fail to initialize that as well, we can
>> +     * still attempt a connection as long as the sslmode isn't verify*.
>> +     */
>> +    if (!conn->cert_database && conn->sslmode[0] == 'v')
>> +    {
>> +        status = pg_load_nss_module(&ca_trust, ca_trust_name, "\"Root Certificates\"");
>> +        if (status != SECSuccess)
>> +        {
>> +            printfPQExpBuffer(&conn->errorMessage,
>> +                              libpq_gettext("WARNING: unable to load NSS trust module \"%s\" : %s"),
>> +                              ca_trust_name,
>> +                              pg_SSLerrmessage(PR_GetError()));
>> +
>> +            return PGRES_POLLING_FAILED;
>> +        }
>> +    }
>
> Maybe have something a bit more here about "maybe you should specifify a
> cert_database" or such?

Good point, will expand with more detail.

>> +    if (conn->ssl_max_protocol_version && strlen(conn->ssl_max_protocol_version) > 0)
>> +    {
>> +        int            ssl_max_ver = ssl_protocol_version_to_nss(conn->ssl_max_protocol_version);
>> +
>> +        if (ssl_max_ver == -1)
>> +        {
>> +            printfPQExpBuffer(&conn->errorMessage,
>> +                              libpq_gettext("invalid value \"%s\" for maximum version of SSL protocol\n"),
>> +                              conn->ssl_max_protocol_version);
>> +            return -1;
>> +        }
>> +
>> +        desired_range.max = ssl_max_ver;
>> +    }
>
> In the backend code, we have an additional check to make sure they
> didn't set the min version higher than the max.. should we have that
> here too?  Either way, seems like we should be consistent.

We already test that in src/interfaces/libpq/fe-connect.c.

>> +     * The model can now we closed as we've applied the settings of the model
>
> *be

Fixed.

>> +     * onto the real socket. From hereon we should only use conn->pr_fd.
>
> *here on

Fixed.

> Similar comments to the backend code- should we just always use
> conn->pr_fd?  Or should we rename pr_fd to something else?

Renaming is probably not a bad idea, will fix.

>> +    /*
>> +     * Specify which hostname we are expecting to talk to. This is required,
>> +     * albeit mostly applies to when opening a connection to a traditional
>> +     * http server it seems.
>> +     */
>> +    SSL_SetURL(conn->pr_fd, (conn->connhost[conn->whichhost]).host);
>
> We should probably also set SNI, if available (NSS 3.12.6 it seems?),
> since it looks like that's going to be added to the OpenSSL code.

Good point, will do.

>> +    do
>> +    {
>> +        status = SSL_ForceHandshake(conn->pr_fd);
>> +    }
>> +    while (status != SECSuccess && PR_GetError() == PR_WOULD_BLOCK_ERROR);
>
> We don't seem to have this loop in the backend code..  Is there some
> reason that we don't?  Is it possible that we need to have a loop here
> too?  I recall in the GSS encryption code there were definitely things
> during setup that had to be looped back over on both sides to make sure
> everything was finished ...

Off the cuff I can't remember, will look into it.

>> +    if (conn->sslmode[0] == 'v')
>> +        return SECFailure;
>
> Seems a bit grotty to do this (though I see that the OpenSSL code does
> too ... at least there we have a comment though, maybe add one here?).
> I would have thought we'd actually do strcmp()'s like above.

That's admittedly copied from the OpenSSL code, and I agree that it's a bit too
clever.  Replaced with plain strcmp's to improve readability in both places it
occurred.

>> +    /*
>> +     * Return the underlying PRFileDesc which can be used to access
>> +     * information on the connection details. There is no SSL context per se.
>> +     */
>> +    if (strcmp(struct_name, "NSS") == 0)
>> +        return conn->pr_fd;
>> +    return NULL;
>> +}
>
> Is there never a reason someone might want the pointer returned by
> NSS_InitContext?  I don't know that there is but it might be something
> to consider (we could even possibly have our own structure returned by
> this function which includes both, maybe..?).  Not sure if there's a
> sensible use-case for that or not just wanted to bring it up as it's
> something I asked myself while reading through this patch.

Not sure I understand what you're asking for here, did you mean "is there ever
a reason"?

>> +    if (strcmp(attribute_name, "protocol") == 0)
>> +    {
>> +        switch (channel.protocolVersion)
>> +        {
>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_3
>> +            case SSL_LIBRARY_VERSION_TLS_1_3:
>> +                return "TLSv1.3";
>> +#endif
>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_2
>> +            case SSL_LIBRARY_VERSION_TLS_1_2:
>> +                return "TLSv1.2";
>> +#endif
>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_1
>> +            case SSL_LIBRARY_VERSION_TLS_1_1:
>> +                return "TLSv1.1";
>> +#endif
>> +            case SSL_LIBRARY_VERSION_TLS_1_0:
>> +                return "TLSv1.0";
>> +            default:
>> +                return "unknown";
>> +        }
>> +    }
>
> Not sure that it really matters, but this seems like it might be useful
> to have as its own function...  Maybe even a data structure that both
> functions use just in oppostie directions.  Really minor tho. :)

I suppose that wouldn't be a bad thing, will fix.

>> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
>> index c601071838..7f10da3010 100644
>> --- a/src/interfaces/libpq/fe-secure.c
>> +++ b/src/interfaces/libpq/fe-secure.c
>> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn)
>> }
>> #endif                            /* USE_OPENSSL */
>>
>> +#ifndef USE_NSS
>> +
>> +PQsslKeyPassHook_nss_type
>> +PQgetSSLKeyPassHook_nss(void)
>> +{
>> +    return NULL;
>> +}
>> +
>> +void
>> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook)
>> +{
>> +    return;
>> +}
>> +
>> +char *
>> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg)
>> +{
>> +    return NULL;
>> +}
>> +#endif                            /* USE_NSS */
>
> Isn't this '!USE_NSS'?

Technically it is, but using just /* USE_NSS */ is consistent with the rest of
blocks in the file.

>> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
>> index 0c9e95f1a7..f15af39222 100644
>> --- a/src/interfaces/libpq/libpq-int.h
>> +++ b/src/interfaces/libpq/libpq-int.h
>> @@ -383,6 +383,7 @@ struct pg_conn
>>     char       *sslrootcert;    /* root certificate filename */
>>     char       *sslcrl;            /* certificate revocation list filename */
>>     char       *sslcrldir;        /* certificate revocation list directory name */
>> +    char       *cert_database;    /* NSS certificate/key database */
>>     char       *requirepeer;    /* required peer credentials for local sockets */
>>     char       *gssencmode;        /* GSS mode (require,prefer,disable) */
>>     char       *krbsrvname;        /* Kerberos service name */
>> @@ -507,6 +508,28 @@ struct pg_conn
>>                                  * OpenSSL version changes */
>> #endif
>> #endif                            /* USE_OPENSSL */
>> +
>> +/*
>> + * The NSS/NSPR specific types aren't used to avoid pulling in the required
>> + * headers here, as they are causing conflicts with PG definitions.
>> + */
>
> I'm a bit confused- what are the conflicts being caused here..?
> Certainly under USE_OPENSSL we use the actual OpenSSL types..

It's referring to collisions with for example BITS_PER_BYTE which is defined
both by postgres and nspr.  Since writing this I've introduced src/common/nss.h
to handle it in a single place, so we can indeed use the proper types without
polluting the file.  Fixed.

>> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library
>>
>> The SSL testharness was fully tied to OpenSSL in the way the server was
>> set up and reconfigured. This refactors the SSLServer module into a SSL
>> library agnostic SSL/Server module which in turn use SSL/Backend/<lib>
>> modules for the implementation details.
>>
>> No changes are done to the actual tests, this only change how setup and
>> teardown is performed.
>
> Presumably this could be committed ahead of the main NSS support?

Correct, I think this has merits even if NSS support is ultimately rejected.

>> Subject: [PATCH v30 4/9] nss: pg_strong_random support
>> +++ b/src/port/pg_strong_random.c
>> +bool
>> +pg_strong_random(void *buf, size_t len)
>> +{
>> +    NSSInitParameters params;
>> +    NSSInitContext *nss_context;
>> +    SECStatus    status;
>> +
>> +    memset(¶ms, 0, sizeof(params));
>> +    params.length = sizeof(params);
>> +    nss_context = NSS_InitContext("", "", "", "", ¶ms,
>> +                                  NSS_INIT_READONLY | NSS_INIT_NOCERTDB |
>> +                                  NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN |
>> +                                  NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD);
>> +
>> +    if (!nss_context)
>> +        return false;
>> +
>> +    status = PK11_GenerateRandom(buf, len);
>> +    NSS_ShutdownContext(nss_context);
>> +
>> +    if (status == SECSuccess)
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>> +#else                            /* not USE_OPENSSL, USE_NSS or WIN32 */
>
> I don't know that it's an issue, but do we actually need to init the NSS
> context and shut it down every time..?

We need to have a context, and we should be able to set it like how the WIN32
code sets hProvider.  I don't remember if there was a reason against that, will
revisit.

>> /*
>>  * Without OpenSSL or Win32 support, just read /dev/urandom ourselves.
>
> *or NSS

Fixed.

>> Subject: [PATCH v30 5/9] nss: Documentation
>> +++ b/doc/src/sgml/acronyms.sgml
>> @@ -684,6 +717,16 @@
>>     </listitem>
>>    </varlistentry>
>>
>> +   <varlistentry>
>> +    <term><acronym>TLS</acronym></term>
>> +    <listitem>
>> +     <para>
>> +      <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security">
>> +      Transport Layer Security</ulink>
>> +     </para>
>> +    </listitem>
>> +   </varlistentry>
>
> We don't have this already..?  Surely we should..

We really should, especially since we've had <acronym>TLS</acronym> in
config.sgml since 2014 (c6763156589).  That's another small piece that could be
committed on it's own to cut down the size of this patchset (even if only by a
tiny amount).

>> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
>> index 967de73596..1608e9a7c7 100644
>> --- a/doc/src/sgml/config.sgml
>> +++ b/doc/src/sgml/config.sgml
>> @@ -1272,6 +1272,23 @@ include_dir 'conf.d'
>>       </listitem>
>>      </varlistentry>
>>
>> +     <varlistentry id="guc-ssl-database" xreflabel="ssl_database">
>> +      <term><varname>ssl_database</varname> (<type>string</type>)
>> +      <indexterm>
>> +       <primary><varname>ssl_database</varname> configuration parameter</primary>
>> +      </indexterm>
>> +      </term>
>> +      <listitem>
>> +       <para>
>> +        Specifies the name of the file containing the server certificates and
>> +        keys when using <productname>NSS</productname> for <acronym>SSL</acronym>
>> +        connections. This parameter can only be set in the
>> +        <filename>postgresql.conf</filename> file or on the server command
>> +        line.
>
> *SSL/TLS maybe?

Fixed.

>> @@ -1288,7 +1305,9 @@ include_dir 'conf.d'
>>         connections using TLS version 1.2 and lower are affected.  There is
>>         currently no setting that controls the cipher choices used by TLS
>>         version 1.3 connections.  The default value is
>> -        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>.  The default is usually a
>> +        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have
>> +        been built with <productname>OpenSSL</productname> as the
>> +        <acronym>SSL</acronym> library.  The default is usually a
>>         reasonable choice unless you have specific security requirements.
>>        </para>
>
> Shouldn't we say something here wrt NSS?

We should, but I'm not entirely what just yet. Need to revisit that.

>> @@ -1490,8 +1509,11 @@ include_dir 'conf.d'
>>        <para>
>>         Sets an external command to be invoked when a passphrase for
>>         decrypting an SSL file such as a private key needs to be obtained.  By
>> -        default, this parameter is empty, which means the built-in prompting
>> -        mechanism is used.
>> +        default, this parameter is empty. When the server is using
>> +        <productname>OpenSSL</productname>, this means the built-in prompting
>> +        mechanism is used. When using <productname>NSS</productname>, there is
>> +        no default prompting so a blank callback will be used returning an
>> +        empty password.
>>        </para>
>
> Maybe we should point out here that this requires the database to not
> require a password..?  So if they have one, they need to set this, or
> maybe we should provide a default one..

I've added a sentence on not using a password for the cert database.  I'm not
sure if providing a default one is a good idea but it's no less insecure than
having no password really..

>> +++ b/doc/src/sgml/libpq.sgml
>> +<synopsis>
>> +PQsslKeyPassHook_nss_type PQgetSSLKeyPassHook_nss(void);
>> +</synopsis>
>> +      </para>
>> +
>> +      <para>
>> +        <function>PQgetSSLKeyPassHook_nss</function> has no effect unless the
>> +        server was compiled with <productname>nss</productname> support.
>> +      </para>
>
> We should try to be consistent- above should be NSS, not nss.

Fixed.

>> +         <listitem>
>> +          <para>
>> +           <productname>NSS</productname>: specifying the parameter is required
>> +           in case any password protected items are referenced in the
>> +           <productname>NSS</productname> database, or if the database itself
>> +           is password protected.  If multiple different objects are password
>> +           protected, the same password is used for all.
>> +          </para>
>> +         </listitem>
>> +        </itemizedlist>
>
> Is this a statement about NSS databases (which I don't think it is) or
> about the fact that we'll just use the password provided for all
> attempts to decrypt something we need in the database?

Correct.

> Assuming the
> latter, seems like we could reword this to be a bit more clear.
>
> Maybe:
>
> All attempts to decrypt objects which are password protected in the
> database will use this password.

Agreed, fixed.

>> @@ -2620,9 +2791,14 @@ void *PQsslStruct(const PGconn *conn, const char *struct_name);
>> +       For <productname>NSS</productname>, there is one struct available under
>> +       the name "NSS", and it returns a pointer to the
>> +       <productname>NSS</productname> <literal>PRFileDesc</literal>.
>
> ... SSL PRFileDesc associated with the connection, no?

I was trying to be specific that it's an NSS-defined structure and not a
PostgreSQL one which is returned. Fixed.

>> +++ b/doc/src/sgml/runtime.sgml
>> @@ -2552,6 +2583,89 @@ openssl x509 -req -in server.csr -text -days 365 \
>>    </para>
>>   </sect2>
>>
>> +  <sect2 id="nss-certificate-database">
>> +   <title>NSS Certificate Databases</title>
>> +
>> +   <para>
>> +    When using <productname>NSS</productname>, all certificates and keys must
>> +    be loaded into an <productname>NSS</productname> certificate database.
>> +   </para>
>> +
>> +   <para>
>> +    To create a new <productname>NSS</productname> certificate database and
>> +    load the certificates created in <xref linkend="ssl-certificate-creation" />,
>> +    use the following <productname>NSS</productname> commands:
>> +<programlisting>
>> +certutil -d "sql:server.db" -N --empty-password
>> +certutil -d "sql:server.db" -A -n server.crt -i server.crt -t "CT,C,C"
>> +certutil -d "sql:server.db" -A -n root.crt -i root.crt -t "CT,C,C"
>> +</programlisting>
>> +    This will give the certificate the filename as the nickname identifier in
>> +    the database which is created as <filename>server.db</filename>.
>> +   </para>
>> +   <para>
>> +    Then load the server key, which require converting it to
>
> *requires

Fixed.

>> Subject: [PATCH v30 6/9] nss: Support NSS in pgcrypto
>> +++ b/doc/src/sgml/pgcrypto.sgml
>>       <row>
>>        <entry>Blowfish</entry>
>>        <entry>yes</entry>
>>        <entry>yes</entry>
>> +       <entry>yes</entry>
>>       </row>
>
> Maybe this should mention that it's with the built-in implementation as
> blowfish isn't available from NSS?

Fixed by adding a Note item.

>>       <row>
>>        <entry>DES/3DES/CAST5</entry>
>>        <entry>no</entry>
>>        <entry>yes</entry>
>> +       <entry>yes</entry>
>> +      </row>
>
> Surely CAST5 from the above should be removed, since it's given its own
> entry now?

Indeed, fixed.

>> @@ -1241,7 +1260,8 @@ gen_random_uuid() returns uuid
>>    <orderedlist>
>>     <listitem>
>>      <para>
>> -      Any digest algorithm <productname>OpenSSL</productname> supports
>> +      Any digest algorithm <productname>OpenSSL</productname> and
>> +      <productname>NSS</productname> supports
>>       is automatically picked up.
>
> *or?  Maybe something more specific though- "Any digest algorithm
> included with the library that PostgreSQL is compiled with is
> automatically picked up." ?

Good point, thats better. Fixed.

>> Subject: [PATCH v30 7/9] nss: Support NSS in sslinfo
>>
>> Since sslinfo to a large extent use the be_tls_* API this mostly
>
> *uses

Fixed.

>> Subject: [PATCH v30 8/9] nss: Support NSS in cryptohash
>> +++ b/src/common/cryptohash_nss.c
>> +    /*
>> +     * Initialize our own NSS context without a database backing it.
>> +     */
>> +    memset(¶ms, 0, sizeof(params));
>> +    params.length = sizeof(params);
>> +    status = NSS_NoDB_Init(".");
>
> We take some pains to use NSS_InitContext elsewhere..  Are we sure that
> we should be using NSS_NoDB_Init here..?

No, we should probably be using NSS_InitContext. Will fix.

> Just a, well, not so quick read-through.  Generally it's looking pretty
> good to me.  Will see about playing with it this week.

Thanks again for reviewing, another version which addresses the remaining
issues will be posted soon but I wanted to get this out to give further reviews
something that properly works.

--
Daniel Gustafsson        https://vmware.com/



Attachment

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote:
>
> Thanks for the review!  Below is a partial response, I haven't had time to
> address all your review comments yet but I wanted to submit a rebased patchset
> directly since the current version doesn't work after recent changes in the
> tree. I will address the remaining comments tomorrow or the day after.

Great, thanks!

> This rebase also includes a fix for pgtls_init which was sent offlist by Jacob.
> The changes in pgtls_init can potentially be used to initialize the crypto
> context for NSS to clean up this patch, Jacob is currently looking at that.

Ah, cool, sounds good.

> > They aren't the same and it might not be
> > clear what's going on if one was to somehow mix them (at least if pr_fd
> > continues to sometimes be a void*, but I wonder why that's being
> > done..?  more on that later..).
>
> To paraphrase from a later in this email, there are collisions between nspr and
> postgres on things like BITS_PER_BYTE, and there were also collisions on basic
> types until I learned about NO_NSPR_10_SUPPORT.  By moving the juggling of this
> into common/nss.h we can use proper types without introducing that pollution
> everywhere. I will address these places.

Ah, ok, and great, that sounds good.

> >> +++ b/src/backend/libpq/be-secure-nss.c
> > [...]
> >> +/* default init hook can be overridden by a shared library */
> >> +static void default_nss_tls_init(bool isServerStart);
> >> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init;
> >
> >> +static PRDescIdentity pr_id;
> >> +
> >> +static PRIOMethods pr_iomethods;
> >
> > Happy to be told I'm missing something, but the above two variables seem
> > to only be used in init_iolayer.. is there a reason they're declared
> > here instead of just being declared in that function?
>
> They must be there since NSPR doesn't copy these but reference them.

Ah, ok, interesting.

> >> +    /*
> >> +     * Set the fallback versions for the TLS protocol version range to a
> >> +     * combination of our minimal requirement and the library maximum. Error
> >> +     * messages should be kept identical to those in be-secure-openssl.c to
> >> +     * make translations easier.
> >> +     */
> >
> > Should we pull these error messages out into another header so that
> > they're in one place to make sure they're kept consistent, if we really
> > want to put the effort in to keep them the same..?  I'm not 100% sure
> > that it's actually necessary to do so, but defining these in one place
> > would help maintain this if we want to.  Also alright with just keeping
> > the comment, not that big of a deal.
>
> It might make sense to pull them into common/nss.h, but seeing the error
> message right there when reading the code does IMO make it clearer so it's a
> doubleedged sword.  Not sure what is the best option, but I'm not married to
> the current solution so if there is consensus to pull them out somewhere I'm
> happy to do so.

My thought was to put them into some common/ssl.h or something along
those lines but I don't see it as a big deal either way really.  You
make a good point that having the error message there when reading the
code is nice.

> > Maybe we should put a stake in the ground that says "we only support
> > back to version X of NSS", test with that and a few more recent versions
> > and the most recent, and then rip out anything that's needed for
> > versions which are older than that?
>
> Yes, right now there is very little in the patch which caters for old versions,
> the PR_Init call might be one of the few offenders.  There has been discussion
> upthread about settling for a required version, combining the insights learned
> there with a survey of which versions are commonly available packaged.
>
> Once we settle on a version we can confirm if PR_Init is/isn't needed and
> remove all traces of it if not.

I don't really see this as all that hard to do- I'd suggest we look at
what systems someone might reasonably deploy v14 on.  To that end, I'd
say "only systems which are presently supported", so: RHEL7+, Debian 9+,
Ubuntu 16.04+.  Looking at those, I see:

Ubuntu 16.04: 3.28.4
RHEL6: v3.28.4
Debian: 3.26.2

> > I have a pretty hard time imagining that someone is going to want to build PG
> > v14 w/ NSS 2.0 ...
>
> Let alone compiling 2.0 at all on a recent system..

Indeed, and given the above, it seems entirely reasonable to make the
requirement be NSS v3+, no?  I wouldn't be against making that even
tighter if we thought it made sense to do so.

> > Also- we don't seem to complain at all about a cipher being specified that we
> > don't find?  Guess I would think that we might want to throw a WARNING in such
> > a case, but I could possibly be convinced otherwise.
>
> No, I think you're right, we should throw WARNING there or possibly even a
> higher elevel. Should that be a COMMERROR even?

I suppose the thought I was having was that we might want to allow some
string that covered all the OpenSSL and NSS ciphers that someone feels
comfortable with and we'd just ignore the ones that don't make sense for
the particular library we're currently built with.  Making it a
COMMERROR seems like overkill and I'm not entirely sure we actually want
any warning since we might then be constantly bleating about it.

> > Kind of wonder just what happens with the current code, I'm guessing ciphercode
> > is zero and therefore doesn't complain but also doesn't do what we want.  I
> > wonder if there's a way to test this?
>
> We could extend the test suite to set ciphers in postgresql.conf, I'll give it
> a go.

That'd be great, thanks!

> > I do think we should probably throw an error if we end up with *no*
> > ciphers being set, which doesn't seem to be happening here..?
>
> Yeah, that should be a COMMERROR. Fixed.

I do think it makes sense to throw a COMMERROR here since the connection
is going to end up failing anyway.

> >> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags,
> >> +            PRIntervalTime timeout)
> >> +{
> >> +    PRRecvFN    read_fn;
> >> +    PRInt32        n_read;
> >> +
> >> +    read_fn = fd->lower->methods->recv;
> >> +    n_read = read_fn(fd->lower, buf, amount, flags, timeout);
> >> +
> >> +    return n_read;
> >> +}
> >> +
> >> +static PRInt32
> >> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags,
> >> +             PRIntervalTime timeout)
> >> +{
> >> +    PRSendFN    send_fn;
> >> +    PRInt32        n_write;
> >> +
> >> +    send_fn = fd->lower->methods->send;
> >> +    n_write = send_fn(fd->lower, buf, amount, flags, timeout);
> >> +
> >> +    return n_write;
> >> +}
> >> +
> >> +static PRStatus
> >> +pg_ssl_close(PRFileDesc *fd)
> >> +{
> >> +    /*
> >> +     * Disconnect our private Port from the fd before closing out the stack.
> >> +     * (Debug builds of NSPR will assert if we do not.)
> >> +     */
> >> +    fd->secret = NULL;
> >> +    return PR_GetDefaultIOMethods()->close(fd);
> >> +}
> >
> > Regarding these, I find myself wondering how they're different from the
> > defaults..?  I mean, the above just directly called
> > PR_GetDefaultIOMethods() to then call it's close() function- are the
> > fd->lower_methods->recv/send not the default methods?  I don't quite get
> > what the point is from having our own callbacks here if they just do
> > exactly what the defaults would do (or are there actually no defined
> > defaults and you have to provide these..?).
>
> It's really just to cope with debug builds of NSPR which assert that fd->secret
> is null before closing.

And we have to override the recv/send functions for this too..?  Sorry,
my comment wasn't just about the close() method but about the others
too.

> >> +    /*
> >> +     * Return the underlying PRFileDesc which can be used to access
> >> +     * information on the connection details. There is no SSL context per se.
> >> +     */
> >> +    if (strcmp(struct_name, "NSS") == 0)
> >> +        return conn->pr_fd;
> >> +    return NULL;
> >> +}
> >
> > Is there never a reason someone might want the pointer returned by
> > NSS_InitContext?  I don't know that there is but it might be something
> > to consider (we could even possibly have our own structure returned by
> > this function which includes both, maybe..?).  Not sure if there's a
> > sensible use-case for that or not just wanted to bring it up as it's
> > something I asked myself while reading through this patch.
>
> Not sure I understand what you're asking for here, did you mean "is there ever
> a reason"?

Eh, poor wording on my part.  You're right, the question, reworded
again, was "Would someone want to get the context returned by
NSS_InitContext?".  If we think there's a reason that someone might want
that context then perhaps we should allow getting it, in addition to the
pr_fd.  If there's really no reason to ever want the context from
NSS_InitContext then what you have here where we're returning pr_fd is
probably fine.

> >> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
> >> index c601071838..7f10da3010 100644
> >> --- a/src/interfaces/libpq/fe-secure.c
> >> +++ b/src/interfaces/libpq/fe-secure.c
> >> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn)
> >> }
> >> #endif                            /* USE_OPENSSL */
> >>
> >> +#ifndef USE_NSS
> >> +
> >> +PQsslKeyPassHook_nss_type
> >> +PQgetSSLKeyPassHook_nss(void)
> >> +{
> >> +    return NULL;
> >> +}
> >> +
> >> +void
> >> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook)
> >> +{
> >> +    return;
> >> +}
> >> +
> >> +char *
> >> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg)
> >> +{
> >> +    return NULL;
> >> +}
> >> +#endif                            /* USE_NSS */
> >
> > Isn't this '!USE_NSS'?
>
> Technically it is, but using just /* USE_NSS */ is consistent with the rest of
> blocks in the file.

Hrmpf.  I guess it seems a bit confusing to me to have to go find the
opening #ifndef to realize that it's actally !USE_NSS..  In other words,
I would think we'd actually want to fix all of these, heh.  I only
actually see one case on a quick grep where it's wrong for USE_OPENSSL
and so that doesn't seem like it's really a precedent and is more of a
bug.  We certainly say 'not OPENSSL' in one place today too and also
have a number of places where we have: #endif ... /* ! WHATEVER */.

> >> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
> >> index 0c9e95f1a7..f15af39222 100644
> >> --- a/src/interfaces/libpq/libpq-int.h
> >> +++ b/src/interfaces/libpq/libpq-int.h
> >> @@ -383,6 +383,7 @@ struct pg_conn
> >>     char       *sslrootcert;    /* root certificate filename */
> >>     char       *sslcrl;            /* certificate revocation list filename */
> >>     char       *sslcrldir;        /* certificate revocation list directory name */
> >> +    char       *cert_database;    /* NSS certificate/key database */
> >>     char       *requirepeer;    /* required peer credentials for local sockets */
> >>     char       *gssencmode;        /* GSS mode (require,prefer,disable) */
> >>     char       *krbsrvname;        /* Kerberos service name */
> >> @@ -507,6 +508,28 @@ struct pg_conn
> >>                                  * OpenSSL version changes */
> >> #endif
> >> #endif                            /* USE_OPENSSL */
> >> +
> >> +/*
> >> + * The NSS/NSPR specific types aren't used to avoid pulling in the required
> >> + * headers here, as they are causing conflicts with PG definitions.
> >> + */
> >
> > I'm a bit confused- what are the conflicts being caused here..?
> > Certainly under USE_OPENSSL we use the actual OpenSSL types..
>
> It's referring to collisions with for example BITS_PER_BYTE which is defined
> both by postgres and nspr.  Since writing this I've introduced src/common/nss.h
> to handle it in a single place, so we can indeed use the proper types without
> polluting the file.  Fixed.

Great, thanks!

> >> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library
> >>
> >> The SSL testharness was fully tied to OpenSSL in the way the server was
> >> set up and reconfigured. This refactors the SSLServer module into a SSL
> >> library agnostic SSL/Server module which in turn use SSL/Backend/<lib>
> >> modules for the implementation details.
> >>
> >> No changes are done to the actual tests, this only change how setup and
> >> teardown is performed.
> >
> > Presumably this could be committed ahead of the main NSS support?
>
> Correct, I think this has merits even if NSS support is ultimately rejected.

Ok- could you break it out on to its own thread and I'll see about
committing it soonish, to get it out of the way?

> >> Subject: [PATCH v30 5/9] nss: Documentation
> >> +++ b/doc/src/sgml/acronyms.sgml
> >> @@ -684,6 +717,16 @@
> >>     </listitem>
> >>    </varlistentry>
> >>
> >> +   <varlistentry>
> >> +    <term><acronym>TLS</acronym></term>
> >> +    <listitem>
> >> +     <para>
> >> +      <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security">
> >> +      Transport Layer Security</ulink>
> >> +     </para>
> >> +    </listitem>
> >> +   </varlistentry>
> >
> > We don't have this already..?  Surely we should..
>
> We really should, especially since we've had <acronym>TLS</acronym> in
> config.sgml since 2014 (c6763156589).  That's another small piece that could be
> committed on it's own to cut down the size of this patchset (even if only by a
> tiny amount).

Ditto on this. :)

> >> @@ -1288,7 +1305,9 @@ include_dir 'conf.d'
> >>         connections using TLS version 1.2 and lower are affected.  There is
> >>         currently no setting that controls the cipher choices used by TLS
> >>         version 1.3 connections.  The default value is
> >> -        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>.  The default is usually a
> >> +        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have
> >> +        been built with <productname>OpenSSL</productname> as the
> >> +        <acronym>SSL</acronym> library.  The default is usually a
> >>         reasonable choice unless you have specific security requirements.
> >>        </para>
> >
> > Shouldn't we say something here wrt NSS?
>
> We should, but I'm not entirely what just yet. Need to revisit that.

Not sure if we really want to do this but at least with ssllabs.com,
postgresql.org gets an 'A' rating with this set:

ECDHE-ECDSA-CHACHA20-POLY1305
ECDHE-RSA-CHACHA20-POLY1305
ECDHE-ECDSA-AES128-GCM-SHA256
ECDHE-RSA-AES128-GCM-SHA256
ECDHE-ECDSA-AES256-GCM-SHA384
ECDHE-RSA-AES256-GCM-SHA384
DHE-RSA-AES128-GCM-SHA256
DHE-RSA-AES256-GCM-SHA384
ECDHE-ECDSA-AES128-SHA256
ECDHE-RSA-AES128-SHA256
ECDHE-ECDSA-AES128-SHA
ECDHE-RSA-AES256-SHA384
ECDHE-RSA-AES128-SHA
ECDHE-ECDSA-AES256-SHA384
ECDHE-ECDSA-AES256-SHA
ECDHE-RSA-AES256-SHA
DHE-RSA-AES128-SHA256
DHE-RSA-AES128-SHA
DHE-RSA-AES256-SHA256
DHE-RSA-AES256-SHA
ECDHE-ECDSA-DES-CBC3-SHA
ECDHE-RSA-DES-CBC3-SHA
EDH-RSA-DES-CBC3-SHA
AES128-GCM-SHA256
AES256-GCM-SHA384
AES128-SHA256
AES256-SHA256
AES128-SHA
AES256-SHA
DES-CBC3-SHA
!DSS

Which also seems kinda close to what the default when built with OpenSSL
ends up being?  Thought the ssllabs report does list which ones it
thinks are weak and so we might consider excluding those by default too:

https://www.ssllabs.com/ssltest/analyze.html?d=postgresql.org&s=2a02%3a16a8%3adc51%3a0%3a0%3a0%3a0%3a50

> >> @@ -1490,8 +1509,11 @@ include_dir 'conf.d'
> >>        <para>
> >>         Sets an external command to be invoked when a passphrase for
> >>         decrypting an SSL file such as a private key needs to be obtained.  By
> >> -        default, this parameter is empty, which means the built-in prompting
> >> -        mechanism is used.
> >> +        default, this parameter is empty. When the server is using
> >> +        <productname>OpenSSL</productname>, this means the built-in prompting
> >> +        mechanism is used. When using <productname>NSS</productname>, there is
> >> +        no default prompting so a blank callback will be used returning an
> >> +        empty password.
> >>        </para>
> >
> > Maybe we should point out here that this requires the database to not
> > require a password..?  So if they have one, they need to set this, or
> > maybe we should provide a default one..
>
> I've added a sentence on not using a password for the cert database.  I'm not
> sure if providing a default one is a good idea but it's no less insecure than
> having no password really..

I was meaning a default callback to prompt, not sure if that was clear.

> > Just a, well, not so quick read-through.  Generally it's looking pretty
> > good to me.  Will see about playing with it this week.
>
> Thanks again for reviewing, another version which addresses the remaining
> issues will be posted soon but I wanted to get this out to give further reviews
> something that properly works.

Fantastic, thanks again!

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2021-03-23 at 00:38 +0100, Daniel Gustafsson wrote:
> This rebase also includes a fix for pgtls_init which was sent offlist by Jacob.
> The changes in pgtls_init can potentially be used to initialize the crypto
> context for NSS to clean up this patch, Jacob is currently looking at that.

I'm having a hell of a time trying to get the context stuff working.
Findings so far (I have patches in progress for many of these, but it's
all blowing up because of the last problem):

NSS_INIT_NOROOTINIT is hardcoded for NSS_InitContext(), so we probably
don't need to pass it explicitly. NSS_INIT_PK11RELOAD is apparently
meant to hack around libraries that do their own PKCS loading; do we
need it?

NSS_ShutdownContext() can (and does) fail if we've leaked handles to
objects, so we need to check its return value. Once this happens,
future NSS_InitContext() calls behave poorly. Currently we leak the
pr_fd as well as a handful of server_cert handles.

NSS_NoDB_Init() is going to pin NSS in memory. For the backend this is
probably okay, but for libpq clients that's probably not what we want.

The first database loaded by NSS_InitContext() becomes the "default"
database. This is what I'm currently hung up on. I can't figure out how
to get NSS to use the database that was loaded for the current
connection, so in my local patches for the issues above, client
certificates fail to load. I can work around it temporarily for the
tests, but this will be a problem if any libpq clients load up multiple
independent databases for use with separate connections. Anyone know if
this is a supported use case for NSS?

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Wed, Mar 24, 2021 at 12:05:35AM +0000, Jacob Champion wrote:
> The first database loaded by NSS_InitContext() becomes the "default"
> database. This is what I'm currently hung up on. I can't figure out how
> to get NSS to use the database that was loaded for the current
> connection, so in my local patches for the issues above, client
> certificates fail to load. I can work around it temporarily for the
> tests, but this will be a problem if any libpq clients load up multiple
> independent databases for use with separate connections. Anyone know if
> this is a supported use case for NSS?

Are you referring to the case of threading here?  This should be a
supported case, as threads created by an application through libpq
could perfectly use completely different connection strings.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Tue, Mar 23, 2021 at 12:38:50AM +0100, Daniel Gustafsson wrote:
> Thanks again for reviewing, another version which addresses the remaining
> issues will be posted soon but I wanted to get this out to give further reviews
> something that properly works.

I have been looking at the infrastructure of the tests, patches 0002
(some refactoring) and 0003 (more refactoring with tests for NSS), and
I am a bit confused by its state.

First, I think that the split is not completely clear.  For example,
patch 0003 has changes for OpenSSL.pm and Server.pm, but wouldn't it
be better to have all the refactoring infrastructure only in 0002,
with 0003 introducing only the NSS pieces for its internal data and
NSS.pm?

+       keyfile => 'server-password',
+       nssdatabase => 'server-cn-only.crt__server-password.key.db',
+       passphrase_cmd => 'echo secret1',
001_ssltests.pl and 002_scram.pl have NSS-related parameters, which
does not look like a clean separation to me as there are OpenSSL tests
that use some NSS parts, and the main scripts should remain neutral in
terms setting contents, including only variables and callbacks that
should be filled specifically for each SSL implementation, no?  Aren't
we missing a second piece here with a set of callbacks for the
per-library test paths then?

+   if (defined($openssl))
+   {
+       copy_files("ssl/server-*.crt", $pgdata);
+       copy_files("ssl/server-*.key", $pgdata);
+       chmod(0600, glob "$pgdata/server-*.key") or die $!;
+       copy_files("ssl/root+client_ca.crt", $pgdata);
+       copy_files("ssl/root_ca.crt",        $pgdata);
+       copy_files("ssl/root+client.crl",    $pgdata);
+       mkdir("$pgdata/root+client-crldir");
+       copy_files("ssl/root+client-crldir/*",
"$pgdata/root+client-crldir/");
+   }
+   elsif (defined($nss))
+   {
+       RecursiveCopy::copypath("ssl/nss", $pgdata . "/nss") if -e
"ssl/nss";
+   }
This had better be in its own callback, for example.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-03-24 at 09:28 +0900, Michael Paquier wrote:
> On Wed, Mar 24, 2021 at 12:05:35AM +0000, Jacob Champion wrote:
> > I can work around it temporarily for the
> > tests, but this will be a problem if any libpq clients load up multiple
> > independent databases for use with separate connections. Anyone know if
> > this is a supported use case for NSS?
> 
> Are you referring to the case of threading here?  This should be a
> supported case, as threads created by an application through libpq
> could perfectly use completely different connection strings.
Right, but to clarify -- I was asking if *NSS* supports loading and
using separate certificate databases as part of its API. It seems like
the internals make it possible, but I don't see the public interfaces
to actually use those internals.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings Jacob,

* Jacob Champion (pchampion@vmware.com) wrote:
> On Wed, 2021-03-24 at 09:28 +0900, Michael Paquier wrote:
> > On Wed, Mar 24, 2021 at 12:05:35AM +0000, Jacob Champion wrote:
> > > I can work around it temporarily for the
> > > tests, but this will be a problem if any libpq clients load up multiple
> > > independent databases for use with separate connections. Anyone know if
> > > this is a supported use case for NSS?
> >
> > Are you referring to the case of threading here?  This should be a
> > supported case, as threads created by an application through libpq
> > could perfectly use completely different connection strings.
> Right, but to clarify -- I was asking if *NSS* supports loading and
> using separate certificate databases as part of its API. It seems like
> the internals make it possible, but I don't see the public interfaces
> to actually use those internals.

Yes, this is done using SECMOD_OpenUserDB, see:

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11_Functions#SECMOD_OpenUserDB

also there's info here:

https://groups.google.com/g/mozilla.dev.tech.crypto/c/Xz6Emfcue0E

We should document that, as mentioned in the link above, the NSS find
functions will find certs in all the opened databases.  As this would
all be under one application which is linked against libpq and passing
in different values for ssl_database for different connections, this
doesn't seem like it's really that much of an issue.

Thanks!

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-03-24 at 13:00 -0400, Stephen Frost wrote:
> * Jacob Champion (pchampion@vmware.com) wrote:
> > Right, but to clarify -- I was asking if *NSS* supports loading and
> > using separate certificate databases as part of its API. It seems like
> > the internals make it possible, but I don't see the public interfaces
> > to actually use those internals.
> 
> Yes, this is done using SECMOD_OpenUserDB, see:
> 
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11_Functions#SECMOD_OpenUserDB

Ah, I had assumed that the DB-specific InitContext was using this
behind the scenes; apparently not. I will give that a try, thanks!

> also there's info here:
> 
> https://groups.google.com/g/mozilla.dev.tech.crypto/c/Xz6Emfcue0E
> 
> We should document that, as mentioned in the link above, the NSS find
> functions will find certs in all the opened databases.  As this would
> all be under one application which is linked against libpq and passing
> in different values for ssl_database for different connections, this
> doesn't seem like it's really that much of an issue.

I could see this being a problem if two client certificate nicknames
collide across multiple in-use databases, maybe?

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:
> On Wed, 2021-03-24 at 13:00 -0400, Stephen Frost wrote:
> > * Jacob Champion (pchampion@vmware.com) wrote:
> > > Right, but to clarify -- I was asking if *NSS* supports loading and
> > > using separate certificate databases as part of its API. It seems like
> > > the internals make it possible, but I don't see the public interfaces
> > > to actually use those internals.
> >
> > Yes, this is done using SECMOD_OpenUserDB, see:
> >
> > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11_Functions#SECMOD_OpenUserDB
>
> Ah, I had assumed that the DB-specific InitContext was using this
> behind the scenes; apparently not. I will give that a try, thanks!
>
> > also there's info here:
> >
> > https://groups.google.com/g/mozilla.dev.tech.crypto/c/Xz6Emfcue0E
> >
> > We should document that, as mentioned in the link above, the NSS find
> > functions will find certs in all the opened databases.  As this would
> > all be under one application which is linked against libpq and passing
> > in different values for ssl_database for different connections, this
> > doesn't seem like it's really that much of an issue.
>
> I could see this being a problem if two client certificate nicknames
> collide across multiple in-use databases, maybe?

Right, in such a case either cert might get returned and it's possible
that the "wrong" one is returned and therefore the connection would end
up failing, assuming that they aren't actually the same and just happen
to be in both.

Seems like we could use SECMOD_OpenUserDB() and then pass the result
from that into PK11_ListCertsInSlot() and scan through the certs in just
the specified database to find the one we're looking for if we really
feel compelled to try and address this risk.  I've reached out to the
NSS folks to see if they have any thoughts about the best way to address
this.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 24 Mar 2021, at 04:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 23, 2021 at 12:38:50AM +0100, Daniel Gustafsson wrote:
>> Thanks again for reviewing, another version which addresses the remaining
>> issues will be posted soon but I wanted to get this out to give further reviews
>> something that properly works.
>
> I have been looking at the infrastructure of the tests, patches 0002
> (some refactoring) and 0003 (more refactoring with tests for NSS), and
> I am a bit confused by its state.
>
> First, I think that the split is not completely clear.  For example,
> patch 0003 has changes for OpenSSL.pm and Server.pm, but wouldn't it
> be better to have all the refactoring infrastructure only in 0002,
> with 0003 introducing only the NSS pieces for its internal data and
> NSS.pm?

Yes.  Juggling a patchset of this size is errorprone.  This is why I opened the
separate thread for this where the patch can be held apart cleaner, so let's
take this discussion over there.  I will post an updated patch there shortly.

> +       keyfile => 'server-password',
> +       nssdatabase => 'server-cn-only.crt__server-password.key.db',
> +       passphrase_cmd => 'echo secret1',
> 001_ssltests.pl and 002_scram.pl have NSS-related parameters, which
> does not look like a clean separation to me as there are OpenSSL tests
> that use some NSS parts, and the main scripts should remain neutral in
> terms setting contents, including only variables and callbacks that
> should be filled specifically for each SSL implementation, no?  Aren't
> we missing a second piece here with a set of callbacks for the
> per-library test paths then?

Well, then again, keyfile is an OpenSSL specific parameter, it just happens to
be named quite neutrally.  I'm not sure how to best express the certificate and
key requirements of a test since the testcase is the source of truth in terms
of what it requires.  If we introduce a standard set of cert/keys which all
backends are required to supply, we could refer to those.  Tests that need
something more specific can then go into 00X_<library>.pl.  There is a balance
to strike though, there is a single backend now with at most one on the horizon
which is yet to be decided upon, making it too generic may end up making test
writing overcomplicated. Do you have any concretee ideas?

> +   if (defined($openssl))
> +   {
> +       copy_files("ssl/server-*.crt", $pgdata);
> +       copy_files("ssl/server-*.key", $pgdata);
> +       chmod(0600, glob "$pgdata/server-*.key") or die $!;
> +       copy_files("ssl/root+client_ca.crt", $pgdata);
> +       copy_files("ssl/root_ca.crt",        $pgdata);
> +       copy_files("ssl/root+client.crl",    $pgdata);
> +       mkdir("$pgdata/root+client-crldir");
> +       copy_files("ssl/root+client-crldir/*",
> "$pgdata/root+client-crldir/");
> +   }
> +   elsif (defined($nss))
> +   {
> +       RecursiveCopy::copypath("ssl/nss", $pgdata . "/nss") if -e
> "ssl/nss";
> +   }
> This had better be in its own callback, for example.

Yes, this one is a clearer case, fixed in the v2 patch which will be posted on
the separate thread.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-03-24 at 14:10 -0400, Stephen Frost wrote:
> * Jacob Champion (pchampion@vmware.com) wrote:
> > I could see this being a problem if two client certificate nicknames
> > collide across multiple in-use databases, maybe?
> 
> Right, in such a case either cert might get returned and it's possible
> that the "wrong" one is returned and therefore the connection would end
> up failing, assuming that they aren't actually the same and just happen
> to be in both.
> 
> Seems like we could use SECMOD_OpenUserDB() and then pass the result
> from that into PK11_ListCertsInSlot() and scan through the certs in just
> the specified database to find the one we're looking for if we really
> feel compelled to try and address this risk.  I've reached out to the
> NSS folks to see if they have any thoughts about the best way to address
> this.

Some additional findings (NSS 3.63), please correct me if I've made any mistakes:

The very first NSSInitContext created is special. If it contains a database, that database will be considered part of
the"internal" slot and its certificates can be referenced directly by nickname. If it doesn't have a database, the
internalslot has no certificates, and it will continue to have zero certificates until NSS is completely shut down and
reinitializedwith a new "first" context.
 

Databases that are opened *after* the first one are given their own separate slots. Any certificates that are part of
thosedatabases seemingly can't be referenced directly by nickname. They have to be prefixed by their token name -- a
namewhich you don't have if you used NSS_InitContext() to create the database. You have to use SECMOD_OpenUserDB()
instead.This explains some strange failures I was seeing in local testing, where the order of InitContext determined
whetherour client certificate selection succeeded or failed.
 

If you SECMOD_OpenUserDB() a database that is identical to the first (internal) database, NSS deduplicates for you and
justreturns the internal slot. Which seems like it's helpful, except you're not allowed to close that database, and you
haveto know not to close it by checking to see whether that slot is the "internal key slot". It appears to remain open
untilNSS is shut down entirely.
 
But if you open a database that is *not* the magic internal database,
and then open a duplicate of that one, NSS creates yet another new slot
for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
hog, depending on the global state of the process at the time libpq
opens its first connection. We won't be able to control what the parent
application will do before loading us up.

It also doesn't look like any of the SECMOD_* machinery that we're
looking at is thread-safe, but I'd really like to be wrong...

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 23 Mar 2021, at 20:04, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Daniel Gustafsson (daniel@yesql.se) wrote:
>>> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote:
>>
>> Thanks for the review!  Below is a partial response, I haven't had time to
>> address all your review comments yet but I wanted to submit a rebased patchset
>> directly since the current version doesn't work after recent changes in the
>> tree. I will address the remaining comments tomorrow or the day after.
>
> Great, thanks!
>
>> This rebase also includes a fix for pgtls_init which was sent offlist by Jacob.
>> The changes in pgtls_init can potentially be used to initialize the crypto
>> context for NSS to clean up this patch, Jacob is currently looking at that.
>
> Ah, cool, sounds good.
>
>>> They aren't the same and it might not be
>>> clear what's going on if one was to somehow mix them (at least if pr_fd
>>> continues to sometimes be a void*, but I wonder why that's being
>>> done..?  more on that later..).
>>
>> To paraphrase from a later in this email, there are collisions between nspr and
>> postgres on things like BITS_PER_BYTE, and there were also collisions on basic
>> types until I learned about NO_NSPR_10_SUPPORT.  By moving the juggling of this
>> into common/nss.h we can use proper types without introducing that pollution
>> everywhere. I will address these places.
>
> Ah, ok, and great, that sounds good.

>>>> +    /*
>>>> +     * Set the fallback versions for the TLS protocol version range to a
>>>> +     * combination of our minimal requirement and the library maximum. Error
>>>> +     * messages should be kept identical to those in be-secure-openssl.c to
>>>> +     * make translations easier.
>>>> +     */
>>>
>>> Should we pull these error messages out into another header so that
>>> they're in one place to make sure they're kept consistent, if we really
>>> want to put the effort in to keep them the same..?  I'm not 100% sure
>>> that it's actually necessary to do so, but defining these in one place
>>> would help maintain this if we want to.  Also alright with just keeping
>>> the comment, not that big of a deal.
>>
>> It might make sense to pull them into common/nss.h, but seeing the error
>> message right there when reading the code does IMO make it clearer so it's a
>> doubleedged sword.  Not sure what is the best option, but I'm not married to
>> the current solution so if there is consensus to pull them out somewhere I'm
>> happy to do so.
>
> My thought was to put them into some common/ssl.h or something along
> those lines but I don't see it as a big deal either way really.  You
> make a good point that having the error message there when reading the
> code is nice.

Thinking more on this, I think my vote will be to keep them duplicated in the
code for readability.  Unless there are strong feelings against I think we at
least should start there.

>>> Maybe we should put a stake in the ground that says "we only support
>>> back to version X of NSS", test with that and a few more recent versions
>>> and the most recent, and then rip out anything that's needed for
>>> versions which are older than that?
>>
>> Yes, right now there is very little in the patch which caters for old versions,
>> the PR_Init call might be one of the few offenders.  There has been discussion
>> upthread about settling for a required version, combining the insights learned
>> there with a survey of which versions are commonly available packaged.
>>
>> Once we settle on a version we can confirm if PR_Init is/isn't needed and
>> remove all traces of it if not.
>
> I don't really see this as all that hard to do- I'd suggest we look at
> what systems someone might reasonably deploy v14 on.  To that end, I'd
> say "only systems which are presently supported", so: RHEL7+, Debian 9+,
> Ubuntu 16.04+.

Sounds reasonable.

> Looking at those, I see:
>
> Ubuntu 16.04: 3.28.4
> RHEL6: v3.28.4
> Debian: 3.26.2

I assume these have matching NSPR versions placing the Debian 9 NSPR package as
the lowest required version for that?

>>> I have a pretty hard time imagining that someone is going to want to build PG
>>> v14 w/ NSS 2.0 ...
>>
>> Let alone compiling 2.0 at all on a recent system..
>
> Indeed, and given the above, it seems entirely reasonable to make the
> requirement be NSS v3+, no?  I wouldn't be against making that even
> tighter if we thought it made sense to do so.

I think anything but doing that would be incredibly unreasonable.

>>> Also- we don't seem to complain at all about a cipher being specified that we
>>> don't find?  Guess I would think that we might want to throw a WARNING in such
>>> a case, but I could possibly be convinced otherwise.
>>
>> No, I think you're right, we should throw WARNING there or possibly even a
>> higher elevel. Should that be a COMMERROR even?
>
> I suppose the thought I was having was that we might want to allow some
> string that covered all the OpenSSL and NSS ciphers that someone feels
> comfortable with and we'd just ignore the ones that don't make sense for
> the particular library we're currently built with.  Making it a
> COMMERROR seems like overkill and I'm not entirely sure we actually want
> any warning since we might then be constantly bleating about it.

Right, with a string like that we'd induce WARNING fatigue quickly.  Catching
the case of *no* ciphers enabled with a COMMERROR is going some way towards
being helpful to the user in debugging the failed connection here.

>>>> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags,
>>>> +            PRIntervalTime timeout)
>>>> +{
>>>> +    PRRecvFN    read_fn;
>>>> +    PRInt32        n_read;
>>>> +
>>>> +    read_fn = fd->lower->methods->recv;
>>>> +    n_read = read_fn(fd->lower, buf, amount, flags, timeout);
>>>> +
>>>> +    return n_read;
>>>> +}
>>>> +
>>>> +static PRInt32
>>>> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags,
>>>> +             PRIntervalTime timeout)
>>>> +{
>>>> +    PRSendFN    send_fn;
>>>> +    PRInt32        n_write;
>>>> +
>>>> +    send_fn = fd->lower->methods->send;
>>>> +    n_write = send_fn(fd->lower, buf, amount, flags, timeout);
>>>> +
>>>> +    return n_write;
>>>> +}
>>>> +
>>>> +static PRStatus
>>>> +pg_ssl_close(PRFileDesc *fd)
>>>> +{
>>>> +    /*
>>>> +     * Disconnect our private Port from the fd before closing out the stack.
>>>> +     * (Debug builds of NSPR will assert if we do not.)
>>>> +     */
>>>> +    fd->secret = NULL;
>>>> +    return PR_GetDefaultIOMethods()->close(fd);
>>>> +}
>>>
>>> Regarding these, I find myself wondering how they're different from the
>>> defaults..?  I mean, the above just directly called
>>> PR_GetDefaultIOMethods() to then call it's close() function- are the
>>> fd->lower_methods->recv/send not the default methods?  I don't quite get
>>> what the point is from having our own callbacks here if they just do
>>> exactly what the defaults would do (or are there actually no defined
>>> defaults and you have to provide these..?).
>>
>> It's really just to cope with debug builds of NSPR which assert that fd->secret
>> is null before closing.
>
> And we have to override the recv/send functions for this too..?  Sorry,
> my comment wasn't just about the close() method but about the others
> too.

Ah, no we can ditch the .send and .recv functions and stick with the default
built-ins, I just confirmed this and removed them.  I think they are leftovers
from when I injected debug code there during development, they were as you say
copies of the default.

>>>> +    /*
>>>> +     * Return the underlying PRFileDesc which can be used to access
>>>> +     * information on the connection details. There is no SSL context per se.
>>>> +     */
>>>> +    if (strcmp(struct_name, "NSS") == 0)
>>>> +        return conn->pr_fd;
>>>> +    return NULL;
>>>> +}
>>>
>>> Is there never a reason someone might want the pointer returned by
>>> NSS_InitContext?  I don't know that there is but it might be something
>>> to consider (we could even possibly have our own structure returned by
>>> this function which includes both, maybe..?).  Not sure if there's a
>>> sensible use-case for that or not just wanted to bring it up as it's
>>> something I asked myself while reading through this patch.
>>
>> Not sure I understand what you're asking for here, did you mean "is there ever
>> a reason"?
>
> Eh, poor wording on my part.  You're right, the question, reworded
> again, was "Would someone want to get the context returned by
> NSS_InitContext?".  If we think there's a reason that someone might want
> that context then perhaps we should allow getting it, in addition to the
> pr_fd.  If there's really no reason to ever want the context from
> NSS_InitContext then what you have here where we're returning pr_fd is
> probably fine.

I can't think of any reason, maybe Jacob who has been knee-deep in NSS contexts
have insights which tell a different story?

>>>> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
>>>> index c601071838..7f10da3010 100644
>>>> --- a/src/interfaces/libpq/fe-secure.c
>>>> +++ b/src/interfaces/libpq/fe-secure.c
>>>> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn)
>>>> }
>>>> #endif                            /* USE_OPENSSL */
>>>>
>>>> +#ifndef USE_NSS
>>>> +
>>>> +PQsslKeyPassHook_nss_type
>>>> +PQgetSSLKeyPassHook_nss(void)
>>>> +{
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +void
>>>> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook)
>>>> +{
>>>> +    return;
>>>> +}
>>>> +
>>>> +char *
>>>> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg)
>>>> +{
>>>> +    return NULL;
>>>> +}
>>>> +#endif                            /* USE_NSS */
>>>
>>> Isn't this '!USE_NSS'?
>>
>> Technically it is, but using just /* USE_NSS */ is consistent with the rest of
>> blocks in the file.
>
> Hrmpf.  I guess it seems a bit confusing to me to have to go find the
> opening #ifndef to realize that it's actally !USE_NSS..  In other words,
> I would think we'd actually want to fix all of these, heh.  I only
> actually see one case on a quick grep where it's wrong for USE_OPENSSL
> and so that doesn't seem like it's really a precedent and is more of a
> bug.  We certainly say 'not OPENSSL' in one place today too and also
> have a number of places where we have: #endif ... /* ! WHATEVER */.

No disagreement from me.  To cut down the size of this patchset however I
propose that we tackle this separately and leave this as is in this thread
since it's in line with the rest of the file (for now).

>>>> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library
>>>>
>>>> The SSL testharness was fully tied to OpenSSL in the way the server was
>>>> set up and reconfigured. This refactors the SSLServer module into a SSL
>>>> library agnostic SSL/Server module which in turn use SSL/Backend/<lib>
>>>> modules for the implementation details.
>>>>
>>>> No changes are done to the actual tests, this only change how setup and
>>>> teardown is performed.
>>>
>>> Presumably this could be committed ahead of the main NSS support?
>>
>> Correct, I think this has merits even if NSS support is ultimately rejected.
>
> Ok- could you break it out on to its own thread and I'll see about
> committing it soonish, to get it out of the way?

It was already on it's own thread, as we discussed offlist.  I have since
rebased and expanded that patch over in that thread which has gotten review
that needs to be addressed.  As such, I will not update that patch in the
series in this thread but keep the changes on that thread, and then pull them
back into here when ready.

>>>> Subject: [PATCH v30 5/9] nss: Documentation
>>>> +++ b/doc/src/sgml/acronyms.sgml
>>>> @@ -684,6 +717,16 @@
>>>>    </listitem>
>>>>   </varlistentry>
>>>>
>>>> +   <varlistentry>
>>>> +    <term><acronym>TLS</acronym></term>
>>>> +    <listitem>
>>>> +     <para>
>>>> +      <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security">
>>>> +      Transport Layer Security</ulink>
>>>> +     </para>
>>>> +    </listitem>
>>>> +   </varlistentry>
>>>
>>> We don't have this already..?  Surely we should..
>>
>> We really should, especially since we've had <acronym>TLS</acronym> in
>> config.sgml since 2014 (c6763156589).  That's another small piece that could be
>> committed on it's own to cut down the size of this patchset (even if only by a
>> tiny amount).
>
> Ditto on this. :)

Done in https://postgr.es/m/27109504-82DB-41A8-8E63-C0498314F5B0@yesql.se

>>>> @@ -1288,7 +1305,9 @@ include_dir 'conf.d'
>>>>        connections using TLS version 1.2 and lower are affected.  There is
>>>>        currently no setting that controls the cipher choices used by TLS
>>>>        version 1.3 connections.  The default value is
>>>> -        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>.  The default is usually a
>>>> +        <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have
>>>> +        been built with <productname>OpenSSL</productname> as the
>>>> +        <acronym>SSL</acronym> library.  The default is usually a
>>>>        reasonable choice unless you have specific security requirements.
>>>>       </para>
>>>
>>> Shouldn't we say something here wrt NSS?
>>
>> We should, but I'm not entirely what just yet. Need to revisit that.
>
> Not sure if we really want to do this but at least with ssllabs.com,
> postgresql.org gets an 'A' rating with this set:
>
> ECDHE-ECDSA-CHACHA20-POLY1305
> ECDHE-RSA-CHACHA20-POLY1305
> ECDHE-ECDSA-AES128-GCM-SHA256
> ECDHE-RSA-AES128-GCM-SHA256
> ECDHE-ECDSA-AES256-GCM-SHA384
> ECDHE-RSA-AES256-GCM-SHA384
> DHE-RSA-AES128-GCM-SHA256
> DHE-RSA-AES256-GCM-SHA384
> ECDHE-ECDSA-AES128-SHA256
> ECDHE-RSA-AES128-SHA256
> ECDHE-ECDSA-AES128-SHA
> ECDHE-RSA-AES256-SHA384
> ECDHE-RSA-AES128-SHA
> ECDHE-ECDSA-AES256-SHA384
> ECDHE-ECDSA-AES256-SHA
> ECDHE-RSA-AES256-SHA
> DHE-RSA-AES128-SHA256
> DHE-RSA-AES128-SHA
> DHE-RSA-AES256-SHA256
> DHE-RSA-AES256-SHA
> ECDHE-ECDSA-DES-CBC3-SHA
> ECDHE-RSA-DES-CBC3-SHA
> EDH-RSA-DES-CBC3-SHA
> AES128-GCM-SHA256
> AES256-GCM-SHA384
> AES128-SHA256
> AES256-SHA256
> AES128-SHA
> AES256-SHA
> DES-CBC3-SHA
> !DSS
>
> Which also seems kinda close to what the default when built with OpenSSL
> ends up being?  Thought the ssllabs report does list which ones it
> thinks are weak and so we might consider excluding those by default too:
>
> https://www.ssllabs.com/ssltest/analyze.html?d=postgresql.org&s=2a02%3a16a8%3adc51%3a0%3a0%3a0%3a0%3a50

Agreed, maintaining parity (or thereabouts) with OpenSSL defaults taking
industry best practices into account is probably what we should aim for.

>>>> @@ -1490,8 +1509,11 @@ include_dir 'conf.d'
>>>>       <para>
>>>>        Sets an external command to be invoked when a passphrase for
>>>>        decrypting an SSL file such as a private key needs to be obtained.  By
>>>> -        default, this parameter is empty, which means the built-in prompting
>>>> -        mechanism is used.
>>>> +        default, this parameter is empty. When the server is using
>>>> +        <productname>OpenSSL</productname>, this means the built-in prompting
>>>> +        mechanism is used. When using <productname>NSS</productname>, there is
>>>> +        no default prompting so a blank callback will be used returning an
>>>> +        empty password.
>>>>       </para>
>>>
>>> Maybe we should point out here that this requires the database to not
>>> require a password..?  So if they have one, they need to set this, or
>>> maybe we should provide a default one..
>>
>> I've added a sentence on not using a password for the cert database.  I'm not
>> sure if providing a default one is a good idea but it's no less insecure than
>> having no password really..
>
> I was meaning a default callback to prompt, not sure if that was clear.

Ah, no that's not what I thought you meant.  Do you have any thoughts on what
that callback would look like? Take a password on a TTY input?

Below are a few fixes addressed from the original review email:

>>> +    /*
>>> +     * Set up the custom IO layer.
>>> +     */
>>
>> Might be good to mention that the IO Layer is what sets up the
>> read/write callbacks to be used.
>
> Good point, will do in the next version of the patchset.

Fixed.

>>> +    port->pr_fd = SSL_ImportFD(model, pr_fd);
>>> +    if (!port->pr_fd)
>>> +    {
>>> +        ereport(COMMERROR,
>>> +                (errmsg("unable to initialize")));
>>> +        return -1;
>>> +    }
>>
>> Maybe a comment and a better error message for this?
>
> Will do.

Fixed.

>>>
>>> +    PR_Close(model);
>>
>> This might deserve one also, the whole 'model' construct is a bit
>> different. :)
>
> Agreed. will do.


Fixed.

>> Also, I get that they do similar jobs and that one is in the frontend
>> and the other is in the backend, but I'm not a fan of having two
>> 'ssl_protocol_version_to_nss()'s functions that take different argument
>> types but have exact same name and do functionally different things..
>
> Good point, I'll change that.

Fixed.

>>> +    /*
>>> +     * Configure cipher policy.
>>> +     */
>>> +    status = NSS_SetDomesticPolicy();
>>> +    if (status != SECSuccess)
>>> +    {
>>> +        printfPQExpBuffer(&conn->errorMessage,
>>> +                          libpq_gettext("unable to configure cipher policy: %s"),
>>> +                          pg_SSLerrmessage(PR_GetError()));
>>> +
>>> +        return PGRES_POLLING_FAILED;
>>> +    }
>>
>> Probably good to pull over at least some parts of the comments made in
>> the backend code about SetDomesticPolicy() actually enabling everything
>> (just like all the policies apparently do)...
>
> Good point, will do.

Fixed.

>>> +int
>>> +be_tls_open_server(Port *port)
>>> +{
>>> +    SECStatus    status;
>>> +    PRFileDesc *model;
>>> +    PRFileDesc *pr_fd;
>>
>> pr_fd here is materially different from port->pr_fd, no?  As in, one is
>> the NSS raw TCP fd while the other is the SSL fd, right?  Maybe we
>> should use two different variable names to try and make sure they don't
>> get confused?  Might even set this to NULL after we are done with it
>> too..  Then again, I see later on that when we do the dance with the
>> 'model' PRFileDesc that we just use the same variable- maybe we should
>> do that?  That is, just get rid of this 'pr_fd' and use port->pr_fd
>> always?
>
> Hmm, I think you're right. I will try that for the next patchset version.


>> Similar comments to the backend code- should we just always use
>> conn->pr_fd?  Or should we rename pr_fd to something else?
>
> Renaming is probably not a bad idea, will fix.


Both fixed.

Additionally, a few other off-list reported issues are also fixed in this
version (such as fixing the silly markup doc error and testplan off-by-one etc).

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Fri, 2021-03-26 at 00:22 +0100, Daniel Gustafsson wrote:
> > On 23 Mar 2021, at 20:04, Stephen Frost <sfrost@snowman.net> wrote:
> > 
> > Eh, poor wording on my part.  You're right, the question, reworded
> > again, was "Would someone want to get the context returned by
> > NSS_InitContext?".  If we think there's a reason that someone might want
> > that context then perhaps we should allow getting it, in addition to the
> > pr_fd.  If there's really no reason to ever want the context from
> > NSS_InitContext then what you have here where we're returning pr_fd is
> > probably fine.
> 
> I can't think of any reason, maybe Jacob who has been knee-deep in NSS contexts
> have insights which tell a different story?

The only thing you can do with a context pointer is shut it down, and I
don't think that's something that should be exposed.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:
> On Wed, 2021-03-24 at 14:10 -0400, Stephen Frost wrote:
> > * Jacob Champion (pchampion@vmware.com) wrote:
> > > I could see this being a problem if two client certificate nicknames
> > > collide across multiple in-use databases, maybe?
> >
> > Right, in such a case either cert might get returned and it's possible
> > that the "wrong" one is returned and therefore the connection would end
> > up failing, assuming that they aren't actually the same and just happen
> > to be in both.
> >
> > Seems like we could use SECMOD_OpenUserDB() and then pass the result
> > from that into PK11_ListCertsInSlot() and scan through the certs in just
> > the specified database to find the one we're looking for if we really
> > feel compelled to try and address this risk.  I've reached out to the
> > NSS folks to see if they have any thoughts about the best way to address
> > this.
>
> Some additional findings (NSS 3.63), please correct me if I've made any mistakes:
>
> The very first NSSInitContext created is special. If it contains a database, that database will be considered part of
the"internal" slot and its certificates can be referenced directly by nickname. If it doesn't have a database, the
internalslot has no certificates, and it will continue to have zero certificates until NSS is completely shut down and
reinitializedwith a new "first" context. 
>
> Databases that are opened *after* the first one are given their own separate slots. Any certificates that are part of
thosedatabases seemingly can't be referenced directly by nickname. They have to be prefixed by their token name -- a
namewhich you don't have if you used NSS_InitContext() to create the database. You have to use SECMOD_OpenUserDB()
instead.This explains some strange failures I was seeing in local testing, where the order of InitContext determined
whetherour client certificate selection succeeded or failed. 

This is more-or-less what we would want though, right..?  If a user asks
for a connection with ssl_database=blah and sslcert=whatever, we'd want
to open database 'blah' and search (just) that database for cert
'whatever'.  We could possibly offer other options in the future but
certainly this would work and be the most straight-forward and expected
behavior.

> If you SECMOD_OpenUserDB() a database that is identical to the first (internal) database, NSS deduplicates for you
andjust returns the internal slot. Which seems like it's helpful, except you're not allowed to close that database, and
youhave to know not to close it by checking to see whether that slot is the "internal key slot". It appears to remain
openuntil NSS is shut down entirely. 

Seems like we shouldn't do that and should just use SECMOD_OpenUserDB()
for opening databases.

> But if you open a database that is *not* the magic internal database,
> and then open a duplicate of that one, NSS creates yet another new slot
> for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
> hog, depending on the global state of the process at the time libpq
> opens its first connection. We won't be able to control what the parent
> application will do before loading us up.

I would think we'd want to avoid re-opening the same database multiple
times, to avoid the duplicate slots and such.  If the application code
does it themselves, well, there's not much we can do about that, but we
could at least avoid doing so in *our* code.  I wouldn't expect us to be
opening hundreds of databases either and so keeping a simple list around
of what we've opened and scanning it seems like it'd be workable.  Of
course, this could likely be improved in the future but I would think
that'd be good for an initial implementation.

We could also just generally caution users in our documentation against
using multiple databases.  The NSS folks discourage doing so and it
doesn't strike me as being a terribly useful thing to do anyway, at
least from within one invocation of an application.  Still, if we could
make it work reasonably well, then I'd say we should go ahead and do so.

> It also doesn't look like any of the SECMOD_* machinery that we're
> looking at is thread-safe, but I'd really like to be wrong...

That's unfortuante but solvable by using our own locks, similar
to what's done in fe-secure-openssl.c.

Thanks!

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Fri, 2021-03-26 at 15:33 -0400, Stephen Frost wrote:
> * Jacob Champion (pchampion@vmware.com) wrote:
> > Databases that are opened *after* the first one are given their own
> > separate slots. [...]
> 
> This is more-or-less what we would want though, right..?  If a user asks
> for a connection with ssl_database=blah and sslcert=whatever, we'd want
> to open database 'blah' and search (just) that database for cert
> 'whatever'.  We could possibly offer other options in the future but
> certainly this would work and be the most straight-forward and expected
> behavior.

Yes, but see below.

> > If you SECMOD_OpenUserDB() a database that is identical to the first
> > (internal) database, NSS deduplicates for you and just returns the
> > internal slot. Which seems like it's helpful, except you're not
> > allowed to close that database, and you have to know not to close it
> > by checking to see whether that slot is the "internal key slot". It
> > appears to remain open until NSS is shut down entirely.
> 
> Seems like we shouldn't do that and should just use SECMOD_OpenUserDB()
> for opening databases.

We don't have control over whether or not this happens. If the
application embedding libpq has already loaded the database into the
internal slot via its own NSS initialization, then when we call
SECMOD_OpenUserDB() for that same database, the internal slot will be
returned and we have to handle it accordingly.

It's not a huge amount of work, but it is magic knowledge that has to
be maintained, especially in the absence of specialized clientside
tests.

> > But if you open a database that is *not* the magic internal database,
> > and then open a duplicate of that one, NSS creates yet another new slot
> > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
> > hog, depending on the global state of the process at the time libpq
> > opens its first connection. We won't be able to control what the parent
> > application will do before loading us up.
> 
> I would think we'd want to avoid re-opening the same database multiple
> times, to avoid the duplicate slots and such.  If the application code
> does it themselves, well, there's not much we can do about that, but we
> could at least avoid doing so in *our* code.  I wouldn't expect us to be
> opening hundreds of databases either and so keeping a simple list around
> of what we've opened and scanning it seems like it'd be workable.  Of
> course, this could likely be improved in the future but I would think
> that'd be good for an initial implementation.
> 
> [...]
> 
> > It also doesn't look like any of the SECMOD_* machinery that we're
> > looking at is thread-safe, but I'd really like to be wrong...
> 
> That's unfortuante but solvable by using our own locks, similar
> to what's done in fe-secure-openssl.c.

Yeah. I was hoping to avoid implementing our own locks and refcounts,
but it seems like it's going to be required.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:
> On Fri, 2021-03-26 at 15:33 -0400, Stephen Frost wrote:
> > * Jacob Champion (pchampion@vmware.com) wrote:
> > > Databases that are opened *after* the first one are given their own
> > > separate slots. [...]
> >
> > This is more-or-less what we would want though, right..?  If a user asks
> > for a connection with ssl_database=blah and sslcert=whatever, we'd want
> > to open database 'blah' and search (just) that database for cert
> > 'whatever'.  We could possibly offer other options in the future but
> > certainly this would work and be the most straight-forward and expected
> > behavior.
>
> Yes, but see below.
>
> > > If you SECMOD_OpenUserDB() a database that is identical to the first
> > > (internal) database, NSS deduplicates for you and just returns the
> > > internal slot. Which seems like it's helpful, except you're not
> > > allowed to close that database, and you have to know not to close it
> > > by checking to see whether that slot is the "internal key slot". It
> > > appears to remain open until NSS is shut down entirely.
> >
> > Seems like we shouldn't do that and should just use SECMOD_OpenUserDB()
> > for opening databases.
>
> We don't have control over whether or not this happens. If the
> application embedding libpq has already loaded the database into the
> internal slot via its own NSS initialization, then when we call
> SECMOD_OpenUserDB() for that same database, the internal slot will be
> returned and we have to handle it accordingly.
>
> It's not a huge amount of work, but it is magic knowledge that has to
> be maintained, especially in the absence of specialized clientside
> tests.

Ah..  yeah, fair enough.  We could document that we discourage
applications from doing so, but I agree that we'll need to deal with it
since it could happen.

> > > But if you open a database that is *not* the magic internal database,
> > > and then open a duplicate of that one, NSS creates yet another new slot
> > > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource
> > > hog, depending on the global state of the process at the time libpq
> > > opens its first connection. We won't be able to control what the parent
> > > application will do before loading us up.
> >
> > I would think we'd want to avoid re-opening the same database multiple
> > times, to avoid the duplicate slots and such.  If the application code
> > does it themselves, well, there's not much we can do about that, but we
> > could at least avoid doing so in *our* code.  I wouldn't expect us to be
> > opening hundreds of databases either and so keeping a simple list around
> > of what we've opened and scanning it seems like it'd be workable.  Of
> > course, this could likely be improved in the future but I would think
> > that'd be good for an initial implementation.
> >
> > [...]
> >
> > > It also doesn't look like any of the SECMOD_* machinery that we're
> > > looking at is thread-safe, but I'd really like to be wrong...
> >
> > That's unfortuante but solvable by using our own locks, similar
> > to what's done in fe-secure-openssl.c.
>
> Yeah. I was hoping to avoid implementing our own locks and refcounts,
> but it seems like it's going to be required.

Yeah, afraid so.

Thanks!

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Fri, 2021-03-26 at 18:05 -0400, Stephen Frost wrote:
> * Jacob Champion (pchampion@vmware.com) wrote:
> > Yeah. I was hoping to avoid implementing our own locks and refcounts,
> > but it seems like it's going to be required.
> 
> Yeah, afraid so.

I think it gets worse, after having debugged some confusing crashes.
There's already been a discussion on PR_Init upthread a bit:

> Once we settle on a version we can confirm if PR_Init is/isn't needed and
> remove all traces of it if not.

What the NSPR documentation omits is that implicit initialization is
not threadsafe. So NSS_InitContext() is technically "threadsafe"
because it's built on PR_CallOnce(), but if you haven't called
PR_Init() yet, multiple simultaneous PR_CallOnce() calls can crash into
each other.

So, fine. We just add our own locks around NSS_InitContext() (or around
a single call to PR_Init()). Well, the first thread to win and
successfully initialize NSPR gets marked as the "primordial" thread
using thread-local state. And it gets a pthread destructor that does...
something. So lazy initialization seems a bit dangerous regardless of
whether or not we add locks, but I can't really prove whether it's
dangerous or not in practice.

I do know that only the primordial thread is allowed to call
PR_Cleanup(), and of course we wouldn't be able to control which thread
does what for libpq clients. I don't know what other assumptions are
made about the primordial thread, or if there are any platform-specific 
behaviors with older versions of NSPR that we'd need to worry about. It
used to be that the primordial thread was not allowed to exit before
any other threads, but that restriction was lifted at some point [1].

I think we're going to need some analogue to PQinitOpenSSL() to help
client applications cut through the mess, but I'm not sure what it
should look like, or how we would maintain any sort of API
compatibility between the two flavors. And does libpq already have some
notion of a "main thread" that I'm missing?

--Jacob

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=294955

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Wed, Mar 31, 2021 at 10:15:15PM +0000, Jacob Champion wrote:
> I think we're going to need some analogue to PQinitOpenSSL() to help
> client applications cut through the mess, but I'm not sure what it
> should look like, or how we would maintain any sort of API
> compatibility between the two flavors. And does libpq already have some
> notion of a "main thread" that I'm missing?

Nope as far as I recall.  With OpenSSL, the initialization of the SSL
mutex lock and the crypto callback initialization is done by the first
thread in.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Wed, Mar 31, 2021 at 10:15:15PM +0000, Jacob Champion wrote:
> > I think we're going to need some analogue to PQinitOpenSSL() to help
> > client applications cut through the mess, but I'm not sure what it
> > should look like, or how we would maintain any sort of API
> > compatibility between the two flavors. And does libpq already have some
> > notion of a "main thread" that I'm missing?
>
> Nope as far as I recall.  With OpenSSL, the initialization of the SSL
> mutex lock and the crypto callback initialization is done by the first
> thread in.

Yeah, we haven't got any such concept in libpq.  I do think that some of
this can simply be documented as "if you do this, then you need to make
sure to do this".

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 23 Mar 2021, at 00:38, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote:

Attached is a rebase on top of the recent SSL related commits with a few more
fixes from previous reviews.

>>> +++ b/src/interfaces/libpq/fe-connect.c
>>> @@ -359,6 +359,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
>>>         "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */
>>>     offsetof(struct pg_conn, target_session_attrs)},
>>>
>>> +    {"cert_database", NULL, NULL, NULL,
>>> +        "CertificateDatabase", "", 64,
>>> +    offsetof(struct pg_conn, cert_database)},
>>
>> I mean, maybe nitpicking here, but all the other SSL stuff is
>> 'sslsomething' and the backend version of this is 'ssl_database', so
>> wouldn't it be more consistent to have this be 'ssldatabase'?
>
> Thats a good point, I was clearly Stockholm syndromed since I hadn't reflected
> on that but it's clearly wrong.  Will fix.

Fixed

>>> +    /*
>>> +     * If we don't have a certificate database, the system trust store is the
>>> +     * fallback we can use. If we fail to initialize that as well, we can
>>> +     * still attempt a connection as long as the sslmode isn't verify*.
>>> +     */
>>> +    if (!conn->cert_database && conn->sslmode[0] == 'v')
>>> +    {
>>> +        status = pg_load_nss_module(&ca_trust, ca_trust_name, "\"Root Certificates\"");
>>> +        if (status != SECSuccess)
>>> +        {
>>> +            printfPQExpBuffer(&conn->errorMessage,
>>> +                              libpq_gettext("WARNING: unable to load NSS trust module \"%s\" : %s"),
>>> +                              ca_trust_name,
>>> +                              pg_SSLerrmessage(PR_GetError()));
>>> +
>>> +            return PGRES_POLLING_FAILED;
>>> +        }
>>> +    }
>>
>> Maybe have something a bit more here about "maybe you should specifify a
>> cert_database" or such?
>
> Good point, will expand with more detail.

Fixed.

>>> +    /*
>>> +     * Specify which hostname we are expecting to talk to. This is required,
>>> +     * albeit mostly applies to when opening a connection to a traditional
>>> +     * http server it seems.
>>> +     */
>>> +    SSL_SetURL(conn->pr_fd, (conn->connhost[conn->whichhost]).host);
>>
>> We should probably also set SNI, if available (NSS 3.12.6 it seems?),
>> since it looks like that's going to be added to the OpenSSL code.
>
> Good point, will do.

Actually, it turns out that NSS 3.12.6 introduced the serverside SNI handling
by providing callbacks to respond to hostname verification.  There was no
mention of clientside SNI in the NSS documentation that I could find, reading
the code however SSL_SetURL does actually set the SNI extension in the
ClientHello.  So, clientsidee SNI (which is what is proposed for the OpenSSL
backend) is already in.

>>> +    do
>>> +    {
>>> +        status = SSL_ForceHandshake(conn->pr_fd);
>>> +    }
>>> +    while (status != SECSuccess && PR_GetError() == PR_WOULD_BLOCK_ERROR);
>>
>> We don't seem to have this loop in the backend code..  Is there some
>> reason that we don't?  Is it possible that we need to have a loop here
>> too?  I recall in the GSS encryption code there were definitely things
>> during setup that had to be looped back over on both sides to make sure
>> everything was finished ...
>
> Off the cuff I can't remember, will look into it.

Thinking more about this, I don't think we should have the loop at all in the
frontend either.  The reason it was added was to cover cases where we're
confused about blocking but I can't actually see the case I was worried about
in the code so I think it's useless.  Removed.

>>> +    if (strcmp(attribute_name, "protocol") == 0)
>>> +    {
>>> +        switch (channel.protocolVersion)
>>> +        {
>>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_3
>>> +            case SSL_LIBRARY_VERSION_TLS_1_3:
>>> +                return "TLSv1.3";
>>> +#endif
>>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_2
>>> +            case SSL_LIBRARY_VERSION_TLS_1_2:
>>> +                return "TLSv1.2";
>>> +#endif
>>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_1
>>> +            case SSL_LIBRARY_VERSION_TLS_1_1:
>>> +                return "TLSv1.1";
>>> +#endif
>>> +            case SSL_LIBRARY_VERSION_TLS_1_0:
>>> +                return "TLSv1.0";
>>> +            default:
>>> +                return "unknown";
>>> +        }
>>> +    }
>>
>> Not sure that it really matters, but this seems like it might be useful
>> to have as its own function...  Maybe even a data structure that both
>> functions use just in oppostie directions.  Really minor tho. :)
>
> I suppose that wouldn't be a bad thing, will fix.

Moved this into a shared function as it's used by both frontend and backend.
It's moved mostly verbatim as it seemed simple enough to not warrant much
complication.

--
Daniel Gustafsson        https://vmware.com/



Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Mon, Apr 05, 2021 at 12:13:43AM +0200, Daniel Gustafsson wrote:
> Another rebase to cope with recent changes (hmac, ssl tests etc) that
> conflicted and broke this patchset.

Please find an updated set, v35, attached, and my apologies for
breaking again your patch set.  While testing this patch set and
adjusting the SSL tests with HEAD, I have noticed what looks like a
bug with the DN mapping that NSS does not run.  The connection strings
are the same in v35 and in v34, with dbname only changing in-between.

Just to be sure, because I could have done something wrong with the
rebase of v35, I have done the same test with v34 applied on top of
dfc843d and things are failing.  So it seems to me that there is an
issue with the DN mapping part.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Mon, Apr 05, 2021 at 11:12:22AM +0900, Michael Paquier wrote:
> Please find an updated set, v35, attached, and my apologies for
> breaking again your patch set.  While testing this patch set and
> adjusting the SSL tests with HEAD, I have noticed what looks like a
> bug with the DN mapping that NSS does not run.  The connection strings
> are the same in v35 and in v34, with dbname only changing in-between.
>
> Just to be sure, because I could have done something wrong with the
> rebase of v35, I have done the same test with v34 applied on top of
> dfc843d and things are failing.  So it seems to me that there is an
> issue with the DN mapping part.

For now, I have marked this patch set as returned with feedback as it
is still premature for integration, and there are still bugs in it.
FWIW, I think that there is a future for providing an alternative to
OpenSSL, so, even if it could not make it for this release, I'd like
to push forward with this area more seriously as of 15.  The recent
libcrypto-related refactorings were one step in this direction, as
well.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 25 Mar 2021, at 00:56, Jacob Champion <pchampion@vmware.com> wrote:

> Databases that are opened *after* the first one are given their own separate slots. Any certificates that are part of
thosedatabases seemingly can't be referenced directly by nickname. They have to be prefixed by their token name -- a
namewhich you don't have if you used NSS_InitContext() to create the database. You have to use SECMOD_OpenUserDB()
instead.This explains some strange failures I was seeing in local testing, where the order of InitContext determined
whetherour client certificate selection succeeded or failed. 

Sorry for the latency is responding, but I'm now back from parental leave.

AFAICT the tokenname for the database can be set with the dbTokenDescription
member in the NSSInitParameters struct passed to NSS_InitContext() (documented
in nss.h).  Using this we can avoid the messier SECMOD machinery and use the
token in the auth callback to refer to the database we loaded.  I hacked this
up in my local tree (rebased patchset coming soon) and it seems to work as
intended.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
Attached is a rebase to keep bitrot at bay.  On top rebasing and smaller fixes
in comments etc, this version fixes/adds a number things:

* Performs DN resolution to support the DN mapping
* Locks the SECMOD parts and PR_Init call in the frontend as per Jacobs
  findings upthread
* Properly set the tokenname of the database to avoid ambigious lookups in case
  multiple databases are loaded (a better name to ensure uniqueness is a TODO)
* Adds a test for certificate lookup without sslcert set



--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jeff Davis
Date:
On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote:
> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid
> needing
> to check for errors.

[ apologies for the late reply ]

Would it be more proper to call it --with-tls={openssl,nss} ?

Regards,
    Jeff Davis





Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 3 Jun 2021, at 19:37, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote:
>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid
>> needing
>> to check for errors.
>
> [ apologies for the late reply ]
>
> Would it be more proper to call it --with-tls={openssl,nss} ?

Well, we use SSL for everything else (GUCs, connection params and env vars etc)
so I think --with-ssl is sensible.

However, SSL and TLS are used quite interchangeably these days so I think it
makes sense to provide --with-tls as an alias.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Andrew Dunstan
Date:
On 6/3/21 1:47 PM, Daniel Gustafsson wrote:
>> On 3 Jun 2021, at 19:37, Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote:
>>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid
>>> needing
>>> to check for errors.
>> [ apologies for the late reply ]
>>
>> Would it be more proper to call it --with-tls={openssl,nss} ?
> Well, we use SSL for everything else (GUCs, connection params and env vars etc)
> so I think --with-ssl is sensible.
>
> However, SSL and TLS are used quite interchangeably these days so I think it
> makes sense to provide --with-tls as an alias.
>

Yeah, but it's annoying to have to start every talk I give touching this
subject with the slide that says "When we say SSL we really means TLS".
Maybe release 15 would be a good time to rename user-visible option
names etc, with support for legacy names.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: Support for NSS as a libpq TLS backend

From
Jeff Davis
Date:
On Thu, 2021-06-03 at 15:53 -0400, Andrew Dunstan wrote:
> Yeah, but it's annoying to have to start every talk I give touching
> this
> subject with the slide that says "When we say SSL we really means
> TLS".
> Maybe release 15 would be a good time to rename user-visible option
> names etc, with support for legacy names.

Sounds good to me, though I haven't looked into how big of a diff that
will be.

Also, do we have precedent for GUC aliases? That might be a little
weird.

Regards,
    Jeff Davis





Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 3 Jun 2021, at 22:14, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2021-06-03 at 15:53 -0400, Andrew Dunstan wrote:
>> Yeah, but it's annoying to have to start every talk I give touching
>> this
>> subject with the slide that says "When we say SSL we really means
>> TLS".
>> Maybe release 15 would be a good time to rename user-visible option
>> names etc, with support for legacy names.

Perhaps.  Having spent some time in this space, SSL has IMHO become the de
facto term for an encrypted connection at the socket layer, with TLS being the
current protocol suite (additionally, often referred to SSL/TLS).  Offering
tls* counterparts to our ssl GUCs etc will offer a level of correctness but I
doubt we'll ever get rid of ssl* so we might not help too many users by the
added complexity.

It might also put us a hard spot if the next TLS spec ends up being called
something other than TLS?  It's clearly happened before =)

> Sounds good to me, though I haven't looked into how big of a diff that
> will be.
>
> Also, do we have precedent for GUC aliases? That might be a little
> weird.

I don't think we do currently, but I have a feeling the topic has surfaced here
before.

If we end up settling on this being something we want I can volunteer to do the
legwork, but it seems a discussion best had before a patch is drafted.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> It might also put us a hard spot if the next TLS spec ends up being called
> something other than TLS?  It's clearly happened before =)

Good point.  I'm inclined to just stick with the SSL terminology.

>> Also, do we have precedent for GUC aliases? That might be a little
>> weird.

> I don't think we do currently, but I have a feeling the topic has surfaced here
> before.

We do, look for "sort_mem" in guc.c.  So it's not like it'd be
inconvenient to implement.  But I think user confusion and the
potential for the new terminology to fail to be any more
future-proof are good reasons to just leave the names alone.

            regards, tom lane



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 3 Jun 2021, at 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>>> Also, do we have precedent for GUC aliases? That might be a little
>>> weird.
>
>> I don't think we do currently, but I have a feeling the topic has surfaced here
>> before.
>
> We do, look for "sort_mem" in guc.c.

I knew it seemed familiar but I failed to find it, thanks for the pointer.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Bruce Momjian
Date:
On Thu, Jun  3, 2021 at 04:55:45PM -0400, Tom Lane wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
> > It might also put us a hard spot if the next TLS spec ends up being called
> > something other than TLS?  It's clearly happened before =)
> 
> Good point.  I'm inclined to just stick with the SSL terminology.

I wonder if we should use SSL/TLS in more places in our documentation.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Support for NSS as a libpq TLS backend

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I wonder if we should use SSL/TLS in more places in our documentation.

No objection to doing that in the docs; I'm just questioning
switching the code-visible names.

            regards, tom lane



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 3 Jun 2021, at 23:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Bruce Momjian <bruce@momjian.us> writes:
>> I wonder if we should use SSL/TLS in more places in our documentation.
>
> No objection to doing that in the docs; I'm just questioning
> switching the code-visible names.

As long as it's still searchable by "SSL", "TLS" and "SSL/TLS" and not just the
latter.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Magnus Hagander
Date:
On Thu, Jun 3, 2021 at 11:14 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 3 Jun 2021, at 23:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Bruce Momjian <bruce@momjian.us> writes:
> >> I wonder if we should use SSL/TLS in more places in our documentation.
> >
> > No objection to doing that in the docs; I'm just questioning
> > switching the code-visible names.

+1.

I also don't think it's worth changing the actual names, I think
that'll cause more problems than it solves. But we can, and probably
should, change the messaging around it, particularly the docs (but
probably also comments in the config file).


> As long as it's still searchable by "SSL", "TLS" and "SSL/TLS" and not just the
> latter.

Agreed, making it searchable and easily cross-linkable.. And maybe
both terms should be in the glossary.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Fri, 2021-05-28 at 11:04 +0200, Daniel Gustafsson wrote:
> Attached is a rebase to keep bitrot at bay.

I get a failure during one of the CRL directory tests due to a missing
database -- it looks like the Makefile is missing an entry. (I'm
dusting off my build after a few months away, so I don't know if this
latest rebase introduced it or not.)

Attached is a quick patch; does it work on your machine?

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 15 Jun 2021, at 00:15, Jacob Champion <pchampion@vmware.com> wrote:

> Attached is a quick patch; does it work on your machine?

It does, thanks!  I've included it in the attached v37 along with a few tiny
non-functional improvements in comment spelling etc.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-06-16 at 00:08 +0200, Daniel Gustafsson wrote:
> > On 15 Jun 2021, at 00:15, Jacob Champion <pchampion@vmware.com> wrote:
> > Attached is a quick patch; does it work on your machine?
> 
> It does, thanks!  I've included it in the attached v37 along with a few tiny
> non-functional improvements in comment spelling etc.

Great, thanks!

I've been tracking down reference leaks in the client. These open
references prevent NSS from shutting down cleanly, which then makes it
impossible to open a new context in the future. This probably affects
other libpq clients more than it affects psql.

The first step to fixing that is not ignoring failures during NSS
shutdown, so I've tried a patch to pgtls_close() that pushes any
failures through the pqInternalNotice(). That seems to be working well.
The tests were still mostly green, so I taught connect_ok() to fail if
any stderr showed up, and that exposed quite a few failures.

I am currently stuck on one last failing test. This leak seems to only
show up when using TLSv1.2 or below. There doesn't seem to be a
substantial difference in libpq code coverage between 1.2 and 1.3, so
I'm worried that either 1) there's some API we use that "requires"
cleanup, but only on 1.2 and below, or 2) there's some bug in my
version of NSS.

Attached are a few work-in-progress patches. I think the reference
cleanups themselves are probably solid, but the rest of it could use
some feedback. Are there better ways to test for this? and can anyone
reproduce the TLSv1.2 leak?

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote:

> I've been tracking down reference leaks in the client. These open
> references prevent NSS from shutting down cleanly, which then makes it
> impossible to open a new context in the future. This probably affects
> other libpq clients more than it affects psql.

Ah, nice catch, that's indeed a bug in the frontend implementation.  The
problem is that the NSS trustdomain cache *must* be empty before shutting down
the context, else this very issue happens. Note this in be_tls_destroy():

    /*
     * It reads a bit odd to clear a session cache when we are destroying the
     * context altogether, but if the session cache isn't cleared before
     * shutting down the context it will fail with SEC_ERROR_BUSY.
     */
    SSL_ClearSessionCache();

Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.

There is another resource leak left (visible in one test after the above is
added), the SECMOD module needs to be unloaded in case it's been loaded.
Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).

> The first step to fixing that is not ignoring failures during NSS
> shutdown, so I've tried a patch to pgtls_close() that pushes any
> failures through the pqInternalNotice(). That seems to be working well.

I'm keeping these in during hacking, with a comment that they need to be
revisited during review since they are mainly useful for debugging.

> The tests were still mostly green, so I taught connect_ok() to fail if
> any stderr showed up, and that exposed quite a few failures.


With your patches I'm seeing a couple of these:

  SSL error: The one-time function was previously called and failed. Its error code is no longer available

This is an error from NSPR, but it's not clear to me which PR_CallOnce call
it's coming from.  It seems to be hitting in the SAN and CRL tests, so it
smells of some form of caching implemented with NSPR API's to me but thats a
mere hunch.

> I am currently stuck on one last failing test. This leak seems to only
> show up when using TLSv1.2 or below.

AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting
renegotiation.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote:
> > On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote:
> > I've been tracking down reference leaks in the client. These open
> > references prevent NSS from shutting down cleanly, which then makes it
> > impossible to open a new context in the future. This probably affects
> > other libpq clients more than it affects psql.
> 
> Ah, nice catch, that's indeed a bug in the frontend implementation.  The
> problem is that the NSS trustdomain cache *must* be empty before shutting down
> the context, else this very issue happens. Note this in be_tls_destroy():
> 
>     /*
>      * It reads a bit odd to clear a session cache when we are destroying the
>      * context altogether, but if the session cache isn't cleared before
>      * shutting down the context it will fail with SEC_ERROR_BUSY.
>      */
>     SSL_ClearSessionCache();
> 
> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.

That's unfortunate. The session cache is global, right? So I'm guessing
we'll need to refcount and lock that call, to avoid cleaning up out
from under a thread that's actively using the the cache?

> There is another resource leak left (visible in one test after the above is
> added), the SECMOD module needs to be unloaded in case it's been loaded.
> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
> 
> [...]
> 
> With your patches I'm seeing a couple of these:
> 
>   SSL error: The one-time function was previously called and failed. Its error code is no longer available

Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the
moment) fixes all of the SSL tests for me, and I don't see either the
SECMOD leak or the "one-time function" error that you've mentioned.
What version of NSS are you running? I'm on 3.63.

I've attached my current patchset (based on v37) for comparison.

> > I am currently stuck on one last failing test. This leak seems to only
> > show up when using TLSv1.2 or below.
> 
> AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting
> renegotiation.

Nice, at least that mystery is solved. :D

Thanks,
--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 16 Jun 2021, at 18:15, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote:
>>> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote:
>>> I've been tracking down reference leaks in the client. These open
>>> references prevent NSS from shutting down cleanly, which then makes it
>>> impossible to open a new context in the future. This probably affects
>>> other libpq clients more than it affects psql.
>>
>> Ah, nice catch, that's indeed a bug in the frontend implementation.  The
>> problem is that the NSS trustdomain cache *must* be empty before shutting down
>> the context, else this very issue happens. Note this in be_tls_destroy():
>>
>>    /*
>>     * It reads a bit odd to clear a session cache when we are destroying the
>>     * context altogether, but if the session cache isn't cleared before
>>     * shutting down the context it will fail with SEC_ERROR_BUSY.
>>     */
>>    SSL_ClearSessionCache();
>>
>> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.
>
> That's unfortunate. The session cache is global, right? So I'm guessing
> we'll need to refcount and lock that call, to avoid cleaning up out
> from under a thread that's actively using the the cache?

I'm not sure, the documentation doesn't give any answers and implementations of
libnss tend to just clear the cache without consideration.  In libcurl we do
just that, and haven't had any complaints - which doesn't mean it's correct but
it's a datapoint.

>> There is another resource leak left (visible in one test after the above is
>> added), the SECMOD module needs to be unloaded in case it's been loaded.
>> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
>> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
>>
>> [...]
>>
>> With your patches I'm seeing a couple of these:
>>
>>  SSL error: The one-time function was previously called and failed. Its error code is no longer available
>
> Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the
> moment) fixes all of the SSL tests for me, and I don't see either the
> SECMOD leak or the "one-time function" error that you've mentioned.

Reading the code I don't think a loaded user module is considered a resource
that must've been released prior to closing the context.  I will dig for what
showed up in my tests, but I don't think it was caused by this.

> What version of NSS are you running? I'm on 3.63.

Right now I'm using what Debian 10 is packaging which is 3.42.  Admittedly not
hot off the press but I've been trying to develop off a packaged version which
we might see users wanting to deploy against should this get shipped.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
Attached is a rebased version which incorporates your recent patchset for
resource handling, as well as the connect_ok test patch.

I've implemented tracking the close_notify alert that you mentioned offlist,
but it turns out that the alert callbacks in NSS are of limited use so it
close_notify is currently the only checked description.  The enum which labels
the descriptions in the SSLAlert struct is private, so it's just sending over
an anonymous number apart from close_notify which is zero.

A few other fixups are included as well, like adapting the pending data read
function in the frontend to how the OpenSSL implementation does it.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-06-23 at 15:48 +0200, Daniel Gustafsson wrote:
> Attached is a rebased version which incorporates your recent patchset for
> resource handling, as well as the connect_ok test patch.

With v38 I do see the "one-time function was previously called and
failed" message you mentioned before, as well as some PR_Assert()
crashes. Looks like it's just due to the placement of
SSL_ClearSessionCache(); gating it behind the conn->nss_context check
ensures that we don't call it if no NSS context actually exists. Patch
attached (0001).

--

Continuing my jog around the patch... client connections will crash if
hostaddr is provided rather than host, because SSL_SetURL can't handle
a NULL argument. I'm running with 0002 to fix it for the moment, but
I'm not sure yet if it does the right thing for IP addresses, which the
OpenSSL side has a special case for.

Early EOFs coming from the server don't currently have their own error
message, which leads to a confusingly empty

    connection to server at "127.0.0.1", port 47447 failed: 

0003 adds one, to roughly match the corresponding OpenSSL message.

While I was fixing that I noticed that I was getting a "unable to
verify certificate" error message for the early EOF case, even with
sslmode=require. That error message is being printed to conn-
>errorMessage during pg_cert_auth_handler(), even if we're not
verifying certificates, and then that message is included in later
unrelated failures. 0004 patches that.

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 19 Jul 2021, at 21:33, Jacob Champion <pchampion@vmware.com> wrote:

> ..client connections will crash if
> hostaddr is provided rather than host, because SSL_SetURL can't handle
> a NULL argument. I'm running with 0002 to fix it for the moment, but
> I'm not sure yet if it does the right thing for IP addresses, which the
> OpenSSL side has a special case for.

AFAICT the idea is to handle it in the cert auth callback, so I've added some
PoC code to check for sslsni there and updated the TODO comment to reflect
that.

I've applied your patches in the attached rebase which passes all tests for me.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2021-08-10 at 19:22 +0200, Daniel Gustafsson wrote:
> Another rebase to work around the recent changes in the ssl Makefile.

I have a local test suite that I've been writing against libpq. With
the new ssldatabase connection option, one tricky aspect is figuring
out whether it's supported or not. It doesn't look like there's any way
to tell, from a client application, whether NSS or OpenSSL (or neither)
is in use.

You'd mentioned that perhaps we should support a call like

    PQsslAttribute(NULL, "library"); /* returns "NSS", "OpenSSL", or NULL */

so that you don't have to have an actual connection first in order to
figure out what connection options you need to supply. Clients that
support multiple libpq versions would need to know whether that call is
reliable (older versions of libpq will always return NULL, whether SSL
is compiled in or not), so maybe we could add a feature macro at the
same time?

We could also add a new API (say, PQsslLibrary()) but I don't know if
that gives us anything in practice. Thoughts?

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Michael Paquier
Date:
On Wed, Aug 18, 2021 at 12:06:59AM +0000, Jacob Champion wrote:
> I have a local test suite that I've been writing against libpq. With
> the new ssldatabase connection option, one tricky aspect is figuring
> out whether it's supported or not. It doesn't look like there's any way
> to tell, from a client application, whether NSS or OpenSSL (or neither)
> is in use.

That's about guessing which library libpq is compiled with, so yes
that's a problem.

> so that you don't have to have an actual connection first in order to
> figure out what connection options you need to supply. Clients that
> support multiple libpq versions would need to know whether that call is
> reliable (older versions of libpq will always return NULL, whether SSL
> is compiled in or not), so maybe we could add a feature macro at the
> same time?

Still, the problem is wider than that, no?  One cannot know either if
a version of libpq is able to work with GSSAPI until they attempt a
connection with gssencmode.  It seems to me that we should work on the
larger picture here.

> We could also add a new API (say, PQsslLibrary()) but I don't know if
> that gives us anything in practice. Thoughts?

Knowing that the GSSAPI stuff is part of fe-secure.c, we may want
instead a call that returns a list of supported secure libraries.
--
Michael

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 18 Aug 2021, at 02:32, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Aug 18, 2021 at 12:06:59AM +0000, Jacob Champion wrote:
>> I have a local test suite that I've been writing against libpq. With
>> the new ssldatabase connection option, one tricky aspect is figuring
>> out whether it's supported or not. It doesn't look like there's any way
>> to tell, from a client application, whether NSS or OpenSSL (or neither)
>> is in use.
>
> That's about guessing which library libpq is compiled with, so yes
> that's a problem.
>
>> so that you don't have to have an actual connection first in order to
>> figure out what connection options you need to supply. Clients that
>> support multiple libpq versions would need to know whether that call is
>> reliable (older versions of libpq will always return NULL, whether SSL
>> is compiled in or not), so maybe we could add a feature macro at the
>> same time?
>
> Still, the problem is wider than that, no?  One cannot know either if
> a version of libpq is able to work with GSSAPI until they attempt a
> connection with gssencmode.  It seems to me that we should work on the
> larger picture here.

I think we should do both.  PQsslAttribute() already exists, and being able to
get the library attribute for NULL conn object when there are multiple
libraries makes a lot of sense to me.  That doesn’t exclude working on a better
way for apps to interrogate the libpq they have at hand for which capabilities
it has.  Personally I’m not sure what that API could look like, but we should
discuss that in a separate thread I guess.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2021-07-26 at 15:26 +0200, Daniel Gustafsson wrote:
> > On 19 Jul 2021, at 21:33, Jacob Champion <pchampion@vmware.com> wrote:
> > ..client connections will crash if
> > hostaddr is provided rather than host, because SSL_SetURL can't handle
> > a NULL argument. I'm running with 0002 to fix it for the moment, but
> > I'm not sure yet if it does the right thing for IP addresses, which the
> > OpenSSL side has a special case for.
> 
> AFAICT the idea is to handle it in the cert auth callback, so I've added some
> PoC code to check for sslsni there and updated the TODO comment to reflect
> that.

I dug a bit deeper into the SNI stuff:

> +    server_hostname = SSL_RevealURL(conn->pr_fd);
> +    if (!server_hostname || server_hostname[0] == '\0')
> +    {
> +        /* If SNI is enabled we must have a hostname set */
> +        if (conn->sslsni && conn->sslsni[0])
> +            status = SECFailure;

conn->sslsni can be explicitly set to "0" to disable it, so this should
probably be changed to a check for "1", but I'm not sure that would be
correct either. If the user has the default sslsni="1" and supplies an
IP address for the host parameter, I don't think we should fail the
connection.

> +    if (host && host[0] &&
> +        !(strspn(host, "0123456789.") == strlen(host) ||
> +          strchr(host, ':')))
> +        SSL_SetURL(conn->pr_fd, host);

It looks like NSS may already have some code that prevents SNI from
being sent for IP addresses, so that part of the guard might not be
necessary. (And potentially counterproductive, because it looks like
NSS can perform verification against the certificate's SANs if you pass
an IP address to SSL_SetURL().)

Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
backend can handle those. That's a separate conversation, but I might
take a look at a patch for next commitfest.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 21 Sep 2021, at 02:06, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Mon, 2021-07-26 at 15:26 +0200, Daniel Gustafsson wrote:
>>> On 19 Jul 2021, at 21:33, Jacob Champion <pchampion@vmware.com> wrote:
>>> ..client connections will crash if
>>> hostaddr is provided rather than host, because SSL_SetURL can't handle
>>> a NULL argument. I'm running with 0002 to fix it for the moment, but
>>> I'm not sure yet if it does the right thing for IP addresses, which the
>>> OpenSSL side has a special case for.
>>
>> AFAICT the idea is to handle it in the cert auth callback, so I've added some
>> PoC code to check for sslsni there and updated the TODO comment to reflect
>> that.
>
> I dug a bit deeper into the SNI stuff:
>
>> +    server_hostname = SSL_RevealURL(conn->pr_fd);
>> +    if (!server_hostname || server_hostname[0] == '\0')
>> +    {
>> +        /* If SNI is enabled we must have a hostname set */
>> +        if (conn->sslsni && conn->sslsni[0])
>> +            status = SECFailure;
>
> conn->sslsni can be explicitly set to "0" to disable it, so this should
> probably be changed to a check for "1",

Agreed.

> but I'm not sure that would be
> correct either. If the user has the default sslsni="1" and supplies an
> IP address for the host parameter, I don't think we should fail the
> connection.

Maybe not, but doing so is at least in line with how the OpenSSL support will
handle the same config AFAICT. Or am I missing something?

>> +    if (host && host[0] &&
>> +        !(strspn(host, "0123456789.") == strlen(host) ||
>> +          strchr(host, ':')))
>> +        SSL_SetURL(conn->pr_fd, host);
>
> It looks like NSS may already have some code that prevents SNI from
> being sent for IP addresses, so that part of the guard might not be
> necessary. (And potentially counterproductive, because it looks like
> NSS can perform verification against the certificate's SANs if you pass
> an IP address to SSL_SetURL().)

Skimming the NSS code I wasn't able find the countermeasures, can you provide a
reference to where I should look?

Feel free to post a new version of the NSS patch with these changes if you want.

> Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
> backend can handle those. That's a separate conversation, but I might
> take a look at a patch for next commitfest.

Please do.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > On 21 Sep 2021, at 02:06, Jacob Champion <pchampion@vmware.com> wrote:
> > but I'm not sure that would be
> > correct either. If the user has the default sslsni="1" and supplies an
> > IP address for the host parameter, I don't think we should fail the
> > connection.
> 
> Maybe not, but doing so is at least in line with how the OpenSSL support will
> handle the same config AFAICT. Or am I missing something?

With OpenSSL, I don't see a connection failure when using sslsni=1 with
IP addresses. (verify-full can't work, but that's a separate problem.)

> > > +    if (host && host[0] &&
> > > +        !(strspn(host, "0123456789.") == strlen(host) ||
> > > +          strchr(host, ':')))
> > > +        SSL_SetURL(conn->pr_fd, host);
> > 
> > It looks like NSS may already have some code that prevents SNI from
> > being sent for IP addresses, so that part of the guard might not be
> > necessary. (And potentially counterproductive, because it looks like
> > NSS can perform verification against the certificate's SANs if you pass
> > an IP address to SSL_SetURL().)
> 
> Skimming the NSS code I wasn't able find the countermeasures, can you provide a
> reference to where I should look?

I see the check in ssl_ShouldSendSNIExtension(), in ssl3exthandle.c.

> Feel free to post a new version of the NSS patch with these changes if you want.

Will do!

Thanks,
--Jacob

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2021-09-27 at 16:29 +0000, Jacob Champion wrote:
> On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > 
> > Feel free to post a new version of the NSS patch with these changes if you want.
> 
> Will do!

Something like the attached, v43, I think. (since-v42.diff.txt has the
changes only.)

This fixes the interaction of IP addresses and SNI for me, and honors
sslsni=0.

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Rachel Heaton
Date:
On Mon, Sep 20, 2021 at 2:38 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> Rebased on top of HEAD with off-list comment fixes by Kevin Burke.
>

Hello Daniel,

I've been playing with your patch on Mac (OS 11.6 Big Sur) and have
run into a couple of issues so far.

1. I get 7 warnings while running make (truncated):
cryptohash_nss.c:101:21: warning: implicit conversion from enumeration
type 'SECOidTag' to different enumeration type 'HASH_HashType'
[-Wenum-conversion]
                        ctx->hash_type = SEC_OID_SHA1;
                                       ~ ^~~~~~~~~~~~
...
cryptohash_nss.c:134:34: warning: implicit conversion from enumeration
type 'HASH_HashType' to different enumeration type 'SECOidTag'
[-Wenum-conversion]
        hash = SECOID_FindOIDByTag(ctx->hash_type);
               ~~~~~~~~~~~~~~~~~~~ ~~~~~^~~~~~~~~
7 warnings generated.

2. libpq-refs-stamp fails -- it appears an exit is being injected into
libpq on Mac

Notes about my environment:
I've installed nss via homebrew (at version 3.70) and linked it.

Cheers,
Rachel



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 28 Sep 2021, at 01:07, Rachel Heaton <rachelmheaton@gmail.com> wrote:

> 1. I get 7 warnings while running make (truncated):
> cryptohash_nss.c:101:21: warning: implicit conversion from enumeration
> type 'SECOidTag' to different enumeration type 'HASH_HashType'

Nice catch, fixed in the attached.

> 2. libpq-refs-stamp fails -- it appears an exit is being injected into
> libpq on Mac

I spent some time investigating this, and there are two cases of _exit() and
one atexit() which are coming from the threading code in libnspr (which is the
runtime lib required by libnss).

On macOS the threading code registers an atexit handler [0] in order to work
around issues with __attribute__((destructor)) [1].  The pthreads code also
defines PR_ProcessExit [2] which does what it says on the tin, calls exit and
not much more [3].  Both of these uses are only compiled when building with
pthreads, which can be disabled in autoconf but that seems broken in recent
version of NSPR.  I'm fairly sure I've built NSPR with the user pthreads in the
past, but if packagers build it like this then we need to conform to that.  The
PR_CreateProcess() [4] call further calls _exit() [5] in a number of error
paths on failing syscalls.

The libpq libnss implementation doesn't call either of these, and neither does
libnss.

I'm not entirely sure what to do here, it clearly requires an exception in the
Makefile check of sorts if we deem we can live with this.

@Jacob: how did you configure your copy of NSPR?

--
Daniel Gustafsson        https://vmware.com/

[0] https://hg.mozilla.org/projects/nspr/file/tip/pr/src/pthreads/ptthread.c#l1034
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1399746#c99
[2] https://www-archive.mozilla.org/projects/nspr/reference/html/prinit.html#15859
[3] https://hg.mozilla.org/projects/nspr/file/tip/pr/src/pthreads/ptthread.c#l1181
[4] https://www-archive.mozilla.org/projects/nspr/reference/html/prprocess.html#24535
[5] https://hg.mozilla.org/projects/nspr/file/tip/pr/src/md/unix/uxproces.c#l268


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Thu, 2021-09-30 at 14:17 +0200, Daniel Gustafsson wrote:
> The libpq libnss implementation doesn't call either of these, and neither does
> libnss.

I thought the refs check only searched for direct symbol dependencies;
is that piece of NSPR being statically included somehow?

> I'm not entirely sure what to do here, it clearly requires an exception in the
> Makefile check of sorts if we deem we can live with this.
> 
> @Jacob: how did you configure your copy of NSPR?

I use the Ubuntu 20.04 builtin (NSPR 4.25.0), but it looks like the
reason I haven't been seeing this is because I've always used --enable-
coverage. If I take that out, I see the same exit check failure.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Thu, 2021-09-30 at 16:04 +0000, Jacob Champion wrote:
> On Thu, 2021-09-30 at 14:17 +0200, Daniel Gustafsson wrote:
> > The libpq libnss implementation doesn't call either of these, and neither does
> > libnss.
> 
> I thought the refs check only searched for direct symbol dependencies;
> is that piece of NSPR being statically included somehow?

On my machine, at least, exit() is coming in due to a few calls to
psprintf(), pstrdup(), and pg_malloc() in the new NSS code.
(Disassembly via `objdump -S libpq.so` helped me track those down.) I'm
working on a patch.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 1 Oct 2021, at 02:02, Jacob Champion <pchampion@vmware.com> wrote:

> On my machine, at least, exit() is coming in due to a few calls to
> psprintf(), pstrdup(), and pg_malloc() in the new NSS code.
> (Disassembly via `objdump -S libpq.so` helped me track those down.) I'm
> working on a patch.

Ah, that makes perfect sense.  I was too focused on hunting in what new was
linked against that I overlooked the obvious.  Thanks for finding these.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Fri, 2021-10-01 at 08:55 +0200, Daniel Gustafsson wrote:
> Ah, that makes perfect sense.  I was too focused on hunting in what new was
> linked against that I overlooked the obvious.  Thanks for finding these.

No problem at all :) The exit() check is useful but still a little
opaque, I think, especially since (from my newbie perspective) there's
so much of the pgcommon staticlib that is forbidden for use in libpq.

Fixed in v44, attached; changes in since-v43.diff.txt.

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 4 Oct 2021, at 18:14, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Fri, 2021-10-01 at 08:55 +0200, Daniel Gustafsson wrote:
>> Ah, that makes perfect sense.  I was too focused on hunting in what new was
>> linked against that I overlooked the obvious.  Thanks for finding these.
>
> No problem at all :) The exit() check is useful but still a little
> opaque, I think, especially since (from my newbie perspective) there's
> so much of the pgcommon staticlib that is forbidden for use in libpq.

Thanks!  These changes looks good.  Since you accidentally based this on v43
and not the v44 I posted with the cryptohash fix in, the attached is a v45 with
both your v44 and the previous one, all rebased over HEAD.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote:
> Thanks!  These changes looks good.  Since you accidentally based this on v43
> and not the v44 I posted with the cryptohash fix in, the attached is a v45 with
> both your v44 and the previous one, all rebased over HEAD.

Thanks, and sorry about that.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Kevin Burke
Date:
Hi all, apologies but I'm having trouble applying the latest patch (v45) to the latest commit on master (6b0f6f79eef2168ce38a8ee99c3ed76e3df5d7ad)

I downloaded all of the patches to my local filesystem, and then ran: 

for patch in ../../kevinburke/rustls-postgres/patchsets/2021-10-05-gustafsson-mailing-list/*.patch; do git am $patch; done;

I get the following error on the second patch file:

Applying: Refactor SSL testharness for multiple library
error: patch failed: src/test/ssl/t/001_ssltests.pl:7
error: src/test/ssl/t/001_ssltests.pl: patch does not apply
error: patch failed: src/test/ssl/t/SSLServer.pm:26
error: src/test/ssl/t/SSLServer.pm: patch does not apply
Patch failed at 0001 Refactor SSL testharness for multiple library
hint: Use 'git am --show-current-patch=diff' to see the failed patch

I believe that these patches need to integrate the refactoring in commit b3b4d8e68ae83f432f43f035c7eb481ef93e1583 - git is searching for the wrong text in the existing file, but I'm not sure how to submit a patch against a patch.

Thanks,
Kevin

On Tue, Oct 5, 2021 at 8:05 AM Jacob Champion <pchampion@vmware.com> wrote:
On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote:
> Thanks!  These changes looks good.  Since you accidentally based this on v43
> and not the v44 I posted with the cryptohash fix in, the attached is a v45 with
> both your v44 and the previous one, all rebased over HEAD.

Thanks, and sorry about that.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Kevin Burke
Date:
For anyone else trying to test out this branch I'm able to get the patches to apply cleanly if I check out e.g. commit 92e6a98c3636948e7ece9a3260f9d89dd60da278.

Kevin

--
Kevin Burke
phone: 925-271-7005 | kevin.burke.dev


On Thu, Oct 28, 2021 at 9:31 PM Kevin Burke <kevin@burke.dev> wrote:
Hi all, apologies but I'm having trouble applying the latest patch (v45) to the latest commit on master (6b0f6f79eef2168ce38a8ee99c3ed76e3df5d7ad)

I downloaded all of the patches to my local filesystem, and then ran: 

for patch in ../../kevinburke/rustls-postgres/patchsets/2021-10-05-gustafsson-mailing-list/*.patch; do git am $patch; done;

I get the following error on the second patch file:

Applying: Refactor SSL testharness for multiple library
error: patch failed: src/test/ssl/t/001_ssltests.pl:7
error: src/test/ssl/t/001_ssltests.pl: patch does not apply
error: patch failed: src/test/ssl/t/SSLServer.pm:26
error: src/test/ssl/t/SSLServer.pm: patch does not apply
Patch failed at 0001 Refactor SSL testharness for multiple library
hint: Use 'git am --show-current-patch=diff' to see the failed patch

I believe that these patches need to integrate the refactoring in commit b3b4d8e68ae83f432f43f035c7eb481ef93e1583 - git is searching for the wrong text in the existing file, but I'm not sure how to submit a patch against a patch.

Thanks,
Kevin

On Tue, Oct 5, 2021 at 8:05 AM Jacob Champion <pchampion@vmware.com> wrote:
On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote:
> Thanks!  These changes looks good.  Since you accidentally based this on v43
> and not the v44 I posted with the cryptohash fix in, the attached is a v45 with
> both your v44 and the previous one, all rebased over HEAD.

Thanks, and sorry about that.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 29 Oct 2021, at 06:31, Kevin Burke <kevin@burke.dev> wrote:

Thanks for testing the patch!

> I believe that these patches need to integrate the refactoring in commit
> b3b4d8e68ae83f432f43f035c7eb481ef93e1583 - git is searching for the wrong text
> in the existing file


Correct, b3b4d8e68 as well as b4c4a00ea both created conflicts with this
patchset.  Attached is an updated patchset fixing both of those as well as
adding version checks for NSS and NSPR to autoconf (with fallbacks for
non-{nss|nspr}-config systems).  The versions picked are semi-arbitrary and
definitely up for discussion.  I chose them mainly as they were the oldest
commonly available packages I found, and they satisfy the requirements we have.

> I'm not sure how to submit a patch against a patch.

If you've done the work of fixing the conflicts in a rebase, the best option is
IMO to supply a whole new version of the patchset since that will make the CF
patch tester be able to build and test the version.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Fri, Nov 5, 2021 at 6:01 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> Attached is a rebase fixing a tiny bug in the documentation which prevented it
> from being able to compile.
>

Hello, I'm looking to help out with reviews for this CF and I'm
currently looking at this patchset.

currently I'm stuck trying to configure:

checking for nss-config... /usr/bin/nss-config
checking for nspr-config... /usr/bin/nspr-config
...
checking nss/ssl.h usability... no
checking nss/ssl.h presence... no
checking for nss/ssl.h... no
configure: error: header file <nss/ssl.h> is required for NSS

This is on fedora 33 and nss-devel is installed, nss-config is
available (and configure finds it) but the directory is different from
Ubuntu:
(base) [vagrant@fedora ~]$ nss-config --includedir
/usr/include/nss3
(base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h
-rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h

So if nss-config --includedir is used then #include <ssl.h> should be
used, or if not then #include <nss3/ssl.h> but on this system #include
<nss/ssl.h> is not going to work.

Thanks



Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Fri, Nov 5, 2021 at 6:01 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> >
> > Attached is a rebase fixing a tiny bug in the documentation which prevented it
> > from being able to compile.
> >
>
> Hello, I'm looking to help out with reviews for this CF and I'm
> currently looking at this patchset.
>
> currently I'm stuck trying to configure:
>
> checking for nss-config... /usr/bin/nss-config
> checking for nspr-config... /usr/bin/nspr-config
> ...
> checking nss/ssl.h usability... no
> checking nss/ssl.h presence... no
> checking for nss/ssl.h... no
> configure: error: header file <nss/ssl.h> is required for NSS
>
> This is on fedora 33 and nss-devel is installed, nss-config is
> available (and configure finds it) but the directory is different from
> Ubuntu:
> (base) [vagrant@fedora ~]$ nss-config --includedir
> /usr/include/nss3
> (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h
> -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h
>
> So if nss-config --includedir is used then #include <ssl.h> should be
> used, or if not then #include <nss3/ssl.h> but on this system #include
> <nss/ssl.h> is not going to work.

FYI, if I make a symlink to get past this, configure completes but
compilation fails because nspr/nspr.h cannot be found (I'm not sure
why configure doesn't discover this)
../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
#include <nspr/nspr.h>In file included from protocol_nss.c:24:
../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
#include <nspr/nspr.h>
 ^~~~~~~~~~~~~

It's a similar issue:
$ nspr-config --includedir
/usr/include/nspr4



Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Tue, Nov 9, 2021 at 2:02 PM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:
> >
> > On Fri, Nov 5, 2021 at 6:01 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> > >
> > > Attached is a rebase fixing a tiny bug in the documentation which prevented it
> > > from being able to compile.
> > >
> >
> > Hello, I'm looking to help out with reviews for this CF and I'm
> > currently looking at this patchset.
> >
> > currently I'm stuck trying to configure:
> >
> > checking for nss-config... /usr/bin/nss-config
> > checking for nspr-config... /usr/bin/nspr-config
> > ...
> > checking nss/ssl.h usability... no
> > checking nss/ssl.h presence... no
> > checking for nss/ssl.h... no
> > configure: error: header file <nss/ssl.h> is required for NSS
> >
> > This is on fedora 33 and nss-devel is installed, nss-config is
> > available (and configure finds it) but the directory is different from
> > Ubuntu:
> > (base) [vagrant@fedora ~]$ nss-config --includedir
> > /usr/include/nss3
> > (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h
> > -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h
> >
> > So if nss-config --includedir is used then #include <ssl.h> should be
> > used, or if not then #include <nss3/ssl.h> but on this system #include
> > <nss/ssl.h> is not going to work.
>
> FYI, if I make a symlink to get past this, configure completes but
> compilation fails because nspr/nspr.h cannot be found (I'm not sure
> why configure doesn't discover this)
> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
> #include <nspr/nspr.h>In file included from protocol_nss.c:24:
> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
> #include <nspr/nspr.h>
>  ^~~~~~~~~~~~~
>
> It's a similar issue:
> $ nspr-config --includedir
> /usr/include/nspr4

If these get resolved the next issue is llvm bitcode doesn't compile
because the nss includedir is missing from CPPFLAGS:

/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv
-O2  -I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
-I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc
be-secure-nss.c
In file included from be-secure-nss.c:20:
In file included from ../../../src/include/common/nss.h:38:
In file included from /usr/include/nss/nss.h:34:
/usr/include/nss/seccomon.h:17:10: fatal error: 'prtypes.h' file not found
#include "prtypes.h"
         ^~~~~~~~~~~
1 error generated.



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 9 Nov 2021, at 22:22, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> On Tue, Nov 9, 2021 at 2:02 PM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:
>>
>> On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle
>> <joshua.brindle@crunchydata.com> wrote:

>>> Hello, I'm looking to help out with reviews for this CF and I'm
>>> currently looking at this patchset.

Thanks, much appreciated!

>>> currently I'm stuck trying to configure:
>>>
>>> checking for nss-config... /usr/bin/nss-config
>>> checking for nspr-config... /usr/bin/nspr-config
>>> ...
>>> checking nss/ssl.h usability... no
>>> checking nss/ssl.h presence... no
>>> checking for nss/ssl.h... no
>>> configure: error: header file <nss/ssl.h> is required for NSS
>>>
>>> This is on fedora 33 and nss-devel is installed, nss-config is
>>> available (and configure finds it) but the directory is different from
>>> Ubuntu:
>>> (base) [vagrant@fedora ~]$ nss-config --includedir
>>> /usr/include/nss3
>>> (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h
>>> -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h
>>>
>>> So if nss-config --includedir is used then #include <ssl.h> should be
>>> used, or if not then #include <nss3/ssl.h> but on this system #include
>>> <nss/ssl.h> is not going to work.

Interesting rename, I doubt any version but NSS 3 and NSPR 4 is alive anywhere
and an incremented major version seems highly unlikely.  Going back to plain
#include <ssl.h> and have the includeflags sort out the correct directories
seems like the best option then.  Fixed in the attached.

>> FYI, if I make a symlink to get past this, configure completes but
>> compilation fails because nspr/nspr.h cannot be found (I'm not sure
>> why configure doesn't discover this)
>> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
>> #include <nspr/nspr.h>In file included from protocol_nss.c:24:
>> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
>> #include <nspr/nspr.h>
>> ^~~~~~~~~~~~~
>>
>> It's a similar issue:
>> $ nspr-config --includedir
>> /usr/include/nspr4

Fixed.

> If these get resolved the next issue is llvm bitcode doesn't compile
> because the nss includedir is missing from CPPFLAGS:
>
> /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv
> -O2  -I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
> -I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc
> be-secure-nss.c
> In file included from be-secure-nss.c:20:
> In file included from ../../../src/include/common/nss.h:38:
> In file included from /usr/include/nss/nss.h:34:
> /usr/include/nss/seccomon.h:17:10: fatal error: 'prtypes.h' file not found
> #include "prtypes.h"
>         ^~~~~~~~~~~
> 1 error generated.

Fixed.

The attached also resolves the conflicts in pgcrypto following db7d1a7b05.  PGP
elgamel and RSA pubkey functions aren't supported for now as there is no bignum
functions similar to the BN_* in OpenSSL.  I will look into more how hard it
would be to support, for now this gets us ahead.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Wed, Nov 10, 2021 at 8:49 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 9 Nov 2021, at 22:22, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> > On Tue, Nov 9, 2021 at 2:02 PM Joshua Brindle
> > <joshua.brindle@crunchydata.com> wrote:
> >>
> >> On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle
> >> <joshua.brindle@crunchydata.com> wrote:
>
> >>> Hello, I'm looking to help out with reviews for this CF and I'm
> >>> currently looking at this patchset.
>
> Thanks, much appreciated!
>
> >>> currently I'm stuck trying to configure:
> >>>
> >>> checking for nss-config... /usr/bin/nss-config
> >>> checking for nspr-config... /usr/bin/nspr-config
> >>> ...
> >>> checking nss/ssl.h usability... no
> >>> checking nss/ssl.h presence... no
> >>> checking for nss/ssl.h... no
> >>> configure: error: header file <nss/ssl.h> is required for NSS
> >>>
> >>> This is on fedora 33 and nss-devel is installed, nss-config is
> >>> available (and configure finds it) but the directory is different from
> >>> Ubuntu:
> >>> (base) [vagrant@fedora ~]$ nss-config --includedir
> >>> /usr/include/nss3
> >>> (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h
> >>> -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h
> >>>
> >>> So if nss-config --includedir is used then #include <ssl.h> should be
> >>> used, or if not then #include <nss3/ssl.h> but on this system #include
> >>> <nss/ssl.h> is not going to work.
>
> Interesting rename, I doubt any version but NSS 3 and NSPR 4 is alive anywhere
> and an incremented major version seems highly unlikely.  Going back to plain
> #include <ssl.h> and have the includeflags sort out the correct directories
> seems like the best option then.  Fixed in the attached.
>
> >> FYI, if I make a symlink to get past this, configure completes but
> >> compilation fails because nspr/nspr.h cannot be found (I'm not sure
> >> why configure doesn't discover this)
> >> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
> >> #include <nspr/nspr.h>In file included from protocol_nss.c:24:
> >> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found
> >> #include <nspr/nspr.h>
> >> ^~~~~~~~~~~~~
> >>
> >> It's a similar issue:
> >> $ nspr-config --includedir
> >> /usr/include/nspr4
>
> Fixed.
>
> > If these get resolved the next issue is llvm bitcode doesn't compile
> > because the nss includedir is missing from CPPFLAGS:
> >
> > /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv
> > -O2  -I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
> > -I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc
> > be-secure-nss.c
> > In file included from be-secure-nss.c:20:
> > In file included from ../../../src/include/common/nss.h:38:
> > In file included from /usr/include/nss/nss.h:34:
> > /usr/include/nss/seccomon.h:17:10: fatal error: 'prtypes.h' file not found
> > #include "prtypes.h"
> >         ^~~~~~~~~~~
> > 1 error generated.
>
> Fixed.

Apologies for the delay, this didn't go to my inbox and I missed it on list.

The bitcode generation is still broken, this time for nspr.h:

/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv
-O2  -I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
-I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc
be-secure-nss.c
In file included from be-secure-nss.c:20:
../../../src/include/common/nss.h:31:10: fatal error: 'nspr.h' file not found
#include <nspr.h>
         ^~~~~~~~
1 error generated.

FWIW I attached the Dockerfile I've been using to test this, primarily
to ensure that there were no openssl devel files lurking around during
compilation.

It expects a ./postgres directory with whatever patches already applied to it.

>
> The attached also resolves the conflicts in pgcrypto following db7d1a7b05.  PGP
> elgamel and RSA pubkey functions aren't supported for now as there is no bignum
> functions similar to the BN_* in OpenSSL.  I will look into more how hard it
> would be to support, for now this gets us ahead.
>
> --
> Daniel Gustafsson               https://vmware.com/
>

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:

> Apologies for the delay, this didn't go to my inbox and I missed it on list.
>
> The bitcode generation is still broken, this time for nspr.h:

Interesting, I am unable to replicate that in my tree but I'll investigate
further tomorrow using your Dockerfile.  For the sake of testing, does
compilation pass for you in the same place without using --with-llvm?

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
>
> > Apologies for the delay, this didn't go to my inbox and I missed it on list.
> >
> > The bitcode generation is still broken, this time for nspr.h:
>
> Interesting, I am unable to replicate that in my tree but I'll investigate
> further tomorrow using your Dockerfile.  For the sake of testing, does
> compilation pass for you in the same place without using --with-llvm?
>

Yes, it builds and check-world passes. I'll continue testing with this
build. Thank you.



Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Mon, Nov 15, 2021 at 5:37 PM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> >
> > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> >
> > > Apologies for the delay, this didn't go to my inbox and I missed it on list.
> > >
> > > The bitcode generation is still broken, this time for nspr.h:
> >
> > Interesting, I am unable to replicate that in my tree but I'll investigate
> > further tomorrow using your Dockerfile.  For the sake of testing, does
> > compilation pass for you in the same place without using --with-llvm?
> >
>
> Yes, it builds and check-world passes. I'll continue testing with this
> build. Thank you.

The previous Dockerfile had some issues due to a hasty port from RHEL
to Fedora, attached is one that works with your patchset, llvm
currently disabled, and the llvm deps removed.

The service file is also attached since it's referenced in the
Dockerfile and you'd have had to reproduce it.

After building, run with:
docker run --name pg-test -p 5432:5432 --cap-add=SYS_ADMIN -v
/sys/fs/cgroup:/sys/fs/cgroup:ro -d <final docker hash>

Attachment

Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Tue, Nov 16, 2021 at 9:45 AM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Mon, Nov 15, 2021 at 5:37 PM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > >
> > > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> > >
> > > > Apologies for the delay, this didn't go to my inbox and I missed it on list.
> > > >
> > > > The bitcode generation is still broken, this time for nspr.h:
> > >
> > > Interesting, I am unable to replicate that in my tree but I'll investigate
> > > further tomorrow using your Dockerfile.  For the sake of testing, does
> > > compilation pass for you in the same place without using --with-llvm?
> > >
> >
> > Yes, it builds and check-world passes. I'll continue testing with this
> > build. Thank you.
>
> The previous Dockerfile had some issues due to a hasty port from RHEL
> to Fedora, attached is one that works with your patchset, llvm
> currently disabled, and the llvm deps removed.
>
> The service file is also attached since it's referenced in the
> Dockerfile and you'd have had to reproduce it.
>
> After building, run with:
> docker run --name pg-test -p 5432:5432 --cap-add=SYS_ADMIN -v
> /sys/fs/cgroup:/sys/fs/cgroup:ro -d <final docker hash>

I think there it a typo in the docs here that prevents them from
building (this diff seems to fix it):

diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml
index 56b73e033c..844aa31e86 100644
--- a/doc/src/sgml/pgcrypto.sgml
+++ b/doc/src/sgml/pgcrypto.sgml
@@ -767,7 +767,7 @@ pgp_sym_encrypt(data, psw, 'compress-algo=1,
cipher-algo=aes256')
    <para>
     Which cipher algorithm to use.  <literal>cast5</literal> is only available
     if <productname>PostgreSQL</productname> was built with
-    <productname>OpenSSL</productame>.
+    <productname>OpenSSL</productname>.
    </para>
 <literallayout>
 Values: bf, aes128, aes192, aes256, 3des, cast5



Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Tue, Nov 16, 2021 at 1:26 PM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Tue, Nov 16, 2021 at 9:45 AM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 5:37 PM Joshua Brindle
> > <joshua.brindle@crunchydata.com> wrote:
> > >
> > > On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > > >
> > > > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> > > >
> > > > > Apologies for the delay, this didn't go to my inbox and I missed it on list.
> > > > >
> > > > > The bitcode generation is still broken, this time for nspr.h:
> > > >
> > > > Interesting, I am unable to replicate that in my tree but I'll investigate
> > > > further tomorrow using your Dockerfile.  For the sake of testing, does
> > > > compilation pass for you in the same place without using --with-llvm?
> > > >
> > >
> > > Yes, it builds and check-world passes. I'll continue testing with this
> > > build. Thank you.
> >
> > The previous Dockerfile had some issues due to a hasty port from RHEL
> > to Fedora, attached is one that works with your patchset, llvm
> > currently disabled, and the llvm deps removed.
> >
> > The service file is also attached since it's referenced in the
> > Dockerfile and you'd have had to reproduce it.
> >
> > After building, run with:
> > docker run --name pg-test -p 5432:5432 --cap-add=SYS_ADMIN -v
> > /sys/fs/cgroup:/sys/fs/cgroup:ro -d <final docker hash>
>
> I think there it a typo in the docs here that prevents them from
> building (this diff seems to fix it):
>
> diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml
> index 56b73e033c..844aa31e86 100644
> --- a/doc/src/sgml/pgcrypto.sgml
> +++ b/doc/src/sgml/pgcrypto.sgml
> @@ -767,7 +767,7 @@ pgp_sym_encrypt(data, psw, 'compress-algo=1,
> cipher-algo=aes256')
>     <para>
>      Which cipher algorithm to use.  <literal>cast5</literal> is only available
>      if <productname>PostgreSQL</productname> was built with
> -    <productname>OpenSSL</productame>.
> +    <productname>OpenSSL</productname>.
>     </para>
>  <literallayout>
>  Values: bf, aes128, aes192, aes256, 3des, cast5

After a bit more testing, the server is up and running with an nss
database but before configuring the client database I tried connecting
and got a segfault:

#0  PR_Write (fd=0x0, buf=0x141ba60, amount=84) at
io/../../.././nspr/pr/src/io/priometh.c:114
#1  0x00007ff33dfdc62f in pgtls_write (conn=0x13cecb0, ptr=0x141ba60,
len=84) at fe-secure-nss.c:583
#2  0x00007ff33dfd6e18 in pqsecure_write (conn=0x13cecb0,
ptr=0x141ba60, len=84) at fe-secure.c:295
#3  0x00007ff33dfd04dc in pqSendSome (conn=0x13cecb0, len=84) at fe-misc.c:834
#4  0x00007ff33dfd06c8 in pqFlush (conn=0x13cecb0) at fe-misc.c:972
#5  0x00007ff33dfc257c in pqPacketSend (conn=0x13cecb0, pack_type=0
'\000', buf=0x1414c60, buf_len=80) at fe-connect.c:4619
#6  0x00007ff33dfbfadd in PQconnectPoll (conn=0x13cecb0) at fe-connect.c:2986
#7  0x00007ff33dfbe55c in connectDBComplete (conn=0x13cecb0) at
fe-connect.c:2218
#8  0x00007ff33dfbbaef in PQconnectdbParams (keywords=0x1427d10,
values=0x1427e60, expand_dbname=1) at fe-connect.c:668
#9  0x000000000043ebc7 in main (argc=2, argv=0x7ffdccd0e2f8) at startup.c:273

It looks like the ssl connection falls through to attempt a non-ssl
connection but at some point conn->ssl_in_use gets set to true,
despite pr_fd and nss_context being null.

This patch fixes the segfault but I suspect is not the correct fix,
due to the error when connecting saying "Success":

--- a/src/interfaces/libpq/fe-secure-nss.c
+++ b/src/interfaces/libpq/fe-secure-nss.c
@@ -498,6 +498,11 @@ pgtls_read(PGconn *conn, void *ptr, size_t len)
         * for closed connections, while -1 indicates an error within
the ongoing
         * connection.
         */
+       if (!conn->pr_fd) {
+               SOCK_ERRNO_SET(read_errno);
+               return -1;
+       }
+
        nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);

        if (nread == 0)
@@ -580,6 +585,11 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
        PRErrorCode status;
        int                     write_errno = 0;

+       if (!conn->pr_fd) {
+               SOCK_ERRNO_SET(write_errno);
+               return -1;
+       }
+
        n = PR_Write(conn->pr_fd, ptr, len);

        if (n < 0)



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 17 Nov 2021, at 19:42, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> On Tue, Nov 16, 2021 at 1:26 PM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:

>> I think there it a typo in the docs here that prevents them from
>> building (this diff seems to fix it):

Ah yes, thanks, I had noticed that one but forgot to send out a new version to
make the CFBot green.

> After a bit more testing, the server is up and running with an nss
> database but before configuring the client database I tried connecting
> and got a segfault:

Interesting.  I'm unable to reproduce this crash, can you show the sequence of
commands which led to this?

> It looks like the ssl connection falls through to attempt a non-ssl
> connection but at some point conn->ssl_in_use gets set to true,
> despite pr_fd and nss_context being null.

pgtls_close missed setting ssl_in_use to false, fixed in the attached.  I've
also added some assertions to the connection setup for debugging this.

> This patch fixes the segfault but I suspect is not the correct fix,
> due to the error when connecting saying "Success":

Right, without an SSL enabled FD we should never get here.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Tue, Nov 23, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 17 Nov 2021, at 19:42, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> > On Tue, Nov 16, 2021 at 1:26 PM Joshua Brindle
> > <joshua.brindle@crunchydata.com> wrote:
>
> >> I think there it a typo in the docs here that prevents them from
> >> building (this diff seems to fix it):
>
> Ah yes, thanks, I had noticed that one but forgot to send out a new version to
> make the CFBot green.
>
> > After a bit more testing, the server is up and running with an nss
> > database but before configuring the client database I tried connecting
> > and got a segfault:
>
> Interesting.  I'm unable to reproduce this crash, can you show the sequence of
> commands which led to this?

It no longer happens with v49, since it was a null deref of the pr_fd
which no longer happens.

I'll continue testing now, so far it's looking better.

Did the build issue with --with-llvm get fixed in this update also? I
haven't tried building with it yet.

> > It looks like the ssl connection falls through to attempt a non-ssl
> > connection but at some point conn->ssl_in_use gets set to true,
> > despite pr_fd and nss_context being null.
>
> pgtls_close missed setting ssl_in_use to false, fixed in the attached.  I've
> also added some assertions to the connection setup for debugging this.
>
> > This patch fixes the segfault but I suspect is not the correct fix,
> > due to the error when connecting saying "Success":
>
> Right, without an SSL enabled FD we should never get here.
>

Thank you.



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:

> It no longer happens with v49, since it was a null deref of the pr_fd
> which no longer happens.
>
> I'll continue testing now, so far it's looking better.

Great, thanks for confirming.  I'm still keen on knowing how you triggered the
segfault so I can ensure there are no further bugs around there.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
>
> > It no longer happens with v49, since it was a null deref of the pr_fd
> > which no longer happens.
> >
> > I'll continue testing now, so far it's looking better.
>
> Great, thanks for confirming.  I'm still keen on knowing how you triggered the
> segfault so I can ensure there are no further bugs around there.
>

It happened when I ran psql with hostssl on the server but before I'd
initialized my client certificate store.



Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> >
> > > On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> >
> > > It no longer happens with v49, since it was a null deref of the pr_fd
> > > which no longer happens.
> > >
> > > I'll continue testing now, so far it's looking better.
> >
> > Great, thanks for confirming.  I'm still keen on knowing how you triggered the
> > segfault so I can ensure there are no further bugs around there.
> >
>
> It happened when I ran psql with hostssl on the server but before I'd
> initialized my client certificate store.

I don't know enough about NSS to know if this is problematic or not
but if I try verify-full without having the root CA in the certificate
store I get:

$ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres"
psql: error: SSL error: Issuer certificate is invalid.
unable to shut down NSS context: NSS could not shutdown. Objects are
still in use.



Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle
<joshua.brindle@crunchydata.com> wrote:
>
> On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:
> >
> > On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> > >
> > > > On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> > >
> > > > It no longer happens with v49, since it was a null deref of the pr_fd
> > > > which no longer happens.
> > > >
> > > > I'll continue testing now, so far it's looking better.
> > >
> > > Great, thanks for confirming.  I'm still keen on knowing how you triggered the
> > > segfault so I can ensure there are no further bugs around there.
> > >
> >
> > It happened when I ran psql with hostssl on the server but before I'd
> > initialized my client certificate store.
>
> I don't know enough about NSS to know if this is problematic or not
> but if I try verify-full without having the root CA in the certificate
> store I get:
>
> $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres"
> psql: error: SSL error: Issuer certificate is invalid.
> unable to shut down NSS context: NSS could not shutdown. Objects are
> still in use.

Something is strange with ssl downgrading and a bad ssldatabase
[postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops
sslcert=client_cert host=localhost"
Password for user postgres:

<freezes here>

On the server side:
2021-11-25 01:52:01.984 UTC [269] LOG:  unable to handshake:
Encountered end of file (PR_END_OF_FILE_ERROR)

Other than that and I still haven't tested --with-llvm I've gotten
everything working, including with an openssl client. Attached is a
dockerfile that gets to the point where a client can connect with
clientcert=verify-full. I've removed some of the old cruft and
debugging from the previous versions.

Thank you.

Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
> > backend can handle those. That's a separate conversation, but I might
> > take a look at a patch for next commitfest.
> 
> Please do.

Didn't get around to it for November, but I'm putting the finishing
touches on that now.

While I was looking at the new SAN code (in fe-secure-nss.c,
pgtls_verify_peer_name_matches_certificate_guts()), I noticed that code
coverage never seemed to touch a good chunk of it:

> +        for (cn = san_list; cn != san_list; cn = CERT_GetNextGeneralName(cn))
> +        {
> +            char       *alt_name;
> +            int         rv;
> +            char        tmp[512];

That loop can never execute. But I wonder if all of that extra SAN code
should be removed anyway? There's this comment above it:

> +    /*
> +     * CERT_VerifyCertName will internally perform RFC 2818 SubjectAltName
> +     * verification.
> +     */

and it seems like SAN verification is working in my testing, despite
the dead loop.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 25 Nov 2021, at 14:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle
> <joshua.brindle@crunchydata.com> wrote:
>>
>> On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle
>> <joshua.brindle@crunchydata.com> wrote:

>> I don't know enough about NSS to know if this is problematic or not
>> but if I try verify-full without having the root CA in the certificate
>> store I get:
>>
>> $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres"
>> psql: error: SSL error: Issuer certificate is invalid.
>> unable to shut down NSS context: NSS could not shutdown. Objects are
>> still in use.

Fixed.

> Something is strange with ssl downgrading and a bad ssldatabase
> [postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops
> sslcert=client_cert host=localhost"
> Password for user postgres:
>
> <freezes here>

Also fixed.

> On the server side:
> 2021-11-25 01:52:01.984 UTC [269] LOG:  unable to handshake:
> Encountered end of file (PR_END_OF_FILE_ERROR)

This is normal and expected, but to make it easier on users I've changed this
error message to be aligned with the OpenSSL implementation.

> Other than that and I still haven't tested --with-llvm I've gotten
> everything working, including with an openssl client. Attached is a
> dockerfile that gets to the point where a client can connect with
> clientcert=verify-full. I've removed some of the old cruft and
> debugging from the previous versions.

Very cool, thanks!  I've been unable to reproduce any issues with llvm but I'll
keep poking at that.  A new version will be posted shortly with the above and a
few more fixes.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:

> On 30 Nov 2021, at 20:03, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
>>> Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
>>> backend can handle those. That's a separate conversation, but I might
>>> take a look at a patch for next commitfest.
>>
>> Please do.
>
> Didn't get around to it for November, but I'm putting the finishing
> touches on that now.

Cool, thanks!

> While I was looking at the new SAN code (in fe-secure-nss.c,
> pgtls_verify_peer_name_matches_certificate_guts()), I noticed that code
> coverage never seemed to touch a good chunk of it:
>
>> +        for (cn = san_list; cn != san_list; cn = CERT_GetNextGeneralName(cn))
>> +        {
>> +            char       *alt_name;
>> +            int         rv;
>> +            char        tmp[512];
>
> That loop can never execute. But I wonder if all of that extra SAN code
> should be removed anyway? There's this comment above it:
>
>> +    /*
>> +     * CERT_VerifyCertName will internally perform RFC 2818 SubjectAltName
>> +     * verification.
>> +     */
>
> and it seems like SAN verification is working in my testing, despite
> the dead loop.

Yeah, that's clearly bogus.  I followed the bouncing ball reading NSS code and
from what I can tell the comment is correct.  I removed the dead code, only
realizing after the fact that I might cause conflict with your tree doing so,
in that case sorry.

I've attached a v50 which fixes the issues found by Joshua upthread, as well as
rebases on top of all the recent SSL and pgcrypto changes.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
> > On 30 Nov 2021, at 20:03, Jacob Champion <pchampion@vmware.com> wrote:
> > 
> > On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > > > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
> > > > backend can handle those. That's a separate conversation, but I might
> > > > take a look at a patch for next commitfest.
> > > 
> > > Please do.
> > 
> > Didn't get around to it for November, but I'm putting the finishing
> > touches on that now.
> 
> Cool, thanks!

Done and registered in Commitfest.

> Yeah, that's clearly bogus.  I followed the bouncing ball reading NSS code and
> from what I can tell the comment is correct.  I removed the dead code, only
> realizing after the fact that I might cause conflict with your tree doing so,
> in that case sorry.

No worries, there weren't any issues with the rebase.

> I've attached a v50 which fixes the issues found by Joshua upthread, as well as
> rebases on top of all the recent SSL and pgcrypto changes.

Thanks!

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Wed, Dec 15, 2021 at 5:05 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 25 Nov 2021, at 14:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote:
> > On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle
> > <joshua.brindle@crunchydata.com> wrote:
> >>
> >> On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle
> >> <joshua.brindle@crunchydata.com> wrote:
>
> >> I don't know enough about NSS to know if this is problematic or not
> >> but if I try verify-full without having the root CA in the certificate
> >> store I get:
> >>
> >> $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres"
> >> psql: error: SSL error: Issuer certificate is invalid.
> >> unable to shut down NSS context: NSS could not shutdown. Objects are
> >> still in use.
>
> Fixed.
>
> > Something is strange with ssl downgrading and a bad ssldatabase
> > [postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops
> > sslcert=client_cert host=localhost"
> > Password for user postgres:
> >
> > <freezes here>
>
> Also fixed.
>
> > On the server side:
> > 2021-11-25 01:52:01.984 UTC [269] LOG:  unable to handshake:
> > Encountered end of file (PR_END_OF_FILE_ERROR)
>
> This is normal and expected, but to make it easier on users I've changed this
> error message to be aligned with the OpenSSL implementation.
>
> > Other than that and I still haven't tested --with-llvm I've gotten
> > everything working, including with an openssl client. Attached is a
> > dockerfile that gets to the point where a client can connect with
> > clientcert=verify-full. I've removed some of the old cruft and
> > debugging from the previous versions.
>
> Very cool, thanks!  I've been unable to reproduce any issues with llvm but I'll
> keep poking at that.  A new version will be posted shortly with the above and a
> few more fixes.

For v50 this change was required for an llvm build to succeed on my
Fedora system:

diff --git a/configure b/configure
index 25388a75a2..62d554806a 100755
--- a/configure
+++ b/configure
@@ -13276,6 +13276,7 @@ fi

   LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
   CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
+  CPPFLAGS="$CPPFLAGS $NSS_CFLAGS $NSPR_CFLAGS"


 $as_echo "#define USE_NSS 1" >>confdefs.h

I'm not certain why configure didn't already have that, configure.ac
appears to, but nonetheless it builds, all tests succeed, and a quick
tire kicking looks good.

Thank you.



Re: Support for NSS as a libpq TLS backend

From
Julien Rouhaud
Date:
Hi,

On Wed, Dec 15, 2021 at 11:10:14PM +0100, Daniel Gustafsson wrote:
> 
> I've attached a v50 which fixes the issues found by Joshua upthread, as well as
> rebases on top of all the recent SSL and pgcrypto changes.

The cfbot reports that the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3138.log
=== Applying patches on top of PostgreSQL commit ID 74527c3e022d3ace648340b79a6ddec3419f6732 ===
[...]
=== applying patch ./v50-0010-nss-Build-infrastructure.patch
patching file configure
patching file configure.ac
Hunk #3 succeeded at 1566 (offset 1 line).
Hunk #4 succeeded at 2366 (offset 1 line).
Hunk #5 succeeded at 2379 (offset 1 line).
patching file src/backend/libpq/Makefile
patching file src/common/Makefile
patching file src/include/pg_config.h.in
Hunk #3 succeeded at 926 (offset 3 lines).
patching file src/interfaces/libpq/Makefile
patching file src/tools/msvc/Install.pm
Hunk #1 FAILED at 440.
1 out of 1 hunk FAILED -- saving rejects to file src/tools/msvc/Install.pm.rej

Could you send a rebased version, possibly with an updated configure as
reported by Joshua?  In the meantime I will switch the entry to Waitinng on
Author.



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 15 Jan 2022, at 05:42, Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Wed, Dec 15, 2021 at 11:10:14PM +0100, Daniel Gustafsson wrote:
>>
>> I've attached a v50 which fixes the issues found by Joshua upthread, as well as
>> rebases on top of all the recent SSL and pgcrypto changes.
>
> The cfbot reports that the patchset doesn't apply anymore:

Fixed, as well as rebased and fixed up on top of the recent cryptohash error
reporting functionality to support that on par with the OpenSSL backend.

> ..possibly with an updated configure as reported by Joshua?

I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac
but not configure.  Fixed now.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Julien Rouhaud
Date:
Hi,

On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote:
> 
> I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac
> but not configure.  Fixed now.

Thanks!  Apparently this version now fails on all OS, e.g.:

https://cirrus-ci.com/task/4643868095283200
[22:17:39.965] #   Failed test 'certificate authorization succeeds with correct client cert in PEM format'
[22:17:39.965] #   at t/001_ssltests.pl line 456.
[22:17:39.965] #          got: '2'
[22:17:39.965] #     expected: '0'
[22:17:39.965]
[22:17:39.965] #   Failed test 'certificate authorization succeeds with correct client cert in PEM format: no stderr'
[22:17:39.965] #   at t/001_ssltests.pl line 456.
[22:17:39.965] #          got: 'psql: error: connection to server at "127.0.0.1", port 50023 failed: certificate
present,but not private key file "/home/postgres/.postgresql/postgresql.key"'
 
[22:17:39.965] #     expected: ''
[22:17:39.965]
[22:17:39.965] #   Failed test 'certificate authorization succeeds with correct client cert in DER format'
[22:17:39.965] #   at t/001_ssltests.pl line 475.
[22:17:39.965] #          got: '2'
[22:17:39.965] #     expected: '0'
[...]



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 18 Jan 2022, at 07:36, Julien Rouhaud <rjuju123@gmail.com> wrote:

> On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote:
>>
>> I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac
>> but not configure.  Fixed now.
>
> Thanks!  Apparently this version now fails on all OS, e.g.:

Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it
in testing.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Joshua Brindle
Date:
On Tue, Jan 18, 2022 at 7:43 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 18 Jan 2022, at 07:36, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> > On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote:
> >>
> >> I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac
> >> but not configure.  Fixed now.
> >
> > Thanks!  Apparently this version now fails on all OS, e.g.:
>
> Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it
> in testing.

LGTM +1



Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
> I've attached a v50 which fixes the issues found by Joshua upthread, as well as
> rebases on top of all the recent SSL and pgcrypto changes.

I'm currently tracking down a slot leak. When opening and closing large
numbers of NSS databases, at some point we appear to run out of slots
and then NSS starts misbehaving, even though we've closed all of our
context handles.

I don't have anything more helpful to share yet, but I wanted to make a
note of it here in case anyone else had seen it or has ideas on what
may be causing it. My next move will be to update the version of NSS
I'm running.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 18 Jan 2022, at 17:37, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
>> I've attached a v50 which fixes the issues found by Joshua upthread, as well as
>> rebases on top of all the recent SSL and pgcrypto changes.
>
> I'm currently tracking down a slot leak. When opening and closing large
> numbers of NSS databases, at some point we appear to run out of slots
> and then NSS starts misbehaving, even though we've closed all of our
> context handles.

Interesting, are you able to share a reproducer for this so I can assist in
debugging it?

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Andres Freund
Date:
Hi,

On 2022-01-18 13:42:54 +0100, Daniel Gustafsson wrote:
> Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it
> in testing.


> +task:
> +  name: Linux - Debian Bullseye (nss)
> [ copy of a bunch of code ]

I also needed similar-but-not-quite-equivalent tasks for the meson patch as
well. I just moved to having a splitting the tasks into a template and a use
of it. It's probably not quite right as I did there, but it might be worth
looking into:

https://github.com/anarazel/postgres/blob/meson/.cirrus.yml#L181

But maybe this case actually has a better solution, see two paragraphs down:


> +  install_script: |
> +    DEBIAN_FRONTEND=noninteractive apt-get --yes install libnss3 libnss3-dev libnss3-tools libnspr4 libnspr4-dev

This needs an apt-get update beforehand to succeed. That's what caused the last few runs
to fail, see e.g.
https://cirrus-ci.com/task/6293612580306944


Just duplicating the task doesn't really scale once in tree. What about
reconfiguring (note: add --enable-depend) the linux tasks to build against
nss, and then run the relevant subset of tests with it?  Most tests don't use
tcp / SSL anyway, so rerunning a small subset of tests should be feasible?


> From 297ee9ab31aa579e002edc335cce83dae19711b1 Mon Sep 17 00:00:00 2001
> From: Daniel Gustafsson <daniel@yesql.se>
> Date: Mon, 8 Feb 2021 23:52:22 +0100
> Subject: [PATCH v52 01/11] nss: Support libnss as TLS library in libpq

>  16 files changed, 3192 insertions(+), 7 deletions(-)

Phew. This is a huge patch.

Damn, I only opened this thread to report the CI failure. But now I ended up
doing a small review...



> +#include "common/nss.h"
> +
> +/*
> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
> + * colliding definitions from ours, causing a much expected compiler error.
> + * Remove backwards compatibility with ancient NSPR versions to avoid this.
> + */
> +#define NO_NSPR_10_SUPPORT
> +#include <nspr.h>
> +#include <prerror.h>
> +#include <prio.h>
> +#include <prmem.h>
> +#include <prtypes.h>

Duplicated with nss.h. Which brings me to:


> +#include <nss.h>

Is it a great idea to have common/nss.h when there's a library header nss.h?
Perhaps we should have a pg_ssl_{nss,openssl}.h or such?


> +/* ------------------------------------------------------------ */
> +/*                         Public interface                        */
> +/* ------------------------------------------------------------ */

Nitpicks:
I don't think we typically do multiple /* */ comments in a row for this type
of thing. I also don't particularly like centering things like this, tends to
get inconsistent across comments.


> +/*
> + * be_tls_open_server
> + *
> + * Since NSPR initialization must happen after forking, most of the actual
> + * setup of NSPR/NSS is done here rather than in be_tls_init.

The "Since ... must happen after forking" sounds like it's referencing a
previously remarked upon fact. But I don't see anything but a copy of this
comment.

Does this make some things notably more expensive? Presumably it does remove a
bunch of COW opportunities, but likely that's not a huge factor compared to
assymetric crypto negotiation...

Maybe soem of this commentary should migrate to the file header or such?


> This introduce
> + * differences with the OpenSSL support where some errors are only reported
> + * at runtime with NSS where they are reported at startup with OpenSSL.

Found this sentence hard to parse somehow.

It seems pretty unfriendly to only have minimal error checking at postmaster
startup time. Seems at least the presence and usability of keys should be done
*also* at that time?


> +    /*
> +     * If no ciphers are specified, enable them all.
> +     */
> +    if (!SSLCipherSuites || strlen(SSLCipherSuites) == 0)
> +    {
> +        status = NSS_SetDomesticPolicy();
> +        if (status != SECSuccess)
> +        {
> +            ereport(COMMERROR,
> +                    (errmsg("unable to set cipher policy: %s",
> +                            pg_SSLerrmessage(PR_GetError()))));
> +            return -1;
> +        }
> +    }
> +    else
> +    {
> +        char       *ciphers,
> +                   *c;
> +
> +        char       *sep = ":;, ";
> +        PRUint16    ciphercode;
> +        const        PRUint16 *nss_ciphers;
> +        bool        found = false;
> +
> +        /*
> +         * If the user has specified a set of preferred cipher suites we start
> +         * by turning off all the existing suites to avoid the risk of down-
> +         * grades to a weaker cipher than expected.
> +         */
> +        nss_ciphers = SSL_GetImplementedCiphers();
> +        for (int i = 0; i < SSL_GetNumImplementedCiphers(); i++)
> +            SSL_CipherPrefSet(model, nss_ciphers[i], PR_FALSE);
> +
> +        ciphers = pstrdup(SSLCipherSuites);
> +
> +        for (c = strtok(ciphers, sep); c; c = strtok(NULL, sep))
> +        {
> +            if (pg_find_cipher(c, &ciphercode))
> +            {
> +                status = SSL_CipherPrefSet(model, ciphercode, PR_TRUE);
> +                found = true;
> +                if (status != SECSuccess)
> +                {
> +                    ereport(COMMERROR,
> +                            (errmsg("invalid cipher-suite specified: %s", c)));
> +                    return -1;

It likely doesn't matter much because the backend will exit, but because
COMERROR doesn't throw, it seems like this will leak "ciphers"?


> +                }
> +            }
> +        }
> +
> +        pfree(ciphers);
> +
> +        if (!found)
> +        {
> +            ereport(COMMERROR,
> +                    (errmsg("no cipher-suites found")));
> +            return -1;
> +        }
> +    }

Seems like this could reasonably done in a separate function?



> +    server_cert = PK11_FindCertFromNickname(ssl_cert_file, (void *) port);
> +    if (!server_cert)
> +    {
> +        if (dummy_ssl_passwd_cb_called)
> +        {
> +            ereport(COMMERROR,
> +                    (errmsg("unable to load certificate for \"%s\": %s",
> +                            ssl_cert_file, pg_SSLerrmessage(PR_GetError())),
> +                     errhint("The certificate requires a password.")));
> +            return -1;
> +        }

I assume PR_GetError() is some thread-local construct, given it's also used in
libpq? Why, oh why, do people copy the abysmal "global errno" approach
everywhere.


> +ssize_t
> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
> +{

I'm not a fan of duplicating the symbol names between be-secure-openssl.c and
this. For one it's annoying for source code naviation. It also seems that at
some point we might want to be able to link against both at the same time?
Maybe we should name them unambiguously and then use some indirection in a
header somewhere?


> +    ssize_t        n_read;
> +    PRErrorCode err;
> +
> +    n_read = PR_Read(port->pr_fd, ptr, len);
> +
> +    if (n_read < 0)
> +    {
> +        err = PR_GetError();
> +
> +        if (err == PR_WOULD_BLOCK_ERROR)
> +        {
> +            *waitfor = WL_SOCKET_READABLE;
> +            errno = EWOULDBLOCK;
> +        }
> +        else
> +            errno = ECONNRESET;
> +    }
> +
> +    return n_read;
> +}
> +
> +ssize_t
> +be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
> +{
> +    ssize_t        n_write;
> +    PRErrorCode err;
> +    PRIntn        flags = 0;
> +
> +    /*
> +     * The flags parameter to PR_Send is no longer used and is, according to
> +     * the documentation, required to be zero.
> +     */
> +    n_write = PR_Send(port->pr_fd, ptr, len, flags, PR_INTERVAL_NO_WAIT);
> +
> +    if (n_write < 0)
> +    {
> +        err = PR_GetError();
> +
> +        if (err == PR_WOULD_BLOCK_ERROR)
> +        {
> +            *waitfor = WL_SOCKET_WRITEABLE;
> +            errno = EWOULDBLOCK;
> +        }
> +        else
> +            errno = ECONNRESET;
> +    }
> +
> +    return n_write;
> +}
> +
> +/*
> + * be_tls_close
> + *
> + * Callback for closing down the current connection, if any.
> + */
> +void
> +be_tls_close(Port *port)
> +{
> +    if (!port)
> +        return;
> +    /*
> +     * Immediately signal to the rest of the backend that this connnection is
> +     * no longer to be considered to be using TLS encryption.
> +     */
> +    port->ssl_in_use = false;
> +
> +    if (port->peer_cn)
> +    {
> +        SSL_InvalidateSession(port->pr_fd);
> +        pfree(port->peer_cn);
> +        port->peer_cn = NULL;
> +    }
> +
> +    PR_Close(port->pr_fd);
> +    port->pr_fd = NULL;

What if we failed before initializing pr_fd?


> +    /*
> +     * Since there is no password callback in NSS when the server starts up,
> +     * it makes little sense to create an interactive callback. Thus, if this
> +     * is a retry attempt then give up immediately.
> +     */
> +    if (retry)
> +        return NULL;

That's really not great. Can't we do something like initialize NSS in
postmaster, load the key into memory, including prompting, and then shut nss
down again?



> +/*
> + * raw_subject_common_name
> + *
> + * Returns the Subject Common Name for the given certificate as a raw char
> + * buffer (that is, without any form of escaping for unprintable characters or
> + * embedded nulls), with the length of the buffer returned in the len param.
> + * The buffer is allocated in the TopMemoryContext and is given a NULL
> + * terminator so that callers are safe to call strlen() on it.
> + *
> + * This is used instead of CERT_GetCommonName(), which always performs quoting
> + * and/or escaping. NSS doesn't appear to give us a way to easily unescape the
> + * result, and we need to store the raw CN into port->peer_cn for compatibility
> + * with the OpenSSL implementation.
> + */

Do we have a testcase for embedded NULLs in common names?


> +static char *
> +raw_subject_common_name(CERTCertificate *cert, unsigned int *len)
> +{
> +    CERTName    subject = cert->subject;
> +    CERTRDN      **rdn;
> +
> +    for (rdn = subject.rdns; *rdn; rdn++)
> +    {
> +        CERTAVA      **ava;
> +
> +        for (ava = (*rdn)->avas; *ava; ava++)
> +        {
> +            SECItem       *buf;
> +            char       *cn;
> +
> +            if (CERT_GetAVATag(*ava) != SEC_OID_AVA_COMMON_NAME)
> +                continue;
> +
> +            /* Found a CN, decode and copy it into a newly allocated buffer */
> +            buf = CERT_DecodeAVAValue(&(*ava)->value);
> +            if (!buf)
> +            {
> +                /*
> +                 * This failure case is difficult to test. (Since this code
> +                 * runs after certificate authentication has otherwise
> +                 * succeeded, you'd need to convince a CA implementation to
> +                 * sign a corrupted certificate in order to get here.)

Why is that hard with a toy CA locally? Might not be worth the effort, but if
the comment explicitly talks about it being hard...



> +                 * Follow the behavior of CERT_GetCommonName() in this case and
> +                 * simply return NULL, as if a Common Name had not been found.
> +                 */
> +                goto fail;
> +            }
> +
> +            cn = MemoryContextAlloc(TopMemoryContext, buf->len + 1);
> +            memcpy(cn, buf->data, buf->len);
> +            cn[buf->len] = '\0';
> +
> +            *len = buf->len;
> +
> +            SECITEM_FreeItem(buf, PR_TRUE);
> +            return cn;
> +        }
> +    }
> +
> +fail:
> +    /* Not found  */
> +    *len = 0;
> +    return NULL;
> +}
>

> +/*
> + * pg_SSLShutdownFunc
> + *        Callback for NSS shutdown
> + *
> + * If NSS is terminated from the outside when the connection is still in use

What does "NSS is terminated from the outside when the connection" really
mean? Does this mean the client initiating something?


> + * we must treat this as potentially hostile and immediately close to avoid
> + * leaking the connection in any way. Once this is called, NSS will shutdown
> + * regardless so we may as well clean up the best we can. Returning SECFailure
> + * will cause the NSS shutdown to return with an error, but it will shutdown
> + * nevertheless. nss_data is reserved for future use and is always NULL.
> + */
> +static SECStatus
> +pg_SSLShutdownFunc(void *private_data, void *nss_data)
> +{
> +    Port *port = (Port *) private_data;
> +
> +    if (!port || !port->ssl_in_use)
> +        return SECSuccess;

How can that happen?


> +    /*
> +     * There is a connection still open, close it and signal to whatever that
> +     * called the shutdown that it was erroneous.
> +     */
> +    be_tls_close(port);
> +    be_tls_destroy();

And this doesn't have any dangerous around those functions getting called
again later?



> +void
> +pgtls_close(PGconn *conn)
> +{
> +    conn->ssl_in_use = false;
> +    conn->has_password = false;
> +
> +    /*
> +     * If the system trust module has been loaded we must try to unload it
> +     * before closing the context, since it will otherwise fail reporting a
> +     * SEC_ERROR_BUSY error.
> +     */
> +    if (ca_trust != NULL)
> +    {
> +        if (SECMOD_UnloadUserModule(ca_trust) != SECSuccess)
> +        {
> +            pqInternalNotice(&conn->noticeHooks,
> +                             "unable to unload trust module");
> +        }
> +        else
> +        {
> +            SECMOD_DestroyModule(ca_trust);
> +            ca_trust = NULL;
> +        }
> +    }

Might just misunderstand: How can it be ok to destroy ca_trust here? What if
there's other connections using it? The same thread might be using multiple
connections, and multiple threads might be using connections. Seems very much
not thread safe.


> +PostgresPollingStatusType
> +pgtls_open_client(PGconn *conn)
> +{
> +    SECStatus    status;
> +    PRFileDesc *model;
> +    NSSInitParameters params;
> +    SSLVersionRange desired_range;
> +
> +#ifdef ENABLE_THREAD_SAFETY
> +#ifdef WIN32
> +    /* This locking is modelled after fe-secure-openssl.c */
> +    if (ssl_config_mutex == NULL)
> +    {
> +        while (InterlockedExchange(&win32_ssl_create_mutex, 1) == 1)
> +            /* loop while another thread owns the lock */ ;
> +        if (ssl_config_mutex == NULL)
> +        {
> +            if (pthread_mutex_init(&ssl_config_mutex, NULL))
> +            {
> +                printfPQExpBuffer(&conn->errorMessage,
> +                                  libpq_gettext("unable to lock thread"));
> +                return PGRES_POLLING_FAILED;
> +            }
> +        }
> +        InterlockedExchange(&win32_ssl_create_mutex, 0);
> +    }
> +#endif
> +    if (pthread_mutex_lock(&ssl_config_mutex))
> +    {
> +        printfPQExpBuffer(&conn->errorMessage,
> +                          libpq_gettext("unable to lock thread"));
> +        return PGRES_POLLING_FAILED;
> +    }
> +#endif                            /* ENABLE_THREAD_SAFETY */

I'd very much like to avoid duplicating this code. Can we put it somewhere
combined instead?


> +    /*
> +     * The NSPR documentation states that runtime initialization via PR_Init
> +     * is no longer required, as the first caller into NSPR will perform the
> +     * initialization implicitly. See be-secure-nss.c for further discussion
> +     * on PR_Init.
> +     */
> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0);

Why does this, and several subsequent bits, have to happen under a lock?


> +    if (conn->ssl_max_protocol_version && strlen(conn->ssl_max_protocol_version) > 0)
> +    {
> +        int            ssl_max_ver = ssl_protocol_param_to_nss(conn->ssl_max_protocol_version);
> +
> +        if (ssl_max_ver == -1)
> +        {
> +            printfPQExpBuffer(&conn->errorMessage,
> +                              libpq_gettext("invalid value \"%s\" for maximum version of SSL protocol\n"),
> +                              conn->ssl_max_protocol_version);
> +            return -1;
> +        }
> +
> +        desired_range.max = ssl_max_ver;
> +    }
> +
> +    if (SSL_VersionRangeSet(model, &desired_range) != SECSuccess)
> +    {
> +        printfPQExpBuffer(&conn->errorMessage,
> +                          libpq_gettext("unable to set allowed SSL protocol version range: %s"),
> +                          pg_SSLerrmessage(PR_GetError()));
> +        return PGRES_POLLING_FAILED;
> +    }

Why are some parts returning -1 and some PGRES_POLLING_FAILED? -1 certainly
isn't a member of PostgresPollingStatusType.


> +                /*
> +                 * The error cases for PR_Recv are not documented, but can be
> +                 * reverse engineered from _MD_unix_map_default_error() in the
> +                 * NSPR code, defined in pr/src/md/unix/unix_errors.c.
> +                 */

Can we propose a patch to document them? Don't want to get bitten by this
suddenly changing...




> From a12769bd793a8e073125c3b3a176b355335646bc Mon Sep 17 00:00:00 2001
> From: Daniel Gustafsson <daniel@yesql.se>
> Date: Mon, 8 Feb 2021 23:52:45 +0100
> Subject: [PATCH v52 07/11] nss: Support NSS in pgcrypto
>
> This extends pgcrypto to be able to use libnss as a cryptographic
> backend for pgcrypto much like how OpenSSL is a supported backend.
> Blowfish is not a supported cipher in NSS, so the implementation
> falls back on the built-in BF code to be compatible in terms of
> cipher support.

I wish we didn't have pgcrypto in its current form.



> From 5079ce8a677074b93ef1f118d535c6dee4ce64f9 Mon Sep 17 00:00:00 2001
> From: Daniel Gustafsson <daniel@yesql.se>
> Date: Mon, 8 Feb 2021 23:52:55 +0100
> Subject: [PATCH v52 10/11] nss: Build infrastructure
> 
> Finally this adds the infrastructure to build a postgres installation
> with libnss support.

I would suggest trying to come up with a way to reorder / split the series so
that smaller pieces are committable. The way you have this right now leaves
you with applying all of it at once as the only realistic way. And this
patchset is too large for that.


Greetings,

Andres Freund



Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2022-01-19 at 10:01 +0100, Daniel Gustafsson wrote:
> > On 18 Jan 2022, at 17:37, Jacob Champion <pchampion@vmware.com> wrote:
> > 
> > On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
> > > I've attached a v50 which fixes the issues found by Joshua upthread, as well as
> > > rebases on top of all the recent SSL and pgcrypto changes.
> > 
> > I'm currently tracking down a slot leak. When opening and closing large
> > numbers of NSS databases, at some point we appear to run out of slots
> > and then NSS starts misbehaving, even though we've closed all of our
> > context handles.
> 
> Interesting, are you able to share a reproducer for this so I can assist in
> debugging it?

(This was in my spam folder, sorry for the delay...) Let me see if I
can minimize my current reproduction case and get it ported out of
Python.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Tue, 2022-01-25 at 22:26 +0000, Jacob Champion wrote:
> On Wed, 2022-01-19 at 10:01 +0100, Daniel Gustafsson wrote:
> > > On 18 Jan 2022, at 17:37, Jacob Champion <pchampion@vmware.com> wrote:
> > > 
> > > On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
> > > > I've attached a v50 which fixes the issues found by Joshua upthread, as well as
> > > > rebases on top of all the recent SSL and pgcrypto changes.
> > > 
> > > I'm currently tracking down a slot leak. When opening and closing large
> > > numbers of NSS databases, at some point we appear to run out of slots
> > > and then NSS starts misbehaving, even though we've closed all of our
> > > context handles.
> > 
> > Interesting, are you able to share a reproducer for this so I can assist in
> > debugging it?
> 
> (This was in my spam folder, sorry for the delay...) Let me see if I
> can minimize my current reproduction case and get it ported out of
> Python.

Here's my attempt at a Bash port. It has races but reliably reproduces
on my machine after 98 connections (there's a hardcoded slot limit of
100, so that makes sense when factoring in the internal NSS slots).

--Jacob

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 23 Jan 2022, at 22:20, Andres Freund <andres@anarazel.de> wrote:
> On 2022-01-18 13:42:54 +0100, Daniel Gustafsson wrote:

Thanks heaps for the review, much appreciated!

>> +  install_script: |
>> +    DEBIAN_FRONTEND=noninteractive apt-get --yes install libnss3 libnss3-dev libnss3-tools libnspr4 libnspr4-dev
>
> This needs an apt-get update beforehand to succeed. That's what caused the last few runs
> to fail, see e.g.
> https://cirrus-ci.com/task/6293612580306944

Ah, good point. Adding that made it indeed work.

> Just duplicating the task doesn't really scale once in tree.

Totally agree.  This was mostly a hack to see if I could make the CFBot build a
tailored build, then life threw school closures etc at me and I sort of forgot
about removing it again.

> What about
> reconfiguring (note: add --enable-depend) the linux tasks to build against
> nss, and then run the relevant subset of tests with it?  Most tests don't use
> tcp / SSL anyway, so rerunning a small subset of tests should be feasible?

That's an interesting idea, I think that could work and be reasonably readable
at the same time (and won't require in-depth knowledge of Cirrus).  As it's the
same task it does spend more time towards the max runtime per task, but that's
not a problem for now.  It's worth keeping in mind though if we deem this to be
a way forward with testing multiple settings.

>> From 297ee9ab31aa579e002edc335cce83dae19711b1 Mon Sep 17 00:00:00 2001
>> From: Daniel Gustafsson <daniel@yesql.se>
>> Date: Mon, 8 Feb 2021 23:52:22 +0100
>> Subject: [PATCH v52 01/11] nss: Support libnss as TLS library in libpq
>
>> 16 files changed, 3192 insertions(+), 7 deletions(-)
>
> Phew. This is a huge patch.

Yeah =/ ..  without going beyond and inventing new things on top what is needed
to replace OpenSSL, a lot of code (and tests) has to be written.  If nothing
else, this work at least highlights just how much we've come to use OpenSSL.

> Damn, I only opened this thread to report the CI failure. But now I ended up
> doing a small review...

Thanks! Next time we meet, I owe you a beverage of choice.

>> +#include "common/nss.h"
>> +
>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * Remove backwards compatibility with ancient NSPR versions to avoid this.
>> + */
>> +#define NO_NSPR_10_SUPPORT
>> +#include <nspr.h>
>> +#include <prerror.h>
>> +#include <prio.h>
>> +#include <prmem.h>
>> +#include <prtypes.h>
>
> Duplicated with nss.h. Which brings me to:

Fixed, there and elsewhere.

>> +#include <nss.h>
>
> Is it a great idea to have common/nss.h when there's a library header nss.h?
> Perhaps we should have a pg_ssl_{nss,openssl}.h or such?

That's a good point, I modelled it after common/openssl.h but I agree it's
better to differentiate the filenames.  I've renamed it to common/pg_nss.h and
we should IMO rename common/openssl.h regardless of what happens to this patch.

>> +/* ------------------------------------------------------------ */
>> +/*                         Public interface                        */
>> +/* ------------------------------------------------------------ */
>
> Nitpicks:
> I don't think we typically do multiple /* */ comments in a row for this type
> of thing. I also don't particularly like centering things like this, tends to
> get inconsistent across comments.

This is just a copy/paste from be-secure-openssl.c, but I'm far from married to
it so happy to remove. Fixed.

>> +/*
>> + * be_tls_open_server
>> + *
>> + * Since NSPR initialization must happen after forking, most of the actual
>> + * setup of NSPR/NSS is done here rather than in be_tls_init.
>
> The "Since ... must happen after forking" sounds like it's referencing a
> previously remarked upon fact. But I don't see anything but a copy of this
> comment.

NSS contexts aren't fork safe, IIRC it's around its use of file descriptors.
Fairly old NSS documentation and mailing list posts cite hardware tokens (which
was a very strong focus in the earlier days of NSS) not being safe to use across
forks and thus none of NSS was ever intended to be initialized until after the
fork. I've reworded this comment a bit to make that clearer.

> Does this make some things notably more expensive? Presumably it does remove a
> bunch of COW opportunities, but likely that's not a huge factor compared to
> assymetric crypto negotiation...

Right, the context of setting up crypto across a network connection it's highly
likely to drown out the costs.

> Maybe soem of this commentary should migrate to the file header or such?

Maybe, or perhaps README.ssl?  Not sure where it would be most reasonable to
keep it such that it's also kept up to date.

>> This introduce
>> + * differences with the OpenSSL support where some errors are only reported
>> + * at runtime with NSS where they are reported at startup with OpenSSL.
>
> Found this sentence hard to parse somehow.
>
> It seems pretty unfriendly to only have minimal error checking at postmaster
> startup time. Seems at least the presence and usability of keys should be done
> *also* at that time?

I'll look at adding some setup, and subsequent teardown, of NSS at startup
during which we could do checking to be more on par with how the OpenSSL
backend will report errors.

>> +    /*
>> +     * If no ciphers are specified, enable them all.
>> +     */
>> +    if (!SSLCipherSuites || strlen(SSLCipherSuites) == 0)
>> +    {
>> +        ...
>> +                if (status != SECSuccess)
>> +                {
>> +                    ereport(COMMERROR,
>> +                            (errmsg("invalid cipher-suite specified: %s", c)));
>> +                    return -1;
>
> It likely doesn't matter much because the backend will exit, but because
> COMERROR doesn't throw, it seems like this will leak "ciphers"?

Agreed, it won't matter much in practice but we should clearly pfree it, fixed.

>> +        pfree(ciphers);
>> +
>> +        if (!found)
>> +        {
>> +            ereport(COMMERROR,
>> +                    (errmsg("no cipher-suites found")));
>> +            return -1;
>> +        }
>> +    }
>
> Seems like this could reasonably done in a separate function?

Agreed, trimming the length of an already very long function is a good idea.
Fixed.

> I assume PR_GetError() is some thread-local construct, given it's also used in
> libpq?

Correct.

> Why, oh why, do people copy the abysmal "global errno" approach everywhere.

Even better, NSPR has two of them: PR_GetError and PR_GetOSError (the latter
isn't used in this implementation, but it could potentially be added to error
paths on NSS_InitContext and other calls that read off the filesystem).

>> +ssize_t
>> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
>> +{
>
> I'm not a fan of duplicating the symbol names between be-secure-openssl.c and
> this. For one it's annoying for source code naviation. It also seems that at
> some point we might want to be able to link against both at the same time?
> Maybe we should name them unambiguously and then use some indirection in a
> header somewhere?

We could do that, and that's something that we could do independently of this
patch to keep the scope down.  Doing it in master now with just the OpenSSL
implementation as a consumer would be a logical next step in the TLS library
abstraction we've done.

>> +    PR_Close(port->pr_fd);
>> +    port->pr_fd = NULL;
>
> What if we failed before initializing pr_fd?

Fixed.

>> +    /*
>> +     * Since there is no password callback in NSS when the server starts up,
>> +     * it makes little sense to create an interactive callback. Thus, if this
>> +     * is a retry attempt then give up immediately.
>> +     */
>> +    if (retry)
>> +        return NULL;
>
> That's really not great. Can't we do something like initialize NSS in
> postmaster, load the key into memory, including prompting, and then shut nss
> down again?

I can look at doing something along those lines.  It does require setting up a
fair bit of infrastructure but if the code refactored to allow reuse it can
probably be done fairly readable.

>> +/*
>> + * raw_subject_common_name
>> + *
>> + * Returns the Subject Common Name for the given certificate as a raw char
>> + * buffer (that is, without any form of escaping for unprintable characters or
>> + * embedded nulls), with the length of the buffer returned in the len param.
>> + * The buffer is allocated in the TopMemoryContext and is given a NULL
>> + * terminator so that callers are safe to call strlen() on it.
>> + *
>> + * This is used instead of CERT_GetCommonName(), which always performs quoting
>> + * and/or escaping. NSS doesn't appear to give us a way to easily unescape the
>> + * result, and we need to store the raw CN into port->peer_cn for compatibility
>> + * with the OpenSSL implementation.
>> + */
>
> Do we have a testcase for embedded NULLs in common names?

We don't, neither for OpenSSL or NSS.  AFAICR Jacob spent days trying to get a
certificate generation to include an embedded NULL byte but in the end gave up.
We would have to write our own tools for generating certificates to add that
(which may or may not be a bad idea, but it hasn't been done).

>> +            /* Found a CN, decode and copy it into a newly allocated buffer */
>> +            buf = CERT_DecodeAVAValue(&(*ava)->value);
>> +            if (!buf)
>> +            {
>> +                /*
>> +                 * This failure case is difficult to test. (Since this code
>> +                 * runs after certificate authentication has otherwise
>> +                 * succeeded, you'd need to convince a CA implementation to
>> +                 * sign a corrupted certificate in order to get here.)
>
> Why is that hard with a toy CA locally? Might not be worth the effort, but if
> the comment explicitly talks about it being hard...

The gist of this comment is that it's hard to do with a stock local CA.  I've
added a small blurb to clarify that a custom implementation would be required.

>> +/*
>> + * pg_SSLShutdownFunc
>> + *        Callback for NSS shutdown
>> + *
>> + * If NSS is terminated from the outside when the connection is still in use
>
> What does "NSS is terminated from the outside when the connection" really
> mean? Does this mean the client initiating something?

If an extension, or other server-loaded code, interfered with NSS and managed
to close contexts in order to interfere with connections this would ensure us
closing it down cleanly.

That being said, I was now unable to get my old testcase working so I've for
now removed this callback from the patch until I can work out if we can make
proper use of it.  AFAICS other mature NSS implementations aren't using it
(OpenLDAP did in the past but have since removed it, will look at how/why).

>> +        else
>> +        {
>> +            SECMOD_DestroyModule(ca_trust);
>> +            ca_trust = NULL;
>> +        }
>> +    }
>
> Might just misunderstand: How can it be ok to destroy ca_trust here? What if
> there's other connections using it? The same thread might be using multiple
> connections, and multiple threads might be using connections. Seems very much
> not thread safe.

Right, that's a leftover from early hacking that I had missed. Fixed.

>> +    /* This locking is modelled after fe-secure-openssl.c */
>> +    if (ssl_config_mutex == NULL)
>> +    {
>> +    ...
>
> I'd very much like to avoid duplicating this code. Can we put it somewhere
> combined instead?

I can look at splitting it out to fe-secure-common.c.  A first step here to
keep the goalposts from moving in this patch would be to look at combining lock
init in fe-secure-openssl.c:pgtls_init() and fe-connect.c:default_threadlock,
and then just apply the same recipe here once landed.  This could be done
independent of this patch.

>> +    /*
>> +     * The NSPR documentation states that runtime initialization via PR_Init
>> +     * is no longer required, as the first caller into NSPR will perform the
>> +     * initialization implicitly. See be-secure-nss.c for further discussion
>> +     * on PR_Init.
>> +     */
>> +    PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0);
>
> Why does this, and several subsequent bits, have to happen under a lock?

NSS initialization isn't thread-safe, there is more discussion upthread in and
around this email:

https://postgr.es/m/c8d4bc0dfd266799ab4213f1673a813786ac0c70.camel@vmware.com

> Why are some parts returning -1 and some PGRES_POLLING_FAILED? -1 certainly
> isn't a member of PostgresPollingStatusType.

That was a thinko, fixed.

>> +                /*
>> +                 * The error cases for PR_Recv are not documented, but can be
>> +                 * reverse engineered from _MD_unix_map_default_error() in the
>> +                 * NSPR code, defined in pr/src/md/unix/unix_errors.c.
>> +                 */
>
> Can we propose a patch to document them? Don't want to get bitten by this
> suddenly changing...

I can certainly propose something on their mailinglist, but I unfortunately
wouldn't get my hopes up too high as NSS and documentation aren't exactly best
friends (the in-tree docs doesn't cover the API and Mozilla recently removed
most of the online docs in their neverending developer site reorg).

>> From a12769bd793a8e073125c3b3a176b355335646bc Mon Sep 17 00:00:00 2001
>> From: Daniel Gustafsson <daniel@yesql.se>
>> Date: Mon, 8 Feb 2021 23:52:45 +0100
>> Subject: [PATCH v52 07/11] nss: Support NSS in pgcrypto
>>
>> This extends pgcrypto to be able to use libnss as a cryptographic
>> backend for pgcrypto much like how OpenSSL is a supported backend.
>> Blowfish is not a supported cipher in NSS, so the implementation
>> falls back on the built-in BF code to be compatible in terms of
>> cipher support.
>
> I wish we didn't have pgcrypto in its current form.

Yes.  Very much yes.  I don't think doing anything about that in the context of
this patch is wise, but a discussion on where to take pgcrypto in the future
would probably be a good idea.

>> From 5079ce8a677074b93ef1f118d535c6dee4ce64f9 Mon Sep 17 00:00:00 2001
>> From: Daniel Gustafsson <daniel@yesql.se>
>> Date: Mon, 8 Feb 2021 23:52:55 +0100
>> Subject: [PATCH v52 10/11] nss: Build infrastructure
>>
>> Finally this adds the infrastructure to build a postgres installation
>> with libnss support.
>
> I would suggest trying to come up with a way to reorder / split the series so
> that smaller pieces are committable. The way you have this right now leaves
> you with applying all of it at once as the only realistic way. And this
> patchset is too large for that.

I completely agree, the hard part is identifying smaller sets which also make
sense and which doesn't leave the tree in a bad state should anyone check out
that specific point in time.

The two commits in the patchset that are "easy" to consider for pushing
independently in this regard are IMO:

  * 0002 Test refactoring to support multiple TLS libraries.
  * 0004 Check for empty stderr during connect_ok

The refactoring in 0002 is hopefully not too controversial, but it clearly
needs eyes from someone more familiar with modern and idiomatic Perl.  0004
could IMO be pushed regardless of the fate of this patchset (after being
floated in its own thread on -hackers).

In order to find a good split I think we need to figure what to optimize for;
do we optimize for ease of reverting should that be needed, or along
functionality borders, or something else?  I don't have good ideas here, but a
single 7596 insertions(+), 421 deletions(-) commit is clearly not a good idea.

Stephen had an idea off-list that we could look at splitting this across the
server/client boundary, which I think is the only idea I've so far which has
legs. (The first to go in would come with the common code of course.)

Do you have any thoughts after reading through the patch?

The attached v53 incorporates the fixes discussed above, and builds green for
both OpenSSL and NSS in Cirrus on my Github repo (thanks again for your work on
those files) so it will be interesting to see the CFBot running them.  Next
would be to figure out how to make the MSVC build it, basing an attempt on
Andrew's blogpost.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Support for NSS as a libpq TLS backend

From
Andres Freund
Date:
Hi,

On 2022-01-26 21:39:16 +0100, Daniel Gustafsson wrote:
> > What about
> > reconfiguring (note: add --enable-depend) the linux tasks to build against
> > nss, and then run the relevant subset of tests with it?  Most tests don't use
> > tcp / SSL anyway, so rerunning a small subset of tests should be feasible?
> 
> That's an interesting idea, I think that could work and be reasonably readable
> at the same time (and won't require in-depth knowledge of Cirrus).  As it's the
> same task it does spend more time towards the max runtime per task, but that's
> not a problem for now.  It's worth keeping in mind though if we deem this to be
> a way forward with testing multiple settings.

I think it's a way for a limited number of settings, that each only require a
limited amount of tests... Rerunning all tests etc is a different story.



> > Is it a great idea to have common/nss.h when there's a library header nss.h?
> > Perhaps we should have a pg_ssl_{nss,openssl}.h or such?
> 
> That's a good point, I modelled it after common/openssl.h but I agree it's
> better to differentiate the filenames.  I've renamed it to common/pg_nss.h and
> we should IMO rename common/openssl.h regardless of what happens to this patch.

+1


> > Does this make some things notably more expensive? Presumably it does remove a
> > bunch of COW opportunities, but likely that's not a huge factor compared to
> > assymetric crypto negotiation...
> 
> Right, the context of setting up crypto across a network connection it's highly
> likely to drown out the costs.

If you start to need to run a helper to decrypt an encrypted private key, and
do all the initialization, I'm not sure sure that holds true anymore... Have
you done any connection speed tests? pgbench -C is helpful for that.


> > Maybe soem of this commentary should migrate to the file header or such?
> 
> Maybe, or perhaps README.ssl?  Not sure where it would be most reasonable to
> keep it such that it's also kept up to date.

Either would work for me.


> >> This introduce
> >> + * differences with the OpenSSL support where some errors are only reported
> >> + * at runtime with NSS where they are reported at startup with OpenSSL.
> > 
> > Found this sentence hard to parse somehow.
> > 
> > It seems pretty unfriendly to only have minimal error checking at postmaster
> > startup time. Seems at least the presence and usability of keys should be done
> > *also* at that time?
> 
> I'll look at adding some setup, and subsequent teardown, of NSS at startup
> during which we could do checking to be more on par with how the OpenSSL
> backend will report errors.

Cool.


> >> +/*
> >> + * raw_subject_common_name
> >> + *
> >> + * Returns the Subject Common Name for the given certificate as a raw char
> >> + * buffer (that is, without any form of escaping for unprintable characters or
> >> + * embedded nulls), with the length of the buffer returned in the len param.
> >> + * The buffer is allocated in the TopMemoryContext and is given a NULL
> >> + * terminator so that callers are safe to call strlen() on it.
> >> + *
> >> + * This is used instead of CERT_GetCommonName(), which always performs quoting
> >> + * and/or escaping. NSS doesn't appear to give us a way to easily unescape the
> >> + * result, and we need to store the raw CN into port->peer_cn for compatibility
> >> + * with the OpenSSL implementation.
> >> + */
> > 
> > Do we have a testcase for embedded NULLs in common names?
> 
> We don't, neither for OpenSSL or NSS.  AFAICR Jacob spent days trying to get a
> certificate generation to include an embedded NULL byte but in the end gave up.
> We would have to write our own tools for generating certificates to add that
> (which may or may not be a bad idea, but it hasn't been done).

Hah, that's interesting.


> >> +/*
> >> + * pg_SSLShutdownFunc
> >> + *        Callback for NSS shutdown
> >> + *
> >> + * If NSS is terminated from the outside when the connection is still in use
> > 
> > What does "NSS is terminated from the outside when the connection" really
> > mean? Does this mean the client initiating something?
> 
> If an extension, or other server-loaded code, interfered with NSS and managed
> to close contexts in order to interfere with connections this would ensure us
> closing it down cleanly.
> 
> That being said, I was now unable to get my old testcase working so I've for
> now removed this callback from the patch until I can work out if we can make
> proper use of it.  AFAICS other mature NSS implementations aren't using it
> (OpenLDAP did in the past but have since removed it, will look at how/why).

I think that'd be elog(FATAL) time if we want to do anything (after changing
state so that no data is sent to client).


> >> +                /*
> >> +                 * The error cases for PR_Recv are not documented, but can be
> >> +                 * reverse engineered from _MD_unix_map_default_error() in the
> >> +                 * NSPR code, defined in pr/src/md/unix/unix_errors.c.
> >> +                 */
> > 
> > Can we propose a patch to document them? Don't want to get bitten by this
> > suddenly changing...
> 
> I can certainly propose something on their mailinglist, but I unfortunately
> wouldn't get my hopes up too high as NSS and documentation aren't exactly best
> friends (the in-tree docs doesn't cover the API and Mozilla recently removed
> most of the online docs in their neverending developer site reorg).

Kinda makes me question the wisdom of starting to depend on NSS. When openssl
docs are vastly outshining a library's, that library really should start to
ask itself some hard questions.



> In order to find a good split I think we need to figure what to optimize for;
> do we optimize for ease of reverting should that be needed, or along
> functionality borders, or something else?  I don't have good ideas here, but a
> single 7596 insertions(+), 421 deletions(-) commit is clearly not a good idea.

I think the goal should be the ability to incrementally commit.


> Stephen had an idea off-list that we could look at splitting this across the
> server/client boundary, which I think is the only idea I've so far which has
> legs. (The first to go in would come with the common code of course.)

Yea, that's the most obvious one. I suspect client-side has a lower
complexity, because it doesn't need to replace quite as many things?


> The attached v53 incorporates the fixes discussed above, and builds green for
> both OpenSSL and NSS in Cirrus on my Github repo (thanks again for your work on
> those files) so it will be interesting to see the CFBot running them.

Looks like that worked...

Greetings,

Andres Freund



Re: Support for NSS as a libpq TLS backend

From
Jacob Champion
Date:
On Wed, 2022-01-26 at 15:59 -0800, Andres Freund wrote:
> > > Do we have a testcase for embedded NULLs in common names?
> > 
> > We don't, neither for OpenSSL or NSS.  AFAICR Jacob spent days trying to get a
> > certificate generation to include an embedded NULL byte but in the end gave up.
> > We would have to write our own tools for generating certificates to add that
> > (which may or may not be a bad idea, but it hasn't been done).
> 
> Hah, that's interesting.

Yeah, OpenSSL just refused to do it, with any method I could find at
least. My personal test suite is using pyca/cryptography and psycopg2
to cover that case.

--Jacob

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
>>> Can we propose a patch to document them? Don't want to get bitten by this
>>> suddenly changing...
>>
>> I can certainly propose something on their mailinglist, but I unfortunately
>> wouldn't get my hopes up too high as NSS and documentation aren't exactly best
>> friends (the in-tree docs doesn't cover the API and Mozilla recently removed
>> most of the online docs in their neverending developer site reorg).
>
> Kinda makes me question the wisdom of starting to depend on NSS. When openssl
> docs are vastly outshining a library's, that library really should start to
> ask itself some hard questions.

Sadly, there is that.  While this is not a new problem, Mozilla has been making
some very weird decisions around NSS governance as of late.  Another data point
is the below thread from libcurl:

    https://curl.se/mail/lib-2022-01/0120.html

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Robert Haas
Date:
On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> > Kinda makes me question the wisdom of starting to depend on NSS. When openssl
> > docs are vastly outshining a library's, that library really should start to
> > ask itself some hard questions.

Yeah, OpenSSL is very poor, so being worse is not good.

> Sadly, there is that.  While this is not a new problem, Mozilla has been making
> some very weird decisions around NSS governance as of late.  Another data point
> is the below thread from libcurl:
>
>     https://curl.se/mail/lib-2022-01/0120.html

I would really, really like to have an alternative to OpenSSL for PG.
I don't know if this is the right thing, though. If other people are
dropping support for it, that's a pretty bad sign IMHO. Later in the
thread it says OpenLDAP have dropped support for it already as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>>> Kinda makes me question the wisdom of starting to depend on NSS. When openssl
>>> docs are vastly outshining a library's, that library really should start to
>>> ask itself some hard questions.
>
> Yeah, OpenSSL is very poor, so being worse is not good.

Some background on this for anyone interested: Mozilla removed the
documentation from the MDN website and the attempt at resurrecting it in the
tree (where it should've been all along </rant>) isn't making much progress.
Some more can be found in this post on the NSS mailinglist:

https://groups.google.com/a/mozilla.org/g/dev-tech-crypto/c/p0MO7030K4A/m/Mx5St_2sAwAJ

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>>> Kinda makes me question the wisdom of starting to depend on NSS. When openssl
>>> docs are vastly outshining a library's, that library really should start to
>>> ask itself some hard questions.
>
> Yeah, OpenSSL is very poor, so being worse is not good.
>
>> Sadly, there is that.  While this is not a new problem, Mozilla has been making
>> some very weird decisions around NSS governance as of late.  Another data point
>> is the below thread from libcurl:
>>
>>    https://curl.se/mail/lib-2022-01/0120.html
>
> I would really, really like to have an alternative to OpenSSL for PG.
> I don't know if this is the right thing, though. If other people are
> dropping support for it, that's a pretty bad sign IMHO. Later in the
> thread it says OpenLDAP have dropped support for it already as well.

I'm counting this and Andres' comment as a -1 on the patchset, and given where
we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
objects.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> >>> Kinda makes me question the wisdom of starting to depend on NSS. When openssl
> >>> docs are vastly outshining a library's, that library really should start to
> >>> ask itself some hard questions.
> >
> > Yeah, OpenSSL is very poor, so being worse is not good.
> >
> >> Sadly, there is that.  While this is not a new problem, Mozilla has been making
> >> some very weird decisions around NSS governance as of late.  Another data point
> >> is the below thread from libcurl:
> >>
> >>    https://curl.se/mail/lib-2022-01/0120.html
> >
> > I would really, really like to have an alternative to OpenSSL for PG.
> > I don't know if this is the right thing, though. If other people are
> > dropping support for it, that's a pretty bad sign IMHO. Later in the
> > thread it says OpenLDAP have dropped support for it already as well.
>
> I'm counting this and Andres' comment as a -1 on the patchset, and given where
> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
> objects.

I agree that it's concerning to hear that OpenLDAP dropped support for
NSS... though I don't seem to be able to find any information as to why
they decided to do so.  NSS is clearly still supported and maintained
and they do seem to understand that they need to work on the
documentation situation and to get that fixed (the current issue seems
to be around NSS vs. NSPR and the migration off of MDN to the in-tree
documentation as Daniel mentioned, if I followed the discussion
correctly in the bug that was filed by the curl folks and was then
actively responded to by the NSS/NSPR folks), which seems to be the main
issue that's being raised about it by the curl folks and here.

I'm also very much a fan of having an alternative to OpenSSL and the
NSS/NSPR license fits well for us, unlike the alternatives to OpenSSL
used by other projects, such as GnuTLS (which is the alternative to
OpenSSL that OpenLDAP now has) or other libraries like wolfSSL.

Beyond the documentation issue, which I agree is a concern but also
seems to be actively realized as an issue by the NSS/NSPR folks, is
there some other reason that the curl folks are thinking of dropping
support for it?  Or does anyone have insight into why OpenLDAP decided
to remove support?

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Andres Freund
Date:
Hi,

On 2022-01-31 14:24:03 +0100, Daniel Gustafsson wrote:
> > On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote:
> > I would really, really like to have an alternative to OpenSSL for PG.
> > I don't know if this is the right thing, though. If other people are
> > dropping support for it, that's a pretty bad sign IMHO. Later in the
> > thread it says OpenLDAP have dropped support for it already as well.
> 
> I'm counting this and Andres' comment as a -1 on the patchset, and given where
> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
> objects.

I'd make mine more a -0.2 or so. I'm concerned about the lack of non-code
documentation and the state of code documentation. I'd like an openssl
alternative, although not as much as a few years ago - it seems that the state
of openssl has improved compared to most of the other implementations.

Greetings,

Andres Freund



Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 31 Jan 2022, at 17:24, Stephen Frost <sfrost@snowman.net> wrote:
> * Daniel Gustafsson (daniel@yesql.se) wrote:

>> I'm counting this and Andres' comment as a -1 on the patchset, and given where
>> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
>> objects.
>
> I agree that it's concerning to hear that OpenLDAP dropped support for
> NSS... though I don't seem to be able to find any information as to why
> they decided to do so.

I was also unable to do that.  There is no information that I could see in
either the commit message, Bugzilla entry (#9207) or on the mailinglist.
Searching the web didn't yield anything either.  I've reached out to hopefully
get a bit more information.

> I'm also very much a fan of having an alternative to OpenSSL and the
> NSS/NSPR license fits well for us, unlike the alternatives to OpenSSL
> used by other projects, such as GnuTLS (which is the alternative to
> OpenSSL that OpenLDAP now has) or other libraries like wolfSSL.

Short of platform specific (proprietary) libraries like Schannel and Secure
Transport, the alternatives are indeed slim.

> Beyond the documentation issue, which I agree is a concern but also
> seems to be actively realized as an issue by the NSS/NSPR folks,

It is, but it has also been an issue for years to be honest, getting the docs
up to scratch will require a very large effort.

> is there some other reason that the curl folks are thinking of dropping support
> for it?

It's also not really used anymore in conjunction with curl, with Red Hat no
longer shipping builds against it.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 31 Jan 2022, at 22:32, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-01-31 14:24:03 +0100, Daniel Gustafsson wrote:
>>> On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I would really, really like to have an alternative to OpenSSL for PG.
>>> I don't know if this is the right thing, though. If other people are
>>> dropping support for it, that's a pretty bad sign IMHO. Later in the
>>> thread it says OpenLDAP have dropped support for it already as well.
>>
>> I'm counting this and Andres' comment as a -1 on the patchset, and given where
>> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
>> objects.
>
> I'd make mine more a -0.2 or so. I'm concerned about the lack of non-code
> documentation and the state of code documentation. I'd like an openssl
> alternative, although not as much as a few years ago - it seems that the state
> of openssl has improved compared to most of the other implementations.

IMHO I think OpenSSL has improved over OpenSSL of the past - which is great to
see - but they have also diverged themselves into writing a full QUIC
implementation which *I personally think* is a distraction they don't need.

That being said, there aren't too many other options.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 31 Jan 2022, at 22:48, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 31 Jan 2022, at 17:24, Stephen Frost <sfrost@snowman.net> wrote:

>> I agree that it's concerning to hear that OpenLDAP dropped support for
>> NSS... though I don't seem to be able to find any information as to why
>> they decided to do so.
>
> I was also unable to do that.  There is no information that I could see in
> either the commit message, Bugzilla entry (#9207) or on the mailinglist.
> Searching the web didn't yield anything either.  I've reached out to hopefully
> get a bit more information.

Support issues and Red Hat dropping OpenLDAP was cited [0] as the main drivers
for dropping NSS.

--
Daniel Gustafsson        https://vmware.com/

[0] https://curl.se/mail/lib-2022-02/0000.html


Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 31 Jan 2022, at 22:48, Daniel Gustafsson <daniel@yesql.se> wrote:
> >> On 31 Jan 2022, at 17:24, Stephen Frost <sfrost@snowman.net> wrote:
>
> >> I agree that it's concerning to hear that OpenLDAP dropped support for
> >> NSS... though I don't seem to be able to find any information as to why
> >> they decided to do so.
> >
> > I was also unable to do that.  There is no information that I could see in
> > either the commit message, Bugzilla entry (#9207) or on the mailinglist.
> > Searching the web didn't yield anything either.  I've reached out to hopefully
> > get a bit more information.
>
> Support issues and Red Hat dropping OpenLDAP was cited [0] as the main drivers
> for dropping NSS.

That's both very vaugue and oddly specific, I have to say.  Also, not
really sure that it's a good reason for other projects to move away, or
for the large amount of work put into this effort to be thrown out when
it seems to be quite close to finally being done and giving us an
alternative, supported and maintained, TLS/SSL library.

The concern about the documentation not being easily available is
certainly something to consider.  I remember in prior reviews not having
that much difficulty looking up documentation for functions, and in
doing some quick looking around there's certainly some (most?) of the
NSS documentation still up, the issue is that the NSPR documentation was
taken off of the MDN website and that's referenced from the NSS pages
and is obviously something that folks working with NSS need to be able
to find the documentation for too.

All that said, while have documentation on the web is nice and all, it
seems to still be in the source, at least when I grabbed NSPR locally
with apt-get source and looked at PR_Recv, I found:

/*
 *************************************************************************
 * FUNCTION: PR_Recv
 * DESCRIPTION:
 *    Receive a specified number of bytes from a connected socket.
 *     The operation will block until some positive number of bytes are
 *     transferred, a time out has occurred, or there is an error.
 *     No more than 'amount' bytes will be transferred.
 * INPUTS:
 *     PRFileDesc *fd
 *       points to a PRFileDesc object representing a socket.
 *     void *buf
 *       pointer to a buffer to hold the data received.
 *     PRInt32 amount
 *       the size of 'buf' (in bytes)
 *     PRIntn flags
 *       must be zero or PR_MSG_PEEK.
 *     PRIntervalTime timeout
 *       Time limit for completion of the receive operation.
 * OUTPUTS:
 *     None
 * RETURN: PRInt32
 *         a positive number indicates the number of bytes actually received.
 *         0 means the network connection is closed.
 *         -1 indicates a failure. The reason for the failure is obtained
 *         by calling PR_GetError().
 **************************************************************************
 */

So, it's not the case that the documentation is completely gone and
utterly unavailable to those who are interested in it, it's just in the
source rather than being on a nicely formatted webpage.  One can find it
on the web too, naturally:

https://github.com/thespooler/nspr/blob/29ba433ebceda269d2b0885176b7f8cd4c5c2c52/pr/include/prio.h#L1424

(no idea what version that is, just found a random github repo with it,
but wouldn't be hard to import the latest version).

Considering how much we point people to our source when they're writing
extensions and such, this doesn't strike me as quite the dire situation
that it first appeared to be based on the initial comments.  There is
documentation, it's not actually that hard to find if you're working
with the library, and the maintainers have stated their intention to
work on improving the web-based documentation.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Andres Freund
Date:
Hi,

On 2022-02-01 15:12:28 -0500, Stephen Frost wrote:
> The concern about the documentation not being easily available is
> certainly something to consider.  I remember in prior reviews not having
> that much difficulty looking up documentation for functions

I've definitely several times in the course of this thread asked for
documentation about specific bits and there was none. And not just recently.


> All that said, while have documentation on the web is nice and all, it
> seems to still be in the source, at least when I grabbed NSPR locally
> with apt-get source and looked at PR_Recv, I found:

What I'm most concerned about is less the way individual functions work, and
more a bit higher level things. Like e.g. about not being allowed to
fork. Which has significant design implications given postgres' process
model...


I think some documentation has been re-uploaded in the last few days. I recall
the content around https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS
being gone too, last time I checked.


> So, it's not the case that the documentation is completely gone and
> utterly unavailable to those who are interested in it, it's just in the
> source rather than being on a nicely formatted webpage.  One can find it
> on the web too, naturally:

> https://github.com/thespooler/nspr/blob/29ba433ebceda269d2b0885176b7f8cd4c5c2c52/pr/include/prio.h#L1424

> (no idea what version that is, just found a random github repo with it,
> but wouldn't be hard to import the latest version).

It's last been updated 2015...

There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I
think the upstream source.

A project without even a bare-minimal README at the root does have a "internal
only" feel to it...

Greetings,

Andres Freund



Re: Support for NSS as a libpq TLS backend

From
Bruce Momjian
Date:
On Tue, Feb  1, 2022 at 01:52:09PM -0800, Andres Freund wrote:
> There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I
> think the upstream source.
> 
> A project without even a bare-minimal README at the root does have a "internal
> only" feel to it...

I agree --- it is a library --- if they don't feel the need to publish
the API, it seems to mean they want to maintain the ability to change it
at any time, and therefore it is inappropriate for other software to
rely on that API.

This is not the same as Postgres extensions needing to read the Postgres
source code --- they are an important but edge use case and we never saw
the need to standardize or publish the internal functions that must be
studied and adjusted possibly for major releases.

This kind of feels like the Chrome JavaScript code that used to be able
to be build separately for PL/v8, but has gotten much harder to do in
the past few years.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Support for NSS as a libpq TLS backend

From
Peter Eisentraut
Date:
On 28.01.22 15:30, Robert Haas wrote:
> I would really, really like to have an alternative to OpenSSL for PG.

What are the reasons people want that?  With OpenSSL 3, the main reasons 
-- license and FIPS support -- have gone away.




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 3 Feb 2022, at 15:07, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 28.01.22 15:30, Robert Haas wrote:
>> I would really, really like to have an alternative to OpenSSL for PG.
>
> What are the reasons people want that?  With OpenSSL 3, the main reasons -- license and FIPS support -- have gone
away.

At least it will go away when OpenSSL 3 is FIPS certified, which is yet to
happen (submitted, not processed).

I see quite a few valid reasons to want an alternative, a few off the top of my
head include:

- Using trust stores like Keychain on macOS with Secure Transport.  There is
AFAIK something similar on Windows and NSS has it's certificate databases.
Especially on client side libpq it would be quite nice to integrate with where
certificates already are rather than rely on files on disks.

- Not having to install OpenSSL, Schannel and Secure Transport would make life
easier for packagers.

- Simply having an alternative.  The OpenSSL projects recent venture into
writing transport protocols have made a lot of people worried over their
bandwidth for fixing and supporting core features.

Just my $0.02, everyones mileage varies on these.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Feb  1, 2022 at 01:52:09PM -0800, Andres Freund wrote:
> > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I
> > think the upstream source.
> >
> > A project without even a bare-minimal README at the root does have a "internal
> > only" feel to it...
>
> I agree --- it is a library --- if they don't feel the need to publish
> the API, it seems to mean they want to maintain the ability to change it
> at any time, and therefore it is inappropriate for other software to
> rely on that API.

This is really not a reasonable representation of how this library has
been maintained historically nor is there any reason to think that their
policy regarding the API has changed recently.  They do have a
documented API and that hasn't changed- it's just that it's not easily
available in web-page form any longer and that's due to something
independent of the library maintenance.  They've also done a good job
with maintaining the API as one would expect from a library and so this
really isn't a reason to avoid using it.  If there's actual specific
examples of the API not being well maintained and causing issues then
please point to them and we can discuss if that is a reason to consider
not depending on NSS/NSPR.

> This is not the same as Postgres extensions needing to read the Postgres
> source code --- they are an important but edge use case and we never saw
> the need to standardize or publish the internal functions that must be
> studied and adjusted possibly for major releases.

I agree that extensions and public libraries aren't entirely the same
but I don't think it's all that unreasonable for developers that are
using a library to look at the source code for that library when
developing against it, that's certainly something I've done for a
number of different libraries.

> This kind of feels like the Chrome JavaScript code that used to be able
> to be build separately for PL/v8, but has gotten much harder to do in
> the past few years.

This isn't at all like that case, where the maintainers made a very
clear and intentional choice to make it quite difficult for packagers to
pull v8 out to package it.  Nothing like that has happened with NSS and
there isn't any reason to think that it will based on what the
maintainers have said and what they've done across the many years that
NSS has been around.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 3 Feb 2022, at 15:07, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
> >
> > On 28.01.22 15:30, Robert Haas wrote:
> >> I would really, really like to have an alternative to OpenSSL for PG.
> >
> > What are the reasons people want that?  With OpenSSL 3, the main reasons -- license and FIPS support -- have gone
away.
>
> At least it will go away when OpenSSL 3 is FIPS certified, which is yet to
> happen (submitted, not processed).
>
> I see quite a few valid reasons to want an alternative, a few off the top of my
> head include:
>
> - Using trust stores like Keychain on macOS with Secure Transport.  There is
> AFAIK something similar on Windows and NSS has it's certificate databases.
> Especially on client side libpq it would be quite nice to integrate with where
> certificates already are rather than rely on files on disks.
>
> - Not having to install OpenSSL, Schannel and Secure Transport would make life
> easier for packagers.
>
> - Simply having an alternative.  The OpenSSL projects recent venture into
> writing transport protocols have made a lot of people worried over their
> bandwidth for fixing and supporting core features.
>
> Just my $0.02, everyones mileage varies on these.

Yeah, agreed on all of these.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Peter Eisentraut
Date:
On 03.02.22 15:53, Daniel Gustafsson wrote:
> I see quite a few valid reasons to want an alternative, a few off the top of my
> head include:
> 
> - Using trust stores like Keychain on macOS with Secure Transport.  There is
> AFAIK something similar on Windows and NSS has it's certificate databases.
> Especially on client side libpq it would be quite nice to integrate with where
> certificates already are rather than rely on files on disks.
> 
> - Not having to install OpenSSL, Schannel and Secure Transport would make life
> easier for packagers.

Those are good reasons for Schannel and Secure Transport, less so for NSS.

> - Simply having an alternative.  The OpenSSL projects recent venture into
> writing transport protocols have made a lot of people worried over their
> bandwidth for fixing and supporting core features.

If we want simply an alternative, we had a GnuTLS variant almost done a 
few years ago, but in the end people didn't want it enough.  It seems to 
be similar now.




Re: Support for NSS as a libpq TLS backend

From
Robert Haas
Date:
On Thu, Feb 3, 2022 at 2:16 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> If we want simply an alternative, we had a GnuTLS variant almost done a
> few years ago, but in the end people didn't want it enough.  It seems to
> be similar now.

Yeah. I think it's pretty clear that the only real downside of
committing support for GnuTLS or NSS or anything else is that we then
need to maintain that support (or eventually remove it). I don't
really see a problem if Daniel wants to commit this, set up a few
buildfarm animals, and fix stuff when it breaks. If he does that, I
don't see that we're losing anything. But, if he commits it in the
hope that other people are going to step up to do the maintenance
work, maybe that's not going to happen, or at least not without
grumbling. I'm not objecting to this being committed in the sense that
I don't ever want to see it in the tree, but I'm also not volunteering
to maintain it.

As a philosophical matter, I don't think it's great for us - or the
Internet in general - to be too dependent on OpenSSL. Software
monocultures are not great, and OpenSSL has near-constant security
updates and mediocre documentation. Now, maybe anything else we
support will end up having similar issues, or worse. But if we and
other projects are never willing to support anything but OpenSSL, then
there will never be viable alternatives to OpenSSL, because a library
that isn't actually used by the software you care about is of no use.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Support for NSS as a libpq TLS backend

From
Bruce Momjian
Date:
On Thu, Feb  3, 2022 at 01:42:53PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Tue, Feb  1, 2022 at 01:52:09PM -0800, Andres Freund wrote:
> > > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I
> > > think the upstream source.
> > > 
> > > A project without even a bare-minimal README at the root does have a "internal
> > > only" feel to it...
> > 
> > I agree --- it is a library --- if they don't feel the need to publish
> > the API, it seems to mean they want to maintain the ability to change it
> > at any time, and therefore it is inappropriate for other software to
> > rely on that API.
> 
> This is really not a reasonable representation of how this library has
> been maintained historically nor is there any reason to think that their
> policy regarding the API has changed recently.  They do have a
> documented API and that hasn't changed- it's just that it's not easily
> available in web-page form any longer and that's due to something
> independent of the library maintenance.  They've also done a good job

So they have always been bad at providing an API, not just now, or that
their web content disappeared and they haven't fixed it, for how long? 
I guess that is better than the v8 case, but not much.  Is posting web
content really that hard for them?

> with maintaining the API as one would expect from a library and so this
> really isn't a reason to avoid using it.  If there's actual specific
> examples of the API not being well maintained and causing issues then
> please point to them and we can discuss if that is a reason to consider
> not depending on NSS/NSPR.

I have no specifics.

> > This is not the same as Postgres extensions needing to read the Postgres
> > source code --- they are an important but edge use case and we never saw
> > the need to standardize or publish the internal functions that must be
> > studied and adjusted possibly for major releases.
> 
> I agree that extensions and public libraries aren't entirely the same
> but I don't think it's all that unreasonable for developers that are
> using a library to look at the source code for that library when
> developing against it, that's certainly something I've done for a
> number of different libraries.

Wow, you have a much higher tolerance than I do.  How do you even know
which functions are the public API if you have to look at the source
code?

> > This kind of feels like the Chrome JavaScript code that used to be able
> > to be build separately for PL/v8, but has gotten much harder to do in
> > the past few years.
> 
> This isn't at all like that case, where the maintainers made a very
> clear and intentional choice to make it quite difficult for packagers to
> pull v8 out to package it.  Nothing like that has happened with NSS and
> there isn't any reason to think that it will based on what the
> maintainers have said and what they've done across the many years that
> NSS has been around.

As far as I know, the v8 developers didn't say anything, they just
started moving things around to make it easier for them and harder for
packagers --- and they didn't care.

I frankly think we need some public statement from the NSS developers
before moving forward --- there are just too many red flags here, and
once we support it, it will be hard to remove support for it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Support for NSS as a libpq TLS backend

From
Bruce Momjian
Date:
On Thu, Feb  3, 2022 at 02:33:37PM -0500, Robert Haas wrote:
> As a philosophical matter, I don't think it's great for us - or the
> Internet in general - to be too dependent on OpenSSL. Software
> monocultures are not great, and OpenSSL has near-constant security
> updates and mediocre documentation. Now, maybe anything else we

I don't think it is fair to be criticizing OpenSSL for its mediocre
documentation when the alternative being considered, NSS, has no public
documentation.  Can the source-code-defined NSS documentation be
considered better than the mediocre OpenSSL public documentation?

For the record, I do like the idea of adding NSS, but I am concerned
about its long-term maintenance, we you explained.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Support for NSS as a libpq TLS backend

From
Robert Haas
Date:
On Fri, Feb 4, 2022 at 1:22 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Feb  3, 2022 at 02:33:37PM -0500, Robert Haas wrote:
> > As a philosophical matter, I don't think it's great for us - or the
> > Internet in general - to be too dependent on OpenSSL. Software
> > monocultures are not great, and OpenSSL has near-constant security
> > updates and mediocre documentation. Now, maybe anything else we
>
> I don't think it is fair to be criticizing OpenSSL for its mediocre
> documentation when the alternative being considered, NSS, has no public
> documentation.  Can the source-code-defined NSS documentation be
> considered better than the mediocre OpenSSL public documentation?

I mean, I think it's fair to say that my experiences with trying to
use the OpenSSL documentation have been poor. Admittedly it's been a
few years now so maybe it's gotten better, but my experience was what
it was. In one case, the function I needed wasn't documented at all,
and I had to read the C code, which was weirdly-formatted and had no
comments. That wasn't fun, and knowing that NSS could be an even worse
experience doesn't retroactively turn that into a good one.

> For the record, I do like the idea of adding NSS, but I am concerned
> about its long-term maintenance, we you explained.

It sounds like we come down in about the same place here, in the end.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Support for NSS as a libpq TLS backend

From
Bruce Momjian
Date:
On Fri, Feb  4, 2022 at 01:33:00PM -0500, Robert Haas wrote:
> > I don't think it is fair to be criticizing OpenSSL for its mediocre
> > documentation when the alternative being considered, NSS, has no public
> > documentation.  Can the source-code-defined NSS documentation be
> > considered better than the mediocre OpenSSL public documentation?
> 
> I mean, I think it's fair to say that my experiences with trying to
> use the OpenSSL documentation have been poor. Admittedly it's been a
> few years now so maybe it's gotten better, but my experience was what
> it was. In one case, the function I needed wasn't documented at all,
> and I had to read the C code, which was weirdly-formatted and had no
> comments. That wasn't fun, and knowing that NSS could be an even worse
> experience doesn't retroactively turn that into a good one.

Oh, yeah, the OpenSSL documentation is verifiably mediocre.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 4 Feb 2022, at 19:22, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Feb  3, 2022 at 02:33:37PM -0500, Robert Haas wrote:
>> As a philosophical matter, I don't think it's great for us - or the
>> Internet in general - to be too dependent on OpenSSL. Software
>> monocultures are not great, and OpenSSL has near-constant security
>> updates and mediocre documentation. Now, maybe anything else we
>
> I don't think it is fair to be criticizing OpenSSL for its mediocre
> documentation when the alternative being considered, NSS, has no public
> documentation.  Can the source-code-defined NSS documentation..

Not that it will shift the needle either way, but to give credit where credit
is due:

Both NSS and NSPR are documented, and have been since they were published by
Netscape in 1998.  The documentation does lack things, and some parts are quite
out of date.  That's true and undisputed even by the projects themselves who
state this: "It currently is very deprecated and likely incorrect or broken in
many places".

The recent issue was that Mozilla decided to remove all 3rd party projects (why
they consider their own code 3rd party is a mystery to me) from their MDN site,
and so NSS and NSPR were deleted with no replacement.  This was said to be
worked on but didn't happen and no docs were imported into the tree.  When
Daniel from curl (the other one, not I) complained, this caused enough momentum
to get this work going and it's now been "done".

   NSS: https://firefox-source-docs.mozilla.org/security/nss/
   NSPR: https://firefox-source-docs.mozilla.org/nspr/

I am writing done above in quotes, since the documentation also needs to be
updated, completed, rewritten, organized etc etc.  The above is an import of
what was found, and is in a fairly poor state.  Unfortunately, it's still not
in the tree where I personally believe documentation stands the best chance of
being kept up to date.  The NSPR documentation is probably the best of the two,
but it's also much less of a moving target.

It is true that the documentation is poor and currently in bad shape with lots
of broken links and heavily disorganized etc.  It's also true that I managed to
implement full libpq support without any crystal ball or help from the NSS
folks.  The latter doesn't mean we can brush documentation concerns aside, but
let's be fair in our criticism.

> ..be considered better than the mediocre OpenSSL public documentation?

OpenSSL has gotten a lot better in recent years, it's still not great or where
I would like it to be, but a lot better.

--
Daniel Gustafsson        https://vmware.com/




Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Feb  3, 2022 at 01:42:53PM -0500, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Tue, Feb  1, 2022 at 01:52:09PM -0800, Andres Freund wrote:
> > > > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I
> > > > think the upstream source.
> > > >
> > > > A project without even a bare-minimal README at the root does have a "internal
> > > > only" feel to it...
> > >
> > > I agree --- it is a library --- if they don't feel the need to publish
> > > the API, it seems to mean they want to maintain the ability to change it
> > > at any time, and therefore it is inappropriate for other software to
> > > rely on that API.
> >
> > This is really not a reasonable representation of how this library has
> > been maintained historically nor is there any reason to think that their
> > policy regarding the API has changed recently.  They do have a
> > documented API and that hasn't changed- it's just that it's not easily
> > available in web-page form any longer and that's due to something
> > independent of the library maintenance.  They've also done a good job
>
> So they have always been bad at providing an API, not just now, or that
> their web content disappeared and they haven't fixed it, for how long?
> I guess that is better than the v8 case, but not much.  Is posting web
> content really that hard for them?

To be clear, *part* of the web-based documentation disappeared and
hasn't been replaced yet.  The NSS-specific pieces are actually still
available, it's the NSPR (which is a lower level library used by NSS)
part that was removed from MDN and hasn't been brought back yet, but
which does still exist as comments in the source of the library.

> > with maintaining the API as one would expect from a library and so this
> > really isn't a reason to avoid using it.  If there's actual specific
> > examples of the API not being well maintained and causing issues then
> > please point to them and we can discuss if that is a reason to consider
> > not depending on NSS/NSPR.
>
> I have no specifics.

Then I don't understand where the claim you made that "it seems to mean
they want to maintain the ability to change it at any time" has any
merit.

> > > This is not the same as Postgres extensions needing to read the Postgres
> > > source code --- they are an important but edge use case and we never saw
> > > the need to standardize or publish the internal functions that must be
> > > studied and adjusted possibly for major releases.
> >
> > I agree that extensions and public libraries aren't entirely the same
> > but I don't think it's all that unreasonable for developers that are
> > using a library to look at the source code for that library when
> > developing against it, that's certainly something I've done for a
> > number of different libraries.
>
> Wow, you have a much higher tolerance than I do.  How do you even know
> which functions are the public API if you have to look at the source
> code?

Because... it's documented?  They have public (and private) .h files in
the source tree and the function declarations have large comment blocks
above them which provide a documented API.  I'm not talking about having
to decipher from the actual C code what's going on but just reading the
function header comment that provides the documentation of the API for
each of the functions, and there's larger blocks of comments at the top
of those .h files which provide more insight into how the functions in
that particular part of the system work and interact with each other.
Maybe those things would be better as separate README files like what we
do, but maybe not, and I don't see it as a huge failing that they chose
to use a big comment block at the top of their .h files to explain
things rather than separate README files.

Reading comments in code that I'm calling out to, even if it's in
another library (or another part of PG where the README isn't helping me
enough, or due to there not being a README for that particular thing)
almost seems typical, to me anyway.  Perhaps the exception being when
there are good man pages.

> I frankly think we need some public statement from the NSS developers
> before moving forward --- there are just too many red flags here, and
> once we support it, it will be hard to remove support for it.

They have made public statements regarding this and it's been linked to
already in this thread:

https://github.com/mdn/content/issues/12471

where they explicitly state that the project is alive and maintained,
further, it now now also links to this:

https://bugzilla.mozilla.org/show_bug.cgi?id=1753127

Which certainly seems to have had a fair bit of action taken on it.

Indeed, it looks like they've got a lot of the docs up and online now,
including the documentation for the function that started much of this:

https://firefox-source-docs.mozilla.org/nspr/reference/pr_recv.html#pr-recv

Looks like they're still working out some of the kinks between the NSS
pages and having links from them over to the NSPR pages, but a whole lot
of progress sure looks like it's been made in pretty short order here.

Definitely isn't looking unmaintained to me.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Feb  3, 2022 at 02:33:37PM -0500, Robert Haas wrote:
> > As a philosophical matter, I don't think it's great for us - or the
> > Internet in general - to be too dependent on OpenSSL. Software
> > monocultures are not great, and OpenSSL has near-constant security
> > updates and mediocre documentation. Now, maybe anything else we
>
> I don't think it is fair to be criticizing OpenSSL for its mediocre
> documentation when the alternative being considered, NSS, has no public
> documentation.  Can the source-code-defined NSS documentation be
> considered better than the mediocre OpenSSL public documentation?

This simply isn't the case and wasn't even the case at the start of this
thread.  The NSPR documentation was only available through the header
files due to it being taken down from MDN.  The NSS documentation was
actually still there.  Looks like they've now (mostly) fixed the lack of
NSPR documentation, as noted in the recent email that I sent.

> For the record, I do like the idea of adding NSS, but I am concerned
> about its long-term maintenance, we you explained.

They've come out and explicitly said that the project is active and
maintained, and they've been doing regular releases.  I don't think
there's really any reason to think that it's not being maintained at
this point.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> I am writing done above in quotes, since the documentation also needs to be
> updated, completed, rewritten, organized etc etc.  The above is an import of
> what was found, and is in a fairly poor state.  Unfortunately, it's still not
> in the tree where I personally believe documentation stands the best chance of
> being kept up to date.  The NSPR documentation is probably the best of the two,
> but it's also much less of a moving target.

I wonder about the 'not in tree' bit since it is in the header files,
certainly for NSPR which I've been poking at due to this discussion.  I
had hoped that they were generating the documentation on the webpage
from what's in the header files, is that not the case then?  Which is
more accurate?  If it's a simple matter of spending time going through
what's in the tree and making sure what's online matches that, I suspect
we could find some folks with time to work on helping them there.

If the in-tree stuff isn't accurate then that's a bigger problem, of
course.

> It is true that the documentation is poor and currently in bad shape with lots
> of broken links and heavily disorganized etc.  It's also true that I managed to
> implement full libpq support without any crystal ball or help from the NSS
> folks.  The latter doesn't mean we can brush documentation concerns aside, but
> let's be fair in our criticism.

Agreed.

Thanks,

Stephen

Attachment

Re: Support for NSS as a libpq TLS backend

From
Daniel Gustafsson
Date:
> On 4 Feb 2022, at 21:03, Stephen Frost <sfrost@snowman.net> wrote:

> I wonder about the 'not in tree' bit since it is in the header files,
> certainly for NSPR which I've been poking at due to this discussion.

What I meant was that the documentation on the website isn't published from
documentation source code (in whichever format) residing in the tree.

That being said, I take that back since I just now in a git pull found that
they had done just that 6 days ago.  It's just as messy and incomplete as what
is currently on the web, important API's like NSS_InitContext are still not
even mentioned more than in a release note, but I think it stands a better
chance of success than before.

> I had hoped that they were generating the documentation on the webpage from
> what's in the header files, is that not the case then?


Not from what I can tell no.

--
Daniel Gustafsson        https://vmware.com/