Thread: Support for NSS as a libpq TLS backend
The attached patch implements NSS (Network Security Services) [0] with the required NSPR runtime [1] as a TLS backend for PostgreSQL. While all sslmodes are implemented and work for the most part, the patch is *not* ready yet but I wanted to show progress early so that anyone interested in this can help out with testing and maybe even hacking. Why NSS? Well. It shares no lineage with OpenSSL making it not just an alternative by fork but a 100% alternative. It's also actively maintained, is readily available on many platforms where PostgreSQL is popular and has a FIPS mode which doesn't require an EOL'd library. And finally, I was asked nicely with the promise of a free beverage, an incentive as good as any. Differences with OpenSSL ------------------------ NSS does not use certificates and keys on the filesystem, it instead uses a certificate database in which all certificates, keys and CRL's are loaded. A set of tools are provided to work with the database, like: certutil, crlutil, pk12util etc. We could support plain PEM files as well, and load them into a database ourselves but so far I've opted for just using what is already in the database. This does mean that new GUCs are needed to identify the database. I've mostly repurposed the existing ones for cert/key/crl, but had to invent a new one for the database. Maybe there should be an entirely new set? This needs to be discussed with not only NSS in mind but for additional as-of-yet unknown backends we might get (SChannel comes to mind). NSS also supports partial chain validation per default (as do many other TLS libraries) where OpenSSL does not. I haven't done anything about that just yet, thus there is a failing test as a reminder to address it. The documentation of NSS/NSPR is unfortunately quite poor and often times outdated or simply nonexisting. Cloning the repo and reading the source code is the only option for parts of the API. Featurewise there might be other things we can make use of in NSS which doesn't exist in OpenSSL, but for now I've tried to keep them aligned. Known Bugs and Limitations (in this version of the patch) --------------------------------------------------------- The frontend doesn't attempt to verify whether the specified CRL exists in the database or not. This can be done with pretty much the same code as in the backend, except that we don't have the client side certificate loaded so we either need to read it back from the database, or parse a list of all CRLs (which would save us from having the cert in local memory which generally is a good thing to avoid). pgtls_read is calling PR_Recv which works fine for communicating with an NSS backend cluster, but hangs waiting for IO when communicating with an OpenSSL backend cluster. Using PR_Read reverses the situation. This is probably a simple bug but I haven't had time to track it down yet. The below shifts between the two for debugging. - nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT); + nread = PR_Read(conn->pr_fd, ptr, len); Passphrase handling in the backend is broken, more on that under TODO. There are a few failing tests and a few skipped ones for now, but the majority of the tests pass. Testing ------- In order for the TAP framework to be able to handle backends with different characteristics I've broken up SSLServer.pm into a set of modules: SSL::Server SSL::Backend::NSS SSL::Backend::OpenSSL The SSL tests import SSL::Server which in turn imports the appropriate backend module in order to perform backend specific setup tasks. The backend used should be transparent for the TAP code when it comes to switching server certs etc. So far I've used foo|bar in the matching regex to provide alternative output, and SKIP blocks for tests that don't apply. There might be neater ways to achieve this, but I was trying to avoid having separate test files for the different backends. The certificate databases can be created with a new nssfiles make target in src/test/ssl, which use the existing files (and also depend on OpenSSL which I don't think is a problematic dependency for development environments). To keep it simple I've named the certificates in the NSS database after the filenames, this isn't really NSS best-practices but it makes for an easier time reading the tests IMO. If this direction is of interest, extracting into to a separate patch for just setting up the modules and implementing OpenSSL without a new backend is probably the next step. TODO ---- This patch is a work in progress, and there is work left to do, below is a dump of what is left to fix before this can be considered a full implementation for review. Most of these items have more documentation in the code comments. * The split between init and open needs to be revisited, especially in frontend where we have a bit more freedom. It remains to be seen if we can do better in the backend part. * Documentation, it's currently not even started * Windows support. I've hacked mostly using Debian and have tested versions of the patch on macOS, but not Windows so far. * Figure out how to handle cipher configuration. Getting a set of ciphers that result in a useable socket isn't as easy as with OpenSSL, and policies seem much more preferred. At the very least this needs to be solidly documented. * The rules in src/test/ssl/Makefile for generating certificate databases can probably be generalized into a smaller set of rules based on wildcards. * The password callback on server-side won't be invoked at server start due to init happening in be_tls_open, so something needs to be figured out there. Maybe attempt to open the database with a throw-away context in init just to invoke the password callback? * Identify code duplicated between frontend and backend and try to generalize. * Make sure the handling the error codes correctly in the certificate and auth callbacks to properly handle self-signed certs etc. * Tidy up the tests which are partially hardwired for NSS now to make sure there are no regressions for OpenSSL. * All the code using OpenSSL which isn't the libpq communications parts, like pgcrypto, strong_random, sha2, SCRAM et.al * Review language in code comments and run pgindent et.al * Settle on a minimum required version. I've been using NSS 3.42 and NSPR 4.20 simply since they were the packages Debian wanted to install for me, but I'm quite convinced that we can go a bit lower (featurewise we can go much lower but there are bugfixes in recent versions that we might want to include). Anything lower than a version supporting TLSv1.3 seems like an obvious no-no. I'd be surprised if this is all, but that's at least a start. There isn't really a playbook on how to add a new TLS backend, but I'm hoping to be able to summarize the required bits and pieces in README.SSL once this is a bit closer to completion. My plan is to keep hacking at this to have it reviewable for the 14 cycle, so if anyone has an interest in NSS, then I would love to hear feedback on how it works (and doesn't work). The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat abstraction which IMO leaks backend implementation details. This needs to go on it's own thread, but since 0001 fails without it I've included it here for simplicity sake for now. cheers ./daniel [0] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS [1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR
Attachment
> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote: > The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat > abstraction which IMO leaks backend implementation details. This needs to go > on it's own thread, but since 0001 fails without it I've included it here for > simplicity sake for now. The attached 0001 and 0002 are the same patchseries as before, but with the OpenSSL test module fixed and a rebase on top of the current master. cheers ./daniel
Attachment
> On 25 Jun 2020, at 17:39, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote: > >> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat >> abstraction which IMO leaks backend implementation details. This needs to go >> on it's own thread, but since 0001 fails without it I've included it here for >> simplicity sake for now. > > The attached 0001 and 0002 are the same patchseries as before, but with the > OpenSSL test module fixed and a rebase on top of the current master. Another rebase to resolve conflicts with the recent fixes in the SSL tests, as well as some minor cleanup. cheers ./daniel
Attachment
On Fri, Jul 3, 2020 at 11:51 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 25 Jun 2020, at 17:39, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote: > >> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat > >> abstraction which IMO leaks backend implementation details. This needs to go > >> on it's own thread, but since 0001 fails without it I've included it here for > >> simplicity sake for now. > > > > The attached 0001 and 0002 are the same patchseries as before, but with the > > OpenSSL test module fixed and a rebase on top of the current master. > > Another rebase to resolve conflicts with the recent fixes in the SSL tests, as > well as some minor cleanup. Hi Daniel, Thanks for blazing the trail for other implementations to coexist in the tree. I see that cURL (another project Daniel works on) supports a lot of TLS implementations[1]. I recognise 4 other library names from that table as having appeared on this mailing list as candidates for PostgreSQL support complete with WIP patches, including another one from you (Apple Secure Transport). I don't have strong views on how many and which libraries we should support, but I was curious how many packages depend on libss1.1, libgnutls30 and libnss3 in the Debian package repos in my sources.list, and I came up with OpenSSL = 820, GnuTLS = 342, and NSS = 87. I guess Solution.pm needs at least USE_NSS => undef for this not to break the build on Windows. Obviously cfbot is useless for testing this code, since its build script does --with-openssl and you need --with-nss, but it still shows us one thing: with your patch, a --with-openssl build is apparently broken: /001_ssltests.pl .. 1/93 Bailout called. Further testing stopped: system pg_ctl failed There are some weird configure-related hunks in the patch: + -runstatedir | --runstatedir | --runstatedi | --runstated \ ...[more stuff like that]... -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) I see the same when I use Debian's autoconf, but not FreeBSD's or MacPorts', despite all being version 2.69. That seems to be due to non-upstreamed changes added by the Debian maintainers (I see the off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz). I think you need to build a stock autoconf 2.69 or run autoconf on a non-Debian system. I installed libnss3-dev on my Debian box and then configure had trouble locating and understanding <ssl.h>, until I added --with-includes=/usr/include/nss:/usr/include/nspr. I suspect this is supposed to be done with pkg-config nss --cflags somewhere in configure (or alternatively nss-config --cflags, nspr-config --cflags, I don't know, but we're using pkg-config for other stuff). I installed the Debian package libnss3-tools (for certutil) and then, in src/test/ssl, I ran make nssfiles (I guess that should be automatic?), and make check, and I got this far: Test Summary Report ------------------- t/001_ssltests.pl (Wstat: 3584 Tests: 93 Failed: 14) Failed tests: 14, 16, 18-20, 24, 27-28, 54-55, 78-80 91 Non-zero exit status: 14 You mentioned some were failing in this WIP -- are those results you expect? [1] https://curl.haxx.se/docs/ssl-compared.html
> On 10 Jul 2020, at 07:10, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Jul 3, 2020 at 11:51 PM Daniel Gustafsson <daniel@yesql.se> wrote: >>> On 25 Jun 2020, at 17:39, Daniel Gustafsson <daniel@yesql.se> wrote: >>>> On 15 May 2020, at 22:46, Daniel Gustafsson <daniel@yesql.se> wrote: >>>> The 0001 patch contains the full NSS support, and 0002 is a fix for the pgstat >>>> abstraction which IMO leaks backend implementation details. This needs to go >>>> on it's own thread, but since 0001 fails without it I've included it here for >>>> simplicity sake for now. >>> >>> The attached 0001 and 0002 are the same patchseries as before, but with the >>> OpenSSL test module fixed and a rebase on top of the current master. >> >> Another rebase to resolve conflicts with the recent fixes in the SSL tests, as >> well as some minor cleanup. > > Hi Daniel, > > Thanks for blazing the trail for other implementations to coexist in > the tree. I see that cURL (another project Daniel works on) > supports a lot of TLS implementations[1]. The list on that URL is also just a selection, the total count is 10 (not counting OpenSSL forks) IIRC, after axing support for a few lately. OpenSSL clearly has a large mindshare but the gist of it is that there exist quite a few alternatives each with their own strengths. > I recognise 4 other library > names from that table as having appeared on this mailing list as > candidates for PostgreSQL support complete with WIP patches, including > another one from you (Apple Secure Transport). I don't have strong > views on how many and which libraries we should support, I think it's key to keep in mind *why* it's relevant to provide options in the first place, after all, as they must be 100% interoperable one can easily argue for a single one being enough. We need to to look at what they offer users on top of just a TLS connection, like: managed certificate storage like for example macOS Keychains, FIPS certification, good platform availability and/or OS integration etc. If all a library offers is "not being OpenSSL" then it's not clear that we're adding much value by spending the cycles to support it. My personal opinion is that we should keep it pretty limited, not least to lessen the burden of testing and during feature development. Supporting a new library comes with requirements on both the CFBot as well as the buildfarm, not to mention on developers who dabble in that area of the code. The goal should IMHO be to make it trivial for every postgres installation to use TLS regardless of platform and experience level with the person installing it. The situation is a bit different for curl where we have as a goal to provide enough alternatives such that every platform can have a libcurl/curl more or less regardless of what it contains. As a consequence, we have around 80 CI jobs to test each pull request to provide ample coverage. Being a kitchen- sink is really hard work. > but I was > curious how many packages depend on libss1.1, libgnutls30 and libnss3 > in the Debian package repos in my sources.list, and I came up with > OpenSSL = 820, GnuTLS = 342, and NSS = 87. I don't see a lot of excitement over GnuTLS lately, but Debian shipping it due to (I believe) licensing concerns with OpenSSL does help it along. In my experience, platforms with GnuTLS easily available also have OpenSSL easily available. > I guess Solution.pm needs at least USE_NSS => undef for this not to > break the build on Windows. Thanks, I'll fix (I admittedly haven't tried this at all on Windows yet). > Obviously cfbot is useless for testing this code, since its build > script does --with-openssl and you need --with-nss, Right, this is a CFBot problem with any patch that require specific autoconf flags to be excercised. I wonder if we can make something when we do CF app integration which can inject flags to a Travis pipeline in a safe manner? > but it still shows > us one thing: with your patch, a --with-openssl build is apparently > broken: > > /001_ssltests.pl .. 1/93 Bailout called. Further testing stopped: > system pg_ctl failed Humm .. I hate to say "it worked on my machine" but it did, but my TLS environment is hardly standard. Sorry for posting breakage, most likely this is a bug in the new test module structure that the patch introduce in order to support multiple backends for src/tests/ssl. I'll fix. > There are some weird configure-related hunks in the patch: > > + -runstatedir | --runstatedir | --runstatedi | --runstated \ > ...[more stuff like that]... > > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) > > I see the same when I use Debian's autoconf, but not FreeBSD's or > MacPorts', despite all being version 2.69. That seems to be due to > non-upstreamed changes added by the Debian maintainers (I see the > off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz). > I think you need to build a stock autoconf 2.69 or run autoconf on a > non-Debian system. Sigh, yes that's a Debianism that slipped through, again sorry about that. > I installed libnss3-dev on my Debian box and then configure had > trouble locating and understanding <ssl.h>, until I added > --with-includes=/usr/include/nss:/usr/include/nspr. I suspect this is > supposed to be done with pkg-config nss --cflags somewhere in > configure (or alternatively nss-config --cflags, nspr-config --cflags, > I don't know, but we're using pkg-config for other stuff). Yeah, that's a good point, I should fix that. Having a metric ton of TLS libraries in various versions around in my environment I've been Stockholm Syndromed to --with-includes to the point where I didn't even think to run without it. It should clearly be as easy to use as OpenSSL wrt autoconf. > I installed the Debian package libnss3-tools (for certutil) and then, > in src/test/ssl, I ran make nssfiles (I guess that should be > automatic?) Yes, it needs to run automatically for NSS builds on make check. > , and make check, and I got this far: > > Test Summary Report > ------------------- > t/001_ssltests.pl (Wstat: 3584 Tests: 93 Failed: 14) > Failed tests: 14, 16, 18-20, 24, 27-28, 54-55, 78-80 > 91 > Non-zero exit status: 14 > > You mentioned some were failing in this WIP -- are those results you expect? I'm not on my dev box at the moment, and I don't remember off the cuff, but that sounds higher than I remember. I wonder if I fat-fingered the regexes in the last version? Thanks for taking a look at the patch, I'll fix up the reported issues Monday at the latest. cheers ./daniel
On Fri, Jul 10, 2020 at 5:10 PM Thomas Munro <thomas.munro@gmail.com> wrote: > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) > > I see the same when I use Debian's autoconf, but not FreeBSD's or > MacPorts', despite all being version 2.69. That seems to be due to > non-upstreamed changes added by the Debian maintainers (I see the > off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz). By the way, Dagfinn mentioned that these changes were in fact upstreamed, and happened to be beta-released today[1], and are due out in ~3 months as 2.70. That'll be something for us to coordinate a bit further down the road. [1] https://lists.gnu.org/archive/html/autoconf/2020-07/msg00006.html
On 5/15/20 4:46 PM, Daniel Gustafsson wrote: > > My plan is to keep hacking at this to have it reviewable for the 14 cycle, so > if anyone has an interest in NSS, then I would love to hear feedback on how it > works (and doesn't work). I'll be happy to help, particularly with Windows support and with some of the callback stuff I've had a hand in. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 12 Jul 2020, at 00:03, Daniel Gustafsson <daniel@yesql.se> wrote: > Thanks for taking a look at the patch, I'll fix up the reported issues Monday > at the latest. A bit of life intervened, but attached is a new version of the patch which should work for OpenSSL builds, and have the other issues addressed as well. I took the opportunity to clean up the NSS tests to be more like the OpenSSL ones to lessen the impact on the TAP testcases. On my Debian box, using the standard NSS and NSPR packages, I get 6 failures which are essentially all around CRL handling. I'm going to circle back and look at what is missing there. This version also removes the required patch for statistics reporting as that has been committed in 6a5c750f3f72899f4f982f921d5bf5665f55651e. cheers ./daniel
Attachment
> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > > On 5/15/20 4:46 PM, Daniel Gustafsson wrote: >> >> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so >> if anyone has an interest in NSS, then I would love to hear feedback on how it >> works (and doesn't work). > > I'll be happy to help, particularly with Windows support and with some > of the callback stuff I've had a hand in. That would be fantastic, thanks! The password callback handling is still a TODO so feel free to take a stab at that since you have a lot of context on there. For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this thread, but that was done blind as I've done no testing on Windows yet. cheers ./daniel
> On 16 Jul 2020, at 00:16, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 12 Jul 2020, at 00:03, Daniel Gustafsson <daniel@yesql.se> wrote: > >> Thanks for taking a look at the patch, I'll fix up the reported issues Monday >> at the latest. > > A bit of life intervened, but attached is a new version of the patch which > should work for OpenSSL builds, and have the other issues addressed as well. I > took the opportunity to clean up the NSS tests to be more like the OpenSSL ones > to lessen the impact on the TAP testcases. On my Debian box, using the > standard NSS and NSPR packages, I get 6 failures which are essentially all > around CRL handling. I'm going to circle back and look at what is missing there. Taking a look at this, the issue was that I had fat-fingered the Makefile rules for generating the NSS databases. This is admittedly very messy at this point, partly due to trying to mimick OpenSSL filepaths/names to minimize the impact on tests and to keep OpenSSL/NSS tests as "optically" equivalent as I could. With this, I have one failing test ("intermediate client certificate is provided by client") which I've left failing since I believe the case should be supported by NSS. The issue is most likely that I havent figured out the right certinfo incantation to make it so (Mozilla hasn't strained themselves when writing documentation for this toolchain, or any part of NSS for that matter). The failing test when running with OpenSSL also remains, the issue is that the very first test for incorrect key passphrase fails, even though the server is behaving exactly as it should. Something in the test suite hackery breaks for that test but I've been unable to pin down what it is, any help on would be greatly appreciated. This version adds support for sslinfo on NSS for most the functions. In the process I realized that sslinfo never got the memo about SSL support being abstracted behind an API, so I went and did that as well. This part of the patch should perhaps be broken out into a separate patch/thread in case it's deemed interesting regardless of the evetual conclusion on this patch. Doing this removed a bit of duplication with the backend code, and some errorhandling moved to be-secure-openssl.c (originally added in d94c36a45ab45). As the original commit message states, they're mostly code hygiene with belts and suspenders, but if we deemed them valuable enough for a contrib module ISTM they should go into the backend as well. Adding a testcase for sslinfo is a TODO. Support pg_strong_random, sha2 and pgcrypto has been started, but it's less trivial as NSS/NSPR requires a lot more initialization and state than OpenSSL, so it needs a bit more thought. I've also done a rebase over todays HEAD, a pgindent pass and some cleanup here and there. cheers ./daniel
Attachment
On 7/15/20 6:18 PM, Daniel Gustafsson wrote: >> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >> >> On 5/15/20 4:46 PM, Daniel Gustafsson wrote: >>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so >>> if anyone has an interest in NSS, then I would love to hear feedback on how it >>> works (and doesn't work). >> I'll be happy to help, particularly with Windows support and with some >> of the callback stuff I've had a hand in. > That would be fantastic, thanks! The password callback handling is still a > TODO so feel free to take a stab at that since you have a lot of context on > there. > > For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this > thread, but that was done blind as I've done no testing on Windows yet. > OK, here is an update of your patch that compiles and runs against NSS under Windows (VS2019). In addition to some work that was missing in src/tools/msvc, I had to make a few adjustments, including: * strtok_r() isn't available on Windows. We don't use it elsewhere in the postgres code, and it seemed unnecessary to have reentrant calls here, so I just replaced it with equivalent strtok() calls. * We were missing an NSS implementation of pgtls_verify_peer_name_matches_certificate_guts(). I supplied a dummy that's enough to get it building cleanly, but that needs to be filled in properly. There is still plenty of work to go, but this seemed a sufficient milestone to report progress on. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 7/31/20 4:44 PM, Andrew Dunstan wrote: > On 7/15/20 6:18 PM, Daniel Gustafsson wrote: >>> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >>> >>> On 5/15/20 4:46 PM, Daniel Gustafsson wrote: >>>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so >>>> if anyone has an interest in NSS, then I would love to hear feedback on how it >>>> works (and doesn't work). >>> I'll be happy to help, particularly with Windows support and with some >>> of the callback stuff I've had a hand in. >> That would be fantastic, thanks! The password callback handling is still a >> TODO so feel free to take a stab at that since you have a lot of context on >> there. >> >> For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this >> thread, but that was done blind as I've done no testing on Windows yet. >> > > OK, here is an update of your patch that compiles and runs against NSS > under Windows (VS2019). > > > In addition to some work that was missing in src/tools/msvc, I had to > make a few adjustments, including: > > > * strtok_r() isn't available on Windows. We don't use it elsewhere in > the postgres code, and it seemed unnecessary to have reentrant calls > here, so I just replaced it with equivalent strtok() calls. > * We were missing an NSS implementation of > pgtls_verify_peer_name_matches_certificate_guts(). I supplied a > dummy that's enough to get it building cleanly, but that needs to be > filled in properly. > > > There is still plenty of work to go, but this seemed a sufficient > milestone to report progress on. > > OK, this version contains pre-generated nss files, and passes a full buildfarm run including the ssl test module, with both openssl and NSS. That should keep the cfbot happy :-) cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 8/3/20 12:46 PM, Andrew Dunstan wrote: > On 7/31/20 4:44 PM, Andrew Dunstan wrote: >> On 7/15/20 6:18 PM, Daniel Gustafsson wrote: >>>> On 15 Jul 2020, at 20:35, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >>>> >>>> On 5/15/20 4:46 PM, Daniel Gustafsson wrote: >>>>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so >>>>> if anyone has an interest in NSS, then I would love to hear feedback on how it >>>>> works (and doesn't work). >>>> I'll be happy to help, particularly with Windows support and with some >>>> of the callback stuff I've had a hand in. >>> That would be fantastic, thanks! The password callback handling is still a >>> TODO so feel free to take a stab at that since you have a lot of context on >>> there. >>> >>> For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this >>> thread, but that was done blind as I've done no testing on Windows yet. >>> >> OK, here is an update of your patch that compiles and runs against NSS >> under Windows (VS2019). >> >> >> In addition to some work that was missing in src/tools/msvc, I had to >> make a few adjustments, including: >> >> >> * strtok_r() isn't available on Windows. We don't use it elsewhere in >> the postgres code, and it seemed unnecessary to have reentrant calls >> here, so I just replaced it with equivalent strtok() calls. >> * We were missing an NSS implementation of >> pgtls_verify_peer_name_matches_certificate_guts(). I supplied a >> dummy that's enough to get it building cleanly, but that needs to be >> filled in properly. >> >> >> There is still plenty of work to go, but this seemed a sufficient >> milestone to report progress on. >> >> > > OK, this version contains pre-generated nss files, and passes a full > buildfarm run including the ssl test module, with both openssl and NSS. > That should keep the cfbot happy :-) > > rebased on current master. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
> On 3 Aug 2020, at 21:18, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On 8/3/20 12:46 PM, Andrew Dunstan wrote: >> On 7/31/20 4:44 PM, Andrew Dunstan wrote: >>> OK, here is an update of your patch that compiles and runs against NSS >>> under Windows (VS2019). Out of curiosity since I'm not familiar with Windows, how hard/easy is it to install NSS for the purpose of a) hacking on postgres+NSS and b) using postgres with NSS as the backend? >>> * strtok_r() isn't available on Windows. We don't use it elsewhere in >>> the postgres code, and it seemed unnecessary to have reentrant calls >>> here, so I just replaced it with equivalent strtok() calls. Fair enough, that makes sense. >>> * We were missing an NSS implementation of >>> pgtls_verify_peer_name_matches_certificate_guts(). I supplied a >>> dummy that's enough to get it building cleanly, but that needs to be >>> filled in properly. Interesting, not sure how I could've missed that one. >> OK, this version contains pre-generated nss files, and passes a full >> buildfarm run including the ssl test module, with both openssl and NSS. >> That should keep the cfbot happy :-) Exciting, thanks a lot for helping out on this! I've started to look at the required documentation changes during vacation, will hopefully be able to post something soon. cheers ./daniel
On 8/4/20 5:42 PM, Daniel Gustafsson wrote: >> On 3 Aug 2020, at 21:18, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >> On 8/3/20 12:46 PM, Andrew Dunstan wrote: >>> On 7/31/20 4:44 PM, Andrew Dunstan wrote: >>>> OK, here is an update of your patch that compiles and runs against NSS >>>> under Windows (VS2019). > Out of curiosity since I'm not familiar with Windows, how hard/easy is it to > install NSS for the purpose of a) hacking on postgres+NSS and b) using postgres > with NSS as the backend? I've laid out the process at https://www.2ndquadrant.com/en/blog/nss-on-windows-for-postgresql-development/ >>> OK, this version contains pre-generated nss files, and passes a full >>> buildfarm run including the ssl test module, with both openssl and NSS. >>> That should keep the cfbot happy :-) > Exciting, thanks a lot for helping out on this! I've started to look at the > required documentation changes during vacation, will hopefully be able to post > something soon. > Good. Having got the tests running cleanly on Linux, I'm now going back to work on that for Windows. After that I'll look at the hook/callback stuff. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 5 Aug 2020, at 22:38, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > > On 8/4/20 5:42 PM, Daniel Gustafsson wrote: >>> On 3 Aug 2020, at 21:18, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >>> On 8/3/20 12:46 PM, Andrew Dunstan wrote: >>>> On 7/31/20 4:44 PM, Andrew Dunstan wrote: >>>>> OK, here is an update of your patch that compiles and runs against NSS >>>>> under Windows (VS2019). >> Out of curiosity since I'm not familiar with Windows, how hard/easy is it to >> install NSS for the purpose of a) hacking on postgres+NSS and b) using postgres >> with NSS as the backend? > > I've laid out the process at > https://www.2ndquadrant.com/en/blog/nss-on-windows-for-postgresql-development/ That's fantastic, thanks for putting that together. >>>> OK, this version contains pre-generated nss files, and passes a full >>>> buildfarm run including the ssl test module, with both openssl and NSS. >>>> That should keep the cfbot happy :-) Turns out the CFBot doesn't like the binary diffs. They are included in this version too but we should probably drop them again it seems. >> Exciting, thanks a lot for helping out on this! I've started to look at the >> required documentation changes during vacation, will hopefully be able to post >> something soon. > > Good. Having got the tests running cleanly on Linux, I'm now going back > to work on that for Windows. > > After that I'll look at the hook/callback stuff. The attached v9 contains mostly a first stab at getting some documentation going, it's far from completed but I'd rather share more frequently to not have local trees deviate too much in case you've had time to hack as well. I had a few documentation tweaks in the code too, but no real functionality change for now. The 0001 patch isn't strictly necessary but it seems reasonable to address the various ways OpenSSL was spelled out in the docs while at updating the SSL portions. It essentially ensures that markup around OpenSSL and SSL is used consistently. I didn't address the linelengths being too long in this patch to make review easier instead. cheers ./daniel
Attachment
> > >>>> OK, this version contains pre-generated nss files, and passes a full > >>>> buildfarm run including the ssl test module, with both openssl and NSS. > >>>> That should keep the cfbot happy :-) > > Turns out the CFBot doesn't like the binary diffs. They are included in this > version too but we should probably drop them again it seems. > I did ask Thomas about this, he was going to try to fix it. In principle we should want it to accept binary diffs exactly for this sort of thing. > The attached v9 contains mostly a first stab at getting some documentation > going, it's far from completed but I'd rather share more frequently to not have > local trees deviate too much in case you've had time to hack as well. I had a > few documentation tweaks in the code too, but no real functionality change for > now. > > The 0001 patch isn't strictly necessary but it seems reasonable to address the > various ways OpenSSL was spelled out in the docs while at updating the SSL > portions. It essentially ensures that markup around OpenSSL and SSL is used > consistently. I didn't address the linelengths being too long in this patch to > make review easier instead. > I'll take a look. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Sep 03, 2020 at 03:26:03PM -0400, Andrew Dunstan wrote: >> The 0001 patch isn't strictly necessary but it seems reasonable to address the >> various ways OpenSSL was spelled out in the docs while at updating the SSL >> portions. It essentially ensures that markup around OpenSSL and SSL is used >> consistently. I didn't address the linelengths being too long in this patch to >> make review easier instead. > > I'll take a look. Adding a <productname> markup around OpenSSL in the docs makes things consistent. +1. -- Michael
Attachment
On Fri, Sep 04, 2020 at 10:23:34AM +0900, Michael Paquier wrote: > Adding a <productname> markup around OpenSSL in the docs makes things > consistent. +1. I have looked at 0001, and applied it after fixing the line length (thanks for not doing it to ease my lookup), and I found one extra place in need of fix. Patch 0002 is failing to apply. -- Michael
Attachment
> On 17 Sep 2020, at 09:41, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 04, 2020 at 10:23:34AM +0900, Michael Paquier wrote: >> Adding a <productname> markup around OpenSSL in the docs makes things >> consistent. +1. > > I have looked at 0001, and applied it after fixing the line length > (thanks for not doing it to ease my lookup), and I found one extra > place in need of fix. Thanks! > Patch 0002 is failing to apply. Attached is a v10 rebased to apply on top of HEAD. cheers ./daniel
Attachment
On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote: > Attached is a v10 rebased to apply on top of HEAD. I am afraid that this needs a new rebase. The patch is failing to apply, per the CF bot. :/ -- Michael
Attachment
> On 29 Sep 2020, at 07:59, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote: >> Attached is a v10 rebased to apply on top of HEAD. > > I am afraid that this needs a new rebase. The patch is failing to > apply, per the CF bot. :/ It's failing on binary diffs due to the NSS certificate databases being included to make hacking on the patch easier: File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported. This is a limitation of the CFBot patch tester, the text portions of the patch still applies with a tiny but of fuzz. cheers ./daniel
> On 29 Sep 2020, at 09:52, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 29 Sep 2020, at 07:59, Michael Paquier <michael@paquier.xyz> wrote: >> >> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote: >>> Attached is a v10 rebased to apply on top of HEAD. >> >> I am afraid that this needs a new rebase. The patch is failing to >> apply, per the CF bot. :/ > > It's failing on binary diffs due to the NSS certificate databases being > included to make hacking on the patch easier: > > File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported. > > This is a limitation of the CFBot patch tester, the text portions of the patch > still applies with a tiny but of fuzz. Attached is a new version which doesn't contain the NSS certificate databases to keep the CFBot happy. It also implements server-side passphrase callbacks as well as re-enables the tests for those. The callback works a bit differently from the OpenSSL one as it must run in the forked process, so it can't run on server reload. There's also no default fallback reading from a TTY like in OpenSSL, so if no callback it set the always-failing dummy is set. cheers ./daniel
Attachment
The attached v12 adds support for pgcrypto as well as pg_strong_random, which I believe completes the required subsystems where we have OpenSSL support today. I opted for not adding code to handle the internal shaXXX implementations until the dust settles around the proposal to change the API there. Blowfish is not supported by NSS AFAICT, even though the cipher mechanism is defined, so the internal implementation is used there instead. CAST5 is supported, but segfaults inside NSS on most inputs so support for that is not included for now. cheers ./daniel
Attachment
Hi, On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote: > From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001 > From: Daniel Gustafsson <daniel@yesql.se> > Date: Thu, 8 Oct 2020 18:44:28 +0200 > Subject: [PATCH] Support for NSS as a TLS backend v12 > --- > configure | 223 +++- > configure.ac | 39 +- > contrib/Makefile | 2 +- > contrib/pgcrypto/Makefile | 5 + > contrib/pgcrypto/nss.c | 773 +++++++++++ > contrib/pgcrypto/openssl.c | 2 +- > contrib/pgcrypto/px.c | 1 + > contrib/pgcrypto/px.h | 1 + Personally I'd like to see this patch broken up a bit - it's quite large. Several of the changes could easily be committed separately, no? > if test "$with_openssl" = yes ; then > + if test x"$with_nss" = x"yes" ; then > + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) > + fi Based on a quick look there's no similar error check for the msvc build. Should there be? > > +if test "$with_nss" = yes ; then > + if test x"$with_openssl" = x"yes" ; then > + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) > + fi Isn't this a repetition of the earlier check? > + CLEANLDFLAGS="$LDFLAGS" > + # TODO: document this set of LDFLAGS > + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" Shouldn't this use nss-config or such? > +if test "$with_nss" = yes ; then > + AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])]) > + AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])]) > +fi Hm. For me, on debian, these headers are not directly in the default include search path, but would be as nss/ssl.h. I don't see you adding nss/ to CFLAGS anywhere? How does this work currently? I think it'd also be better if we could include these files as nss/ssl.h etc - ssl.h is a name way too likely to conflict imo. > +++ b/src/backend/libpq/be-secure-nss.c > @@ -0,0 +1,1158 @@ > +/* > + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef > + * our version to avoid compiler warnings on redefinition. > + */ > +#define pg_BITS_PER_BYTE BITS_PER_BYTE > +#undef BITS_PER_BYTE Most compilers/preprocessors don't warn about redefinitions when they would result in the same value (IIRC we have some cases of redefinitions in tree even). Does nspr's differ? > +/* > + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with > + * colliding definitions from ours, causing a much expected compiler error. > + * The definitions are however not actually used in NSPR at all, and are only > + * intended for what seems to be backwards compatibility for apps written > + * against old versions of NSPR. The following comment is in the referenced > + * file, and was added in 1998: > + * > + * This section typedefs the old 'native' types to the new PR<type>s. > + * These definitions are scheduled to be eliminated at the earliest > + * possible time. The NSPR API is implemented and documented using > + * the new definitions. > + * > + * As there is no opt-out from pulling in these typedefs, we define the guard > + * for the file to exclude it. This is incredibly ugly, but seems to be about > + * the only way around it. > + */ > +#define PROTYPES_H > +#include <nspr.h> > +#undef PROTYPES_H Yuck :(. > +int > +be_tls_init(bool isServerStart) > +{ > + SECStatus status; > + SSLVersionRange supported_sslver; > + > + /* > + * Set up the connection cache for multi-processing application behavior. Hm. Do we necessarily want that? Session resumption is not exactly unproblematic... Or does this do something else? > + * If we are in ServerStart then we initialize the cache. If the server is > + * already started, we inherit the cache such that it can be used for > + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment > + * variable which contains enough information for the forked child to know > + * how to access it. Passing NULL to SSL_InheritMPServerSIDCache will > + * make the forked child look it up by the default name SSL_INHERITANCE, > + * if env vars aren't inherited then the contents of the variable can be > + * passed instead. > + */ Does this stuff work on windows / EXEC_BACKEND? > + * The below parameters are what the implicit initialization would've done > + * for us, and should work even for older versions where it might not be > + * done automatically. The last parameter, maxPTDs, is set to various > + * values in other codebases, but has been unused since NSPR 2.1 which was > + * released sometime in 1998. > + */ > + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ ); https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init says that currently all parameters are ignored? > + /* > + * Import the already opened socket as we don't want to use NSPR functions > + * for opening the network socket due to how the PostgreSQL protocol works > + * with TLS connections. This function is not part of the NSPR public API, > + * see the comment at the top of the file for the rationale of still using > + * it. > + */ > + pr_fd = PR_ImportTCPSocket(port->sock); > + if (!pr_fd) > + ereport(ERROR, > + (errmsg("unable to connect to socket"))); I don't see the comment you're referring to? > + /* > + * Most of the documentation available, and implementations of, NSS/NSPR > + * use the PR_NewTCPSocket() function here, which has the drawback that it > + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which > + * copes with IPv6 as well. > + */ > + model = PR_OpenTCPSocket(port->laddr.addr.ss_family); > + if (!model) > + ereport(ERROR, > + (errmsg("unable to open socket"))); > + > + /* > + * Convert the NSPR socket to an SSL socket. Ensuring the success of this > + * operation is critical as NSS SSL_* functions may return SECSuccess on > + * the socket even though SSL hasn't been enabled, which introduce a risk > + * of silent downgrades. > + */ > + model = SSL_ImportFD(NULL, model); > + if (!model) > + ereport(ERROR, > + (errmsg("unable to enable TLS on socket"))); It's confusing that these functions do not actually reference the socket via some handle :(. What does opening a socket do here? > + /* > + * Configure the allowed cipher. If there are no user preferred suites, *ciphers? > + > + port->pr_fd = SSL_ImportFD(model, pr_fd); > + if (!port->pr_fd) > + ereport(ERROR, > + (errmsg("unable to initialize"))); > + > + PR_Close(model); A comment explaining why we first import a NULL into the model, and then release the model, and import the real fd would be good. > +ssize_t > +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) > +{ > + ssize_t n_read; > + PRErrorCode err; > + > + n_read = PR_Read(port->pr_fd, ptr, len); > + > + if (n_read < 0) > + { > + err = PR_GetError(); Yay, more thread global state :(. > + /* XXX: This logic seems potentially bogus? */ > + if (err == PR_WOULD_BLOCK_ERROR) > + *waitfor = WL_SOCKET_READABLE; > + else > + *waitfor = WL_SOCKET_WRITEABLE; Don't we need to handle failed connections somewhere here? secure_read() won't know about PR_GetError() etc? How would SSL errors be signalled upwards here? Also, as you XXX, it's not clear to me that your mapping would always result in waiting for the right event? A tls write could e.g. very well require receiving data etc? > + /* > + * At least one byte with password content was returned, and NSS requires > + * that we return it allocated in NSS controlled memory. If we fail to > + * allocate then abort without passing back NULL and bubble up the error > + * on the PG side. > + */ > + password = (char *) PR_Malloc(len + 1); > + if (!password) > + ereport(ERROR, > + (errcode(ERRCODE_OUT_OF_MEMORY), > + errmsg("out of memory"))); > > + strlcpy(password, buf, sizeof(password)); > + explicit_bzero(buf, sizeof(buf)); > + In case of error you're not bzero'ing out the password! Separately, I wonder if we should introduce a function for throwing OOM errors - which then e.g. could print the memory context stats in those places too... > +static SECStatus > +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) > +{ > + SECStatus status; > + Port *port = (Port *) arg; > + CERTCertificate *cert; > + char *peer_cn; > + int len; > + > + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); > + if (status == SECSuccess) > + { > + cert = SSL_PeerCertificate(port->pr_fd); > + len = strlen(cert->subjectName); > + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); > + if (strncmp(cert->subjectName, "CN=", 3) == 0) > + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); > + else > + strlcpy(peer_cn, cert->subjectName, len + 1); > + CERT_DestroyCertificate(cert); > + > + port->peer_cn = peer_cn; > + port->peer_cert_valid = true; Hm. We either should have something similar to /* * Reject embedded NULLs in certificate common name to prevent * attacks like CVE-2009-4034. */ if (len != strlen(peer_cn)) { ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL certificate's common name contains embedded null"))); pfree(peer_cn); return -1; } here, or a comment explaining why not. Also, what's up with the CN= bit? Why is that needed here, but not for openssl? > +static PRFileDesc * > +init_iolayer(Port *port, int loglevel) > +{ > + const PRIOMethods *default_methods; > + PRFileDesc *layer; > + > + /* > + * Start by initializing our layer with all the default methods so that we > + * can selectively override the ones we want while still ensuring that we > + * have a complete layer specification. > + */ > + default_methods = PR_GetDefaultIOMethods(); > + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods)); > + > + pr_iomethods.recv = pg_ssl_read; > + pr_iomethods.send = pg_ssl_write; > + > + /* > + * Each IO layer must be identified by a unique name, where uniqueness is > + * per connection. Each connection in a postgres cluster can generate the > + * identity from the same string as they will create their IO layers on > + * different sockets. Only one layer per socket can have the same name. > + */ > + pr_id = PR_GetUniqueIdentity("PostgreSQL"); Seems like it might not be a bad idea to append Server or something? > + > + /* > + * Create the actual IO layer as a stub such that it can be pushed onto > + * the layer stack. The step via a stub is required as we define custom > + * callbacks. > + */ > + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods); > + if (!layer) > + { > + ereport(loglevel, > + (errmsg("unable to create NSS I/O layer"))); > + return NULL; > + } Why is this accepting a variable log level? The only caller passes ERROR? > +/* > + * pg_SSLerrmessage > + * Create and return a human readable error message given > + * the specified error code > + * > + * PR_ErrorToName only converts the enum identifier of the error to string, > + * but that can be quite useful for debugging (and in case PR_ErrorToString is > + * unable to render a message then we at least have something). > + */ > +static char * > +pg_SSLerrmessage(PRErrorCode errcode) > +{ > + char error[128]; > + int ret; > + > + /* TODO: this should perhaps use a StringInfo instead.. */ > + ret = pg_snprintf(error, sizeof(error), "%s (%s)", > + PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT), > + PR_ErrorToName(errcode)); > + if (ret) > + return pstrdup(error); > + return pstrdup(_("unknown TLS error")); > +} Why not use psrintf() here? > +++ b/src/include/common/pg_nss.h > @@ -0,0 +1,141 @@ > +/*------------------------------------------------------------------------- > + * > + * pg_nss.h > + * Support for NSS as a TLS backend > + * > + * These definitions are used by both frontend and backend code. > + * > + * Copyright (c) 2020, PostgreSQL Global Development Group > + * > + * IDENTIFICATION > + * src/include/common/pg_nss.h > + * > + *------------------------------------------------------------------------- > + */ > +#ifndef PG_NSS_H > +#define PG_NSS_H > + > +#ifdef USE_NSS > + > +#include <sslproto.h> > + > +PRUint16 pg_find_cipher(char *name); > + > +typedef struct > +{ > + const char *name; > + PRUint16 number; > +} NSSCiphers; > + > +#define INVALID_CIPHER 0xFFFF > + > +/* > + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h > + * in order to provide a human readable version of the ciphers. It would be > + * nice to not have to have this, but NSS doesn't provide any API addressing > + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less? > + */ > +static const NSSCiphers NSS_CipherList[] = { > + > + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL}, Hm. Is this whole business of defining array constants in a header just done to avoid having a .c file that needs to be compiled both in frontend and backend code? > +/* > + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with > + * colliding definitions from ours, causing a much expected compiler error. > + * The definitions are however not actually used in NSPR at all, and are only > + * intended for what seems to be backwards compatibility for apps written > + * against old versions of NSPR. The following comment is in the referenced > + * file, and was added in 1998: > + * > + * This section typedefs the old 'native' types to the new PR<type>s. > + * These definitions are scheduled to be eliminated at the earliest > + * possible time. The NSPR API is implemented and documented using > + * the new definitions. > + * > + * As there is no opt-out from pulling in these typedefs, we define the guard > + * for the file to exclude it. This is incredibly ugly, but seems to be about > + * the only way around it. > + */ There's a lot of duplicated comments here. Could we move either of the files to reference the other for longer ones? > +/* > + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the > + * only way to make NSS use an already set up POSIX file descriptor rather > + * than opening one itself. To quote the NSS documentation: > + * > + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's > + * implementation changes. In practice, this is unlikely to happen because > + * NSPR's implementation has been stable for years and because of NSPR's > + * strong commitment to backward compatibility." > + * > + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket > + * > + * The function is declared in <private/pprio.h>, but as it is a header marked > + * private we declare it here rather than including it. > + */ > +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); Ugh. This is really the way to do this? How do other applications deal with this problem? > +#if defined(WIN32) > +static const char *ca_trust_name = "nssckbi.dll"; > +#elif defined(__darwin__) > +static const char *ca_trust_name = "libnssckbi.dylib"; > +#else > +static const char *ca_trust_name = "libnssckbi.so"; > +#endif There's really no pre-existing handling for this in nss??? > + /* > + * The original design of NSS was for a single application to use a single > + * copy of it, initialized with NSS_Initialize() which isn't returning any > + * handle with which to refer to NSS. NSS initialization and shutdown are > + * global for the application, so a shutdown in another NSS enabled > + * library would cause NSS to be stopped for libpq as well. The fix has > + * been to introduce NSS_InitContext which returns a context handle to > + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS > + * 3.12, but the use of it is not very well documented. > + * https://bugzilla.redhat.com/show_bug.cgi?id=738456 > + * > + * The InitParameters struct passed can be used to override internal > + * values in NSS, but the usage is not documented at all. When using > + * NSS_Init initializations, the values are instead set via PK11_Configure > + * calls so the PK11_Configure documentation can be used to glean some > + * details on these. > + * > + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs > + > + if (!nss_context) > + { > + char *err = pg_SSLerrmessage(PR_GetError()); > + > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to %s certificate database: %s"), > + conn->cert_database ? "open" : "create", > + err); > + free(err); > + return PGRES_POLLING_FAILED; > + } > + > + /* > + * Configure cipher policy. > + */ > + status = NSS_SetDomesticPolicy(); Why is "domestic" the right thing here? > + > + PK11_SetPasswordFunc(PQssl_passwd_cb); Is it actually OK to do stuff like this when other users of NSS might be present? That's obviously more likely in the libpq case, compared to the backend case (where it's also possible, of course). What prevents us from overriding another user's callback? > +ssize_t > +pgtls_read(PGconn *conn, void *ptr, size_t len) > +{ > + PRInt32 nread; > + PRErrorCode status; > + int read_errno = 0; > + > + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT); > + > + /* > + * PR_Recv blocks until there is data to read or the timeout expires. Zero > + * is returned for closed connections, while -1 indicates an error within > + * the ongoing connection. > + */ > + if (nread == 0) > + { > + read_errno = ECONNRESET; > + return -1; > + } It's a bit confusing to talk about blocking when the socket presumably is in non-blocking mode, and you're also asking to never wait? > + if (nread == -1) > + { > + status = PR_GetError(); > + > + switch (status) > + { > + case PR_WOULD_BLOCK_ERROR: > + read_errno = EINTR; > + break; Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the same as EAGAIN/EWOULDBLOCK? > + case PR_IO_TIMEOUT_ERROR: > + break; What does this mean? We'll return with a 0 errno here, right? When is this case reachable? E.g. the comment in fe-misc.c: /* pqsecure_read set the error message for us */ for this case doesn't seem to be fulfilled by this. > +/* > + * Verify that the server certificate matches the hostname we connected to. > + * > + * The certificate's Common Name and Subject Alternative Names are considered. > + */ > +int > +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn, > + int *names_examined, > + char **first_name) > +{ > + return 1; > +} Uh, huh? Certainly doesn't verify anything... > +/* ------------------------------------------------------------ */ > +/* PostgreSQL specific TLS support functions */ > +/* ------------------------------------------------------------ */ > + > +/* > + * TODO: this a 99% copy of the same function in the backend, make these share > + * a single implementation instead. > + */ > +static char * > +pg_SSLerrmessage(PRErrorCode errcode) > +{ > + const char *error; > + > + error = PR_ErrorToName(errcode); > + if (error) > + return strdup(error); > + > + return strdup("unknown TLS error"); > +} Btw, why does this need to duplicate strings, instead of returning a const char*? Greetings, Andres Freund
> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote: > > Hi, Thanks for your review, much appreciated! > On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote: >> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001 >> From: Daniel Gustafsson <daniel@yesql.se> >> Date: Thu, 8 Oct 2020 18:44:28 +0200 >> Subject: [PATCH] Support for NSS as a TLS backend v12 >> --- >> configure | 223 +++- >> configure.ac | 39 +- >> contrib/Makefile | 2 +- >> contrib/pgcrypto/Makefile | 5 + >> contrib/pgcrypto/nss.c | 773 +++++++++++ >> contrib/pgcrypto/openssl.c | 2 +- >> contrib/pgcrypto/px.c | 1 + >> contrib/pgcrypto/px.h | 1 + > > Personally I'd like to see this patch broken up a bit - it's quite > large. Several of the changes could easily be committed separately, no? Not sure how much of this makes sense committed separately (unless separately means in quick succession), but it could certainly be broken up for the sake of making review easier. I will take a stab at that, but in a follow-up email as I would like the split to be a version just doing the split and not also introducing/fixing things. >> if test "$with_openssl" = yes ; then >> + if test x"$with_nss" = x"yes" ; then >> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >> + fi > > Based on a quick look there's no similar error check for the msvc > build. Should there be? Thats a good question. When embarking on this is seemed quite natural to me that it should be, but now I'm not so sure. Maybe there should be a --with-openssl-preferred like how we handle readline/libedit or just allow multiple and let the last one win? Do you have any input on what would make sense? The only thing I think makes no sense is to allow multiple ones at the same time given the current autoconf switches, even if it would just be to pick say pg_strong_random from one and libpq TLS from another. >> +if test "$with_nss" = yes ; then >> + if test x"$with_openssl" = x"yes" ; then >> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >> + fi > > Isn't this a repetition of the earlier check? It is, and it we want to keep such a check it should be broken out into a separate step performed before all library specific checks IMO. >> + CLEANLDFLAGS="$LDFLAGS" >> + # TODO: document this set of LDFLAGS >> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" > > Shouldn't this use nss-config or such? Indeed it should, where available. I've added rudimentary support for that without a fallback as of now. >> +if test "$with_nss" = yes ; then >> + AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])]) >> + AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])]) >> +fi > > Hm. For me, on debian, these headers are not directly in the default > include search path, but would be as nss/ssl.h. I don't see you adding > nss/ to CFLAGS anywhere? How does this work currently? I had Stockholm-syndromed myself into passing --with-includes and hadn't really realized. Sometimes the obvious is too obvious in a 4000+ LOC patch. > I think it'd also be better if we could include these files as nss/ssl.h > etc - ssl.h is a name way too likely to conflict imo. I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path will still need the direct path to the headers (from autoconf) since nss.h includes NSPR headers as #include <nspr.h> and so on. >> +++ b/src/backend/libpq/be-secure-nss.c >> @@ -0,0 +1,1158 @@ >> +/* >> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef >> + * our version to avoid compiler warnings on redefinition. >> + */ >> +#define pg_BITS_PER_BYTE BITS_PER_BYTE >> +#undef BITS_PER_BYTE > > Most compilers/preprocessors don't warn about redefinitions when they > would result in the same value (IIRC we have some cases of redefinitions > in tree even). Does nspr's differ? GCC 8.3 in my Debian installation throws the below warning: In file included from /usr/include/nspr/prtypes.h:26, from /usr/include/nspr/pratom.h:14, from /usr/include/nspr/nspr.h:9, from be-secure-nss.c:45: /usr/include/nspr/prcpucfg.h:1143: warning: "BITS_PER_BYTE" redefined #define BITS_PER_BYTE PR_BITS_PER_BYTE In file included from ../../../src/include/c.h:55, from ../../../src/include/postgres.h:46, from be-secure-nss.c:16: ../../../src/include/pg_config_manual.h:115: note: this is the location of the previous definition #define BITS_PER_BYTE 8 PR_BITS_PER_BYTE is defined per platform in pr/include/md/_<platform>.cfg and is as expected 8. I assume it's that indirection which cause the warning? >> +/* >> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with >> + * colliding definitions from ours, causing a much expected compiler error. >> + * The definitions are however not actually used in NSPR at all, and are only >> + * intended for what seems to be backwards compatibility for apps written >> + * against old versions of NSPR. The following comment is in the referenced >> + * file, and was added in 1998: >> + * >> + * This section typedefs the old 'native' types to the new PR<type>s. >> + * These definitions are scheduled to be eliminated at the earliest >> + * possible time. The NSPR API is implemented and documented using >> + * the new definitions. >> + * >> + * As there is no opt-out from pulling in these typedefs, we define the guard >> + * for the file to exclude it. This is incredibly ugly, but seems to be about >> + * the only way around it. >> + */ >> +#define PROTYPES_H >> +#include <nspr.h> >> +#undef PROTYPES_H > > Yuck :(. Thats not an understatement. Taking another dive into the NSPR code I did however find a proper way to deal with this. Defining NO_NSPR_10_SUPPORT stops NSPR from using the files in obsolete/. So fixed, yay! >> +int >> +be_tls_init(bool isServerStart) >> +{ >> + SECStatus status; >> + SSLVersionRange supported_sslver; >> + >> + /* >> + * Set up the connection cache for multi-processing application behavior. > > Hm. Do we necessarily want that? Session resumption is not exactly > unproblematic... Or does this do something else? From my reading of the docs, and experience with the code, a server application must set up a connection cache in order to accept connections. Not entirely sure, and the docs aren't terribly clear for non SSLv2/v3 environments (it seems to only cache for SSLv2/3 and not TLSv+) but it seems like it may have other uses internally. I will hunt down some more information on the NSS mailing list. >> + * If we are in ServerStart then we initialize the cache. If the server is >> + * already started, we inherit the cache such that it can be used for >> + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment >> + * variable which contains enough information for the forked child to know >> + * how to access it. Passing NULL to SSL_InheritMPServerSIDCache will >> + * make the forked child look it up by the default name SSL_INHERITANCE, >> + * if env vars aren't inherited then the contents of the variable can be >> + * passed instead. >> + */ > > Does this stuff work on windows According to the documentation it does, and Andrew had this working on Windows in an earlier version of the patch. I need to get a proper Windows env for testing/dev up and running as mine has bitrotted to nothingness. > / EXEC_BACKEND? That's a good point, maybe we need to do a SSL_ConfigServerSessionIDCache rather than the MP version for EXEC_BACKEND? Not sure. >> + * The below parameters are what the implicit initialization would've done >> + * for us, and should work even for older versions where it might not be >> + * done automatically. The last parameter, maxPTDs, is set to various >> + * values in other codebases, but has been unused since NSPR 2.1 which was >> + * released sometime in 1998. >> + */ >> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ ); > > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init > says that currently all parameters are ignored? Right, my comment didn't reflect that they're all dead these days, only that one of them has been unused since RUN DMC topped the charts with "It's like that". Comment updated. >> + /* >> + * Import the already opened socket as we don't want to use NSPR functions >> + * for opening the network socket due to how the PostgreSQL protocol works >> + * with TLS connections. This function is not part of the NSPR public API, >> + * see the comment at the top of the file for the rationale of still using >> + * it. >> + */ >> + pr_fd = PR_ImportTCPSocket(port->sock); >> + if (!pr_fd) >> + ereport(ERROR, >> + (errmsg("unable to connect to socket"))); > > I don't see the comment you're referring to? It's referring to the comment discussing PR_ImportTCPSocket being a private API call, yet still used by everyone (which is also discussed later in this review). >> + /* >> + * Most of the documentation available, and implementations of, NSS/NSPR >> + * use the PR_NewTCPSocket() function here, which has the drawback that it >> + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which >> + * copes with IPv6 as well. >> + */ >> + model = PR_OpenTCPSocket(port->laddr.addr.ss_family); >> + if (!model) >> + ereport(ERROR, >> + (errmsg("unable to open socket"))); >> + >> + /* >> + * Convert the NSPR socket to an SSL socket. Ensuring the success of this >> + * operation is critical as NSS SSL_* functions may return SECSuccess on >> + * the socket even though SSL hasn't been enabled, which introduce a risk >> + * of silent downgrades. >> + */ >> + model = SSL_ImportFD(NULL, model); >> + if (!model) >> + ereport(ERROR, >> + (errmsg("unable to enable TLS on socket"))); > > It's confusing that these functions do not actually reference the socket > via some handle :(. What does opening a socket do here? This specific call converts the socket from a plain NSPR socket to an SSL/TLS capable socket which NSS will work with. This is a required step for "activating" NSS on the socket. >> + /* >> + * Configure the allowed cipher. If there are no user preferred suites, > > *ciphers? Yes, fixed. >> + >> + port->pr_fd = SSL_ImportFD(model, pr_fd); >> + if (!port->pr_fd) >> + ereport(ERROR, >> + (errmsg("unable to initialize"))); >> + >> + PR_Close(model); > > A comment explaining why we first import a NULL into the model, and then > release the model, and import the real fd would be good. I've added a small comment to explain how the model is a configuration template for the actual socket. This part of NSS/NSPR is a bit overcomplicated for how we have connections, it's more geared towards having many open sockets in the same process. >> +ssize_t >> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) >> +{ >> + ssize_t n_read; >> + PRErrorCode err; >> + >> + n_read = PR_Read(port->pr_fd, ptr, len); >> + >> + if (n_read < 0) >> + { >> + err = PR_GetError(); > > Yay, more thread global state :(. Sorry about that. >> + /* XXX: This logic seems potentially bogus? */ >> + if (err == PR_WOULD_BLOCK_ERROR) >> + *waitfor = WL_SOCKET_READABLE; >> + else >> + *waitfor = WL_SOCKET_WRITEABLE; > > Don't we need to handle failed connections somewhere here? secure_read() > won't know about PR_GetError() etc? How would SSL errors be signalled > upwards here? > > Also, as you XXX, it's not clear to me that your mapping would always > result in waiting for the right event? A tls write could e.g. very well > require receiving data etc? Fixed, but there might be more to be done here. >> + /* >> + * At least one byte with password content was returned, and NSS requires >> + * that we return it allocated in NSS controlled memory. If we fail to >> + * allocate then abort without passing back NULL and bubble up the error >> + * on the PG side. >> + */ >> + password = (char *) PR_Malloc(len + 1); >> + if (!password) >> + ereport(ERROR, >> + (errcode(ERRCODE_OUT_OF_MEMORY), >> + errmsg("out of memory"))); >> >> + strlcpy(password, buf, sizeof(password)); >> + explicit_bzero(buf, sizeof(buf)); >> + > > In case of error you're not bzero'ing out the password! Fixed. > Separately, I wonder if we should introduce a function for throwing OOM > errors - which then e.g. could print the memory context stats in those > places too... +1. I'd be happy to review such a patch. >> +static SECStatus >> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) >> +{ >> + SECStatus status; >> + Port *port = (Port *) arg; >> + CERTCertificate *cert; >> + char *peer_cn; >> + int len; >> + >> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); >> + if (status == SECSuccess) >> + { >> + cert = SSL_PeerCertificate(port->pr_fd); >> + len = strlen(cert->subjectName); >> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); >> + if (strncmp(cert->subjectName, "CN=", 3) == 0) >> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); >> + else >> + strlcpy(peer_cn, cert->subjectName, len + 1); >> + CERT_DestroyCertificate(cert); >> + >> + port->peer_cn = peer_cn; >> + port->peer_cert_valid = true; > > Hm. We either should have something similar to > > /* > * Reject embedded NULLs in certificate common name to prevent > * attacks like CVE-2009-4034. > */ > if (len != strlen(peer_cn)) > { > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL certificate's common name contains embedded null"))); > pfree(peer_cn); > return -1; > } > here, or a comment explaining why not. We should, but it's proving rather difficult as there is no equivalent API call to get the string as well as the expected length of it. > Also, what's up with the CN= bit? Why is that needed here, but not for > openssl? OpenSSL returns only the value portion, whereas NSS returns key=value so we need to skip over the key= part. >> +static PRFileDesc * >> +init_iolayer(Port *port, int loglevel) >> +{ >> + const PRIOMethods *default_methods; >> + PRFileDesc *layer; >> + >> + /* >> + * Start by initializing our layer with all the default methods so that we >> + * can selectively override the ones we want while still ensuring that we >> + * have a complete layer specification. >> + */ >> + default_methods = PR_GetDefaultIOMethods(); >> + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods)); >> + >> + pr_iomethods.recv = pg_ssl_read; >> + pr_iomethods.send = pg_ssl_write; >> + >> + /* >> + * Each IO layer must be identified by a unique name, where uniqueness is >> + * per connection. Each connection in a postgres cluster can generate the >> + * identity from the same string as they will create their IO layers on >> + * different sockets. Only one layer per socket can have the same name. >> + */ >> + pr_id = PR_GetUniqueIdentity("PostgreSQL"); > > Seems like it might not be a bad idea to append Server or something? Fixed. >> + /* >> + * Create the actual IO layer as a stub such that it can be pushed onto >> + * the layer stack. The step via a stub is required as we define custom >> + * callbacks. >> + */ >> + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods); >> + if (!layer) >> + { >> + ereport(loglevel, >> + (errmsg("unable to create NSS I/O layer"))); >> + return NULL; >> + } > > Why is this accepting a variable log level? The only caller passes ERROR? Good catch, that's a leftover from a previous version which no longer makes sense. loglevel param removed. >> +/* >> + * pg_SSLerrmessage >> + * Create and return a human readable error message given >> + * the specified error code >> + * >> + * PR_ErrorToName only converts the enum identifier of the error to string, >> + * but that can be quite useful for debugging (and in case PR_ErrorToString is >> + * unable to render a message then we at least have something). >> + */ >> +static char * >> +pg_SSLerrmessage(PRErrorCode errcode) >> +{ >> + char error[128]; >> + int ret; >> + >> + /* TODO: this should perhaps use a StringInfo instead.. */ >> + ret = pg_snprintf(error, sizeof(error), "%s (%s)", >> + PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT), >> + PR_ErrorToName(errcode)); >> + if (ret) >> + return pstrdup(error); > >> + return pstrdup(_("unknown TLS error")); >> +} > > Why not use psrintf() here? Thats a good question to which I don't have a good answer. Changed to doing just that. >> +++ b/src/include/common/pg_nss.h >> @@ -0,0 +1,141 @@ >> +/*------------------------------------------------------------------------- >> + * >> + * pg_nss.h >> + * Support for NSS as a TLS backend >> + * >> + * These definitions are used by both frontend and backend code. >> + * >> + * Copyright (c) 2020, PostgreSQL Global Development Group >> + * >> + * IDENTIFICATION >> + * src/include/common/pg_nss.h >> + * >> + *------------------------------------------------------------------------- >> + */ >> +#ifndef PG_NSS_H >> +#define PG_NSS_H >> + >> +#ifdef USE_NSS >> + >> +#include <sslproto.h> >> + >> +PRUint16 pg_find_cipher(char *name); >> + >> +typedef struct >> +{ >> + const char *name; >> + PRUint16 number; >> +} NSSCiphers; >> + >> +#define INVALID_CIPHER 0xFFFF >> + >> +/* >> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h >> + * in order to provide a human readable version of the ciphers. It would be >> + * nice to not have to have this, but NSS doesn't provide any API addressing >> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less? >> + */ >> +static const NSSCiphers NSS_CipherList[] = { >> + >> + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL}, > > Hm. Is this whole business of defining array constants in a header just > done to avoid having a .c file that needs to be compiled both in > frontend and backend code? That was the original motivation, but I guess I should just bit the bullet and make it a .c compiled in both frontend and backend? >> +/* >> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with >> + * colliding definitions from ours, causing a much expected compiler error. >> + * The definitions are however not actually used in NSPR at all, and are only >> + * intended for what seems to be backwards compatibility for apps written >> + * against old versions of NSPR. The following comment is in the referenced >> + * file, and was added in 1998: >> + * >> + * This section typedefs the old 'native' types to the new PR<type>s. >> + * These definitions are scheduled to be eliminated at the earliest >> + * possible time. The NSPR API is implemented and documented using >> + * the new definitions. >> + * >> + * As there is no opt-out from pulling in these typedefs, we define the guard >> + * for the file to exclude it. This is incredibly ugly, but seems to be about >> + * the only way around it. >> + */ > > There's a lot of duplicated comments here. Could we move either of the > files to reference the other for longer ones? I took a stab at this in the attached version. The code is perhaps over- commented in parts but I tried to encode my understanding of NSS into the comments where documentation is lacking, since I assume I'm not the only one who is new to NSS. There might be a need to pare back to keep it focused in case this patch goes futher. >> +/* >> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the >> + * only way to make NSS use an already set up POSIX file descriptor rather >> + * than opening one itself. To quote the NSS documentation: >> + * >> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's >> + * implementation changes. In practice, this is unlikely to happen because >> + * NSPR's implementation has been stable for years and because of NSPR's >> + * strong commitment to backward compatibility." >> + * >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket >> + * >> + * The function is declared in <private/pprio.h>, but as it is a header marked >> + * private we declare it here rather than including it. >> + */ >> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); > > Ugh. This is really the way to do this? How do other applications deal > with this problem? They either #include <private/pprio.h> or they do it like this (or vendor NSPR which makes calling private APIs less problematic). It sure is ugly, but there is no alternative to using this function. >> +#if defined(WIN32) >> +static const char *ca_trust_name = "nssckbi.dll"; >> +#elif defined(__darwin__) >> +static const char *ca_trust_name = "libnssckbi.dylib"; >> +#else >> +static const char *ca_trust_name = "libnssckbi.so"; >> +#endif > > There's really no pre-existing handling for this in nss??? NSS_Init does have more or less the above logic (see snippet below), but only when there is a cert database defined. /* * The following code is an attempt to automagically find the external root * module. * Note: Keep the #if-defined chunks in order. HPUX must select before UNIX. */ static const char *dllname = #if defined(XP_WIN32) || defined(XP_OS2) "nssckbi.dll"; #elif defined(HPUX) && !defined(__ia64) /* HP-UX PA-RISC */ "libnssckbi.sl"; #elif defined(DARWIN) "libnssckbi.dylib"; #elif defined(XP_UNIX) || defined(XP_BEOS) "libnssckbi.so"; #else #error "Uh! Oh! I don't know about this platform." #endif In the NSS_INIT_NOCERTDB case there is no such handling of the libname provided by NSS so we need to do that ourselves. >> + /* >> + * The original design of NSS was for a single application to use a single >> + * copy of it, initialized with NSS_Initialize() which isn't returning any >> + * handle with which to refer to NSS. NSS initialization and shutdown are >> + * global for the application, so a shutdown in another NSS enabled >> + * library would cause NSS to be stopped for libpq as well. The fix has >> + * been to introduce NSS_InitContext which returns a context handle to >> + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS >> + * 3.12, but the use of it is not very well documented. >> + * https://bugzilla.redhat.com/show_bug.cgi?id=738456 >> + * >> + * The InitParameters struct passed can be used to override internal >> + * values in NSS, but the usage is not documented at all. When using >> + * NSS_Init initializations, the values are instead set via PK11_Configure >> + * calls so the PK11_Configure documentation can be used to glean some >> + * details on these. >> + * >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs > >> + >> + if (!nss_context) >> + { >> + char *err = pg_SSLerrmessage(PR_GetError()); >> + >> + printfPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("unable to %s certificate database: %s"), >> + conn->cert_database ? "open" : "create", >> + err); >> + free(err); >> + return PGRES_POLLING_FAILED; >> + } >> + >> + /* >> + * Configure cipher policy. >> + */ >> + status = NSS_SetDomesticPolicy(); > > Why is "domestic" the right thing here? Historically there are three cipher policies in NSS: Domestic, Export and France. These would enable a set of ciphers based on US export restrictions (domest/export) or French import restrictions. All ciphers would start disabled and then the ciphers belonging to the chosen set would be enabled. Long ago, that was however removed and they now all get enabled by calling either of these three functions. NSS_SetDomesticPolicy enables all implemented ciphers, and the other calls just call NSS_SetDomesticPolicy, I guess that API was kept for backwards compatibility. The below bugzilla entry has a bit more information on this: https://bugzilla.mozilla.org/show_bug.cgi?id=848384 That being said, the comment in the code did not reflect that, so I've reworded it hoping it will be clearer now. >> + >> + PK11_SetPasswordFunc(PQssl_passwd_cb); > > Is it actually OK to do stuff like this when other users of NSS might be > present? That's obviously more likely in the libpq case, compared to the > backend case (where it's also possible, of course). What prevents us > from overriding another user's callback? The password callback pointer is stored in a static variable in NSS (in the file lib/pk11wrap/pk11auth.c). >> +ssize_t >> +pgtls_read(PGconn *conn, void *ptr, size_t len) >> +{ >> + PRInt32 nread; >> + PRErrorCode status; >> + int read_errno = 0; >> + >> + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT); >> + >> + /* >> + * PR_Recv blocks until there is data to read or the timeout expires. Zero >> + * is returned for closed connections, while -1 indicates an error within >> + * the ongoing connection. >> + */ >> + if (nread == 0) >> + { >> + read_errno = ECONNRESET; >> + return -1; >> + } > > It's a bit confusing to talk about blocking when the socket presumably > is in non-blocking mode, and you're also asking to never wait? Fair enough, I can agree that the wording isn't spot on. The socket is non-blocking while PR_Recv can block (which is what we ask it not to). I've reworded and moved the comment around to hopefully make it clearer. >> + if (nread == -1) >> + { >> + status = PR_GetError(); >> + >> + switch (status) >> + { >> + case PR_WOULD_BLOCK_ERROR: >> + read_errno = EINTR; >> + break; > > Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the > same as EAGAIN/EWOULDBLOCK? Right, that's clearly not right. >> + case PR_IO_TIMEOUT_ERROR: >> + break; > > What does this mean? We'll return with a 0 errno here, right? When is > this case reachable? It should, AFAICT, only be reachable when PR_Recv is used with a timeout which we don't do. It mentioned somewhere that it had happened in no-wait calls due to a bug, but I fail to find that reference now. Either way, I've removed it to fall into the default error handling which now sets errno correctly as that was a paddle short here. > E.g. the comment in fe-misc.c: > /* pqsecure_read set the error message for us */ > for this case doesn't seem to be fulfilled by this. Fixed, I hope. >> +/* >> + * Verify that the server certificate matches the hostname we connected to. >> + * >> + * The certificate's Common Name and Subject Alternative Names are considered. >> + */ >> +int >> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn, >> + int *names_examined, >> + char **first_name) >> +{ >> + return 1; >> +} > > Uh, huh? Certainly doesn't verify anything... Doh, the verification was done as part of the cert validation callback and I had missed moving it to the stub. Fixed and also expanded to closer match how it's done in the OpenSSL implementation. >> +/* ------------------------------------------------------------ */ >> +/* PostgreSQL specific TLS support functions */ >> +/* ------------------------------------------------------------ */ >> + >> +/* >> + * TODO: this a 99% copy of the same function in the backend, make these share >> + * a single implementation instead. >> + */ >> +static char * >> +pg_SSLerrmessage(PRErrorCode errcode) >> +{ >> + const char *error; >> + >> + error = PR_ErrorToName(errcode); >> + if (error) >> + return strdup(error); >> + >> + return strdup("unknown TLS error"); >> +} > > Btw, why does this need to duplicate strings, instead of returning a > const char*? No, it doesn't, and no longer does. The attached includes fixes for the above mentioned issues (and a few small other ones I stumbled across), hopefully without introducing too many new. As mentioned, I'll perform the split into multiple patches in a separate version which only performs a split to make it easier to diff the individual patchfile versions. cheers ./daniel
Attachment
On 27/10/2020 22:07, Daniel Gustafsson wrote: > /* > * Track whether the NSS database has a password set or not. There is no API > * function for retrieving password status, so we simply flip this to true in > * case NSS invoked the password callback - as that will only happen in case > * there is a password. The reason for tracking this is that there are calls > * which require a password parameter, but doesn't use the callbacks provided, > * so we must call the callback on behalf of these. > */ > static bool has_password = false; This is set in PQssl_passwd_cb function, but never reset. That seems wrong. The NSS database used in one connection might have a password, while another one might not. Or have I completely misunderstood this? - Heikki
Hi, On 2020-10-27 21:07:01 +0100, Daniel Gustafsson wrote: > > On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote: > >> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001 > >> From: Daniel Gustafsson <daniel@yesql.se> > >> Date: Thu, 8 Oct 2020 18:44:28 +0200 > >> Subject: [PATCH] Support for NSS as a TLS backend v12 > >> --- > >> configure | 223 +++- > >> configure.ac | 39 +- > >> contrib/Makefile | 2 +- > >> contrib/pgcrypto/Makefile | 5 + > >> contrib/pgcrypto/nss.c | 773 +++++++++++ > >> contrib/pgcrypto/openssl.c | 2 +- > >> contrib/pgcrypto/px.c | 1 + > >> contrib/pgcrypto/px.h | 1 + > > > > Personally I'd like to see this patch broken up a bit - it's quite > > large. Several of the changes could easily be committed separately, no? > > Not sure how much of this makes sense committed separately (unless separately > means in quick succession), but it could certainly be broken up for the sake of > making review easier. Committing e.g. the pgcrypto pieces separately from the backend code seems unproblematic. But yes, I would expect them to go in close to each other. I'm mainly concerned with smaller review-able units. Have you done testing to ensure that NSS PG cooperates correctly with openssl PG? Is there a way we can make that easier to do? E.g. allowing to build frontend with NSS and backend with openssl and vice versa? > >> if test "$with_openssl" = yes ; then > >> + if test x"$with_nss" = x"yes" ; then > >> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) > >> + fi > > > > Based on a quick look there's no similar error check for the msvc > > build. Should there be? > > Thats a good question. When embarking on this is seemed quite natural to me > that it should be, but now I'm not so sure. Maybe there should be a > --with-openssl-preferred like how we handle readline/libedit or just allow > multiple and let the last one win? Do you have any input on what would make > sense? > > The only thing I think makes no sense is to allow multiple ones at the same > time given the current autoconf switches, even if it would just be to pick say > pg_strong_random from one and libpq TLS from another. Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing to check for errors. Even better, of course, would be to allow switching of the SSL backend based on config options (PGC_POSTMASTER GUC for backend, connection string for frontend). Mainly because that would make testing of interoperability so much easier. Obviously still a few places like pgcrypto, randomness, etc, where only a compile time decision seems to make sense. > >> + CLEANLDFLAGS="$LDFLAGS" > >> + # TODO: document this set of LDFLAGS > >> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" > > > > Shouldn't this use nss-config or such? > > Indeed it should, where available. I've added rudimentary support for that > without a fallback as of now. When would we need a fallback? > > I think it'd also be better if we could include these files as nss/ssl.h > > etc - ssl.h is a name way too likely to conflict imo. > > I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path > will still need the direct path to the headers (from autoconf) since nss.h > includes NSPR headers as #include <nspr.h> and so on. Hm. Then it's probably not worth going there... > >> +static SECStatus > >> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) > >> +{ > >> + SECStatus status; > >> + Port *port = (Port *) arg; > >> + CERTCertificate *cert; > >> + char *peer_cn; > >> + int len; > >> + > >> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); > >> + if (status == SECSuccess) > >> + { > >> + cert = SSL_PeerCertificate(port->pr_fd); > >> + len = strlen(cert->subjectName); > >> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); > >> + if (strncmp(cert->subjectName, "CN=", 3) == 0) > >> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); > >> + else > >> + strlcpy(peer_cn, cert->subjectName, len + 1); > >> + CERT_DestroyCertificate(cert); > >> + > >> + port->peer_cn = peer_cn; > >> + port->peer_cert_valid = true; > > > > Hm. We either should have something similar to > > > > /* > > * Reject embedded NULLs in certificate common name to prevent > > * attacks like CVE-2009-4034. > > */ > > if (len != strlen(peer_cn)) > > { > > ereport(COMMERROR, > > (errcode(ERRCODE_PROTOCOL_VIOLATION), > > errmsg("SSL certificate's common name contains embedded null"))); > > pfree(peer_cn); > > return -1; > > } > > here, or a comment explaining why not. > > We should, but it's proving rather difficult as there is no equivalent API call > to get the string as well as the expected length of it. Hm. Should at least have a test to ensure that's not a problem then. I hope/assume NSS rejects this somewhere internally... > > Also, what's up with the CN= bit? Why is that needed here, but not for > > openssl? > > OpenSSL returns only the value portion, whereas NSS returns key=value so we > need to skip over the key= part. Why is it a conditional path though? > >> +/* > >> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the > >> + * only way to make NSS use an already set up POSIX file descriptor rather > >> + * than opening one itself. To quote the NSS documentation: > >> + * > >> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's > >> + * implementation changes. In practice, this is unlikely to happen because > >> + * NSPR's implementation has been stable for years and because of NSPR's > >> + * strong commitment to backward compatibility." > >> + * > >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket > >> + * > >> + * The function is declared in <private/pprio.h>, but as it is a header marked > >> + * private we declare it here rather than including it. > >> + */ > >> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); > > > > Ugh. This is really the way to do this? How do other applications deal > > with this problem? > > They either #include <private/pprio.h> or they do it like this (or vendor NSPR > which makes calling private APIs less problematic). It sure is ugly, but there > is no alternative to using this function. Hm - in debian unstable's NSS this function appears to be in nss/ssl.h, not pprio.h: /* ** Imports fd into SSL, returning a new socket. Copies SSL configuration ** from model. */ SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd); and ssl.h starts with: /* * This file contains prototypes for the public SSL functions. > >> + > >> + PK11_SetPasswordFunc(PQssl_passwd_cb); > > > > Is it actually OK to do stuff like this when other users of NSS might be > > present? That's obviously more likely in the libpq case, compared to the > > backend case (where it's also possible, of course). What prevents us > > from overriding another user's callback? > > The password callback pointer is stored in a static variable in NSS (in the > file lib/pk11wrap/pk11auth.c). But, uh, how is that not a problem? What happens if a backend imports libpq? What if plpython imports curl which then also uses nss? > + /* > + * Finally we must configure the socket for being a server by setting the > + * certificate and key. > + */ > + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa); > + if (status != SECSuccess) > + ereport(ERROR, > + (errmsg("unable to configure secure server: %s", > + pg_SSLerrmessage(PR_GetError())))); > + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0); > + if (status != SECSuccess) > + ereport(ERROR, > + (errmsg("unable to configure server for TLS server connections: %s", > + pg_SSLerrmessage(PR_GetError())))); Why do both of these need to get called? The NSS docs say: /* ** Deprecated variant of SSL_ConfigServerCert. ** ... SSL_IMPORT SECStatus SSL_ConfigSecureServer( PRFileDesc *fd, CERTCertificate *cert, SECKEYPrivateKey *key, SSLKEAType kea); Greetings, Andres Freund
>>> Personally I'd like to see this patch broken up a bit - it's quite >>> large. Several of the changes could easily be committed separately, no? >> >> Not sure how much of this makes sense committed separately (unless separately >> means in quick succession), but it could certainly be broken up for the sake of >> making review easier. > > Committing e.g. the pgcrypto pieces separately from the backend code > seems unproblematic. But yes, I would expect them to go in close to each > other. I'm mainly concerned with smaller review-able units. Attached is a v14 where the logical units are separated into individual commits. I hope this split makes it easier to read. The 0006 commit were things not really related to NSS at all that can be submitted to -hackers independently of this work, but they're still there since this version wasn't supposed to change anything. Most of the changes to sslinfo in 0005 are really only needed in case OpenSSL isn't the only TLS library, but I would argue that they should be considered regardless. There we are still accessing the ->ssl member directly and passing it to OpenSSL rather than using the be_tls_* API that we have. I can extract that portion as a separate patch submission unless there are objections. cheers ./daniel
Attachment
> On 28 Oct 2020, at 07:39, Andres Freund <andres@anarazel.de> wrote: > Have you done testing to ensure that NSS PG cooperates correctly with > openssl PG? Is there a way we can make that easier to do? E.g. allowing > to build frontend with NSS and backend with openssl and vice versa? When I wrote the Secure Transport patch I had a patch against PostgresNode which allowed for overriding the server binaries like so: SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check I've used that coupled with manual testing so far to make sure that an openssl client can talk to an NSS backend and so on. Before any other backend is added we clearly need *a* way of doing this, one which no doubt will need to be improved upon to suit more workflows. This is sort of the same situation as pg_upgrade, where two trees is needed to really test it. I can clean that patch up and post as a starting point for discussions. >>>> if test "$with_openssl" = yes ; then >>>> + if test x"$with_nss" = x"yes" ; then >>>> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >>>> + fi >>> >>> Based on a quick look there's no similar error check for the msvc >>> build. Should there be? >> >> Thats a good question. When embarking on this is seemed quite natural to me >> that it should be, but now I'm not so sure. Maybe there should be a >> --with-openssl-preferred like how we handle readline/libedit or just allow >> multiple and let the last one win? Do you have any input on what would make >> sense? >> >> The only thing I think makes no sense is to allow multiple ones at the same >> time given the current autoconf switches, even if it would just be to pick say >> pg_strong_random from one and libpq TLS from another. > > Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing > to check for errors. Thats another option, with --with-openssl being an alias for --with-ssl=openssl. After another round of thinking I like this even better as it makes the build infra cleaner, so the attached patch has this implemented. > Even better, of course, would be to allow switching of the SSL backend > based on config options (PGC_POSTMASTER GUC for backend, connection > string for frontend). Mainly because that would make testing of > interoperability so much easier. Obviously still a few places like > pgcrypto, randomness, etc, where only a compile time decision seems to > make sense. It would make testing easier, but the expense seems potentially rather high. How would a GUC switch be allowed to operate, would we have mixed backends or would be require all openssl connectins to be dropped before serving nss ones? >>>> + CLEANLDFLAGS="$LDFLAGS" >>>> + # TODO: document this set of LDFLAGS >>>> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" >>> >>> Shouldn't this use nss-config or such? >> >> Indeed it should, where available. I've added rudimentary support for that >> without a fallback as of now. > > When would we need a fallback? One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship an nss-config AFAICT. I wouldn't be surprised if there are other cases. >>> I think it'd also be better if we could include these files as nss/ssl.h >>> etc - ssl.h is a name way too likely to conflict imo. >> >> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path >> will still need the direct path to the headers (from autoconf) since nss.h >> includes NSPR headers as #include <nspr.h> and so on. > > Hm. Then it's probably not worth going there... It does however make visual parsing of the source files easer since it's clear which ssl.h is being referred to. I'm in favor of keeping it. >>>> +static SECStatus >>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) >>>> +{ >>>> + SECStatus status; >>>> + Port *port = (Port *) arg; >>>> + CERTCertificate *cert; >>>> + char *peer_cn; >>>> + int len; >>>> + >>>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); >>>> + if (status == SECSuccess) >>>> + { >>>> + cert = SSL_PeerCertificate(port->pr_fd); >>>> + len = strlen(cert->subjectName); >>>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); >>>> + if (strncmp(cert->subjectName, "CN=", 3) == 0) >>>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); >>>> + else >>>> + strlcpy(peer_cn, cert->subjectName, len + 1); >>>> + CERT_DestroyCertificate(cert); >>>> + >>>> + port->peer_cn = peer_cn; >>>> + port->peer_cert_valid = true; >>> >>> Hm. We either should have something similar to >>> >>> /* >>> * Reject embedded NULLs in certificate common name to prevent >>> * attacks like CVE-2009-4034. >>> */ >>> if (len != strlen(peer_cn)) >>> { >>> ereport(COMMERROR, >>> (errcode(ERRCODE_PROTOCOL_VIOLATION), >>> errmsg("SSL certificate's common name contains embedded null"))); >>> pfree(peer_cn); >>> return -1; >>> } >>> here, or a comment explaining why not. >> >> We should, but it's proving rather difficult as there is no equivalent API call >> to get the string as well as the expected length of it. > > Hm. Should at least have a test to ensure that's not a problem then. I > hope/assume NSS rejects this somewhere internally... Agreed, I'll try to hack up a testcase. >>> Also, what's up with the CN= bit? Why is that needed here, but not for >>> openssl? >> >> OpenSSL returns only the value portion, whereas NSS returns key=value so we >> need to skip over the key= part. > > Why is it a conditional path though? It was mostly just a belts-and-suspenders thing, I don't have any hard evidence that it's been a thing in any modern NSS version so it can be removed. >>>> +/* >>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the >>>> + * only way to make NSS use an already set up POSIX file descriptor rather >>>> + * than opening one itself. To quote the NSS documentation: >>>> + * >>>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's >>>> + * implementation changes. In practice, this is unlikely to happen because >>>> + * NSPR's implementation has been stable for years and because of NSPR's >>>> + * strong commitment to backward compatibility." >>>> + * >>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket >>>> + * >>>> + * The function is declared in <private/pprio.h>, but as it is a header marked >>>> + * private we declare it here rather than including it. >>>> + */ >>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); >>> >>> Ugh. This is really the way to do this? How do other applications deal >>> with this problem? >> >> They either #include <private/pprio.h> or they do it like this (or vendor NSPR >> which makes calling private APIs less problematic). It sure is ugly, but there >> is no alternative to using this function. > > Hm - in debian unstable's NSS this function appears to be in nss/ssl.h, > not pprio.h: > > /* > ** Imports fd into SSL, returning a new socket. Copies SSL configuration > ** from model. > */ > SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd); > > and ssl.h starts with: > /* > * This file contains prototypes for the public SSL functions. Right, but that's Import*FD*, not Import*TCPSocket*. We use ImportFD as well since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS on it. Thats been a public API for a long time. ImportTCPSocket is used to import an already opened socket into NSPR, else NSPR must open the socket itself. That part has been kept private for reasons unknown, as it's incredibly useful. >>>> + PK11_SetPasswordFunc(PQssl_passwd_cb); >>> >>> Is it actually OK to do stuff like this when other users of NSS might be >>> present? That's obviously more likely in the libpq case, compared to the >>> backend case (where it's also possible, of course). What prevents us >>> from overriding another user's callback? >> >> The password callback pointer is stored in a static variable in NSS (in the >> file lib/pk11wrap/pk11auth.c). > > But, uh, how is that not a problem? What happens if a backend imports > libpq? What if plpython imports curl which then also uses nss? Sorry, that sentence wasn't really finished. What I meant to write was that I don't really have good answers here. The available implementation is via the static var, and there are no alternative APIs. I've tried googling for insights but haven't come across any. The only datapoint I have is that I can't recall there ever being a complaint against libcurl doing this exact thing. That of course doesn't mean it cannot happen or cause problems. >> + /* >> + * Finally we must configure the socket for being a server by setting the >> + * certificate and key. >> + */ >> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa); >> + if (status != SECSuccess) >> + ereport(ERROR, >> + (errmsg("unable to configure secure server: %s", >> + pg_SSLerrmessage(PR_GetError())))); >> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0); >> + if (status != SECSuccess) >> + ereport(ERROR, >> + (errmsg("unable to configure server for TLS server connections: %s", >> + pg_SSLerrmessage(PR_GetError())))); > > Why do both of these need to get called? The NSS docs say: > > /* > ** Deprecated variant of SSL_ConfigServerCert. > ** > ... > SSL_IMPORT SECStatus SSL_ConfigSecureServer( > PRFileDesc *fd, CERTCertificate *ce rt, > SECKEYPrivateKey *key, SSLKEAType kea); They don't, I had missed the deprecation warning as it's not mentioned at all in the online documentation: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html (SSL_ConfigServerCert isn't at all mentioned there which dates it to before this went it obsoleting SSL_ConfigSecureServer.) Fixed by removing the superfluous call. Thanks again for reviewing! cheers ./daniel
Attachment
On 10/29/20 11:20 AM, Daniel Gustafsson wrote: >> On 28 Oct 2020, at 07:39, Andres Freund <andres@anarazel.de> wrote: >> Have you done testing to ensure that NSS PG cooperates correctly with >> openssl PG? Is there a way we can make that easier to do? E.g. allowing >> to build frontend with NSS and backend with openssl and vice versa? > When I wrote the Secure Transport patch I had a patch against PostgresNode > which allowed for overriding the server binaries like so: > > SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check > > I've used that coupled with manual testing so far to make sure that an openssl > client can talk to an NSS backend and so on. Before any other backend is added > we clearly need *a* way of doing this, one which no doubt will need to be > improved upon to suit more workflows. > > This is sort of the same situation as pg_upgrade, where two trees is needed to > really test it. > > I can clean that patch up and post as a starting point for discussions. > >>>>> if test "$with_openssl" = yes ; then >>>>> + if test x"$with_nss" = x"yes" ; then >>>>> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >>>>> + fi >>>> Based on a quick look there's no similar error check for the msvc >>>> build. Should there be? >>> Thats a good question. When embarking on this is seemed quite natural to me >>> that it should be, but now I'm not so sure. Maybe there should be a >>> --with-openssl-preferred like how we handle readline/libedit or just allow >>> multiple and let the last one win? Do you have any input on what would make >>> sense? >>> >>> The only thing I think makes no sense is to allow multiple ones at the same >>> time given the current autoconf switches, even if it would just be to pick say >>> pg_strong_random from one and libpq TLS from another. >> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing >> to check for errors. > Thats another option, with --with-openssl being an alias for --with-ssl=openssl. > > After another round of thinking I like this even better as it makes the build > infra cleaner, so the attached patch has this implemented. > >> Even better, of course, would be to allow switching of the SSL backend >> based on config options (PGC_POSTMASTER GUC for backend, connection >> string for frontend). Mainly because that would make testing of >> interoperability so much easier. Obviously still a few places like >> pgcrypto, randomness, etc, where only a compile time decision seems to >> make sense. > It would make testing easier, but the expense seems potentially rather high. > How would a GUC switch be allowed to operate, would we have mixed backends or > would be require all openssl connectins to be dropped before serving nss ones? > >>>>> + CLEANLDFLAGS="$LDFLAGS" >>>>> + # TODO: document this set of LDFLAGS >>>>> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" >>>> Shouldn't this use nss-config or such? >>> Indeed it should, where available. I've added rudimentary support for that >>> without a fallback as of now. >> When would we need a fallback? > One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship > an nss-config AFAICT. I wouldn't be surprised if there are other cases. > >>>> I think it'd also be better if we could include these files as nss/ssl.h >>>> etc - ssl.h is a name way too likely to conflict imo. >>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path >>> will still need the direct path to the headers (from autoconf) since nss.h >>> includes NSPR headers as #include <nspr.h> and so on. >> Hm. Then it's probably not worth going there... > It does however make visual parsing of the source files easer since it's clear > which ssl.h is being referred to. I'm in favor of keeping it. > >>>>> +static SECStatus >>>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) >>>>> +{ >>>>> + SECStatus status; >>>>> + Port *port = (Port *) arg; >>>>> + CERTCertificate *cert; >>>>> + char *peer_cn; >>>>> + int len; >>>>> + >>>>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); >>>>> + if (status == SECSuccess) >>>>> + { >>>>> + cert = SSL_PeerCertificate(port->pr_fd); >>>>> + len = strlen(cert->subjectName); >>>>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); >>>>> + if (strncmp(cert->subjectName, "CN=", 3) == 0) >>>>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); >>>>> + else >>>>> + strlcpy(peer_cn, cert->subjectName, len + 1); >>>>> + CERT_DestroyCertificate(cert); >>>>> + >>>>> + port->peer_cn = peer_cn; >>>>> + port->peer_cert_valid = true; >>>> Hm. We either should have something similar to >>>> >>>> /* >>>> * Reject embedded NULLs in certificate common name to prevent >>>> * attacks like CVE-2009-4034. >>>> */ >>>> if (len != strlen(peer_cn)) >>>> { >>>> ereport(COMMERROR, >>>> (errcode(ERRCODE_PROTOCOL_VIOLATION), >>>> errmsg("SSL certificate's common name contains embedded null"))); >>>> pfree(peer_cn); >>>> return -1; >>>> } >>>> here, or a comment explaining why not. >>> We should, but it's proving rather difficult as there is no equivalent API call >>> to get the string as well as the expected length of it. >> Hm. Should at least have a test to ensure that's not a problem then. I >> hope/assume NSS rejects this somewhere internally... > Agreed, I'll try to hack up a testcase. > >>>> Also, what's up with the CN= bit? Why is that needed here, but not for >>>> openssl? >>> OpenSSL returns only the value portion, whereas NSS returns key=value so we >>> need to skip over the key= part. >> Why is it a conditional path though? > It was mostly just a belts-and-suspenders thing, I don't have any hard evidence > that it's been a thing in any modern NSS version so it can be removed. > >>>>> +/* >>>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the >>>>> + * only way to make NSS use an already set up POSIX file descriptor rather >>>>> + * than opening one itself. To quote the NSS documentation: >>>>> + * >>>>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's >>>>> + * implementation changes. In practice, this is unlikely to happen because >>>>> + * NSPR's implementation has been stable for years and because of NSPR's >>>>> + * strong commitment to backward compatibility." >>>>> + * >>>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket >>>>> + * >>>>> + * The function is declared in <private/pprio.h>, but as it is a header marked >>>>> + * private we declare it here rather than including it. >>>>> + */ >>>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); >>>> Ugh. This is really the way to do this? How do other applications deal >>>> with this problem? >>> They either #include <private/pprio.h> or they do it like this (or vendor NSPR >>> which makes calling private APIs less problematic). It sure is ugly, but there >>> is no alternative to using this function. >> Hm - in debian unstable's NSS this function appears to be in nss/ssl.h, >> not pprio.h: >> >> /* >> ** Imports fd into SSL, returning a new socket. Copies SSL configuration >> ** from model. >> */ >> SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd); >> >> and ssl.h starts with: >> /* >> * This file contains prototypes for the public SSL functions. > Right, but that's Import*FD*, not Import*TCPSocket*. We use ImportFD as well > since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS > on it. Thats been a public API for a long time. ImportTCPSocket is used to > import an already opened socket into NSPR, else NSPR must open the socket > itself. That part has been kept private for reasons unknown, as it's > incredibly useful. > >>>>> + PK11_SetPasswordFunc(PQssl_passwd_cb); >>>> Is it actually OK to do stuff like this when other users of NSS might be >>>> present? That's obviously more likely in the libpq case, compared to the >>>> backend case (where it's also possible, of course). What prevents us >>>> from overriding another user's callback? >>> The password callback pointer is stored in a static variable in NSS (in the >>> file lib/pk11wrap/pk11auth.c). >> But, uh, how is that not a problem? What happens if a backend imports >> libpq? What if plpython imports curl which then also uses nss? > Sorry, that sentence wasn't really finished. What I meant to write was that I > don't really have good answers here. The available implementation is via the > static var, and there are no alternative APIs. I've tried googling for > insights but haven't come across any. > > The only datapoint I have is that I can't recall there ever being a complaint > against libcurl doing this exact thing. That of course doesn't mean it cannot > happen or cause problems. > >>> + /* >>> + * Finally we must configure the socket for being a server by setting the >>> + * certificate and key. >>> + */ >>> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa); >>> + if (status != SECSuccess) >>> + ereport(ERROR, >>> + (errmsg("unable to configure secure server: %s", >>> + pg_SSLerrmessage(PR_GetError())))); >>> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0); >>> + if (status != SECSuccess) >>> + ereport(ERROR, >>> + (errmsg("unable to configure server for TLS server connections: %s", >>> + pg_SSLerrmessage(PR_GetError())))); >> Why do both of these need to get called? The NSS docs say: >> >> /* >> ** Deprecated variant of SSL_ConfigServerCert. >> ** >> ... >> SSL_IMPORT SECStatus SSL_ConfigSecureServer( >> PRFileDesc *fd, CERTCertificate *ce rt, >> SECKEYPrivateKey *key, SSLKEAType kea); > They don't, I had missed the deprecation warning as it's not mentioned at all > in the online documentation: > > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html > > (SSL_ConfigServerCert isn't at all mentioned there which dates it to before > this went it obsoleting SSL_ConfigSecureServer.) > > Fixed by removing the superfluous call. > I've been looking through the new patch set, in particular the testing setup. The way it seems to proceed is to use the existing openssl generated certificates and imports them into NSS certificate databases. That seems fine to bootstrap testing, but it seems to me it would be more sound not to rely on openssl at all. I'd rather see the Makefile containing commands to create these from scratch, which mirror the openssl variants. IOW you should be able to build and test this from scratch, including certificate generation, without having openssl installed at all. I also notice that the invocations to pk12util don't contain the "sql:" prefix to the -d option, even though the database was created with that prefix a few lines above. That seems like a mistake from my reading of the pk12util man page. cheers andrew
> On 1 Nov 2020, at 14:13, Andrew Dunstan <andrew@dunslane.net> wrote: > I've been looking through the new patch set, in particular the testing > setup. Thanks! > The way it seems to proceed is to use the existing openssl generated > certificates and imports them into NSS certificate databases. That seems > fine to bootstrap testing, That's pretty much why I opted for using the existing certs: to bootstrap the patch and ensure OpenSSL-backend compatibility. > but it seems to me it would be more sound not > to rely on openssl at all. I'd rather see the Makefile containing > commands to create these from scratch, which mirror the openssl > variants. IOW you should be able to build and test this from scratch, > including certificate generation, without having openssl installed at all. I don't disagree with this, but I do also believe there is value in testing all TLS backends with exactly the same certificates to act as a baseline. The nssfiles target should definitely be able to generate from scratch, but maybe a combination is the best option? Being well versed in the buildfarm code, do you have an off-the-cuff idea on how to do cross library testing such that OpenSSL/NSS compatibility can be ensured? Andres was floating the idea of making a single sourcetree be able to have both for testing but more discussion is needed to settle on a way forward. > I also notice that the invocations to pk12util don't contain the "sql:" > prefix to the -d option, even though the database was created with that > prefix a few lines above. That seems like a mistake from my reading of > the pk12util man page. Fixed in the attached v16, which also drops the parts of the patchset which have been submitted separately to -hackers (the sslinfo patch hunks are still there are they are required). cheers ./daniel
Attachment
On 11/1/20 5:04 PM, Daniel Gustafsson wrote: >> On 1 Nov 2020, at 14:13, Andrew Dunstan <andrew@dunslane.net> wrote: >> I've been looking through the new patch set, in particular the testing >> setup. > Thanks! > >> The way it seems to proceed is to use the existing openssl generated >> certificates and imports them into NSS certificate databases. That seems >> fine to bootstrap testing, > That's pretty much why I opted for using the existing certs: to bootstrap the > patch and ensure OpenSSL-backend compatibility. > >> but it seems to me it would be more sound not >> to rely on openssl at all. I'd rather see the Makefile containing >> commands to create these from scratch, which mirror the openssl >> variants. IOW you should be able to build and test this from scratch, >> including certificate generation, without having openssl installed at all. > I don't disagree with this, but I do also believe there is value in testing all > TLS backends with exactly the same certificates to act as a baseline. The > nssfiles target should definitely be able to generate from scratch, but maybe a > combination is the best option? Yeah. I certainly think we need something that should how we would generate them from scratch using nss. That said, the importation code is also useful. > > Being well versed in the buildfarm code, do you have an off-the-cuff idea onIU > how to do cross library testing such that OpenSSL/NSS compatibility can be > ensured? Andres was floating the idea of making a single sourcetree be able to > have both for testing but more discussion is needed to settle on a way forward. Well, I'd probably try to leverage the knowledge we have in doing cross-version upgrade testing. It works like this: After the install-check-C stage each branch saves its binaries and data files in a special location, adjusting things like library locations to match. then to test that version it uses that against all the older versions similarly saved. We could generalize that saving mechanism and do it if any module required it. But instead of testing against a different branch, we'd test against a different animal. So we'd have two animals, one building with openssl and one with nss, and they would test against each other (i.e. one as the client and one as the sever, and vice versa). This would involve a deal of work on my part, but it's very doable, I believe. We'd need a way to run tests where we could specify the client and server binary locations. Anyway, those are my thoughts. Comments welcome. cheers andrew -- Andrew Dunstan EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 27 Oct 2020, at 21:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 27/10/2020 22:07, Daniel Gustafsson wrote: >> /* >> * Track whether the NSS database has a password set or not. There is no API >> * function for retrieving password status, so we simply flip this to true in >> * case NSS invoked the password callback - as that will only happen in case >> * there is a password. The reason for tracking this is that there are calls >> * which require a password parameter, but doesn't use the callbacks provided, >> * so we must call the callback on behalf of these. >> */ >> static bool has_password = false; > > This is set in PQssl_passwd_cb function, but never reset. That seems wrong. The NSS database used in one connection mighthave a password, while another one might not. Or have I completely misunderstood this? (sorry for slow response). You are absolutely right, the has_password flag must be tracked per connection in PGconn. The attached v17 implements this as well a frontend bugfix which caused dropped connections and some smaller fixups to make strings more translateable. I've also included a WIP version of SCRAM channel binding in the attached patch, it's currently failing to connect but someone here might spot the bug before I do so I figured it's better to include it. The 0005 patch is now, thanks to the sslinfo patch going in on master, only containing NSS specific code. cheers ./daniel
Attachment
> On 2 Nov 2020, at 15:17, Andrew Dunstan <andrew@dunslane.net> wrote: > We could generalize that saving mechanism and do it if any module > required it. But instead of testing against a different branch, we'd > test against a different animal. So we'd have two animals, one building > with openssl and one with nss, and they would test against each other > (i.e. one as the client and one as the sever, and vice versa). That seems like a very good plan. It would also allow us to test a backend compiled with OpenSSL 1.0.2 against a frontend with OpenSSL 1.1.1 which might come in handy when OpenSSL 3.0.0 lands. > This would involve a deal of work on my part, but it's very doable, I > believe. I have no experience with the buildfarm code, but I'm happy to help if theres anything I can do. cheers ./daniel
On Nov 4, 2020, at 5:09 AM, Daniel Gustafsson <daniel@yesql.se> wrote: > (sorry for slow response). You are absolutely right, the has_password flag > must be tracked per connection in PGconn. The attached v17 implements this as > well a frontend bugfix which caused dropped connections and some smaller fixups > to make strings more translateable. Some initial notes from building and testing on macOS Mojave. I'm working with both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a hand-built NSS/NSPR (which does not). 1. In configure.ac: > + LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS" > + CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS" > + > + AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])]) Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails with the hand-built stack, where there is no nss-config to populate LDFLAGS. I changed the function to NSS_InitContext and that seems to work nicely. 2. Among the things to eventually think about when it comes to configuring, it looks like some platforms [1] install the headers under <nspr4/...> and <nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS maintainers never chose an official installation layout. 3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both src/include/common/pg_nss.h src/port/pg_strong_random.c before the tree will compile for me. Both of those files include NSS headers. 4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR assertion that looks like sslMutex_Init not implemented for multi-process applications ! With assertions disabled, this ends up showing a somewhat unhelpful FATAL: unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE) It looks like cross-process locking isn't actually enabled on macOS, which is a long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache() error out. --Jacob [1] https://github.com/erthink/ReOpenLDAP/issues/112 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=538680 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1192500
> On 6 Nov 2020, at 21:37, Jacob Champion <pchampion@vmware.com> wrote: > Some initial notes from building and testing on macOS Mojave. I'm working with > both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a > hand-built NSS/NSPR (which does not). Thanks for looking! > 1. In configure.ac: > >> + LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS" >> + CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS" >> + >> + AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])]) > > Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails > with the hand-built stack, where there is no nss-config to populate LDFLAGS. I > changed the function to NSS_InitContext and that seems to work nicely. Ah yes, fixed. > 2. Among the things to eventually think about when it comes to configuring, it > looks like some platforms [1] install the headers under <nspr4/...> and > <nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS > maintainers never chose an official installation layout. Yeah, maybe we need to start with the most common path and have fallbacks in case not found? > 3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both > > src/include/common/pg_nss.h > src/port/pg_strong_random.c > > before the tree will compile for me. Both of those files include NSS headers. Odd that I was able to compile on Linux, but I've added these. > 4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR > assertion that looks like > > sslMutex_Init not implemented for multi-process applications ! > > With assertions disabled, this ends up showing a somewhat unhelpful > > FATAL: unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE) > > It looks like cross-process locking isn't actually enabled on macOS, which is a > long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache() > error out. Thats unfortunate since the session cache is required for a server application backed by NSS. The attached switches to SSL_ConfigServerSessionIDCacheWithOpt with which one can explicitly make the cache non-shared, which in turn backs the mutexes with NSPR locks rather than the missing sem_init. Can you test this version and see if that makes it work? This version also contains a channel binding bug that Heikki pointed out off- list (sadly not The bug) and a few very minor cleanups as well as a rebase to handle the new pg_strong_random_init. Actually performing the context init there is yet a TODO, but I wanted a version out that at all compiled. cheers ./daniel
Attachment
On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <daniel@yesql.se> wrote: > > The attached switches to SSL_ConfigServerSessionIDCacheWithOpt > with which one can explicitly make the cache non-shared, which in turn backs > the mutexes with NSPR locks rather than the missing sem_init. Can you test > this version and see if that makes it work? Yep, I get much farther through the tests with that patch. I'm currently diving into another assertion failure during socket disconnection: Assertion failure: fd->secret == NULL, at prlayer.c:45 cURL has some ominously vague references to this [1], though I'm not sure that we should work around it in the same way without knowing what the cause is... --Jacob [1] https://github.com/curl/curl/blob/4d2f800/lib/vtls/nss.c#L1266
> On 10 Nov 2020, at 21:11, Jacob Champion <pchampion@vmware.com> wrote: > On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <daniel@yesql.se> wrote: >> The attached switches to SSL_ConfigServerSessionIDCacheWithOpt >> with which one can explicitly make the cache non-shared, which in turn backs >> the mutexes with NSPR locks rather than the missing sem_init. Can you test >> this version and see if that makes it work? > > Yep, I get much farther through the tests with that patch. Great, thanks for confirming. > I'm currently > diving into another assertion failure during socket disconnection: > > Assertion failure: fd->secret == NULL, at prlayer.c:45 > > cURL has some ominously vague references to this [1], though I'm not > sure that we should work around it in the same way without knowing what > the cause is... Digging through the archives from when this landed in curl, the assertion failure was never fully identified back then but happened spuriously. Which version of NSPR is this happening with? cheers ./daniel
On Nov 10, 2020, at 2:28 PM, Daniel Gustafsson <daniel@yesql.se> wrote: > > Digging through the archives from when this landed in curl, the assertion > failure was never fully identified back then but happened spuriously. Which > version of NSPR is this happening with? This is NSPR 4.29, with debugging enabled. The fd that causes the assertion is the custom layer that's added during be_tls_open_server(), which connects a Port as the layer secret. It looks like NSPR is trying to help surface potential memory leaks by asserting if the secret is non-NULL at the time the stack is being closed. In this case, it doesn't matter since the Port lifetime is managed elsewhere, but it looks easy enough to add a custom close in the way that cURL and the NSPR test programs [1] do. Sample patch attached, which gets me to the end of the tests without any assertions. (Two failures left on my machine.) --Jacob [1] https://hg.mozilla.org/projects/nspr/file/bf6620c143/pr/tests/nblayer.c#l354
Attachment
On Nov 11, 2020, at 10:17 AM, Jacob Champion <pchampion@vmware.com> wrote: > > (Two failures left on my machine.) False alarm -- the stderr debugging I'd added in to track down the assertion tripped up the "no stderr" tests. Zero failing tests now. --Jacob
On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote: > > False alarm -- the stderr debugging I'd added in to track down the > assertion tripped up the "no stderr" tests. Zero failing tests now. I took a look at the OpenSSL interop problems you mentioned upthread. I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during connection. I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes were read, in order to satisfy its API. There was some discussion on this upthread: On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote: > > On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote: >> >>> + case PR_IO_TIMEOUT_ERROR: >>> + break; >> >> What does this mean? We'll return with a 0 errno here, right? When is >> this case reachable? > > It should, AFAICT, only be reachable when PR_Recv is used with a timeout which > we don't do. It mentioned somewhere that it had happened in no-wait calls due > to a bug, but I fail to find that reference now. Either way, I've removed it > to fall into the default error handling which now sets errno correctly as that > was a paddle short here. PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my machine. It doesn't look like the PR_Recv() API has a choice -- if there's no data, it can't return a positive integer, and returning zero means that the socket has been disconnected. So -1 with a timeout error is the only option. I'm not completely sure why this is exposed so easily with an OpenSSL server -- I'm guessing the implementation slices up its packets differently on the wire, causing a read event before NSS is able to decrypt a full record -- but it's worth noting that this case also shows up during NSS-to-NSS psql connections, when handling notifications at the end of every query. PQconsumeInput() reports a hard failure with the current implementation, but its return value is ignored by PrintNotifications(). Otherwise this probably would have showed up earlier. (What's the best way to test this case? Are there lower-level tests for the protocol/network layer somewhere that I'm missing?) While patching this case, I also noticed that pgtls_read() doesn't call SOCK_ERRNO_SET() for the disconnection case. That is also in the attached patch. --Jacob
Attachment
> On 12 Nov 2020, at 23:12, Jacob Champion <pchampion@vmware.com> wrote: > > On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote: >> >> False alarm -- the stderr debugging I'd added in to track down the >> assertion tripped up the "no stderr" tests. Zero failing tests now. > > I took a look at the OpenSSL interop problems you mentioned upthread. Great, thanks! > I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during > connection. > > I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes > were read, in order to satisfy its API. There was some discussion on > this upthread: > > On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote: >> >> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote: >>> >>>> + case PR_IO_TIMEOUT_ERROR: >>>> + break; >>> >>> What does this mean? We'll return with a 0 errno here, right? When is >>> this case reachable? >> >> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which >> we don't do. It mentioned somewhere that it had happened in no-wait calls due >> to a bug, but I fail to find that reference now. Either way, I've removed it >> to fall into the default error handling which now sets errno correctly as that >> was a paddle short here. > > PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my > machine. It doesn't look like the PR_Recv() API has a choice -- if > there's no data, it can't return a positive integer, and returning zero > means that the socket has been disconnected. So -1 with a timeout error > is the only option. Right, that makes sense. > I'm not completely sure why this is exposed so easily with an OpenSSL > server -- I'm guessing the implementation slices up its packets > differently on the wire, causing a read event before NSS is able to > decrypt a full record -- but it's worth noting that this case also shows > up during NSS-to-NSS psql connections, when handling notifications at > the end of every query. PQconsumeInput() reports a hard failure with the > current implementation, but its return value is ignored by > PrintNotifications(). Otherwise this probably would have showed up > earlier. Should there perhaps be an Assert there to catch those? > (What's the best way to test this case? Are there lower-level tests for > the protocol/network layer somewhere that I'm missing?) Not AFAIK. Having been knee-deep now, do you have any ideas on how to implement? > While patching this case, I also noticed that pgtls_read() doesn't call > SOCK_ERRNO_SET() for the disconnection case. That is also in the > attached patch. Ah yes, nice catch. I've incorporated this patch as well as the previous patch for the assertion failure on private callback data into the attached v19 patchset. I also did a spellcheck and pgindent run on it for ease of review. cheers ./daniel
Attachment
On Nov 13, 2020, at 4:14 AM, Daniel Gustafsson <daniel@yesql.se> wrote: >> On 12 Nov 2020, at 23:12, Jacob Champion <pchampion@vmware.com> wrote: >> >> I'm not completely sure why this is exposed so easily with an OpenSSL >> server -- I'm guessing the implementation slices up its packets >> differently on the wire, causing a read event before NSS is able to >> decrypt a full record -- but it's worth noting that this case also shows >> up during NSS-to-NSS psql connections, when handling notifications at >> the end of every query. PQconsumeInput() reports a hard failure with the >> current implementation, but its return value is ignored by >> PrintNotifications(). Otherwise this probably would have showed up >> earlier. > > Should there perhaps be an Assert there to catch those? Hm. From the perspective of helping developers out, perhaps, but from the standpoint of "don't crash when an endpoint outside our control does something strange", I think that's a harder sell. Should the error be bubbled all the way up instead? Or perhaps, if psql isn't supposed to treat notification errors as "hard" failures, it should at least warn the user that something is fishy? >> (What's the best way to test this case? Are there lower-level tests for >> the protocol/network layer somewhere that I'm missing?) > > Not AFAIK. Having been knee-deep now, do you have any ideas on how to > implement? I think that testing these sorts of important edge cases needs a friendly DSL -- something that doesn't want to make devs tear their hair out while building tests. I've been playing a little bit with Scapy [1] to understand more of the libpq v3 protocol; I'll see if that can be adapted for pieces of the TLS handshake in a way that's easy to maintain. If it can be, maybe that'd be a good starting example. > I've incorporated this patch as well as the previous patch for the assertion > failure on private callback data into the attached v19 patchset. I also did a > spellcheck and pgindent run on it for ease of review. Commit 6be725e70 got rid of some psql error messaging that the tests were keying off of, so there are a few new failures after a rebase onto latest master. I've attached a patch that gets the SCRAM tests a little further (certificate hashing was caught in an infinite loop). I also added error checks to those loops, along the lines of the existing OpenSSL implementation: if a suitable digest can't be found, the user will see an error like psql: error: could not find digest for OID 'PKCS #1 SHA-256 With RSA Encryption' It's a little verbose but I don't think this case should come up in normal practice. --Jacob [1] https://scapy.net/
Attachment
> On 16 Nov 2020, at 21:00, Jacob Champion <pchampion@vmware.com> wrote: > On Nov 13, 2020, at 4:14 AM, Daniel Gustafsson <daniel@yesql.se> wrote: >> I've incorporated this patch as well as the previous patch for the assertion >> failure on private callback data into the attached v19 patchset. I also did a >> spellcheck and pgindent run on it for ease of review. > > Commit 6be725e70 got rid of some psql error messaging that the tests > were keying off of, so there are a few new failures after a rebase onto > latest master. > > I've attached a patch that gets the SCRAM tests a little further > (certificate hashing was caught in an infinite loop). I also added error > checks to those loops, along the lines of the existing OpenSSL > implementation: if a suitable digest can't be found, the user will see > an error like > > psql: error: could not find digest for OID 'PKCS #1 SHA-256 With RSA Encryption' > > It's a little verbose but I don't think this case should come up in > normal practice. Nice, thanks for the fix! I've incorporated your patch into the attached v20 which also fixes client side error reporting to be more readable. The SCRAM tests are now also hooked up, albeit with SKIP blocks for NSS, so they can start getting fixed. cheers ./daniel
Attachment
On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson <daniel@yesql.se> wrote: > > Nice, thanks for the fix! I've incorporated your patch into the attached v20 > which also fixes client side error reporting to be more readable. I was testing handshake failure modes and noticed that some FATAL messages are being sent through to the client in cleartext. The OpenSSL implementation doesn't do this, because it logs handshake problems at COMMERROR level. Should we switch all those ereport() calls in the NSS be_tls_open_server() to COMMERROR as well (and return explicitly), to avoid this? Or was there a reason for logging at FATAL/ERROR level? Related note, at the end of be_tls_open_server(): > ... > port->ssl_in_use = true; > return 0; > > error: > return 1; > } This needs to return -1 in the error case; the only caller of secure_open_server() does a direct `result == -1` comparison rather than checking `result != 0`. --Jacob
On Tue, 2020-10-27 at 21:07 +0100, Daniel Gustafsson wrote: > > On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote: > > > > > +static SECStatus > > > +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) > > > +{ > > > + SECStatus status; > > > + Port *port = (Port *) arg; > > > + CERTCertificate *cert; > > > + char *peer_cn; > > > + int len; > > > + > > > + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); > > > + if (status == SECSuccess) > > > + { > > > + cert = SSL_PeerCertificate(port->pr_fd); > > > + len = strlen(cert->subjectName); > > > + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); > > > + if (strncmp(cert->subjectName, "CN=", 3) == 0) > > > + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); > > > + else > > > + strlcpy(peer_cn, cert->subjectName, len + 1); > > > + CERT_DestroyCertificate(cert); > > > + > > > + port->peer_cn = peer_cn; > > > + port->peer_cert_valid = true; > > > > Hm. We either should have something similar to > > > > /* > > * Reject embedded NULLs in certificate common name to prevent > > * attacks like CVE-2009-4034. > > */ > > if (len != strlen(peer_cn)) > > { > > ereport(COMMERROR, > > (errcode(ERRCODE_PROTOCOL_VIOLATION), > > errmsg("SSL certificate's common name contains embedded null"))); > > pfree(peer_cn); > > return -1; > > } > > here, or a comment explaining why not. > > We should, but it's proving rather difficult as there is no equivalent API call > to get the string as well as the expected length of it. I'm going to try to tackle this part next. It looks like NSS uses RFC 4514 (or something like it) backslash-quoting, which this code either needs to undo or bypass before performing a comparison. --Jacob
On Tue, Nov 17, 2020 at 04:00:53PM +0100, Daniel Gustafsson wrote: > Nice, thanks for the fix! I've incorporated your patch into the attached v20 > which also fixes client side error reporting to be more readable. The SCRAM > tests are now also hooked up, albeit with SKIP blocks for NSS, so they can > start getting fixed. On top of the set of TODO items mentioned in the logs of the patches, this patch set needs a rebase because it does not apply. In order to move on with this set, I would suggest to extract some parts of the patch set independently of the others and have two buildfarm members for the MSVC and non-MSVC cases to stress the parts that can be committed. Just seeing the size, we could move on with: - The ./configure set, with the change to introduce --with-ssl=openssl. - 0004 for strong randoms. - Support for cryptohashes. +/* + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef + * our version to avoid compiler warnings on redefinition. + */ +#define pg_BITS_PER_BYTE BITS_PER_BYTE +#undef BITS_PER_BYTE This could be done separately. src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no? -- Michael
Attachment
> On 18 Jan 2021, at 08:08, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Nov 17, 2020 at 04:00:53PM +0100, Daniel Gustafsson wrote: >> Nice, thanks for the fix! I've incorporated your patch into the attached v20 >> which also fixes client side error reporting to be more readable. The SCRAM >> tests are now also hooked up, albeit with SKIP blocks for NSS, so they can >> start getting fixed. > > On top of the set of TODO items mentioned in the logs of the patches, > this patch set needs a rebase because it does not apply. Fixed in the attached, which also addresses the points raised earlier by Jacob as well as adds certificates created entirely by NSS tooling as well as initial cryptohash support. There is something iffy with these certs (the test fails on mismatching ciphers and/or signature algorithms) that I haven't been able to pin down, but to get more eyes on this I'm posting the patch with the test enabled. The NSS toolchain requires interactive input which makes the Makefile a bit hacky, ideas on cleaning that up are appreciated. > In order to > move on with this set, I would suggest to extract some parts of the > patch set independently of the others and have two buildfarm members > for the MSVC and non-MSVC cases to stress the parts that can be > committed. Just seeing the size, we could move on with: > - The ./configure set, with the change to introduce --with-ssl=openssl. > - 0004 for strong randoms. > - Support for cryptohashes. I will leave it to others to decide the feasibility of this, I'm happy to slice and dice the commits into smaller bits to for example separate out the --with-ssl autoconf change into a non NSS dependent commit, if that's wanted. > +/* > + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef > + * our version to avoid compiler warnings on redefinition. > + */ > +#define pg_BITS_PER_BYTE BITS_PER_BYTE > +#undef BITS_PER_BYTE > This could be done separately. Based on an offlist discussion I believe this was a misunderstanding, but if I instead misunderstood that feel free to correct me with how you think this should be done. > src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no? Good point, fixed. cheers ./daniel
Attachment
> On 4 Dec 2020, at 01:57, Jacob Champion <pchampion@vmware.com> wrote: > > On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson <daniel@yesql.se> wrote: >> >> Nice, thanks for the fix! I've incorporated your patch into the attached v20 >> which also fixes client side error reporting to be more readable. > > I was testing handshake failure modes and noticed that some FATAL > messages are being sent through to the client in cleartext. The OpenSSL > implementation doesn't do this, because it logs handshake problems at > COMMERROR level. Should we switch all those ereport() calls in the NSS > be_tls_open_server() to COMMERROR as well (and return explicitly), to > avoid this? Or was there a reason for logging at FATAL/ERROR level? The ERROR logging made early development easier but then stuck around, I've changed them to COMMERROR returning an error instead in the v21 patch just sent to the list. > Related note, at the end of be_tls_open_server(): > >> ... >> port->ssl_in_use = true; >> return 0; >> >> error: >> return 1; >> } > > This needs to return -1 in the error case; the only caller of > secure_open_server() does a direct `result == -1` comparison rather than > checking `result != 0`. Fixed. cheers ./daniel
On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote: > There is something iffy with these certs (the test fails > on mismatching ciphers and/or signature algorithms) that I haven't been able to > pin down, but to get more eyes on this I'm posting the patch with the test > enabled. Removing `--keyUsage keyEncipherment` from the native_server-* CSR generation seems to let the tests pass for me, but I'm wary of just pushing that as a solution because I don't understand why that would have anything to do with the failure mode (SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM). > The NSS toolchain requires interactive input which makes the Makefile > a bit hacky, ideas on cleaning that up are appreciated. Hm. I got nothing, short of a feature request to NSS... --Jacob
> On 20 Jan 2021, at 01:40, Jacob Champion <pchampion@vmware.com> wrote: > > On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote: >> There is something iffy with these certs (the test fails >> on mismatching ciphers and/or signature algorithms) that I haven't been able to >> pin down, but to get more eyes on this I'm posting the patch with the test >> enabled. > > Removing `--keyUsage keyEncipherment` from the native_server-* CSR > generation seems to let the tests pass for me, but I'm wary of just > pushing that as a solution because I don't understand why that would > have anything to do with the failure mode > (SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM). Aha, that was a good clue, I had overlooked the required extensions in the CSR. Re-reading RFC 5280 it seems we need keyEncipherment, dataEncipherment and digitalSignature to create a valid SSL Server certificate. Adding those indeed make the test pass. Skimming the certutil code *I think* removing it as you did cause a set of defaults to kick in that made it work based on the parameter "--nsCertType sslServer", but it's not entirely easy to make out. Either way, relying on defaults in a test suite seems less than good, so I've extended the Makefile to be explicit about the extensions. The attached v22 rebase incorporates the fixup to the test Makefile, with not further changes on top of that. cheers ./daniel
Attachment
On Wed, 2021-01-20 at 12:58 +0100, Daniel Gustafsson wrote: > Aha, that was a good clue, I had overlooked the required extensions in the CSR. > Re-reading RFC 5280 it seems we need keyEncipherment, dataEncipherment and > digitalSignature to create a valid SSL Server certificate. Adding those indeed > make the test pass. Skimming the certutil code *I think* removing it as you > did cause a set of defaults to kick in that made it work based on the parameter > "--nsCertType sslServer", but it's not entirely easy to make out. Lovely. I didn't expect *removing* an extension to effectively *add* more, but I'm glad it works now. == To continue the Subject Common Name discussion [1] from a different part of the thread: Attached is a v23 version of the patchset that peels the raw Common Name out from a client cert's Subject. This allows the following cases that the OpenSSL implementation currently handles: - subjects that don't begin with a CN - subjects with quotable characters - subjects that have no CN at all Embedded NULLs are now handled in a similar manner to the OpenSSL side, though because this failure happens during the certificate authentication callback, it results in a TLS alert rather than simply closing the connection. For easier review of just the parts I've changed, I've also attached a since-v22.diff, which is part of the 0001 patch. --Jacob [1] https://www.postgresql.org/message-id/7d6a23a7e30540b486abc823f7ced7a93e1da1e8.camel%40vmware.com
Attachment
On Tue, Jan 19, 2021 at 09:21:41PM +0100, Daniel Gustafsson wrote: >> In order to >> move on with this set, I would suggest to extract some parts of the >> patch set independently of the others and have two buildfarm members >> for the MSVC and non-MSVC cases to stress the parts that can be >> committed. Just seeing the size, we could move on with: >> - The ./configure set, with the change to introduce --with-ssl=openssl. >> - 0004 for strong randoms. >> - Support for cryptohashes. > > I will leave it to others to decide the feasibility of this, I'm happy to slice > and dice the commits into smaller bits to for example separate out the > --with-ssl autoconf change into a non NSS dependent commit, if that's wanted. IMO it makes sense to extract the independent pieces and build on top of them. The bulk of the changes is likely going to have a bunch of comments if reviewed deeply, so I think that we had better remove from the stack the small-ish problems to ease the next moves. The ./configure part and replacement of with_openssl by with_ssl is mixed in 0001 and 0002, which is actually confusing. And, FWIW, I would be fine with applying a patch that introduces a --with-ssl with a compatibility kept for --with-openssl. This is what 0001 is doing, actually, similarly to the past switches for --with-uuid. A point that has been mentioned offline by you, but not mentioned on this list. The structure of the modules in src/test/ssl/ could be refactored to help with an easier integration of more SSL libraries. This makes sense taken independently. > Based on an offlist discussion I believe this was a misunderstanding, but if I > instead misunderstood that feel free to correct me with how you think this > should be done. The point would be to rename BITS_PER_BYTE to PG_BITS_PER_BYTE in the code and avoid conflicts. I am not completely sure if others would agree here, but this would remove quite some ifdef/undef stuff from the code dedicated to NSS. > > src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no? > > Good point, fixed. Please note that patch 0001 is failing to apply after the recent commit b663a41. There are conflicts in postgres_fdw.out. Also, what's the minimum version of NSS that would be supported? It would be good to define an acceptable older version, to keep that documented and to track that perhaps with some configure checks (?), similarly to what is done for OpenSSL. Patch 0006 has three trailing whitespaces (git diff --check complains). Running the regression tests of pgcrypto, I think that the SHA2 implementation is not completely right. Some SHA2 encoding reports results from already-freed data. I have spotted a second issue within scram_HMAC_init(), where pg_cryptohash_create() remains stuck inside NSS_InitContext(), freezing the regression tests where password hashed for SCRAM are created. + ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner); + ctx = MemoryContextAlloc(TopMemoryContext, sizeof(pg_cryptohash_ctx)); +#else + ctx = pg_malloc(sizeof(pg_cryptohash_ctx)); +#endif cryptohash_nss.c cannot use pg_malloc() for frontend allocations. On OOM, your patch would call exit() directly, even within libpq. But shared library callers need to know about the OOM failure. + explicit_bzero(ctx, sizeof(pg_cryptohash_ctx)); + pfree(ctx); For similar reasons, pfree should not be used for the frontend code in cryptohash_nss.c. The fallback should be just a malloc/free set. + status = PK11_DigestBegin(ctx->pk11_context); + + if (status != SECSuccess) + return 1; + return 0; This needs to return -1 on failure, not 1. I really need to study more the choide of the options chosen for NSS_InitContext()... But based on the docs I can read on the matter I think that saving nsscontext in pg_cryptohash_ctx is right for each cryptohash built. src/tools/msvc/ is missing an update for cryptohash_nss.c. -- Michael
Attachment
On Mon, 2020-07-20 at 15:35 +0200, Daniel Gustafsson wrote: > With this, I have one failing test ("intermediate client certificate is > provided by client") which I've left failing since I believe the case should be > supported by NSS. The issue is most likely that I havent figured out the right > certinfo incantation to make it so (Mozilla hasn't strained themselves when > writing documentation for this toolchain, or any part of NSS for that matter). I think we're missing a counterpart to this piece of the OpenSSL implementation, in be_tls_init(): if (ssl_ca_file[0]) { ... SSL_CTX_set_client_CA_list(context, root_cert_list); } I think the NSS equivalent to SSL_CTX_set_client_CA_list() is probably SSL_SetTrustAnchors() (which isn't called out in the online NSS docs, as far as I can see). What I'm less sure of is how we want the NSS counterpart to ssl_ca_file to behave. The OpenSSL implementation allows a list of CA names to be sent. Should the NSS side take a list of CA cert nicknames? a list of Subjects? something else? mod_nss for httpd had a proposed feature [1] to do this that unfortunately withered on the vine, and Google returns ~500 results for "SSL_SetTrustAnchors", so I'm unaware of any prior art in the wild... --Jacob [1] https://bugzilla.redhat.com/show_bug.cgi?id=719401
On Thu, 2021-01-21 at 14:21 +0900, Michael Paquier wrote: > Also, what's the minimum version of NSS that would be supported? It > would be good to define an acceptable older version, to keep that > documented and to track that perhaps with some configure checks (?), > similarly to what is done for OpenSSL. Some version landmarks: - 3.21 adds support for extended master secret, which according to [1] is required for SCRAM channel binding to actually be secure. - 3.26 is Debian Stretch. - 3.28 is Ubuntu 16.04, and RHEL6 (I think). - 3.35 is Ubuntu 18.04. - 3.36 is RHEL7 (I think). - 3.39 gets us final TLS 1.3 support. - 3.42 is Debian Buster. - 3.49 is Ubuntu 20.04. (I'm having trouble finding online package information for RHEL variants, so I've pulled those versions from online supportdocs. If someone notices that those are wrong please speak up.) So 3.39 would guarantee TLS1.3 but exclude a decent chunk of still- supported Debian-alikes. Anything less than 3.21 seems actively unsafe unless we disable SCRAM with those versions. Any other important landmarks (whether feature- or distro-related) we need to consider? --Jacob [1] https://tools.ietf.org/html/rfc7677#section-4
On Wed, Jan 20, 2021 at 05:07:08PM +0000, Jacob Champion wrote: > Lovely. I didn't expect *removing* an extension to effectively *add* > more, but I'm glad it works now. My apologies for chiming in. I was looking at your patch set here, and while reviewing the strong random and cryptohash parts I have found a couple of mistakes in the ./configure part. I think that the switch from --with-openssl to --with-ssl={openssl} could just be done independently as a building piece of the rest, then the first portion based on NSS could just add the minimum set in configure.ac. Please note that the patch set has been using autoconf from Debian, or something forked from upstream. There were also missing updates in several parts of the code base, and a lack of docs for the new switch. I have spent time checking that with --with-openssl to make sure that the obsolete grammar is still compatible, --with-ssl=openssl and also without it. Thoughts? -- Michael
Attachment
On Wed, 2021-01-27 at 16:39 +0900, Michael Paquier wrote: > My apologies for chiming in. I was looking at your patch set here, > and while reviewing the strong random and cryptohash parts I have > found a couple of mistakes in the ./configure part. I think that the > switch from --with-openssl to --with-ssl={openssl} could just be done > independently as a building piece of the rest, then the first portion > based on NSS could just add the minimum set in configure.ac. > > Please note that the patch set has been using autoconf from Debian, or > something forked from upstream. There were also missing updates in > several parts of the code base, and a lack of docs for the new > switch. I have spent time checking that with --with-openssl to make > sure that the obsolete grammar is still compatible, --with-ssl=openssl > and also without it. > > Thoughts? Seems good to me on Ubuntu; builds with both flavors. From peering at the Windows side: > --- a/src/tools/msvc/config_default.pl > +++ b/src/tools/msvc/config_default.pl > @@ -16,7 +16,7 @@ our $config = { > tcl => undef, # --with-tcl=<path> > perl => undef, # --with-perl=<path> > python => undef, # --with-python=<path> > - openssl => undef, # --with-openssl=<path> > + openssl => undef, # --with-ssl=openssl with <path> > uuid => undef, # --with-uuid=<path> > xml => undef, # --with-libxml=<path> > xslt => undef, # --with-libxslt=<path> So to check understanding: the `openssl` config variable is still alive for MSVC builds; it just turns that into `--with-ssl=openssl` in the fake CONFIGURE_ARGS? <bikeshed color="lightblue"> Since SSL is an obsolete term, and the choice of OpenSSL vs NSS vs [nothing] affects server operation (such as cryptohash) regardless of whether or not connection-level TLS is actually used, what would you all think about naming this option --with-crypto? I.e. --with-crypto=openssl --with-crypto=nss </bikeshed> --Jacob
On Wed, Jan 27, 2021 at 06:47:17PM +0000, Jacob Champion wrote: > So to check understanding: the `openssl` config variable is still alive > for MSVC builds; it just turns that into `--with-ssl=openssl` in the > fake CONFIGURE_ARGS? Yeah, I think that keeping both variables separated in the MSVC scripts is the most straight-forward option, as this passes down a path. Once there is a value for nss, we'd need to properly issue an error if both OpenSSL and NSS are specified. > Since SSL is an obsolete term, and the choice of OpenSSL vs NSS vs > [nothing] affects server operation (such as cryptohash) regardless of > whether or not connection-level TLS is actually used, what would you > all think about naming this option --with-crypto? I.e. > > --with-crypto=openssl > --with-crypto=nss Looking around, curl has multiple switches for each lib with one named --with-ssl for OpenSSL, but it needs to be able to use multiple libraries at run time. I can spot that libssh2 uses what you are proposing. It seems to me that --with-ssl is a bit more popular but not by that much: wget, wayland, some apache stuff (it uses a path as option value). Anyway, what you are suggesting sounds like a good in the context of Postgres. Daniel? -- Michael
Attachment
> On 28 Jan 2021, at 07:06, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Jan 27, 2021 at 06:47:17PM +0000, Jacob Champion wrote: >> Since SSL is an obsolete term, and the choice of OpenSSL vs NSS vs >> [nothing] affects server operation (such as cryptohash) regardless of >> whether or not connection-level TLS is actually used, what would you >> all think about naming this option --with-crypto? I.e. >> >> --with-crypto=openssl >> --with-crypto=nss > > Looking around, curl has multiple switches for each lib with one named > --with-ssl for OpenSSL, but it needs to be able to use multiple > libraries at run time. To be fair, if we started over in curl I would push back on --with-ssl meaning OpenSSL but that ship has long since sailed. > I can spot that libssh2 uses what you are > proposing. It seems to me that --with-ssl is a bit more popular but > not by that much: wget, wayland, some apache stuff (it uses a path as > option value). Anyway, what you are suggesting sounds like a good in > the context of Postgres. Daniel? SSL is admittedly an obsolete technical term, but it's one that enough people have decided is interchangeable with TLS that it's not a hill worth dying on IMHO. Since postgres won't allow for using libnss or OpenSSL for cryptohash *without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is more descriptive and less confusing. -- Daniel Gustafsson https://vmware.com/
On Thu, 2021-01-21 at 20:16 +0000, Jacob Champion wrote: > I think we're missing a counterpart to this piece of the OpenSSL > implementation, in be_tls_init(): Never mind. Using SSL_SetTrustAnchor is something we could potentially do if we wanted to further limit the CAs that are actually sent to the client, but it shouldn't be necessary to get the tests to pass. I now think that it's just a matter of making sure that the "server-cn- only" DB has the root_ca.crt included, so that it can correctly validate the client certificate. Incidentally I think this should also fix the remaining failing SCRAM test. I'll try to get a patch out tomorrow, if adding the root CA doesn't invalidate some other test logic. --Jacob
On Fri, Jan 29, 2021 at 12:20:21AM +0100, Daniel Gustafsson wrote: > SSL is admittedly an obsolete technical term, but it's one that enough people > have decided is interchangeable with TLS that it's not a hill worth dying on > IMHO. Since postgres won't allow for using libnss or OpenSSL for cryptohash > *without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is > more descriptive and less confusing. Okay, let's use --with-ssl then for the new switch name. The previous patch is backward-compatible, and will simplify the rest of the set, so let's move on with it. Once this is done, my guess is that it would be cleaner to have a new patch that includes only the ./configure and MSVC changes, and then the rest: test refactoring, cryptohash, strong random and lastly TLS (we may want to cut this a bit more though and perhaps have some restrictions depending on the scope of options a first patch set could support). I'll wait a bit first to see if there are any objections to this change. -- Michael
Attachment
> On 21 Jan 2021, at 06:21, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jan 19, 2021 at 09:21:41PM +0100, Daniel Gustafsson wrote: >>> In order to >>> move on with this set, I would suggest to extract some parts of the >>> patch set independently of the others and have two buildfarm members >>> for the MSVC and non-MSVC cases to stress the parts that can be >>> committed. Just seeing the size, we could move on with: >>> - The ./configure set, with the change to introduce --with-ssl=openssl. >>> - 0004 for strong randoms. >>> - Support for cryptohashes. >> >> I will leave it to others to decide the feasibility of this, I'm happy to slice >> and dice the commits into smaller bits to for example separate out the >> --with-ssl autoconf change into a non NSS dependent commit, if that's wanted. > > IMO it makes sense to extract the independent pieces and build on top > of them. The bulk of the changes is likely going to have a bunch of > comments if reviewed deeply, so I think that we had better remove from > the stack the small-ish problems to ease the next moves. The > ./configure part and replacement of with_openssl by with_ssl is mixed > in 0001 and 0002, which is actually confusing. And, FWIW, I would be > fine with applying a patch that introduces a --with-ssl with a > compatibility kept for --with-openssl. This is what 0001 is doing, > actually, similarly to the past switches for --with-uuid. This has been discussed elsewhere in the thread, so let's continue that there. The attached v23 does however split off --with-ssl for OpenSSL in 0001, adding the nss option in 0002. > A point that has been mentioned offline by you, but not mentioned on > this list. The structure of the modules in src/test/ssl/ could be > refactored to help with an easier integration of more SSL libraries. > This makes sense taken independently. This has been submitted in F513E66A-E693-4802-9F8A-A74C1D0E3D10@yesql.se. >> Based on an offlist discussion I believe this was a misunderstanding, but if I >> instead misunderstood that feel free to correct me with how you think this >> should be done. > > The point would be to rename BITS_PER_BYTE to PG_BITS_PER_BYTE in the > code and avoid conflicts. I am not completely sure if others would > agree here, but this would remove quite some ifdef/undef stuff from > the code dedicated to NSS. Aha, now I see what you mean, sorry for the confusion. That can certainly be done (and done so outside of this patchset), but it admittedly feels a bit intrusive. If there is consensus that we should namespace our version like this I'll go ahead and do that. >>> src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no? >> >> Good point, fixed. > > Please note that patch 0001 is failing to apply after the recent > commit b663a41. There are conflicts in postgres_fdw.out. Fixed. > Patch 0006 has three trailing whitespaces (git diff --check complains). Fixed. > Running the regression tests of pgcrypto, I think that > the SHA2 implementation is not completely right. Some SHA2 encoding > reports results from already-freed data. I've been unable to reproduce, can you shed some light on this? > I have spotted a second > issue within scram_HMAC_init(), where pg_cryptohash_create() remains > stuck inside NSS_InitContext(), freezing the regression tests where > password hashed for SCRAM are created. I think the freezing you saw comes from opening and closing NSS contexts per cryptohash op (some patience on my part runs the test Ok in ~30s which is clearly not in the wheelhouse of acceptable), more on that below. > + ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner); > + ctx = MemoryContextAlloc(TopMemoryContext, sizeof(pg_cryptohash_ctx)); > +#else > + ctx = pg_malloc(sizeof(pg_cryptohash_ctx)); > +#endif > cryptohash_nss.c cannot use pg_malloc() for frontend allocations. On > OOM, your patch would call exit() directly, even within libpq. But > shared library callers need to know about the OOM failure. Of course, fixed. > + status = PK11_DigestBegin(ctx->pk11_context); > + > + if (status != SECSuccess) > + return 1; > + return 0; > This needs to return -1 on failure, not 1. Doh, fixed. > I really need to study more the choide of the options chosen for > NSS_InitContext()... But based on the docs I can read on the matter I > think that saving nsscontext in pg_cryptohash_ctx is right for each > cryptohash built. It's a safe but slow option, NSS wasn't really made for running a single crypto operation. Since we are opening a context which isn't backed by an NSS database we could have a static context, which indeed speeds up processing a lot. The problem with that is that there is no good callsite for closing the context as the backend is closing down. Since you are kneedeep in the cryptohash code, do you have any thoughts on this? I've included 0008 which implements this, with a commented out dummy stub for cleaning up. Making nss_context static in cryptohash_nss.c is appealing but there is no good option for closing it there. Any thoughts on how to handle global contexts like this? > src/tools/msvc/ is missing an update for cryptohash_nss.c. Fixed. -- Daniel Gustafsson https://vmware.com/
Attachment
- v23-0008-NSS-Make-the-cryptohash-NSSInitContext-static-as.patch
- v23-0007-NSS-cryptohash-support.patch
- v23-0006-NSS-contrib-modules.patch
- v23-0005-NSS-Documentation.patch
- v23-0004-NSS-pg_strong_random-support.patch
- v23-0003-NSS-Testharness-updates.patch
- v23-0002-NSS-Frontend-Backend-and-build-infrastructure.patch
- v23-0001-Introduce-with-ssl.patch
> On 29 Jan 2021, at 07:01, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jan 29, 2021 at 12:20:21AM +0100, Daniel Gustafsson wrote: >> SSL is admittedly an obsolete technical term, but it's one that enough people >> have decided is interchangeable with TLS that it's not a hill worth dying on >> IMHO. Since postgres won't allow for using libnss or OpenSSL for cryptohash >> *without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is >> more descriptive and less confusing. > > Okay, let's use --with-ssl then for the new switch name. The previous > patch is backward-compatible, and will simplify the rest of the set, > so let's move on with it. Once this is done, my guess is that it > would be cleaner to have a new patch that includes only the > ./configure and MSVC changes, and then the rest: test refactoring, > cryptohash, strong random and lastly TLS (we may want to cut this a > bit more though and perhaps have some restrictions depending on the > scope of options a first patch set could support). > > I'll wait a bit first to see if there are any objections to this > change. I'm still not convinced that adding --with-ssl=openssl is worth it before the rest of NSS goes in (and more importantly, *if* it goes in). On the one hand, we already have pluggable (for some value of) support for adding TLS libraries, and adding --with-ssl is one more piece of that puzzle. We could of course have endless --with-X options instead but as you say, --with-uuid has set the tone here (and I believe that's good). On the other hand, if we never add any other library than OpenSSL then it's just complexity without benefit. As mentioned elsewhere in the thread, the current v23 patchset has the --with-ssl change as a separate commit to at least make it visual what it looks like. The documentation changes are in the main NSS patch though since documenting --with-ssl when there is only one possible value didn't seem to be helpful to users whom are fully expected to use --with-openssl still. -- Daniel Gustafsson https://vmware.com/
On Fri, 2021-01-29 at 13:57 +0100, Daniel Gustafsson wrote: > > On 21 Jan 2021, at 06:21, Michael Paquier <michael@paquier.xyz> wrote: > > I really need to study more the choide of the options chosen for > > NSS_InitContext()... But based on the docs I can read on the matter I > > think that saving nsscontext in pg_cryptohash_ctx is right for each > > cryptohash built. > > It's a safe but slow option, NSS wasn't really made for running a single crypto > operation. Since we are opening a context which isn't backed by an NSS > database we could have a static context, which indeed speeds up processing a > lot. The problem with that is that there is no good callsite for closing the > context as the backend is closing down. Since you are kneedeep in the > cryptohash code, do you have any thoughts on this? I've included 0008 which > implements this, with a commented out dummy stub for cleaning up. > > Making nss_context static in cryptohash_nss.c is > appealing but there is no good option for closing it there. Any thoughts on > how to handle global contexts like this? I'm completely new to this code, so take my thoughts with a grain of salt... I think the bad news is that the static approach will need support for ENABLE_THREAD_SAFETY. (It looks like the NSS implementation of pgtls_close() needs some thread support too?) The good(?) news is that I don't understand why OpenSSL's implementation of cryptohash doesn't _also_ need the thread-safety code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al before using any of its cryptohash implementation?) So maybe we can implement the same global setup/teardown API for OpenSSL too and not have to one-off it for NSS... --Jacob
On Fri, Jan 29, 2021 at 02:13:30PM +0100, Daniel Gustafsson wrote: > I'm still not convinced that adding --with-ssl=openssl is worth it before the > rest of NSS goes in (and more importantly, *if* it goes in). > > On the one hand, we already have pluggable (for some value of) support for > adding TLS libraries, and adding --with-ssl is one more piece of that puzzle. > We could of course have endless --with-X options instead but as you say, > --with-uuid has set the tone here (and I believe that's good). On the other > hand, if we never add any other library than OpenSSL then it's just complexity > without benefit. IMO, one could say the same thing for any piece of refactoring we have done in the past to make the TLS/crypto code more modular. There is demand for being able to choose among multiple SSL libs at build time, and we are still in a phase where we evaluate the options at hand. This refactoring is just careful progress, and this is one step in this direction. The piece about refactoring the SSL tests is similar. > As mentioned elsewhere in the thread, the current v23 patchset has the > --with-ssl change as a separate commit to at least make it visual what it looks > like. The documentation changes are in the main NSS patch though since > documenting --with-ssl when there is only one possible value didn't seem to be > helpful to users whom are fully expected to use --with-openssl still. The documentation changes should be part of the patch introducing the switch IMO: a description of the new switch, as well as a paragraph about the old value being deprecated. That's done this way for UUID. -- Michael
Attachment
On Fri, Jan 29, 2021 at 01:57:02PM +0100, Daniel Gustafsson wrote: > This has been discussed elsewhere in the thread, so let's continue that there. > The attached v23 does however split off --with-ssl for OpenSSL in 0001, adding > the nss option in 0002. While going through 0001, I have found a couple of things. -CF_SRCS = $(if $(subst no,,$(with_openssl)), $(OSSL_SRCS), $(INT_SRCS)) -CF_TESTS = $(if $(subst no,,$(with_openssl)), $(OSSL_TESTS), $(INT_TESTS)) +CF_SRCS = $(if $(subst openssl,,$(with_ssl)), $(OSSL_SRCS), $(INT_SRCS)) +CF_TESTS = $(if $(subst openssl,,$(with_ssl)), $(OSSL_TESTS), $(INT_TESTS)) It seems to me that this part is the opposite, aka here the OpenSSL files and tests (OSSL*) would be used if with_ssl is not openssl. -ifeq ($(with_openssl),yes) +ifneq ($(with_ssl),no) +OBJS += \ + fe-secure-common.o +endif This split is better, good idea. The two SSL tests still included a reference to with_openssl after 0001: src/test/ssl/t/001_ssltests.pl:if ($ENV{with_openssl} eq 'yes') src/test/ssl/t/002_scram.pl:if ($ENV{with_openssl} ne 'yes') I have refreshed the docs on top to be consistent with the new configuration, and applied it after more checks. I'll try to look in more details at the failures with cryptohashes I found upthread. -- Michael
Attachment
> On 20 Jan 2021, at 18:07, Jacob Champion <pchampion@vmware.com> wrote: > To continue the Subject Common Name discussion [1] from a different > part of the thread: > > Attached is a v23 version of the patchset that peels the raw Common > Name out from a client cert's Subject. This allows the following cases > that the OpenSSL implementation currently handles: > > - subjects that don't begin with a CN > - subjects with quotable characters > - subjects that have no CN at all Nice, thanks for fixing this! > Embedded NULLs are now handled in a similar manner to the OpenSSL side, > though because this failure happens during the certificate > authentication callback, it results in a TLS alert rather than simply > closing the connection. But returning SECFailure from the cert callback force NSS to terminate the connection immediately doesn't it? > For easier review of just the parts I've changed, I've also attached a > since-v22.diff, which is part of the 0001 patch. I confused my dev trees and missed to include this in the v23 that I sent out (which should've been v24), sorry about that. Attached is a v24 which is rebased on top of todays --with-ssl commit, and now includes your changes. Additionally I've added a shutdown callback such that we close the connection immediately if NSS is shutting down from underneath us. I can't imagine a scenario in which that's benign, so let's take whatever precautions we can. I've also changed the NSS initialization in the cryptohash code to closer match what the NSS documentation recommends for similar scenarios, but more on that downthread where that's discussed. -- Daniel Gustafsson https://vmware.com/
Attachment
> On 29 Jan 2021, at 19:46, Jacob Champion <pchampion@vmware.com> wrote: > I think the bad news is that the static approach will need support for > ENABLE_THREAD_SAFETY. I did some more reading today and noticed that the NSS documentation (and their sample code for doing crypto without TLS connections) says to use NSS_NoDB_Init to perform a read-only init which don't require a matching close call. Now, the docs aren't terribly clear and also seems to have gone offline from MDN, and skimming the code isn't entirelt self-explanatory, so I may well have missed something. The v24 patchset posted changes to this and at least passes tests with decent performance so it seems worth investigating. > (It looks like the NSS implementation of pgtls_close() needs some thread > support too?) Storing the context in conn would probably be better? > The good(?) news is that I don't understand why OpenSSL's > implementation of cryptohash doesn't _also_ need the thread-safety > code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al > before using any of its cryptohash implementation?) So maybe we can > implement the same global setup/teardown API for OpenSSL too and not > have to one-off it for NSS... No idea here, wouldn't that impact pgcrypto as well in that case? -- Daniel Gustafsson https://vmware.com/
> On 1 Feb 2021, at 14:25, Michael Paquier <michael@paquier.xyz> wrote: > I have refreshed the docs on top to be consistent with the new > configuration, and applied it after more checks. Thanks, I was just about to send a rebased version earlier today with the doc changes in the 0001 patch when this email landed in my inbox =) The v24 posted upthread is now rebased on top of this. > I'll try to look in more details at the failures with cryptohashes I found > upthread. Great, thanks. -- Daniel Gustafsson https://vmware.com/
On Mon, 2021-02-01 at 21:49 +0100, Daniel Gustafsson wrote: > > On 29 Jan 2021, at 19:46, Jacob Champion <pchampion@vmware.com> wrote: > > I think the bad news is that the static approach will need support for > > ENABLE_THREAD_SAFETY. > > I did some more reading today and noticed that the NSS documentation (and their > sample code for doing crypto without TLS connections) says to use NSS_NoDB_Init > to perform a read-only init which don't require a matching close call. Now, > the docs aren't terribly clear and also seems to have gone offline from MDN, > and skimming the code isn't entirelt self-explanatory, so I may well have > missed something. The v24 patchset posted changes to this and at least passes > tests with decent performance so it seems worth investigating. Nice! Not having to close helps quite a bit. (Looks like thread safety for NSS_Init was added in 3.13, so we have an absolute version floor.) > > (It looks like the NSS implementation of pgtls_close() needs some thread > > support too?) > > Storing the context in conn would probably be better? Agreed. > > The good(?) news is that I don't understand why OpenSSL's > > implementation of cryptohash doesn't _also_ need the thread-safety > > code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al > > before using any of its cryptohash implementation?) So maybe we can > > implement the same global setup/teardown API for OpenSSL too and not > > have to one-off it for NSS... > > No idea here, wouldn't that impact pgcrypto as well in that case? If pgcrypto is backend-only then I don't think it should need multithreading protection; is that right? --Jacob
On Mon, 2021-02-01 at 21:49 +0100, Daniel Gustafsson wrote: > > Embedded NULLs are now handled in a similar manner to the OpenSSL side, > > though because this failure happens during the certificate > > authentication callback, it results in a TLS alert rather than simply > > closing the connection. > > But returning SECFailure from the cert callback force NSS to terminate the > connection immediately doesn't it? IIRC NSS will send the alert first, whereas our OpenSSL implementation will complete the handshake and then drop the connection. I'll rebuild with the latest and confirm. > > For easier review of just the parts I've changed, I've also attached a > > since-v22.diff, which is part of the 0001 patch. > > I confused my dev trees and missed to include this in the v23 that I sent out > (which should've been v24), sorry about that. Attached is a v24 which is > rebased on top of todays --with-ssl commit, and now includes your changes. No problem. Thanks! --Jacob
On Tue, Feb 02, 2021 at 12:42:23AM +0000, Jacob Champion wrote: > (Looks like thread safety for NSS_Init was added in 3.13, so we have an > absolute version floor.) If that's the case, I would recommend to add at least something in the section called install-requirements in the docs. > If pgcrypto is backend-only then I don't think it should need > multithreading protection; is that right? No need for it in the backend, unless there are plans to switch from processes to threads there :p libpq, ecpg and anything using them have to care about that. Worth noting that OpenSSL also has some special handling in libpq with CRYPTO_get_id_callback() and that it tracks the number of opened connections. -- Michael
Attachment
On Tue, 2021-02-02 at 00:55 +0000, Jacob Champion wrote: > On Mon, 2021-02-01 at 21:49 +0100, Daniel Gustafsson wrote: > > > Embedded NULLs are now handled in a similar manner to the OpenSSL side, > > > though because this failure happens during the certificate > > > authentication callback, it results in a TLS alert rather than simply > > > closing the connection. > > > > But returning SECFailure from the cert callback force NSS to terminate the > > connection immediately doesn't it? > > IIRC NSS will send the alert first, whereas our OpenSSL implementation > will complete the handshake and then drop the connection. I'll rebuild > with the latest and confirm. I wasn't able to reproduce the behavior I thought I saw before. In any case I think the current NSS implementation for embedded NULLs will work correctly. > > Attached is a v24 which is > > rebased on top of todays --with-ssl commit, and now includes your changes. I have a v25 attached which fixes and re-enables the skipped/todo'd client certificate and SCRAM tests. (Changes between v24 and v25 are in since-v24.diff.) The server-cn-only database didn't have the root CA installed to be able to verify client certificates, so I've added it. Note that this changes the error message printed during the invalid- root tests, because NSS is now sending the root of the chain. So the server's issuer is considered untrusted rather than unrecognized. --Jacob
Attachment
On Tue, Feb 02, 2021 at 08:33:35PM +0000, Jacob Champion wrote: > Note that this changes the error message printed during the invalid- > root tests, because NSS is now sending the root of the chain. So the > server's issuer is considered untrusted rather than unrecognized. I think that it is not a good idea to attach the since-v*.diff patches into the threads. This causes the CF bot to fail in applying those patches. Could it be possible to split 0001 into two parts at least with one patch that includes the basic changes for the build and ./configure, and a second with the FE/BE changes? -- Michael
Attachment
On Thu, 2021-02-04 at 16:30 +0900, Michael Paquier wrote: > On Tue, Feb 02, 2021 at 08:33:35PM +0000, Jacob Champion wrote: > > Note that this changes the error message printed during the invalid- > > root tests, because NSS is now sending the root of the chain. So the > > server's issuer is considered untrusted rather than unrecognized. > > I think that it is not a good idea to attach the since-v*.diff patches > into the threads. This causes the CF bot to fail in applying those > patches. Ah, sorry about that. Is there an extension I can use (or lack thereof) that the CF bot will ignore, or does it scan the attachment contents? --Jacob
On Thu, Feb 04, 2021 at 06:35:28PM +0000, Jacob Champion wrote: > Ah, sorry about that. Is there an extension I can use (or lack thereof) > that the CF bot will ignore, or does it scan the attachment contents? The thing is smart, but there are ways to bypass it. Here is the code: https://github.com/macdice/cfbot/ And here are the patterns looked at: cfbot_commitfest_rpc.py: groups = re.search('<a href="(/message-id/attachment/[^"]*\\.(diff|diff\\.gz|patch|patch\\.gz|tar\\.gz|tgz|tar\\.bz2))">', line) -- Michael
Attachment
> On 4 Feb 2021, at 08:30, Michael Paquier <michael@paquier.xyz> wrote: > Could it be possible to split 0001 into two parts at least with one > patch that includes the basic changes for the build and ./configure, > and a second with the FE/BE changes? Attached is a new patchset where I've tried to split the patches even further to try and separate out changes for easier review. While not a perfect split I'm sure, and clearly only for review purposes, I do hope it helps a little. There is one hunk in 0002 which moves some OpenSSL specific code from underneath USE_SSL, but thats about the only non-NSS change left in this patchset AFAICS. Additionally, this version moves the code in thee shared header to a proper .c file shared between frontend and backend as well as performs some general cleanup around that. -- Daniel Gustafsson https://vmware.com/
Attachment
- v26-0010-nss-Build-infrastructure.patch
- v26-0009-nss-Support-NSS-in-cryptohash.patch
- v26-0008-nss-Support-NSS-in-sslinfo.patch
- v26-0007-nss-Support-NSS-in-pgcrypto.patch
- v26-0006-nss-Documentation.patch
- v26-0005-nss-pg_strong_random-support.patch
- v26-0004-nss-Add-NSS-specific-tests.patch
- v26-0003-Refactor-SSL-testharness-for-multiple-library.patch
- v26-0002-nss-Remove-mentions-and-infra-of-OpenSSL-being-t.patch
- v26-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
> On 4 Feb 2021, at 19:35, Jacob Champion <pchampion@vmware.com> wrote: > > On Thu, 2021-02-04 at 16:30 +0900, Michael Paquier wrote: >> On Tue, Feb 02, 2021 at 08:33:35PM +0000, Jacob Champion wrote: >>> Note that this changes the error message printed during the invalid- >>> root tests, because NSS is now sending the root of the chain. So the >>> server's issuer is considered untrusted rather than unrecognized. >> >> I think that it is not a good idea to attach the since-v*.diff patches >> into the threads. This causes the CF bot to fail in applying those >> patches. > > Ah, sorry about that. Is there an extension I can use (or lack thereof) > that the CF bot will ignore, or does it scan the attachment contents? Naming the file .patch.txt should work, and it serves the double purpose of making it extra clear that this is not a patch intended to be applied but one intended to be read for informational purposes. -- Daniel Gustafsson https://vmware.com/
On Tue, Feb 09, 2021 at 12:08:37AM +0100, Daniel Gustafsson wrote: > Attached is a new patchset where I've tried to split the patches even further > to try and separate out changes for easier review. While not a perfect split > I'm sure, and clearly only for review purposes, I do hope it helps a little. > There is one hunk in 0002 which moves some OpenSSL specific code from > underneath USE_SSL, but thats about the only non-NSS change left in this > patchset AFAICS. I would have imagined 0010 to be either a 0001 or a 0002 :) } +#endif /* USE_SSL */ + +#ifndef USE_OPENSSL PQsslKeyPassHook_OpenSSL_type PQgetSSLKeyPassHook_OpenSSL(void) Indeed. Let's fix that on HEAD, as an independent thing. errmsg("hostssl record cannot match because SSL is not supported by this build"), - errhint("Compile with --with-ssl=openssl to use SSL connections."), + errhint("Compile with --with-ssl to use SSL connections."), Actually, we could change that directly on HEAD as you suggest. This code area is surrounded with USE_SSL so there is no need to mention openssl at all. -/* Support for overriding sslpassword handling with a callback. */ +/* Support for overriding sslpassword handling with a callback */ Makes sense. /* * USE_SSL code should be compiled only when compiling with an SSL - * implementation. (Currently, only OpenSSL is supported, but we might add - * more implementations in the future.) + * implementation. */ Fine by me as well, meaning that 0002 could just be committed as-is. I am also looking at 0003 a bit. -- Michael
Attachment
> On 9 Feb 2021, at 07:47, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Feb 09, 2021 at 12:08:37AM +0100, Daniel Gustafsson wrote: >> Attached is a new patchset where I've tried to split the patches even further >> to try and separate out changes for easier review. While not a perfect split >> I'm sure, and clearly only for review purposes, I do hope it helps a little. >> There is one hunk in 0002 which moves some OpenSSL specific code from >> underneath USE_SSL, but thats about the only non-NSS change left in this >> patchset AFAICS. > > I would have imagined 0010 to be either a 0001 or a 0002 :) Well, 0010 is a 2 in binary =) Jokes aside, I just didn't want to have a patch referencing files added by later patches in the series. > errmsg("hostssl record cannot match because SSL is not supported by this build"), > - errhint("Compile with --with-ssl=openssl to use SSL connections."), > + errhint("Compile with --with-ssl to use SSL connections."), > Actually, we could change that directly on HEAD as you suggest. This > code area is surrounded with USE_SSL so there is no need to mention > openssl at all. We could, the only reason it says =openssl today is that it's the only possible value but thats an implementation detail. Changing it now before it's shipped anywhere means the translation will be stable even if another library is supported. > 0002 could just be committed as-is. It can be, it's not the most pressing patch scope reduction but everything helps of course. > I am also looking at 0003 a bit. Thanks. That patch is slightly more interesting in terms of reducing scope here, but I also think it makes the test code a bit easier to digest when certificate management is abstracted into the API rather than the job of the testfile to perform. -- Daniel Gustafsson https://vmware.com/
On Tue, Feb 09, 2021 at 10:30:52AM +0100, Daniel Gustafsson wrote: > It can be, it's not the most pressing patch scope reduction but everything > helps of course. Okay. I have spent some time on this one and finished it. > Thanks. That patch is slightly more interesting in terms of reducing scope > here, but I also think it makes the test code a bit easier to digest when > certificate management is abstracted into the API rather than the job of the > testfile to perform. That's my impression. Still, I am wondering if there could be a different approach. I need to think more about that first.. -- Michael
Attachment
> On 10 Feb 2021, at 08:23, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Feb 09, 2021 at 10:30:52AM +0100, Daniel Gustafsson wrote: >> It can be, it's not the most pressing patch scope reduction but everything >> helps of course. > > Okay. I have spent some time on this one and finished it. Thanks, I'll post a rebased version on top of this soon. >> Thanks. That patch is slightly more interesting in terms of reducing scope >> here, but I also think it makes the test code a bit easier to digest when >> certificate management is abstracted into the API rather than the job of the >> testfile to perform. > > That's my impression. Still, I am wondering if there could be a > different approach. I need to think more about that first.. Another option could be to roll SSL config into PostgresNode and expose SSL connections to every subsystem tested with TAP. Something like: $node = get_new_node(..); $node->setup_ssl(..); $node->set_certificate(..); That is a fair bit more work though, but perhaps we could then easier find (and/or prevent) bugs like the one fixed in a45bc8a4f6495072bc48ad40a5aa03. -- Daniel Gustafsson https://vmware.com/
On Mon, 2020-07-20 at 15:35 +0200, Daniel Gustafsson wrote: > This version adds support for sslinfo on NSS for most the functions. I've poked around to see what can be done about the unimplemented ssl_client_dn_field/ssl_issuer_field functions. There's a nasty soup of specs to wade around in, and it's not really clear to me which ones take precedence since they're mostly centered on LDAP. My take on it is that OpenSSL has done its own thing here, with almost- based-on-a-spec-but-not-quite semantics. NSS has no equivalents to many of the field names that OpenSSL supports (e.g. "commonName"). Likewise, OpenSSL doesn't support case-insensitivity (e.g. "cn" in addition to "CN") as many of the relevant RFCs require. They do both support dotted-decimal representations, so we could theoretically get feature parity there without a huge amount of work. For the few attributes that NSS has a public API for retrieving: - common name - country - locality - state - organization - domain component - org. unit - DN qualifier - uid - email address(es?) we could hardcode the list of OpenSSL-compatible names, and just translate manually in sslinfo. Then leave the rest up to dotted-decimal OIDs. Would that be desirable, or do we want this interface to be something more generally compatible with (some as-of-yet unspecified) spec? --Jacob
> On 17 Feb 2021, at 02:02, Jacob Champion <pchampion@vmware.com> wrote: > On Mon, 2020-07-20 at 15:35 +0200, Daniel Gustafsson wrote: >> This version adds support for sslinfo on NSS for most the functions. > > I've poked around to see what can be done about the > unimplemented ssl_client_dn_field/ssl_issuer_field functions. There's a > nasty soup of specs to wade around in, and it's not really clear to me > which ones take precedence since they're mostly centered on LDAP. Thanks for digging! > we could hardcode the list of OpenSSL-compatible names, and just > translate manually in sslinfo. Then leave the rest up to dotted-decimal > OIDs. > > Would that be desirable, or do we want this interface to be something > more generally compatible with (some as-of-yet unspecified) spec? Regardless of approach taken I think this sounds like something that should be tackled in a follow-up patch if the NSS patch is merged - and probably only as a follow-up to a patch that adds test coverage to sslinfo. From the sounds of things me may not be able to guarantee stability across OpenSSL versions as it is right now? -- Daniel Gustafsson https://vmware.com/
> On 10 Feb 2021, at 13:17, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 10 Feb 2021, at 08:23, Michael Paquier <michael@paquier.xyz> wrote: >> >> On Tue, Feb 09, 2021 at 10:30:52AM +0100, Daniel Gustafsson wrote: >>> It can be, it's not the most pressing patch scope reduction but everything >>> helps of course. >> >> Okay. I have spent some time on this one and finished it. > > Thanks, I'll post a rebased version on top of this soon. Attached is a rebase on top of this and the recent cryptohash changes to pass in buffer lengths to the _final function. On top of that, I fixed up and expanded the documentation, improved SCRAM handling (by using NSS digest operations which are better suited) and reworded and expanded comments. This patch version is, I think, feature complete with the OpenSSL implementation. -- Daniel Gustafsson https://vmware.com/
Attachment
- v27-0009-nss-Build-infrastructure.patch
- v27-0008-nss-Support-NSS-in-cryptohash.patch
- v27-0007-nss-Support-NSS-in-sslinfo.patch
- v27-0006-nss-Support-NSS-in-pgcrypto.patch
- v27-0005-nss-Documentation.patch
- v27-0004-nss-pg_strong_random-support.patch
- v27-0003-nss-Add-NSS-specific-tests.patch
- v27-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v27-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Wed, 2021-02-17 at 22:19 +0100, Daniel Gustafsson wrote: > > On 17 Feb 2021, at 02:02, Jacob Champion <pchampion@vmware.com> wrote: > > Would that be desirable, or do we want this interface to be something > > more generally compatible with (some as-of-yet unspecified) spec? > > Regardless of approach taken I think this sounds like something that should be > tackled in a follow-up patch if the NSS patch is merged - and probably only as > a follow-up to a patch that adds test coverage to sslinfo. Sounds good, and +1 to adding coverage at the same time. > From the sounds of > things me may not be able to guarantee stability across OpenSSL versions as it > is right now? Yeah. I was going to write that OpenSSL would be unlikely to change these once they're added for the first time, but after checking GitHub it looks like they have done so recently [1], as part of a patch release no less. --Jacob [1] https://github.com/openssl/openssl/pull/10029
On Wed, 2021-02-17 at 22:35 +0100, Daniel Gustafsson wrote: > Attached is a rebase on top of this and the recent cryptohash changes to pass > in buffer lengths to the _final function. On top of that, I fixed up and > expanded the documentation, improved SCRAM handling (by using NSS digest > operations which are better suited) and reworded and expanded comments. This > patch version is, I think, feature complete with the OpenSSL implementation. fe-secure-nss.c is no longer compiling as of this patchset; looks like pgtls_open_client() has a truncated statement. --Jacob
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > Attached is a rebase which attempts to fix the cfbot Appveyor failure, there > were missing HAVE_ defines for MSVC. > Subject: [PATCH v30 1/9] nss: Support libnss as TLS library in libpq > > This commit contains the frontend and backend portion of TLS support > in libpq to allow encrypted connections. The implementation is done maybe add 'using NSS' to that first sentence. ;) > +++ b/src/backend/libpq/auth.c > @@ -2849,7 +2849,14 @@ CheckCertAuth(Port *port) > { > int status_check_usermap = STATUS_ERROR; > > +#if defined(USE_OPENSSL) > Assert(port->ssl); > +#elif defined(USE_NSS) > + /* TODO: should we rename pr_fd to ssl, to keep consistency? */ > + Assert(port->pr_fd); > +#else > + Assert(false); > +#endif Having thought about this TODO item for a bit, I tend to think it's better to keep them distinct. They aren't the same and it might not be clear what's going on if one was to somehow mix them (at least if pr_fd continues to sometimes be a void*, but I wonder why that's being done..? more on that later..). > +++ b/src/backend/libpq/be-secure-nss.c [...] > +/* default init hook can be overridden by a shared library */ > +static void default_nss_tls_init(bool isServerStart); > +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init; > +static PRDescIdentity pr_id; > + > +static PRIOMethods pr_iomethods; Happy to be told I'm missing something, but the above two variables seem to only be used in init_iolayer.. is there a reason they're declared here instead of just being declared in that function? > + /* > + * Set the fallback versions for the TLS protocol version range to a > + * combination of our minimal requirement and the library maximum. Error > + * messages should be kept identical to those in be-secure-openssl.c to > + * make translations easier. > + */ Should we pull these error messages out into another header so that they're in one place to make sure they're kept consistent, if we really want to put the effort in to keep them the same..? I'm not 100% sure that it's actually necessary to do so, but defining these in one place would help maintain this if we want to. Also alright with just keeping the comment, not that big of a deal. > +int > +be_tls_open_server(Port *port) > +{ > + SECStatus status; > + PRFileDesc *model; > + PRFileDesc *pr_fd; pr_fd here is materially different from port->pr_fd, no? As in, one is the NSS raw TCP fd while the other is the SSL fd, right? Maybe we should use two different variable names to try and make sure they don't get confused? Might even set this to NULL after we are done with it too.. Then again, I see later on that when we do the dance with the 'model' PRFileDesc that we just use the same variable- maybe we should do that? That is, just get rid of this 'pr_fd' and use port->pr_fd always? > + /* > + * The NSPR documentation states that runtime initialization via PR_Init > + * is no longer required, as the first caller into NSPR will perform the > + * initialization implicitly. The documentation doesn't however clarify > + * from which version this is holds true, so let's perform the potentially > + * superfluous initialization anyways to avoid crashing on older versions > + * of NSPR, as there is no difference in overhead. The NSS documentation > + * still states that PR_Init must be called in some way (implicitly or > + * explicitly). > + * > + * The below parameters are what the implicit initialization would've done > + * for us, and should work even for older versions where it might not be > + * done automatically. The last parameter, maxPTDs, is set to various > + * values in other codebases, but has been unused since NSPR 2.1 which was > + * released sometime in 1998. In current versions of NSPR all parameters > + * are ignored. > + */ > + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ ); > + > + /* > + * The certificate path (configdir) must contain a valid NSS database. If > + * the certificate path isn't a valid directory, NSS will fall back on the > + * system certificate database. If the certificate path is a directory but > + * is empty then the initialization will fail. On the client side this can > + * be allowed for any sslmode but the verify-xxx ones. > + * https://bugzilla.redhat.com/show_bug.cgi?id=728562 For the server side > + * we won't allow this to fail however, as we require the certificate and > + * key to exist. > + * > + * The original design of NSS was for a single application to use a single > + * copy of it, initialized with NSS_Initialize() which isn't returning any > + * handle with which to refer to NSS. NSS initialization and shutdown are > + * global for the application, so a shutdown in another NSS enabled > + * library would cause NSS to be stopped for libpq as well. The fix has > + * been to introduce NSS_InitContext which returns a context handle to > + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS > + * 3.12, but the use of it is not very well documented. > + * https://bugzilla.redhat.com/show_bug.cgi?id=738456 The above seems to indicate that we will be requiring at least 3.12, right? Yet above we have code to work with NSPR versions before 2.1? Maybe we should put a stake in the ground that says "we only support back to version X of NSS", test with that and a few more recent versions and the most recent, and then rip out anything that's needed for versions which are older than that? I have a pretty hard time imagining that someone is going to want to build PG v14 w/ NSS 2.0 ... > + { > + char *ciphers, > + *c; > + > + char *sep = ":;, "; > + PRUint16 ciphercode; > + const PRUint16 *nss_ciphers; > + > + /* > + * If the user has specified a set of preferred cipher suites we start > + * by turning off all the existing suites to avoid the risk of down- > + * grades to a weaker cipher than expected. > + */ > + nss_ciphers = SSL_GetImplementedCiphers(); > + for (int i = 0; i < SSL_GetNumImplementedCiphers(); i++) > + SSL_CipherPrefSet(model, nss_ciphers[i], PR_FALSE); > + > + ciphers = pstrdup(SSLCipherSuites); > + > + for (c = strtok(ciphers, sep); c; c = strtok(NULL, sep)) > + { > + if (!pg_find_cipher(c, &ciphercode)) > + { > + status = SSL_CipherPrefSet(model, ciphercode, PR_TRUE); > + if (status != SECSuccess) > + { > + ereport(COMMERROR, > + (errmsg("invalid cipher-suite specified: %s", c))); > + return -1; > + } > + } > + } Maybe I'm a bit confused, but doesn't pg_find_cipher return *true* when a cipher is found, and therefore the '!' above is saying "if we don't find a matching cipher, then run the code to set the cipher ...". Also- we don't seem to complain at all about a cipher being specified that we don't find? Guess I would think that we might want to throw a WARNING in such a case, but I could possibly be convinced otherwise. Kind of wonder just what happens with the current code, I'm guessing ciphercode is zero and therefore doesn't complain but also doesn't do what we want. I wonder if there's a way to test this? I do think we should probably throw an error if we end up with *no* ciphers being set, which doesn't seem to be happening here..? > + /* > + * Set up the custom IO layer. > + */ Might be good to mention that the IO Layer is what sets up the read/write callbacks to be used. > + port->pr_fd = SSL_ImportFD(model, pr_fd); > + if (!port->pr_fd) > + { > + ereport(COMMERROR, > + (errmsg("unable to initialize"))); > + return -1; > + } Maybe a comment and a better error message for this? > + PR_Close(model); This might deserve one also, the whole 'model' construct is a bit different. :) > + port->ssl_in_use = true; > + > + /* Register out shutdown callback */ *our > +int > +be_tls_get_cipher_bits(Port *port) > +{ > + SECStatus status; > + SSLChannelInfo channel; > + SSLCipherSuiteInfo suite; > + > + status = SSL_GetChannelInfo(port->pr_fd, &channel, sizeof(channel)); > + if (status != SECSuccess) > + goto error; > + > + status = SSL_GetCipherSuiteInfo(channel.cipherSuite, &suite, sizeof(suite)); > + if (status != SECSuccess) > + goto error; > + > + return suite.effectiveKeyBits; > + > +error: > + ereport(WARNING, > + (errmsg("unable to extract TLS session information: %s", > + pg_SSLerrmessage(PR_GetError())))); > + return 0; > +} It doesn't have to be much, but I, at least, do prefer to see function-header comments. :) Not that the OpenSSL code has them consistently, so obviously not that big of a deal. Goes for a number of the functions being added. > + /* Found a CN, ecode and copy it into a newly allocated buffer */ *decode > +static PRInt32 > +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags, > + PRIntervalTime timeout) > +{ > + PRRecvFN read_fn; > + PRInt32 n_read; > + > + read_fn = fd->lower->methods->recv; > + n_read = read_fn(fd->lower, buf, amount, flags, timeout); > + > + return n_read; > +} > + > +static PRInt32 > +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags, > + PRIntervalTime timeout) > +{ > + PRSendFN send_fn; > + PRInt32 n_write; > + > + send_fn = fd->lower->methods->send; > + n_write = send_fn(fd->lower, buf, amount, flags, timeout); > + > + return n_write; > +} > + > +static PRStatus > +pg_ssl_close(PRFileDesc *fd) > +{ > + /* > + * Disconnect our private Port from the fd before closing out the stack. > + * (Debug builds of NSPR will assert if we do not.) > + */ > + fd->secret = NULL; > + return PR_GetDefaultIOMethods()->close(fd); > +} Regarding these, I find myself wondering how they're different from the defaults..? I mean, the above just directly called PR_GetDefaultIOMethods() to then call it's close() function- are the fd->lower_methods->recv/send not the default methods? I don't quite get what the point is from having our own callbacks here if they just do exactly what the defaults would do (or are there actually no defined defaults and you have to provide these..?). > +/* > + * ssl_protocol_version_to_nss > + * Translate PostgreSQL TLS version to NSS version > + * > + * Returns zero in case the requested TLS version is undefined (PG_ANY) and > + * should be set by the caller, or -1 on failure. > + */ > +static uint16 > +ssl_protocol_version_to_nss(int v, const char *guc_name) guc_name isn't actually used in this function..? Is there some reason to keep it or is it leftover? Also, I get that they do similar jobs and that one is in the frontend and the other is in the backend, but I'm not a fan of having two 'ssl_protocol_version_to_nss()'s functions that take different argument types but have exact same name and do functionally different things.. > +++ b/src/backend/utils/misc/guc.c > @@ -4377,6 +4381,18 @@ static struct config_string ConfigureNamesString[] = > check_canonical_path, assign_pgstat_temp_directory, NULL > }, > > +#ifdef USE_NSS > + { > + {"ssl_database", PGC_SIGHUP, CONN_AUTH_SSL, > + gettext_noop("Location of the NSS certificate database."), > + NULL > + }, > + &ssl_database, > + "", > + NULL, NULL, NULL > + }, > +#endif We don't #ifdef out the various GUCs even if SSL isn't compiled in, so it doesn't seem quite right to be doing so here? Generally speaking, GUCs that we expect people to use (rather than debugging ones and such) are typically always built, even if we don't build support for that capability, so we can throw a better error message than just some ugly syntax or parsing error if we come across one being set in a non-enabled build. > +++ b/src/common/cipher_nss.c > @@ -0,0 +1,192 @@ > +/*------------------------------------------------------------------------- > + * > + * cipher_nss.c > + * NSS functionality shared between frontend and backend for working > + * with ciphers > + * > + * This should only bse used if code is compiled with NSS support. *be > +++ b/src/include/libpq/libpq-be.h > @@ -200,6 +200,10 @@ typedef struct Port > SSL *ssl; > X509 *peer; > #endif > + > +#ifdef USE_NSS > + void *pr_fd; > +#endif > } Port; Given this is under a #ifdef USE_NSS, does it need to be / should it really be a void*? > +++ b/src/interfaces/libpq/fe-connect.c > @@ -359,6 +359,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = { > "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */ > offsetof(struct pg_conn, target_session_attrs)}, > > + {"cert_database", NULL, NULL, NULL, > + "CertificateDatabase", "", 64, > + offsetof(struct pg_conn, cert_database)}, I mean, maybe nitpicking here, but all the other SSL stuff is 'sslsomething' and the backend version of this is 'ssl_database', so wouldn't it be more consistent to have this be 'ssldatabase'? > +++ b/src/interfaces/libpq/fe-secure-nss.c > + * This logic exist in NSS as well, but it's only available for when there is *exists > + /* > + * The NSPR documentation states that runtime initialization via PR_Init > + * is no longer required, as the first caller into NSPR will perform the > + * initialization implicitly. See be-secure-nss.c for further discussion > + * on PR_Init. > + */ > + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0); See same comment I made above- and also there's a comment earlier in this file that we don't need to PR_Init() even ... > + { > + conn->nss_context = NSS_InitContext("", "", "", "", ¶ms, > + NSS_INIT_READONLY | NSS_INIT_NOCERTDB | > + NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN | > + NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD); > + if (!conn->nss_context) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to create certificate database: %s"), > + pg_SSLerrmessage(PR_GetError())); > + return PGRES_POLLING_FAILED; > + } > + } That error message seems a bit ... off? Surely we aren't trying to actually create a certificate database here? > + /* > + * Configure cipher policy. > + */ > + status = NSS_SetDomesticPolicy(); > + if (status != SECSuccess) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to configure cipher policy: %s"), > + pg_SSLerrmessage(PR_GetError())); > + > + return PGRES_POLLING_FAILED; > + } Probably good to pull over at least some parts of the comments made in the backend code about SetDomesticPolicy() actually enabling everything (just like all the policies apparently do)... > + /* > + * If we don't have a certificate database, the system trust store is the > + * fallback we can use. If we fail to initialize that as well, we can > + * still attempt a connection as long as the sslmode isn't verify*. > + */ > + if (!conn->cert_database && conn->sslmode[0] == 'v') > + { > + status = pg_load_nss_module(&ca_trust, ca_trust_name, "\"Root Certificates\""); > + if (status != SECSuccess) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("WARNING: unable to load NSS trust module \"%s\" : %s"), > + ca_trust_name, > + pg_SSLerrmessage(PR_GetError())); > + > + return PGRES_POLLING_FAILED; > + } > + } Maybe have something a bit more here about "maybe you should specifify a cert_database" or such? > + if (conn->ssl_max_protocol_version && strlen(conn->ssl_max_protocol_version) > 0) > + { > + int ssl_max_ver = ssl_protocol_version_to_nss(conn->ssl_max_protocol_version); > + > + if (ssl_max_ver == -1) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("invalid value \"%s\" for maximum version of SSL protocol\n"), > + conn->ssl_max_protocol_version); > + return -1; > + } > + > + desired_range.max = ssl_max_ver; > + } In the backend code, we have an additional check to make sure they didn't set the min version higher than the max.. should we have that here too? Either way, seems like we should be consistent. > + * The model can now we closed as we've applied the settings of the model *be > + * onto the real socket. From hereon we should only use conn->pr_fd. *here on Similar comments to the backend code- should we just always use conn->pr_fd? Or should we rename pr_fd to something else? > + /* > + * Specify which hostname we are expecting to talk to. This is required, > + * albeit mostly applies to when opening a connection to a traditional > + * http server it seems. > + */ > + SSL_SetURL(conn->pr_fd, (conn->connhost[conn->whichhost]).host); We should probably also set SNI, if available (NSS 3.12.6 it seems?), since it looks like that's going to be added to the OpenSSL code. > + do > + { > + status = SSL_ForceHandshake(conn->pr_fd); > + } > + while (status != SECSuccess && PR_GetError() == PR_WOULD_BLOCK_ERROR); We don't seem to have this loop in the backend code.. Is there some reason that we don't? Is it possible that we need to have a loop here too? I recall in the GSS encryption code there were definitely things during setup that had to be looped back over on both sides to make sure everything was finished ... > + if (conn->sslmode[0] == 'v') > + return SECFailure; Seems a bit grotty to do this (though I see that the OpenSSL code does too ... at least there we have a comment though, maybe add one here?). I would have thought we'd actually do strcmp()'s like above. > + /* > + * Return the underlying PRFileDesc which can be used to access > + * information on the connection details. There is no SSL context per se. > + */ > + if (strcmp(struct_name, "NSS") == 0) > + return conn->pr_fd; > + return NULL; > +} Is there never a reason someone might want the pointer returned by NSS_InitContext? I don't know that there is but it might be something to consider (we could even possibly have our own structure returned by this function which includes both, maybe..?). Not sure if there's a sensible use-case for that or not just wanted to bring it up as it's something I asked myself while reading through this patch. > + if (strcmp(attribute_name, "protocol") == 0) > + { > + switch (channel.protocolVersion) > + { > +#ifdef SSL_LIBRARY_VERSION_TLS_1_3 > + case SSL_LIBRARY_VERSION_TLS_1_3: > + return "TLSv1.3"; > +#endif > +#ifdef SSL_LIBRARY_VERSION_TLS_1_2 > + case SSL_LIBRARY_VERSION_TLS_1_2: > + return "TLSv1.2"; > +#endif > +#ifdef SSL_LIBRARY_VERSION_TLS_1_1 > + case SSL_LIBRARY_VERSION_TLS_1_1: > + return "TLSv1.1"; > +#endif > + case SSL_LIBRARY_VERSION_TLS_1_0: > + return "TLSv1.0"; > + default: > + return "unknown"; > + } > + } Not sure that it really matters, but this seems like it might be useful to have as its own function... Maybe even a data structure that both functions use just in oppostie directions. Really minor tho. :) > diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c > index c601071838..7f10da3010 100644 > --- a/src/interfaces/libpq/fe-secure.c > +++ b/src/interfaces/libpq/fe-secure.c > @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn) > } > #endif /* USE_OPENSSL */ > > +#ifndef USE_NSS > + > +PQsslKeyPassHook_nss_type > +PQgetSSLKeyPassHook_nss(void) > +{ > + return NULL; > +} > + > +void > +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook) > +{ > + return; > +} > + > +char * > +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg) > +{ > + return NULL; > +} > +#endif /* USE_NSS */ Isn't this '!USE_NSS'? > diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h > index 0c9e95f1a7..f15af39222 100644 > --- a/src/interfaces/libpq/libpq-int.h > +++ b/src/interfaces/libpq/libpq-int.h > @@ -383,6 +383,7 @@ struct pg_conn > char *sslrootcert; /* root certificate filename */ > char *sslcrl; /* certificate revocation list filename */ > char *sslcrldir; /* certificate revocation list directory name */ > + char *cert_database; /* NSS certificate/key database */ > char *requirepeer; /* required peer credentials for local sockets */ > char *gssencmode; /* GSS mode (require,prefer,disable) */ > char *krbsrvname; /* Kerberos service name */ > @@ -507,6 +508,28 @@ struct pg_conn > * OpenSSL version changes */ > #endif > #endif /* USE_OPENSSL */ > + > +/* > + * The NSS/NSPR specific types aren't used to avoid pulling in the required > + * headers here, as they are causing conflicts with PG definitions. > + */ I'm a bit confused- what are the conflicts being caused here..? Certainly under USE_OPENSSL we use the actual OpenSSL types.. > Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library > > The SSL testharness was fully tied to OpenSSL in the way the server was > set up and reconfigured. This refactors the SSLServer module into a SSL > library agnostic SSL/Server module which in turn use SSL/Backend/<lib> > modules for the implementation details. > > No changes are done to the actual tests, this only change how setup and > teardown is performed. Presumably this could be committed ahead of the main NSS support? > Subject: [PATCH v30 4/9] nss: pg_strong_random support > +++ b/src/port/pg_strong_random.c > +bool > +pg_strong_random(void *buf, size_t len) > +{ > + NSSInitParameters params; > + NSSInitContext *nss_context; > + SECStatus status; > + > + memset(¶ms, 0, sizeof(params)); > + params.length = sizeof(params); > + nss_context = NSS_InitContext("", "", "", "", ¶ms, > + NSS_INIT_READONLY | NSS_INIT_NOCERTDB | > + NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN | > + NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD); > + > + if (!nss_context) > + return false; > + > + status = PK11_GenerateRandom(buf, len); > + NSS_ShutdownContext(nss_context); > + > + if (status == SECSuccess) > + return true; > + > + return false; > +} > + > +#else /* not USE_OPENSSL, USE_NSS or WIN32 */ I don't know that it's an issue, but do we actually need to init the NSS context and shut it down every time..? > /* > * Without OpenSSL or Win32 support, just read /dev/urandom ourselves. *or NSS > Subject: [PATCH v30 5/9] nss: Documentation > +++ b/doc/src/sgml/acronyms.sgml > @@ -684,6 +717,16 @@ > </listitem> > </varlistentry> > > + <varlistentry> > + <term><acronym>TLS</acronym></term> > + <listitem> > + <para> > + <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security"> > + Transport Layer Security</ulink> > + </para> > + </listitem> > + </varlistentry> We don't have this already..? Surely we should.. > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index 967de73596..1608e9a7c7 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -1272,6 +1272,23 @@ include_dir 'conf.d' > </listitem> > </varlistentry> > > + <varlistentry id="guc-ssl-database" xreflabel="ssl_database"> > + <term><varname>ssl_database</varname> (<type>string</type>) > + <indexterm> > + <primary><varname>ssl_database</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the name of the file containing the server certificates and > + keys when using <productname>NSS</productname> for <acronym>SSL</acronym> > + connections. This parameter can only be set in the > + <filename>postgresql.conf</filename> file or on the server command > + line. *SSL/TLS maybe? > @@ -1288,7 +1305,9 @@ include_dir 'conf.d' > connections using TLS version 1.2 and lower are affected. There is > currently no setting that controls the cipher choices used by TLS > version 1.3 connections. The default value is > - <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>. The default is usually a > + <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have > + been built with <productname>OpenSSL</productname> as the > + <acronym>SSL</acronym> library. The default is usually a > reasonable choice unless you have specific security requirements. > </para> Shouldn't we say something here wrt NSS? > @@ -1490,8 +1509,11 @@ include_dir 'conf.d' > <para> > Sets an external command to be invoked when a passphrase for > decrypting an SSL file such as a private key needs to be obtained. By > - default, this parameter is empty, which means the built-in prompting > - mechanism is used. > + default, this parameter is empty. When the server is using > + <productname>OpenSSL</productname>, this means the built-in prompting > + mechanism is used. When using <productname>NSS</productname>, there is > + no default prompting so a blank callback will be used returning an > + empty password. > </para> Maybe we should point out here that this requires the database to not require a password..? So if they have one, they need to set this, or maybe we should provide a default one.. > +++ b/doc/src/sgml/libpq.sgml > +<synopsis> > +PQsslKeyPassHook_nss_type PQgetSSLKeyPassHook_nss(void); > +</synopsis> > + </para> > + > + <para> > + <function>PQgetSSLKeyPassHook_nss</function> has no effect unless the > + server was compiled with <productname>nss</productname> support. > + </para> We should try to be consistent- above should be NSS, not nss. > + <listitem> > + <para> > + <productname>NSS</productname>: specifying the parameter is required > + in case any password protected items are referenced in the > + <productname>NSS</productname> database, or if the database itself > + is password protected. If multiple different objects are password > + protected, the same password is used for all. > + </para> > + </listitem> > + </itemizedlist> Is this a statement about NSS databases (which I don't think it is) or about the fact that we'll just use the password provided for all attempts to decrypt something we need in the database? Assuming the latter, seems like we could reword this to be a bit more clear. Maybe: All attempts to decrypt objects which are password protected in the database will use this password. ? > @@ -2620,9 +2791,14 @@ void *PQsslStruct(const PGconn *conn, const char *struct_name); > + For <productname>NSS</productname>, there is one struct available under > + the name "NSS", and it returns a pointer to the > + <productname>NSS</productname> <literal>PRFileDesc</literal>. ... SSL PRFileDesc associated with the connection, no? > +++ b/doc/src/sgml/runtime.sgml > @@ -2552,6 +2583,89 @@ openssl x509 -req -in server.csr -text -days 365 \ > </para> > </sect2> > > + <sect2 id="nss-certificate-database"> > + <title>NSS Certificate Databases</title> > + > + <para> > + When using <productname>NSS</productname>, all certificates and keys must > + be loaded into an <productname>NSS</productname> certificate database. > + </para> > + > + <para> > + To create a new <productname>NSS</productname> certificate database and > + load the certificates created in <xref linkend="ssl-certificate-creation" />, > + use the following <productname>NSS</productname> commands: > +<programlisting> > +certutil -d "sql:server.db" -N --empty-password > +certutil -d "sql:server.db" -A -n server.crt -i server.crt -t "CT,C,C" > +certutil -d "sql:server.db" -A -n root.crt -i root.crt -t "CT,C,C" > +</programlisting> > + This will give the certificate the filename as the nickname identifier in > + the database which is created as <filename>server.db</filename>. > + </para> > + <para> > + Then load the server key, which require converting it to *requires > Subject: [PATCH v30 6/9] nss: Support NSS in pgcrypto > +++ b/doc/src/sgml/pgcrypto.sgml > <row> > <entry>Blowfish</entry> > <entry>yes</entry> > <entry>yes</entry> > + <entry>yes</entry> > </row> Maybe this should mention that it's with the built-in implementation as blowfish isn't available from NSS? > <row> > <entry>DES/3DES/CAST5</entry> > <entry>no</entry> > <entry>yes</entry> > + <entry>yes</entry> > + </row> Surely CAST5 from the above should be removed, since it's given its own entry now? > @@ -1241,7 +1260,8 @@ gen_random_uuid() returns uuid > <orderedlist> > <listitem> > <para> > - Any digest algorithm <productname>OpenSSL</productname> supports > + Any digest algorithm <productname>OpenSSL</productname> and > + <productname>NSS</productname> supports > is automatically picked up. *or? Maybe something more specific though- "Any digest algorithm included with the library that PostgreSQL is compiled with is automatically picked up." ? > Subject: [PATCH v30 7/9] nss: Support NSS in sslinfo > > Since sslinfo to a large extent use the be_tls_* API this mostly *uses > Subject: [PATCH v30 8/9] nss: Support NSS in cryptohash > +++ b/src/common/cryptohash_nss.c > + /* > + * Initialize our own NSS context without a database backing it. > + */ > + memset(¶ms, 0, sizeof(params)); > + params.length = sizeof(params); > + status = NSS_NoDB_Init("."); We take some pains to use NSS_InitContext elsewhere.. Are we sure that we should be using NSS_NoDB_Init here..? Just a, well, not so quick read-through. Generally it's looking pretty good to me. Will see about playing with it this week. Thanks! Stephen
Attachment
> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, Thanks for the review! Below is a partial response, I haven't had time to address all your review comments yet but I wanted to submit a rebased patchset directly since the current version doesn't work after recent changes in the tree. I will address the remaining comments tomorrow or the day after. This rebase also includes a fix for pgtls_init which was sent offlist by Jacob. The changes in pgtls_init can potentially be used to initialize the crypto context for NSS to clean up this patch, Jacob is currently looking at that. >> Subject: [PATCH v30 1/9] nss: Support libnss as TLS library in libpq >> >> This commit contains the frontend and backend portion of TLS support >> in libpq to allow encrypted connections. The implementation is done > > maybe add 'using NSS' to that first sentence. ;) Fixed. >> +++ b/src/backend/libpq/auth.c >> @@ -2849,7 +2849,14 @@ CheckCertAuth(Port *port) >> { >> int status_check_usermap = STATUS_ERROR; >> >> +#if defined(USE_OPENSSL) >> Assert(port->ssl); >> +#elif defined(USE_NSS) >> + /* TODO: should we rename pr_fd to ssl, to keep consistency? */ >> + Assert(port->pr_fd); >> +#else >> + Assert(false); >> +#endif > > Having thought about this TODO item for a bit, I tend to think it's > better to keep them distinct. I agree, which is why the TODO comment was there in the first place. I've removed the comment now. > They aren't the same and it might not be > clear what's going on if one was to somehow mix them (at least if pr_fd > continues to sometimes be a void*, but I wonder why that's being > done..? more on that later..). To paraphrase from a later in this email, there are collisions between nspr and postgres on things like BITS_PER_BYTE, and there were also collisions on basic types until I learned about NO_NSPR_10_SUPPORT. By moving the juggling of this into common/nss.h we can use proper types without introducing that pollution everywhere. I will address these places. >> +++ b/src/backend/libpq/be-secure-nss.c > [...] >> +/* default init hook can be overridden by a shared library */ >> +static void default_nss_tls_init(bool isServerStart); >> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init; > >> +static PRDescIdentity pr_id; >> + >> +static PRIOMethods pr_iomethods; > > Happy to be told I'm missing something, but the above two variables seem > to only be used in init_iolayer.. is there a reason they're declared > here instead of just being declared in that function? They must be there since NSPR doesn't copy these but reference them. >> + /* >> + * Set the fallback versions for the TLS protocol version range to a >> + * combination of our minimal requirement and the library maximum. Error >> + * messages should be kept identical to those in be-secure-openssl.c to >> + * make translations easier. >> + */ > > Should we pull these error messages out into another header so that > they're in one place to make sure they're kept consistent, if we really > want to put the effort in to keep them the same..? I'm not 100% sure > that it's actually necessary to do so, but defining these in one place > would help maintain this if we want to. Also alright with just keeping > the comment, not that big of a deal. It might make sense to pull them into common/nss.h, but seeing the error message right there when reading the code does IMO make it clearer so it's a doubleedged sword. Not sure what is the best option, but I'm not married to the current solution so if there is consensus to pull them out somewhere I'm happy to do so. >> +int >> +be_tls_open_server(Port *port) >> +{ >> + SECStatus status; >> + PRFileDesc *model; >> + PRFileDesc *pr_fd; > > pr_fd here is materially different from port->pr_fd, no? As in, one is > the NSS raw TCP fd while the other is the SSL fd, right? Maybe we > should use two different variable names to try and make sure they don't > get confused? Might even set this to NULL after we are done with it > too.. Then again, I see later on that when we do the dance with the > 'model' PRFileDesc that we just use the same variable- maybe we should > do that? That is, just get rid of this 'pr_fd' and use port->pr_fd > always? Hmm, I think you're right. I will try that for the next patchset version. >> + /* >> + * The NSPR documentation states that runtime initialization via PR_Init >> + * is no longer required, as the first caller into NSPR will perform the >> + * initialization implicitly. The documentation doesn't however clarify >> + * from which version this is holds true, so let's perform the potentially >> + * superfluous initialization anyways to avoid crashing on older versions >> + * of NSPR, as there is no difference in overhead. The NSS documentation >> + * still states that PR_Init must be called in some way (implicitly or >> + * explicitly). >> + * >> + * The below parameters are what the implicit initialization would've done >> + * for us, and should work even for older versions where it might not be >> + * done automatically. The last parameter, maxPTDs, is set to various >> + * values in other codebases, but has been unused since NSPR 2.1 which was >> + * released sometime in 1998. In current versions of NSPR all parameters >> + * are ignored. >> + */ >> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ ); >> + >> + /* >> + * The certificate path (configdir) must contain a valid NSS database. If >> + * the certificate path isn't a valid directory, NSS will fall back on the >> + * system certificate database. If the certificate path is a directory but >> + * is empty then the initialization will fail. On the client side this can >> + * be allowed for any sslmode but the verify-xxx ones. >> + * https://bugzilla.redhat.com/show_bug.cgi?id=728562 For the server side >> + * we won't allow this to fail however, as we require the certificate and >> + * key to exist. >> + * >> + * The original design of NSS was for a single application to use a single >> + * copy of it, initialized with NSS_Initialize() which isn't returning any >> + * handle with which to refer to NSS. NSS initialization and shutdown are >> + * global for the application, so a shutdown in another NSS enabled >> + * library would cause NSS to be stopped for libpq as well. The fix has >> + * been to introduce NSS_InitContext which returns a context handle to >> + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS >> + * 3.12, but the use of it is not very well documented. >> + * https://bugzilla.redhat.com/show_bug.cgi?id=738456 > > The above seems to indicate that we will be requiring at least 3.12, > right? Yet above we have code to work with NSPR versions before 2.1? Well, not really. The comment tries to explain the rationale for the parameters passed. Clearly the comment could be improved to make that point clearer. > Maybe we should put a stake in the ground that says "we only support > back to version X of NSS", test with that and a few more recent versions > and the most recent, and then rip out anything that's needed for > versions which are older than that? Yes, right now there is very little in the patch which caters for old versions, the PR_Init call might be one of the few offenders. There has been discussion upthread about settling for a required version, combining the insights learned there with a survey of which versions are commonly available packaged. Once we settle on a version we can confirm if PR_Init is/isn't needed and remove all traces of it if not. > I have a pretty hard time imagining that someone is going to want to build PG > v14 w/ NSS 2.0 ... Let alone compiling 2.0 at all on a recent system.. >> + { >> + char *ciphers, >> + *c; >> + >> + char *sep = ":;, "; >> + PRUint16 ciphercode; >> + const PRUint16 *nss_ciphers; >> + >> + /* >> + * If the user has specified a set of preferred cipher suites we start >> + * by turning off all the existing suites to avoid the risk of down- >> + * grades to a weaker cipher than expected. >> + */ >> + nss_ciphers = SSL_GetImplementedCiphers(); >> + for (int i = 0; i < SSL_GetNumImplementedCiphers(); i++) >> + SSL_CipherPrefSet(model, nss_ciphers[i], PR_FALSE); >> + >> + ciphers = pstrdup(SSLCipherSuites); >> + >> + for (c = strtok(ciphers, sep); c; c = strtok(NULL, sep)) >> + { >> + if (!pg_find_cipher(c, &ciphercode)) >> + { >> + status = SSL_CipherPrefSet(model, ciphercode, PR_TRUE); >> + if (status != SECSuccess) >> + { >> + ereport(COMMERROR, >> + (errmsg("invalid cipher-suite specified: %s", c))); >> + return -1; >> + } >> + } >> + } > > Maybe I'm a bit confused, but doesn't pg_find_cipher return *true* when > a cipher is found, and therefore the '!' above is saying "if we don't > find a matching cipher, then run the code to set the cipher ...". Hmm, yes thats broken. Fixed. > Also- we don't seem to complain at all about a cipher being specified that we > don't find? Guess I would think that we might want to throw a WARNING in such > a case, but I could possibly be convinced otherwise. No, I think you're right, we should throw WARNING there or possibly even a higher elevel. Should that be a COMMERROR even? > Kind of wonder just what happens with the current code, I'm guessing ciphercode > is zero and therefore doesn't complain but also doesn't do what we want. I > wonder if there's a way to test this? We could extend the test suite to set ciphers in postgresql.conf, I'll give it a go. > I do think we should probably throw an error if we end up with *no* > ciphers being set, which doesn't seem to be happening here..? Yeah, that should be a COMMERROR. Fixed. >> + /* >> + * Set up the custom IO layer. >> + */ > > Might be good to mention that the IO Layer is what sets up the > read/write callbacks to be used. Good point, will do in the next version of the patchset. >> + port->pr_fd = SSL_ImportFD(model, pr_fd); >> + if (!port->pr_fd) >> + { >> + ereport(COMMERROR, >> + (errmsg("unable to initialize"))); >> + return -1; >> + } > > Maybe a comment and a better error message for this? Will do. > >> + PR_Close(model); > > This might deserve one also, the whole 'model' construct is a bit > different. :) Agreed. will do. >> + port->ssl_in_use = true; >> + >> + /* Register out shutdown callback */ > > *our Fixed. >> +int >> +be_tls_get_cipher_bits(Port *port) >> +{ >> + SECStatus status; >> + SSLChannelInfo channel; >> + SSLCipherSuiteInfo suite; >> + >> + status = SSL_GetChannelInfo(port->pr_fd, &channel, sizeof(channel)); >> + if (status != SECSuccess) >> + goto error; >> + >> + status = SSL_GetCipherSuiteInfo(channel.cipherSuite, &suite, sizeof(suite)); >> + if (status != SECSuccess) >> + goto error; >> + >> + return suite.effectiveKeyBits; >> + >> +error: >> + ereport(WARNING, >> + (errmsg("unable to extract TLS session information: %s", >> + pg_SSLerrmessage(PR_GetError())))); >> + return 0; >> +} > > It doesn't have to be much, but I, at least, do prefer to see > function-header comments. :) Not that the OpenSSL code has them > consistently, so obviously not that big of a deal. Goes for a number of > the functions being added. No disagreement from me, I've added comments on a few more functions and will continue to go over the patchset to add them everywhere. Some of these comments are pretty uninteresting and could do with some wordsmithing. >> + /* Found a CN, ecode and copy it into a newly allocated buffer */ > > *decode Fixed. >> +static PRInt32 >> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags, >> + PRIntervalTime timeout) >> +{ >> + PRRecvFN read_fn; >> + PRInt32 n_read; >> + >> + read_fn = fd->lower->methods->recv; >> + n_read = read_fn(fd->lower, buf, amount, flags, timeout); >> + >> + return n_read; >> +} >> + >> +static PRInt32 >> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags, >> + PRIntervalTime timeout) >> +{ >> + PRSendFN send_fn; >> + PRInt32 n_write; >> + >> + send_fn = fd->lower->methods->send; >> + n_write = send_fn(fd->lower, buf, amount, flags, timeout); >> + >> + return n_write; >> +} >> + >> +static PRStatus >> +pg_ssl_close(PRFileDesc *fd) >> +{ >> + /* >> + * Disconnect our private Port from the fd before closing out the stack. >> + * (Debug builds of NSPR will assert if we do not.) >> + */ >> + fd->secret = NULL; >> + return PR_GetDefaultIOMethods()->close(fd); >> +} > > Regarding these, I find myself wondering how they're different from the > defaults..? I mean, the above just directly called > PR_GetDefaultIOMethods() to then call it's close() function- are the > fd->lower_methods->recv/send not the default methods? I don't quite get > what the point is from having our own callbacks here if they just do > exactly what the defaults would do (or are there actually no defined > defaults and you have to provide these..?). It's really just to cope with debug builds of NSPR which assert that fd->secret is null before closing. >> +/* >> + * ssl_protocol_version_to_nss >> + * Translate PostgreSQL TLS version to NSS version >> + * >> + * Returns zero in case the requested TLS version is undefined (PG_ANY) and >> + * should be set by the caller, or -1 on failure. >> + */ >> +static uint16 >> +ssl_protocol_version_to_nss(int v, const char *guc_name) > > guc_name isn't actually used in this function..? Is there some reason > to keep it or is it leftover? It's a leftover from when the function was doing error reporting, fixed. > Also, I get that they do similar jobs and that one is in the frontend > and the other is in the backend, but I'm not a fan of having two > 'ssl_protocol_version_to_nss()'s functions that take different argument > types but have exact same name and do functionally different things.. Good point, I'll change that. >> +++ b/src/backend/utils/misc/guc.c >> @@ -4377,6 +4381,18 @@ static struct config_string ConfigureNamesString[] = >> check_canonical_path, assign_pgstat_temp_directory, NULL >> }, >> >> +#ifdef USE_NSS >> + { >> + {"ssl_database", PGC_SIGHUP, CONN_AUTH_SSL, >> + gettext_noop("Location of the NSS certificate database."), >> + NULL >> + }, >> + &ssl_database, >> + "", >> + NULL, NULL, NULL >> + }, >> +#endif > > We don't #ifdef out the various GUCs even if SSL isn't compiled in, so > it doesn't seem quite right to be doing so here? Generally speaking, > GUCs that we expect people to use (rather than debugging ones and such) > are typically always built, even if we don't build support for that > capability, so we can throw a better error message than just some ugly > syntax or parsing error if we come across one being set in a non-enabled > build. Of course, fixed. >> +++ b/src/common/cipher_nss.c >> @@ -0,0 +1,192 @@ >> +/*------------------------------------------------------------------------- >> + * >> + * cipher_nss.c >> + * NSS functionality shared between frontend and backend for working >> + * with ciphers >> + * >> + * This should only bse used if code is compiled with NSS support. > > *be Fixed. >> +++ b/src/include/libpq/libpq-be.h >> @@ -200,6 +200,10 @@ typedef struct Port >> SSL *ssl; >> X509 *peer; >> #endif >> + >> +#ifdef USE_NSS >> + void *pr_fd; >> +#endif >> } Port; > > Given this is under a #ifdef USE_NSS, does it need to be / should it > really be a void*? It's to avoid the same BITS_PER_BYTE collision discussed elsewhere in this email. >> +++ b/src/interfaces/libpq/fe-connect.c >> @@ -359,6 +359,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = { >> "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */ >> offsetof(struct pg_conn, target_session_attrs)}, >> >> + {"cert_database", NULL, NULL, NULL, >> + "CertificateDatabase", "", 64, >> + offsetof(struct pg_conn, cert_database)}, > > I mean, maybe nitpicking here, but all the other SSL stuff is > 'sslsomething' and the backend version of this is 'ssl_database', so > wouldn't it be more consistent to have this be 'ssldatabase'? Thats a good point, I was clearly Stockholm syndromed since I hadn't reflected on that but it's clearly wrong. Will fix. >> +++ b/src/interfaces/libpq/fe-secure-nss.c >> + * This logic exist in NSS as well, but it's only available for when there is > > *exists Fixed. >> + /* >> + * The NSPR documentation states that runtime initialization via PR_Init >> + * is no longer required, as the first caller into NSPR will perform the >> + * initialization implicitly. See be-secure-nss.c for further discussion >> + * on PR_Init. >> + */ >> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0); > > See same comment I made above- and also there's a comment earlier in > this file that we don't need to PR_Init() even ... Right, once we can confirm that the minimum required versions are past the PR_Init dependency then we should remove all of these calls. If we can't remove the calls, the comments should be updated to reflect why they are there. >> + { >> + conn->nss_context = NSS_InitContext("", "", "", "", ¶ms, >> + NSS_INIT_READONLY | NSS_INIT_NOCERTDB | >> + NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN | >> + NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD); >> + if (!conn->nss_context) >> + { >> + printfPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("unable to create certificate database: %s"), >> + pg_SSLerrmessage(PR_GetError())); >> + return PGRES_POLLING_FAILED; >> + } >> + } > > That error message seems a bit ... off? Surely we aren't trying to > actually create a certificate database here? Not really no, it does set up a transient database structure for the duration of the connection AFAIK but thats clearly not the level of detail we should be giving users. I've reworded to indicate that NSS init failed, and ideally the pg_SSLerrmessage call will provide appropriate detail. >> + /* >> + * Configure cipher policy. >> + */ >> + status = NSS_SetDomesticPolicy(); >> + if (status != SECSuccess) >> + { >> + printfPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("unable to configure cipher policy: %s"), >> + pg_SSLerrmessage(PR_GetError())); >> + >> + return PGRES_POLLING_FAILED; >> + } > > Probably good to pull over at least some parts of the comments made in > the backend code about SetDomesticPolicy() actually enabling everything > (just like all the policies apparently do)... Good point, will do. >> + /* >> + * If we don't have a certificate database, the system trust store is the >> + * fallback we can use. If we fail to initialize that as well, we can >> + * still attempt a connection as long as the sslmode isn't verify*. >> + */ >> + if (!conn->cert_database && conn->sslmode[0] == 'v') >> + { >> + status = pg_load_nss_module(&ca_trust, ca_trust_name, "\"Root Certificates\""); >> + if (status != SECSuccess) >> + { >> + printfPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("WARNING: unable to load NSS trust module \"%s\" : %s"), >> + ca_trust_name, >> + pg_SSLerrmessage(PR_GetError())); >> + >> + return PGRES_POLLING_FAILED; >> + } >> + } > > Maybe have something a bit more here about "maybe you should specifify a > cert_database" or such? Good point, will expand with more detail. >> + if (conn->ssl_max_protocol_version && strlen(conn->ssl_max_protocol_version) > 0) >> + { >> + int ssl_max_ver = ssl_protocol_version_to_nss(conn->ssl_max_protocol_version); >> + >> + if (ssl_max_ver == -1) >> + { >> + printfPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("invalid value \"%s\" for maximum version of SSL protocol\n"), >> + conn->ssl_max_protocol_version); >> + return -1; >> + } >> + >> + desired_range.max = ssl_max_ver; >> + } > > In the backend code, we have an additional check to make sure they > didn't set the min version higher than the max.. should we have that > here too? Either way, seems like we should be consistent. We already test that in src/interfaces/libpq/fe-connect.c. >> + * The model can now we closed as we've applied the settings of the model > > *be Fixed. >> + * onto the real socket. From hereon we should only use conn->pr_fd. > > *here on Fixed. > Similar comments to the backend code- should we just always use > conn->pr_fd? Or should we rename pr_fd to something else? Renaming is probably not a bad idea, will fix. >> + /* >> + * Specify which hostname we are expecting to talk to. This is required, >> + * albeit mostly applies to when opening a connection to a traditional >> + * http server it seems. >> + */ >> + SSL_SetURL(conn->pr_fd, (conn->connhost[conn->whichhost]).host); > > We should probably also set SNI, if available (NSS 3.12.6 it seems?), > since it looks like that's going to be added to the OpenSSL code. Good point, will do. >> + do >> + { >> + status = SSL_ForceHandshake(conn->pr_fd); >> + } >> + while (status != SECSuccess && PR_GetError() == PR_WOULD_BLOCK_ERROR); > > We don't seem to have this loop in the backend code.. Is there some > reason that we don't? Is it possible that we need to have a loop here > too? I recall in the GSS encryption code there were definitely things > during setup that had to be looped back over on both sides to make sure > everything was finished ... Off the cuff I can't remember, will look into it. >> + if (conn->sslmode[0] == 'v') >> + return SECFailure; > > Seems a bit grotty to do this (though I see that the OpenSSL code does > too ... at least there we have a comment though, maybe add one here?). > I would have thought we'd actually do strcmp()'s like above. That's admittedly copied from the OpenSSL code, and I agree that it's a bit too clever. Replaced with plain strcmp's to improve readability in both places it occurred. >> + /* >> + * Return the underlying PRFileDesc which can be used to access >> + * information on the connection details. There is no SSL context per se. >> + */ >> + if (strcmp(struct_name, "NSS") == 0) >> + return conn->pr_fd; >> + return NULL; >> +} > > Is there never a reason someone might want the pointer returned by > NSS_InitContext? I don't know that there is but it might be something > to consider (we could even possibly have our own structure returned by > this function which includes both, maybe..?). Not sure if there's a > sensible use-case for that or not just wanted to bring it up as it's > something I asked myself while reading through this patch. Not sure I understand what you're asking for here, did you mean "is there ever a reason"? >> + if (strcmp(attribute_name, "protocol") == 0) >> + { >> + switch (channel.protocolVersion) >> + { >> +#ifdef SSL_LIBRARY_VERSION_TLS_1_3 >> + case SSL_LIBRARY_VERSION_TLS_1_3: >> + return "TLSv1.3"; >> +#endif >> +#ifdef SSL_LIBRARY_VERSION_TLS_1_2 >> + case SSL_LIBRARY_VERSION_TLS_1_2: >> + return "TLSv1.2"; >> +#endif >> +#ifdef SSL_LIBRARY_VERSION_TLS_1_1 >> + case SSL_LIBRARY_VERSION_TLS_1_1: >> + return "TLSv1.1"; >> +#endif >> + case SSL_LIBRARY_VERSION_TLS_1_0: >> + return "TLSv1.0"; >> + default: >> + return "unknown"; >> + } >> + } > > Not sure that it really matters, but this seems like it might be useful > to have as its own function... Maybe even a data structure that both > functions use just in oppostie directions. Really minor tho. :) I suppose that wouldn't be a bad thing, will fix. >> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c >> index c601071838..7f10da3010 100644 >> --- a/src/interfaces/libpq/fe-secure.c >> +++ b/src/interfaces/libpq/fe-secure.c >> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn) >> } >> #endif /* USE_OPENSSL */ >> >> +#ifndef USE_NSS >> + >> +PQsslKeyPassHook_nss_type >> +PQgetSSLKeyPassHook_nss(void) >> +{ >> + return NULL; >> +} >> + >> +void >> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook) >> +{ >> + return; >> +} >> + >> +char * >> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg) >> +{ >> + return NULL; >> +} >> +#endif /* USE_NSS */ > > Isn't this '!USE_NSS'? Technically it is, but using just /* USE_NSS */ is consistent with the rest of blocks in the file. >> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h >> index 0c9e95f1a7..f15af39222 100644 >> --- a/src/interfaces/libpq/libpq-int.h >> +++ b/src/interfaces/libpq/libpq-int.h >> @@ -383,6 +383,7 @@ struct pg_conn >> char *sslrootcert; /* root certificate filename */ >> char *sslcrl; /* certificate revocation list filename */ >> char *sslcrldir; /* certificate revocation list directory name */ >> + char *cert_database; /* NSS certificate/key database */ >> char *requirepeer; /* required peer credentials for local sockets */ >> char *gssencmode; /* GSS mode (require,prefer,disable) */ >> char *krbsrvname; /* Kerberos service name */ >> @@ -507,6 +508,28 @@ struct pg_conn >> * OpenSSL version changes */ >> #endif >> #endif /* USE_OPENSSL */ >> + >> +/* >> + * The NSS/NSPR specific types aren't used to avoid pulling in the required >> + * headers here, as they are causing conflicts with PG definitions. >> + */ > > I'm a bit confused- what are the conflicts being caused here..? > Certainly under USE_OPENSSL we use the actual OpenSSL types.. It's referring to collisions with for example BITS_PER_BYTE which is defined both by postgres and nspr. Since writing this I've introduced src/common/nss.h to handle it in a single place, so we can indeed use the proper types without polluting the file. Fixed. >> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library >> >> The SSL testharness was fully tied to OpenSSL in the way the server was >> set up and reconfigured. This refactors the SSLServer module into a SSL >> library agnostic SSL/Server module which in turn use SSL/Backend/<lib> >> modules for the implementation details. >> >> No changes are done to the actual tests, this only change how setup and >> teardown is performed. > > Presumably this could be committed ahead of the main NSS support? Correct, I think this has merits even if NSS support is ultimately rejected. >> Subject: [PATCH v30 4/9] nss: pg_strong_random support >> +++ b/src/port/pg_strong_random.c >> +bool >> +pg_strong_random(void *buf, size_t len) >> +{ >> + NSSInitParameters params; >> + NSSInitContext *nss_context; >> + SECStatus status; >> + >> + memset(¶ms, 0, sizeof(params)); >> + params.length = sizeof(params); >> + nss_context = NSS_InitContext("", "", "", "", ¶ms, >> + NSS_INIT_READONLY | NSS_INIT_NOCERTDB | >> + NSS_INIT_NOMODDB | NSS_INIT_FORCEOPEN | >> + NSS_INIT_NOROOTINIT | NSS_INIT_PK11RELOAD); >> + >> + if (!nss_context) >> + return false; >> + >> + status = PK11_GenerateRandom(buf, len); >> + NSS_ShutdownContext(nss_context); >> + >> + if (status == SECSuccess) >> + return true; >> + >> + return false; >> +} >> + >> +#else /* not USE_OPENSSL, USE_NSS or WIN32 */ > > I don't know that it's an issue, but do we actually need to init the NSS > context and shut it down every time..? We need to have a context, and we should be able to set it like how the WIN32 code sets hProvider. I don't remember if there was a reason against that, will revisit. >> /* >> * Without OpenSSL or Win32 support, just read /dev/urandom ourselves. > > *or NSS Fixed. >> Subject: [PATCH v30 5/9] nss: Documentation >> +++ b/doc/src/sgml/acronyms.sgml >> @@ -684,6 +717,16 @@ >> </listitem> >> </varlistentry> >> >> + <varlistentry> >> + <term><acronym>TLS</acronym></term> >> + <listitem> >> + <para> >> + <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security"> >> + Transport Layer Security</ulink> >> + </para> >> + </listitem> >> + </varlistentry> > > We don't have this already..? Surely we should.. We really should, especially since we've had <acronym>TLS</acronym> in config.sgml since 2014 (c6763156589). That's another small piece that could be committed on it's own to cut down the size of this patchset (even if only by a tiny amount). >> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml >> index 967de73596..1608e9a7c7 100644 >> --- a/doc/src/sgml/config.sgml >> +++ b/doc/src/sgml/config.sgml >> @@ -1272,6 +1272,23 @@ include_dir 'conf.d' >> </listitem> >> </varlistentry> >> >> + <varlistentry id="guc-ssl-database" xreflabel="ssl_database"> >> + <term><varname>ssl_database</varname> (<type>string</type>) >> + <indexterm> >> + <primary><varname>ssl_database</varname> configuration parameter</primary> >> + </indexterm> >> + </term> >> + <listitem> >> + <para> >> + Specifies the name of the file containing the server certificates and >> + keys when using <productname>NSS</productname> for <acronym>SSL</acronym> >> + connections. This parameter can only be set in the >> + <filename>postgresql.conf</filename> file or on the server command >> + line. > > *SSL/TLS maybe? Fixed. >> @@ -1288,7 +1305,9 @@ include_dir 'conf.d' >> connections using TLS version 1.2 and lower are affected. There is >> currently no setting that controls the cipher choices used by TLS >> version 1.3 connections. The default value is >> - <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>. The default is usually a >> + <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have >> + been built with <productname>OpenSSL</productname> as the >> + <acronym>SSL</acronym> library. The default is usually a >> reasonable choice unless you have specific security requirements. >> </para> > > Shouldn't we say something here wrt NSS? We should, but I'm not entirely what just yet. Need to revisit that. >> @@ -1490,8 +1509,11 @@ include_dir 'conf.d' >> <para> >> Sets an external command to be invoked when a passphrase for >> decrypting an SSL file such as a private key needs to be obtained. By >> - default, this parameter is empty, which means the built-in prompting >> - mechanism is used. >> + default, this parameter is empty. When the server is using >> + <productname>OpenSSL</productname>, this means the built-in prompting >> + mechanism is used. When using <productname>NSS</productname>, there is >> + no default prompting so a blank callback will be used returning an >> + empty password. >> </para> > > Maybe we should point out here that this requires the database to not > require a password..? So if they have one, they need to set this, or > maybe we should provide a default one.. I've added a sentence on not using a password for the cert database. I'm not sure if providing a default one is a good idea but it's no less insecure than having no password really.. >> +++ b/doc/src/sgml/libpq.sgml >> +<synopsis> >> +PQsslKeyPassHook_nss_type PQgetSSLKeyPassHook_nss(void); >> +</synopsis> >> + </para> >> + >> + <para> >> + <function>PQgetSSLKeyPassHook_nss</function> has no effect unless the >> + server was compiled with <productname>nss</productname> support. >> + </para> > > We should try to be consistent- above should be NSS, not nss. Fixed. >> + <listitem> >> + <para> >> + <productname>NSS</productname>: specifying the parameter is required >> + in case any password protected items are referenced in the >> + <productname>NSS</productname> database, or if the database itself >> + is password protected. If multiple different objects are password >> + protected, the same password is used for all. >> + </para> >> + </listitem> >> + </itemizedlist> > > Is this a statement about NSS databases (which I don't think it is) or > about the fact that we'll just use the password provided for all > attempts to decrypt something we need in the database? Correct. > Assuming the > latter, seems like we could reword this to be a bit more clear. > > Maybe: > > All attempts to decrypt objects which are password protected in the > database will use this password. Agreed, fixed. >> @@ -2620,9 +2791,14 @@ void *PQsslStruct(const PGconn *conn, const char *struct_name); >> + For <productname>NSS</productname>, there is one struct available under >> + the name "NSS", and it returns a pointer to the >> + <productname>NSS</productname> <literal>PRFileDesc</literal>. > > ... SSL PRFileDesc associated with the connection, no? I was trying to be specific that it's an NSS-defined structure and not a PostgreSQL one which is returned. Fixed. >> +++ b/doc/src/sgml/runtime.sgml >> @@ -2552,6 +2583,89 @@ openssl x509 -req -in server.csr -text -days 365 \ >> </para> >> </sect2> >> >> + <sect2 id="nss-certificate-database"> >> + <title>NSS Certificate Databases</title> >> + >> + <para> >> + When using <productname>NSS</productname>, all certificates and keys must >> + be loaded into an <productname>NSS</productname> certificate database. >> + </para> >> + >> + <para> >> + To create a new <productname>NSS</productname> certificate database and >> + load the certificates created in <xref linkend="ssl-certificate-creation" />, >> + use the following <productname>NSS</productname> commands: >> +<programlisting> >> +certutil -d "sql:server.db" -N --empty-password >> +certutil -d "sql:server.db" -A -n server.crt -i server.crt -t "CT,C,C" >> +certutil -d "sql:server.db" -A -n root.crt -i root.crt -t "CT,C,C" >> +</programlisting> >> + This will give the certificate the filename as the nickname identifier in >> + the database which is created as <filename>server.db</filename>. >> + </para> >> + <para> >> + Then load the server key, which require converting it to > > *requires Fixed. >> Subject: [PATCH v30 6/9] nss: Support NSS in pgcrypto >> +++ b/doc/src/sgml/pgcrypto.sgml >> <row> >> <entry>Blowfish</entry> >> <entry>yes</entry> >> <entry>yes</entry> >> + <entry>yes</entry> >> </row> > > Maybe this should mention that it's with the built-in implementation as > blowfish isn't available from NSS? Fixed by adding a Note item. >> <row> >> <entry>DES/3DES/CAST5</entry> >> <entry>no</entry> >> <entry>yes</entry> >> + <entry>yes</entry> >> + </row> > > Surely CAST5 from the above should be removed, since it's given its own > entry now? Indeed, fixed. >> @@ -1241,7 +1260,8 @@ gen_random_uuid() returns uuid >> <orderedlist> >> <listitem> >> <para> >> - Any digest algorithm <productname>OpenSSL</productname> supports >> + Any digest algorithm <productname>OpenSSL</productname> and >> + <productname>NSS</productname> supports >> is automatically picked up. > > *or? Maybe something more specific though- "Any digest algorithm > included with the library that PostgreSQL is compiled with is > automatically picked up." ? Good point, thats better. Fixed. >> Subject: [PATCH v30 7/9] nss: Support NSS in sslinfo >> >> Since sslinfo to a large extent use the be_tls_* API this mostly > > *uses Fixed. >> Subject: [PATCH v30 8/9] nss: Support NSS in cryptohash >> +++ b/src/common/cryptohash_nss.c >> + /* >> + * Initialize our own NSS context without a database backing it. >> + */ >> + memset(¶ms, 0, sizeof(params)); >> + params.length = sizeof(params); >> + status = NSS_NoDB_Init("."); > > We take some pains to use NSS_InitContext elsewhere.. Are we sure that > we should be using NSS_NoDB_Init here..? No, we should probably be using NSS_InitContext. Will fix. > Just a, well, not so quick read-through. Generally it's looking pretty > good to me. Will see about playing with it this week. Thanks again for reviewing, another version which addresses the remaining issues will be posted soon but I wanted to get this out to give further reviews something that properly works. -- Daniel Gustafsson https://vmware.com/
Attachment
- v31-0009-nss-Build-infrastructure.patch
- v31-0008-nss-Support-NSS-in-cryptohash.patch
- v31-0007-nss-Support-NSS-in-sslinfo.patch
- v31-0006-nss-Support-NSS-in-pgcrypto.patch
- v31-0005-nss-Documentation.patch
- v31-0004-nss-pg_strong_random-support.patch
- v31-0003-nss-Add-NSS-specific-tests.patch
- v31-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v31-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > > On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote: > > Thanks for the review! Below is a partial response, I haven't had time to > address all your review comments yet but I wanted to submit a rebased patchset > directly since the current version doesn't work after recent changes in the > tree. I will address the remaining comments tomorrow or the day after. Great, thanks! > This rebase also includes a fix for pgtls_init which was sent offlist by Jacob. > The changes in pgtls_init can potentially be used to initialize the crypto > context for NSS to clean up this patch, Jacob is currently looking at that. Ah, cool, sounds good. > > They aren't the same and it might not be > > clear what's going on if one was to somehow mix them (at least if pr_fd > > continues to sometimes be a void*, but I wonder why that's being > > done..? more on that later..). > > To paraphrase from a later in this email, there are collisions between nspr and > postgres on things like BITS_PER_BYTE, and there were also collisions on basic > types until I learned about NO_NSPR_10_SUPPORT. By moving the juggling of this > into common/nss.h we can use proper types without introducing that pollution > everywhere. I will address these places. Ah, ok, and great, that sounds good. > >> +++ b/src/backend/libpq/be-secure-nss.c > > [...] > >> +/* default init hook can be overridden by a shared library */ > >> +static void default_nss_tls_init(bool isServerStart); > >> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init; > > > >> +static PRDescIdentity pr_id; > >> + > >> +static PRIOMethods pr_iomethods; > > > > Happy to be told I'm missing something, but the above two variables seem > > to only be used in init_iolayer.. is there a reason they're declared > > here instead of just being declared in that function? > > They must be there since NSPR doesn't copy these but reference them. Ah, ok, interesting. > >> + /* > >> + * Set the fallback versions for the TLS protocol version range to a > >> + * combination of our minimal requirement and the library maximum. Error > >> + * messages should be kept identical to those in be-secure-openssl.c to > >> + * make translations easier. > >> + */ > > > > Should we pull these error messages out into another header so that > > they're in one place to make sure they're kept consistent, if we really > > want to put the effort in to keep them the same..? I'm not 100% sure > > that it's actually necessary to do so, but defining these in one place > > would help maintain this if we want to. Also alright with just keeping > > the comment, not that big of a deal. > > It might make sense to pull them into common/nss.h, but seeing the error > message right there when reading the code does IMO make it clearer so it's a > doubleedged sword. Not sure what is the best option, but I'm not married to > the current solution so if there is consensus to pull them out somewhere I'm > happy to do so. My thought was to put them into some common/ssl.h or something along those lines but I don't see it as a big deal either way really. You make a good point that having the error message there when reading the code is nice. > > Maybe we should put a stake in the ground that says "we only support > > back to version X of NSS", test with that and a few more recent versions > > and the most recent, and then rip out anything that's needed for > > versions which are older than that? > > Yes, right now there is very little in the patch which caters for old versions, > the PR_Init call might be one of the few offenders. There has been discussion > upthread about settling for a required version, combining the insights learned > there with a survey of which versions are commonly available packaged. > > Once we settle on a version we can confirm if PR_Init is/isn't needed and > remove all traces of it if not. I don't really see this as all that hard to do- I'd suggest we look at what systems someone might reasonably deploy v14 on. To that end, I'd say "only systems which are presently supported", so: RHEL7+, Debian 9+, Ubuntu 16.04+. Looking at those, I see: Ubuntu 16.04: 3.28.4 RHEL6: v3.28.4 Debian: 3.26.2 > > I have a pretty hard time imagining that someone is going to want to build PG > > v14 w/ NSS 2.0 ... > > Let alone compiling 2.0 at all on a recent system.. Indeed, and given the above, it seems entirely reasonable to make the requirement be NSS v3+, no? I wouldn't be against making that even tighter if we thought it made sense to do so. > > Also- we don't seem to complain at all about a cipher being specified that we > > don't find? Guess I would think that we might want to throw a WARNING in such > > a case, but I could possibly be convinced otherwise. > > No, I think you're right, we should throw WARNING there or possibly even a > higher elevel. Should that be a COMMERROR even? I suppose the thought I was having was that we might want to allow some string that covered all the OpenSSL and NSS ciphers that someone feels comfortable with and we'd just ignore the ones that don't make sense for the particular library we're currently built with. Making it a COMMERROR seems like overkill and I'm not entirely sure we actually want any warning since we might then be constantly bleating about it. > > Kind of wonder just what happens with the current code, I'm guessing ciphercode > > is zero and therefore doesn't complain but also doesn't do what we want. I > > wonder if there's a way to test this? > > We could extend the test suite to set ciphers in postgresql.conf, I'll give it > a go. That'd be great, thanks! > > I do think we should probably throw an error if we end up with *no* > > ciphers being set, which doesn't seem to be happening here..? > > Yeah, that should be a COMMERROR. Fixed. I do think it makes sense to throw a COMMERROR here since the connection is going to end up failing anyway. > >> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags, > >> + PRIntervalTime timeout) > >> +{ > >> + PRRecvFN read_fn; > >> + PRInt32 n_read; > >> + > >> + read_fn = fd->lower->methods->recv; > >> + n_read = read_fn(fd->lower, buf, amount, flags, timeout); > >> + > >> + return n_read; > >> +} > >> + > >> +static PRInt32 > >> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags, > >> + PRIntervalTime timeout) > >> +{ > >> + PRSendFN send_fn; > >> + PRInt32 n_write; > >> + > >> + send_fn = fd->lower->methods->send; > >> + n_write = send_fn(fd->lower, buf, amount, flags, timeout); > >> + > >> + return n_write; > >> +} > >> + > >> +static PRStatus > >> +pg_ssl_close(PRFileDesc *fd) > >> +{ > >> + /* > >> + * Disconnect our private Port from the fd before closing out the stack. > >> + * (Debug builds of NSPR will assert if we do not.) > >> + */ > >> + fd->secret = NULL; > >> + return PR_GetDefaultIOMethods()->close(fd); > >> +} > > > > Regarding these, I find myself wondering how they're different from the > > defaults..? I mean, the above just directly called > > PR_GetDefaultIOMethods() to then call it's close() function- are the > > fd->lower_methods->recv/send not the default methods? I don't quite get > > what the point is from having our own callbacks here if they just do > > exactly what the defaults would do (or are there actually no defined > > defaults and you have to provide these..?). > > It's really just to cope with debug builds of NSPR which assert that fd->secret > is null before closing. And we have to override the recv/send functions for this too..? Sorry, my comment wasn't just about the close() method but about the others too. > >> + /* > >> + * Return the underlying PRFileDesc which can be used to access > >> + * information on the connection details. There is no SSL context per se. > >> + */ > >> + if (strcmp(struct_name, "NSS") == 0) > >> + return conn->pr_fd; > >> + return NULL; > >> +} > > > > Is there never a reason someone might want the pointer returned by > > NSS_InitContext? I don't know that there is but it might be something > > to consider (we could even possibly have our own structure returned by > > this function which includes both, maybe..?). Not sure if there's a > > sensible use-case for that or not just wanted to bring it up as it's > > something I asked myself while reading through this patch. > > Not sure I understand what you're asking for here, did you mean "is there ever > a reason"? Eh, poor wording on my part. You're right, the question, reworded again, was "Would someone want to get the context returned by NSS_InitContext?". If we think there's a reason that someone might want that context then perhaps we should allow getting it, in addition to the pr_fd. If there's really no reason to ever want the context from NSS_InitContext then what you have here where we're returning pr_fd is probably fine. > >> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c > >> index c601071838..7f10da3010 100644 > >> --- a/src/interfaces/libpq/fe-secure.c > >> +++ b/src/interfaces/libpq/fe-secure.c > >> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn) > >> } > >> #endif /* USE_OPENSSL */ > >> > >> +#ifndef USE_NSS > >> + > >> +PQsslKeyPassHook_nss_type > >> +PQgetSSLKeyPassHook_nss(void) > >> +{ > >> + return NULL; > >> +} > >> + > >> +void > >> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook) > >> +{ > >> + return; > >> +} > >> + > >> +char * > >> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg) > >> +{ > >> + return NULL; > >> +} > >> +#endif /* USE_NSS */ > > > > Isn't this '!USE_NSS'? > > Technically it is, but using just /* USE_NSS */ is consistent with the rest of > blocks in the file. Hrmpf. I guess it seems a bit confusing to me to have to go find the opening #ifndef to realize that it's actally !USE_NSS.. In other words, I would think we'd actually want to fix all of these, heh. I only actually see one case on a quick grep where it's wrong for USE_OPENSSL and so that doesn't seem like it's really a precedent and is more of a bug. We certainly say 'not OPENSSL' in one place today too and also have a number of places where we have: #endif ... /* ! WHATEVER */. > >> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h > >> index 0c9e95f1a7..f15af39222 100644 > >> --- a/src/interfaces/libpq/libpq-int.h > >> +++ b/src/interfaces/libpq/libpq-int.h > >> @@ -383,6 +383,7 @@ struct pg_conn > >> char *sslrootcert; /* root certificate filename */ > >> char *sslcrl; /* certificate revocation list filename */ > >> char *sslcrldir; /* certificate revocation list directory name */ > >> + char *cert_database; /* NSS certificate/key database */ > >> char *requirepeer; /* required peer credentials for local sockets */ > >> char *gssencmode; /* GSS mode (require,prefer,disable) */ > >> char *krbsrvname; /* Kerberos service name */ > >> @@ -507,6 +508,28 @@ struct pg_conn > >> * OpenSSL version changes */ > >> #endif > >> #endif /* USE_OPENSSL */ > >> + > >> +/* > >> + * The NSS/NSPR specific types aren't used to avoid pulling in the required > >> + * headers here, as they are causing conflicts with PG definitions. > >> + */ > > > > I'm a bit confused- what are the conflicts being caused here..? > > Certainly under USE_OPENSSL we use the actual OpenSSL types.. > > It's referring to collisions with for example BITS_PER_BYTE which is defined > both by postgres and nspr. Since writing this I've introduced src/common/nss.h > to handle it in a single place, so we can indeed use the proper types without > polluting the file. Fixed. Great, thanks! > >> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library > >> > >> The SSL testharness was fully tied to OpenSSL in the way the server was > >> set up and reconfigured. This refactors the SSLServer module into a SSL > >> library agnostic SSL/Server module which in turn use SSL/Backend/<lib> > >> modules for the implementation details. > >> > >> No changes are done to the actual tests, this only change how setup and > >> teardown is performed. > > > > Presumably this could be committed ahead of the main NSS support? > > Correct, I think this has merits even if NSS support is ultimately rejected. Ok- could you break it out on to its own thread and I'll see about committing it soonish, to get it out of the way? > >> Subject: [PATCH v30 5/9] nss: Documentation > >> +++ b/doc/src/sgml/acronyms.sgml > >> @@ -684,6 +717,16 @@ > >> </listitem> > >> </varlistentry> > >> > >> + <varlistentry> > >> + <term><acronym>TLS</acronym></term> > >> + <listitem> > >> + <para> > >> + <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security"> > >> + Transport Layer Security</ulink> > >> + </para> > >> + </listitem> > >> + </varlistentry> > > > > We don't have this already..? Surely we should.. > > We really should, especially since we've had <acronym>TLS</acronym> in > config.sgml since 2014 (c6763156589). That's another small piece that could be > committed on it's own to cut down the size of this patchset (even if only by a > tiny amount). Ditto on this. :) > >> @@ -1288,7 +1305,9 @@ include_dir 'conf.d' > >> connections using TLS version 1.2 and lower are affected. There is > >> currently no setting that controls the cipher choices used by TLS > >> version 1.3 connections. The default value is > >> - <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>. The default is usually a > >> + <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have > >> + been built with <productname>OpenSSL</productname> as the > >> + <acronym>SSL</acronym> library. The default is usually a > >> reasonable choice unless you have specific security requirements. > >> </para> > > > > Shouldn't we say something here wrt NSS? > > We should, but I'm not entirely what just yet. Need to revisit that. Not sure if we really want to do this but at least with ssllabs.com, postgresql.org gets an 'A' rating with this set: ECDHE-ECDSA-CHACHA20-POLY1305 ECDHE-RSA-CHACHA20-POLY1305 ECDHE-ECDSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-ECDSA-AES256-GCM-SHA384 ECDHE-RSA-AES256-GCM-SHA384 DHE-RSA-AES128-GCM-SHA256 DHE-RSA-AES256-GCM-SHA384 ECDHE-ECDSA-AES128-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-ECDSA-AES128-SHA ECDHE-RSA-AES256-SHA384 ECDHE-RSA-AES128-SHA ECDHE-ECDSA-AES256-SHA384 ECDHE-ECDSA-AES256-SHA ECDHE-RSA-AES256-SHA DHE-RSA-AES128-SHA256 DHE-RSA-AES128-SHA DHE-RSA-AES256-SHA256 DHE-RSA-AES256-SHA ECDHE-ECDSA-DES-CBC3-SHA ECDHE-RSA-DES-CBC3-SHA EDH-RSA-DES-CBC3-SHA AES128-GCM-SHA256 AES256-GCM-SHA384 AES128-SHA256 AES256-SHA256 AES128-SHA AES256-SHA DES-CBC3-SHA !DSS Which also seems kinda close to what the default when built with OpenSSL ends up being? Thought the ssllabs report does list which ones it thinks are weak and so we might consider excluding those by default too: https://www.ssllabs.com/ssltest/analyze.html?d=postgresql.org&s=2a02%3a16a8%3adc51%3a0%3a0%3a0%3a0%3a50 > >> @@ -1490,8 +1509,11 @@ include_dir 'conf.d' > >> <para> > >> Sets an external command to be invoked when a passphrase for > >> decrypting an SSL file such as a private key needs to be obtained. By > >> - default, this parameter is empty, which means the built-in prompting > >> - mechanism is used. > >> + default, this parameter is empty. When the server is using > >> + <productname>OpenSSL</productname>, this means the built-in prompting > >> + mechanism is used. When using <productname>NSS</productname>, there is > >> + no default prompting so a blank callback will be used returning an > >> + empty password. > >> </para> > > > > Maybe we should point out here that this requires the database to not > > require a password..? So if they have one, they need to set this, or > > maybe we should provide a default one.. > > I've added a sentence on not using a password for the cert database. I'm not > sure if providing a default one is a good idea but it's no less insecure than > having no password really.. I was meaning a default callback to prompt, not sure if that was clear. > > Just a, well, not so quick read-through. Generally it's looking pretty > > good to me. Will see about playing with it this week. > > Thanks again for reviewing, another version which addresses the remaining > issues will be posted soon but I wanted to get this out to give further reviews > something that properly works. Fantastic, thanks again! Stephen
Attachment
On Tue, 2021-03-23 at 00:38 +0100, Daniel Gustafsson wrote: > This rebase also includes a fix for pgtls_init which was sent offlist by Jacob. > The changes in pgtls_init can potentially be used to initialize the crypto > context for NSS to clean up this patch, Jacob is currently looking at that. I'm having a hell of a time trying to get the context stuff working. Findings so far (I have patches in progress for many of these, but it's all blowing up because of the last problem): NSS_INIT_NOROOTINIT is hardcoded for NSS_InitContext(), so we probably don't need to pass it explicitly. NSS_INIT_PK11RELOAD is apparently meant to hack around libraries that do their own PKCS loading; do we need it? NSS_ShutdownContext() can (and does) fail if we've leaked handles to objects, so we need to check its return value. Once this happens, future NSS_InitContext() calls behave poorly. Currently we leak the pr_fd as well as a handful of server_cert handles. NSS_NoDB_Init() is going to pin NSS in memory. For the backend this is probably okay, but for libpq clients that's probably not what we want. The first database loaded by NSS_InitContext() becomes the "default" database. This is what I'm currently hung up on. I can't figure out how to get NSS to use the database that was loaded for the current connection, so in my local patches for the issues above, client certificates fail to load. I can work around it temporarily for the tests, but this will be a problem if any libpq clients load up multiple independent databases for use with separate connections. Anyone know if this is a supported use case for NSS? --Jacob
On Wed, Mar 24, 2021 at 12:05:35AM +0000, Jacob Champion wrote: > The first database loaded by NSS_InitContext() becomes the "default" > database. This is what I'm currently hung up on. I can't figure out how > to get NSS to use the database that was loaded for the current > connection, so in my local patches for the issues above, client > certificates fail to load. I can work around it temporarily for the > tests, but this will be a problem if any libpq clients load up multiple > independent databases for use with separate connections. Anyone know if > this is a supported use case for NSS? Are you referring to the case of threading here? This should be a supported case, as threads created by an application through libpq could perfectly use completely different connection strings. -- Michael
Attachment
On Tue, Mar 23, 2021 at 12:38:50AM +0100, Daniel Gustafsson wrote: > Thanks again for reviewing, another version which addresses the remaining > issues will be posted soon but I wanted to get this out to give further reviews > something that properly works. I have been looking at the infrastructure of the tests, patches 0002 (some refactoring) and 0003 (more refactoring with tests for NSS), and I am a bit confused by its state. First, I think that the split is not completely clear. For example, patch 0003 has changes for OpenSSL.pm and Server.pm, but wouldn't it be better to have all the refactoring infrastructure only in 0002, with 0003 introducing only the NSS pieces for its internal data and NSS.pm? + keyfile => 'server-password', + nssdatabase => 'server-cn-only.crt__server-password.key.db', + passphrase_cmd => 'echo secret1', 001_ssltests.pl and 002_scram.pl have NSS-related parameters, which does not look like a clean separation to me as there are OpenSSL tests that use some NSS parts, and the main scripts should remain neutral in terms setting contents, including only variables and callbacks that should be filled specifically for each SSL implementation, no? Aren't we missing a second piece here with a set of callbacks for the per-library test paths then? + if (defined($openssl)) + { + copy_files("ssl/server-*.crt", $pgdata); + copy_files("ssl/server-*.key", $pgdata); + chmod(0600, glob "$pgdata/server-*.key") or die $!; + copy_files("ssl/root+client_ca.crt", $pgdata); + copy_files("ssl/root_ca.crt", $pgdata); + copy_files("ssl/root+client.crl", $pgdata); + mkdir("$pgdata/root+client-crldir"); + copy_files("ssl/root+client-crldir/*", "$pgdata/root+client-crldir/"); + } + elsif (defined($nss)) + { + RecursiveCopy::copypath("ssl/nss", $pgdata . "/nss") if -e "ssl/nss"; + } This had better be in its own callback, for example. -- Michael
Attachment
On Wed, 2021-03-24 at 09:28 +0900, Michael Paquier wrote: > On Wed, Mar 24, 2021 at 12:05:35AM +0000, Jacob Champion wrote: > > I can work around it temporarily for the > > tests, but this will be a problem if any libpq clients load up multiple > > independent databases for use with separate connections. Anyone know if > > this is a supported use case for NSS? > > Are you referring to the case of threading here? This should be a > supported case, as threads created by an application through libpq > could perfectly use completely different connection strings. Right, but to clarify -- I was asking if *NSS* supports loading and using separate certificate databases as part of its API. It seems like the internals make it possible, but I don't see the public interfaces to actually use those internals. --Jacob
Greetings Jacob, * Jacob Champion (pchampion@vmware.com) wrote: > On Wed, 2021-03-24 at 09:28 +0900, Michael Paquier wrote: > > On Wed, Mar 24, 2021 at 12:05:35AM +0000, Jacob Champion wrote: > > > I can work around it temporarily for the > > > tests, but this will be a problem if any libpq clients load up multiple > > > independent databases for use with separate connections. Anyone know if > > > this is a supported use case for NSS? > > > > Are you referring to the case of threading here? This should be a > > supported case, as threads created by an application through libpq > > could perfectly use completely different connection strings. > Right, but to clarify -- I was asking if *NSS* supports loading and > using separate certificate databases as part of its API. It seems like > the internals make it possible, but I don't see the public interfaces > to actually use those internals. Yes, this is done using SECMOD_OpenUserDB, see: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11_Functions#SECMOD_OpenUserDB also there's info here: https://groups.google.com/g/mozilla.dev.tech.crypto/c/Xz6Emfcue0E We should document that, as mentioned in the link above, the NSS find functions will find certs in all the opened databases. As this would all be under one application which is linked against libpq and passing in different values for ssl_database for different connections, this doesn't seem like it's really that much of an issue. Thanks! Stephen
Attachment
On Wed, 2021-03-24 at 13:00 -0400, Stephen Frost wrote: > * Jacob Champion (pchampion@vmware.com) wrote: > > Right, but to clarify -- I was asking if *NSS* supports loading and > > using separate certificate databases as part of its API. It seems like > > the internals make it possible, but I don't see the public interfaces > > to actually use those internals. > > Yes, this is done using SECMOD_OpenUserDB, see: > > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11_Functions#SECMOD_OpenUserDB Ah, I had assumed that the DB-specific InitContext was using this behind the scenes; apparently not. I will give that a try, thanks! > also there's info here: > > https://groups.google.com/g/mozilla.dev.tech.crypto/c/Xz6Emfcue0E > > We should document that, as mentioned in the link above, the NSS find > functions will find certs in all the opened databases. As this would > all be under one application which is linked against libpq and passing > in different values for ssl_database for different connections, this > doesn't seem like it's really that much of an issue. I could see this being a problem if two client certificate nicknames collide across multiple in-use databases, maybe? --Jacob
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > On Wed, 2021-03-24 at 13:00 -0400, Stephen Frost wrote: > > * Jacob Champion (pchampion@vmware.com) wrote: > > > Right, but to clarify -- I was asking if *NSS* supports loading and > > > using separate certificate databases as part of its API. It seems like > > > the internals make it possible, but I don't see the public interfaces > > > to actually use those internals. > > > > Yes, this is done using SECMOD_OpenUserDB, see: > > > > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11_Functions#SECMOD_OpenUserDB > > Ah, I had assumed that the DB-specific InitContext was using this > behind the scenes; apparently not. I will give that a try, thanks! > > > also there's info here: > > > > https://groups.google.com/g/mozilla.dev.tech.crypto/c/Xz6Emfcue0E > > > > We should document that, as mentioned in the link above, the NSS find > > functions will find certs in all the opened databases. As this would > > all be under one application which is linked against libpq and passing > > in different values for ssl_database for different connections, this > > doesn't seem like it's really that much of an issue. > > I could see this being a problem if two client certificate nicknames > collide across multiple in-use databases, maybe? Right, in such a case either cert might get returned and it's possible that the "wrong" one is returned and therefore the connection would end up failing, assuming that they aren't actually the same and just happen to be in both. Seems like we could use SECMOD_OpenUserDB() and then pass the result from that into PK11_ListCertsInSlot() and scan through the certs in just the specified database to find the one we're looking for if we really feel compelled to try and address this risk. I've reached out to the NSS folks to see if they have any thoughts about the best way to address this. Thanks, Stephen
Attachment
> On 24 Mar 2021, at 04:54, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Mar 23, 2021 at 12:38:50AM +0100, Daniel Gustafsson wrote: >> Thanks again for reviewing, another version which addresses the remaining >> issues will be posted soon but I wanted to get this out to give further reviews >> something that properly works. > > I have been looking at the infrastructure of the tests, patches 0002 > (some refactoring) and 0003 (more refactoring with tests for NSS), and > I am a bit confused by its state. > > First, I think that the split is not completely clear. For example, > patch 0003 has changes for OpenSSL.pm and Server.pm, but wouldn't it > be better to have all the refactoring infrastructure only in 0002, > with 0003 introducing only the NSS pieces for its internal data and > NSS.pm? Yes. Juggling a patchset of this size is errorprone. This is why I opened the separate thread for this where the patch can be held apart cleaner, so let's take this discussion over there. I will post an updated patch there shortly. > + keyfile => 'server-password', > + nssdatabase => 'server-cn-only.crt__server-password.key.db', > + passphrase_cmd => 'echo secret1', > 001_ssltests.pl and 002_scram.pl have NSS-related parameters, which > does not look like a clean separation to me as there are OpenSSL tests > that use some NSS parts, and the main scripts should remain neutral in > terms setting contents, including only variables and callbacks that > should be filled specifically for each SSL implementation, no? Aren't > we missing a second piece here with a set of callbacks for the > per-library test paths then? Well, then again, keyfile is an OpenSSL specific parameter, it just happens to be named quite neutrally. I'm not sure how to best express the certificate and key requirements of a test since the testcase is the source of truth in terms of what it requires. If we introduce a standard set of cert/keys which all backends are required to supply, we could refer to those. Tests that need something more specific can then go into 00X_<library>.pl. There is a balance to strike though, there is a single backend now with at most one on the horizon which is yet to be decided upon, making it too generic may end up making test writing overcomplicated. Do you have any concretee ideas? > + if (defined($openssl)) > + { > + copy_files("ssl/server-*.crt", $pgdata); > + copy_files("ssl/server-*.key", $pgdata); > + chmod(0600, glob "$pgdata/server-*.key") or die $!; > + copy_files("ssl/root+client_ca.crt", $pgdata); > + copy_files("ssl/root_ca.crt", $pgdata); > + copy_files("ssl/root+client.crl", $pgdata); > + mkdir("$pgdata/root+client-crldir"); > + copy_files("ssl/root+client-crldir/*", > "$pgdata/root+client-crldir/"); > + } > + elsif (defined($nss)) > + { > + RecursiveCopy::copypath("ssl/nss", $pgdata . "/nss") if -e > "ssl/nss"; > + } > This had better be in its own callback, for example. Yes, this one is a clearer case, fixed in the v2 patch which will be posted on the separate thread. -- Daniel Gustafsson https://vmware.com/
On Wed, 2021-03-24 at 14:10 -0400, Stephen Frost wrote: > * Jacob Champion (pchampion@vmware.com) wrote: > > I could see this being a problem if two client certificate nicknames > > collide across multiple in-use databases, maybe? > > Right, in such a case either cert might get returned and it's possible > that the "wrong" one is returned and therefore the connection would end > up failing, assuming that they aren't actually the same and just happen > to be in both. > > Seems like we could use SECMOD_OpenUserDB() and then pass the result > from that into PK11_ListCertsInSlot() and scan through the certs in just > the specified database to find the one we're looking for if we really > feel compelled to try and address this risk. I've reached out to the > NSS folks to see if they have any thoughts about the best way to address > this. Some additional findings (NSS 3.63), please correct me if I've made any mistakes: The very first NSSInitContext created is special. If it contains a database, that database will be considered part of the"internal" slot and its certificates can be referenced directly by nickname. If it doesn't have a database, the internalslot has no certificates, and it will continue to have zero certificates until NSS is completely shut down and reinitializedwith a new "first" context. Databases that are opened *after* the first one are given their own separate slots. Any certificates that are part of thosedatabases seemingly can't be referenced directly by nickname. They have to be prefixed by their token name -- a namewhich you don't have if you used NSS_InitContext() to create the database. You have to use SECMOD_OpenUserDB() instead.This explains some strange failures I was seeing in local testing, where the order of InitContext determined whetherour client certificate selection succeeded or failed. If you SECMOD_OpenUserDB() a database that is identical to the first (internal) database, NSS deduplicates for you and justreturns the internal slot. Which seems like it's helpful, except you're not allowed to close that database, and you haveto know not to close it by checking to see whether that slot is the "internal key slot". It appears to remain open untilNSS is shut down entirely. But if you open a database that is *not* the magic internal database, and then open a duplicate of that one, NSS creates yet another new slot for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource hog, depending on the global state of the process at the time libpq opens its first connection. We won't be able to control what the parent application will do before loading us up. It also doesn't look like any of the SECMOD_* machinery that we're looking at is thread-safe, but I'd really like to be wrong... --Jacob
> On 23 Mar 2021, at 20:04, Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Daniel Gustafsson (daniel@yesql.se) wrote: >>> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote: >> >> Thanks for the review! Below is a partial response, I haven't had time to >> address all your review comments yet but I wanted to submit a rebased patchset >> directly since the current version doesn't work after recent changes in the >> tree. I will address the remaining comments tomorrow or the day after. > > Great, thanks! > >> This rebase also includes a fix for pgtls_init which was sent offlist by Jacob. >> The changes in pgtls_init can potentially be used to initialize the crypto >> context for NSS to clean up this patch, Jacob is currently looking at that. > > Ah, cool, sounds good. > >>> They aren't the same and it might not be >>> clear what's going on if one was to somehow mix them (at least if pr_fd >>> continues to sometimes be a void*, but I wonder why that's being >>> done..? more on that later..). >> >> To paraphrase from a later in this email, there are collisions between nspr and >> postgres on things like BITS_PER_BYTE, and there were also collisions on basic >> types until I learned about NO_NSPR_10_SUPPORT. By moving the juggling of this >> into common/nss.h we can use proper types without introducing that pollution >> everywhere. I will address these places. > > Ah, ok, and great, that sounds good. >>>> + /* >>>> + * Set the fallback versions for the TLS protocol version range to a >>>> + * combination of our minimal requirement and the library maximum. Error >>>> + * messages should be kept identical to those in be-secure-openssl.c to >>>> + * make translations easier. >>>> + */ >>> >>> Should we pull these error messages out into another header so that >>> they're in one place to make sure they're kept consistent, if we really >>> want to put the effort in to keep them the same..? I'm not 100% sure >>> that it's actually necessary to do so, but defining these in one place >>> would help maintain this if we want to. Also alright with just keeping >>> the comment, not that big of a deal. >> >> It might make sense to pull them into common/nss.h, but seeing the error >> message right there when reading the code does IMO make it clearer so it's a >> doubleedged sword. Not sure what is the best option, but I'm not married to >> the current solution so if there is consensus to pull them out somewhere I'm >> happy to do so. > > My thought was to put them into some common/ssl.h or something along > those lines but I don't see it as a big deal either way really. You > make a good point that having the error message there when reading the > code is nice. Thinking more on this, I think my vote will be to keep them duplicated in the code for readability. Unless there are strong feelings against I think we at least should start there. >>> Maybe we should put a stake in the ground that says "we only support >>> back to version X of NSS", test with that and a few more recent versions >>> and the most recent, and then rip out anything that's needed for >>> versions which are older than that? >> >> Yes, right now there is very little in the patch which caters for old versions, >> the PR_Init call might be one of the few offenders. There has been discussion >> upthread about settling for a required version, combining the insights learned >> there with a survey of which versions are commonly available packaged. >> >> Once we settle on a version we can confirm if PR_Init is/isn't needed and >> remove all traces of it if not. > > I don't really see this as all that hard to do- I'd suggest we look at > what systems someone might reasonably deploy v14 on. To that end, I'd > say "only systems which are presently supported", so: RHEL7+, Debian 9+, > Ubuntu 16.04+. Sounds reasonable. > Looking at those, I see: > > Ubuntu 16.04: 3.28.4 > RHEL6: v3.28.4 > Debian: 3.26.2 I assume these have matching NSPR versions placing the Debian 9 NSPR package as the lowest required version for that? >>> I have a pretty hard time imagining that someone is going to want to build PG >>> v14 w/ NSS 2.0 ... >> >> Let alone compiling 2.0 at all on a recent system.. > > Indeed, and given the above, it seems entirely reasonable to make the > requirement be NSS v3+, no? I wouldn't be against making that even > tighter if we thought it made sense to do so. I think anything but doing that would be incredibly unreasonable. >>> Also- we don't seem to complain at all about a cipher being specified that we >>> don't find? Guess I would think that we might want to throw a WARNING in such >>> a case, but I could possibly be convinced otherwise. >> >> No, I think you're right, we should throw WARNING there or possibly even a >> higher elevel. Should that be a COMMERROR even? > > I suppose the thought I was having was that we might want to allow some > string that covered all the OpenSSL and NSS ciphers that someone feels > comfortable with and we'd just ignore the ones that don't make sense for > the particular library we're currently built with. Making it a > COMMERROR seems like overkill and I'm not entirely sure we actually want > any warning since we might then be constantly bleating about it. Right, with a string like that we'd induce WARNING fatigue quickly. Catching the case of *no* ciphers enabled with a COMMERROR is going some way towards being helpful to the user in debugging the failed connection here. >>>> +pg_ssl_read(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags, >>>> + PRIntervalTime timeout) >>>> +{ >>>> + PRRecvFN read_fn; >>>> + PRInt32 n_read; >>>> + >>>> + read_fn = fd->lower->methods->recv; >>>> + n_read = read_fn(fd->lower, buf, amount, flags, timeout); >>>> + >>>> + return n_read; >>>> +} >>>> + >>>> +static PRInt32 >>>> +pg_ssl_write(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags, >>>> + PRIntervalTime timeout) >>>> +{ >>>> + PRSendFN send_fn; >>>> + PRInt32 n_write; >>>> + >>>> + send_fn = fd->lower->methods->send; >>>> + n_write = send_fn(fd->lower, buf, amount, flags, timeout); >>>> + >>>> + return n_write; >>>> +} >>>> + >>>> +static PRStatus >>>> +pg_ssl_close(PRFileDesc *fd) >>>> +{ >>>> + /* >>>> + * Disconnect our private Port from the fd before closing out the stack. >>>> + * (Debug builds of NSPR will assert if we do not.) >>>> + */ >>>> + fd->secret = NULL; >>>> + return PR_GetDefaultIOMethods()->close(fd); >>>> +} >>> >>> Regarding these, I find myself wondering how they're different from the >>> defaults..? I mean, the above just directly called >>> PR_GetDefaultIOMethods() to then call it's close() function- are the >>> fd->lower_methods->recv/send not the default methods? I don't quite get >>> what the point is from having our own callbacks here if they just do >>> exactly what the defaults would do (or are there actually no defined >>> defaults and you have to provide these..?). >> >> It's really just to cope with debug builds of NSPR which assert that fd->secret >> is null before closing. > > And we have to override the recv/send functions for this too..? Sorry, > my comment wasn't just about the close() method but about the others > too. Ah, no we can ditch the .send and .recv functions and stick with the default built-ins, I just confirmed this and removed them. I think they are leftovers from when I injected debug code there during development, they were as you say copies of the default. >>>> + /* >>>> + * Return the underlying PRFileDesc which can be used to access >>>> + * information on the connection details. There is no SSL context per se. >>>> + */ >>>> + if (strcmp(struct_name, "NSS") == 0) >>>> + return conn->pr_fd; >>>> + return NULL; >>>> +} >>> >>> Is there never a reason someone might want the pointer returned by >>> NSS_InitContext? I don't know that there is but it might be something >>> to consider (we could even possibly have our own structure returned by >>> this function which includes both, maybe..?). Not sure if there's a >>> sensible use-case for that or not just wanted to bring it up as it's >>> something I asked myself while reading through this patch. >> >> Not sure I understand what you're asking for here, did you mean "is there ever >> a reason"? > > Eh, poor wording on my part. You're right, the question, reworded > again, was "Would someone want to get the context returned by > NSS_InitContext?". If we think there's a reason that someone might want > that context then perhaps we should allow getting it, in addition to the > pr_fd. If there's really no reason to ever want the context from > NSS_InitContext then what you have here where we're returning pr_fd is > probably fine. I can't think of any reason, maybe Jacob who has been knee-deep in NSS contexts have insights which tell a different story? >>>> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c >>>> index c601071838..7f10da3010 100644 >>>> --- a/src/interfaces/libpq/fe-secure.c >>>> +++ b/src/interfaces/libpq/fe-secure.c >>>> @@ -448,6 +448,27 @@ PQdefaultSSLKeyPassHook_OpenSSL(char *buf, int size, PGconn *conn) >>>> } >>>> #endif /* USE_OPENSSL */ >>>> >>>> +#ifndef USE_NSS >>>> + >>>> +PQsslKeyPassHook_nss_type >>>> +PQgetSSLKeyPassHook_nss(void) >>>> +{ >>>> + return NULL; >>>> +} >>>> + >>>> +void >>>> +PQsetSSLKeyPassHook_nss(PQsslKeyPassHook_nss_type hook) >>>> +{ >>>> + return; >>>> +} >>>> + >>>> +char * >>>> +PQdefaultSSLKeyPassHook_nss(PK11SlotInfo * slot, PRBool retry, void *arg) >>>> +{ >>>> + return NULL; >>>> +} >>>> +#endif /* USE_NSS */ >>> >>> Isn't this '!USE_NSS'? >> >> Technically it is, but using just /* USE_NSS */ is consistent with the rest of >> blocks in the file. > > Hrmpf. I guess it seems a bit confusing to me to have to go find the > opening #ifndef to realize that it's actally !USE_NSS.. In other words, > I would think we'd actually want to fix all of these, heh. I only > actually see one case on a quick grep where it's wrong for USE_OPENSSL > and so that doesn't seem like it's really a precedent and is more of a > bug. We certainly say 'not OPENSSL' in one place today too and also > have a number of places where we have: #endif ... /* ! WHATEVER */. No disagreement from me. To cut down the size of this patchset however I propose that we tackle this separately and leave this as is in this thread since it's in line with the rest of the file (for now). >>>> Subject: [PATCH v30 2/9] Refactor SSL testharness for multiple library >>>> >>>> The SSL testharness was fully tied to OpenSSL in the way the server was >>>> set up and reconfigured. This refactors the SSLServer module into a SSL >>>> library agnostic SSL/Server module which in turn use SSL/Backend/<lib> >>>> modules for the implementation details. >>>> >>>> No changes are done to the actual tests, this only change how setup and >>>> teardown is performed. >>> >>> Presumably this could be committed ahead of the main NSS support? >> >> Correct, I think this has merits even if NSS support is ultimately rejected. > > Ok- could you break it out on to its own thread and I'll see about > committing it soonish, to get it out of the way? It was already on it's own thread, as we discussed offlist. I have since rebased and expanded that patch over in that thread which has gotten review that needs to be addressed. As such, I will not update that patch in the series in this thread but keep the changes on that thread, and then pull them back into here when ready. >>>> Subject: [PATCH v30 5/9] nss: Documentation >>>> +++ b/doc/src/sgml/acronyms.sgml >>>> @@ -684,6 +717,16 @@ >>>> </listitem> >>>> </varlistentry> >>>> >>>> + <varlistentry> >>>> + <term><acronym>TLS</acronym></term> >>>> + <listitem> >>>> + <para> >>>> + <ulink url="https://en.wikipedia.org/wiki/Transport_Layer_Security"> >>>> + Transport Layer Security</ulink> >>>> + </para> >>>> + </listitem> >>>> + </varlistentry> >>> >>> We don't have this already..? Surely we should.. >> >> We really should, especially since we've had <acronym>TLS</acronym> in >> config.sgml since 2014 (c6763156589). That's another small piece that could be >> committed on it's own to cut down the size of this patchset (even if only by a >> tiny amount). > > Ditto on this. :) Done in https://postgr.es/m/27109504-82DB-41A8-8E63-C0498314F5B0@yesql.se >>>> @@ -1288,7 +1305,9 @@ include_dir 'conf.d' >>>> connections using TLS version 1.2 and lower are affected. There is >>>> currently no setting that controls the cipher choices used by TLS >>>> version 1.3 connections. The default value is >>>> - <literal>HIGH:MEDIUM:+3DES:!aNULL</literal>. The default is usually a >>>> + <literal>HIGH:MEDIUM:+3DES:!aNULL</literal> for servers which have >>>> + been built with <productname>OpenSSL</productname> as the >>>> + <acronym>SSL</acronym> library. The default is usually a >>>> reasonable choice unless you have specific security requirements. >>>> </para> >>> >>> Shouldn't we say something here wrt NSS? >> >> We should, but I'm not entirely what just yet. Need to revisit that. > > Not sure if we really want to do this but at least with ssllabs.com, > postgresql.org gets an 'A' rating with this set: > > ECDHE-ECDSA-CHACHA20-POLY1305 > ECDHE-RSA-CHACHA20-POLY1305 > ECDHE-ECDSA-AES128-GCM-SHA256 > ECDHE-RSA-AES128-GCM-SHA256 > ECDHE-ECDSA-AES256-GCM-SHA384 > ECDHE-RSA-AES256-GCM-SHA384 > DHE-RSA-AES128-GCM-SHA256 > DHE-RSA-AES256-GCM-SHA384 > ECDHE-ECDSA-AES128-SHA256 > ECDHE-RSA-AES128-SHA256 > ECDHE-ECDSA-AES128-SHA > ECDHE-RSA-AES256-SHA384 > ECDHE-RSA-AES128-SHA > ECDHE-ECDSA-AES256-SHA384 > ECDHE-ECDSA-AES256-SHA > ECDHE-RSA-AES256-SHA > DHE-RSA-AES128-SHA256 > DHE-RSA-AES128-SHA > DHE-RSA-AES256-SHA256 > DHE-RSA-AES256-SHA > ECDHE-ECDSA-DES-CBC3-SHA > ECDHE-RSA-DES-CBC3-SHA > EDH-RSA-DES-CBC3-SHA > AES128-GCM-SHA256 > AES256-GCM-SHA384 > AES128-SHA256 > AES256-SHA256 > AES128-SHA > AES256-SHA > DES-CBC3-SHA > !DSS > > Which also seems kinda close to what the default when built with OpenSSL > ends up being? Thought the ssllabs report does list which ones it > thinks are weak and so we might consider excluding those by default too: > > https://www.ssllabs.com/ssltest/analyze.html?d=postgresql.org&s=2a02%3a16a8%3adc51%3a0%3a0%3a0%3a0%3a50 Agreed, maintaining parity (or thereabouts) with OpenSSL defaults taking industry best practices into account is probably what we should aim for. >>>> @@ -1490,8 +1509,11 @@ include_dir 'conf.d' >>>> <para> >>>> Sets an external command to be invoked when a passphrase for >>>> decrypting an SSL file such as a private key needs to be obtained. By >>>> - default, this parameter is empty, which means the built-in prompting >>>> - mechanism is used. >>>> + default, this parameter is empty. When the server is using >>>> + <productname>OpenSSL</productname>, this means the built-in prompting >>>> + mechanism is used. When using <productname>NSS</productname>, there is >>>> + no default prompting so a blank callback will be used returning an >>>> + empty password. >>>> </para> >>> >>> Maybe we should point out here that this requires the database to not >>> require a password..? So if they have one, they need to set this, or >>> maybe we should provide a default one.. >> >> I've added a sentence on not using a password for the cert database. I'm not >> sure if providing a default one is a good idea but it's no less insecure than >> having no password really.. > > I was meaning a default callback to prompt, not sure if that was clear. Ah, no that's not what I thought you meant. Do you have any thoughts on what that callback would look like? Take a password on a TTY input? Below are a few fixes addressed from the original review email: >>> + /* >>> + * Set up the custom IO layer. >>> + */ >> >> Might be good to mention that the IO Layer is what sets up the >> read/write callbacks to be used. > > Good point, will do in the next version of the patchset. Fixed. >>> + port->pr_fd = SSL_ImportFD(model, pr_fd); >>> + if (!port->pr_fd) >>> + { >>> + ereport(COMMERROR, >>> + (errmsg("unable to initialize"))); >>> + return -1; >>> + } >> >> Maybe a comment and a better error message for this? > > Will do. Fixed. >>> >>> + PR_Close(model); >> >> This might deserve one also, the whole 'model' construct is a bit >> different. :) > > Agreed. will do. Fixed. >> Also, I get that they do similar jobs and that one is in the frontend >> and the other is in the backend, but I'm not a fan of having two >> 'ssl_protocol_version_to_nss()'s functions that take different argument >> types but have exact same name and do functionally different things.. > > Good point, I'll change that. Fixed. >>> + /* >>> + * Configure cipher policy. >>> + */ >>> + status = NSS_SetDomesticPolicy(); >>> + if (status != SECSuccess) >>> + { >>> + printfPQExpBuffer(&conn->errorMessage, >>> + libpq_gettext("unable to configure cipher policy: %s"), >>> + pg_SSLerrmessage(PR_GetError())); >>> + >>> + return PGRES_POLLING_FAILED; >>> + } >> >> Probably good to pull over at least some parts of the comments made in >> the backend code about SetDomesticPolicy() actually enabling everything >> (just like all the policies apparently do)... > > Good point, will do. Fixed. >>> +int >>> +be_tls_open_server(Port *port) >>> +{ >>> + SECStatus status; >>> + PRFileDesc *model; >>> + PRFileDesc *pr_fd; >> >> pr_fd here is materially different from port->pr_fd, no? As in, one is >> the NSS raw TCP fd while the other is the SSL fd, right? Maybe we >> should use two different variable names to try and make sure they don't >> get confused? Might even set this to NULL after we are done with it >> too.. Then again, I see later on that when we do the dance with the >> 'model' PRFileDesc that we just use the same variable- maybe we should >> do that? That is, just get rid of this 'pr_fd' and use port->pr_fd >> always? > > Hmm, I think you're right. I will try that for the next patchset version. >> Similar comments to the backend code- should we just always use >> conn->pr_fd? Or should we rename pr_fd to something else? > > Renaming is probably not a bad idea, will fix. Both fixed. Additionally, a few other off-list reported issues are also fixed in this version (such as fixing the silly markup doc error and testplan off-by-one etc). -- Daniel Gustafsson https://vmware.com/
Attachment
- v32-0009-nss-Build-infrastructure.patch
- v32-0008-nss-Support-NSS-in-cryptohash.patch
- v32-0007-nss-Support-NSS-in-sslinfo.patch
- v32-0006-nss-Support-NSS-in-pgcrypto.patch
- v32-0005-nss-Documentation.patch
- v32-0004-nss-pg_strong_random-support.patch
- v32-0003-nss-Add-NSS-specific-tests.patch
- v32-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v32-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Fri, 2021-03-26 at 00:22 +0100, Daniel Gustafsson wrote: > > On 23 Mar 2021, at 20:04, Stephen Frost <sfrost@snowman.net> wrote: > > > > Eh, poor wording on my part. You're right, the question, reworded > > again, was "Would someone want to get the context returned by > > NSS_InitContext?". If we think there's a reason that someone might want > > that context then perhaps we should allow getting it, in addition to the > > pr_fd. If there's really no reason to ever want the context from > > NSS_InitContext then what you have here where we're returning pr_fd is > > probably fine. > > I can't think of any reason, maybe Jacob who has been knee-deep in NSS contexts > have insights which tell a different story? The only thing you can do with a context pointer is shut it down, and I don't think that's something that should be exposed. --Jacob
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > On Wed, 2021-03-24 at 14:10 -0400, Stephen Frost wrote: > > * Jacob Champion (pchampion@vmware.com) wrote: > > > I could see this being a problem if two client certificate nicknames > > > collide across multiple in-use databases, maybe? > > > > Right, in such a case either cert might get returned and it's possible > > that the "wrong" one is returned and therefore the connection would end > > up failing, assuming that they aren't actually the same and just happen > > to be in both. > > > > Seems like we could use SECMOD_OpenUserDB() and then pass the result > > from that into PK11_ListCertsInSlot() and scan through the certs in just > > the specified database to find the one we're looking for if we really > > feel compelled to try and address this risk. I've reached out to the > > NSS folks to see if they have any thoughts about the best way to address > > this. > > Some additional findings (NSS 3.63), please correct me if I've made any mistakes: > > The very first NSSInitContext created is special. If it contains a database, that database will be considered part of the"internal" slot and its certificates can be referenced directly by nickname. If it doesn't have a database, the internalslot has no certificates, and it will continue to have zero certificates until NSS is completely shut down and reinitializedwith a new "first" context. > > Databases that are opened *after* the first one are given their own separate slots. Any certificates that are part of thosedatabases seemingly can't be referenced directly by nickname. They have to be prefixed by their token name -- a namewhich you don't have if you used NSS_InitContext() to create the database. You have to use SECMOD_OpenUserDB() instead.This explains some strange failures I was seeing in local testing, where the order of InitContext determined whetherour client certificate selection succeeded or failed. This is more-or-less what we would want though, right..? If a user asks for a connection with ssl_database=blah and sslcert=whatever, we'd want to open database 'blah' and search (just) that database for cert 'whatever'. We could possibly offer other options in the future but certainly this would work and be the most straight-forward and expected behavior. > If you SECMOD_OpenUserDB() a database that is identical to the first (internal) database, NSS deduplicates for you andjust returns the internal slot. Which seems like it's helpful, except you're not allowed to close that database, and youhave to know not to close it by checking to see whether that slot is the "internal key slot". It appears to remain openuntil NSS is shut down entirely. Seems like we shouldn't do that and should just use SECMOD_OpenUserDB() for opening databases. > But if you open a database that is *not* the magic internal database, > and then open a duplicate of that one, NSS creates yet another new slot > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource > hog, depending on the global state of the process at the time libpq > opens its first connection. We won't be able to control what the parent > application will do before loading us up. I would think we'd want to avoid re-opening the same database multiple times, to avoid the duplicate slots and such. If the application code does it themselves, well, there's not much we can do about that, but we could at least avoid doing so in *our* code. I wouldn't expect us to be opening hundreds of databases either and so keeping a simple list around of what we've opened and scanning it seems like it'd be workable. Of course, this could likely be improved in the future but I would think that'd be good for an initial implementation. We could also just generally caution users in our documentation against using multiple databases. The NSS folks discourage doing so and it doesn't strike me as being a terribly useful thing to do anyway, at least from within one invocation of an application. Still, if we could make it work reasonably well, then I'd say we should go ahead and do so. > It also doesn't look like any of the SECMOD_* machinery that we're > looking at is thread-safe, but I'd really like to be wrong... That's unfortuante but solvable by using our own locks, similar to what's done in fe-secure-openssl.c. Thanks! Stephen
Attachment
On Fri, 2021-03-26 at 15:33 -0400, Stephen Frost wrote: > * Jacob Champion (pchampion@vmware.com) wrote: > > Databases that are opened *after* the first one are given their own > > separate slots. [...] > > This is more-or-less what we would want though, right..? If a user asks > for a connection with ssl_database=blah and sslcert=whatever, we'd want > to open database 'blah' and search (just) that database for cert > 'whatever'. We could possibly offer other options in the future but > certainly this would work and be the most straight-forward and expected > behavior. Yes, but see below. > > If you SECMOD_OpenUserDB() a database that is identical to the first > > (internal) database, NSS deduplicates for you and just returns the > > internal slot. Which seems like it's helpful, except you're not > > allowed to close that database, and you have to know not to close it > > by checking to see whether that slot is the "internal key slot". It > > appears to remain open until NSS is shut down entirely. > > Seems like we shouldn't do that and should just use SECMOD_OpenUserDB() > for opening databases. We don't have control over whether or not this happens. If the application embedding libpq has already loaded the database into the internal slot via its own NSS initialization, then when we call SECMOD_OpenUserDB() for that same database, the internal slot will be returned and we have to handle it accordingly. It's not a huge amount of work, but it is magic knowledge that has to be maintained, especially in the absence of specialized clientside tests. > > But if you open a database that is *not* the magic internal database, > > and then open a duplicate of that one, NSS creates yet another new slot > > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource > > hog, depending on the global state of the process at the time libpq > > opens its first connection. We won't be able to control what the parent > > application will do before loading us up. > > I would think we'd want to avoid re-opening the same database multiple > times, to avoid the duplicate slots and such. If the application code > does it themselves, well, there's not much we can do about that, but we > could at least avoid doing so in *our* code. I wouldn't expect us to be > opening hundreds of databases either and so keeping a simple list around > of what we've opened and scanning it seems like it'd be workable. Of > course, this could likely be improved in the future but I would think > that'd be good for an initial implementation. > > [...] > > > It also doesn't look like any of the SECMOD_* machinery that we're > > looking at is thread-safe, but I'd really like to be wrong... > > That's unfortuante but solvable by using our own locks, similar > to what's done in fe-secure-openssl.c. Yeah. I was hoping to avoid implementing our own locks and refcounts, but it seems like it's going to be required. --Jacob
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > On Fri, 2021-03-26 at 15:33 -0400, Stephen Frost wrote: > > * Jacob Champion (pchampion@vmware.com) wrote: > > > Databases that are opened *after* the first one are given their own > > > separate slots. [...] > > > > This is more-or-less what we would want though, right..? If a user asks > > for a connection with ssl_database=blah and sslcert=whatever, we'd want > > to open database 'blah' and search (just) that database for cert > > 'whatever'. We could possibly offer other options in the future but > > certainly this would work and be the most straight-forward and expected > > behavior. > > Yes, but see below. > > > > If you SECMOD_OpenUserDB() a database that is identical to the first > > > (internal) database, NSS deduplicates for you and just returns the > > > internal slot. Which seems like it's helpful, except you're not > > > allowed to close that database, and you have to know not to close it > > > by checking to see whether that slot is the "internal key slot". It > > > appears to remain open until NSS is shut down entirely. > > > > Seems like we shouldn't do that and should just use SECMOD_OpenUserDB() > > for opening databases. > > We don't have control over whether or not this happens. If the > application embedding libpq has already loaded the database into the > internal slot via its own NSS initialization, then when we call > SECMOD_OpenUserDB() for that same database, the internal slot will be > returned and we have to handle it accordingly. > > It's not a huge amount of work, but it is magic knowledge that has to > be maintained, especially in the absence of specialized clientside > tests. Ah.. yeah, fair enough. We could document that we discourage applications from doing so, but I agree that we'll need to deal with it since it could happen. > > > But if you open a database that is *not* the magic internal database, > > > and then open a duplicate of that one, NSS creates yet another new slot > > > for the duplicate. So SECMOD_OpenUserDB() may or may not be a resource > > > hog, depending on the global state of the process at the time libpq > > > opens its first connection. We won't be able to control what the parent > > > application will do before loading us up. > > > > I would think we'd want to avoid re-opening the same database multiple > > times, to avoid the duplicate slots and such. If the application code > > does it themselves, well, there's not much we can do about that, but we > > could at least avoid doing so in *our* code. I wouldn't expect us to be > > opening hundreds of databases either and so keeping a simple list around > > of what we've opened and scanning it seems like it'd be workable. Of > > course, this could likely be improved in the future but I would think > > that'd be good for an initial implementation. > > > > [...] > > > > > It also doesn't look like any of the SECMOD_* machinery that we're > > > looking at is thread-safe, but I'd really like to be wrong... > > > > That's unfortuante but solvable by using our own locks, similar > > to what's done in fe-secure-openssl.c. > > Yeah. I was hoping to avoid implementing our own locks and refcounts, > but it seems like it's going to be required. Yeah, afraid so. Thanks! Stephen
Attachment
On Fri, 2021-03-26 at 18:05 -0400, Stephen Frost wrote: > * Jacob Champion (pchampion@vmware.com) wrote: > > Yeah. I was hoping to avoid implementing our own locks and refcounts, > > but it seems like it's going to be required. > > Yeah, afraid so. I think it gets worse, after having debugged some confusing crashes. There's already been a discussion on PR_Init upthread a bit: > Once we settle on a version we can confirm if PR_Init is/isn't needed and > remove all traces of it if not. What the NSPR documentation omits is that implicit initialization is not threadsafe. So NSS_InitContext() is technically "threadsafe" because it's built on PR_CallOnce(), but if you haven't called PR_Init() yet, multiple simultaneous PR_CallOnce() calls can crash into each other. So, fine. We just add our own locks around NSS_InitContext() (or around a single call to PR_Init()). Well, the first thread to win and successfully initialize NSPR gets marked as the "primordial" thread using thread-local state. And it gets a pthread destructor that does... something. So lazy initialization seems a bit dangerous regardless of whether or not we add locks, but I can't really prove whether it's dangerous or not in practice. I do know that only the primordial thread is allowed to call PR_Cleanup(), and of course we wouldn't be able to control which thread does what for libpq clients. I don't know what other assumptions are made about the primordial thread, or if there are any platform-specific behaviors with older versions of NSPR that we'd need to worry about. It used to be that the primordial thread was not allowed to exit before any other threads, but that restriction was lifted at some point [1]. I think we're going to need some analogue to PQinitOpenSSL() to help client applications cut through the mess, but I'm not sure what it should look like, or how we would maintain any sort of API compatibility between the two flavors. And does libpq already have some notion of a "main thread" that I'm missing? --Jacob [1] https://bugzilla.mozilla.org/show_bug.cgi?id=294955
On Wed, Mar 31, 2021 at 10:15:15PM +0000, Jacob Champion wrote: > I think we're going to need some analogue to PQinitOpenSSL() to help > client applications cut through the mess, but I'm not sure what it > should look like, or how we would maintain any sort of API > compatibility between the two flavors. And does libpq already have some > notion of a "main thread" that I'm missing? Nope as far as I recall. With OpenSSL, the initialization of the SSL mutex lock and the crypto callback initialization is done by the first thread in. -- Michael
Attachment
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Wed, Mar 31, 2021 at 10:15:15PM +0000, Jacob Champion wrote: > > I think we're going to need some analogue to PQinitOpenSSL() to help > > client applications cut through the mess, but I'm not sure what it > > should look like, or how we would maintain any sort of API > > compatibility between the two flavors. And does libpq already have some > > notion of a "main thread" that I'm missing? > > Nope as far as I recall. With OpenSSL, the initialization of the SSL > mutex lock and the crypto callback initialization is done by the first > thread in. Yeah, we haven't got any such concept in libpq. I do think that some of this can simply be documented as "if you do this, then you need to make sure to do this". Thanks, Stephen
Attachment
> On 23 Mar 2021, at 00:38, Daniel Gustafsson <daniel@yesql.se> wrote: >> On 22 Mar 2021, at 00:49, Stephen Frost <sfrost@snowman.net> wrote: Attached is a rebase on top of the recent SSL related commits with a few more fixes from previous reviews. >>> +++ b/src/interfaces/libpq/fe-connect.c >>> @@ -359,6 +359,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = { >>> "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */ >>> offsetof(struct pg_conn, target_session_attrs)}, >>> >>> + {"cert_database", NULL, NULL, NULL, >>> + "CertificateDatabase", "", 64, >>> + offsetof(struct pg_conn, cert_database)}, >> >> I mean, maybe nitpicking here, but all the other SSL stuff is >> 'sslsomething' and the backend version of this is 'ssl_database', so >> wouldn't it be more consistent to have this be 'ssldatabase'? > > Thats a good point, I was clearly Stockholm syndromed since I hadn't reflected > on that but it's clearly wrong. Will fix. Fixed >>> + /* >>> + * If we don't have a certificate database, the system trust store is the >>> + * fallback we can use. If we fail to initialize that as well, we can >>> + * still attempt a connection as long as the sslmode isn't verify*. >>> + */ >>> + if (!conn->cert_database && conn->sslmode[0] == 'v') >>> + { >>> + status = pg_load_nss_module(&ca_trust, ca_trust_name, "\"Root Certificates\""); >>> + if (status != SECSuccess) >>> + { >>> + printfPQExpBuffer(&conn->errorMessage, >>> + libpq_gettext("WARNING: unable to load NSS trust module \"%s\" : %s"), >>> + ca_trust_name, >>> + pg_SSLerrmessage(PR_GetError())); >>> + >>> + return PGRES_POLLING_FAILED; >>> + } >>> + } >> >> Maybe have something a bit more here about "maybe you should specifify a >> cert_database" or such? > > Good point, will expand with more detail. Fixed. >>> + /* >>> + * Specify which hostname we are expecting to talk to. This is required, >>> + * albeit mostly applies to when opening a connection to a traditional >>> + * http server it seems. >>> + */ >>> + SSL_SetURL(conn->pr_fd, (conn->connhost[conn->whichhost]).host); >> >> We should probably also set SNI, if available (NSS 3.12.6 it seems?), >> since it looks like that's going to be added to the OpenSSL code. > > Good point, will do. Actually, it turns out that NSS 3.12.6 introduced the serverside SNI handling by providing callbacks to respond to hostname verification. There was no mention of clientside SNI in the NSS documentation that I could find, reading the code however SSL_SetURL does actually set the SNI extension in the ClientHello. So, clientsidee SNI (which is what is proposed for the OpenSSL backend) is already in. >>> + do >>> + { >>> + status = SSL_ForceHandshake(conn->pr_fd); >>> + } >>> + while (status != SECSuccess && PR_GetError() == PR_WOULD_BLOCK_ERROR); >> >> We don't seem to have this loop in the backend code.. Is there some >> reason that we don't? Is it possible that we need to have a loop here >> too? I recall in the GSS encryption code there were definitely things >> during setup that had to be looped back over on both sides to make sure >> everything was finished ... > > Off the cuff I can't remember, will look into it. Thinking more about this, I don't think we should have the loop at all in the frontend either. The reason it was added was to cover cases where we're confused about blocking but I can't actually see the case I was worried about in the code so I think it's useless. Removed. >>> + if (strcmp(attribute_name, "protocol") == 0) >>> + { >>> + switch (channel.protocolVersion) >>> + { >>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_3 >>> + case SSL_LIBRARY_VERSION_TLS_1_3: >>> + return "TLSv1.3"; >>> +#endif >>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_2 >>> + case SSL_LIBRARY_VERSION_TLS_1_2: >>> + return "TLSv1.2"; >>> +#endif >>> +#ifdef SSL_LIBRARY_VERSION_TLS_1_1 >>> + case SSL_LIBRARY_VERSION_TLS_1_1: >>> + return "TLSv1.1"; >>> +#endif >>> + case SSL_LIBRARY_VERSION_TLS_1_0: >>> + return "TLSv1.0"; >>> + default: >>> + return "unknown"; >>> + } >>> + } >> >> Not sure that it really matters, but this seems like it might be useful >> to have as its own function... Maybe even a data structure that both >> functions use just in oppostie directions. Really minor tho. :) > > I suppose that wouldn't be a bad thing, will fix. Moved this into a shared function as it's used by both frontend and backend. It's moved mostly verbatim as it seemed simple enough to not warrant much complication. -- Daniel Gustafsson https://vmware.com/
Attachment
- v33-0008-nss-Support-NSS-in-cryptohash.patch
- v33-0007-nss-Support-NSS-in-sslinfo.patch
- v33-0006-nss-Support-NSS-in-pgcrypto.patch
- v33-0005-nss-Documentation.patch
- v33-0004-nss-pg_strong_random-support.patch
- v33-0003-nss-Add-NSS-specific-tests.patch
- v33-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v33-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v33-0009-nss-Build-infrastructure.patch
Another rebase to cope with recent changes (hmac, ssl tests etc) that conflicted and broke this patchset. -- Daniel Gustafsson https://vmware.com/
Attachment
- v34-0009-nss-Build-infrastructure.patch
- v34-0008-nss-Support-NSS-in-cryptohash.patch
- v34-0007-nss-Support-NSS-in-sslinfo.patch
- v34-0006-nss-Support-NSS-in-pgcrypto.patch
- v34-0005-nss-Documentation.patch
- v34-0004-nss-pg_strong_random-support.patch
- v34-0003-nss-Add-NSS-specific-tests.patch
- v34-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v34-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Mon, Apr 05, 2021 at 12:13:43AM +0200, Daniel Gustafsson wrote: > Another rebase to cope with recent changes (hmac, ssl tests etc) that > conflicted and broke this patchset. Please find an updated set, v35, attached, and my apologies for breaking again your patch set. While testing this patch set and adjusting the SSL tests with HEAD, I have noticed what looks like a bug with the DN mapping that NSS does not run. The connection strings are the same in v35 and in v34, with dbname only changing in-between. Just to be sure, because I could have done something wrong with the rebase of v35, I have done the same test with v34 applied on top of dfc843d and things are failing. So it seems to me that there is an issue with the DN mapping part. -- Michael
Attachment
- v35-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v35-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v35-0003-nss-Add-NSS-specific-tests.patch
- v35-0004-nss-pg_strong_random-support.patch
- v35-0005-nss-Documentation.patch
- v35-0006-nss-Support-NSS-in-pgcrypto.patch
- v35-0007-nss-Support-NSS-in-sslinfo.patch
- v35-0008-nss-Support-NSS-in-cryptohash.patch
- v35-0009-nss-Build-infrastructure.patch
- signature.asc
On Mon, Apr 05, 2021 at 11:12:22AM +0900, Michael Paquier wrote: > Please find an updated set, v35, attached, and my apologies for > breaking again your patch set. While testing this patch set and > adjusting the SSL tests with HEAD, I have noticed what looks like a > bug with the DN mapping that NSS does not run. The connection strings > are the same in v35 and in v34, with dbname only changing in-between. > > Just to be sure, because I could have done something wrong with the > rebase of v35, I have done the same test with v34 applied on top of > dfc843d and things are failing. So it seems to me that there is an > issue with the DN mapping part. For now, I have marked this patch set as returned with feedback as it is still premature for integration, and there are still bugs in it. FWIW, I think that there is a future for providing an alternative to OpenSSL, so, even if it could not make it for this release, I'd like to push forward with this area more seriously as of 15. The recent libcrypto-related refactorings were one step in this direction, as well. -- Michael
Attachment
> On 25 Mar 2021, at 00:56, Jacob Champion <pchampion@vmware.com> wrote: > Databases that are opened *after* the first one are given their own separate slots. Any certificates that are part of thosedatabases seemingly can't be referenced directly by nickname. They have to be prefixed by their token name -- a namewhich you don't have if you used NSS_InitContext() to create the database. You have to use SECMOD_OpenUserDB() instead.This explains some strange failures I was seeing in local testing, where the order of InitContext determined whetherour client certificate selection succeeded or failed. Sorry for the latency is responding, but I'm now back from parental leave. AFAICT the tokenname for the database can be set with the dbTokenDescription member in the NSSInitParameters struct passed to NSS_InitContext() (documented in nss.h). Using this we can avoid the messier SECMOD machinery and use the token in the auth callback to refer to the database we loaded. I hacked this up in my local tree (rebased patchset coming soon) and it seems to work as intended. -- Daniel Gustafsson https://vmware.com/
Attached is a rebase to keep bitrot at bay. On top rebasing and smaller fixes in comments etc, this version fixes/adds a number things: * Performs DN resolution to support the DN mapping * Locks the SECMOD parts and PR_Init call in the frontend as per Jacobs findings upthread * Properly set the tokenname of the database to avoid ambigious lookups in case multiple databases are loaded (a better name to ensure uniqueness is a TODO) * Adds a test for certificate lookup without sslcert set -- Daniel Gustafsson https://vmware.com/
Attachment
- v36-0009-nss-Build-infrastructure.patch
- v36-0008-nss-Support-NSS-in-cryptohash.patch
- v36-0007-nss-Support-NSS-in-sslinfo.patch
- v36-0006-nss-Support-NSS-in-pgcrypto.patch
- v36-0005-nss-Documentation.patch
- v36-0004-nss-pg_strong_random-support.patch
- v36-0003-nss-Add-NSS-specific-tests.patch
- v36-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v36-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote: > Maybe we should just have --with-ssl={openssl,nss}? That'd avoid > needing > to check for errors. [ apologies for the late reply ] Would it be more proper to call it --with-tls={openssl,nss} ? Regards, Jeff Davis
> On 3 Jun 2021, at 19:37, Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote: >> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid >> needing >> to check for errors. > > [ apologies for the late reply ] > > Would it be more proper to call it --with-tls={openssl,nss} ? Well, we use SSL for everything else (GUCs, connection params and env vars etc) so I think --with-ssl is sensible. However, SSL and TLS are used quite interchangeably these days so I think it makes sense to provide --with-tls as an alias. -- Daniel Gustafsson https://vmware.com/
On 6/3/21 1:47 PM, Daniel Gustafsson wrote: >> On 3 Jun 2021, at 19:37, Jeff Davis <pgsql@j-davis.com> wrote: >> >> On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote: >>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid >>> needing >>> to check for errors. >> [ apologies for the late reply ] >> >> Would it be more proper to call it --with-tls={openssl,nss} ? > Well, we use SSL for everything else (GUCs, connection params and env vars etc) > so I think --with-ssl is sensible. > > However, SSL and TLS are used quite interchangeably these days so I think it > makes sense to provide --with-tls as an alias. > Yeah, but it's annoying to have to start every talk I give touching this subject with the slide that says "When we say SSL we really means TLS". Maybe release 15 would be a good time to rename user-visible option names etc, with support for legacy names. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, 2021-06-03 at 15:53 -0400, Andrew Dunstan wrote: > Yeah, but it's annoying to have to start every talk I give touching > this > subject with the slide that says "When we say SSL we really means > TLS". > Maybe release 15 would be a good time to rename user-visible option > names etc, with support for legacy names. Sounds good to me, though I haven't looked into how big of a diff that will be. Also, do we have precedent for GUC aliases? That might be a little weird. Regards, Jeff Davis
> On 3 Jun 2021, at 22:14, Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2021-06-03 at 15:53 -0400, Andrew Dunstan wrote: >> Yeah, but it's annoying to have to start every talk I give touching >> this >> subject with the slide that says "When we say SSL we really means >> TLS". >> Maybe release 15 would be a good time to rename user-visible option >> names etc, with support for legacy names. Perhaps. Having spent some time in this space, SSL has IMHO become the de facto term for an encrypted connection at the socket layer, with TLS being the current protocol suite (additionally, often referred to SSL/TLS). Offering tls* counterparts to our ssl GUCs etc will offer a level of correctness but I doubt we'll ever get rid of ssl* so we might not help too many users by the added complexity. It might also put us a hard spot if the next TLS spec ends up being called something other than TLS? It's clearly happened before =) > Sounds good to me, though I haven't looked into how big of a diff that > will be. > > Also, do we have precedent for GUC aliases? That might be a little > weird. I don't think we do currently, but I have a feeling the topic has surfaced here before. If we end up settling on this being something we want I can volunteer to do the legwork, but it seems a discussion best had before a patch is drafted. -- Daniel Gustafsson https://vmware.com/
Daniel Gustafsson <daniel@yesql.se> writes: > It might also put us a hard spot if the next TLS spec ends up being called > something other than TLS? It's clearly happened before =) Good point. I'm inclined to just stick with the SSL terminology. >> Also, do we have precedent for GUC aliases? That might be a little >> weird. > I don't think we do currently, but I have a feeling the topic has surfaced here > before. We do, look for "sort_mem" in guc.c. So it's not like it'd be inconvenient to implement. But I think user confusion and the potential for the new terminology to fail to be any more future-proof are good reasons to just leave the names alone. regards, tom lane
> On 3 Jun 2021, at 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Also, do we have precedent for GUC aliases? That might be a little >>> weird. > >> I don't think we do currently, but I have a feeling the topic has surfaced here >> before. > > We do, look for "sort_mem" in guc.c. I knew it seemed familiar but I failed to find it, thanks for the pointer. -- Daniel Gustafsson https://vmware.com/
On Thu, Jun 3, 2021 at 04:55:45PM -0400, Tom Lane wrote: > Daniel Gustafsson <daniel@yesql.se> writes: > > It might also put us a hard spot if the next TLS spec ends up being called > > something other than TLS? It's clearly happened before =) > > Good point. I'm inclined to just stick with the SSL terminology. I wonder if we should use SSL/TLS in more places in our documentation. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Bruce Momjian <bruce@momjian.us> writes: > I wonder if we should use SSL/TLS in more places in our documentation. No objection to doing that in the docs; I'm just questioning switching the code-visible names. regards, tom lane
> On 3 Jun 2021, at 23:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bruce Momjian <bruce@momjian.us> writes: >> I wonder if we should use SSL/TLS in more places in our documentation. > > No objection to doing that in the docs; I'm just questioning > switching the code-visible names. As long as it's still searchable by "SSL", "TLS" and "SSL/TLS" and not just the latter. -- Daniel Gustafsson https://vmware.com/
On Thu, Jun 3, 2021 at 11:14 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 3 Jun 2021, at 23:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Bruce Momjian <bruce@momjian.us> writes: > >> I wonder if we should use SSL/TLS in more places in our documentation. > > > > No objection to doing that in the docs; I'm just questioning > > switching the code-visible names. +1. I also don't think it's worth changing the actual names, I think that'll cause more problems than it solves. But we can, and probably should, change the messaging around it, particularly the docs (but probably also comments in the config file). > As long as it's still searchable by "SSL", "TLS" and "SSL/TLS" and not just the > latter. Agreed, making it searchable and easily cross-linkable.. And maybe both terms should be in the glossary. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Fri, 2021-05-28 at 11:04 +0200, Daniel Gustafsson wrote: > Attached is a rebase to keep bitrot at bay. I get a failure during one of the CRL directory tests due to a missing database -- it looks like the Makefile is missing an entry. (I'm dusting off my build after a few months away, so I don't know if this latest rebase introduced it or not.) Attached is a quick patch; does it work on your machine? --Jacob
Attachment
> On 15 Jun 2021, at 00:15, Jacob Champion <pchampion@vmware.com> wrote: > Attached is a quick patch; does it work on your machine? It does, thanks! I've included it in the attached v37 along with a few tiny non-functional improvements in comment spelling etc. -- Daniel Gustafsson https://vmware.com/
Attachment
- v37-0009-nss-Build-infrastructure.patch
- v37-0008-nss-Support-NSS-in-cryptohash.patch
- v37-0007-nss-Support-NSS-in-sslinfo.patch
- v37-0006-nss-Support-NSS-in-pgcrypto.patch
- v37-0005-nss-Documentation.patch
- v37-0004-nss-pg_strong_random-support.patch
- v37-0003-nss-Add-NSS-specific-tests.patch
- v37-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v37-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Wed, 2021-06-16 at 00:08 +0200, Daniel Gustafsson wrote: > > On 15 Jun 2021, at 00:15, Jacob Champion <pchampion@vmware.com> wrote: > > Attached is a quick patch; does it work on your machine? > > It does, thanks! I've included it in the attached v37 along with a few tiny > non-functional improvements in comment spelling etc. Great, thanks! I've been tracking down reference leaks in the client. These open references prevent NSS from shutting down cleanly, which then makes it impossible to open a new context in the future. This probably affects other libpq clients more than it affects psql. The first step to fixing that is not ignoring failures during NSS shutdown, so I've tried a patch to pgtls_close() that pushes any failures through the pqInternalNotice(). That seems to be working well. The tests were still mostly green, so I taught connect_ok() to fail if any stderr showed up, and that exposed quite a few failures. I am currently stuck on one last failing test. This leak seems to only show up when using TLSv1.2 or below. There doesn't seem to be a substantial difference in libpq code coverage between 1.2 and 1.3, so I'm worried that either 1) there's some API we use that "requires" cleanup, but only on 1.2 and below, or 2) there's some bug in my version of NSS. Attached are a few work-in-progress patches. I think the reference cleanups themselves are probably solid, but the rest of it could use some feedback. Are there better ways to test for this? and can anyone reproduce the TLSv1.2 leak? --Jacob
Attachment
> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote: > I've been tracking down reference leaks in the client. These open > references prevent NSS from shutting down cleanly, which then makes it > impossible to open a new context in the future. This probably affects > other libpq clients more than it affects psql. Ah, nice catch, that's indeed a bug in the frontend implementation. The problem is that the NSS trustdomain cache *must* be empty before shutting down the context, else this very issue happens. Note this in be_tls_destroy(): /* * It reads a bit odd to clear a session cache when we are destroying the * context altogether, but if the session cache isn't cleared before * shutting down the context it will fail with SEC_ERROR_BUSY. */ SSL_ClearSessionCache(); Calling SSL_ClearSessionCache() in pgtls_close() fixes the error. There is another resource leak left (visible in one test after the above is added), the SECMOD module needs to be unloaded in case it's been loaded. Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I have yet to figure out (when acquiring a lock with NSSRWLock_LockRead). > The first step to fixing that is not ignoring failures during NSS > shutdown, so I've tried a patch to pgtls_close() that pushes any > failures through the pqInternalNotice(). That seems to be working well. I'm keeping these in during hacking, with a comment that they need to be revisited during review since they are mainly useful for debugging. > The tests were still mostly green, so I taught connect_ok() to fail if > any stderr showed up, and that exposed quite a few failures. With your patches I'm seeing a couple of these: SSL error: The one-time function was previously called and failed. Its error code is no longer available This is an error from NSPR, but it's not clear to me which PR_CallOnce call it's coming from. It seems to be hitting in the SAN and CRL tests, so it smells of some form of caching implemented with NSPR API's to me but thats a mere hunch. > I am currently stuck on one last failing test. This leak seems to only > show up when using TLSv1.2 or below. AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting renegotiation. -- Daniel Gustafsson https://vmware.com/
On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote: > > On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote: > > I've been tracking down reference leaks in the client. These open > > references prevent NSS from shutting down cleanly, which then makes it > > impossible to open a new context in the future. This probably affects > > other libpq clients more than it affects psql. > > Ah, nice catch, that's indeed a bug in the frontend implementation. The > problem is that the NSS trustdomain cache *must* be empty before shutting down > the context, else this very issue happens. Note this in be_tls_destroy(): > > /* > * It reads a bit odd to clear a session cache when we are destroying the > * context altogether, but if the session cache isn't cleared before > * shutting down the context it will fail with SEC_ERROR_BUSY. > */ > SSL_ClearSessionCache(); > > Calling SSL_ClearSessionCache() in pgtls_close() fixes the error. That's unfortunate. The session cache is global, right? So I'm guessing we'll need to refcount and lock that call, to avoid cleaning up out from under a thread that's actively using the the cache? > There is another resource leak left (visible in one test after the above is > added), the SECMOD module needs to be unloaded in case it's been loaded. > Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I > have yet to figure out (when acquiring a lock with NSSRWLock_LockRead). > > [...] > > With your patches I'm seeing a couple of these: > > SSL error: The one-time function was previously called and failed. Its error code is no longer available Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the moment) fixes all of the SSL tests for me, and I don't see either the SECMOD leak or the "one-time function" error that you've mentioned. What version of NSS are you running? I'm on 3.63. I've attached my current patchset (based on v37) for comparison. > > I am currently stuck on one last failing test. This leak seems to only > > show up when using TLSv1.2 or below. > > AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting > renegotiation. Nice, at least that mystery is solved. :D Thanks, --Jacob
Attachment
> On 16 Jun 2021, at 18:15, Jacob Champion <pchampion@vmware.com> wrote: > > On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote: >>> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote: >>> I've been tracking down reference leaks in the client. These open >>> references prevent NSS from shutting down cleanly, which then makes it >>> impossible to open a new context in the future. This probably affects >>> other libpq clients more than it affects psql. >> >> Ah, nice catch, that's indeed a bug in the frontend implementation. The >> problem is that the NSS trustdomain cache *must* be empty before shutting down >> the context, else this very issue happens. Note this in be_tls_destroy(): >> >> /* >> * It reads a bit odd to clear a session cache when we are destroying the >> * context altogether, but if the session cache isn't cleared before >> * shutting down the context it will fail with SEC_ERROR_BUSY. >> */ >> SSL_ClearSessionCache(); >> >> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error. > > That's unfortunate. The session cache is global, right? So I'm guessing > we'll need to refcount and lock that call, to avoid cleaning up out > from under a thread that's actively using the the cache? I'm not sure, the documentation doesn't give any answers and implementations of libnss tend to just clear the cache without consideration. In libcurl we do just that, and haven't had any complaints - which doesn't mean it's correct but it's a datapoint. >> There is another resource leak left (visible in one test after the above is >> added), the SECMOD module needs to be unloaded in case it's been loaded. >> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I >> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead). >> >> [...] >> >> With your patches I'm seeing a couple of these: >> >> SSL error: The one-time function was previously called and failed. Its error code is no longer available > > Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the > moment) fixes all of the SSL tests for me, and I don't see either the > SECMOD leak or the "one-time function" error that you've mentioned. Reading the code I don't think a loaded user module is considered a resource that must've been released prior to closing the context. I will dig for what showed up in my tests, but I don't think it was caused by this. > What version of NSS are you running? I'm on 3.63. Right now I'm using what Debian 10 is packaging which is 3.42. Admittedly not hot off the press but I've been trying to develop off a packaged version which we might see users wanting to deploy against should this get shipped. -- Daniel Gustafsson https://vmware.com/
Attached is a rebased version which incorporates your recent patchset for resource handling, as well as the connect_ok test patch. I've implemented tracking the close_notify alert that you mentioned offlist, but it turns out that the alert callbacks in NSS are of limited use so it close_notify is currently the only checked description. The enum which labels the descriptions in the SSLAlert struct is private, so it's just sending over an anonymous number apart from close_notify which is zero. A few other fixups are included as well, like adapting the pending data read function in the frontend to how the OpenSSL implementation does it. -- Daniel Gustafsson https://vmware.com/
Attachment
- v38-0010-nss-Build-infrastructure.patch
- v38-0009-nss-Support-NSS-in-cryptohash.patch
- v38-0008-nss-Support-NSS-in-sslinfo.patch
- v38-0007-nss-Support-NSS-in-pgcrypto.patch
- v38-0006-nss-Documentation.patch
- v38-0005-nss-pg_strong_random-support.patch
- v38-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v38-0003-nss-Add-NSS-specific-tests.patch
- v38-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v38-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Wed, 2021-06-23 at 15:48 +0200, Daniel Gustafsson wrote: > Attached is a rebased version which incorporates your recent patchset for > resource handling, as well as the connect_ok test patch. With v38 I do see the "one-time function was previously called and failed" message you mentioned before, as well as some PR_Assert() crashes. Looks like it's just due to the placement of SSL_ClearSessionCache(); gating it behind the conn->nss_context check ensures that we don't call it if no NSS context actually exists. Patch attached (0001). -- Continuing my jog around the patch... client connections will crash if hostaddr is provided rather than host, because SSL_SetURL can't handle a NULL argument. I'm running with 0002 to fix it for the moment, but I'm not sure yet if it does the right thing for IP addresses, which the OpenSSL side has a special case for. Early EOFs coming from the server don't currently have their own error message, which leads to a confusingly empty connection to server at "127.0.0.1", port 47447 failed: 0003 adds one, to roughly match the corresponding OpenSSL message. While I was fixing that I noticed that I was getting a "unable to verify certificate" error message for the early EOF case, even with sslmode=require. That error message is being printed to conn- >errorMessage during pg_cert_auth_handler(), even if we're not verifying certificates, and then that message is included in later unrelated failures. 0004 patches that. --Jacob
Attachment
> On 19 Jul 2021, at 21:33, Jacob Champion <pchampion@vmware.com> wrote: > ..client connections will crash if > hostaddr is provided rather than host, because SSL_SetURL can't handle > a NULL argument. I'm running with 0002 to fix it for the moment, but > I'm not sure yet if it does the right thing for IP addresses, which the > OpenSSL side has a special case for. AFAICT the idea is to handle it in the cert auth callback, so I've added some PoC code to check for sslsni there and updated the TODO comment to reflect that. I've applied your patches in the attached rebase which passes all tests for me. -- Daniel Gustafsson https://vmware.com/
Attachment
- v39-0010-nss-Build-infrastructure.patch
- v39-0009-nss-Support-NSS-in-cryptohash.patch
- v39-0008-nss-Support-NSS-in-sslinfo.patch
- v39-0007-nss-Support-NSS-in-pgcrypto.patch
- v39-0006-nss-Documentation.patch
- v39-0005-nss-pg_strong_random-support.patch
- v39-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v39-0003-nss-Add-NSS-specific-tests.patch
- v39-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v39-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
Another rebase to work around the recent changes in the ssl Makefile. -- Daniel Gustafsson https://vmware.com/
Attachment
- v40-0010-nss-Build-infrastructure.patch
- v40-0009-nss-Support-NSS-in-cryptohash.patch
- v40-0008-nss-Support-NSS-in-sslinfo.patch
- v40-0007-nss-Support-NSS-in-pgcrypto.patch
- v40-0006-nss-Documentation.patch
- v40-0005-nss-pg_strong_random-support.patch
- v40-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v40-0003-nss-Add-NSS-specific-tests.patch
- v40-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v40-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Tue, 2021-08-10 at 19:22 +0200, Daniel Gustafsson wrote: > Another rebase to work around the recent changes in the ssl Makefile. I have a local test suite that I've been writing against libpq. With the new ssldatabase connection option, one tricky aspect is figuring out whether it's supported or not. It doesn't look like there's any way to tell, from a client application, whether NSS or OpenSSL (or neither) is in use. You'd mentioned that perhaps we should support a call like PQsslAttribute(NULL, "library"); /* returns "NSS", "OpenSSL", or NULL */ so that you don't have to have an actual connection first in order to figure out what connection options you need to supply. Clients that support multiple libpq versions would need to know whether that call is reliable (older versions of libpq will always return NULL, whether SSL is compiled in or not), so maybe we could add a feature macro at the same time? We could also add a new API (say, PQsslLibrary()) but I don't know if that gives us anything in practice. Thoughts? --Jacob
On Wed, Aug 18, 2021 at 12:06:59AM +0000, Jacob Champion wrote: > I have a local test suite that I've been writing against libpq. With > the new ssldatabase connection option, one tricky aspect is figuring > out whether it's supported or not. It doesn't look like there's any way > to tell, from a client application, whether NSS or OpenSSL (or neither) > is in use. That's about guessing which library libpq is compiled with, so yes that's a problem. > so that you don't have to have an actual connection first in order to > figure out what connection options you need to supply. Clients that > support multiple libpq versions would need to know whether that call is > reliable (older versions of libpq will always return NULL, whether SSL > is compiled in or not), so maybe we could add a feature macro at the > same time? Still, the problem is wider than that, no? One cannot know either if a version of libpq is able to work with GSSAPI until they attempt a connection with gssencmode. It seems to me that we should work on the larger picture here. > We could also add a new API (say, PQsslLibrary()) but I don't know if > that gives us anything in practice. Thoughts? Knowing that the GSSAPI stuff is part of fe-secure.c, we may want instead a call that returns a list of supported secure libraries. -- Michael
Attachment
> On 18 Aug 2021, at 02:32, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Aug 18, 2021 at 12:06:59AM +0000, Jacob Champion wrote: >> I have a local test suite that I've been writing against libpq. With >> the new ssldatabase connection option, one tricky aspect is figuring >> out whether it's supported or not. It doesn't look like there's any way >> to tell, from a client application, whether NSS or OpenSSL (or neither) >> is in use. > > That's about guessing which library libpq is compiled with, so yes > that's a problem. > >> so that you don't have to have an actual connection first in order to >> figure out what connection options you need to supply. Clients that >> support multiple libpq versions would need to know whether that call is >> reliable (older versions of libpq will always return NULL, whether SSL >> is compiled in or not), so maybe we could add a feature macro at the >> same time? > > Still, the problem is wider than that, no? One cannot know either if > a version of libpq is able to work with GSSAPI until they attempt a > connection with gssencmode. It seems to me that we should work on the > larger picture here. I think we should do both. PQsslAttribute() already exists, and being able to get the library attribute for NULL conn object when there are multiple libraries makes a lot of sense to me. That doesn’t exclude working on a better way for apps to interrogate the libpq they have at hand for which capabilities it has. Personally I’m not sure what that API could look like, but we should discuss that in a separate thread I guess. -- Daniel Gustafsson https://vmware.com/
Attached is a rebased v41 to keep the patch from bitrot. -- Daniel Gustafsson https://vmware.com/
Attachment
- v41-0008-nss-Support-NSS-in-sslinfo.patch
- v41-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v41-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v41-0003-nss-Add-NSS-specific-tests.patch
- v41-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v41-0005-nss-pg_strong_random-support.patch
- v41-0006-nss-Documentation.patch
- v41-0007-nss-Support-NSS-in-pgcrypto.patch
- v41-0009-nss-Support-NSS-in-cryptohash.patch
- v41-0010-nss-Build-infrastructure.patch
Rebased on top of HEAD with off-list comment fixes by Kevin Burke. -- Daniel Gustafsson https://vmware.com/
Attachment
- v42-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v42-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v42-0003-nss-Add-NSS-specific-tests.patch
- v42-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v42-0005-nss-pg_strong_random-support.patch
- v42-0006-nss-Documentation.patch
- v42-0007-nss-Support-NSS-in-pgcrypto.patch
- v42-0008-nss-Support-NSS-in-sslinfo.patch
- v42-0009-nss-Support-NSS-in-cryptohash.patch
- v42-0010-nss-Build-infrastructure.patch
On Mon, 2021-07-26 at 15:26 +0200, Daniel Gustafsson wrote: > > On 19 Jul 2021, at 21:33, Jacob Champion <pchampion@vmware.com> wrote: > > ..client connections will crash if > > hostaddr is provided rather than host, because SSL_SetURL can't handle > > a NULL argument. I'm running with 0002 to fix it for the moment, but > > I'm not sure yet if it does the right thing for IP addresses, which the > > OpenSSL side has a special case for. > > AFAICT the idea is to handle it in the cert auth callback, so I've added some > PoC code to check for sslsni there and updated the TODO comment to reflect > that. I dug a bit deeper into the SNI stuff: > + server_hostname = SSL_RevealURL(conn->pr_fd); > + if (!server_hostname || server_hostname[0] == '\0') > + { > + /* If SNI is enabled we must have a hostname set */ > + if (conn->sslsni && conn->sslsni[0]) > + status = SECFailure; conn->sslsni can be explicitly set to "0" to disable it, so this should probably be changed to a check for "1", but I'm not sure that would be correct either. If the user has the default sslsni="1" and supplies an IP address for the host parameter, I don't think we should fail the connection. > + if (host && host[0] && > + !(strspn(host, "0123456789.") == strlen(host) || > + strchr(host, ':'))) > + SSL_SetURL(conn->pr_fd, host); It looks like NSS may already have some code that prevents SNI from being sent for IP addresses, so that part of the guard might not be necessary. (And potentially counterproductive, because it looks like NSS can perform verification against the certificate's SANs if you pass an IP address to SSL_SetURL().) Speaking of IP addresses in SANs, it doesn't look like our OpenSSL backend can handle those. That's a separate conversation, but I might take a look at a patch for next commitfest. --Jacob
> On 21 Sep 2021, at 02:06, Jacob Champion <pchampion@vmware.com> wrote: > > On Mon, 2021-07-26 at 15:26 +0200, Daniel Gustafsson wrote: >>> On 19 Jul 2021, at 21:33, Jacob Champion <pchampion@vmware.com> wrote: >>> ..client connections will crash if >>> hostaddr is provided rather than host, because SSL_SetURL can't handle >>> a NULL argument. I'm running with 0002 to fix it for the moment, but >>> I'm not sure yet if it does the right thing for IP addresses, which the >>> OpenSSL side has a special case for. >> >> AFAICT the idea is to handle it in the cert auth callback, so I've added some >> PoC code to check for sslsni there and updated the TODO comment to reflect >> that. > > I dug a bit deeper into the SNI stuff: > >> + server_hostname = SSL_RevealURL(conn->pr_fd); >> + if (!server_hostname || server_hostname[0] == '\0') >> + { >> + /* If SNI is enabled we must have a hostname set */ >> + if (conn->sslsni && conn->sslsni[0]) >> + status = SECFailure; > > conn->sslsni can be explicitly set to "0" to disable it, so this should > probably be changed to a check for "1", Agreed. > but I'm not sure that would be > correct either. If the user has the default sslsni="1" and supplies an > IP address for the host parameter, I don't think we should fail the > connection. Maybe not, but doing so is at least in line with how the OpenSSL support will handle the same config AFAICT. Or am I missing something? >> + if (host && host[0] && >> + !(strspn(host, "0123456789.") == strlen(host) || >> + strchr(host, ':'))) >> + SSL_SetURL(conn->pr_fd, host); > > It looks like NSS may already have some code that prevents SNI from > being sent for IP addresses, so that part of the guard might not be > necessary. (And potentially counterproductive, because it looks like > NSS can perform verification against the certificate's SANs if you pass > an IP address to SSL_SetURL().) Skimming the NSS code I wasn't able find the countermeasures, can you provide a reference to where I should look? Feel free to post a new version of the NSS patch with these changes if you want. > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL > backend can handle those. That's a separate conversation, but I might > take a look at a patch for next commitfest. Please do. -- Daniel Gustafsson https://vmware.com/
On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote: > > On 21 Sep 2021, at 02:06, Jacob Champion <pchampion@vmware.com> wrote: > > but I'm not sure that would be > > correct either. If the user has the default sslsni="1" and supplies an > > IP address for the host parameter, I don't think we should fail the > > connection. > > Maybe not, but doing so is at least in line with how the OpenSSL support will > handle the same config AFAICT. Or am I missing something? With OpenSSL, I don't see a connection failure when using sslsni=1 with IP addresses. (verify-full can't work, but that's a separate problem.) > > > + if (host && host[0] && > > > + !(strspn(host, "0123456789.") == strlen(host) || > > > + strchr(host, ':'))) > > > + SSL_SetURL(conn->pr_fd, host); > > > > It looks like NSS may already have some code that prevents SNI from > > being sent for IP addresses, so that part of the guard might not be > > necessary. (And potentially counterproductive, because it looks like > > NSS can perform verification against the certificate's SANs if you pass > > an IP address to SSL_SetURL().) > > Skimming the NSS code I wasn't able find the countermeasures, can you provide a > reference to where I should look? I see the check in ssl_ShouldSendSNIExtension(), in ssl3exthandle.c. > Feel free to post a new version of the NSS patch with these changes if you want. Will do! Thanks, --Jacob
On Mon, 2021-09-27 at 16:29 +0000, Jacob Champion wrote: > On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote: > > > > Feel free to post a new version of the NSS patch with these changes if you want. > > Will do! Something like the attached, v43, I think. (since-v42.diff.txt has the changes only.) This fixes the interaction of IP addresses and SNI for me, and honors sslsni=0. --Jacob
Attachment
- since-v42.diff.txt
- v43-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v43-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v43-0003-nss-Add-NSS-specific-tests.patch
- v43-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v43-0005-nss-pg_strong_random-support.patch
- v43-0006-nss-Documentation.patch
- v43-0007-nss-Support-NSS-in-pgcrypto.patch
- v43-0008-nss-Support-NSS-in-sslinfo.patch
- v43-0009-nss-Support-NSS-in-cryptohash.patch
- v43-0010-nss-Build-infrastructure.patch
On Mon, Sep 20, 2021 at 2:38 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > Rebased on top of HEAD with off-list comment fixes by Kevin Burke. > Hello Daniel, I've been playing with your patch on Mac (OS 11.6 Big Sur) and have run into a couple of issues so far. 1. I get 7 warnings while running make (truncated): cryptohash_nss.c:101:21: warning: implicit conversion from enumeration type 'SECOidTag' to different enumeration type 'HASH_HashType' [-Wenum-conversion] ctx->hash_type = SEC_OID_SHA1; ~ ^~~~~~~~~~~~ ... cryptohash_nss.c:134:34: warning: implicit conversion from enumeration type 'HASH_HashType' to different enumeration type 'SECOidTag' [-Wenum-conversion] hash = SECOID_FindOIDByTag(ctx->hash_type); ~~~~~~~~~~~~~~~~~~~ ~~~~~^~~~~~~~~ 7 warnings generated. 2. libpq-refs-stamp fails -- it appears an exit is being injected into libpq on Mac Notes about my environment: I've installed nss via homebrew (at version 3.70) and linked it. Cheers, Rachel
> On 28 Sep 2021, at 01:07, Rachel Heaton <rachelmheaton@gmail.com> wrote: > 1. I get 7 warnings while running make (truncated): > cryptohash_nss.c:101:21: warning: implicit conversion from enumeration > type 'SECOidTag' to different enumeration type 'HASH_HashType' Nice catch, fixed in the attached. > 2. libpq-refs-stamp fails -- it appears an exit is being injected into > libpq on Mac I spent some time investigating this, and there are two cases of _exit() and one atexit() which are coming from the threading code in libnspr (which is the runtime lib required by libnss). On macOS the threading code registers an atexit handler [0] in order to work around issues with __attribute__((destructor)) [1]. The pthreads code also defines PR_ProcessExit [2] which does what it says on the tin, calls exit and not much more [3]. Both of these uses are only compiled when building with pthreads, which can be disabled in autoconf but that seems broken in recent version of NSPR. I'm fairly sure I've built NSPR with the user pthreads in the past, but if packagers build it like this then we need to conform to that. The PR_CreateProcess() [4] call further calls _exit() [5] in a number of error paths on failing syscalls. The libpq libnss implementation doesn't call either of these, and neither does libnss. I'm not entirely sure what to do here, it clearly requires an exception in the Makefile check of sorts if we deem we can live with this. @Jacob: how did you configure your copy of NSPR? -- Daniel Gustafsson https://vmware.com/ [0] https://hg.mozilla.org/projects/nspr/file/tip/pr/src/pthreads/ptthread.c#l1034 [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1399746#c99 [2] https://www-archive.mozilla.org/projects/nspr/reference/html/prinit.html#15859 [3] https://hg.mozilla.org/projects/nspr/file/tip/pr/src/pthreads/ptthread.c#l1181 [4] https://www-archive.mozilla.org/projects/nspr/reference/html/prprocess.html#24535 [5] https://hg.mozilla.org/projects/nspr/file/tip/pr/src/md/unix/uxproces.c#l268
Attachment
- v44-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v44-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v44-0003-nss-Add-NSS-specific-tests.patch
- v44-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v44-0005-nss-pg_strong_random-support.patch
- v44-0006-nss-Documentation.patch
- v44-0007-nss-Support-NSS-in-pgcrypto.patch
- v44-0008-nss-Support-NSS-in-sslinfo.patch
- v44-0009-nss-Support-NSS-in-cryptohash.patch
- v44-0010-nss-Build-infrastructure.patch
On Thu, 2021-09-30 at 14:17 +0200, Daniel Gustafsson wrote: > The libpq libnss implementation doesn't call either of these, and neither does > libnss. I thought the refs check only searched for direct symbol dependencies; is that piece of NSPR being statically included somehow? > I'm not entirely sure what to do here, it clearly requires an exception in the > Makefile check of sorts if we deem we can live with this. > > @Jacob: how did you configure your copy of NSPR? I use the Ubuntu 20.04 builtin (NSPR 4.25.0), but it looks like the reason I haven't been seeing this is because I've always used --enable- coverage. If I take that out, I see the same exit check failure. --Jacob
On Thu, 2021-09-30 at 16:04 +0000, Jacob Champion wrote: > On Thu, 2021-09-30 at 14:17 +0200, Daniel Gustafsson wrote: > > The libpq libnss implementation doesn't call either of these, and neither does > > libnss. > > I thought the refs check only searched for direct symbol dependencies; > is that piece of NSPR being statically included somehow? On my machine, at least, exit() is coming in due to a few calls to psprintf(), pstrdup(), and pg_malloc() in the new NSS code. (Disassembly via `objdump -S libpq.so` helped me track those down.) I'm working on a patch. --Jacob
> On 1 Oct 2021, at 02:02, Jacob Champion <pchampion@vmware.com> wrote: > On my machine, at least, exit() is coming in due to a few calls to > psprintf(), pstrdup(), and pg_malloc() in the new NSS code. > (Disassembly via `objdump -S libpq.so` helped me track those down.) I'm > working on a patch. Ah, that makes perfect sense. I was too focused on hunting in what new was linked against that I overlooked the obvious. Thanks for finding these. -- Daniel Gustafsson https://vmware.com/
On Fri, 2021-10-01 at 08:55 +0200, Daniel Gustafsson wrote: > Ah, that makes perfect sense. I was too focused on hunting in what new was > linked against that I overlooked the obvious. Thanks for finding these. No problem at all :) The exit() check is useful but still a little opaque, I think, especially since (from my newbie perspective) there's so much of the pgcommon staticlib that is forbidden for use in libpq. Fixed in v44, attached; changes in since-v43.diff.txt. --Jacob
Attachment
- since-v43.diff.txt
- v44-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v44-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v44-0003-nss-Add-NSS-specific-tests.patch
- v44-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v44-0005-nss-pg_strong_random-support.patch
- v44-0006-nss-Documentation.patch
- v44-0007-nss-Support-NSS-in-pgcrypto.patch
- v44-0008-nss-Support-NSS-in-sslinfo.patch
- v44-0009-nss-Support-NSS-in-cryptohash.patch
- v44-0010-nss-Build-infrastructure.patch
> On 4 Oct 2021, at 18:14, Jacob Champion <pchampion@vmware.com> wrote: > > On Fri, 2021-10-01 at 08:55 +0200, Daniel Gustafsson wrote: >> Ah, that makes perfect sense. I was too focused on hunting in what new was >> linked against that I overlooked the obvious. Thanks for finding these. > > No problem at all :) The exit() check is useful but still a little > opaque, I think, especially since (from my newbie perspective) there's > so much of the pgcommon staticlib that is forbidden for use in libpq. Thanks! These changes looks good. Since you accidentally based this on v43 and not the v44 I posted with the cryptohash fix in, the attached is a v45 with both your v44 and the previous one, all rebased over HEAD. -- Daniel Gustafsson https://vmware.com/
Attachment
- v45-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v45-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v45-0003-nss-Add-NSS-specific-tests.patch
- v45-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v45-0005-nss-pg_strong_random-support.patch
- v45-0006-nss-Documentation.patch
- v45-0007-nss-Support-NSS-in-pgcrypto.patch
- v45-0008-nss-Support-NSS-in-sslinfo.patch
- v45-0009-nss-Support-NSS-in-cryptohash.patch
- v45-0010-nss-Build-infrastructure.patch
On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote: > Thanks! These changes looks good. Since you accidentally based this on v43 > and not the v44 I posted with the cryptohash fix in, the attached is a v45 with > both your v44 and the previous one, all rebased over HEAD. Thanks, and sorry about that. --Jacob
Hi all, apologies but I'm having trouble applying the latest patch (v45) to the latest commit on master (6b0f6f79eef2168ce38a8ee99c3ed76e3df5d7ad)
I believe that these patches need to integrate the refactoring in commit b3b4d8e68ae83f432f43f035c7eb481ef93e1583 - git is searching for the wrong text in the existing file, but I'm not sure how to submit a patch against a patch.
I downloaded all of the patches to my local filesystem, and then ran:
for patch in ../../kevinburke/rustls-postgres/patchsets/2021-10-05-gustafsson-mailing-list/*.patch; do git am $patch; done;
I get the following error on the second patch file:
Applying: Refactor SSL testharness for multiple library
error: patch failed: src/test/ssl/t/001_ssltests.pl:7
error: src/test/ssl/t/001_ssltests.pl: patch does not apply
error: patch failed: src/test/ssl/t/SSLServer.pm:26
error: src/test/ssl/t/SSLServer.pm: patch does not apply
Patch failed at 0001 Refactor SSL testharness for multiple library
hint: Use 'git am --show-current-patch=diff' to see the failed patch
error: patch failed: src/test/ssl/t/001_ssltests.pl:7
error: src/test/ssl/t/001_ssltests.pl: patch does not apply
error: patch failed: src/test/ssl/t/SSLServer.pm:26
error: src/test/ssl/t/SSLServer.pm: patch does not apply
Patch failed at 0001 Refactor SSL testharness for multiple library
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Thanks,
Kevin
On Tue, Oct 5, 2021 at 8:05 AM Jacob Champion <pchampion@vmware.com> wrote:
On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote:
> Thanks! These changes looks good. Since you accidentally based this on v43
> and not the v44 I posted with the cryptohash fix in, the attached is a v45 with
> both your v44 and the previous one, all rebased over HEAD.
Thanks, and sorry about that.
--Jacob
For anyone else trying to test out this branch I'm able to get the patches to apply cleanly if I check out e.g. commit 92e6a98c3636948e7ece9a3260f9d89dd60da278.
Kevin
On Thu, Oct 28, 2021 at 9:31 PM Kevin Burke <kevin@burke.dev> wrote:
Hi all, apologies but I'm having trouble applying the latest patch (v45) to the latest commit on master (6b0f6f79eef2168ce38a8ee99c3ed76e3df5d7ad)I downloaded all of the patches to my local filesystem, and then ran:for patch in ../../kevinburke/rustls-postgres/patchsets/2021-10-05-gustafsson-mailing-list/*.patch; do git am $patch; done;I get the following error on the second patch file:Applying: Refactor SSL testharness for multiple library
error: patch failed: src/test/ssl/t/001_ssltests.pl:7
error: src/test/ssl/t/001_ssltests.pl: patch does not apply
error: patch failed: src/test/ssl/t/SSLServer.pm:26
error: src/test/ssl/t/SSLServer.pm: patch does not apply
Patch failed at 0001 Refactor SSL testharness for multiple library
hint: Use 'git am --show-current-patch=diff' to see the failed patchI believe that these patches need to integrate the refactoring in commit b3b4d8e68ae83f432f43f035c7eb481ef93e1583 - git is searching for the wrong text in the existing file, but I'm not sure how to submit a patch against a patch.Thanks,KevinOn Tue, Oct 5, 2021 at 8:05 AM Jacob Champion <pchampion@vmware.com> wrote:On Tue, 2021-10-05 at 15:08 +0200, Daniel Gustafsson wrote:
> Thanks! These changes looks good. Since you accidentally based this on v43
> and not the v44 I posted with the cryptohash fix in, the attached is a v45 with
> both your v44 and the previous one, all rebased over HEAD.
Thanks, and sorry about that.
--Jacob
> On 29 Oct 2021, at 06:31, Kevin Burke <kevin@burke.dev> wrote: Thanks for testing the patch! > I believe that these patches need to integrate the refactoring in commit > b3b4d8e68ae83f432f43f035c7eb481ef93e1583 - git is searching for the wrong text > in the existing file Correct, b3b4d8e68 as well as b4c4a00ea both created conflicts with this patchset. Attached is an updated patchset fixing both of those as well as adding version checks for NSS and NSPR to autoconf (with fallbacks for non-{nss|nspr}-config systems). The versions picked are semi-arbitrary and definitely up for discussion. I chose them mainly as they were the oldest commonly available packages I found, and they satisfy the requirements we have. > I'm not sure how to submit a patch against a patch. If you've done the work of fixing the conflicts in a rebase, the best option is IMO to supply a whole new version of the patchset since that will make the CF patch tester be able to build and test the version. -- Daniel Gustafsson https://vmware.com/
Attachment
- v46-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v46-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v46-0003-nss-Add-NSS-specific-tests.patch
- v46-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v46-0005-nss-pg_strong_random-support.patch
- v46-0006-nss-Documentation.patch
- v46-0007-nss-Support-NSS-in-pgcrypto.patch
- v46-0008-nss-Support-NSS-in-sslinfo.patch
- v46-0009-nss-Support-NSS-in-cryptohash.patch
- v46-0010-nss-Build-infrastructure.patch
Attached is a rebase fixing a tiny bug in the documentation which prevented it from being able to compile. -- Daniel Gustafsson https://vmware.com/
Attachment
- v47-0010-nss-Build-infrastructure.patch
- v47-0009-nss-Support-NSS-in-cryptohash.patch
- v47-0008-nss-Support-NSS-in-sslinfo.patch
- v47-0007-nss-Support-NSS-in-pgcrypto.patch
- v47-0006-nss-Documentation.patch
- v47-0005-nss-pg_strong_random-support.patch
- v47-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v47-0003-nss-Add-NSS-specific-tests.patch
- v47-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v47-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Fri, Nov 5, 2021 at 6:01 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > Attached is a rebase fixing a tiny bug in the documentation which prevented it > from being able to compile. > Hello, I'm looking to help out with reviews for this CF and I'm currently looking at this patchset. currently I'm stuck trying to configure: checking for nss-config... /usr/bin/nss-config checking for nspr-config... /usr/bin/nspr-config ... checking nss/ssl.h usability... no checking nss/ssl.h presence... no checking for nss/ssl.h... no configure: error: header file <nss/ssl.h> is required for NSS This is on fedora 33 and nss-devel is installed, nss-config is available (and configure finds it) but the directory is different from Ubuntu: (base) [vagrant@fedora ~]$ nss-config --includedir /usr/include/nss3 (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h So if nss-config --includedir is used then #include <ssl.h> should be used, or if not then #include <nss3/ssl.h> but on this system #include <nss/ssl.h> is not going to work. Thanks
On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Fri, Nov 5, 2021 at 6:01 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > Attached is a rebase fixing a tiny bug in the documentation which prevented it > > from being able to compile. > > > > Hello, I'm looking to help out with reviews for this CF and I'm > currently looking at this patchset. > > currently I'm stuck trying to configure: > > checking for nss-config... /usr/bin/nss-config > checking for nspr-config... /usr/bin/nspr-config > ... > checking nss/ssl.h usability... no > checking nss/ssl.h presence... no > checking for nss/ssl.h... no > configure: error: header file <nss/ssl.h> is required for NSS > > This is on fedora 33 and nss-devel is installed, nss-config is > available (and configure finds it) but the directory is different from > Ubuntu: > (base) [vagrant@fedora ~]$ nss-config --includedir > /usr/include/nss3 > (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h > -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h > > So if nss-config --includedir is used then #include <ssl.h> should be > used, or if not then #include <nss3/ssl.h> but on this system #include > <nss/ssl.h> is not going to work. FYI, if I make a symlink to get past this, configure completes but compilation fails because nspr/nspr.h cannot be found (I'm not sure why configure doesn't discover this) ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found #include <nspr/nspr.h>In file included from protocol_nss.c:24: ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found #include <nspr/nspr.h> ^~~~~~~~~~~~~ It's a similar issue: $ nspr-config --includedir /usr/include/nspr4
On Tue, Nov 9, 2021 at 2:02 PM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: > > > > On Fri, Nov 5, 2021 at 6:01 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > > > Attached is a rebase fixing a tiny bug in the documentation which prevented it > > > from being able to compile. > > > > > > > Hello, I'm looking to help out with reviews for this CF and I'm > > currently looking at this patchset. > > > > currently I'm stuck trying to configure: > > > > checking for nss-config... /usr/bin/nss-config > > checking for nspr-config... /usr/bin/nspr-config > > ... > > checking nss/ssl.h usability... no > > checking nss/ssl.h presence... no > > checking for nss/ssl.h... no > > configure: error: header file <nss/ssl.h> is required for NSS > > > > This is on fedora 33 and nss-devel is installed, nss-config is > > available (and configure finds it) but the directory is different from > > Ubuntu: > > (base) [vagrant@fedora ~]$ nss-config --includedir > > /usr/include/nss3 > > (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h > > -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h > > > > So if nss-config --includedir is used then #include <ssl.h> should be > > used, or if not then #include <nss3/ssl.h> but on this system #include > > <nss/ssl.h> is not going to work. > > FYI, if I make a symlink to get past this, configure completes but > compilation fails because nspr/nspr.h cannot be found (I'm not sure > why configure doesn't discover this) > ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found > #include <nspr/nspr.h>In file included from protocol_nss.c:24: > ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found > #include <nspr/nspr.h> > ^~~~~~~~~~~~~ > > It's a similar issue: > $ nspr-config --includedir > /usr/include/nspr4 If these get resolved the next issue is llvm bitcode doesn't compile because the nss includedir is missing from CPPFLAGS: /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc be-secure-nss.c In file included from be-secure-nss.c:20: In file included from ../../../src/include/common/nss.h:38: In file included from /usr/include/nss/nss.h:34: /usr/include/nss/seccomon.h:17:10: fatal error: 'prtypes.h' file not found #include "prtypes.h" ^~~~~~~~~~~ 1 error generated.
> On 9 Nov 2021, at 22:22, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > On Tue, Nov 9, 2021 at 2:02 PM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: >> >> On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle >> <joshua.brindle@crunchydata.com> wrote: >>> Hello, I'm looking to help out with reviews for this CF and I'm >>> currently looking at this patchset. Thanks, much appreciated! >>> currently I'm stuck trying to configure: >>> >>> checking for nss-config... /usr/bin/nss-config >>> checking for nspr-config... /usr/bin/nspr-config >>> ... >>> checking nss/ssl.h usability... no >>> checking nss/ssl.h presence... no >>> checking for nss/ssl.h... no >>> configure: error: header file <nss/ssl.h> is required for NSS >>> >>> This is on fedora 33 and nss-devel is installed, nss-config is >>> available (and configure finds it) but the directory is different from >>> Ubuntu: >>> (base) [vagrant@fedora ~]$ nss-config --includedir >>> /usr/include/nss3 >>> (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h >>> -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h >>> >>> So if nss-config --includedir is used then #include <ssl.h> should be >>> used, or if not then #include <nss3/ssl.h> but on this system #include >>> <nss/ssl.h> is not going to work. Interesting rename, I doubt any version but NSS 3 and NSPR 4 is alive anywhere and an incremented major version seems highly unlikely. Going back to plain #include <ssl.h> and have the includeflags sort out the correct directories seems like the best option then. Fixed in the attached. >> FYI, if I make a symlink to get past this, configure completes but >> compilation fails because nspr/nspr.h cannot be found (I'm not sure >> why configure doesn't discover this) >> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found >> #include <nspr/nspr.h>In file included from protocol_nss.c:24: >> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found >> #include <nspr/nspr.h> >> ^~~~~~~~~~~~~ >> >> It's a similar issue: >> $ nspr-config --includedir >> /usr/include/nspr4 Fixed. > If these get resolved the next issue is llvm bitcode doesn't compile > because the nss includedir is missing from CPPFLAGS: > > /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv > -O2 -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 > -I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc > be-secure-nss.c > In file included from be-secure-nss.c:20: > In file included from ../../../src/include/common/nss.h:38: > In file included from /usr/include/nss/nss.h:34: > /usr/include/nss/seccomon.h:17:10: fatal error: 'prtypes.h' file not found > #include "prtypes.h" > ^~~~~~~~~~~ > 1 error generated. Fixed. The attached also resolves the conflicts in pgcrypto following db7d1a7b05. PGP elgamel and RSA pubkey functions aren't supported for now as there is no bignum functions similar to the BN_* in OpenSSL. I will look into more how hard it would be to support, for now this gets us ahead. -- Daniel Gustafsson https://vmware.com/
Attachment
- v48-0010-nss-Build-infrastructure.patch
- v48-0009-nss-Support-NSS-in-cryptohash.patch
- v48-0008-nss-Support-NSS-in-sslinfo.patch
- v48-0007-nss-Support-NSS-in-pgcrypto.patch
- v48-0006-nss-Documentation.patch
- v48-0005-nss-pg_strong_random-support.patch
- v48-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v48-0003-nss-Add-NSS-specific-tests.patch
- v48-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v48-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Wed, Nov 10, 2021 at 8:49 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 9 Nov 2021, at 22:22, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Tue, Nov 9, 2021 at 2:02 PM Joshua Brindle > > <joshua.brindle@crunchydata.com> wrote: > >> > >> On Tue, Nov 9, 2021 at 1:59 PM Joshua Brindle > >> <joshua.brindle@crunchydata.com> wrote: > > >>> Hello, I'm looking to help out with reviews for this CF and I'm > >>> currently looking at this patchset. > > Thanks, much appreciated! > > >>> currently I'm stuck trying to configure: > >>> > >>> checking for nss-config... /usr/bin/nss-config > >>> checking for nspr-config... /usr/bin/nspr-config > >>> ... > >>> checking nss/ssl.h usability... no > >>> checking nss/ssl.h presence... no > >>> checking for nss/ssl.h... no > >>> configure: error: header file <nss/ssl.h> is required for NSS > >>> > >>> This is on fedora 33 and nss-devel is installed, nss-config is > >>> available (and configure finds it) but the directory is different from > >>> Ubuntu: > >>> (base) [vagrant@fedora ~]$ nss-config --includedir > >>> /usr/include/nss3 > >>> (base) [vagrant@fedora ~]$ ls -al /usr/include/nss3/ssl.h > >>> -rw-r--r--. 1 root root 70450 Sep 30 05:41 /usr/include/nss3/ssl.h > >>> > >>> So if nss-config --includedir is used then #include <ssl.h> should be > >>> used, or if not then #include <nss3/ssl.h> but on this system #include > >>> <nss/ssl.h> is not going to work. > > Interesting rename, I doubt any version but NSS 3 and NSPR 4 is alive anywhere > and an incremented major version seems highly unlikely. Going back to plain > #include <ssl.h> and have the includeflags sort out the correct directories > seems like the best option then. Fixed in the attached. > > >> FYI, if I make a symlink to get past this, configure completes but > >> compilation fails because nspr/nspr.h cannot be found (I'm not sure > >> why configure doesn't discover this) > >> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found > >> #include <nspr/nspr.h>In file included from protocol_nss.c:24: > >> ../../src/include/common/nss.h:31:10: fatal error: 'nspr/nspr.h' file not found > >> #include <nspr/nspr.h> > >> ^~~~~~~~~~~~~ > >> > >> It's a similar issue: > >> $ nspr-config --includedir > >> /usr/include/nspr4 > > Fixed. > > > If these get resolved the next issue is llvm bitcode doesn't compile > > because the nss includedir is missing from CPPFLAGS: > > > > /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv > > -O2 -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 > > -I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc > > be-secure-nss.c > > In file included from be-secure-nss.c:20: > > In file included from ../../../src/include/common/nss.h:38: > > In file included from /usr/include/nss/nss.h:34: > > /usr/include/nss/seccomon.h:17:10: fatal error: 'prtypes.h' file not found > > #include "prtypes.h" > > ^~~~~~~~~~~ > > 1 error generated. > > Fixed. Apologies for the delay, this didn't go to my inbox and I missed it on list. The bitcode generation is still broken, this time for nspr.h: /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -flto=thin -emit-llvm -c -o be-secure-nss.bc be-secure-nss.c In file included from be-secure-nss.c:20: ../../../src/include/common/nss.h:31:10: fatal error: 'nspr.h' file not found #include <nspr.h> ^~~~~~~~ 1 error generated. FWIW I attached the Dockerfile I've been using to test this, primarily to ensure that there were no openssl devel files lurking around during compilation. It expects a ./postgres directory with whatever patches already applied to it. > > The attached also resolves the conflicts in pgcrypto following db7d1a7b05. PGP > elgamel and RSA pubkey functions aren't supported for now as there is no bignum > functions similar to the BN_* in OpenSSL. I will look into more how hard it > would be to support, for now this gets us ahead. > > -- > Daniel Gustafsson https://vmware.com/ >
Attachment
> On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > Apologies for the delay, this didn't go to my inbox and I missed it on list. > > The bitcode generation is still broken, this time for nspr.h: Interesting, I am unable to replicate that in my tree but I'll investigate further tomorrow using your Dockerfile. For the sake of testing, does compilation pass for you in the same place without using --with-llvm? -- Daniel Gustafsson https://vmware.com/
On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > Apologies for the delay, this didn't go to my inbox and I missed it on list. > > > > The bitcode generation is still broken, this time for nspr.h: > > Interesting, I am unable to replicate that in my tree but I'll investigate > further tomorrow using your Dockerfile. For the sake of testing, does > compilation pass for you in the same place without using --with-llvm? > Yes, it builds and check-world passes. I'll continue testing with this build. Thank you.
On Mon, Nov 15, 2021 at 5:37 PM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > > > Apologies for the delay, this didn't go to my inbox and I missed it on list. > > > > > > The bitcode generation is still broken, this time for nspr.h: > > > > Interesting, I am unable to replicate that in my tree but I'll investigate > > further tomorrow using your Dockerfile. For the sake of testing, does > > compilation pass for you in the same place without using --with-llvm? > > > > Yes, it builds and check-world passes. I'll continue testing with this > build. Thank you. The previous Dockerfile had some issues due to a hasty port from RHEL to Fedora, attached is one that works with your patchset, llvm currently disabled, and the llvm deps removed. The service file is also attached since it's referenced in the Dockerfile and you'd have had to reproduce it. After building, run with: docker run --name pg-test -p 5432:5432 --cap-add=SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro -d <final docker hash>
Attachment
On Tue, Nov 16, 2021 at 9:45 AM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Mon, Nov 15, 2021 at 5:37 PM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: > > > > On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > > > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > > > > > Apologies for the delay, this didn't go to my inbox and I missed it on list. > > > > > > > > The bitcode generation is still broken, this time for nspr.h: > > > > > > Interesting, I am unable to replicate that in my tree but I'll investigate > > > further tomorrow using your Dockerfile. For the sake of testing, does > > > compilation pass for you in the same place without using --with-llvm? > > > > > > > Yes, it builds and check-world passes. I'll continue testing with this > > build. Thank you. > > The previous Dockerfile had some issues due to a hasty port from RHEL > to Fedora, attached is one that works with your patchset, llvm > currently disabled, and the llvm deps removed. > > The service file is also attached since it's referenced in the > Dockerfile and you'd have had to reproduce it. > > After building, run with: > docker run --name pg-test -p 5432:5432 --cap-add=SYS_ADMIN -v > /sys/fs/cgroup:/sys/fs/cgroup:ro -d <final docker hash> I think there it a typo in the docs here that prevents them from building (this diff seems to fix it): diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml index 56b73e033c..844aa31e86 100644 --- a/doc/src/sgml/pgcrypto.sgml +++ b/doc/src/sgml/pgcrypto.sgml @@ -767,7 +767,7 @@ pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256') <para> Which cipher algorithm to use. <literal>cast5</literal> is only available if <productname>PostgreSQL</productname> was built with - <productname>OpenSSL</productame>. + <productname>OpenSSL</productname>. </para> <literallayout> Values: bf, aes128, aes192, aes256, 3des, cast5
On Tue, Nov 16, 2021 at 1:26 PM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Tue, Nov 16, 2021 at 9:45 AM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: > > > > On Mon, Nov 15, 2021 at 5:37 PM Joshua Brindle > > <joshua.brindle@crunchydata.com> wrote: > > > > > > On Mon, Nov 15, 2021 at 4:44 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > > > > > > On 15 Nov 2021, at 20:51, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > > > > > > > Apologies for the delay, this didn't go to my inbox and I missed it on list. > > > > > > > > > > The bitcode generation is still broken, this time for nspr.h: > > > > > > > > Interesting, I am unable to replicate that in my tree but I'll investigate > > > > further tomorrow using your Dockerfile. For the sake of testing, does > > > > compilation pass for you in the same place without using --with-llvm? > > > > > > > > > > Yes, it builds and check-world passes. I'll continue testing with this > > > build. Thank you. > > > > The previous Dockerfile had some issues due to a hasty port from RHEL > > to Fedora, attached is one that works with your patchset, llvm > > currently disabled, and the llvm deps removed. > > > > The service file is also attached since it's referenced in the > > Dockerfile and you'd have had to reproduce it. > > > > After building, run with: > > docker run --name pg-test -p 5432:5432 --cap-add=SYS_ADMIN -v > > /sys/fs/cgroup:/sys/fs/cgroup:ro -d <final docker hash> > > I think there it a typo in the docs here that prevents them from > building (this diff seems to fix it): > > diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml > index 56b73e033c..844aa31e86 100644 > --- a/doc/src/sgml/pgcrypto.sgml > +++ b/doc/src/sgml/pgcrypto.sgml > @@ -767,7 +767,7 @@ pgp_sym_encrypt(data, psw, 'compress-algo=1, > cipher-algo=aes256') > <para> > Which cipher algorithm to use. <literal>cast5</literal> is only available > if <productname>PostgreSQL</productname> was built with > - <productname>OpenSSL</productame>. > + <productname>OpenSSL</productname>. > </para> > <literallayout> > Values: bf, aes128, aes192, aes256, 3des, cast5 After a bit more testing, the server is up and running with an nss database but before configuring the client database I tried connecting and got a segfault: #0 PR_Write (fd=0x0, buf=0x141ba60, amount=84) at io/../../.././nspr/pr/src/io/priometh.c:114 #1 0x00007ff33dfdc62f in pgtls_write (conn=0x13cecb0, ptr=0x141ba60, len=84) at fe-secure-nss.c:583 #2 0x00007ff33dfd6e18 in pqsecure_write (conn=0x13cecb0, ptr=0x141ba60, len=84) at fe-secure.c:295 #3 0x00007ff33dfd04dc in pqSendSome (conn=0x13cecb0, len=84) at fe-misc.c:834 #4 0x00007ff33dfd06c8 in pqFlush (conn=0x13cecb0) at fe-misc.c:972 #5 0x00007ff33dfc257c in pqPacketSend (conn=0x13cecb0, pack_type=0 '\000', buf=0x1414c60, buf_len=80) at fe-connect.c:4619 #6 0x00007ff33dfbfadd in PQconnectPoll (conn=0x13cecb0) at fe-connect.c:2986 #7 0x00007ff33dfbe55c in connectDBComplete (conn=0x13cecb0) at fe-connect.c:2218 #8 0x00007ff33dfbbaef in PQconnectdbParams (keywords=0x1427d10, values=0x1427e60, expand_dbname=1) at fe-connect.c:668 #9 0x000000000043ebc7 in main (argc=2, argv=0x7ffdccd0e2f8) at startup.c:273 It looks like the ssl connection falls through to attempt a non-ssl connection but at some point conn->ssl_in_use gets set to true, despite pr_fd and nss_context being null. This patch fixes the segfault but I suspect is not the correct fix, due to the error when connecting saying "Success": --- a/src/interfaces/libpq/fe-secure-nss.c +++ b/src/interfaces/libpq/fe-secure-nss.c @@ -498,6 +498,11 @@ pgtls_read(PGconn *conn, void *ptr, size_t len) * for closed connections, while -1 indicates an error within the ongoing * connection. */ + if (!conn->pr_fd) { + SOCK_ERRNO_SET(read_errno); + return -1; + } + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT); if (nread == 0) @@ -580,6 +585,11 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len) PRErrorCode status; int write_errno = 0; + if (!conn->pr_fd) { + SOCK_ERRNO_SET(write_errno); + return -1; + } + n = PR_Write(conn->pr_fd, ptr, len); if (n < 0)
> On 17 Nov 2021, at 19:42, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > On Tue, Nov 16, 2021 at 1:26 PM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: >> I think there it a typo in the docs here that prevents them from >> building (this diff seems to fix it): Ah yes, thanks, I had noticed that one but forgot to send out a new version to make the CFBot green. > After a bit more testing, the server is up and running with an nss > database but before configuring the client database I tried connecting > and got a segfault: Interesting. I'm unable to reproduce this crash, can you show the sequence of commands which led to this? > It looks like the ssl connection falls through to attempt a non-ssl > connection but at some point conn->ssl_in_use gets set to true, > despite pr_fd and nss_context being null. pgtls_close missed setting ssl_in_use to false, fixed in the attached. I've also added some assertions to the connection setup for debugging this. > This patch fixes the segfault but I suspect is not the correct fix, > due to the error when connecting saying "Success": Right, without an SSL enabled FD we should never get here. -- Daniel Gustafsson https://vmware.com/
Attachment
- v49-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
- v49-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v49-0003-nss-Add-NSS-specific-tests.patch
- v49-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v49-0005-nss-pg_strong_random-support.patch
- v49-0006-nss-Documentation.patch
- v49-0007-nss-Support-NSS-in-pgcrypto.patch
- v49-0008-nss-Support-NSS-in-sslinfo.patch
- v49-0009-nss-Support-NSS-in-cryptohash.patch
- v49-0010-nss-Build-infrastructure.patch
On Tue, Nov 23, 2021 at 9:12 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 17 Nov 2021, at 19:42, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Tue, Nov 16, 2021 at 1:26 PM Joshua Brindle > > <joshua.brindle@crunchydata.com> wrote: > > >> I think there it a typo in the docs here that prevents them from > >> building (this diff seems to fix it): > > Ah yes, thanks, I had noticed that one but forgot to send out a new version to > make the CFBot green. > > > After a bit more testing, the server is up and running with an nss > > database but before configuring the client database I tried connecting > > and got a segfault: > > Interesting. I'm unable to reproduce this crash, can you show the sequence of > commands which led to this? It no longer happens with v49, since it was a null deref of the pr_fd which no longer happens. I'll continue testing now, so far it's looking better. Did the build issue with --with-llvm get fixed in this update also? I haven't tried building with it yet. > > It looks like the ssl connection falls through to attempt a non-ssl > > connection but at some point conn->ssl_in_use gets set to true, > > despite pr_fd and nss_context being null. > > pgtls_close missed setting ssl_in_use to false, fixed in the attached. I've > also added some assertions to the connection setup for debugging this. > > > This patch fixes the segfault but I suspect is not the correct fix, > > due to the error when connecting saying "Success": > > Right, without an SSL enabled FD we should never get here. > Thank you.
> On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > It no longer happens with v49, since it was a null deref of the pr_fd > which no longer happens. > > I'll continue testing now, so far it's looking better. Great, thanks for confirming. I'm still keen on knowing how you triggered the segfault so I can ensure there are no further bugs around there. -- Daniel Gustafsson https://vmware.com/
On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > It no longer happens with v49, since it was a null deref of the pr_fd > > which no longer happens. > > > > I'll continue testing now, so far it's looking better. > > Great, thanks for confirming. I'm still keen on knowing how you triggered the > segfault so I can ensure there are no further bugs around there. > It happened when I ran psql with hostssl on the server but before I'd initialized my client certificate store.
On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > > On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > > > It no longer happens with v49, since it was a null deref of the pr_fd > > > which no longer happens. > > > > > > I'll continue testing now, so far it's looking better. > > > > Great, thanks for confirming. I'm still keen on knowing how you triggered the > > segfault so I can ensure there are no further bugs around there. > > > > It happened when I ran psql with hostssl on the server but before I'd > initialized my client certificate store. I don't know enough about NSS to know if this is problematic or not but if I try verify-full without having the root CA in the certificate store I get: $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres" psql: error: SSL error: Issuer certificate is invalid. unable to shut down NSS context: NSS could not shutdown. Objects are still in use.
On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: > > > > On Wed, Nov 24, 2021 at 6:59 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > > > > > On 23 Nov 2021, at 23:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > > > > > > It no longer happens with v49, since it was a null deref of the pr_fd > > > > which no longer happens. > > > > > > > > I'll continue testing now, so far it's looking better. > > > > > > Great, thanks for confirming. I'm still keen on knowing how you triggered the > > > segfault so I can ensure there are no further bugs around there. > > > > > > > It happened when I ran psql with hostssl on the server but before I'd > > initialized my client certificate store. > > I don't know enough about NSS to know if this is problematic or not > but if I try verify-full without having the root CA in the certificate > store I get: > > $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres" > psql: error: SSL error: Issuer certificate is invalid. > unable to shut down NSS context: NSS could not shutdown. Objects are > still in use. Something is strange with ssl downgrading and a bad ssldatabase [postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops sslcert=client_cert host=localhost" Password for user postgres: <freezes here> On the server side: 2021-11-25 01:52:01.984 UTC [269] LOG: unable to handshake: Encountered end of file (PR_END_OF_FILE_ERROR) Other than that and I still haven't tested --with-llvm I've gotten everything working, including with an openssl client. Attached is a dockerfile that gets to the point where a client can connect with clientcert=verify-full. I've removed some of the old cruft and debugging from the previous versions. Thank you.
Attachment
On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote: > > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL > > backend can handle those. That's a separate conversation, but I might > > take a look at a patch for next commitfest. > > Please do. Didn't get around to it for November, but I'm putting the finishing touches on that now. While I was looking at the new SAN code (in fe-secure-nss.c, pgtls_verify_peer_name_matches_certificate_guts()), I noticed that code coverage never seemed to touch a good chunk of it: > + for (cn = san_list; cn != san_list; cn = CERT_GetNextGeneralName(cn)) > + { > + char *alt_name; > + int rv; > + char tmp[512]; That loop can never execute. But I wonder if all of that extra SAN code should be removed anyway? There's this comment above it: > + /* > + * CERT_VerifyCertName will internally perform RFC 2818 SubjectAltName > + * verification. > + */ and it seems like SAN verification is working in my testing, despite the dead loop. --Jacob
> On 25 Nov 2021, at 14:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle > <joshua.brindle@crunchydata.com> wrote: >> >> On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle >> <joshua.brindle@crunchydata.com> wrote: >> I don't know enough about NSS to know if this is problematic or not >> but if I try verify-full without having the root CA in the certificate >> store I get: >> >> $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres" >> psql: error: SSL error: Issuer certificate is invalid. >> unable to shut down NSS context: NSS could not shutdown. Objects are >> still in use. Fixed. > Something is strange with ssl downgrading and a bad ssldatabase > [postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops > sslcert=client_cert host=localhost" > Password for user postgres: > > <freezes here> Also fixed. > On the server side: > 2021-11-25 01:52:01.984 UTC [269] LOG: unable to handshake: > Encountered end of file (PR_END_OF_FILE_ERROR) This is normal and expected, but to make it easier on users I've changed this error message to be aligned with the OpenSSL implementation. > Other than that and I still haven't tested --with-llvm I've gotten > everything working, including with an openssl client. Attached is a > dockerfile that gets to the point where a client can connect with > clientcert=verify-full. I've removed some of the old cruft and > debugging from the previous versions. Very cool, thanks! I've been unable to reproduce any issues with llvm but I'll keep poking at that. A new version will be posted shortly with the above and a few more fixes. -- Daniel Gustafsson https://vmware.com/
> On 30 Nov 2021, at 20:03, Jacob Champion <pchampion@vmware.com> wrote: > > On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote: >>> Speaking of IP addresses in SANs, it doesn't look like our OpenSSL >>> backend can handle those. That's a separate conversation, but I might >>> take a look at a patch for next commitfest. >> >> Please do. > > Didn't get around to it for November, but I'm putting the finishing > touches on that now. Cool, thanks! > While I was looking at the new SAN code (in fe-secure-nss.c, > pgtls_verify_peer_name_matches_certificate_guts()), I noticed that code > coverage never seemed to touch a good chunk of it: > >> + for (cn = san_list; cn != san_list; cn = CERT_GetNextGeneralName(cn)) >> + { >> + char *alt_name; >> + int rv; >> + char tmp[512]; > > That loop can never execute. But I wonder if all of that extra SAN code > should be removed anyway? There's this comment above it: > >> + /* >> + * CERT_VerifyCertName will internally perform RFC 2818 SubjectAltName >> + * verification. >> + */ > > and it seems like SAN verification is working in my testing, despite > the dead loop. Yeah, that's clearly bogus. I followed the bouncing ball reading NSS code and from what I can tell the comment is correct. I removed the dead code, only realizing after the fact that I might cause conflict with your tree doing so, in that case sorry. I've attached a v50 which fixes the issues found by Joshua upthread, as well as rebases on top of all the recent SSL and pgcrypto changes. -- Daniel Gustafsson https://vmware.com/
Attachment
- v50-0010-nss-Build-infrastructure.patch
- v50-0009-nss-Support-NSS-in-cryptohash.patch
- v50-0008-nss-Support-NSS-in-sslinfo.patch
- v50-0007-nss-Support-NSS-in-pgcrypto.patch
- v50-0006-nss-Documentation.patch
- v50-0005-nss-pg_strong_random-support.patch
- v50-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v50-0003-nss-Add-NSS-specific-tests.patch
- v50-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v50-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote: > > On 30 Nov 2021, at 20:03, Jacob Champion <pchampion@vmware.com> wrote: > > > > On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote: > > > > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL > > > > backend can handle those. That's a separate conversation, but I might > > > > take a look at a patch for next commitfest. > > > > > > Please do. > > > > Didn't get around to it for November, but I'm putting the finishing > > touches on that now. > > Cool, thanks! Done and registered in Commitfest. > Yeah, that's clearly bogus. I followed the bouncing ball reading NSS code and > from what I can tell the comment is correct. I removed the dead code, only > realizing after the fact that I might cause conflict with your tree doing so, > in that case sorry. No worries, there weren't any issues with the rebase. > I've attached a v50 which fixes the issues found by Joshua upthread, as well as > rebases on top of all the recent SSL and pgcrypto changes. Thanks! --Jacob
On Wed, Dec 15, 2021 at 5:05 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 25 Nov 2021, at 14:39, Joshua Brindle <joshua.brindle@crunchydata.com> wrote: > > On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle > > <joshua.brindle@crunchydata.com> wrote: > >> > >> On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle > >> <joshua.brindle@crunchydata.com> wrote: > > >> I don't know enough about NSS to know if this is problematic or not > >> but if I try verify-full without having the root CA in the certificate > >> store I get: > >> > >> $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres" > >> psql: error: SSL error: Issuer certificate is invalid. > >> unable to shut down NSS context: NSS could not shutdown. Objects are > >> still in use. > > Fixed. > > > Something is strange with ssl downgrading and a bad ssldatabase > > [postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops > > sslcert=client_cert host=localhost" > > Password for user postgres: > > > > <freezes here> > > Also fixed. > > > On the server side: > > 2021-11-25 01:52:01.984 UTC [269] LOG: unable to handshake: > > Encountered end of file (PR_END_OF_FILE_ERROR) > > This is normal and expected, but to make it easier on users I've changed this > error message to be aligned with the OpenSSL implementation. > > > Other than that and I still haven't tested --with-llvm I've gotten > > everything working, including with an openssl client. Attached is a > > dockerfile that gets to the point where a client can connect with > > clientcert=verify-full. I've removed some of the old cruft and > > debugging from the previous versions. > > Very cool, thanks! I've been unable to reproduce any issues with llvm but I'll > keep poking at that. A new version will be posted shortly with the above and a > few more fixes. For v50 this change was required for an llvm build to succeed on my Fedora system: diff --git a/configure b/configure index 25388a75a2..62d554806a 100755 --- a/configure +++ b/configure @@ -13276,6 +13276,7 @@ fi LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS" CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS" + CPPFLAGS="$CPPFLAGS $NSS_CFLAGS $NSPR_CFLAGS" $as_echo "#define USE_NSS 1" >>confdefs.h I'm not certain why configure didn't already have that, configure.ac appears to, but nonetheless it builds, all tests succeed, and a quick tire kicking looks good. Thank you.
Hi, On Wed, Dec 15, 2021 at 11:10:14PM +0100, Daniel Gustafsson wrote: > > I've attached a v50 which fixes the issues found by Joshua upthread, as well as > rebases on top of all the recent SSL and pgcrypto changes. The cfbot reports that the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_3138.log === Applying patches on top of PostgreSQL commit ID 74527c3e022d3ace648340b79a6ddec3419f6732 === [...] === applying patch ./v50-0010-nss-Build-infrastructure.patch patching file configure patching file configure.ac Hunk #3 succeeded at 1566 (offset 1 line). Hunk #4 succeeded at 2366 (offset 1 line). Hunk #5 succeeded at 2379 (offset 1 line). patching file src/backend/libpq/Makefile patching file src/common/Makefile patching file src/include/pg_config.h.in Hunk #3 succeeded at 926 (offset 3 lines). patching file src/interfaces/libpq/Makefile patching file src/tools/msvc/Install.pm Hunk #1 FAILED at 440. 1 out of 1 hunk FAILED -- saving rejects to file src/tools/msvc/Install.pm.rej Could you send a rebased version, possibly with an updated configure as reported by Joshua? In the meantime I will switch the entry to Waitinng on Author.
> On 15 Jan 2022, at 05:42, Julien Rouhaud <rjuju123@gmail.com> wrote: > On Wed, Dec 15, 2021 at 11:10:14PM +0100, Daniel Gustafsson wrote: >> >> I've attached a v50 which fixes the issues found by Joshua upthread, as well as >> rebases on top of all the recent SSL and pgcrypto changes. > > The cfbot reports that the patchset doesn't apply anymore: Fixed, as well as rebased and fixed up on top of the recent cryptohash error reporting functionality to support that on par with the OpenSSL backend. > ..possibly with an updated configure as reported by Joshua? I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac but not configure. Fixed now. -- Daniel Gustafsson https://vmware.com/
Attachment
- v51-0011-NSS-experimental-support-for-NSS-in-CI.patch
- v51-0010-nss-Build-infrastructure.patch
- v51-0009-nss-Support-NSS-in-cryptohash.patch
- v51-0008-nss-Support-NSS-in-sslinfo.patch
- v51-0007-nss-Support-NSS-in-pgcrypto.patch
- v51-0006-nss-Documentation.patch
- v51-0005-nss-pg_strong_random-support.patch
- v51-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v51-0003-nss-Add-NSS-specific-tests.patch
- v51-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v51-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
Hi, On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote: > > I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac > but not configure. Fixed now. Thanks! Apparently this version now fails on all OS, e.g.: https://cirrus-ci.com/task/4643868095283200 [22:17:39.965] # Failed test 'certificate authorization succeeds with correct client cert in PEM format' [22:17:39.965] # at t/001_ssltests.pl line 456. [22:17:39.965] # got: '2' [22:17:39.965] # expected: '0' [22:17:39.965] [22:17:39.965] # Failed test 'certificate authorization succeeds with correct client cert in PEM format: no stderr' [22:17:39.965] # at t/001_ssltests.pl line 456. [22:17:39.965] # got: 'psql: error: connection to server at "127.0.0.1", port 50023 failed: certificate present,but not private key file "/home/postgres/.postgresql/postgresql.key"' [22:17:39.965] # expected: '' [22:17:39.965] [22:17:39.965] # Failed test 'certificate authorization succeeds with correct client cert in DER format' [22:17:39.965] # at t/001_ssltests.pl line 475. [22:17:39.965] # got: '2' [22:17:39.965] # expected: '0' [...]
> On 18 Jan 2022, at 07:36, Julien Rouhaud <rjuju123@gmail.com> wrote: > On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote: >> >> I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac >> but not configure. Fixed now. > > Thanks! Apparently this version now fails on all OS, e.g.: Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it in testing. -- Daniel Gustafsson https://vmware.com/
Attachment
- v52-0011-NSS-experimental-support-for-NSS-in-CI.patch
- v52-0010-nss-Build-infrastructure.patch
- v52-0009-nss-Support-NSS-in-cryptohash.patch
- v52-0008-nss-Support-NSS-in-sslinfo.patch
- v52-0007-nss-Support-NSS-in-pgcrypto.patch
- v52-0006-nss-Documentation.patch
- v52-0005-nss-pg_strong_random-support.patch
- v52-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v52-0003-nss-Add-NSS-specific-tests.patch
- v52-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v52-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
On Tue, Jan 18, 2022 at 7:43 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 18 Jan 2022, at 07:36, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > On Mon, Jan 17, 2022 at 03:09:11PM +0100, Daniel Gustafsson wrote: > >> > >> I must've fat-fingered the "git add -p" for v50 as the fix was in configure.ac > >> but not configure. Fixed now. > > > > Thanks! Apparently this version now fails on all OS, e.g.: > > Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it > in testing. LGTM +1
On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote: > I've attached a v50 which fixes the issues found by Joshua upthread, as well as > rebases on top of all the recent SSL and pgcrypto changes. I'm currently tracking down a slot leak. When opening and closing large numbers of NSS databases, at some point we appear to run out of slots and then NSS starts misbehaving, even though we've closed all of our context handles. I don't have anything more helpful to share yet, but I wanted to make a note of it here in case anyone else had seen it or has ideas on what may be causing it. My next move will be to update the version of NSS I'm running. --Jacob
> On 18 Jan 2022, at 17:37, Jacob Champion <pchampion@vmware.com> wrote: > > On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote: >> I've attached a v50 which fixes the issues found by Joshua upthread, as well as >> rebases on top of all the recent SSL and pgcrypto changes. > > I'm currently tracking down a slot leak. When opening and closing large > numbers of NSS databases, at some point we appear to run out of slots > and then NSS starts misbehaving, even though we've closed all of our > context handles. Interesting, are you able to share a reproducer for this so I can assist in debugging it? -- Daniel Gustafsson https://vmware.com/
Hi, On 2022-01-18 13:42:54 +0100, Daniel Gustafsson wrote: > Fixed, I had made a mistake in the OpenSSL.pm testcode and failed to catch it > in testing. > +task: > + name: Linux - Debian Bullseye (nss) > [ copy of a bunch of code ] I also needed similar-but-not-quite-equivalent tasks for the meson patch as well. I just moved to having a splitting the tasks into a template and a use of it. It's probably not quite right as I did there, but it might be worth looking into: https://github.com/anarazel/postgres/blob/meson/.cirrus.yml#L181 But maybe this case actually has a better solution, see two paragraphs down: > + install_script: | > + DEBIAN_FRONTEND=noninteractive apt-get --yes install libnss3 libnss3-dev libnss3-tools libnspr4 libnspr4-dev This needs an apt-get update beforehand to succeed. That's what caused the last few runs to fail, see e.g. https://cirrus-ci.com/task/6293612580306944 Just duplicating the task doesn't really scale once in tree. What about reconfiguring (note: add --enable-depend) the linux tasks to build against nss, and then run the relevant subset of tests with it? Most tests don't use tcp / SSL anyway, so rerunning a small subset of tests should be feasible? > From 297ee9ab31aa579e002edc335cce83dae19711b1 Mon Sep 17 00:00:00 2001 > From: Daniel Gustafsson <daniel@yesql.se> > Date: Mon, 8 Feb 2021 23:52:22 +0100 > Subject: [PATCH v52 01/11] nss: Support libnss as TLS library in libpq > 16 files changed, 3192 insertions(+), 7 deletions(-) Phew. This is a huge patch. Damn, I only opened this thread to report the CI failure. But now I ended up doing a small review... > +#include "common/nss.h" > + > +/* > + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with > + * colliding definitions from ours, causing a much expected compiler error. > + * Remove backwards compatibility with ancient NSPR versions to avoid this. > + */ > +#define NO_NSPR_10_SUPPORT > +#include <nspr.h> > +#include <prerror.h> > +#include <prio.h> > +#include <prmem.h> > +#include <prtypes.h> Duplicated with nss.h. Which brings me to: > +#include <nss.h> Is it a great idea to have common/nss.h when there's a library header nss.h? Perhaps we should have a pg_ssl_{nss,openssl}.h or such? > +/* ------------------------------------------------------------ */ > +/* Public interface */ > +/* ------------------------------------------------------------ */ Nitpicks: I don't think we typically do multiple /* */ comments in a row for this type of thing. I also don't particularly like centering things like this, tends to get inconsistent across comments. > +/* > + * be_tls_open_server > + * > + * Since NSPR initialization must happen after forking, most of the actual > + * setup of NSPR/NSS is done here rather than in be_tls_init. The "Since ... must happen after forking" sounds like it's referencing a previously remarked upon fact. But I don't see anything but a copy of this comment. Does this make some things notably more expensive? Presumably it does remove a bunch of COW opportunities, but likely that's not a huge factor compared to assymetric crypto negotiation... Maybe soem of this commentary should migrate to the file header or such? > This introduce > + * differences with the OpenSSL support where some errors are only reported > + * at runtime with NSS where they are reported at startup with OpenSSL. Found this sentence hard to parse somehow. It seems pretty unfriendly to only have minimal error checking at postmaster startup time. Seems at least the presence and usability of keys should be done *also* at that time? > + /* > + * If no ciphers are specified, enable them all. > + */ > + if (!SSLCipherSuites || strlen(SSLCipherSuites) == 0) > + { > + status = NSS_SetDomesticPolicy(); > + if (status != SECSuccess) > + { > + ereport(COMMERROR, > + (errmsg("unable to set cipher policy: %s", > + pg_SSLerrmessage(PR_GetError())))); > + return -1; > + } > + } > + else > + { > + char *ciphers, > + *c; > + > + char *sep = ":;, "; > + PRUint16 ciphercode; > + const PRUint16 *nss_ciphers; > + bool found = false; > + > + /* > + * If the user has specified a set of preferred cipher suites we start > + * by turning off all the existing suites to avoid the risk of down- > + * grades to a weaker cipher than expected. > + */ > + nss_ciphers = SSL_GetImplementedCiphers(); > + for (int i = 0; i < SSL_GetNumImplementedCiphers(); i++) > + SSL_CipherPrefSet(model, nss_ciphers[i], PR_FALSE); > + > + ciphers = pstrdup(SSLCipherSuites); > + > + for (c = strtok(ciphers, sep); c; c = strtok(NULL, sep)) > + { > + if (pg_find_cipher(c, &ciphercode)) > + { > + status = SSL_CipherPrefSet(model, ciphercode, PR_TRUE); > + found = true; > + if (status != SECSuccess) > + { > + ereport(COMMERROR, > + (errmsg("invalid cipher-suite specified: %s", c))); > + return -1; It likely doesn't matter much because the backend will exit, but because COMERROR doesn't throw, it seems like this will leak "ciphers"? > + } > + } > + } > + > + pfree(ciphers); > + > + if (!found) > + { > + ereport(COMMERROR, > + (errmsg("no cipher-suites found"))); > + return -1; > + } > + } Seems like this could reasonably done in a separate function? > + server_cert = PK11_FindCertFromNickname(ssl_cert_file, (void *) port); > + if (!server_cert) > + { > + if (dummy_ssl_passwd_cb_called) > + { > + ereport(COMMERROR, > + (errmsg("unable to load certificate for \"%s\": %s", > + ssl_cert_file, pg_SSLerrmessage(PR_GetError())), > + errhint("The certificate requires a password."))); > + return -1; > + } I assume PR_GetError() is some thread-local construct, given it's also used in libpq? Why, oh why, do people copy the abysmal "global errno" approach everywhere. > +ssize_t > +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) > +{ I'm not a fan of duplicating the symbol names between be-secure-openssl.c and this. For one it's annoying for source code naviation. It also seems that at some point we might want to be able to link against both at the same time? Maybe we should name them unambiguously and then use some indirection in a header somewhere? > + ssize_t n_read; > + PRErrorCode err; > + > + n_read = PR_Read(port->pr_fd, ptr, len); > + > + if (n_read < 0) > + { > + err = PR_GetError(); > + > + if (err == PR_WOULD_BLOCK_ERROR) > + { > + *waitfor = WL_SOCKET_READABLE; > + errno = EWOULDBLOCK; > + } > + else > + errno = ECONNRESET; > + } > + > + return n_read; > +} > + > +ssize_t > +be_tls_write(Port *port, void *ptr, size_t len, int *waitfor) > +{ > + ssize_t n_write; > + PRErrorCode err; > + PRIntn flags = 0; > + > + /* > + * The flags parameter to PR_Send is no longer used and is, according to > + * the documentation, required to be zero. > + */ > + n_write = PR_Send(port->pr_fd, ptr, len, flags, PR_INTERVAL_NO_WAIT); > + > + if (n_write < 0) > + { > + err = PR_GetError(); > + > + if (err == PR_WOULD_BLOCK_ERROR) > + { > + *waitfor = WL_SOCKET_WRITEABLE; > + errno = EWOULDBLOCK; > + } > + else > + errno = ECONNRESET; > + } > + > + return n_write; > +} > + > +/* > + * be_tls_close > + * > + * Callback for closing down the current connection, if any. > + */ > +void > +be_tls_close(Port *port) > +{ > + if (!port) > + return; > + /* > + * Immediately signal to the rest of the backend that this connnection is > + * no longer to be considered to be using TLS encryption. > + */ > + port->ssl_in_use = false; > + > + if (port->peer_cn) > + { > + SSL_InvalidateSession(port->pr_fd); > + pfree(port->peer_cn); > + port->peer_cn = NULL; > + } > + > + PR_Close(port->pr_fd); > + port->pr_fd = NULL; What if we failed before initializing pr_fd? > + /* > + * Since there is no password callback in NSS when the server starts up, > + * it makes little sense to create an interactive callback. Thus, if this > + * is a retry attempt then give up immediately. > + */ > + if (retry) > + return NULL; That's really not great. Can't we do something like initialize NSS in postmaster, load the key into memory, including prompting, and then shut nss down again? > +/* > + * raw_subject_common_name > + * > + * Returns the Subject Common Name for the given certificate as a raw char > + * buffer (that is, without any form of escaping for unprintable characters or > + * embedded nulls), with the length of the buffer returned in the len param. > + * The buffer is allocated in the TopMemoryContext and is given a NULL > + * terminator so that callers are safe to call strlen() on it. > + * > + * This is used instead of CERT_GetCommonName(), which always performs quoting > + * and/or escaping. NSS doesn't appear to give us a way to easily unescape the > + * result, and we need to store the raw CN into port->peer_cn for compatibility > + * with the OpenSSL implementation. > + */ Do we have a testcase for embedded NULLs in common names? > +static char * > +raw_subject_common_name(CERTCertificate *cert, unsigned int *len) > +{ > + CERTName subject = cert->subject; > + CERTRDN **rdn; > + > + for (rdn = subject.rdns; *rdn; rdn++) > + { > + CERTAVA **ava; > + > + for (ava = (*rdn)->avas; *ava; ava++) > + { > + SECItem *buf; > + char *cn; > + > + if (CERT_GetAVATag(*ava) != SEC_OID_AVA_COMMON_NAME) > + continue; > + > + /* Found a CN, decode and copy it into a newly allocated buffer */ > + buf = CERT_DecodeAVAValue(&(*ava)->value); > + if (!buf) > + { > + /* > + * This failure case is difficult to test. (Since this code > + * runs after certificate authentication has otherwise > + * succeeded, you'd need to convince a CA implementation to > + * sign a corrupted certificate in order to get here.) Why is that hard with a toy CA locally? Might not be worth the effort, but if the comment explicitly talks about it being hard... > + * Follow the behavior of CERT_GetCommonName() in this case and > + * simply return NULL, as if a Common Name had not been found. > + */ > + goto fail; > + } > + > + cn = MemoryContextAlloc(TopMemoryContext, buf->len + 1); > + memcpy(cn, buf->data, buf->len); > + cn[buf->len] = '\0'; > + > + *len = buf->len; > + > + SECITEM_FreeItem(buf, PR_TRUE); > + return cn; > + } > + } > + > +fail: > + /* Not found */ > + *len = 0; > + return NULL; > +} > > +/* > + * pg_SSLShutdownFunc > + * Callback for NSS shutdown > + * > + * If NSS is terminated from the outside when the connection is still in use What does "NSS is terminated from the outside when the connection" really mean? Does this mean the client initiating something? > + * we must treat this as potentially hostile and immediately close to avoid > + * leaking the connection in any way. Once this is called, NSS will shutdown > + * regardless so we may as well clean up the best we can. Returning SECFailure > + * will cause the NSS shutdown to return with an error, but it will shutdown > + * nevertheless. nss_data is reserved for future use and is always NULL. > + */ > +static SECStatus > +pg_SSLShutdownFunc(void *private_data, void *nss_data) > +{ > + Port *port = (Port *) private_data; > + > + if (!port || !port->ssl_in_use) > + return SECSuccess; How can that happen? > + /* > + * There is a connection still open, close it and signal to whatever that > + * called the shutdown that it was erroneous. > + */ > + be_tls_close(port); > + be_tls_destroy(); And this doesn't have any dangerous around those functions getting called again later? > +void > +pgtls_close(PGconn *conn) > +{ > + conn->ssl_in_use = false; > + conn->has_password = false; > + > + /* > + * If the system trust module has been loaded we must try to unload it > + * before closing the context, since it will otherwise fail reporting a > + * SEC_ERROR_BUSY error. > + */ > + if (ca_trust != NULL) > + { > + if (SECMOD_UnloadUserModule(ca_trust) != SECSuccess) > + { > + pqInternalNotice(&conn->noticeHooks, > + "unable to unload trust module"); > + } > + else > + { > + SECMOD_DestroyModule(ca_trust); > + ca_trust = NULL; > + } > + } Might just misunderstand: How can it be ok to destroy ca_trust here? What if there's other connections using it? The same thread might be using multiple connections, and multiple threads might be using connections. Seems very much not thread safe. > +PostgresPollingStatusType > +pgtls_open_client(PGconn *conn) > +{ > + SECStatus status; > + PRFileDesc *model; > + NSSInitParameters params; > + SSLVersionRange desired_range; > + > +#ifdef ENABLE_THREAD_SAFETY > +#ifdef WIN32 > + /* This locking is modelled after fe-secure-openssl.c */ > + if (ssl_config_mutex == NULL) > + { > + while (InterlockedExchange(&win32_ssl_create_mutex, 1) == 1) > + /* loop while another thread owns the lock */ ; > + if (ssl_config_mutex == NULL) > + { > + if (pthread_mutex_init(&ssl_config_mutex, NULL)) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to lock thread")); > + return PGRES_POLLING_FAILED; > + } > + } > + InterlockedExchange(&win32_ssl_create_mutex, 0); > + } > +#endif > + if (pthread_mutex_lock(&ssl_config_mutex)) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to lock thread")); > + return PGRES_POLLING_FAILED; > + } > +#endif /* ENABLE_THREAD_SAFETY */ I'd very much like to avoid duplicating this code. Can we put it somewhere combined instead? > + /* > + * The NSPR documentation states that runtime initialization via PR_Init > + * is no longer required, as the first caller into NSPR will perform the > + * initialization implicitly. See be-secure-nss.c for further discussion > + * on PR_Init. > + */ > + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0); Why does this, and several subsequent bits, have to happen under a lock? > + if (conn->ssl_max_protocol_version && strlen(conn->ssl_max_protocol_version) > 0) > + { > + int ssl_max_ver = ssl_protocol_param_to_nss(conn->ssl_max_protocol_version); > + > + if (ssl_max_ver == -1) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("invalid value \"%s\" for maximum version of SSL protocol\n"), > + conn->ssl_max_protocol_version); > + return -1; > + } > + > + desired_range.max = ssl_max_ver; > + } > + > + if (SSL_VersionRangeSet(model, &desired_range) != SECSuccess) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to set allowed SSL protocol version range: %s"), > + pg_SSLerrmessage(PR_GetError())); > + return PGRES_POLLING_FAILED; > + } Why are some parts returning -1 and some PGRES_POLLING_FAILED? -1 certainly isn't a member of PostgresPollingStatusType. > + /* > + * The error cases for PR_Recv are not documented, but can be > + * reverse engineered from _MD_unix_map_default_error() in the > + * NSPR code, defined in pr/src/md/unix/unix_errors.c. > + */ Can we propose a patch to document them? Don't want to get bitten by this suddenly changing... > From a12769bd793a8e073125c3b3a176b355335646bc Mon Sep 17 00:00:00 2001 > From: Daniel Gustafsson <daniel@yesql.se> > Date: Mon, 8 Feb 2021 23:52:45 +0100 > Subject: [PATCH v52 07/11] nss: Support NSS in pgcrypto > > This extends pgcrypto to be able to use libnss as a cryptographic > backend for pgcrypto much like how OpenSSL is a supported backend. > Blowfish is not a supported cipher in NSS, so the implementation > falls back on the built-in BF code to be compatible in terms of > cipher support. I wish we didn't have pgcrypto in its current form. > From 5079ce8a677074b93ef1f118d535c6dee4ce64f9 Mon Sep 17 00:00:00 2001 > From: Daniel Gustafsson <daniel@yesql.se> > Date: Mon, 8 Feb 2021 23:52:55 +0100 > Subject: [PATCH v52 10/11] nss: Build infrastructure > > Finally this adds the infrastructure to build a postgres installation > with libnss support. I would suggest trying to come up with a way to reorder / split the series so that smaller pieces are committable. The way you have this right now leaves you with applying all of it at once as the only realistic way. And this patchset is too large for that. Greetings, Andres Freund
On Wed, 2022-01-19 at 10:01 +0100, Daniel Gustafsson wrote: > > On 18 Jan 2022, at 17:37, Jacob Champion <pchampion@vmware.com> wrote: > > > > On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote: > > > I've attached a v50 which fixes the issues found by Joshua upthread, as well as > > > rebases on top of all the recent SSL and pgcrypto changes. > > > > I'm currently tracking down a slot leak. When opening and closing large > > numbers of NSS databases, at some point we appear to run out of slots > > and then NSS starts misbehaving, even though we've closed all of our > > context handles. > > Interesting, are you able to share a reproducer for this so I can assist in > debugging it? (This was in my spam folder, sorry for the delay...) Let me see if I can minimize my current reproduction case and get it ported out of Python. --Jacob
On Tue, 2022-01-25 at 22:26 +0000, Jacob Champion wrote: > On Wed, 2022-01-19 at 10:01 +0100, Daniel Gustafsson wrote: > > > On 18 Jan 2022, at 17:37, Jacob Champion <pchampion@vmware.com> wrote: > > > > > > On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote: > > > > I've attached a v50 which fixes the issues found by Joshua upthread, as well as > > > > rebases on top of all the recent SSL and pgcrypto changes. > > > > > > I'm currently tracking down a slot leak. When opening and closing large > > > numbers of NSS databases, at some point we appear to run out of slots > > > and then NSS starts misbehaving, even though we've closed all of our > > > context handles. > > > > Interesting, are you able to share a reproducer for this so I can assist in > > debugging it? > > (This was in my spam folder, sorry for the delay...) Let me see if I > can minimize my current reproduction case and get it ported out of > Python. Here's my attempt at a Bash port. It has races but reliably reproduces on my machine after 98 connections (there's a hardcoded slot limit of 100, so that makes sense when factoring in the internal NSS slots). --Jacob
Attachment
> On 23 Jan 2022, at 22:20, Andres Freund <andres@anarazel.de> wrote: > On 2022-01-18 13:42:54 +0100, Daniel Gustafsson wrote: Thanks heaps for the review, much appreciated! >> + install_script: | >> + DEBIAN_FRONTEND=noninteractive apt-get --yes install libnss3 libnss3-dev libnss3-tools libnspr4 libnspr4-dev > > This needs an apt-get update beforehand to succeed. That's what caused the last few runs > to fail, see e.g. > https://cirrus-ci.com/task/6293612580306944 Ah, good point. Adding that made it indeed work. > Just duplicating the task doesn't really scale once in tree. Totally agree. This was mostly a hack to see if I could make the CFBot build a tailored build, then life threw school closures etc at me and I sort of forgot about removing it again. > What about > reconfiguring (note: add --enable-depend) the linux tasks to build against > nss, and then run the relevant subset of tests with it? Most tests don't use > tcp / SSL anyway, so rerunning a small subset of tests should be feasible? That's an interesting idea, I think that could work and be reasonably readable at the same time (and won't require in-depth knowledge of Cirrus). As it's the same task it does spend more time towards the max runtime per task, but that's not a problem for now. It's worth keeping in mind though if we deem this to be a way forward with testing multiple settings. >> From 297ee9ab31aa579e002edc335cce83dae19711b1 Mon Sep 17 00:00:00 2001 >> From: Daniel Gustafsson <daniel@yesql.se> >> Date: Mon, 8 Feb 2021 23:52:22 +0100 >> Subject: [PATCH v52 01/11] nss: Support libnss as TLS library in libpq > >> 16 files changed, 3192 insertions(+), 7 deletions(-) > > Phew. This is a huge patch. Yeah =/ .. without going beyond and inventing new things on top what is needed to replace OpenSSL, a lot of code (and tests) has to be written. If nothing else, this work at least highlights just how much we've come to use OpenSSL. > Damn, I only opened this thread to report the CI failure. But now I ended up > doing a small review... Thanks! Next time we meet, I owe you a beverage of choice. >> +#include "common/nss.h" >> + >> +/* >> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with >> + * colliding definitions from ours, causing a much expected compiler error. >> + * Remove backwards compatibility with ancient NSPR versions to avoid this. >> + */ >> +#define NO_NSPR_10_SUPPORT >> +#include <nspr.h> >> +#include <prerror.h> >> +#include <prio.h> >> +#include <prmem.h> >> +#include <prtypes.h> > > Duplicated with nss.h. Which brings me to: Fixed, there and elsewhere. >> +#include <nss.h> > > Is it a great idea to have common/nss.h when there's a library header nss.h? > Perhaps we should have a pg_ssl_{nss,openssl}.h or such? That's a good point, I modelled it after common/openssl.h but I agree it's better to differentiate the filenames. I've renamed it to common/pg_nss.h and we should IMO rename common/openssl.h regardless of what happens to this patch. >> +/* ------------------------------------------------------------ */ >> +/* Public interface */ >> +/* ------------------------------------------------------------ */ > > Nitpicks: > I don't think we typically do multiple /* */ comments in a row for this type > of thing. I also don't particularly like centering things like this, tends to > get inconsistent across comments. This is just a copy/paste from be-secure-openssl.c, but I'm far from married to it so happy to remove. Fixed. >> +/* >> + * be_tls_open_server >> + * >> + * Since NSPR initialization must happen after forking, most of the actual >> + * setup of NSPR/NSS is done here rather than in be_tls_init. > > The "Since ... must happen after forking" sounds like it's referencing a > previously remarked upon fact. But I don't see anything but a copy of this > comment. NSS contexts aren't fork safe, IIRC it's around its use of file descriptors. Fairly old NSS documentation and mailing list posts cite hardware tokens (which was a very strong focus in the earlier days of NSS) not being safe to use across forks and thus none of NSS was ever intended to be initialized until after the fork. I've reworded this comment a bit to make that clearer. > Does this make some things notably more expensive? Presumably it does remove a > bunch of COW opportunities, but likely that's not a huge factor compared to > assymetric crypto negotiation... Right, the context of setting up crypto across a network connection it's highly likely to drown out the costs. > Maybe soem of this commentary should migrate to the file header or such? Maybe, or perhaps README.ssl? Not sure where it would be most reasonable to keep it such that it's also kept up to date. >> This introduce >> + * differences with the OpenSSL support where some errors are only reported >> + * at runtime with NSS where they are reported at startup with OpenSSL. > > Found this sentence hard to parse somehow. > > It seems pretty unfriendly to only have minimal error checking at postmaster > startup time. Seems at least the presence and usability of keys should be done > *also* at that time? I'll look at adding some setup, and subsequent teardown, of NSS at startup during which we could do checking to be more on par with how the OpenSSL backend will report errors. >> + /* >> + * If no ciphers are specified, enable them all. >> + */ >> + if (!SSLCipherSuites || strlen(SSLCipherSuites) == 0) >> + { >> + ... >> + if (status != SECSuccess) >> + { >> + ereport(COMMERROR, >> + (errmsg("invalid cipher-suite specified: %s", c))); >> + return -1; > > It likely doesn't matter much because the backend will exit, but because > COMERROR doesn't throw, it seems like this will leak "ciphers"? Agreed, it won't matter much in practice but we should clearly pfree it, fixed. >> + pfree(ciphers); >> + >> + if (!found) >> + { >> + ereport(COMMERROR, >> + (errmsg("no cipher-suites found"))); >> + return -1; >> + } >> + } > > Seems like this could reasonably done in a separate function? Agreed, trimming the length of an already very long function is a good idea. Fixed. > I assume PR_GetError() is some thread-local construct, given it's also used in > libpq? Correct. > Why, oh why, do people copy the abysmal "global errno" approach everywhere. Even better, NSPR has two of them: PR_GetError and PR_GetOSError (the latter isn't used in this implementation, but it could potentially be added to error paths on NSS_InitContext and other calls that read off the filesystem). >> +ssize_t >> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) >> +{ > > I'm not a fan of duplicating the symbol names between be-secure-openssl.c and > this. For one it's annoying for source code naviation. It also seems that at > some point we might want to be able to link against both at the same time? > Maybe we should name them unambiguously and then use some indirection in a > header somewhere? We could do that, and that's something that we could do independently of this patch to keep the scope down. Doing it in master now with just the OpenSSL implementation as a consumer would be a logical next step in the TLS library abstraction we've done. >> + PR_Close(port->pr_fd); >> + port->pr_fd = NULL; > > What if we failed before initializing pr_fd? Fixed. >> + /* >> + * Since there is no password callback in NSS when the server starts up, >> + * it makes little sense to create an interactive callback. Thus, if this >> + * is a retry attempt then give up immediately. >> + */ >> + if (retry) >> + return NULL; > > That's really not great. Can't we do something like initialize NSS in > postmaster, load the key into memory, including prompting, and then shut nss > down again? I can look at doing something along those lines. It does require setting up a fair bit of infrastructure but if the code refactored to allow reuse it can probably be done fairly readable. >> +/* >> + * raw_subject_common_name >> + * >> + * Returns the Subject Common Name for the given certificate as a raw char >> + * buffer (that is, without any form of escaping for unprintable characters or >> + * embedded nulls), with the length of the buffer returned in the len param. >> + * The buffer is allocated in the TopMemoryContext and is given a NULL >> + * terminator so that callers are safe to call strlen() on it. >> + * >> + * This is used instead of CERT_GetCommonName(), which always performs quoting >> + * and/or escaping. NSS doesn't appear to give us a way to easily unescape the >> + * result, and we need to store the raw CN into port->peer_cn for compatibility >> + * with the OpenSSL implementation. >> + */ > > Do we have a testcase for embedded NULLs in common names? We don't, neither for OpenSSL or NSS. AFAICR Jacob spent days trying to get a certificate generation to include an embedded NULL byte but in the end gave up. We would have to write our own tools for generating certificates to add that (which may or may not be a bad idea, but it hasn't been done). >> + /* Found a CN, decode and copy it into a newly allocated buffer */ >> + buf = CERT_DecodeAVAValue(&(*ava)->value); >> + if (!buf) >> + { >> + /* >> + * This failure case is difficult to test. (Since this code >> + * runs after certificate authentication has otherwise >> + * succeeded, you'd need to convince a CA implementation to >> + * sign a corrupted certificate in order to get here.) > > Why is that hard with a toy CA locally? Might not be worth the effort, but if > the comment explicitly talks about it being hard... The gist of this comment is that it's hard to do with a stock local CA. I've added a small blurb to clarify that a custom implementation would be required. >> +/* >> + * pg_SSLShutdownFunc >> + * Callback for NSS shutdown >> + * >> + * If NSS is terminated from the outside when the connection is still in use > > What does "NSS is terminated from the outside when the connection" really > mean? Does this mean the client initiating something? If an extension, or other server-loaded code, interfered with NSS and managed to close contexts in order to interfere with connections this would ensure us closing it down cleanly. That being said, I was now unable to get my old testcase working so I've for now removed this callback from the patch until I can work out if we can make proper use of it. AFAICS other mature NSS implementations aren't using it (OpenLDAP did in the past but have since removed it, will look at how/why). >> + else >> + { >> + SECMOD_DestroyModule(ca_trust); >> + ca_trust = NULL; >> + } >> + } > > Might just misunderstand: How can it be ok to destroy ca_trust here? What if > there's other connections using it? The same thread might be using multiple > connections, and multiple threads might be using connections. Seems very much > not thread safe. Right, that's a leftover from early hacking that I had missed. Fixed. >> + /* This locking is modelled after fe-secure-openssl.c */ >> + if (ssl_config_mutex == NULL) >> + { >> + ... > > I'd very much like to avoid duplicating this code. Can we put it somewhere > combined instead? I can look at splitting it out to fe-secure-common.c. A first step here to keep the goalposts from moving in this patch would be to look at combining lock init in fe-secure-openssl.c:pgtls_init() and fe-connect.c:default_threadlock, and then just apply the same recipe here once landed. This could be done independent of this patch. >> + /* >> + * The NSPR documentation states that runtime initialization via PR_Init >> + * is no longer required, as the first caller into NSPR will perform the >> + * initialization implicitly. See be-secure-nss.c for further discussion >> + * on PR_Init. >> + */ >> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0); > > Why does this, and several subsequent bits, have to happen under a lock? NSS initialization isn't thread-safe, there is more discussion upthread in and around this email: https://postgr.es/m/c8d4bc0dfd266799ab4213f1673a813786ac0c70.camel@vmware.com > Why are some parts returning -1 and some PGRES_POLLING_FAILED? -1 certainly > isn't a member of PostgresPollingStatusType. That was a thinko, fixed. >> + /* >> + * The error cases for PR_Recv are not documented, but can be >> + * reverse engineered from _MD_unix_map_default_error() in the >> + * NSPR code, defined in pr/src/md/unix/unix_errors.c. >> + */ > > Can we propose a patch to document them? Don't want to get bitten by this > suddenly changing... I can certainly propose something on their mailinglist, but I unfortunately wouldn't get my hopes up too high as NSS and documentation aren't exactly best friends (the in-tree docs doesn't cover the API and Mozilla recently removed most of the online docs in their neverending developer site reorg). >> From a12769bd793a8e073125c3b3a176b355335646bc Mon Sep 17 00:00:00 2001 >> From: Daniel Gustafsson <daniel@yesql.se> >> Date: Mon, 8 Feb 2021 23:52:45 +0100 >> Subject: [PATCH v52 07/11] nss: Support NSS in pgcrypto >> >> This extends pgcrypto to be able to use libnss as a cryptographic >> backend for pgcrypto much like how OpenSSL is a supported backend. >> Blowfish is not a supported cipher in NSS, so the implementation >> falls back on the built-in BF code to be compatible in terms of >> cipher support. > > I wish we didn't have pgcrypto in its current form. Yes. Very much yes. I don't think doing anything about that in the context of this patch is wise, but a discussion on where to take pgcrypto in the future would probably be a good idea. >> From 5079ce8a677074b93ef1f118d535c6dee4ce64f9 Mon Sep 17 00:00:00 2001 >> From: Daniel Gustafsson <daniel@yesql.se> >> Date: Mon, 8 Feb 2021 23:52:55 +0100 >> Subject: [PATCH v52 10/11] nss: Build infrastructure >> >> Finally this adds the infrastructure to build a postgres installation >> with libnss support. > > I would suggest trying to come up with a way to reorder / split the series so > that smaller pieces are committable. The way you have this right now leaves > you with applying all of it at once as the only realistic way. And this > patchset is too large for that. I completely agree, the hard part is identifying smaller sets which also make sense and which doesn't leave the tree in a bad state should anyone check out that specific point in time. The two commits in the patchset that are "easy" to consider for pushing independently in this regard are IMO: * 0002 Test refactoring to support multiple TLS libraries. * 0004 Check for empty stderr during connect_ok The refactoring in 0002 is hopefully not too controversial, but it clearly needs eyes from someone more familiar with modern and idiomatic Perl. 0004 could IMO be pushed regardless of the fate of this patchset (after being floated in its own thread on -hackers). In order to find a good split I think we need to figure what to optimize for; do we optimize for ease of reverting should that be needed, or along functionality borders, or something else? I don't have good ideas here, but a single 7596 insertions(+), 421 deletions(-) commit is clearly not a good idea. Stephen had an idea off-list that we could look at splitting this across the server/client boundary, which I think is the only idea I've so far which has legs. (The first to go in would come with the common code of course.) Do you have any thoughts after reading through the patch? The attached v53 incorporates the fixes discussed above, and builds green for both OpenSSL and NSS in Cirrus on my Github repo (thanks again for your work on those files) so it will be interesting to see the CFBot running them. Next would be to figure out how to make the MSVC build it, basing an attempt on Andrew's blogpost. -- Daniel Gustafsson https://vmware.com/
Attachment
- v53-0011-NSS-experimental-support-for-NSS-in-CI.patch
- v53-0010-nss-Build-infrastructure.patch
- v53-0009-nss-Support-NSS-in-cryptohash.patch
- v53-0008-nss-Support-NSS-in-sslinfo.patch
- v53-0007-nss-Support-NSS-in-pgcrypto.patch
- v53-0006-nss-Documentation.patch
- v53-0005-nss-pg_strong_random-support.patch
- v53-0004-test-check-for-empty-stderr-during-connect_ok.patch
- v53-0003-nss-Add-NSS-specific-tests.patch
- v53-0002-Refactor-SSL-testharness-for-multiple-library.patch
- v53-0001-nss-Support-libnss-as-TLS-library-in-libpq.patch
Hi, On 2022-01-26 21:39:16 +0100, Daniel Gustafsson wrote: > > What about > > reconfiguring (note: add --enable-depend) the linux tasks to build against > > nss, and then run the relevant subset of tests with it? Most tests don't use > > tcp / SSL anyway, so rerunning a small subset of tests should be feasible? > > That's an interesting idea, I think that could work and be reasonably readable > at the same time (and won't require in-depth knowledge of Cirrus). As it's the > same task it does spend more time towards the max runtime per task, but that's > not a problem for now. It's worth keeping in mind though if we deem this to be > a way forward with testing multiple settings. I think it's a way for a limited number of settings, that each only require a limited amount of tests... Rerunning all tests etc is a different story. > > Is it a great idea to have common/nss.h when there's a library header nss.h? > > Perhaps we should have a pg_ssl_{nss,openssl}.h or such? > > That's a good point, I modelled it after common/openssl.h but I agree it's > better to differentiate the filenames. I've renamed it to common/pg_nss.h and > we should IMO rename common/openssl.h regardless of what happens to this patch. +1 > > Does this make some things notably more expensive? Presumably it does remove a > > bunch of COW opportunities, but likely that's not a huge factor compared to > > assymetric crypto negotiation... > > Right, the context of setting up crypto across a network connection it's highly > likely to drown out the costs. If you start to need to run a helper to decrypt an encrypted private key, and do all the initialization, I'm not sure sure that holds true anymore... Have you done any connection speed tests? pgbench -C is helpful for that. > > Maybe soem of this commentary should migrate to the file header or such? > > Maybe, or perhaps README.ssl? Not sure where it would be most reasonable to > keep it such that it's also kept up to date. Either would work for me. > >> This introduce > >> + * differences with the OpenSSL support where some errors are only reported > >> + * at runtime with NSS where they are reported at startup with OpenSSL. > > > > Found this sentence hard to parse somehow. > > > > It seems pretty unfriendly to only have minimal error checking at postmaster > > startup time. Seems at least the presence and usability of keys should be done > > *also* at that time? > > I'll look at adding some setup, and subsequent teardown, of NSS at startup > during which we could do checking to be more on par with how the OpenSSL > backend will report errors. Cool. > >> +/* > >> + * raw_subject_common_name > >> + * > >> + * Returns the Subject Common Name for the given certificate as a raw char > >> + * buffer (that is, without any form of escaping for unprintable characters or > >> + * embedded nulls), with the length of the buffer returned in the len param. > >> + * The buffer is allocated in the TopMemoryContext and is given a NULL > >> + * terminator so that callers are safe to call strlen() on it. > >> + * > >> + * This is used instead of CERT_GetCommonName(), which always performs quoting > >> + * and/or escaping. NSS doesn't appear to give us a way to easily unescape the > >> + * result, and we need to store the raw CN into port->peer_cn for compatibility > >> + * with the OpenSSL implementation. > >> + */ > > > > Do we have a testcase for embedded NULLs in common names? > > We don't, neither for OpenSSL or NSS. AFAICR Jacob spent days trying to get a > certificate generation to include an embedded NULL byte but in the end gave up. > We would have to write our own tools for generating certificates to add that > (which may or may not be a bad idea, but it hasn't been done). Hah, that's interesting. > >> +/* > >> + * pg_SSLShutdownFunc > >> + * Callback for NSS shutdown > >> + * > >> + * If NSS is terminated from the outside when the connection is still in use > > > > What does "NSS is terminated from the outside when the connection" really > > mean? Does this mean the client initiating something? > > If an extension, or other server-loaded code, interfered with NSS and managed > to close contexts in order to interfere with connections this would ensure us > closing it down cleanly. > > That being said, I was now unable to get my old testcase working so I've for > now removed this callback from the patch until I can work out if we can make > proper use of it. AFAICS other mature NSS implementations aren't using it > (OpenLDAP did in the past but have since removed it, will look at how/why). I think that'd be elog(FATAL) time if we want to do anything (after changing state so that no data is sent to client). > >> + /* > >> + * The error cases for PR_Recv are not documented, but can be > >> + * reverse engineered from _MD_unix_map_default_error() in the > >> + * NSPR code, defined in pr/src/md/unix/unix_errors.c. > >> + */ > > > > Can we propose a patch to document them? Don't want to get bitten by this > > suddenly changing... > > I can certainly propose something on their mailinglist, but I unfortunately > wouldn't get my hopes up too high as NSS and documentation aren't exactly best > friends (the in-tree docs doesn't cover the API and Mozilla recently removed > most of the online docs in their neverending developer site reorg). Kinda makes me question the wisdom of starting to depend on NSS. When openssl docs are vastly outshining a library's, that library really should start to ask itself some hard questions. > In order to find a good split I think we need to figure what to optimize for; > do we optimize for ease of reverting should that be needed, or along > functionality borders, or something else? I don't have good ideas here, but a > single 7596 insertions(+), 421 deletions(-) commit is clearly not a good idea. I think the goal should be the ability to incrementally commit. > Stephen had an idea off-list that we could look at splitting this across the > server/client boundary, which I think is the only idea I've so far which has > legs. (The first to go in would come with the common code of course.) Yea, that's the most obvious one. I suspect client-side has a lower complexity, because it doesn't need to replace quite as many things? > The attached v53 incorporates the fixes discussed above, and builds green for > both OpenSSL and NSS in Cirrus on my Github repo (thanks again for your work on > those files) so it will be interesting to see the CFBot running them. Looks like that worked... Greetings, Andres Freund
On Wed, 2022-01-26 at 15:59 -0800, Andres Freund wrote: > > > Do we have a testcase for embedded NULLs in common names? > > > > We don't, neither for OpenSSL or NSS. AFAICR Jacob spent days trying to get a > > certificate generation to include an embedded NULL byte but in the end gave up. > > We would have to write our own tools for generating certificates to add that > > (which may or may not be a bad idea, but it hasn't been done). > > Hah, that's interesting. Yeah, OpenSSL just refused to do it, with any method I could find at least. My personal test suite is using pyca/cryptography and psycopg2 to cover that case. --Jacob
>>> Can we propose a patch to document them? Don't want to get bitten by this >>> suddenly changing... >> >> I can certainly propose something on their mailinglist, but I unfortunately >> wouldn't get my hopes up too high as NSS and documentation aren't exactly best >> friends (the in-tree docs doesn't cover the API and Mozilla recently removed >> most of the online docs in their neverending developer site reorg). > > Kinda makes me question the wisdom of starting to depend on NSS. When openssl > docs are vastly outshining a library's, that library really should start to > ask itself some hard questions. Sadly, there is that. While this is not a new problem, Mozilla has been making some very weird decisions around NSS governance as of late. Another data point is the below thread from libcurl: https://curl.se/mail/lib-2022-01/0120.html -- Daniel Gustafsson https://vmware.com/
On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > Kinda makes me question the wisdom of starting to depend on NSS. When openssl > > docs are vastly outshining a library's, that library really should start to > > ask itself some hard questions. Yeah, OpenSSL is very poor, so being worse is not good. > Sadly, there is that. While this is not a new problem, Mozilla has been making > some very weird decisions around NSS governance as of late. Another data point > is the below thread from libcurl: > > https://curl.se/mail/lib-2022-01/0120.html I would really, really like to have an alternative to OpenSSL for PG. I don't know if this is the right thing, though. If other people are dropping support for it, that's a pretty bad sign IMHO. Later in the thread it says OpenLDAP have dropped support for it already as well. -- Robert Haas EDB: http://www.enterprisedb.com
> On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote: >>> Kinda makes me question the wisdom of starting to depend on NSS. When openssl >>> docs are vastly outshining a library's, that library really should start to >>> ask itself some hard questions. > > Yeah, OpenSSL is very poor, so being worse is not good. Some background on this for anyone interested: Mozilla removed the documentation from the MDN website and the attempt at resurrecting it in the tree (where it should've been all along </rant>) isn't making much progress. Some more can be found in this post on the NSS mailinglist: https://groups.google.com/a/mozilla.org/g/dev-tech-crypto/c/p0MO7030K4A/m/Mx5St_2sAwAJ -- Daniel Gustafsson https://vmware.com/
> On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote: >>> Kinda makes me question the wisdom of starting to depend on NSS. When openssl >>> docs are vastly outshining a library's, that library really should start to >>> ask itself some hard questions. > > Yeah, OpenSSL is very poor, so being worse is not good. > >> Sadly, there is that. While this is not a new problem, Mozilla has been making >> some very weird decisions around NSS governance as of late. Another data point >> is the below thread from libcurl: >> >> https://curl.se/mail/lib-2022-01/0120.html > > I would really, really like to have an alternative to OpenSSL for PG. > I don't know if this is the right thing, though. If other people are > dropping support for it, that's a pretty bad sign IMHO. Later in the > thread it says OpenLDAP have dropped support for it already as well. I'm counting this and Andres' comment as a -1 on the patchset, and given where we are in the cycle I'm mark it rejected in the CF app shortly unless anyone objects. -- Daniel Gustafsson https://vmware.com/
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > > On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson <daniel@yesql.se> wrote: > >>> Kinda makes me question the wisdom of starting to depend on NSS. When openssl > >>> docs are vastly outshining a library's, that library really should start to > >>> ask itself some hard questions. > > > > Yeah, OpenSSL is very poor, so being worse is not good. > > > >> Sadly, there is that. While this is not a new problem, Mozilla has been making > >> some very weird decisions around NSS governance as of late. Another data point > >> is the below thread from libcurl: > >> > >> https://curl.se/mail/lib-2022-01/0120.html > > > > I would really, really like to have an alternative to OpenSSL for PG. > > I don't know if this is the right thing, though. If other people are > > dropping support for it, that's a pretty bad sign IMHO. Later in the > > thread it says OpenLDAP have dropped support for it already as well. > > I'm counting this and Andres' comment as a -1 on the patchset, and given where > we are in the cycle I'm mark it rejected in the CF app shortly unless anyone > objects. I agree that it's concerning to hear that OpenLDAP dropped support for NSS... though I don't seem to be able to find any information as to why they decided to do so. NSS is clearly still supported and maintained and they do seem to understand that they need to work on the documentation situation and to get that fixed (the current issue seems to be around NSS vs. NSPR and the migration off of MDN to the in-tree documentation as Daniel mentioned, if I followed the discussion correctly in the bug that was filed by the curl folks and was then actively responded to by the NSS/NSPR folks), which seems to be the main issue that's being raised about it by the curl folks and here. I'm also very much a fan of having an alternative to OpenSSL and the NSS/NSPR license fits well for us, unlike the alternatives to OpenSSL used by other projects, such as GnuTLS (which is the alternative to OpenSSL that OpenLDAP now has) or other libraries like wolfSSL. Beyond the documentation issue, which I agree is a concern but also seems to be actively realized as an issue by the NSS/NSPR folks, is there some other reason that the curl folks are thinking of dropping support for it? Or does anyone have insight into why OpenLDAP decided to remove support? Thanks, Stephen
Attachment
Hi, On 2022-01-31 14:24:03 +0100, Daniel Gustafsson wrote: > > On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote: > > I would really, really like to have an alternative to OpenSSL for PG. > > I don't know if this is the right thing, though. If other people are > > dropping support for it, that's a pretty bad sign IMHO. Later in the > > thread it says OpenLDAP have dropped support for it already as well. > > I'm counting this and Andres' comment as a -1 on the patchset, and given where > we are in the cycle I'm mark it rejected in the CF app shortly unless anyone > objects. I'd make mine more a -0.2 or so. I'm concerned about the lack of non-code documentation and the state of code documentation. I'd like an openssl alternative, although not as much as a few years ago - it seems that the state of openssl has improved compared to most of the other implementations. Greetings, Andres Freund
> On 31 Jan 2022, at 17:24, Stephen Frost <sfrost@snowman.net> wrote: > * Daniel Gustafsson (daniel@yesql.se) wrote: >> I'm counting this and Andres' comment as a -1 on the patchset, and given where >> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone >> objects. > > I agree that it's concerning to hear that OpenLDAP dropped support for > NSS... though I don't seem to be able to find any information as to why > they decided to do so. I was also unable to do that. There is no information that I could see in either the commit message, Bugzilla entry (#9207) or on the mailinglist. Searching the web didn't yield anything either. I've reached out to hopefully get a bit more information. > I'm also very much a fan of having an alternative to OpenSSL and the > NSS/NSPR license fits well for us, unlike the alternatives to OpenSSL > used by other projects, such as GnuTLS (which is the alternative to > OpenSSL that OpenLDAP now has) or other libraries like wolfSSL. Short of platform specific (proprietary) libraries like Schannel and Secure Transport, the alternatives are indeed slim. > Beyond the documentation issue, which I agree is a concern but also > seems to be actively realized as an issue by the NSS/NSPR folks, It is, but it has also been an issue for years to be honest, getting the docs up to scratch will require a very large effort. > is there some other reason that the curl folks are thinking of dropping support > for it? It's also not really used anymore in conjunction with curl, with Red Hat no longer shipping builds against it. -- Daniel Gustafsson https://vmware.com/
> On 31 Jan 2022, at 22:32, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-01-31 14:24:03 +0100, Daniel Gustafsson wrote: >>> On 28 Jan 2022, at 15:30, Robert Haas <robertmhaas@gmail.com> wrote: >>> I would really, really like to have an alternative to OpenSSL for PG. >>> I don't know if this is the right thing, though. If other people are >>> dropping support for it, that's a pretty bad sign IMHO. Later in the >>> thread it says OpenLDAP have dropped support for it already as well. >> >> I'm counting this and Andres' comment as a -1 on the patchset, and given where >> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone >> objects. > > I'd make mine more a -0.2 or so. I'm concerned about the lack of non-code > documentation and the state of code documentation. I'd like an openssl > alternative, although not as much as a few years ago - it seems that the state > of openssl has improved compared to most of the other implementations. IMHO I think OpenSSL has improved over OpenSSL of the past - which is great to see - but they have also diverged themselves into writing a full QUIC implementation which *I personally think* is a distraction they don't need. That being said, there aren't too many other options. -- Daniel Gustafsson https://vmware.com/
> On 31 Jan 2022, at 22:48, Daniel Gustafsson <daniel@yesql.se> wrote: >> On 31 Jan 2022, at 17:24, Stephen Frost <sfrost@snowman.net> wrote: >> I agree that it's concerning to hear that OpenLDAP dropped support for >> NSS... though I don't seem to be able to find any information as to why >> they decided to do so. > > I was also unable to do that. There is no information that I could see in > either the commit message, Bugzilla entry (#9207) or on the mailinglist. > Searching the web didn't yield anything either. I've reached out to hopefully > get a bit more information. Support issues and Red Hat dropping OpenLDAP was cited [0] as the main drivers for dropping NSS. -- Daniel Gustafsson https://vmware.com/ [0] https://curl.se/mail/lib-2022-02/0000.html
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > > On 31 Jan 2022, at 22:48, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 31 Jan 2022, at 17:24, Stephen Frost <sfrost@snowman.net> wrote: > > >> I agree that it's concerning to hear that OpenLDAP dropped support for > >> NSS... though I don't seem to be able to find any information as to why > >> they decided to do so. > > > > I was also unable to do that. There is no information that I could see in > > either the commit message, Bugzilla entry (#9207) or on the mailinglist. > > Searching the web didn't yield anything either. I've reached out to hopefully > > get a bit more information. > > Support issues and Red Hat dropping OpenLDAP was cited [0] as the main drivers > for dropping NSS. That's both very vaugue and oddly specific, I have to say. Also, not really sure that it's a good reason for other projects to move away, or for the large amount of work put into this effort to be thrown out when it seems to be quite close to finally being done and giving us an alternative, supported and maintained, TLS/SSL library. The concern about the documentation not being easily available is certainly something to consider. I remember in prior reviews not having that much difficulty looking up documentation for functions, and in doing some quick looking around there's certainly some (most?) of the NSS documentation still up, the issue is that the NSPR documentation was taken off of the MDN website and that's referenced from the NSS pages and is obviously something that folks working with NSS need to be able to find the documentation for too. All that said, while have documentation on the web is nice and all, it seems to still be in the source, at least when I grabbed NSPR locally with apt-get source and looked at PR_Recv, I found: /* ************************************************************************* * FUNCTION: PR_Recv * DESCRIPTION: * Receive a specified number of bytes from a connected socket. * The operation will block until some positive number of bytes are * transferred, a time out has occurred, or there is an error. * No more than 'amount' bytes will be transferred. * INPUTS: * PRFileDesc *fd * points to a PRFileDesc object representing a socket. * void *buf * pointer to a buffer to hold the data received. * PRInt32 amount * the size of 'buf' (in bytes) * PRIntn flags * must be zero or PR_MSG_PEEK. * PRIntervalTime timeout * Time limit for completion of the receive operation. * OUTPUTS: * None * RETURN: PRInt32 * a positive number indicates the number of bytes actually received. * 0 means the network connection is closed. * -1 indicates a failure. The reason for the failure is obtained * by calling PR_GetError(). ************************************************************************** */ So, it's not the case that the documentation is completely gone and utterly unavailable to those who are interested in it, it's just in the source rather than being on a nicely formatted webpage. One can find it on the web too, naturally: https://github.com/thespooler/nspr/blob/29ba433ebceda269d2b0885176b7f8cd4c5c2c52/pr/include/prio.h#L1424 (no idea what version that is, just found a random github repo with it, but wouldn't be hard to import the latest version). Considering how much we point people to our source when they're writing extensions and such, this doesn't strike me as quite the dire situation that it first appeared to be based on the initial comments. There is documentation, it's not actually that hard to find if you're working with the library, and the maintainers have stated their intention to work on improving the web-based documentation. Thanks, Stephen
Attachment
Hi, On 2022-02-01 15:12:28 -0500, Stephen Frost wrote: > The concern about the documentation not being easily available is > certainly something to consider. I remember in prior reviews not having > that much difficulty looking up documentation for functions I've definitely several times in the course of this thread asked for documentation about specific bits and there was none. And not just recently. > All that said, while have documentation on the web is nice and all, it > seems to still be in the source, at least when I grabbed NSPR locally > with apt-get source and looked at PR_Recv, I found: What I'm most concerned about is less the way individual functions work, and more a bit higher level things. Like e.g. about not being allowed to fork. Which has significant design implications given postgres' process model... I think some documentation has been re-uploaded in the last few days. I recall the content around https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS being gone too, last time I checked. > So, it's not the case that the documentation is completely gone and > utterly unavailable to those who are interested in it, it's just in the > source rather than being on a nicely formatted webpage. One can find it > on the web too, naturally: > https://github.com/thespooler/nspr/blob/29ba433ebceda269d2b0885176b7f8cd4c5c2c52/pr/include/prio.h#L1424 > (no idea what version that is, just found a random github repo with it, > but wouldn't be hard to import the latest version). It's last been updated 2015... There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I think the upstream source. A project without even a bare-minimal README at the root does have a "internal only" feel to it... Greetings, Andres Freund
On Tue, Feb 1, 2022 at 01:52:09PM -0800, Andres Freund wrote: > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I > think the upstream source. > > A project without even a bare-minimal README at the root does have a "internal > only" feel to it... I agree --- it is a library --- if they don't feel the need to publish the API, it seems to mean they want to maintain the ability to change it at any time, and therefore it is inappropriate for other software to rely on that API. This is not the same as Postgres extensions needing to read the Postgres source code --- they are an important but edge use case and we never saw the need to standardize or publish the internal functions that must be studied and adjusted possibly for major releases. This kind of feels like the Chrome JavaScript code that used to be able to be build separately for PL/v8, but has gotten much harder to do in the past few years. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 28.01.22 15:30, Robert Haas wrote: > I would really, really like to have an alternative to OpenSSL for PG. What are the reasons people want that? With OpenSSL 3, the main reasons -- license and FIPS support -- have gone away.
> On 3 Feb 2022, at 15:07, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 28.01.22 15:30, Robert Haas wrote: >> I would really, really like to have an alternative to OpenSSL for PG. > > What are the reasons people want that? With OpenSSL 3, the main reasons -- license and FIPS support -- have gone away. At least it will go away when OpenSSL 3 is FIPS certified, which is yet to happen (submitted, not processed). I see quite a few valid reasons to want an alternative, a few off the top of my head include: - Using trust stores like Keychain on macOS with Secure Transport. There is AFAIK something similar on Windows and NSS has it's certificate databases. Especially on client side libpq it would be quite nice to integrate with where certificates already are rather than rely on files on disks. - Not having to install OpenSSL, Schannel and Secure Transport would make life easier for packagers. - Simply having an alternative. The OpenSSL projects recent venture into writing transport protocols have made a lot of people worried over their bandwidth for fixing and supporting core features. Just my $0.02, everyones mileage varies on these. -- Daniel Gustafsson https://vmware.com/
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Feb 1, 2022 at 01:52:09PM -0800, Andres Freund wrote: > > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I > > think the upstream source. > > > > A project without even a bare-minimal README at the root does have a "internal > > only" feel to it... > > I agree --- it is a library --- if they don't feel the need to publish > the API, it seems to mean they want to maintain the ability to change it > at any time, and therefore it is inappropriate for other software to > rely on that API. This is really not a reasonable representation of how this library has been maintained historically nor is there any reason to think that their policy regarding the API has changed recently. They do have a documented API and that hasn't changed- it's just that it's not easily available in web-page form any longer and that's due to something independent of the library maintenance. They've also done a good job with maintaining the API as one would expect from a library and so this really isn't a reason to avoid using it. If there's actual specific examples of the API not being well maintained and causing issues then please point to them and we can discuss if that is a reason to consider not depending on NSS/NSPR. > This is not the same as Postgres extensions needing to read the Postgres > source code --- they are an important but edge use case and we never saw > the need to standardize or publish the internal functions that must be > studied and adjusted possibly for major releases. I agree that extensions and public libraries aren't entirely the same but I don't think it's all that unreasonable for developers that are using a library to look at the source code for that library when developing against it, that's certainly something I've done for a number of different libraries. > This kind of feels like the Chrome JavaScript code that used to be able > to be build separately for PL/v8, but has gotten much harder to do in > the past few years. This isn't at all like that case, where the maintainers made a very clear and intentional choice to make it quite difficult for packagers to pull v8 out to package it. Nothing like that has happened with NSS and there isn't any reason to think that it will based on what the maintainers have said and what they've done across the many years that NSS has been around. Thanks, Stephen
Attachment
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > > On 3 Feb 2022, at 15:07, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > > > On 28.01.22 15:30, Robert Haas wrote: > >> I would really, really like to have an alternative to OpenSSL for PG. > > > > What are the reasons people want that? With OpenSSL 3, the main reasons -- license and FIPS support -- have gone away. > > At least it will go away when OpenSSL 3 is FIPS certified, which is yet to > happen (submitted, not processed). > > I see quite a few valid reasons to want an alternative, a few off the top of my > head include: > > - Using trust stores like Keychain on macOS with Secure Transport. There is > AFAIK something similar on Windows and NSS has it's certificate databases. > Especially on client side libpq it would be quite nice to integrate with where > certificates already are rather than rely on files on disks. > > - Not having to install OpenSSL, Schannel and Secure Transport would make life > easier for packagers. > > - Simply having an alternative. The OpenSSL projects recent venture into > writing transport protocols have made a lot of people worried over their > bandwidth for fixing and supporting core features. > > Just my $0.02, everyones mileage varies on these. Yeah, agreed on all of these. Thanks, Stephen
Attachment
On 03.02.22 15:53, Daniel Gustafsson wrote: > I see quite a few valid reasons to want an alternative, a few off the top of my > head include: > > - Using trust stores like Keychain on macOS with Secure Transport. There is > AFAIK something similar on Windows and NSS has it's certificate databases. > Especially on client side libpq it would be quite nice to integrate with where > certificates already are rather than rely on files on disks. > > - Not having to install OpenSSL, Schannel and Secure Transport would make life > easier for packagers. Those are good reasons for Schannel and Secure Transport, less so for NSS. > - Simply having an alternative. The OpenSSL projects recent venture into > writing transport protocols have made a lot of people worried over their > bandwidth for fixing and supporting core features. If we want simply an alternative, we had a GnuTLS variant almost done a few years ago, but in the end people didn't want it enough. It seems to be similar now.
On Thu, Feb 3, 2022 at 2:16 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > If we want simply an alternative, we had a GnuTLS variant almost done a > few years ago, but in the end people didn't want it enough. It seems to > be similar now. Yeah. I think it's pretty clear that the only real downside of committing support for GnuTLS or NSS or anything else is that we then need to maintain that support (or eventually remove it). I don't really see a problem if Daniel wants to commit this, set up a few buildfarm animals, and fix stuff when it breaks. If he does that, I don't see that we're losing anything. But, if he commits it in the hope that other people are going to step up to do the maintenance work, maybe that's not going to happen, or at least not without grumbling. I'm not objecting to this being committed in the sense that I don't ever want to see it in the tree, but I'm also not volunteering to maintain it. As a philosophical matter, I don't think it's great for us - or the Internet in general - to be too dependent on OpenSSL. Software monocultures are not great, and OpenSSL has near-constant security updates and mediocre documentation. Now, maybe anything else we support will end up having similar issues, or worse. But if we and other projects are never willing to support anything but OpenSSL, then there will never be viable alternatives to OpenSSL, because a library that isn't actually used by the software you care about is of no use. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Feb 3, 2022 at 01:42:53PM -0500, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (bruce@momjian.us) wrote: > > On Tue, Feb 1, 2022 at 01:52:09PM -0800, Andres Freund wrote: > > > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I > > > think the upstream source. > > > > > > A project without even a bare-minimal README at the root does have a "internal > > > only" feel to it... > > > > I agree --- it is a library --- if they don't feel the need to publish > > the API, it seems to mean they want to maintain the ability to change it > > at any time, and therefore it is inappropriate for other software to > > rely on that API. > > This is really not a reasonable representation of how this library has > been maintained historically nor is there any reason to think that their > policy regarding the API has changed recently. They do have a > documented API and that hasn't changed- it's just that it's not easily > available in web-page form any longer and that's due to something > independent of the library maintenance. They've also done a good job So they have always been bad at providing an API, not just now, or that their web content disappeared and they haven't fixed it, for how long? I guess that is better than the v8 case, but not much. Is posting web content really that hard for them? > with maintaining the API as one would expect from a library and so this > really isn't a reason to avoid using it. If there's actual specific > examples of the API not being well maintained and causing issues then > please point to them and we can discuss if that is a reason to consider > not depending on NSS/NSPR. I have no specifics. > > This is not the same as Postgres extensions needing to read the Postgres > > source code --- they are an important but edge use case and we never saw > > the need to standardize or publish the internal functions that must be > > studied and adjusted possibly for major releases. > > I agree that extensions and public libraries aren't entirely the same > but I don't think it's all that unreasonable for developers that are > using a library to look at the source code for that library when > developing against it, that's certainly something I've done for a > number of different libraries. Wow, you have a much higher tolerance than I do. How do you even know which functions are the public API if you have to look at the source code? > > This kind of feels like the Chrome JavaScript code that used to be able > > to be build separately for PL/v8, but has gotten much harder to do in > > the past few years. > > This isn't at all like that case, where the maintainers made a very > clear and intentional choice to make it quite difficult for packagers to > pull v8 out to package it. Nothing like that has happened with NSS and > there isn't any reason to think that it will based on what the > maintainers have said and what they've done across the many years that > NSS has been around. As far as I know, the v8 developers didn't say anything, they just started moving things around to make it easier for them and harder for packagers --- and they didn't care. I frankly think we need some public statement from the NSS developers before moving forward --- there are just too many red flags here, and once we support it, it will be hard to remove support for it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Feb 3, 2022 at 02:33:37PM -0500, Robert Haas wrote: > As a philosophical matter, I don't think it's great for us - or the > Internet in general - to be too dependent on OpenSSL. Software > monocultures are not great, and OpenSSL has near-constant security > updates and mediocre documentation. Now, maybe anything else we I don't think it is fair to be criticizing OpenSSL for its mediocre documentation when the alternative being considered, NSS, has no public documentation. Can the source-code-defined NSS documentation be considered better than the mediocre OpenSSL public documentation? For the record, I do like the idea of adding NSS, but I am concerned about its long-term maintenance, we you explained. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Fri, Feb 4, 2022 at 1:22 PM Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Feb 3, 2022 at 02:33:37PM -0500, Robert Haas wrote: > > As a philosophical matter, I don't think it's great for us - or the > > Internet in general - to be too dependent on OpenSSL. Software > > monocultures are not great, and OpenSSL has near-constant security > > updates and mediocre documentation. Now, maybe anything else we > > I don't think it is fair to be criticizing OpenSSL for its mediocre > documentation when the alternative being considered, NSS, has no public > documentation. Can the source-code-defined NSS documentation be > considered better than the mediocre OpenSSL public documentation? I mean, I think it's fair to say that my experiences with trying to use the OpenSSL documentation have been poor. Admittedly it's been a few years now so maybe it's gotten better, but my experience was what it was. In one case, the function I needed wasn't documented at all, and I had to read the C code, which was weirdly-formatted and had no comments. That wasn't fun, and knowing that NSS could be an even worse experience doesn't retroactively turn that into a good one. > For the record, I do like the idea of adding NSS, but I am concerned > about its long-term maintenance, we you explained. It sounds like we come down in about the same place here, in the end. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 4, 2022 at 01:33:00PM -0500, Robert Haas wrote: > > I don't think it is fair to be criticizing OpenSSL for its mediocre > > documentation when the alternative being considered, NSS, has no public > > documentation. Can the source-code-defined NSS documentation be > > considered better than the mediocre OpenSSL public documentation? > > I mean, I think it's fair to say that my experiences with trying to > use the OpenSSL documentation have been poor. Admittedly it's been a > few years now so maybe it's gotten better, but my experience was what > it was. In one case, the function I needed wasn't documented at all, > and I had to read the C code, which was weirdly-formatted and had no > comments. That wasn't fun, and knowing that NSS could be an even worse > experience doesn't retroactively turn that into a good one. Oh, yeah, the OpenSSL documentation is verifiably mediocre. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
> On 4 Feb 2022, at 19:22, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Feb 3, 2022 at 02:33:37PM -0500, Robert Haas wrote: >> As a philosophical matter, I don't think it's great for us - or the >> Internet in general - to be too dependent on OpenSSL. Software >> monocultures are not great, and OpenSSL has near-constant security >> updates and mediocre documentation. Now, maybe anything else we > > I don't think it is fair to be criticizing OpenSSL for its mediocre > documentation when the alternative being considered, NSS, has no public > documentation. Can the source-code-defined NSS documentation.. Not that it will shift the needle either way, but to give credit where credit is due: Both NSS and NSPR are documented, and have been since they were published by Netscape in 1998. The documentation does lack things, and some parts are quite out of date. That's true and undisputed even by the projects themselves who state this: "It currently is very deprecated and likely incorrect or broken in many places". The recent issue was that Mozilla decided to remove all 3rd party projects (why they consider their own code 3rd party is a mystery to me) from their MDN site, and so NSS and NSPR were deleted with no replacement. This was said to be worked on but didn't happen and no docs were imported into the tree. When Daniel from curl (the other one, not I) complained, this caused enough momentum to get this work going and it's now been "done". NSS: https://firefox-source-docs.mozilla.org/security/nss/ NSPR: https://firefox-source-docs.mozilla.org/nspr/ I am writing done above in quotes, since the documentation also needs to be updated, completed, rewritten, organized etc etc. The above is an import of what was found, and is in a fairly poor state. Unfortunately, it's still not in the tree where I personally believe documentation stands the best chance of being kept up to date. The NSPR documentation is probably the best of the two, but it's also much less of a moving target. It is true that the documentation is poor and currently in bad shape with lots of broken links and heavily disorganized etc. It's also true that I managed to implement full libpq support without any crystal ball or help from the NSS folks. The latter doesn't mean we can brush documentation concerns aside, but let's be fair in our criticism. > ..be considered better than the mediocre OpenSSL public documentation? OpenSSL has gotten a lot better in recent years, it's still not great or where I would like it to be, but a lot better. -- Daniel Gustafsson https://vmware.com/
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, Feb 3, 2022 at 01:42:53PM -0500, Stephen Frost wrote: > > * Bruce Momjian (bruce@momjian.us) wrote: > > > On Tue, Feb 1, 2022 at 01:52:09PM -0800, Andres Freund wrote: > > > > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I > > > > think the upstream source. > > > > > > > > A project without even a bare-minimal README at the root does have a "internal > > > > only" feel to it... > > > > > > I agree --- it is a library --- if they don't feel the need to publish > > > the API, it seems to mean they want to maintain the ability to change it > > > at any time, and therefore it is inappropriate for other software to > > > rely on that API. > > > > This is really not a reasonable representation of how this library has > > been maintained historically nor is there any reason to think that their > > policy regarding the API has changed recently. They do have a > > documented API and that hasn't changed- it's just that it's not easily > > available in web-page form any longer and that's due to something > > independent of the library maintenance. They've also done a good job > > So they have always been bad at providing an API, not just now, or that > their web content disappeared and they haven't fixed it, for how long? > I guess that is better than the v8 case, but not much. Is posting web > content really that hard for them? To be clear, *part* of the web-based documentation disappeared and hasn't been replaced yet. The NSS-specific pieces are actually still available, it's the NSPR (which is a lower level library used by NSS) part that was removed from MDN and hasn't been brought back yet, but which does still exist as comments in the source of the library. > > with maintaining the API as one would expect from a library and so this > > really isn't a reason to avoid using it. If there's actual specific > > examples of the API not being well maintained and causing issues then > > please point to them and we can discuss if that is a reason to consider > > not depending on NSS/NSPR. > > I have no specifics. Then I don't understand where the claim you made that "it seems to mean they want to maintain the ability to change it at any time" has any merit. > > > This is not the same as Postgres extensions needing to read the Postgres > > > source code --- they are an important but edge use case and we never saw > > > the need to standardize or publish the internal functions that must be > > > studied and adjusted possibly for major releases. > > > > I agree that extensions and public libraries aren't entirely the same > > but I don't think it's all that unreasonable for developers that are > > using a library to look at the source code for that library when > > developing against it, that's certainly something I've done for a > > number of different libraries. > > Wow, you have a much higher tolerance than I do. How do you even know > which functions are the public API if you have to look at the source > code? Because... it's documented? They have public (and private) .h files in the source tree and the function declarations have large comment blocks above them which provide a documented API. I'm not talking about having to decipher from the actual C code what's going on but just reading the function header comment that provides the documentation of the API for each of the functions, and there's larger blocks of comments at the top of those .h files which provide more insight into how the functions in that particular part of the system work and interact with each other. Maybe those things would be better as separate README files like what we do, but maybe not, and I don't see it as a huge failing that they chose to use a big comment block at the top of their .h files to explain things rather than separate README files. Reading comments in code that I'm calling out to, even if it's in another library (or another part of PG where the README isn't helping me enough, or due to there not being a README for that particular thing) almost seems typical, to me anyway. Perhaps the exception being when there are good man pages. > I frankly think we need some public statement from the NSS developers > before moving forward --- there are just too many red flags here, and > once we support it, it will be hard to remove support for it. They have made public statements regarding this and it's been linked to already in this thread: https://github.com/mdn/content/issues/12471 where they explicitly state that the project is alive and maintained, further, it now now also links to this: https://bugzilla.mozilla.org/show_bug.cgi?id=1753127 Which certainly seems to have had a fair bit of action taken on it. Indeed, it looks like they've got a lot of the docs up and online now, including the documentation for the function that started much of this: https://firefox-source-docs.mozilla.org/nspr/reference/pr_recv.html#pr-recv Looks like they're still working out some of the kinks between the NSS pages and having links from them over to the NSPR pages, but a whole lot of progress sure looks like it's been made in pretty short order here. Definitely isn't looking unmaintained to me. Thanks, Stephen
Attachment
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, Feb 3, 2022 at 02:33:37PM -0500, Robert Haas wrote: > > As a philosophical matter, I don't think it's great for us - or the > > Internet in general - to be too dependent on OpenSSL. Software > > monocultures are not great, and OpenSSL has near-constant security > > updates and mediocre documentation. Now, maybe anything else we > > I don't think it is fair to be criticizing OpenSSL for its mediocre > documentation when the alternative being considered, NSS, has no public > documentation. Can the source-code-defined NSS documentation be > considered better than the mediocre OpenSSL public documentation? This simply isn't the case and wasn't even the case at the start of this thread. The NSPR documentation was only available through the header files due to it being taken down from MDN. The NSS documentation was actually still there. Looks like they've now (mostly) fixed the lack of NSPR documentation, as noted in the recent email that I sent. > For the record, I do like the idea of adding NSS, but I am concerned > about its long-term maintenance, we you explained. They've come out and explicitly said that the project is active and maintained, and they've been doing regular releases. I don't think there's really any reason to think that it's not being maintained at this point. Thanks, Stephen
Attachment
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > I am writing done above in quotes, since the documentation also needs to be > updated, completed, rewritten, organized etc etc. The above is an import of > what was found, and is in a fairly poor state. Unfortunately, it's still not > in the tree where I personally believe documentation stands the best chance of > being kept up to date. The NSPR documentation is probably the best of the two, > but it's also much less of a moving target. I wonder about the 'not in tree' bit since it is in the header files, certainly for NSPR which I've been poking at due to this discussion. I had hoped that they were generating the documentation on the webpage from what's in the header files, is that not the case then? Which is more accurate? If it's a simple matter of spending time going through what's in the tree and making sure what's online matches that, I suspect we could find some folks with time to work on helping them there. If the in-tree stuff isn't accurate then that's a bigger problem, of course. > It is true that the documentation is poor and currently in bad shape with lots > of broken links and heavily disorganized etc. It's also true that I managed to > implement full libpq support without any crystal ball or help from the NSS > folks. The latter doesn't mean we can brush documentation concerns aside, but > let's be fair in our criticism. Agreed. Thanks, Stephen
Attachment
> On 4 Feb 2022, at 21:03, Stephen Frost <sfrost@snowman.net> wrote: > I wonder about the 'not in tree' bit since it is in the header files, > certainly for NSPR which I've been poking at due to this discussion. What I meant was that the documentation on the website isn't published from documentation source code (in whichever format) residing in the tree. That being said, I take that back since I just now in a git pull found that they had done just that 6 days ago. It's just as messy and incomplete as what is currently on the web, important API's like NSS_InitContext are still not even mentioned more than in a release note, but I think it stands a better chance of success than before. > I had hoped that they were generating the documentation on the webpage from > what's in the header files, is that not the case then? Not from what I can tell no. -- Daniel Gustafsson https://vmware.com/