Thread: Serverside SNI support in libpq
SNI was brought up the discussions around the ALPN work, and I have had asks for it off-list, so I decided to dust off an old patch I started around the time we got client-side SNI support but never finished (until now). Since there is discussion and thinking around how we handle SSL right now I wanted to share this early even though it will be parked in the July CF for now. There are a few usecases for serverside SNI, allowing for completely disjoint CAs for different hostnames is one that has come up. Using strict SNI mode (elaborated on below) as a cross-host attack mitigation was mentioned in [0]. The attached patch adds serverside SNI support to libpq, it is still a bit rough around the edges but I'm sharing it early to make sure I'm not designing it in a direction that the community doesn't like. A new config file $datadir/pg_hosts.conf is used for configuring which certicate and key should be used for which hostname. The file is parsed in the same way as pg_ident et.al so it allows for the usual include type statements we support. A new GUC, ssl_snimode, is added which controls how the hostname TLS extension is handled. The possible values are off, default and strict: - off: pg_hosts.conf is not parsed and the hostname TLS extension is not inspected at all. The normal SSL GUCs for certificates and keys are used. - default: pg_hosts.conf is loaded as well as the normal GUCs. If no match for the TLS extension hostname is found in pg_hosts the cert and key from the postgresql.conf GUCs is used as the default (used as a wildcard host). - strict: only pg_hosts.conf is loaded and the TLS extension hostname MUST be passed and MUST have a match in the configuration, else the connection is refused. As of now the patch use default as the initial value for the GUC. The way multiple certificates are handled is that libpq creates one SSL_CTX for each at startup, and switch to the appropriate one when the connection is inspected. Configuration handling is done in secure-common to not tie it to a specific TLS backend (should we ever support more), but the validation of the config values is left for the TLS backend. There are a few known open items with this patch: * There are two OpenSSL callbacks which can be used to inspect the hostname TLS extension: SSL_CTX_set_tlsext_servername_callback and SSL_CTX_set_client_hello_cb. The documentation for the latter says you shouldn't use the former, and the docs for the former says you need it even if you use the latter. For now I'm using SSL_CTX_set_tlsext_servername_callback mainly because the OpenSSL tools themselves use that for SNI. * The documentation is not polished at all and will require a more work to make it passable I think. There are also lot's more testing that can be done, so far it's pretty basic. * I've so far only tested with OpenSSL and haven't yet verified how LibreSSL handles this. -- Daniel Gustafsson [0] https://www.postgresql.org/message-id/e782e9f4-a0cd-49f5-800b-5e32a1b29183%40eisentraut.org
Attachment
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: not tested This is an interesting feature on PostgreSQL server side where it can swap the certificate settings based on the incoming hostnames in SNI field in client hello message. I think this patch resonate with a patch I shared awhile ago ( https://commitfest.postgresql.org/48/4924/ ) that adds multiple certificate support on the libpq client side while this patch adds multiple certificate support on the server side. My patch allows user to supply multiple certs, keys, sslpasswords in comma separated format and the libpq client will pick one that matches the CA issuer names sent by the server. In relation with your patch, this CA issuer name would match the CA certificate configured in pg_hosts.cfg. I had a look at the patch and here's my comments: + <para> + <productname>PostgreSQL</productname> can be configured for + <acronym>SNI</acronym> using the <filename>pg_hosts.conf</filename> + configuration file. <productname>PostgreSQL</productname> inspects the TLS + hostname extension in the SSL connection handshake, and selects the right + SSL certificate, key and CA certificate to use for the connection. + </para> pg_hosts should also have sslpassword_command just like in the postgresql.conf in case the sslkey for a particular host is encrypted with a different password. + /* + * Install SNI TLS extension callback in case the server is configured to + * validate hostnames. + */ + if (ssl_snimode != SSL_SNIMODE_OFF) + SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb); If libpq client does not provide SNI, this callback will not be called, so there is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT, or possibly reject the connection even in strict mode. The TLS handshake in such case shall proceed and server will use the certificate specified in postgresql.conf (if these are loaded) to complete the handshake with the client. There is a comment in the patch that reads: > - strict: only pg_hosts.conf is loaded and the TLS extension hostname > MUST be passed and MUST have a match in the configuration, else the > connection is refused. I am not sure if it implies that if ssl_snimode is strict, then the normal ssl_cert, ssl_key and ca_cert…etc settings in postgresql.conf are ignored? thank you Cary Huang ------------- HighGo Software Inc. (Canada) cary.huang@highgo.ca www.highgo.ca
On Fri, May 10, 2024 at 7:23 AM Daniel Gustafsson <daniel@yesql.se> wrote: > The way multiple certificates are handled is that libpq creates one SSL_CTX for > each at startup, and switch to the appropriate one when the connection is > inspected. I fell in a rabbit hole while testing this patch, so this review isn't complete, but I don't want to delay it any more. I see a few possibly-related problems with the handling of SSL_context. The first is that reloading the server configuration doesn't reset the contexts list, so the server starts behaving in really strange ways the longer you test. That's an easy enough fix, but things got weirder when I did. Part of that weirdness is that SSL_context gets set to the last initialized context, so fallback doesn't always behave in a deterministic fashion. But we do have to set it to something, to create the SSL object itself... I tried patching all that, but I continue to see nondeterministic behavior, including the wrong certificate chain occasionally being served, and the servername callback being called twice for each connection (?!). Since I can't reproduce the weirdest bits under a debugger yet, I don't really know what's happening. Maybe my patches are buggy. Or maybe we're running into some chicken-and-egg madness? The order of operations looks like this: 1. Create a list of contexts, selecting one as an arbitrary default 2. Create an SSL object from our default context 3. During the servername_callback, reparent that SSL object (which has an active connection underway) to the actual context we want to use 4. Complete the connection It's step 3 that I'm squinting at. I wondered how, exactly, that worked in practice, and based on this issue the answer might be "not well": https://github.com/openssl/openssl/issues/6109 Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is fundamentally broken. So it might just be FUD, but I'm wondering if we should instead be using the SSL_ flavors of the API to reassign the certificate chain on the SSL pointer directly, inside the callback, instead of trying to set them indirectly via the SSL_CTX_ API. Have you seen any weird behavior like this on your end? I'm starting to doubt my test setup... On the plus side, I now have a handful of debugging patches for a future commitfest. Thanks, --Jacob
On Fri, May 24, 2024 at 12:55 PM Cary Huang <cary.huang@highgo.ca> wrote: > pg_hosts should also have sslpassword_command just like in the postgresql.conf in > case the sslkey for a particular host is encrypted with a different password. Good point. There is also the HBA-related handling of client certificate settings (such as pg_ident)... I really dislike that these things are governed by various different files, but I also feel like I'm opening up a huge can of worms by requesting nestable configurations. > + if (ssl_snimode != SSL_SNIMODE_OFF) > + SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb); > > If libpq client does not provide SNI, this callback will not be called, so there > is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT, > or possibly reject the connection even in strict mode. I'm mistrustful of my own test setup (see previous email to the thread), but I don't seem to be able to reproduce this. With sslsni=0 set, strict mode correctly shuts down the connection for me. Can you share your setup? (The behavior you describe might be a useful setting in practice, to let DBAs roll out strict protection for new clients gracefully without immediately blocking older ones.) Thanks, --Jacob
On Tue, Dec 3, 2024 at 5:58 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > Have you seen any weird behavior like this on your end? I'm starting > > to doubt my test setup... > > I've not been able to reproduce any behaviour like what you describe. Hm, v2 is different enough that I'm going to need to check my notes and try to reproduce again. At first glance, I am still seeing strange reload behavior (e.g. issuing `pg_ctl reload` a couple of times in a row leads to the server disappearing without any log messages indicating why). > > On the plus side, I now have a handful of > > debugging patches for a future commitfest. > > Do you have them handy for running tests on this version? I'll work on cleaning them up. I'd meant to contribute them individually by now, but I got a bit sidetracked... Thanks! --Jacob
> On 11 Dec 2024, at 01:34, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Dec 04, 2024 at 02:44:18PM +0100, Daniel Gustafsson wrote: >> No worries, I know you have a big path on your plate right now. The attached >> v3 fixes a small buglet in the tests and adds silly reload testing to try and >> stress out any issues. > > Looks like this still fails quite heavily in the CI.. You may want to > look at that. Interestingly enough the CFBot hasn't picked up that there are new version posted and the buildfailure is from the initial patch in the thread, which no longer applies (as the CFBot righly points out). I'll try posting another version later today to see if that gets it unstuck. -- Daniel Gustafsson