Thread: Serverside SNI support in libpq

Serverside SNI support in libpq

From
Daniel Gustafsson
Date:
SNI was brought up the discussions around the ALPN work, and I have had asks
for it off-list, so I decided to dust off an old patch I started around the
time we got client-side SNI support but never finished (until now).  Since
there is discussion and thinking around how we handle SSL right now I wanted to
share this early even though it will be parked in the July CF for now.  There
are a few usecases for serverside SNI, allowing for completely disjoint CAs for
different hostnames is one that has come up.  Using strict SNI mode (elaborated
on below) as a cross-host attack mitigation was mentioned in [0].

The attached patch adds serverside SNI support to libpq, it is still a bit
rough around the edges but I'm sharing it early to make sure I'm not designing
it in a direction that the community doesn't like.  A new config file
$datadir/pg_hosts.conf is used for configuring which certicate and key should
be used for which hostname.  The file is parsed in the same way as pg_ident
et.al so it allows for the usual include type statements we support.  A new
GUC, ssl_snimode, is added which controls how the hostname TLS extension is
handled.  The possible values are off, default and strict:

      - off: pg_hosts.conf is not parsed and the hostname TLS extension is
        not inspected at all. The normal SSL GUCs for certificates and keys
        are used.
      - default: pg_hosts.conf is loaded as well as the normal GUCs. If no
        match for the TLS extension hostname is found in pg_hosts the cert
        and key from the postgresql.conf GUCs is used as the default (used
        as a wildcard host).
      - strict: only pg_hosts.conf is loaded and the TLS extension hostname
        MUST be passed and MUST have a match in the configuration, else the
        connection is refused.

As of now the patch use default as the initial value for the GUC.

The way multiple certificates are handled is that libpq creates one SSL_CTX for
each at startup, and switch to the appropriate one when the connection is
inspected.  Configuration handling is done in secure-common to not tie it to a
specific TLS backend (should we ever support more), but the validation of the
config values is left for the TLS backend.

There are a few known open items with this patch:

* There are two OpenSSL callbacks which can be used to inspect the hostname TLS
extension: SSL_CTX_set_tlsext_servername_callback and
SSL_CTX_set_client_hello_cb.  The documentation for the latter says you
shouldn't use the former, and the docs for the former says you need it even if
you use the latter.  For now I'm using SSL_CTX_set_tlsext_servername_callback
mainly because the OpenSSL tools themselves use that for SNI.

* The documentation is not polished at all and will require a more work to make
it passable I think.  There are also lot's more testing that can be done, so
far it's pretty basic.

* I've so far only tested with OpenSSL and haven't yet verified how LibreSSL
handles this.

--
Daniel Gustafsson

[0] https://www.postgresql.org/message-id/e782e9f4-a0cd-49f5-800b-5e32a1b29183%40eisentraut.org


Attachment

Re: Serverside SNI support in libpq

From
Cary Huang
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

This is an interesting feature on PostgreSQL server side where it can swap the
certificate settings based on the incoming hostnames in SNI field in client
hello message.

I think this patch resonate with a patch I shared awhile ago
( https://commitfest.postgresql.org/48/4924/ ) that adds multiple certificate
support on the libpq client side while this patch adds multiple certificate
support on the server side. My patch allows user to supply multiple certs, keys,
sslpasswords in comma separated format and the libpq client will pick one that
matches the CA issuer names sent by the server. In relation with your patch,
this CA issuer name would match the CA certificate configured in pg_hosts.cfg.

I had a look at the patch and here's my comments:

+   <para>
+    <productname>PostgreSQL</productname> can be configured for
+    <acronym>SNI</acronym> using the <filename>pg_hosts.conf</filename>
+    configuration file. <productname>PostgreSQL</productname> inspects the TLS
+    hostname extension in the SSL connection handshake, and selects the right
+    SSL certificate, key and CA certificate to use for the connection.
+   </para>

pg_hosts should also have sslpassword_command just like in the postgresql.conf in
case the sslkey for a particular host is encrypted with a different password.

+    /*
+     * Install SNI TLS extension callback in case the server is configured to
+     * validate hostnames.
+     */
+    if (ssl_snimode != SSL_SNIMODE_OFF)
+        SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb);

If libpq client does not provide SNI, this callback will not be called, so there
is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT,
or possibly reject the connection even in strict mode. The TLS handshake in such
case shall proceed and server will use the certificate specified in
postgresql.conf (if these are loaded) to complete the handshake with the client.
There is a comment in the patch that reads:

>  - strict: only pg_hosts.conf is loaded and the TLS extension hostname
>   MUST be passed and MUST have a match in the configuration, else the
>  connection is refused.

I am not sure if it implies that if ssl_snimode is strict, then the normal ssl_cert,
ssl_key and ca_cert…etc settings in postgresql.conf are ignored?

thank you

Cary Huang
-------------
HighGo Software Inc. (Canada)
cary.huang@highgo.ca
www.highgo.ca

Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Fri, May 10, 2024 at 7:23 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> The way multiple certificates are handled is that libpq creates one SSL_CTX for
> each at startup, and switch to the appropriate one when the connection is
> inspected.

I fell in a rabbit hole while testing this patch, so this review isn't
complete, but I don't want to delay it any more. I see a few
possibly-related problems with the handling of SSL_context.

The first is that reloading the server configuration doesn't reset the
contexts list, so the server starts behaving in really strange ways
the longer you test. That's an easy enough fix, but things got weirder
when I did. Part of that weirdness is that SSL_context gets set to the
last initialized context, so fallback doesn't always behave in a
deterministic fashion. But we do have to set it to something, to
create the SSL object itself...

I tried patching all that, but I continue to see nondeterministic
behavior, including the wrong certificate chain occasionally being
served, and the servername callback being called twice for each
connection (?!).

Since I can't reproduce the weirdest bits under a debugger yet, I
don't really know what's happening. Maybe my patches are buggy. Or
maybe we're running into some chicken-and-egg madness? The order of
operations looks like this:

1. Create a list of contexts, selecting one as an arbitrary default
2. Create an SSL object from our default context
3. During the servername_callback, reparent that SSL object (which has
an active connection underway) to the actual context we want to use
4. Complete the connection

It's step 3 that I'm squinting at. I wondered how, exactly, that
worked in practice, and based on this issue the answer might be "not
well":

    https://github.com/openssl/openssl/issues/6109

Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is
fundamentally broken. So it might just be FUD, but I'm wondering if we
should instead be using the SSL_ flavors of the API to reassign the
certificate chain on the SSL pointer directly, inside the callback,
instead of trying to set them indirectly via the SSL_CTX_ API.

Have you seen any weird behavior like this on your end? I'm starting
to doubt my test setup... On the plus side, I now have a handful of
debugging patches for a future commitfest.

Thanks,
--Jacob



Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Fri, May 24, 2024 at 12:55 PM Cary Huang <cary.huang@highgo.ca> wrote:
> pg_hosts should also have sslpassword_command just like in the postgresql.conf in
> case the sslkey for a particular host is encrypted with a different password.

Good point. There is also the HBA-related handling of client
certificate settings (such as pg_ident)...

I really dislike that these things are governed by various different
files, but I also feel like I'm opening up a huge can of worms by
requesting nestable configurations.

> +       if (ssl_snimode != SSL_SNIMODE_OFF)
> +               SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb);
>
> If libpq client does not provide SNI, this callback will not be called, so there
> is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT,
> or possibly reject the connection even in strict mode.

I'm mistrustful of my own test setup (see previous email to the
thread), but I don't seem to be able to reproduce this. With sslsni=0
set, strict mode correctly shuts down the connection for me. Can you
share your setup?

(The behavior you describe might be a useful setting in practice, to
let DBAs roll out strict protection for new clients gracefully without
immediately blocking older ones.)

Thanks,
--Jacob



Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Tue, Dec 3, 2024 at 5:58 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> > Have you seen any weird behavior like this on your end? I'm starting
> > to doubt my test setup...
>
> I've not been able to reproduce any behaviour like what you describe.

Hm, v2 is different enough that I'm going to need to check my notes
and try to reproduce again. At first glance, I am still seeing strange
reload behavior (e.g. issuing `pg_ctl reload` a couple of times in a
row leads to the server disappearing without any log messages
indicating why).

> > On the plus side, I now have a handful of
> > debugging patches for a future commitfest.
>
> Do you have them handy for running tests on this version?

I'll work on cleaning them up. I'd meant to contribute them
individually by now, but I got a bit sidetracked...

Thanks!
--Jacob



Re: Serverside SNI support in libpq

From
Daniel Gustafsson
Date:
> On 11 Dec 2024, at 01:34, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 04, 2024 at 02:44:18PM +0100, Daniel Gustafsson wrote:
>> No worries, I know you have a big path on your plate right now.  The attached
>> v3 fixes a small buglet in the tests and adds silly reload testing to try and
>> stress out any issues.
>
> Looks like this still fails quite heavily in the CI..  You may want to
> look at that.

Interestingly enough the CFBot hasn't picked up that there are new version
posted and the buildfailure is from the initial patch in the thread, which no
longer applies (as the CFBot righly points out).  I'll try posting another
version later today to see if that gets it unstuck.

--
Daniel Gustafsson