Thread: Serverside SNI support in libpq

Serverside SNI support in libpq

From
Daniel Gustafsson
Date:
SNI was brought up the discussions around the ALPN work, and I have had asks
for it off-list, so I decided to dust off an old patch I started around the
time we got client-side SNI support but never finished (until now).  Since
there is discussion and thinking around how we handle SSL right now I wanted to
share this early even though it will be parked in the July CF for now.  There
are a few usecases for serverside SNI, allowing for completely disjoint CAs for
different hostnames is one that has come up.  Using strict SNI mode (elaborated
on below) as a cross-host attack mitigation was mentioned in [0].

The attached patch adds serverside SNI support to libpq, it is still a bit
rough around the edges but I'm sharing it early to make sure I'm not designing
it in a direction that the community doesn't like.  A new config file
$datadir/pg_hosts.conf is used for configuring which certicate and key should
be used for which hostname.  The file is parsed in the same way as pg_ident
et.al so it allows for the usual include type statements we support.  A new
GUC, ssl_snimode, is added which controls how the hostname TLS extension is
handled.  The possible values are off, default and strict:

      - off: pg_hosts.conf is not parsed and the hostname TLS extension is
        not inspected at all. The normal SSL GUCs for certificates and keys
        are used.
      - default: pg_hosts.conf is loaded as well as the normal GUCs. If no
        match for the TLS extension hostname is found in pg_hosts the cert
        and key from the postgresql.conf GUCs is used as the default (used
        as a wildcard host).
      - strict: only pg_hosts.conf is loaded and the TLS extension hostname
        MUST be passed and MUST have a match in the configuration, else the
        connection is refused.

As of now the patch use default as the initial value for the GUC.

The way multiple certificates are handled is that libpq creates one SSL_CTX for
each at startup, and switch to the appropriate one when the connection is
inspected.  Configuration handling is done in secure-common to not tie it to a
specific TLS backend (should we ever support more), but the validation of the
config values is left for the TLS backend.

There are a few known open items with this patch:

* There are two OpenSSL callbacks which can be used to inspect the hostname TLS
extension: SSL_CTX_set_tlsext_servername_callback and
SSL_CTX_set_client_hello_cb.  The documentation for the latter says you
shouldn't use the former, and the docs for the former says you need it even if
you use the latter.  For now I'm using SSL_CTX_set_tlsext_servername_callback
mainly because the OpenSSL tools themselves use that for SNI.

* The documentation is not polished at all and will require a more work to make
it passable I think.  There are also lot's more testing that can be done, so
far it's pretty basic.

* I've so far only tested with OpenSSL and haven't yet verified how LibreSSL
handles this.

--
Daniel Gustafsson

[0] https://www.postgresql.org/message-id/e782e9f4-a0cd-49f5-800b-5e32a1b29183%40eisentraut.org


Attachment

Re: Serverside SNI support in libpq

From
Cary Huang
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

This is an interesting feature on PostgreSQL server side where it can swap the
certificate settings based on the incoming hostnames in SNI field in client
hello message.

I think this patch resonate with a patch I shared awhile ago
( https://commitfest.postgresql.org/48/4924/ ) that adds multiple certificate
support on the libpq client side while this patch adds multiple certificate
support on the server side. My patch allows user to supply multiple certs, keys,
sslpasswords in comma separated format and the libpq client will pick one that
matches the CA issuer names sent by the server. In relation with your patch,
this CA issuer name would match the CA certificate configured in pg_hosts.cfg.

I had a look at the patch and here's my comments:

+   <para>
+    <productname>PostgreSQL</productname> can be configured for
+    <acronym>SNI</acronym> using the <filename>pg_hosts.conf</filename>
+    configuration file. <productname>PostgreSQL</productname> inspects the TLS
+    hostname extension in the SSL connection handshake, and selects the right
+    SSL certificate, key and CA certificate to use for the connection.
+   </para>

pg_hosts should also have sslpassword_command just like in the postgresql.conf in
case the sslkey for a particular host is encrypted with a different password.

+    /*
+     * Install SNI TLS extension callback in case the server is configured to
+     * validate hostnames.
+     */
+    if (ssl_snimode != SSL_SNIMODE_OFF)
+        SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb);

If libpq client does not provide SNI, this callback will not be called, so there
is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT,
or possibly reject the connection even in strict mode. The TLS handshake in such
case shall proceed and server will use the certificate specified in
postgresql.conf (if these are loaded) to complete the handshake with the client.
There is a comment in the patch that reads:

>  - strict: only pg_hosts.conf is loaded and the TLS extension hostname
>   MUST be passed and MUST have a match in the configuration, else the
>  connection is refused.

I am not sure if it implies that if ssl_snimode is strict, then the normal ssl_cert,
ssl_key and ca_cert…etc settings in postgresql.conf are ignored?

thank you

Cary Huang
-------------
HighGo Software Inc. (Canada)
cary.huang@highgo.ca
www.highgo.ca

Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Fri, May 10, 2024 at 7:23 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> The way multiple certificates are handled is that libpq creates one SSL_CTX for
> each at startup, and switch to the appropriate one when the connection is
> inspected.

I fell in a rabbit hole while testing this patch, so this review isn't
complete, but I don't want to delay it any more. I see a few
possibly-related problems with the handling of SSL_context.

The first is that reloading the server configuration doesn't reset the
contexts list, so the server starts behaving in really strange ways
the longer you test. That's an easy enough fix, but things got weirder
when I did. Part of that weirdness is that SSL_context gets set to the
last initialized context, so fallback doesn't always behave in a
deterministic fashion. But we do have to set it to something, to
create the SSL object itself...

I tried patching all that, but I continue to see nondeterministic
behavior, including the wrong certificate chain occasionally being
served, and the servername callback being called twice for each
connection (?!).

Since I can't reproduce the weirdest bits under a debugger yet, I
don't really know what's happening. Maybe my patches are buggy. Or
maybe we're running into some chicken-and-egg madness? The order of
operations looks like this:

1. Create a list of contexts, selecting one as an arbitrary default
2. Create an SSL object from our default context
3. During the servername_callback, reparent that SSL object (which has
an active connection underway) to the actual context we want to use
4. Complete the connection

It's step 3 that I'm squinting at. I wondered how, exactly, that
worked in practice, and based on this issue the answer might be "not
well":

    https://github.com/openssl/openssl/issues/6109

Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is
fundamentally broken. So it might just be FUD, but I'm wondering if we
should instead be using the SSL_ flavors of the API to reassign the
certificate chain on the SSL pointer directly, inside the callback,
instead of trying to set them indirectly via the SSL_CTX_ API.

Have you seen any weird behavior like this on your end? I'm starting
to doubt my test setup... On the plus side, I now have a handful of
debugging patches for a future commitfest.

Thanks,
--Jacob



Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Fri, May 24, 2024 at 12:55 PM Cary Huang <cary.huang@highgo.ca> wrote:
> pg_hosts should also have sslpassword_command just like in the postgresql.conf in
> case the sslkey for a particular host is encrypted with a different password.

Good point. There is also the HBA-related handling of client
certificate settings (such as pg_ident)...

I really dislike that these things are governed by various different
files, but I also feel like I'm opening up a huge can of worms by
requesting nestable configurations.

> +       if (ssl_snimode != SSL_SNIMODE_OFF)
> +               SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb);
>
> If libpq client does not provide SNI, this callback will not be called, so there
> is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT,
> or possibly reject the connection even in strict mode.

I'm mistrustful of my own test setup (see previous email to the
thread), but I don't seem to be able to reproduce this. With sslsni=0
set, strict mode correctly shuts down the connection for me. Can you
share your setup?

(The behavior you describe might be a useful setting in practice, to
let DBAs roll out strict protection for new clients gracefully without
immediately blocking older ones.)

Thanks,
--Jacob



Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Tue, Dec 3, 2024 at 5:58 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> > Have you seen any weird behavior like this on your end? I'm starting
> > to doubt my test setup...
>
> I've not been able to reproduce any behaviour like what you describe.

Hm, v2 is different enough that I'm going to need to check my notes
and try to reproduce again. At first glance, I am still seeing strange
reload behavior (e.g. issuing `pg_ctl reload` a couple of times in a
row leads to the server disappearing without any log messages
indicating why).

> > On the plus side, I now have a handful of
> > debugging patches for a future commitfest.
>
> Do you have them handy for running tests on this version?

I'll work on cleaning them up. I'd meant to contribute them
individually by now, but I got a bit sidetracked...

Thanks!
--Jacob



Re: Serverside SNI support in libpq

From
Daniel Gustafsson
Date:
> On 11 Dec 2024, at 01:34, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 04, 2024 at 02:44:18PM +0100, Daniel Gustafsson wrote:
>> No worries, I know you have a big path on your plate right now.  The attached
>> v3 fixes a small buglet in the tests and adds silly reload testing to try and
>> stress out any issues.
>
> Looks like this still fails quite heavily in the CI..  You may want to
> look at that.

Interestingly enough the CFBot hasn't picked up that there are new version
posted and the buildfailure is from the initial patch in the thread, which no
longer applies (as the CFBot righly points out).  I'll try posting another
version later today to see if that gets it unstuck.

--
Daniel Gustafsson




Re: Serverside SNI support in libpq

From
Daniel Gustafsson
Date:
Attached is a rebase which fixes a few smaller things (and a pgperltidy run);
and adds a paragraph to the docs about how HBA clientname settings can't be
made per certificate set in an SNI config.  As discussed with Jacob offlist,
there might be a case for supporting that but it will be a niche usecase within
a niche feature, so rather than complicating the code for something which might
never be used, it's likely better to document it and await feedback.

Are there any blockers for getting this in?

--
Daniel Gustafsson


Attachment

Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Wed, Feb 19, 2025 at 3:13 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> Are there any blockers for getting this in?

> +           SSL_context = ssl_init_context(isServerStart, host);

I'm still not quite following the rationale behind the SSL_context
assignment. To maybe illustrate, attached are some tests that I
expected to pass, but don't.

After adding an additional host and reloading the config, the behavior
of the original fallback host seems to change. Am I misunderstanding
the designed fallback behavior, have I misdesigned my test, or is this
a bug?

Thanks,
--Jacob

Attachment

Re: Serverside SNI support in libpq

From
Daniel Gustafsson
Date:
> On 24 Feb 2025, at 22:51, Jacob Champion <jacob.champion@enterprisedb.com> wrote:
>
> On Wed, Feb 19, 2025 at 3:13 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>> Are there any blockers for getting this in?
>
>> +           SSL_context = ssl_init_context(isServerStart, host);
>
> I'm still not quite following the rationale behind the SSL_context
> assignment. To maybe illustrate, attached are some tests that I
> expected to pass, but don't.
>
> After adding an additional host and reloading the config, the behavior
> of the original fallback host seems to change. Am I misunderstanding
> the designed fallback behavior, have I misdesigned my test, or is this
> a bug?

Thanks for the tests, they did in fact uncover a bug in how fallback was
handled which is now fixed.  In doing so I revamped how the default context
handling is done, it now always use the GUCs in postgresql.conf for
consistency.  The attached v6 rebase contains this as well as your tests as
well as general cleanup and comment writing etc.

--
Daniel Gustafsson


Attachment

Re: Serverside SNI support in libpq

From
Jacob Champion
Date:
On Thu, Feb 27, 2025 at 5:38 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> Thanks for the tests, they did in fact uncover a bug in how fallback was
> handled which is now fixed.  In doing so I revamped how the default context
> handling is done, it now always use the GUCs in postgresql.conf for
> consistency.  The attached v6 rebase contains this as well as your tests as
> well as general cleanup and comment writing etc.

Great, thanks!

Revisiting my concerns from upthread:

On Thu, Jul 25, 2024 at 10:51 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> I tried patching all that, but I continue to see nondeterministic
> behavior, including the wrong certificate chain occasionally being
> served, and the servername callback being called twice for each
> connection (?!).

1) The wrong chain being served was due to the fallback bug, now fixed.
2) The servername callback happening twice is due to the TLS 1.3
HelloRetryRequest problem with our ssl_groups (which reminded me to
ping that thread [1]). Switching to TLSv1.2 in order to more easily
see the handshake on the wire makes the problem go away, which
probably did not help my sense of growing insanity last July.

>     https://github.com/openssl/openssl/issues/6109
>
> Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is
> fundamentally broken.

We briefly talked about this in Brussels, and I've been trying to find
proof. Attached are some (very rough) tests that might highlight an
issue.

Basically, the new tests set up three hosts in pg_hosts.conf: one with
no client CA, one with a valid client CA, and one with a
malfunctioning CA (root+server_ca, which can't verify our client
certs). Then it switches out the default CA underneath to make sure it
does not affect the visible behavior, since that CA should not
actually be used in the end.

Unfortunately, the failure modes change depending on the default CA.
If it's not a bug in my tests, I think this may be an indication that
SSL_set_SSL_CTX() isn't fully switching out the client verification
behavior? For example, if the default CA isn't set, the other hosts
don't appear to ask for a client certificate even if they need one.
And vice versa.

--

> +           /*
> +            * Set flag to remember whether CA store has been loaded into this
> +            * SSL_context.
> +            */
> +           if (host->ssl_ca)

I think this should be `if (host->ssl_ca[0])`  -- which, incidentally,
fixes one of the new failing tests on my machine.

>  int
>  be_tls_init(bool isServerStart)
> +{
> +   SSL_CTX    *ctx;
> +   List       *sni_hosts = NIL;
> +   HostsLine   line;

A pointer to `line` is passed down to ssl_init_context(), but it's
only been partially initialized on the stack. Can it be
zero-initialized here instead?

> +       if (ssl_snimode == SSL_SNIMODE_STRICT)
> +       {
> +           ereport(COMMERROR,
> +                   (errcode(ERRCODE_PROTOCOL_VIOLATION),
> +                    errmsg("no hostname provided in callback")));
> +           return SSL_TLSEXT_ERR_ALERT_FATAL;
> +       }

At the moment we're sending an `unrecognized_name` alert in strict
mode if the client doesn't send SNI. RFC 8446 suggests
`missing_extension`:

   Additionally, all implementations MUST support the use of the
   "server_name" extension with applications capable of using it.
   Servers MAY require clients to send a valid "server_name" extension.
   Servers requiring this extension SHOULD respond to a ClientHello
   lacking a "server_name" extension by terminating the connection with
   a "missing_extension" alert.

Should we do that, or should we ignore the suggestion? The problem
with missing_extension, IMO, is that there's absolutely no indication
to the client as to which extension is missing. unrecognized_name is a
little confusing in this case (there was no name sent), but at least
the end user will be able to link that to an SNI problem via search
engine.

> +#hosts_file = 'ConfigDir/pg_hosts.conf' # hosts configuration file
> +                   # (change requires restart)

Nitpickiest nitpick: looks like the other lines use a tab instead of a
space between the setting and the trailing comment.

Thanks,
--Jacob

[1] https://postgr.es/m/CAOYmi%2BnTwu7%3DaUGCkf6L-ULqS8itNP7uc9nUmNLOvbXf2TCgBA%40mail.gmail.com

Attachment