Thread: Experiments with Postgres and SSL

Experiments with Postgres and SSL

From

Greg Stark

Date:

19 January 2023, 03:15:24

I had a conversation a while back with Heikki where he expressed that
it was annoying that we negotiate SSL/TLS the way we do since it
introduces an extra round trip. Aside from the performance
optimization I think accepting standard TLS connections would open the
door to a number of other opportunities that would be worth it on
their own.

So I took a look into what it would take to do and I think it would
actually be quite feasible. The first byte of a standard TLS
connection can't look anything like the first byte of any flavour of
Postgres startup packet because it would be the high order bits of the
length so unless we start having multi-megabyte startup packets....

So I put together a POC patch and it's working quite well and didn't
require very much kludgery. Well, it required some but it's really not
bad. I do have a bug I'm still trying to work out and the code isn't
quite in committable form but I can send the POC patch.

Other things it would open the door to in order from least
controversial to most....

* Hiding Postgres behind a standard SSL proxy terminating SSL without
implementing the Postgres protocol.

* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").

* Browser-based protocol implementations using websockets for things
like pgadmin or other tools to connect directly to postgres using
Postgres wire protocol but using native SSL implementations.

* Postgres could even implement an HTTP based version of its protocol
and enable things like queries or browser based tools using straight
up HTTP requests so they don't need to use websockets.

* Postgres could implement other protocols to serve up data like
status queries or monitoring metrics, using HTTP based standard
protocols instead of using our own protocol.

Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.

--
greg

Re: Experiments with Postgres and SSL

From

Andrey Borodin

Date:

19 January 2023, 05:45:15

On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
>
> So I took a look into what it would take to do and I think it would
> actually be quite feasible. The first byte of a standard TLS
> connection can't look anything like the first byte of any flavour of
> Postgres startup packet because it would be the high order bits of the
> length so unless we start having multi-megabyte startup packets....
>

This is a fascinating idea! I like it a lot.
But..do we have to treat any unknown start sequence of bytes as a TLS
connection? Or is there some definite subset of possible first bytes
that clearly indicates that this is a TLS connection or not?

Best regards, Andrey Borodin.

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

19 January 2023, 17:07:41

On Thu, 19 Jan 2023 at 00:45, Andrey Borodin <amborodin86@gmail.com> wrote:

> But..do we have to treat any unknown start sequence of bytes as a TLS
> connection? Or is there some definite subset of possible first bytes
> that clearly indicates that this is a TLS connection or not?

Absolutely not, there's only one MessageType that can initiate a
connection, ClientHello, so the initial byte has to be a specific
value. (0x16)

And probably to implement HTTP/Websocket it would probably only peek
at the first byte and check for things like G(ET) and H(EAD) and so
on, possibly only over SSL but in theory it could be over any
connection if the request comes before the startup packet.

Personally I'm motivated by wanting to implement status and monitoring
data for things like Prometheus and the like. For that it would just
be simple GET queries to recognize. But tunneling pg wire protocol
over websockets sounds cool but not really something I know a lot
about. I note that Neon is doing something similar with a proxy:
https://neon.tech/blog/serverless-driver-for-postgres

--
greg

Re: Experiments with Postgres and SSL

From

Vladimir Sitnikov

Date:

19 January 2023, 20:49:30

It would be great if PostgreSQL supported 'start with TLS', however, how could clients activate the feature?

I would like to refrain users from configuring the handshake mode, and I would like to refrain from degrading performance when a new client talks to an old database.

What if the server that supports 'fast TLS' added an extra notification in case client connects with a classic TLS?

Then a capable client could remember host:port and try with newer TLS appoach the next time it connects.

It would be transparent to the clients, and the users won't need to configure 'prefer classic or fast TLS'

The old clients could discard the notification.

Vladimir

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

19 January 2023, 23:44:22

On Thu, 19 Jan 2023 at 15:49, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
>
> What if the server that supports 'fast TLS' added an extra notification in case client connects with a classic TLS?
> Then a capable client could remember host:port and try with newer TLS appoach the next time it connects.
>
> It would be transparent to the clients, and the users won't need to configure 'prefer classic or fast TLS'
> The old clients could discard the notification.

Hm. I hadn't really thought about the case of a new client connecting
to an old server. I don't think it's worth implementing a code path in
the server like this as it would then become cruft that would be hard
to ever get rid of.

I think you can do the same thing, more or less, in the client. Like
if the driver tries to connect via SSL and gets an error it remembers
that host/port and connects using negotiation in the future.

In practice though, by the time drivers support this it'll probably be
far enough in the future that they can just enable it and you can
disable it if you're connecting to an old server. The main benefit for
the near term is going to be clients that are specifically designed to
take advantage of it because it's necessary to enable the environment
they need -- like monitoring tools and proxies.

I've attached the POC. It's not near committable, mainly because of
the lack of any proper interface to the added fields in Port. I
actually had a whole API but ripped it out while debugging because it
wasn't working out.

But here's an example of psql connecting to the same server via
negotiated SSL or through stunnel where stunnel establishes the SSL
connection and psql is just doing plain text:

stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:9432/postgres'
psql (16devel)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384,
compression: off)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version |         cipher         | bits | client_dn |
client_serial | issuer_dn
-------+-----+---------+------------------------+------+-----------+---------------+-----------
 48771 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |           |
            |
(1 row)

postgres=# \q
stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:8999/postgres'
psql (16devel)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version |         cipher         | bits | client_dn |
client_serial | issuer_dn
-------+-----+---------+------------------------+------+-----------+---------------+-----------
 48797 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |           |
            |
(1 row)

-- 
greg

Attachment

v1-0001-Direct-SSL-Connections.patch

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

20 January 2023, 01:28:18

On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
> I had a conversation a while back with Heikki where he expressed that
> it was annoying that we negotiate SSL/TLS the way we do since it
> introduces an extra round trip. Aside from the performance
> optimization I think accepting standard TLS connections would open the
> door to a number of other opportunities that would be worth it on
> their own.

Nice! I want this too, but for security reasons [1] -- I want to be
able to turn off negotiated (explicit) TLS, to force (implicit)
TLS-only mode.

> Other things it would open the door to in order from least
> controversial to most....
>
> * Hiding Postgres behind a standard SSL proxy terminating SSL without
> implementing the Postgres protocol.

+1

> * "Service Mesh" type tools that hide multiple services behind a
> single host/port ("Service Mesh" is just a new buzzword for "proxy").

If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].

ALPN doesn't prevent cross-port attacks though, and speaking of those...

> * Browser-based protocol implementations using websockets for things
> like pgadmin or other tools to connect directly to postgres using
> Postgres wire protocol but using native SSL implementations.
>
> * Postgres could even implement an HTTP based version of its protocol
> and enable things like queries or browser based tools using straight
> up HTTP requests so they don't need to use websockets.
>
> * Postgres could implement other protocols to serve up data like
> status queries or monitoring metrics, using HTTP based standard
> protocols instead of using our own protocol.

I see big red warning lights going off in my head -- in a previous
life, I got to fix vulnerabilities that resulted from bolting HTTP
onto existing protocol servers. Not only do you opt into the browser
security model forever, you also gain the ability to speak for any
other web server already running on the same host.

(I know you have PG committers who are also HTTP experts, and I think
you were hacking on mod_perl well before I knew web servers existed.
Just... please be careful. ;D )

> Incidentally I find the logic in ProcessStartupPacket incredibly
> confusing. It took me a while before I realized it's using tail
> recursion to implement the startup logic. I think it would be way more
> straightforward and extensible if it used a much more common iterative
> style. I think it would make it possible to keep more state than just
> ssl_done and gss_done without changing the function signature every
> time for example.

+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.

Thanks!
--Jacob

[1] https://www.postgresql.org/message-id/flat/fcc3ebeb7f05775b63f3207ed52a54ea5d17fb42.camel%40vmware.com
[2] https://alpaca-attack.com/

Re: Experiments with Postgres and SSL

From

Vladimir Sitnikov

Date:

20 January 2023, 06:40:19

>I don't think it's worth implementing a code path in
> the server like this as it would then become cruft that would be hard
> to ever get rid of.

Do you think the server can de-support the old code path soon?

> I think you can do the same thing, more or less, in the client. Like
> if the driver tries to connect via SSL and gets an error it remembers
> that host/port and connects using negotiation in the future.

Well, I doubt everybody would instantaneously upgrade to the database that supports fast TLS,
so there will be a timeframe when there will be a lot of old databases, and the clients will be new.
In that case, going with "try fast, ignore exception" would degrade performance for old databases.

I see you suggest caching, however, "degrading one of the cases" might be more painful than

"not improving one of the cases".

I would like to refrain from implementing "parallel connect both ways and check which is faster" in

PG clients (e.g. https://en.wikipedia.org/wiki/Happy_Eyeballs ).

Just wondering: do you consider back-porting the feature to all supported DB versions?

> In practice though, by the time drivers support this it'll probably be
> far enough in the future

I think drivers release more often than the database, and we can get driver support even before the database releases.

I'm from pgjdbc Java driver team, and I think it is unfair to suggest that "driver support is only far enough in the future".

Vladimir

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

20 January 2023, 16:08:32

On Fri, 20 Jan 2023 at 01:41, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
>
> Do you think the server can de-support the old code path soon?

I don't have any intention to de-support anything. I really only
picture it being an option in environments where the client and server
are all part of a stack controlled by a single group. User tools and
general purpose tools are better served by our current more flexible
setup.

> Just wondering: do you consider back-porting the feature to all supported DB versions?

I can't see that, no.

> > In practice though, by the time drivers support this it'll probably be
> > far enough in the future
>
> I think drivers release more often than the database, and we can get driver support even before the database
releases.
> I'm from pgjdbc Java driver team, and I think it is unfair to suggest that "driver support is only far enough in the
future".

Interesting. I didn't realize this would be so attractive to regular
driver authors. I did think of the Happy Eyeballs technique too but I
agree I wouldn't want to go that way either :)

I guess the server doesn't really have to do anything specific to do
what you want. You could just hard code that servers newer than a
specific version would have this support. Or it could be done with a
"protocol option" -- which wouldn't actually change any behaviour but
would be rejected if the server doesn't support "fast ssl" giving you
the feedback you expect without having much extra legacy complexity.

I guess a lot depends on the way the driver works and the way the
application is structured. Applications that make a single connection
or don't have shared state across connections wouldn't think this way.
And interfaces like libpq would normally just leave it up to the
application to make choices like this. But I guess JVM based
applications are more likely to have long-lived systems that make many
connections and also more likely to make it the driver's
responsibility to manage such things.

--
greg

Re: Experiments with Postgres and SSL

From

Vladimir Sitnikov

Date:

20 January 2023, 17:11:56

>You could just hard code that servers newer than a
> specific version would have this support

Suppose PostgreSQL 21 implements "fast TLS"
Suppose pgjdbc 43 supports "fast TLS"
Suppose PgBouncer 1.17.0 does not support "fast TLS" yet

If pgjdbc connects to the DB via balancer, then the server would
respond with "server_version=21".
The balancer would forward "server_version", so the driver would
assume "fast TLS is supported".

In practice, fast TLS can't be used in that configuration since the
connection will fail when the driver attempts to ask
"fast TLS" from the PgBouncer.

> Or it could be done with a "protocol option"

Would you please clarify what you mean by "protocol option"?

>I guess a lot depends on the way the driver works and the way the
> application is structured

There are cases when applications pre-create connections on startup,
so the faster connections are created the better.
The same case happens when the admin issues "reset connection pool",
so it discards old connections and creates new ones.
People rarely know all the knobs, so I would like to have a "fast by
default" design (e.g. server sending a notification "you may use fast
mode the next time")
rather than "keep old behaviour and require everybody to add fast=true
to their configuration" (e.g. users having to configure
"try_fast_tls_first=true")

Vladimir

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

22 February 2023, 12:26:40

On 20/01/2023 03:28, Jacob Champion wrote:
> On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
>> * "Service Mesh" type tools that hide multiple services behind a
>> single host/port ("Service Mesh" is just a new buzzword for "proxy").
> 
> If you want to multiplex protocols on a port, now is an excellent time
> to require clients to use ALPN on implicit-TLS connections. (There are
> no clients that can currently connect via implicit TLS, so you'll
> never have another chance to force the issue without breaking
> backwards compatibility.) That should hopefully make it harder to
> ALPACA yourself or others [2].

Good idea. Do we want to just require the protocol to be "postgres", or 
perhaps "postgres/3.0"? Need to register that with IANA, I guess.

We implemented a protocol version negotiation mechanism in the libpq 
protocol itself, how would this interact with it? If it's just 
"postgres", then I guess we'd still negotiate the protocol version and 
list of extensions after the TLS handshake.

>> Incidentally I find the logic in ProcessStartupPacket incredibly
>> confusing. It took me a while before I realized it's using tail
>> recursion to implement the startup logic. I think it would be way more
>> straightforward and extensible if it used a much more common iterative
>> style. I think it would make it possible to keep more state than just
>> ssl_done and gss_done without changing the function signature every
>> time for example.
> 
> +1. The complexity of the startup logic, both client- and server-side,
> is a big reason why I want implicit TLS in the first place. That way,
> bugs in that code can't be exploited before the TLS handshake
> completes.

+1. We need to support explicit TLS for a long time, so we can't 
simplify by just removing it. But let's refactor the code somehow, to 
make it more clear.

Looking at the patch, I think it accepts an SSLRequest packet even if 
implicit TLS has already been established. That's surely wrong, and 
shows how confusing the code is. (Or I'm reading it incorrectly, which 
also shows how confusing it is :-) )

Regarding Vladimir's comments on how clients can migrate to this, I 
don't have any great suggestions. To summarize, there are several options:

- Add an "fast_tls" option that the user can enable if they know the 
server supports it

- First connect in old-fashioned way, and remember the server version. 
Later, if you reconnect to the same server, use implicit TLS if the 
server version was high enough. This would be most useful for connection 
pools.

- Connect both ways at the same time, and continue with the fastest, 
i.e. "happy eyeballs"

- Try implicit TLS first, and fall back to explicit TLS if it fails.

For libpq, we don't necessarily need to do anything right now. We can 
add the implicit TLS support in a later version. Not having libpq 
support makes it hard to test the server codepath, though. Maybe just 
test it with 'stunnel' or 'openssl s_client'.

- Heikki

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

22 February 2023, 22:19:28

On Wed, Feb 22, 2023 at 4:26 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 20/01/2023 03:28, Jacob Champion wrote:
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

Unless you plan to make the next minor protocol version fundamentally
incompatible, I don't think there's much reason to add '.0'. (And even
if that does happen, 'postgres/3.1' is still distinct from
'postgres/3'. Or 'postgres' for that matter.) The Expert Review
process might provide some additional guidance?

> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.

Yeah. You could choose to replace major version negotiation completely
with ALPN, I suppose, but there might not be any maintenance benefit
if you still have to support plaintext negotiation. Maybe there are
performance implications to handling the negotiation earlier vs.
later?

Note that older versions of TLS will expose the ALPN in plaintext...
but that may not be a factor by the time a postgres/4 shows up, and if
the next protocol is incompatible then it may not be feasible to hide
the differences via transport encryption anyway.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it

I like that such an option could eventually be leveraged for a
postgresqls:// URI scheme (which should not fall back, ever). There
would be other things we'd have to change first to make that a reality
-- postgresqls://example.com?host=evil.local is problematic, for
example -- but it'd be really nice to have an HTTPS equivalent.

--Jacob

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

28 February 2023, 18:32:45

On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 20/01/2023 03:28, Jacob Champion wrote:
> > On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote:
> >> * "Service Mesh" type tools that hide multiple services behind a
> >> single host/port ("Service Mesh" is just a new buzzword for "proxy").
> >
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

I had never heard of this before, it does seem useful. But if I
understand it right it's entirely independent of this patch. We can
add it to all our Client/Server exchanges whether they're the initial
direct SSL connection or the STARTTLS negotiation?

> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.
>
> >> Incidentally I find the logic in ProcessStartupPacket incredibly
> >> confusing. It took me a while before I realized it's using tail
> >> recursion to implement the startup logic. I think it would be way more
> >> straightforward and extensible if it used a much more common iterative
> >> style. I think it would make it possible to keep more state than just
> >> ssl_done and gss_done without changing the function signature every
> >> time for example.
> >
> > +1. The complexity of the startup logic, both client- and server-side,
> > is a big reason why I want implicit TLS in the first place. That way,
> > bugs in that code can't be exploited before the TLS handshake
> > completes.
>
> +1. We need to support explicit TLS for a long time, so we can't
> simplify by just removing it. But let's refactor the code somehow, to
> make it more clear.
>
> Looking at the patch, I think it accepts an SSLRequest packet even if
> implicit TLS has already been established. That's surely wrong, and
> shows how confusing the code is. (Or I'm reading it incorrectly, which
> also shows how confusing it is :-) )

I'll double check it but I think I tested that that wasn't the case. I
think it accepts the SSL request packet and sends back an N which the
client libpq just interprets as the server not supporting SSL and does
an unencrypted connection (which is tunneled over stunnel unbeknownst
to libpq).

I agree I would want to flatten this logic to an iterative approach
but having wrapped my head around it now I'm not necessarily rushing
to do it now. The main advantage of flattening it would be to make it
easy to support other protocol types which I think could be really
interesting. It would be much clearer to document the state machine if
all the state is in one place and the code just loops through
processing startup packets and going to a new state until the
connection is established. That's true now but you have to understand
how the state is passed in the function parameters and notice that all
the recursion is tail recursive (I think). And extending that state
would require extending the function signature which would get awkward
quickly.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it
>
> - First connect in old-fashioned way, and remember the server version.
> Later, if you reconnect to the same server, use implicit TLS if the
> server version was high enough. This would be most useful for connection
> pools.

Vladimir pointed out that this doesn't necessarily work. The server
may be new enough to support it but it could be behind a proxy like
pgbouncer or something. The same would be true if the server reported
a "connection option" instead of depending on version.

> - Connect both ways at the same time, and continue with the fastest,
> i.e. "happy eyeballs"

That seems way too complex for us to bother with imho.

> - Try implicit TLS first, and fall back to explicit TLS if it fails.

> For libpq, we don't necessarily need to do anything right now. We can
> add the implicit TLS support in a later version. Not having libpq
> support makes it hard to test the server codepath, though. Maybe just
> test it with 'stunnel' or 'openssl s_client'.

I think we should have an option to explicitly enable it in psql, if
only for testing. And then wait five years and switch the default on
it then. In the meantime users can just set it based on their setup.
That's not the way to the quickest adoption but imho the main
advantages of this option are the options it gives users, not the
latency improvement, so I'm not actually super concerned about
adoption rate.

I assume we'll keep the negotiated mode indefinitely because it can
handle any other protocols we might want. For instance, it currently
handles GSSAPI -- which raises the question, are we happy with GSSAPI
having this extra round trip? Is there a similar change we could make
for it? My understanding is that GSSAPI is an abstract interface and
the actual protocol it's invoking could be anything so we can't make
any assumptions about what the first packet looks like. Perhaps we can
do something about pipelining GSSAPI messages so if the negotiation
fails the server just closes the connection but if it accepts it it
does a similar trick with unreading the buffered data and processing
it through the GSSAPI calls.

-- 
greg

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

28 February 2023, 19:02:42

On Tue, Feb 28, 2023 at 10:33 AM Greg Stark <stark@mit.edu> wrote:
> On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Good idea. Do we want to just require the protocol to be "postgres", or
> > perhaps "postgres/3.0"? Need to register that with IANA, I guess.
>
> I had never heard of this before, it does seem useful. But if I
> understand it right it's entirely independent of this patch.

It can be. If you want to use it in the strongest possible way,
though, you'd have to require its use by clients. Introducing that
requirement later would break existing ones, so I think it makes sense
to do it at the same time as the initial implementation, if there's
interest.

> We can
> add it to all our Client/Server exchanges whether they're the initial
> direct SSL connection or the STARTTLS negotiation?

I'm not sure it would buy you anything during the STARTTLS-style
opening. You already know what protocol you're speaking in that case.
(So with the ALPACA example, the damage is already done.)

Thanks,
--Jacob

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

16 March 2023, 20:00:28

Here's an updated patch for direct SSL connections.

I've added libpq client support with a new connection parameter. This
allows testing it easily with psql. It's still a bit hard to see
what's going on though. I'm thinking it would be good to have libpq
keep a string which describes what negotiations were attempted and
failed and what was eventually accepted which psql could print with
the SSL message or expose some other way.

In the end I didn't see how adding an API for this really helped any
more than just saying the API is to stuff the unread data into the
Port structure. So I just documented that. If anyone has any better
idea...

I added documentation for the libpq connection setting.

One thing, I *think* it's ok to replace the send(2) call with
secure_write() in the negotiation. It does mean it's possible for the
connection to fail with FATAL at that point instead of COMMERROR but I
don't think that's a problem.

I haven't added tests. I'm not sure how to test this since to test it
properly means running the server with every permutation of ssl and
gssapi configurations.

Incidentally, some of the configuration combinations -- namely
sslnegotiation=direct and default gssencmode and sslmode results in a
counter-intuitive behaviour. But I don't see a better option that
doesn't mean making the defaults less useful.

Attachment

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

20 March 2023, 20:31:03

Here's a first cut at ALPN support.

Currently it's using a hard coded "Postgres/3.0" protocol (hard coded
both in the client and the server...). And it's hard coded to be
required for direct connections and supported but not required for
regular connections.

IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.

The other patches are unchanged (modulo a free() that I missed in the
client before). They still have the semi-open issues I mentioned in
the previous email.




--
greg

Attachment

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

20 March 2023, 20:35:38

Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case.

Attachment

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

31 March 2023, 07:14:03

On Mon, 20 Mar 2023 at 16:31, Greg Stark <stark@mit.edu> wrote:
>
> Here's a first cut at ALPN support.
>
> Currently it's using a hard coded "Postgres/3.0" protocol

Apparently that is explicitly disrecommended by the IETF folk. They
want something like "TBD" so people don't start using a string until
it's been added to the registry. So I've changed this for now (to
"TBD-pgsql")

Ok, I think this has pretty much everything I was hoping to do.

The one thing I'm not sure of is it seems some codepaths in postmaster
have ereport(COMMERROR) followed by returning an error whereas other
codepaths just have ereport(FATAL). And I don't actually see much
logic in which do which. (I get the principle behind COMMERR it just
seems like it doesn't really match the code).

I realized I had exactly the infrastructure needed to allow pipelining
the SSL ClientHello like Neon wanted to do so I added that too. It's
kind of redundant with direct SSL connections but seems like there may
be reasons to use that instead.

-- 
greg

Attachment

Re: Experiments with Postgres and SSL

From

Greg Stark

Date:

31 March 2023, 07:59:49

And the cfbot wants a small tweak

Attachment

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

04 July 2023, 14:15:49

On 31/03/2023 10:59, Greg Stark wrote:
> IIRC I put a variable labeled a "GUC" but forgot to actually make it a
> GUC. But I'm thinking of maybe removing that variable since I don't
> see much of a use case for controlling this manually. I *think* ALPN
> is supported by all the versions of OpenSSL we support.

+1 on removing the variable. Let's make ALPN mandatory for direct SSL 
connections, like Jacob suggested. And for old-style handshakes, accept 
and check ALPN if it's given.

I don't see the point of the libpq 'sslalpn' option either. Let's send 
ALPN always.

Admittedly having the options make testing different of combinations of 
old and new clients and servers a little easier. But I don't think we 
should add options for the sake of backwards compatibility tests.

> --- a/src/backend/libpq/pqcomm.c
> +++ b/src/backend/libpq/pqcomm.c
> @@ -1126,13 +1126,16 @@ pq_discardbytes(size_t len)
>  /* --------------------------------
>   *             pq_buffer_has_data              - is any buffered data available to read?
>   *
> - * This will *not* attempt to read more data.
> + * Actually returns the number of bytes in the buffer...
> + *
> + * This will *not* attempt to read more data. And reading up to that number of
> + * bytes should not cause reading any more data either.
>   * --------------------------------
>   */
> -bool
> +size_t
>  pq_buffer_has_data(void)
>  {
> -       return (PqRecvPointer < PqRecvLength);
> +       return (PqRecvLength - PqRecvPointer);
>  }

Let's rename the function.

>         /* push unencrypted buffered data back through SSL setup */
>         len = pq_buffer_has_data();
>         if (len > 0)
>         {
>             buf = palloc(len);
>             if (pq_getbytes(buf, len) == EOF)
>                 return STATUS_ERROR; /* shouldn't be possible */
>             port->raw_buf = buf;
>             port->raw_buf_remaining = len;
>             port->raw_buf_consumed = 0;
>         }
> 
>         Assert(pq_buffer_has_data() == 0);
>         if (secure_open_server(port) == -1)
>         {
>             ereport(COMMERROR,
>                     (errcode(ERRCODE_PROTOCOL_VIOLATION),
>                      errmsg("SSL Protocol Error during direct SSL connection initiation")));
>             return STATUS_ERROR;
>         }
> 
>         if (port->raw_buf_remaining > 0)
>         {
>             ereport(COMMERROR,
>                     (errcode(ERRCODE_PROTOCOL_VIOLATION),
>                      errmsg("received unencrypted data after SSL request"),
>                      errdetail("This could be either a client-software bug or evidence of an attempted
man-in-the-middleattack.")));
 
>             return STATUS_ERROR;
>         }
>         if (port->raw_buf)
>             pfree(port->raw_buf);

This pattern is repeated in both callers of secure_open_server(). Could 
we move this into secure_open_server() itself? That would feel pretty 
natural, be-secure.c already contains the secure_raw_read() function 
that reads the 'raw_buf' field.

> const char *
> PQsslAttribute(PGconn *conn, const char *attribute_name)
> {
>     ...
> 
>     if (strcmp(attribute_name, "alpn") == 0)
>     {
>         const unsigned char *data;
>         unsigned int len;
>         static char alpn_str[256]; /* alpn doesn't support longer than 255 bytes */
>         SSL_get0_alpn_selected(conn->ssl, &data, &len);
>         if (data == NULL || len==0 || len > sizeof(alpn_str)-1)
>             return NULL;
>         memcpy(alpn_str, data, len);
>         alpn_str[len] = 0;
>         return alpn_str;
>     }

Using a static buffer doesn't look right. If you call PQsslAttribute on 
two different connections from two different threads concurrently, they 
will write to the same buffer. I see that you copied it from the 
"key_bits" handlng, but it has the same issue.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Michael Paquier

Date:

04 July 2023, 23:33:07

On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote:
> I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN
> always.
>
> Admittedly having the options make testing different of combinations of old
> and new clients and servers a little easier. But I don't think we should add
> options for the sake of backwards compatibility tests.

Hmm.  I would actually argue in favor of having these with tests in
core to stress the previous SSL hanshake protocol, as not having these
parameters would mean that we rely only on major version upgrades in
the buildfarm to test the backward-compatible code path, making issues
much harder to catch.  And we still need to maintain the
backward-compatible path for 10 years based on what pg_dump and
pg_upgrade need to support.
--
Michael

Attachment

signature.asc

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

30 December 2023, 21:51:34

On 05/07/2023 02:33, Michael Paquier wrote:
> On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote:
>> I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN
>> always.
>>
>> Admittedly having the options make testing different of combinations of old
>> and new clients and servers a little easier. But I don't think we should add
>> options for the sake of backwards compatibility tests.
> 
> Hmm.  I would actually argue in favor of having these with tests in
> core to stress the previous SSL hanshake protocol, as not having these
> parameters would mean that we rely only on major version upgrades in
> the buildfarm to test the backward-compatible code path, making issues
> much harder to catch.  And we still need to maintain the
> backward-compatible path for 10 years based on what pg_dump and
> pg_upgrade need to support.

Ok, let's keep it.

I started to review this again. There's a lot of little things to fix 
before this is ready for commit, but overall this looks pretty good. A 
few notes / questions on the first two patches (in addition to the few 
comments I made earlier):

If the client sends TLS HelloClient directly, but the server does not 
support TLS, it just closes the connection. It would be nice to still 
send some kind of an error to the client. Maybe a TLS alert packet? I 
don't want to start implementing TLS, but I think a TLS alert packet 
with a suitable error code would be just a constant.

The new CONNECTION_DIRECT_SSL_STARTUP state needs to be moved to end of 
the enum. We cannot change the integer values of existing of enum 
values, or clients compiled with old libpq version would mix up the states.

>     /*
>      * validate sslnegotiation option, default is "postgres" for the postgres
>      * style negotiated connection with an extra round trip but more options.
>      */

What "more options" does the negotiated connection provide?

>     if (conn->sslnegotiation)
>     {
>         if (strcmp(conn->sslnegotiation, "postgres") != 0
>             && strcmp(conn->sslnegotiation, "direct") != 0
>             && strcmp(conn->sslnegotiation, "requiredirect") != 0)
>         {
>             conn->status = CONNECTION_BAD;
>             libpq_append_conn_error(conn, "invalid %s value: \"%s\"",
>                                     "sslnegotiation", conn->sslnegotiation);
>             return false;
>         }
> 
> #ifndef USE_SSL
>         if (conn->sslnegotiation[0] != 'p') {
>             conn->status = CONNECTION_BAD;
>             libpq_append_conn_error(conn, "sslnegotiation value \"%s\" invalid when SSL support is not compiled in",
>                                     conn->sslnegotiation);
>             return false;
>         }
> #endif
>     }

At the same time, the patch allows the combination of "sslmode=disable" 
and "sslnegotiation=requiredirect". Seems inconsistent to error out if 
compiled without SSL support.

>     else
>     {
>         libpq_append_conn_error(conn, "sslnegotiation missing?");
>         return false;
>     }

In the other similar settings, like 'channel_binding' and 'sslcertmode', 
we strdup() the compiled-in default if the option is NULL. I'm not sure 
if that's necessary, I think the compiled-in defaults should get filled 
in conninfo_add_defaults(). If so, then those other places could be 
turned into errors like this too. This seems to be a bit of a mess even 
before this patch.

In pg_conn struct:

> +       bool            allow_direct_ssl_try; /* Try to make a direct SSL connection
> +                                                                          * without an "SSL negotiation packet" */
>         bool            allow_ssl_try;  /* Allowed to try SSL negotiation */
>         bool            wait_ssl_try;   /* Delay SSL negotiation until after
>                                                                  * attempting normal connection */

It's getting hard to follow what combinations of these booleans are 
valid and what they're set to at different stages. I think it's time to 
turn all these into one enum, or something like that.

I intend to continue reviewing this after Jan 8th. I'd still like to get 
this into v17.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

10 January 2024, 08:30:49

Some more comments on this:

1. It feels weird that the combination of "gssencmode=require 
sslnegotiation=direct" combination is forbidden. Sure, the ssl 
negotiation will never happen with gssencmode=require, so the 
sslnegotiation option has no effect. But by that token, should we also 
forbid the combination "sslmode=disable sslnegotiation=direct"? I think 
not. The sslnegotiation option should mean "if we are going to try SSL, 
should we try it in direct or negotiated mode?"

2. Should we allow direct SSL only at the very beginning of a TCP 
connection, or should we also allow it after we have requested GSS and 
the server said no? Like this:

Client: GSSENCRequest
Server: 'N' (gss not supported)
Client: TLS client Hello

On one hand, why not? It saves you a round-trip in this case too. If we 
don't allow it, the client will have to send SSLRequest and wait for 
response, or reconnect to try direct SSL. On the other hand, flexibility 
is not necessarily a good thing in security-critical code like this.

The patch set is confused on whether that's allowed or not. The server 
rejects it. But if you use "gssencmode=prefer 
sslnegotiation=requiredrect", libpq will attempt to do it, and fail.

3. With "sslmode=verify-full sslnegotiation=direct", if the direct SSL 
connection fails because of a problem with the certificate, libpq will 
try again in negotiated SSL mode. That seems pointless. If the server 
responded to the direct TLS Client Hello message with a valid 
ServerHello, that indicates that the server supports direct SSL. If 
anything goes wrong after that, retrying in negotiated mode is not going 
to help.

4. The number of combinations of sslmode, gssencmode and sslnegotiation 
settings is scary. And we have very few tests for them.


Attached patch set addresses the above, but is very much WIP. I 
refactored the state machine in libpq, to make the states and 
transitions more clear. I think that helps, but it's still pretty 
complex. I'm all ears for ideas on how to simplify it further.

I added a new test suite to test the different libpq options. See 
src/test/libpq_encryption. I think this is very much needed, but I'm 
still not very happy with the implementation. Some combinations are 
still impossible to test, like connecting to an older server that 
doesn't support direct SSL, or having the server respond with 'N' to 
GSSEncRequest. I'd also like to check more details of each connection 
attempt, like how many TCP connections are established, to check for 
things like 3. above. Maybe we need to add more logging to libpq or the 
server and check the logs after each test.

I'm tempted to implement a mock server from scratch that could easily be 
instructed to accept/reject the connection at just the right places. But 
that's a lot of work.

I'm going to put this down for now. The attached patch set is even more 
raw than v6, but I'm including it here to "save the work".

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Experiments with Postgres and SSL

From

Matthias van de Meent

Date:

19 February 2024, 13:14:23

I've been asked to take a look at this thread and review some patches,
and the subject looks interesting enough, so here I am.

On Thu, 19 Jan 2023 at 04:16, Greg Stark <stark@mit.edu> wrote:
> I had a conversation a while back with Heikki where he expressed that
> it was annoying that we negotiate SSL/TLS the way we do since it
> introduces an extra round trip. Aside from the performance
> optimization I think accepting standard TLS connections would open the
> door to a number of other opportunities that would be worth it on
> their own.

I agree that this would be very nice.

> Other things it would open the door to in order from least
> controversial to most....
>
> * Hiding Postgres behind a standard SSL proxy terminating SSL without
> implementing the Postgres protocol.

I think there is also the option "hiding Postgres behind a standard
SNI-based SSL router that does not terminate SSL", as that's arguably
a more secure way to deploy any SSL service than SSL-terminating
proxies.

> * "Service Mesh" type tools that hide multiple services behind a
> single host/port ("Service Mesh" is just a new buzzword for "proxy").

People proxying PostgreSQL seems fine, and enabling better proxying
seems reasonable.

> * Browser-based protocol implementations using websockets for things
> like pgadmin or other tools to connect directly to postgres using
> Postgres wire protocol but using native SSL implementations.
>
> * Postgres could even implement an HTTP based version of its protocol
> and enable things like queries or browser based tools using straight
> up HTTP requests so they don't need to use websockets.
>
> * Postgres could implement other protocols to serve up data like
> status queries or monitoring metrics, using HTTP based standard
> protocols instead of using our own protocol.

I don't think we should be trying to serve anything HTTP-like, even
with a ten-foot pole, on a port that we serve the PostgreSQL wire
protocol on.

If someone wants to multiplex the PostgreSQL wire protocol on the same
port that serves HTTPS traffic, they're welcome to do so with their
own proxy, but I'd rather we keep the PostgreSQL server's socket
handling fundamentaly incapable of servicng protocols primarily used
in web browsers on the same socket that handles normal psql data
connections.

PostgreSQL may have its own host-based authentication with HBA, but
I'd rather not have to depend on it to filter incoming connections
between valid psql connections and people trying to grab the latest
monitoring statistics at some http endpoint - I'd rather use my trusty
firewall that can already limit access to specific ports very
efficiently without causing undue load on the database server.

Matthias van de Meent
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Matthias van de Meent

Date:

21 February 2024, 23:43:11

On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Some more comments on this:
>
> 1. It feels weird that the combination of "gssencmode=require
> sslnegotiation=direct" combination is forbidden. Sure, the ssl
> negotiation will never happen with gssencmode=require, so the
> sslnegotiation option has no effect. But by that token, should we also
> forbid the combination "sslmode=disable sslnegotiation=direct"? I think
> not. The sslnegotiation option should mean "if we are going to try SSL,
> should we try it in direct or negotiated mode?"

I'm not sure about this either. The 'gssencmode' option is already
quite weird in that it seems to override the "require"d priority of
"sslmode=require", which it IMO really shouldn't.

> 2. Should we allow direct SSL only at the very beginning of a TCP
> connection, or should we also allow it after we have requested GSS and
> the server said no? Like this:
>
> Client: GSSENCRequest
> Server: 'N' (gss not supported)
> Client: TLS client Hello
>
> On one hand, why not? It saves you a round-trip in this case too. If we
> don't allow it, the client will have to send SSLRequest and wait for
> response, or reconnect to try direct SSL. On the other hand, flexibility
> is not necessarily a good thing in security-critical code like this.

I think this should be "no".
Once we start accepting PostgreSQL protocol packets (such as the
GSSENCRequest packet) I don't think we should start treating data
stream corruption as attempted SSL connections.

> The patch set is confused on whether that's allowed or not. The server
> rejects it. But if you use "gssencmode=prefer
> sslnegotiation=requiredrect", libpq will attempt to do it, and fail.

That should then be detected as an incorrect combination of flags in
psql: you can't have direct-to-ssl and put something in front of it.

> 3. With "sslmode=verify-full sslnegotiation=direct", if the direct SSL
> connection fails because of a problem with the certificate, libpq will
> try again in negotiated SSL mode. That seems pointless. If the server
> responded to the direct TLS Client Hello message with a valid
> ServerHello, that indicates that the server supports direct SSL. If
> anything goes wrong after that, retrying in negotiated mode is not going
> to help.

This makes sense.

> 4. The number of combinations of sslmode, gssencmode and sslnegotiation
> settings is scary. And we have very few tests for them.

Yeah, it's not great. We could easily automate this better though. I
mean, can't we run the tests using a "cube" configuration, i.e. test
every combination of parameters? We would use a mapping function of
(psql connection parameter values -> expectations), which would be
along the lines of the attached pl testfile. I feel it's a bit more
approachable than the lists of manual option configurations, and makes
it a bit easier to program the logic of which connection security
option we should have used to connect.
The attached file would be a drop-in replacement; it's tested to work
with SSL only - without GSS - because I've been having issues getting
GSS working on my machine.

> I'm going to put this down for now. The attached patch set is even more
> raw than v6, but I'm including it here to "save the work".

v6 doesn't apply cleanly anymore after 774bcffe, but here are some notes:

Several patches are still very much WIP. Reviewing them on a
patch-by-patch basis is therefore nigh impossible; the specific
reviews below are thus on changes that could be traced back to a
specific patch. A round of cleanup would be appreciated.

> 0003: Direct SSL connections postmaster support
> [...]
> -extern bool pq_buffer_has_data(void);
> +extern size_t pq_buffer_has_data(void);

This should probably be renamed to pg_buffer_remaining_data or such,
if we change the signature like this.

> +    /* Read from the "unread" buffered data first. c.f. libpq-be.h */
> +    if (port->raw_buf_remaining > 0)
> +    {
> +        /* consume up to len bytes from the raw_buf */
> +        if (len > port->raw_buf_remaining)
> +            len = port->raw_buf_remaining;

Shouldn't we also try to read from the socket, instead of only
consuming bytes from the raw buffer if it contains bytes?

> 0008: Allow pipelining data after ssl request
> +            /*
> +             * At this point we should have no data already buffered.  If we do,
> +             * it was received before we performed the SSL handshake, so it wasn't
> +             * encrypted and indeed may have been injected by a man-in-the-middle.
> +             * We report this case to the client.
> +             */
> +            if (port->raw_buf_remaining > 0)
> +                ereport(FATAL,
> +                        (errcode(ERRCODE_PROTOCOL_VIOLATION),
> +                         errmsg("received unencrypted data after SSL request"),
> +                         errdetail("This could be either a client-software bug or evidence of an attempted
man-in-the-middleattack.")));

We currently don't support 0-RTT SSL connections because (among other
reasons) we haven't yet imported many features from TLS1.3, but it
seems reasonable that clients may want to use 0RTT (or, session
resumption in 0 round trips), which would allow encrypted data after
the SSL startup packet.
It seems wise to add something to this note to these comments in
ProcessStartupPacket.

> ALPN

Does the TLS ALPN spec allow protocol versions in the protocol tag? It
would be very useful to detect clients with new capabilities at the
first connection, rather than having to wait for one round trip, and
would allow one avenue for changing the protocol version.

Apart from this, I didn't really find any serious problems in the sum
of these patches. The intermediate states were not great though, with
various broken states in between.

Kind regards,

Matthias van de Meent

Attachment

001_negotiate_encryption.pl

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

22 February 2024, 17:02:51

On 22/02/2024 01:43, Matthias van de Meent wrote:
> On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> 4. The number of combinations of sslmode, gssencmode and sslnegotiation
>> settings is scary. And we have very few tests for them.
> 
> Yeah, it's not great. We could easily automate this better though. I
> mean, can't we run the tests using a "cube" configuration, i.e. test
> every combination of parameters? We would use a mapping function of
> (psql connection parameter values -> expectations), which would be
> along the lines of the attached pl testfile. I feel it's a bit more
> approachable than the lists of manual option configurations, and makes
> it a bit easier to program the logic of which connection security
> option we should have used to connect.
> The attached file would be a drop-in replacement; it's tested to work
> with SSL only - without GSS - because I've been having issues getting
> GSS working on my machine.

+1 testing all combinations. I don't think the 'mapper' function 
approach in your version is much better than the original though. Maybe 
it would be better with just one 'mapper' function that contains all the 
rules, along the lines of: (This isn't valid perl, just pseudo-code)

sub expected_outcome
{
     my ($user, $sslmode, $negotiation, $gssmode) = @_;

     my @possible_outcomes = { 'plain', 'ssl', 'gss' }

     delete $possible_outcomes{'plain'} if $sslmode eq 'require';
     delete $possible_outcomes{'ssl'} if $sslmode eq 'disable';

     delete $possible_outcomes{'plain'} if $user eq 'ssluser';
     delete $possible_outcomes{'plain'} if $user eq 'ssluser';

     if $sslmode eq 'allow' {
    # move 'plain' before 'ssl' in the list
     }
     if $sslmode eq 'prefer' {
    # move 'ssl' before 'plain' in the list
     }

     # more rules here


     # If there are no outcomes left in $possible_outcomes, return 'fail'
     # If there's exactly one outcome left, return that.
     # If there's more, return the first one.
}


Or maybe a table that lists all the combinations and the expected 
outcome. Something lieke this:

             nossluser    nogssuser    ssluser    gssuser        
sslmode=require    fail        ...
sslmode=prefer    plain
sslmode=disable    plain


The problem is that there are more than two dimensions. So maybe an 
exhaustive list like this:

user        sslmode        gssmode        outcome

nossluser    require        disable        fail
nossluser    prefer        disable        plain
nossluser    disable        disable        plain
ssluser        require        disable        ssl
...


I'm just throwing around ideas here, can you experiment with different 
approaches and see what looks best?

>> ALPN
> 
> Does the TLS ALPN spec allow protocol versions in the protocol tag? It
> would be very useful to detect clients with new capabilities at the
> first connection, rather than having to wait for one round trip, and
> would allow one avenue for changing the protocol version.

Looking at the list of registered ALPN tags [0], I can see "http/0.9"; 
"http/1.0" and "http/1.1". I think we'd want to changing the major 
protocol version in a way that would introduce a new roundtrip, though.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Matthias van de Meent

Date:

28 February 2024, 12:00:52

On Thu, 22 Feb 2024 at 18:02, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 22/02/2024 01:43, Matthias van de Meent wrote:
>> On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> 4. The number of combinations of sslmode, gssencmode and sslnegotiation
>>> settings is scary. And we have very few tests for them.
>>
>> Yeah, it's not great. We could easily automate this better though. I
>> mean, can't we run the tests using a "cube" configuration, i.e. test
>> every combination of parameters? We would use a mapping function of
>> (psql connection parameter values -> expectations), which would be
>> along the lines of the attached pl testfile. I feel it's a bit more
>> approachable than the lists of manual option configurations, and makes
>> it a bit easier to program the logic of which connection security
>> option we should have used to connect.
>> The attached file would be a drop-in replacement; it's tested to work
>> with SSL only - without GSS - because I've been having issues getting
>> GSS working on my machine.
>
> +1 testing all combinations. I don't think the 'mapper' function
> approach in your version is much better than the original though. Maybe
> it would be better with just one 'mapper' function that contains all the
> rules, along the lines of: (This isn't valid perl, just pseudo-code)
>
> sub expected_outcome
> {
[...]
> }
>
> Or maybe a table that lists all the combinations and the expected
> outcome. Something lieke this:
[...]
>
> The problem is that there are more than two dimensions. So maybe an
> exhaustive list like this:
>
> user            sslmode         gssmode         outcome
>
> nossluser       require         disable         fail
> ...

> I'm just throwing around ideas here, can you experiment with different
> approaches and see what looks best?

One issue with exhaustive tables is that they would require a product
of all options to be listed, and that'd require at least 216 rows to
manage: server_ssl 2 * server_gss 2 * users 3 * client_ssl 4 *
client_gss 3 * client_ssldirect 3 = 216 different states. I think the
expected_autcome version is easier in that regard.

Attached an updated version using a single unified connection type
validator using an approach similar to yours. Note that it does fail 8
tests, all of which are attributed to the current handling of
`sslmode=require gssencmode=prefer`: right now, we allow GSS in that
case, even though the user require-d sslmode.

An alternative check that does pass tests with the code of the patch
is commented out, at lines 209-216.

>>> ALPN
>>
>> Does the TLS ALPN spec allow protocol versions in the protocol tag? It
>> would be very useful to detect clients with new capabilities at the
>> first connection, rather than having to wait for one round trip, and
>> would allow one avenue for changing the protocol version.
>
> Looking at the list of registered ALPN tags [0], I can see "http/0.9";
> "http/1.0" and "http/1.1".

Ah, nice.

> I think we'd want to changing the major
> protocol version in a way that would introduce a new roundtrip, though.

I don't think I understand what you meant here, could you correct the
sentence or expand why we want to do that?
Note that with ALPN you could negotiate postgres/3.0 or postgres/4.0
during the handshake, which could save round-trips.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment

001_negotiate_encryption.pl

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

28 February 2024, 12:10:20

On 28/02/2024 14:00, Matthias van de Meent wrote:
> I don't think I understand what you meant here, could you correct the
> sentence or expand why we want to do that?
> Note that with ALPN you could negotiate postgres/3.0 or postgres/4.0
> during the handshake, which could save round-trips.

Sorry, I missed "avoid" there. I meant:

I think we'd want to *avoid* changing the major protocol version in a 
way that would introduce a new roundtrip, though.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

01 March 2024, 21:49:12

On Wed, Feb 28, 2024 at 4:10 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I think we'd want to *avoid* changing the major protocol version in a
> way that would introduce a new roundtrip, though.

I'm starting to get up to speed with this patchset. So far I'm mostly
testing how it works; I have yet to take an in-depth look at the
implementation.

I'll squint more closely at the MITM-protection changes in 0008 later.
First impressions, though: it looks like that code has gotten much
less straightforward, which I think is dangerous given the attack it's
preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
protocol doesn't seem particularly replay-safe to me.)

If we're interested in ALPN negotiation in the future, we may also
want to look at GREASE [1] to keep those options open in the presence
of third-party implementations. Unfortunately OpenSSL doesn't do this
automatically yet.

If we don't have a reason not to, it'd be good to follow the strictest
recommendations from [2] to avoid cross-protocol attacks. (For anyone
currently running web servers and Postgres on the same host, they
really don't want browsers "talking" to their Postgres servers.) That
would mean checking the negotiated ALPN on both the server and client
side, and failing if it's not what we expect.

I'm not excited about the proliferation of connection options. I don't
have a lot of ideas on how to fix it, though, other than to note that
the current sslnegotiation option names are very unintuitive to me:
- "postgres": only legacy handshakes
- "direct": might be direct... or maybe legacy
- "requiredirect": only direct handshakes... unless other options are
enabled and then we fall back again to legacy? How many people willing
to break TLS compatibility with old servers via "requiredirect" are
going to be okay with lazy fallback to GSS or otherwise?

Heikki mentioned possibly hard-coding a TLS alert if direct SSL is
attempted without server TLS support. I think that's a cool idea, but
without an official "TLS not supported" alert code (which, honestly,
would be strange to standardize) I'm kinda -0.5 on it. If the client
tells me about a handshake_failure or similar, I'm going to start
investigating protocol versions and ciphersuites; I'm not going to
think to myself that maybe the server lacks TLS support altogether.
(Plus, we need to have a good error message when connecting to older
servers anyway. I think we should be able to key off of the EOF coming
back from OpenSSL; it'd be a good excuse to give that part of the code
some love.)

For the record, I'm adding some one-off tests for this feature to a
local copy of my OAuth pytest suite, which is designed to do the kinds
of testing you're running into trouble with. It's not in any way
viable for a PG17 commit, but if you're interested I can make the
patches available.

--Jacob

[1] https://www.rfc-editor.org/rfc/rfc8701.html
[2] https://alpaca-attack.com/libs.html

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

04 March 2024, 15:29:41

On 01/03/2024 23:49, Jacob Champion wrote:
> On Wed, Feb 28, 2024 at 4:10 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I think we'd want to *avoid* changing the major protocol version in a
>> way that would introduce a new roundtrip, though.
> 
> I'm starting to get up to speed with this patchset. So far I'm mostly
> testing how it works; I have yet to take an in-depth look at the
> implementation.

Thank you!

> I'll squint more closely at the MITM-protection changes in 0008 later.
> First impressions, though: it looks like that code has gotten much
> less straightforward, which I think is dangerous given the attack it's
> preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
> protocol doesn't seem particularly replay-safe to me.)

Let's drop that patch. AFAICS it's not needed by the rest of the patches.

> If we're interested in ALPN negotiation in the future, we may also
> want to look at GREASE [1] to keep those options open in the presence
> of third-party implementations. Unfortunately OpenSSL doesn't do this
> automatically yet.

Can you elaborate? Do we need to do something extra in the server to be 
compatible with GREASE?

> If we don't have a reason not to, it'd be good to follow the strictest
> recommendations from [2] to avoid cross-protocol attacks. (For anyone
> currently running web servers and Postgres on the same host, they
> really don't want browsers "talking" to their Postgres servers.) That
> would mean checking the negotiated ALPN on both the server and client
> side, and failing if it's not what we expect.

Hmm, I thought that's what the patches does. But looking closer, libpq 
is not checking that ALPN was used. We should add that. Am I right?

> I'm not excited about the proliferation of connection options. I don't
> have a lot of ideas on how to fix it, though, other than to note that
> the current sslnegotiation option names are very unintuitive to me:
> - "postgres": only legacy handshakes
> - "direct": might be direct... or maybe legacy
> - "requiredirect": only direct handshakes... unless other options are
> enabled and then we fall back again to legacy? How many people willing
> to break TLS compatibility with old servers via "requiredirect" are
> going to be okay with lazy fallback to GSS or otherwise?

Yeah, this is my biggest complaint about all this. Not so much the names 
of the options, but the number of combinations of different options, and 
how we're going to test them all. I don't have any great solutions, 
except adding a lot of tests to cover them, like Matthias did.

> Heikki mentioned possibly hard-coding a TLS alert if direct SSL is
> attempted without server TLS support. I think that's a cool idea, but
> without an official "TLS not supported" alert code (which, honestly,
> would be strange to standardize) I'm kinda -0.5 on it. If the client
> tells me about a handshake_failure or similar, I'm going to start
> investigating protocol versions and ciphersuites; I'm not going to
> think to myself that maybe the server lacks TLS support altogether.

Agreed.

> (Plus, we need to have a good error message when connecting to older
> servers anyway.I think we should be able to key off of the EOF coming
> back from OpenSSL; it'd be a good excuse to give that part of the code
> some love.)

Hmm, if OpenSSL sends ClientHello and the server responds with a 
Postgres error packet, OpenSSL will presumably consume the error packet 
or at least part of it. But with our custom BIO, we can peek at the 
server response before handing it to OpenSSL.

If it helps, we could backport a nicer error message to old server 
versions, similar to what we did with SCRAM in commit 96d0f988b1.

> For the record, I'm adding some one-off tests for this feature to a
> local copy of my OAuth pytest suite, which is designed to do the kinds
> of testing you're running into trouble with. It's not in any way
> viable for a PG17 commit, but if you're interested I can make the
> patches available.

Yes please, it would be nice to see what tests you've performed, and 
have it archived.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

05 March 2024, 14:08:54

I hope I didn't joggle your elbow reviewing this, Jacob, but I spent 
some time rebase and fix various little things:

- Incorporated Matthias's test changes

- Squashed the client, server and documentation patches. Not much point 
in keeping them separate, as one requires the other, and if you're only 
interested e.g. in the server parts, just look at src/backend.

- Squashed some of my refactorings with the main patches, because I'm 
certain enough that they're desirable. I kept the last libpq state 
machine refactoring separate though. I'm pretty sure we need a 
refactoring like that, but I'm not 100% sure about the details.

- Added some comments to the new state machine logic in fe-connect.c.

- Removed the XXX comments about TLS alerts.

- Removed the "Allow pipelining data after ssl request" patch

- Reordered the patches so that the first two patches add the tests 
different combinations of sslmode, gssencmode and server support. That 
could be committed separately, without the rest of the patches. A later 
patch expands the tests for the new sslnegotiation option.


The tests are still not distinguishing whether a connection was 
established in direct or negotiated mode. So if we e.g. had a bug that 
accidentally disabled direct SSL connection completely and always used 
negotiated mode, the tests would still pass. I'd like to see some tests 
that would catch that.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

05 March 2024, 18:57:31

On Tue, Mar 5, 2024 at 6:09 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> I hope I didn't joggle your elbow reviewing this

Nope, not at all!

> The tests are still not distinguishing whether a connection was
> established in direct or negotiated mode. So if we e.g. had a bug that
> accidentally disabled direct SSL connection completely and always used
> negotiated mode, the tests would still pass. I'd like to see some tests
> that would catch that.

+1

On Mon, Mar 4, 2024 at 7:29 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 01/03/2024 23:49, Jacob Champion wrote:
> > I'll squint more closely at the MITM-protection changes in 0008 later.
> > First impressions, though: it looks like that code has gotten much
> > less straightforward, which I think is dangerous given the attack it's
> > preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
> > protocol doesn't seem particularly replay-safe to me.)
>
> Let's drop that patch. AFAICS it's not needed by the rest of the patches.

Okay, sounds good.

> > If we're interested in ALPN negotiation in the future, we may also
> > want to look at GREASE [1] to keep those options open in the presence
> > of third-party implementations. Unfortunately OpenSSL doesn't do this
> > automatically yet.
>
> Can you elaborate?

Sure: now that we're letting middleboxes and proxies inspect and react
to connections based on ALPN, it's possible that some intermediary
might incorrectly fixate on the "postgres" ID (or whatever we choose
in the end), and shut down connections that carry additional protocols
rather than ignoring them. That would prevent future graceful upgrades
where the client sends both "postgres/X" and "postgres/X+1". While
that wouldn't be our fault, it'd be cold comfort to whoever has that
middlebox.

GREASE is a set of reserved protocol IDs that you can add randomly to
your ALPN list, so any middleboxes that fail to follow the rules will
just break outright rather than silently proliferating. (Hence the
pun: GREASE keeps the joints in the pipe from rusting into place.) The
RFC goes into more detail about how to do it. And I don't know if it's
necessary for a v1, but it'd be something to keep in mind.

> Do we need to do something extra in the server to be
> compatible with GREASE?

No, I think that as long as we use OpenSSL's APIs correctly on the
server side, we'll be compatible by default. This would be a
client-side implementation, to push random GREASE strings into the
ALPN list. (There is a risk that if/when OpenSSL finally starts
supporting this transparently, we'd need to remove it from our code.)

> > If we don't have a reason not to, it'd be good to follow the strictest
> > recommendations from [2] to avoid cross-protocol attacks. (For anyone
> > currently running web servers and Postgres on the same host, they
> > really don't want browsers "talking" to their Postgres servers.) That
> > would mean checking the negotiated ALPN on both the server and client
> > side, and failing if it's not what we expect.
>
> Hmm, I thought that's what the patches does. But looking closer, libpq
> is not checking that ALPN was used. We should add that. Am I right?

Right. Also, it looks like the server isn't failing the TLS handshake
itself, but instead just dropping the connection after the handshake.
In a cross-protocol attack, there's a danger that the client (which is
not speaking our protocol) could still treat the server as
authoritative in that situation.

> > I'm not excited about the proliferation of connection options. I don't
> > have a lot of ideas on how to fix it, though, other than to note that
> > the current sslnegotiation option names are very unintuitive to me:
> > - "postgres": only legacy handshakes
> > - "direct": might be direct... or maybe legacy
> > - "requiredirect": only direct handshakes... unless other options are
> > enabled and then we fall back again to legacy? How many people willing
> > to break TLS compatibility with old servers via "requiredirect" are
> > going to be okay with lazy fallback to GSS or otherwise?
>
> Yeah, this is my biggest complaint about all this. Not so much the names
> of the options, but the number of combinations of different options, and
> how we're going to test them all. I don't have any great solutions,
> except adding a lot of tests to cover them, like Matthias did.

The default gssencmode=prefer is especially problematic if I'm trying
to use sslnegotiation=requiredirect for security. It'll appear to work
at first, but if somehow I get a credential cache into my environment,
libpq will suddenly fall back to plaintext negotiation :(

> > (Plus, we need to have a good error message when connecting to older
> > servers anyway.I think we should be able to key off of the EOF coming
> > back from OpenSSL; it'd be a good excuse to give that part of the code
> > some love.)
>
> Hmm, if OpenSSL sends ClientHello and the server responds with a
> Postgres error packet, OpenSSL will presumably consume the error packet
> or at least part of it. But with our custom BIO, we can peek at the
> server response before handing it to OpenSSL.

I don't think an error packet is going to come back with the
currently-shipped implementations. IIUC, COMMERROR packets are
swallowed instead of emitted before authentication completes. So I see
EOFs when trying to connect to older servers. Do you know of any
situations where we'd see an actual error message on the wire?

> If it helps, we could backport a nicer error message to old server
> versions, similar to what we did with SCRAM in commit 96d0f988b1.

That might be nice regardless, instead of pushing "invalid length of
startup packet" into the logs.

> > For the record, I'm adding some one-off tests for this feature to a
> > local copy of my OAuth pytest suite, which is designed to do the kinds
> > of testing you're running into trouble with. It's not in any way
> > viable for a PG17 commit, but if you're interested I can make the
> > patches available.
>
> Yes please, it would be nice to see what tests you've performed, and
> have it archived.

I've cleaned it up a bit and put it up at [1]. (If you want, I can
attach the GitHub-generated ZIP, so the mailing list has a snapshot.)

These include happy-path tests for direct SSL, some failure modes, and
an example test that combines the GSS and SSL negotiation paths. So
there might be test bugs, but with the v8 patchset, I see the
following failures:

> FAILED client/test_tls.py::test_direct_ssl_without_alpn - AssertionError: client sent unexpected data

I.e. the client doesn't disconnect if the server doesn't select our protocol.

> FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[direct-True] - AssertionError: Regex pattern did not
match.
> FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[requiredirect-False] - AssertionError: Regex pattern
didnot match. 
> FAILED client/test_tls.py::test_gssapi_negotiation - AssertionError: Regex pattern did not match.

These are complaining about the "SSL SYSCALL error: EOF detected"
error messages that the client returns.

> FAILED server/test_tls.py::test_direct_ssl_without_alpn[no application protocols] - Failed: DID NOT RAISE <class
'ssl.SSLError'>
> FAILED server/test_tls.py::test_direct_ssl_without_alpn[incorrect application protocol] - Failed: DID NOT RAISE
<class'ssl.SSLError'> 

I.e. the server allows the handshake to complete without a proper ALPN
selection.

Thanks,
--Jacob

[1] https://github.com/jchampio/pg-pytest-suite

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

05 March 2024, 22:30:40

I keep forgetting -- attached is the diff I'm carrying to plug
libpq_encryption into Meson. (The current patchset has a meson.build
for it, but it's not connected.)

--Jacob

Attachment

meson.diff.txt

Re: Experiments with Postgres and SSL

From

Matthias van de Meent

Date:

28 March 2024, 11:15:27

On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
> some time rebase and fix various little things:

With the recent changes to backend startup committed by you, this
patchset has gotten major apply failures.

Could you provide a new version of the patchset so that it can be
reviewed in the context of current HEAD?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

28 March 2024, 12:37:07

On 28/03/2024 13:15, Matthias van de Meent wrote:
> On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
>> some time rebase and fix various little things:
> 
> With the recent changes to backend startup committed by you, this
> patchset has gotten major apply failures.
> 
> Could you provide a new version of the patchset so that it can be
> reviewed in the context of current HEAD?

Here you are.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Experiments with Postgres and SSL

From

Matthias van de Meent

Date:

04 April 2024, 11:08:08

On Thu, 28 Mar 2024, 13:37 Heikki Linnakangas, <hlinnaka@iki.fi> wrote:
>
> On 28/03/2024 13:15, Matthias van de Meent wrote:
> > On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>
> >> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
> >> some time rebase and fix various little things:
> >
> > With the recent changes to backend startup committed by you, this
> > patchset has gotten major apply failures.
> >
> > Could you provide a new version of the patchset so that it can be
> > reviewed in the context of current HEAD?
>
> Here you are.

Sorry for the delay. I've run some tests and didn't find any specific
issues in the patchset.

I did get sidetracked on trying to further improve the test suite,
where I was trying to find out how to use Test::More::subtests, but
have now decided it's not worth the lost time now vs adding this as a
feature in 17.

Some remaining comments:

patches 0001/0002: not reviewed in detail.

Patch 0003:

The read size in secure_raw_read is capped to port->raw_buf_remaining
if the raw buf has any data. While the user will probably call into
this function again, I think that's a waste of cycles.

pq_buffer_has_data now doesn't have any protections against
desynchronized state between PqRecvLength and PqRecvPointer. An
Assert(PqRecvLength >= PqRecvPointer) to that value would be
appreciated.

(in backend_startup.c)
> +         elog(LOG, "Detected direct SSL handshake");

I think this should be gated at a lower log level, or a GUC, as this
wouls easily DOS a logfile by bulk sending of SSL handshake bytes.

0004:

backend_startup.c
> +        if (!ssl_enable_alpn)
> +        {
> +            elog(WARNING, "Received direct SSL connection without ssl_enable_alpn enabled");

This is too verbose, too.

> +       if (!port->alpn_used)
> +        {
> +            ereport(COMMERROR,
> +                    (errcode(ERRCODE_PROTOCOL_VIOLATION),
> +                     errmsg("Received direct SSL connection request without required ALPN protocol negotiation
extension")));

If ssl_enable_alpn is disabled, we shouln't report a COMMERROR when
the client does indeed not have alpn enabled.

0005:

As mentioned above, I'd have loved to use subtests here for the cube()
of tests, but I got in too much of a rabbit hole to get that done.

0006:

In CONNECTION_FAILED, we use connection_failed() to select whether we
need a new connection or stop trying altogether, but that function's
description states:

> + * Out-of-line portion of the CONNECTION_FAILED() macro
> + *
> + * Returns true, if we should retry the connection with different encryption method.

Which to me reads like we should reuse the connection, and try a
different method on that same connection. Maybe we can improve the
wording to something like
+ * Returns true, if we should reconnect with a different encryption method.
to make the reconnect part more clear.

In select_next_encryption_method, there are several copies of this pattern:

if ((remaining_methods & ENC_METHOD) != 0)
{
    conn->current_enc_method = ENC_METHOD;
    return true;
}

I think a helper macro would reduce the verbosity of the scaffolding,
like in the attached SELECT_NEXT_METHOD.diff.txt.

Kind regards,

Matthias van de Meent

Attachment

SELECT_NEXT_METHOD.diff.txt

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

08 April 2024, 01:25:54

Committed this. Thank you to everyone involved!

On 04/04/2024 14:08, Matthias van de Meent wrote:
> Patch 0003:
> 
> The read size in secure_raw_read is capped to port->raw_buf_remaining
> if the raw buf has any data. While the user will probably call into
> this function again, I think that's a waste of cycles.

Hmm, yeah, I suppose we could read more data in the same call. It seems 
simpler not to. The case that "raw_buf_remaining > 0" is a very rare.

> pq_buffer_has_data now doesn't have any protections against
> desynchronized state between PqRecvLength and PqRecvPointer. An
> Assert(PqRecvLength >= PqRecvPointer) to that value would be
> appreciated.

Added.

> 0006:
> 
> In CONNECTION_FAILED, we use connection_failed() to select whether we
> need a new connection or stop trying altogether, but that function's
> description states:
> 
>> + * Out-of-line portion of the CONNECTION_FAILED() macro
>> + *
>> + * Returns true, if we should retry the connection with different encryption method.
> 
> Which to me reads like we should reuse the connection, and try a
> different method on that same connection. Maybe we can improve the
> wording to something like
> + * Returns true, if we should reconnect with a different encryption method.
> to make the reconnect part more clear.

Changed to "Returns true, if we should reconnect and retry with a 
different encryption method".

> In select_next_encryption_method, there are several copies of this pattern:
> 
> if ((remaining_methods & ENC_METHOD) != 0)
> {
>      conn->current_enc_method = ENC_METHOD;
>      return true;
> }
> 
> I think a helper macro would reduce the verbosity of the scaffolding,
> like in the attached SELECT_NEXT_METHOD.diff.txt.

Applied.

In addition to the above, I made heavy changes to the tests. I wanted to 
test not just the outcome (SSL, GSSAPI, plaintext, or fail), but also 
the steps and reconnections needed to get there. To facilitate that, I 
rewrote how the expected outcome was represented in the test script. It 
now uses a table-driven approach, with a line for each test iteration, 
ie. for each different combination of options that are tested.

I then added some more logging, so that whenever the server receives an 
SSLRequest or GSSENCRequest packet, it logs a line. That's controlled by 
a new not-in-sample GUC ("trace_connection_negotiation"), intended only 
for the test and debugging. The test scrapes the log for the lines that 
it prints, and the expected output includes a compact trace of expected 
events. For example, the expected output for "user=testuser 
gssencmode=prefer sslmode=prefer sslnegotiation=direct", when GSS and 
SSL are both disabled in the server, looks like this:

# USER      GSSENCMODE  SSLMODE   SSLNEGOTIATION  EVENTS -> OUTCOME
testuser    prefer      prefer    direct          connect, 
directsslreject, reconnect, sslreject, authok  -> plain

That means, we expect libpq to first try direct SSL, which is rejected 
by the server. It should then reconnect and attempt traditional 
negotiated SSL, which is also rejected. Finally, it should try plaintext 
authentication, without reconnecting, which succeeds.

That actually revealed a couple of slightly bogus behaviors with the 
current code. Here's one example:

# XXX: libpq retries the connection unnecessarily in this case:
nogssuser   require     allow     connect, gssaccept, authfail, 
reconnect, gssaccept, authfail -> fail

That means, with "gssencmode=require sslmode=allow", if the server 
accepts the GSS encryption but refuses the connection at authentication, 
libpq will reconnect and go through the same motions again. The second 
attempt is pointless, we know it's going to fail. The refactoring to the 
libpq state machine fixed that issue as a side-effect.

I removed the server ssl_enable_alpn and libpq sslalpn options. The idea 
was that they could be useful for testing, but we didn't actually have 
any tests that would use them, and you get the same result by testing 
with an older server or client version. I'm open to adding them back if 
we also add tests that benefit from them, but they were pretty pointless 
as they were.

One important open item now is that we need to register a proper ALPN 
protocol ID with IANA.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Tom Lane

Date:

08 April 2024, 01:28:31

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Committed this. Thank you to everyone involved!

Looks like perlcritic isn't too happy with the test code:
koel and crake say

./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - chmod at line
138,column 2.  See pages 208,278 of PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5) 
./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - open at line 184,
column1.  See pages 208,278 of PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5) 


            regards, tom lane

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

08 April 2024, 01:40:00

On 08/04/2024 04:28, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> Committed this. Thank you to everyone involved!
> 
> Looks like perlcritic isn't too happy with the test code:
> koel and crake say
> 
> ./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - chmod at line
138,column 2.  See pages 208,278 of PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)
 
> ./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - open at line
184,column 1.  See pages 208,278 of PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)
 

Fixed, thanks.

I'll make a note in my personal TODO list to add perlcritic to cirrus CI 
if possible. I rely heavily on that nowadays to catch issues before the 
buildfarm.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

08 April 2024, 08:38:57

On 08/04/2024 04:25, Heikki Linnakangas wrote:
> One important open item now is that we need to register a proper ALPN
> protocol ID with IANA.

I sent a request for that: 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Experiments with Postgres and SSL

From

Peter Eisentraut

Date:

24 April 2024, 20:51:29

On 08.04.24 10:38, Heikki Linnakangas wrote:
> On 08/04/2024 04:25, Heikki Linnakangas wrote:
>> One important open item now is that we need to register a proper ALPN
>> protocol ID with IANA.
> 
> I sent a request for that: 
> https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/

Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is 
"postgres".  This seems confusing.

Re: Experiments with Postgres and SSL

From

Peter Eisentraut

Date:

24 April 2024, 20:57:08

On 01.03.24 22:49, Jacob Champion wrote:
> If we're interested in ALPN negotiation in the future, we may also
> want to look at GREASE [1] to keep those options open in the presence
> of third-party implementations. Unfortunately OpenSSL doesn't do this
> automatically yet.
> 
> If we don't have a reason not to, it'd be good to follow the strictest
> recommendations from [2] to avoid cross-protocol attacks. (For anyone
> currently running web servers and Postgres on the same host, they
> really don't want browsers "talking" to their Postgres servers.) That
> would mean checking the negotiated ALPN on both the server and client
> side, and failing if it's not what we expect.

I've been reading up on ALPN.  There is another thread that is 
discussing PostgreSQL protocol version negotiation, and ALPN also has 
"protocol negotiation" in the name and there is some discussion in this 
thread about the granularity oft the protocol names.

I'm concerned that there appears to be some confusion over whether ALPN 
is a performance feature or a security feature.  RFC 7301 appears to be 
pretty clear that it's for performance, not for security.

Looking at the ALPACA attack, I'm not convinced that it's very relevant 
for PostgreSQL.  It's basically just a case of, you connected to the 
wrong server.  And web browsers routinely open additional connections 
based on what data they have previously received, and they liberally 
send along session cookies to those new connections, so I understand 
that this can be a problem.  But I don't see how ALPN is a good defense. 
  It can help only if all other possible services other than http 
implement it and say, you're a web browser, go away.  And what if the 
rogue server is in fact a web server, then it doesn't help at all.  I 
guess there could be some common configurations where there is a web 
server, and ftp server, and some mail servers running on the same TLS 
end point.  But in how many cases is there also a PostgreSQL server 
running on the same end point?  The page about ALPACA also suggests SNI 
as a mitigation, which seems more sensible, because the burden is then 
on the client to do the right thing, and not on all other servers to 
send away clients doing the wrong thing.  And of course libpq already 
supports SNI.

For the protocol negotiation aspect, how does this work if the wrapped 
protocol already has a version negotiation system?  For example, various 
HTTP versions are registered as separate protocols for ALPN.  What if 
ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or 
vice versa?  What is the actual mechanism where the performance benefits 
(saving round-trips) are created?  I haven't caught up with HTTP 2 and 
so on, so maybe there are additional things at play there, but it is not 
fully explained in the RFCs.  I suppose PostgreSQL would keep its 
internal protocol version negotiation in any case, but then what do we 
need ALPN on top for?

Re: Experiments with Postgres and SSL

From

Jacob Champion

Date:

25 April 2024, 00:57:21

On Wed, Apr 24, 2024 at 1:57 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> I'm concerned that there appears to be some confusion over whether ALPN
> is a performance feature or a security feature.  RFC 7301 appears to be
> pretty clear that it's for performance, not for security.

It was also designed to give benefits for more complex topologies
(proxies, cert selection, etc.), but yeah, this is a mitigation
technique that just uses what is already widely implemented.

> Looking at the ALPACA attack, I'm not convinced that it's very relevant
> for PostgreSQL.  It's basically just a case of, you connected to the
> wrong server.

I think that's an oversimplification. This prevents active MITM, where
an adversary has connected you to the wrong server.

> But I don't see how ALPN is a good defense.
>   It can help only if all other possible services other than http
> implement it and say, you're a web browser, go away.

Why? An ALPACA-aware client will fail the connection if the server
doesn't advertise the correct protocol. An ALPACA-aware server will
fail the handshake if the client doesn't advertise the correct
protocol. They protect themselves, and their peers, without needing
their peers to understand.

> And what if the
> rogue server is in fact a web server, then it doesn't help at all.

It's not a rogue server; the attack is using other friendly services
against you. If you're able to set up an attacker-controlled server,
using the same certificate as the valid server, on a host covered by
the cert, I think it's game over for many other reasons.

If you mean that you can't prevent an attacker from redirecting one
web server's traffic to another (friendly) web server that's running
on the same host, that's correct. Web admins who care would need to
implement countermeasures, like Origin header filtering or something?
I don't think we have a similar concept to that -- it'd be nice! --
but we don't need to have one in order to provide protection for the
other network protocols we exist next to.

> I
> guess there could be some common configurations where there is a web
> server, and ftp server, and some mail servers running on the same TLS
> end point.  But in how many cases is there also a PostgreSQL server
> running on the same end point?

Not only have I seen those cohosted, I've deployed such setups myself.
Isn't that basically cPanel's MO, and a standard setup for <shared web
hosting provider here>? (It's been a while and I don't have a setup
handy to double-check, sorry; feel free to push back if they don't do
that anymore.)

A quick search for "running web server and Postgres on the same host"
seems to yield plenty of conversations. Some of those conversations
say "don't do it", but of course others do not :) Some actively
encourage it for simplicity.

> The page about ALPACA also suggests SNI
> as a mitigation, which seems more sensible, because the burden is then
> on the client to do the right thing, and not on all other servers to
> send away clients doing the wrong thing.  And of course libpq already
> supports SNI.

That mitigates a different attack. From the ALPACA site [1]:

> Implementing these [ALPN] countermeasures is effective in preventing cross-protocol attacks irregardless of hostnames
andports used for application servers. 
> ...
> Implementing these [SNI] countermeasures is effective in preventing same-protocol attacks on servers with different
hostnames,as well as cross-protocol attacks on servers with different hostnames even if the ALPN countermeasures can
notbe implemented. 

SNI is super useful; it's just not always enough. And a strict SNI
check would also be good to do, but it doesn't seem imperative to tie
it to this feature, since same-protocol attacks were already possible
AFAICT. It's the cross-protocol attacks that are new, made possible by
the new handshake.

> For the protocol negotiation aspect, how does this work if the wrapped
> protocol already has a version negotiation system?  For example, various
> HTTP versions are registered as separate protocols for ALPN.  What if
> ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or
> vice versa?

If a client or server incorrectly negotiates a protocol and then
starts speaking a different one, then it's just protocol-dependent
whether that works or not. HTTP/1.0 and HTTP/1.1 would still be
cross-compatible in some cases. The others, not so much.

> What is the actual mechanism where the performance benefits
> (saving round-trips) are created?

The negotiation gets done as part of the TLS handshake, which had to
be done anyway.

> I haven't caught up with HTTP 2 and
> so on, so maybe there are additional things at play there, but it is not
> fully explained in the RFCs.

Practically speaking, HTTP/2 is negotiated via ALPN in the real world,
at least last I checked. I don't think browsers ever supported the
plaintext h2c:// scheme. There's also an in-band `Upgrade: h2c` path
defined that does not use ALPN at all, but again I don't think any
browsers use it.

> I suppose PostgreSQL would keep its
> internal protocol version negotiation in any case, but then what do we
> need ALPN on top for?

That is entirely up to us. If there's a 4.0 protocol that's completely
incompatible at the network level (multiplexing? QUIC?) then issuing a
new ALPN would probably be useful.

Thanks,
--Jacob

[1] https://alpaca-attack.com/libs.html

Re: Experiments with Postgres and SSL

From

Heikki Linnakangas

Date:

25 April 2024, 07:07:54

On 24/04/2024 23:51, Peter Eisentraut wrote:
> On 08.04.24 10:38, Heikki Linnakangas wrote:
>> On 08/04/2024 04:25, Heikki Linnakangas wrote:
>>> One important open item now is that we need to register a proper ALPN
>>> protocol ID with IANA.
>>
>> I sent a request for that:
>> https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/
> 
> Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is
> "postgres".  This seems confusing.

Oh, I was not aware of that. According to [1], it's actually 
"postgresql". The ALPN registration has not been approved yet, so I'll 
reply on the ietf thread to point that out.

[1] 
https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt

-- 
Heikki Linnakangas
Neon (https://neon.tech)