Thread: Experiments with Postgres and SSL
I had a conversation a while back with Heikki where he expressed that it was annoying that we negotiate SSL/TLS the way we do since it introduces an extra round trip. Aside from the performance optimization I think accepting standard TLS connections would open the door to a number of other opportunities that would be worth it on their own. So I took a look into what it would take to do and I think it would actually be quite feasible. The first byte of a standard TLS connection can't look anything like the first byte of any flavour of Postgres startup packet because it would be the high order bits of the length so unless we start having multi-megabyte startup packets.... So I put together a POC patch and it's working quite well and didn't require very much kludgery. Well, it required some but it's really not bad. I do have a bug I'm still trying to work out and the code isn't quite in committable form but I can send the POC patch. Other things it would open the door to in order from least controversial to most.... * Hiding Postgres behind a standard SSL proxy terminating SSL without implementing the Postgres protocol. * "Service Mesh" type tools that hide multiple services behind a single host/port ("Service Mesh" is just a new buzzword for "proxy"). * Browser-based protocol implementations using websockets for things like pgadmin or other tools to connect directly to postgres using Postgres wire protocol but using native SSL implementations. * Postgres could even implement an HTTP based version of its protocol and enable things like queries or browser based tools using straight up HTTP requests so they don't need to use websockets. * Postgres could implement other protocols to serve up data like status queries or monitoring metrics, using HTTP based standard protocols instead of using our own protocol. Incidentally I find the logic in ProcessStartupPacket incredibly confusing. It took me a while before I realized it's using tail recursion to implement the startup logic. I think it would be way more straightforward and extensible if it used a much more common iterative style. I think it would make it possible to keep more state than just ssl_done and gss_done without changing the function signature every time for example. -- greg
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote: > > So I took a look into what it would take to do and I think it would > actually be quite feasible. The first byte of a standard TLS > connection can't look anything like the first byte of any flavour of > Postgres startup packet because it would be the high order bits of the > length so unless we start having multi-megabyte startup packets.... > This is a fascinating idea! I like it a lot. But..do we have to treat any unknown start sequence of bytes as a TLS connection? Or is there some definite subset of possible first bytes that clearly indicates that this is a TLS connection or not? Best regards, Andrey Borodin.
On Thu, 19 Jan 2023 at 00:45, Andrey Borodin <amborodin86@gmail.com> wrote: > But..do we have to treat any unknown start sequence of bytes as a TLS > connection? Or is there some definite subset of possible first bytes > that clearly indicates that this is a TLS connection or not? Absolutely not, there's only one MessageType that can initiate a connection, ClientHello, so the initial byte has to be a specific value. (0x16) And probably to implement HTTP/Websocket it would probably only peek at the first byte and check for things like G(ET) and H(EAD) and so on, possibly only over SSL but in theory it could be over any connection if the request comes before the startup packet. Personally I'm motivated by wanting to implement status and monitoring data for things like Prometheus and the like. For that it would just be simple GET queries to recognize. But tunneling pg wire protocol over websockets sounds cool but not really something I know a lot about. I note that Neon is doing something similar with a proxy: https://neon.tech/blog/serverless-driver-for-postgres -- greg
It would be great if PostgreSQL supported 'start with TLS', however, how could clients activate the feature?
I would like to refrain users from configuring the handshake mode, and I would like to refrain from degrading performance when a new client talks to an old database.
What if the server that supports 'fast TLS' added an extra notification in case client connects with a classic TLS?
Then a capable client could remember host:port and try with newer TLS appoach the next time it connects.
It would be transparent to the clients, and the users won't need to configure 'prefer classic or fast TLS'
The old clients could discard the notification.
Vladimir
Vladimir
On Thu, 19 Jan 2023 at 15:49, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote: > > What if the server that supports 'fast TLS' added an extra notification in case client connects with a classic TLS? > Then a capable client could remember host:port and try with newer TLS appoach the next time it connects. > > It would be transparent to the clients, and the users won't need to configure 'prefer classic or fast TLS' > The old clients could discard the notification. Hm. I hadn't really thought about the case of a new client connecting to an old server. I don't think it's worth implementing a code path in the server like this as it would then become cruft that would be hard to ever get rid of. I think you can do the same thing, more or less, in the client. Like if the driver tries to connect via SSL and gets an error it remembers that host/port and connects using negotiation in the future. In practice though, by the time drivers support this it'll probably be far enough in the future that they can just enable it and you can disable it if you're connecting to an old server. The main benefit for the near term is going to be clients that are specifically designed to take advantage of it because it's necessary to enable the environment they need -- like monitoring tools and proxies. I've attached the POC. It's not near committable, mainly because of the lack of any proper interface to the added fields in Port. I actually had a whole API but ripped it out while debugging because it wasn't working out. But here's an example of psql connecting to the same server via negotiated SSL or through stunnel where stunnel establishes the SSL connection and psql is just doing plain text: stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql 'postgresql://localhost:9432/postgres' psql (16devel) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. postgres=# select * from pg_stat_ssl; pid | ssl | version | cipher | bits | client_dn | client_serial | issuer_dn -------+-----+---------+------------------------+------+-----------+---------------+----------- 48771 | t | TLSv1.3 | TLS_AES_256_GCM_SHA384 | 256 | | | (1 row) postgres=# \q stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql 'postgresql://localhost:8999/postgres' psql (16devel) Type "help" for help. postgres=# select * from pg_stat_ssl; pid | ssl | version | cipher | bits | client_dn | client_serial | issuer_dn -------+-----+---------+------------------------+------+-----------+---------------+----------- 48797 | t | TLSv1.3 | TLS_AES_256_GCM_SHA384 | 256 | | | (1 row) -- greg
Attachment
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote: > I had a conversation a while back with Heikki where he expressed that > it was annoying that we negotiate SSL/TLS the way we do since it > introduces an extra round trip. Aside from the performance > optimization I think accepting standard TLS connections would open the > door to a number of other opportunities that would be worth it on > their own. Nice! I want this too, but for security reasons [1] -- I want to be able to turn off negotiated (explicit) TLS, to force (implicit) TLS-only mode. > Other things it would open the door to in order from least > controversial to most.... > > * Hiding Postgres behind a standard SSL proxy terminating SSL without > implementing the Postgres protocol. +1 > * "Service Mesh" type tools that hide multiple services behind a > single host/port ("Service Mesh" is just a new buzzword for "proxy"). If you want to multiplex protocols on a port, now is an excellent time to require clients to use ALPN on implicit-TLS connections. (There are no clients that can currently connect via implicit TLS, so you'll never have another chance to force the issue without breaking backwards compatibility.) That should hopefully make it harder to ALPACA yourself or others [2]. ALPN doesn't prevent cross-port attacks though, and speaking of those... > * Browser-based protocol implementations using websockets for things > like pgadmin or other tools to connect directly to postgres using > Postgres wire protocol but using native SSL implementations. > > * Postgres could even implement an HTTP based version of its protocol > and enable things like queries or browser based tools using straight > up HTTP requests so they don't need to use websockets. > > * Postgres could implement other protocols to serve up data like > status queries or monitoring metrics, using HTTP based standard > protocols instead of using our own protocol. I see big red warning lights going off in my head -- in a previous life, I got to fix vulnerabilities that resulted from bolting HTTP onto existing protocol servers. Not only do you opt into the browser security model forever, you also gain the ability to speak for any other web server already running on the same host. (I know you have PG committers who are also HTTP experts, and I think you were hacking on mod_perl well before I knew web servers existed. Just... please be careful. ;D ) > Incidentally I find the logic in ProcessStartupPacket incredibly > confusing. It took me a while before I realized it's using tail > recursion to implement the startup logic. I think it would be way more > straightforward and extensible if it used a much more common iterative > style. I think it would make it possible to keep more state than just > ssl_done and gss_done without changing the function signature every > time for example. +1. The complexity of the startup logic, both client- and server-side, is a big reason why I want implicit TLS in the first place. That way, bugs in that code can't be exploited before the TLS handshake completes. Thanks! --Jacob [1] https://www.postgresql.org/message-id/flat/fcc3ebeb7f05775b63f3207ed52a54ea5d17fb42.camel%40vmware.com [2] https://alpaca-attack.com/
>I don't think it's worth implementing a code path in
> the server like this as it would then become cruft that would be hard
> to ever get rid of.
Do you think the server can de-support the old code path soon?
> I think you can do the same thing, more or less, in the client. Like
> if the driver tries to connect via SSL and gets an error it remembers
> that host/port and connects using negotiation in the future.
Well, I doubt everybody would instantaneously upgrade to the database that supports fast TLS,
so there will be a timeframe when there will be a lot of old databases, and the clients will be new.
In that case, going with "try fast, ignore exception" would degrade performance for old databases.
> the server like this as it would then become cruft that would be hard
> to ever get rid of.
Do you think the server can de-support the old code path soon?
> I think you can do the same thing, more or less, in the client. Like
> if the driver tries to connect via SSL and gets an error it remembers
> that host/port and connects using negotiation in the future.
Well, I doubt everybody would instantaneously upgrade to the database that supports fast TLS,
so there will be a timeframe when there will be a lot of old databases, and the clients will be new.
In that case, going with "try fast, ignore exception" would degrade performance for old databases.
I see you suggest caching, however, "degrading one of the cases" might be more painful than
"not improving one of the cases".
I would like to refrain from implementing "parallel connect both ways and check which is faster" in
> In practice though, by the time drivers support this it'll probably be
> far enough in the future
I would like to refrain from implementing "parallel connect both ways and check which is faster" in
PG clients (e.g. https://en.wikipedia.org/wiki/Happy_Eyeballs ).
Just wondering: do you consider back-porting the feature to all supported DB versions?
Just wondering: do you consider back-porting the feature to all supported DB versions?
> In practice though, by the time drivers support this it'll probably be
> far enough in the future
I think drivers release more often than the database, and we can get driver support even before the database releases.
I'm from pgjdbc Java driver team, and I think it is unfair to suggest that "driver support is only far enough in the future".
Vladimir
On Fri, 20 Jan 2023 at 01:41, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote: > > Do you think the server can de-support the old code path soon? I don't have any intention to de-support anything. I really only picture it being an option in environments where the client and server are all part of a stack controlled by a single group. User tools and general purpose tools are better served by our current more flexible setup. > Just wondering: do you consider back-porting the feature to all supported DB versions? I can't see that, no. > > In practice though, by the time drivers support this it'll probably be > > far enough in the future > > I think drivers release more often than the database, and we can get driver support even before the database releases. > I'm from pgjdbc Java driver team, and I think it is unfair to suggest that "driver support is only far enough in the future". Interesting. I didn't realize this would be so attractive to regular driver authors. I did think of the Happy Eyeballs technique too but I agree I wouldn't want to go that way either :) I guess the server doesn't really have to do anything specific to do what you want. You could just hard code that servers newer than a specific version would have this support. Or it could be done with a "protocol option" -- which wouldn't actually change any behaviour but would be rejected if the server doesn't support "fast ssl" giving you the feedback you expect without having much extra legacy complexity. I guess a lot depends on the way the driver works and the way the application is structured. Applications that make a single connection or don't have shared state across connections wouldn't think this way. And interfaces like libpq would normally just leave it up to the application to make choices like this. But I guess JVM based applications are more likely to have long-lived systems that make many connections and also more likely to make it the driver's responsibility to manage such things. -- greg
>You could just hard code that servers newer than a > specific version would have this support Suppose PostgreSQL 21 implements "fast TLS" Suppose pgjdbc 43 supports "fast TLS" Suppose PgBouncer 1.17.0 does not support "fast TLS" yet If pgjdbc connects to the DB via balancer, then the server would respond with "server_version=21". The balancer would forward "server_version", so the driver would assume "fast TLS is supported". In practice, fast TLS can't be used in that configuration since the connection will fail when the driver attempts to ask "fast TLS" from the PgBouncer. > Or it could be done with a "protocol option" Would you please clarify what you mean by "protocol option"? >I guess a lot depends on the way the driver works and the way the > application is structured There are cases when applications pre-create connections on startup, so the faster connections are created the better. The same case happens when the admin issues "reset connection pool", so it discards old connections and creates new ones. People rarely know all the knobs, so I would like to have a "fast by default" design (e.g. server sending a notification "you may use fast mode the next time") rather than "keep old behaviour and require everybody to add fast=true to their configuration" (e.g. users having to configure "try_fast_tls_first=true") Vladimir
On 20/01/2023 03:28, Jacob Champion wrote: > On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote: >> * "Service Mesh" type tools that hide multiple services behind a >> single host/port ("Service Mesh" is just a new buzzword for "proxy"). > > If you want to multiplex protocols on a port, now is an excellent time > to require clients to use ALPN on implicit-TLS connections. (There are > no clients that can currently connect via implicit TLS, so you'll > never have another chance to force the issue without breaking > backwards compatibility.) That should hopefully make it harder to > ALPACA yourself or others [2]. Good idea. Do we want to just require the protocol to be "postgres", or perhaps "postgres/3.0"? Need to register that with IANA, I guess. We implemented a protocol version negotiation mechanism in the libpq protocol itself, how would this interact with it? If it's just "postgres", then I guess we'd still negotiate the protocol version and list of extensions after the TLS handshake. >> Incidentally I find the logic in ProcessStartupPacket incredibly >> confusing. It took me a while before I realized it's using tail >> recursion to implement the startup logic. I think it would be way more >> straightforward and extensible if it used a much more common iterative >> style. I think it would make it possible to keep more state than just >> ssl_done and gss_done without changing the function signature every >> time for example. > > +1. The complexity of the startup logic, both client- and server-side, > is a big reason why I want implicit TLS in the first place. That way, > bugs in that code can't be exploited before the TLS handshake > completes. +1. We need to support explicit TLS for a long time, so we can't simplify by just removing it. But let's refactor the code somehow, to make it more clear. Looking at the patch, I think it accepts an SSLRequest packet even if implicit TLS has already been established. That's surely wrong, and shows how confusing the code is. (Or I'm reading it incorrectly, which also shows how confusing it is :-) ) Regarding Vladimir's comments on how clients can migrate to this, I don't have any great suggestions. To summarize, there are several options: - Add an "fast_tls" option that the user can enable if they know the server supports it - First connect in old-fashioned way, and remember the server version. Later, if you reconnect to the same server, use implicit TLS if the server version was high enough. This would be most useful for connection pools. - Connect both ways at the same time, and continue with the fastest, i.e. "happy eyeballs" - Try implicit TLS first, and fall back to explicit TLS if it fails. For libpq, we don't necessarily need to do anything right now. We can add the implicit TLS support in a later version. Not having libpq support makes it hard to test the server codepath, though. Maybe just test it with 'stunnel' or 'openssl s_client'. - Heikki
On Wed, Feb 22, 2023 at 4:26 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 20/01/2023 03:28, Jacob Champion wrote: > > If you want to multiplex protocols on a port, now is an excellent time > > to require clients to use ALPN on implicit-TLS connections. (There are > > no clients that can currently connect via implicit TLS, so you'll > > never have another chance to force the issue without breaking > > backwards compatibility.) That should hopefully make it harder to > > ALPACA yourself or others [2]. > > Good idea. Do we want to just require the protocol to be "postgres", or > perhaps "postgres/3.0"? Need to register that with IANA, I guess. Unless you plan to make the next minor protocol version fundamentally incompatible, I don't think there's much reason to add '.0'. (And even if that does happen, 'postgres/3.1' is still distinct from 'postgres/3'. Or 'postgres' for that matter.) The Expert Review process might provide some additional guidance? > We implemented a protocol version negotiation mechanism in the libpq > protocol itself, how would this interact with it? If it's just > "postgres", then I guess we'd still negotiate the protocol version and > list of extensions after the TLS handshake. Yeah. You could choose to replace major version negotiation completely with ALPN, I suppose, but there might not be any maintenance benefit if you still have to support plaintext negotiation. Maybe there are performance implications to handling the negotiation earlier vs. later? Note that older versions of TLS will expose the ALPN in plaintext... but that may not be a factor by the time a postgres/4 shows up, and if the next protocol is incompatible then it may not be feasible to hide the differences via transport encryption anyway. > Regarding Vladimir's comments on how clients can migrate to this, I > don't have any great suggestions. To summarize, there are several options: > > - Add an "fast_tls" option that the user can enable if they know the > server supports it I like that such an option could eventually be leveraged for a postgresqls:// URI scheme (which should not fall back, ever). There would be other things we'd have to change first to make that a reality -- postgresqls://example.com?host=evil.local is problematic, for example -- but it'd be really nice to have an HTTPS equivalent. --Jacob
On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 20/01/2023 03:28, Jacob Champion wrote: > > On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark@mit.edu> wrote: > >> * "Service Mesh" type tools that hide multiple services behind a > >> single host/port ("Service Mesh" is just a new buzzword for "proxy"). > > > > If you want to multiplex protocols on a port, now is an excellent time > > to require clients to use ALPN on implicit-TLS connections. (There are > > no clients that can currently connect via implicit TLS, so you'll > > never have another chance to force the issue without breaking > > backwards compatibility.) That should hopefully make it harder to > > ALPACA yourself or others [2]. > > Good idea. Do we want to just require the protocol to be "postgres", or > perhaps "postgres/3.0"? Need to register that with IANA, I guess. I had never heard of this before, it does seem useful. But if I understand it right it's entirely independent of this patch. We can add it to all our Client/Server exchanges whether they're the initial direct SSL connection or the STARTTLS negotiation? > We implemented a protocol version negotiation mechanism in the libpq > protocol itself, how would this interact with it? If it's just > "postgres", then I guess we'd still negotiate the protocol version and > list of extensions after the TLS handshake. > > >> Incidentally I find the logic in ProcessStartupPacket incredibly > >> confusing. It took me a while before I realized it's using tail > >> recursion to implement the startup logic. I think it would be way more > >> straightforward and extensible if it used a much more common iterative > >> style. I think it would make it possible to keep more state than just > >> ssl_done and gss_done without changing the function signature every > >> time for example. > > > > +1. The complexity of the startup logic, both client- and server-side, > > is a big reason why I want implicit TLS in the first place. That way, > > bugs in that code can't be exploited before the TLS handshake > > completes. > > +1. We need to support explicit TLS for a long time, so we can't > simplify by just removing it. But let's refactor the code somehow, to > make it more clear. > > Looking at the patch, I think it accepts an SSLRequest packet even if > implicit TLS has already been established. That's surely wrong, and > shows how confusing the code is. (Or I'm reading it incorrectly, which > also shows how confusing it is :-) ) I'll double check it but I think I tested that that wasn't the case. I think it accepts the SSL request packet and sends back an N which the client libpq just interprets as the server not supporting SSL and does an unencrypted connection (which is tunneled over stunnel unbeknownst to libpq). I agree I would want to flatten this logic to an iterative approach but having wrapped my head around it now I'm not necessarily rushing to do it now. The main advantage of flattening it would be to make it easy to support other protocol types which I think could be really interesting. It would be much clearer to document the state machine if all the state is in one place and the code just loops through processing startup packets and going to a new state until the connection is established. That's true now but you have to understand how the state is passed in the function parameters and notice that all the recursion is tail recursive (I think). And extending that state would require extending the function signature which would get awkward quickly. > Regarding Vladimir's comments on how clients can migrate to this, I > don't have any great suggestions. To summarize, there are several options: > > - Add an "fast_tls" option that the user can enable if they know the > server supports it > > - First connect in old-fashioned way, and remember the server version. > Later, if you reconnect to the same server, use implicit TLS if the > server version was high enough. This would be most useful for connection > pools. Vladimir pointed out that this doesn't necessarily work. The server may be new enough to support it but it could be behind a proxy like pgbouncer or something. The same would be true if the server reported a "connection option" instead of depending on version. > - Connect both ways at the same time, and continue with the fastest, > i.e. "happy eyeballs" That seems way too complex for us to bother with imho. > - Try implicit TLS first, and fall back to explicit TLS if it fails. > For libpq, we don't necessarily need to do anything right now. We can > add the implicit TLS support in a later version. Not having libpq > support makes it hard to test the server codepath, though. Maybe just > test it with 'stunnel' or 'openssl s_client'. I think we should have an option to explicitly enable it in psql, if only for testing. And then wait five years and switch the default on it then. In the meantime users can just set it based on their setup. That's not the way to the quickest adoption but imho the main advantages of this option are the options it gives users, not the latency improvement, so I'm not actually super concerned about adoption rate. I assume we'll keep the negotiated mode indefinitely because it can handle any other protocols we might want. For instance, it currently handles GSSAPI -- which raises the question, are we happy with GSSAPI having this extra round trip? Is there a similar change we could make for it? My understanding is that GSSAPI is an abstract interface and the actual protocol it's invoking could be anything so we can't make any assumptions about what the first packet looks like. Perhaps we can do something about pipelining GSSAPI messages so if the negotiation fails the server just closes the connection but if it accepts it it does a similar trick with unreading the buffered data and processing it through the GSSAPI calls. -- greg
On Tue, Feb 28, 2023 at 10:33 AM Greg Stark <stark@mit.edu> wrote: > On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Good idea. Do we want to just require the protocol to be "postgres", or > > perhaps "postgres/3.0"? Need to register that with IANA, I guess. > > I had never heard of this before, it does seem useful. But if I > understand it right it's entirely independent of this patch. It can be. If you want to use it in the strongest possible way, though, you'd have to require its use by clients. Introducing that requirement later would break existing ones, so I think it makes sense to do it at the same time as the initial implementation, if there's interest. > We can > add it to all our Client/Server exchanges whether they're the initial > direct SSL connection or the STARTTLS negotiation? I'm not sure it would buy you anything during the STARTTLS-style opening. You already know what protocol you're speaking in that case. (So with the ALPACA example, the damage is already done.) Thanks, --Jacob
Here's an updated patch for direct SSL connections. I've added libpq client support with a new connection parameter. This allows testing it easily with psql. It's still a bit hard to see what's going on though. I'm thinking it would be good to have libpq keep a string which describes what negotiations were attempted and failed and what was eventually accepted which psql could print with the SSL message or expose some other way. In the end I didn't see how adding an API for this really helped any more than just saying the API is to stuff the unread data into the Port structure. So I just documented that. If anyone has any better idea... I added documentation for the libpq connection setting. One thing, I *think* it's ok to replace the send(2) call with secure_write() in the negotiation. It does mean it's possible for the connection to fail with FATAL at that point instead of COMMERROR but I don't think that's a problem. I haven't added tests. I'm not sure how to test this since to test it properly means running the server with every permutation of ssl and gssapi configurations. Incidentally, some of the configuration combinations -- namely sslnegotiation=direct and default gssencmode and sslmode results in a counter-intuitive behaviour. But I don't see a better option that doesn't mean making the defaults less useful.
Attachment
Here's a first cut at ALPN support. Currently it's using a hard coded "Postgres/3.0" protocol (hard coded both in the client and the server...). And it's hard coded to be required for direct connections and supported but not required for regular connections. IIRC I put a variable labeled a "GUC" but forgot to actually make it a GUC. But I'm thinking of maybe removing that variable since I don't see much of a use case for controlling this manually. I *think* ALPN is supported by all the versions of OpenSSL we support. The other patches are unchanged (modulo a free() that I missed in the client before). They still have the semi-open issues I mentioned in the previous email. -- greg
Attachment
Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case.
Attachment
On Mon, 20 Mar 2023 at 16:31, Greg Stark <stark@mit.edu> wrote: > > Here's a first cut at ALPN support. > > Currently it's using a hard coded "Postgres/3.0" protocol Apparently that is explicitly disrecommended by the IETF folk. They want something like "TBD" so people don't start using a string until it's been added to the registry. So I've changed this for now (to "TBD-pgsql") Ok, I think this has pretty much everything I was hoping to do. The one thing I'm not sure of is it seems some codepaths in postmaster have ereport(COMMERROR) followed by returning an error whereas other codepaths just have ereport(FATAL). And I don't actually see much logic in which do which. (I get the principle behind COMMERR it just seems like it doesn't really match the code). I realized I had exactly the infrastructure needed to allow pipelining the SSL ClientHello like Neon wanted to do so I added that too. It's kind of redundant with direct SSL connections but seems like there may be reasons to use that instead. -- greg
Attachment
And the cfbot wants a small tweak
Attachment
- v6-0005-Allow-pipelining-data-after-ssl-request.patch
- v6-0004-Direct-SSL-connections-ALPN-support.patch
- v6-0002-Direct-SSL-connections-client-support.patch
- v6-0003-Direct-SSL-connections-documentation.patch
- v6-0006-Direct-SSL-connections-some-additional-docs.patch
- v6-0001-Direct-SSL-connections-postmaster-support.patch
On 31/03/2023 10:59, Greg Stark wrote: > IIRC I put a variable labeled a "GUC" but forgot to actually make it a > GUC. But I'm thinking of maybe removing that variable since I don't > see much of a use case for controlling this manually. I *think* ALPN > is supported by all the versions of OpenSSL we support. +1 on removing the variable. Let's make ALPN mandatory for direct SSL connections, like Jacob suggested. And for old-style handshakes, accept and check ALPN if it's given. I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN always. Admittedly having the options make testing different of combinations of old and new clients and servers a little easier. But I don't think we should add options for the sake of backwards compatibility tests. > --- a/src/backend/libpq/pqcomm.c > +++ b/src/backend/libpq/pqcomm.c > @@ -1126,13 +1126,16 @@ pq_discardbytes(size_t len) > /* -------------------------------- > * pq_buffer_has_data - is any buffered data available to read? > * > - * This will *not* attempt to read more data. > + * Actually returns the number of bytes in the buffer... > + * > + * This will *not* attempt to read more data. And reading up to that number of > + * bytes should not cause reading any more data either. > * -------------------------------- > */ > -bool > +size_t > pq_buffer_has_data(void) > { > - return (PqRecvPointer < PqRecvLength); > + return (PqRecvLength - PqRecvPointer); > } Let's rename the function. > /* push unencrypted buffered data back through SSL setup */ > len = pq_buffer_has_data(); > if (len > 0) > { > buf = palloc(len); > if (pq_getbytes(buf, len) == EOF) > return STATUS_ERROR; /* shouldn't be possible */ > port->raw_buf = buf; > port->raw_buf_remaining = len; > port->raw_buf_consumed = 0; > } > > Assert(pq_buffer_has_data() == 0); > if (secure_open_server(port) == -1) > { > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL Protocol Error during direct SSL connection initiation"))); > return STATUS_ERROR; > } > > if (port->raw_buf_remaining > 0) > { > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("received unencrypted data after SSL request"), > errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middleattack."))); > return STATUS_ERROR; > } > if (port->raw_buf) > pfree(port->raw_buf); This pattern is repeated in both callers of secure_open_server(). Could we move this into secure_open_server() itself? That would feel pretty natural, be-secure.c already contains the secure_raw_read() function that reads the 'raw_buf' field. > const char * > PQsslAttribute(PGconn *conn, const char *attribute_name) > { > ... > > if (strcmp(attribute_name, "alpn") == 0) > { > const unsigned char *data; > unsigned int len; > static char alpn_str[256]; /* alpn doesn't support longer than 255 bytes */ > SSL_get0_alpn_selected(conn->ssl, &data, &len); > if (data == NULL || len==0 || len > sizeof(alpn_str)-1) > return NULL; > memcpy(alpn_str, data, len); > alpn_str[len] = 0; > return alpn_str; > } Using a static buffer doesn't look right. If you call PQsslAttribute on two different connections from two different threads concurrently, they will write to the same buffer. I see that you copied it from the "key_bits" handlng, but it has the same issue. -- Heikki Linnakangas Neon (https://neon.tech)
On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote: > I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN > always. > > Admittedly having the options make testing different of combinations of old > and new clients and servers a little easier. But I don't think we should add > options for the sake of backwards compatibility tests. Hmm. I would actually argue in favor of having these with tests in core to stress the previous SSL hanshake protocol, as not having these parameters would mean that we rely only on major version upgrades in the buildfarm to test the backward-compatible code path, making issues much harder to catch. And we still need to maintain the backward-compatible path for 10 years based on what pg_dump and pg_upgrade need to support. -- Michael
Attachment
On 05/07/2023 02:33, Michael Paquier wrote: > On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote: >> I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN >> always. >> >> Admittedly having the options make testing different of combinations of old >> and new clients and servers a little easier. But I don't think we should add >> options for the sake of backwards compatibility tests. > > Hmm. I would actually argue in favor of having these with tests in > core to stress the previous SSL hanshake protocol, as not having these > parameters would mean that we rely only on major version upgrades in > the buildfarm to test the backward-compatible code path, making issues > much harder to catch. And we still need to maintain the > backward-compatible path for 10 years based on what pg_dump and > pg_upgrade need to support. Ok, let's keep it. I started to review this again. There's a lot of little things to fix before this is ready for commit, but overall this looks pretty good. A few notes / questions on the first two patches (in addition to the few comments I made earlier): If the client sends TLS HelloClient directly, but the server does not support TLS, it just closes the connection. It would be nice to still send some kind of an error to the client. Maybe a TLS alert packet? I don't want to start implementing TLS, but I think a TLS alert packet with a suitable error code would be just a constant. The new CONNECTION_DIRECT_SSL_STARTUP state needs to be moved to end of the enum. We cannot change the integer values of existing of enum values, or clients compiled with old libpq version would mix up the states. > /* > * validate sslnegotiation option, default is "postgres" for the postgres > * style negotiated connection with an extra round trip but more options. > */ What "more options" does the negotiated connection provide? > if (conn->sslnegotiation) > { > if (strcmp(conn->sslnegotiation, "postgres") != 0 > && strcmp(conn->sslnegotiation, "direct") != 0 > && strcmp(conn->sslnegotiation, "requiredirect") != 0) > { > conn->status = CONNECTION_BAD; > libpq_append_conn_error(conn, "invalid %s value: \"%s\"", > "sslnegotiation", conn->sslnegotiation); > return false; > } > > #ifndef USE_SSL > if (conn->sslnegotiation[0] != 'p') { > conn->status = CONNECTION_BAD; > libpq_append_conn_error(conn, "sslnegotiation value \"%s\" invalid when SSL support is not compiled in", > conn->sslnegotiation); > return false; > } > #endif > } At the same time, the patch allows the combination of "sslmode=disable" and "sslnegotiation=requiredirect". Seems inconsistent to error out if compiled without SSL support. > else > { > libpq_append_conn_error(conn, "sslnegotiation missing?"); > return false; > } In the other similar settings, like 'channel_binding' and 'sslcertmode', we strdup() the compiled-in default if the option is NULL. I'm not sure if that's necessary, I think the compiled-in defaults should get filled in conninfo_add_defaults(). If so, then those other places could be turned into errors like this too. This seems to be a bit of a mess even before this patch. In pg_conn struct: > + bool allow_direct_ssl_try; /* Try to make a direct SSL connection > + * without an "SSL negotiation packet" */ > bool allow_ssl_try; /* Allowed to try SSL negotiation */ > bool wait_ssl_try; /* Delay SSL negotiation until after > * attempting normal connection */ It's getting hard to follow what combinations of these booleans are valid and what they're set to at different stages. I think it's time to turn all these into one enum, or something like that. I intend to continue reviewing this after Jan 8th. I'd still like to get this into v17. -- Heikki Linnakangas Neon (https://neon.tech)
Some more comments on this: 1. It feels weird that the combination of "gssencmode=require sslnegotiation=direct" combination is forbidden. Sure, the ssl negotiation will never happen with gssencmode=require, so the sslnegotiation option has no effect. But by that token, should we also forbid the combination "sslmode=disable sslnegotiation=direct"? I think not. The sslnegotiation option should mean "if we are going to try SSL, should we try it in direct or negotiated mode?" 2. Should we allow direct SSL only at the very beginning of a TCP connection, or should we also allow it after we have requested GSS and the server said no? Like this: Client: GSSENCRequest Server: 'N' (gss not supported) Client: TLS client Hello On one hand, why not? It saves you a round-trip in this case too. If we don't allow it, the client will have to send SSLRequest and wait for response, or reconnect to try direct SSL. On the other hand, flexibility is not necessarily a good thing in security-critical code like this. The patch set is confused on whether that's allowed or not. The server rejects it. But if you use "gssencmode=prefer sslnegotiation=requiredrect", libpq will attempt to do it, and fail. 3. With "sslmode=verify-full sslnegotiation=direct", if the direct SSL connection fails because of a problem with the certificate, libpq will try again in negotiated SSL mode. That seems pointless. If the server responded to the direct TLS Client Hello message with a valid ServerHello, that indicates that the server supports direct SSL. If anything goes wrong after that, retrying in negotiated mode is not going to help. 4. The number of combinations of sslmode, gssencmode and sslnegotiation settings is scary. And we have very few tests for them. Attached patch set addresses the above, but is very much WIP. I refactored the state machine in libpq, to make the states and transitions more clear. I think that helps, but it's still pretty complex. I'm all ears for ideas on how to simplify it further. I added a new test suite to test the different libpq options. See src/test/libpq_encryption. I think this is very much needed, but I'm still not very happy with the implementation. Some combinations are still impossible to test, like connecting to an older server that doesn't support direct SSL, or having the server respond with 'N' to GSSEncRequest. I'd also like to check more details of each connection attempt, like how many TCP connections are established, to check for things like 3. above. Maybe we need to add more logging to libpq or the server and check the logs after each test. I'm tempted to implement a mock server from scratch that could easily be instructed to accept/reject the connection at just the right places. But that's a lot of work. I'm going to put this down for now. The attached patch set is even more raw than v6, but I'm including it here to "save the work". -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v7-0001-Move-Kerberos-module.patch
- v7-0002-Add-enc-tests.patch
- v7-0003-Direct-SSL-connections-postmaster-support.patch
- v7-0004-WIP-refactorings-to-backend-support.patch
- v7-0005-Direct-SSL-connections-client-support.patch
- v7-0006-Direct-SSL-connections-documentation.patch
- v7-0007-Direct-SSL-connections-ALPN-support.patch
- v7-0008-Allow-pipelining-data-after-ssl-request.patch
- v7-0009-Direct-SSL-connections-some-additional-docs.patch
- v7-0010-Move-code-to-check-for-alpn.patch
- v7-0011-WIP-refactor-state-machine-in-libpq.patch
- v7-0012-Add-tests-for-sslnegotiation.patch
I've been asked to take a look at this thread and review some patches, and the subject looks interesting enough, so here I am. On Thu, 19 Jan 2023 at 04:16, Greg Stark <stark@mit.edu> wrote: > I had a conversation a while back with Heikki where he expressed that > it was annoying that we negotiate SSL/TLS the way we do since it > introduces an extra round trip. Aside from the performance > optimization I think accepting standard TLS connections would open the > door to a number of other opportunities that would be worth it on > their own. I agree that this would be very nice. > Other things it would open the door to in order from least > controversial to most.... > > * Hiding Postgres behind a standard SSL proxy terminating SSL without > implementing the Postgres protocol. I think there is also the option "hiding Postgres behind a standard SNI-based SSL router that does not terminate SSL", as that's arguably a more secure way to deploy any SSL service than SSL-terminating proxies. > * "Service Mesh" type tools that hide multiple services behind a > single host/port ("Service Mesh" is just a new buzzword for "proxy"). People proxying PostgreSQL seems fine, and enabling better proxying seems reasonable. > * Browser-based protocol implementations using websockets for things > like pgadmin or other tools to connect directly to postgres using > Postgres wire protocol but using native SSL implementations. > > * Postgres could even implement an HTTP based version of its protocol > and enable things like queries or browser based tools using straight > up HTTP requests so they don't need to use websockets. > > * Postgres could implement other protocols to serve up data like > status queries or monitoring metrics, using HTTP based standard > protocols instead of using our own protocol. I don't think we should be trying to serve anything HTTP-like, even with a ten-foot pole, on a port that we serve the PostgreSQL wire protocol on. If someone wants to multiplex the PostgreSQL wire protocol on the same port that serves HTTPS traffic, they're welcome to do so with their own proxy, but I'd rather we keep the PostgreSQL server's socket handling fundamentaly incapable of servicng protocols primarily used in web browsers on the same socket that handles normal psql data connections. PostgreSQL may have its own host-based authentication with HBA, but I'd rather not have to depend on it to filter incoming connections between valid psql connections and people trying to grab the latest monitoring statistics at some http endpoint - I'd rather use my trusty firewall that can already limit access to specific ports very efficiently without causing undue load on the database server. Matthias van de Meent Neon (https://neon.tech)
On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Some more comments on this: > > 1. It feels weird that the combination of "gssencmode=require > sslnegotiation=direct" combination is forbidden. Sure, the ssl > negotiation will never happen with gssencmode=require, so the > sslnegotiation option has no effect. But by that token, should we also > forbid the combination "sslmode=disable sslnegotiation=direct"? I think > not. The sslnegotiation option should mean "if we are going to try SSL, > should we try it in direct or negotiated mode?" I'm not sure about this either. The 'gssencmode' option is already quite weird in that it seems to override the "require"d priority of "sslmode=require", which it IMO really shouldn't. > 2. Should we allow direct SSL only at the very beginning of a TCP > connection, or should we also allow it after we have requested GSS and > the server said no? Like this: > > Client: GSSENCRequest > Server: 'N' (gss not supported) > Client: TLS client Hello > > On one hand, why not? It saves you a round-trip in this case too. If we > don't allow it, the client will have to send SSLRequest and wait for > response, or reconnect to try direct SSL. On the other hand, flexibility > is not necessarily a good thing in security-critical code like this. I think this should be "no". Once we start accepting PostgreSQL protocol packets (such as the GSSENCRequest packet) I don't think we should start treating data stream corruption as attempted SSL connections. > The patch set is confused on whether that's allowed or not. The server > rejects it. But if you use "gssencmode=prefer > sslnegotiation=requiredrect", libpq will attempt to do it, and fail. That should then be detected as an incorrect combination of flags in psql: you can't have direct-to-ssl and put something in front of it. > 3. With "sslmode=verify-full sslnegotiation=direct", if the direct SSL > connection fails because of a problem with the certificate, libpq will > try again in negotiated SSL mode. That seems pointless. If the server > responded to the direct TLS Client Hello message with a valid > ServerHello, that indicates that the server supports direct SSL. If > anything goes wrong after that, retrying in negotiated mode is not going > to help. This makes sense. > 4. The number of combinations of sslmode, gssencmode and sslnegotiation > settings is scary. And we have very few tests for them. Yeah, it's not great. We could easily automate this better though. I mean, can't we run the tests using a "cube" configuration, i.e. test every combination of parameters? We would use a mapping function of (psql connection parameter values -> expectations), which would be along the lines of the attached pl testfile. I feel it's a bit more approachable than the lists of manual option configurations, and makes it a bit easier to program the logic of which connection security option we should have used to connect. The attached file would be a drop-in replacement; it's tested to work with SSL only - without GSS - because I've been having issues getting GSS working on my machine. > I'm going to put this down for now. The attached patch set is even more > raw than v6, but I'm including it here to "save the work". v6 doesn't apply cleanly anymore after 774bcffe, but here are some notes: Several patches are still very much WIP. Reviewing them on a patch-by-patch basis is therefore nigh impossible; the specific reviews below are thus on changes that could be traced back to a specific patch. A round of cleanup would be appreciated. > 0003: Direct SSL connections postmaster support > [...] > -extern bool pq_buffer_has_data(void); > +extern size_t pq_buffer_has_data(void); This should probably be renamed to pg_buffer_remaining_data or such, if we change the signature like this. > + /* Read from the "unread" buffered data first. c.f. libpq-be.h */ > + if (port->raw_buf_remaining > 0) > + { > + /* consume up to len bytes from the raw_buf */ > + if (len > port->raw_buf_remaining) > + len = port->raw_buf_remaining; Shouldn't we also try to read from the socket, instead of only consuming bytes from the raw buffer if it contains bytes? > 0008: Allow pipelining data after ssl request > + /* > + * At this point we should have no data already buffered. If we do, > + * it was received before we performed the SSL handshake, so it wasn't > + * encrypted and indeed may have been injected by a man-in-the-middle. > + * We report this case to the client. > + */ > + if (port->raw_buf_remaining > 0) > + ereport(FATAL, > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > + errmsg("received unencrypted data after SSL request"), > + errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middleattack."))); We currently don't support 0-RTT SSL connections because (among other reasons) we haven't yet imported many features from TLS1.3, but it seems reasonable that clients may want to use 0RTT (or, session resumption in 0 round trips), which would allow encrypted data after the SSL startup packet. It seems wise to add something to this note to these comments in ProcessStartupPacket. > ALPN Does the TLS ALPN spec allow protocol versions in the protocol tag? It would be very useful to detect clients with new capabilities at the first connection, rather than having to wait for one round trip, and would allow one avenue for changing the protocol version. Apart from this, I didn't really find any serious problems in the sum of these patches. The intermediate states were not great though, with various broken states in between. Kind regards, Matthias van de Meent
Attachment
On 22/02/2024 01:43, Matthias van de Meent wrote: > On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> 4. The number of combinations of sslmode, gssencmode and sslnegotiation >> settings is scary. And we have very few tests for them. > > Yeah, it's not great. We could easily automate this better though. I > mean, can't we run the tests using a "cube" configuration, i.e. test > every combination of parameters? We would use a mapping function of > (psql connection parameter values -> expectations), which would be > along the lines of the attached pl testfile. I feel it's a bit more > approachable than the lists of manual option configurations, and makes > it a bit easier to program the logic of which connection security > option we should have used to connect. > The attached file would be a drop-in replacement; it's tested to work > with SSL only - without GSS - because I've been having issues getting > GSS working on my machine. +1 testing all combinations. I don't think the 'mapper' function approach in your version is much better than the original though. Maybe it would be better with just one 'mapper' function that contains all the rules, along the lines of: (This isn't valid perl, just pseudo-code) sub expected_outcome { my ($user, $sslmode, $negotiation, $gssmode) = @_; my @possible_outcomes = { 'plain', 'ssl', 'gss' } delete $possible_outcomes{'plain'} if $sslmode eq 'require'; delete $possible_outcomes{'ssl'} if $sslmode eq 'disable'; delete $possible_outcomes{'plain'} if $user eq 'ssluser'; delete $possible_outcomes{'plain'} if $user eq 'ssluser'; if $sslmode eq 'allow' { # move 'plain' before 'ssl' in the list } if $sslmode eq 'prefer' { # move 'ssl' before 'plain' in the list } # more rules here # If there are no outcomes left in $possible_outcomes, return 'fail' # If there's exactly one outcome left, return that. # If there's more, return the first one. } Or maybe a table that lists all the combinations and the expected outcome. Something lieke this: nossluser nogssuser ssluser gssuser sslmode=require fail ... sslmode=prefer plain sslmode=disable plain The problem is that there are more than two dimensions. So maybe an exhaustive list like this: user sslmode gssmode outcome nossluser require disable fail nossluser prefer disable plain nossluser disable disable plain ssluser require disable ssl ... I'm just throwing around ideas here, can you experiment with different approaches and see what looks best? >> ALPN > > Does the TLS ALPN spec allow protocol versions in the protocol tag? It > would be very useful to detect clients with new capabilities at the > first connection, rather than having to wait for one round trip, and > would allow one avenue for changing the protocol version. Looking at the list of registered ALPN tags [0], I can see "http/0.9"; "http/1.0" and "http/1.1". I think we'd want to changing the major protocol version in a way that would introduce a new roundtrip, though. -- Heikki Linnakangas Neon (https://neon.tech)
On Thu, 22 Feb 2024 at 18:02, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 22/02/2024 01:43, Matthias van de Meent wrote: >> On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> 4. The number of combinations of sslmode, gssencmode and sslnegotiation >>> settings is scary. And we have very few tests for them. >> >> Yeah, it's not great. We could easily automate this better though. I >> mean, can't we run the tests using a "cube" configuration, i.e. test >> every combination of parameters? We would use a mapping function of >> (psql connection parameter values -> expectations), which would be >> along the lines of the attached pl testfile. I feel it's a bit more >> approachable than the lists of manual option configurations, and makes >> it a bit easier to program the logic of which connection security >> option we should have used to connect. >> The attached file would be a drop-in replacement; it's tested to work >> with SSL only - without GSS - because I've been having issues getting >> GSS working on my machine. > > +1 testing all combinations. I don't think the 'mapper' function > approach in your version is much better than the original though. Maybe > it would be better with just one 'mapper' function that contains all the > rules, along the lines of: (This isn't valid perl, just pseudo-code) > > sub expected_outcome > { [...] > } > > Or maybe a table that lists all the combinations and the expected > outcome. Something lieke this: [...] > > The problem is that there are more than two dimensions. So maybe an > exhaustive list like this: > > user sslmode gssmode outcome > > nossluser require disable fail > ... > I'm just throwing around ideas here, can you experiment with different > approaches and see what looks best? One issue with exhaustive tables is that they would require a product of all options to be listed, and that'd require at least 216 rows to manage: server_ssl 2 * server_gss 2 * users 3 * client_ssl 4 * client_gss 3 * client_ssldirect 3 = 216 different states. I think the expected_autcome version is easier in that regard. Attached an updated version using a single unified connection type validator using an approach similar to yours. Note that it does fail 8 tests, all of which are attributed to the current handling of `sslmode=require gssencmode=prefer`: right now, we allow GSS in that case, even though the user require-d sslmode. An alternative check that does pass tests with the code of the patch is commented out, at lines 209-216. >>> ALPN >> >> Does the TLS ALPN spec allow protocol versions in the protocol tag? It >> would be very useful to detect clients with new capabilities at the >> first connection, rather than having to wait for one round trip, and >> would allow one avenue for changing the protocol version. > > Looking at the list of registered ALPN tags [0], I can see "http/0.9"; > "http/1.0" and "http/1.1". Ah, nice. > I think we'd want to changing the major > protocol version in a way that would introduce a new roundtrip, though. I don't think I understand what you meant here, could you correct the sentence or expand why we want to do that? Note that with ALPN you could negotiate postgres/3.0 or postgres/4.0 during the handshake, which could save round-trips. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Attachment
On 28/02/2024 14:00, Matthias van de Meent wrote: > I don't think I understand what you meant here, could you correct the > sentence or expand why we want to do that? > Note that with ALPN you could negotiate postgres/3.0 or postgres/4.0 > during the handshake, which could save round-trips. Sorry, I missed "avoid" there. I meant: I think we'd want to *avoid* changing the major protocol version in a way that would introduce a new roundtrip, though. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Feb 28, 2024 at 4:10 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I think we'd want to *avoid* changing the major protocol version in a > way that would introduce a new roundtrip, though. I'm starting to get up to speed with this patchset. So far I'm mostly testing how it works; I have yet to take an in-depth look at the implementation. I'll squint more closely at the MITM-protection changes in 0008 later. First impressions, though: it looks like that code has gotten much less straightforward, which I think is dangerous given the attack it's preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our protocol doesn't seem particularly replay-safe to me.) If we're interested in ALPN negotiation in the future, we may also want to look at GREASE [1] to keep those options open in the presence of third-party implementations. Unfortunately OpenSSL doesn't do this automatically yet. If we don't have a reason not to, it'd be good to follow the strictest recommendations from [2] to avoid cross-protocol attacks. (For anyone currently running web servers and Postgres on the same host, they really don't want browsers "talking" to their Postgres servers.) That would mean checking the negotiated ALPN on both the server and client side, and failing if it's not what we expect. I'm not excited about the proliferation of connection options. I don't have a lot of ideas on how to fix it, though, other than to note that the current sslnegotiation option names are very unintuitive to me: - "postgres": only legacy handshakes - "direct": might be direct... or maybe legacy - "requiredirect": only direct handshakes... unless other options are enabled and then we fall back again to legacy? How many people willing to break TLS compatibility with old servers via "requiredirect" are going to be okay with lazy fallback to GSS or otherwise? Heikki mentioned possibly hard-coding a TLS alert if direct SSL is attempted without server TLS support. I think that's a cool idea, but without an official "TLS not supported" alert code (which, honestly, would be strange to standardize) I'm kinda -0.5 on it. If the client tells me about a handshake_failure or similar, I'm going to start investigating protocol versions and ciphersuites; I'm not going to think to myself that maybe the server lacks TLS support altogether. (Plus, we need to have a good error message when connecting to older servers anyway. I think we should be able to key off of the EOF coming back from OpenSSL; it'd be a good excuse to give that part of the code some love.) For the record, I'm adding some one-off tests for this feature to a local copy of my OAuth pytest suite, which is designed to do the kinds of testing you're running into trouble with. It's not in any way viable for a PG17 commit, but if you're interested I can make the patches available. --Jacob [1] https://www.rfc-editor.org/rfc/rfc8701.html [2] https://alpaca-attack.com/libs.html
On 01/03/2024 23:49, Jacob Champion wrote: > On Wed, Feb 28, 2024 at 4:10 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> I think we'd want to *avoid* changing the major protocol version in a >> way that would introduce a new roundtrip, though. > > I'm starting to get up to speed with this patchset. So far I'm mostly > testing how it works; I have yet to take an in-depth look at the > implementation. Thank you! > I'll squint more closely at the MITM-protection changes in 0008 later. > First impressions, though: it looks like that code has gotten much > less straightforward, which I think is dangerous given the attack it's > preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our > protocol doesn't seem particularly replay-safe to me.) Let's drop that patch. AFAICS it's not needed by the rest of the patches. > If we're interested in ALPN negotiation in the future, we may also > want to look at GREASE [1] to keep those options open in the presence > of third-party implementations. Unfortunately OpenSSL doesn't do this > automatically yet. Can you elaborate? Do we need to do something extra in the server to be compatible with GREASE? > If we don't have a reason not to, it'd be good to follow the strictest > recommendations from [2] to avoid cross-protocol attacks. (For anyone > currently running web servers and Postgres on the same host, they > really don't want browsers "talking" to their Postgres servers.) That > would mean checking the negotiated ALPN on both the server and client > side, and failing if it's not what we expect. Hmm, I thought that's what the patches does. But looking closer, libpq is not checking that ALPN was used. We should add that. Am I right? > I'm not excited about the proliferation of connection options. I don't > have a lot of ideas on how to fix it, though, other than to note that > the current sslnegotiation option names are very unintuitive to me: > - "postgres": only legacy handshakes > - "direct": might be direct... or maybe legacy > - "requiredirect": only direct handshakes... unless other options are > enabled and then we fall back again to legacy? How many people willing > to break TLS compatibility with old servers via "requiredirect" are > going to be okay with lazy fallback to GSS or otherwise? Yeah, this is my biggest complaint about all this. Not so much the names of the options, but the number of combinations of different options, and how we're going to test them all. I don't have any great solutions, except adding a lot of tests to cover them, like Matthias did. > Heikki mentioned possibly hard-coding a TLS alert if direct SSL is > attempted without server TLS support. I think that's a cool idea, but > without an official "TLS not supported" alert code (which, honestly, > would be strange to standardize) I'm kinda -0.5 on it. If the client > tells me about a handshake_failure or similar, I'm going to start > investigating protocol versions and ciphersuites; I'm not going to > think to myself that maybe the server lacks TLS support altogether. Agreed. > (Plus, we need to have a good error message when connecting to older > servers anyway.I think we should be able to key off of the EOF coming > back from OpenSSL; it'd be a good excuse to give that part of the code > some love.) Hmm, if OpenSSL sends ClientHello and the server responds with a Postgres error packet, OpenSSL will presumably consume the error packet or at least part of it. But with our custom BIO, we can peek at the server response before handing it to OpenSSL. If it helps, we could backport a nicer error message to old server versions, similar to what we did with SCRAM in commit 96d0f988b1. > For the record, I'm adding some one-off tests for this feature to a > local copy of my OAuth pytest suite, which is designed to do the kinds > of testing you're running into trouble with. It's not in any way > viable for a PG17 commit, but if you're interested I can make the > patches available. Yes please, it would be nice to see what tests you've performed, and have it archived. -- Heikki Linnakangas Neon (https://neon.tech)
I hope I didn't joggle your elbow reviewing this, Jacob, but I spent some time rebase and fix various little things: - Incorporated Matthias's test changes - Squashed the client, server and documentation patches. Not much point in keeping them separate, as one requires the other, and if you're only interested e.g. in the server parts, just look at src/backend. - Squashed some of my refactorings with the main patches, because I'm certain enough that they're desirable. I kept the last libpq state machine refactoring separate though. I'm pretty sure we need a refactoring like that, but I'm not 100% sure about the details. - Added some comments to the new state machine logic in fe-connect.c. - Removed the XXX comments about TLS alerts. - Removed the "Allow pipelining data after ssl request" patch - Reordered the patches so that the first two patches add the tests different combinations of sslmode, gssencmode and server support. That could be committed separately, without the rest of the patches. A later patch expands the tests for the new sslnegotiation option. The tests are still not distinguishing whether a connection was established in direct or negotiated mode. So if we e.g. had a bug that accidentally disabled direct SSL connection completely and always used negotiated mode, the tests would still pass. I'd like to see some tests that would catch that. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
On Tue, Mar 5, 2024 at 6:09 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > I hope I didn't joggle your elbow reviewing this Nope, not at all! > The tests are still not distinguishing whether a connection was > established in direct or negotiated mode. So if we e.g. had a bug that > accidentally disabled direct SSL connection completely and always used > negotiated mode, the tests would still pass. I'd like to see some tests > that would catch that. +1 On Mon, Mar 4, 2024 at 7:29 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 01/03/2024 23:49, Jacob Champion wrote: > > I'll squint more closely at the MITM-protection changes in 0008 later. > > First impressions, though: it looks like that code has gotten much > > less straightforward, which I think is dangerous given the attack it's > > preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our > > protocol doesn't seem particularly replay-safe to me.) > > Let's drop that patch. AFAICS it's not needed by the rest of the patches. Okay, sounds good. > > If we're interested in ALPN negotiation in the future, we may also > > want to look at GREASE [1] to keep those options open in the presence > > of third-party implementations. Unfortunately OpenSSL doesn't do this > > automatically yet. > > Can you elaborate? Sure: now that we're letting middleboxes and proxies inspect and react to connections based on ALPN, it's possible that some intermediary might incorrectly fixate on the "postgres" ID (or whatever we choose in the end), and shut down connections that carry additional protocols rather than ignoring them. That would prevent future graceful upgrades where the client sends both "postgres/X" and "postgres/X+1". While that wouldn't be our fault, it'd be cold comfort to whoever has that middlebox. GREASE is a set of reserved protocol IDs that you can add randomly to your ALPN list, so any middleboxes that fail to follow the rules will just break outright rather than silently proliferating. (Hence the pun: GREASE keeps the joints in the pipe from rusting into place.) The RFC goes into more detail about how to do it. And I don't know if it's necessary for a v1, but it'd be something to keep in mind. > Do we need to do something extra in the server to be > compatible with GREASE? No, I think that as long as we use OpenSSL's APIs correctly on the server side, we'll be compatible by default. This would be a client-side implementation, to push random GREASE strings into the ALPN list. (There is a risk that if/when OpenSSL finally starts supporting this transparently, we'd need to remove it from our code.) > > If we don't have a reason not to, it'd be good to follow the strictest > > recommendations from [2] to avoid cross-protocol attacks. (For anyone > > currently running web servers and Postgres on the same host, they > > really don't want browsers "talking" to their Postgres servers.) That > > would mean checking the negotiated ALPN on both the server and client > > side, and failing if it's not what we expect. > > Hmm, I thought that's what the patches does. But looking closer, libpq > is not checking that ALPN was used. We should add that. Am I right? Right. Also, it looks like the server isn't failing the TLS handshake itself, but instead just dropping the connection after the handshake. In a cross-protocol attack, there's a danger that the client (which is not speaking our protocol) could still treat the server as authoritative in that situation. > > I'm not excited about the proliferation of connection options. I don't > > have a lot of ideas on how to fix it, though, other than to note that > > the current sslnegotiation option names are very unintuitive to me: > > - "postgres": only legacy handshakes > > - "direct": might be direct... or maybe legacy > > - "requiredirect": only direct handshakes... unless other options are > > enabled and then we fall back again to legacy? How many people willing > > to break TLS compatibility with old servers via "requiredirect" are > > going to be okay with lazy fallback to GSS or otherwise? > > Yeah, this is my biggest complaint about all this. Not so much the names > of the options, but the number of combinations of different options, and > how we're going to test them all. I don't have any great solutions, > except adding a lot of tests to cover them, like Matthias did. The default gssencmode=prefer is especially problematic if I'm trying to use sslnegotiation=requiredirect for security. It'll appear to work at first, but if somehow I get a credential cache into my environment, libpq will suddenly fall back to plaintext negotiation :( > > (Plus, we need to have a good error message when connecting to older > > servers anyway.I think we should be able to key off of the EOF coming > > back from OpenSSL; it'd be a good excuse to give that part of the code > > some love.) > > Hmm, if OpenSSL sends ClientHello and the server responds with a > Postgres error packet, OpenSSL will presumably consume the error packet > or at least part of it. But with our custom BIO, we can peek at the > server response before handing it to OpenSSL. I don't think an error packet is going to come back with the currently-shipped implementations. IIUC, COMMERROR packets are swallowed instead of emitted before authentication completes. So I see EOFs when trying to connect to older servers. Do you know of any situations where we'd see an actual error message on the wire? > If it helps, we could backport a nicer error message to old server > versions, similar to what we did with SCRAM in commit 96d0f988b1. That might be nice regardless, instead of pushing "invalid length of startup packet" into the logs. > > For the record, I'm adding some one-off tests for this feature to a > > local copy of my OAuth pytest suite, which is designed to do the kinds > > of testing you're running into trouble with. It's not in any way > > viable for a PG17 commit, but if you're interested I can make the > > patches available. > > Yes please, it would be nice to see what tests you've performed, and > have it archived. I've cleaned it up a bit and put it up at [1]. (If you want, I can attach the GitHub-generated ZIP, so the mailing list has a snapshot.) These include happy-path tests for direct SSL, some failure modes, and an example test that combines the GSS and SSL negotiation paths. So there might be test bugs, but with the v8 patchset, I see the following failures: > FAILED client/test_tls.py::test_direct_ssl_without_alpn - AssertionError: client sent unexpected data I.e. the client doesn't disconnect if the server doesn't select our protocol. > FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[direct-True] - AssertionError: Regex pattern did not match. > FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[requiredirect-False] - AssertionError: Regex pattern didnot match. > FAILED client/test_tls.py::test_gssapi_negotiation - AssertionError: Regex pattern did not match. These are complaining about the "SSL SYSCALL error: EOF detected" error messages that the client returns. > FAILED server/test_tls.py::test_direct_ssl_without_alpn[no application protocols] - Failed: DID NOT RAISE <class 'ssl.SSLError'> > FAILED server/test_tls.py::test_direct_ssl_without_alpn[incorrect application protocol] - Failed: DID NOT RAISE <class'ssl.SSLError'> I.e. the server allows the handshake to complete without a proper ALPN selection. Thanks, --Jacob [1] https://github.com/jchampio/pg-pytest-suite
I keep forgetting -- attached is the diff I'm carrying to plug libpq_encryption into Meson. (The current patchset has a meson.build for it, but it's not connected.) --Jacob
Attachment
On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > I hope I didn't joggle your elbow reviewing this, Jacob, but I spent > some time rebase and fix various little things: With the recent changes to backend startup committed by you, this patchset has gotten major apply failures. Could you provide a new version of the patchset so that it can be reviewed in the context of current HEAD? Kind regards, Matthias van de Meent Neon (https://neon.tech)
On 28/03/2024 13:15, Matthias van de Meent wrote: > On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent >> some time rebase and fix various little things: > > With the recent changes to backend startup committed by you, this > patchset has gotten major apply failures. > > Could you provide a new version of the patchset so that it can be > reviewed in the context of current HEAD? Here you are. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
On Thu, 28 Mar 2024, 13:37 Heikki Linnakangas, <hlinnaka@iki.fi> wrote: > > On 28/03/2024 13:15, Matthias van de Meent wrote: > > On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> > >> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent > >> some time rebase and fix various little things: > > > > With the recent changes to backend startup committed by you, this > > patchset has gotten major apply failures. > > > > Could you provide a new version of the patchset so that it can be > > reviewed in the context of current HEAD? > > Here you are. Sorry for the delay. I've run some tests and didn't find any specific issues in the patchset. I did get sidetracked on trying to further improve the test suite, where I was trying to find out how to use Test::More::subtests, but have now decided it's not worth the lost time now vs adding this as a feature in 17. Some remaining comments: patches 0001/0002: not reviewed in detail. Patch 0003: The read size in secure_raw_read is capped to port->raw_buf_remaining if the raw buf has any data. While the user will probably call into this function again, I think that's a waste of cycles. pq_buffer_has_data now doesn't have any protections against desynchronized state between PqRecvLength and PqRecvPointer. An Assert(PqRecvLength >= PqRecvPointer) to that value would be appreciated. (in backend_startup.c) > + elog(LOG, "Detected direct SSL handshake"); I think this should be gated at a lower log level, or a GUC, as this wouls easily DOS a logfile by bulk sending of SSL handshake bytes. 0004: backend_startup.c > + if (!ssl_enable_alpn) > + { > + elog(WARNING, "Received direct SSL connection without ssl_enable_alpn enabled"); This is too verbose, too. > + if (!port->alpn_used) > + { > + ereport(COMMERROR, > + (errcode(ERRCODE_PROTOCOL_VIOLATION), > + errmsg("Received direct SSL connection request without required ALPN protocol negotiation extension"))); If ssl_enable_alpn is disabled, we shouln't report a COMMERROR when the client does indeed not have alpn enabled. 0005: As mentioned above, I'd have loved to use subtests here for the cube() of tests, but I got in too much of a rabbit hole to get that done. 0006: In CONNECTION_FAILED, we use connection_failed() to select whether we need a new connection or stop trying altogether, but that function's description states: > + * Out-of-line portion of the CONNECTION_FAILED() macro > + * > + * Returns true, if we should retry the connection with different encryption method. Which to me reads like we should reuse the connection, and try a different method on that same connection. Maybe we can improve the wording to something like + * Returns true, if we should reconnect with a different encryption method. to make the reconnect part more clear. In select_next_encryption_method, there are several copies of this pattern: if ((remaining_methods & ENC_METHOD) != 0) { conn->current_enc_method = ENC_METHOD; return true; } I think a helper macro would reduce the verbosity of the scaffolding, like in the attached SELECT_NEXT_METHOD.diff.txt. Kind regards, Matthias van de Meent
Attachment
Committed this. Thank you to everyone involved! On 04/04/2024 14:08, Matthias van de Meent wrote: > Patch 0003: > > The read size in secure_raw_read is capped to port->raw_buf_remaining > if the raw buf has any data. While the user will probably call into > this function again, I think that's a waste of cycles. Hmm, yeah, I suppose we could read more data in the same call. It seems simpler not to. The case that "raw_buf_remaining > 0" is a very rare. > pq_buffer_has_data now doesn't have any protections against > desynchronized state between PqRecvLength and PqRecvPointer. An > Assert(PqRecvLength >= PqRecvPointer) to that value would be > appreciated. Added. > 0006: > > In CONNECTION_FAILED, we use connection_failed() to select whether we > need a new connection or stop trying altogether, but that function's > description states: > >> + * Out-of-line portion of the CONNECTION_FAILED() macro >> + * >> + * Returns true, if we should retry the connection with different encryption method. > > Which to me reads like we should reuse the connection, and try a > different method on that same connection. Maybe we can improve the > wording to something like > + * Returns true, if we should reconnect with a different encryption method. > to make the reconnect part more clear. Changed to "Returns true, if we should reconnect and retry with a different encryption method". > In select_next_encryption_method, there are several copies of this pattern: > > if ((remaining_methods & ENC_METHOD) != 0) > { > conn->current_enc_method = ENC_METHOD; > return true; > } > > I think a helper macro would reduce the verbosity of the scaffolding, > like in the attached SELECT_NEXT_METHOD.diff.txt. Applied. In addition to the above, I made heavy changes to the tests. I wanted to test not just the outcome (SSL, GSSAPI, plaintext, or fail), but also the steps and reconnections needed to get there. To facilitate that, I rewrote how the expected outcome was represented in the test script. It now uses a table-driven approach, with a line for each test iteration, ie. for each different combination of options that are tested. I then added some more logging, so that whenever the server receives an SSLRequest or GSSENCRequest packet, it logs a line. That's controlled by a new not-in-sample GUC ("trace_connection_negotiation"), intended only for the test and debugging. The test scrapes the log for the lines that it prints, and the expected output includes a compact trace of expected events. For example, the expected output for "user=testuser gssencmode=prefer sslmode=prefer sslnegotiation=direct", when GSS and SSL are both disabled in the server, looks like this: # USER GSSENCMODE SSLMODE SSLNEGOTIATION EVENTS -> OUTCOME testuser prefer prefer direct connect, directsslreject, reconnect, sslreject, authok -> plain That means, we expect libpq to first try direct SSL, which is rejected by the server. It should then reconnect and attempt traditional negotiated SSL, which is also rejected. Finally, it should try plaintext authentication, without reconnecting, which succeeds. That actually revealed a couple of slightly bogus behaviors with the current code. Here's one example: # XXX: libpq retries the connection unnecessarily in this case: nogssuser require allow connect, gssaccept, authfail, reconnect, gssaccept, authfail -> fail That means, with "gssencmode=require sslmode=allow", if the server accepts the GSS encryption but refuses the connection at authentication, libpq will reconnect and go through the same motions again. The second attempt is pointless, we know it's going to fail. The refactoring to the libpq state machine fixed that issue as a side-effect. I removed the server ssl_enable_alpn and libpq sslalpn options. The idea was that they could be useful for testing, but we didn't actually have any tests that would use them, and you get the same result by testing with an older server or client version. I'm open to adding them back if we also add tests that benefit from them, but they were pretty pointless as they were. One important open item now is that we need to register a proper ALPN protocol ID with IANA. -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > Committed this. Thank you to everyone involved! Looks like perlcritic isn't too happy with the test code: koel and crake say ./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - chmod at line 138,column 2. See pages 208,278 of PBP. ([InputOutput::RequireCheckedSyscalls] Severity: 5) ./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - open at line 184, column1. See pages 208,278 of PBP. ([InputOutput::RequireCheckedSyscalls] Severity: 5) regards, tom lane
On 08/04/2024 04:28, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> Committed this. Thank you to everyone involved! > > Looks like perlcritic isn't too happy with the test code: > koel and crake say > > ./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - chmod at line 138,column 2. See pages 208,278 of PBP. ([InputOutput::RequireCheckedSyscalls] Severity: 5) > ./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of flagged function ignored - open at line 184,column 1. See pages 208,278 of PBP. ([InputOutput::RequireCheckedSyscalls] Severity: 5) Fixed, thanks. I'll make a note in my personal TODO list to add perlcritic to cirrus CI if possible. I rely heavily on that nowadays to catch issues before the buildfarm. -- Heikki Linnakangas Neon (https://neon.tech)
On 08/04/2024 04:25, Heikki Linnakangas wrote: > One important open item now is that we need to register a proper ALPN > protocol ID with IANA. I sent a request for that: https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/ -- Heikki Linnakangas Neon (https://neon.tech)
On 08.04.24 10:38, Heikki Linnakangas wrote: > On 08/04/2024 04:25, Heikki Linnakangas wrote: >> One important open item now is that we need to register a proper ALPN >> protocol ID with IANA. > > I sent a request for that: > https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/ Why did you ask for "pgsql"? The IANA protocol name for port 5432 is "postgres". This seems confusing.
On 01.03.24 22:49, Jacob Champion wrote: > If we're interested in ALPN negotiation in the future, we may also > want to look at GREASE [1] to keep those options open in the presence > of third-party implementations. Unfortunately OpenSSL doesn't do this > automatically yet. > > If we don't have a reason not to, it'd be good to follow the strictest > recommendations from [2] to avoid cross-protocol attacks. (For anyone > currently running web servers and Postgres on the same host, they > really don't want browsers "talking" to their Postgres servers.) That > would mean checking the negotiated ALPN on both the server and client > side, and failing if it's not what we expect. I've been reading up on ALPN. There is another thread that is discussing PostgreSQL protocol version negotiation, and ALPN also has "protocol negotiation" in the name and there is some discussion in this thread about the granularity oft the protocol names. I'm concerned that there appears to be some confusion over whether ALPN is a performance feature or a security feature. RFC 7301 appears to be pretty clear that it's for performance, not for security. Looking at the ALPACA attack, I'm not convinced that it's very relevant for PostgreSQL. It's basically just a case of, you connected to the wrong server. And web browsers routinely open additional connections based on what data they have previously received, and they liberally send along session cookies to those new connections, so I understand that this can be a problem. But I don't see how ALPN is a good defense. It can help only if all other possible services other than http implement it and say, you're a web browser, go away. And what if the rogue server is in fact a web server, then it doesn't help at all. I guess there could be some common configurations where there is a web server, and ftp server, and some mail servers running on the same TLS end point. But in how many cases is there also a PostgreSQL server running on the same end point? The page about ALPACA also suggests SNI as a mitigation, which seems more sensible, because the burden is then on the client to do the right thing, and not on all other servers to send away clients doing the wrong thing. And of course libpq already supports SNI. For the protocol negotiation aspect, how does this work if the wrapped protocol already has a version negotiation system? For example, various HTTP versions are registered as separate protocols for ALPN. What if ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or vice versa? What is the actual mechanism where the performance benefits (saving round-trips) are created? I haven't caught up with HTTP 2 and so on, so maybe there are additional things at play there, but it is not fully explained in the RFCs. I suppose PostgreSQL would keep its internal protocol version negotiation in any case, but then what do we need ALPN on top for?
On Wed, Apr 24, 2024 at 1:57 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I'm concerned that there appears to be some confusion over whether ALPN > is a performance feature or a security feature. RFC 7301 appears to be > pretty clear that it's for performance, not for security. It was also designed to give benefits for more complex topologies (proxies, cert selection, etc.), but yeah, this is a mitigation technique that just uses what is already widely implemented. > Looking at the ALPACA attack, I'm not convinced that it's very relevant > for PostgreSQL. It's basically just a case of, you connected to the > wrong server. I think that's an oversimplification. This prevents active MITM, where an adversary has connected you to the wrong server. > But I don't see how ALPN is a good defense. > It can help only if all other possible services other than http > implement it and say, you're a web browser, go away. Why? An ALPACA-aware client will fail the connection if the server doesn't advertise the correct protocol. An ALPACA-aware server will fail the handshake if the client doesn't advertise the correct protocol. They protect themselves, and their peers, without needing their peers to understand. > And what if the > rogue server is in fact a web server, then it doesn't help at all. It's not a rogue server; the attack is using other friendly services against you. If you're able to set up an attacker-controlled server, using the same certificate as the valid server, on a host covered by the cert, I think it's game over for many other reasons. If you mean that you can't prevent an attacker from redirecting one web server's traffic to another (friendly) web server that's running on the same host, that's correct. Web admins who care would need to implement countermeasures, like Origin header filtering or something? I don't think we have a similar concept to that -- it'd be nice! -- but we don't need to have one in order to provide protection for the other network protocols we exist next to. > I > guess there could be some common configurations where there is a web > server, and ftp server, and some mail servers running on the same TLS > end point. But in how many cases is there also a PostgreSQL server > running on the same end point? Not only have I seen those cohosted, I've deployed such setups myself. Isn't that basically cPanel's MO, and a standard setup for <shared web hosting provider here>? (It's been a while and I don't have a setup handy to double-check, sorry; feel free to push back if they don't do that anymore.) A quick search for "running web server and Postgres on the same host" seems to yield plenty of conversations. Some of those conversations say "don't do it", but of course others do not :) Some actively encourage it for simplicity. > The page about ALPACA also suggests SNI > as a mitigation, which seems more sensible, because the burden is then > on the client to do the right thing, and not on all other servers to > send away clients doing the wrong thing. And of course libpq already > supports SNI. That mitigates a different attack. From the ALPACA site [1]: > Implementing these [ALPN] countermeasures is effective in preventing cross-protocol attacks irregardless of hostnames andports used for application servers. > ... > Implementing these [SNI] countermeasures is effective in preventing same-protocol attacks on servers with different hostnames,as well as cross-protocol attacks on servers with different hostnames even if the ALPN countermeasures can notbe implemented. SNI is super useful; it's just not always enough. And a strict SNI check would also be good to do, but it doesn't seem imperative to tie it to this feature, since same-protocol attacks were already possible AFAICT. It's the cross-protocol attacks that are new, made possible by the new handshake. > For the protocol negotiation aspect, how does this work if the wrapped > protocol already has a version negotiation system? For example, various > HTTP versions are registered as separate protocols for ALPN. What if > ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or > vice versa? If a client or server incorrectly negotiates a protocol and then starts speaking a different one, then it's just protocol-dependent whether that works or not. HTTP/1.0 and HTTP/1.1 would still be cross-compatible in some cases. The others, not so much. > What is the actual mechanism where the performance benefits > (saving round-trips) are created? The negotiation gets done as part of the TLS handshake, which had to be done anyway. > I haven't caught up with HTTP 2 and > so on, so maybe there are additional things at play there, but it is not > fully explained in the RFCs. Practically speaking, HTTP/2 is negotiated via ALPN in the real world, at least last I checked. I don't think browsers ever supported the plaintext h2c:// scheme. There's also an in-band `Upgrade: h2c` path defined that does not use ALPN at all, but again I don't think any browsers use it. > I suppose PostgreSQL would keep its > internal protocol version negotiation in any case, but then what do we > need ALPN on top for? That is entirely up to us. If there's a 4.0 protocol that's completely incompatible at the network level (multiplexing? QUIC?) then issuing a new ALPN would probably be useful. Thanks, --Jacob [1] https://alpaca-attack.com/libs.html
On 24/04/2024 23:51, Peter Eisentraut wrote: > On 08.04.24 10:38, Heikki Linnakangas wrote: >> On 08/04/2024 04:25, Heikki Linnakangas wrote: >>> One important open item now is that we need to register a proper ALPN >>> protocol ID with IANA. >> >> I sent a request for that: >> https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/ > > Why did you ask for "pgsql"? The IANA protocol name for port 5432 is > "postgres". This seems confusing. Oh, I was not aware of that. According to [1], it's actually "postgresql". The ALPN registration has not been approved yet, so I'll reply on the ietf thread to point that out. [1] https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt -- Heikki Linnakangas Neon (https://neon.tech)