Re: Experiments with Postgres and SSL - Mailing list pgsql-hackers
From | Jacob Champion |
---|---|
Subject | Re: Experiments with Postgres and SSL |
Date | |
Msg-id | CAOYmi+=cnV-8V8TndSkEF6Htqa7qHQUL_KnQU8-DrT0Jjnm3_Q@mail.gmail.com Whole thread Raw |
In response to | Re: Experiments with Postgres and SSL (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Experiments with Postgres and SSL
|
List | pgsql-hackers |
On Tue, Mar 5, 2024 at 6:09 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > I hope I didn't joggle your elbow reviewing this Nope, not at all! > The tests are still not distinguishing whether a connection was > established in direct or negotiated mode. So if we e.g. had a bug that > accidentally disabled direct SSL connection completely and always used > negotiated mode, the tests would still pass. I'd like to see some tests > that would catch that. +1 On Mon, Mar 4, 2024 at 7:29 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 01/03/2024 23:49, Jacob Champion wrote: > > I'll squint more closely at the MITM-protection changes in 0008 later. > > First impressions, though: it looks like that code has gotten much > > less straightforward, which I think is dangerous given the attack it's > > preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our > > protocol doesn't seem particularly replay-safe to me.) > > Let's drop that patch. AFAICS it's not needed by the rest of the patches. Okay, sounds good. > > If we're interested in ALPN negotiation in the future, we may also > > want to look at GREASE [1] to keep those options open in the presence > > of third-party implementations. Unfortunately OpenSSL doesn't do this > > automatically yet. > > Can you elaborate? Sure: now that we're letting middleboxes and proxies inspect and react to connections based on ALPN, it's possible that some intermediary might incorrectly fixate on the "postgres" ID (or whatever we choose in the end), and shut down connections that carry additional protocols rather than ignoring them. That would prevent future graceful upgrades where the client sends both "postgres/X" and "postgres/X+1". While that wouldn't be our fault, it'd be cold comfort to whoever has that middlebox. GREASE is a set of reserved protocol IDs that you can add randomly to your ALPN list, so any middleboxes that fail to follow the rules will just break outright rather than silently proliferating. (Hence the pun: GREASE keeps the joints in the pipe from rusting into place.) The RFC goes into more detail about how to do it. And I don't know if it's necessary for a v1, but it'd be something to keep in mind. > Do we need to do something extra in the server to be > compatible with GREASE? No, I think that as long as we use OpenSSL's APIs correctly on the server side, we'll be compatible by default. This would be a client-side implementation, to push random GREASE strings into the ALPN list. (There is a risk that if/when OpenSSL finally starts supporting this transparently, we'd need to remove it from our code.) > > If we don't have a reason not to, it'd be good to follow the strictest > > recommendations from [2] to avoid cross-protocol attacks. (For anyone > > currently running web servers and Postgres on the same host, they > > really don't want browsers "talking" to their Postgres servers.) That > > would mean checking the negotiated ALPN on both the server and client > > side, and failing if it's not what we expect. > > Hmm, I thought that's what the patches does. But looking closer, libpq > is not checking that ALPN was used. We should add that. Am I right? Right. Also, it looks like the server isn't failing the TLS handshake itself, but instead just dropping the connection after the handshake. In a cross-protocol attack, there's a danger that the client (which is not speaking our protocol) could still treat the server as authoritative in that situation. > > I'm not excited about the proliferation of connection options. I don't > > have a lot of ideas on how to fix it, though, other than to note that > > the current sslnegotiation option names are very unintuitive to me: > > - "postgres": only legacy handshakes > > - "direct": might be direct... or maybe legacy > > - "requiredirect": only direct handshakes... unless other options are > > enabled and then we fall back again to legacy? How many people willing > > to break TLS compatibility with old servers via "requiredirect" are > > going to be okay with lazy fallback to GSS or otherwise? > > Yeah, this is my biggest complaint about all this. Not so much the names > of the options, but the number of combinations of different options, and > how we're going to test them all. I don't have any great solutions, > except adding a lot of tests to cover them, like Matthias did. The default gssencmode=prefer is especially problematic if I'm trying to use sslnegotiation=requiredirect for security. It'll appear to work at first, but if somehow I get a credential cache into my environment, libpq will suddenly fall back to plaintext negotiation :( > > (Plus, we need to have a good error message when connecting to older > > servers anyway.I think we should be able to key off of the EOF coming > > back from OpenSSL; it'd be a good excuse to give that part of the code > > some love.) > > Hmm, if OpenSSL sends ClientHello and the server responds with a > Postgres error packet, OpenSSL will presumably consume the error packet > or at least part of it. But with our custom BIO, we can peek at the > server response before handing it to OpenSSL. I don't think an error packet is going to come back with the currently-shipped implementations. IIUC, COMMERROR packets are swallowed instead of emitted before authentication completes. So I see EOFs when trying to connect to older servers. Do you know of any situations where we'd see an actual error message on the wire? > If it helps, we could backport a nicer error message to old server > versions, similar to what we did with SCRAM in commit 96d0f988b1. That might be nice regardless, instead of pushing "invalid length of startup packet" into the logs. > > For the record, I'm adding some one-off tests for this feature to a > > local copy of my OAuth pytest suite, which is designed to do the kinds > > of testing you're running into trouble with. It's not in any way > > viable for a PG17 commit, but if you're interested I can make the > > patches available. > > Yes please, it would be nice to see what tests you've performed, and > have it archived. I've cleaned it up a bit and put it up at [1]. (If you want, I can attach the GitHub-generated ZIP, so the mailing list has a snapshot.) These include happy-path tests for direct SSL, some failure modes, and an example test that combines the GSS and SSL negotiation paths. So there might be test bugs, but with the v8 patchset, I see the following failures: > FAILED client/test_tls.py::test_direct_ssl_without_alpn - AssertionError: client sent unexpected data I.e. the client doesn't disconnect if the server doesn't select our protocol. > FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[direct-True] - AssertionError: Regex pattern did not match. > FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[requiredirect-False] - AssertionError: Regex pattern didnot match. > FAILED client/test_tls.py::test_gssapi_negotiation - AssertionError: Regex pattern did not match. These are complaining about the "SSL SYSCALL error: EOF detected" error messages that the client returns. > FAILED server/test_tls.py::test_direct_ssl_without_alpn[no application protocols] - Failed: DID NOT RAISE <class 'ssl.SSLError'> > FAILED server/test_tls.py::test_direct_ssl_without_alpn[incorrect application protocol] - Failed: DID NOT RAISE <class'ssl.SSLError'> I.e. the server allows the handshake to complete without a proper ALPN selection. Thanks, --Jacob [1] https://github.com/jchampio/pg-pytest-suite
pgsql-hackers by date: