Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert - Mailing list pgsql-hackers
From | Daniel Gustafsson |
---|---|
Subject | Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert |
Date | |
Msg-id | 9B6FE12B-D665-4764-895B-839CF650D543@yesql.se Whole thread Raw |
In response to | Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert (Jacob Champion <jchampion@timescale.com>) |
Responses |
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
|
List | pgsql-hackers |
> On 14 Apr 2023, at 19:34, Jacob Champion <jchampion@timescale.com> wrote: > > On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson <daniel@yesql.se> wrote: >> And again with the attachment. > > After some sleep... From inspection I think the final EOF branch could > be masked by the new branch, if verification has failed but was already > ignored. > > To test that, I tried hanging up on the client partway through the > server handshake, and I got some strange results. With the patch, using > sslmode=require and OpenSSL 1.0.1, I see: > > connection to server at "127.0.0.1", port 50859 failed: SSL error: > certificate verify failed: self signed certificate > > Which is wrong -- we shouldn't care about the self-signed failure if > we're not using verify-*. I was going to suggest a patch like the following: > >> if (r == -1) >> - libpq_append_conn_error(conn, "SSL SYSCALL error: %s", >> - SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf))); >> + { >> + /* >> + * If we get an X509 error here without an error in the >> + * socket layer it means that verification failed without >> + * it being a protocol error. A common cause is trying to >> + * a default system CA which is missing or broken. >> + */ >> + if (!save_errno && vcode != X509_V_OK) >> + libpq_append_conn_error(conn, "SSL error: certificate verify failed: %s", >> + X509_verify_cert_error_string(vcode)); >> + else >> + libpq_append_conn_error(conn, "SSL SYSCALL error: %s", >> + SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf))); >> + } >> else >> libpq_append_conn_error(conn, "SSL SYSCALL error: EOF detected"); > > But then I tested my case against PG15, and I didn't get the EOF message > I expected: > > connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL > error: Success This "error: Success" error has been reported to the list numerous times as misleading, and I'd love to make progress on improving error reporting during the v17 cycle. > So it appears that this (hanging up on the client during the handshake) > is _also_ a case where we could get a SYSCALL error with a zero errno, > and my patch doesn't actually fix the misleading error message. > > That makes me worried, but I don't really have a concrete suggestion to > make it better, yet. I'm not opposed to pushing this as a fix for the > tests, but I suspect we'll have to iterate on this more. So, taking a step back. We know that libpq error reporting for SSL errors isn't great, the permutations of sslmodes and OpenSSL versions and the very fine-grained error handling API of OpenSSL make it hard to generalize well. That's not what we're trying to solve here. What we are trying solve is this one case where we know exactly what went wrong, and we know that the error message as-is will be somewhere between misleading and utterly bogus. The committed feature is working as intended, and the connection is refused as it should when no CA is available, but we know it's a situation which is quite easy to get oneself into (a typo in an environment variable can be enough). So what we can do is pinpoint that specific case and leave the unknowns to the current error reporting for consistency with older postgres versions. The attached checks for the specific known error, and leave all the other cases to the same logging that we have today. It relies on the knowledge that system sslrootcert configs has deferred loading, and will run with verify-full. So if we see an X509 failure in loading the local issuer cert here then we know the the user wanted to use the system CA pool for certificate verification but the root CA cannot be loaded for some reason. -- Daniel Gustafsson
Attachment
pgsql-hackers by date: