Thread: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

[PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Thomas Habets
Date:
With Letsencrypt now protecting web servers left and right, and it makes sense to me to just re-use the cert that the server may already have installed.

I've tested this on debian with the client compiled from the master branch, against a 13.3 server.

This is my first patch to postgresql, so I apologize for any process errors. I tried to follow https://wiki.postgresql.org/wiki/Submitting_a_Patch

Hope this list takes attachments.

--
typedef struct me_s {
 char name[]      = { "Thomas Habets" };
 char email[]     = { "thomas@habets.se" };
 char kernel[]    = { "Linux" };
 char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
 char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
 char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;
Attachment
Thomas Habets <thomas@habets.se> writes:
> With Letsencrypt now protecting web servers left and right, and it makes
> sense to me to just re-use the cert that the server may already have
> installed.

I'm confused by your description of this patch.  AFAIK, OpenSSL verifies
against the system-wide CA pool by default.  Why do we need to do
anything?

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
thomas@habets.se
Date:
On Mon, 6 Sep 2021 20:47:37 +0100, Tom Lane <tgl@sss.pgh.pa.us> said:
> I'm confused by your description of this patch.  AFAIK, OpenSSL verifies
> against the system-wide CA pool by default.  Why do we need to do
> anything?

Experimentally, no it doesn't. Or if it does, then it doesn't verify
the CN/altnames of the cert.

sslmode=require allows self-signed and name mismatch.

verify-ca errors out if there is no ~/.postgresql/root.crt. verify-full too.

It seems that currently postgresql verifies the name if and only if
verify-full is used, and then only against ~/.postgresql/root.crt CA file.

But could be that I missed a config option?

--
typedef struct me_s {
  char name[]      = { "Thomas Habets" };
  char email[]     = { "thomas@habets.se" };
  char kernel[]    = { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 9/6/21 6:21 PM, thomas@habets.se wrote:
> On Mon, 6 Sep 2021 20:47:37 +0100, Tom Lane <tgl@sss.pgh.pa.us> said:
>> I'm confused by your description of this patch.  AFAIK, OpenSSL verifies
>> against the system-wide CA pool by default.  Why do we need to do
>> anything?
> Experimentally, no it doesn't. Or if it does, then it doesn't verify
> the CN/altnames of the cert.
>
> sslmode=require allows self-signed and name mismatch.
>
> verify-ca errors out if there is no ~/.postgresql/root.crt. verify-full too.
>
> It seems that currently postgresql verifies the name if and only if
> verify-full is used, and then only against ~/.postgresql/root.crt CA file.
>
> But could be that I missed a config option?



That's my understanding. But can't you specify a CA cert in the system's
CA store if necessary? e.g. on my Fedora system I think it's
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
thomas@habets.se
Date:
On Tue, 7 Sep 2021 15:16:51 +0100, Andrew Dunstan <andrew@dunslane.net> said:
> can't you specify a CA cert in the system's
> CA store if necessary? e.g. on my Fedora system I think it's
> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

I could, but that seems more like a workaround, where I have to change
things around as LetsEncrypt switches to another root (I believe they
have in the past, but I'm not sure), or the server decides to switch
from LetsEncrypt to something else. Then all clients need to update.

Such a decision could actually be made by whoever runs the webserver,
not the database, and the database just reuses the cert and gets a
free ride for cert renewals.

So in other words postgresql currently doesn't use the system database
at all, and the workaround is to find and copy from the system
database. I agree that is a workaround.

If you think this is enough of a corner case that the workaround is
acceptable, or the added complexity of another sslmode setting isn't
worth fixing this edge case, then I assume you have more knowledge
about postgres is used in the field than I do.

But it's not just about today. I would hope that now with LE that
every user of SSL starts using "real" certs. Postgres default settings
imply that most people who even enable SSL will not verify the CA nor
the name, which is a shame.

--
typedef struct me_s {
  char name[]      = { "Thomas Habets" };
  char email[]     = { "thomas@habets.se" };
  char kernel[]    = { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;



thomas@habets.se writes:
> On Tue, 7 Sep 2021 15:16:51 +0100, Andrew Dunstan <andrew@dunslane.net> said:
>> can't you specify a CA cert in the system's
>> CA store if necessary? e.g. on my Fedora system I think it's
>> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

> I could, but that seems more like a workaround, where I have to change
> things around as LetsEncrypt switches to another root (I believe they
> have in the past, but I'm not sure), or the server decides to switch
> from LetsEncrypt to something else. Then all clients need to update.

I experimented with loading a real (not self-signed, not from a private
CA) cert into the server, and I can confirm these results when trying
to use sslmode=verify-ca or sslmode=verify-full:

* libpq fails the connection if ~/.postgresql/root.crt is missing
or empty.

* If I put an irrelevant cert into ~/.postgresql/root.crt, then
libpq reports "SSL error: certificate verify failed".  So the
verification insists that the server's cert chain to whatever
is in root.crt.

This does seem to make it unreasonably painful to use a real SSL cert
for a PG server.  If you've gone to the trouble and expense of getting
a real cert, it should not be on you to persuade the clients that
it's valid.  I agree with Thomas that copying the system trust store
into users' home directories is a completely horrid idea, from both
the ease-of-use and maintenance standpoints.

This is not how I supposed it worked, so I'm coming around to the idea
that we need to do something.  I don't like the details of Thomas'
proposal though; specifically I don't see a need to invent a new sslmode
value.  I think it should just be "if ~/.postgresql/root.crt doesn't
exist, use the system's default trust store".

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 9/7/21 10:57 AM, thomas@habets.se wrote:
> On Tue, 7 Sep 2021 15:16:51 +0100, Andrew Dunstan <andrew@dunslane.net> said:
>> can't you specify a CA cert in the system's
>> CA store if necessary? e.g. on my Fedora system I think it's
>> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt
> I could, but that seems more like a workaround, where I have to change
> things around as LetsEncrypt switches to another root (I believe they
> have in the past, but I'm not sure), or the server decides to switch
> from LetsEncrypt to something else. Then all clients need to update.
>
> Such a decision could actually be made by whoever runs the webserver,
> not the database, and the database just reuses the cert and gets a
> free ride for cert renewals.
>
> So in other words postgresql currently doesn't use the system database
> at all, and the workaround is to find and copy from the system
> database. I agree that is a workaround.
>
> If you think this is enough of a corner case that the workaround is
> acceptable, or the added complexity of another sslmode setting isn't
> worth fixing this edge case, then I assume you have more knowledge
> about postgres is used in the field than I do.
>
> But it's not just about today. I would hope that now with LE that
> every user of SSL starts using "real" certs. Postgres default settings
> imply that most people who even enable SSL will not verify the CA nor
> the name, which is a shame.


It would be if it were true, but it's not. In talks I give on
PostgreSQL+SSL I highly recommend people use verify-full. And the CA
doesn't have to be one that's publicly known. We cater for both public
and private CAs.

You don't have to copy anything to achieve what you want. Just set the
sslrootcert parameter of your connection to point to the system file. e.g.

psql "sslmode=verify-full sslrootcert=/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt ..."


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Andrew Dunstan <andrew@dunslane.net> writes:
> You don't have to copy anything to achieve what you want. Just set the
> sslrootcert parameter of your connection to point to the system file. e.g.

> psql "sslmode=verify-full sslrootcert=/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt ..."

While that does work for me, it seems pretty OS-specific and
user-unfriendly.  Why should ordinary users need to know that
much about their platform's OpenSSL installation?

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 9/7/21 11:47 AM, Tom Lane wrote:
>
> This is not how I supposed it worked, 


That happens to me more than I usually admit -)


> so I'm coming around to the idea
> that we need to do something.  I don't like the details of Thomas'
> proposal though; specifically I don't see a need to invent a new sslmode
> value.  I think it should just be "if ~/.postgresql/root.crt doesn't
> exist, use the system's default trust store".
>
>             


I agree sslmode is the wrong vehicle.

An alternative might be to allow a magic value for sslrootcert, say
"system" which would make it go and look in the system's store wherever
that is, without the user having to know exactly where. OTOH it would
require that the user knows that the system's store is being used, which
might not be a bad thing.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Andrew Dunstan <andrew@dunslane.net> writes:
> On 9/7/21 11:47 AM, Tom Lane wrote:
>> so I'm coming around to the idea
>> that we need to do something.  I don't like the details of Thomas'
>> proposal though; specifically I don't see a need to invent a new sslmode
>> value.  I think it should just be "if ~/.postgresql/root.crt doesn't
>> exist, use the system's default trust store".

> An alternative might be to allow a magic value for sslrootcert, say
> "system" which would make it go and look in the system's store wherever
> that is, without the user having to know exactly where. OTOH it would
> require that the user knows that the system's store is being used, which
> might not be a bad thing.

Yeah, that would mostly fix the usability concern.  I guess what it
comes down to is whether you think that public or private certs are
likely to be the majority use-case in the long run.  The shortage of
previous requests for this feature says that right now, just about
everyone is using self-signed or private-CA certs for Postgres
servers.  So it would likely be a long time, if ever, before public-CA
certs become the majority use-case.

On the other hand, even if I'm using a private CA, there's a lot
to be said for adding its root cert to system-level trust stores
rather than copying it into individual users' home directories.
So I still feel like there's a pretty good case for allowing use
of the system store to happen by default.  (As I said, I'd always
thought that was *already* what would happen.)

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Greg Stark
Date:
On Tue, 7 Sept 2021 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I guess what it
> comes down to is whether you think that public or private certs are
> likely to be the majority use-case in the long run.  The shortage of
> previous requests for this feature says that right now, just about
> everyone is using self-signed or private-CA certs for Postgres
> servers.  So it would likely be a long time, if ever, before public-CA
> certs become the majority use-case.

Well the main thing making public CA certs a pain is precisely tools
that are a pain to configure to use public CA certs so it's a bit of a
chicken and egg problem. Projects like LetsEncrypt are all about
making public CA certs work easily without any additional effort.

However I have a different question. Are the system certificates
intended or general purpose certificates? Do they have their intended
uses annotated on the certificates? Does SSL Verification have any
logic deciding which certificates are appropriate for signing servers?

I ask because the only authority I'm personally aware of is the web
browser consortium that approves signers for web site domains. That's
what web browsers need but I'm not sure those are the same authorities
that are appropriate for internal services like databases.


-- 
greg



Greg Stark <stark@mit.edu> writes:
> However I have a different question. Are the system certificates
> intended or general purpose certificates? Do they have their intended
> uses annotated on the certificates? Does SSL Verification have any
> logic deciding which certificates are appropriate for signing servers?

AFAIK, once you've stuck a certificate into the system store, it
will be trusted by every service on your machine.  Most distros
ship system-store contents that are basically just designed for
web browers, because the web is the only widely-applicable use
case.  Like you said, chicken and egg problem.

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Greg Stark
Date:
Hm. Let's Encrypt's FAQ tells me I'm on the right track with that
question but the distinctinos are far more coarse than I was worried
about:


Does Let’s Encrypt issue certificates for anything other than SSL/TLS
for websites?

Let’s Encrypt certificates are standard Domain Validation
certificates, so you can use them for any server that uses a domain
name, like web servers, mail servers, FTP servers, and many more.

Email encryption and code signing require a different type of
certificate that Let’s Encrypt does not issue.


So it sounds like, at least for SSL connections, we should use the
same certificate authorities used to authenticate web sites. If ever
we implemented signed extensions, for example, it might require
different certificates -- I don't know what that means for the SSL
validation rules and the storage for them.



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Cameron Murdoch
Date:
Hi,

I manage a bunch of Postgres servers at Oslo University and we use real ssl certs on all our servers.

I was actually really surprised to discover that the libpq default is sslmode=require and that the root cert defaults to a file under the user’s home directory. I have been planning to use our management system (CFEngine) to globally change the client settings to verify-ca and to use the system trust store.

So that’s a +1 to use the system cert store for client connections.

I also agree that the proposed patch is not the right way to go as it is essentially the same as verify-full, and I think that the correct fix would be to change the default.

Thanks
C

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Thomas Habets
Date:
On Sat, 18 Sept 2021 at 00:10, Cameron Murdoch <cam@macaroon.net> wrote:
I also agree that the proposed patch is not the right way to go as it is essentially the same as verify-full, and I think that the correct fix would be to change the default.

But these are two changes:
1. Actually verify against a CA
2. Actually check the CN/altnames

Anything short of "verify-full" is in my view "not checking". Even with a private CA this allows for a lot of lateral movement in an org, as if you have one cert you have them all, for impersonation purposes.

Changing such a default is a big change. Maybe long term it's worth the short term pain, though. Long term it'd be the config of least surprise, in my opinion.
But note that one has to think about all the settings, such that the default is not more checking than "require", which might also be surprising.

A magic setting of the file to be "system" sounds good for my use cases, at least.



--
typedef struct me_s {
 char name[]      = { "Thomas Habets" };
 char email[]     = { "thomas@habets.se" };
 char kernel[]    = { "Linux" };
 char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
 char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
 char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Cameron Murdoch
Date:
On Sat, 18 Sep 2021 at 12:57, Thomas Habets <thomas@habets.se> wrote:

But these are two changes:
1. Actually verify against a CA
2. Actually check the CN/altnames

Anything short of "verify-full" is in my view "not checking". Even with a private CA this allows for a lot of lateral movement in an org, as if you have one cert you have them all, for impersonation purposes.

100% agree. I suspect that many postgres users are not completely aware that by default their ssl connections do not check the CA or CN/altnames.


Changing such a default is a big change.

Agreed. It is going to break existing installs that rely on the current behaviour.

There are two defaults to worry about here:

sslmode=prefer
sslrootcert=~/.postgresql/root.crt

Having sslrootcert use the system trust store if ~/.postgresql/root.crt doesn’t exist would seem like a good change.

Changing sslmode to default to something else would mostly likely break a ton of existing installations, and there are plenty of use cases were ssl isn’t used. Trying ssl first and without afterwards probably is still a sensible default. However…

I haven’t completely through this through, but what if the sslmode=prefer logic was:

1. Try ssl first, with both CA and CN checking (ie same as verify-full)
2. Print warnings appropriate to what type of ssl connection can be made
3. If all else fails, try without ssl.

In other words start with verify-full and downgrade gracefully to prefer, but actually tell the user that this has happen.

Essentially sslmode=prefer is a type of opportunistic encryption. I’m suggesting making it try stronger levels of ssl opportunistically. Require, verify-ca and verify-full can keep their semantics, or rather, they should all try verify-full first and then downgrade (with warnings logged) to the level they actually enforce.

Thanks
C

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:

On 9/17/21 5:35 PM, Greg Stark wrote:
> Hm. Let's Encrypt's FAQ tells me I'm on the right track with that
> question but the distinctinos are far more coarse than I was worried
> about:
>
>
> Does Let’s Encrypt issue certificates for anything other than SSL/TLS
> for websites?
>
> Let’s Encrypt certificates are standard Domain Validation
> certificates, so you can use them for any server that uses a domain
> name, like web servers, mail servers, FTP servers, and many more.
>
> Email encryption and code signing require a different type of
> certificate that Let’s Encrypt does not issue.



Presumably this should be a certificate something like our client certs,
where the subject designates a user id or similar (e.g. an email
address) rather than a domain name.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Sat, 2021-09-18 at 14:20 +0200, Cameron Murdoch wrote:
> Having sslrootcert use the system trust store if
> ~/.postgresql/root.crt doesn’t exist would seem like a good change.

Fallback behavior can almost always be exploited given the right
circumstances. IMO, if I've told psql to use a root cert, it really
needs to do that and not trust anything else.

> Changing sslmode to default to something else would mostly likely
> break a ton of existing installations, and there are plenty of use
> cases were ssl isn’t used. Trying ssl first and without afterwards
> probably is still a sensible default. However…

The discussion on changing the sslmode default behavior seems like it
can be separated from the use of system certificates. Not to shut down
that branch of the conversation, but is there enough tentative support
for an "sslrootcert=system" option to move forward with that, while
also discussing potential changes to the sslmode defaults?

The NSS patchset [1] also deals with this problem. FWIW, it currently
treats an empty ssldatabase setting as "use the system's (Mozilla's)
trusted roots".

--Jacob

[1] https://www.postgresql.org/message-id/flat/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 9/22/21 2:36 PM, Jacob Champion wrote:
> On Sat, 2021-09-18 at 14:20 +0200, Cameron Murdoch wrote:
>> Having sslrootcert use the system trust store if
>> ~/.postgresql/root.crt doesn’t exist would seem like a good change.
> Fallback behavior can almost always be exploited given the right
> circumstances. IMO, if I've told psql to use a root cert, it really
> needs to do that and not trust anything else.
>
>> Changing sslmode to default to something else would mostly likely
>> break a ton of existing installations, and there are plenty of use
>> cases were ssl isn’t used. Trying ssl first and without afterwards
>> probably is still a sensible default. However…
> The discussion on changing the sslmode default behavior seems like it
> can be separated from the use of system certificates. Not to shut down
> that branch of the conversation, but is there enough tentative support
> for an "sslrootcert=system" option to move forward with that, while
> also discussing potential changes to the sslmode defaults?
>
> The NSS patchset [1] also deals with this problem. FWIW, it currently
> treats an empty ssldatabase setting as "use the system's (Mozilla's)
> trusted roots".
>


I think we need to be consistent on this. NSS builds and OpenSSL builds
should act the same, mutatis mutandis.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 22 Sep 2021, at 20:59, Andrew Dunstan <andrew@dunslane.net> wrote:

> I think we need to be consistent on this. NSS builds and OpenSSL builds
> should act the same, mutatis mutandis.

I 100% agree.  Different TLS backends should be able use different truststores
etc but once the server is running they must be identical in terms of how they
interact with a connecting client.  I've tried hard to match our OpenSSL
implementation when hacking on the NSS support, but no doubt I've slipped up
somewhere so indepth reviews like what Jacob et.al have done is needed (and
very welcome).

--
Daniel Gustafsson        https://vmware.com/




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
thomas@habets.se
Date:
On Tue, 28 Sep 2021 02:09:11 +0100, Bruce Momjian <bruce@momjian.us> said:
> I don't think public CA's are not a good idea for complex setups since
> they open the ability for an external party to create certificates that
> are trusted by your server's CA, e.g., certificate authentication.

I'm not arguing for, and in fact would argue against, public CA for
client certs.

So that's a separate issue.

Note that mTLS prevents a MITM attack that exposes server data even if
server cert is compromised or re-issued, so if the install is using
client certs (with private CA) then the public CA for server matters
much less.

You can end up at the wrong server, yes, and provide data as INSERT,
but can't steal or corrupt existing data.

And you say for complex setups. Fair enough. But currently I'd say the
default is wrong, and what should be default is not configurable.

--
typedef struct me_s {
  char name[]      = { "Thomas Habets" };
  char email[]     = { "thomas@habets.se" };
  char kernel[]    = { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Bruce Momjian
Date:
On Tue, Sep 28, 2021 at 02:54:39AM -0700, thomas@habets.se wrote:
> On Tue, 28 Sep 2021 02:09:11 +0100, Bruce Momjian <bruce@momjian.us> said:
> > I don't think public CA's are not a good idea for complex setups since
> > they open the ability for an external party to create certificates that
> > are trusted by your server's CA, e.g., certificate authentication.
> 
> I'm not arguing for, and in fact would argue against, public CA for
> client certs.
> 
> So that's a separate issue.
> 
> Note that mTLS prevents a MITM attack that exposes server data even if
> server cert is compromised or re-issued, so if the install is using
> client certs (with private CA) then the public CA for server matters
> much less.
> 
> You can end up at the wrong server, yes, and provide data as INSERT,
> but can't steal or corrupt existing data.
> 
> And you say for complex setups. Fair enough. But currently I'd say the
> default is wrong, and what should be default is not configurable.

Agreed, I think this needs much more discussion and documentation.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Mon, Oct 4, 2021 at 9:14 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Sep 28, 2021 at 02:54:39AM -0700, thomas@habets.se wrote:
> > And you say for complex setups. Fair enough. But currently I'd say the
> > default is wrong, and what should be default is not configurable.
>
> Agreed, I think this needs much more discussion and documentation.

I'd like to try to get this conversation started again. To pique
interest I've attached a new version of 0001, which implements
`sslrootcert=system` instead as suggested upthread. In 0002 I went
further and switched the default sslmode to `verify-full` when using
the system CA roots, because I feel pretty strongly that anyone
interested in using public CA systems is also interested in verifying
hostnames. (Otherwise, why make the switch?)

Notes:
- 0001, like Thomas' original patch, uses
SSL_CTX_set_default_verify_paths(). This will load both a default file
and a default directory. This is probably what most people want if
they're using the system roots -- just give me whatever the local
system wants me to use! -- but sslrootcert currently deals with files
only, I think. Is that a problem?
- The implementation in 0002 goes all the way down to
conninfo_add_defaults(). Maybe this is overly complex. Should I just
make sslmode a derived option, via connectOptions2()?

Thanks,
--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
thomas@habets.se
Date:
On Tue, 25 Oct 2022 01:03:23 +0100, Jacob Champion
<jchampion@timescale.com> said:
> I'd like to try to get this conversation started again. To pique
> interest I've attached a new version of 0001, which implements
> `sslrootcert=system` instead as suggested upthread. In 0002 I went
> further and switched the default sslmode to `verify-full` when using
> the system CA roots, because I feel pretty strongly that anyone
> interested in using public CA systems is also interested in verifying
> hostnames. (Otherwise, why make the switch?)

Yeah I agree that not forcing verify-full when using system CAs is a
giant foot-gun, and many will stop configuring just until it works.

Is there any argument for not checking hostname when using a CA pool
for which literally anyone can create a cert that passes?

It makes sense for self-signed, or "don't care", since that provides
at least protection against passive attacks, but if someone went out
of their way to get a third party signed cert, then it doesn't.

One downside to this approach is that now one option will change the
value of another option. For SSL mode (my rejected patch :-) ) that
makes maybe some more sense.

For users, what is more surprising: A foot-gun that sounds safe, or
one option that overrides another?

--
typedef struct me_s {
  char name[]      = { "Thomas Habets" };
  char email[]     = { "thomas@habets.se" };
  char kernel[]    = { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 2022-10-25 Tu 07:01, thomas@habets.se wrote:
> On Tue, 25 Oct 2022 01:03:23 +0100, Jacob Champion
> <jchampion@timescale.com> said:
>> I'd like to try to get this conversation started again. To pique
>> interest I've attached a new version of 0001, which implements
>> `sslrootcert=system` instead as suggested upthread. In 0002 I went
>> further and switched the default sslmode to `verify-full` when using
>> the system CA roots, because I feel pretty strongly that anyone
>> interested in using public CA systems is also interested in verifying
>> hostnames. (Otherwise, why make the switch?)
> Yeah I agree that not forcing verify-full when using system CAs is a
> giant foot-gun, and many will stop configuring just until it works.
>
> Is there any argument for not checking hostname when using a CA pool
> for which literally anyone can create a cert that passes?
>
> It makes sense for self-signed, or "don't care", since that provides
> at least protection against passive attacks, but if someone went out
> of their way to get a third party signed cert, then it doesn't.
>
> One downside to this approach is that now one option will change the
> value of another option. For SSL mode (my rejected patch :-) ) that
> makes maybe some more sense.
>
> For users, what is more surprising: A foot-gun that sounds safe, or
> one option that overrides another?


I don't find too much difficulty in having one option's default depend
on another's value, as long as it's documented.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Oct 25, 2022 at 4:01 AM <thomas@habets.se> wrote:
> Yeah I agree that not forcing verify-full when using system CAs is a
> giant foot-gun, and many will stop configuring just until it works.
>
> Is there any argument for not checking hostname when using a CA pool
> for which literally anyone can create a cert that passes?

I don't think so. For verify-ca to make any sense, the system CA pool
would need to be very strictly curated, and IMO we already have that
use case covered today.

If there are no valuable use cases for weaker checks, then we could go
even further than my 0002 and just reject any weaker sslmodes
outright. That'd be nice.

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Oct 25, 2022 at 7:26 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> I don't find too much difficulty in having one option's default depend
> on another's value, as long as it's documented.

My patch is definitely missing the documentation for that part right
now; I wanted to get feedback on the approach before wordsmithing too
much.

Thanks,
--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Oct 25, 2022 at 1:20 PM Jacob Champion <jchampion@timescale.com> wrote:
> I wanted to get feedback on the approach before wordsmithing too
> much.

I've added this to tomorrow's CF [1]. Thomas, if you get (or already
have) a PG community username, I can add you as an author.

Thanks,
--Jacob

[1] https://commitfest.postgresql.org/40/3990/



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
thomas@habets.se
Date:
On Mon, 31 Oct 2022 23:36:16 +0000, Jacob Champion
<jchampion@timescale.com> said:
>> I wanted to get feedback on the approach before wordsmithing too
>> much.
>
> I've added this to tomorrow's CF [1]. Thomas, if you get (or already
> have) a PG community username, I can add you as an author.

Sweet. I just created an account with username `habets`.

--
typedef struct me_s {
  char name[]      = { "Thomas Habets" };
  char email[]     = { "thomas@habets.se" };
  char kernel[]    = { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Nov 1, 2022 at 5:30 AM <thomas@habets.se> wrote:
> Sweet. I just created an account with username `habets`.

Added!

OpenSSL 3.0.0 doesn't get along with one of my new tests:

    # Failed test 'sslrootcert=system does not connect with private CA: matches'
    # at /Users/admin/pgsql/src/test/ssl/t/001_ssltests.pl line 453.
    # 'psql: error: connection to server at "127.0.0.1", port 56124
failed: SSL error: unregistered scheme'
    # doesn't match '(?^:SSL error: certificate verify failed)'
    # Looks like you failed 1 test of 191.

I'm not familiar with "unregistered scheme" in this context and will
need to dig in.

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Nov 1, 2022 at 10:03 AM Jacob Champion <jchampion@timescale.com> wrote:
> I'm not familiar with "unregistered scheme" in this context and will
> need to dig in.

Unfortunately I can't reproduce with 3.0.0 on Ubuntu :(

I'm suspicious that it may be related to [1], in which case the
problem might be fixed by upgrading to the latest OpenSSL. But that's
just a guess.

--Jacob

[1] https://github.com/openssl/openssl/issues/18691



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Nov 1, 2022 at 10:55 AM Jacob Champion <jchampion@timescale.com>
wrote:
> On Tue, Nov 1, 2022 at 10:03 AM Jacob Champion <jchampion@timescale.com> wrote:
> > I'm not familiar with "unregistered scheme" in this context and will
> > need to dig in.
>
> Unfortunately I can't reproduce with 3.0.0 on Ubuntu :(

Sorry, when rereading my own emails I suspect they didn't make much
sense to readers. The failure I'm talking about is in cfbot [1], on the
Monterey/Meson build, which is using OpenSSL 3.0.0. I unfortunately
cannot reproduce this on my own Ubuntu machine.

There is an additional test failure with LibreSSL, which doesn't appear
to honor the SSL_CERT_FILE environment variable. This isn't a problem in
production -- if you're using LibreSSL, you'd presumably understand that
you can't use that envvar -- but it makes testing difficult, because I
don't yet know a way to tell LibreSSL to use a different set of roots
for the duration of a test. Has anyone dealt with this before?

> If there are no valuable use cases for weaker checks, then we could go
> even further than my 0002 and just reject any weaker sslmodes
> outright. That'd be nice.

I plan to take this approach in a future v3, with the opinion that it'd
be better for this feature to start life as strict as possible.

--Jacob

[1] https://cirrus-ci.com/task/6176610722775040



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Thu, Nov 3, 2022 at 4:39 PM Jacob Champion <jchampion@timescale.com> wrote:
> There is an additional test failure with LibreSSL, which doesn't appear
> to honor the SSL_CERT_FILE environment variable. This isn't a problem in
> production -- if you're using LibreSSL, you'd presumably understand that
> you can't use that envvar -- but it makes testing difficult, because I
> don't yet know a way to tell LibreSSL to use a different set of roots
> for the duration of a test. Has anyone dealt with this before?

Fixed in v3, with a large hammer (configure-time checks). Hopefully
I've missed a simpler solution.

> > If there are no valuable use cases for weaker checks, then we could go
> > even further than my 0002 and just reject any weaker sslmodes
> > outright. That'd be nice.

Done. sslrootcert=system now prevents you from explicitly setting a
weaker sslmode, to try to cement it as a Do What I Mean sort of
feature. If you need something weird then you can still jump through
the hoops by setting sslrootcert to a real file, same as today.

The macOS/OpenSSL 3.0.0 failure is still unfixed.

Thanks,
--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Michael Paquier
Date:
On Mon, Nov 07, 2022 at 05:04:14PM -0800, Jacob Champion wrote:
> Done. sslrootcert=system now prevents you from explicitly setting a
> weaker sslmode, to try to cement it as a Do What I Mean sort of
> feature. If you need something weird then you can still jump through
> the hoops by setting sslrootcert to a real file, same as today.
>
> The macOS/OpenSSL 3.0.0 failure is still unfixed.

Err, could you look at that?  I am switching the patch as waiting on
author.
--
Michael

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Thu, Dec 1, 2022 at 9:26 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Nov 07, 2022 at 05:04:14PM -0800, Jacob Champion wrote:
> > The macOS/OpenSSL 3.0.0 failure is still unfixed.
>
> Err, could you look at that?  I am switching the patch as waiting on
> author.

Thanks for the nudge -- running with OpenSSL 3.0.7 in CI did not fix
the issue. I suspect a problem with our error stack handling...

Separately from this, our brew cache in Cirrus is extremely out of
date. Is there something that's supposed to be running `brew update`
(or autoupdate) that is stuck or broken?

Thanks,
--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Fri, Dec 2, 2022 at 9:58 AM Jacob Champion <jchampion@timescale.com> wrote:
> Thanks for the nudge -- running with OpenSSL 3.0.7 in CI did not fix
> the issue. I suspect a problem with our error stack handling...

It is a problem with the error queue, but *whose* problem is probably
up for debate. The queue looks like this after SSL_connect() returns:

    error:16000069:STORE
routines:ossl_store_get0_loader_int:unregistered
scheme:crypto/store/store_register.c:237:scheme=file
    error:80000002:system library:file_open:No such file or
directory:providers/implementations/storemgmt/file_store.c:269:calling
stat(/usr/local/etc/openssl@3/certs)
    error:16000069:STORE
routines:ossl_store_get0_loader_int:unregistered
scheme:crypto/store/store_register.c:237:scheme=file
    error:80000002:system library:file_open:No such file or
directory:providers/implementations/storemgmt/file_store.c:269:calling
stat(/usr/local/etc/openssl@3/certs)
    error:16000069:STORE
routines:ossl_store_get0_loader_int:unregistered
scheme:crypto/store/store_register.c:237:scheme=file
    error:80000002:system library:file_open:No such file or
directory:providers/implementations/storemgmt/file_store.c:269:calling
stat(/usr/local/etc/openssl@3/certs)
    error:0A000086:SSL
routines:tls_post_process_server_certificate:certificate verify
failed:ssl/statem/statem_clnt.c:1883:

Note that the error we care about is at the end, not the front.

We are not the first using Homebrew to run into this, and best I can
tell, it is a brew-specific bug. The certificate directory that's been
configured isn't actually installed by the formula. (A colleague here
was able to verify the same behavior on their local machine, so it's
not a Cirrus problem.)

The confusing "unrecognized scheme" message has thrown at least a few
people off the scent. That refers to an OpenSSL STORE URI, not the URI
describing the peer. (Why `file://` is considered "unregistered" is
beyond me, considering the documentation says that file URI support is
built into libcrypto.) From inspection, that error is put onto the
queue before checking to see if the certificate directory exists, and
then it's popped back off the queue if the directory is found(?!).
Unfortunately, the directory isn't there for Homebrew, which means we
get both errors, the first of which is not actually helpful. And then
it pushes the pair of errors two more times, for reasons I haven't
bothered looking into yet.

Maybe this is considered an internal error caused by a packaging bug,
in which case I expect the formula maintainers to ask why it worked
for 1.1. Maybe it's a client error because we're not looking for the
best error on the queue, in which case I ask how we're supposed to
know which error is the most interesting. (I actually kind of know the
answer to this -- OpenSSL's builtin clients appear to check the front
of the queue first, to see if it's an SSL-related error, and then if
it's not they grab the error at the end of the queue instead. To which
I ask: *what?*) Maybe clients are expected to present the entirety of
the queue. But then, why are three separate copies of the same errors
spamming the queue? We can't present that.

I'm considering filing an issue with OpenSSL, to see what they suggest
a responsible client should do in this situation...

> Separately from this, our brew cache in Cirrus is extremely out of
> date. Is there something that's supposed to be running `brew update`
> (or autoupdate) that is stuck or broken?

(If this is eventually considered a bug in the formula, we'll need to
update to get the fix regardless.)

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Mon, Dec 5, 2022 at 10:53 AM Jacob Champion <jchampion@timescale.com> wrote:
> We are not the first using Homebrew to run into this, and best I can
> tell, it is a brew-specific bug. The certificate directory that's been
> configured isn't actually installed by the formula. (A colleague here
> was able to verify the same behavior on their local machine, so it's
> not a Cirrus problem.)

Correction -- it is installed, but then it's removed during `brew
cleanup`. I asked about it over on their discussion board [1].

> (If this is eventually considered a bug in the formula, we'll need to
> update to get the fix regardless.)

For now, it's worked around in v4. This should finally get the cfbot
fully green.

(The "since diff" is now in range-diff format; if you use them, let me
know if this is more or less useful than before.)

Thanks!
--Jacob

[1] https://github.com/orgs/Homebrew/discussions/4030

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Thu, Dec 8, 2022 at 3:10 PM Jacob Champion <jchampion@timescale.com> wrote:
> For now, it's worked around in v4. This should finally get the cfbot
> fully green.

Cirrus's switch to M1 Macs changed the Homebrew installation path, so
v5 adjusts the workaround accordingly.

--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jelte Fennema
Date:
Huge +1 from me. On Azure we're already using public CAs to sign certificates for our managed postgres offerings[1][2]. Right now, our customers have to go to the hassle of downloading a specific root cert or finding their OS default location. Neither of these allow us to give users a simple copy-pastable connection string that uses secure settings. This would change this and make it much easier for our customers to use secure connections to their database.

I have two main questions:
1. From the rest of the thread it's not entirely clear to me why this patch goes for the sslrootcert=system approach, instead of changing what sslrootcert='' means when using verify-full. Like Tom Lane suggested, we could change it to try ~/.postgresql/root.crt and if that doesn't exist make it try the system store, instead of erroring out like it does now when ~/.postgresql/root.crt doesn't exist. This approach seems nicer to me, as it doesn't require introducing another special keyword. It would also remove the need for the changing of defaults depending on the value of sslrootcert. NOTE: For sslmode=verify-ca we should still error out if ~/.postgresql/root.crt doesn't exist, because as mentioned upthread it is trivial to get a cert from these CAs.

2. Should we allow the same approach with ssl_ca_file on the server side, for client cert validation?



On Fri, 6 Jan 2023 at 10:42, Jacob Champion <jchampion@timescale.com> wrote:
On Thu, Dec 8, 2022 at 3:10 PM Jacob Champion <jchampion@timescale.com> wrote:
> For now, it's worked around in v4. This should finally get the cfbot
> fully green.

Cirrus's switch to M1 Macs changed the Homebrew installation path, so
v5 adjusts the workaround accordingly.

--Jacob

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 2023-01-06 Fr 05:17, Jelte Fennema wrote:
> Huge +1 from me. On Azure we're already using public CAs to sign
> certificates for our managed postgres offerings[1][2]. Right now, our
> customers have to go to the hassle of downloading a specific root cert
> or finding their OS default location. Neither of these allow us to
> give users a simple copy-pastable connection string that uses secure
> settings. This would change this and make it much easier for our
> customers to use secure connections to their database.
>
> I have two main questions:
> 1. From the rest of the thread it's not entirely clear to me why this
> patch goes for the sslrootcert=system approach, instead of changing
> what sslrootcert='' means when using verify-full. Like Tom Lane
> suggested, we could change it to try ~/.postgresql/root.crt and if
> that doesn't exist make it try the system store, instead of erroring
> out like it does now when ~/.postgresql/root.crt doesn't exist. This
> approach seems nicer to me, as it doesn't require introducing another
> special keyword. It would also remove the need for the changing of
> defaults depending on the value of sslrootcert. NOTE: For
> sslmode=verify-ca we should still error out if ~/.postgresql/root.crt
> doesn't exist, because as mentioned upthread it is trivial to get a
> cert from these CAs.


One reason might be that it doesn't give you any way not to fall back on
the system store. Maybe that's important, maybe not. I don't know that
there would be much extra ease in doing it the other way, you're going
to have to specify some ssl options anyway.


>
> 2. Should we allow the same approach with ssl_ca_file on the server
> side, for client cert validation?


+1 for doing this, although I think client certs are less likely to have
been issued by a public CA.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jelte Fennema
Date:
> One reason might be that it doesn't give you any way not to fall back on
> the system store.

To not fall back to the system store you could still provide the exact path
to the CA cert file.

> +1 for doing this, although I think client certs are less likely to have
> been issued by a public CA.

I totally agree that it's less likely. And I definitely don't want to block this
patch on this feature. Especially since configuring your database server
is much easier than configuring ALL the clients that ever connect to your
database.

However, I would like to give a use case where use public CA signed
client authentication can make sense:
Authenticating different nodes in a citus cluster to each other. If such
nodes already have a public CA signed certificate for their hostname
to attest their identity for regular clients, then you can set up client
side auth on each of the nodes so that each node in the
cluster can connect as any user to each of the other nodes in
the cluster by authenticating with that same certificate.



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 2023-01-06 Fr 09:28, Jelte Fennema wrote:
>> One reason might be that it doesn't give you any way not to fall back on
>> the system store.
> To not fall back to the system store you could still provide the exact path
> to the CA cert file.


I guess. I don't have strong feelings one way or the other about this.


>
>> +1 for doing this, although I think client certs are less likely to have
>> been issued by a public CA.
> I totally agree that it's less likely. And I definitely don't want to block this
> patch on this feature. Especially since configuring your database server
> is much easier than configuring ALL the clients that ever connect to your
> database.
>
> However, I would like to give a use case where use public CA signed
> client authentication can make sense:
> Authenticating different nodes in a citus cluster to each other. If such
> nodes already have a public CA signed certificate for their hostname
> to attest their identity for regular clients, then you can set up client
> side auth on each of the nodes so that each node in the
> cluster can connect as any user to each of the other nodes in
> the cluster by authenticating with that same certificate.


Yeah, I have done that sort of thing with pgbouncer auth using an ident
map. (There's probably a good case for making ident maps for useful by
adopting the +role mechanism from pg_hba.conf processing, but that's a
separate issue).


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Fri, Jan 6, 2023 at 2:18 AM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> Huge +1 from me. On Azure we're already using public CAs to sign certificates for our managed postgres
offerings[1][2].Right now, our customers have to go to the hassle of downloading a specific root cert or finding their
OSdefault location. Neither of these allow us to give users a simple copy-pastable connection string that uses secure
settings.This would change this and make it much easier for our customers to use secure connections to their database. 

Thanks! Timescale Cloud is in the same boat.

> I have two main questions:
> 1. From the rest of the thread it's not entirely clear to me why this patch goes for the sslrootcert=system approach,
insteadof changing what sslrootcert='' means when using verify-full. Like Tom Lane suggested, we could change it to try
~/.postgresql/root.crtand if that doesn't exist make it try the system store, instead of erroring out like it does now
when~/.postgresql/root.crt doesn't exist. 

I mentioned it briefly upthread, but to expand: For something this
critical to security, I don't like solutions that don't say exactly
what they do. What does the following connection string mean?

    $ psql 'host=example.org sslmode=verify-full'

If it sometimes means to use root.crt (if one exists) and sometimes to
use the system store, then
1) it's hard to audit the actual behavior without knowing the state of
the filesystem, and
2) if I want to connect to a server using the system store, and *only*
the system store, then I have to make sure that the default root.crt
never exists. But what if other software on my system relies on it?

It also provides a bigger target for exploit chains, because I can
remove somebody's root.crt file and their connections may try to
continue with an effectively weaker verification level instead of
erroring out. I realize that for many people this is a nonissue ("if
you can delete the root cert, you can probably do much worse") but IME
arbitrary file deletion vulnerabilities are treated with less concern
than arbitrary file writes.

By contrast,

    $ psql 'host=example.org sslrootcert=system sslmode=verify-full'

has a clear meaning. We'll never use a root.crt.

(A downside of reusing sslrootcert like this is the cross-version
hazard. The connection string 'host=example.org sslrootcert=system'
means something strong with this patchset, but something very weak to
libpq 15 and prior. So clients should probably continue to specify
sslmode=verify-full explicitly for the foreseeable future.)

> This approach seems nicer to me, as it doesn't require introducing another special keyword.

Maybe I'm being overly aspirational, but one upside to that special
keyword is that it's a clear signal that the user wants to use the
public CA model. So we have the opportunity to remove footguns
aggressively when we see this mode. In the future we may have further
opportunities to strengthen sslrootcert=system (OCSP and/or
must-staple support?) that would be harder to roll out by default if
we're just trying to guess what the user wants.

> It would also remove the need for the changing of defaults depending on the value of sslrootcert.

Agreed. Personally I think the benefit of 0002 outweighs its cost, but
maybe there's a better way to implement it.

> 2. Should we allow the same approach with ssl_ca_file on the server side, for client cert validation?

I don't know enough about this use case to implement it safely. We'd
have to make sure the HBA entry is checking the hostname (so that we
do the reverse DNS dance), and I guess we'd need to introduce a new
clientcert verify-* mode? Also, it seems like server operators are
more likely to know exactly which roots they need, at least compared
to clients. I agree the feature is useful, but I'm not excited about
attaching it to this patchset.

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jelte Fennema
Date:
Thanks for clarifying your reasoning. I now agree that ssrootcert=system
is now the best option.

> > 2. Should we allow the same approach with ssl_ca_file on the server side, for client cert validation?
>
> I don't know enough about this use case to implement it safely. We'd
> have to make sure the HBA entry is checking the hostname (so that we
> do the reverse DNS dance), and I guess we'd need to introduce a new
> clientcert verify-* mode? Also, it seems like server operators are
> more likely to know exactly which roots they need, at least compared
> to clients. I agree the feature is useful, but I'm not excited about
> attaching it to this patchset.

The main thing would be to have ssl_ca_file=system check against
the certs from the system CA store. And probably we would want
to disallow clientcert=verify-ca when ssl_ca_file is set to system.
Other than that I don't think anything is necessary. I definitely agree
that this patch should not be blocked on this. But it seems simple
enough to implement and imho it would be a bit confusing if ssl_ca_file
does not support the "system" value, but sslrootcert does.

I also took a closer look at the code, and the only comment I have is:

> appendPQExpBuffer(&conn->errorMessage,

These calls can all be replaced by the recently added libpq_append_conn_error

Finally I tested this against a Postgres server I created on Azure and
the new value works as expected. The only thing that I think would be
good to change is the error message when sslmode=verify-full, and
sslrootcert is not provided, but ~/.postgresql/root.crt is also not available.
I think it would be good for the error to mention sslrootcert=system

> psql: error: connection to server at "xxx.postgres.database.azure.com" (123.456.789.123), port 5432 failed: root
certificatefile "/home/jelte/.postgresql/root.crt" does not exist 
> Either provide the file or change sslmode to disable server certificate verification.

Changing that last line to something like (maybe removing the part
about disabling server certificate verification):
> Either provide the file using the sslrootcert parameter, or use sslrootert=system to use the OS certificate store, or
changesslmode to disable server certificate verification. 

On Fri, 6 Jan 2023 at 19:20, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Fri, Jan 6, 2023 at 2:18 AM Jelte Fennema <postgres@jeltef.nl> wrote:
> >
> > Huge +1 from me. On Azure we're already using public CAs to sign certificates for our managed postgres
offerings[1][2].Right now, our customers have to go to the hassle of downloading a specific root cert or finding their
OSdefault location. Neither of these allow us to give users a simple copy-pastable connection string that uses secure
settings.This would change this and make it much easier for our customers to use secure connections to their database. 
>
> Thanks! Timescale Cloud is in the same boat.
>
> > I have two main questions:
> > 1. From the rest of the thread it's not entirely clear to me why this patch goes for the sslrootcert=system
approach,instead of changing what sslrootcert='' means when using verify-full. Like Tom Lane suggested, we could change
itto try ~/.postgresql/root.crt and if that doesn't exist make it try the system store, instead of erroring out like it
doesnow when ~/.postgresql/root.crt doesn't exist. 
>
> I mentioned it briefly upthread, but to expand: For something this
> critical to security, I don't like solutions that don't say exactly
> what they do. What does the following connection string mean?
>
>     $ psql 'host=example.org sslmode=verify-full'
>
> If it sometimes means to use root.crt (if one exists) and sometimes to
> use the system store, then
> 1) it's hard to audit the actual behavior without knowing the state of
> the filesystem, and
> 2) if I want to connect to a server using the system store, and *only*
> the system store, then I have to make sure that the default root.crt
> never exists. But what if other software on my system relies on it?
>
> It also provides a bigger target for exploit chains, because I can
> remove somebody's root.crt file and their connections may try to
> continue with an effectively weaker verification level instead of
> erroring out. I realize that for many people this is a nonissue ("if
> you can delete the root cert, you can probably do much worse") but IME
> arbitrary file deletion vulnerabilities are treated with less concern
> than arbitrary file writes.
>
> By contrast,
>
>     $ psql 'host=example.org sslrootcert=system sslmode=verify-full'
>
> has a clear meaning. We'll never use a root.crt.
>
> (A downside of reusing sslrootcert like this is the cross-version
> hazard. The connection string 'host=example.org sslrootcert=system'
> means something strong with this patchset, but something very weak to
> libpq 15 and prior. So clients should probably continue to specify
> sslmode=verify-full explicitly for the foreseeable future.)
>
> > This approach seems nicer to me, as it doesn't require introducing another special keyword.
>
> Maybe I'm being overly aspirational, but one upside to that special
> keyword is that it's a clear signal that the user wants to use the
> public CA model. So we have the opportunity to remove footguns
> aggressively when we see this mode. In the future we may have further
> opportunities to strengthen sslrootcert=system (OCSP and/or
> must-staple support?) that would be harder to roll out by default if
> we're just trying to guess what the user wants.
>
> > It would also remove the need for the changing of defaults depending on the value of sslrootcert.
>
> Agreed. Personally I think the benefit of 0002 outweighs its cost, but
> maybe there's a better way to implement it.
>
> > 2. Should we allow the same approach with ssl_ca_file on the server side, for client cert validation?
>
> I don't know enough about this use case to implement it safely. We'd
> have to make sure the HBA entry is checking the hostname (so that we
> do the reverse DNS dance), and I guess we'd need to introduce a new
> clientcert verify-* mode? Also, it seems like server operators are
> more likely to know exactly which roots they need, at least compared
> to clients. I agree the feature is useful, but I'm not excited about
> attaching it to this patchset.
>
> --Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Andrew Dunstan
Date:
On 2023-01-09 Mo 10:07, Jelte Fennema wrote:
> Thanks for clarifying your reasoning. I now agree that ssrootcert=system
> is now the best option.


Cool, that looks like a consensus.


>
>>> 2. Should we allow the same approach with ssl_ca_file on the server side, for client cert validation?
>> I don't know enough about this use case to implement it safely. We'd
>> have to make sure the HBA entry is checking the hostname (so that we
>> do the reverse DNS dance), and I guess we'd need to introduce a new
>> clientcert verify-* mode? Also, it seems like server operators are
>> more likely to know exactly which roots they need, at least compared
>> to clients. I agree the feature is useful, but I'm not excited about
>> attaching it to this patchset.


I'm confused. A client cert might not have a hostname at all, and isn't
used to verify the connecting address, but to verify the username. It
needs to have a CN/DN equal to the user name of the connection, or that
maps to that name via pg_ident.conf.


cheers


andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Mon, Jan 9, 2023 at 7:40 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> I'm confused. A client cert might not have a hostname at all, and isn't
> used to verify the connecting address, but to verify the username. It
> needs to have a CN/DN equal to the user name of the connection, or that
> maps to that name via pg_ident.conf.

Right. But I don't know anything about the security model for using a
publicly issued server certificate as a client certificate. So if you
tell me that your only requirement is that the hostname/CN matches an
entry in your ident file, and that you don't need to verify that the
certificate identifying example.org is actually coming from
example.org, or do any sort of online revocation processing to help
mitigate the risks from that, or even handle wildcards or SANs in the
cert -- fine, but I don't know the right questions to ask to review
that case for safety or correctness. It'd be better to ask someone who
is already comfortable with it.

From my perspective, ssl_ca_file=system sure *looks* like something
reasonable for me to choose as a DBA, but I'm willing to guess it's
not actually reasonable for 99% of people. (If you get your pg_ident
rule wrong, for example, the number of people who can attack you is
scoped by the certificates issued by your CA... which for 'system'
would be the entire world.) By contrast I would have no problem
recommending sslrootcert=system as a default: a certificate can still
be misissued, but a would-be attacker would still have to get you to
connect to them. That model and its risks are, I think, generally well
understood by the community.

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Mon, Jan 9, 2023 at 7:07 AM Jelte Fennema <postgres@jeltef.nl> wrote:
> I also took a closer look at the code, and the only comment I have is:
>
> > appendPQExpBuffer(&conn->errorMessage,
>
> These calls can all be replaced by the recently added libpq_append_conn_error

Argh, thanks for the catch. Fixed.

> Finally I tested this against a Postgres server I created on Azure and
> the new value works as expected. The only thing that I think would be
> good to change is the error message when sslmode=verify-full, and
> sslrootcert is not provided, but ~/.postgresql/root.crt is also not available.
> I think it would be good for the error to mention sslrootcert=system

Good idea. The wording I chose in v6 is

    Either provide the file, use the system's trusted roots with
sslrootcert=system, or change sslmode to disable server certificate
verification.

What do you think?

Thanks!
--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jelte Fennema
Date:
LGTM. As far as I can tell this is ready for a committer.

On Wed, 11 Jan 2023 at 00:15, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Mon, Jan 9, 2023 at 7:07 AM Jelte Fennema <postgres@jeltef.nl> wrote:
> > I also took a closer look at the code, and the only comment I have is:
> >
> > > appendPQExpBuffer(&conn->errorMessage,
> >
> > These calls can all be replaced by the recently added libpq_append_conn_error
>
> Argh, thanks for the catch. Fixed.
>
> > Finally I tested this against a Postgres server I created on Azure and
> > the new value works as expected. The only thing that I think would be
> > good to change is the error message when sslmode=verify-full, and
> > sslrootcert is not provided, but ~/.postgresql/root.crt is also not available.
> > I think it would be good for the error to mention sslrootcert=system
>
> Good idea. The wording I chose in v6 is
>
>     Either provide the file, use the system's trusted roots with
> sslrootcert=system, or change sslmode to disable server certificate
> verification.
>
> What do you think?
>
> Thanks!
> --Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Wed, Jan 11, 2023 at 6:37 AM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> LGTM. As far as I can tell this is ready for a committer.

Thanks for the reviews!

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Magnus Hagander
Date:
On Wed, Jan 11, 2023 at 6:27 PM Jacob Champion <jchampion@timescale.com> wrote:
On Wed, Jan 11, 2023 at 6:37 AM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> LGTM. As far as I can tell this is ready for a committer.

Thanks for the reviews!

Sorry to jump in (very) late in this game. So first, I like this general approach :)

It feels icky to have to add configure tests just to make a test work. But I guess there isn't really a way around that if we want to test the full thing.

However, shouldn't we be using X509_get_default_cert_file_env() to get the name of the env? Granted it's  unlikely to be anything else, but if it's an API you're supposed to use. (In an ideal world that function would not return anything in LibreSSL but I think it does include something, and then just ignores it?)

--

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Wed, Jan 11, 2023 at 10:23 AM Magnus Hagander <magnus@hagander.net> wrote:
> Sorry to jump in (very) late in this game. So first, I like this general approach :)

Thanks!

> It feels icky to have to add configure tests just to make a test work. But I guess there isn't really a way around
thatif we want to test the full thing. 

I agree...

> However, shouldn't we be using X509_get_default_cert_file_env() to get the name of the env? Granted it's  unlikely to
beanything else, but if it's an API you're supposed to use. (In an ideal world that function would not return anything
inLibreSSL but I think it does include something, and then just ignores it?) 

I think you're right, but before I do that, is the cure better than
the disease? It seems like that would further complicate a part of the
Perl tests that is already unnecessarily complicated. (The Postgres
code doesn't use the envvar at all.) Unless you already know of an
OpenSSL-alike that doesn't use that same envvar name?

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Magnus Hagander
Date:
On Wed, Jan 11, 2023 at 8:06 PM Jacob Champion <jchampion@timescale.com> wrote:
On Wed, Jan 11, 2023 at 10:23 AM Magnus Hagander <magnus@hagander.net> wrote:
> Sorry to jump in (very) late in this game. So first, I like this general approach :)

Thanks!

> It feels icky to have to add configure tests just to make a test work. But I guess there isn't really a way around that if we want to test the full thing.

I agree...

> However, shouldn't we be using X509_get_default_cert_file_env() to get the name of the env? Granted it's  unlikely to be anything else, but if it's an API you're supposed to use. (In an ideal world that function would not return anything in LibreSSL but I think it does include something, and then just ignores it?)

I think you're right, but before I do that, is the cure better than
the disease? It seems like that would further complicate a part of the
Perl tests that is already unnecessarily complicated. (The Postgres
code doesn't use the envvar at all.) Unless you already know of an
OpenSSL-alike that doesn't use that same envvar name?

Fair point. No, I have not run into one, I just recalled having seen the API :) 

And you're right -- I didn't consider that we were looking at that one in the *perl* code, not the C code. In the C code it would've been a trivial replacement. In the perl, I agree it's not worth it -- at least not until we run into a platform where it *would' matter.

--

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jelte Fennema
Date:
FYI the last patch does not apply cleanly anymore. So a rebase is needed.



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Thu, Feb 16, 2023 at 1:35 AM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> FYI the last patch does not apply cleanly anymore. So a rebase is needed.

Thanks for the nudge, v7 rebases over the configure conflict from 9244c11afe.

--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On 2/16/23 10:38, Jacob Champion wrote:
> Thanks for the nudge, v7 rebases over the configure conflict from 9244c11afe.

I think/hope this is well-baked enough for a potential commit this CF,
so I've adjusted the target version. Let me know if there are any
concerns about the approach.

Thanks!
--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
"Gregory Stark (as CFM)"
Date:
It does look like a rebase for meson.build would be helpful. I'm not
marking it waiting on author just for meson.build conflicts but it
would be perhaps more likely to be picked up by a committer if it's
showing green in cfbot.



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Tue, Mar 14, 2023 at 11:01 AM Gregory Stark (as CFM)
<stark.cfm@gmail.com> wrote:
> It does look like a rebase for meson.build would be helpful. I'm not
> marking it waiting on author just for meson.build conflicts but it
> would be perhaps more likely to be picked up by a committer if it's
> showing green in cfbot.

Rebased over yesterday's Meson changes in v8.

Thanks!
--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 14 Mar 2023, at 20:20, Jacob Champion <jchampion@timescale.com> wrote:

> Rebased over yesterday's Meson changes in v8.

I had a look at this and agree that it's something we should do.  The patch
seems quite close to committable, I just have a few comments on it:

+  # Let tests differentiate between vanilla OpenSSL and LibreSSL.
+  AC_CHECK_DECLS([LIBRESSL_VERSION_NUMBER], [], [], [#include <openssl/opensslv.h>])
We have a check for SSL_CTX_set_cert_cb which is specifically done since it's
not present in Libressl.  Rather than spending more cycles in autoconf/meson,
couldn't we use HAVE_SSL_CTX_SET_CERT_CB for this test?  (Longer term, maybe we
should make the checks properly distinguish between OpenSSL and LibreSSL as
they are diverging, but thats not for this patch to tackle.)


+    # brew cleanup removes the empty certs directory in OPENSSLDIR, causing
+    # OpenSSL to report unexpected errors ("unregistered scheme") during
+    # verification failures. Put it back for now as a workaround.
+    #
+    #   https://github.com/orgs/Homebrew/discussions/4030
+    #
+    # Note that $(brew --prefix openssl) will give us the opt/ prefix but not
+    # the etc/ prefix, so we hardcode the full path here. openssl@3 is pinned
+    # above to try to minimize the chances of this changing beneath us, but it's
+    # brittle...
+    mkdir -p "/opt/homebrew/etc/openssl@3/certs"
I can agree with the comment that this seems brittle. How about moving the installation of openssl to after the brew
cleanupstage to avoid the need for this? While that may leave more in the cache, it seems more palatable. Something
likethis essentially: 

    brew install <everything but openssl>
    brew cleanup -s
    # Comment about why OpenSSL is kept separate
    brew install openssl@3


+       libpq_append_conn_error(conn, "weak sslmode \"%s\" may not be used with sslrootcert=system",
+                               conn->sslmode);
I think we should help the user by indicating which sslmode we allow in this
message.


+
+    /*
+     * sslmode is not specified. Let it be filled in with the compiled
+     * default for now, but if sslrootcert=system, we'll override the
+     * default later before returning.
+     */
+    sslmode_default = option;
As a not to self and other reviewers, "git am" misplaced this when applying the
patch such that the result was syntactically correct but semantically wrong,
causing very weird test errors.


+    sslmode_default->val = strdup("verify-full");
This needs to be checked for OOM error.


-   if (fnbuf[0] != '\0' &&
-       stat(fnbuf, &buf) == 0)
+   if (strcmp(fnbuf, "system") == 0)
I'm not a fan of magic values, but sadly I don't have a better idea for this.
We should however document that the keyword takes precedence over a file with
the same name (even though the collision is unlikely).


+       if (SSL_CTX_set_default_verify_paths(SSL_context) != 1)
OpenSSL documents this as "A missing default location is still treated as a
success.", is that something we need to document or in any way deal with?
(Skimming the OpenSSL code I'm not sure it's actually correct in v3+, but I
might very well have missed something.)

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On 3/31/23 02:14, Daniel Gustafsson wrote:
>> On 14 Mar 2023, at 20:20, Jacob Champion <jchampion@timescale.com> wrote:
> 
>> Rebased over yesterday's Meson changes in v8.
> 
> I had a look at this and agree that it's something we should do.

Great, thanks for the review!

> +  # Let tests differentiate between vanilla OpenSSL and LibreSSL.
> +  AC_CHECK_DECLS([LIBRESSL_VERSION_NUMBER], [], [], [#include <openssl/opensslv.h>])
> We have a check for SSL_CTX_set_cert_cb which is specifically done since it's
> not present in Libressl.  Rather than spending more cycles in autoconf/meson,
> couldn't we use HAVE_SSL_CTX_SET_CERT_CB for this test?  (Longer term, maybe we
> should make the checks properly distinguish between OpenSSL and LibreSSL as
> they are diverging, but thats not for this patch to tackle.)

I can make that change; note that it'll also skip some of the new tests
with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
acceptable, it should be an easy switch.

> I can agree with the comment that this seems brittle. How about moving the installation of openssl to after the brew
cleanupstage to avoid the need for this? While that may leave more in the cache, it seems more palatable. Something
likethis essentially:
 
> 
>     brew install <everything but openssl>
>     brew cleanup -s    
>     # Comment about why OpenSSL is kept separate
>     brew install openssl@3

That looks much better to me, but it didn't work when I tried it. One or
more of the packages above it (and/or the previous cache?) has already
installed OpenSSL as one of its dependencies, so the last `brew install`
becomes a no-op. I tried an `install --force` as well, but that didn't
seem to do anything differently. :/

> +       libpq_append_conn_error(conn, "weak sslmode \"%s\" may not be used with sslrootcert=system",
> +                               conn->sslmode);
> I think we should help the user by indicating which sslmode we allow in this
> message.

Added in v9.

> +
> +    /*
> +     * sslmode is not specified. Let it be filled in with the compiled
> +     * default for now, but if sslrootcert=system, we'll override the
> +     * default later before returning.
> +     */
> +    sslmode_default = option;
> As a not to self and other reviewers, "git am" misplaced this when applying the
> patch such that the result was syntactically correct but semantically wrong,
> causing very weird test errors.

Lovely... I've formatted v9 with a longer patch context.

> +    sslmode_default->val = strdup("verify-full");
> This needs to be checked for OOM error.

Whoops, should be fixed now.

> -   if (fnbuf[0] != '\0' &&
> -       stat(fnbuf, &buf) == 0)
> +   if (strcmp(fnbuf, "system") == 0)
> I'm not a fan of magic values, but sadly I don't have a better idea for this.
> We should however document that the keyword takes precedence over a file with
> the same name (even though the collision is unlikely).

Added a note to the docs.

> +       if (SSL_CTX_set_default_verify_paths(SSL_context) != 1)
> OpenSSL documents this as "A missing default location is still treated as a
> success.", is that something we need to document or in any way deal with?
> (Skimming the OpenSSL code I'm not sure it's actually correct in v3+, but I
> might very well have missed something.)

I think it's still true in v3+, because that sounds exactly like the
brew issue we're working around in Cirrus. I'm not sure if there's much
for us to do in that case, short of reimplementing the OpenSSL defaults
logic and checking it ourselves. (And even that would look different
between OpenSSL and LibreSSL...)

Is there something we could document that's more helpful than "make sure
your installation isn't broken"?

--Jacob
Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 31 Mar 2023, at 19:59, Jacob Champion <jchampion@timescale.com> wrote:

>> +  # Let tests differentiate between vanilla OpenSSL and LibreSSL.
>> +  AC_CHECK_DECLS([LIBRESSL_VERSION_NUMBER], [], [], [#include <openssl/opensslv.h>])
>> We have a check for SSL_CTX_set_cert_cb which is specifically done since it's
>> not present in Libressl.  Rather than spending more cycles in autoconf/meson,
>> couldn't we use HAVE_SSL_CTX_SET_CERT_CB for this test?  (Longer term, maybe we
>> should make the checks properly distinguish between OpenSSL and LibreSSL as
>> they are diverging, but thats not for this patch to tackle.)
>
> I can make that change; note that it'll also skip some of the new tests
> with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
> acceptable, it should be an easy switch.

I'm not sure I follow, AFAICT it's present all the way till 3.1 at least?  What
am I missing?

>> I can agree with the comment that this seems brittle. How about moving the installation of openssl to after the brew
cleanupstage to avoid the need for this? While that may leave more in the cache, it seems more palatable. Something
likethis essentially: 
>>
>>     brew install <everything but openssl>
>>     brew cleanup -s
>>     # Comment about why OpenSSL is kept separate
>>     brew install openssl@3
>
> That looks much better to me, but it didn't work when I tried it. One or
> more of the packages above it (and/or the previous cache?) has already
> installed OpenSSL as one of its dependencies, so the last `brew install`
> becomes a no-op. I tried an `install --force` as well, but that didn't
> seem to do anything differently. :/

Ugh, that's very unfortunate, I guess we're stuck with this then.  If we can't
make brew cleanup not remove it then any hack applied to make it stick around
will be equally brittle so we might as well mkdir it back.

>> +       if (SSL_CTX_set_default_verify_paths(SSL_context) != 1)
>> OpenSSL documents this as "A missing default location is still treated as a
>> success.", is that something we need to document or in any way deal with?
>> (Skimming the OpenSSL code I'm not sure it's actually correct in v3+, but I
>> might very well have missed something.)
>
> I think it's still true in v3+, because that sounds exactly like the
> brew issue we're working around in Cirrus. I'm not sure if there's much
> for us to do in that case, short of reimplementing the OpenSSL defaults
> logic and checking it ourselves. (And even that would look different
> between OpenSSL and LibreSSL...)

Right, that's clearly not something we want to do.

> Is there something we could document that's more helpful than "make sure
> your installation isn't broken"?

I wonder if there is an openssl command line example for verifying defaults
that we can document and refer to?

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Sun, Apr 2, 2023 at 1:36 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 31 Mar 2023, at 19:59, Jacob Champion <jchampion@timescale.com> wrote:
> > I can make that change; note that it'll also skip some of the new tests
> > with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
> > acceptable, it should be an easy switch.
>
> I'm not sure I follow, AFAICT it's present all the way till 3.1 at least?  What
> am I missing?

I don't see it anywhere in my 1.0.1 setup, and Meson doesn't define
HAVE_SSL_CTX_SET_CERT_CB when built against it.

> > Is there something we could document that's more helpful than "make sure
> > your installation isn't broken"?
>
> I wonder if there is an openssl command line example for verifying defaults
> that we can document and refer to?

We could maybe have them connect to a known host:

    $ echo Q | openssl s_client -connect postgresql.org:443 -verify_return_error

Alternatively, OpenSSL will show you the OPENSSLDIR:

    $ openssl version -d
    OPENSSLDIR: "/usr/lib/ssl"

and then we could tell users to ensure they have a populated certs/
directory or a cert.pem file underneath it. That'll be prone to rot,
though (e.g. OpenSSL 3 introduces the default store in addition to the
default file+directory).

Thanks,
--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 3 Apr 2023, at 21:04, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Sun, Apr 2, 2023 at 1:36 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>> On 31 Mar 2023, at 19:59, Jacob Champion <jchampion@timescale.com> wrote:
>>> I can make that change; note that it'll also skip some of the new tests
>>> with OpenSSL 1.0.1, where there's no SSL_CTX_set_cert_cb. If that's
>>> acceptable, it should be an easy switch.
>>
>> I'm not sure I follow, AFAICT it's present all the way till 3.1 at least?  What
>> am I missing?
>
> I don't see it anywhere in my 1.0.1 setup, and Meson doesn't define
> HAVE_SSL_CTX_SET_CERT_CB when built against it.

Doh, sorry, my bad.  I read and wrote 1.0.1 but was thinking about 1.0.2.  You
are right, in 1.0.1 that API does not exist.  I'm not all too concerned with
skipping this tests on OpenSSL versions that by the time 16 ships are 6 years
EOL - and I'm not convinced that spending meson/autoconf cycles to include them
is warranted.

Longer term I'd want to properly distinguish between LibreSSL and OpenSSL, but
then we should have a bigger discussion on what we want to use these values for.

>>> Is there something we could document that's more helpful than "make sure
>>> your installation isn't broken"?
>>
>> I wonder if there is an openssl command line example for verifying defaults
>> that we can document and refer to?
>
> We could maybe have them connect to a known host:
>
>    $ echo Q | openssl s_client -connect postgresql.org:443 -verify_return_error

Something along these lines is probably best, if we do it at all.  Needs
sleeping on.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Mon, Apr 3, 2023 at 12:40 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> Doh, sorry, my bad.  I read and wrote 1.0.1 but was thinking about 1.0.2.  You
> are right, in 1.0.1 that API does not exist.  I'm not all too concerned with
> skipping this tests on OpenSSL versions that by the time 16 ships are 6 years
> EOL - and I'm not convinced that spending meson/autoconf cycles to include them
> is warranted.

Cool. v10 keys off of HAVE_SSL_CTX_SET_CERT_CB, instead.

> > We could maybe have them connect to a known host:
> >
> >    $ echo Q | openssl s_client -connect postgresql.org:443 -verify_return_error
>
> Something along these lines is probably best, if we do it at all.  Needs
> sleeping on.

Sounds good.

Thanks!
--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 3 Apr 2023, at 23:09, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Mon, Apr 3, 2023 at 12:40 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>> Doh, sorry, my bad.  I read and wrote 1.0.1 but was thinking about 1.0.2.  You
>> are right, in 1.0.1 that API does not exist.  I'm not all too concerned with
>> skipping this tests on OpenSSL versions that by the time 16 ships are 6 years
>> EOL - and I'm not convinced that spending meson/autoconf cycles to include them
>> is warranted.
>
> Cool. v10 keys off of HAVE_SSL_CTX_SET_CERT_CB, instead.

I squashed and pushed v10 with a few small comment tweaks for typos and some
pgindenting. Thanks!

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Wed, Apr 5, 2023 at 2:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> I squashed and pushed v10 with a few small comment tweaks for typos and some
> pgindenting. Thanks!

Thank you very much!

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Peter Eisentraut
Date:
On 05.04.23 23:29, Jacob Champion wrote:
> On Wed, Apr 5, 2023 at 2:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>> I squashed and pushed v10 with a few small comment tweaks for typos and some
>> pgindenting. Thanks!
> 
> Thank you very much!

This patch (8eda731465) makes the ssl tests fail for me:

not ok 121 - sslrootcert=system does not connect with private CA: matches

#   Failed test 'sslrootcert=system does not connect with private CA: 
matches'
#   at t/001_ssltests.pl line 479.
#                   'psql: error: connection to server at "127.0.0.1", 
port 53971 failed: SSL SYSCALL error: Undefined error: 0'
#     doesn't match '(?^:SSL error: certificate verify failed)'

This is with OpenSSL 3.1.0 from macOS/Homebrew.

If I instead use OpenSSL 1.1.1t, then the tests pass.




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 09:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 05.04.23 23:29, Jacob Champion wrote:
>> On Wed, Apr 5, 2023 at 2:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>> I squashed and pushed v10 with a few small comment tweaks for typos and some
>>> pgindenting. Thanks!
>> Thank you very much!
>
> This patch (8eda731465) makes the ssl tests fail for me:
>
> not ok 121 - sslrootcert=system does not connect with private CA: matches
>
> #   Failed test 'sslrootcert=system does not connect with private CA: matches'
> #   at t/001_ssltests.pl line 479.
> #                   'psql: error: connection to server at "127.0.0.1", port 53971 failed: SSL SYSCALL error:
Undefinederror: 0' 
> #     doesn't match '(?^:SSL error: certificate verify failed)'
>
> This is with OpenSSL 3.1.0 from macOS/Homebrew.
>
> If I instead use OpenSSL 1.1.1t, then the tests pass.

Thanks for the report, I'll look at it today as an open item for 16.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 09:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 05.04.23 23:29, Jacob Champion wrote:
>> On Wed, Apr 5, 2023 at 2:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>> I squashed and pushed v10 with a few small comment tweaks for typos and some
>>> pgindenting. Thanks!
>> Thank you very much!
>
> This patch (8eda731465) makes the ssl tests fail for me:
>
> not ok 121 - sslrootcert=system does not connect with private CA: matches
>
> #   Failed test 'sslrootcert=system does not connect with private CA: matches'
> #   at t/001_ssltests.pl line 479.
> #                   'psql: error: connection to server at "127.0.0.1", port 53971 failed: SSL SYSCALL error:
Undefinederror: 0' 
> #     doesn't match '(?^:SSL error: certificate verify failed)'
>
> This is with OpenSSL 3.1.0 from macOS/Homebrew.
>
> If I instead use OpenSSL 1.1.1t, then the tests pass.

I am unable to reproduce this (or any failure) with OpenSSL 3.1 built from
source (or 3.0 or 3.1.1-dev) or installed via homebrew (on macOS 12 with Intel
CPU).  Do you have any more clues from logs what might've happened?

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Wed, Apr 12, 2023 at 2:24 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 12 Apr 2023, at 09:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
> > #   Failed test 'sslrootcert=system does not connect with private CA: matches'
> > #   at t/001_ssltests.pl line 479.
> > #                   'psql: error: connection to server at "127.0.0.1", port 53971 failed: SSL SYSCALL error:
Undefinederror: 0' 
> > #     doesn't match '(?^:SSL error: certificate verify failed)'
> >
> > This is with OpenSSL 3.1.0 from macOS/Homebrew.
> >
> > If I instead use OpenSSL 1.1.1t, then the tests pass.
>
> I am unable to reproduce this (or any failure) with OpenSSL 3.1 built from
> source (or 3.0 or 3.1.1-dev) or installed via homebrew (on macOS 12 with Intel
> CPU).  Do you have any more clues from logs what might've happened?

This looks similar (but not identical) to the brew bug we're working
around for Cirrus, in which `brew cleanup` breaks the OpenSSL
installation and turns certificate verification failures into
bizarrely unhelpful messages.

Peter, you should have a .../etc/openssl@3/certs directory somewhere
in your Homebrew installation prefix -- do you, or has Homebrew
removed it by mistake?

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Peter Eisentraut
Date:
On 12.04.23 18:54, Jacob Champion wrote:
> Peter, you should have a .../etc/openssl@3/certs directory somewhere
> in your Homebrew installation prefix -- do you, or has Homebrew
> removed it by mistake?

I don't have that, but I don't have it for openssl@1.1 either.  I have

~$ ll /usr/local/etc/openssl@3
total 76
drwxr-xr-x 7 peter admin   224 2023-03-08 08:49 misc/
lrwxr-xr-x 1 peter admin    27 2023-03-21 13:41 cert.pem -> 
../ca-certificates/cert.pem
-rw-r--r-- 1 peter admin   412 2023-03-21 13:41 ct_log_list.cnf
-rw-r--r-- 1 peter admin   412 2023-03-21 13:41 ct_log_list.cnf.dist
-rw-r--r-- 1 peter admin   351 2023-03-08 08:57 fipsmodule.cnf
-rw-r--r-- 1 peter admin 12386 2023-03-13 10:49 openssl.cnf
-rw-r--r-- 1 peter admin 12292 2023-03-21 13:41 openssl.cnf.default
-rw-r--r-- 1 peter admin 12292 2023-03-08 08:49 openssl.cnf.dist
-rw-r--r-- 1 peter admin 12292 2023-03-21 13:41 openssl.cnf.dist.default




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 21:43, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 12.04.23 18:54, Jacob Champion wrote:
>> Peter, you should have a .../etc/openssl@3/certs directory somewhere
>> in your Homebrew installation prefix -- do you, or has Homebrew
>> removed it by mistake?
>
> I don't have that, but I don't have it for openssl@1.1 either.

The important bit is that your OPENSSLDIR points to a directory which has the
content OpenSSL needs.

> I have
>
> ~$ ll /usr/local/etc/openssl@3
> total 76
> drwxr-xr-x 7 peter admin   224 2023-03-08 08:49 misc/
> lrwxr-xr-x 1 peter admin    27 2023-03-21 13:41 cert.pem -> ../ca-certificates/cert.pem
> -rw-r--r-- 1 peter admin   412 2023-03-21 13:41 ct_log_list.cnf
> -rw-r--r-- 1 peter admin   412 2023-03-21 13:41 ct_log_list.cnf.dist
> -rw-r--r-- 1 peter admin   351 2023-03-08 08:57 fipsmodule.cnf
> -rw-r--r-- 1 peter admin 12386 2023-03-13 10:49 openssl.cnf
> -rw-r--r-- 1 peter admin 12292 2023-03-21 13:41 openssl.cnf.default
> -rw-r--r-- 1 peter admin 12292 2023-03-08 08:49 openssl.cnf.dist
> -rw-r--r-- 1 peter admin 12292 2023-03-21 13:41 openssl.cnf.dist.default

Assuming that's your OPENSSLDIR, then that looks like it should (it's precisely
what I have).

Just to further rule out any issues in the installation, If you run the command
from upthread, does that properly verify postgresql.org?

echo Q | <path to>openssl@3/bin/openssl s_client -connect postgresql.org:443 -verify_return_error

Is the failure repeatable enough that you might be able to tease something out
of the log?  I've been trying again today but been unable to reproduce this =(

We don't have great coverage of macOS in the buildfarm sadly, I wonder if can
get sifaka to run the SSL tests if we ask nicely?

--
Daniel Gustafsson




Daniel Gustafsson <daniel@yesql.se> writes:
> We don't have great coverage of macOS in the buildfarm sadly, I wonder if can
> get sifaka to run the SSL tests if we ask nicely?

I was just looking into that, but it seems like it'd be a mess.

I have a modern openssl installation from MacPorts, but if
I try to select that I am going to end up compiling with
-I/opt/local/include -L/opt/local/lib, which exposes all of the
metric buttload of stuff that MacPorts tends to pull in.  sifaka
is intended to test in a reasonably-default macOS environment,
and that would be far from it.

Plausible alternatives include:

1. Hand-built private copy of openssl.  longfin is set up that way,
but I'm not really eager to duplicate that approach, especially if
we want to test cutting-edge openssl.

2. Run a second BF animal that's intentionally pointed at the MacPorts
environment, in hopes of testing what MacPorts users would see.

#2 feels like it might not be a waste of cycles, and certainly that
machine is underworked at the moment.  Thoughts?

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 22:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> We don't have great coverage of macOS in the buildfarm sadly, I wonder if can
>> get sifaka to run the SSL tests if we ask nicely?
>
> I was just looking into that, but it seems like it'd be a mess.
>
> I have a modern openssl installation from MacPorts, but if
> I try to select that I am going to end up compiling with
> -I/opt/local/include -L/opt/local/lib, which exposes all of the
> metric buttload of stuff that MacPorts tends to pull in.  sifaka
> is intended to test in a reasonably-default macOS environment,
> and that would be far from it.

That makes sense.

> Plausible alternatives include:
>
> 1. Hand-built private copy of openssl.  longfin is set up that way,
> but I'm not really eager to duplicate that approach, especially if
> we want to test cutting-edge openssl.
>
> 2. Run a second BF animal that's intentionally pointed at the MacPorts
> environment, in hopes of testing what MacPorts users would see.
>
> #2 feels like it might not be a waste of cycles, and certainly that
> machine is underworked at the moment.  Thoughts?

I think #2 would be a good addition.  Most won't build OpenSSL themselves so
seeing builds from envs that are reasonable to expect in the wild is more
interesting.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
(Peter, your emails are being redirected to spam for me, FYI.
Something about messagingengine.)

On Wed, Apr 12, 2023 at 12:57 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 12 Apr 2023, at 21:43, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
> > On 12.04.23 18:54, Jacob Champion wrote:
> >> Peter, you should have a .../etc/openssl@3/certs directory somewhere
> >> in your Homebrew installation prefix -- do you, or has Homebrew
> >> removed it by mistake?
> >
> > I don't have that, but I don't have it for openssl@1.1 either.

AFAIK this behavior started with 3.x.

> The important bit is that your OPENSSLDIR points to a directory which has the
> content OpenSSL needs.
>
> > I have
> >
> > ~$ ll /usr/local/etc/openssl@3
> > total 76
> > drwxr-xr-x 7 peter admin   224 2023-03-08 08:49 misc/
> > lrwxr-xr-x 1 peter admin    27 2023-03-21 13:41 cert.pem -> ../ca-certificates/cert.pem
> > -rw-r--r-- 1 peter admin   412 2023-03-21 13:41 ct_log_list.cnf
> > -rw-r--r-- 1 peter admin   412 2023-03-21 13:41 ct_log_list.cnf.dist
> > -rw-r--r-- 1 peter admin   351 2023-03-08 08:57 fipsmodule.cnf
> > -rw-r--r-- 1 peter admin 12386 2023-03-13 10:49 openssl.cnf
> > -rw-r--r-- 1 peter admin 12292 2023-03-21 13:41 openssl.cnf.default
> > -rw-r--r-- 1 peter admin 12292 2023-03-08 08:49 openssl.cnf.dist
> > -rw-r--r-- 1 peter admin 12292 2023-03-21 13:41 openssl.cnf.dist.default
>
> Assuming that's your OPENSSLDIR, then that looks like it should (it's precisely
> what I have).

It surprises me that you can get a successful test with a missing
certs directory. If I remove the workaround in Cirrus, I get the
following error, which looks the same to me:

    [20:40:00.253](0.000s) not ok 121 - sslrootcert=system does not
connect with private CA: matches
    [20:40:00.253](0.000s) #   Failed test 'sslrootcert=system does
not connect with private CA: matches'
    #   at /Users/admin/pgsql/src/test/ssl/t/001_ssltests.pl line 479.
    [20:40:00.253](0.000s) #                   'psql: error:
connection to server at "127.0.0.1", port 57681 failed: SSL SYSCALL
error: Undefined error: 0'
    #     doesn't match '(?^:SSL error: certificate verify failed)'

(That broken error message has changed since 3.0; now it's busted in a
new way as of 3.1, I guess.)

Does the test start passing if you create an empty certs directory? It
still wouldn't explain why Daniel's setup is succeeding...

--Jacob



Oh!  I was a little behind on MacPorts updates, and after
pulling the latest (taking their openssl from 3.0.8 to 3.1.0)
I can duplicate Peter's problem:

# +++ tap check in src/test/ssl +++
t/001_ssltests.pl .. 120/?
#   Failed test 'sslrootcert=system does not connect with private CA: matches'
#   at t/001_ssltests.pl line 479.
#                   'psql: error: connection to server at "127.0.0.1", port 58910 failed: SSL SYSCALL error: Undefined
error:0' 
#     doesn't match '(?^:SSL error: certificate verify failed)'
t/001_ssltests.pl .. 196/? # Looks like you failed 1 test of 205.
t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/205 subtests
t/002_scram.pl ..... ok
t/003_sslinfo.pl ... ok

Test Summary Report
-------------------
t/001_ssltests.pl (Wstat: 256 Tests: 205 Failed: 1)
  Failed test:  121
  Non-zero exit status: 1
Files=3, Tests=247, 14 wallclock secs ( 0.02 usr  0.01 sys +  2.04 cusr  1.54 csys =  3.61 CPU)
Result: FAIL
make: *** [check] Error 1

So whatever this is, it's not strictly Homebrew's issue.

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Peter Eisentraut
Date:
On 12.04.23 22:52, Jacob Champion wrote:
> It surprises me that you can get a successful test with a missing
> certs directory. If I remove the workaround in Cirrus, I get the
> following error, which looks the same to me:
> 
>      [20:40:00.253](0.000s) not ok 121 - sslrootcert=system does not
> connect with private CA: matches
>      [20:40:00.253](0.000s) #   Failed test 'sslrootcert=system does
> not connect with private CA: matches'
>      #   at /Users/admin/pgsql/src/test/ssl/t/001_ssltests.pl line 479.
>      [20:40:00.253](0.000s) #                   'psql: error:
> connection to server at "127.0.0.1", port 57681 failed: SSL SYSCALL
> error: Undefined error: 0'
>      #     doesn't match '(?^:SSL error: certificate verify failed)'
> 
> (That broken error message has changed since 3.0; now it's busted in a
> new way as of 3.1, I guess.)
> 
> Does the test start passing if you create an empty certs directory? It
> still wouldn't explain why Daniel's setup is succeeding...

After

mkdir /usr/local/etc/openssl@3/certs

the tests pass!




Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 12.04.23 22:52, Jacob Champion wrote:
>> Does the test start passing if you create an empty certs directory? It
>> still wouldn't explain why Daniel's setup is succeeding...

> After
> mkdir /usr/local/etc/openssl@3/certs
> the tests pass!

Likewise, though MacPorts unsurprisingly uses a different place:

$ openssl info -configdir
/opt/local/libexec/openssl3/etc/openssl
$ sudo mkdir /opt/local/libexec/openssl3/etc/openssl/certs
$ make check PG_TEST_EXTRA=ssl
... success!

So this smells to me like a new OpenSSL bug: they should tolerate
a missing certs dir like they used to.  Who wants to file it?

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 23:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> On 12.04.23 22:52, Jacob Champion wrote:
>>> Does the test start passing if you create an empty certs directory? It
>>> still wouldn't explain why Daniel's setup is succeeding...
>
>> After
>> mkdir /usr/local/etc/openssl@3/certs
>> the tests pass!
>
> Likewise, though MacPorts unsurprisingly uses a different place:
>
> $ openssl info -configdir
> /opt/local/libexec/openssl3/etc/openssl
> $ sudo mkdir /opt/local/libexec/openssl3/etc/openssl/certs
> $ make check PG_TEST_EXTRA=ssl
> ... success!
>
> So this smells to me like a new OpenSSL bug: they should tolerate
> a missing certs dir like they used to.  Who wants to file it?

They are specifying that: "A missing default location is still treated as a
success".  That leaves out the interesting bit of what a success means here,
and how it should work when verifications are requested.  That being said, the
same is written in the 1.1.1 manpage.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 23:46, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 12 Apr 2023, at 23:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> So this smells to me like a new OpenSSL bug: they should tolerate
>> a missing certs dir like they used to.  Who wants to file it?
>
> They are specifying that: "A missing default location is still treated as a
> success".  That leaves out the interesting bit of what a success means here,
> and how it should work when verifications are requested.  That being said, the
> same is written in the 1.1.1 manpage.

After a little bit of digging I have a vague idea.

OpenSSL will treat a missing default location as a success simply due to the
fact that it mainly just stores the path, loading of the certs is deferred
until use (which maps well to the error we are seeing).  Patching OpenSSL to
report all errors makes no difference, a missing default is indeed not an error
even with errors turned on.

The change in OpenSSL 3 is the addition of certificate stores via ossl_store
API.  When SSL_CTX_set_default_verify_paths() is called it will in 1.1.1 set
the default (hardcoded) filename and path; in 3 it also sets the default store.
Stores are initialized with a URL, and the default store falls back to using the
default certs dir as the URI as no store is set.

If I patch OpenSSL 3 to skip setting the default store, the tests pass even
with a missing cert directory. This is effectively the 1.1.1 behavior.

The verification error we are hitting is given to us in the verify_cb which
we've short circuited.  The issue we have is that we cannot get PGconn in
verify_cb so logging an error is tricky.

I need to sleep on this before I do some more digging to figure out if OpenSSL
considers this to be the intended behavior, a regression in 3, or if we have a
bug in how we catch verification errors which is exposed by a non-existing
store.  I'll add an open item for this in the morning to track how we'd like to
proceed.

--
Daniel Gustafsson




On Wed, Apr 12, 2023, 19:30 Daniel Gustafsson <daniel@yesql.se> wrote:
>
>  The issue we have is that we cannot get PGconn in
> verify_cb so logging an error is tricky.


Hm, the man page talks about a "ex_data mechanism" which seems to be
referring to this Rube Goldberg device
https://www.openssl.org/docs/man3.1/man3/SSL_get_ex_data.html

It looks like X509_STORE_CTX_set_app_data() and
X509_STORE_CTX_get_app_data() would be convenience macros to do this.



Daniel Gustafsson <daniel@yesql.se> writes:
> On 12 Apr 2023, at 22:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Plausible alternatives include:
>> 2. Run a second BF animal that's intentionally pointed at the MacPorts
>> environment, in hopes of testing what MacPorts users would see.
>> 
>> #2 feels like it might not be a waste of cycles, and certainly that
>> machine is underworked at the moment.  Thoughts?

> I think #2 would be a good addition.  Most won't build OpenSSL themselves so
> seeing builds from envs that are reasonable to expect in the wild is more
> interesting.

I have an animal cranked up and awaiting approval; however, it fails
the src/test/ldap tests in v11, apparently because aa1419e63 was not
back-patched.  Barring objections, I'll back-patch that before
bringing the animal on-line (or Thomas can do it, if he wishes).

            regards, tom lane



Daniel Gustafsson <daniel@yesql.se> writes:
>> On 12 Apr 2023, at 22:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. Run a second BF animal that's intentionally pointed at the MacPorts
>> environment, in hopes of testing what MacPorts users would see.

> I think #2 would be a good addition.  Most won't build OpenSSL themselves so
> seeing builds from envs that are reasonable to expect in the wild is more
> interesting.

Done, reporting as "indri".

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 13 Apr 2023, at 18:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 12 Apr 2023, at 22:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> 2. Run a second BF animal that's intentionally pointed at the MacPorts
>>> environment, in hopes of testing what MacPorts users would see.
>
>> I think #2 would be a good addition.  Most won't build OpenSSL themselves so
>> seeing builds from envs that are reasonable to expect in the wild is more
>> interesting.
>
> Done, reporting as "indri".

Great, thanks heaps! That will for sure be helpful going forward.

Regarding the thread; I hope to have a suggestion for a way forward regarding
the open issue later tonight.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 13 Apr 2023, at 18:42, Daniel Gustafsson <daniel@yesql.se> wrote:

> Regarding the thread; I hope to have a suggestion for a way forward regarding
> the open issue later tonight.

After reading OpenSSL code and documentation, I think the simplest solution is
to explicitly check for X509 errors when OpenSSL reports SSL_ERROR_SYSCALL.
It's not documented why this particular errorcode is used, but AFAICT it's
because while it is a cert verification failure, the cause of it is an IO error
in reading a non-existing file or directory.

The attached diff passes the tests on OpenSSL 1.0.1 through 3.1 as well as on
LibreSSL. Thoughts?

--
Daniel Gustafsson


Attachment
Daniel Gustafsson <daniel@yesql.se> writes:
> The attached diff passes the tests on OpenSSL 1.0.1 through 3.1 as well as on
> LibreSSL. Thoughts?

1. You can't assume that errno starts out zero, unless you zero it
right before SSL_connect.

2. I wonder whether it's safe to assume that errno (a/k/a SOCK_ERRNO)
can't be clobbered by SSL_get_verify_result.

3. It seems weird to refer to errno directly just a couple lines away
from where we refer to it via SOCK_ERRNO.  Will this even compile
on Windows?

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 14 Apr 2023, at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> The attached diff passes the tests on OpenSSL 1.0.1 through 3.1 as well as on
>> LibreSSL. Thoughts?
>
> 1. You can't assume that errno starts out zero, unless you zero it
> right before SSL_connect.

Maybe we should do that regardless of this?  We do for reading and writing but
not in open_client_SSL, and I can't off the top of my head think of a good
reason not to?

> 2. I wonder whether it's safe to assume that errno (a/k/a SOCK_ERRNO)
> can't be clobbered by SSL_get_verify_result.
>
> 3. It seems weird to refer to errno directly just a couple lines away
> from where we refer to it via SOCK_ERRNO.  Will this even compile
> on Windows?

Good points, it should of course be SOCK_ERRNO.  The attached saves off errno
and reinstates it to avoid clobbering.  Will test it on Windows in the morning
as well.

--
Daniel Gustafsson


Attachment
Daniel Gustafsson <daniel@yesql.se> writes:
> Good points, it should of course be SOCK_ERRNO.  The attached saves off errno
> and reinstates it to avoid clobbering.  Will test it on Windows in the morning
> as well.

I think instead of this:

+                    SOCK_ERRNO_SET(save_errno);

you could just do this:

                         libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
-                                           SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+                                           SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf)));

Although ... we're already assuming that SSL_get_error and ERR_get_error
don't clobber errno.  Maybe SSL_get_verify_result doesn't either.
Or we could make it look like this:

+    SOCK_ERRNO_SET(0);
     ERR_clear_error();
     r = SSL_connect(conn->ssl);
     if (r <= 0)
+       int            save_errno = SOCK_ERRNO;
        int            err = SSL_get_error(conn->ssl, r);
        unsigned long ecode;

        ...

-                                           SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+                                           SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf)));

to remove all doubt.

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 14 Apr 2023, at 01:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> Good points, it should of course be SOCK_ERRNO.  The attached saves off errno
>> and reinstates it to avoid clobbering.  Will test it on Windows in the morning
>> as well.
>
> I think instead of this:
>
> +                    SOCK_ERRNO_SET(save_errno);
>
> you could just do this:
>
>                         libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
> -                                           SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
> +                                           SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf)));
>
> Although ... we're already assuming that SSL_get_error and ERR_get_error
> don't clobber errno.  Maybe SSL_get_verify_result doesn't either.
> Or we could make it look like this:
>
> +    SOCK_ERRNO_SET(0);
>     ERR_clear_error();
>     r = SSL_connect(conn->ssl);
>     if (r <= 0)
> +       int            save_errno = SOCK_ERRNO;
>        int            err = SSL_get_error(conn->ssl, r);
>        unsigned long ecode;
>
>        ...
>
> -                                           SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
> +                                           SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf)));
>
> to remove all doubt.

I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have any
strong opinions either way so I went with the latter suggestion.  Attached v3
does the above change and passes the tests both with a broken and working
system CA pool.  Unless objections from those with failing local envs I propose
this is pushed to close the open item.

--
Daniel Gustafsson


Attachment
Daniel Gustafsson <daniel@yesql.se> writes:
> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have any
> strong opinions either way so I went with the latter suggestion.  Attached v3
> does the above change and passes the tests both with a broken and working
> system CA pool.  Unless objections from those with failing local envs I propose
> this is pushed to close the open item.

One more question when looking at it with fresh eyes: should the argument
of X509_verify_cert_error_string be "ecode" or "vcode"?

            regards, tom lane



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:

> On 14 Apr 2023, at 15:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have any
>> strong opinions either way so I went with the latter suggestion.  Attached v3
>> does the above change and passes the tests both with a broken and working
>> system CA pool.  Unless objections from those with failing local envs I propose
>> this is pushed to close the open item.
>
> One more question when looking at it with fresh eyes: should the argument
> of X509_verify_cert_error_string be "ecode" or "vcode"?

Good catch, it should be vcode.

--
Daniel Gustafsson





Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 14 Apr 2023, at 16:20, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 14 Apr 2023, at 15:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have any
>>> strong opinions either way so I went with the latter suggestion.  Attached v3
>>> does the above change and passes the tests both with a broken and working
>>> system CA pool.  Unless objections from those with failing local envs I propose
>>> this is pushed to close the open item.
>>
>> One more question when looking at it with fresh eyes: should the argument
>> of X509_verify_cert_error_string be "ecode" or "vcode"?
>
> Good catch, it should be vcode.

And again with the attachment.

--
Daniel Gustafsson


Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> And again with the attachment.

After some sleep... From inspection I think the final EOF branch could
be masked by the new branch, if verification has failed but was already
ignored.

To test that, I tried hanging up on the client partway through the
server handshake, and I got some strange results. With the patch, using
sslmode=require and OpenSSL 1.0.1, I see:

    connection to server at "127.0.0.1", port 50859 failed: SSL error:
certificate verify failed: self signed certificate

Which is wrong -- we shouldn't care about the self-signed failure if
we're not using verify-*. I was going to suggest a patch like the following:

>                     if (r == -1)
> -                       libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
> -                                         SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
> +                   {
> +                       /*
> +                        * If we get an X509 error here without an error in the
> +                        * socket layer it means that verification failed without
> +                        * it being a protocol error. A common cause is trying to
> +                        * a default system CA which is missing or broken.
> +                        */
> +                       if (!save_errno && vcode != X509_V_OK)
> +                           libpq_append_conn_error(conn, "SSL error: certificate verify failed: %s",
> +                                                   X509_verify_cert_error_string(vcode));
> +                       else
> +                           libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
> +                                             SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf)));
> +                   }
>                     else
>                         libpq_append_conn_error(conn, "SSL SYSCALL error: EOF detected");

But then I tested my case against PG15, and I didn't get the EOF message
I expected:

    connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL
error: Success

So it appears that this (hanging up on the client during the handshake)
is _also_ a case where we could get a SYSCALL error with a zero errno,
and my patch doesn't actually fix the misleading error message.

That makes me worried, but I don't really have a concrete suggestion to
make it better, yet. I'm not opposed to pushing this as a fix for the
tests, but I suspect we'll have to iterate on this more.

Thanks,
--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 14 Apr 2023, at 19:34, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>> And again with the attachment.
>
> After some sleep... From inspection I think the final EOF branch could
> be masked by the new branch, if verification has failed but was already
> ignored.
>
> To test that, I tried hanging up on the client partway through the
> server handshake, and I got some strange results. With the patch, using
> sslmode=require and OpenSSL 1.0.1, I see:
>
>    connection to server at "127.0.0.1", port 50859 failed: SSL error:
> certificate verify failed: self signed certificate
>
> Which is wrong -- we shouldn't care about the self-signed failure if
> we're not using verify-*. I was going to suggest a patch like the following:
>
>>                    if (r == -1)
>> -                       libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
>> -                                         SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
>> +                   {
>> +                       /*
>> +                        * If we get an X509 error here without an error in the
>> +                        * socket layer it means that verification failed without
>> +                        * it being a protocol error. A common cause is trying to
>> +                        * a default system CA which is missing or broken.
>> +                        */
>> +                       if (!save_errno && vcode != X509_V_OK)
>> +                           libpq_append_conn_error(conn, "SSL error: certificate verify failed: %s",
>> +                                                   X509_verify_cert_error_string(vcode));
>> +                       else
>> +                           libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
>> +                                             SOCK_STRERROR(save_errno, sebuf, sizeof(sebuf)));
>> +                   }
>>                    else
>>                        libpq_append_conn_error(conn, "SSL SYSCALL error: EOF detected");
>
> But then I tested my case against PG15, and I didn't get the EOF message
> I expected:
>
>    connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL
> error: Success

This "error: Success" error has been reported to the list numerous times as
misleading, and I'd love to make progress on improving error reporting during
the v17 cycle.

> So it appears that this (hanging up on the client during the handshake)
> is _also_ a case where we could get a SYSCALL error with a zero errno,
> and my patch doesn't actually fix the misleading error message.
>
> That makes me worried, but I don't really have a concrete suggestion to
> make it better, yet. I'm not opposed to pushing this as a fix for the
> tests, but I suspect we'll have to iterate on this more.

So, taking a step back.  We know that libpq error reporting for SSL errors
isn't great, the permutations of sslmodes and OpenSSL versions and the very
fine-grained error handling API of OpenSSL make it hard to generalize well.
That's not what we're trying to solve here.

What we are trying solve is this one case where we know exactly what went
wrong, and we know that the error message as-is will be somewhere between
misleading and utterly bogus.  The committed feature is working as intended,
and the connection is refused as it should when no CA is available, but we know
it's a situation which is quite easy to get oneself into (a typo in an
environment variable can be enough).  So what we can do is pinpoint that
specific case and leave the unknowns to the current error reporting for
consistency with older postgres versions.

The attached checks for the specific known error, and leave all the other cases
to the same logging that we have today.  It relies on the knowledge that system
sslrootcert configs has deferred loading, and will run with verify-full.  So if
we see an X509 failure in loading the local issuer cert here then we know the
the user wanted to use the system CA pool for certificate verification but the
root CA cannot be loaded for some reason.

--
Daniel Gustafsson


Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Fri, Apr 14, 2023 at 3:36 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> This "error: Success" error has been reported to the list numerous times as
> misleading, and I'd love to make progress on improving error reporting during
> the v17 cycle.

Agreed!

> The attached checks for the specific known error, and leave all the other cases
> to the same logging that we have today.  It relies on the knowledge that system
> sslrootcert configs has deferred loading, and will run with verify-full.  So if
> we see an X509 failure in loading the local issuer cert here then we know the
> the user wanted to use the system CA pool for certificate verification but the
> root CA cannot be loaded for some reason.

This LGTM; I agree with your reasoning. Note that it won't fix the
(completely different) misleading error message for OpenSSL 3.0, but
since that's an *actively* unhelpful error message coming back from
OpenSSL, I don't think we want to override it. For 3.1, we have no
information and we're trying to fill in the gaps.

--Jacob



Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 17 Apr 2023, at 18:20, Jacob Champion <jchampion@timescale.com> wrote:

> Note that it won't fix the
> (completely different) misleading error message for OpenSSL 3.0, but
> since that's an *actively* unhelpful error message coming back from
> OpenSSL, I don't think we want to override it.

Agreed, the best we can do there is to memorize it in the test.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 18 Apr 2023, at 14:32, Daniel Gustafsson <daniel@yesql.se> wrote:
> 
>> On 17 Apr 2023, at 18:20, Jacob Champion <jchampion@timescale.com> wrote:
> 
>> Note that it won't fix the
>> (completely different) misleading error message for OpenSSL 3.0, but
>> since that's an *actively* unhelpful error message coming back from
>> OpenSSL, I don't think we want to override it.
> 
> Agreed, the best we can do there is to memorize it in the test.

This has been done and the open item marked as completed.

--
Daniel Gustafsson




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Jacob Champion
Date:
On Fri, Apr 21, 2023 at 12:55 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> This has been done and the open item marked as completed.

Thanks! Now that the weirdness is handled by the tests, I think we can
remove the Cirrus workaround. Something like the attached, which
passes the macOS Meson suite for me.

--Jacob

Attachment

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From
Daniel Gustafsson
Date:
> On 21 Apr 2023, at 18:56, Jacob Champion <jchampion@timescale.com> wrote:
>
> On Fri, Apr 21, 2023 at 12:55 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>> This has been done and the open item marked as completed.
>
> Thanks! Now that the weirdness is handled by the tests, I think we can
> remove the Cirrus workaround. Something like the attached, which
> passes the macOS Meson suite for me.

Agreed, I had this on my TODO list for when the test fix patch had landed.
Verified in CI for me as well so pushed.  Thanks!

--
Daniel Gustafsson