Thread: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

Hello, I've recently joined the list on a tip from one of the maintainers of jdbc-postgres as I wanted to discuss an issue we've run into and find out if the fix we've worked out is the right thing to do, or if there is actually a bug that needs to be fixed.

The full details can be found at github.com/pgjdbc/pgjdbc/discussions/3236 - in summary, both jdbc-postgres and the psql cli seem to be affected by an issue validating the certificate chain up to a publicly trusted root certificate that has cross-signed an intermediate certificate coming from a Postgres server in Azure, when using sslmode=verify-full and trying to rely on the default path for sslrootcert.

The workaround we came up with is to add the original root cert, not the root that cross-signed the intermediate, to root.crt, in order to avoid needing to specify sslrootcert=<the default path>. This allows the full chain to be verified.

I believe that either one should be able to be placed there without me needing to explicitly specify sslrootcert=<the default path>, but if I use the CA that cross-signed the intermediate cert, and don't specify sslrootcert=<some path, either default or not> the chain validation fails.

Thank you,

Thomas
On Tue, Apr 30, 2024 at 2:41 PM Thomas Spear <speeddymon@gmail.com> wrote:
> The full details can be found at github.com/pgjdbc/pgjdbc/discussions/3236 - in summary, both jdbc-postgres and the
psqlcli seem to be affected by an issue validating the certificate chain up to a publicly trusted root certificate that
hascross-signed an intermediate certificate coming from a Postgres server in Azure, when using sslmode=verify-full and
tryingto rely on the default path for sslrootcert. 

Hopefully someone more familiar with the Azure cross-signing setup
sees something obvious and chimes in, but in the meantime there are a
couple things I can think to ask:

1. Are you sure that the server is actually putting the cross-signed
intermediate in the chain it's serving to the client?

2. What version of OpenSSL? There used to be validation bugs with
alternate trust paths; hopefully you're not using any of those (I
think they're old as dirt), but it doesn't hurt to know.

3. Can you provide a sample public certificate chain that should
validate and doesn't?

Thanks,
--Jacob



On Tue, Apr 30, 2024 at 5:19 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote:
On Tue, Apr 30, 2024 at 2:41 PM Thomas Spear <speeddymon@gmail.com> wrote:
> The full details can be found at github.com/pgjdbc/pgjdbc/discussions/3236 - in summary, both jdbc-postgres and the psql cli seem to be affected by an issue validating the certificate chain up to a publicly trusted root certificate that has cross-signed an intermediate certificate coming from a Postgres server in Azure, when using sslmode=verify-full and trying to rely on the default path for sslrootcert.

Hopefully someone more familiar with the Azure cross-signing setup
sees something obvious and chimes in, but in the meantime there are a
couple things I can think to ask:

1. Are you sure that the server is actually putting the cross-signed
intermediate in the chain it's serving to the client?


Hello Jacob, thanks for your reply.

I can say I'm reasonably certain. I dumped out the certificates presented by the server using openssl, and the chain that gets output includes "Microsoft Azure RSA TLS Issuing CA 08".
On https://www.microsoft.com/pkiops/docs/repository.htm the page says that that cert was cross-signed by the DigiCert RSA G2 root.
The postgres server appears to send the Microsoft root certificate instead of the DigiCert one, which should be fine. The server sends the "Microsoft RSA Root Certificate Authority 2017" root.
As far as I understand, a server sending a root certificate along with the intermediate is a big no-no, but that's a topic for a different thread and audience most likely. :)

2. What version of OpenSSL? There used to be validation bugs with
alternate trust paths; hopefully you're not using any of those (I
think they're old as dirt), but it doesn't hurt to know.


The openssl version in my Windows test system is 3.0.7. It's running Almalinux 9 in WSL2, so openssl is from the package manager. The container image I'm using has an old-as-dirt openssl 1.1.1k. It's built using a RHEL UBI8 image as the base layer, so it doesn't surprise me that the package manager-provided version of openssl here is old as dirt, so I'll have to look at making a build of 3.x for this container or maybe switching out the base layer to ubuntu temporarily to test if we need to.
 
3. Can you provide a sample public certificate chain that should
validate and doesn't?


I'll get back to you on this one. I'll have to check one of our public cloud postgres instances to see if I can reproduce the issue there in order to get a chain that I can share because the system where I'm testing is a locked down jump host to our Azure GovCloud infrastructure, and I can't copy anything out from it.

Thanks again

--Thomas

On Wed, May 1, 2024 at 6:48 AM Thomas Spear <speeddymon@gmail.com> wrote:
> I dumped out the certificates presented by the server using openssl, and the chain that gets output includes
"MicrosoftAzure RSA TLS Issuing CA 08". 
> On https://www.microsoft.com/pkiops/docs/repository.htm the page says that that cert was cross-signed by the DigiCert
RSAG2 root. 

It's been a while since I've looked at cross-signing, but that may not
be enough information to prove that it's the "correct" version of the
intermediate. You'd need to know the Issuer, not just the Subject, for
all the intermediates that were given to the client. (It may not match
the one they have linked on their support page.)

> The postgres server appears to send the Microsoft root certificate instead of the DigiCert one, which should be fine.
Theserver sends the "Microsoft RSA Root Certificate Authority 2017" root. 
> As far as I understand, a server sending a root certificate along with the intermediate is a big no-no, but that's a
topicfor a different thread and audience most likely. :) 

To me, that only makes me more suspicious that the chain the server is
sending you may not be the chain you're expecting. Especially since
you mentioned on the other thread that the MS root is working and the
DigiCert root is not.

> The openssl version in my Windows test system is 3.0.7. It's running Almalinux 9 in WSL2, so openssl is from the
packagemanager. The container image I'm using has an old-as-dirt openssl 1.1.1k. 

I'm not aware of any validation issues with 1.1.1k, for what it's
worth. If upgrading helps, great! -- but I wouldn't be surprised if it
didn't.

> I'll have to check one of our public cloud postgres instances to see if I can reproduce the issue there in order to
geta chain that I can share because the system where I'm testing is a locked down jump host to our Azure GovCloud
infrastructure,and I can't copy anything out from it. 

Yeah, if at all possible, that'd make it easier to point at any
glaring problems.

Thanks,
--Jacob



On Wed, May 1, 2024 at 9:23 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote:
On Wed, May 1, 2024 at 6:48 AM Thomas Spear <speeddymon@gmail.com> wrote:
> I dumped out the certificates presented by the server using openssl, and the chain that gets output includes "Microsoft Azure RSA TLS Issuing CA 08".
> On https://www.microsoft.com/pkiops/docs/repository.htm the page says that that cert was cross-signed by the DigiCert RSA G2 root.

It's been a while since I've looked at cross-signing, but that may not
be enough information to prove that it's the "correct" version of the
intermediate. You'd need to know the Issuer, not just the Subject, for
all the intermediates that were given to the client. (It may not match
the one they have linked on their support page.)


Fair enough. The server issuer is C=US,O=Microsoft Corporation,CN=Microsoft Azure RSA TLS Issuing CA 08
The intermediate's issuer is C=US,O=Microsoft Corporation,CN=Microsoft RSA Root Certificate Authority 2017 so I think that you're absolutely correct. The intermediate on the support page reflects the DigiCert issuer, but the one from the server reflects the Microsoft issuer.

Circling back to my original question, why is there a difference in behavior?

What I believe should be happening isn't what's happening:
1. If ~/.postgresql/root.crt contains the MS root, and I don't specify sslrootcert= -- successful validation
2. If ~/.postgresql/root.crt contains the MS root, and I specify sslrootcert= -- successful validation
3. If ~/.postgresql/root.crt contains the DigiCert root, and I don't specify sslrootcert= -- validation fails
4. If ~/.postgresql/root.crt contains the DigiCert root, and I specify sslrootcert= -- successful validation

Case 3 should succeed IMHO since case 4 does.
 
> The postgres server appears to send the Microsoft root certificate instead of the DigiCert one, which should be fine. The server sends the "Microsoft RSA Root Certificate Authority 2017" root.
> As far as I understand, a server sending a root certificate along with the intermediate is a big no-no, but that's a topic for a different thread and audience most likely. :)

To me, that only makes me more suspicious that the chain the server is
sending you may not be the chain you're expecting. Especially since
you mentioned on the other thread that the MS root is working and the
DigiCert root is not.


Yeah, I agree. So then I need to talk to MS about why the portal is giving us the wrong root -- and I'll open a support ticket with them for this. I still don't understand why the above difference in behavior happens though. Is that specifically because the server is sending the MS root? Still doesn't seem to make a whole lot of sense. If the DigiCert root can validate the chain when it's explicitly passed, it should be able to validate the chain when it's implicitly the only root cert available to a postgres client.
 
> The openssl version in my Windows test system is 3.0.7. It's running Almalinux 9 in WSL2, so openssl is from the package manager. The container image I'm using has an old-as-dirt openssl 1.1.1k.

I'm not aware of any validation issues with 1.1.1k, for what it's
worth. If upgrading helps, great! -- but I wouldn't be surprised if it
didn't.

 
I was thinking the same honestly. If it breaks for jdbc-postgres on 1.1.1k and psql cli on 3.0.7 then it's likely not an issue there.

> I'll have to check one of our public cloud postgres instances to see if I can reproduce the issue there in order to get a chain that I can share because the system where I'm testing is a locked down jump host to our Azure GovCloud infrastructure, and I can't copy anything out from it.

Yeah, if at all possible, that'd make it easier to point at any
glaring problems.


I should be able to do this today.

Thanks again!

--Thomas
On Wed, May 1, 2024 at 8:17 AM Thomas Spear <speeddymon@gmail.com> wrote:
> Circling back to my original question, why is there a difference in behavior?
>
> What I believe should be happening isn't what's happening:
> 1. If ~/.postgresql/root.crt contains the MS root, and I don't specify sslrootcert= -- successful validation
> 2. If ~/.postgresql/root.crt contains the MS root, and I specify sslrootcert= -- successful validation
> 3. If ~/.postgresql/root.crt contains the DigiCert root, and I don't specify sslrootcert= -- validation fails
> 4. If ~/.postgresql/root.crt contains the DigiCert root, and I specify sslrootcert= -- successful validation

Number 4 is the only one that seems off to me given what we know. If
you're saying that the server's chain never mentions DigiCert as an
issuer, then I see no reason that the DigiCert root should ever
successfully validate the chain. You mentioned on the other thread
that

> We eventually found the intermediate cert was missing from the system trust, so we tried adding that without success

and that has me a little worried. Why would the intermediate need to
be explicitly trusted?

I also notice from the other thread that sometimes you're testing on
Linux and sometimes you're testing on Windows, and that you've mixed
in a couple of different sslmodes during debugging. So I want to make
absolutely sure: are you _certain_ that case number 4 is a true
statement? In other words, there's nothing in your default root.crt
except the DigiCert root, you've specified exactly the same path in
sslrootcert as the one that's loaded by default, and your experiments
are all using verify-full?

The default can also be modified by a bunch of environmental factors,
including $PGSSLROOTCERT, $HOME, the effective user ID, etc. (On
Windows I don't know what the %APPDATA% conventions are.) If you empty
out your root.crt file, you should get a clear message that libpq
tried to load certificates from it and couldn't; otherwise, it's
finding the default somewhere else.

--Jacob



On Wed, May 1, 2024 at 12:31 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote:
On Wed, May 1, 2024 at 8:17 AM Thomas Spear <speeddymon@gmail.com> wrote:
> Circling back to my original question, why is there a difference in behavior?
>
> What I believe should be happening isn't what's happening:
> 1. If ~/.postgresql/root.crt contains the MS root, and I don't specify sslrootcert= -- successful validation
> 2. If ~/.postgresql/root.crt contains the MS root, and I specify sslrootcert= -- successful validation
> 3. If ~/.postgresql/root.crt contains the DigiCert root, and I don't specify sslrootcert= -- validation fails
> 4. If ~/.postgresql/root.crt contains the DigiCert root, and I specify sslrootcert= -- successful validation

Number 4 is the only one that seems off to me given what we know.
 
I see how that could be true.
 
If you're saying that the server's chain never mentions DigiCert as an
issuer, then I see no reason that the DigiCert root should ever
successfully validate the chain. You mentioned on the other thread
that

> We eventually found the intermediate cert was missing from the system trust, so we tried adding that without success

and that has me a little worried. Why would the intermediate need to
be explicitly trusted?


Right, so just to be clear, all of the details from the other thread was testing done in a container running on Kubernetes, so when adding the intermediate to the "system trust" it was the container's java trust store. When that didn't work, we removed it from the Dockerfile again so the future builds didn't include the trust for that cert.
 
I also notice from the other thread that sometimes you're testing on
Linux and sometimes you're testing on Windows, and that you've mixed
in a couple of different sslmodes during debugging. So I want to make
absolutely sure: are you _certain_ that case number 4 is a true
statement? In other words, there's nothing in your default root.crt
except the DigiCert root, you've specified exactly the same path in
sslrootcert as the one that's loaded by default, and your experiments
are all using verify-full?

The default can also be modified by a bunch of environmental factors,
including $PGSSLROOTCERT, $HOME, the effective user ID, etc. (On
Windows I don't know what the %APPDATA% conventions are.) If you empty
out your root.crt file, you should get a clear message that libpq
tried to load certificates from it and couldn't; otherwise, it's
finding the default somewhere else.


I redid the command line tests to be sure, from Windows command prompt so that I can't rely on my bash command history from AlmaLinux and instead had to type everything out by hand.
It does fail to validate for case 4 after all. I must have had a copy/paste error during past tests.

With no root.crt file present, the psql command complains that root.crt is missing as well.

So then it sounds like putting the MS root in root.crt (as we have done to fix this) is the correct thing to do, and there's no issue. It doesn't seem libpq will use the trusted roots that are typically located in either /etc/ssl or /etc/pki so we have to provide the root in the path where libpq expects it to be to get verify-full to work properly.

Thanks for helping me to confirm this. I'll get a case open with MS regarding the wrong root download from the portal in GovCloud.

--Thomas
On Wed, May 1, 2024 at 11:57 AM Thomas Spear <speeddymon@gmail.com> wrote:
> It does fail to validate for case 4 after all. I must have had a copy/paste error during past tests.

Okay, good. Glad it's behaving as expected!

> So then it sounds like putting the MS root in root.crt (as we have done to fix this) is the correct thing to do, and
there'sno issue. It doesn't seem libpq will use the trusted roots that are typically located in either /etc/ssl or
/etc/pkiso we have to provide the root in the path where libpq expects it to be to get verify-full to work properly. 

Right. Versions 16 and later will let you use `sslrootcert=system` to
load those /etc locations more easily, but if the MS root isn't in the
system PKI stores and the server isn't sending the DigiCert chain then
that probably doesn't help you.

> Thanks for helping me to confirm this. I'll get a case open with MS regarding the wrong root download from the portal
inGovCloud. 

Happy to help!

Have a good one,
--Jacob