Thread: pgsql: Log details for client certificate failures

pgsql: Log details for client certificate failures

From
Peter Eisentraut
Date:
Log details for client certificate failures

Currently, debugging client certificate verification failures is
mostly limited to looking at the TLS alert code on the client side.
For simple deployments, sometimes it's enough to see "sslv3 alert
certificate revoked" and know exactly what needs to be fixed, but if
you add any more complexity (multiple CA layers, misconfigured CA
certificates, etc.), trying to debug what happened based on the TLS
alert alone can be an exercise in frustration.

Luckily, the server has more information about exactly what failed in
the chain, and we already have the requisite callback implemented as a
stub.  We fill that in, collect the data, and pass the constructed
error message back to the main code via a static variable.  This lets
us add our error details directly to the final "could not accept SSL
connection" log message, as opposed to issuing intermediate LOGs.

It ends up looking like

    LOG:  connection received: host=localhost port=43112
    LOG:  could not accept SSL connection: certificate verify failed
    DETAIL:  Client certificate verification failed at depth 1: unable to get local issuer certificate.
            Failed certificate data (unverified): subject "/CN=Test CA for PostgreSQL SSL regression test client
certs",serial number 2315134995201656577, issuer "/CN=Test root CA for PostgreSQL SSL regression test suite". 

The length of the Subject and Issuer strings is limited to prevent
malicious client certs from spamming the logs.  In case the truncation
makes things ambiguous, the certificate's serial number is also
logged.

Author: Jacob Champion <pchampion@vmware.com>
Discussion: https://www.postgresql.org/message-id/flat/d13c4a5787c2a3f83705124f0391e0738c796751.camel@vmware.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3a0e385048adf84d212e010d82db16228e5f4060

Modified Files
--------------
src/backend/libpq/be-secure-openssl.c | 112 +++++++++++++++++++++++++++++++++-
src/test/ssl/conf/client-long.config  |  14 +++++
src/test/ssl/ssl/client-long.crt      |  20 ++++++
src/test/ssl/ssl/client-long.key      |  27 ++++++++
src/test/ssl/sslfiles.mk              |   2 +-
src/test/ssl/t/001_ssltests.pl        |  40 +++++++++++-
src/test/ssl/t/SSL/Backend/OpenSSL.pm |   2 +-
7 files changed, 210 insertions(+), 7 deletions(-)


Re: pgsql: Log details for client certificate failures

From
Justin Pryzby
Date:
On Fri, Jul 15, 2022 at 03:35:28PM +0000, Peter Eisentraut wrote:
> Log details for client certificate failures

This seems to be causing cfbot/linux to fail since yesterday.

-- 
Justin



Re: pgsql: Log details for client certificate failures

From
Peter Eisentraut
Date:
On 16.07.22 13:37, Justin Pryzby wrote:
> On Fri, Jul 15, 2022 at 03:35:28PM +0000, Peter Eisentraut wrote:
>> Log details for client certificate failures
> 
> This seems to be causing cfbot/linux to fail since yesterday.

Hmm, the Cirrus CI build works for me: 
https://cirrus-ci.com/task/6520112718020608

Isn't the cfbot using the same thing?



Re: pgsql: Log details for client certificate failures

From
Justin Pryzby
Date:
On Sat, Jul 16, 2022 at 01:57:25PM +0200, Peter Eisentraut wrote:
> On 16.07.22 13:37, Justin Pryzby wrote:
> > On Fri, Jul 15, 2022 at 03:35:28PM +0000, Peter Eisentraut wrote:
> > > Log details for client certificate failures
> > 
> > This seems to be causing cfbot/linux to fail since yesterday.
> 
> Hmm, the Cirrus CI build works for me:
> https://cirrus-ci.com/task/6520112718020608
> 
> Isn't the cfbot using the same thing?

It's using the same thing, so I suppose the failure is intermittent.
It failed for me just now: https://cirrus-ci.com/task/6502388730167296

-- 
Justin



Re: pgsql: Log details for client certificate failures

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sat, Jul 16, 2022 at 01:57:25PM +0200, Peter Eisentraut wrote:
>> Isn't the cfbot using the same thing?

> It's using the same thing, so I suppose the failure is intermittent.
> It failed for me just now: https://cirrus-ci.com/task/6502388730167296

It looks like a timing problem, which you'd probably never reproduce
on an unloaded development system.  On the presumably-busy CI hosts,
I think we're managing to examine the log file before the backend has
written the log entry.

Can we rejigger the backend so that the log entry is written before
we shove the failure response out to the client?  Maybe not given the
implementation technology.  In that case the alternatives seem to be
(1) insert a delay in the log_like checks (ugh) or
(2) abandon attempting to test this particular behavior.

            regards, tom lane



Re: pgsql: Log details for client certificate failures

From
Tom Lane
Date:
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> It failed for me just now: https://cirrus-ci.com/task/6502388730167296

Looking at the cfbot, an annoyingly large percentage of current runs
are being broken by this.  I think we need to do this as a stopgap:

> (2) abandon attempting to test this particular behavior.

If somebody wants to think up a better answer later, great, but
the current situation is completely not tolerable.

            regards, tom lane



Re: pgsql: Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-16 10:45:29 -0400, Tom Lane wrote:
> > Justin Pryzby <pryzby@telsasoft.com> writes:
> >> It failed for me just now: https://cirrus-ci.com/task/6502388730167296
> 
> Looking at the cfbot, an annoyingly large percentage of current runs
> are being broken by this.  I think we need to do this as a stopgap:
> 
> > (2) abandon attempting to test this particular behavior.
> 
> If somebody wants to think up a better answer later, great, but
> the current situation is completely not tolerable.

Yea, it's quite annoying. I think it might be worth to not wait for Peter to
chime in, but to just add a skip to the test for now?

Greetings,

Andres Freund



Re: pgsql: Log details for client certificate failures

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-07-16 10:45:29 -0400, Tom Lane wrote:
>> Looking at the cfbot, an annoyingly large percentage of current runs
>> are being broken by this.  I think we need to do this as a stopgap:
>>> (2) abandon attempting to test this particular behavior.

> Yea, it's quite annoying. I think it might be worth to not wait for Peter to
> chime in, but to just add a skip to the test for now?

I was thinking of commenting out the log_like parts of the affected
tests.  Maybe that's too cute, but at least it'd still run through
the code of interest, even if we don't verify the output.

Anyway, if Peter doesn't answer pretty soon I'll go do that.

            regards, tom lane



Re: pgsql: Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-16 17:33:47 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Yea, it's quite annoying. I think it might be worth to not wait for Peter to
> > chime in, but to just add a skip to the test for now?
> 
> I was thinking of commenting out the log_like parts of the affected
> tests.  Maybe that's too cute, but at least it'd still run through
> the code of interest, even if we don't verify the output.

WFM.

> Anyway, if Peter doesn't answer pretty soon I'll go do that.

It's about midnight in Germany, so unlikely to come soon...

Greetings,

Andres Freund



Re: pgsql: Log details for client certificate failures

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-07-16 17:33:47 -0400, Tom Lane wrote:
>> Anyway, if Peter doesn't answer pretty soon I'll go do that.

> It's about midnight in Germany, so unlikely to come soon...

Done now.

            regards, tom lane



Re: pgsql: Log details for client certificate failures

From
Justin Pryzby
Date:
On Sat, Jul 16, 2022 at 06:26:57PM -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-07-16 17:33:47 -0400, Tom Lane wrote:
> >> Anyway, if Peter doesn't answer pretty soon I'll go do that.
> 
> > It's about midnight in Germany, so unlikely to come soon...
> 
> Done now.

I think this is still failing
https://cirrus-ci.com/task/6142528721256448

You can see the cfbot user's test history here
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/



Re: pgsql: Log details for client certificate failures

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> I think this is still failing
> https://cirrus-ci.com/task/6142528721256448

Sigh ... misleading formatting led me to see one log_like as
belonging to a connect_ok test, when it belongs to a connect_fails.
Fixed, thanks.

            regards, tom lane