Thread: [PATCH] Log details for client certificate failures

[PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
Hello,

(I'm cleaning up some old git branches and found this. It was helpful
when I was trying to debug failures between an NSS client and an
OpenSSL server, and it seems general enough to help for more
complicated OpenSSL-only setups as well.)

Currently, debugging client cert verification failures is mostly
limited to looking at the TLS alert code on the client side. For simple
deployments, usually it's enough to see "sslv3 alert certificate
revoked" and know exactly what needs to be fixed, but if you add any
more complexity (multiple CA layers, misconfigured CA certificates,
etc.), trying to debug what happened based on the TLS alert alone can
be an exercise in frustration.

Luckily, the server has more information about exactly what failed in
the chain, and we already have the requisite callback implemented as a
stub, so I've filled it out with error handling and added a COMMERROR
log so that a DBA can debug client failures more easily.

It ends up looking like

    LOG:  connection received: host=localhost port=44120
    LOG:  client certificate verification failed at depth 1: unable to get local issuer certificate
    DETAIL:  failed certificate's subject: /CN=Test CA for PostgreSQL SSL regression test client certs
    LOG:  could not accept SSL connection: certificate verify failed

It might be even nicer to make this available to the client, but I
think the server log is an appropriate place for this information -- an
admin might not want to advertise exactly why a client certificate has
failed verification (other than what's already available via the TLS
alert, that is), and I think more complicated failures (with
intermediate CAs, etc.) are going to need administrator intervention
anyway. So having to check the logs doesn't seem like a big hurdle.

One question/concern -- the Subject that's printed to the logs could be
pretty big (OpenSSL limits the incoming certificate chain to 100K, by
default), which introduces an avenue for intentional log spamming. Is
there an existing convention for limiting the length of log output used
for debugging? Maybe I should just hardcode a smaller limit and
truncate anything past that? Or we could just log the Common Name,
which should be limited to 64 bytes...

I'll add this to the July commitfest.

Thanks,
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 03.05.22 19:04, Jacob Champion wrote:
> One question/concern -- the Subject that's printed to the logs could be
> pretty big (OpenSSL limits the incoming certificate chain to 100K, by
> default), which introduces an avenue for intentional log spamming. Is
> there an existing convention for limiting the length of log output used
> for debugging? Maybe I should just hardcode a smaller limit and
> truncate anything past that? Or we could just log the Common Name,
> which should be limited to 64 bytes...

The information in pg_stat_ssl is limited to NAMEDATALEN (see struct 
PgBackendSSLStatus).

It might make sense to align what your patch prints to identify 
certificates with what is shown in that view.



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Tue, 2022-05-03 at 21:06 +0200, Peter Eisentraut wrote:
> The information in pg_stat_ssl is limited to NAMEDATALEN (see struct
> PgBackendSSLStatus).
> 
> It might make sense to align what your patch prints to identify
> certificates with what is shown in that view.

Sure, a max length should be easy enough to do. Is there a reason to
limit to NAMEDATALEN specifically? I was under the impression that we
would rather not have had that limitation in the stats framework, if we
could have avoided it. (In particular I think NAMEDATALEN will cut off
the longest possible Common Name by just five bytes.)

Thanks,
--Jacob

Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 04.05.22 01:05, Jacob Champion wrote:
> On Tue, 2022-05-03 at 21:06 +0200, Peter Eisentraut wrote:
>> The information in pg_stat_ssl is limited to NAMEDATALEN (see struct
>> PgBackendSSLStatus).
>>
>> It might make sense to align what your patch prints to identify
>> certificates with what is shown in that view.
> 
> Sure, a max length should be easy enough to do. Is there a reason to
> limit to NAMEDATALEN specifically? I was under the impression that we
> would rather not have had that limitation in the stats framework, if we
> could have avoided it. (In particular I think NAMEDATALEN will cut off
> the longest possible Common Name by just five bytes.)

Just saying that cutting it off appears to be acceptable.  A bit more 
than 63 bytes should be okay for the log.

In terms of aligning what is printed, I meant that pg_stat_ssl uses the 
issuer plus serial number to identify the certificate unambiguously.



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Wed, 2022-05-04 at 15:53 +0200, Peter Eisentraut wrote:
> Just saying that cutting it off appears to be acceptable.  A bit more
> than 63 bytes should be okay for the log.

Gotcha.

> In terms of aligning what is printed, I meant that pg_stat_ssl uses the
> issuer plus serial number to identify the certificate unambiguously.

Oh, that's a great idea. I'll do that too.

Thanks!
--Jacob

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, 2022-05-05 at 15:12 +0000, Jacob Champion wrote:
> On Wed, 2022-05-04 at 15:53 +0200, Peter Eisentraut wrote:
> > In terms of aligning what is printed, I meant that pg_stat_ssl uses the
> > issuer plus serial number to identify the certificate unambiguously.
> 
> Oh, that's a great idea. I'll do that too.

v2 limits the maximum subject length and adds the serial number to the
logs.

Thanks!
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 13.05.22 00:36, Jacob Champion wrote:
> On Thu, 2022-05-05 at 15:12 +0000, Jacob Champion wrote:
>> On Wed, 2022-05-04 at 15:53 +0200, Peter Eisentraut wrote:
>>> In terms of aligning what is printed, I meant that pg_stat_ssl uses the
>>> issuer plus serial number to identify the certificate unambiguously.
>>
>> Oh, that's a great idea. I'll do that too.
> 
> v2 limits the maximum subject length and adds the serial number to the
> logs.

I wrote that pg_stat_ssl uses the *issuer* plus serial number to 
identify a certificate.  What your patch shows is the subject and the 
serial number, which isn't the same thing.  Let's get that sorted out 
one way or the other.

Another point, your patch produces

     LOG:  connection received: host=localhost port=44120
     LOG:  client certificate verification failed at depth 1: ...
     DETAIL:  failed certificate had subject ...
     LOG:  could not accept SSL connection: certificate verify failed

I guess what we really would like is

     LOG:  connection received: host=localhost port=44120
     LOG:  could not accept SSL connection: certificate verify failed
     DETAIL:  client certificate verification failed at depth 1: ...
     failed certificate had subject ...

But I suppose that would be very cumbersome to produce with the callback 
structure provided by OpenSSL?

I'm not saying the proposed way is unacceptable, but maybe it's worth 
being explicit about this tradeoff.



Re: [PATCH] Log details for client certificate failures

From
Graham Leggett
Date:
On 30 Jun 2022, at 10:43, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I wrote that pg_stat_ssl uses the *issuer* plus serial number to identify a certificate.  What your patch shows is the subject and the serial number, which isn't the same thing.  Let's get that sorted out one way or the other.

Quick observation on this one, the string format of an issuer and serial number is defined as a “Certificate Exact Assertion” in RFC 4523.

I added this to httpd a while back:

SSL_CLIENT_CERT_RFC4523_CEA

It would be good to interoperate.

Regards,
Graham

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, Jun 30, 2022 at 2:43 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 13.05.22 00:36, Jacob Champion wrote:
> > v2 limits the maximum subject length and adds the serial number to the
> > logs.
>
> I wrote that pg_stat_ssl uses the *issuer* plus serial number to
> identify a certificate.  What your patch shows is the subject and the
> serial number, which isn't the same thing.  Let's get that sorted out
> one way or the other.

Sorry for the misunderstanding! v3 adds the Issuer to the logs as well.

I wanted to clarify that this "issuer" has not actually been verified,
but all I could come up with was "purported issuer" which doesn't read
well to me. "Claimed issuer"? "Alleged issuer"? Thoughts?

> Another point, your patch produces
>
>      LOG:  connection received: host=localhost port=44120
>      LOG:  client certificate verification failed at depth 1: ...
>      DETAIL:  failed certificate had subject ...
>      LOG:  could not accept SSL connection: certificate verify failed
>
> I guess what we really would like is
>
>      LOG:  connection received: host=localhost port=44120
>      LOG:  could not accept SSL connection: certificate verify failed
>      DETAIL:  client certificate verification failed at depth 1: ...
>      failed certificate had subject ...
>
> But I suppose that would be very cumbersome to produce with the callback
> structure provided by OpenSSL?

I was about to say "yes, very cumbersome", but I actually think we
might be able to do that without bubbling the error up through
multiple callback layers, using SSL_set_ex_data() and friends. I'll
take a closer look.

Thanks!
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, Jun 30, 2022 at 2:54 AM Graham Leggett <minfrin@sharp.fm> wrote:
>
> I added this to httpd a while back:
>
> SSL_CLIENT_CERT_RFC4523_CEA
>
> It would be good to interoperate.

What kind of interoperation did you have in mind? Are there existing
tools that want to scrape this information for observability?

I think the CEA syntax might not be a good fit for this particular
patch: first, we haven't actually verified the certificate, so no one
should be using it to assert certificate equality (and I'm truncating
the Issuer anyway, to avoid letting someone flood the logs). Second,
this is designed to be human-readable rather than machine-readable.

Thanks,
--Jacob



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Fri, Jul 1, 2022 at 1:51 PM Jacob Champion <jchampion@timescale.com> wrote:
> Sorry for the misunderstanding! v3 adds the Issuer to the logs as well.

Resending v3; I messed up the certificate diff with my gitconfig.

--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 05.07.22 18:34, Jacob Champion wrote:
> On Fri, Jul 1, 2022 at 1:51 PM Jacob Champion <jchampion@timescale.com> wrote:
>> Sorry for the misunderstanding! v3 adds the Issuer to the logs as well.
> 
> Resending v3; I messed up the certificate diff with my gitconfig.

This patch looks pretty good to me.  Some minor details:

I looked into how you decode the serial number.  I have found some code 
elsewhere that passed the result of X509_get_serialNumber() directly to 
ASN1_INTEGER_set().  But I guess a serial number of maximum length 20 
octets wouldn't fit into a 32-bit long.  (There is 
ASN1_INTEGER_set_int64(), but that requires OpenSSL 1.1.0.)  Does that 
match your understanding?

For the detail string, I think we could do something like:

DETAIL:  Failed certificate data (unverified): subject '%s', serial 
number %s, issuer '%s'



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, Jul 7, 2022 at 2:50 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> I looked into how you decode the serial number.  I have found some code
> elsewhere that passed the result of X509_get_serialNumber() directly to
> ASN1_INTEGER_set().  But I guess a serial number of maximum length 20
> octets wouldn't fit into a 32-bit long.  (There is
> ASN1_INTEGER_set_int64(), but that requires OpenSSL 1.1.0.)  Does that
> match your understanding?

Yep. And the bit lengths of the serial numbers used in the test suite
are in the low 60s already. Many people will just randomize their
serial numbers, so I think BN_bn2dec() is the way to go.

> For the detail string, I think we could do something like:
>
> DETAIL:  Failed certificate data (unverified): subject '%s', serial
> number %s, issuer '%s'

Done that way in v4.

I also added an optional 0002 that bubbles the error info up to the
final ereport(ERROR), using errdetail() and errhint(). I can squash it
into 0001 if you like it, or drop it if you don't. (This approach
could be adapted to the client, too.)

Thanks!
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Graham Leggett
Date:
On 01 Jul 2022, at 22:59, Jacob Champion <jchampion@timescale.com> wrote:

>> I added this to httpd a while back:
>>
>> SSL_CLIENT_CERT_RFC4523_CEA
>>
>> It would be good to interoperate.
>
> What kind of interoperation did you have in mind? Are there existing
> tools that want to scrape this information for observability?

This is for human troubleshooting.

> I think the CEA syntax might not be a good fit for this particular
> patch: first, we haven't actually verified the certificate, so no one
> should be using it to assert certificate equality (and I'm truncating
> the Issuer anyway, to avoid letting someone flood the logs). Second,
> this is designed to be human-readable rather than machine-readable.

This is what a CEA looks like:

{ serialNumber 400410167207191393705333222102472642510002355884, issuer rdnSequence:”CN=Foo UK G1,O=Foo,C=UK" }

Whitespace and escaping is important above.

When troubleshooting, you want a string like the above that you can cut and paste and search for in other systems and
logfiles. The verification status of the cert isn’t an issue at this point, you have a system in front of you where it
doesn’twork when it should, and you need to know exactly what’s connecting, not what you think you’re connecting to,
andyou need precise data. 

Please don’t invent another format, or try and truncate the data. This is a huge headache when troubleshooting.

Regards,
Graham
—




Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 08.07.22 20:39, Jacob Champion wrote:
> I also added an optional 0002 that bubbles the error info up to the
> final ereport(ERROR), using errdetail() and errhint(). I can squash it
> into 0001 if you like it, or drop it if you don't. (This approach
> could be adapted to the client, too.)

I squashed those two together.  I also adjusted the error message a bit 
more for project style.  (We can put both lines into detail.)

I had to read up on this "ex_data" API.  Interesting.  But I'm wondering 
a bit about how the life cycle of these objects is managed.  What 
happens if the allocated error string is deallocated before its 
containing object?  Or vice versa?  How do we ensure we don't 
accidentally reuse the error string when the code runs again?  (I guess 
currently it can't?)  Maybe we should avoid this and just put the 
errdetail itself into a static variable that we unset once the message 
is printed?
Attachment

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Sat, Jul 9, 2022 at 6:49 AM Graham Leggett <minfrin@sharp.fm> wrote:
> Please don’t invent another format, or try and truncate the data. This is a huge headache when troubleshooting.

I hear you, and I agree that correlating these things across machines
is something we should be making easier. I'm just not convinced that
the particular format you've proposed, with a new set of rules for
quoting and escaping, needs to be part of this patch. (And I think
there are good reasons to truncate unverified cert data, so there'd
have to be clear benefits to offset the risk of opening it up.)

Searching Google for "issuer rdnSequence" comes up with mostly false
positives related to LDAP filtering and certificate dumps, and the
true positives seem to be mail threads that you've participated in. Do
many LDAP servers log certificate failures in this format by default?
(For that matter, does httpd?) The discussion at the time you added
this to httpd [1] seemed to be making the point that this was a niche
format, suited mostly for interaction with LDAP filters -- and Kaspar
additionally pointed out that it's not a canonical format, so all of
our implementations would have to have an ad hoc agreement to choose
exactly one encoding.

If you're using randomized serial numbers, you should be able to grep
for those by themselves and successfully match many different formats,
no? To me, that seems good enough for a first patch, considering we
don't currently log any of this information.

--Jacobfi

[1] https://lists.apache.org/thread/1665qc4mod7ppp58qk3bqc2l3wtl3lkn



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Mon, Jul 11, 2022 at 6:09 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> I squashed those two together.  I also adjusted the error message a bit
> more for project style.  (We can put both lines into detail.)

Oh, okay. Log parsers don't have any issues with that?

> I had to read up on this "ex_data" API.  Interesting.  But I'm wondering
> a bit about how the life cycle of these objects is managed.  What
> happens if the allocated error string is deallocated before its
> containing object?  Or vice versa?

Yeah, I'm currently leaning heavily on the lack of any memory context
switches here. And I end up leaking out a pointer to the stale stack
of be_tls_open_server(), which is gross -- it works since there are no
other clients, but that could probably come back to bite us.

The ex_data API exposes optional callbacks for new/dup/free (I'm
currently setting those to NULL), so we can run custom code whenever
the SSL* is destroyed. If you'd rather the data have the same lifetime
of the SSL* object, we can switch to malloc/strdup/free (or even
OPENSSL_strdup() in later versions). But since we don't have any use
for the ex_data outside of this function, maybe we should just clear
it before we return, rather than carrying it around.

> How do we ensure we don't
> accidentally reuse the error string when the code runs again?  (I guess
> currently it can't?)

The ex_data is associated with the SSL*, not the global SSL_CTX*, so
that shouldn't be an issue. A new SSL* gets created at the start of
be_tls_open_server().

>  Maybe we should avoid this and just put the
> errdetail itself into a static variable that we unset once the message
> is printed?

If you're worried about the lifetime of the palloc'd data being too
short, does switching to a static variable help in that case?

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 13.07.22 01:06, Jacob Champion wrote:
>> I had to read up on this "ex_data" API.  Interesting.  But I'm wondering
>> a bit about how the life cycle of these objects is managed.  What
>> happens if the allocated error string is deallocated before its
>> containing object?  Or vice versa?
> 
> Yeah, I'm currently leaning heavily on the lack of any memory context
> switches here. And I end up leaking out a pointer to the stale stack
> of be_tls_open_server(), which is gross -- it works since there are no
> other clients, but that could probably come back to bite us.
> 
> The ex_data API exposes optional callbacks for new/dup/free (I'm
> currently setting those to NULL), so we can run custom code whenever
> the SSL* is destroyed. If you'd rather the data have the same lifetime
> of the SSL* object, we can switch to malloc/strdup/free (or even
> OPENSSL_strdup() in later versions). But since we don't have any use
> for the ex_data outside of this function, maybe we should just clear
> it before we return, rather than carrying it around.

Concretely, I was thinking like the attached top-up patch.

The other way can surely be made to work somehow, but this seems much 
simpler and with fewer questions about the details.
Attachment

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, Jul 14, 2022 at 1:12 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> Concretely, I was thinking like the attached top-up patch.
>
> The other way can surely be made to work somehow, but this seems much
> simpler and with fewer questions about the details.

Ah, seeing it side-by-side helps. That's much easier, I agree.

Thanks,
--Jacob



Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 14.07.22 23:09, Jacob Champion wrote:
> On Thu, Jul 14, 2022 at 1:12 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> Concretely, I was thinking like the attached top-up patch.
>>
>> The other way can surely be made to work somehow, but this seems much
>> simpler and with fewer questions about the details.
> 
> Ah, seeing it side-by-side helps. That's much easier, I agree.

Committed like that.




Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On 7/15/22 09:34, Peter Eisentraut wrote:
> Committed like that.

Thanks for all the reviews!

--Jacob




Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-15 09:46:40 -0700, Jacob Champion wrote:
> On 7/15/22 09:34, Peter Eisentraut wrote:
> > Committed like that.
> 
> Thanks for all the reviews!

This might have been discussed somewhere, but I'm worried about emitting
unescaped data from pre-auth clients. What guarantees that subject / issuer
name only contain printable ascii-chars? Printing terminal control chars or
such would not be great, nor would splitting a string at a multi-boundary.

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On 7/15/22 12:11, Andres Freund wrote:
> This might have been discussed somewhere, but I'm worried about emitting
> unescaped data from pre-auth clients. What guarantees that subject / issuer
> name only contain printable ascii-chars? Printing terminal control chars or
> such would not be great, nor would splitting a string at a multi-boundary.

Hm. The last time I asked about that, Magnus pointed out that we reflect
port->user_name as-is [1], so I kind of stopped worrying about it. Is
this more dangerous than that? (And do we want to fix it now,
regardless?) What guarantees are we supposed to be making for log encoding?

Thanks,
--Jacob

[1]
https://www.postgresql.org/message-id/CABUevExVHryTasKmtJW5RtU-dBesYj4bV7ggpeVMfiPCHCvLNA%40mail.gmail.com



Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-15 13:20:59 -0700, Jacob Champion wrote:
> On 7/15/22 12:11, Andres Freund wrote:
> > This might have been discussed somewhere, but I'm worried about emitting
> > unescaped data from pre-auth clients. What guarantees that subject / issuer
> > name only contain printable ascii-chars? Printing terminal control chars or
> > such would not be great, nor would splitting a string at a multi-boundary.
> 
> Hm. The last time I asked about that, Magnus pointed out that we reflect
> port->user_name as-is [1], so I kind of stopped worrying about it.

I think we need to fix a number of these. But no, I don't think we should just
add more because we've not been careful in a bunch of other places.


> Is this more dangerous than that?

Hard to say.


> (And do we want to fix it now, regardless?)

Yes.


> What guarantees are we supposed to be making for log encoding?

I don't know, but I don't think not caring at all is a good
option. Particularly for unauthenticated data I'd say that escaping everything
but printable ascii chars is a sensible approach.

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On 7/15/22 13:35, Andres Freund wrote:
>> (And do we want to fix it now, regardless?)
> 
> Yes.

Cool. I can get on board with that.

>> What guarantees are we supposed to be making for log encoding?
> 
> I don't know, but I don't think not caring at all is a good
> option. Particularly for unauthenticated data I'd say that escaping everything
> but printable ascii chars is a sensible approach.

It'll also be painful for anyone whose infrastructure isn't in a Latin
character set... Maybe that's worth the tradeoff for a v1.

Is there an acceptable approach that could centralize it, so we fix it
once and are done? E.g. a log_encoding GUC and either conversion or
escaping in send_message_to_server_log()?

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-15 14:01:53 -0700, Jacob Champion wrote:
> On 7/15/22 13:35, Andres Freund wrote:
> >> (And do we want to fix it now, regardless?)
> > 
> > Yes.
> 
> Cool. I can get on board with that.
> 
> >> What guarantees are we supposed to be making for log encoding?
> > 
> > I don't know, but I don't think not caring at all is a good
> > option. Particularly for unauthenticated data I'd say that escaping everything
> > but printable ascii chars is a sensible approach.
> 
> It'll also be painful for anyone whose infrastructure isn't in a Latin
> character set... Maybe that's worth the tradeoff for a v1.

I don't think it's a huge issue, or really avoidable, pre-authentication.
Don't we require all server-side encodings to be supersets of ascii?

We already have pg_clean_ascii() and use it for application_name, fwiw.


> Is there an acceptable approach that could centralize it, so we fix it
> once and are done? E.g. a log_encoding GUC and either conversion or
> escaping in send_message_to_server_log()?

Introducing escaping to ascii for all log messages seems like it'd be
incredibly invasive, and would remove a lot of worthwhile information. Nor
does it really address the whole scope - consider e.g. the truncation in this
patch, that can't be done correctly by the time send_message_to_server_log()
is reached - just chopping in the middle of a multi-byte string would have
made the string invalidly encoded. And we can't perform encoding conversion
from client data until we've gone further into the authentication process, I
think.

Always escaping ANSI escape codes (or rather the non-printable ascii range) is
more convincing. Then we'd just need to make sure that client controlled data
is properly encoded before handing it over to other parts of the system.

I can see a point in a log_encoding GUC at some point, but it seems a bit
separate from the discussion here.

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> We already have pg_clean_ascii() and use it for application_name, fwiw.

Yeah, we should just use that.  If anyone wants to upgrade the situation
for non-ASCII data later, fixing it for all of these cases at once would
be appropriate.

            regards, tom lane



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On 7/15/22 14:19, Andres Freund wrote:
> On 2022-07-15 14:01:53 -0700, Jacob Champion wrote:
>> On 7/15/22 13:35, Andres Freund wrote:
>>> I don't know, but I don't think not caring at all is a good
>>> option. Particularly for unauthenticated data I'd say that escaping everything
>>> but printable ascii chars is a sensible approach.
>>
>> It'll also be painful for anyone whose infrastructure isn't in a Latin
>> character set... Maybe that's worth the tradeoff for a v1.
> 
> I don't think it's a huge issue, or really avoidable, pre-authentication.
> Don't we require all server-side encodings to be supersets of ascii?

Well, I was going to say that for this feature, where the goal is to
debug a failed certificate chain, having to manually unescape the logged
certificate names if your infrastructure already handled UTF-8 natively
would be a real pain.

But your later point about truncation makes that moot; I forgot that my
patch can truncate in the middle of a UTF-8 sequence, so you're probably
dealing with replacement glyphs anyway. I don't really have a leg to
stand on there.

> We already have pg_clean_ascii() and use it for application_name, fwiw.

That seems much worse than escaping for this particular patch; if your
cert's Common Name is in (non-ASCII) UTF-8 then all you'll see is
"CN=?????????" in the log lines that were supposed to be helping you
root-cause. Escaping would be much more helpful in this case.

>> Is there an acceptable approach that could centralize it, so we fix it
>> once and are done? E.g. a log_encoding GUC and either conversion or
>> escaping in send_message_to_server_log()?
> 
> Introducing escaping to ascii for all log messages seems like it'd be
> incredibly invasive, and would remove a lot of worthwhile information. Nor
> does it really address the whole scope - consider e.g. the truncation in this
> patch, that can't be done correctly by the time send_message_to_server_log()
> is reached - just chopping in the middle of a multi-byte string would have
> made the string invalidly encoded. And we can't perform encoding conversion
> from client data until we've gone further into the authentication process, I
> think.
> 
> Always escaping ANSI escape codes (or rather the non-printable ascii range) is
> more convincing. Then we'd just need to make sure that client controlled data
> is properly encoded before handing it over to other parts of the system.
> 
> I can see a point in a log_encoding GUC at some point, but it seems a bit
> separate from the discussion here.

Fair enough.

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-15 14:51:38 -0700, Jacob Champion wrote:
> > We already have pg_clean_ascii() and use it for application_name, fwiw.
> 
> That seems much worse than escaping for this particular patch; if your
> cert's Common Name is in (non-ASCII) UTF-8 then all you'll see is
> "CN=?????????" in the log lines that were supposed to be helping you
> root-cause. Escaping would be much more helpful in this case.

I'm doubtful that's all that common. But either way, I suggest a separate
patch to deal with that...

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Fri, Jul 15, 2022 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-07-15 14:51:38 -0700, Jacob Champion wrote:
> > That seems much worse than escaping for this particular patch; if your
> > cert's Common Name is in (non-ASCII) UTF-8 then all you'll see is
> > "CN=?????????" in the log lines that were supposed to be helping you
> > root-cause. Escaping would be much more helpful in this case.
>
> I'm doubtful that's all that common.

Probably not, but the more systems that support it without weird
usability bugs, the more common it will hopefully become.

> But either way, I suggest a separate patch to deal with that...

Proposed fix attached, which uses \x-escaping for bytes outside of
printable ASCII.

Thanks,
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-19 09:07:31 -0700, Jacob Champion wrote:
> On Fri, Jul 15, 2022 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-07-15 14:51:38 -0700, Jacob Champion wrote:
> > > That seems much worse than escaping for this particular patch; if your
> > > cert's Common Name is in (non-ASCII) UTF-8 then all you'll see is
> > > "CN=?????????" in the log lines that were supposed to be helping you
> > > root-cause. Escaping would be much more helpful in this case.
> >
> > I'm doubtful that's all that common.
> 
> Probably not, but the more systems that support it without weird
> usability bugs, the more common it will hopefully become.
> 
> > But either way, I suggest a separate patch to deal with that...
> 
> Proposed fix attached, which uses \x-escaping for bytes outside of
> printable ASCII.

I don't think this should be open coded in the ssl part of the code. IMO this
should replace the existing ascii escape function instead. I strongly oppose
open coding this functionality in prepare_cert_name().

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
[resending to list]

On 7/19/22 09:14, Andres Freund wrote:
> IMO this should replace the existing ascii escape function instead.
That will affect the existing behavior of application_name and
cluster_name; is that acceptable?

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Tom Lane
Date:
Jacob Champion <jchampion@timescale.com> writes:
> On 7/19/22 09:14, Andres Freund wrote:
>> IMO this should replace the existing ascii escape function instead.

> That will affect the existing behavior of application_name and
> cluster_name; is that acceptable?

I think Andres' point is exactly that these should all act alike.

Having said that, I struggle to see why we are panicking about badly
encoded log data from this source while blithely ignoring the problems
posed by non-ASCII role names, database names, and tablespace names.

            regards, tom lane



Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-19 12:39:43 -0400, Tom Lane wrote:
> Having said that, I struggle to see why we are panicking about badly
> encoded log data from this source while blithely ignoring the problems
> posed by non-ASCII role names, database names, and tablespace names.

I think we should fix these as well. I'm not as concerned about post-auth
encoding issues (i.e. tablespace name) as about pre-auth data (role name,
database name) - obviously being allowed to log in already is a pretty good
filter...

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Tue, Jul 19, 2022 at 10:09 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-07-19 12:39:43 -0400, Tom Lane wrote:
> > Having said that, I struggle to see why we are panicking about badly
> > encoded log data from this source while blithely ignoring the problems
> > posed by non-ASCII role names, database names, and tablespace names.
>
> I think we should fix these as well. I'm not as concerned about post-auth
> encoding issues (i.e. tablespace name) as about pre-auth data (role name,
> database name) - obviously being allowed to log in already is a pretty good
> filter...

v2 adds escaping to pg_clean_ascii(). My original attempt used
StringInfo allocation, but that didn't play well with guc_malloc(), so
I switched to a two-pass API where the caller allocates. Let me know
if I'm missing something obvious; this way is more verbose than I'd
like...

Thanks,
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Andres Freund
Date:
Hi,

On 2022-07-19 15:08:38 -0700, Jacob Champion wrote:
> v2 adds escaping to pg_clean_ascii(). My original attempt used
> StringInfo allocation, but that didn't play well with guc_malloc(), so
> I switched to a two-pass API where the caller allocates. Let me know
> if I'm missing something obvious; this way is more verbose than I'd
> like...

Hm, that's pretty awkward. Perhaps we can have a better API for
everything but guc.c?

Or alternatively, perhaps we can just make pg_clean_ascii() return NULL
if allocation failed and then guc_strdup() the result in guc.c?

If we end up needing a two phase approach, why use the same function for
both phases? That seems quite awkward.

Greetings,

Andres Freund



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Tue, Jul 19, 2022 at 3:38 PM Andres Freund <andres@anarazel.de> wrote:
> Or alternatively, perhaps we can just make pg_clean_ascii() return NULL
> if allocation failed and then guc_strdup() the result in guc.c?

The guc_strdup() approach really reduces the amount of code, so that's
what I did in v3. I'm not following why we need to return NULL on
failure, though -- both palloc() and guc_malloc() ERROR on failure, so
is it okay to keep those semantics the same?

> If we end up needing a two phase approach, why use the same function for
> both phases? That seems quite awkward.

Mostly so the byte counting always agrees between the two phases, no
matter how the implementation evolves. But it's hopefully moot now.

--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Tom Lane
Date:
Jacob Champion <jchampion@timescale.com> writes:
> The guc_strdup() approach really reduces the amount of code, so that's
> what I did in v3. I'm not following why we need to return NULL on
> failure, though -- both palloc() and guc_malloc() ERROR on failure, so
> is it okay to keep those semantics the same?

guc_malloc's behavior varies depending on elevel.  It's *not*
equivalent to palloc.

            regards, tom lane



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Wed, Jul 20, 2022 at 3:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> guc_malloc's behavior varies depending on elevel.  It's *not*
> equivalent to palloc.

Right, sorry -- a better way for me to ask the question:

I'm currently hardcoding an elevel of ERROR on the new guc_strdup()s,
because that seems to be a common case for the check hooks. If that's
okay, is there any reason not to use palloc() semantics for
pg_clean_ascii()? (And if it's not okay, why?)

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Tom Lane
Date:
Jacob Champion <jchampion@timescale.com> writes:
> I'm currently hardcoding an elevel of ERROR on the new guc_strdup()s,
> because that seems to be a common case for the check hooks.

Really?  That's almost certainly NOT okay.  As an example, if you
have a problem with a new value loaded from postgresql.conf during
SIGHUP processing, throwing ERROR will cause the postmaster to exit.

I wouldn't be too surprised if there are isolated cases where people
didn't understand what they were doing and wrote that, but that
needs to be fixed not emulated.

            regards, tom lane



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Wed, Jul 20, 2022 at 3:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jacob Champion <jchampion@timescale.com> writes:
> > I'm currently hardcoding an elevel of ERROR on the new guc_strdup()s,
> > because that seems to be a common case for the check hooks.
>
> Really?  That's almost certainly NOT okay.  As an example, if you
> have a problem with a new value loaded from postgresql.conf during
> SIGHUP processing, throwing ERROR will cause the postmaster to exit.

v4 attempts to fix this by letting the check hooks pass
MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
which just mallocs.)

> I wouldn't be too surprised if there are isolated cases where people
> didn't understand what they were doing and wrote that, but that
> needs to be fixed not emulated.

I might be missing something, but in guc.c at least it appears to be
the rule and not the exception.

Thanks,
--Jacob

Attachment

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion <jchampion@timescale.com> wrote:
> v4 attempts to fix this by letting the check hooks pass
> MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
> which just mallocs.)

Ping -- should I add an open item somewhere so this isn't lost?

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Thu, Jul 28, 2022 at 9:19 AM Jacob Champion <jchampion@timescale.com> wrote:
> On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion <jchampion@timescale.com> wrote:
> > v4 attempts to fix this by letting the check hooks pass
> > MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
> > which just mallocs.)
>
> Ping -- should I add an open item somewhere so this isn't lost?

Trying again. Peter, is this approach acceptable? Should I try something else?

Thanks,
--Jacob



Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 09.09.22 00:32, Jacob Champion wrote:
> On Thu, Jul 28, 2022 at 9:19 AM Jacob Champion <jchampion@timescale.com> wrote:
>> On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion <jchampion@timescale.com> wrote:
>>> v4 attempts to fix this by letting the check hooks pass
>>> MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
>>> which just mallocs.)
>>
>> Ping -- should I add an open item somewhere so this isn't lost?
> 
> Trying again. Peter, is this approach acceptable? Should I try something else?

This looks fine to me.  Committed.



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Tue, Sep 13, 2022 at 7:11 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> This looks fine to me.  Committed.

Thanks!

--Jacob



Re: [PATCH] Log details for client certificate failures

From
Masahiko Sawada
Date:
Hi,

On Tue, Sep 13, 2022 at 11:11 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 09.09.22 00:32, Jacob Champion wrote:
> > On Thu, Jul 28, 2022 at 9:19 AM Jacob Champion <jchampion@timescale.com> wrote:
> >> On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion <jchampion@timescale.com> wrote:
> >>> v4 attempts to fix this by letting the check hooks pass
> >>> MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
> >>> which just mallocs.)
> >>
> >> Ping -- should I add an open item somewhere so this isn't lost?
> >
> > Trying again. Peter, is this approach acceptable? Should I try something else?
>
> This looks fine to me.  Committed.

While looking at the recent changes for check_cluster_name() I found
this thread. Regarding the following change made by the commit
45b1a67a0fc, there is possibly small memory leak:

 static bool
 check_cluster_name(char **newval, void **extra, GucSource source)
 {
+   char       *clean;
+
    /* Only allow clean ASCII chars in the cluster name */
-   pg_clean_ascii(*newval);
+   clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM);
+   if (!clean)
+       return false;
+
+   clean = guc_strdup(WARNING, clean);
+   if (!clean)
+       return false;

+   *newval = clean;
    return true;
 }

pg_clean_ascii() does palloc_extended() to allocate memory in
Postmaster context for the new characters and the clean is then
replaced with the new memory allocated by guc_strdup(). No-one
references the memory allocated by pg_clean_ascii() and it lasts for
postmaster lifetime. Valgrind memcheck also shows:

 1 bytes in 1 blocks are definitely lost in loss record 4 of 70
    at 0xCD2A16: palloc_extended (mcxt.c:1239)
    by 0xD09437: pg_clean_ascii (string.c:99)
    by 0x7A5CF3: check_cluster_name (variable.c:1061)
    by 0xCAF7CD: call_string_check_hook (guc.c:6365)
    by 0xCAA724: InitializeOneGUCOption (guc.c:1439)
    by 0xCAA0ED: InitializeGUCOptions (guc.c:1268)
    by 0x99B245: PostmasterMain (postmaster.c:691)
    by 0x858896: main (main.c:197)

I think we can fix it by the attached patch but I'd like to discuss
whether it's worth fixing it.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Tue, Sep 27, 2022 at 1:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think we can fix it by the attached patch but I'd like to discuss
> whether it's worth fixing it.

Whoops. So every time it's changed, we leak a little postmaster memory?

Your patch looks good to me and I see no reason not to fix it.

Thanks,
--Jacob



Re: [PATCH] Log details for client certificate failures

From
Masahiko Sawada
Date:
On Wed, Sep 28, 2022 at 4:44 AM Jacob Champion <jchampion@timescale.com> wrote:
>
> On Tue, Sep 27, 2022 at 1:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I think we can fix it by the attached patch but I'd like to discuss
> > whether it's worth fixing it.
>
> Whoops. So every time it's changed, we leak a little postmaster memory?

No. Since cluster_name is PGC_POSTMATER, we leak a little postmaster
memory only once when starting up. application_name is PGC_USERSET but
since we normally allocate memory in PortalMemoryContext we eventually
can free it. Since check_cluster_name and check_application_name are
similar, I changed both for consistency.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: [PATCH] Log details for client certificate failures

From
Jacob Champion
Date:
On Tue, Sep 27, 2022 at 6:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> No. Since cluster_name is PGC_POSTMATER, we leak a little postmaster
> memory only once when starting up. application_name is PGC_USERSET but
> since we normally allocate memory in PortalMemoryContext we eventually
> can free it.

Oh, I see; thank you for the correction. And even if someone put an
application_name into their postgresql.conf, and then changed it a
bunch of times, we'd free the leaked memory from the config_cxt that's
created in ProcessConfigFile().

Is there a reason we don't provide a similar temporary context during
InitializeGUCOptions()? Naively it seems like that would suppress any
future one-time leaks, and maybe cut down on some Valgrind noise. Then
again, maybe there's just not that much demand for pallocs during GUC
hooks.

Thanks,
--Jacob



Re: [PATCH] Log details for client certificate failures

From
Masahiko Sawada
Date:
On Thu, Sep 29, 2022 at 1:43 AM Jacob Champion <jchampion@timescale.com> wrote:
>
> On Tue, Sep 27, 2022 at 6:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > No. Since cluster_name is PGC_POSTMATER, we leak a little postmaster
> > memory only once when starting up. application_name is PGC_USERSET but
> > since we normally allocate memory in PortalMemoryContext we eventually
> > can free it.
>
> Oh, I see; thank you for the correction. And even if someone put an
> application_name into their postgresql.conf, and then changed it a
> bunch of times, we'd free the leaked memory from the config_cxt that's
> created in ProcessConfigFile().

Right.

>
> Is there a reason we don't provide a similar temporary context during
> InitializeGUCOptions()? Naively it seems like that would suppress any
> future one-time leaks, and maybe cut down on some Valgrind noise. Then
> again, maybe there's just not that much demand for pallocs during GUC
> hooks.

While this seems a future-proof idea, I wonder if it might be overkill
since we don't need to worry about accumulation of leaked memory in
this case. Given that only check_cluter_name is the case where we
found a small memory leak, I think it's adequate to fix it.

Fixing this issue suppresses the valgrind's complaint but since the
boot value of cluster_name is "" the memory leak we can avoid is only
1 byte.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: [PATCH] Log details for client certificate failures

From
Peter Eisentraut
Date:
On 29.09.22 06:52, Masahiko Sawada wrote:
> While this seems a future-proof idea, I wonder if it might be overkill
> since we don't need to worry about accumulation of leaked memory in
> this case. Given that only check_cluter_name is the case where we
> found a small memory leak, I think it's adequate to fix it.
> 
> Fixing this issue suppresses the valgrind's complaint but since the
> boot value of cluster_name is "" the memory leak we can avoid is only
> 1 byte.

I have committed this.  I think it's better to keep the code locally 
robust and not to have to rely on complex analysis of how GUC memory 
management works.



Re: [PATCH] Log details for client certificate failures

From
Masahiko Sawada
Date:
On Sat, Oct 1, 2022 at 7:53 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 29.09.22 06:52, Masahiko Sawada wrote:
> > While this seems a future-proof idea, I wonder if it might be overkill
> > since we don't need to worry about accumulation of leaked memory in
> > this case. Given that only check_cluter_name is the case where we
> > found a small memory leak, I think it's adequate to fix it.
> >
> > Fixing this issue suppresses the valgrind's complaint but since the
> > boot value of cluster_name is "" the memory leak we can avoid is only
> > 1 byte.
>
> I have committed this.  I think it's better to keep the code locally
> robust and not to have to rely on complex analysis of how GUC memory
> management works.

Thanks! Agreed.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: [PATCH] Log details for client certificate failures

From
Ranier Vilela
Date:
Hi,
I think that 9d58bf, left a tiny oversight.

guc_strdup uses strdup, so must be cleaned by free not pfree.
IMO, can be used once free call too.

regards,
Ranier Vilela
Attachment

Re: [PATCH] Log details for client certificate failures

From
Ranier Vilela
Date:
Em qua., 5 de out. de 2022 às 09:19, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Hi,
I think that 9d58bf, left a tiny oversight.

guc_strdup uses strdup, so must be cleaned by free not pfree.
IMO, can be used once free call too.
Sorry, my fault.
Please ignore this.

regards,
Ranier Vilela