Thread: could not accept SSL connection: Success

could not accept SSL connection: Success

From
Carla Iriberri
Date:
Hi all,

We've noticed the following connection error logs recently:

    sql_error_code = XX000 LOG:  could not accept SSL connection: Success

We're seeing this on PostgreSQL databases running on Ubuntu Focal 20.04 with
different PostgreSQL versions (13.5, 13.4, 12.9, 10.19...).

After going through the source code I think that this comes from a
`SSL_ERROR_SYSCALL` where the errcode itself is 0, given the "Success" error
that's getting logged.

The server is accepting other (TLSv1.3) SSL connections from the same source
around the same time when this happens, so I understand this error/behavior is
likely due to the client itself closing the connection.

I saw previous discussions where different errors were logged with the "Success"
message and this was corrected/treated as a bug, but I couldn't find similar
reports specific to "could not accept SSL connection". Is this a known issue or
case?

Regards,
Carla

Re: could not accept SSL connection: Success

From
Michael Paquier
Date:
On Mon, Jan 17, 2022 at 05:05:52PM +0100, Carla Iriberri wrote:
> I saw previous discussions where different errors were logged with the
> "Success"
> message and this was corrected/treated as a bug, but I couldn't find similar
> reports specific to "could not accept SSL connection". Is this a known
> issue or
> case?

Not based my recent mailing list memories, but I may be running short.
The error comes from the backend as you say, where this log would
expect something in saved_errno to feed %m.

And, upstream documentation tells that:
https://www.openssl.org/docs/manmaster/man3/SSL_get_error.html

"On an unexpected EOF, versions before OpenSSL 3.0 returned
SSL_ERROR_SYSCALL, nothing was added to the error stack, and errno was
0. Since OpenSSL 3.0 the returned error is SSL_ERROR_SSL with a
meaningful error on the error stack."

This would mean that relying on %m would be wrong for this case.  And
I guess that you are using a version of OpenSSL older than 3.0?
--
Michael

Attachment

Re: could not accept SSL connection: Success

From
Carla Iriberri
Date:
Thanks, Michael, that's it, indeed! I had missed that part of the
OpenSSL docs. These PG instances are running on Ubuntu Focal hosts that come
with OpenSSL 1.1.1.
 
We had never seen these in the previous Xenial images because those
were using OpenSSL 1.0.2, and from what I've seen the bug was introduced
in 1.1.0.

Thanks again,
Carla

On Wed, Jan 19, 2022 at 5:42 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jan 17, 2022 at 05:05:52PM +0100, Carla Iriberri wrote:
> I saw previous discussions where different errors were logged with the
> "Success"
> message and this was corrected/treated as a bug, but I couldn't find similar
> reports specific to "could not accept SSL connection". Is this a known
> issue or
> case?

Not based my recent mailing list memories, but I may be running short.
The error comes from the backend as you say, where this log would
expect something in saved_errno to feed %m.

And, upstream documentation tells that:
https://www.openssl.org/docs/manmaster/man3/SSL_get_error.html

"On an unexpected EOF, versions before OpenSSL 3.0 returned
SSL_ERROR_SYSCALL, nothing was added to the error stack, and errno was
0. Since OpenSSL 3.0 the returned error is SSL_ERROR_SSL with a
meaningful error on the error stack."

This would mean that relying on %m would be wrong for this case.  And
I guess that you are using a version of OpenSSL older than 3.0?
--
Michael

Re: could not accept SSL connection: Success

From
Thomas Munro
Date:
On Thu, Jan 20, 2022 at 12:06 AM Carla Iriberri
<ciriberri@salesforce.com> wrote:
> On Wed, Jan 19, 2022 at 5:42 AM Michael Paquier <michael@paquier.xyz> wrote:
>> "On an unexpected EOF, versions before OpenSSL 3.0 returned
>> SSL_ERROR_SYSCALL, nothing was added to the error stack, and errno was
>> 0. Since OpenSSL 3.0 the returned error is SSL_ERROR_SSL with a
>> meaningful error on the error stack."

> Thanks, Michael, that's it, indeed! I had missed that part of the
> OpenSSL docs. These PG instances are running on Ubuntu Focal hosts that come
> with OpenSSL 1.1.1.

Good news, I'm glad they nailed that down.  I recall that this
behaviour was a bit of a moving target in earlier versions:

https://www.postgresql.org/message-id/CAEepm%3D3cc5wYv%3DX4Nzy7VOUkdHBiJs9bpLzqtqJWxdDUp5DiPQ%40mail.gmail.com



Re: could not accept SSL connection: Success

From
Michael Paquier
Date:
On Thu, Jan 20, 2022 at 09:05:35AM +1300, Thomas Munro wrote:
> Good news, I'm glad they nailed that down.  I recall that this
> behaviour was a bit of a moving target in earlier versions:
>
> https://www.postgresql.org/message-id/CAEepm%3D3cc5wYv%3DX4Nzy7VOUkdHBiJs9bpLzqtqJWxdDUp5DiPQ%40mail.gmail.com

Ahh..  So you saw the same problem a couple of years back.  This
thread did not catch much attention.

I don't think that it makes much sense to leave this unchecked as the
message is confusing as it stands.  Perhaps we could do something like
the attached by adding a note about OpenSSL 3.0 to revisit this code
once we unplug support for 1.1.1 and avoiding the errno==0 case?  The
frontend has its own ideas of socket failures as it requires thread
support, making things different with the backend, but it seems to me
that we could see cases where SOCK_ERRNO is also 0.  That's mostly
what you suggested on the other thread.

The error handling changes are really cosmetic, so I'd rather leave
the back-branches out of that.  Thoughts?
--
Michael

Attachment

Re: could not accept SSL connection: Success

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> I don't think that it makes much sense to leave this unchecked as the
> message is confusing as it stands.  Perhaps we could do something like
> the attached by adding a note about OpenSSL 3.0 to revisit this code
> once we unplug support for 1.1.1 and avoiding the errno==0 case?

If I'm reading this patch correctly, you have it calling the case
"EOF detected" in one place, "internal failure" in another, and
failing to touch several more places where we deal with
SSL_ERROR_SYSCALL.  I don't find that to be an improvement ---
inconsistency is worse than a confusing error message.

Personally I'm satisfied to leave it as-is, since this issue apparently
occurs only in a minority of OpenSSL versions, and not the newest.

            regards, tom lane



Re: could not accept SSL connection: Success

From
Michael Paquier
Date:
On Wed, Jan 19, 2022 at 07:58:43PM -0500, Tom Lane wrote:
> Personally I'm satisfied to leave it as-is, since this issue apparently
> occurs only in a minority of OpenSSL versions, and not the newest.

Leaving things in their current state is fine by me.  Would it be
better to add a note about the business with 3.0 though?  My gut is
telling me that we'd better revisit those code paths in a couple of
years when support for legacy OpenSSL is removed, and most likely we
would have forgotten about all those details.
--
Michael

Attachment

Re: could not accept SSL connection: Success

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Leaving things in their current state is fine by me.  Would it be
> better to add a note about the business with 3.0 though?

What do you envision saying?  "We don't need to do anything here
for 3.0" doesn't seem helpful.

            regards, tom lane



Re: could not accept SSL connection: Success

From
Michael Paquier
Date:
On Wed, Jan 19, 2022 at 08:06:30PM -0500, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > Leaving things in their current state is fine by me.  Would it be
> > better to add a note about the business with 3.0 though?
>
> What do you envision saying?  "We don't need to do anything here
> for 3.0" doesn't seem helpful.

Nope, but the idea would be to keep around a note that we may want to
revisit this area of the code based on the state of upstream, because
our code is currently shaped based on problems that OpenSSL has dealt
with.  I am not completely sure, but something among the line of:
"OpenSSL 1.1.1 and older versions return nothing on an unexpected EOF,
and errno may not be set.  3.0 reports SSL_ERROR_SSL with a
meaningful error set on the stack, so this could be reworked once
support for older versions is removed."

Perhaps that's just nannyism from my side, this is really minor at the
end.
--
Michael

Attachment