Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0 - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Date
Msg-id 1113966.1644280235@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-bugs
I wrote:
> The seeming timing problem with the two CRL tests remains.

I spent some more time poking at this, and found that:

* There are three tests, not two, that intermittently fail.
They are at 001_ssltests.pl lines 565, 608, 618.  It's suspicious
that these are exactly the tests that expect to see "sslv3 alert"
or "tlsv1 alert" conditions rather than anything higher-level;
but I don't have any insight as to why that might be relevant.

* The failure occurs on the WRITE side, not the read side; the
'server closed the connection unexpectedly' message we see coming
back from libpq is from pqsecure_raw_write.  (I verified this by
changing the texts of the various instances of that message.)

* If I make my_sock_write ignore EPIPE/ECONNRESET, as per the
attached entirely-uncommitable patch, the errors go away.

I hypothesize that something about OpenBSD scheduling is allowing the
server to (sometimes) exit before the client-side openssl has flushed
all its buffers, and the client-side code doesn't handle that well.
It's not very clear why this wouldn't be affecting all users of
OpenSSL, but there you have it.

While the attached is surely no good as a general patch, could we
get away with ignoring EPIPE/ECONNRESET in writes during connection
startup?  We'd notice the failure soon enough on the read side if
it's not this problem.  (This seems a bit related to libpq's other
hacks that postpone recognition of write failures.)

By the by, today's fairywren failure [1] sure looks related:

#   Failed test 'intermediate client certificate is missing: matches'
#   at t/001_ssltests.pl line 608.
#                   'psql: error: connection to server at "127.0.0.1", port 50577 failed: could not receive data from
server:Software caused connection abort (0x00002745/10053) 
# SSL SYSCALL error: Software caused connection abort (0x00002745/10053)
# could not send startup packet: No error (0x00000000/0)'
#     doesn't match '(?^:SSL error: tlsv1 alert unknown ca)'

This is evidently on the read not write side, so it's not quite
the same thing, but ...

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-02-07%2021%3A04%3A53

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 9f735ba437..11084a6a07 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1697,6 +1697,10 @@ my_sock_write(BIO *h, const char *buf, int size)
                 BIO_set_retry_write(h);
                 break;

+            case EPIPE:
+            case ECONNRESET:
+                return size;
+
             default:
                 break;
         }

pgsql-bugs by date:

Previous
From: Luis Díaz
Date:
Subject: PSQL Client command line password leak when using Connection String
Next
From: PG Bug reporting form
Date:
Subject: BUG #17399: Dead tuple number stats not updated on long running queries