could not receive data from WAL stream: SSL SYSCALL error: Success - Mailing list pgsql-hackers

From Thomas Munro
Subject could not receive data from WAL stream: SSL SYSCALL error: Success
Date
Msg-id CAEepm=3cc5wYv=X4Nzy7VOUkdHBiJs9bpLzqtqJWxdDUp5DiPQ@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi hackers,

I heard a report of an error like this from a user of openssl
1.1.0f-3+deb9u on Debian:

pg_basebackup: could not receive data from WAL stream: SSL SYSCALL
error: Success

I noticed that some man pages for SSL_get_error say this under
SSL_ERROR_SYSCALL:
          Some non-recoverable I/O error occurred.  The OpenSSL error queue          may contain more information on
theerror.  For socket I/O on Unix          systems, consult errno for details.
 

But others say:
 Some I/O error occurred. The OpenSSL error queue may contain more information on the error. If the error queue is
empty(i.e. ERR_get_error() returns 0), ret can be used to find out more about the error: If ret == 0, an EOF was
observedthat violates the protocol. If ret == -1, the underlying BIO reported an I/O error (for socket I/O on Unix
systems,consult errno for details).
 

While wondering if it was the documentation or the behaviour that
changed and what it all means, I came across some discussion and a
reverted commit here:

https://github.com/openssl/openssl/issues/1903

The error reported to me seems to have occurred on a release whose man
page *doesn't* describe the ERR_get_error() == 0 case (unlike some of
the earlier tags you can get to from here):

https://github.com/openssl/openssl/blob/OpenSSL_1_1_0-stable/doc/ssl/SSL_get_error.pod

And yet clearly errno didn't hold an error number from a failed
syscall, which seems consistent with the older documented behaviour.

Perhaps pgtls_read(), pgtls_write() and open_client_SSL() should add
"&& ecode != 0" to the if statements in their SSL_ERROR_SYSCALL case
so that this case would fall to the "EOF detected" message instead of
logging the nonsensical (and potentially uninitialised?) errno
message, if indeed this is behaviour described in older releases.  On
the other hand, without documentation to support it in the current
release, we don't really *know* that it's an EOF condition.  Due to
this murkiness and the fact that it's mostly harmless anyway, I'm not
proposing a change, but I thought I'd share this in case it makes more
sense to someone more familiar with this stuff.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Sv: pspg - psql pager
Next
From: Huong Dangminh
Date:
Subject: RE: User defined data types in Logical Replication