Re: Rare SSL failures on eelpout - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Rare SSL failures on eelpout
Date
Msg-id 26030.1551733682@sss.pgh.pa.us
Whole thread Raw
In response to Re: Rare SSL failures on eelpout  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Rare SSL failures on eelpout
List pgsql-hackers
I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> That suggests that we could perhaps handle ECONNRESET both at startup
>> packet send time (for certificate rejection, eelpout's case) and at
>> initial query send (for idle timeout, bug #15598's case) by attempting
>> to read.  Does that make sense?

> Hmm ... it definitely makes sense that we shouldn't assume that a *write*
> failure means there is nothing left to *read*.

After staring at the code for awhile, I am thinking that there may be
a bug of that ilk, but if so it's down inside OpenSSL.  Perhaps it's
specific to the OpenSSL version you're using on eelpout?  There is
not anything we could do differently in libpq, AFAICS, because it's
OpenSSL's responsibility to read any data that might be available.

I also looked into the idea that we're doing something wrong on the
server side, allowing the final error message to not get flushed out.
A plausible theory there is that SSL_shutdown is returning a WANT_READ
or WANT_WRITE error and we should retry it ... but that doesn't square
with your observation upthread that it's returning SSL_ERROR_SSL.

It's all very confusing, but I think there's a nontrivial chance
that this is an OpenSSL bug, especially since we haven't been able
to replicate it elsewhere.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: monitoring CREATE INDEX [CONCURRENTLY]
Next
From: Tom Lane
Date:
Subject: Re: Allowing extensions to supply operator-/function-specific info