Re: Rare SSL failures on eelpout - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Rare SSL failures on eelpout
Date
Msg-id CA+hUKGJDOLkCcuT3q4Ofu8Ojo9n4PNKsUc1108tv=r1=36bbeQ@mail.gmail.com
Whole thread Raw
In response to Re: Rare SSL failures on eelpout  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Rare SSL failures on eelpout
List pgsql-hackers
On Wed, Mar 6, 2019 at 6:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > You can see that poll() already knew the other end had closed the
> > socket.  Since this is clearly timing... let's see, yeah, I can make
> > it fail every time by adding sleep(1) before the comment "Send the
> > startup packet.".  I assume that'll work on any Linux machine?
>
> Great idea, but no cigar --- doesn't do anything for me except make
> the ssl test really slow.  (I tried it on RHEL6 and Fedora 28 and, just
> for luck, current macOS.)  What this seems to prove is that the thing
> that's different about eelpout is the particular kernel it's running,
> and that that kernel has some weird timing behavior in this situation.
>
> I've also been experimenting with reducing libpq's SO_SNDBUF setting
> on the socket, with more or less the same idea of making the sending
> of the startup packet slower.  No joy there either.
>
> Annoying.  I'd be happier about writing code to fix this if I could
> reproduce it :-(

Hmm.  Note that eelpout only started doing it with OpenSSL 1.1.1.  But
I just tried the sleep(1) trick on an x86 box running the same version
of Debian, OpenSSL etc and it didn't work.  So eelpout (a super cheap
virtualised 4-core ARMv8 system rented from scaleway.com running
Debian Buster with a kernel identifying itself as 4.9.23-std-1 and
libc6 2.28-7) is indeed starting to look pretty weird.  Let me know if
you want to log in and experiment on that machine.

-- 
Thomas Munro
https://enterprisedb.com


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Inheriting table AMs for partitioned tables
Next
From: Robert Haas
Date:
Subject: Re: Ordered Partitioned Table Scans