Re: Rare SSL failures on eelpout - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Rare SSL failures on eelpout
Date
Msg-id 6920.1551805678@sss.pgh.pa.us
Whole thread Raw
In response to Re: Rare SSL failures on eelpout  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Rare SSL failures on eelpout
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> You can see that poll() already knew the other end had closed the
> socket.  Since this is clearly timing... let's see, yeah, I can make
> it fail every time by adding sleep(1) before the comment "Send the
> startup packet.".  I assume that'll work on any Linux machine?

Great idea, but no cigar --- doesn't do anything for me except make
the ssl test really slow.  (I tried it on RHEL6 and Fedora 28 and, just
for luck, current macOS.)  What this seems to prove is that the thing
that's different about eelpout is the particular kernel it's running,
and that that kernel has some weird timing behavior in this situation.

I've also been experimenting with reducing libpq's SO_SNDBUF setting
on the socket, with more or less the same idea of making the sending
of the startup packet slower.  No joy there either.

Annoying.  I'd be happier about writing code to fix this if I could
reproduce it :-(

            regards, tom lane

PS: but now I'm wondering about trying other non-Linux kernels.


pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Re: \describe*
Next
From: Shawn Debnath
Date:
Subject: Re: Refactoring the checkpointer's fsync request queue