> On 12 Nov 2020, at 23:12, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote:
>>
>> False alarm -- the stderr debugging I'd added in to track down the
>> assertion tripped up the "no stderr" tests. Zero failing tests now.
>
> I took a look at the OpenSSL interop problems you mentioned upthread.
Great, thanks!
> I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
> connection.
>
> I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
> were read, in order to satisfy its API. There was some discussion on
> this upthread:
>
> On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
>>>
>>>> + case PR_IO_TIMEOUT_ERROR:
>>>> + break;
>>>
>>> What does this mean? We'll return with a 0 errno here, right? When is
>>> this case reachable?
>>
>> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
>> we don't do. It mentioned somewhere that it had happened in no-wait calls due
>> to a bug, but I fail to find that reference now. Either way, I've removed it
>> to fall into the default error handling which now sets errno correctly as that
>> was a paddle short here.
>
> PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
> machine. It doesn't look like the PR_Recv() API has a choice -- if
> there's no data, it can't return a positive integer, and returning zero
> means that the socket has been disconnected. So -1 with a timeout error
> is the only option.
Right, that makes sense.
> I'm not completely sure why this is exposed so easily with an OpenSSL
> server -- I'm guessing the implementation slices up its packets
> differently on the wire, causing a read event before NSS is able to
> decrypt a full record -- but it's worth noting that this case also shows
> up during NSS-to-NSS psql connections, when handling notifications at
> the end of every query. PQconsumeInput() reports a hard failure with the
> current implementation, but its return value is ignored by
> PrintNotifications(). Otherwise this probably would have showed up
> earlier.
Should there perhaps be an Assert there to catch those?
> (What's the best way to test this case? Are there lower-level tests for
> the protocol/network layer somewhere that I'm missing?)
Not AFAIK. Having been knee-deep now, do you have any ideas on how to
implement?
> While patching this case, I also noticed that pgtls_read() doesn't call
> SOCK_ERRNO_SET() for the disconnection case. That is also in the
> attached patch.
Ah yes, nice catch.
I've incorporated this patch as well as the previous patch for the assertion
failure on private callback data into the attached v19 patchset. I also did a
spellcheck and pgindent run on it for ease of review.
cheers ./daniel