Re: Support for NSS as a libpq TLS backend - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Support for NSS as a libpq TLS backend
Date
Msg-id 7B5A88DF-863A-4B00-B841-0285C1FF70DF@yesql.se
Whole thread Raw
In response to Re: Support for NSS as a libpq TLS backend  (Jacob Champion <pchampion@vmware.com>)
Responses Re: Support for NSS as a libpq TLS backend  (Jacob Champion <pchampion@vmware.com>)
List pgsql-hackers
> On 12 Nov 2020, at 23:12, Jacob Champion <pchampion@vmware.com> wrote:
>
> On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote:
>>
>> False alarm -- the stderr debugging I'd added in to track down the
>> assertion tripped up the "no stderr" tests. Zero failing tests now.
>
> I took a look at the OpenSSL interop problems you mentioned upthread.

Great, thanks!

> I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
> connection.
>
> I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
> were read, in order to satisfy its API. There was some discussion on
> this upthread:
>
> On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
>>>
>>>> +            case PR_IO_TIMEOUT_ERROR:
>>>> +                break;
>>>
>>> What does this mean? We'll return with a 0 errno here, right? When is
>>> this case reachable?
>>
>> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
>> we don't do.  It mentioned somewhere that it had happened in no-wait calls due
>> to a bug, but I fail to find that reference now.  Either way, I've removed it
>> to fall into the default error handling which now sets errno correctly as that
>> was a paddle short here.
>
> PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
> machine. It doesn't look like the PR_Recv() API has a choice -- if
> there's no data, it can't return a positive integer, and returning zero
> means that the socket has been disconnected. So -1 with a timeout error
> is the only option.

Right, that makes sense.

> I'm not completely sure why this is exposed so easily with an OpenSSL
> server -- I'm guessing the implementation slices up its packets
> differently on the wire, causing a read event before NSS is able to
> decrypt a full record -- but it's worth noting that this case also shows
> up during NSS-to-NSS psql connections, when handling notifications at
> the end of every query. PQconsumeInput() reports a hard failure with the
> current implementation, but its return value is ignored by
> PrintNotifications(). Otherwise this probably would have showed up
> earlier.

Should there perhaps be an Assert there to catch those?

> (What's the best way to test this case? Are there lower-level tests for
> the protocol/network layer somewhere that I'm missing?)

Not AFAIK.  Having been knee-deep now, do you have any ideas on how to
implement?

> While patching this case, I also noticed that pgtls_read() doesn't call
> SOCK_ERRNO_SET() for the disconnection case. That is also in the
> attached patch.

Ah yes, nice catch.

I've incorporated this patch as well as the previous patch for the assertion
failure on private callback data into the attached v19 patchset.  I also did a
spellcheck and pgindent run on it for ease of review.

cheers ./daniel


Attachment

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: MultiXact\SLRU buffers configuration
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions