Re: Support for NSS as a libpq TLS backend - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: Support for NSS as a libpq TLS backend
Date
Msg-id FBA0C365-3B19-4E40-98F4-8B0064DB7127@vmware.com
Whole thread Raw
In response to Re: Support for NSS as a libpq TLS backend  (Jacob Champion <pchampion@vmware.com>)
Responses Re: Support for NSS as a libpq TLS backend  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion@vmware.com> wrote:
>
> False alarm -- the stderr debugging I'd added in to track down the
> assertion tripped up the "no stderr" tests. Zero failing tests now.

I took a look at the OpenSSL interop problems you mentioned upthread. I
don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
connection.

I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
were read, in order to satisfy its API. There was some discussion on
this upthread:

On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> On 20 Oct 2020, at 21:15, Andres Freund <andres@anarazel.de> wrote:
>>
>>> +            case PR_IO_TIMEOUT_ERROR:
>>> +                break;
>>
>> What does this mean? We'll return with a 0 errno here, right? When is
>> this case reachable?
>
> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
> we don't do.  It mentioned somewhere that it had happened in no-wait calls due
> to a bug, but I fail to find that reference now.  Either way, I've removed it
> to fall into the default error handling which now sets errno correctly as that
> was a paddle short here.

PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
machine. It doesn't look like the PR_Recv() API has a choice -- if
there's no data, it can't return a positive integer, and returning zero
means that the socket has been disconnected. So -1 with a timeout error
is the only option.

I'm not completely sure why this is exposed so easily with an OpenSSL
server -- I'm guessing the implementation slices up its packets
differently on the wire, causing a read event before NSS is able to
decrypt a full record -- but it's worth noting that this case also shows
up during NSS-to-NSS psql connections, when handling notifications at
the end of every query. PQconsumeInput() reports a hard failure with the
current implementation, but its return value is ignored by
PrintNotifications(). Otherwise this probably would have showed up
earlier.

(What's the best way to test this case? Are there lower-level tests for
the protocol/network layer somewhere that I'm missing?)

While patching this case, I also noticed that pgtls_read() doesn't call
SOCK_ERRNO_SET() for the disconnection case. That is also in the
attached patch.

--Jacob


Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Next
From: Tom Lane
Date:
Subject: Re: Bogus documentation for bogus geometric operators