Thread: buildfarm server suddenly not talking to old SSL stacks?
My buildfarm animals dromedary and prairiedog have been failing since around 9AM EDT on Sunday. The buildfarm script output isn't very detailed: getting branches of interest (https://buildfarm.postgresql.org/branches_of_inte\ rest.txt) at ./run_branches.pl line 129. but trying it manually yields $ curl https://buildfarm.postgresql.org/branches_of_interest.txt curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version The same thing works fine on newer machines though, as does fetching with http: instead of https:. Have we done something recently to create an incompatibility with old SSL stacks? regards, tom lane
On 2018-Jul-16, Tom Lane wrote: > My buildfarm animals dromedary and prairiedog have been failing since > around 9AM EDT on Sunday. The buildfarm script output isn't very > detailed: > > getting branches of interest (https://buildfarm.postgresql.org/branches_of_inte\ > rest.txt) at ./run_branches.pl line 129. > > but trying it manually yields > > $ curl https://buildfarm.postgresql.org/branches_of_interest.txt > curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version > > The same thing works fine on newer machines though, as does fetching with > http: instead of https:. Have we done something recently to create an > incompatibility with old SSL stacks? Yeah, there were a few updates that day at 11am UTC; particularly the ca-certificates package was updated (to version 20161130+nmu1+deb9u1). I don't know why this would be significant (is the server trying to verify the client's cert?), but here's the changelog: ca-certificates (20161130+nmu1+deb9u1) stretch; urgency=medium * debian/ca-certificates.postinst: Prevent postinst failure on read-only /usr/local. Closes: #843722 * debian/control: Remove Christian Perrier from uploaders at his request. Closes: #894070 * mozilla/{certdata.txt,nssckbi.h}: Update Mozilla certificate authority bundle to version 2.22. Closes: #858064 The following certificate authorities were added (+): + "AC RAIZ FNMT-RCM" + "Amazon Root CA 1" + "Amazon Root CA 2" + "Amazon Root CA 3" + "Amazon Root CA 4" + "D-TRUST Root CA 3 2013" + "GDCA TrustAUTH R5 ROOT" + "LuxTrust Global Root 2" + "SSL.com EV Root Certification Authority ECC" + "SSL.com EV Root Certification Authority RSA R2" + "SSL.com Root Certification Authority ECC" + "SSL.com Root Certification Authority RSA" + "Symantec Class 1 Public Primary Certification Authority - G4" + "Symantec Class 1 Public Primary Certification Authority - G6" + "Symantec Class 2 Public Primary Certification Authority - G4" + "Symantec Class 2 Public Primary Certification Authority - G6" + "TrustCor ECA-1" + "TrustCor RootCert CA-1" + "TrustCor RootCert CA-2" + "TUBITAK Kamu SM SSL Kok Sertifikasi - Surum 1" The following certificate authorities were removed (-): - "ACEDICOM Root" - "AddTrust Public Services Root" - "AddTrust Qualified Certificates Root" - "ApplicationCA - Japanese Government" - "Buypass Class 2 CA 1" - "CA Disig Root R1" - "Certinomis - Autorité Racine" - "China Internet Network Information Center EV Certificates Root" - "CNNIC ROOT" - "Comodo Secure Services root" - "Comodo Trusted Services root" - "DST ACES CA X6" - "EBG Elektronik Sertifika Hizmet Saglayicisi" - "Equifax Secure CA" - "Equifax Secure eBusiness CA 1" - "Equifax Secure Global eBusiness CA" - "GeoTrust Global CA 2" - "IGC/A" - "Juur-SK" - "Microsec e-Szigno Root CA" - "PSCProcert" - "Root CA Generalitat Valenciana" - "RSA Security 2048 v3" - "Security Communication EV RootCA1" - "S-TRUST Authentication and Encryption Root CA 2005 PN" - "Swisscom Root CA 1" - "Swisscom Root EV CA 2" - "TUBITAK UEKAE Kok Sertifika Hizmet Saglayicisi - Surum 3" - "TURKTRUST Certificate Services Provider Root 2007" - "TÜRKTRUST Elektronik Sertifika Hizmet Sağlayıcısı H6" - "UTN USERFirst Hardware Root CA" - "Verisign Class 1 Public Primary Certification Authority" - "Verisign Class 2 Public Primary Certification Authority - G2" - "Verisign Class 3 Public Primary Certification Authority" - "WellsSecure Public Root Certificate Authority" -- Michael Shuler <michael@pbandjelly.org> Sat, 07 Jul 2018 01:08:40 +0200 -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2018-Jul-16, Tom Lane wrote: >> My buildfarm animals dromedary and prairiedog have been failing since >> around 9AM EDT on Sunday. ... Have we done something recently to create an >> incompatibility with old SSL stacks? > Yeah, there were a few updates that day at 11am UTC; particularly the > ca-certificates package was updated (to version 20161130+nmu1+deb9u1). Ah, that sounds plausibly related. Guess I need a certificate update on those machines. Thanks! regards, tom lane
On Tue, Jul 17, 2018 at 7:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-Jul-16, Tom Lane wrote:
>> My buildfarm animals dromedary and prairiedog have been failing since
>> around 9AM EDT on Sunday. ... Have we done something recently to create an
>> incompatibility with old SSL stacks?
> Yeah, there were a few updates that day at 11am UTC; particularly the
> ca-certificates package was updated (to version 20161130+nmu1+deb9u1).
Ah, that sounds plausibly related. Guess I need a certificate update
on those machines. Thanks!
We also changed some of the server setup so there is now a haproxy that's doing the SSL termination. So there is probably a slightly different configuration of available SSL algorithms and such as well. It might be either one of those two, both changes happened not too far apart on that day.
On 07/16/2018 08:31 PM, Tom Lane wrote: > My buildfarm animals dromedary and prairiedog have been failing since > around 9AM EDT on Sunday. The buildfarm script output isn't very > detailed: > > getting branches of interest (https://buildfarm.postgresql.org/branches_of_inte\ > rest.txt) at ./run_branches.pl line 129. > > but trying it manually yields > > $ curl https://buildfarm.postgresql.org/branches_of_interest.txt > curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version > > The same thing works fine on newer machines though, as does fetching with > http: instead of https:. Have we done something recently to create an > incompatibility with old SSL stacks? Maybe something to do with this?: https://blog.pcisecuritystandards.org/are-you-ready-for-30-june-2018-sayin-goodbye-to-ssl-early-tls > > regards, tom lane > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, Jul 17, 2018 at 3:20 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 07/16/2018 08:31 PM, Tom Lane wrote:My buildfarm animals dromedary and prairiedog have been failing since
around 9AM EDT on Sunday. The buildfarm script output isn't very
detailed:
getting branches of interest (https://buildfarm.postgresql.org/branches_of_inte\
rest.txt) at ./run_branches.pl line 129.
but trying it manually yields
$ curl https://buildfarm.postgresql.org/branches_of_interest.txt
curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
The same thing works fine on newer machines though, as does fetching with
http: instead of https:. Have we done something recently to create an
incompatibility with old SSL stacks?
Maybe something to do with this?:
https://blog.pcisecuritystandards.org/are-you-ready-for-30- june-2018-sayin-goodbye-to- ssl-early-tls
Our buildfarm does not require PCI classification :P
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jul 17, 2018 at 7:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> My buildfarm animals dromedary and prairiedog have been failing since >>> around 9AM EDT on Sunday. ... Have we done something recently to create >>> an incompatibility with old SSL stacks? > We also changed some of the server setup so there is now a haproxy that's > doing the SSL termination. So there is probably a slightly different > configuration of available SSL algorithms and such as well. It might be > either one of those two, both changes happened not too far apart on that > day. Hm. Closer investigation suggests that there's something else wrong. While, as I said, curl works for non-SSL connections: $ curl http://buildfarm.postgresql.org/branches_of_interest.txt REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD doing the same thing the way the buildfarm script does it does not work: $ perl -MLWP::Simple -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/branches_of_interest.txt");' 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:http://buildfarm.postgresql.org/branches_of_interest.txt> That's on dromedary's host with perl 5.10.0. Even weirder, it *does* work on prairiedog's host with perl 5.8.3. I think that the latter installation is newer and hence may have newer copies of some CPAN-supplied modules, but I'm not sure how to debug further. Also, on prairiedog's host, this is what I get for the https case: $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/branches_of_interest.txt");' 500 Can't connect to buildfarm.postgresql.org:443 <URL:https://buildfarm.postgresql.org/branches_of_interest.txt> which isn't terribly informative but it doesn't look like an SSL certificate failure. I've temporarily revived prairiedog by changing its config to report to http not https. But dromedary is dead in the water until this gets sorted. BTW, Noah's AIX critters may be suffering from the same problem; I'd have expected them to report in by now on recent HEAD changes... regards, tom lane
On 2018-Jul-17, Tom Lane wrote: > doing the same thing the way the buildfarm script does it does not work: > > $ perl -MLWP::Simple -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/branches_of_interest.txt");' > 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:http://buildfarm.postgresql.org/branches_of_interest.txt> I don't know if Varnish catches https calls as well as http, but if it does, it could very well be related. A Varnish cache was added recently to buildfarm. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 17, 2018 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jul 17, 2018 at 7:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My buildfarm animals dromedary and prairiedog have been failing since
>>> around 9AM EDT on Sunday. ... Have we done something recently to create
>>> an incompatibility with old SSL stacks?
> We also changed some of the server setup so there is now a haproxy that's
> doing the SSL termination. So there is probably a slightly different
> configuration of available SSL algorithms and such as well. It might be
> either one of those two, both changes happened not too far apart on that
> day.
Hm. Closer investigation suggests that there's something else wrong.
While, as I said, curl works for non-SSL connections:
$ curl http://buildfarm.postgresql.org/branches_of_interest.txt
REL9_3_STABLE
REL9_4_STABLE
REL9_5_STABLE
REL9_6_STABLE
REL_10_STABLE
REL_11_STABLE
HEAD
doing the same thing the way the buildfarm script does it does not work:
$ perl -MLWP::Simple -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/ branches_of_interest.txt");'
500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:http://buildfarm.postgresql.org/branches_of_ interest.txt>
OK, that's just weird. It's failing to connect on port *80* with a "No route to host" error? That sounds more like it would be on a network layer?
I could understand many weird errors on it, but no route to host seems extremely weird. Almost indicates it would be connecting to the wrong IP.
That's on dromedary's host with perl 5.10.0. Even weirder, it
*does* work on prairiedog's host with perl 5.8.3. I think that the
latter installation is newer and hence may have newer copies of
some CPAN-supplied modules, but I'm not sure how to debug further.
Also, on prairiedog's host, this is what I get for the https case:
$ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/ branches_of_interest.txt");'
500 Can't connect to buildfarm.postgresql.org:443 <URL:https://buildfarm.postgresql.org/branches_of_ interest.txt>
which isn't terribly informative but it doesn't look like an SSL
certificate failure.
That one I believe more in since it could be because of SSL issues. What do you get with curl on that one?
On Tue, Jul 17, 2018 at 7:08 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2018-Jul-17, Tom Lane wrote:
> doing the same thing the way the buildfarm script does it does not work:
>
> $ perl -MLWP::Simple -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/ branches_of_interest.txt");'
> 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:http://buildfarm.postgresql.org/branches_of_ interest.txt>
I don't know if Varnish catches https calls as well as http, but if it
does, it could very well be related. A Varnish cache was added recently
to buildfarm.
https is terminated in haproxy and relayed from there. Varnish doesn't speak native https.
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jul 17, 2018 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, on prairiedog's host, this is what I get for the https case: >> >> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint(" >> https://buildfarm.postgresql.org/branches_of_interest.txt");' >> 500 Can't connect to buildfarm.postgresql.org:443 <URL:https://buildfarm. >> postgresql.org/branches_of_interest.txt> >> >> which isn't terribly informative but it doesn't look like an SSL >> certificate failure. > That one I believe more in since it could be because of SSL issues. What do > you get with curl on that one? Both machines show the same behavior with curl: $ curl https://buildfarm.postgresql.org/branches_of_interest.txt curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version $ curl http://buildfarm.postgresql.org/branches_of_interest.txt REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD Now, curl is the OS-supplied one and probably isn't sharing any userspace infrastructure at all with prairiedog's Perl stack. On the other hand, dromedary is using Apple's perl installation so it's possible that it shares root-certificate infrastructure with curl. regards, tom lane
On Tue, Jul 17, 2018 at 7:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jul 17, 2018 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, on prairiedog's host, this is what I get for the https case:
>>
>> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("
>> https://buildfarm.postgresql.org/branches_of_interest.txt") ;'
>> 500 Can't connect to buildfarm.postgresql.org:443 <URL:https://buildfarm.
>> postgresql.org/branches_of_interest.txt>
>>
>> which isn't terribly informative but it doesn't look like an SSL
>> certificate failure.
> That one I believe more in since it could be because of SSL issues. What do
> you get with curl on that one?
Both machines show the same behavior with curl:
$ curl https://buildfarm.postgresql.org/branches_of_interest.txt
curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Ah. Some googling shows that does seem to indicate an old version of OpenSSL.
The old config rejected sslv2 and sslv3, but allowed tlsv1.
The new one refuses both tlsv1 and tlsv1.1, allowing only tlsv1.2.
As a check if this might be it, I have at least temporarily removed that restriction. Can you try again now?
$ curl http://buildfarm.postgresql.org/branches_of_interest.txt
REL9_3_STABLE
REL9_4_STABLE
REL9_5_STABLE
REL9_6_STABLE
REL_10_STABLE
REL_11_STABLE
HEAD
Now, curl is the OS-supplied one and probably isn't sharing any userspace
infrastructure at all with prairiedog's Perl stack. On the other hand,
dromedary is using Apple's perl installation so it's possible that it
shares root-certificate infrastructure with curl.
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jul 17, 2018 at 7:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Both machines show the same behavior with curl: >> $ curl https://buildfarm.postgresql.org/branches_of_interest.txt >> curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert >> protocol version > Ah. Some googling shows that does seem to indicate an old version of > OpenSSL. > The old config rejected sslv2 and sslv3, but allowed tlsv1. > The new one refuses both tlsv1 and tlsv1.1, allowing only tlsv1.2. > As a check if this might be it, I have at least temporarily removed that > restriction. Can you try again now? Same results, both via curl and via perl. regards, tom lane
On Tue, Jul 17, 2018 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jul 17, 2018 at 7:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Both machines show the same behavior with curl:
>> $ curl https://buildfarm.postgresql.org/branches_of_interest.txt
>> curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert
>> protocol version
> Ah. Some googling shows that does seem to indicate an old version of
> OpenSSL.
> The old config rejected sslv2 and sslv3, but allowed tlsv1.
> The new one refuses both tlsv1 and tlsv1.1, allowing only tlsv1.2.
> As a check if this might be it, I have at least temporarily removed that
> restriction. Can you try again now?
Same results, both via curl and via perl.
Ha. I changed the client config instead of the server :/ Sorry about that, once more?
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jul 17, 2018 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>> The old config rejected sslv2 and sslv3, but allowed tlsv1. >>> The new one refuses both tlsv1 and tlsv1.1, allowing only tlsv1.2. >>> As a check if this might be it, I have at least temporarily removed that >>> restriction. Can you try again now? >> Same results, both via curl and via perl. > Ha. I changed the client config instead of the server :/ Sorry about that, > once more? Better. On prairiedog, I now get $ curl https://buildfarm.postgresql.org/branches_of_interest.txt curl: (60) SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed More details here: http://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). The default bundle is named curl-ca-bundle.crt; you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. and with -k it actually works: $ curl -k https://buildfarm.postgresql.org/branches_of_interest.txt REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD and what's more useful for the purpose at hand, so does perl: $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/branches_of_interest.txt");' REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD The fact that Perl is happy may have something to do with my having just updated Mozilla::CA on these machines, which so far as I can find is Perl's only source of root certs. But curl is using the OS' keystore which of course is horribly behind the times. The results on dromedary are even more interesting: $ curl https://buildfarm.postgresql.org/branches_of_interest.txt REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD (So, system keystore less out of date here...) $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/branches_of_interest.txt");' 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:http://buildfarm.postgresql.org/branches_of_interest.txt> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/branches_of_interest.txt");' REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD I have no idea what to make of the fact that http: still fails with this perl version. But I think we've conclusively proven that the problem with https: is down to these machines trying to use tlsv1. So the next question is what to do about it. Is tls < 1.2 officially deprecated these days, or was that configuration change just accidental? I can probably restore these machines to functionality by updating whichever Perl module knows about TLS (anyone know which that is?), so if you want to undo the config change, it's OK by me. But other owners of ancient buildfarm critters might be less happy about it. regards, tom lane
On Tue, Jul 17, 2018 at 8:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That said, the buildfarm doesn't really do things that are that sensitive. So we can probably turn it off on that individual machine if we have to. Right now our config management will flip the configuration right back shortly, but I can probably get that sorted out pretty easily.
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jul 17, 2018 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
<snip>
The results on dromedary are even more interesting:
$ curl https://buildfarm.postgresql.org/branches_of_interest.txt
REL9_3_STABLE
REL9_4_STABLE
REL9_5_STABLE
REL9_6_STABLE
REL_10_STABLE
REL_11_STABLE
HEAD
(So, system keystore less out of date here...)
$ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/ branches_of_interest.txt");'
500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:http://buildfarm.postgresql.org/branches_of_ interest.txt>
$ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/ branches_of_interest.txt");'
REL9_3_STABLE
REL9_4_STABLE
REL9_5_STABLE
REL9_6_STABLE
REL_10_STABLE
REL_11_STABLE
HEAD
I have no idea what to make of the fact that http: still fails with this
Yeah, that part is super weird. Do we know if that worked before? Or has it been using https for a while?
perl version. But I think we've conclusively proven that the problem with
https: is down to these machines trying to use tlsv1.
So the next question is what to do about it. Is tls < 1.2 officially
deprecated these days, or was that configuration change just accidental?
It absolutely is. I actually thought we had already blocked that in the *previous* setup, but clearly we hadn't :)
That said, the buildfarm doesn't really do things that are that sensitive. So we can probably turn it off on that individual machine if we have to. Right now our config management will flip the configuration right back shortly, but I can probably get that sorted out pretty easily.
I can probably restore these machines to functionality by updating
whichever Perl module knows about TLS (anyone know which that is?),
so if you want to undo the config change, it's OK by me. But other
owners of ancient buildfarm critters might be less happy about it.
I think what you'd need is a new version of openssl.
But it might be hard to get in on all of them. Let's see if we can turn off the restriction for a while, and see if the other BF animals also recover.
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jul 17, 2018 at 8:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The results on dromedary are even more interesting: >> >> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint(" >> http://buildfarm.postgresql.org/branches_of_interest.txt");' >> 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL: >> http://buildfarm.postgresql.org/branches_of_interest.txt> > Yeah, that part is super weird. Do we know if that worked before? Or has it > been using https for a while? It looks like I installed Perl https support on that machine on 2017-01-14, so I'd guess dromedary has been using https since then. >> I can probably restore these machines to functionality by updating >> whichever Perl module knows about TLS (anyone know which that is?), >> so if you want to undo the config change, it's OK by me. But other >> owners of ancient buildfarm critters might be less happy about it. > I think what you'd need is a new version of openssl. Yeah, I'd just come to that conclusion after researching things a bit (although it looks like IO::Socket:SSL has some relevant fixes too). > But it might be hard to get in on all of them. Let's see if we can turn off > the restriction for a while, and see if the other BF animals also recover. The bigger issue here is that if we force buildfarm members to run openssl >= x.y, I'd say that's tantamount to desupporting openssl < x.y. Are we ready to desupport versions that don't have TLS 1.2? I think that might well be reasonable to do in HEAD, but I'm less enthused about it for the back branches. regards, tom lane
On Tue, Jul 17, 2018 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jul 17, 2018 at 8:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The results on dromedary are even more interesting:
>>
>> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("
>> http://buildfarm.postgresql.org/branches_of_interest.txt") ;'
>> 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) <URL:
>> http://buildfarm.postgresql.org/branches_of_interest.txt>
> Yeah, that part is super weird. Do we know if that worked before? Or has it
> been using https for a while?
It looks like I installed Perl https support on that machine on
2017-01-14, so I'd guess dromedary has been using https since then.
So it could be something else. I have no idea what it would be though, since port 80 seems to work from elsewhere.
>> I can probably restore these machines to functionality by updating
>> whichever Perl module knows about TLS (anyone know which that is?),
>> so if you want to undo the config change, it's OK by me. But other
>> owners of ancient buildfarm critters might be less happy about it.
> I think what you'd need is a new version of openssl.
Yeah, I'd just come to that conclusion after researching things a bit
(although it looks like IO::Socket:SSL has some relevant fixes too).
> But it might be hard to get in on all of them. Let's see if we can turn off
> the restriction for a while, and see if the other BF animals also recover.
The bigger issue here is that if we force buildfarm members to run
openssl >= x.y, I'd say that's tantamount to desupporting openssl < x.y.
Are we ready to desupport versions that don't have TLS 1.2? I think
that might well be reasonable to do in HEAD, but I'm less enthused about
it for the back branches.
Yeah, that's definitely a bigger problem.
We could always use http for those and not https. But surely that's *worse* than using a https that's considered insecure. Completely skipping it must be worse... And I don't think separating out the site into "submissions can do 1.0 but viewers can only do 1.2+" is reasonable, not given that the only things that actually passes credentials *are* the submissions.
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jul 17, 2018 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint(" >>> http://buildfarm.postgresql.org/branches_of_interest.txt");' >>> 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) >>> <URL:http://buildfarm.postgresql.org/branches_of_interest.txt> > So it could be something else. I have no idea what it would be though, > since port 80 seems to work from elsewhere. Oh, and before you decide to get back in the water ... I just tried this stuff from my RHEL6 server. curl is fine, but: $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/branches_of_interest.txt");' REL9_3_STABLE REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL_10_STABLE REL_11_STABLE HEAD $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/branches_of_interest.txt");' 500 Can't connect to buildfarm.postgresql.org:443 (connect: Network is unreachable) <URL:https://buildfarm.postgresql.org/branches_of_interest.txt> Now I'm completely confused. This is going through a different ISP and significantly different network path to reach rackspace, but that shouldn't have anything to do with it? *Something* is darn weird here. regards, tom lane
On 07/17/2018 08:58 PM, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Tue, Jul 17, 2018 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint(" >>>> http://buildfarm.postgresql.org/branches_of_interest.txt");' >>>> 500 Can't connect to buildfarm.postgresql.org:80 (No route to host) >>>> <URL:http://buildfarm.postgresql.org/branches_of_interest.txt> > >> So it could be something else. I have no idea what it would be though, >> since port 80 seems to work from elsewhere. > > Oh, and before you decide to get back in the water ... I just tried this > stuff from my RHEL6 server. curl is fine, but: > > $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("http://buildfarm.postgresql.org/branches_of_interest.txt");' > REL9_3_STABLE > REL9_4_STABLE > REL9_5_STABLE > REL9_6_STABLE > REL_10_STABLE > REL_11_STABLE > HEAD > $ perl -MLWP::Simple -MLWP::Protocol::https -e 'LWP::Simple::getprint("https://buildfarm.postgresql.org/branches_of_interest.txt");' > 500 Can't connect to buildfarm.postgresql.org:443 (connect: Network is unreachable) <URL:https://buildfarm.postgresql.org/branches_of_interest.txt> > > Now I'm completely confused. This is going through a different ISP > and significantly different network path to reach rackspace, but > that shouldn't have anything to do with it? *Something* is darn > weird here. given it does not yet seem to have been discussed in this thread - ipv4 vs ipv6 (either direct or indirect through CGN or similiar technologies)? Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > On 07/17/2018 08:58 PM, Tom Lane wrote: >> Now I'm completely confused. This is going through a different ISP >> and significantly different network path to reach rackspace, but >> that shouldn't have anything to do with it? *Something* is darn >> weird here. > given it does not yet seem to have been discussed in this thread - ipv4 > vs ipv6 (either direct or indirect through CGN or similiar technologies)? Good thought, but at least at my end it's all IPv4, and traceroute doesn't suggest there's anything else between. The RHEL6 box sees this traceroute: $ traceroute buildfarm.postgresql.org traceroute to buildfarm.postgresql.org (174.143.35.217), 30 hops max, 60 byte packets 1 router1.sss.pgh.pa.us (192.168.168.5) 3.705 ms 4.465 ms 5.528 ms 2 192.168.252.29 (192.168.252.29) 18.278 ms 19.285 ms 20.719 ms 3 gw.aspStation.net (66.207.128.1) 22.197 ms 23.351 ms 24.578 ms 4 144.232.10.211 (144.232.10.211) 25.990 ms 27.226 ms 28.582 ms 5 144.232.10.210 (144.232.10.210) 29.884 ms 31.063 ms 33.532 ms 6 144.232.14.7 (144.232.14.7) 32.215 ms 46.337 ms 44.983 ms 7 144.232.15.174 (144.232.15.174) 46.679 ms 144.232.14.10 (144.232.14.10) 34.810 ms 35.586 ms 8 144.232.15.121 (144.232.15.121) 35.414 ms 34.871 ms 35.634 ms 9 sl-above1-722053-0.sprintlink.net (144.228.205.158) 35.691 ms 35.101 ms 35.580 ms 10 ae1.cr2.dca2.us.zip.zayo.com (64.125.20.121) 35.510 ms 34.947 ms 35.528 ms 11 ae27.cs2.dca2.us.eth.zayo.com (64.125.30.248) 83.111 ms 82.605 ms 87.833 ms 12 ae4.cs2.lga5.us.eth.zayo.com (64.125.29.30) 98.348 ms 97.706 ms 97.689 ms 13 ae3.cs2.ord2.us.eth.zayo.com (64.125.29.213) 85.612 ms 85.198 ms 85.710 ms 14 ae5.cs2.den5.us.eth.zayo.com (64.125.29.216) 85.700 ms 85.143 ms 85.800 ms 15 ae7.cs2.den5.us.eth.zayo.com (64.125.31.237) 91.259 ms 90.791 ms 91.310 ms 16 ae28.er1.dfw2.us.zip.zayo.com (64.125.26.15) 85.946 ms 69.331 ms 69.892 ms 17 128.177.70.86.IPYX-076520-900-ZYO.zip.zayo.com (128.177.70.86) 69.761 ms 87.568 ms 88.741 ms 18 * * * 19 74.205.108.121 (74.205.108.121) 85.957 ms be42.coreb.dfw1.rackspace.net (74.205.108.125) 85.334 ms be41.corea.dfw1.rackspace.net(74.205.108.113) 85.820 ms 20 core5-coreb.dfw1.rackspace.net (74.205.108.27) 76.332 ms po1.CoreA.core6.dfw1.rackspace.net (72.32.111.13) 84.456 mspo2.CoreB.core6.dfw1.rackspace.net (72.32.111.15) 77.344 ms 21 core5-aggr313a.dfw1.rackspace.net (67.192.56.63) 83.224 ms 82.713 ms 71.320 ms 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * while the buildfarm critters are going through verizon: $ traceroute buildfarm.postgresql.org traceroute to buildfarm.postgresql.org (174.143.35.217), 64 hops max, 40 byte packets 1 router2 (192.168.168.2) 5.031 ms 4.108 ms 4.195 ms 2 * * * 3 b3309.pitbpa-lcr-22.verizon-gni.net (130.81.28.70) 133.770 ms 8.250 ms 8.069 ms 4 * * * 5 * * * 6 0.et-7-3-0.br1.iad8.alter.net (140.222.239.83) 16.920 ms 0.et-5-1-5.br1.iad8.alter.net (140.222.0.65) 16.547 ms 14.896ms 7 xe-2-1-0.er2.iad10.us.zip.zayo.com (64.125.13.173) 15.706 ms 14.482 ms 14.481 ms 8 ae1.cr2.dca2.us.zip.zayo.com (64.125.20.121) 15.731 ms 17.841 ms 17.265 ms 9 ae27.cs2.dca2.us.eth.zayo.com (64.125.30.248) 68.536 ms 68.973 ms 67.971 ms 10 ae4.cs2.lga5.us.eth.zayo.com (64.125.29.30) 67.021 ms 68.627 ms 67.130 ms 11 ae3.cs2.ord2.us.eth.zayo.com (64.125.29.213) 67.277 ms 67.727 ms 68.079 ms 12 ae5.cs2.den5.us.eth.zayo.com (64.125.29.216) 66.520 ms 69.493 ms 73.931 ms 13 ae7.cs2.den5.us.eth.zayo.com (64.125.31.237) 193.146 ms 66.854 ms 67.034 ms 14 ae28.er1.dfw2.us.zip.zayo.com (64.125.26.15) 149.815 ms 67.910 ms 70.480 ms 15 128.177.70.86.ipyx-076520-900-zyo.zip.zayo.com (128.177.70.86) 68.615 ms 78.398 ms 70.127 ms 16 * * * 17 be41.coreb.dfw1.rackspace.net (74.205.108.117) 73.871 ms 74.205.108.121 (74.205.108.121) 67.374 ms 70.223 ms 18 core5-corea.dfw1.rackspace.net (74.205.108.11) 72.848 ms po1.corea.core6.dfw1.rackspace.net (72.32.111.13) 69.176 mscore5-corea.dfw1.rackspace.net (74.205.108.11) 70.037 ms 19 po2.core6.aggr313a.rackspace.net (72.32.111.211) 203.537 ms 70.130 ms core5-aggr313a.dfw1.rackspace.net (67.192.56.63) 70.854 ms 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * Also, if the issue is somewhere in between, that fails to explain why "curl" works but not perl. regards, tom lane
On 07/17/2018 09:22 PM, Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> On 07/17/2018 08:58 PM, Tom Lane wrote: >>> Now I'm completely confused. This is going through a different ISP >>> and significantly different network path to reach rackspace, but >>> that shouldn't have anything to do with it? *Something* is darn >>> weird here. > >> given it does not yet seem to have been discussed in this thread - ipv4 >> vs ipv6 (either direct or indirect through CGN or similiar technologies)? > > Good thought, but at least at my end it's all IPv4, and traceroute > doesn't suggest there's anything else between. The RHEL6 box sees > this traceroute: > > $ traceroute buildfarm.postgresql.org > traceroute to buildfarm.postgresql.org (174.143.35.217), 30 hops max, 60 byte packets > 1 router1.sss.pgh.pa.us (192.168.168.5) 3.705 ms 4.465 ms 5.528 ms > 2 192.168.252.29 (192.168.252.29) 18.278 ms 19.285 ms 20.719 ms > 3 gw.aspStation.net (66.207.128.1) 22.197 ms 23.351 ms 24.578 ms > 4 144.232.10.211 (144.232.10.211) 25.990 ms 27.226 ms 28.582 ms > 5 144.232.10.210 (144.232.10.210) 29.884 ms 31.063 ms 33.532 ms > 6 144.232.14.7 (144.232.14.7) 32.215 ms 46.337 ms 44.983 ms > 7 144.232.15.174 (144.232.15.174) 46.679 ms 144.232.14.10 (144.232.14.10) 34.810 ms 35.586 ms > 8 144.232.15.121 (144.232.15.121) 35.414 ms 34.871 ms 35.634 ms > 9 sl-above1-722053-0.sprintlink.net (144.228.205.158) 35.691 ms 35.101 ms 35.580 ms > 10 ae1.cr2.dca2.us.zip.zayo.com (64.125.20.121) 35.510 ms 34.947 ms 35.528 ms > 11 ae27.cs2.dca2.us.eth.zayo.com (64.125.30.248) 83.111 ms 82.605 ms 87.833 ms > 12 ae4.cs2.lga5.us.eth.zayo.com (64.125.29.30) 98.348 ms 97.706 ms 97.689 ms > 13 ae3.cs2.ord2.us.eth.zayo.com (64.125.29.213) 85.612 ms 85.198 ms 85.710 ms > 14 ae5.cs2.den5.us.eth.zayo.com (64.125.29.216) 85.700 ms 85.143 ms 85.800 ms > 15 ae7.cs2.den5.us.eth.zayo.com (64.125.31.237) 91.259 ms 90.791 ms 91.310 ms > 16 ae28.er1.dfw2.us.zip.zayo.com (64.125.26.15) 85.946 ms 69.331 ms 69.892 ms > 17 128.177.70.86.IPYX-076520-900-ZYO.zip.zayo.com (128.177.70.86) 69.761 ms 87.568 ms 88.741 ms > 18 * * * > 19 74.205.108.121 (74.205.108.121) 85.957 ms be42.coreb.dfw1.rackspace.net (74.205.108.125) 85.334 ms be41.corea.dfw1.rackspace.net(74.205.108.113) 85.820 ms > 20 core5-coreb.dfw1.rackspace.net (74.205.108.27) 76.332 ms po1.CoreA.core6.dfw1.rackspace.net (72.32.111.13) 84.456ms po2.CoreB.core6.dfw1.rackspace.net (72.32.111.15) 77.344 ms > 21 core5-aggr313a.dfw1.rackspace.net (67.192.56.63) 83.224 ms 82.713 ms 71.320 ms > 22 * * * > 23 * * * > 24 * * * > 25 * * * > 26 * * * > 27 * * * > 28 * * * > 29 * * * > 30 * * * > > while the buildfarm critters are going through verizon: > > $ traceroute buildfarm.postgresql.org > traceroute to buildfarm.postgresql.org (174.143.35.217), 64 hops max, 40 byte packets > 1 router2 (192.168.168.2) 5.031 ms 4.108 ms 4.195 ms > 2 * * * > 3 b3309.pitbpa-lcr-22.verizon-gni.net (130.81.28.70) 133.770 ms 8.250 ms 8.069 ms > 4 * * * > 5 * * * > 6 0.et-7-3-0.br1.iad8.alter.net (140.222.239.83) 16.920 ms 0.et-5-1-5.br1.iad8.alter.net (140.222.0.65) 16.547 ms 14.896 ms > 7 xe-2-1-0.er2.iad10.us.zip.zayo.com (64.125.13.173) 15.706 ms 14.482 ms 14.481 ms > 8 ae1.cr2.dca2.us.zip.zayo.com (64.125.20.121) 15.731 ms 17.841 ms 17.265 ms > 9 ae27.cs2.dca2.us.eth.zayo.com (64.125.30.248) 68.536 ms 68.973 ms 67.971 ms > 10 ae4.cs2.lga5.us.eth.zayo.com (64.125.29.30) 67.021 ms 68.627 ms 67.130 ms > 11 ae3.cs2.ord2.us.eth.zayo.com (64.125.29.213) 67.277 ms 67.727 ms 68.079 ms > 12 ae5.cs2.den5.us.eth.zayo.com (64.125.29.216) 66.520 ms 69.493 ms 73.931 ms > 13 ae7.cs2.den5.us.eth.zayo.com (64.125.31.237) 193.146 ms 66.854 ms 67.034 ms > 14 ae28.er1.dfw2.us.zip.zayo.com (64.125.26.15) 149.815 ms 67.910 ms 70.480 ms > 15 128.177.70.86.ipyx-076520-900-zyo.zip.zayo.com (128.177.70.86) 68.615 ms 78.398 ms 70.127 ms > 16 * * * > 17 be41.coreb.dfw1.rackspace.net (74.205.108.117) 73.871 ms 74.205.108.121 (74.205.108.121) 67.374 ms 70.223 ms > 18 core5-corea.dfw1.rackspace.net (74.205.108.11) 72.848 ms po1.corea.core6.dfw1.rackspace.net (72.32.111.13) 69.176ms core5-corea.dfw1.rackspace.net (74.205.108.11) 70.037 ms > 19 po2.core6.aggr313a.rackspace.net (72.32.111.211) 203.537 ms 70.130 ms core5-aggr313a.dfw1.rackspace.net (67.192.56.63) 70.854 ms > 20 * * * > 21 * * * > 22 * * * > 23 * * * > 24 * * * > 25 * * * > 26 * * * > 27 * * * > 28 * * * > 29 * * * > 30 * * * > > Also, if the issue is somewhere in between, that fails to explain why > "curl" works but not perl. not sure that proofs that v4 vs v6 is out entirely, there could be other factors involved (like your isp mapping v4 to v6 on the router(!) maybe combined with dns64/dns46 related tricks). It should still be possible to figure out where exactly the "network unreachable" comes from - whether it is something the local tcp/ip stack generates or something that comes in as an icmp-error from remote using tcpdump or similiar. Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > On 07/17/2018 09:22 PM, Tom Lane wrote: >> Also, if the issue is somewhere in between, that fails to explain why >> "curl" works but not perl. > not sure that proofs that v4 vs v6 is out entirely, there could be other > factors involved (like your isp mapping v4 to v6 on the router(!) maybe > combined with dns64/dns46 related tricks). > It should still be possible to figure out where exactly the "network > unreachable" comes from - whether it is something the local tcp/ip stack > generates or something that comes in as an icmp-error from remote using > tcpdump or similiar. Good idea ... tcpdump says *nothing at all* is happening during the https request, which led me to try strace'ing the perl run, and that tells the tale: [ lots of setup omitted ] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) sendto(3, "\370-\1\0\0\1\0\0\0\0\0\0\tbuildfarm\npostgresq"..., 42, MSG_NOSIGNAL, NULL, 0) = 42 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}]) ioctl(3, FIONREAD, [70]) = 0 recvfrom(3, "\370-\201\200\0\1\0\1\0\0\0\0\tbuildfarm\npostgresq"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")},[16]) = 70 close(3) = 0 socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffe430799c0) = -1 EINVAL (Invalid argument) lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffe430799c0) = -1 EINVAL (Invalid argument) lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl(3, F_SETFD, FD_CLOEXEC) = 0 bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},28) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:4800:1501:1::217", &sin6_addr), sin6_flowinfo=0,sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable) close(3) = 0 write(2, "500 Can't connect to buildfarm.p"..., 83500 Can't connect to buildfarm.postgresql.org:443 (connect: Network isunreachable)) = 83 write(2, " <URL:https://buildfarm.postgres"..., 65 <URL:https://buildfarm.postgresql.org/branches_of_interest.txt> ) = 65 So for some reason, perl's https support is trying to bind to the IPv6 address of buildfarm.postgresql.org, even though no IPv6 support is configured at all on this machine. I wonder how long that's been going on? Has anything about the machine's DNS entries changed recently? (Also, "ssh buildfarm.postgresql.org" binds to IPv4 just fine.) Also, checking the equally inexplicable failure on dromedary, it looks like the explanation might be similar there, only reversed: the http: request produces zero interface traffic, suggesting that it's getting mapped to an IPv6 address. I don't seem to have a working strace equivalent on that machine so it's harder to be sure about it. regards, tom lane
On 07/17/2018 10:14 PM, Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> On 07/17/2018 09:22 PM, Tom Lane wrote: >>> Also, if the issue is somewhere in between, that fails to explain why >>> "curl" works but not perl. > >> not sure that proofs that v4 vs v6 is out entirely, there could be other >> factors involved (like your isp mapping v4 to v6 on the router(!) maybe >> combined with dns64/dns46 related tricks). >> It should still be possible to figure out where exactly the "network >> unreachable" comes from - whether it is something the local tcp/ip stack >> generates or something that comes in as an icmp-error from remote using >> tcpdump or similiar. > > Good idea ... tcpdump says *nothing at all* is happening during the > https request, which led me to try strace'ing the perl run, and that > tells the tale: > > [ lots of setup omitted ] > socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 > connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 > poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) > sendto(3, "\370-\1\0\0\1\0\0\0\0\0\0\tbuildfarm\npostgresq"..., 42, MSG_NOSIGNAL, NULL, 0) = 42 > poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}]) > ioctl(3, FIONREAD, [70]) = 0 > recvfrom(3, "\370-\201\200\0\1\0\1\0\0\0\0\tbuildfarm\npostgresq"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53),sin_addr=inet_addr("127.0.0.1")}, [16]) = 70 > close(3) = 0 > socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3 > ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffe430799c0) = -1 EINVAL (Invalid argument) > lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffe430799c0) = -1 EINVAL (Invalid argument) > lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},28) = 0 > connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:4800:1501:1::217", &sin6_addr), sin6_flowinfo=0,sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable) > close(3) = 0 > write(2, "500 Can't connect to buildfarm.p"..., 83500 Can't connect to buildfarm.postgresql.org:443 (connect: Network isunreachable)) = 83 > write(2, " <URL:https://buildfarm.postgres"..., 65 <URL:https://buildfarm.postgresql.org/branches_of_interest.txt> > ) = 65 > > So for some reason, perl's https support is trying to bind to the IPv6 > address of buildfarm.postgresql.org, even though no IPv6 support is > configured at all on this machine. I wonder how long that's been going > on? Has anything about the machine's DNS entries changed recently? > (Also, "ssh buildfarm.postgresql.org" binds to IPv4 just fine.) > > Also, checking the equally inexplicable failure on dromedary, it looks > like the explanation might be similar there, only reversed: the http: > request produces zero interface traffic, suggesting that it's getting > mapped to an IPv6 address. I don't seem to have a working strace > equivalent on that machine so it's harder to be sure about it. I dont think there have been any recent changes on (DNS) v6 for brentalia - afaiks in our internal revision control we have had v6 on that box for at least 2 years now. However could it be that whatever DNS resolver those boxes are using just started to return AAAAs as well (the strsize in the strace output is not large enough to see the actual response from the local resolver) - like as part of your ISP enabling v6? Also note that the bind() call does actually return 0 so not sure it is perl to blame that it tries a connect() as well... Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > On 07/17/2018 10:14 PM, Tom Lane wrote: >> So for some reason, perl's https support is trying to bind to the IPv6 >> address of buildfarm.postgresql.org, even though no IPv6 support is >> configured at all on this machine. I wonder how long that's been going >> on? Has anything about the machine's DNS entries changed recently? >> (Also, "ssh buildfarm.postgresql.org" binds to IPv4 just fine.) > I dont think there have been any recent changes on (DNS) v6 for > brentalia - afaiks in our internal revision control we have had v6 on > that box for at least 2 years now. > However could it be that whatever DNS resolver those boxes are using > just started to return AAAAs as well (the strsize in the strace output > is not large enough to see the actual response from the local resolver) The nameserver is one I run locally, and the only change it's seen lately is RHEL6's occasional security updates. I don't think that's where the issue came in. The full nameserver interaction is sendto(3, "\x21\x86\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x1c\x00\x01", 42,MSG_NOSIGNAL, NULL, 0) = 42 recvfrom(3, "\x21\x86\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x1c\x00\x01\xc0\x0c\x00\x1c\x00\x01\x00\x00\x06\xc1\x00\x10\x20\x01\x48\x00\x15\x01\x00\x01\x00\x00\x00\x00\x00\x00\x02\x17", 1024,0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 70 I don't have anything handy like wireshark installed on this machine, but I see the hex for buildfarm's IPv6 address in that response, and *not* the hex for its IPv4 address. Conversely, when I try the http: URL, I see a different query and only the IPv4 address in the response: sendto(3, "\xa8\x93\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x01\x00\x01", 42,MSG_NOSIGNAL, NULL, 0) = 42 recvfrom(3, "\xa8\x93\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00\x01\xd5\x00\x04\xae\x8f\x23\xd9", 1024,0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 58 It looks like Perl is specifically asking for AAAA in preference to A records, but only for https:. Weird. regards, tom lane
> On Jul 17, 2018, at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The nameserver is one I run locally, and the only change it's seen lately > is RHEL6's occasional security updates. I don't think that's where the > issue came in. > > The full nameserver interaction is > > sendto(3, "\x21\x86\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x1c\x00\x01", 42,MSG_NOSIGNAL, NULL, 0) = 42 00 1c is AAAA, so this is requesting the AAAA for buildfarm.postgresql.org > > recvfrom(3, "\x21\x86\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x1c\x00\x01\xc0\x0c\x00\x1c\x00\x01\x00\x00\x06\xc1\x00\x10\x20\x01\x48\x00\x15\x01\x00\x01\x00\x00\x00\x00\x00\x00\x02\x17", 1024,0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 70 > > I don't have anything handy like wireshark installed on this machine, but > I see the hex for buildfarm's IPv6 address in that response, and *not* > the hex for its IPv4 address. Conversely, when I try the http: URL, > I see a different query and only the IPv4 address in the response: > > sendto(3, "\xa8\x93\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x01\x00\x01", 42,MSG_NOSIGNAL, NULL, 0) = 42 and 00 01 is A. > > recvfrom(3, "\xa8\x93\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00\x01\xd5\x00\x04\xae\x8f\x23\xd9", 1024,0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 58 > > It looks like Perl is specifically asking for AAAA in preference to A > records, but only for https:. Weird. Rather weird. Cheers, Steve
On 07/17/2018 11:29 PM, Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> On 07/17/2018 10:14 PM, Tom Lane wrote: >>> So for some reason, perl's https support is trying to bind to the IPv6 >>> address of buildfarm.postgresql.org, even though no IPv6 support is >>> configured at all on this machine. I wonder how long that's been going >>> on? Has anything about the machine's DNS entries changed recently? >>> (Also, "ssh buildfarm.postgresql.org" binds to IPv4 just fine.) > >> I dont think there have been any recent changes on (DNS) v6 for >> brentalia - afaiks in our internal revision control we have had v6 on >> that box for at least 2 years now. >> However could it be that whatever DNS resolver those boxes are using >> just started to return AAAAs as well (the strsize in the strace output >> is not large enough to see the actual response from the local resolver) > > The nameserver is one I run locally, and the only change it's seen lately > is RHEL6's occasional security updates. I don't think that's where the > issue came in. > > The full nameserver interaction is > > sendto(3, "\x21\x86\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x1c\x00\x01", 42,MSG_NOSIGNAL, NULL, 0) = 42 > > recvfrom(3, "\x21\x86\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x1c\x00\x01\xc0\x0c\x00\x1c\x00\x01\x00\x00\x06\xc1\x00\x10\x20\x01\x48\x00\x15\x01\x00\x01\x00\x00\x00\x00\x00\x00\x02\x17", 1024,0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 70 > > I don't have anything handy like wireshark installed on this machine, but > I see the hex for buildfarm's IPv6 address in that response, and *not* > the hex for its IPv4 address. Conversely, when I try the http: URL, > I see a different query and only the IPv4 address in the response: > > sendto(3, "\xa8\x93\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x01\x00\x01", 42,MSG_NOSIGNAL, NULL, 0) = 42 > > recvfrom(3, "\xa8\x93\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x75\x69\x6c\x64\x66\x61\x72\x6d\x0a\x70\x6f\x73\x74\x67\x72\x65\x73\x71\x6c\x03\x6f\x72\x67\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00\x01\xd5\x00\x04\xae\x8f\x23\xd9", 1024,0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 58 > > It looks like Perl is specifically asking for AAAA in preference to A > records, but only for https:. Weird. not really weird I think - the buildfarm uses LWP and for SSL support it might use(iirc) either Crypt::SSLeay (older versions before unbundling of lwp::protocol:https) or IO::Socket:SSL which has this in its docs: "Please be aware that with the IPv6 capable super classes, it will look first for the IPv6 address of a given hostname. If the resolver provides an IPv6 address, but the host cannot be reached by IPv6, there will be no automatic fallback to IPv4. To avoid these problems you can enforce IPv4 for a specific socket by using the Domain or Family option with the value AF_INET as described in IO::Socket::IP. Alternatively you can enforce IPv4 globally by loading IO::Socket::SSL with the option 'inet4', in which case it will use the IPv4 only class IO::Socket::INET as the super class." So maybe removing the IO::Socket::INET6 superclass/package from the system will get it working (or hacking the buildfarm script). Stefan
On Wed, Jul 18, 2018 at 2:57 AM, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
not really weird I think - the buildfarm uses LWP and for SSL support it might use(iirc) either Crypt::SSLeay (older versions before unbundling of lwp::protocol:https) or IO::Socket:SSL which has this in its docs:On 07/17/2018 11:29 PM, Tom Lane wrote:Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:On 07/17/2018 10:14 PM, Tom Lane wrote:So for some reason, perl's https support is trying to bind to the IPv6
address of buildfarm.postgresql.org, even though no IPv6 support is
configured at all on this machine. I wonder how long that's been going
on? Has anything about the machine's DNS entries changed recently?
(Also, "ssh buildfarm.postgresql.org" binds to IPv4 just fine.)I dont think there have been any recent changes on (DNS) v6 for
brentalia - afaiks in our internal revision control we have had v6 on
that box for at least 2 years now.
However could it be that whatever DNS resolver those boxes are using
just started to return AAAAs as well (the strsize in the strace output
is not large enough to see the actual response from the local resolver)
The nameserver is one I run locally, and the only change it's seen lately
is RHEL6's occasional security updates. I don't think that's where the
issue came in.
The full nameserver interaction is
sendto(3, "\x21\x86\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x7 5\x69\x6c\x64\x66\x61\x72\x6d\ x0a\x70\x6f\x73\x74\x67\x72\ x65\x73\x71\x6c\x03\x6f\x72\ x67\x00\x00\x1c\x00\x01", 42, MSG_NOSIGNAL, NULL, 0) = 42
recvfrom(3, "\x21\x86\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x7 5\x69\x6c\x64\x66\x61\x72\x6d\ x0a\x70\x6f\x73\x74\x67\x72\ x65\x73\x71\x6c\x03\x6f\x72\ x67\x00\x00\x1c\x00\x01\xc0\ x0c\x00\x1c\x00\x01\x00\x00\ x06\xc1\x00\x10\x20\x01\x48\ x00\x15\x01\x00\x01\x00\x00\ x00\x00\x00\x00\x02\x17", 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1" )}, [16]) = 70
I don't have anything handy like wireshark installed on this machine, but
I see the hex for buildfarm's IPv6 address in that response, and *not*
the hex for its IPv4 address. Conversely, when I try the http: URL,
I see a different query and only the IPv4 address in the response:
sendto(3, "\xa8\x93\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x09\x62\x7 5\x69\x6c\x64\x66\x61\x72\x6d\ x0a\x70\x6f\x73\x74\x67\x72\ x65\x73\x71\x6c\x03\x6f\x72\ x67\x00\x00\x01\x00\x01", 42, MSG_NOSIGNAL, NULL, 0) = 42
recvfrom(3, "\xa8\x93\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x09\x62\x7 5\x69\x6c\x64\x66\x61\x72\x6d\ x0a\x70\x6f\x73\x74\x67\x72\ x65\x73\x71\x6c\x03\x6f\x72\ x67\x00\x00\x01\x00\x01\xc0\ x0c\x00\x01\x00\x01\x00\x00\ x01\xd5\x00\x04\xae\x8f\x23\ xd9", 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1" )}, [16]) = 58
It looks like Perl is specifically asking for AAAA in preference to A
records, but only for https:. Weird.
"Please be aware that with the IPv6 capable super classes, it will look first for the IPv6 address of a given hostname. If the resolver provides an IPv6 address, but the host cannot be reached by IPv6, there will be no automatic fallback to IPv4. To avoid these problems you can enforce IPv4 for a specific socket by using the Domain or Family option with the value AF_INET as described in IO::Socket::IP. Alternatively you can enforce IPv4 globally by loading IO::Socket::SSL with the option 'inet4', in which case it will use the IPv4 only class IO::Socket::INET as the super class."
So maybe removing the IO::Socket::INET6 superclass/package from the system will get it working (or hacking the buildfarm script).
Tom, please see if adding this at the top of the failing script fixes it:
use IO::Socket::SSL qw (inet);
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom, please see if adding this at the top of the failing script fixes it: > use IO::Socket::SSL qw (inet); No, that doesn't work at all, but use IO::Socket::SSL qw (inet4); does fix it. Not sure how far that helps though --- we'd not want to put that in the buildfarm client would we? Some more detail: tracing shows that IO::Socket::INET6 is getting used, and that contains code that purports to make the correct decision between IPv6 and IPv4, but it's going wrong. It looks like what it *actually* does is make sure that both the local and remote addresses can be resolved in the same address family. I think that the local address is probably "localhost", which RHEL6 will helpfully resolve as either 127.0.0.1 or ::1 regardless of whether there's any other support for IPv6 anyplace, allowing INET6 to predict that the connection will work ... which it doesn't, but the code doesn't want to retry after failing that step. Perhaps I could fix this by rejiggering things so that localhost only resolves as 127.0.0.1, but I don't really want to muck with that. Removing the perl-IO-Socket-INET6 package would be less invasive. regards, tom lane
On 07/20/2018 01:11 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Tom, please see if adding this at the top of the failing script fixes it: >> use IO::Socket::SSL qw (inet); > > No, that doesn't work at all, but > > use IO::Socket::SSL qw (inet4); > > does fix it. Not sure how far that helps though --- we'd not want to put > that in the buildfarm client would we? maybe a more general option to "force ipv4 or ipv6" akin to what most unix networking related utilities support with -4 and -6 might be useful? On the other side I wonder whether passing in "MultiHomed" to the IO::Socket::INET6 Constructor behind LWPs back might work - though the docs are pretty light on any details on its actual behaviour: https://metacpan.org/pod/release/SHLOMIF/IO-Socket-INET6-2.72/lib/IO/Socket/INET6.pm#CONSTRUCTOR > > Some more detail: tracing shows that IO::Socket::INET6 is getting used, > and that contains code that purports to make the correct decision between > IPv6 and IPv4, but it's going wrong. It looks like what it *actually* > does is make sure that both the local and remote addresses can be resolved > in the same address family. I think that the local address is probably > "localhost", which RHEL6 will helpfully resolve as either 127.0.0.1 or ::1 > regardless of whether there's any other support for IPv6 anyplace, > allowing INET6 to predict that the connection will work ... which it > doesn't, but the code doesn't want to retry after failing that step. > > Perhaps I could fix this by rejiggering things so that localhost only > resolves as 127.0.0.1, but I don't really want to muck with that. > Removing the perl-IO-Socket-INET6 package would be less invasive. yeah the removal seems easier but do you actually know yet why the system started behaving differently in that regard? Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > maybe a more general option to "force ipv4 or ipv6" akin to what most > unix networking related utilities support with -4 and -6 might be useful? +1 > On the other side I wonder whether passing in "MultiHomed" to the > IO::Socket::INET6 Constructor behind LWPs back might work - though the > docs are pretty light on any details on its actual behaviour: No, I already looked at the code :-(. MultiHomed allows it to try multiple IP addresses obtained from getaddrinfo, but it's already made up its mind whether to use IPv4 or IPv6, and only addresses of the given type will be tried. (The loop logic looks more than slightly broken, too, at least in the 2.56 version I've got here. I do not think the author was very clear on whether he needed to handle multiple local addresses or multiple remote addresses, but AFAICS it will only work in the unlikely case that you've got *both*, because it loops through both getaddrinfo results in lockstep.) > yeah the removal seems easier but do you actually know yet why the > system started behaving differently in that regard? I don't know that it ever was different. I've never tried to run the buildfarm client on this machine; I just happened to try the manual getprint(".../branches_of_interest.txt") invocation that I'd also been testing on my buildfarm hosts. Presumably, the RHEL/Fedora machines that are in the buildfarm have different network environments where it's not a problem. regards, tom lane