Thread: PG 7.3.1 with ssl on linux hangs (testcase available)
since I moved from postgres 7.1 to 7.3 I get hangs when executing a select on certaint rows with binary data using psql The problem is reproducible and affects only single rows in the table . The hangs happen only when I connect via network unsing psql -h localhost template1 And the problem is gone, when ssl is switched off in postgresql.conf But the problem seems to be on the client side, cause its the same using a 7.1 server. I have generated a testcase with 2 rows, one works the other not: The table looks like this create table testy (id integer, data bytea ); (I can send a dump of the table with the offending rows to anybody interessted by mail -- its about 130 K ) select * from testy where id=28162 --> works select * from testy where id=28163 --> hangs + process goes up to 99% utilisation The offending row is much larger, that might trigger. I use a debian system with: postgresql 7.3.3-1 client and server -- Hans-Jürgen Hay <hjh@alterras.de>
Hans-Jürgen Hay <hjh@alterras.de> writes: > The hangs happen only when I connect via network unsing psql -h > localhost template1 > And the problem is gone, when ssl is switched off in postgresql.conf We have heard a couple of reports of problems with SSL connections, but so far I've not been able to reproduce any trouble here. Details about your SSL installation might help. regards, tom lane
Tom Lane wrote: >Hans-Jürgen Hay <hjh@alterras.de> writes: > > >>The hangs happen only when I connect via network unsing psql -h >>localhost template1 >>And the problem is gone, when ssl is switched off in postgresql.conf >> >> > >We have heard a couple of reports of problems with SSL connections, >but so far I've not been able to reproduce any trouble here. Details >about your SSL installation might help. > > > Hi Tom, good to hear that's not a singular problem with pgadmin3... I currently don't have a system to reproduce this, because I didn't succeed making my 7.4 server accepting ssl-connections. I posted for installation help on pgsql-admin but didn't get a reaction so far. RTFM up and down I couldn't see what I configured wrong (--with-openssl, ssl=true). Can you give me a hint? If I can reproduce the problem, I might be able to trace down that ssl thing. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > I currently don't have a system to reproduce this, because I didn't > succeed making my 7.4 server accepting ssl-connections. I posted for > installation help on pgsql-admin but didn't get a reaction so far. RTFM > up and down I couldn't see what I configured wrong (--with-openssl, > ssl=true). Can you give me a hint? No, I didn't see what you did wrong either. Try looking in the postmaster log to see if any useful error messages appear. regards, tom lane
Tom Lane wrote: >Andreas Pflug <pgadmin@pse-consulting.de> writes: > > >>I currently don't have a system to reproduce this, because I didn't >>succeed making my 7.4 server accepting ssl-connections. I posted for >>installation help on pgsql-admin but didn't get a reaction so far. RTFM >>up and down I couldn't see what I configured wrong (--with-openssl, >>ssl=true). Can you give me a hint? >> >> > >No, I didn't see what you did wrong either. Try looking in the >postmaster log to see if any useful error messages appear. > I pushed client_min_messages and log_min_messages to debug5, and the only suspicious message at startup is LOG: could not load root cert file "/usr/data/pgsql-7.4/root.crt": No such file or directory DETAIL: Will not verify client certificates. which shouldn't be a problem (pg_hba.conf is configured to trust). Any connect attempt will log DEBUG: forked new backend, pid=1826 socket=8 DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 1826) exited with exit code 0 with the client side message "server does not support SSL, but SSL was required". OpenSSL is 0.9.6g Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > I pushed client_min_messages and log_min_messages to debug5, and the > only suspicious message at startup is > LOG: could not load root cert file "/usr/data/pgsql-7.4/root.crt": No > such file or directory > DETAIL: Will not verify client certificates. > which shouldn't be a problem (pg_hba.conf is configured to trust). That's expected if you don't provide a root.crt file. (I don't, and it still works for me.) > Any connect attempt will log > DEBUG: forked new backend, pid=1826 socket=8 > DEBUG: proc_exit(0) > DEBUG: shmem_exit(0) > DEBUG: exit(0) > DEBUG: reaping dead processes > DEBUG: child process (pid 1826) exited with exit code 0 > with the client side message "server does not support SSL, but SSL was > required". OpenSSL is 0.9.6g Bizarre. I looked through the recent sslmode patch, and while I found some things I didn't like, none of them explain this. Could you perhaps get a trace of the server-to-client interaction? Either strace'ing psql or watching the IP traffic with a packet sniffer should do --- but make sure you get the full contents of each packet sent and received. regards, tom lane
Tom Lane wrote: >Bizarre. I looked through the recent sslmode patch, and while I found >some things I didn't like, none of them explain this. Could you perhaps >get a trace of the server-to-client interaction? Either strace'ing psql >or watching the IP traffic with a packet sniffer should do --- but make >sure you get the full contents of each packet sent and received. > > Hi Tom, the attached file is a tcpdump of a connect attempt. Regards, Andreas 00:14:21.128861 localhost.32821 > localhost.postgresql: F 2173273900:2173273900(0) ack 2168553850 win 32767 <nop,nop,timestamp108609 106946> (DF) 4500 0034 0340 4000 4006 3982 7f00 0001 7f00 0001 8035 1538 8189 872c 8141 817a 8011 7fff 0dd5 0000 0101 080a 0001 a841 0001 a1c2 00:14:21.131604 localhost.postgresql > localhost.32821: F 1:1(0) ack 1 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 0034 0341 4000 4006 3981 7f00 0001 7f00 0001 1538 8035 8141 817a 8189 872d 8011 7fff 0755 0000 0101 080a 0001 a841 0001 a841 00:14:21.131635 localhost.32821 > localhost.postgresql: . ack 2 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 0034 0342 4000 4006 3980 7f00 0001 7f00 0001 8035 1538 8189 872d 8141 817b 8010 7fff 0755 0000 0101 080a 0001 a841 0001 a841 00:14:21.132540 localhost.32822 > localhost.postgresql: S 2177528657:2177528657(0) win 32767 <mss 16396,sackOK,timestamp108609 0,nop,wscale 0> (DF) 4500 003c 0343 4000 4006 3977 7f00 0001 7f00 0001 8036 1538 81ca 7351 0000 0000 a002 7fff 5ce0 0000 0204 400c 0402 080a 0001 a841 0000 0000 0103 0300 00:14:21.132565 localhost.postgresql > localhost.32822: S 2176273229:2176273229(0) ack 2177528658 win 32767 <mss 16396,sackOK,timestamp108609 108609,nop,wscale 0> (DF) 4500 003c 0344 4000 4006 3976 7f00 0001 7f00 0001 1538 8036 81b7 4b4d 81ca 7352 a012 7fff e787 0000 0204 400c 0402 080a 0001 a841 0001 a841 0103 0300 00:14:21.132582 localhost.32822 > localhost.postgresql: . ack 1 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 0034 0345 4000 4006 397d 7f00 0001 7f00 0001 8036 1538 81ca 7352 81b7 4b4e 8010 7fff 50a5 0000 0101 080a 0001 a841 0001 a841 00:14:21.132638 localhost.32822 > localhost.postgresql: P 1:9(8) ack 1 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 003c 0346 4000 4006 3974 7f00 0001 7f00 0001 8036 1538 81ca 7352 81b7 4b4e 8018 7fff 358c 0000 0101 080a 0001 a841 0001 a841 0000 0008 04d2 162f 00:14:21.132647 localhost.postgresql > localhost.32822: . ack 9 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 0034 0347 4000 4006 397b 7f00 0001 7f00 0001 1538 8036 81b7 4b4e 81ca 735a 8010 7fff 509d 0000 0101 080a 0001 a841 0001 a841 00:14:21.132833 localhost.filenet-tms > localhost.filenet-tms: udp 24 6000 0000 0020 1140 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0001 8000 8000 0020 4aa6 0200 0000 1800 0000 0000 0000 9b05 0000 0000 0000 0000 0000 00:14:21.134264 localhost.postgresql > localhost.32822: P 1:2(1) ack 9 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 0035 0348 4000 4006 3979 7f00 0001 7f00 0001 1538 8036 81b7 4b4e 81ca 735a 8018 7fff 0294 0000 0101 080a 0001 a841 0001 a841 4e 00:14:21.134284 localhost.32822 > localhost.postgresql: . ack 2 win 32767 <nop,nop,timestamp 108609 108609> (DF) 4500 0034 0349 4000 4006 3979 7f00 0001 7f00 0001 8036 1538 81ca 735a 81b7 4b4f 8010 7fff 509c 0000 0101 080a 0001 a841 0001 a841
Hi Tom, reducing the traced conversation to its essence it's just a ssl-request to a non-ssl server :-( >localhost.32822 > localhost.postgresql: 0000 0008 04d2 162f > > >localhost.postgresql > localhost.32822: 4e > And no hints from serverlog. Are there some more debugging options that I could enable in the backend to be a bit more verbose about the communication process? Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > the attached file is a tcpdump of a connect attempt. Hm. The postmaster is sending back 'N' indicating that it does not want to do SSL. Are you sure you are connecting to an SSL-enabled postmaster? Also, is your connection by any chance IPV6? It doesn't look like it from the tcpdump, but I'm not sure I know how to tell. The relevant bit of code in the postmaster is #ifdef USE_SSL /* No SSL when disabled or on Unix sockets */ if (!EnableSSL || port->laddr.addr.ss_family != AF_INET) SSLok = 'N'; else SSLok = 'S'; /* Support for SSL */ #else SSLok = 'N'; /* No support for SSL */ #endif so one of the paths that chooses 'N' is being followed. The test on AF_INET is broken, since it should allow for IPV6, but otherwise I don't see what's getting you. regards, tom lane
Andreas Pflug <pgadmin@pse-consulting.de> writes: > reducing the traced conversation to its essence it's just a ssl-request > to a non-ssl server :-( Yeah, I had just come to the same conclusion. Either you're connecting to the wrong server (check those port numbers?) or there's something wrong with the bit of postmaster code I just showed you. regards, tom lane
Tom Lane wrote: >Hm. The postmaster is sending back 'N' indicating that it does not want >to do SSL. > >Are you sure you are connecting to an SSL-enabled postmaster? > >Also, is your connection by any chance IPV6? It doesn't look like it >from the tcpdump, but I'm not sure I know how to tell. > Ah, that's it! My code examination just had reached ProcessStartupPacket, when your mail arrived. The kernel is IPV6, and incoming IPV4 connection will be handled IPV6; I can tell this from pg_hba.conf experiments that my installation requires the ::ffff:192.168.0.0/24 entry. >The relevant bit of code in the postmaster is > >#ifdef USE_SSL > /* No SSL when disabled or on Unix sockets */ > if (!EnableSSL || port->laddr.addr.ss_family != AF_INET) > SSLok = 'N'; > else > SSLok = 'S'; /* Support for SSL */ >#else > SSLok = 'N'; /* No support for SSL */ >#endif > According to the comment, it should be if (!EnableSSL || port->laddr.addr.ss_family == AF_UNIX) and after changing this it works! Quite hard to find, probably most users you'd ask "do you use IPV6" would have answered "no". I didn't opt vor IPV6 either, it's a stock SuSE 8.1. Immediately, I checked the reported ssl hangs, and found both cases working. Dave reported this failing on snake.pgadmin.org, which uses a different openssl (0.9.7a), I'll see if I can check that version (won't be before monday, need some sleep now). Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > According to the comment, it should be > if (!EnableSSL || port->laddr.addr.ss_family == AF_UNIX) > and after changing this it works! Right. I've committed the fix. One bug less, now to find the next one ... > Dave reported this failing on snake.pgadmin.org, which uses a > different openssl (0.9.7a), I'll see if I can check that version (won't > be before monday, need some sleep now). Sounds good, see you then. regards, tom lane
Ah-hah, the problem is timing-dependent. I was able to reproduce it after I tried connecting from one machine to another. The patch against 7.3.4 is attached. regards, tom lane Index: fe-misc.c =================================================================== RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-misc.c,v retrieving revision 1.85 diff -c -r1.85 fe-misc.c *** fe-misc.c 24 Oct 2002 23:35:55 -0000 1.85 --- fe-misc.c 4 Aug 2003 17:22:19 -0000 *************** *** 553,559 **** --- 553,570 ---- * file is ready. Grumble. Fortunately, we don't expect this path to * be taken much, since in normal practice we should not be trying to * read data unless the file selected for reading already. + * + * In SSL mode it's even worse: SSL_read() could say WANT_READ and then + * data could arrive before we make the pqReadReady() test. So we must + * play dumb and assume there is more data, relying on the SSL layer to + * detect true EOF. */ + + #ifdef USE_SSL + if (conn->ssl) + return 0; + #endif + switch (pqReadReady(conn)) { case 0: Index: fe-secure.c =================================================================== RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.15.2.5 diff -c -r1.15.2.5 fe-secure.c *** fe-secure.c 10 Apr 2003 23:03:13 -0000 1.15.2.5 --- fe-secure.c 4 Aug 2003 17:22:19 -0000 *************** *** 278,298 **** libpq_gettext("SSL SYSCALL error: %s\n"), SOCK_STRERROR(SOCK_ERRNO)); else printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL SYSCALL error: EOF detected\n")); break; case SSL_ERROR_SSL: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL error: %s\n"), SSLerrmessage()); /* fall through */ case SSL_ERROR_ZERO_RETURN: - pqsecure_close(conn); SOCK_ERRNO = ECONNRESET; n = -1; break; default: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("Unknown SSL error code\n")); break; } } --- 278,302 ---- libpq_gettext("SSL SYSCALL error: %s\n"), SOCK_STRERROR(SOCK_ERRNO)); else + { printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL SYSCALL error: EOF detected\n")); + SOCK_ERRNO = ECONNRESET; + n = -1; + } break; case SSL_ERROR_SSL: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL error: %s\n"), SSLerrmessage()); /* fall through */ case SSL_ERROR_ZERO_RETURN: SOCK_ERRNO = ECONNRESET; n = -1; break; default: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("Unknown SSL error code\n")); + n = -1; break; } } *************** *** 334,354 **** libpq_gettext("SSL SYSCALL error: %s\n"), SOCK_STRERROR(SOCK_ERRNO)); else printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL SYSCALL error: EOF detected\n")); break; case SSL_ERROR_SSL: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL error: %s\n"), SSLerrmessage()); /* fall through */ case SSL_ERROR_ZERO_RETURN: - pqsecure_close(conn); SOCK_ERRNO = ECONNRESET; n = -1; break; default: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("Unknown SSL error code\n")); break; } } --- 338,362 ---- libpq_gettext("SSL SYSCALL error: %s\n"), SOCK_STRERROR(SOCK_ERRNO)); else + { printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL SYSCALL error: EOF detected\n")); + SOCK_ERRNO = ECONNRESET; + n = -1; + } break; case SSL_ERROR_SSL: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("SSL error: %s\n"), SSLerrmessage()); /* fall through */ case SSL_ERROR_ZERO_RETURN: SOCK_ERRNO = ECONNRESET; n = -1; break; default: printfPQExpBuffer(&conn->errorMessage, libpq_gettext("Unknown SSL error code\n")); + n = -1; break; } }