Thread: PG 7.3.1 with ssl on linux hangs (testcase available)

PG 7.3.1 with ssl on linux hangs (testcase available)

From
Hans-Jürgen Hay
Date:
since I moved from postgres 7.1 to 7.3 I get hangs when executing a
select on certaint rows with binary data using psql

The problem is reproducible and affects only single rows in the table .
The hangs happen only when I connect via network unsing psql -h
localhost template1

And the problem is gone, when ssl is switched off in postgresql.conf
But the problem seems to be on the client side, cause its the same using
a 7.1 server.

I have generated a testcase with 2 rows, one works the other not:

The table looks like this

create table testy (id integer, data bytea
);
  (I can send a dump of the table with the offending rows to anybody
interessted by mail -- its about 130 K )
  select * from testy where id=28162    --> works  select * from testy where id=28163    --> hangs + process goes up to
99% utilisation

The offending row is much larger, that might trigger.
  I use a debian system with:  postgresql                               7.3.3-1 client and server


--
Hans-Jürgen Hay <hjh@alterras.de>



Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Hans-Jürgen Hay <hjh@alterras.de> writes:
> The hangs happen only when I connect via network unsing psql -h
> localhost template1
> And the problem is gone, when ssl is switched off in postgresql.conf

We have heard a couple of reports of problems with SSL connections,
but so far I've not been able to reproduce any trouble here.  Details
about your SSL installation might help.
        regards, tom lane


Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Andreas Pflug
Date:
Tom Lane wrote:

>Hans-Jürgen Hay <hjh@alterras.de> writes:
>
>
>>The hangs happen only when I connect via network unsing psql -h
>>localhost template1
>>And the problem is gone, when ssl is switched off in postgresql.conf
>>
>>
>
>We have heard a couple of reports of problems with SSL connections,
>but so far I've not been able to reproduce any trouble here.  Details
>about your SSL installation might help.
>
>
>

Hi Tom,

good to hear that's not a singular problem with pgadmin3...
I currently don't have a system to reproduce this, because I didn't
succeed making my 7.4 server accepting ssl-connections. I posted for
installation help on pgsql-admin but didn't get a reaction so far. RTFM
up and down I couldn't see what I configured wrong (--with-openssl,
ssl=true). Can you give me a hint? If I can reproduce the problem, I
might be able to trace down that ssl thing.

Regards,
Andreas



Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I currently don't have a system to reproduce this, because I didn't
> succeed making my 7.4 server accepting ssl-connections. I posted for
> installation help on pgsql-admin but didn't get a reaction so far. RTFM
> up and down I couldn't see what I configured wrong (--with-openssl,
> ssl=true). Can you give me a hint?

No, I didn't see what you did wrong either.  Try looking in the
postmaster log to see if any useful error messages appear.

            regards, tom lane

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Andreas Pflug
Date:
Tom Lane wrote:

>Andreas Pflug <pgadmin@pse-consulting.de> writes:
>
>
>>I currently don't have a system to reproduce this, because I didn't
>>succeed making my 7.4 server accepting ssl-connections. I posted for
>>installation help on pgsql-admin but didn't get a reaction so far. RTFM
>>up and down I couldn't see what I configured wrong (--with-openssl,
>>ssl=true). Can you give me a hint?
>>
>>
>
>No, I didn't see what you did wrong either.  Try looking in the
>postmaster log to see if any useful error messages appear.
>

I pushed client_min_messages and log_min_messages to debug5, and the
only suspicious message at startup is

LOG:  could not load root cert file "/usr/data/pgsql-7.4/root.crt": No
such file or directory
DETAIL:  Will not verify client certificates.

which shouldn't be a problem (pg_hba.conf is configured to trust).

Any connect attempt will log
DEBUG:  forked new backend, pid=1826 socket=8
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  reaping dead processes
DEBUG:  child process (pid 1826) exited with exit code 0

with the client side message "server does not support SSL, but SSL was
required". OpenSSL is 0.9.6g

Regards,
Andreas

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I pushed client_min_messages and log_min_messages to debug5, and the
> only suspicious message at startup is
> LOG:  could not load root cert file "/usr/data/pgsql-7.4/root.crt": No
> such file or directory
> DETAIL:  Will not verify client certificates.
> which shouldn't be a problem (pg_hba.conf is configured to trust).

That's expected if you don't provide a root.crt file.  (I don't, and it
still works for me.)

> Any connect attempt will log
> DEBUG:  forked new backend, pid=1826 socket=8
> DEBUG:  proc_exit(0)
> DEBUG:  shmem_exit(0)
> DEBUG:  exit(0)
> DEBUG:  reaping dead processes
> DEBUG:  child process (pid 1826) exited with exit code 0

> with the client side message "server does not support SSL, but SSL was
> required". OpenSSL is 0.9.6g

Bizarre.  I looked through the recent sslmode patch, and while I found
some things I didn't like, none of them explain this.  Could you perhaps
get a trace of the server-to-client interaction?  Either strace'ing psql
or watching the IP traffic with a packet sniffer should do --- but make
sure you get the full contents of each packet sent and received.

            regards, tom lane

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Andreas Pflug
Date:
Tom Lane wrote:

>Bizarre.  I looked through the recent sslmode patch, and while I found
>some things I didn't like, none of them explain this.  Could you perhaps
>get a trace of the server-to-client interaction?  Either strace'ing psql
>or watching the IP traffic with a packet sniffer should do --- but make
>sure you get the full contents of each packet sent and received.
>
>
Hi Tom,

the attached file is a tcpdump of a connect attempt.

Regards,
Andreas

00:14:21.128861 localhost.32821 > localhost.postgresql: F 2173273900:2173273900(0) ack 2168553850 win 32767
<nop,nop,timestamp108609 106946> (DF) 
             4500 0034 0340 4000 4006 3982 7f00 0001
             7f00 0001 8035 1538 8189 872c 8141 817a
             8011 7fff 0dd5 0000 0101 080a 0001 a841
             0001 a1c2
00:14:21.131604 localhost.postgresql > localhost.32821: F 1:1(0) ack 1 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 0034 0341 4000 4006 3981 7f00 0001
             7f00 0001 1538 8035 8141 817a 8189 872d
             8011 7fff 0755 0000 0101 080a 0001 a841
             0001 a841
00:14:21.131635 localhost.32821 > localhost.postgresql: . ack 2 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 0034 0342 4000 4006 3980 7f00 0001
             7f00 0001 8035 1538 8189 872d 8141 817b
             8010 7fff 0755 0000 0101 080a 0001 a841
             0001 a841
00:14:21.132540 localhost.32822 > localhost.postgresql: S 2177528657:2177528657(0) win 32767 <mss
16396,sackOK,timestamp108609 0,nop,wscale 0> (DF) 
             4500 003c 0343 4000 4006 3977 7f00 0001
             7f00 0001 8036 1538 81ca 7351 0000 0000
             a002 7fff 5ce0 0000 0204 400c 0402 080a
             0001 a841 0000 0000 0103 0300
00:14:21.132565 localhost.postgresql > localhost.32822: S 2176273229:2176273229(0) ack 2177528658 win 32767 <mss
16396,sackOK,timestamp108609 108609,nop,wscale 0> (DF) 
             4500 003c 0344 4000 4006 3976 7f00 0001
             7f00 0001 1538 8036 81b7 4b4d 81ca 7352
             a012 7fff e787 0000 0204 400c 0402 080a
             0001 a841 0001 a841 0103 0300
00:14:21.132582 localhost.32822 > localhost.postgresql: . ack 1 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 0034 0345 4000 4006 397d 7f00 0001
             7f00 0001 8036 1538 81ca 7352 81b7 4b4e
             8010 7fff 50a5 0000 0101 080a 0001 a841
             0001 a841
00:14:21.132638 localhost.32822 > localhost.postgresql: P 1:9(8) ack 1 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 003c 0346 4000 4006 3974 7f00 0001
             7f00 0001 8036 1538 81ca 7352 81b7 4b4e
             8018 7fff 358c 0000 0101 080a 0001 a841
             0001 a841 0000 0008 04d2 162f
00:14:21.132647 localhost.postgresql > localhost.32822: . ack 9 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 0034 0347 4000 4006 397b 7f00 0001
             7f00 0001 1538 8036 81b7 4b4e 81ca 735a
             8010 7fff 509d 0000 0101 080a 0001 a841
             0001 a841
00:14:21.132833 localhost.filenet-tms > localhost.filenet-tms: udp 24
             6000 0000 0020 1140 0000 0000 0000 0000
             0000 0000 0000 0001 0000 0000 0000 0000
             0000 0000 0000 0001 8000 8000 0020 4aa6
             0200 0000 1800 0000 0000 0000 9b05 0000
             0000 0000 0000 0000
00:14:21.134264 localhost.postgresql > localhost.32822: P 1:2(1) ack 9 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 0035 0348 4000 4006 3979 7f00 0001
             7f00 0001 1538 8036 81b7 4b4e 81ca 735a
             8018 7fff 0294 0000 0101 080a 0001 a841
             0001 a841 4e
00:14:21.134284 localhost.32822 > localhost.postgresql: . ack 2 win 32767 <nop,nop,timestamp 108609 108609> (DF)
             4500 0034 0349 4000 4006 3979 7f00 0001
             7f00 0001 8036 1538 81ca 735a 81b7 4b4f
             8010 7fff 509c 0000 0101 080a 0001 a841
             0001 a841

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Andreas Pflug
Date:
Hi Tom,

reducing the traced conversation to its essence it's just a ssl-request
to a non-ssl server :-(

>localhost.32822 > localhost.postgresql: 0000 0008 04d2 162f
>
>
>localhost.postgresql > localhost.32822:  4e
>
And no hints from serverlog.
Are there some more debugging options that I could enable in the backend
to be a bit more verbose about the communication process?

Regards,
Andreas

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> the attached file is a tcpdump of a connect attempt.

Hm.  The postmaster is sending back 'N' indicating that it does not want
to do SSL.

Are you sure you are connecting to an SSL-enabled postmaster?

Also, is your connection by any chance IPV6?  It doesn't look like it
from the tcpdump, but I'm not sure I know how to tell.

The relevant bit of code in the postmaster is

#ifdef USE_SSL
        /* No SSL when disabled or on Unix sockets */
        if (!EnableSSL || port->laddr.addr.ss_family != AF_INET)
            SSLok = 'N';
        else
            SSLok = 'S';        /* Support for SSL */
#else
        SSLok = 'N';            /* No support for SSL */
#endif

so one of the paths that chooses 'N' is being followed.  The test on
AF_INET is broken, since it should allow for IPV6, but otherwise I
don't see what's getting you.

            regards, tom lane

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> reducing the traced conversation to its essence it's just a ssl-request
> to a non-ssl server :-(

Yeah, I had just come to the same conclusion.  Either you're connecting
to the wrong server (check those port numbers?) or there's something
wrong with the bit of postmaster code I just showed you.

            regards, tom lane

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Andreas Pflug
Date:
Tom Lane wrote:

>Hm.  The postmaster is sending back 'N' indicating that it does not want
>to do SSL.
>
>Are you sure you are connecting to an SSL-enabled postmaster?
>
>Also, is your connection by any chance IPV6?  It doesn't look like it
>from the tcpdump, but I'm not sure I know how to tell.
>
Ah, that's it! My code examination just had reached
ProcessStartupPacket, when your mail arrived.
The kernel is IPV6, and incoming IPV4 connection will be handled IPV6; I
can tell this from pg_hba.conf experiments that my installation requires
the ::ffff:192.168.0.0/24 entry.

>The relevant bit of code in the postmaster is
>
>#ifdef USE_SSL
>        /* No SSL when disabled or on Unix sockets */
>        if (!EnableSSL || port->laddr.addr.ss_family != AF_INET)
>            SSLok = 'N';
>        else
>            SSLok = 'S';        /* Support for SSL */
>#else
>        SSLok = 'N';            /* No support for SSL */
>#endif
>

According to the comment, it should be

if (!EnableSSL || port->laddr.addr.ss_family == AF_UNIX)

and after changing this it works! Quite hard to find, probably most
users you'd ask "do you use IPV6" would have answered "no". I didn't opt
vor IPV6 either, it's a stock SuSE 8.1.

Immediately, I checked the reported ssl hangs, and found both cases
working. Dave reported this failing on snake.pgadmin.org, which uses a
different openssl (0.9.7a), I'll see if I can check that version (won't
be before monday, need some sleep now).

Regards,
Andreas

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> According to the comment, it should be
> if (!EnableSSL || port->laddr.addr.ss_family == AF_UNIX)
> and after changing this it works!

Right.  I've committed the fix.  One bug less, now to find the next one
...

> Dave reported this failing on snake.pgadmin.org, which uses a
> different openssl (0.9.7a), I'll see if I can check that version (won't
> be before monday, need some sleep now).

Sounds good, see you then.

            regards, tom lane

Re: PG 7.3.1 with ssl on linux hangs (testcase available)

From
Tom Lane
Date:
Ah-hah, the problem is timing-dependent.  I was able to reproduce it
after I tried connecting from one machine to another.

The patch against 7.3.4 is attached.

            regards, tom lane

Index: fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.85
diff -c -r1.85 fe-misc.c
*** fe-misc.c    24 Oct 2002 23:35:55 -0000    1.85
--- fe-misc.c    4 Aug 2003 17:22:19 -0000
***************
*** 553,559 ****
--- 553,570 ----
       * file is ready. Grumble.    Fortunately, we don't expect this path to
       * be taken much, since in normal practice we should not be trying to
       * read data unless the file selected for reading already.
+      *
+      * In SSL mode it's even worse: SSL_read() could say WANT_READ and then
+      * data could arrive before we make the pqReadReady() test.  So we must
+      * play dumb and assume there is more data, relying on the SSL layer to
+      * detect true EOF.
       */
+
+ #ifdef USE_SSL
+     if (conn->ssl)
+         return 0;
+ #endif
+
      switch (pqReadReady(conn))
      {
          case 0:
Index: fe-secure.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v
retrieving revision 1.15.2.5
diff -c -r1.15.2.5 fe-secure.c
*** fe-secure.c    10 Apr 2003 23:03:13 -0000    1.15.2.5
--- fe-secure.c    4 Aug 2003 17:22:19 -0000
***************
*** 278,298 ****
                                        libpq_gettext("SSL SYSCALL error: %s\n"),
                                        SOCK_STRERROR(SOCK_ERRNO));
                  else
                      printfPQExpBuffer(&conn->errorMessage,
                                        libpq_gettext("SSL SYSCALL error: EOF detected\n"));
                  break;
              case SSL_ERROR_SSL:
                  printfPQExpBuffer(&conn->errorMessage,
                        libpq_gettext("SSL error: %s\n"), SSLerrmessage());
                  /* fall through */
              case SSL_ERROR_ZERO_RETURN:
-                 pqsecure_close(conn);
                  SOCK_ERRNO = ECONNRESET;
                  n = -1;
                  break;
              default:
                  printfPQExpBuffer(&conn->errorMessage,
                                    libpq_gettext("Unknown SSL error code\n"));
                  break;
          }
      }
--- 278,302 ----
                                        libpq_gettext("SSL SYSCALL error: %s\n"),
                                        SOCK_STRERROR(SOCK_ERRNO));
                  else
+                 {
                      printfPQExpBuffer(&conn->errorMessage,
                                        libpq_gettext("SSL SYSCALL error: EOF detected\n"));
+                     SOCK_ERRNO = ECONNRESET;
+                     n = -1;
+                 }
                  break;
              case SSL_ERROR_SSL:
                  printfPQExpBuffer(&conn->errorMessage,
                        libpq_gettext("SSL error: %s\n"), SSLerrmessage());
                  /* fall through */
              case SSL_ERROR_ZERO_RETURN:
                  SOCK_ERRNO = ECONNRESET;
                  n = -1;
                  break;
              default:
                  printfPQExpBuffer(&conn->errorMessage,
                                    libpq_gettext("Unknown SSL error code\n"));
+                 n = -1;
                  break;
          }
      }
***************
*** 334,354 ****
                                        libpq_gettext("SSL SYSCALL error: %s\n"),
                                        SOCK_STRERROR(SOCK_ERRNO));
                  else
                      printfPQExpBuffer(&conn->errorMessage,
                                        libpq_gettext("SSL SYSCALL error: EOF detected\n"));
                  break;
              case SSL_ERROR_SSL:
                  printfPQExpBuffer(&conn->errorMessage,
                        libpq_gettext("SSL error: %s\n"), SSLerrmessage());
                  /* fall through */
              case SSL_ERROR_ZERO_RETURN:
-                 pqsecure_close(conn);
                  SOCK_ERRNO = ECONNRESET;
                  n = -1;
                  break;
              default:
                  printfPQExpBuffer(&conn->errorMessage,
                                    libpq_gettext("Unknown SSL error code\n"));
                  break;
          }
      }
--- 338,362 ----
                                        libpq_gettext("SSL SYSCALL error: %s\n"),
                                        SOCK_STRERROR(SOCK_ERRNO));
                  else
+                 {
                      printfPQExpBuffer(&conn->errorMessage,
                                        libpq_gettext("SSL SYSCALL error: EOF detected\n"));
+                     SOCK_ERRNO = ECONNRESET;
+                     n = -1;
+                 }
                  break;
              case SSL_ERROR_SSL:
                  printfPQExpBuffer(&conn->errorMessage,
                        libpq_gettext("SSL error: %s\n"), SSLerrmessage());
                  /* fall through */
              case SSL_ERROR_ZERO_RETURN:
                  SOCK_ERRNO = ECONNRESET;
                  n = -1;
                  break;
              default:
                  printfPQExpBuffer(&conn->errorMessage,
                                    libpq_gettext("Unknown SSL error code\n"));
+                 n = -1;
                  break;
          }
      }