Thread: More SSL crash woes

More SSL crash woes

From
Jeff Amiel
Date:
<pre><tt><tt><tt><tt>"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3
(csl-sol210-3_4-branch+sol_rpath)"

As the proud author of this previous post:
<a href="http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php" target="_blank"><span class="yshortcuts"
id="lw_1207670543_0">http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php</span></a>

I never found a real answer except to disable SSL on the connections between my master and
subscriber nodes (instead shuttling data over a secure tunnel)

Things have been peachy ever since.

Today....first db crash in quite some time...seemingly unrelated to slony

Stack trace looks eerily familiar:

Core was generated by `/usr/local/pgsql/bin/postgres -D /db'.
Program terminated with signal 11, Segmentation fault.
#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8

*grumble*

I have located nothing unusual occurring at the time of the event....
We have developers that connect from win32 and Fedora boxes via PGAdminIII and they use SSL
connections...but they have ALWAYS connected using SSL.....

Any suggestions?  I really need to try to either provide an explanation or make SOME change to
prevent....

upgrade openssl? (we are on 9.8e)
remove ALL encrypted connection capabilities (via pg_hba.conf) and force connectivity over secure tunnel?
punt?

Looks like this box does not have postgres compiled with --enable-debug....but dunno if it would help anyway being 
that crash occurs in libcrypto....

Any help would be appreciated.</tt></tt></tt></tt></pre>

Re: More SSL crash woes

From
Tom Lane
Date:
Jeff Amiel <jamiel@istreamimaging.com> writes:
> As the proud author of this previous post:
> <a target="_blank"
>  href="http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php"><span
>  class="yshortcuts" id="lw_1207670543_0">http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php</span></a>

(Non-HTML posts are preferred on these lists)

> I never found a real answer except to disable SSL on the connections between my master and
> subscriber nodes (instead shuttling data over a secure tunnel)

The previous thread suggested that you might have a problem with
different bits of code being linked to different versions of libssl.
Did you ever resolve that?  Given the lack of other reports, I'm
pretty suspicious that it's something like that, rather than a real
bug in either slony or PG.

            regards, tom lane

Re: More SSL crash woes

From
Alvaro Herrera
Date:
Jeff Amiel wrote:

> Stack trace looks eerily familiar:
>
> Core was generated by `/usr/local/pgsql/bin/postgres -D /db'.
> Program terminated with signal 11, Segmentation fault.
> #0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8
>
> *grumble*

Did you try installing the OpenSSL with debugging symbols and getting a
better stack trace?  If this is a bug in OpenSSL I'm sure they'd like to
hear it.

Some random browsing shows that sk_value is part of a stack
implementation in OpenSSL which has been there for many years.
http://www.openssl.org/news/changelog.html
http://www.columbia.edu/~ariel/ssleay/stack.html

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: More SSL crash woes

From
Jeff Amiel
Date:
Tom Lane wrote:
> The previous thread suggested that you might have a problem with
> different bits of code being linked to different versions of libssl.
> Did you ever resolve that?  Given the lack of other reports, I'm
> pretty suspicious that it's something like that, rather than a real
> bug in either slony or PG.
>

# ldd /usr/local/pgsql/bin/postgres
        ...
        libssl.so.0.9.8 =>       /usr/local/ssl/lib/libssl.so.0.9.8
        libcrypto.so.0.9.8 =>    /usr/local/ssl/lib/libcrypto.so.0.9.8
# ldd /usr/local/pgsql/bin/slon
        ...
        libssl.so.0.9.8 =>       /usr/local/ssl/lib/libssl.so.0.9.8
        libcrypto.so.0.9.8 =>    /usr/local/ssl/lib/libcrypto.so.0.9.8

Now their are 2 subscriber nodes that connect to this node for slony
replication...
One is running the same version (libssl 0.9.8e) but one is running
0.9.7e-p1 2.
could this be an issue?

so let's ask what is different between my config and the rest of the
world....

The stack trace actually was one more level deep and the reference to
'output_cert_chain' got me thinking....
#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8
#1  0xfef5b05b in ssl3_output_cert_chain () from
/usr/local/ssl/lib/libssl.so.0.9.8
#2  0x00000000 in ?? ()

Is it unique that I use SSL for encryption but not for authentication?
I have no root.crt (and see the warning in my logs about   "could not
load root certificate file "root.crt": No such file or directory.  Will
not verify client certificates.")
Is this unusual?  Do other people use SSL with postgres JUST for encryption?

Is there something wrong with the way we build/install libssl?
We currently do a pkgadd of the binary from sunfreeware:

/usr/sfw/bin/wget
ftp://ftp.sunfreeware.com/pub/freeware/intel/10/openssl-0.9.8e-sol10-x86-local.gz
gzip -d openssl-0.9.8e-sol10-x86-local.gz
pkgadd -d openssl-0.9.8e-sol10-x86-local

I went back an researched the nearly identical problems we were having
under FreeBSD and the stack trace (using a slightly different/older
version of libssl) looks like a different spot:

(gdb) bt
#0  0x2838e492 in SHA1_Init () from /lib/libcrypto.so.3
#1  0x2838a14a in X509_check_private_key () from /lib/libcrypto.so.3
#2  0x2838a459 in EVP_DigestInit_ex () from /lib/libcrypto.so.3

Any other thoughts?















Re: More SSL crash woes

From
Jeff Amiel
Date:
Jeff Amiel wrote:
>
>
> Now their are 2 subscriber nodes that connect to this node for slony
> replication...
> One is running the same version (libssl 0.9.8e) but one is running
> 0.9.7e-p1 2.
> could this be an issue?
Note that both nodes are set to 'hostnossl' in the pg_hba.conf


Re: More SSL crash woes

From
Tom Lane
Date:
Jeff Amiel <jamiel@istreamimaging.com> writes:
> Now their are 2 subscriber nodes that connect to this node for slony
> replication...
> One is running the same version (libssl 0.9.8e) but one is running
> 0.9.7e-p1 2.
> could this be an issue?

Seems unlikely, that would mean that openssl failed to preserve a
compatible on-the-wire protocol across versions ...

> Is there something wrong with the way we build/install libssl?

That's what I'm suspecting at this point, but I've got too little
acquaintance with the Solaris world to guess what it is.

One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is
it possible that when you compile PG it is finding the header files
for some other version?

            regards, tom lane

Re: More SSL crash woes

From
Jeff Amiel
Date:
Tom Lane wrote:
> One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is
> it possible that when you compile PG it is finding the header files
> for some other version?
>

yes...if I could figure out how the include path is being set on the
postgresql build.
I'm looking at the config.log and I see no reference to -I (to set the
include path)
It simply references the header files as "openssl/ssl.h"

Any way to tell the default include path for gcc?
There are two sets:

/usr/sfw/include/openssl/ssl.h (older incorrect one)
/usr/local/ssl/include/openssl/ssl.h (newer 'correct one)

I guess I could build something that #includes openssl/ssl.h and
'butcher' the bad one and see what happens.






Re: More SSL crash woes

From
Jeff Amiel
Date:
Tom Lane wrote:
>> One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is
>> it possible that when you compile PG it is finding the header files
>> for some other version?
>>
>
Sure
 enough...I put a #ERROR at the top of the 'old/incorrect' ssl..h and
did a make clean/make and errored out.
So I was building with 0.9.8 libraries...but 0.9.7 header files.

That can't be good.
I guess hat would explain why nobody else on the planet has seen this
issue....  :)


thanks much for the assist!




Re: More SSL crash woes

From
Tom Lane
Date:
Jeff Amiel <jamiel@istreamimaging.com> writes:
> Sure enough...I put a #ERROR at the top of the 'old/incorrect' ssl..h and
> did a make clean/make and errored out.
> So I was building with 0.9.8 libraries...but 0.9.7 header files.

Fascinating.  I read your previous mail and was about to reply that
/usr/local/include is normally first on gcc's default search path.
However, I believe it's possible to configure it differently when
gcc is built, and you must be working with such a build.

(No, I don't remember how to tell what the default path really is.  But
the gcc manual might tell you.)

            regards, tom lane