Thread: pgcrypto related backend crash on solaris 10/x86_64

pgcrypto related backend crash on solaris 10/x86_64

From
Stefan Kaltenbrunner
Date:
I brought back clownfish(still a bit dubious about the unexplained
failures which seem vmware emulation bugs but this one seems to be
easily reproduceable) onto the buildfarm and enabled --with-openssl
after the the recent openssl/pgcrypto related fixes but I'm still
getting a backend crash during the pgcrypto regression tests:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfish&dt=2007-09-09%2012:14:50



backtrace looks like:

program terminated by signal SEGV (no mapping at the fault address)
0xfffffd7fff241b61: AES_encrypt+0x0241: xorq     (%r15,%rdx,8),%rbx
(dbx) where
=>[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49,
0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfffffd7fff241b61 [2] 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0


Stefan


Re: pgcrypto related backend crash on solaris 10/x86_64

From
"Marko Kreen"
Date:
On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
> I brought back clownfish(still a bit dubious about the unexplained
> failures which seem vmware emulation bugs but this one seems to be
> easily reproduceable) onto the buildfarm and enabled --with-openssl
> after the the recent openssl/pgcrypto related fixes but I'm still
> getting a backend crash during the pgcrypto regression tests:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfish&dt=2007-09-09%2012:14:50
>
>
>
> backtrace looks like:
>
> program terminated by signal SEGV (no mapping at the fault address)
> 0xfffffd7fff241b61: AES_encrypt+0x0241: xorq     (%r15,%rdx,8),%rbx
> (dbx) where
> =>[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49,
> 0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfffffd7fff241b61
>   [2] 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0

This is crashing because of the crippled OpenSSL on some version
of Solaris.  Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

I'll try to post v03 of Zdenek's patch ASAP.

-- 
marko


Re: pgcrypto related backend crash on solaris 10/x86_64

From
Tom Lane
Date:
"Marko Kreen" <markokr@gmail.com> writes:
> On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
>> I brought back clownfish(still a bit dubious about the unexplained
>> failures which seem vmware emulation bugs but this one seems to be
>> easily reproduceable) onto the buildfarm and enabled --with-openssl
>> after the the recent openssl/pgcrypto related fixes but I'm still
>> getting a backend crash during the pgcrypto regression tests:

> This is crashing because of the crippled OpenSSL on some version
> of Solaris.  Zdenek Kotala posted a workaround for that, I am
> cleaning it but have not found the time to finalize it.

But clownfish was working fine up through Aug 2, and the only change in
pgcrypto since then could hardly have introduced this failure:
http://archives.postgresql.org/pgsql-committers/2007-08/msg00306.php

So I think there's more to it than Marko's explanation.  Maybe clownfish
now has a different OpenSSL version installed than before?
        regards, tom lane


Re: pgcrypto related backend crash on solaris 10/x86_64

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
>> On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
>>> I brought back clownfish(still a bit dubious about the unexplained
>>> failures which seem vmware emulation bugs but this one seems to be
>>> easily reproduceable) onto the buildfarm and enabled --with-openssl
>>> after the the recent openssl/pgcrypto related fixes but I'm still
>>> getting a backend crash during the pgcrypto regression tests:
> 
>> This is crashing because of the crippled OpenSSL on some version
>> of Solaris.  Zdenek Kotala posted a workaround for that, I am
>> cleaning it but have not found the time to finalize it.
> 
> But clownfish was working fine up through Aug 2, and the only change in
> pgcrypto since then could hardly have introduced this failure:
> http://archives.postgresql.org/pgsql-committers/2007-08/msg00306.php
> 
> So I think there's more to it than Marko's explanation.  Maybe clownfish
> now has a different OpenSSL version installed than before?

no clownfish was not building with openssl before because of that
"crippled openssl" issue - I was under the assumption that the above
commit was actually incorporating the complete fix from zdenek so I
added it back again only to find that it is still not working ...


Stefan


Re: pgcrypto related backend crash on solaris 10/x86_64

From
Zdenek Kotala
Date:
Marko Kreen wrote:
> On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
>> I brought back clownfish(still a bit dubious about the unexplained
>> failures which seem vmware emulation bugs but this one seems to be
>> easily reproduceable) onto the buildfarm and enabled --with-openssl
>> after the the recent openssl/pgcrypto related fixes but I'm still
>> getting a backend crash during the pgcrypto regression tests:
>>
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfish&dt=2007-09-09%2012:14:50
>>
>>
>>
>> backtrace looks like:
>>
>> program terminated by signal SEGV (no mapping at the fault address)
>> 0xfffffd7fff241b61: AES_encrypt+0x0241: xorq     (%r15,%rdx,8),%rbx
>> (dbx) where
>> =>[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49,
>> 0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfffffd7fff241b61
>>   [2] 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0
> 
> This is crashing because of the crippled OpenSSL on some version
> of Solaris.  Zdenek Kotala posted a workaround for that, I am
> cleaning it but have not found the time to finalize it.
> 
> I'll try to post v03 of Zdenek's patch ASAP.
> 

However, I guess there still will be a problem with regression tests, 
because pg_crypto will reports error in case when user tries to use 
stronger cipher, but it generates diff between expected and real output.

I don't know if is possible select different output based on test if 
strong crypto is installed or not. Maybe some magic in 
Makefile/Configure. Test should be:

# ldd /usr/postgres/8.2/lib/pgcrypto.so  | grep libcrypto_extra
#   libcrypto_extra.so.0.9.8 =>      (file not found)

if output contains (file not found) library is not installed or not in 
path (/usr/sfw/lib).

    Zdenek


Re: pgcrypto related backend crash on solaris 10/x86_64

From
"Marko Kreen"
Date:
On 9/11/07, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
> Marko Kreen wrote:
> > This is crashing because of the crippled OpenSSL on some version
> > of Solaris.  Zdenek Kotala posted a workaround for that, I am
> > cleaning it but have not found the time to finalize it.
> >
> > I'll try to post v03 of Zdenek's patch ASAP.

> However, I guess there still will be a problem with regression tests,
> because pg_crypto will reports error in case when user tries to use
> stronger cipher, but it generates diff between expected and real output.
>
> I don't know if is possible select different output based on test if
> strong crypto is installed or not. Maybe some magic in
> Makefile/Configure. Test should be:
>
> # ldd /usr/postgres/8.2/lib/pgcrypto.so  | grep libcrypto_extra
> #   libcrypto_extra.so.0.9.8 =>      (file not found)
>
> if output contains (file not found) library is not installed or not in
> path (/usr/sfw/lib).

Failing regression tests are fine - it is good if user can
easily see that the os is broken.

-- 
marko


Re: pgcrypto related backend crash on solaris 10/x86_64

From
Zdenek Kotala
Date:
Marko Kreen wrote:
> On 9/11/07, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
>> Marko Kreen wrote:
>>> This is crashing because of the crippled OpenSSL on some version
>>> of Solaris.  Zdenek Kotala posted a workaround for that, I am
>>> cleaning it but have not found the time to finalize it.
>>>
>>> I'll try to post v03 of Zdenek's patch ASAP.
> 
>> However, I guess there still will be a problem with regression tests,
>> because pg_crypto will reports error in case when user tries to use
>> stronger cipher, but it generates diff between expected and real output.
>>
>> I don't know if is possible select different output based on test if
>> strong crypto is installed or not. Maybe some magic in
>> Makefile/Configure. Test should be:
>>
>> # ldd /usr/postgres/8.2/lib/pgcrypto.so  | grep libcrypto_extra
>> #   libcrypto_extra.so.0.9.8 =>      (file not found)
>>
>> if output contains (file not found) library is not installed or not in
>> path (/usr/sfw/lib).
> 
> Failing regression tests are fine - it is good if user can
> easily see that the os is broken.
> 

But if build machine  still complain about problem we can easily 
overlook another problems. There are two possible solution 1) modify reg 
test or 2) recommend to install crypto package on all affected build 
machine.

Anyway I plan to add some mention into solaris FAQ when we will have 
final patch. I also think It should be good to mention in pg_crypto 
README or add comment into regression test expected output file which 
will be visible in regression.diff.

    Zdenek



Re: pgcrypto related backend crash on solaris 10/x86_64

From
Stefan Kaltenbrunner
Date:
Zdenek Kotala wrote:
> Marko Kreen wrote:
>> On 9/11/07, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
>>> Marko Kreen wrote:
>>>> This is crashing because of the crippled OpenSSL on some version
>>>> of Solaris.  Zdenek Kotala posted a workaround for that, I am
>>>> cleaning it but have not found the time to finalize it.
>>>>
>>>> I'll try to post v03 of Zdenek's patch ASAP.
>>
>>> However, I guess there still will be a problem with regression tests,
>>> because pg_crypto will reports error in case when user tries to use
>>> stronger cipher, but it generates diff between expected and real output.
>>>
>>> I don't know if is possible select different output based on test if
>>> strong crypto is installed or not. Maybe some magic in
>>> Makefile/Configure. Test should be:
>>>
>>> # ldd /usr/postgres/8.2/lib/pgcrypto.so  | grep libcrypto_extra
>>> #   libcrypto_extra.so.0.9.8 =>      (file not found)
>>>
>>> if output contains (file not found) library is not installed or not in
>>> path (/usr/sfw/lib).
>>
>> Failing regression tests are fine - it is good if user can
>> easily see that the os is broken.
>>
> 
> But if build machine  still complain about problem we can easily 
> overlook another problems. There are two possible solution 1) modify reg 
> test or 2) recommend to install crypto package on all affected build 
> machine.
> 
> Anyway I plan to add some mention into solaris FAQ when we will have 
> final patch. I also think It should be good to mention in pg_crypto 
> README or add comment into regression test expected output file which 
> will be visible in regression.diff.

well in my opinion we should simply fail regression(not crash like we do 
now) in case we have to deal with such a crippled openssl installation. 
Adding information about that issue to the Solaris FAQ seems also like a 
good thing.


Stefan