BUG #2246: Bad malloc interactions: ecpg, openssl - Mailing list pgsql-bugs

From Andy Klosterman
Subject BUG #2246: Bad malloc interactions: ecpg, openssl
Date
Msg-id 20060207201545.CAEC2F0AC7@svr2.postgresql.org
Whole thread Raw
Responses Re: BUG #2246: Bad malloc interactions: ecpg, openssl
Re: BUG #2246: Bad malloc interactions: ecpg, openssl
List pgsql-bugs
The following bug has been logged online:

Bug reference:      2246
Logged by:          Andy Klosterman
Email address:      andrew5@ece.cmu.edu
PostgreSQL version: 8.1.0
Operating system:   Debian testing: Linux nc3 2.4.27-2-386 #1 Wed Nov 30
21:38:51 JST 2005 i686 GNU/Linux
Description:        Bad malloc interactions: ecpg, openssl
Details:

Before going into a full description and figuring out some example code for
this situation, I'm fishing for interesting in tracking it down and fixing
it (or not).

On a program that I (pre-)compile with ecpg and connect to a remote Postgres
instance over an SSL connection (as set up in pg_hba.conf with appropriate
certificates installed) my application prematurely terminates with the
following error:
*** glibc detected *** corrupted double-linked list: 0x0807c830 ***
Abort.

(Without an SSL connection (as set in ph_hba.conf) the program executes just
fine.  This leads me to cast suspicion on SSL libraries.)

The back trace from gdb looks like this (which doesn't appear to be too
informative, but looks like an exception stack):
    #0  0x401bc851 in kill () from /lib/libc.so.6
    #1  0x4014a309 in pthread_kill () from /lib/libpthread.so.0
    #2  0x4014a6c0 in raise () from /lib/libpthread.so.0
    #3  0x401bc606 in raise () from /lib/libc.so.6
    #4  0x401bd971 in abort () from /lib/libc.so.6
    #5  0x401ef930 in __fsetlocking () from /lib/libc.so.6
    #6  0x401f52b9 in malloc_usable_size () from /lib/libc.so.6
    #7  0x401f5395 in malloc_usable_size () from /lib/libc.so.6
    #8  0x401f5a43 in malloc_trim () from /lib/libc.so.6
    #9  0x401f5d51 in free () from /lib/libc.so.6
    #10 0x4052ce6c in zcfree () from /usr/lib/libz.so.1
    #11 0x4052f83f in inflateEnd () from /usr/lib/libz.so.1
    #12 0x4040f262 in COMP_rle () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
    #13 0x0807e680 in ?? ()
    #14 0x00000000 in ?? ()

After a bit of digging around online, I discovered the MALLOC_CHECK_
environment variable and how it changes the behavior of malloc (man 3
malloc).  The above back trace was without MALLOC_CHECK_ in the environment
(e.g., unsetenv MALLOC_CHECK_).

Running with MALLOC_CHECK_ equal to 2 or 1 allows my program to run to
completion.

With MALLOC_CHECK_ set to 0 (which is supposed to ignore corruption), I get
a segfault.  Running inside gdb gets me the following back trace:
    #0  0x403d6f73 in ASN1_template_free ()
       from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #1  0x403d6e0d in ASN1_primitive_free ()
       from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #2  0x403d7023 in ASN1_item_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
    #3  0x403d0c07 in X509_CERT_AUX_free ()
       from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #4  0x403d077a in X509_CINF_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
    #5  0x403d6e35 in ASN1_primitive_free ()
       from /usr/lib/i686/cmov/libcrypto.so.0.9.8
    #6  0x403d7023 in ASN1_item_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
    #7  0x403d0927 in X509_free () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
    #8  0x402d16f3 in pqsecure_destroy () from /usr/lib/libpq.so.4
    #9  0x402c387a in PQconninfoFree () from /usr/lib/libpq.so.4
    #10 0x402c39c3 in PQfinish () from /usr/lib/libpq.so.4
    #11 0x4002f41b in ECPGget_connection () from /usr/lib/libecpg.so.5
    #12 0x40030223 in ECPGdisconnect () from /usr/lib/libecpg.so.5
    #13 0x0804a113 in DBDisconnect (arg_connection=0x8054faf
"client_correctness")
        at client_test.pgcc:215
    #14 0x0804a64e in DoCorrectnessChecks () at client_test.pgcc:278
    #15 0x0804aaa1 in main (argc=7, argv=0xbffffa84) at
client_test.pgcc:523

PURE SPECULATION:  It looks like there is either trouble in the interaction
between Postgres and the SSL library or just a bit of trouble within the SSL
library.
SPECULATION: Another possibility is that I misunderstand some aspect of
multi-threaded interactions with Postgres (I open uniquely named connections
to the DB for each thread of my test program).  Maybe I need to have a
"lock" around the code that makes DB connections and make sure that only one
happens at a time (might be better handled within Postgres/SSL if that is
the case).

PROCEEDING FURTHER: If there is any desire on the part of any developers to
pursue this further, I'm open.  As things stand right now, I have
workarounds:
1. Don't use an SSL connection to the DB.
2. Do a "setenv MALLOC_CHECK_ 1" (or 2) and it works.

pgsql-bugs by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: BUG #2236: extremely slow to get unescaped bytea data
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #2246: Bad malloc interactions: ecpg, openssl