Re: BUG #2246: Bad malloc interactions: ecpg, openssl - Mailing list pgsql-bugs

From Andrew Klosterman
Subject Re: BUG #2246: Bad malloc interactions: ecpg, openssl
Date
Msg-id Pine.LNX.4.53L-ECE.CMU.EDU.0602131538360.18395@blossom.pdl.cmu.edu
Whole thread Raw
In response to Re: BUG #2246: Bad malloc interactions: ecpg, openssl  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #2246: Bad malloc interactions: ecpg, openssl
Re: BUG #2246: Bad malloc interactions: ecpg, openssl
List pgsql-bugs
On Mon, 13 Feb 2006, Tom Lane wrote:

> Andrew Klosterman <andrew5@ece.cmu.edu> writes:
> > I threw in a pthread mutex around the code making the database connections
> > for each of my threads.  The problem is still there ("corrupted
> > double-linked list").
>
> > Even tuning things down and instructing my code to only run a single
> > pthread manifests the problem over an SSL connection.
>
> Hmm.  Based on that, the problem is starting to smell more like a
> garden-variety memory clobber, for instance malloc'ing a chunk smaller
> than the data that's later stuffed into it.  It might be worth running
> the program under something like ElectricFence, which will catch the
> offender on-the-spot rather than only later when corruption of malloc's
> private data structures is detected.
>
> Looking back at your original message, I wonder if it could be the
> combination of ecpg and SSL that triggers it?  I'd have thought that
> libpq/SSL alone would be pretty well wrung out, but ecpg is not so
> widely used.
>
> BTW, you did say this was i386 right?  If it were a 64-bit architecture,
> I'd be about ready to bet money on the wrong-malloc-size-calculation
> theory.
>
> > Tracking down exactly what's tickling the problem in this case could be
> > tricky...
>
> Yeah :-(.  If you aren't able to narrow it further by yourself, please
> try to put together a self-contained test case.
>
>             regards, tom lane

I just did the "electric fence" thing for you and this is what I get in
gdb...

  Electric Fence 2.1 Copyright (C) 1987-1998 Bruce Perens.

ElectricFence Aborting: Allocating 0 bytes, probably a bug.

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 16384 (LWP 24753)]
0x401c3851 in kill () from /lib/libc.so.6
(gdb) bt
#0  0x401c3851 in kill () from /lib/libc.so.6
#1  0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
#2  0x40139823 in memalign () from /usr/lib/libefence.so.0
#3  0x401399ad in malloc () from /usr/lib/libefence.so.0
#4  0x40139a10 in calloc () from /usr/lib/libefence.so.0
#5  0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
#6  0x402c8b3f in ?? () from /usr/lib/libpq.so.4
#7  0x402ded88 in ?? () from /usr/lib/libpq.so.4
#8  0x00000000 in ?? ()

Looks like something fishy going on between libpq and libkrb5.  I'm
especially suspicious since I'm not using kerberos for authentication at
all.

I am developing on i386 (more or less).
# uname -m
i686

--Andrew J. Klosterman
andrew5@ece.cmu.edu

pgsql-bugs by date:

Previous
From: "Evgeny Gridasov"
Date:
Subject: BUG #2257: Can' stop server while autovacuum is running
Next
From: Andrew Klosterman
Date:
Subject: Re: BUG #2246: Bad malloc interactions: ecpg, openssl