On Mon, 13 Feb 2006, Tom Lane wrote:
> Andrew Klosterman <andrew5@ece.cmu.edu> writes:
> > I threw in a pthread mutex around the code making the database connections
> > for each of my threads. The problem is still there ("corrupted
> > double-linked list").
>
> > Even tuning things down and instructing my code to only run a single
> > pthread manifests the problem over an SSL connection.
>
> Hmm. Based on that, the problem is starting to smell more like a
> garden-variety memory clobber, for instance malloc'ing a chunk smaller
> than the data that's later stuffed into it. It might be worth running
> the program under something like ElectricFence, which will catch the
> offender on-the-spot rather than only later when corruption of malloc's
> private data structures is detected.
>
> Looking back at your original message, I wonder if it could be the
> combination of ecpg and SSL that triggers it? I'd have thought that
> libpq/SSL alone would be pretty well wrung out, but ecpg is not so
> widely used.
>
> BTW, you did say this was i386 right? If it were a 64-bit architecture,
> I'd be about ready to bet money on the wrong-malloc-size-calculation
> theory.
>
> > Tracking down exactly what's tickling the problem in this case could be
> > tricky...
>
> Yeah :-(. If you aren't able to narrow it further by yourself, please
> try to put together a self-contained test case.
>
> regards, tom lane
I just did the "electric fence" thing for you and this is what I get in
gdb...
Electric Fence 2.1 Copyright (C) 1987-1998 Bruce Perens.
ElectricFence Aborting: Allocating 0 bytes, probably a bug.
Program received signal SIGILL, Illegal instruction.
[Switching to Thread 16384 (LWP 24753)]
0x401c3851 in kill () from /lib/libc.so.6
(gdb) bt
#0 0x401c3851 in kill () from /lib/libc.so.6
#1 0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
#2 0x40139823 in memalign () from /usr/lib/libefence.so.0
#3 0x401399ad in malloc () from /usr/lib/libefence.so.0
#4 0x40139a10 in calloc () from /usr/lib/libefence.so.0
#5 0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
#6 0x402c8b3f in ?? () from /usr/lib/libpq.so.4
#7 0x402ded88 in ?? () from /usr/lib/libpq.so.4
#8 0x00000000 in ?? ()
Looks like something fishy going on between libpq and libkrb5. I'm
especially suspicious since I'm not using kerberos for authentication at
all.
I am developing on i386 (more or less).
# uname -m
i686
--Andrew J. Klosterman
andrew5@ece.cmu.edu