Thread: Bug in ecpg lib ?

Bug in ecpg lib ?

From
leif@crysberg.dk
Date:
Hi guys,

   I'm using PostgreSQL in a server project that uses many forks and many threads in each forked process.

   Almost everytime I do a pthread_cancel() I get a SIGSEGV. I have then linked the libmudflapth into my program to
catchthe problem sooner and now that reports either 'invalid pointer' or 'double free or corruption' when a thread is
cancelled.Typically I have 2 database connection opened before any of the threads are created. I am pretty sure that
I'monly using 1 connection in any 1 thread, i.e. only 2 of the threads are doing database access and using each their
allocatedconnection. 

   After the main thread has done a pthread_cancel() I get a "mudflapth dump" with the following trace back (the abort
comesfrom the mudflapth lib when detecting the bad pointer): 

#0  0xffffe405 in __kernel_vsyscall ()
#1  0xf7ca2335 in raise () from /lib32/libc.so.6
#2  0xf7ca3cb1 in abort () from /lib32/libc.so.6
#3  0xf7cdb6ec in ?? () from /lib32/libc.so.6
#4  0xf7ce71ab in free () from /lib32/libc.so.6
#5  0xf7dec061 in free (buf=0x87ed138) at ../../../libmudflap/mf-hooks1.c:241
#6  0xf7ef2b5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6
#7  0xf7dcebb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
#8  0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
#9  0xf7d5008e in clone () from /lib32/libc.so.6

   Looking in the ecpg_sqlca_key_destructor(), it seems to me that the sqlca can be deallocated several times !? (I'm
nottoo much into the Postgres code including ecpg, so that is a novice point of view.) 

   I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with exactly the same result and and on many different Linux
systems,mainly Slackware 10.2 and Ubuntu 7. I have on all systems configured and compiled Postgres with this configure
line:

./configure --prefix=/usr/local/Packages/pgsql-8.3.5 --with-openssl --enable-thread-safety

   Please help,

 Leif

Re: Bug in ecpg lib ?

From
"Albe Laurenz"
Date:
leif@crysberg.dk wrote:
>    I'm using PostgreSQL in a server project that uses many 
> forks and many threads in each forked process.
> 
>    Almost everytime I do a pthread_cancel() I get a SIGSEGV. 
> I have then linked the libmudflapth into my program to catch 
> the problem sooner and now that reports either 'invalid 
> pointer' or 'double free or corruption' when a thread is 
> cancelled. Typically I have 2 database connection opened 
> before any of the threads are created. I am pretty sure that 
> I'm only using 1 connection in any 1 thread, i.e. only 2 of 
> the threads are doing database access and using each their 
> allocated connection.
> 
>    After the main thread has done a pthread_cancel() I get a 
> "mudflapth dump" with the following trace back (the abort 
> comes from the mudflapth lib when detecting the bad pointer):
> 
> #0  0xffffe405 in __kernel_vsyscall ()
> #1  0xf7ca2335 in raise () from /lib32/libc.so.6
> #2  0xf7ca3cb1 in abort () from /lib32/libc.so.6
> #3  0xf7cdb6ec in ?? () from /lib32/libc.so.6
> #4  0xf7ce71ab in free () from /lib32/libc.so.6
> #5  0xf7dec061 in free (buf=0x87ed138) at ../../../libmudflap/mf-hooks1.c:241
> #6  0xf7ef2b5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6
> #7  0xf7dcebb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
> #8  0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
> #9  0xf7d5008e in clone () from /lib32/libc.so.6
> 
>    Looking in the ecpg_sqlca_key_destructor(), it seems to me 
> that the sqlca can be deallocated several times !? (I'm not 
> too much into the Postgres code including ecpg, so that is a 
> novice point of view.)
> 
>    I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with 
> exactly the same result and and on many different Linux 
> systems, mainly Slackware 10.2 and Ubuntu 7. I have on all 
> systems configured and compiled Postgres with this configure line:
> 
> ./configure --prefix=/usr/local/Packages/pgsql-8.3.5 
> --with-openssl --enable-thread-safety

Could you create a small sample program that reproduces the bug?

That would make it easier for me or somebody else to do something about it.

Yours,
Laurenz Albe

Re: Bug in ecpg lib ?

From
leif@crysberg.dk
Date:
Hi Laurenz,

   Thanks for the suggestion. It sure wasn't easy, but I should have done that right away. It turned out not to be in
theecpg module, but somewhere in my own code (of course ;-) ). At least I haven't been able to reproduce it in a simple
exampleand I haven't figured out where in my own code yet either. 

 Leif


----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

> leif@crysberg.dk wrote:
> >    I'm using PostgreSQL in a server project that uses many
> > forks and many threads in each forked process.
> >
> >    Almost everytime I do a pthread_cancel() I get a SIGSEGV.
> > I have then linked the libmudflapth into my program to catch
> > the problem sooner and now that reports either 'invalid
> > pointer' or 'double free or corruption' when a thread is
> > cancelled. Typically I have 2 database connection opened
> > before any of the threads are created. I am pretty sure that
> > I'm only using 1 connection in any 1 thread, i.e. only 2 of
> > the threads are doing database access and using each their
> > allocated connection.
> >
> >    After the main thread has done a pthread_cancel() I get a
> > "mudflapth dump" with the following trace back (the abort
> > comes from the mudflapth lib when detecting the bad pointer):
> >
> > #0  0xffffe405 in __kernel_vsyscall ()
> > #1  0xf7ca2335 in raise () from /lib32/libc.so.6
> > #2  0xf7ca3cb1 in abort () from /lib32/libc.so.6
> > #3  0xf7cdb6ec in ?? () from /lib32/libc.so.6
> > #4  0xf7ce71ab in free () from /lib32/libc.so.6
> > #5  0xf7dec061 in free (buf=0x87ed138) at
> ../../../libmudflap/mf-hooks1.c:241
> > #6  0xf7ef2b5c in ecpg_sqlca_key_destructor () from
> /lib32/libecpg.so.6
> > #7  0xf7dcebb0 in __nptl_deallocate_tsd () from
> /lib32/libpthread.so.0
> > #8  0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
> > #9  0xf7d5008e in clone () from /lib32/libc.so.6
> >
> >    Looking in the ecpg_sqlca_key_destructor(), it seems to me
> > that the sqlca can be deallocated several times !? (I'm not
> > too much into the Postgres code including ecpg, so that is a
> > novice point of view.)
> >
> >    I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with
> > exactly the same result and and on many different Linux
> > systems, mainly Slackware 10.2 and Ubuntu 7. I have on all
> > systems configured and compiled Postgres with this configure line:
> >
> > ./configure --prefix=/usr/local/Packages/pgsql-8.3.5
> > --with-openssl --enable-thread-safety
>
> Could you create a small sample program that reproduces the bug?
>
> That would make it easier for me or somebody else to do something
> about it.
>
> Yours,
> Laurenz Albe

Re: Bug in ecpg lib ?

From
leif@crysberg.dk
Date:
Hi Laurenz,

   I have now generate a rather small example where I experience the problem, attached. It is linked with the mudflapth
libraryusing the commands below. You may have to change the DBNAME and DBUSER. The delay just before the
pthread_cancel(),i.e. sleep(10), is rather critical for the problem to appear and you might have to change it to
somethingless. On some very slow machines I wasn't able to produce the problem. 

$ ecpg crashex.pgc
$ /usr/local/Packages/gcc-4.4.0/bin/gcc -O0 -c  -fmudflap -fmudflapth -fomit-frame-pointer
-B/usr/local/Packages/gcc-4.4.0/bin/-Wwrite-strings -std=gnu89 -ggdb -fPIC -Wall -I/usr/local/Packages/pgsql/include
-I./Modules-I./ -o crashex.o crashex.c 
$ /usr/local/Packages/gcc-4.4.0/bin/gcc -O0 -B/usr/local/Packages/gcc-4.4.0/bin/ -Wl -o crashex crashex.o
-L/usr/local/Packages/pgsql/lib-lecpg -lpq -lmudflapth -lpthread -ldl 

   And this is the output from running the program:

leif$ LD_LIBRARY_PATH=/usr/local/Packages/gcc-4.4.0/lib/ ./crashex
Couldn't open somename@localhost:5432
2+2=0.
*** glibc detected *** /home/leif/tmp/crashex: free(): invalid pointer: 0x081f3958 ***
======= Backtrace: =========
/lib32/libc.so.6[0xf7c30615]
/lib32/libc.so.6(cfree+0x90)[0xf7c34080]
/usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0(__real_free+0x3f1)[0xf7d39061]
/lib32/libecpg.so.6[0xf7e3fb5c]
/lib32/libpthread.so.0[0xf7d1bbb0]
/lib32/libpthread.so.0[0xf7d1c509]
/lib32/libc.so.6(clone+0x5e)[0xf7c9d08e]
======= Memory map: ========
08048000-0804a000 r-xp 00000000 08:0a 1671173                            /home/leif/tmp/crashex
0804a000-0804b000 rwxp 00001000 08:0a 1671173                            /home/leif/tmp/crashex
0804b000-081f8000 rwxp 0804b000 00:00 0                                  [heap]
f71eb000-f71ec000 ---p f71eb000 00:00 0
f71ec000-f79ec000 rwxp f71ec000 00:00 0
f79ec000-f79f5000 r-xp 00000000 08:01 97934                              /lib32/libnss_files-2.7.so
f79f5000-f79f7000 rwxp 00008000 08:01 97934                              /lib32/libnss_files-2.7.so
f79f7000-f79ff000 r-xp 00000000 08:01 97936                              /lib32/libnss_nis-2.7.so
f79ff000-f7a01000 rwxp 00007000 08:01 97936                              /lib32/libnss_nis-2.7.so
f7a01000-f7a15000 r-xp 00000000 08:01 97931                              /lib32/libnsl-2.7.so
f7a15000-f7a17000 rwxp 00013000 08:01 97931                              /lib32/libnsl-2.7.so
f7a17000-f7a19000 rwxp f7a17000 00:00 0
f7a19000-f7a20000 r-xp 00000000 08:01 97932                              /lib32/libnss_compat-2.7.so
f7a20000-f7a22000 rwxp 00006000 08:01 97932                              /lib32/libnss_compat-2.7.so
f7a22000-f7a2c000 r-xp 00000000 08:08 227374                             /usr/lib32/libgcc_s.so.1
f7a2c000-f7a2d000 rwxp 0000a000 08:08 227374                             /usr/lib32/libgcc_s.so.1
f7a2d000-f7a2f000 rwxp f7a2d000 00:00 0
f7a2f000-f7a38000 r-xp 00000000 08:01 97927                              /lib32/libcrypt-2.7.so
f7a38000-f7a3a000 rwxp 00008000 08:01 97927                              /lib32/libcrypt-2.7.so
f7a3a000-f7a61000 rwxp f7a3a000 00:00 0
f7a61000-f7b4a000 r-xp 00000000 08:01 98015                              /lib32/libcrypto.so.0.9.7
f7b4a000-f7b5c000 rwxp 000e8000 08:01 98015                              /lib32/libcrypto.so.0.9.7
f7b5c000-f7b5f000 rwxp f7b5c000 00:00 0
f7b5f000-f7b8d000 r-xp 00000000 08:01 98021                              /lib32/libssl.so.0.9.7
f7b8d000-f7b90000 rwxp 0002d000 08:01 98021                              /lib32/libssl.so.0.9.7
f7b90000-f7bb3000 r-xp 00000000 08:01 97929                              /lib32/libm-2.7.so
f7bb3000-f7bb5000 rwxp 00023000 08:01 97929                              /lib32/libm-2.7.so
f7bb5000-f7bb6000 rwxp f7bb5000 00:00 0
f7bb6000-f7bc2000 r-xp 00000000 08:01 97971                              /lib32/libpgtypes.so.3.0
f7bc2000-f7bc4000 rwxp 0000c000 08:01 97971                              /lib32/libpgtypes.so.3.0
f7bc4000-f7d0d000 r-xp 00000000 08:01 97925                              /lib32/libc-2.7.so
f7d0d000-f7d0e000 r-xp 00149000 08:01 97925                              /lib32/libc-2.7.so
f7d0e000-f7d10000 rwxp 0014a000 08:01 97925                              /lib32/libc-2.7.so
f7d10000-f7d13000 rwxp f7d10000 00:00 0
f7d13000-f7d15000 r-xp 00000000 08:01 97928                              /lib32/libdl-2.7.so
f7d15000-f7d17000 rwxp 00001000 08:01 97928                              /lib32/libdl-2.7.so
f7d17000-f7d2b000 r-xp 00000000 08:01 97939                              /lib32/libpthread-2.7.so
f7d2b000-f7d2d000 rwxp 00013000 08:01 97939                              /lib32/libpthread-2.7.so
f7d2d000-f7d2f000 rwxp f7d2d000 00:00 0
f7d2f000-f7d49000 r-xp 00000000 08:09 934128
/usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0.0.0
f7d49000-f7d4c000 rwxp 0001a000 08:09 934128
/usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0.0.0
f7d4c000-f7e1b000 rwxp f7d4c000 00:00 0
f7e1b000-f7e35000 r-xp 00000000 08:01 98012                              /lib32/libpq.so.5.0
f7e35000-f7e36000 rwxp 0001a000 08:01 98012                              /lib32/libpq.so.5.0
f7e36000-f7e44000 r-xp 00000000 08:01 97969                              /lib32/libecpg.so.6.0
f7e44000-f7f05000 rwxp 0000d000 08:01 97969                              /lib32/libecpg.so.6.0
f7f18000-f7f1b000 rwxp f7f18000 00:00 0
f7f1b000-f7f38000 r-xp 00000000 08:01 97922                              /lib32/ld-2.7.so
f7f38000-f7f3a000 rwxp 0001c000 08:01 97922                              /lib32/ld-2.7.so
fff3f000-fff59000 rwxp 7ffffffe5000 00:00 0                              [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0                                  [vdso]
Aborted (core dumped)


leif$ gdb ~/tmp/crashex core.30920
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib32/libecpg.so.6...done.
Loaded symbols for /lib32/libecpg.so.6
Reading symbols from /lib32/libpq.so.5...done.
Loaded symbols for /lib32/libpq.so.5
Reading symbols from /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0...done.
Loaded symbols for /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0
Reading symbols from /lib32/libpthread.so.0...done.
Loaded symbols for /lib32/libpthread.so.0
Reading symbols from /lib32/libdl.so.2...done.
Loaded symbols for /lib32/libdl.so.2
Reading symbols from /lib32/libc.so.6...done.
Loaded symbols for /lib32/libc.so.6
Reading symbols from /lib32/libpgtypes.so.3...done.
Loaded symbols for /lib32/libpgtypes.so.3
Reading symbols from /lib32/libm.so.6...done.
Loaded symbols for /lib32/libm.so.6
Reading symbols from /lib32/libssl.so.0...done.
Loaded symbols for /lib32/libssl.so.0
Reading symbols from /lib32/libcrypto.so.0...done.
Loaded symbols for /lib32/libcrypto.so.0
Reading symbols from /lib32/libcrypt.so.1...done.
Loaded symbols for /lib32/libcrypt.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib32/libgcc_s.so.1...done.
Loaded symbols for /usr/lib32/libgcc_s.so.1
Reading symbols from /lib32/libnss_compat.so.2...done.
Loaded symbols for /lib32/libnss_compat.so.2
Reading symbols from /lib32/libnsl.so.1...done.
Loaded symbols for /lib32/libnsl.so.1
Reading symbols from /lib32/libnss_nis.so.2...done.
Loaded symbols for /lib32/libnss_nis.so.2
Reading symbols from /lib32/libnss_files.so.2...done.
Loaded symbols for /lib32/libnss_files.so.2

warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b4
Program terminated with signal 6, Aborted.
[New process 30922]
[New process 30920]
#0  0xffffe405 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe405 in __kernel_vsyscall ()
#1  0xf7bef335 in raise () from /lib32/libc.so.6
#2  0xf7bf0cb1 in abort () from /lib32/libc.so.6
#3  0xf7c286ec in ?? () from /lib32/libc.so.6
#4  0xf7c30615 in ?? () from /lib32/libc.so.6
#5  0xf7c34080 in free () from /lib32/libc.so.6
#6  0xf7d39061 in free (buf=0x81f3958) at ../../../libmudflap/mf-hooks1.c:241
#7  0xf7e3fb5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6
#8  0xf7d1bbb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
#9  0xf7d1c509 in start_thread () from /lib32/libpthread.so.0
#10 0xf7c9d08e in clone () from /lib32/libc.so.6
(gdb)

   As you might have noticed, this particular run is on a 64bit architecture (Ubuntu 8.04) and the crashex program is
generatedon a 32bit machine with gcc-4.4.0. I have tried with PostgreSQL version 8.3.5 and 8.3.7. All give the same
result,though the specific program addresses of course might differ from system to system. 

   Please help,

 Leif


----- leif@crysberg.dk wrote:

> Hi Laurenz,
>
>    Thanks for the suggestion. It sure wasn't easy, but I should have
> done that right away. It turned out not to be in the ecpg module, but
> somewhere in my own code (of course ;-) ). At least I haven't been
> able to reproduce it in a simple example and I haven't figured out
> where in my own code yet either.
>
>  Leif
>
>
> ----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
>
> > leif@crysberg.dk wrote:
> > >    I'm using PostgreSQL in a server project that uses many
> > > forks and many threads in each forked process.
> > >
> > >    Almost everytime I do a pthread_cancel() I get a SIGSEGV.
> > > I have then linked the libmudflapth into my program to catch
> > > the problem sooner and now that reports either 'invalid
> > > pointer' or 'double free or corruption' when a thread is
> > > cancelled. Typically I have 2 database connection opened
> > > before any of the threads are created. I am pretty sure that
> > > I'm only using 1 connection in any 1 thread, i.e. only 2 of
> > > the threads are doing database access and using each their
> > > allocated connection.
> > >
> > >    After the main thread has done a pthread_cancel() I get a
> > > "mudflapth dump" with the following trace back (the abort
> > > comes from the mudflapth lib when detecting the bad pointer):
> > >
> > > #0  0xffffe405 in __kernel_vsyscall ()
> > > #1  0xf7ca2335 in raise () from /lib32/libc.so.6
> > > #2  0xf7ca3cb1 in abort () from /lib32/libc.so.6
> > > #3  0xf7cdb6ec in ?? () from /lib32/libc.so.6
> > > #4  0xf7ce71ab in free () from /lib32/libc.so.6
> > > #5  0xf7dec061 in free (buf=0x87ed138) at
> > ../../../libmudflap/mf-hooks1.c:241
> > > #6  0xf7ef2b5c in ecpg_sqlca_key_destructor () from
> > /lib32/libecpg.so.6
> > > #7  0xf7dcebb0 in __nptl_deallocate_tsd () from
> > /lib32/libpthread.so.0
> > > #8  0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
> > > #9  0xf7d5008e in clone () from /lib32/libc.so.6
> > >
> > >    Looking in the ecpg_sqlca_key_destructor(), it seems to me
> > > that the sqlca can be deallocated several times !? (I'm not
> > > too much into the Postgres code including ecpg, so that is a
> > > novice point of view.)
> > >
> > >    I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with
> > > exactly the same result and and on many different Linux
> > > systems, mainly Slackware 10.2 and Ubuntu 7. I have on all
> > > systems configured and compiled Postgres with this configure line:
> > >
> > > ./configure --prefix=/usr/local/Packages/pgsql-8.3.5
> > > --with-openssl --enable-thread-safety
> >
> > Could you create a small sample program that reproduces the bug?
> >
> > That would make it easier for me or somebody else to do something
> > about it.
> >
> > Yours,
> > Laurenz Albe

Attachment

Re: Bug in ecpg lib ?

From
"Albe Laurenz"
Date:
leif@crysberg.dk wrote:
>    I have now generate a rather small example where I 
> experience the problem, attached. It is linked with the 
> mudflapth library using the commands below. You may have to 
> change the DBNAME and DBUSER. The delay just before the 
> pthread_cancel(), i.e. sleep(10), is rather critical for the 
> problem to appear and you might have to change it to 
> something less. On some very slow machines I wasn't able to 
> produce the problem.
> 
[...]
> 
>    And this is the output from running the program:
> 
> leif$ LD_LIBRARY_PATH=/usr/local/Packages/gcc-4.4.0/lib/ ./crashex
> Couldn't open somename@localhost:5432
> 2+2=0.
> *** glibc detected *** /home/leif/tmp/crashex: free(): 
> invalid pointer: 0x081f3958 ***
[...]
> Aborted (core dumped)
> 
> 
> leif$ gdb ~/tmp/crashex core.30920 
[...]
> #0  0xffffe405 in __kernel_vsyscall ()
> (gdb) bt
> #0  0xffffe405 in __kernel_vsyscall ()
> #1  0xf7bef335 in raise () from /lib32/libc.so.6
> #2  0xf7bf0cb1 in abort () from /lib32/libc.so.6
> #3  0xf7c286ec in ?? () from /lib32/libc.so.6
> #4  0xf7c30615 in ?? () from /lib32/libc.so.6
> #5  0xf7c34080 in free () from /lib32/libc.so.6
> #6  0xf7d39061 in free (buf=0x81f3958) at 
> ../../../libmudflap/mf-hooks1.c:241
> #7  0xf7e3fb5c in ecpg_sqlca_key_destructor () from 
> /lib32/libecpg.so.6
> #8  0xf7d1bbb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
> #9  0xf7d1c509 in start_thread () from /lib32/libpthread.so.0
> #10 0xf7c9d08e in clone () from /lib32/libc.so.6
> (gdb) 

I ran your sample with gdb against PostgreSQL 8.4, and
ecpg_sqlca_key_destructor() was called only once, for a valid pointer,
one that was previously allocated with malloc().
Could you check if ecpg_sqlca_key_destructor() is called more than once if
you run the sample?

Are you aware that in your sample run the connection attempt failed?
It does not matter, ecpg should do the right thing anyway.

What I notice about your program is that you connect to the database
in the main thread, then start a new thread and use the connection in that
new thread.

I don't know, but I'd expect that since ecpg keeps a thread-specific
sqlca, this could cause problems. Indeed I find with the debugger that in
your sample sqlca is allocated and initialized twice, once when the
catabase connection is attempted, and once when the SQL statement is run.

I think that the "good" way to do it would be:
- start a thread
- connect to the database
- do work
- disconnect from the database
- terminate the thread

Maybe somebody who knows more about ecpg can say if what you are doing
should work or not.

Yours,
Laurenz Albe

Re: Bug in ecpg lib ?

From
"Albe Laurenz"
Date:
I wrote: 
> What I notice about your program is that you connect to the database
> in the main thread, then start a new thread and use the connection in that
> new thread.
> 
> I don't know, but I'd expect that since ecpg keeps a thread-specific
> sqlca, this could cause problems. Indeed I find with the debugger that in
> your sample sqlca is allocated and initialized twice, once when the
> catabase connection is attempted, and once when the SQL statement is run.
> 
> I think that the "good" way to do it would be:
> - start a thread
> - connect to the database
> - do work
> - disconnect from the database
> - terminate the thread

I thought some more about that, and it is obvioisly nonsense.
Why shouldn't you use a connection object in a different thread?

I'll try to come up with some more findings to help figure out
what's going on.

Yours,
Laurenz Albe

Re: Bug in ecpg lib ?

From
leif@crysberg.dk
Date:
Hi Laurenz,

    Thank you for your effort. I appreciate it very much.

    I have been trying to figure this thing out myself too, breakpointing and single stepping my way through some of
theecpg code, but without much clarification. (More that I learned new things about pthread). I have been trying to
figureout whether this is a real thing or more a mudflapth "mis-judgement". Also on most (the faster ones) machines
mudflapcomplains either about "invalid pointer in free()" or "double free() or corruption". I haven't been able to
verifythis yet. Specifically on one (slower) machine, I have only seen this mudflapth complaint once, though I have
beenboth running and debugging it on that many times. 

    Are you sure what you suggest is nonsense ? In the light of the sqlca struct being "local" to each thread ? I tried
toput the open and close connection within the thread, but I was still able to get the mudflap complaint.
Theoretically,I guess one could use just 1 connection for all db access in all threads just having them enclosed within
pthread_mutex_[un]lock()s!? (Not what I do, though.) 

    And for your previous mail: Yes, I know that my example does not make the connection, but are still doing the
select...  It doesn't matter, however, if it does make a connection, it still bumps out. 
    And yes, I am aware that I open the connection in the "main thread" and use it another. This is the way real daemon
programwas designed. 

    Once again, thank you,

 Leif


----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

> I wrote:
> > What I notice about your program is that you connect to the
> database
> > in the main thread, then start a new thread and use the connection
> in that
> > new thread.
> >
> > I don't know, but I'd expect that since ecpg keeps a
> thread-specific
> > sqlca, this could cause problems. Indeed I find with the debugger
> that in
> > your sample sqlca is allocated and initialized twice, once when the
> > catabase connection is attempted, and once when the SQL statement is
> run.
> >
> > I think that the "good" way to do it would be:
> > - start a thread
> > - connect to the database
> > - do work
> > - disconnect from the database
> > - terminate the thread
>
> I thought some more about that, and it is obvioisly nonsense.
> Why shouldn't you use a connection object in a different thread?
>
> I'll try to come up with some more findings to help figure out
> what's going on.
>
> Yours,
> Laurenz Albe

Re: Bug in ecpg lib ?

From
"Albe Laurenz"
Date:
lj@crysberg.dk wrote:
>     I have been trying to figure this thing out myself too, 
> breakpointing and single stepping my way through some of the 
> ecpg code, but without much clarification. (More that I 
> learned new things about pthread). I have been trying to 
> figure out whether this is a real thing or more a mudflapth 
> "mis-judgement". Also on most (the faster ones) machines 
> mudflap complains either about "invalid pointer in free()" or 
> "double free() or corruption". I haven't been able to verify 
> this yet. Specifically on one (slower) machine, I have only 
> seen this mudflapth complaint once, though I have been both 
> running and debugging it on that many times.
> 
>     Are you sure what you suggest is nonsense ? In the light 
> of the sqlca struct being "local" to each thread ? I tried to 
> put the open and close connection within the thread, but I 
> was still able to get the mudflap complaint. Theoretically, I 
> guess one could use just 1 connection for all db access in 
> all threads just having them enclosed within 
> pthread_mutex_[un]lock()s !? (Not what I do, though.)

The sqlca is local to each thread, but that should not be a problem.
On closer scrutiny of the source, it works like this:

Whenever a thread performs an SQL operation, it will allocate
an sqlca in its thread-specific data area (TSD) in the ECPG function
ECPGget_sqlca(). When the thread exits or is cancelled, the
sqlca is freed by pthread by calling the ECPG function
ecpg_sqlca_key_destructor(). pthread makes sure that each
destructor function is only called once per thread.

So when several threads use a connection, there will be
several sqlca's around, but that should not matter as they get
freed when the thread exits.

After some experiments, I would say that mudflap's complaint
is a mistake.

I've compiled your program against a debug-enabled PostgreSQL 8.4.0 with

$ ecpg crashex

$ gcc -Wall -O0 -g -o crashex crashex.c -I /magwien/postgres-8.4.0/include \
-L/magwien/postgres-8.4.0/lib -lecpg -Wl,-rpath,/magwien/postgres-8.4.0/lib

and run a gdb session:

$ gdb
GNU gdb Red Hat Linux (6.3.0.0-1.138.el3rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".

   Set the program to be debugged:

(gdb) file crashex
Reading symbols from /home/laurenz/ecpg/crashex...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".

   This is where the source of libecpg is:

(gdb) dir /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib
Source directories searched: /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib:$cdir:$cwd

   Start the program (main thread):

(gdb) break main
Breakpoint 1 at 0x804892c: file crashex.pgc, line 54.
(gdb) run
Starting program: /home/laurenz/ecpg/crashex 
[Thread debugging using libthread_db enabled]
[New Thread -1218572160 (LWP 29290)]
[Switching to Thread -1218572160 (LWP 29290)]

Breakpoint 1, main (argc=1, argv=0xbfffce44) at crashex.pgc:54
54      PerformTask( 25 );
(gdb) delete
Delete all breakpoints? (y or n) y

   Set breakpoint #2 in the function where sqlca is freed:

(gdb) break ecpg_sqlca_key_destructor
Breakpoint 2 at 0x457a27: file misc.c, line 124.
(gdb) list misc.c:124
119    
120    #ifdef ENABLE_THREAD_SAFETY
121    static void
122    ecpg_sqlca_key_destructor(void *arg)
123    {
124        free(arg);                    /* sqlca structure allocated in ECPGget_sqlca */
125    }
126    
127    static void
128    ecpg_sqlca_key_init(void)

   Set breakpoint #3 where a new sqlca is allocated in ECPGget_sqlca():

(gdb) break misc.c:147
Breakpoint 3 at 0x457ad2: file misc.c, line 147.
(gdb) list misc.c:134,misc.c:149
134    struct sqlca_t *
135    ECPGget_sqlca(void)
136    {
137    #ifdef ENABLE_THREAD_SAFETY
138        struct sqlca_t *sqlca;
139    
140        pthread_once(&sqlca_key_once, ecpg_sqlca_key_init);
141    
142        sqlca = pthread_getspecific(sqlca_key);
143        if (sqlca == NULL)
144        {
145            sqlca = malloc(sizeof(struct sqlca_t));
146            ecpg_init_sqlca(sqlca);
147            pthread_setspecific(sqlca_key, sqlca);
148        }
149        return (sqlca);
(gdb) cont
Continuing.

   Breakpoint #3 is hit when the main thread allocates an sqlca during connect:

Breakpoint 3, ECPGget_sqlca () at misc.c:147
147            pthread_setspecific(sqlca_key, sqlca);
(gdb) where
#0  ECPGget_sqlca () at misc.c:147
#1  0x00456d57 in ECPGconnect (lineno=41, c=0, name=0x9bf2008 "test@localhost:1238", 
    user=0x8048a31 "laureny", passwd=0x0, connection_name=0x8048a14 "dbConn", autocommit=0)
    at connect.c:270
#2  0x080488a3 in PerformTask (TaskId=25) at crashex.pgc:41
#3  0x08048936 in main (argc=1, argv=0xbfffce44) at crashex.pgc:54

   This is the address of the main thread's sqlca:

(gdb) print sqlca
$1 = (struct sqlca_t *) 0x9bf2028
(gdb) cont
Continuing.
[New Thread 27225008 (LWP 29343)]
[Switching to Thread 27225008 (LWP 29343)]

   Breakpoint #3 is hit again when the new thread allocates its sqlca when it executes the SELECT statement:

Breakpoint 3, ECPGget_sqlca () at misc.c:147
147            pthread_setspecific(sqlca_key, sqlca);
(gdb) where
#0  ECPGget_sqlca () at misc.c:147
#1  0x004579aa in ecpg_init (con=0x0, connection_name=0x8048a14 "dbConn", lineno=22) at misc.c:107
#2  0x00451a97 in ECPGdo (lineno=22, compat=0, force_indicator=1, 
    connection_name=0x8048a14 "dbConn", questionmarks=0 '\0', st=0, query=0x8048a1b "select 2 + 2")
    at execute.c:1470
#3  0x080487f7 in Work () at crashex.pgc:22
#4  0x00c8cdd8 in start_thread () from /lib/tls/libpthread.so.0
#5  0x003e5fca in clone () from /lib/tls/libc.so.6

   This is the address of the new thread's sqlca:

(gdb) print sqlca
$2 = (struct sqlca_t *) 0x9c16ee8
(gdb) cont
Continuing.
2+2=0.

   Breakpoint #2 is hit when the new thread is canceled:

Breakpoint 2, ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
124        free(arg);                    /* sqlca structure allocated in ECPGget_sqlca */
(gdb) where
#0  ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
#1  0x00c8d799 in deallocate_tsd () from /lib/tls/libpthread.so.0
#2  0x00c8cde6 in start_thread () from /lib/tls/libpthread.so.0
#3  0x003e5fca in clone () from /lib/tls/libc.so.6

   The freed pointer is the sqlca of the new thread:

(gdb) print arg
$3 = (void *) 0x9c16ee8

   And the program terminates with no problems.

(gdb) cont
Continuing.
[Thread 27225008 (zombie) exited]

Program exited normally.
(gdb) quit


This all looks just like it should, doesn't it?

Yours,
Laurenz Albe

Re: Bug in ecpg lib ?

From
leif@crysberg.dk
Date:
Hello Laurenz,

   Thank you for your very thorough walk through the 'ecpg use' of threads with respect to the sqlca. It was very clear
andspecific. I reproduced what you did almost exactly as you have done and I could then also play around with things to
seewhat happens 'if'... I have learned much about threads and ecpg, which I'm sure will be very helpful. Also I'm
afraidI have to agree with you that it must be a mudflap flop ;-)   ...   unfortunately, because now I'm then back to
thereal problem in the larger program and how to track that error. 

   I'm pleased that it wasn't an ecpg bug, and I know now not to use mudflap for tracking my problem.

   Thanks for your big effort on this,

 Leif


----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

> lj@crysberg.dk wrote:
> >     I have been trying to figure this thing out myself too,
> > breakpointing and single stepping my way through some of the
> > ecpg code, but without much clarification. (More that I
> > learned new things about pthread). I have been trying to
> > figure out whether this is a real thing or more a mudflapth
> > "mis-judgement". Also on most (the faster ones) machines
> > mudflap complains either about "invalid pointer in free()" or
> > "double free() or corruption". I haven't been able to verify
> > this yet. Specifically on one (slower) machine, I have only
> > seen this mudflapth complaint once, though I have been both
> > running and debugging it on that many times.
> >
> >     Are you sure what you suggest is nonsense ? In the light
> > of the sqlca struct being "local" to each thread ? I tried to
> > put the open and close connection within the thread, but I
> > was still able to get the mudflap complaint. Theoretically, I
> > guess one could use just 1 connection for all db access in
> > all threads just having them enclosed within
> > pthread_mutex_[un]lock()s !? (Not what I do, though.)
>
> The sqlca is local to each thread, but that should not be a problem.
> On closer scrutiny of the source, it works like this:
>
> Whenever a thread performs an SQL operation, it will allocate
> an sqlca in its thread-specific data area (TSD) in the ECPG function
> ECPGget_sqlca(). When the thread exits or is cancelled, the
> sqlca is freed by pthread by calling the ECPG function
> ecpg_sqlca_key_destructor(). pthread makes sure that each
> destructor function is only called once per thread.
>
> So when several threads use a connection, there will be
> several sqlca's around, but that should not matter as they get
> freed when the thread exits.
>
> After some experiments, I would say that mudflap's complaint
> is a mistake.
>
> I've compiled your program against a debug-enabled PostgreSQL 8.4.0
> with
>
> $ ecpg crashex
>
> $ gcc -Wall -O0 -g -o crashex crashex.c -I
> /magwien/postgres-8.4.0/include \
> -L/magwien/postgres-8.4.0/lib -lecpg
> -Wl,-rpath,/magwien/postgres-8.4.0/lib
>
> and run a gdb session:
>
> $ gdb
> GNU gdb Red Hat Linux (6.3.0.0-1.138.el3rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-redhat-linux-gnu".
>
>    Set the program to be debugged:
>
> (gdb) file crashex
> Reading symbols from /home/laurenz/ecpg/crashex...done.
> Using host libthread_db library "/lib/tls/libthread_db.so.1".
>
>    This is where the source of libecpg is:
>
> (gdb) dir
> /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib
> Source directories searched:
> /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib:$cdir:$cwd
>
>    Start the program (main thread):
>
> (gdb) break main
> Breakpoint 1 at 0x804892c: file crashex.pgc, line 54.
> (gdb) run
> Starting program: /home/laurenz/ecpg/crashex
> [Thread debugging using libthread_db enabled]
> [New Thread -1218572160 (LWP 29290)]
> [Switching to Thread -1218572160 (LWP 29290)]
>
> Breakpoint 1, main (argc=1, argv=0xbfffce44) at crashex.pgc:54
> 54      PerformTask( 25 );
> (gdb) delete
> Delete all breakpoints? (y or n) y
>
>    Set breakpoint #2 in the function where sqlca is freed:
>
> (gdb) break ecpg_sqlca_key_destructor
> Breakpoint 2 at 0x457a27: file misc.c, line 124.
> (gdb) list misc.c:124
> 119
> 120    #ifdef ENABLE_THREAD_SAFETY
> 121    static void
> 122    ecpg_sqlca_key_destructor(void *arg)
> 123    {
> 124        free(arg);                    /* sqlca structure allocated in ECPGget_sqlca */
> 125    }
> 126
> 127    static void
> 128    ecpg_sqlca_key_init(void)
>
>    Set breakpoint #3 where a new sqlca is allocated in
> ECPGget_sqlca():
>
> (gdb) break misc.c:147
> Breakpoint 3 at 0x457ad2: file misc.c, line 147.
> (gdb) list misc.c:134,misc.c:149
> 134    struct sqlca_t *
> 135    ECPGget_sqlca(void)
> 136    {
> 137    #ifdef ENABLE_THREAD_SAFETY
> 138        struct sqlca_t *sqlca;
> 139
> 140        pthread_once(&sqlca_key_once, ecpg_sqlca_key_init);
> 141
> 142        sqlca = pthread_getspecific(sqlca_key);
> 143        if (sqlca == NULL)
> 144        {
> 145            sqlca = malloc(sizeof(struct sqlca_t));
> 146            ecpg_init_sqlca(sqlca);
> 147            pthread_setspecific(sqlca_key, sqlca);
> 148        }
> 149        return (sqlca);
> (gdb) cont
> Continuing.
>
>    Breakpoint #3 is hit when the main thread allocates an sqlca during
> connect:
>
> Breakpoint 3, ECPGget_sqlca () at misc.c:147
> 147            pthread_setspecific(sqlca_key, sqlca);
> (gdb) where
> #0  ECPGget_sqlca () at misc.c:147
> #1  0x00456d57 in ECPGconnect (lineno=41, c=0, name=0x9bf2008
> "test@localhost:1238",
>     user=0x8048a31 "laureny", passwd=0x0, connection_name=0x8048a14
> "dbConn", autocommit=0)
>     at connect.c:270
> #2  0x080488a3 in PerformTask (TaskId=25) at crashex.pgc:41
> #3  0x08048936 in main (argc=1, argv=0xbfffce44) at crashex.pgc:54
>
>    This is the address of the main thread's sqlca:
>
> (gdb) print sqlca
> $1 = (struct sqlca_t *) 0x9bf2028
> (gdb) cont
> Continuing.
> [New Thread 27225008 (LWP 29343)]
> [Switching to Thread 27225008 (LWP 29343)]
>
>    Breakpoint #3 is hit again when the new thread allocates its sqlca
> when it executes the SELECT statement:
>
> Breakpoint 3, ECPGget_sqlca () at misc.c:147
> 147            pthread_setspecific(sqlca_key, sqlca);
> (gdb) where
> #0  ECPGget_sqlca () at misc.c:147
> #1  0x004579aa in ecpg_init (con=0x0, connection_name=0x8048a14
> "dbConn", lineno=22) at misc.c:107
> #2  0x00451a97 in ECPGdo (lineno=22, compat=0, force_indicator=1,
>     connection_name=0x8048a14 "dbConn", questionmarks=0 '\0', st=0,
> query=0x8048a1b "select 2 + 2")
>     at execute.c:1470
> #3  0x080487f7 in Work () at crashex.pgc:22
> #4  0x00c8cdd8 in start_thread () from /lib/tls/libpthread.so.0
> #5  0x003e5fca in clone () from /lib/tls/libc.so.6
>
>    This is the address of the new thread's sqlca:
>
> (gdb) print sqlca
> $2 = (struct sqlca_t *) 0x9c16ee8
> (gdb) cont
> Continuing.
> 2+2=0.
>
>    Breakpoint #2 is hit when the new thread is canceled:
>
> Breakpoint 2, ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
> 124        free(arg);                    /* sqlca structure allocated in ECPGget_sqlca */
> (gdb) where
> #0  ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
> #1  0x00c8d799 in deallocate_tsd () from /lib/tls/libpthread.so.0
> #2  0x00c8cde6 in start_thread () from /lib/tls/libpthread.so.0
> #3  0x003e5fca in clone () from /lib/tls/libc.so.6
>
>    The freed pointer is the sqlca of the new thread:
>
> (gdb) print arg
> $3 = (void *) 0x9c16ee8
>
>    And the program terminates with no problems.
>
> (gdb) cont
> Continuing.
> [Thread 27225008 (zombie) exited]
>
> Program exited normally.
> (gdb) quit
>
>
> This all looks just like it should, doesn't it?
>
> Yours,
> Laurenz Albe