Thread: Bug in ecpg lib ?
Hi guys, I'm using PostgreSQL in a server project that uses many forks and many threads in each forked process. Almost everytime I do a pthread_cancel() I get a SIGSEGV. I have then linked the libmudflapth into my program to catchthe problem sooner and now that reports either 'invalid pointer' or 'double free or corruption' when a thread is cancelled.Typically I have 2 database connection opened before any of the threads are created. I am pretty sure that I'monly using 1 connection in any 1 thread, i.e. only 2 of the threads are doing database access and using each their allocatedconnection. After the main thread has done a pthread_cancel() I get a "mudflapth dump" with the following trace back (the abort comesfrom the mudflapth lib when detecting the bad pointer): #0 0xffffe405 in __kernel_vsyscall () #1 0xf7ca2335 in raise () from /lib32/libc.so.6 #2 0xf7ca3cb1 in abort () from /lib32/libc.so.6 #3 0xf7cdb6ec in ?? () from /lib32/libc.so.6 #4 0xf7ce71ab in free () from /lib32/libc.so.6 #5 0xf7dec061 in free (buf=0x87ed138) at ../../../libmudflap/mf-hooks1.c:241 #6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6 #7 0xf7dcebb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0 #8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0 #9 0xf7d5008e in clone () from /lib32/libc.so.6 Looking in the ecpg_sqlca_key_destructor(), it seems to me that the sqlca can be deallocated several times !? (I'm nottoo much into the Postgres code including ecpg, so that is a novice point of view.) I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with exactly the same result and and on many different Linux systems,mainly Slackware 10.2 and Ubuntu 7. I have on all systems configured and compiled Postgres with this configure line: ./configure --prefix=/usr/local/Packages/pgsql-8.3.5 --with-openssl --enable-thread-safety Please help, Leif
leif@crysberg.dk wrote: > I'm using PostgreSQL in a server project that uses many > forks and many threads in each forked process. > > Almost everytime I do a pthread_cancel() I get a SIGSEGV. > I have then linked the libmudflapth into my program to catch > the problem sooner and now that reports either 'invalid > pointer' or 'double free or corruption' when a thread is > cancelled. Typically I have 2 database connection opened > before any of the threads are created. I am pretty sure that > I'm only using 1 connection in any 1 thread, i.e. only 2 of > the threads are doing database access and using each their > allocated connection. > > After the main thread has done a pthread_cancel() I get a > "mudflapth dump" with the following trace back (the abort > comes from the mudflapth lib when detecting the bad pointer): > > #0 0xffffe405 in __kernel_vsyscall () > #1 0xf7ca2335 in raise () from /lib32/libc.so.6 > #2 0xf7ca3cb1 in abort () from /lib32/libc.so.6 > #3 0xf7cdb6ec in ?? () from /lib32/libc.so.6 > #4 0xf7ce71ab in free () from /lib32/libc.so.6 > #5 0xf7dec061 in free (buf=0x87ed138) at ../../../libmudflap/mf-hooks1.c:241 > #6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6 > #7 0xf7dcebb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0 > #8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0 > #9 0xf7d5008e in clone () from /lib32/libc.so.6 > > Looking in the ecpg_sqlca_key_destructor(), it seems to me > that the sqlca can be deallocated several times !? (I'm not > too much into the Postgres code including ecpg, so that is a > novice point of view.) > > I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with > exactly the same result and and on many different Linux > systems, mainly Slackware 10.2 and Ubuntu 7. I have on all > systems configured and compiled Postgres with this configure line: > > ./configure --prefix=/usr/local/Packages/pgsql-8.3.5 > --with-openssl --enable-thread-safety Could you create a small sample program that reproduces the bug? That would make it easier for me or somebody else to do something about it. Yours, Laurenz Albe
Hi Laurenz, Thanks for the suggestion. It sure wasn't easy, but I should have done that right away. It turned out not to be in theecpg module, but somewhere in my own code (of course ;-) ). At least I haven't been able to reproduce it in a simple exampleand I haven't figured out where in my own code yet either. Leif ----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > leif@crysberg.dk wrote: > > I'm using PostgreSQL in a server project that uses many > > forks and many threads in each forked process. > > > > Almost everytime I do a pthread_cancel() I get a SIGSEGV. > > I have then linked the libmudflapth into my program to catch > > the problem sooner and now that reports either 'invalid > > pointer' or 'double free or corruption' when a thread is > > cancelled. Typically I have 2 database connection opened > > before any of the threads are created. I am pretty sure that > > I'm only using 1 connection in any 1 thread, i.e. only 2 of > > the threads are doing database access and using each their > > allocated connection. > > > > After the main thread has done a pthread_cancel() I get a > > "mudflapth dump" with the following trace back (the abort > > comes from the mudflapth lib when detecting the bad pointer): > > > > #0 0xffffe405 in __kernel_vsyscall () > > #1 0xf7ca2335 in raise () from /lib32/libc.so.6 > > #2 0xf7ca3cb1 in abort () from /lib32/libc.so.6 > > #3 0xf7cdb6ec in ?? () from /lib32/libc.so.6 > > #4 0xf7ce71ab in free () from /lib32/libc.so.6 > > #5 0xf7dec061 in free (buf=0x87ed138) at > ../../../libmudflap/mf-hooks1.c:241 > > #6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from > /lib32/libecpg.so.6 > > #7 0xf7dcebb0 in __nptl_deallocate_tsd () from > /lib32/libpthread.so.0 > > #8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0 > > #9 0xf7d5008e in clone () from /lib32/libc.so.6 > > > > Looking in the ecpg_sqlca_key_destructor(), it seems to me > > that the sqlca can be deallocated several times !? (I'm not > > too much into the Postgres code including ecpg, so that is a > > novice point of view.) > > > > I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with > > exactly the same result and and on many different Linux > > systems, mainly Slackware 10.2 and Ubuntu 7. I have on all > > systems configured and compiled Postgres with this configure line: > > > > ./configure --prefix=/usr/local/Packages/pgsql-8.3.5 > > --with-openssl --enable-thread-safety > > Could you create a small sample program that reproduces the bug? > > That would make it easier for me or somebody else to do something > about it. > > Yours, > Laurenz Albe
Hi Laurenz, I have now generate a rather small example where I experience the problem, attached. It is linked with the mudflapth libraryusing the commands below. You may have to change the DBNAME and DBUSER. The delay just before the pthread_cancel(),i.e. sleep(10), is rather critical for the problem to appear and you might have to change it to somethingless. On some very slow machines I wasn't able to produce the problem. $ ecpg crashex.pgc $ /usr/local/Packages/gcc-4.4.0/bin/gcc -O0 -c -fmudflap -fmudflapth -fomit-frame-pointer -B/usr/local/Packages/gcc-4.4.0/bin/-Wwrite-strings -std=gnu89 -ggdb -fPIC -Wall -I/usr/local/Packages/pgsql/include -I./Modules-I./ -o crashex.o crashex.c $ /usr/local/Packages/gcc-4.4.0/bin/gcc -O0 -B/usr/local/Packages/gcc-4.4.0/bin/ -Wl -o crashex crashex.o -L/usr/local/Packages/pgsql/lib-lecpg -lpq -lmudflapth -lpthread -ldl And this is the output from running the program: leif$ LD_LIBRARY_PATH=/usr/local/Packages/gcc-4.4.0/lib/ ./crashex Couldn't open somename@localhost:5432 2+2=0. *** glibc detected *** /home/leif/tmp/crashex: free(): invalid pointer: 0x081f3958 *** ======= Backtrace: ========= /lib32/libc.so.6[0xf7c30615] /lib32/libc.so.6(cfree+0x90)[0xf7c34080] /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0(__real_free+0x3f1)[0xf7d39061] /lib32/libecpg.so.6[0xf7e3fb5c] /lib32/libpthread.so.0[0xf7d1bbb0] /lib32/libpthread.so.0[0xf7d1c509] /lib32/libc.so.6(clone+0x5e)[0xf7c9d08e] ======= Memory map: ======== 08048000-0804a000 r-xp 00000000 08:0a 1671173 /home/leif/tmp/crashex 0804a000-0804b000 rwxp 00001000 08:0a 1671173 /home/leif/tmp/crashex 0804b000-081f8000 rwxp 0804b000 00:00 0 [heap] f71eb000-f71ec000 ---p f71eb000 00:00 0 f71ec000-f79ec000 rwxp f71ec000 00:00 0 f79ec000-f79f5000 r-xp 00000000 08:01 97934 /lib32/libnss_files-2.7.so f79f5000-f79f7000 rwxp 00008000 08:01 97934 /lib32/libnss_files-2.7.so f79f7000-f79ff000 r-xp 00000000 08:01 97936 /lib32/libnss_nis-2.7.so f79ff000-f7a01000 rwxp 00007000 08:01 97936 /lib32/libnss_nis-2.7.so f7a01000-f7a15000 r-xp 00000000 08:01 97931 /lib32/libnsl-2.7.so f7a15000-f7a17000 rwxp 00013000 08:01 97931 /lib32/libnsl-2.7.so f7a17000-f7a19000 rwxp f7a17000 00:00 0 f7a19000-f7a20000 r-xp 00000000 08:01 97932 /lib32/libnss_compat-2.7.so f7a20000-f7a22000 rwxp 00006000 08:01 97932 /lib32/libnss_compat-2.7.so f7a22000-f7a2c000 r-xp 00000000 08:08 227374 /usr/lib32/libgcc_s.so.1 f7a2c000-f7a2d000 rwxp 0000a000 08:08 227374 /usr/lib32/libgcc_s.so.1 f7a2d000-f7a2f000 rwxp f7a2d000 00:00 0 f7a2f000-f7a38000 r-xp 00000000 08:01 97927 /lib32/libcrypt-2.7.so f7a38000-f7a3a000 rwxp 00008000 08:01 97927 /lib32/libcrypt-2.7.so f7a3a000-f7a61000 rwxp f7a3a000 00:00 0 f7a61000-f7b4a000 r-xp 00000000 08:01 98015 /lib32/libcrypto.so.0.9.7 f7b4a000-f7b5c000 rwxp 000e8000 08:01 98015 /lib32/libcrypto.so.0.9.7 f7b5c000-f7b5f000 rwxp f7b5c000 00:00 0 f7b5f000-f7b8d000 r-xp 00000000 08:01 98021 /lib32/libssl.so.0.9.7 f7b8d000-f7b90000 rwxp 0002d000 08:01 98021 /lib32/libssl.so.0.9.7 f7b90000-f7bb3000 r-xp 00000000 08:01 97929 /lib32/libm-2.7.so f7bb3000-f7bb5000 rwxp 00023000 08:01 97929 /lib32/libm-2.7.so f7bb5000-f7bb6000 rwxp f7bb5000 00:00 0 f7bb6000-f7bc2000 r-xp 00000000 08:01 97971 /lib32/libpgtypes.so.3.0 f7bc2000-f7bc4000 rwxp 0000c000 08:01 97971 /lib32/libpgtypes.so.3.0 f7bc4000-f7d0d000 r-xp 00000000 08:01 97925 /lib32/libc-2.7.so f7d0d000-f7d0e000 r-xp 00149000 08:01 97925 /lib32/libc-2.7.so f7d0e000-f7d10000 rwxp 0014a000 08:01 97925 /lib32/libc-2.7.so f7d10000-f7d13000 rwxp f7d10000 00:00 0 f7d13000-f7d15000 r-xp 00000000 08:01 97928 /lib32/libdl-2.7.so f7d15000-f7d17000 rwxp 00001000 08:01 97928 /lib32/libdl-2.7.so f7d17000-f7d2b000 r-xp 00000000 08:01 97939 /lib32/libpthread-2.7.so f7d2b000-f7d2d000 rwxp 00013000 08:01 97939 /lib32/libpthread-2.7.so f7d2d000-f7d2f000 rwxp f7d2d000 00:00 0 f7d2f000-f7d49000 r-xp 00000000 08:09 934128 /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0.0.0 f7d49000-f7d4c000 rwxp 0001a000 08:09 934128 /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0.0.0 f7d4c000-f7e1b000 rwxp f7d4c000 00:00 0 f7e1b000-f7e35000 r-xp 00000000 08:01 98012 /lib32/libpq.so.5.0 f7e35000-f7e36000 rwxp 0001a000 08:01 98012 /lib32/libpq.so.5.0 f7e36000-f7e44000 r-xp 00000000 08:01 97969 /lib32/libecpg.so.6.0 f7e44000-f7f05000 rwxp 0000d000 08:01 97969 /lib32/libecpg.so.6.0 f7f18000-f7f1b000 rwxp f7f18000 00:00 0 f7f1b000-f7f38000 r-xp 00000000 08:01 97922 /lib32/ld-2.7.so f7f38000-f7f3a000 rwxp 0001c000 08:01 97922 /lib32/ld-2.7.so fff3f000-fff59000 rwxp 7ffffffe5000 00:00 0 [stack] ffffe000-fffff000 r-xp ffffe000 00:00 0 [vdso] Aborted (core dumped) leif$ gdb ~/tmp/crashex core.30920 GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... warning: Can't read pathname for load map: Input/output error. Reading symbols from /lib32/libecpg.so.6...done. Loaded symbols for /lib32/libecpg.so.6 Reading symbols from /lib32/libpq.so.5...done. Loaded symbols for /lib32/libpq.so.5 Reading symbols from /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0...done. Loaded symbols for /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0 Reading symbols from /lib32/libpthread.so.0...done. Loaded symbols for /lib32/libpthread.so.0 Reading symbols from /lib32/libdl.so.2...done. Loaded symbols for /lib32/libdl.so.2 Reading symbols from /lib32/libc.so.6...done. Loaded symbols for /lib32/libc.so.6 Reading symbols from /lib32/libpgtypes.so.3...done. Loaded symbols for /lib32/libpgtypes.so.3 Reading symbols from /lib32/libm.so.6...done. Loaded symbols for /lib32/libm.so.6 Reading symbols from /lib32/libssl.so.0...done. Loaded symbols for /lib32/libssl.so.0 Reading symbols from /lib32/libcrypto.so.0...done. Loaded symbols for /lib32/libcrypto.so.0 Reading symbols from /lib32/libcrypt.so.1...done. Loaded symbols for /lib32/libcrypt.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib32/libgcc_s.so.1...done. Loaded symbols for /usr/lib32/libgcc_s.so.1 Reading symbols from /lib32/libnss_compat.so.2...done. Loaded symbols for /lib32/libnss_compat.so.2 Reading symbols from /lib32/libnsl.so.1...done. Loaded symbols for /lib32/libnsl.so.1 Reading symbols from /lib32/libnss_nis.so.2...done. Loaded symbols for /lib32/libnss_nis.so.2 Reading symbols from /lib32/libnss_files.so.2...done. Loaded symbols for /lib32/libnss_files.so.2 warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b4 Program terminated with signal 6, Aborted. [New process 30922] [New process 30920] #0 0xffffe405 in __kernel_vsyscall () (gdb) bt #0 0xffffe405 in __kernel_vsyscall () #1 0xf7bef335 in raise () from /lib32/libc.so.6 #2 0xf7bf0cb1 in abort () from /lib32/libc.so.6 #3 0xf7c286ec in ?? () from /lib32/libc.so.6 #4 0xf7c30615 in ?? () from /lib32/libc.so.6 #5 0xf7c34080 in free () from /lib32/libc.so.6 #6 0xf7d39061 in free (buf=0x81f3958) at ../../../libmudflap/mf-hooks1.c:241 #7 0xf7e3fb5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6 #8 0xf7d1bbb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0 #9 0xf7d1c509 in start_thread () from /lib32/libpthread.so.0 #10 0xf7c9d08e in clone () from /lib32/libc.so.6 (gdb) As you might have noticed, this particular run is on a 64bit architecture (Ubuntu 8.04) and the crashex program is generatedon a 32bit machine with gcc-4.4.0. I have tried with PostgreSQL version 8.3.5 and 8.3.7. All give the same result,though the specific program addresses of course might differ from system to system. Please help, Leif ----- leif@crysberg.dk wrote: > Hi Laurenz, > > Thanks for the suggestion. It sure wasn't easy, but I should have > done that right away. It turned out not to be in the ecpg module, but > somewhere in my own code (of course ;-) ). At least I haven't been > able to reproduce it in a simple example and I haven't figured out > where in my own code yet either. > > Leif > > > ----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > > > leif@crysberg.dk wrote: > > > I'm using PostgreSQL in a server project that uses many > > > forks and many threads in each forked process. > > > > > > Almost everytime I do a pthread_cancel() I get a SIGSEGV. > > > I have then linked the libmudflapth into my program to catch > > > the problem sooner and now that reports either 'invalid > > > pointer' or 'double free or corruption' when a thread is > > > cancelled. Typically I have 2 database connection opened > > > before any of the threads are created. I am pretty sure that > > > I'm only using 1 connection in any 1 thread, i.e. only 2 of > > > the threads are doing database access and using each their > > > allocated connection. > > > > > > After the main thread has done a pthread_cancel() I get a > > > "mudflapth dump" with the following trace back (the abort > > > comes from the mudflapth lib when detecting the bad pointer): > > > > > > #0 0xffffe405 in __kernel_vsyscall () > > > #1 0xf7ca2335 in raise () from /lib32/libc.so.6 > > > #2 0xf7ca3cb1 in abort () from /lib32/libc.so.6 > > > #3 0xf7cdb6ec in ?? () from /lib32/libc.so.6 > > > #4 0xf7ce71ab in free () from /lib32/libc.so.6 > > > #5 0xf7dec061 in free (buf=0x87ed138) at > > ../../../libmudflap/mf-hooks1.c:241 > > > #6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from > > /lib32/libecpg.so.6 > > > #7 0xf7dcebb0 in __nptl_deallocate_tsd () from > > /lib32/libpthread.so.0 > > > #8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0 > > > #9 0xf7d5008e in clone () from /lib32/libc.so.6 > > > > > > Looking in the ecpg_sqlca_key_destructor(), it seems to me > > > that the sqlca can be deallocated several times !? (I'm not > > > too much into the Postgres code including ecpg, so that is a > > > novice point of view.) > > > > > > I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with > > > exactly the same result and and on many different Linux > > > systems, mainly Slackware 10.2 and Ubuntu 7. I have on all > > > systems configured and compiled Postgres with this configure line: > > > > > > ./configure --prefix=/usr/local/Packages/pgsql-8.3.5 > > > --with-openssl --enable-thread-safety > > > > Could you create a small sample program that reproduces the bug? > > > > That would make it easier for me or somebody else to do something > > about it. > > > > Yours, > > Laurenz Albe
Attachment
leif@crysberg.dk wrote: > I have now generate a rather small example where I > experience the problem, attached. It is linked with the > mudflapth library using the commands below. You may have to > change the DBNAME and DBUSER. The delay just before the > pthread_cancel(), i.e. sleep(10), is rather critical for the > problem to appear and you might have to change it to > something less. On some very slow machines I wasn't able to > produce the problem. > [...] > > And this is the output from running the program: > > leif$ LD_LIBRARY_PATH=/usr/local/Packages/gcc-4.4.0/lib/ ./crashex > Couldn't open somename@localhost:5432 > 2+2=0. > *** glibc detected *** /home/leif/tmp/crashex: free(): > invalid pointer: 0x081f3958 *** [...] > Aborted (core dumped) > > > leif$ gdb ~/tmp/crashex core.30920 [...] > #0 0xffffe405 in __kernel_vsyscall () > (gdb) bt > #0 0xffffe405 in __kernel_vsyscall () > #1 0xf7bef335 in raise () from /lib32/libc.so.6 > #2 0xf7bf0cb1 in abort () from /lib32/libc.so.6 > #3 0xf7c286ec in ?? () from /lib32/libc.so.6 > #4 0xf7c30615 in ?? () from /lib32/libc.so.6 > #5 0xf7c34080 in free () from /lib32/libc.so.6 > #6 0xf7d39061 in free (buf=0x81f3958) at > ../../../libmudflap/mf-hooks1.c:241 > #7 0xf7e3fb5c in ecpg_sqlca_key_destructor () from > /lib32/libecpg.so.6 > #8 0xf7d1bbb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0 > #9 0xf7d1c509 in start_thread () from /lib32/libpthread.so.0 > #10 0xf7c9d08e in clone () from /lib32/libc.so.6 > (gdb) I ran your sample with gdb against PostgreSQL 8.4, and ecpg_sqlca_key_destructor() was called only once, for a valid pointer, one that was previously allocated with malloc(). Could you check if ecpg_sqlca_key_destructor() is called more than once if you run the sample? Are you aware that in your sample run the connection attempt failed? It does not matter, ecpg should do the right thing anyway. What I notice about your program is that you connect to the database in the main thread, then start a new thread and use the connection in that new thread. I don't know, but I'd expect that since ecpg keeps a thread-specific sqlca, this could cause problems. Indeed I find with the debugger that in your sample sqlca is allocated and initialized twice, once when the catabase connection is attempted, and once when the SQL statement is run. I think that the "good" way to do it would be: - start a thread - connect to the database - do work - disconnect from the database - terminate the thread Maybe somebody who knows more about ecpg can say if what you are doing should work or not. Yours, Laurenz Albe
I wrote: > What I notice about your program is that you connect to the database > in the main thread, then start a new thread and use the connection in that > new thread. > > I don't know, but I'd expect that since ecpg keeps a thread-specific > sqlca, this could cause problems. Indeed I find with the debugger that in > your sample sqlca is allocated and initialized twice, once when the > catabase connection is attempted, and once when the SQL statement is run. > > I think that the "good" way to do it would be: > - start a thread > - connect to the database > - do work > - disconnect from the database > - terminate the thread I thought some more about that, and it is obvioisly nonsense. Why shouldn't you use a connection object in a different thread? I'll try to come up with some more findings to help figure out what's going on. Yours, Laurenz Albe
Hi Laurenz, Thank you for your effort. I appreciate it very much. I have been trying to figure this thing out myself too, breakpointing and single stepping my way through some of theecpg code, but without much clarification. (More that I learned new things about pthread). I have been trying to figureout whether this is a real thing or more a mudflapth "mis-judgement". Also on most (the faster ones) machines mudflapcomplains either about "invalid pointer in free()" or "double free() or corruption". I haven't been able to verifythis yet. Specifically on one (slower) machine, I have only seen this mudflapth complaint once, though I have beenboth running and debugging it on that many times. Are you sure what you suggest is nonsense ? In the light of the sqlca struct being "local" to each thread ? I tried toput the open and close connection within the thread, but I was still able to get the mudflap complaint. Theoretically,I guess one could use just 1 connection for all db access in all threads just having them enclosed within pthread_mutex_[un]lock()s!? (Not what I do, though.) And for your previous mail: Yes, I know that my example does not make the connection, but are still doing the select... It doesn't matter, however, if it does make a connection, it still bumps out. And yes, I am aware that I open the connection in the "main thread" and use it another. This is the way real daemon programwas designed. Once again, thank you, Leif ----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > I wrote: > > What I notice about your program is that you connect to the > database > > in the main thread, then start a new thread and use the connection > in that > > new thread. > > > > I don't know, but I'd expect that since ecpg keeps a > thread-specific > > sqlca, this could cause problems. Indeed I find with the debugger > that in > > your sample sqlca is allocated and initialized twice, once when the > > catabase connection is attempted, and once when the SQL statement is > run. > > > > I think that the "good" way to do it would be: > > - start a thread > > - connect to the database > > - do work > > - disconnect from the database > > - terminate the thread > > I thought some more about that, and it is obvioisly nonsense. > Why shouldn't you use a connection object in a different thread? > > I'll try to come up with some more findings to help figure out > what's going on. > > Yours, > Laurenz Albe
lj@crysberg.dk wrote: > I have been trying to figure this thing out myself too, > breakpointing and single stepping my way through some of the > ecpg code, but without much clarification. (More that I > learned new things about pthread). I have been trying to > figure out whether this is a real thing or more a mudflapth > "mis-judgement". Also on most (the faster ones) machines > mudflap complains either about "invalid pointer in free()" or > "double free() or corruption". I haven't been able to verify > this yet. Specifically on one (slower) machine, I have only > seen this mudflapth complaint once, though I have been both > running and debugging it on that many times. > > Are you sure what you suggest is nonsense ? In the light > of the sqlca struct being "local" to each thread ? I tried to > put the open and close connection within the thread, but I > was still able to get the mudflap complaint. Theoretically, I > guess one could use just 1 connection for all db access in > all threads just having them enclosed within > pthread_mutex_[un]lock()s !? (Not what I do, though.) The sqlca is local to each thread, but that should not be a problem. On closer scrutiny of the source, it works like this: Whenever a thread performs an SQL operation, it will allocate an sqlca in its thread-specific data area (TSD) in the ECPG function ECPGget_sqlca(). When the thread exits or is cancelled, the sqlca is freed by pthread by calling the ECPG function ecpg_sqlca_key_destructor(). pthread makes sure that each destructor function is only called once per thread. So when several threads use a connection, there will be several sqlca's around, but that should not matter as they get freed when the thread exits. After some experiments, I would say that mudflap's complaint is a mistake. I've compiled your program against a debug-enabled PostgreSQL 8.4.0 with $ ecpg crashex $ gcc -Wall -O0 -g -o crashex crashex.c -I /magwien/postgres-8.4.0/include \ -L/magwien/postgres-8.4.0/lib -lecpg -Wl,-rpath,/magwien/postgres-8.4.0/lib and run a gdb session: $ gdb GNU gdb Red Hat Linux (6.3.0.0-1.138.el3rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". Set the program to be debugged: (gdb) file crashex Reading symbols from /home/laurenz/ecpg/crashex...done. Using host libthread_db library "/lib/tls/libthread_db.so.1". This is where the source of libecpg is: (gdb) dir /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib Source directories searched: /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib:$cdir:$cwd Start the program (main thread): (gdb) break main Breakpoint 1 at 0x804892c: file crashex.pgc, line 54. (gdb) run Starting program: /home/laurenz/ecpg/crashex [Thread debugging using libthread_db enabled] [New Thread -1218572160 (LWP 29290)] [Switching to Thread -1218572160 (LWP 29290)] Breakpoint 1, main (argc=1, argv=0xbfffce44) at crashex.pgc:54 54 PerformTask( 25 ); (gdb) delete Delete all breakpoints? (y or n) y Set breakpoint #2 in the function where sqlca is freed: (gdb) break ecpg_sqlca_key_destructor Breakpoint 2 at 0x457a27: file misc.c, line 124. (gdb) list misc.c:124 119 120 #ifdef ENABLE_THREAD_SAFETY 121 static void 122 ecpg_sqlca_key_destructor(void *arg) 123 { 124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */ 125 } 126 127 static void 128 ecpg_sqlca_key_init(void) Set breakpoint #3 where a new sqlca is allocated in ECPGget_sqlca(): (gdb) break misc.c:147 Breakpoint 3 at 0x457ad2: file misc.c, line 147. (gdb) list misc.c:134,misc.c:149 134 struct sqlca_t * 135 ECPGget_sqlca(void) 136 { 137 #ifdef ENABLE_THREAD_SAFETY 138 struct sqlca_t *sqlca; 139 140 pthread_once(&sqlca_key_once, ecpg_sqlca_key_init); 141 142 sqlca = pthread_getspecific(sqlca_key); 143 if (sqlca == NULL) 144 { 145 sqlca = malloc(sizeof(struct sqlca_t)); 146 ecpg_init_sqlca(sqlca); 147 pthread_setspecific(sqlca_key, sqlca); 148 } 149 return (sqlca); (gdb) cont Continuing. Breakpoint #3 is hit when the main thread allocates an sqlca during connect: Breakpoint 3, ECPGget_sqlca () at misc.c:147 147 pthread_setspecific(sqlca_key, sqlca); (gdb) where #0 ECPGget_sqlca () at misc.c:147 #1 0x00456d57 in ECPGconnect (lineno=41, c=0, name=0x9bf2008 "test@localhost:1238", user=0x8048a31 "laureny", passwd=0x0, connection_name=0x8048a14 "dbConn", autocommit=0) at connect.c:270 #2 0x080488a3 in PerformTask (TaskId=25) at crashex.pgc:41 #3 0x08048936 in main (argc=1, argv=0xbfffce44) at crashex.pgc:54 This is the address of the main thread's sqlca: (gdb) print sqlca $1 = (struct sqlca_t *) 0x9bf2028 (gdb) cont Continuing. [New Thread 27225008 (LWP 29343)] [Switching to Thread 27225008 (LWP 29343)] Breakpoint #3 is hit again when the new thread allocates its sqlca when it executes the SELECT statement: Breakpoint 3, ECPGget_sqlca () at misc.c:147 147 pthread_setspecific(sqlca_key, sqlca); (gdb) where #0 ECPGget_sqlca () at misc.c:147 #1 0x004579aa in ecpg_init (con=0x0, connection_name=0x8048a14 "dbConn", lineno=22) at misc.c:107 #2 0x00451a97 in ECPGdo (lineno=22, compat=0, force_indicator=1, connection_name=0x8048a14 "dbConn", questionmarks=0 '\0', st=0, query=0x8048a1b "select 2 + 2") at execute.c:1470 #3 0x080487f7 in Work () at crashex.pgc:22 #4 0x00c8cdd8 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003e5fca in clone () from /lib/tls/libc.so.6 This is the address of the new thread's sqlca: (gdb) print sqlca $2 = (struct sqlca_t *) 0x9c16ee8 (gdb) cont Continuing. 2+2=0. Breakpoint #2 is hit when the new thread is canceled: Breakpoint 2, ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124 124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */ (gdb) where #0 ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124 #1 0x00c8d799 in deallocate_tsd () from /lib/tls/libpthread.so.0 #2 0x00c8cde6 in start_thread () from /lib/tls/libpthread.so.0 #3 0x003e5fca in clone () from /lib/tls/libc.so.6 The freed pointer is the sqlca of the new thread: (gdb) print arg $3 = (void *) 0x9c16ee8 And the program terminates with no problems. (gdb) cont Continuing. [Thread 27225008 (zombie) exited] Program exited normally. (gdb) quit This all looks just like it should, doesn't it? Yours, Laurenz Albe
Hello Laurenz, Thank you for your very thorough walk through the 'ecpg use' of threads with respect to the sqlca. It was very clear andspecific. I reproduced what you did almost exactly as you have done and I could then also play around with things to seewhat happens 'if'... I have learned much about threads and ecpg, which I'm sure will be very helpful. Also I'm afraidI have to agree with you that it must be a mudflap flop ;-) ... unfortunately, because now I'm then back to thereal problem in the larger program and how to track that error. I'm pleased that it wasn't an ecpg bug, and I know now not to use mudflap for tracking my problem. Thanks for your big effort on this, Leif ----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > lj@crysberg.dk wrote: > > I have been trying to figure this thing out myself too, > > breakpointing and single stepping my way through some of the > > ecpg code, but without much clarification. (More that I > > learned new things about pthread). I have been trying to > > figure out whether this is a real thing or more a mudflapth > > "mis-judgement". Also on most (the faster ones) machines > > mudflap complains either about "invalid pointer in free()" or > > "double free() or corruption". I haven't been able to verify > > this yet. Specifically on one (slower) machine, I have only > > seen this mudflapth complaint once, though I have been both > > running and debugging it on that many times. > > > > Are you sure what you suggest is nonsense ? In the light > > of the sqlca struct being "local" to each thread ? I tried to > > put the open and close connection within the thread, but I > > was still able to get the mudflap complaint. Theoretically, I > > guess one could use just 1 connection for all db access in > > all threads just having them enclosed within > > pthread_mutex_[un]lock()s !? (Not what I do, though.) > > The sqlca is local to each thread, but that should not be a problem. > On closer scrutiny of the source, it works like this: > > Whenever a thread performs an SQL operation, it will allocate > an sqlca in its thread-specific data area (TSD) in the ECPG function > ECPGget_sqlca(). When the thread exits or is cancelled, the > sqlca is freed by pthread by calling the ECPG function > ecpg_sqlca_key_destructor(). pthread makes sure that each > destructor function is only called once per thread. > > So when several threads use a connection, there will be > several sqlca's around, but that should not matter as they get > freed when the thread exits. > > After some experiments, I would say that mudflap's complaint > is a mistake. > > I've compiled your program against a debug-enabled PostgreSQL 8.4.0 > with > > $ ecpg crashex > > $ gcc -Wall -O0 -g -o crashex crashex.c -I > /magwien/postgres-8.4.0/include \ > -L/magwien/postgres-8.4.0/lib -lecpg > -Wl,-rpath,/magwien/postgres-8.4.0/lib > > and run a gdb session: > > $ gdb > GNU gdb Red Hat Linux (6.3.0.0-1.138.el3rh) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "i386-redhat-linux-gnu". > > Set the program to be debugged: > > (gdb) file crashex > Reading symbols from /home/laurenz/ecpg/crashex...done. > Using host libthread_db library "/lib/tls/libthread_db.so.1". > > This is where the source of libecpg is: > > (gdb) dir > /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib > Source directories searched: > /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib:$cdir:$cwd > > Start the program (main thread): > > (gdb) break main > Breakpoint 1 at 0x804892c: file crashex.pgc, line 54. > (gdb) run > Starting program: /home/laurenz/ecpg/crashex > [Thread debugging using libthread_db enabled] > [New Thread -1218572160 (LWP 29290)] > [Switching to Thread -1218572160 (LWP 29290)] > > Breakpoint 1, main (argc=1, argv=0xbfffce44) at crashex.pgc:54 > 54 PerformTask( 25 ); > (gdb) delete > Delete all breakpoints? (y or n) y > > Set breakpoint #2 in the function where sqlca is freed: > > (gdb) break ecpg_sqlca_key_destructor > Breakpoint 2 at 0x457a27: file misc.c, line 124. > (gdb) list misc.c:124 > 119 > 120 #ifdef ENABLE_THREAD_SAFETY > 121 static void > 122 ecpg_sqlca_key_destructor(void *arg) > 123 { > 124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */ > 125 } > 126 > 127 static void > 128 ecpg_sqlca_key_init(void) > > Set breakpoint #3 where a new sqlca is allocated in > ECPGget_sqlca(): > > (gdb) break misc.c:147 > Breakpoint 3 at 0x457ad2: file misc.c, line 147. > (gdb) list misc.c:134,misc.c:149 > 134 struct sqlca_t * > 135 ECPGget_sqlca(void) > 136 { > 137 #ifdef ENABLE_THREAD_SAFETY > 138 struct sqlca_t *sqlca; > 139 > 140 pthread_once(&sqlca_key_once, ecpg_sqlca_key_init); > 141 > 142 sqlca = pthread_getspecific(sqlca_key); > 143 if (sqlca == NULL) > 144 { > 145 sqlca = malloc(sizeof(struct sqlca_t)); > 146 ecpg_init_sqlca(sqlca); > 147 pthread_setspecific(sqlca_key, sqlca); > 148 } > 149 return (sqlca); > (gdb) cont > Continuing. > > Breakpoint #3 is hit when the main thread allocates an sqlca during > connect: > > Breakpoint 3, ECPGget_sqlca () at misc.c:147 > 147 pthread_setspecific(sqlca_key, sqlca); > (gdb) where > #0 ECPGget_sqlca () at misc.c:147 > #1 0x00456d57 in ECPGconnect (lineno=41, c=0, name=0x9bf2008 > "test@localhost:1238", > user=0x8048a31 "laureny", passwd=0x0, connection_name=0x8048a14 > "dbConn", autocommit=0) > at connect.c:270 > #2 0x080488a3 in PerformTask (TaskId=25) at crashex.pgc:41 > #3 0x08048936 in main (argc=1, argv=0xbfffce44) at crashex.pgc:54 > > This is the address of the main thread's sqlca: > > (gdb) print sqlca > $1 = (struct sqlca_t *) 0x9bf2028 > (gdb) cont > Continuing. > [New Thread 27225008 (LWP 29343)] > [Switching to Thread 27225008 (LWP 29343)] > > Breakpoint #3 is hit again when the new thread allocates its sqlca > when it executes the SELECT statement: > > Breakpoint 3, ECPGget_sqlca () at misc.c:147 > 147 pthread_setspecific(sqlca_key, sqlca); > (gdb) where > #0 ECPGget_sqlca () at misc.c:147 > #1 0x004579aa in ecpg_init (con=0x0, connection_name=0x8048a14 > "dbConn", lineno=22) at misc.c:107 > #2 0x00451a97 in ECPGdo (lineno=22, compat=0, force_indicator=1, > connection_name=0x8048a14 "dbConn", questionmarks=0 '\0', st=0, > query=0x8048a1b "select 2 + 2") > at execute.c:1470 > #3 0x080487f7 in Work () at crashex.pgc:22 > #4 0x00c8cdd8 in start_thread () from /lib/tls/libpthread.so.0 > #5 0x003e5fca in clone () from /lib/tls/libc.so.6 > > This is the address of the new thread's sqlca: > > (gdb) print sqlca > $2 = (struct sqlca_t *) 0x9c16ee8 > (gdb) cont > Continuing. > 2+2=0. > > Breakpoint #2 is hit when the new thread is canceled: > > Breakpoint 2, ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124 > 124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */ > (gdb) where > #0 ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124 > #1 0x00c8d799 in deallocate_tsd () from /lib/tls/libpthread.so.0 > #2 0x00c8cde6 in start_thread () from /lib/tls/libpthread.so.0 > #3 0x003e5fca in clone () from /lib/tls/libc.so.6 > > The freed pointer is the sqlca of the new thread: > > (gdb) print arg > $3 = (void *) 0x9c16ee8 > > And the program terminates with no problems. > > (gdb) cont > Continuing. > [Thread 27225008 (zombie) exited] > > Program exited normally. > (gdb) quit > > > This all looks just like it should, doesn't it? > > Yours, > Laurenz Albe