Thread: ECPG still having thread problems on Linux
Hi all, it looks like Lee's ECPG (and libpq) thread-safety patches have been applied, and configure --with-threads is also added. I have been doing some testing. On FreeBSD 4.8, the attached sample app runs without a problem. However, I still encounter a threading problem on Linux (RedHat 7.3). I have done the following: 1) cvs update 2) ./configure --with-threads && make && su -c "make install" 3) compiled cn.pgc as follows: a) ecpg -t cn.pgc b) gcc -I/usr/local/pgsql/include -L/usr/local/pgsql/lib \ -lecpg -lpgtypes -pthread cn.c 4) ./a.out - one thread runs to completion (inserts 5 records), the other hangs (manages one insert, then blocks forever) Using gdb, I attached to the thread that has locked up, and the backtrace looks like this: (gdb) backtrace #0 0x420e0187 in poll () from /lib/i686/libc.so.6 #1 0x4007d8cc in pqSocketPoll () from /usr/local/pgsql/lib/libpq.so.3 #2 0x4007d7ed in pqSocketCheck () from /usr/local/pgsql/lib/libpq.so.3 #3 0x4007d71f in pqWaitTimed () from /usr/local/pgsql/lib/libpq.so.3 #4 0x4007d6f5 in pqWait () from /usr/local/pgsql/lib/libpq.so.3 #5 0x4007bb53 in PQgetResult () from /usr/local/pgsql/lib/libpq.so.3 #6 0x4007bcbb in PQexec () from /usr/local/pgsql/lib/libpq.so.3 #7 0x40026d81 in ECPGexecute () from /usr/local/pgsql/lib/libecpg.so.4 #8 0x4002724c in ECPGdo () from /usr/local/pgsql/lib/libecpg.so.4 #9 0x08048927 in ins2 () #10 0x40043faf in pthread_start_thread () from /lib/i686/libpthread.so.0 Can anyone shed some light on why the behaviour differs between these two platforms? Also, perhaps someone other there with access to a different Linux setup (maybe a more recent build than RedHat 7.3, or a different distro) could try this app themselves to help verify if this is something that's stuffed on that release. I think I can rule out this problem being a quirk of my particular setup, as 3 different machines (all running RH7.3) give identical results. Build env: Linux 2.4.18-3 gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113) Regards, Philip Yarra.
Philip, both your SELECTs are using the same database connection (and it's undefined which one it is) without any locking. You need to add "AT clauses" to specify an explicit connection. See attached diff. However, i've not tried it... I'll try and get some time! L. Philip Yarra writes: > Hi all, it looks like Lee's ECPG (and libpq) thread-safety patches > have been applied, and configure --with-threads is also added. I > have been doing some testing. > > On FreeBSD 4.8, the attached sample app runs without a problem. > > However, I still encounter a threading problem on Linux (RedHat 7.3). > > I have done the following: > 1) cvs update > 2) ./configure --with-threads && make && su -c "make install" > 3) compiled cn.pgc as follows: > a) ecpg -t cn.pgc > b) gcc -I/usr/local/pgsql/include -L/usr/local/pgsql/lib \ > -lecpg -lpgtypes -pthread cn.c > 4) ./a.out - one thread runs to completion (inserts 5 records), > the other hangs (manages one insert, then blocks forever) *** cn.pgc 2003-06-25 10:29:55.000000000 +0100 --- cn.pgc.new 2003-06-25 10:29:45.000000000 +0100 *************** *** 36,46 **** EXEC SQL END DECLARE SECTION; EXEC SQL WHENEVER sqlerror sqlprint; EXEC SQL CONNECT TO :cs AS test1; ! EXEC SQL SET AUTOCOMMIT TO ON; for (i = 0; i < 5; i++) { printf("thread1 inserting\n"); ! EXEC SQL INSERT INTO foo VALUES(:bar); printf("==>thread1 insert done\n"); } EXEC SQL DISCONNECT test1; --- 36,46 ---- EXEC SQL END DECLARE SECTION; EXEC SQL WHENEVER sqlerror sqlprint; EXEC SQL CONNECT TO :cs AS test1; ! EXEC SQL AT test1 SET AUTOCOMMIT TO ON; for (i = 0; i < 5; i++) { printf("thread1 inserting\n"); ! EXEC SQL AT test1 INSERT INTO foo VALUES(:bar); printf("==>thread1 insert done\n"); } EXEC SQL DISCONNECT test1; *************** *** 57,67 **** EXEC SQL END DECLARE SECTION; EXEC SQL WHENEVER sqlerror sqlprint; EXEC SQL CONNECT TO :cs AS test2; ! EXEC SQL SET AUTOCOMMIT TO ON; for (i = 0; i < 5; i++) { printf("thread2 inserting\n"); ! EXEC SQL INSERT INTO foo VALUES(:bar); printf("==>thread2 insert done\n"); } EXEC SQL DISCONNECT test2; --- 57,67 ---- EXEC SQL END DECLARE SECTION; EXEC SQL WHENEVER sqlerror sqlprint; EXEC SQL CONNECT TO :cs AS test2; ! EXEC SQL AT test2 SET AUTOCOMMIT TO ON; for (i = 0; i < 5; i++) { printf("thread2 inserting\n"); ! EXEC SQL AT test2 INSERT INTO foo VALUES(:bar); printf("==>thread2 insert done\n"); } EXEC SQL DISCONNECT test2;
On Wed, 25 Jun 2003 07:35 pm, Lee Kindness wrote: > Philip, both your SELECTs are using the same database connection (and > it's undefined which one it is) without any locking. You need to add > "AT clauses" to specify an explicit connection. See attached diff. Ah, that'd be it. I spent some time debugging last night, and I'd realised the problem lay in the fact that the preproc was outputting NULL as the connection name, but was unsure why. Your changes allowed both threads to complete their inserts, which is great news for us! I'll add that "AT" clause to my list of updates for the documentation - it might be important. It's kinda.... absent... from the manual. I might also add a section on using pthreads with ECPG, since people porting from Informix or Sybase might require such info up front. > However, i've not tried it... I'll try and get some time! That'd be great if you could... there appears to still be a problem occurring at "EXEC SQL DISCONNECT con_name". I'll look into it tonight if I can. All this does kinda raise the interesting question of why it worked at all on FreeBSD... probably different scheduling and blind luck, I suppose. Thanks for the reponse - I'm a happy man. By 7.4, we should be able to start porting our apps to Postgres in earnest. Regards, Philip.
On Thu, 26 Jun 2003 11:19 am, Philip Yarra wrote: > there appears to still be a problem > occurring at "EXEC SQL DISCONNECT con_name". I'll look into it tonight if I > can. I did some more poking around last night, and believe I have found the issue: RedHat Linux 7.3 (the only distro I have access to currently) ships with a fairly challenged pthreads inplementation. The default mutex type (which you get from PTHREAD_MUTEX_INITIALIZER) is, according the the man page, PTHREAD_MUTEX_FAST_NP which is not a recursive mutex. If a thread owns a mutex and attempts to lock the mutex again, it will hang. By replacing PTHREAD_MUTEX_INITIALIZER with PTHREAD_MUTEX_RECURSIVE_NP for the two mutexes that are used recursively (debug_mutex and connections_mutex) I got my sample app to work flawlessly on Linux RedHat 7.3 Sadly, the _NP suffix is used to indicate non-portable, so of course my FreeBSD box steadfastly refused to compile it. Darn. The correct way to do this appears to be: pthread_mutexattr_t *mattr; pthread_mutexattr_settype(mattr, PTHREAD_MUTEX_RECURSIVE); (will verify this against FreeBSD when I get home, and Tru64 man page indicates support for this too, so I'll test that later). It won't work on RedHat Linux 7.3... I guess something like: #ifdef DODGY_PTHREADS #define PTHREAD_MUTEX_RECURSIVE = PTHREAD_MUTEX_RECURSIVE_NP #endif might do it... if we could detect the problem during configure. How is this sort of detection handled in other cases (such as long long, etc)? The other solution I can think of is to eradicate the two recursive locks I found. One is simple: ECPGlog calls ECPGdebug, which share debug_mutex - it ought to be okay to use different mutexes for each of these functions (there's a risk someone might call ECPGdebug while someone else is running through ECPGlog, but I think it is less likely, since it is a debug mechanism.) The second recursive lock I found is ECPGdisconnect calling ECPGget_connection, both of which share a mutex. Would it be okay if we did the following: ECPGdisconnect() still locks connections_mutex, but calls ECPGget_connection_nr() instead of ECPGget_connection() ECPGget_connection() becomes a locking wrapper, which locks connections_mutex then calls ECPGget_connection_nr() ECPGget_connection_nr() is a non-locking function which implements what ECPGget_connection() currently does. I'm not sure if this sort of thing is okay (and there may be other recursive locking scenarios that I haven't exercised yet). What approach should I take? I'm leaning towards eradicating recursive locks, unless someone has a good reason not to. > All this does kinda raise the interesting question of why it worked at all > on FreeBSD... probably different scheduling and blind luck, I suppose. FreeBSD 4.8 must have PTHREAD_MUTEX_RECURSIVE as default mutex type. I'm a bit concerned about FreeBSD 4.2 though - I noticed (before I blew it away in favour of 4.8) that its pthreads implementation came from a package called linuxthreads.tgz - it might have inherited the same problematic behaviour. Could someone with access to or knowledge of FreeBSD 4.2 check what the default mutex type is there? Regards, Philip. I can just see the ad for 7.3's pthreads impementation "Fast mutexes: zero to deadlock in 6.9 milliseconds!"
On Fri, Jun 27, 2003 at 10:45:46AM +1000, Philip Yarra wrote: > ECPGget_connection, both of which share a mutex. Would it be okay if we did > the following: > ... As you know I have never tried using threads, so feel free to go ahead and change this. Either commit to cvs ot send me a patch. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De ICQ: 179140304, AIM: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!