Thread: --enable-thread-safety bug
Hello List, I am running 8.3.1 on FreeBSD 6.2 patch-7. The ports for Freebsd turn on --enable-thread-safety during configure of pg. When running my app after some time I have been getting a core dump - sig 11. #0 0x28333b96 in memcpy () from /lib/libc.so.6 (gdb) bt #0 0x28333b96 in memcpy () from /lib/libc.so.6 #1 0x280d0122 in ecpg_init_sqlca (sqlca=0x0) at misc.c:100 #2 0x280d0264 in ECPGget_sqlca () at misc.c:145 #3 0x280d056c in ecpg_log ( format=0x280d1d78 "free_params line %d: parameter %d = %s\n") at misc.c:243 #4 0x280c9758 in free_params (paramValues=0x836fe00, nParams=104, print=1 '\001', lineno=3303) at execute.c:1045 #5 0x280c9f08 in ecpg_execute (stmt=0xa726f00) at execute.c:1298 #6 0x280ca978 in ECPGdo (lineno=3303, compat=0, force_indicator=1, connection_name=0x0, questionmarks=0 '\0', st=0, query=0x806023c "update T_UNIT_STATUS_LOG set ip_address = $1 :: inet , last_ip_address = $2 :: inet , unit_date = $3 :: timestamp with time zone , unit_raw_time = $4 , status_date = now () , unit_ac"...) at execute.c:1636 #7 0x08057a46 in UpdateTUSL (pCachedUnit=0x807b680, msg=0xbfbf8850 "", p_threshold=80, p_actualIP=0xbfbfe880 "24.39.85.226") at srm2_monitor_db.pgc:3303 #8 0x0804f174 in main (argc=3, argv=0xbfbf7fc0) at srm2_monitor_server.c:3265 (gdb) f 2 #2 0x280d0264 in ECPGget_sqlca () at misc.c:145 145 ecpg_init_sqlca(sqlca); (gdb) p sqlca $1 = (struct sqlca_t *) 0x0 in looking in the code in misc.c I see: struct sqlca_t * ECPGget_sqlca(void) { #ifdef ENABLE_THREAD_SAFETY struct sqlca_t *sqlca; pthread_once(&sqlca_key_once, ecpg_sqlca_key_init); sqlca = pthread_getspecific(sqlca_key); if (sqlca == NULL) { sqlca = malloc(sizeof(struct sqlca_t)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ecpg_init_sqlca(sqlca); pthread_setspecific(sqlca_key, sqlca); } return (sqlca); #else return (&sqlca); #endif } The return from malloc should be checked to make sure it succeeds - right??? Steve
Steve Clark <sclark@netwolves.com> writes: > The return from malloc should be checked to make sure it succeeds - > right??? Probably, but what do you expect the code to do if it doesn't succeed? This function seems not to have any defined error-return convention. regards, tom lane
Tom Lane wrote: > Steve Clark <sclark@netwolves.com> writes: > >>The return from malloc should be checked to make sure it succeeds - >>right??? > > > Probably, but what do you expect the code to do if it doesn't succeed? > This function seems not to have any defined error-return convention. > > regards, tom lane > > Retry - the malloc - maybe there is a memory leak when --enable-thread-saftey is enabled, send an out of memory message to the postgres log, abort the transaction - I don't know I am not a postgres developer so I don't know all the issues. I all I know as a user having a program like postgres just sig 11 is unacceptable! As a commercial developer of software for over 30 years I would never just do nothing. My $.02 Steve
On Sat, Mar 22, 2008 at 11:28:24AM -0400, Steve Clark wrote: > Retry - the malloc - maybe there is a memory leak when > --enable-thread-saftey is enabled, > send an out of memory message to the postgres log, abort the > transaction - I don't know I am > not a postgres developer so I don't know all the issues. I all I know > as a user having a program > like postgres just sig 11 is unacceptable! As a commercial developer > of software for over 30 years > I would never just do nothing. Note this is your in application, not the server. Only your program died. Ofcourse the transaction got aborted, since the client (you) disconnected. There is no way for this to write to the server log, since it may be one another machine... As to the issue at hand: it looks like your program ran out of memory. Can you confirm the memory was running low? Even if it handled it by returning NULL, the caller will die because it also needs memory. Do you create and destroy a lot of threads since it seems this memory won't be freed? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > Note this is your in application, not the server. Only your program > died. Ofcourse the transaction got aborted, since the client (you) > disconnected. There is no way for this to write to the server log, > since it may be one another machine... Right. And note that if we don't have enough memory for the struct that was requested, we *certainly* don't have enough to do anything interesting. We could try fprintf(stderr, "out of memory\n"); exit(1); but even that I would give only about 50-50 odds of success; and more to the point, how is this any better for an application than a core dump? It's still summary termination. > Do you create and destroy a lot of threads since it seems this memory > won't be freed? The OP's program isn't threaded at all, since he was apparently running with a non-threaded ecpg/libpq before. This means that the proposal of looping till someone else frees memory is at least as silly as allowing the core dump to happen. regards, tom lane
Martijn van Oosterhout wrote: > On Sat, Mar 22, 2008 at 11:28:24AM -0400, Steve Clark wrote: > >>Retry - the malloc - maybe there is a memory leak when >>--enable-thread-saftey is enabled, >>send an out of memory message to the postgres log, abort the >>transaction - I don't know I am >>not a postgres developer so I don't know all the issues. I all I know >>as a user having a program >>like postgres just sig 11 is unacceptable! As a commercial developer >>of software for over 30 years >>I would never just do nothing. > > > Note this is your in application, not the server. Only your program > died. Ofcourse the transaction got aborted, since the client (you) > disconnected. There is no way for this to write to the server log, > since it may be one another machine... > > As to the issue at hand: it looks like your program ran out of memory. > Can you confirm the memory was running low? Even if it handled it by > returning NULL, the caller will die because it also needs memory. > > Do you create and destroy a lot of threads since it seems this memory > won't be freed? > > Have a nice day, My program had no threads - as I pointed out if I change the default Makefile in the FreeBSD ports system to not enable thread safety my programs runs just fine for days on end. It appears to me without any kind of close examination that there is a memory leak in the ecpg library when enable thread safety is turned on. I had an earlier problem in 8.2.6 where if enable-thread-safety was turned on sqlca would always be zero no matter if there was an error or not. This appears to me to be a problem in the ecpg library when thread safety is enabled. Have a nice day. Steve
Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > >>Note this is your in application, not the server. Only your program >>died. Ofcourse the transaction got aborted, since the client (you) >>disconnected. There is no way for this to write to the server log, >>since it may be one another machine... > > > Right. And note that if we don't have enough memory for the struct > that was requested, we *certainly* don't have enough to do anything > interesting. We could try > > fprintf(stderr, "out of memory\n"); > exit(1); > > but even that I would give only about 50-50 odds of success; and more > to the point, how is this any better for an application than a core > dump? It's still summary termination. > > >>Do you create and destroy a lot of threads since it seems this memory >>won't be freed? > > > The OP's program isn't threaded at all, since he was apparently running > with a non-threaded ecpg/libpq before. This means that the proposal of > looping till someone else frees memory is at least as silly as allowing > the core dump to happen. > > regards, tom lane > > I guess the real question is why we are running out of memory when this option is enabled. Since my app doesn't use threads that points to a memory leak in the ecpg library when enable thread safety is turned on. Steve
On Sat, Mar 22, 2008 at 12:42:51PM -0400, Tom Lane wrote: > > Do you create and destroy a lot of threads since it seems this memory > > won't be freed? > > The OP's program isn't threaded at all, since he was apparently running > with a non-threaded ecpg/libpq before. This means that the proposal of > looping till someone else frees memory is at least as silly as allowing > the core dump to happen. I found an old report where someone found that the get/setspecific wasn't working and it was allocating a new version of the structure each time. http://www.mail-archive.com/pgsql-general@postgresql.org/msg42918.html That was on Solaris though. It would be instructive to test that by calling that function multiple times successivly and ensure it's returning the same addess each time. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote: > My program had no threads - as I pointed out if I change the default > Makefile in the FreeBSD ports > system to not enable thread safety my programs runs just fine for days > on end. It appears to me > without any kind of close examination that there is a memory leak in the > ecpg library when enable > thread safety is turned on. There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder how the program manages to spend so much time allocating memory to eat all of it. Could you give us some more info about your source code? Do you use descriptors? Auto allocating? Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!
Michael Meskes wrote: > On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote: > >>My program had no threads - as I pointed out if I change the default >>Makefile in the FreeBSD ports >>system to not enable thread safety my programs runs just fine for days >>on end. It appears to me >>without any kind of close examination that there is a memory leak in the >>ecpg library when enable >>thread safety is turned on. > > > There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder > how the program manages to spend so much time allocating memory to eat > all of it. Could you give us some more info about your source code? Do > you use descriptors? Auto allocating? > > Michael Hi Michael, Not exactly sure what you are asking about - descriptors and auto allocating. The program processes about 800000 packets a day, which can update several tables. It runs continously reading udp packets from systems at remote locations coming in over the internet. It has a global exec sql include sqlca; then a number of functions that get called with each function having it own xxx( args,... ) { EXEC SQL BEGIN DECLARE SECTION; a bunch of variable EXEC SQL END DECLARE SECTION; with various EXEC SQL inserts, updates and selects. with checks of sqlca.sqlcode to determine if the sql statement succeeded. } Steve
Steve Clark wrote: > Michael Meskes wrote: > >>On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote: >> >> >>>My program had no threads - as I pointed out if I change the default >>>Makefile in the FreeBSD ports >>>system to not enable thread safety my programs runs just fine for days >>>on end. It appears to me >>>without any kind of close examination that there is a memory leak in the >>>ecpg library when enable >>>thread safety is turned on. >> >> >>There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder >>how the program manages to spend so much time allocating memory to eat >>all of it. Could you give us some more info about your source code? Do >>you use descriptors? Auto allocating? >> >>Michael > > > Hi Michael, > > Not exactly sure what you are asking about - descriptors and auto > allocating. > > The program processes about 800000 packets a day, which can update > several tables. > It runs continously reading udp packets from systems at remote > locations coming in over the internet. > > It has a global > exec sql include sqlca; > > then a number of functions that get called with each function having > it own > > xxx( args,... ) > { > EXEC SQL BEGIN DECLARE SECTION; > a bunch of variable > EXEC SQL END DECLARE SECTION; > > with various EXEC SQL inserts, updates and selects. > with checks of sqlca.sqlcode to determine if the sql statement succeeded. > > } > > Steve > to further illustrate our code below is a typical exec sql statement: exec sql insert into t_unit_event_log (event_log_no, unit_serial_no, event_type, event_category, event_mesg, event_severity, event_status, event_ref_log_no, event_logged_by, event_date, alarm, last_updated_by, last_updated_date) values (nextval('seq_event_log_no'), :h_serial_no, 'ALERT', :h_category, :h_mesg, :h_sev, 3, NULL, current_user, now(), :h_alarm, current_user, now()); if (sqlca.sqlcode != 0) { VARLOG(INFO, LOG_LEVEL_DBG4, "could not insert into T_UNIT_EVENT_LOG\n"); VARLOG(INFO, LOG_LEVEL_DBG4, "insertTUEL returns %d\n", ret); return ret; }
Steve Clark wrote: > I guess the real question is why we are running out of memory when > this option is enabled. > Since my app doesn't use threads that points to a memory leak in the > ecpg library when enable thread > safety is turned on. > It might be worth building ecpg with debug symbols then running your app, linked to that ecpg, under Valgrind. If you are able to produce more specific information about how the leak occurs in the context of your application people here may be more able to help you. -- Craig Ringer
Craig Ringer wrote: > Steve Clark wrote: > > >>I guess the real question is why we are running out of memory when >>this option is enabled. >>Since my app doesn't use threads that points to a memory leak in the >>ecpg library when enable thread >>safety is turned on. >> > > It might be worth building ecpg with debug symbols then running your > app, linked to that ecpg, under Valgrind. If you are able to produce > more specific information about how the leak occurs in the context of > your application people here may be more able to help you. > > -- > Craig Ringer > > Hi Craig, I could do that - but in my situation I am not using threads so I really don't need --enable-thread-safety turned on. The freebsd ports maintainer for postgresql decided everybody should have it whether they needed it or not. I simply deleted the option from the freebsd makefile rebuilt the port - relinked my app and no more problem. I just thought the postgresql developers would want to know there was a bug. If they don't care to investigate or trouble shoot the bug it is fine by me. I just find it is interesting that a non-threaded program causes a memory leak when used with postgres libraries that are compiled with --enable-thread-safety - doesn't seem to safe to me. Have a nice day. Steve
Steve Clark <sclark@netwolves.com> writes: > I could do that - but in my situation I am not using threads so I > really don't need --enable-thread-safety > turned on. The freebsd ports maintainer for postgresql decided > everybody should have it whether they > needed it or not. I simply deleted the option from the freebsd > makefile rebuilt the port - relinked my app > and no more problem. I just thought the postgresql developers would > want to know there was a bug. If > they don't care to investigate or trouble shoot the bug it is fine by me. I don't think you grasp the situation, Steve. Having enable-thread-safety turned on is standard across a wide swath of the world, and yet nobody else has reported severe memory leaks in ecpg. So there's something very specific to what your app is doing that triggers the problem. There's little point in anyone else investigating unless you can give them a test case that reproduces the misbehavior. I can assure you we would like to fix the problem if we can find it. But with no cooperation from you, we'll just have to wait until someone else stumbles across it and can show us exactly how to make it happen. regards, tom lane
On Sat, Mar 22, 2008 at 04:58:28PM -0400, Steve Clark wrote: > Not exactly sure what you are asking about - descriptors and auto > allocating. So I guess you don't use either feature. :-) > The program processes about 800000 packets a day, which can update > several tables. > It runs continously reading udp packets from systems at remote locations > coming in over the internet. But the code for processing all thoss statements is the same, with and without threading enabled. One code that differs is allocation of sqlca, but given that this structure has a mere 215 bytes (about). Even if it was allocated 800000 times it would make up for a memory loss of about 164MB. Which brings up the question how long the application runs until it segfaults. As Tom already pointed out, without more information there simply is no way for us to find out what's going on. We are more than willing to dig into it, but we need more to be able to. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!
Michael Meskes wrote: > On Sat, Mar 22, 2008 at 04:58:28PM -0400, Steve Clark wrote: > >>Not exactly sure what you are asking about - descriptors and auto >>allocating. > > > So I guess you don't use either feature. :-) > > >>The program processes about 800000 packets a day, which can update >>several tables. >>It runs continously reading udp packets from systems at remote locations >>coming in over the internet. > > > But the code for processing all thoss statements is the same, with and > without threading enabled. > > One code that differs is allocation of sqlca, but given that this > structure has a mere 215 bytes (about). Even if it was allocated 800000 > times it would make up for a memory loss of about 164MB. Which brings up > the question how long the application runs until it segfaults. > > As Tom already pointed out, without more information there simply is no > way for us to find out what's going on. We are more than willing to dig > into it, but we need more to be able to. > > Michael Ok I tryed valgrind and after a while it dies with a valgrind assertion error before providing any useful data. So I tried linking with -lc_r and it appears to have stopped the leak. Without -lc_r using "top" my app quickly climbed over 150mbyte in memory size - it is now staying steady at about 8mb - which is about what it ran when I compiled the ecpg lib without --enable-thread-safety enabled. Now why does this make a difference in ecpg? HTH, Steve If anyone cares below is the valgrind assertion failure: valgrind: vg_malloc2.c:1008 (vgPlain_arena_malloc): Assertion `new_sb != ((void*)0)' failed. ==4166== at 0xB802BE1F: (within /usr/local/lib/valgrind/stage2) ==4166== by 0xB802BE1E: (within /usr/local/lib/valgrind/stage2) ==4166== by 0xB802BE5D: vgPlain_core_assert_fail (in /usr/local/lib/valgrind/stage2) ==4166== by 0xB8028091: vgPlain_arena_malloc (in /usr/local/lib/valgrind/stage2) sched status: Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0 ==4166== at 0x3C03894B: calloc (in /usr/local/lib/valgrind/vgpreload_memcheck.so) Note: see also the FAQ.txt in the source distribution. It contains workarounds to several common problems. If that doesn't help, please report this bug to: valgrind.kde.org In the bug report, send all the above text, the valgrind version, and what Linux distro you are using. Thanks.