Thread: still memory leaks with libpgtcl
Weird, I thought these were fixed already. Downloaded 7.2.3 source today, was *a lot* astonished, that the pg_on_connection_loss and the fix for finding a free connection slot (PSetResultId) were not in the source tarball. Fetched latest versions from CVS (both patches included), compiled tcl 8.3.5 and did set conn [pg_connect template1] pg_disconnect $conn in an endless loop, which result's in a constant increase in memory usage. Has anybody encountered this problem also ? What went wrong here - the sources in the tarball are oviously old (I submitted 2 patches several month ago, one was a bug fix, ok the other was an enhancement, but can anybody explain, why at least the bug fix did *not* find it's way to the tarball , and why are there 2 projects in parallel - libpgtcl and the pgtcl project - that is only confusing people. Has anyone experience with pgtcl project, it is still beta, and I'm in a production environment, so I need really stable stuff. Gerhard -- Gerhard Hintermayer http://www.inode.at/g.hintermayer
Gerhard Hintermayer wrote: > Weird, I thought these were fixed already. > > Downloaded 7.2.3 source today, was *a lot* astonished, that the > pg_on_connection_loss and the fix for finding a free connection slot > (PSetResultId) were not in the source tarball. > Fetched latest versions from CVS (both patches included), compiled tcl > 8.3.5 and did > > set conn [pg_connect template1] > pg_disconnect $conn > > in an endless loop, which result's in a constant increase in memory usage. > Has anybody encountered this problem also ? > > What went wrong here - the sources in the tarball are oviously old (I > submitted 2 patches several month ago, one was a bug fix, ok the other > was an enhancement, but can anybody explain, why at least the bug fix > did *not* find it's way to the tarball , and why are there 2 projects in > parallel - libpgtcl and the pgtcl project - that is only confusing people. > > Has anyone experience with pgtcl project, it is still beta, and I'm in a > production environment, so I need really stable stuff. > > Gerhard > Is there absoulutely noone using postgreSQL with tcl/tk ? I cannot beleave that. I'd be happy to hear at least the preferred pgtcl sources -either from the postgreSQL sources of from the pgtcl project. And what's the differences ? -- Gerhard Hintermayer http://www.inode.at/g.hintermayer
g.hintermayer@inode.at wrote: > ... > Is there absoulutely noone using postgreSQL with tcl/tk ? I cannot beleave that. > I'd be happy to hear at least the preferred pgtcl sources -either from the > postgreSQL sources of from the pgtcl project. And what's the differences ? Sure people are using it. As I understand it: The Tcl interface included with PostgreSQL-7.3.1 is the preferred, production release. The new libpgtcl on gborg.postgresql.org is beta-test; when it is done, the Tcl interface will be unbundled from the PostgreSQL releases (like the perl5 interface already is). The latest release of the unbundled libpgtcl-1.4b3 seems pretty good to me (I was unable to break it), so I suggest you try it if you can. As to your specific issues: 1) The PostgreSQL-7.3.1 released libpgtcl does have the connection loss handling function. 2) I'm not sure what you are referring to by "fix for finding a free connection slot (PSetResultID)". If you mean finding a free result structure slot (PGSetResultID), then I see code changes there and I think that is fixed in both 7.3.1 and 1.4b3. It seems to work as it should: I can allocate 128, but not 129 result structures. 3) Memory leak on connect/disconnect unfortunately is not fixed in either the production or beta versions. I'm seeing about 4KB loss per connect/disconnect. I'm sure this is serious for some applications, but others would probably not have trouble here. I can't remember: was there a patch - has this leak been tracked down?
ljb <lbayuk@mindspring.com> writes: > 3) Memory leak on connect/disconnect unfortunately is not fixed in either > the production or beta versions. I'm seeing about 4KB loss per > connect/disconnect. The notes I made about this say: 2002-09-02 19:41 tgl * src/interfaces/libpgtcl/: pgtclCmds.c, pgtclId.c: Partialsolution for 'unexpected EOF' problem with pg_disconnect: it seemswehave a choice between annoying messages and leaking memory (ordumping core, but that's right out). That was nearly four months ago, and so I don't recall the details, other than that I went for the "leak memory" choice :-( If anyone can improve the situation, step right up with a patch ... regards, tom lane
ljb wrote: > Sure people are using it. As I understand it: The Tcl interface included > with PostgreSQL-7.3.1 is the preferred, production release. The new > libpgtcl on gborg.postgresql.org is beta-test; when it is done, the Tcl > interface will be unbundled from the PostgreSQL releases (like the perl5 > interface already is). The latest release of the unbundled libpgtcl-1.4b3 > seems pretty good to me (I was unable to break it), so I suggest you try it > if you can. > OK, good to know. In the meanwhile I upgraded to 7.3.1, so 1) and 2) are OK. > As to your specific issues: > > 1) The PostgreSQL-7.3.1 released libpgtcl does have the connection loss > handling function. > > 2) I'm not sure what you are referring to by "fix for finding a free > connection slot (PSetResultID)". If you mean finding a free result > structure slot (PGSetResultID), then I see code changes there and I think > that is fixed in both 7.3.1 and 1.4b3. It seems to work as it should: I > can allocate 128, but not 129 result structures. > > 3) Memory leak on connect/disconnect unfortunately is not fixed in either > the production or beta versions. I'm seeing about 4KB loss per > connect/disconnect. I'm sure this is serious for some applications, but > others would probably not have trouble here. I can't remember: was there a > patch - has this leak been tracked down? Definitely serious, took me a long time to find out, why two machines hat to be hard rebooted because they ran out of memory after some weeks running with no problems. Changed the "connect/disconnect when idle" strategy to "connect at startup and stay so". If I'd track down the problem and sent a patch, has the wheel to be reinvented for gborg ? Two concurrent developments is not a good thing. Oops, forgot to check that thread. Thanks for your response. -- Gerhard Hintermayer http://www.inode.at/g.hintermayer
g.hintermayer@inode.at wrote: >... > If I'd track down the problem and sent a patch, has the wheel to be reinvented > for gborg ? Two concurrent developments is not a good thing. Did you see Tom Lane's reply on this thread? The problem is known but fixing the leak causes a crash. The code is in libpgtcl/pgtclId.c; look for this comment: * XXX Unfortunately, while this works fine if we are closing due to * explicit pg_disconnect, Tcl versionsthrough 8.3.3 dump core if we * try to do it during interpreter shutdown. Not clear why, or if * there isa workaround. For now, accept leakage of the (fairly * small) amount of memory taken for the channel state representation. I've confirmed that the crash occurs with Tcl 8.3.4 also. But I've got a patch which seems to fix it - that is, it neither leaks on disconnect nor crashes on shutdown after my patch is applied. At least it doesn't with my extremely minimal testing. If you want to try this, go ahead. This patch is seriously ugly and 'highly suspect', and I am NOT suggesting this be included in the PostgreSQL release. Really. But it works. Maybe. (This is for PostgreSQL-7.3.1) --- src/interfaces/libpgtcl/pgtclId.c.bak 2002-10-17 10:53:32.000000000 -0400 +++ src/interfaces/libpgtcl/pgtclId.c 2003-01-07 21:51:53.000000000 -0500 @@ -300,12 +300,10 @@ * small) amount of memory taken for the channel state representation. * Note we are not leakinga socket, since libpq closed that already. */ -#ifdef NOT_USED#if TCL_MAJOR_VERSION >= 8 - if (connid->notifier_channel != NULL) + if (connid->notifier_channel != NULL && interp != NULL) Tcl_UnregisterChannel(NULL, connid->notifier_channel);#endif -#endif /* * We must use Tcl_EventuallyFree because we don't want the connid
ljb <lbayuk@mindspring.com> writes: > This patch is > seriously ugly and 'highly suspect', and I am NOT suggesting this be > included in the PostgreSQL release. Really. But it works. Maybe. What we really need to do is reel in a Tcl guru to look at this. I think that there is a Tcl bug here somewhere; even if not, we seem to need someone who knows the Tcl library inside-and-out to tell us what we're doing wrong. I don't have the time or interest to pursue this with the Tcl community, but perhaps one of you can do it? > -#ifdef NOT_USED > #if TCL_MAJOR_VERSION >= 8 > - if (connid->notifier_channel != NULL) > + if (connid->notifier_channel != NULL && interp != NULL) > Tcl_UnregisterChannel(NULL, connid->notifier_channel); > #endif > -#endif Cool solution if it works --- but I'll not accept it unless someone can explain to me *why* it fixes the problem, and preferably just where in the Tcl documentation this is promised to be a correct solution... regards, tom lane
ljb wrote: > I've confirmed that the crash occurs with Tcl 8.3.4 also. But I've got a > patch which seems to fix it - that is, it neither leaks on disconnect nor > crashes on shutdown after my patch is applied. At least it doesn't with my > extremely minimal testing. If you want to try this, go ahead. This patch is > seriously ugly and 'highly suspect', and I am NOT suggesting this be > included in the PostgreSQL release. Really. But it works. Maybe. > > (This is for PostgreSQL-7.3.1) > > --- src/interfaces/libpgtcl/pgtclId.c.bak 2002-10-17 10:53:32.000000000 -0400 > +++ src/interfaces/libpgtcl/pgtclId.c 2003-01-07 21:51:53.000000000 -0500 > @@ -300,12 +300,10 @@ > * small) amount of memory taken for the channel state representation. > * Note we are not leaking a socket, since libpq closed that already. > */ > -#ifdef NOT_USED > #if TCL_MAJOR_VERSION >= 8 > - if (connid->notifier_channel != NULL) > + if (connid->notifier_channel != NULL && interp != NULL) > Tcl_UnregisterChannel(NULL, connid->notifier_channel); > #endif > -#endif > > /* > * We must use Tcl_EventuallyFree because we don't want the connid Ah, I think my brain does have memory leaks too, but I remember now, that Tom Lane fixed that after I found out, that the logfile was filled with "unexpected EOF on client connection" each time I pg_disconnected. Does your patch produce the above log-entries ? I'd be willing to post a question on comp.lang.tcl or dig through the Tcl_Channel documentation, but I still see the problem of two concurrent developments (gborg/libpgtcl). Gborg does heavily use Tcl_Obj instead of strings, and as far as I can remember, the code differs a lot, so it's likely that if ther's a solution in libpgtcl, it has to be dropped and searched again, when gborg will replace libpgtcl. -- Gerhard Hintermayer http://www.inode.at/g.hintermayer
The Tcl_Channel type mismatch was wrong alarm, ths is detected automatically (but should be corrected in future versions anyhow). The SEGV is, when Tcl_DontCallWhenDeleted is called, when the channel is shut down (PgDelConnectionId). Unfortunately notifies->interp is NULL and Tcl_DontCallWhenDeleted does not like that at all. This might be the reason, why the patch from ljb (sorry, no name) helps. I will track down that deeper tomorrow when I'm at home, posting via Google takes too long. Gerhard
tgl@sss.pgh.pa.us wrote: > What we really need to do is reel in a Tcl guru to look at this. > I think that there is a Tcl bug here somewhere; even if not, we seem > to need someone who knows the Tcl library inside-and-out to tell us > what we're doing wrong. > > I don't have the time or interest to pursue this with the Tcl community, > but perhaps one of you can do it? > >> -#ifdef NOT_USED >> #if TCL_MAJOR_VERSION >= 8 >> - if (connid->notifier_channel != NULL) >> + if (connid->notifier_channel != NULL && interp != NULL) >> Tcl_UnregisterChannel(NULL, connid->notifier_channel); >> #endif >> -#endif > > Cool solution if it works --- but I'll not accept it unless someone can > explain to me *why* it fixes the problem, and preferably just where in > the Tcl documentation this is promised to be a correct solution... It doesn't fix the problem; it avoids the problem. The crash happens when the DriverCloseProc calls UnregisterChannel on shutdown, not on a normal disconnect. The leak happens when UnregisterChannel is not called on normal disconnect. The patch avoids the issue by not calling UnregisterChannel if interp is NULL, as it seems to be only if called during shutdown. So the patch relies on undocumented behavior inside Tcl - that the DriverCloseProc() is called with interp==NULL during shutdown. I agree that something is wrong here, either with how Tcl implements the unregister/shutdown, or with how libpgtcl uses channels. As I said before, I don't recommend this patch be applied, except for people who have this specific problem (long-running program doing frequent connect/disconnects).
g.hintermayer@inode.at wrote: >... > the logfile was filled with "unexpected > EOF on client connection" each time I pg_disconnected. > Does your patch produce the above log-entries ? Not that I can see, front-end or back-end.
Here's a short answer on my posting "deep Tcl_Channel question" to c.l.t. >>Can anybody of the tcl-gurus please enlighten me.>>I have a problem with the tcl extension to the postgreSQL database.>>Theextension wraps a client channel around an existing TCP-Socket>>(which ist created by an external library).>>Theproblem is, that the closeProc of the created Tcl_Channel is>>sometimes called with interp NULL and sometimeswith a valid interp.>>Some tcl-functions are very unhappy with passed NULL-interp (e.g.>>Tcl_DontCallWhenDeleted).>>>>Myquestion is: When is the close proc passed a valid interp and when>>a NULL pointeras parameter ?>>>>Thanks>>>>Gerhard >Off hand, I think Tcl_Close() is called with a NULL interp when Tcl is>shutting down. Or, if the channel was never associatedto a particular>interp with Tcl_RegisterChannel().>>If you use Tcl_RegisterChannel(), you might not need to usea>Tcl_InterpDeleteProc for Tcl_CallWhenDeleted().>-->David Gravereaux <davygrvy@pobox.com>>[species: human; planet: earth,milkyway,alphasector] This is just like I did see it, in a normal interpreter shutdown and in case of an error interp is NULL, and a valid interp else. So I think it is ok, to check for NULL interp berfore calling Tcl functions that don't check for NULL pointers themselves. Tom are you happy with that explanation ? Gerhard
ljb <lbayuk@mindspring.com> wrote in message > [...] > (This is for PostgreSQL-7.3.1) > > --- src/interfaces/libpgtcl/pgtclId.c.bak 2002-10-17 10:53:32.000000000 -0400 > +++ src/interfaces/libpgtcl/pgtclId.c 2003-01-07 21:51:53.000000000 -0500 > @@ -300,12 +300,10 @@ > * small) amount of memory taken for the channel state representation. > * Note we are not leaking a socket, since libpq closed that already. > */ > -#ifdef NOT_USED > #if TCL_MAJOR_VERSION >= 8 > - if (connid->notifier_channel != NULL) > + if (connid->notifier_channel != NULL && interp != NULL) > Tcl_UnregisterChannel(NULL, connid->notifier_channel); > #endif > -#endif > > /* > * We must use Tcl_EventuallyFree because we don't want the connid Your patch does indeed fix the memory leakage but seems to introduce another problem. Every once in a while I do get segfaults, when I enter the event loop after pg_disconnect-ing (well, it looks like that to me). I have to say, that this is the only script, where the error occurs. All others (~500) work with no problem at all. I was trying to reproduce that in a short script, but - no way. I am not even sure, if this is postgreSQL related. I did build a static wish with --enable-symbols=all and --enable-memory to track the error down. Hope I'll see more then. Gerhard
Gerhard Hintermayer wrote: > ljb <lbayuk@mindspring.com> wrote in message > >>[...] >>(This is for PostgreSQL-7.3.1) >> >>--- src/interfaces/libpgtcl/pgtclId.c.bak 2002-10-17 10:53:32.000000000 -0400 >>+++ src/interfaces/libpgtcl/pgtclId.c 2003-01-07 21:51:53.000000000 -0500 >>@@ -300,12 +300,10 @@ >> * small) amount of memory taken for the channel state representation. >> * Note we are not leaking a socket, since libpq closed that already. >> */ >>-#ifdef NOT_USED >> #if TCL_MAJOR_VERSION >= 8 >>- if (connid->notifier_channel != NULL) >>+ if (connid->notifier_channel != NULL && interp != NULL) >> Tcl_UnregisterChannel(NULL, connid->notifier_channel); >> #endif >>-#endif >> >> /* >> * We must use Tcl_EventuallyFree because we don't want the connid > > > Your patch does indeed fix the memory leakage but seems to introduce > another problem. Every once in a while I do get segfaults, when I > enter the event loop after pg_disconnect-ing (well, it looks like that > to me). I have to say, that this is the only script, where the error > occurs. All others (~500) work with no problem at all. I was trying to > reproduce that in a short script, but - no way. I am not even sure, if > this is postgreSQL related. I did build a static wish with > --enable-symbols=all and --enable-memory to track the error down. Hope > I'll see more then. > > Gerhard I found out (actually it was Joe English from TCT), that the segfault was due to an error in tk (Bug #671330). So I see no reason, why the above patch should'nt be applied to further releases. I have running lot's of scripts with the above patch and not having any problem with both memory leaks or segfaults. Gerhard -- Gerhard Hintermayer http://www.inode.at/g.hintermayer
Gerhard Hintermayer <g.hintermayer@inode.at> writes: > I found out (actually it was Joe English from TCT), that the segfault > was due to an error in tk (Bug #671330). So I see no reason, why the > above patch should'nt be applied to further releases. I have running > lot's of scripts with the above patch and not having any problem with > both memory leaks or segfaults. Okay, I've applied this patch into CVS head and also REL7_3 branch. It'll be in PG 7.3.2. regards, tom lane