Thread: Mac OS X, PostgreSQL, PL/Tcl
Hoping someone can help me figure out why I can't get PL/Tcl to load without crashing the backend on Mac OS 10.3.2. I compile Tcl, PostgreSQL, create the database and then run the following: create function plpgsql_call_handler() RETURNS LANGUAGE_HANDLER as 'plpgsql.so' language 'c'; create trusted procedural language 'plpgsql' HANDLER plpgsql_call_handler LANCOMPILER 'PL/pgSQL'; create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER as 'pltcl.so' language 'c'; create trusted procedural language 'pltcl' HANDLER pltcl_call_handler LANCOMPILER 'PL/Tcl'; The PL/pgSQL part loads fine. The PL/Tcl part crashes the server, and psql reports this: psql:/Users/scott/pgtest/add_languages.sql:12: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing therequest. psql:/Users/scott/pgtest/add_languages.sql:12: connection to server was lost I have tried the exact same procedure on Linux without any problems using the exact same scripts, setup etc. I've tried both PG 7.4.1 and a CVS copy from 11 Feb. I've used gcc 3.3, 3.1 and 2.85. I've tried loading PL/Tcl without loading PL/pgSQL at all, same problem. I tried Tcl 8.4.3, 8.4.4 and 8.4.5. pgtclsh runs fine. I used ktrace to attach to the PG process and it's generating a SIGSEGV. I get several "file name too long" errors before the SEGV. Problem is probably not with PG, but could be with Tcl and/or Mac OS X loadable libs. Here's the significant portion of it (you can find the whole output trace at http://scottg.net/pgktrace.txt): ... stuff prior ... 27296 postgres 0.000021 NAMI "/usr/lib/libicucore.A.dylib" 27296 postgres 0.000019 RET open 114/0x72 27296 postgres 0.000009CALL fstat(0x72,0xbfffdf50) 27296 postgres 0.000009 RET fstat 0 27296 postgres 0.000047 CALL load_shared_file(0x9019060c,0x605000,0x13b680,0xbfffdd60,0x4,0xbfffdcf0, 0xbfffdd64) 27296 postgres 0.000053 NAMI "/usr/lib/libicucore.A.dylib" 27296 postgres 0.000135 RET load_shared_file 027296 postgres 0.000034 CALL close(0x72) 27296 postgres 0.000015 RET close 0 27296 postgres 0.000113 CALL stat(0x800200,0xbfffde20)27296 postgres 0.000016 NAMI " /libSystem.B.dylib" 27296 postgres 0.000023 RET stat -1 errno 2 No such file or directory 27296 postgres 0.000021 CALL stat(0x800200,0xbfffde20) 27296 postgres 0.000009 NAMI " /libSystem.B.dylib" 27296 postgres 0.000017 RET stat -1 errno 2 No such file or directory 27296 postgres 0.004552 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000044 RET stat -1 errno 63 File name too long 27296 postgres 0.000019CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET stat -1 errno 63 File name too long 27296 postgres0.000012 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET stat -1 errno 63 File name too long 27296postgres 0.000013 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET stat -1 errno 63 File name too long27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET stat -1 errno 63 File name toolong 27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET stat -1 errno 63 File nametoo long 27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET stat -1 errno 63 Filename too long 27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000009 RET stat -1 errno 63File name too long 27296 postgres 0.000013 CALL stat(0x90104e34,0xbfffd3b0) 27296 postgres 0.000118 NAMI "/" 27296 postgres0.000019 RET stat 0 27296 postgres 0.000012 CALL lstat(0x182f600,0xbfffd3b0) 27296 postgres 0.000007 NAMI "."27296 postgres 0.000016 RET lstat 0 27296 postgres 0.000009 CALL stat(0x182f600,0xbfffd1a0) 27296 postgres 0.000006NAMI ".." 27296 postgres 0.000018 RET stat 0 27296 postgres 0.000009 CALL open(0x182f600,0x4,0xfefefeff) ... more stuff ... 27296 postgres 0.000007 NAMI "../../../../../.." 27296 postgres 0.000021 RET stat 0 27296 postgres 0.000008 CALL open(0x182f600,0x4,0)27296 postgres 0.000008 NAMI "../../../../../.." 27296 postgres 0.000016 RET open 114/0x72 27296postgres 0.000009 CALL fstat(0x72,0xbfffd1a0) 27296 postgres 0.000007 RET fstat 0 27296 postgres 0.000007 CALL fcntl(0x72,0x2,0x1)27296 postgres 0.000007 RET fcntl 0 27296 postgres 0.000008 CALL fstatfs(0x72,0xbfffd200) 27296 postgres0.000007 RET fstatfs 0 27296 postgres 0.000009 CALL fstat(0x72,0xbfffd3b0) 27296 postgres 0.000007 RET fstat0 27296 postgres 0.000008 CALL getdirentries(0x72,0x182fa00,0x1000,0x501b74) 27296 postgres 0.000065 RET getdirentries 640/0x280 27296 postgres 0.000015CALL lseek(0x72,0,0,0) 27296 postgres 0.000007 RET lseek 0 27296 postgres 0.000009 CALL close(0x72) 27296 postgres0.000009 RET close 0 27296 postgres 0.000007 CALL lstat(0x182f600,0xbfffd3b0) 27296 postgres 0.000007 NAMI "../../../../../../"27296 postgres 0.000019 RET lstat 0 27296 postgres 0.000024 CALL stat(0xbfffd4f0,0xbfffd900) 27296postgres 0.000009 RET stat -1 errno 63 File name too long 27296 postgres 0.140906 PSIG SIGSEGV SIG_DFL 26999 postgres0.004582 CSW resume kernel 26999 postgres 0.000025 RET select -1 errno 4 Interrupted system call 26999 postgres0.000010 PSIG SIGCHLD caught handler=0xe59ac mask=0x0 code=0x0 26999 postgres 0.000302 CALL sigprocmask(0x3,0x23fc74,0) 26999 postgres 0.000036 RET sigprocmask 0 26999 postgres0.000037 CALL wait4(0xffffffff,0xbfffe670,0x1,0) 26999 postgres 0.000086 RET wait4 27296/0x6aa0 26999 postgres0.000258 CALL write(0x2,0xbfffdd10,0x3d) 26999 postgres 0.000031 GIO fd 2 wrote 61 bytes "LOG: server process(PID 27296) was terminated by signal 11 " 26999 postgres 0.000009 RET write 61/0x3d 26999 postgres 0.000020CALL write(0x2,0xbfffdd10,0x34) 26999 postgres 0.000013 GIO fd 2 wrote 52 bytes "LOG: terminating any otheractive server processes " 26999 postgres 0.000008 RET write 52/0x34 26999 postgres 0.000032 CALL kill(0x6a35,0x3)26999 postgres 0.000020 RET kill 0 26999 postgres 0.000011 CALL sendto(0x6e,0xbfffe5a0,0x18,0,0,0) thanks, /s.
Scott Goodwin <scott@scottg.net> writes: > Hoping someone can help me figure out why I can't get PL/Tcl to load > without crashing the backend on Mac OS 10.3.2. FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2 and PG CVS tip, I didconfigure --with-tcl --without-tk then make, make install, etc. pltcl installs and passes its regression test. > psql:/Users/scott/pgtest/add_languages.sql:12: server closed the > connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. Can you provide a stack trace for this? regards, tom lane
Ok, so it's something specific to my setup. I created a test account, logged in and compiled postgresql there with a clean shell environment and it worked fine. So I'm shooting myself in the foot in my login environment. *sigh*. thanks, /s. On Feb 21, 2004, at 1:51 AM, Tom Lane wrote: > Scott Goodwin <scott@scottg.net> writes: >> Hoping someone can help me figure out why I can't get PL/Tcl to load >> without crashing the backend on Mac OS 10.3.2. > > FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2 > and PG CVS tip, I did > configure --with-tcl --without-tk > then make, make install, etc. pltcl installs and passes its regression > test. > >> psql:/Users/scott/pgtest/add_languages.sql:12: server closed the >> connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. > > Can you provide a stack trace for this? > > regards, tom lane >
Found the problem. If I have a very long environment variable exported and I start PG, PG crashes when I try to load PG/Tcl. In my case I use color ls and I have a very long LS_COLORS environment variable set. I have duplicated the problem by renaming my .bashrc and logging back in. With this clean environment, I started PG and loaded PG/Tcl without any problems. I then created the following environment variable on the command line: LONG_VAR=aaaaaaaaaaaaaaaaaa:bbbbbbbbbbbbbbbbbbb:cccccccccccccccccc: ddddddddddddddddddd:eeeeeeeeeeeeeeeeeee:fffffffffffffff: ggggggggggggggggg:hhhhhhhhhhhhhhhhhhhh:iiiiiiiiiiiiiiiiiii: jjjjjjjjjjjjjjjjjjjjj:kkkkkkkkkkkkkkkkkkkkkk:llllllllllllllllllll: mmmmmmmmmmmmmmmmmmmmmmm:nnnnnnnnnnnnnnnnnnnnnnnnn: ooooooooooooooooooooooo:pppppppppppppppppppppp:qqqqqqqqqqqqqqqqqqqqqqq: rrrrrrrrrrrrrrrrrrrrrrr:ssssssssssssssssssssssssss: ttttttttttttttttttttttttttt:uuuuuuuuuuuuuuuuuuuuuuuuu: vvvvvvvvvvvvvvvvvvvvvv:wwwwwwwwwwwwwwwwwwwwwwwwwwwwww: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:yyyyyyyyyyyyyyyyyyyyyyyyyyyyy: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz and exported it. (Obviously the line above is going to be broken into multiple lines by the mailer...). Then I stopped and restarted PG, loaded PG/Tcl and PG crashed. You *must* stop and restart PG for the problem to exhibit itself, otherwise it won't pick up the change in the environment. I suspect I'm running into a buffer overflow situation. Ok, it fails consistently when LONG_VAR is 523 characters or greater; works consistently when LONG_VAR is 522 characters or smaller. Might not fail at the same number for others. /s. To prove that this was the problem, I cleaned out my environment by moving my .bashrc file to another name, logged out, logged in, start On Feb 21, 2004, at 1:51 AM, Tom Lane wrote: > Scott Goodwin <scott@scottg.net> writes: >> Hoping someone can help me figure out why I can't get PL/Tcl to load >> without crashing the backend on Mac OS 10.3.2. > > FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2 > and PG CVS tip, I did > configure --with-tcl --without-tk > then make, make install, etc. pltcl installs and passes its regression > test. > >> psql:/Users/scott/pgtest/add_languages.sql:12: server closed the >> connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. > > Can you provide a stack trace for this? > > regards, tom lane >
Scott Goodwin <scott@scottg.net> writes: > Found the problem. If I have a very long environment variable exported > and I start PG, PG crashes when I try to load PG/Tcl. In my case I use > color ls and I have a very long LS_COLORS environment variable set. Interesting. Did you check whether the limiting factor is the longest variable length, or the total size of the environment? ("env|wc" would probably do as an approximation for the latter.) regards, tom lane
I'm certain that the length of a single env var is the only factor involved, and not the size of the enviroment itself. If I login to my normal environment and unset LS_COLORS, everything works fine. If I move my .bashrc out of the way, login fresh and create an env var > 522 chars, it fails. My login environment is much larger than the environment I get without . bashrc, and the results of setting a single env var to > 522 chars duplicates the problem in both envs. leading me to believe that env size doesn't have an effect on this problem. I've now set my PG startup script to 'unset LS_COLORS' before starting PG, and this works great. Has anyone else tried to duplicate this problem? I'm using Mac OS 10.3.2, PG 7.4.1, Tcl 8.4.5. /s. On Feb 22, 2004, at 12:21 PM, Tom Lane wrote: > Scott Goodwin <scott@scottg.net> writes: >> Found the problem. If I have a very long environment variable exported >> and I start PG, PG crashes when I try to load PG/Tcl. In my case I use >> color ls and I have a very long LS_COLORS environment variable set. > > Interesting. Did you check whether the limiting factor is the longest > variable length, or the total size of the environment? ("env|wc" would > probably do as an approximation for the latter.) > > regards, tom lane >
Scott Goodwin <scott@scottg.net> writes: > Found the problem. If I have a very long environment variable exported > and I start PG, PG crashes when I try to load PG/Tcl. In my case I use > color ls and I have a very long LS_COLORS environment variable set. I was able to duplicate this. I am not entirely sure why the problem is dependent on the environment size, but I now know what causes it. It seems Darwin's libc keeps its own copy of the argv pointer, and when we move argv and then scribble on the original, it causes problems for subsequent code that tries to look at argv[0] to determine the executable's location. (It's a good thing Darwin is open source, 'cause I'm not sure we'd have ever seen the connection if we hadn't been able to look at the source code for their libc.) The fix is basically + #if defined(__darwin__) + #include <crt_externs.h> + #endif + #if defined(__darwin__) + *_NSGetArgv() = new_argv; + #endif which you can stick into main.c if you need a workaround. I applied a more extensive patch to HEAD that refactors this code into ps_status.c, but I'm disinclined to apply that patch to stable branches... regards, tom lane
I'll grab the CVS PG copy and try it out. Is this something the Darwin folks should be notified about? It might cause problems with other apps. thanks, /s. On Feb 22, 2004, at 4:47 PM, Tom Lane wrote: > Scott Goodwin <scott@scottg.net> writes: >> Found the problem. If I have a very long environment variable exported >> and I start PG, PG crashes when I try to load PG/Tcl. In my case I use >> color ls and I have a very long LS_COLORS environment variable set. > > I was able to duplicate this. I am not entirely sure why the problem > is > dependent on the environment size, but I now know what causes it. > It seems Darwin's libc keeps its own copy of the argv pointer, and when > we move argv and then scribble on the original, it causes problems for > subsequent code that tries to look at argv[0] to determine the > executable's location. (It's a good thing Darwin is open source, > 'cause > I'm not sure we'd have ever seen the connection if we hadn't been able > to look at the source code for their libc.) > > The fix is basically > > + #if defined(__darwin__) > + #include <crt_externs.h> > + #endif > > + #if defined(__darwin__) > + *_NSGetArgv() = new_argv; > + #endif > > which you can stick into main.c if you need a workaround. I applied a > more extensive patch to HEAD that refactors this code into ps_status.c, > but I'm disinclined to apply that patch to stable branches... > > regards, tom lane >
Scott Goodwin <scott@scottg.net> writes: > I'll grab the CVS PG copy and try it out. Is this something the Darwin > folks should be notified about? It might cause problems with other > apps. It's unlikely that they'll consider it their problem. regards, tom lane