Thread: Mac OS X, PostgreSQL, PL/Tcl

Mac OS X, PostgreSQL, PL/Tcl

From
Scott Goodwin
Date:
Hoping someone can help me figure out why I can't get PL/Tcl to load  
without crashing the backend on Mac OS 10.3.2.

I compile Tcl, PostgreSQL, create the database and then run the  
following:

create function plpgsql_call_handler() RETURNS LANGUAGE_HANDLER
as 'plpgsql.so' language 'c';

create trusted procedural language 'plpgsql'  HANDLER plpgsql_call_handler  LANCOMPILER 'PL/pgSQL';

create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
as 'pltcl.so' language 'c';

create trusted procedural language 'pltcl'   HANDLER pltcl_call_handler   LANCOMPILER 'PL/Tcl';

The PL/pgSQL part loads fine. The PL/Tcl part crashes the server, and  
psql reports this:

psql:/Users/scott/pgtest/add_languages.sql:12: server closed the  
connection unexpectedly        This probably means the server terminated abnormally        before or while processing
therequest.
 
psql:/Users/scott/pgtest/add_languages.sql:12: connection to server was  
lost

I have tried the exact same procedure on Linux without any problems  
using the exact same scripts, setup etc. I've tried both PG 7.4.1 and a  
CVS copy from 11 Feb. I've used gcc 3.3, 3.1 and 2.85. I've tried  
loading PL/Tcl without loading PL/pgSQL at all, same problem. I tried  
Tcl 8.4.3, 8.4.4 and 8.4.5. pgtclsh runs fine.

I used ktrace to attach to the PG process and it's generating a  
SIGSEGV. I get several "file name too long" errors before the SEGV.  
Problem is probably not with PG, but could be with Tcl and/or Mac OS X  
loadable libs. Here's the significant portion of it (you can find the  
whole output trace at http://scottg.net/pgktrace.txt):

... stuff prior ...
 27296 postgres 0.000021 NAMI  "/usr/lib/libicucore.A.dylib" 27296 postgres 0.000019 RET   open 114/0x72 27296 postgres
0.000009CALL  fstat(0x72,0xbfffdf50) 27296 postgres 0.000009 RET   fstat 0 27296 postgres 0.000047 CALL   
 
load_shared_file(0x9019060c,0x605000,0x13b680,0xbfffdd60,0x4,0xbfffdcf0, 
0xbfffdd64) 27296 postgres 0.000053 NAMI  "/usr/lib/libicucore.A.dylib" 27296 postgres 0.000135 RET   load_shared_file
027296 postgres 0.000034 CALL  close(0x72) 27296 postgres 0.000015 RET   close 0 27296 postgres 0.000113 CALL
stat(0x800200,0xbfffde20)27296 postgres 0.000016 NAMI  "



                                                                                                                    
 
/libSystem.B.dylib" 27296 postgres 0.000023 RET   stat -1 errno 2 No such file or directory 27296 postgres 0.000021
CALL stat(0x800200,0xbfffde20) 27296 postgres 0.000009 NAMI  "




 
 
/libSystem.B.dylib" 27296 postgres 0.000017 RET   stat -1 errno 2 No such file or directory 27296 postgres 0.004552
CALL stat(0x182ea00,0xbfffd430) 27296 postgres 0.000044 RET   stat -1 errno 63 File name too long 27296 postgres
0.000019CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET   stat -1 errno 63 File name too long 27296
postgres0.000012 CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET   stat -1 errno 63 File name too long
27296postgres 0.000013 CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET   stat -1 errno 63 File name too
long27296 postgres 0.000013 CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET   stat -1 errno 63 File name
toolong 27296 postgres 0.000013 CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET   stat -1 errno 63 File
nametoo long 27296 postgres 0.000013 CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000008 RET   stat -1 errno 63
Filename too long 27296 postgres 0.000013 CALL  stat(0x182ea00,0xbfffd430) 27296 postgres 0.000009 RET   stat -1 errno
63File name too long 27296 postgres 0.000013 CALL  stat(0x90104e34,0xbfffd3b0) 27296 postgres 0.000118 NAMI  "/" 27296
postgres0.000019 RET   stat 0 27296 postgres 0.000012 CALL  lstat(0x182f600,0xbfffd3b0) 27296 postgres 0.000007 NAMI
"."27296 postgres 0.000016 RET   lstat 0 27296 postgres 0.000009 CALL  stat(0x182f600,0xbfffd1a0) 27296 postgres
0.000006NAMI  ".." 27296 postgres 0.000018 RET   stat 0 27296 postgres 0.000009 CALL  open(0x182f600,0x4,0xfefefeff)
 

... more stuff ...

27296 postgres 0.000007 NAMI  "../../../../../.." 27296 postgres 0.000021 RET   stat 0 27296 postgres 0.000008 CALL
open(0x182f600,0x4,0)27296 postgres 0.000008 NAMI  "../../../../../.." 27296 postgres 0.000016 RET   open 114/0x72
27296postgres 0.000009 CALL  fstat(0x72,0xbfffd1a0) 27296 postgres 0.000007 RET   fstat 0 27296 postgres 0.000007 CALL
fcntl(0x72,0x2,0x1)27296 postgres 0.000007 RET   fcntl 0 27296 postgres 0.000008 CALL  fstatfs(0x72,0xbfffd200) 27296
postgres0.000007 RET   fstatfs 0 27296 postgres 0.000009 CALL  fstat(0x72,0xbfffd3b0) 27296 postgres 0.000007 RET
fstat0 27296 postgres 0.000008 CALL   
 
getdirentries(0x72,0x182fa00,0x1000,0x501b74) 27296 postgres 0.000065 RET   getdirentries 640/0x280 27296 postgres
0.000015CALL  lseek(0x72,0,0,0) 27296 postgres 0.000007 RET   lseek 0 27296 postgres 0.000009 CALL  close(0x72) 27296
postgres0.000009 RET   close 0 27296 postgres 0.000007 CALL  lstat(0x182f600,0xbfffd3b0) 27296 postgres 0.000007 NAMI
"../../../../../../"27296 postgres 0.000019 RET   lstat 0 27296 postgres 0.000024 CALL  stat(0xbfffd4f0,0xbfffd900)
27296postgres 0.000009 RET   stat -1 errno 63 File name too long 27296 postgres 0.140906 PSIG  SIGSEGV SIG_DFL 26999
postgres0.004582 CSW  resume kernel 26999 postgres 0.000025 RET   select -1 errno 4 Interrupted system call 26999
postgres0.000010 PSIG  SIGCHLD caught handler=0xe59ac mask=0x0  
 
code=0x0 26999 postgres 0.000302 CALL  sigprocmask(0x3,0x23fc74,0) 26999 postgres 0.000036 RET   sigprocmask 0 26999
postgres0.000037 CALL  wait4(0xffffffff,0xbfffe670,0x1,0) 26999 postgres 0.000086 RET   wait4 27296/0x6aa0 26999
postgres0.000258 CALL  write(0x2,0xbfffdd10,0x3d) 26999 postgres 0.000031 GIO   fd 2 wrote 61 bytes       "LOG:  server
process(PID 27296) was terminated by signal 11       " 26999 postgres 0.000009 RET   write 61/0x3d 26999 postgres
0.000020CALL  write(0x2,0xbfffdd10,0x34) 26999 postgres 0.000013 GIO   fd 2 wrote 52 bytes       "LOG:  terminating any
otheractive server processes       " 26999 postgres 0.000008 RET   write 52/0x34 26999 postgres 0.000032 CALL
kill(0x6a35,0x3)26999 postgres 0.000020 RET   kill 0 26999 postgres 0.000011 CALL  sendto(0x6e,0xbfffe5a0,0x18,0,0,0)
 


thanks,

/s.



Re: Mac OS X, PostgreSQL, PL/Tcl

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> Hoping someone can help me figure out why I can't get PL/Tcl to load  
> without crashing the backend on Mac OS 10.3.2.

FWIW, pltcl seems to work for me.  Using up-to-date Darwin 10.3.2
and PG CVS tip, I didconfigure --with-tcl --without-tk
then make, make install, etc.  pltcl installs and passes its regression
test.

> psql:/Users/scott/pgtest/add_languages.sql:12: server closed the  
> connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.

Can you provide a stack trace for this?
        regards, tom lane


Re: Mac OS X, PostgreSQL, PL/Tcl

From
Scott Goodwin
Date:
Ok, so it's something specific to my setup. I created a test account, 
logged in and compiled postgresql there with a clean shell environment 
and it worked fine. So I'm shooting myself in the foot in my login 
environment. *sigh*.

thanks,

/s.


On Feb 21, 2004, at 1:51 AM, Tom Lane wrote:

> Scott Goodwin <scott@scottg.net> writes:
>> Hoping someone can help me figure out why I can't get PL/Tcl to load
>> without crashing the backend on Mac OS 10.3.2.
>
> FWIW, pltcl seems to work for me.  Using up-to-date Darwin 10.3.2
> and PG CVS tip, I did
>     configure --with-tcl --without-tk
> then make, make install, etc.  pltcl installs and passes its regression
> test.
>
>> psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
>> connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>
> Can you provide a stack trace for this?
>
>             regards, tom lane
>



Re: Mac OS X, PostgreSQL, PL/Tcl

From
Scott Goodwin
Date:
Found the problem. If I have a very long environment variable exported  
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use  
color ls and I have a very long LS_COLORS environment variable set.

I have duplicated the problem by renaming my .bashrc and logging back  
in. With this clean environment, I started PG and loaded PG/Tcl without  
any problems. I then created the following environment variable on the  
command line:

LONG_VAR=aaaaaaaaaaaaaaaaaa:bbbbbbbbbbbbbbbbbbb:cccccccccccccccccc: 
ddddddddddddddddddd:eeeeeeeeeeeeeeeeeee:fffffffffffffff: 
ggggggggggggggggg:hhhhhhhhhhhhhhhhhhhh:iiiiiiiiiiiiiiiiiii: 
jjjjjjjjjjjjjjjjjjjjj:kkkkkkkkkkkkkkkkkkkkkk:llllllllllllllllllll: 
mmmmmmmmmmmmmmmmmmmmmmm:nnnnnnnnnnnnnnnnnnnnnnnnn: 
ooooooooooooooooooooooo:pppppppppppppppppppppp:qqqqqqqqqqqqqqqqqqqqqqq: 
rrrrrrrrrrrrrrrrrrrrrrr:ssssssssssssssssssssssssss: 
ttttttttttttttttttttttttttt:uuuuuuuuuuuuuuuuuuuuuuuuu: 
vvvvvvvvvvvvvvvvvvvvvv:wwwwwwwwwwwwwwwwwwwwwwwwwwwwww: 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:yyyyyyyyyyyyyyyyyyyyyyyyyyyyy: 
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

and exported it. (Obviously the line above is going to be broken into  
multiple lines by the mailer...).

Then I stopped and restarted PG, loaded PG/Tcl and PG crashed. You  
*must* stop and restart PG for the problem to exhibit itself, otherwise  
it won't pick up the change in the environment. I suspect I'm running  
into a buffer overflow situation.

Ok, it fails consistently when LONG_VAR is 523 characters or greater;  
works consistently when LONG_VAR is 522 characters or smaller. Might  
not fail at the same number for others.

/s.



 To prove that this was the problem, I cleaned out my environment by  
moving my .bashrc file to another name, logged out, logged in, start
On Feb 21, 2004, at 1:51 AM, Tom Lane wrote:

> Scott Goodwin <scott@scottg.net> writes:
>> Hoping someone can help me figure out why I can't get PL/Tcl to load
>> without crashing the backend on Mac OS 10.3.2.
>
> FWIW, pltcl seems to work for me.  Using up-to-date Darwin 10.3.2
> and PG CVS tip, I did
>     configure --with-tcl --without-tk
> then make, make install, etc.  pltcl installs and passes its regression
> test.
>
>> psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
>> connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>
> Can you provide a stack trace for this?
>
>             regards, tom lane
>



Re: Mac OS X, PostgreSQL, PL/Tcl

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> Found the problem. If I have a very long environment variable exported  
> and I start PG, PG crashes when I try to load PG/Tcl. In my case I use  
> color ls and I have a very long LS_COLORS environment variable set.

Interesting.  Did you check whether the limiting factor is the longest
variable length, or the total size of the environment?  ("env|wc" would
probably do as an approximation for the latter.)
        regards, tom lane


Re: Mac OS X, PostgreSQL, PL/Tcl

From
Scott Goodwin
Date:
I'm certain that the length of a single env var is the only factor 
involved, and not the size of the enviroment itself. If I login to my 
normal environment and unset LS_COLORS, everything works fine. If I 
move my .bashrc out of the way, login fresh and create an env var > 522 
chars, it fails. My login environment is much larger than the 
environment I get without . bashrc, and the results of setting a single 
env var to > 522 chars duplicates the problem in both envs. leading me 
to believe that env size doesn't have an effect on this problem. I've 
now set my PG startup script to 'unset LS_COLORS' before starting PG, 
and this works great. Has anyone else tried to duplicate this problem? 
I'm using Mac OS 10.3.2, PG 7.4.1, Tcl 8.4.5.

/s.


On Feb 22, 2004, at 12:21 PM, Tom Lane wrote:

> Scott Goodwin <scott@scottg.net> writes:
>> Found the problem. If I have a very long environment variable exported
>> and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
>> color ls and I have a very long LS_COLORS environment variable set.
>
> Interesting.  Did you check whether the limiting factor is the longest
> variable length, or the total size of the environment?  ("env|wc" would
> probably do as an approximation for the latter.)
>
>             regards, tom lane
>



Re: Mac OS X, PostgreSQL, PL/Tcl

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> Found the problem. If I have a very long environment variable exported  
> and I start PG, PG crashes when I try to load PG/Tcl. In my case I use  
> color ls and I have a very long LS_COLORS environment variable set.

I was able to duplicate this.  I am not entirely sure why the problem is
dependent on the environment size, but I now know what causes it.
It seems Darwin's libc keeps its own copy of the argv pointer, and when
we move argv and then scribble on the original, it causes problems for
subsequent code that tries to look at argv[0] to determine the
executable's location.  (It's a good thing Darwin is open source, 'cause
I'm not sure we'd have ever seen the connection if we hadn't been able
to look at the source code for their libc.)

The fix is basically

+ #if defined(__darwin__)
+ #include <crt_externs.h>
+ #endif

+ #if defined(__darwin__)
+         *_NSGetArgv() = new_argv;
+ #endif

which you can stick into main.c if you need a workaround.  I applied a
more extensive patch to HEAD that refactors this code into ps_status.c,
but I'm disinclined to apply that patch to stable branches...
        regards, tom lane


Re: Mac OS X, PostgreSQL, PL/Tcl

From
Scott Goodwin
Date:
I'll grab the CVS PG copy and try it out. Is this something the Darwin 
folks should be notified about? It might cause problems with other 
apps.

thanks,

/s.


On Feb 22, 2004, at 4:47 PM, Tom Lane wrote:

> Scott Goodwin <scott@scottg.net> writes:
>> Found the problem. If I have a very long environment variable exported
>> and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
>> color ls and I have a very long LS_COLORS environment variable set.
>
> I was able to duplicate this.  I am not entirely sure why the problem 
> is
> dependent on the environment size, but I now know what causes it.
> It seems Darwin's libc keeps its own copy of the argv pointer, and when
> we move argv and then scribble on the original, it causes problems for
> subsequent code that tries to look at argv[0] to determine the
> executable's location.  (It's a good thing Darwin is open source, 
> 'cause
> I'm not sure we'd have ever seen the connection if we hadn't been able
> to look at the source code for their libc.)
>
> The fix is basically
>
> + #if defined(__darwin__)
> + #include <crt_externs.h>
> + #endif
>
> + #if defined(__darwin__)
> +         *_NSGetArgv() = new_argv;
> + #endif
>
> which you can stick into main.c if you need a workaround.  I applied a
> more extensive patch to HEAD that refactors this code into ps_status.c,
> but I'm disinclined to apply that patch to stable branches...
>
>             regards, tom lane
>



Re: [BUGS] Mac OS X, PostgreSQL, PL/Tcl

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> I'll grab the CVS PG copy and try it out. Is this something the Darwin 
> folks should be notified about? It might cause problems with other 
> apps.

It's unlikely that they'll consider it their problem.
        regards, tom lane