Thread: PostgreSQL 7.4RC1 crashes on Panther

PostgreSQL 7.4RC1 crashes on Panther

From
Scott Goodwin
Date:
I've encountered a problem where the PostgreSQL database crashes when
attempting to load pltcl.so on Mac OS 10.3. PostgreSQL fails because
memory cannot be allocated during a shmget call. Here is the exact
error message:

FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
03600).
HINT:  This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory or swap space. To reduce the
request size (currently 3809280 bytes), reduce PostgreSQL's
shared_buffers parameter (currently 300) and/or its max_connections
parameter (currently 50).
         The PostgreSQL documentation contains more information about
shared memory configuration.


Here's the code that triggers it:

create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
    as 'pltcl.so' language 'c';


I have 1GB of memory and very little running on the powerbook (I
rebooted just to be sure I started with a clean system).

Not sure whether this is a PostgreSQL problem or a Mac OS 10.3 problem,
but I can load plpgsql.so right before loading pltcl.so and it still
only fails on the pltcl.so load. Commenting out the plpgsql.so load and
trying again it still fails on the pltcl.so load. I'm compiling against
a locally compiled version of Tcl 8.4.4. Here are the configure
settings:

./configure \
     --prefix=$INSTALL/postgresql \
     --with-tcl \
     --with-tclconfig=$INSTALL/tcl/lib \
     --with-includes=$INSTALL/tcl/include:$INSTALL/readline/include \
     --with-libraries=$INSTALL/readline/lib \
     --without-tk \
     --without-openssl


thanks,

/s.

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> FATAL:  could not create shared memory segment: Cannot allocate memory

> Here's the code that triggers it:
> create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
>     as 'pltcl.so' language 'c';

I don't think so.  That's a startup failure; it can not be triggered by
executing a SQL command, because if the postmaster is alive enough to
accept a SQL command in the first place, it's already gotten past
creation of the shared memory segment.

> Not sure whether this is a PostgreSQL problem or a Mac OS 10.3 problem,

It's a user problem.  If you're going to run multiple
shared-memory-using applications, it's up to you to adjust the kernel
limit or the per-application requests to fit.  I can't tell from this
what other app is using shared memory, though.  Are you trying to start
more than one postmaster?  If not, see whether OS X provides "ipcs" ---
that would give you some data about what shared-memory requests are
already present in the system.

            regards, tom lane

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server
> closed the connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.

> ...output in the log file is:

> LOG:  server process (PID 2739) was terminated by signal 10

Here's the real problem --- why are you getting a SIGBUS while trying to
load the pltcl handler function?  I suspect something broken in Tcl's
shared library, but dunno what.  You should be getting a core file from
the crashed process --- can you get a stack trace from it with gdb?

> FATAL:  could not create shared memory segment: Cannot allocate memory
> DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
> 03600).

This is evidently happening during attempted restart after the backend
crash.  I suspect it is a matter of the OS not having released the old
memory segment yet, together with the SHMMAX limit being too tight to
allow two such segments to exist concurrently.  Are you able to start
the server by hand immediately afterwards, or a few seconds afterwards?
Or do you have to reboot before it will restart?

            regards, tom lane

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> After recompiling with GCC 3.1 it fails when I'm running initdb to
> create the cluster -- it's a shmget error again. I believe that takes
> both Tcl and PostgreSQL out of the suspect pool and leaves Mac OS 10.3
> as the primary culprit.

Does the failure go away if you reboot?  I'm wondering whether the
conflicting shared memory segment is simply left over from your last
failure.  (Try "ipcs" if you have it; I don't think 10.2 did, but maybe
Apple saw the light for 10.3.)

We know that PG worked okay on 10.3 betas about a month back,
so I'm doubtful that there's any serious problem in 10.3 final.
Unfortunately I don't have a copy of 10.3 final yet to confirm ...

            regards, tom lane

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Tom Lane
Date:
Scott Goodwin <scott@scottg.net> writes:
> Just compiled PG 7.3.4 with GCC 3.1 on Panther and it exhibits the same
> problem, but generates a SIGSEGV instead of a SIGBUS.

I tried this on the 10.3 beta that I have (from about a month back) and
indeed I get a core dump while trying to create the pltcl call handler.

Steps to reproduce:
    configure --with-tcl --without-tk    (no tk support?)
    make, make install, make installcheck are all ok
    createlang pltcl regression
    ... kaboom ...

The stack trace looks like this:

(gdb) bt
#0  0x900f5fc0 in memcmp ()
#1  0x901c6734 in ?? ()
#2  0x8fe09c18 in __dyld_call_image_init_routines ()
#3  0x8fe11880 in __dyld_link_in_need_modules ()
#4  0x8fe134e4 in __dyld__dyld_link_module ()
#5  0x9003f5c8 in NSLinkModule ()
#6  0x00100f50 in pg_dlopen (filename=0x200d2c4 "/Users/tgl/testversion/lib/postgresql/pltcl.so") at dynloader.c:26
#7  0x001a1110 in load_external_function (filename=0xc01650 "", funcname=0x202b5a4 "pltcl_call_handler",
signalNotFound=1'\001', filehandle=0xbfffe344) at dfmgr.c:127 
#8  0x00055c6c in fmgr_c_validator (fcinfo=0x0) at pg_proc.c:639
#9  0x001a3838 in OidFunctionCall1 (functionId=0, arg1=180982) at fmgr.c:1210
#10 0x000552bc in ProcedureCreate (procedureName=0xbfffe5f0 "", procNamespace=2200, replace=0 '\0', returnsSet=0 '\0',
returnType=2280,languageObjectId=13, languageValidator=2247, prosrc=0x20234c0 "pltcl_call_handler", probin=0x20235b4
"$libdir/pltcl",isAgg=0 '\0', security_definer=0 '\0', isStrict=0 '\0', volatility=118 'v', parameterCount=0,
parameterTypes=0xbfffe710)at pg_proc.c:331 
#11 0x000880c0 in CreateFunction (stmt=0x234b3c) at functioncmds.c:515
...

But plpgsql works fine.  Also, the same code works fine in OS X 10.2.*.
Seems like either the Tcl shared library is broken in 10.3, or Apple
broke something in the dynamic linker, or our dynamic-library-loading
code is doing something that was OK with 10.2 but isn't OK with 10.3.
I guess we need to call in some OS X experts ... Marko, can you take
a look?

BTW, the failure to restart after the crash is explained here:
http://archives.postgresql.org/pgsql-hackers/2003-11/msg00321.php
I'll have that fixed for 7.4, but I dunno what to do about pltcl's
problem.

            regards, tom lane

looking for a kind soul for psqlODBC help (OSX)

From
Theodore Petrosky
Date:
I am sorry to post this here but... (i did post in the
odbc section but I don't think there are many osxers
there).

i am having a problem with mac osx and psqlODBC. If
there is anyone that is experienced with osx and odbc
and is willing to help, please respond off list.

i really need this and can't get it to work..

Ted

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Tom Lane
Date:
It turns out that the "createlang pltcl" failure on OS X 10.3 was due to
our ps_status code doing the wrong thing.  I have committed a fix.

            regards, tom lane

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Scott Goodwin
Date:
Hi Tom,

On Nov 4, 2003, at 4:48 PM, Tom Lane wrote:

>> Here's the code that triggers it:
>> create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
>>     as 'pltcl.so' language 'c';
>
> I don't think so.  That's a startup failure; it can not be triggered by
> executing a SQL command, because if the postmaster is alive enough to
> accept a SQL command in the first place, it's already gotten past
> creation of the shared memory segment.

I have to differ here. This problem is being triggered by the create
function section above, it is doing it after startup, and it's doing it
on Mac OS 10.3. Here are the commands I'm using, in the order I'm using
them. I'll be glad to admit I'm the one screwing it up, but I don't see
where.

# Define vars
ROOT=/Users/scott/m
INSTALL=$ROOT/install
PG=$INSTALL/postgresql
PGLIB=$PG/lib
PGDATA=$ROOT/var/db
PORT=5432
DB=m

DYLD_LIBRARY_PATH=$INSTALL/tcl/lib:$INSTALL/postgresql/lib:$INSTALL/
openssl/lib
export DYLD_LIBRARY_PATH


# Initialize the database cluster
$PG/bin/initdb -D $PGDATA --locale=C -L $PG/share

...output of the above command is:

The files belonging to this database system will be owned by user
"scott".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory /Users/scott/m/var/db... ok
creating directory /Users/scott/m/var/db/base... ok
creating directory /Users/scott/m/var/db/global... ok
creating directory /Users/scott/m/var/db/pg_xlog... ok
creating directory /Users/scott/m/var/db/pg_clog... ok
selecting default max_connections... 30
selecting default shared_buffers... 200
creating configuration files... ok
creating template1 database in /Users/scott/m/var/db/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ok
setting privileges on built-in objects... ok
creating information schema... ok
vacuuming database template1... ok
copying template1 to template0... ok

Success. You can now start the database server using:

     /Users/scott/m/install/postgresql/bin/postmaster -D
/Users/scott/m/var/db
or
     /Users/scott/m/install/postgresql/bin/pg_ctl -D
/Users/scott/m/var/db -l logfile start



# Start the database
$PG/bin/pg_ctl start -D $PGDATA -l $ROOT/database/postgres.log -o "-i"

...at this point the database is running, as shown by ps:

scott  2712   0.0  0.1    37288    936 std  S    12:10PM   0:00.02
/Users/scott/m/install/postgresql/bin/postmaster -i -D
/Users/scott/m/var/db
scott  2715   0.0  0.0    38276    168 std  S    12:10PM   0:00.00
/Users/scott/m/install/postgresql/bin/postmaster -i -D
/Users/scott/m/var/db
scott  2717   0.0  0.0    37288    260 std  S    12:10PM   0:00.00
/Users/scott/m/install/postgresql/bin/postmaster -i -D
/Users/scott/m/var/db

...and by the log file:

LOG:  database system was shut down at 2003-11-06 12:10:49 CST
LOG:  checkpoint record is at 0/9B13D8
LOG:  redo record is at 0/9B13D8; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 534; next OID: 17142
LOG:  database system is ready


# Create the database
$PG/bin/psql -d template1 -c "create database $DB"

...output on the command line:
CREATE DATABASE


# Add PL/pgsql and PL/tcl
$PG/bin/psql -d $DB -f $OPS/database/sql/add_languages.sql

...output on the command line is:

psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server
closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: connection
to server was lost

...output in the log file is:

LOG:  server process (PID 2739) was terminated by signal 10
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
03600).
HINT:  This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory or swap space. To reduce the
request size (currently 3809280 bytes), reduce PostgreSQL's
shared_buffers parameter (currently 300) and/or its max_connections
parameter (currently 50).
         The PostgreSQL documentation contains more information about
shared memory configuration.

...at this point, the server is no longer running.



The add_languages.sql file contains:

create function plpgsql_call_handler() RETURNS LANGUAGE_HANDLER
    as 'plpgsql.so' language 'c';

create trusted procedural language 'plpgsql'
    HANDLER plpgsql_call_handler
    LANCOMPILER 'PL/pgSQL';

create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
    as 'pltcl.so' language 'c';

create trusted procedural language 'pltcl'
    HANDLER pltcl_call_handler
    LANCOMPILER 'PL/Tcl';


(Line 13 of my add_languages.sql corresponds to the creation of the
pltcl call handler -- I left off comments at the top of the file when I
copied and pasted it here).

The above process worked fine with PostgreSQL 7.3.4 on Mac OS 10.2.8.


The next thing I tried was reducing the shared memory footprint:
   max_connections = 10
   shared_buffers = 40

I then wiped out the database area, and followed the exact same process
above. This time around, it didn't complain about shmget problems, but
it still caught a SIGBUS; it restarted gracefully, as shown by the log
file:

LOG:  server process (PID 2959) was terminated by signal 10
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2003-11-06 12:28:02 CST
LOG:  checkpoint record is at 0/9B13D8
LOG:  redo record is at 0/9B13D8; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 534; next OID: 17142
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/9B1418
LOG:  record with zero length at 0/9CDA00
LOG:  redo done at 0/9CD9DC
LOG:  database system is ready


The final thing I tried was altering the add_languages.sql file. I
commented out the parts that loaded Tcl, wiped out the database, and
followed the same procedure above, leaving max_connections and
shared_buffers as defaults (50 and 300). This worked great -- I can
load PL/pgsql fine, it's only when I attempt to load Tcl that it barfs.


>> Not sure whether this is a PostgreSQL problem or a Mac OS 10.3
>> problem,
>
> It's a user problem.  If you're going to run multiple
> shared-memory-using applications, it's up to you to adjust the kernel
> limit or the per-application requests to fit.  I can't tell from this
> what other app is using shared memory, though.  Are you trying to start
> more than one postmaster?  If not, see whether OS X provides "ipcs" ---
> that would give you some data about what shared-memory requests are
> already present in the system.

After this last test I started Mac OS X's Activity Monitor and looked
the postgres process -- there were three, as shown in the 'ps' output
above. Shared memory size was between 3 and 5 MB for each. This is on a
PowerBook with 1GB of memory, and with Activity Monitor showing 626MB
of that as being free. VM size is showing 3.84GB. I'm as sure as I can
be that I'm not running into a resource problem.


I added the following to the /System/Library/StartupItems/SystemTuning
file:

sysctl -w kern.sysv.shmmax=167772160 # bytes: 160 megs
sysctl -w kern.sysv.shmmin=1
sysctl -w kern.sysv.shmmni=32
sysctl -w kern.sysv.shmseg=8
sysctl -w kern.sysv.shmall=65536 # 4k pages: 256 megs

rebooted and reran the experiment -- problem still exists.


One thing I'm going to try next is using an earlier version of GCC.
Panther defaults to:

    gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1495);

I've used gcc_select to go back to GCC 3.1 and I'm rebuilding all the
parts now.


I'll keep digging as I have time.


thanks,

/s.

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Scott Goodwin
Date:
Just compiled PG 7.3.4 with GCC 3.1 on Panther and it exhibits the same
problem, but generates a SIGSEGV instead of a SIGBUS. Here's the log:

LOG:  server process (pid 12078) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing shared memory and
semaphores
LOG:  database system was interrupted at 2003-11-06 14:19:26 CST
LOG:  checkpoint record is at 0/80212C
LOG:  redo record is at 0/80212C; undo record is at 0/0; shutdown TRUE
LOG:  next transaction id: 480; next oid: 16976
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/80216C
LOG:  ReadRecord: record with zero length at 0/81E754
LOG:  redo done at 0/81E730
LOG:  database system is ready

A reboot does not help -- it still fails. I recompiled at GCC 3.1 and
it's failing at pltcl load again. I rebooted, then tried to add the
languages again. plpgsql was already loaded from the last time, but
shared memory failed again when it tried to load pltcl.

ipcs isn't installed on Panther. Strangely though, I've found ipcs in
the Darwin source tree (previous version) under /usr/bin, and in the
same place in FreeBSD source tree.

/s.




On Nov 6, 2003, at 2:41 PM, Tom Lane wrote:

> Scott Goodwin <scott@scottg.net> writes:
>> psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server
>> closed the connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>
>> ...output in the log file is:
>
>> LOG:  server process (PID 2739) was terminated by signal 10
>
> Here's the real problem --- why are you getting a SIGBUS while trying
> to
> load the pltcl handler function?  I suspect something broken in Tcl's
> shared library, but dunno what.  You should be getting a core file from
> the crashed process --- can you get a stack trace from it with gdb?
>
>> FATAL:  could not create shared memory segment: Cannot allocate memory
>> DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
>> 03600).
>
> This is evidently happening during attempted restart after the backend
> crash.  I suspect it is a matter of the OS not having released the old
> memory segment yet, together with the SHMMAX limit being too tight to
> allow two such segments to exist concurrently.  Are you able to start
> the server by hand immediately afterwards, or a few seconds afterwards?
> Or do you have to reboot before it will restart?
>
>             regards, tom lane

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Scott Goodwin
Date:
After recompiling with GCC 3.1 it fails when I'm running initdb to
create the cluster -- it's a shmget error again. I believe that takes
both Tcl and PostgreSQL out of the suspect pool and leaves Mac OS 10.3
as the primary culprit. I installed Panther last week from scratch
(reformatted disk etc.) and haven't made any mods to it aside from the
SystemTuning params today. I haven't had any other apps crash, and I'm
using the system all day using Apple's apps, AOLserver, OpenSSL and
others. I tried gdb to get a backtrace but the signal gets caught by
postgres, so it doesn't dump me back to the gdb command line. I'll have
to set breakpoints, have GDB do something with the signal, or mod PG to
not catch it. That'll have to wait until tomorrow or Saturday.

thanks for the assist,

/s.


On Nov 6, 2003, at 2:41 PM, Tom Lane wrote:

> Scott Goodwin <scott@scottg.net> writes:
>> psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server
>> closed the connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>
>> ...output in the log file is:
>
>> LOG:  server process (PID 2739) was terminated by signal 10
>
> Here's the real problem --- why are you getting a SIGBUS while trying
> to
> load the pltcl handler function?  I suspect something broken in Tcl's
> shared library, but dunno what.  You should be getting a core file from
> the crashed process --- can you get a stack trace from it with gdb?
>
>> FATAL:  could not create shared memory segment: Cannot allocate memory
>> DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
>> 03600).
>
> This is evidently happening during attempted restart after the backend
> crash.  I suspect it is a matter of the OS not having released the old
> memory segment yet, together with the SHMMAX limit being too tight to
> allow two such segments to exist concurrently.  Are you able to start
> the server by hand immediately afterwards, or a few seconds afterwards?
> Or do you have to reboot before it will restart?
>
>             regards, tom lane

Re: PostgreSQL 7.4RC1 crashes on Panther

From
Scott Goodwin
Date:
Awesome! Thanks so much for the fix -- I depend on PostgreSQL and Tcl
on my powerbook to do development work.

cheers,

/s.

On Nov 8, 2003, at 2:09 PM, Tom Lane wrote:

> It turns out that the "createlang pltcl" failure on OS X 10.3 was due
> to
> our ps_status code doing the wrong thing.  I have committed a fix.
>
>             regards, tom lane