Thread: postmaster crashing

postmaster crashing

From
psql-mail@freeuk.com
Date:
I have been trying to find out more about the postmaster crashing, but
things seem to be getting stranger! I am experiencing problems running
postmaster in gdb too (see end of message)

I will put all the information in this posting for completness,
apologies for the duplicated sections.

I am running postgresql 7.3.4 on ia64 Red Hat Advance Server 3 beta.
Now compiled from 7.3.4 source downloaded from postgresql.org.

Tsearch2 compiled from tsearch-v2-stable.tar.gz

I am very stuck so thank you for any ideas or guesses about whats
happening or how to further research the problem.

Potentially useful output below (in the order i did it):

cd postgresql7.3.4_src_dir
./configure --enable-debug
<- rest of install process from top of readme ->

<- tsearch2 compile/install ->

initdb on /data
createdb test

<- create a table in test and populate it with test data ->
<- query test data sucessfully ->

test# SELECT 'Our first string used today'::tsvector;
tsvector
---------------------------------------
'Our' 'used' 'first' 'today' 'string'
(1 row)

test=# SELECT to_tsvector( 'default', 'this is many words' );
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!#


LOG: server process (pid 7698) was terminated by signal 11
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing shared memory and
semaphores
LOG: database system was interrupted at 2003-09-02 13:26:42 UTC
LOG: checkpoint record is at 0/967458
LOG: redo record is at 0/967458; undo record is at 0/0; shutdown TRUE
LOG: next transaction id: 581; next oid: 25098
LOG: database system was not properly shut down; automatic recovery in
progress
FATAL: The database system is starting up
LOG: ReadRecord: record with zero length at 0/9674A0
LOG: redo is not required
LOG: database system is ready

<-shutdown backend ->

After more poking i discovered that the to_tsvector function call does
not cause a seg fault in the backend if you pass it only numbers,
characters and whitespace, but instead works as desired.

ddd postmaster
<- run postmaster with -D /data ->

psql test
<- seg fault, similar LOG message to above but now with signal 5 ->

psql db_that_not_exist
<- seg fault as previous ->


How do i get the core files to examine? There never seem to be any
produced, even outside the debuggers. I can't even connect to the db
when its running in the debugger

Thanks for reading this far,
Grateful for any help or sugestions,
Matt

--

Re: postmaster crashing

From
Tom Lane
Date:
psql-mail@freeuk.com writes:
> How do i get the core files to examine? There never seem to be any
> produced, even outside the debuggers.

Most likely you have launched the postmaster under "ulimit -c 0", which
prevents core dumps.  This seems to be the default state in recent Linux
releases, for reasons I cannot fathom :-(.  I put "ulimit -c unlimited"
into the postmaster launch script whenever I am working on Linux.

            regards, tom lane

Re: postmaster crashing

From
psql-mail@freeuk.com
Date:
> From: Tom Lane <tgl@sss.pgh.pa.us>
>
> psql-mail@freeuk.com writes:
> > How do i get the core files to examine? There never seem to be any

> > produced, even outside the debuggers.
>
> Most likely you have launched the postmaster under "ulimit -c 0",
which
> prevents core dumps.  This seems to be the default state in recent
Linux
> releases, for reasons I cannot fathom :-(.  I put "ulimit -c
unlimited"
> into the postmaster launch script whenever I am working on Linux.
>
>                         regards, tom lane
>

I have set "ulimit -c unlimited" as you sugested,
i then copied postmaster to /home/postgres
and ran it as postgres from there...
but still no core files. Where should they appear?

I tried running from the command line and from within gdb and ddd.

Still the same segfaulting problem with to_tsvector(). Is it a problem
with tsearch2?

Thanks,
Mat

--

Re: postmaster crashing

From
Tom Lane
Date:
psql-mail@freeuk.com writes:
> I have set "ulimit -c unlimited" as you sugested,
> i then copied postmaster to /home/postgres
> and ran it as postgres from there...
> but still no core files. Where should they appear?

In $PGDATA/base/yourdbnumber/core (under some OSes the file name might
be "core" plus a number).

            regards, tom lane

Re: postmaster crashing

From
psql-mail@freeuk.com
Date:
> psql-mail@freeuk.com writes:
> > I have set "ulimit -c unlimited" as you sugested,
> > i then copied postmaster to /home/postgres
> > and ran it as postgres from there...
> > but still no core files. Where should they appear?
>
> In $PGDATA/base/yourdbnumber/core (under some OSes the file name
might
> be "core" plus a number).
>
>             regards, tom lane

I have one core file (yes just one despite many duplications of the
problem)
and i can't get another one to appear. This one came from $PGDATA/db_
num
as you said, and had .number on the end.
I think the core file is from an instance of run postmaster from within
gdb as
it has the signal 5 rather than 11.

Below is the gdb output.

Thank you for your help!

Core was generated by `postgres: mat test [local] SELECT               '

Re: postmaster crashing

From
psql-mail@freeuk.com
Date:
First - apologies for the stuff about "i don't understand why there's
only one core file", i now have a post-it note now saying "ulimit gets
reset at reboot" (i assume thats what happened).

So please find below a potentially more useful core file gdb output:

Core was generated by `postgres: mat test [local] SELECT  '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4...done.
Loaded symbols for /usr/lib/libreadline.so.4
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6.1...done.
Loaded symbols for /lib/tls/libm.so.6.1
Reading symbols from /lib/tls/libc.so.6.1...done.
Loaded symbols for /lib/tls/libc.so.6.1
Reading symbols from /lib/ld-linux-ia64.so.2...done.
Loaded symbols for /lib/ld-linux-ia64.so.2
Reading symbols from /usr/local/pgsql/lib/tsearch2.so...done.
Loaded symbols for /usr/local/pgsql/lib/tsearch2.so
#0  SN_create_env (S_size=0, I_size=2, B_size=1) at api.c:6
6           z->p = create_s();

--

Re: postmaster crashing

From
Tom Lane
Date:
psql-mail@freeuk.com writes:
> #0  SN_create_env (S_size=0, I_size=2, B_size=1) at api.c:6
> 6           z->p = create_s();

Hm.  Is it possible you're running out of memory?  If the crash is right
there, and not inside create_s(), it seems like a null return from calloc
is the only explanation.  This code doesn't seem to be checking for
calloc() failure (I wonder why it's not using palloc/pfree anyway).

            regards, tom lane

Re: postmaster crashing

From
psql-mail@freeuk.com
Date:
The server has 4GB of memory and is not running any other services - so
i wouldn't expect there to be memory shortage problems.

The gdb output yesterday was from a postmaster running from the default
postgresql.conf

I have since changed the kernel settings to allow 1GB of shared mem and
changed shared_buffers in postgresql.conf to request just under that.

The core dump for the higher memory alloaction gives identical output
to the one i posted yesterday.

Thank you for you assistance,
Mat

--

Re: postmaster crashing

From
Tom Lane
Date:
psql-mail@freeuk.com writes:
> The server has 4GB of memory and is not running any other services - so
> i wouldn't expect there to be memory shortage problems.

That has nothing whatever to do with how much memory the kernel will let
any one process have.  Check what ulimit settings the postmaster is
running under (particularly -d, -m, -v).

> I have since changed the kernel settings to allow 1GB of shared mem and
> changed shared_buffers in postgresql.conf to request just under that.

If anything, that's counterproductive for this problem.  I'm not sure
whether shared memory is counted against your ulimit -d, but it might
be.

            regards, tom lane

Re: postmaster crashing

From
psql-mail@freeuk.com
Date:
Tom Lane writes:
> That has nothing whatever to do with how much memory the kernel will
let
> any one process have.  Check what ulimit settings the postmaster is
> running under (particularly -d, -m, -v).

My ulimit settings you requested look ok (others included for info)

ulimit -d, -m, -v :  unlimited

-t, -f, -c all unlimited

-l 16
-n 1024
-s 10240
-u 8094

--