Thread: postmaster crashing
I have been trying to find out more about the postmaster crashing, but things seem to be getting stranger! I am experiencing problems running postmaster in gdb too (see end of message) I will put all the information in this posting for completness, apologies for the duplicated sections. I am running postgresql 7.3.4 on ia64 Red Hat Advance Server 3 beta. Now compiled from 7.3.4 source downloaded from postgresql.org. Tsearch2 compiled from tsearch-v2-stable.tar.gz I am very stuck so thank you for any ideas or guesses about whats happening or how to further research the problem. Potentially useful output below (in the order i did it): cd postgresql7.3.4_src_dir ./configure --enable-debug <- rest of install process from top of readme -> <- tsearch2 compile/install -> initdb on /data createdb test <- create a table in test and populate it with test data -> <- query test data sucessfully -> test# SELECT 'Our first string used today'::tsvector; tsvector --------------------------------------- 'Our' 'used' 'first' 'today' 'string' (1 row) test=# SELECT to_tsvector( 'default', 'this is many words' ); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !# LOG: server process (pid 7698) was terminated by signal 11 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing shared memory and semaphores LOG: database system was interrupted at 2003-09-02 13:26:42 UTC LOG: checkpoint record is at 0/967458 LOG: redo record is at 0/967458; undo record is at 0/0; shutdown TRUE LOG: next transaction id: 581; next oid: 25098 LOG: database system was not properly shut down; automatic recovery in progress FATAL: The database system is starting up LOG: ReadRecord: record with zero length at 0/9674A0 LOG: redo is not required LOG: database system is ready <-shutdown backend -> After more poking i discovered that the to_tsvector function call does not cause a seg fault in the backend if you pass it only numbers, characters and whitespace, but instead works as desired. ddd postmaster <- run postmaster with -D /data -> psql test <- seg fault, similar LOG message to above but now with signal 5 -> psql db_that_not_exist <- seg fault as previous -> How do i get the core files to examine? There never seem to be any produced, even outside the debuggers. I can't even connect to the db when its running in the debugger Thanks for reading this far, Grateful for any help or sugestions, Matt --
psql-mail@freeuk.com writes: > How do i get the core files to examine? There never seem to be any > produced, even outside the debuggers. Most likely you have launched the postmaster under "ulimit -c 0", which prevents core dumps. This seems to be the default state in recent Linux releases, for reasons I cannot fathom :-(. I put "ulimit -c unlimited" into the postmaster launch script whenever I am working on Linux. regards, tom lane
> From: Tom Lane <tgl@sss.pgh.pa.us> > > psql-mail@freeuk.com writes: > > How do i get the core files to examine? There never seem to be any > > produced, even outside the debuggers. > > Most likely you have launched the postmaster under "ulimit -c 0", which > prevents core dumps. This seems to be the default state in recent Linux > releases, for reasons I cannot fathom :-(. I put "ulimit -c unlimited" > into the postmaster launch script whenever I am working on Linux. > > regards, tom lane > I have set "ulimit -c unlimited" as you sugested, i then copied postmaster to /home/postgres and ran it as postgres from there... but still no core files. Where should they appear? I tried running from the command line and from within gdb and ddd. Still the same segfaulting problem with to_tsvector(). Is it a problem with tsearch2? Thanks, Mat --
psql-mail@freeuk.com writes: > I have set "ulimit -c unlimited" as you sugested, > i then copied postmaster to /home/postgres > and ran it as postgres from there... > but still no core files. Where should they appear? In $PGDATA/base/yourdbnumber/core (under some OSes the file name might be "core" plus a number). regards, tom lane
> psql-mail@freeuk.com writes: > > I have set "ulimit -c unlimited" as you sugested, > > i then copied postmaster to /home/postgres > > and ran it as postgres from there... > > but still no core files. Where should they appear? > > In $PGDATA/base/yourdbnumber/core (under some OSes the file name might > be "core" plus a number). > > regards, tom lane I have one core file (yes just one despite many duplications of the problem) and i can't get another one to appear. This one came from $PGDATA/db_ num as you said, and had .number on the end. I think the core file is from an instance of run postmaster from within gdb as it has the signal 5 rather than 11. Below is the gdb output. Thank you for your help! Core was generated by `postgres: mat test [local] SELECT '
First - apologies for the stuff about "i don't understand why there's only one core file", i now have a post-it note now saying "ulimit gets reset at reboot" (i assume thats what happened). So please find below a potentially more useful core file gdb output: Core was generated by `postgres: mat test [local] SELECT '. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4...done. Loaded symbols for /usr/lib/libreadline.so.4 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/tls/libm.so.6.1...done. Loaded symbols for /lib/tls/libm.so.6.1 Reading symbols from /lib/tls/libc.so.6.1...done. Loaded symbols for /lib/tls/libc.so.6.1 Reading symbols from /lib/ld-linux-ia64.so.2...done. Loaded symbols for /lib/ld-linux-ia64.so.2 Reading symbols from /usr/local/pgsql/lib/tsearch2.so...done. Loaded symbols for /usr/local/pgsql/lib/tsearch2.so #0 SN_create_env (S_size=0, I_size=2, B_size=1) at api.c:6 6 z->p = create_s(); --
psql-mail@freeuk.com writes: > #0 SN_create_env (S_size=0, I_size=2, B_size=1) at api.c:6 > 6 z->p = create_s(); Hm. Is it possible you're running out of memory? If the crash is right there, and not inside create_s(), it seems like a null return from calloc is the only explanation. This code doesn't seem to be checking for calloc() failure (I wonder why it's not using palloc/pfree anyway). regards, tom lane
The server has 4GB of memory and is not running any other services - so i wouldn't expect there to be memory shortage problems. The gdb output yesterday was from a postmaster running from the default postgresql.conf I have since changed the kernel settings to allow 1GB of shared mem and changed shared_buffers in postgresql.conf to request just under that. The core dump for the higher memory alloaction gives identical output to the one i posted yesterday. Thank you for you assistance, Mat --
psql-mail@freeuk.com writes: > The server has 4GB of memory and is not running any other services - so > i wouldn't expect there to be memory shortage problems. That has nothing whatever to do with how much memory the kernel will let any one process have. Check what ulimit settings the postmaster is running under (particularly -d, -m, -v). > I have since changed the kernel settings to allow 1GB of shared mem and > changed shared_buffers in postgresql.conf to request just under that. If anything, that's counterproductive for this problem. I'm not sure whether shared memory is counted against your ulimit -d, but it might be. regards, tom lane
Tom Lane writes: > That has nothing whatever to do with how much memory the kernel will let > any one process have. Check what ulimit settings the postmaster is > running under (particularly -d, -m, -v). My ulimit settings you requested look ok (others included for info) ulimit -d, -m, -v : unlimited -t, -f, -c all unlimited -l 16 -n 1024 -s 10240 -u 8094 --