Thread: first crash
I had my first production crash today. PostgreSQL v7.3 NetBSD/Alpha 1.6 Postmaster was started a few months ago. Things went fine for months. Then I started noticing psql.core dumps when exiting from psql. On a fresh startup of postmaster, psql doesn't core dump on exit. Today postmaster crashed without a core dump. When I tried to start postmaster up again it gave me some strange error about using ipcrm to remove stale shared memory block. The main postmaster disappeared from 'ps' but a 4 others remain... Once I removed those stale connections, and started up postmaster again, things are back to normal and psql didn't core dump upon exit (\q). Is there anything I can do to help track down this issue? --- $ ps axj | grep postmaster pgsql 22589 1 16910 a36100 0 S ?? 0:09.87 postmaster: pgsql testdb [local] idle pgsql 22593 1 16910 a36100 0 S ?? 0:09.70 postmaster: pgsql testdb [local] idle pgsql 22596 1 16910 a36100 0 S ?? 0:09.77 postmaster: pgsql testdb [local] idle pgsql 22604 1 16910 a36100 0 S ?? 0:10.43 postmaster: pgsql testdb [local] idle pgsql 22607 1 16910 a36100 0 S ?? 0:10.14 postmaster: pgsql testdb [local] idle pgsql 3266 3028 3265 916d80 2 R+ p3 0:00.00 grep postmaster (sh) $ pg_ctl start -D /var/pgsql/data-7.3 pg_ctl: Another postmaster may be running. Trying to start postmaster anyway. Found a pre-existing shared memory block (key 5432001, id 327680) still in use. If you're sure there are no old backends still running, remove the shared memory block with ipcrm(1), or just delete "/var/pgsql/data-7.3/postmaster.pid". pg_ctl: cannot start postmaster Examine the log output. $ kill -HUP 22589 22593 22596 22604 22607 $ ps axj | grep postmaster ---
"Thomas T. Thai" <tom@minnesota.com> writes: > I had my first production crash today. > PostgreSQL v7.3 > NetBSD/Alpha 1.6 Why are you not running 7.3.2? > Is there anything I can do to help track down this issue? Provide stack traces from the core dumps. If you're not getting core dumps, you may need to set "ulimit -c unlimited" before launching the postmaster. Also, keep in mind that the postmaster itself never does chdir(), so it would try to dump core in whatever directory you are in when you launch it. Make sure this directory is writable by postgres, or you'll get no core... Also, did you look to see if there is anything interesting in the postmaster log? It could be that this was not a core dump but a panic exit, in which case the postmaster would have written a complaint to the log before exiting. regards, tom lane
> "Thomas T. Thai" <tom@minnesota.com> writes: >> I had my first production crash today. >> PostgreSQL v7.3 >> NetBSD/Alpha 1.6 > > Why are you not running 7.3.2? I just loaded up 7.3.2. Will watch it closely to see. [...] > chdir(), so it would try to dump core in whatever directory you are in > when you launch it. Make sure this directory is writable by postgres, > or you'll get no core... I did a find starting at / but no core for postmaster. > Also, did you look to see if there is anything interesting in the > postmaster log? It could be that this was not a core dump but a panic > exit, in which case the postmaster would have written a complaint to the > log before exiting. Unfortunately I didn't run it with the log option on. -- Thomas