Thread: first crash

first crash

From
"Thomas T. Thai"
Date:
I had my first production crash today.

PostgreSQL v7.3
NetBSD/Alpha 1.6

Postmaster was started a few months ago. Things went fine for months.
Then I started noticing psql.core dumps when exiting from psql. On a fresh
startup of postmaster, psql doesn't core dump on exit.

Today postmaster crashed without a core dump. When I tried to start
postmaster up again it gave me some strange error about using ipcrm to
remove stale shared memory block.

The main postmaster disappeared from 'ps' but a 4 others remain...

Once I removed those stale connections, and started up postmaster again,
things are back to normal and psql didn't core dump upon exit (\q).

Is there anything I can do to help track down this issue?

---
$ ps axj | grep postmaster
pgsql   22589     1 16910 a36100    0 S    ??    0:09.87 postmaster: pgsql
testdb [local] idle
pgsql   22593     1 16910 a36100    0 S    ??    0:09.70 postmaster: pgsql
testdb [local] idle
pgsql   22596     1 16910 a36100    0 S    ??    0:09.77 postmaster: pgsql
testdb [local] idle
pgsql   22604     1 16910 a36100    0 S    ??    0:10.43 postmaster: pgsql
testdb [local] idle
pgsql   22607     1 16910 a36100    0 S    ??    0:10.14 postmaster: pgsql
testdb [local] idle
pgsql    3266  3028  3265 916d80    2 R+   p3    0:00.00 grep postmaster
(sh)
$ pg_ctl start -D /var/pgsql/data-7.3
pg_ctl: Another postmaster may be running.  Trying to start postmaster
anyway.
Found a pre-existing shared memory block (key 5432001, id 327680) still in
use.
If you're sure there are no old backends still running,
remove the shared memory block with ipcrm(1), or just
delete "/var/pgsql/data-7.3/postmaster.pid".
pg_ctl: cannot start postmaster
Examine the log output.
$ kill -HUP 22589 22593 22596 22604 22607
$ ps axj | grep postmaster
---



Re: first crash

From
Tom Lane
Date:
"Thomas T. Thai" <tom@minnesota.com> writes:
> I had my first production crash today.
> PostgreSQL v7.3
> NetBSD/Alpha 1.6

Why are you not running 7.3.2?

> Is there anything I can do to help track down this issue?

Provide stack traces from the core dumps.  If you're not getting core
dumps, you may need to set "ulimit -c unlimited" before launching the
postmaster.  Also, keep in mind that the postmaster itself never does
chdir(), so it would try to dump core in whatever directory you are in
when you launch it.  Make sure this directory is writable by postgres,
or you'll get no core...

Also, did you look to see if there is anything interesting in the
postmaster log?  It could be that this was not a core dump but a panic
exit, in which case the postmaster would have written a complaint to the
log before exiting.

            regards, tom lane

Re: first crash

From
"Tom"
Date:
> "Thomas T. Thai" <tom@minnesota.com> writes:
>> I had my first production crash today.
>> PostgreSQL v7.3
>> NetBSD/Alpha 1.6
>
> Why are you not running 7.3.2?

I just loaded up 7.3.2. Will watch it closely to see.

[...]
> chdir(), so it would try to dump core in whatever directory you are in
> when you launch it.  Make sure this directory is writable by postgres,
> or you'll get no core...

I did a find starting at / but no core for postmaster.

> Also, did you look to see if there is anything interesting in the
> postmaster log?  It could be that this was not a core dump but a panic
> exit, in which case the postmaster would have written a complaint to the
> log before exiting.

Unfortunately I didn't run it with the log option on.

--
Thomas