Thread: Postmaster crashes periodically

Postmaster crashes periodically

From
Philip Warner
Date:
Probably once per week or so I get a message like:

    Server process (pid 12243) exited with status 11 at Tue Oct 10 01:20:01
2000
    Terminating any active server processes...

Followed by one of these for each backend:

    NOTICE:  Message from PostgreSQL backend:
            The Postmaster has informed me that some other backend died
abnormally and possibly corrupted shared memory.
            I have rolled back the current transaction and am going to
terminate your database system connection and exit.
            Please reconnect to the database system and repeat your query.

I'd be interested in any hints as to how to track this down since I can't
seem to find any core files etc. This is under Linux 2.2.10 & PG 7.0.2. I'm
tempted to start using one postmaster per database (with different ports)
just so I can insulate each database from the others and see if there is a
specific database causing the problem.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: Postmaster crashes periodically

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>     Server process (pid 12243) exited with status 11 at Tue Oct 10 01:20:01
> 2000

> I'd be interested in any hints as to how to track this down since I can't
> seem to find any core files etc. This is under Linux 2.2.10 & PG
> 7.0.2.

A backend SEGV crash like this would normally leave a core file in the
$PGDATA/base/whicheverdb/ directory.  If you don't see one, the odds are
that the postmaster and hence the backend was launched with a ulimit
setting that prohibits dumping core ("ulimit -c 0", I think, but check
your local man pages).  Unfortunately a lot of Linux distros are set up
in such a way that any process launched from a system startup script
runs with exactly that ulimit setting by default.  You might try
tweaking your PG start script to do "ulimit -c unlimited" just before
starting the postmaster.

            regards, tom lane