Thread: Backend died abnormally - postgresql 7.2.1-5

Backend died abnormally - postgresql 7.2.1-5

From

"Rick Eicher II"

Date:

16 July 2002, 10:22:50

Hello all,

I am happy to report that this is the first time I have had a moment of
trouble with postgresql.

I have upgraded to 7.2.1-5 from version 7.1.3-2. (Redhat rpms)


I did a pg_dumpall on the old version, installed the new version and
then restored the databases.

The problem seems to be that the backend is dieing. This gives this
error in apache logs.

    ###############################################################
    NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
    DBD::Pg::db disconnect failed: rollback failed at
/usr/local/lib/perl5/site_perl/5.6.1//FS/UID.pm line 68.
    NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
    DBD::Pg::db commit failed: begin failed at
/usr/local/lib/perl5/site_perl/5.6.1//FS/Record.pm line 270.

#################################################################


I have search the archives/docs and see some talk on a lack of memory
and/or swap space might be the cause. I see no indication of this with
the 'free' command take right after errors accorded.


################################################################
  [root@nemisis httpd]# free
            total       used       free     shared    buffers     cached
Mem:        512440     504104       8336       1240     130028
303896
-/+ buffers/cache:      70180     442260
Swap:        80284        312      79972

################################################################

1. Does this system need more memory?

2. What should be my next step in finding this problem?

Thank you for your time,
Rick Eicher II

Re: Backend died abnormally - postgresql 7.2.1-5

From

nconway@klamath.dyndns.org (Neil Conway)

Date:

16 July 2002, 10:37:28

On Tue, Jul 16, 2002 at 09:22:47AM -0500, Rick Eicher II wrote:
>     NOTICE:  Message from PostgreSQL backend:
>         The Postmaster has informed me that some other backend
>         died abnormally and possibly corrupted shared memory.
>         I have rolled back the current transaction and am
>         going to terminate your database system connection and exit.
>         Please reconnect to the database system and repeat your query.

>   [root@nemisis httpd]# free
>             total       used       free     shared    buffers     cached
> Mem:        512440     504104       8336       1240     130028
> 303896
> -/+ buffers/cache:      70180     442260
> Swap:        80284        312      79972

> 1. Does this system need more memory?

Doesn't look like it. In general, it might be wise to use a bit more
swap, but that doesn't appear to be causing the problem.

> 2. What should be my next step in finding this problem?

Is the crash reproducible, and if so, can you post the query or
situation that causes the crash to occur? (you can enable query
logging with debug_print_query in postgresql.conf)

Is there a core file in one of your database directories -- and if
so, can you get a backtrace from it using gdb? It might also be
useful to get a backtrace from a debugging build (--enable-debug).

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Backend died abnormally - postgresql 7.2.1-5

From

nconway@klamath.dyndns.org (Neil Conway)

Date:

16 July 2002, 11:04:16

[Cc'ed to -general so that others can help]

On Tue, Jul 16, 2002 at 09:51:48AM -0500, Rick Eicher II wrote:
> > Is the crash reproducible, and if so, can you post the query or
> > situation that causes the crash to occur? (you can enable query
> > logging with debug_print_query in postgresql.conf)
>
> The crash is reproducible. Some examples of a query would be:
>
>    Select * from cust_main where last='smith';
>    Select * from cust_main;

Are there any additional errors in the logs?

With an error that fundamental, I'd suspect hardware problems, namely
bad RAM. Would it be possible to run memtest86 on the machine?

> I do have some joins queries but I seem to get this error with any of
> query. If I issue the same query four times I will get the error one
> time. I have uncommented this line (and others) in postgresql.conf but
> do not get any log entries after restart.
>
> >
> > Is there a core file in one of your database directories -- and if
> > so, can you get a backtrace from it using gdb? It might also be
> > useful to get a backtrace from a debugging build (--enable-debug).
>
> No core file.

Are you sure your system is setup to allow core dumps -- i.e.
does "ulimit -c" produce "unlimited"?

Also, make sure you're looking in the right place for
core files ($PGDATA/base/$oid_of_db/core)

> Should I get the source and build it instead of using rpms?

Might be a good bet -- at the least, it should produce a more
helpful backtrace.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Backend died abnormally - postgresql 7.2.1-5

From

"Rick Eicher II"

Date:

16 July 2002, 11:34:50

> On Tue, Jul 16, 2002 at 09:51:48AM -0500, Rick Eicher II wrote:
> > > Is the crash reproducible, and if so, can you post the query or
> > > situation that causes the crash to occur? (you can enable query
> > > logging with debug_print_query in postgresql.conf)
> >
> > The crash is reproducible. Some examples of a query would be:
> >
> >    Select * from cust_main where last='smith';
> >    Select * from cust_main;
>
> Are there any additional errors in the logs?
>
> With an error that fundamental, I'd suspect hardware problems, namely
> bad RAM. Would it be possible to run memtest86 on the machine?
>
> > I do have some joins queries but I seem to get this error with any
of
> > query. If I issue the same query four times I will get the error one
> > time. I have uncommented this line (and others) in postgresql.conf
but
> > do not get any log entries after restart.
> >
> > >
> > > Is there a core file in one of your database directories -- and if
> > > so, can you get a backtrace from it using gdb? It might also be
> > > useful to get a backtrace from a debugging build (--enable-debug).
> >
> > No core file.
>
> Are you sure your system is setup to allow core dumps -- i.e.
> does "ulimit -c" produce "unlimited"?

[root@nemisis root]# ulimit -c
1000000

> Also, make sure you're looking in the right place for
> core files ($PGDATA/base/$oid_of_db/core)

I was not looking in the right place before. But still no core files
found.

> > Should I get the source and build it instead of using rpms?

Trying to decide what is the best plan of attack since this is a
production machine. But I think I will run the memtest86 on it first.

> Might be a good bet -- at the least, it should produce a more
> helpful backtrace.
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <neilconway@rogers.com>
> PGP Key ID: DB3C29FC

Re: Backend died abnormally - postgresql 7.2.1-5

From

Tom Lane

Date:

16 July 2002, 12:09:57

There should be more information in the postmaster log than you've shown
us, too.  The messages you reported are all from backends *other* than
the one that actually crashed.  At the very least the log should have
the postmaster's report of an unexpected child death, with a signal
code.

Also, if you can reproducibly provoke the error, try attaching to the
backend process with gdb before you do so.  gdb should be able to give a
backtrace from the crash point even if no core file results.

            regards, tom lane