Thread: Backend died abnormally - postgresql 7.2.1-5
Hello all, I am happy to report that this is the first time I have had a moment of trouble with postgresql. I have upgraded to 7.2.1-5 from version 7.1.3-2. (Redhat rpms) I did a pg_dumpall on the old version, installed the new version and then restored the databases. The problem seems to be that the backend is dieing. This gives this error in apache logs. ############################################################### NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DBD::Pg::db disconnect failed: rollback failed at /usr/local/lib/perl5/site_perl/5.6.1//FS/UID.pm line 68. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DBD::Pg::db commit failed: begin failed at /usr/local/lib/perl5/site_perl/5.6.1//FS/Record.pm line 270. ################################################################# I have search the archives/docs and see some talk on a lack of memory and/or swap space might be the cause. I see no indication of this with the 'free' command take right after errors accorded. ################################################################ [root@nemisis httpd]# free total used free shared buffers cached Mem: 512440 504104 8336 1240 130028 303896 -/+ buffers/cache: 70180 442260 Swap: 80284 312 79972 ################################################################ 1. Does this system need more memory? 2. What should be my next step in finding this problem? Thank you for your time, Rick Eicher II
Re: Backend died abnormally - postgresql 7.2.1-5
From
nconway@klamath.dyndns.org (Neil Conway)
Date:
On Tue, Jul 16, 2002 at 09:22:47AM -0500, Rick Eicher II wrote: > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > [root@nemisis httpd]# free > total used free shared buffers cached > Mem: 512440 504104 8336 1240 130028 > 303896 > -/+ buffers/cache: 70180 442260 > Swap: 80284 312 79972 > 1. Does this system need more memory? Doesn't look like it. In general, it might be wise to use a bit more swap, but that doesn't appear to be causing the problem. > 2. What should be my next step in finding this problem? Is the crash reproducible, and if so, can you post the query or situation that causes the crash to occur? (you can enable query logging with debug_print_query in postgresql.conf) Is there a core file in one of your database directories -- and if so, can you get a backtrace from it using gdb? It might also be useful to get a backtrace from a debugging build (--enable-debug). Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Re: Backend died abnormally - postgresql 7.2.1-5
From
nconway@klamath.dyndns.org (Neil Conway)
Date:
[Cc'ed to -general so that others can help] On Tue, Jul 16, 2002 at 09:51:48AM -0500, Rick Eicher II wrote: > > Is the crash reproducible, and if so, can you post the query or > > situation that causes the crash to occur? (you can enable query > > logging with debug_print_query in postgresql.conf) > > The crash is reproducible. Some examples of a query would be: > > Select * from cust_main where last='smith'; > Select * from cust_main; Are there any additional errors in the logs? With an error that fundamental, I'd suspect hardware problems, namely bad RAM. Would it be possible to run memtest86 on the machine? > I do have some joins queries but I seem to get this error with any of > query. If I issue the same query four times I will get the error one > time. I have uncommented this line (and others) in postgresql.conf but > do not get any log entries after restart. > > > > > Is there a core file in one of your database directories -- and if > > so, can you get a backtrace from it using gdb? It might also be > > useful to get a backtrace from a debugging build (--enable-debug). > > No core file. Are you sure your system is setup to allow core dumps -- i.e. does "ulimit -c" produce "unlimited"? Also, make sure you're looking in the right place for core files ($PGDATA/base/$oid_of_db/core) > Should I get the source and build it instead of using rpms? Might be a good bet -- at the least, it should produce a more helpful backtrace. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
> On Tue, Jul 16, 2002 at 09:51:48AM -0500, Rick Eicher II wrote: > > > Is the crash reproducible, and if so, can you post the query or > > > situation that causes the crash to occur? (you can enable query > > > logging with debug_print_query in postgresql.conf) > > > > The crash is reproducible. Some examples of a query would be: > > > > Select * from cust_main where last='smith'; > > Select * from cust_main; > > Are there any additional errors in the logs? > > With an error that fundamental, I'd suspect hardware problems, namely > bad RAM. Would it be possible to run memtest86 on the machine? > > > I do have some joins queries but I seem to get this error with any of > > query. If I issue the same query four times I will get the error one > > time. I have uncommented this line (and others) in postgresql.conf but > > do not get any log entries after restart. > > > > > > > > Is there a core file in one of your database directories -- and if > > > so, can you get a backtrace from it using gdb? It might also be > > > useful to get a backtrace from a debugging build (--enable-debug). > > > > No core file. > > Are you sure your system is setup to allow core dumps -- i.e. > does "ulimit -c" produce "unlimited"? [root@nemisis root]# ulimit -c 1000000 > Also, make sure you're looking in the right place for > core files ($PGDATA/base/$oid_of_db/core) I was not looking in the right place before. But still no core files found. > > Should I get the source and build it instead of using rpms? Trying to decide what is the best plan of attack since this is a production machine. But I think I will run the memtest86 on it first. > Might be a good bet -- at the least, it should produce a more > helpful backtrace. > > Cheers, > > Neil > > -- > Neil Conway <neilconway@rogers.com> > PGP Key ID: DB3C29FC
There should be more information in the postmaster log than you've shown us, too. The messages you reported are all from backends *other* than the one that actually crashed. At the very least the log should have the postmaster's report of an unexpected child death, with a signal code. Also, if you can reproducibly provoke the error, try attaching to the backend process with gdb before you do so. gdb should be able to give a backtrace from the crash point even if no core file results. regards, tom lane