Re: Problems restarting after database crashed (signal - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Problems restarting after database crashed (signal
Date
Msg-id 1088660143.4056.12.camel@localhost.localdomain
Whole thread Raw
In response to Re: Problems restarting after database crashed (signal  ("Scott Marlowe" <smarlowe@qwest.net>)
List pgsql-general
On Wed, 2004-06-30 at 21:41, Scott Marlowe wrote:
> On Wed, 2004-06-30 at 18:57, Christopher Cashell wrote:
> > Yesterday, while attempting to access a database, I received errors
> > saying that the database was innaccessible.  After investigating a
> > little, I found the following in the PostgreSQL log files:
> >
> > 2004-06-30 08:30:19 [24119] LOG:  checkpoint process (PID 28423) was
> > terminated by signal 11
>
> > Eventually I attempted to shut it down and restart it, however that
> > failed too.  When I attempted to shut it down, I discovered a hung
> > 'startup subprocess' that can't be killed.
> >
> > nexus:~# ps aux | grep postgres
> > postgres 28424  0.0  1.5 16804 3044 pts/313  D    08:35   0:06 postgres:
> > startup subprocess
> > nexus:~# kill -9 28424
> > nexus:~# ps aux | grep postgres
> > postgres 28424  0.0  1.5 16804 3044 pts/313  D    08:35   0:06 postgres:
> > startup subprocess
> > nexus:~#
>
> The combination of a Sig 11 failure and a process stuck in a D state
> makes me lean towards thinking it's bad hardware (CPU or memory).  Have
> you tested this machine?

Oh, and a possibly buggy kernel or kernel module somewhere as well.
Didn't mean to not say it, and have had problems with some kernels under
heavy parallel loads doing stupid things that look just like this.


pgsql-general by date:

Previous
From: Dennis Gearon
Date:
Subject: Re: Internationalization
Next
From: Tom Lane
Date:
Subject: Re: case for lock_timeout