Re: Problems restarting after database crashed (signal 11). - Mailing list pgsql-general

From Christopher Cashell
Subject Re: Problems restarting after database crashed (signal 11).
Date
Msg-id 20040701023758.GB30122@zyp.org
Whole thread Raw
In response to Re: Problems restarting after database crashed (signal 11).  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Problems restarting after database crashed (signal 11).  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
At Wed, 30 Jun 04, Unidentified Flying Banana Tom Lane, said:
> Christopher Cashell <topher-pgsql@zyp.org> writes:
> > Eventually I attempted to shut it down and restart it, however that
> > failed too.  When I attempted to shut it down, I discovered a hung
> > 'startup subprocess' that can't be killed.
>
> This is interesting because it seems just about exactly like this
> recent Red Hat bug report:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126885

Hrm.  Yes, it does appear to be a very similar, if not identical, issue.

> As I commented there, I think that it must be a kernel or hardware
> issue --- Postgres itself can surely not make an unkillable process.
> However it's common to see processes that don't respond to kill if
> they are stuck inside a kernel I/O request.  That could mean either
> unresponsive hardware or a kernel bug.

That is somewhat along the lines of what I was thinking, although I have
had no problems like this before.  The machine has been running for over
100 days, and the database as well, without issue.

28424 postgres  18   0 16804 3044  15m D  0.0  1.6   0:06.72 postmaster

Note that it does have a process status of 'D', or uninterruptible
sleep.  That would explain the unkillable part, though I'm curious how
it ended up there.  Unless it just happened to be in a really bad spot
when Posgres segfaulted. . . although, I wouldn't expect that would
affect the 'startup subprocess'.

> I wonder whether you have any similarities in hardware or Linux kernel
> to the person who filed the above report?

Here's all the information I can provide for this machine:

 IBM IntelliStation Z Pro
 Model: 6899-12U
 Dual Pentium Pro 200
 192MB RAM
 4.5 GB IBM SCSI HDD
 9 GB IBM SCSI HDD
 6.4 GB WD HDD

 The database resides on the 4.5 GB SCSI, with the pg_xlog directory
 symlinked from there, and actually existing on the 9GB SCSI.

nexus:~$ uname -a
Linux nexus.zyp.org 2.6.4 #1 SMP Thu Mar 11 14:04:49 CST 2004 i686 GNU/Linux
nexus:~$ uptime
21:15:39 up 107 days, 20:57,  7 users,  load average: 2.04, 2.31, 2.38

If there's any other information I can provide, please let me know.

I'm going to reboot the box right now, and cross my fingers, hoping
it'll come back up. ;-)

>             regards, tom lane

--
| Christopher
+------------------------------------------------+
| Here I stand.  I can do no other.              |
+------------------------------------------------+


pgsql-general by date:

Previous
From: suspect.files@anr.state.vt.us
Date:
Subject: (Fwd)
Next
From: Dennis Gearon
Date:
Subject: Re: ~Strange Operators~