Re: Postmaster hangs - Mailing list pgsql-bugs
| From | Karen Pease |
|---|---|
| Subject | Re: Postmaster hangs |
| Date | |
| Msg-id | 1256528847.25178.25.camel@localhost.localdomain Whole thread Raw |
| In response to | Re: Postmaster hangs (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: Postmaster hangs
Re: Postmaster hangs |
| List | pgsql-bugs |
kill -9 does kill postmaster (or at least seems to). But I can't figure
out a way to get it restarted without a reboot -- I don't know what I'm
missing. The Fedora postgres restart scripts don't do the trick, and I
couldn't get it to work with pg_ctl either.
kill -9 doesn't work on the locked up httpd processes. So that has to
have the system restarted.
[meme@chmmr]$ cat /proc/version
Linux version 2.6.27.37-170.2.104.fc10.i686
(mockbuild@xenbuilder4.fedora.phx.redhat.com) (gcc version 4.3.2
20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Mon Oct 12 22:01:53 EDT 2009
Postgres is by default in /var/lib/pgsql. When / started running out of
space, I moved it to /scratch and symlinked:
lrwxrwxrwx 1 root root 15 2009-09-11 16:57 pgsql
-> /scratch/pgsql//
/ is on md0 and is RAID-1. /scratch is on md1 and is RAID-6:
[meme@chmmr]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md0 64G 42G 18G 71% /
/dev/md1 2.5T 2.2T 239G 91% /scratch
/dev/sdb1 190M 38M 143M 21% /boot
/dev/sde1 190M 86M 95M 48% /boot2
/dev/sdd1 190M 86M 95M 48% /boot3
/dev/sda1 190M 86M 95M 48% /boot4
/dev/sdc1 190M 86M 95M 48% /boot5
tmpfs 1000M 0 1000M 0% /dev/shm
[meme@chmmr]$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid6 sde4[0] sdc4[4] sda4[3] sdb4[2] sdd4[1]
2722005120 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
md0 : active raid1 sde3[0] sdc3[4] sda3[3] sdb3[2] sdd3[1]
67119488 blocks [5/5] [UUUUU]
unused devices: <none>
Both filesystems are EXT-4.
Thanks for your help!
- Karen
On Sun, 2009-10-25 at 23:13 -0400, Tom Lane wrote:
> Karen Pease <meme@daughtersoftiresias.org> writes:
> > It'll get through about three or four of them (out of hundreds) before
> > it locks up. Now, before lockup, postmaster is very active. It shows
> > up on top. The computer's hard drives clack nonstop. Etc. But once it
> > locks up (without warning), all of that stop. Postmaster does nothing.
> > The computer goes silent. I can't ctrl-break the psql process. If I
> > try to start a new psql process, it won't get past the password prompt
> > -- psql will hang. All Apache processes involving postgres queries
> > hang. The postgres server cannot be restarted by any normal means (the
> > only solution I've found that works is a reboot). And so forth.
>
> This sounds to me like it's a kernel problem, possibly triggered by
> misbehaving disk hardware. What you might try to confirm is a kill -9
> on whichever postgres backend seems to be stuck. If that fails to
> remove the process, then it's definitely a kernel issue --- try googling
> "uninterruptible disk wait" and similar phrases.
>
> The cases that I've run into personally have been due to poor error
> handling for a disk failure condition in a kernel-level disk driver.
> If that's what it is for you, the bottom-level problem might be an
> unreadable disk block somewhere. Or it might just be a garden variety
> kernel bug. What's the platform?
>
> regards, tom lane
>
pgsql-bugs by date: