Re: Postmaster hangs - Mailing list pgsql-bugs
From | Karen Pease |
---|---|
Subject | Re: Postmaster hangs |
Date | |
Msg-id | 1256528847.25178.25.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Postmaster hangs (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Postmaster hangs
Re: Postmaster hangs |
List | pgsql-bugs |
kill -9 does kill postmaster (or at least seems to). But I can't figure out a way to get it restarted without a reboot -- I don't know what I'm missing. The Fedora postgres restart scripts don't do the trick, and I couldn't get it to work with pg_ctl either. kill -9 doesn't work on the locked up httpd processes. So that has to have the system restarted. [meme@chmmr]$ cat /proc/version Linux version 2.6.27.37-170.2.104.fc10.i686 (mockbuild@xenbuilder4.fedora.phx.redhat.com) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Mon Oct 12 22:01:53 EDT 2009 Postgres is by default in /var/lib/pgsql. When / started running out of space, I moved it to /scratch and symlinked: lrwxrwxrwx 1 root root 15 2009-09-11 16:57 pgsql -> /scratch/pgsql// / is on md0 and is RAID-1. /scratch is on md1 and is RAID-6: [meme@chmmr]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 64G 42G 18G 71% / /dev/md1 2.5T 2.2T 239G 91% /scratch /dev/sdb1 190M 38M 143M 21% /boot /dev/sde1 190M 86M 95M 48% /boot2 /dev/sdd1 190M 86M 95M 48% /boot3 /dev/sda1 190M 86M 95M 48% /boot4 /dev/sdc1 190M 86M 95M 48% /boot5 tmpfs 1000M 0 1000M 0% /dev/shm [meme@chmmr]$ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1 : active raid6 sde4[0] sdc4[4] sda4[3] sdb4[2] sdd4[1] 2722005120 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] md0 : active raid1 sde3[0] sdc3[4] sda3[3] sdb3[2] sdd3[1] 67119488 blocks [5/5] [UUUUU] unused devices: <none> Both filesystems are EXT-4. Thanks for your help! - Karen On Sun, 2009-10-25 at 23:13 -0400, Tom Lane wrote: > Karen Pease <meme@daughtersoftiresias.org> writes: > > It'll get through about three or four of them (out of hundreds) before > > it locks up. Now, before lockup, postmaster is very active. It shows > > up on top. The computer's hard drives clack nonstop. Etc. But once it > > locks up (without warning), all of that stop. Postmaster does nothing. > > The computer goes silent. I can't ctrl-break the psql process. If I > > try to start a new psql process, it won't get past the password prompt > > -- psql will hang. All Apache processes involving postgres queries > > hang. The postgres server cannot be restarted by any normal means (the > > only solution I've found that works is a reboot). And so forth. > > This sounds to me like it's a kernel problem, possibly triggered by > misbehaving disk hardware. What you might try to confirm is a kill -9 > on whichever postgres backend seems to be stuck. If that fails to > remove the process, then it's definitely a kernel issue --- try googling > "uninterruptible disk wait" and similar phrases. > > The cases that I've run into personally have been due to poor error > handling for a disk failure condition in a kernel-level disk driver. > If that's what it is for you, the bottom-level problem might be an > unreadable disk block somewhere. Or it might just be a garden variety > kernel bug. What's the platform? > > regards, tom lane >
pgsql-bugs by date: